Merge aosp/android-gs-raviole-5.10-android14-qpr2 into aosp/android14-gs-pixel-6.1android14-gs-pixel-6.1

* aosp/android-gs-raviole-5.10-android14-qpr2: (354 commits) [Official] MIDCET-5090, GPUCORE-40350: Flushes for L2 powerdown Fix invalid page table entries from occuring. Fix deadlock BTW user thread and page fault worker Fix deadlock BTW user thread and page fault worker csf: Fix kbase_kcpu_command_queue UaF due to bad queue creation Fix kernel build warnings Fix kernel build warnings Add firmware core dump error code in sscd GPUCORE-39469 Error handling for invalid slot when parsing trace data mali_kbase: platform: Add missing bounds check mali_kbase: Zero-initialize the dump_bufs_meta array mali_kbase: Fix OOB write in kbase_csf_cpu_queue_dump() mali_kbase: Move epoll-consumed waitqueue to struct kbase_file Integrate firmware core dump into sscd MIDCET-4870: Fix GPU page fault issue due to reclaiming of Tiler heap chunks mali_kbase: platform: Fix integer overflow mali_kbase: Tracepoints for governor recommendation mali_kbase: Add tracepoints to hint_min_freq / hint_max_freq mali_kbase: Enable mali_kutf_clk_rate_trace_test_portal build mali_kbase: restore CSF ftrace events Refactor helpers for creating RT threads Update KMD to 'mini release: update r44p1-00dev2 to r44p1-00dev3' mali_kbase: Use kthread for protm_event_worker GPUCORE-34589 jit_lock all JIT operations [Official] MIDCET-4458, GPUCORE-36765: Stop the use of tracking page for GPU memory accounting mali_kbase: Unmask RESET_COMPLETED irq before resetting the GPU [Official] MIDCET-4820,GPUCORE-36255 Sync whole USER_BUFFER pages upon GPU mapping mali_kbase: Use rt_mutex for scheduler lock mali_kbase: fix incorrect auto-merger change mali_pixel: Disable mgm debugfs by default mali_kbase: platform: Batch MMU flushes after liveness update mali_kbase: refactor kbase_mmu_update_pages [Official] MIDCET-4806,GPUCORE-38732 Continue FLUSH_MEM after power transition timeout mali_pixel: mgm: Compensate for group migration mali_pixel: mgm: Remove race condition mali_pixel: mgm: Refactor update_size mali_kbase: add missing deinitialization [Official] MIDCET-4458, GPUCORE-36765: Stop the use of tracking page for GPU memory accounting mali_kbase: restore hysteresis time. Update KMD to 'mini release: update r44p1-01bet1 to r44p1-00dev2' mali_kbase: Reduce kernel log spam. csf: Setup kcpu_fence->metadata before accessing it mali_kbase: Add an ITMON notifier callback to check GPU page tables. mali_kbase: shorten 'mali_kbase_*' thread names Constrain protected memory allocation during FW initialization Merge upstream DDK R43P0 KMD Mali allocations: unconditionally check for pending kill signals pixel_gpu_uevent: Increase uevent ratelimiting timeout to 20mins GPUCORE-38292 Fix Use-After-Free Race with Memory-Pool Grow kbase: csf: Reboot on failed GPU reset Add missing hwaccess_lock around atom_flags updates. GPUCORE-35754: Add barrier before updating GLB_DB_REQ to ring CSG DB mali_kbase: Enable kutf modules GPUCORE-36682 Lock MMU while disabling AS to prevent use after free kbase_mem: Reduce per-memory-group pool size to 4. mali_pixel: mgm: Ensure partition size is set to 0 when disabled. GPUCORE-37961 Deadlock issue due to lock ordering issue Make sure jobs are flushed before kbasep_platform_context_term [Official] MIDCET-4546, GPUCORE-37946: Synchronize GPU cache flush cmds with silent reset on GPU power up mali_kbase: hold GPU utilization for premature update. mali_kbase: Remove incorrect WARN() MIDCET-4324/GPUCORE-35611 Unmapping of aliased sink-page memory [Official] MIDCET-4458, GPUCORE-36402: Check for process exit before page alloc from kthread Revert "Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling"" Mali Valhall Android DDK r43p0-01eac0 KMD Mali Valhall Android DDK r42p0-01eac0 KMD mali_kbase: platform: [SLC-VK] Add new MGM group id for explicit SLC allocations. mali_kbase: [SLC-VK] Add new BASE_MEM_GROUP for explicit SLC allocations. mali_kbase: [SLC-VK] Add CCTX memory class for explicit SLC allocations. platform: Fix mgm_term_data behavior platform: Disable the GPU SLC partition when not in demand Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling" Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free" Revert "GPUCORE-36748 Fix kbase_gpu_mmap() error handling" Revert "GPUCORE-36682 Lock MMU while disabling AS to prevent use after free" [Official] MIDCET-4458, GPUCORE-36429: Prevent JIT allocations following unmap of tracking page [Official] MIDCET-4458, GPUCORE-36635 Fix memory leak via GROUP_SUSPEND Flush mmu updates regardless of coherency mode kbase: Add a debugfs file to test GPU uevents kbase: Add new GPU uevents to kbase pixel: Introduce GPU uevents to notify userspace of GPU failures [Official] MIDCET-4458, GPUCORE-36654 Use %pK on GPU bus fault mali_kbase: platform: Init GPU SLC context Add partial term support to pixel gpu init mali_kbase: Add missing wake_up(poweroff_wait) when cancelling poweroff. mali_pixel: mgm: Factor out common code between enabling/mutating partitions mali_pixel: mgm: Get accurate size from slc pt mutate mali_kbase: platform: mgm: Get accurate SLC partition size mali_kbase: Remove redundant if check to unblock suspend mali_kbase: reset: Flush SSCD worker before resetting the GPU pixel_gpu_sscd: Prevent dumping multiple SSCDs when the GPU hangs mali_kbase: reset: Add a helper to check GPU reset failure mali-pma: Defer probing until the dma_heap is found Revert "mali_kbase: mem: Prevent vma splits" GPUCORE-36682 Lock MMU while disabling AS to prevent use after free GPUCORE-36748 Fix kbase_gpu_mmap() error handling Powercycle mali to recover from a PM timeout mali_pixel: Downgrade invalid region warning to dev_dbg mali_pixel: Fix PBHA bit pos for ZUMA and PRO mali_kbase: platform: Perform partition resize and region migration ... Test: Verify `git diff aosp/android-gs-raviole-5.10-android14-qpr2..HEAD` Change-Id: I0711654dd45ae2996e837ce3353f0790394d7c72 Signed-off-by: Will McVicker <willmcvicker@google.com>
author: Will McVicker <willmcvicker@google.com> 2024-04-15 11:41:22 -0700
committer: Will McVicker <willmcvicker@google.com> 2024-04-16 10:17:07 -0700
commit: 0aa4c41c172f1e2acdf976c655f75a7a21db9791 (patch)
tree: 878a00410737d020c7be8fa0e2ab6849e310645e
parent: de85b3c05698f1ce2829d3ff977dee90be48b2d8 (diff)
parent: cfb55729953d62d99f66b0adc59963b189e9394b (diff)
download: gpu-android14-gs-pixel-6.1.tar.gz
332 files changed, 40150 insertions, 15611 deletions
diff --git a/mali_kbase/arbiter/mali_kbase_arbiter_interface.h b/common/include/linux/mali_arbiter_interface.h
index a0ca1cc..8e675ec 100644
--- a/mali_kbase/arbiter/mali_kbase_arbiter_interface.h
+++ b/common/include/linux/mali_arbiter_interface.h
@@ -41,7 +41,7 @@
  * 4 - Added max_config support
  * 5 - Added GPU clock frequency reporting support from arbiter
  */
-#define MALI_KBASE_ARBITER_INTERFACE_VERSION 5
+#define MALI_ARBITER_INTERFACE_VERSION 5
 
 /**
  * DOC: NO_FREQ is used in case platform doesn't support reporting frequency
diff --git a/common/include/linux/mali_kbase_debug_coresight_csf.h b/common/include/linux/mali_kbase_debug_coresight_csf.h
new file mode 100644
index 0000000..8356fd4
--- /dev/null
+++ b/common/include/linux/mali_kbase_debug_coresight_csf.h
@@ -0,0 +1,241 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_DEBUG_CORESIGHT_CSF_
+#define _KBASE_DEBUG_CORESIGHT_CSF_
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP 0U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM 1U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE 2U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE 3U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ 4U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL 5U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR 6U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR 7U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND 8U
+#define KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT 9U
+
+/**
+ * struct kbase_debug_coresight_csf_write_imm_op - Coresight immediate write operation structure
+ *
+ * @reg_addr: Register address to write to.
+ * @val:      Value to write at @reg_addr.
+ */
+struct kbase_debug_coresight_csf_write_imm_op {
+	__u32 reg_addr;
+	__u32 val;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_write_imm_range_op - Coresight immediate write range
+ *                                                       operation structure
+ *
+ * @reg_start: Register address to start writing from.
+ * @reg_end:   Register address to stop writing from. End address included in the write range.
+ * @val:       Value to write at @reg_addr.
+ */
+struct kbase_debug_coresight_csf_write_imm_range_op {
+	__u32 reg_start;
+	__u32 reg_end;
+	__u32 val;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_write_op - Coresight write operation structure
+ *
+ * @reg_addr: Register address to write to.
+ * @ptr:      Pointer to the value to write at @reg_addr.
+ */
+struct kbase_debug_coresight_csf_write_op {
+	__u32 reg_addr;
+	__u32 *ptr;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_read_op - Coresight read operation structure
+ *
+ * @reg_addr: Register address to read.
+ * @ptr:      Pointer where to store the read value.
+ */
+struct kbase_debug_coresight_csf_read_op {
+	__u32 reg_addr;
+	__u32 *ptr;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_poll_op - Coresight poll operation structure
+ *
+ * @reg_addr: Register address to poll.
+ * @val:      Expected value after poll.
+ * @mask:     Mask to apply on the read value from @reg_addr when comparing against @val.
+ */
+struct kbase_debug_coresight_csf_poll_op {
+	__u32 reg_addr;
+	__u32 val;
+	__u32 mask;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_bitw_op - Coresight bitwise operation structure
+ *
+ * @ptr: Pointer to the variable on which to execute the bit operation.
+ * @val: Value with which the operation should be executed against @ptr value.
+ */
+struct kbase_debug_coresight_csf_bitw_op {
+	__u32 *ptr;
+	__u32 val;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_op - Coresight supported operations
+ *
+ * @type:               Operation type.
+ * @padding:            Padding for 64bit alignment.
+ * @op:                 Operation union.
+ * @op.write_imm:       Parameters for immediate write operation.
+ * @op.write_imm_range: Parameters for immediate range write operation.
+ * @op.write:           Parameters for write operation.
+ * @op.read:            Parameters for read operation.
+ * @op.poll:            Parameters for poll operation.
+ * @op.bitw:            Parameters for bitwise operation.
+ * @op.padding:         Padding for 64bit alignment.
+ *
+ * All operation structures should include padding to ensure they are the same size.
+ */
+struct kbase_debug_coresight_csf_op {
+	__u8 type;
+	__u8 padding[7];
+	union {
+		struct kbase_debug_coresight_csf_write_imm_op write_imm;
+		struct kbase_debug_coresight_csf_write_imm_range_op write_imm_range;
+		struct kbase_debug_coresight_csf_write_op write;
+		struct kbase_debug_coresight_csf_read_op read;
+		struct kbase_debug_coresight_csf_poll_op poll;
+		struct kbase_debug_coresight_csf_bitw_op bitw;
+		u32 padding[3];
+	} op;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_sequence - Coresight sequence of operations
+ *
+ * @ops:    Arrays containing Coresight operations.
+ * @nr_ops: Size of @ops.
+ */
+struct kbase_debug_coresight_csf_sequence {
+	struct kbase_debug_coresight_csf_op *ops;
+	int nr_ops;
+};
+
+/**
+ * struct kbase_debug_coresight_csf_address_range - Coresight client address range
+ *
+ * @start: Start offset of the address range.
+ * @end:   End offset of the address range.
+ */
+struct kbase_debug_coresight_csf_address_range {
+	__u32 start;
+	__u32 end;
+};
+
+/**
+ * kbase_debug_coresight_csf_register - Register as a client for set ranges of MCU memory.
+ *
+ * @drv_data:  Pointer to driver device data.
+ * @ranges:    Pointer to an array of struct kbase_debug_coresight_csf_address_range
+ *             that contains start and end addresses that the client will manage.
+ * @nr_ranges: Size of @ranges array.
+ *
+ * This function checks @ranges against current client claimed ranges. If there
+ * are no overlaps, a new client is created and added to the list.
+ *
+ * Return: A pointer of the registered client instance on success. NULL on failure.
+ */
+void *kbase_debug_coresight_csf_register(void *drv_data,
+					 struct kbase_debug_coresight_csf_address_range *ranges,
+					 int nr_ranges);
+
+/**
+ * kbase_debug_coresight_csf_unregister - Removes a coresight client.
+ *
+ * @client_data: A pointer to a coresight client.
+ *
+ * This function removes a client from the client list and frees the client struct.
+ */
+void kbase_debug_coresight_csf_unregister(void *client_data);
+
+/**
+ * kbase_debug_coresight_csf_config_create - Creates a configuration containing
+ *                                           enable and disable sequence.
+ *
+ * @client_data:      Pointer to a coresight client.
+ * @enable_seq:  Pointer to a struct containing the ops needed to enable coresight blocks.
+ *               It's optional so could be NULL.
+ * @disable_seq: Pointer to a struct containing ops to run to disable coresight blocks.
+ *               It's optional so could be NULL.
+ *
+ * Return: Valid pointer on success. NULL on failure.
+ */
+void *
+kbase_debug_coresight_csf_config_create(void *client_data,
+					struct kbase_debug_coresight_csf_sequence *enable_seq,
+					struct kbase_debug_coresight_csf_sequence *disable_seq);
+/**
+ * kbase_debug_coresight_csf_config_free - Frees a configuration containing
+ *                                         enable and disable sequence.
+ *
+ * @config_data: Pointer to a coresight configuration.
+ */
+void kbase_debug_coresight_csf_config_free(void *config_data);
+
+/**
+ * kbase_debug_coresight_csf_config_enable - Enables a coresight configuration
+ *
+ * @config_data: Pointer to coresight configuration.
+ *
+ * If GPU is turned on, the configuration is immediately applied the CoreSight blocks.
+ * If the GPU is turned off, the configuration is scheduled to be applied on the next
+ * time the GPU is turned on.
+ *
+ * A configuration is enabled by executing read/write/poll ops defined in config->enable_seq.
+ *
+ * Return: 0 if success. Error code on failure.
+ */
+int kbase_debug_coresight_csf_config_enable(void *config_data);
+/**
+ * kbase_debug_coresight_csf_config_disable - Disables a coresight configuration
+ *
+ * @config_data: Pointer to coresight configuration.
+ *
+ * If the GPU is turned off, this is effective a NOP as kbase should have disabled
+ * the configuration when GPU is off.
+ * If the GPU is on, the configuration will be disabled.
+ *
+ * A configuration is disabled by executing read/write/poll ops defined in config->disable_seq.
+ *
+ * Return: 0 if success. Error code on failure.
+ */
+int kbase_debug_coresight_csf_config_disable(void *config_data);
+
+#endif /* _KBASE_DEBUG_CORESIGHT_CSF_ */
diff --git a/common/include/linux/memory_group_manager.h b/common/include/linux/memory_group_manager.h
index efa35f5..7561363 100644
--- a/common/include/linux/memory_group_manager.h
+++ b/common/include/linux/memory_group_manager.h
@@ -30,7 +30,7 @@
 typedef int vm_fault_t;
 #endif
 
-#define MEMORY_GROUP_MANAGER_NR_GROUPS (16)
+#define MEMORY_GROUP_MANAGER_NR_GROUPS (4)
 
 struct memory_group_manager_device;
 struct memory_group_manager_import_data;
@@ -43,6 +43,8 @@ struct memory_group_manager_import_data;
  * @mgm_free_page:            Callback to free physical memory in a group
  * @mgm_get_import_memory_id: Callback to get the group ID for imported memory
  * @mgm_update_gpu_pte:       Callback to modify a GPU page table entry
+ * @mgm_pte_to_original_pte:  Callback to get the original PTE entry as given
+ *                            to mgm_update_gpu_pte
  * @mgm_vmf_insert_pfn_prot:  Callback to map a physical memory page for the CPU
  */
 struct memory_group_manager_ops {
@@ -120,7 +122,8 @@ struct memory_group_manager_ops {
 	 * This function allows the memory group manager to modify a GPU page
 	 * table entry before it is stored by the kbase module (controller
 	 * driver). It may set certain bits in the page table entry attributes
-	 * or in the physical address, based on the physical memory group ID.
+	 * or modify the physical address, based on the physical memory group ID
+	 * and/or additional data in struct memory_group_manager_device.
 	 *
 	 * Return: A modified GPU page table entry to be stored in a page table.
 	 */
@@ -128,6 +131,27 @@ struct memory_group_manager_ops {
 			int group_id, int mmu_level, u64 pte);
 
 	/*
+	 * mgm_pte_to_original_pte - Undo any modification done during mgm_update_gpu_pte()
+	 *
+	 * @mgm_dev:   The memory group manager through which the request
+	 *             is being made.
+	 * @group_id:  A physical memory group ID. The meaning of this is
+	 *             defined by the systems integrator. Its valid range is
+	 *             0 .. MEMORY_GROUP_MANAGER_NR_GROUPS-1.
+	 * @mmu_level: The level of the page table entry in @ate.
+	 * @pte:       The page table entry to restore the original representation for,
+	 *             in LPAE or AArch64 format (depending on the driver's configuration).
+	 *
+	 * Undo any modifications done during mgm_update_gpu_pte().
+	 * This function allows getting back the original PTE entry as given
+	 * to mgm_update_gpu_pte().
+	 *
+	 * Return: PTE entry as originally specified to mgm_update_gpu_pte()
+	 */
+	u64 (*mgm_pte_to_original_pte)(struct memory_group_manager_device *mgm_dev, int group_id,
+				       int mmu_level, u64 pte);
+
+	/*
 	 * mgm_vmf_insert_pfn_prot - Map a physical page in a group for the CPU
 	 *
 	 * @mgm_dev:   The memory group manager through which the request
diff --git a/common/include/linux/version_compat_defs.h b/common/include/linux/version_compat_defs.h
index 8d289f2..47551f2 100644
--- a/common/include/linux/version_compat_defs.h
+++ b/common/include/linux/version_compat_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,11 +23,46 @@
 #define _VERSION_COMPAT_DEFS_H_
 
 #include <linux/version.h>
+#include <linux/highmem.h>
+#include <linux/timer.h>
 
-#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE
+#if (KERNEL_VERSION(4, 4, 267) < LINUX_VERSION_CODE)
+#include <linux/overflow.h>
+#endif
+
+#include <linux/bitops.h>
+#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE)
+#include <linux/bits.h>
+#endif
+
+#ifndef BITS_PER_TYPE
+#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
+#endif
+
+#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE
 typedef unsigned int __poll_t;
 #endif
 
+#if KERNEL_VERSION(4, 9, 78) >= LINUX_VERSION_CODE
+
+#ifndef EPOLLHUP
+#define EPOLLHUP POLLHUP
+#endif
+
+#ifndef EPOLLERR
+#define EPOLLERR POLLERR
+#endif
+
+#ifndef EPOLLIN
+#define EPOLLIN POLLIN
+#endif
+
+#ifndef EPOLLRDNORM
+#define EPOLLRDNORM POLLRDNORM
+#endif
+
+#endif
+
 #if KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE
 /* This is defined inside kbase for matching the default to kernel's
  * mmap_min_addr, used inside file mali_kbase_mmap.c.
@@ -36,21 +71,173 @@ typedef unsigned int __poll_t;
  */
 #ifdef CONFIG_MMU
 #define kbase_mmap_min_addr CONFIG_DEFAULT_MMAP_MIN_ADDR
+
 #ifdef CONFIG_LSM_MMAP_MIN_ADDR
 #if (CONFIG_LSM_MMAP_MIN_ADDR > CONFIG_DEFAULT_MMAP_MIN_ADDR)
 /* Replace the default definition with CONFIG_LSM_MMAP_MIN_ADDR */
 #undef kbase_mmap_min_addr
 #define kbase_mmap_min_addr CONFIG_LSM_MMAP_MIN_ADDR
-#pragma message "kbase_mmap_min_addr compiled to CONFIG_LSM_MMAP_MIN_ADDR, no runtime update!"
+#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG                                                           \
+	"* MALI kbase_mmap_min_addr compiled to CONFIG_LSM_MMAP_MIN_ADDR, no runtime update possible! *"
 #endif /* (CONFIG_LSM_MMAP_MIN_ADDR > CONFIG_DEFAULT_MMAP_MIN_ADDR) */
 #endif /* CONFIG_LSM_MMAP_MIN_ADDR */
+
 #if (kbase_mmap_min_addr == CONFIG_DEFAULT_MMAP_MIN_ADDR)
-#pragma message "kbase_mmap_min_addr compiled to CONFIG_DEFAULT_MMAP_MIN_ADDR, no runtime update!"
+#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG                                                           \
+	"* MALI kbase_mmap_min_addr compiled to CONFIG_DEFAULT_MMAP_MIN_ADDR, no runtime update possible! *"
 #endif
+
 #else /* CONFIG_MMU */
 #define kbase_mmap_min_addr (0UL)
-#pragma message "kbase_mmap_min_addr compiled to (0UL), no runtime update!"
+#define KBASE_COMPILED_MMAP_MIN_ADDR_MSG                                                           \
+	"* MALI kbase_mmap_min_addr compiled to (0UL), no runtime update possible! *"
 #endif /* CONFIG_MMU */
 #endif /* KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE */
 
+static inline void kbase_timer_setup(struct timer_list *timer,
+				     void (*callback)(struct timer_list *timer))
+{
+#if KERNEL_VERSION(4, 14, 0) > LINUX_VERSION_CODE
+	setup_timer(timer, (void (*)(unsigned long))callback, (unsigned long)timer);
+#else
+	timer_setup(timer, callback, 0);
+#endif
+}
+
+#ifndef WRITE_ONCE
+#ifdef ASSIGN_ONCE
+#define WRITE_ONCE(x, val) ASSIGN_ONCE(val, x)
+#else
+#define WRITE_ONCE(x, val) (ACCESS_ONCE(x) = (val))
+#endif
+#endif
+
+#ifndef READ_ONCE
+#define READ_ONCE(x) ACCESS_ONCE(x)
+#endif
+
+static inline void *kbase_kmap(struct page *p)
+{
+#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE
+	return kmap_local_page(p);
+#else
+	return kmap(p);
+#endif /* KERNEL_VERSION(5, 11, 0) */
+}
+
+static inline void *kbase_kmap_atomic(struct page *p)
+{
+#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE
+	return kmap_local_page(p);
+#else
+	return kmap_atomic(p);
+#endif /* KERNEL_VERSION(5, 11, 0) */
+}
+
+static inline void kbase_kunmap(struct page *p, void *address)
+{
+#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE
+	kunmap_local(address);
+#else
+	kunmap(p);
+#endif /* KERNEL_VERSION(5, 11, 0) */
+}
+
+static inline void kbase_kunmap_atomic(void *address)
+{
+#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE
+	kunmap_local(address);
+#else
+	kunmap_atomic(address);
+#endif /* KERNEL_VERSION(5, 11, 0) */
+}
+
+/* Some of the older 4.4 kernel patch versions do
+ * not contain the overflow check functions. However,
+ * they are based on compiler instrinsics, so they
+ * are simple to reproduce.
+ */
+#if (KERNEL_VERSION(4, 4, 267) >= LINUX_VERSION_CODE)
+/* Some of the older 4.4 kernel patch versions do
+ * not contain the overflow check functions. However,
+ * they are based on compiler instrinsics, so they
+ * are simple to reproduce.
+ */
+#define check_mul_overflow(a, b, d) __builtin_mul_overflow(a, b, d)
+#endif
+
+/*
+ * There was a big rename in the 4.10 kernel (fence* -> dma_fence*),
+ * with most of the related functions keeping the same signatures.
+ */
+
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+
+#include <linux/fence.h>
+
+#define dma_fence fence
+#define dma_fence_ops fence_ops
+#define dma_fence_context_alloc(a) fence_context_alloc(a)
+#define dma_fence_init(a, b, c, d, e) fence_init(a, b, c, d, e)
+#define dma_fence_get(a) fence_get(a)
+#define dma_fence_put(a) fence_put(a)
+#define dma_fence_signal(a) fence_signal(a)
+#define dma_fence_is_signaled(a) fence_is_signaled(a)
+#define dma_fence_add_callback(a, b, c) fence_add_callback(a, b, c)
+#define dma_fence_remove_callback(a, b) fence_remove_callback(a, b)
+#define dma_fence_default_wait fence_default_wait
+
+#if (KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE)
+#define dma_fence_get_status(a) (fence_is_signaled(a) ? (a)->error ?: 1 : 0)
+#else
+#define dma_fence_get_status(a) (fence_is_signaled(a) ? (a)->status ?: 1 : 0)
+#endif
+
+#else
+
+#include <linux/dma-fence.h>
+
+#if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE)
+#define dma_fence_get_status(a) (dma_fence_is_signaled(a) ? (a)->status ?: 1 : 0)
+#endif
+
+#endif /* < 4.10.0 */
+
+static inline void dma_fence_set_error_helper(
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+					      struct fence *fence,
+#else
+					      struct dma_fence *fence,
+#endif
+					      int error)
+{
+#if (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE)
+	dma_fence_set_error(fence, error);
+#elif (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \
+		KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE)
+	fence_set_error(fence, error);
+#else
+	fence->status = error;
+#endif
+}
+
+#include <linux/mm.h>
+#if !((KERNEL_VERSION(6, 3, 0) <= LINUX_VERSION_CODE) || \
+      ((KERNEL_VERSION(6, 1, 25) <= LINUX_VERSION_CODE) && defined(__ANDROID_COMMON_KERNEL__)))
+static inline void vm_flags_set(struct vm_area_struct *vma, vm_flags_t flags)
+{
+	vma->vm_flags |= flags;
+}
+static inline void vm_flags_clear(struct vm_area_struct *vma, vm_flags_t flags)
+{
+	vma->vm_flags &= ~flags;
+}
+#endif
+
+#if (KERNEL_VERSION(6, 4, 0) <= LINUX_VERSION_CODE)
+#define KBASE_CLASS_CREATE(owner, name) class_create(name)
+#else
+#define KBASE_CLASS_CREATE(owner, name) class_create(owner, name)
+#endif
+
 #endif /* _VERSION_COMPAT_DEFS_H_ */
diff --git a/common/include/linux/dma-buf-test-exporter.h b/common/include/uapi/base/arm/dma_buf_test_exporter/dma-buf-test-exporter.h
index aae12f9..a92e296 100644
--- a/common/include/linux/dma-buf-test-exporter.h
+++ b/common/include/uapi/base/arm/dma_buf_test_exporter/dma-buf-test-exporter.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2012-2013, 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2013, 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,21 +19,22 @@
  *
  */
 
-#ifndef _LINUX_DMA_BUF_TEST_EXPORTER_H_
-#define _LINUX_DMA_BUF_TEST_EXPORTER_H_
+#ifndef _UAPI_DMA_BUF_TEST_EXPORTER_H_
+#define _UAPI_DMA_BUF_TEST_EXPORTER_H_
 
 #include <linux/types.h>
 #include <linux/ioctl.h>
 
-#define DMA_BUF_TE_VER_MAJOR 1
-#define DMA_BUF_TE_VER_MINOR 0
 #define DMA_BUF_TE_ENQ 0x642d7465
 #define DMA_BUF_TE_ACK 0x68692100
 
 struct dma_buf_te_ioctl_version {
-	int op;    /**< Must be set to DMA_BUF_TE_ENQ by client, driver will set it to DMA_BUF_TE_ACK */
-	int major; /**< Major version */
-	int minor; /**< Minor version */
+	/** Must be set to DMA_BUF_TE_ENQ by client, driver will set it to DMA_BUF_TE_ACK */
+	int op;
+	/** Major version */
+	int major;
+	/** Minor version */
+	int minor;
 };
 
 struct dma_buf_te_ioctl_alloc {
@@ -46,7 +47,7 @@ struct dma_buf_te_ioctl_status {
 	/* out */
 	int attached_devices; /* number of devices attached (active 'dma_buf_attach's) */
 	int device_mappings; /* number of device mappings (active 'dma_buf_map_attachment's) */
-	int cpu_mappings;    /* number of cpu mappings (active 'mmap's) */
+	int cpu_mappings; /* number of cpu mappings (active 'mmap's) */
 };
 
 struct dma_buf_te_ioctl_set_failing {
@@ -66,11 +67,12 @@ struct dma_buf_te_ioctl_fill {
 
 #define DMA_BUF_TE_IOCTL_BASE 'E'
 /* Below all returning 0 if successful or -errcode except DMA_BUF_TE_ALLOC which will return fd or -errcode */
-#define DMA_BUF_TE_VERSION         _IOR(DMA_BUF_TE_IOCTL_BASE, 0x00, struct dma_buf_te_ioctl_version)
-#define DMA_BUF_TE_ALLOC           _IOR(DMA_BUF_TE_IOCTL_BASE, 0x01, struct dma_buf_te_ioctl_alloc)
-#define DMA_BUF_TE_QUERY           _IOR(DMA_BUF_TE_IOCTL_BASE, 0x02, struct dma_buf_te_ioctl_status)
-#define DMA_BUF_TE_SET_FAILING     _IOW(DMA_BUF_TE_IOCTL_BASE, 0x03, struct dma_buf_te_ioctl_set_failing)
-#define DMA_BUF_TE_ALLOC_CONT      _IOR(DMA_BUF_TE_IOCTL_BASE, 0x04, struct dma_buf_te_ioctl_alloc)
-#define DMA_BUF_TE_FILL            _IOR(DMA_BUF_TE_IOCTL_BASE, 0x05, struct dma_buf_te_ioctl_fill)
+#define DMA_BUF_TE_VERSION _IOR(DMA_BUF_TE_IOCTL_BASE, 0x00, struct dma_buf_te_ioctl_version)
+#define DMA_BUF_TE_ALLOC _IOR(DMA_BUF_TE_IOCTL_BASE, 0x01, struct dma_buf_te_ioctl_alloc)
+#define DMA_BUF_TE_QUERY _IOR(DMA_BUF_TE_IOCTL_BASE, 0x02, struct dma_buf_te_ioctl_status)
+#define DMA_BUF_TE_SET_FAILING                                                                     \
+	_IOW(DMA_BUF_TE_IOCTL_BASE, 0x03, struct dma_buf_te_ioctl_set_failing)
+#define DMA_BUF_TE_ALLOC_CONT _IOR(DMA_BUF_TE_IOCTL_BASE, 0x04, struct dma_buf_te_ioctl_alloc)
+#define DMA_BUF_TE_FILL _IOR(DMA_BUF_TE_IOCTL_BASE, 0x05, struct dma_buf_te_ioctl_fill)
 
-#endif /* _LINUX_DMA_BUF_TEST_EXPORTER_H_ */
+#endif /* _UAPI_DMA_BUF_TEST_EXPORTER_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h
index 9d677ca..a44da7b 100644
--- a/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h
+++ b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -29,7 +29,11 @@
 #include <linux/types.h>
 
 #define KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS (4)
+#if MALI_USE_CSF
+#define KBASE_DUMMY_MODEL_COUNTER_PER_CORE      (65)
+#else /* MALI_USE_CSF */
 #define KBASE_DUMMY_MODEL_COUNTER_PER_CORE      (60)
+#endif /* !MALI_USE_CSF */
 #define KBASE_DUMMY_MODEL_COUNTERS_PER_BIT      (4)
 #define KBASE_DUMMY_MODEL_COUNTER_ENABLED(enable_mask, ctr_idx) \
 	(enable_mask & (1 << (ctr_idx / KBASE_DUMMY_MODEL_COUNTERS_PER_BIT)))
@@ -43,13 +47,29 @@
 	(KBASE_DUMMY_MODEL_VALUES_PER_BLOCK * sizeof(__u32))
 #define KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS      8
 #define KBASE_DUMMY_MODEL_MAX_SHADER_CORES       32
-#define KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS    \
+#define KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS 0
+#define KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS                                                  \
 	(1 + 1 + KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS + KBASE_DUMMY_MODEL_MAX_SHADER_CORES)
+#define KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS                                                      \
+	(KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS + KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS)
 #define KBASE_DUMMY_MODEL_COUNTER_TOTAL                                        \
 	(KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS *                               \
 	 KBASE_DUMMY_MODEL_COUNTER_PER_CORE)
+#define KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE                                                    \
+	(KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS * KBASE_DUMMY_MODEL_VALUES_PER_BLOCK)
+#define KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE                                                          \
+	(KBASE_DUMMY_MODEL_MAX_NUM_PERF_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE)
 
+/*
+ * Bit mask - no. bits set is no. cores
+ * Values obtained from talking to HW team
+ * Example: tODx has 10 cores, 0b11 1111 1111 -> 0x3FF
+ */
 #define DUMMY_IMPLEMENTATION_SHADER_PRESENT (0xFull)
+#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TBEX (0x7FFFull)
+#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TODX (0x3FFull)
+#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTUX (0x7FFull)
+#define DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTIX (0xFFFull)
 #define DUMMY_IMPLEMENTATION_TILER_PRESENT (0x1ull)
 #define DUMMY_IMPLEMENTATION_L2_PRESENT (0x1ull)
 #define DUMMY_IMPLEMENTATION_STACK_PRESENT (0xFull)
diff --git a/mali_kbase/mali_kbase_bits.h b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h
index a085fd8..c83cedd 100644
--- a/mali_kbase/mali_kbase_bits.h
+++ b/common/include/uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,13 +19,18 @@
  *
  */
 
-#ifndef _KBASE_BITS_H_
-#define _KBASE_BITS_H_
+/*
+ * Dummy Model interface
+ */
+
+#ifndef _UAPI_KBASE_MODEL_LINUX_H_
+#define _UAPI_KBASE_MODEL_LINUX_H_
+
+/* Generic model IRQs */
+#define MODEL_LINUX_JOB_IRQ (0x1 << 0)
+#define MODEL_LINUX_GPU_IRQ (0x1 << 1)
+#define MODEL_LINUX_MMU_IRQ (0x1 << 2)
 
-#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE)
-#include <linux/bits.h>
-#else
-#include <linux/bitops.h>
-#endif
+#define MODEL_LINUX_IRQ_MASK (MODEL_LINUX_JOB_IRQ | MODEL_LINUX_GPU_IRQ | MODEL_LINUX_MMU_IRQ)
 
-#endif /* _KBASE_BITS_H_ */
+#endif /* _UAPI_KBASE_MODEL_LINUX_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h b/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h
index 7f7b9dd..a8e5802 100644
--- a/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h
+++ b/common/include/uapi/gpu/arm/midgard/csf/mali_base_csf_kernel.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,99 +23,16 @@
 #define _UAPI_BASE_CSF_KERNEL_H_
 
 #include <linux/types.h>
+#include "../mali_base_common_kernel.h"
 
-/* Memory allocation, access/hint flags.
+/* Memory allocation, access/hint flags & mask specific to CSF GPU.
  *
  * See base_mem_alloc_flags.
  */
 
-/* IN */
-/* Read access CPU side
- */
-#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0)
-
-/* Write access CPU side
- */
-#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1)
-
-/* Read access GPU side
- */
-#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2)
-
-/* Write access GPU side
- */
-#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3)
-
-/* Execute allowed on the GPU side
- */
-#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4)
-
-/* Will be permanently mapped in kernel space.
- * Flag is only allowed on allocations originating from kbase.
- */
-#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5)
-
-/* The allocation will completely reside within the same 4GB chunk in the GPU
- * virtual space.
- * Since this flag is primarily required only for the TLS memory which will
- * not be used to contain executable code and also not used for Tiler heap,
- * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags.
- */
-#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6)
-
-/* Userspace is not allowed to free this memory.
- * Flag is only allowed on allocations originating from kbase.
- */
-#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7)
-
 /* Must be FIXED memory. */
 #define BASE_MEM_FIXED ((base_mem_alloc_flags)1 << 8)
 
-/* Grow backing store on GPU Page Fault
- */
-#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9)
-
-/* Page coherence Outer shareable, if available
- */
-#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10)
-
-/* Page coherence Inner shareable
- */
-#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11)
-
-/* IN/OUT */
-/* Should be cached on the CPU, returned if actually cached
- */
-#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12)
-
-/* IN/OUT */
-/* Must have same VA on both the GPU and the CPU
- */
-#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13)
-
-/* OUT */
-/* Must call mmap to acquire a GPU address for the alloc
- */
-#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14)
-
-/* IN */
-/* Page coherence Outer shareable, required.
- */
-#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15)
-
-/* Protected memory
- */
-#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16)
-
-/* Not needed physical memory
- */
-#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17)
-
-/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the
- * addresses to be the same
- */
-#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18)
-
 /* CSF event memory
  *
  * If Outer shareable coherence is not specified or not available, then on
@@ -131,46 +48,15 @@
 
 #define BASE_MEM_RESERVED_BIT_20 ((base_mem_alloc_flags)1 << 20)
 
-/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu
- * mode. Some components within the GPU might only be able to access memory
- * that is GPU cacheable. Refer to the specific GPU implementation for more
- * details. The 3 shareability flags will be ignored for GPU uncached memory.
- * If used while importing USER_BUFFER type memory, then the import will fail
- * if the memory is not aligned to GPU and CPU cache line width.
- */
-#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21)
-
-/*
- * Bits [22:25] for group_id (0~15).
- *
- * base_mem_group_id_set() should be used to pack a memory group ID into a
- * base_mem_alloc_flags value instead of accessing the bits directly.
- * base_mem_group_id_get() should be used to extract the memory group ID from
- * a base_mem_alloc_flags value.
- */
-#define BASEP_MEM_GROUP_ID_SHIFT 22
-#define BASE_MEM_GROUP_ID_MASK \
-	((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT)
-
-/* Must do CPU cache maintenance when imported memory is mapped/unmapped
- * on GPU. Currently applicable to dma-buf type only.
- */
-#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26)
-
-/* OUT */
-/* Kernel side cache sync ops required */
-#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28)
 
 /* Must be FIXABLE memory: its GPU VA will be determined at a later point,
  * at which time it will be at a fixed GPU VA.
  */
 #define BASE_MEM_FIXABLE ((base_mem_alloc_flags)1 << 29)
 
-/* Number of bits used as flags for base memory management
- *
- * Must be kept in sync with the base_mem_alloc_flags flags
+/* Note that the number of bits used for base_mem_alloc_flags
+ * must be less than BASE_MEM_FLAGS_NR_BITS !!!
  */
-#define BASE_MEM_FLAGS_NR_BITS 30
 
 /* A mask of all the flags which are only valid for allocations within kbase,
  * and may not be passed from user space.
@@ -178,62 +64,23 @@
 #define BASEP_MEM_FLAGS_KERNEL_ONLY \
 	(BASEP_MEM_PERMANENT_KERNEL_MAPPING | BASEP_MEM_NO_USER_FREE)
 
-/* A mask for all output bits, excluding IN/OUT bits.
- */
-#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP
-
-/* A mask for all input bits, including IN/OUT bits.
- */
-#define BASE_MEM_FLAGS_INPUT_MASK \
-	(((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK)
-
 /* A mask of all currently reserved flags
  */
 #define BASE_MEM_FLAGS_RESERVED BASE_MEM_RESERVED_BIT_20
 
-#define BASEP_MEM_INVALID_HANDLE (0ul)
-#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT)
-#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT)
-/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */
+/* Special base mem handles specific to CSF.
+ */
 #define BASEP_MEM_CSF_USER_REG_PAGE_HANDLE (47ul << LOCAL_PAGE_SHIFT)
 #define BASEP_MEM_CSF_USER_IO_PAGES_HANDLE (48ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_FIRST_FREE_ADDRESS                                            \
-	((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE)
 
 #define KBASE_CSF_NUM_USER_IO_PAGES_HANDLE \
 	((BASE_MEM_COOKIE_BASE - BASEP_MEM_CSF_USER_IO_PAGES_HANDLE) >> \
 	 LOCAL_PAGE_SHIFT)
 
-/**
- * Valid set of just-in-time memory allocation flags
- */
+/* Valid set of just-in-time memory allocation flags */
 #define BASE_JIT_ALLOC_VALID_FLAGS ((__u8)0)
 
-/* Flags to pass to ::base_context_init.
- * Flags can be ORed together to enable multiple things.
- *
- * These share the same space as BASEP_CONTEXT_FLAG_*, and so must
- * not collide with them.
- */
-typedef __u32 base_context_create_flags;
-
-/* No flags set */
-#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0)
-
-/* Base context is embedded in a cctx object (flag used for CINSTR
- * software counter macros)
- */
-#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0)
-
-/* Base context is a 'System Monitor' context for Hardware counters.
- *
- * One important side effect of this is that job submission is disabled.
- */
-#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED \
-	((base_context_create_flags)1 << 1)
+/* flags for base context specific to CSF */
 
 /* Base context creates a CSF event notification thread.
  *
@@ -242,22 +89,6 @@ typedef __u32 base_context_create_flags;
  */
 #define BASE_CONTEXT_CSF_EVENT_THREAD ((base_context_create_flags)1 << 2)
 
-/* Bit-shift used to encode a memory group ID in base_context_create_flags
- */
-#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3)
-
-/* Bitmask used to encode a memory group ID in base_context_create_flags
- */
-#define BASEP_CONTEXT_MMU_GROUP_ID_MASK \
-	((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT)
-
-/* Bitpattern describing the base_context_create_flags that can be
- * passed to the kernel
- */
-#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS \
-	(BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | \
-	 BASEP_CONTEXT_MMU_GROUP_ID_MASK)
-
 /* Bitpattern describing the ::base_context_create_flags that can be
  * passed to base_context_init()
  */
@@ -266,15 +97,7 @@ typedef __u32 base_context_create_flags;
 	 BASE_CONTEXT_CSF_EVENT_THREAD | \
 	 BASEP_CONTEXT_CREATE_KERNEL_FLAGS)
 
-/* Enable additional tracepoints for latency measurements (TL_ATOM_READY,
- * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST)
- */
-#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0)
-
-/* Indicate that job dumping is enabled. This could affect certain timers
- * to account for the performance impact.
- */
-#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1)
+/* Flags for base tracepoint specific to CSF */
 
 /* Enable KBase tracepoints for CSF builds */
 #define BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS (1 << 2)
@@ -295,9 +118,21 @@ typedef __u32 base_context_create_flags;
 
 #define BASE_QUEUE_MAX_PRIORITY (15U)
 
-/* CQS Sync object is an array of __u32 event_mem[2], error field index is 1 */
-#define BASEP_EVENT_VAL_INDEX (0U)
-#define BASEP_EVENT_ERR_INDEX (1U)
+/* Sync32 object fields definition */
+#define BASEP_EVENT32_VAL_OFFSET (0U)
+#define BASEP_EVENT32_ERR_OFFSET (4U)
+#define BASEP_EVENT32_SIZE_BYTES (8U)
+
+/* Sync64 object fields definition */
+#define BASEP_EVENT64_VAL_OFFSET (0U)
+#define BASEP_EVENT64_ERR_OFFSET (8U)
+#define BASEP_EVENT64_SIZE_BYTES (16U)
+
+/* Sync32 object alignment, equal to its size */
+#define BASEP_EVENT32_ALIGN_BYTES (8U)
+
+/* Sync64 object alignment, equal to its size */
+#define BASEP_EVENT64_ALIGN_BYTES (16U)
 
 /* The upper limit for number of objects that could be waited/set per command.
  * This limit is now enforced as internally the error inherit inputs are
@@ -306,6 +141,13 @@ typedef __u32 base_context_create_flags;
  */
 #define BASEP_KCPU_CQS_MAX_NUM_OBJS ((size_t)32)
 
+/* CSF CSI EXCEPTION_HANDLER_FLAGS */
+#define BASE_CSF_TILER_OOM_EXCEPTION_FLAG (1u << 0)
+#define BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK (BASE_CSF_TILER_OOM_EXCEPTION_FLAG)
+
+/* Initial value for LATEST_FLUSH register */
+#define POWER_DOWN_LATEST_FLUSH_VALUE ((uint32_t)1)
+
 /**
  * enum base_kcpu_command_type - Kernel CPU queue command type.
  * @BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL:       fence_signal,
@@ -335,7 +177,7 @@ enum base_kcpu_command_type {
 	BASE_KCPU_COMMAND_TYPE_JIT_ALLOC,
 	BASE_KCPU_COMMAND_TYPE_JIT_FREE,
 	BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND,
-	BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER
+	BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER,
 };
 
 /**
@@ -725,4 +567,47 @@ struct base_csf_notification {
 	} payload;
 };
 
+/**
+ * struct mali_base_gpu_core_props - GPU core props info
+ *
+ * @product_id: Pro specific value.
+ * @version_status: Status of the GPU release. No defined values, but starts at
+ *   0 and increases by one for each release status (alpha, beta, EAC, etc.).
+ *   4 bit values (0-15).
+ * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn"
+ *   release number.
+ *   8 bit values (0-255).
+ * @major_revision: Major release number of the GPU. "R" part of an "RnPn"
+ *   release number.
+ *   4 bit values (0-15).
+ * @padding: padding to align to 8-byte
+ * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by
+ *   clGetDeviceInfo()
+ * @log2_program_counter_size: Size of the shader program counter, in bits.
+ * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This
+ *   is a bitpattern where a set bit indicates that the format is supported.
+ *   Before using a texture format, it is recommended that the corresponding
+ *   bit be checked.
+ * @paddings: Padding bytes.
+ * @gpu_available_memory_size: Theoretical maximum memory available to the GPU.
+ *   It is unlikely that a client will be able to allocate all of this memory
+ *   for their own purposes, but this at least provides an upper bound on the
+ *   memory available to the GPU.
+ *   This is required for OpenCL's clGetDeviceInfo() call when
+ *   CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The
+ *   client will not be expecting to allocate anywhere near this value.
+ */
+struct mali_base_gpu_core_props {
+	__u32 product_id;
+	__u16 version_status;
+	__u16 minor_revision;
+	__u16 major_revision;
+	__u16 padding;
+	__u32 gpu_freq_khz_max;
+	__u32 log2_program_counter_size;
+	__u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS];
+	__u8 paddings[4];
+	__u64 gpu_available_memory_size;
+};
+
 #endif /* _UAPI_BASE_CSF_KERNEL_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h
new file mode 100644
index 0000000..f49ab00
--- /dev/null
+++ b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_
+#define _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_
+
+/**
+ * enum dumpfault_error_type - Enumeration to define errors to be dumped
+ *
+ * @DF_NO_ERROR:                       No pending error
+ * @DF_CSG_SUSPEND_TIMEOUT:            CSG suspension timeout
+ * @DF_CSG_TERMINATE_TIMEOUT:          CSG group termination timeout
+ * @DF_CSG_START_TIMEOUT:              CSG start timeout
+ * @DF_CSG_RESUME_TIMEOUT:             CSG resume timeout
+ * @DF_CSG_EP_CFG_TIMEOUT:             CSG end point configuration timeout
+ * @DF_CSG_STATUS_UPDATE_TIMEOUT:      CSG status update timeout
+ * @DF_PROGRESS_TIMER_TIMEOUT:         Progress timer timeout
+ * @DF_FW_INTERNAL_ERROR:              Firmware internal error
+ * @DF_CS_FATAL:                       CS fatal error
+ * @DF_CS_FAULT:                       CS fault error
+ * @DF_FENCE_WAIT_TIMEOUT:             Fence wait timeout
+ * @DF_PROTECTED_MODE_EXIT_TIMEOUT:    P.mode exit timeout
+ * @DF_PROTECTED_MODE_ENTRY_FAILURE:   P.mode entrance failure
+ * @DF_PING_REQUEST_TIMEOUT:           Ping request timeout
+ * @DF_CORE_DOWNSCALE_REQUEST_TIMEOUT: DCS downscale request timeout
+ * @DF_TILER_OOM:                      Tiler Out-of-memory error
+ * @DF_GPU_PAGE_FAULT:                 GPU page fault
+ * @DF_BUS_FAULT:                      MMU BUS Fault
+ * @DF_GPU_PROTECTED_FAULT:            GPU P.mode fault
+ * @DF_AS_ACTIVE_STUCK:                AS active stuck
+ * @DF_GPU_SOFT_RESET_FAILURE:         GPU soft reset falure
+ *
+ * This is used for kbase to notify error type of an event whereby
+ * user space client will dump relevant debugging information via debugfs.
+ * @DF_NO_ERROR is used to indicate no pending fault, thus the client will
+ * be blocked on reading debugfs file till a fault happens.
+ */
+enum dumpfault_error_type {
+	DF_NO_ERROR = 0,
+	DF_CSG_SUSPEND_TIMEOUT,
+	DF_CSG_TERMINATE_TIMEOUT,
+	DF_CSG_START_TIMEOUT,
+	DF_CSG_RESUME_TIMEOUT,
+	DF_CSG_EP_CFG_TIMEOUT,
+	DF_CSG_STATUS_UPDATE_TIMEOUT,
+	DF_PROGRESS_TIMER_TIMEOUT,
+	DF_FW_INTERNAL_ERROR,
+	DF_CS_FATAL,
+	DF_CS_FAULT,
+	DF_FENCE_WAIT_TIMEOUT,
+	DF_PROTECTED_MODE_EXIT_TIMEOUT,
+	DF_PROTECTED_MODE_ENTRY_FAILURE,
+	DF_PING_REQUEST_TIMEOUT,
+	DF_CORE_DOWNSCALE_REQUEST_TIMEOUT,
+	DF_TILER_OOM,
+	DF_GPU_PAGE_FAULT,
+	DF_BUS_FAULT,
+	DF_GPU_PROTECTED_FAULT,
+	DF_AS_ACTIVE_STUCK,
+	DF_GPU_SOFT_RESET_FAILURE,
+};
+
+#endif /* _UAPI_KBASE_CSF_ERRORS_DUMPFAULT_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h
index 1794ddc..c9de5fd 100644
--- a/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h
+++ b/common/include/uapi/gpu/arm/midgard/csf/mali_kbase_csf_ioctl.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -56,10 +56,44 @@
  * - Added new Base memory allocation interface
  * 1.10:
  * - First release of new HW performance counters interface.
+ * 1.11:
+ * - Dummy model (no mali) backend will now clear HWC values after each sample
+ * 1.12:
+ * - Added support for incremental rendering flag in CSG create call
+ * 1.13:
+ * - Added ioctl to query a register of USER page.
+ * 1.14:
+ * - Added support for passing down the buffer descriptor VA in tiler heap init
+ * 1.15:
+ * - Enable new sync_wait GE condition
+ * 1.16:
+ * - Remove legacy definitions:
+ *   - base_jit_alloc_info_10_2
+ *   - base_jit_alloc_info_11_5
+ *   - kbase_ioctl_mem_jit_init_10_2
+ *   - kbase_ioctl_mem_jit_init_11_5
+ * 1.17:
+ * - Fix kinstr_prfcnt issues:
+ *   - Missing implicit sample for CMD_STOP when HWCNT buffer is full.
+ *   - Race condition when stopping periodic sampling.
+ *   - prfcnt_block_metadata::block_idx gaps.
+ *   - PRFCNT_CONTROL_CMD_SAMPLE_ASYNC is removed.
+ * 1.18:
+ * - Relax the requirement to create a mapping with BASE_MEM_MAP_TRACKING_HANDLE
+ *   before allocating GPU memory for the context.
+ * - CPU mappings of USER_BUFFER imported memory handles must be cached.
+ * 1.19:
+ * - Add NE support in queue_group_create IOCTL fields
+ * - Previous version retained as KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18 for
+ *     backward compatibility.
+ * 1.20:
+ * - Restrict child process from doing supported file operations (like mmap, ioctl,
+ *   read, poll) on the file descriptor of mali device file that was inherited
+ *   from the parent process.
  */
 
 #define BASE_UK_VERSION_MAJOR 1
-#define BASE_UK_VERSION_MINOR 10
+#define BASE_UK_VERSION_MINOR 20
 
 /**
  * struct kbase_ioctl_version_check - Check version compatibility between
@@ -232,6 +266,56 @@ union kbase_ioctl_cs_queue_group_create_1_6 {
 	_IOWR(KBASE_IOCTL_TYPE, 42, union kbase_ioctl_cs_queue_group_create_1_6)
 
 /**
+ * union kbase_ioctl_cs_queue_group_create_1_18 - Create a GPU command queue group
+ * @in:               Input parameters
+ * @in.tiler_mask:    Mask of tiler endpoints the group is allowed to use.
+ * @in.fragment_mask: Mask of fragment endpoints the group is allowed to use.
+ * @in.compute_mask:  Mask of compute endpoints the group is allowed to use.
+ * @in.cs_min:        Minimum number of CSs required.
+ * @in.priority:      Queue group's priority within a process.
+ * @in.tiler_max:     Maximum number of tiler endpoints the group is allowed
+ *                    to use.
+ * @in.fragment_max:  Maximum number of fragment endpoints the group is
+ *                    allowed to use.
+ * @in.compute_max:   Maximum number of compute endpoints the group is allowed
+ *                    to use.
+ * @in.csi_handlers:  Flags to signal that the application intends to use CSI
+ *                    exception handlers in some linear buffers to deal with
+ *                    the given exception types.
+ * @in.padding:       Currently unused, must be zero
+ * @out:              Output parameters
+ * @out.group_handle: Handle of a newly created queue group.
+ * @out.padding:      Currently unused, must be zero
+ * @out.group_uid:    UID of the queue group available to base.
+ */
+union kbase_ioctl_cs_queue_group_create_1_18 {
+	struct {
+		__u64 tiler_mask;
+		__u64 fragment_mask;
+		__u64 compute_mask;
+		__u8 cs_min;
+		__u8 priority;
+		__u8 tiler_max;
+		__u8 fragment_max;
+		__u8 compute_max;
+		__u8 csi_handlers;
+		__u8 padding[2];
+		/**
+		 * @in.dvs_buf: buffer for deferred vertex shader
+		 */
+		__u64 dvs_buf;
+	} in;
+	struct {
+		__u8 group_handle;
+		__u8 padding[3];
+		__u32 group_uid;
+	} out;
+};
+
+#define KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18                                                     \
+	_IOWR(KBASE_IOCTL_TYPE, 58, union kbase_ioctl_cs_queue_group_create_1_18)
+
+/**
  * union kbase_ioctl_cs_queue_group_create - Create a GPU command queue group
  * @in:               Input parameters
  * @in.tiler_mask:    Mask of tiler endpoints the group is allowed to use.
@@ -245,6 +329,9 @@ union kbase_ioctl_cs_queue_group_create_1_6 {
  *                    allowed to use.
  * @in.compute_max:   Maximum number of compute endpoints the group is allowed
  *                    to use.
+ * @in.csi_handlers:  Flags to signal that the application intends to use CSI
+ *                    exception handlers in some linear buffers to deal with
+ *                    the given exception types.
  * @in.padding:       Currently unused, must be zero
  * @out:              Output parameters
  * @out.group_handle: Handle of a newly created queue group.
@@ -261,11 +348,16 @@ union kbase_ioctl_cs_queue_group_create {
 		__u8 tiler_max;
 		__u8 fragment_max;
 		__u8 compute_max;
-		__u8 padding[3];
+		__u8 csi_handlers;
+		/**
+		 * @in.reserved:   Reserved, currently unused, must be zero.
+		 */
+		__u16 reserved;
 		/**
-		 * @reserved: Reserved
+		 * @in.dvs_buf: buffer for deferred vertex shader
 		 */
-		__u64 reserved;
+		__u64 dvs_buf;
+		__u64 padding[9];
 	} in;
 	struct {
 		__u8 group_handle;
@@ -353,6 +445,7 @@ struct kbase_ioctl_kcpu_queue_enqueue {
  *                     allowed.
  * @in.group_id:       Group ID to be used for physical allocations.
  * @in.padding:        Padding
+ * @in.buf_desc_va:    Buffer descriptor GPU VA for tiler heap reclaims.
  * @out:               Output parameters
  * @out.gpu_heap_va:   GPU VA (virtual address) of Heap context that was set up
  *                     for the heap.
@@ -368,6 +461,7 @@ union kbase_ioctl_cs_tiler_heap_init {
 		__u16 target_in_flight;
 		__u8 group_id;
 		__u8 padding;
+		__u64 buf_desc_va;
 	} in;
 	struct {
 		__u64 gpu_heap_va;
@@ -379,6 +473,43 @@ union kbase_ioctl_cs_tiler_heap_init {
 	_IOWR(KBASE_IOCTL_TYPE, 48, union kbase_ioctl_cs_tiler_heap_init)
 
 /**
+ * union kbase_ioctl_cs_tiler_heap_init_1_13 - Initialize chunked tiler memory heap,
+ *                                             earlier version upto 1.13
+ * @in:                Input parameters
+ * @in.chunk_size:     Size of each chunk.
+ * @in.initial_chunks: Initial number of chunks that heap will be created with.
+ * @in.max_chunks:     Maximum number of chunks that the heap is allowed to use.
+ * @in.target_in_flight: Number of render-passes that the driver should attempt to
+ *                     keep in flight for which allocation of new chunks is
+ *                     allowed.
+ * @in.group_id:       Group ID to be used for physical allocations.
+ * @in.padding:        Padding
+ * @out:               Output parameters
+ * @out.gpu_heap_va:   GPU VA (virtual address) of Heap context that was set up
+ *                     for the heap.
+ * @out.first_chunk_va: GPU VA of the first chunk allocated for the heap,
+ *                     actually points to the header of heap chunk and not to
+ *                     the low address of free memory in the chunk.
+ */
+union kbase_ioctl_cs_tiler_heap_init_1_13 {
+	struct {
+		__u32 chunk_size;
+		__u32 initial_chunks;
+		__u32 max_chunks;
+		__u16 target_in_flight;
+		__u8 group_id;
+		__u8 padding;
+	} in;
+	struct {
+		__u64 gpu_heap_va;
+		__u64 first_chunk_va;
+	} out;
+};
+
+#define KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13                                                        \
+	_IOWR(KBASE_IOCTL_TYPE, 48, union kbase_ioctl_cs_tiler_heap_init_1_13)
+
+/**
  * struct kbase_ioctl_cs_tiler_heap_term - Terminate a chunked tiler heap
  *                                         instance
  *
@@ -479,6 +610,29 @@ union kbase_ioctl_mem_alloc_ex {
 
 #define KBASE_IOCTL_MEM_ALLOC_EX _IOWR(KBASE_IOCTL_TYPE, 59, union kbase_ioctl_mem_alloc_ex)
 
+/**
+ * union kbase_ioctl_read_user_page - Read a register of USER page
+ *
+ * @in:               Input parameters.
+ * @in.offset:        Register offset in USER page.
+ * @in.padding:       Padding to round up to a multiple of 8 bytes, must be zero.
+ * @out:              Output parameters.
+ * @out.val_lo:       Value of 32bit register or the 1st half of 64bit register to be read.
+ * @out.val_hi:       Value of the 2nd half of 64bit register to be read.
+ */
+union kbase_ioctl_read_user_page {
+	struct {
+		__u32 offset;
+		__u32 padding;
+	} in;
+	struct {
+		__u32 val_lo;
+		__u32 val_hi;
+	} out;
+};
+
+#define KBASE_IOCTL_READ_USER_PAGE _IOWR(KBASE_IOCTL_TYPE, 60, union kbase_ioctl_read_user_page)
+
 /***************
  * test ioctls *
  ***************/
diff --git a/mali_kbase/mali_kbase_strings.h b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_csf.h
index c3f94f9..eaa4b2d 100644
--- a/mali_kbase/mali_kbase_strings.h
+++ b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2016, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,5 +19,18 @@
  *
  */
 
-extern const char kbase_drv_name[];
-extern const char kbase_timeline_name[];
+#ifndef _UAPI_KBASE_GPU_REGMAP_CSF_H_
+#define _UAPI_KBASE_GPU_REGMAP_CSF_H_
+
+/* USER base address */
+#define USER_BASE 0x0010000
+#define USER_REG(r) (USER_BASE + (r))
+
+/* USER register offsets */
+#define LATEST_FLUSH 0x0000 /* () Flush ID of latest clean-and-invalidate operation */
+
+/* DOORBELLS base address */
+#define DOORBELLS_BASE 0x0080000
+#define DOORBELLS_REG(r) (DOORBELLS_BASE + (r))
+
+#endif /* _UAPI_KBASE_GPU_REGMAP_CSF_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h
index f466389..d24afcc 100644
--- a/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h
+++ b/common/include/uapi/gpu/arm/midgard/gpu/backend/mali_kbase_gpu_regmap_jm.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,9 +22,4 @@
 #ifndef _UAPI_KBASE_GPU_REGMAP_JM_H_
 #define _UAPI_KBASE_GPU_REGMAP_JM_H_
 
-/* GPU control registers */
-#define LATEST_FLUSH            0x038   /* (RO) Flush ID of latest
-					 * clean-and-invalidate operation
-					 */
-
 #endif /* _UAPI_KBASE_GPU_REGMAP_JM_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h
index 1a99e56..784e09a 100644
--- a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h
+++ b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h
@@ -119,13 +119,14 @@
 #define GPU_ID2_PRODUCT_TBEX              GPU_ID2_MODEL_MAKE(9, 2)
 #define GPU_ID2_PRODUCT_LBEX              GPU_ID2_MODEL_MAKE(9, 4)
 #define GPU_ID2_PRODUCT_TBAX              GPU_ID2_MODEL_MAKE(9, 5)
-#define GPU_ID2_PRODUCT_TDUX              GPU_ID2_MODEL_MAKE(10, 1)
 #define GPU_ID2_PRODUCT_TODX              GPU_ID2_MODEL_MAKE(10, 2)
 #define GPU_ID2_PRODUCT_TGRX              GPU_ID2_MODEL_MAKE(10, 3)
 #define GPU_ID2_PRODUCT_TVAX              GPU_ID2_MODEL_MAKE(10, 4)
 #define GPU_ID2_PRODUCT_LODX              GPU_ID2_MODEL_MAKE(10, 7)
 #define GPU_ID2_PRODUCT_TTUX              GPU_ID2_MODEL_MAKE(11, 2)
 #define GPU_ID2_PRODUCT_LTUX              GPU_ID2_MODEL_MAKE(11, 3)
+#define GPU_ID2_PRODUCT_TTIX              GPU_ID2_MODEL_MAKE(12, 0)
+#define GPU_ID2_PRODUCT_LTIX              GPU_ID2_MODEL_MAKE(12, 1)
 
 /**
  * GPU_ID_MAKE - Helper macro to generate GPU_ID using id, major, minor, status
diff --git a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h
index deca665..8256191 100644
--- a/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h
+++ b/common/include/uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,13 +22,10 @@
 #ifndef _UAPI_KBASE_GPU_REGMAP_H_
 #define _UAPI_KBASE_GPU_REGMAP_H_
 
-#if !MALI_USE_CSF
+#if MALI_USE_CSF
+#include "backend/mali_kbase_gpu_regmap_csf.h"
+#else
 #include "backend/mali_kbase_gpu_regmap_jm.h"
 #endif /* !MALI_USE_CSF */
 
-/* MMU control registers */
-#define MEMORY_MANAGEMENT_BASE  0x2000
-#define MMU_REG(r)              (MEMORY_MANAGEMENT_BASE + (r))
-#define MMU_IRQ_RAWSTAT         0x000   /* (RW) Raw interrupt status register */
-
 #endif /* _UAPI_KBASE_GPU_REGMAP_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h b/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h
index 94f4dc7..1a3098d 100644
--- a/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h
+++ b/common/include/uapi/gpu/arm/midgard/jm/mali_base_jm_kernel.h
@@ -23,100 +23,16 @@
 #define _UAPI_BASE_JM_KERNEL_H_
 
 #include <linux/types.h>
+#include "../mali_base_common_kernel.h"
 
-/* Memory allocation, access/hint flags.
+/* Memory allocation, access/hint flags & mask specific to JM GPU.
  *
  * See base_mem_alloc_flags.
  */
 
-/* IN */
-/* Read access CPU side
- */
-#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0)
-
-/* Write access CPU side
- */
-#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1)
-
-/* Read access GPU side
- */
-#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2)
-
-/* Write access GPU side
- */
-#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3)
-
-/* Execute allowed on the GPU side
- */
-#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4)
-
-/* Will be permanently mapped in kernel space.
- * Flag is only allowed on allocations originating from kbase.
- */
-#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5)
-
-/* The allocation will completely reside within the same 4GB chunk in the GPU
- * virtual space.
- * Since this flag is primarily required only for the TLS memory which will
- * not be used to contain executable code and also not used for Tiler heap,
- * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags.
- */
-#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6)
-
-/* Userspace is not allowed to free this memory.
- * Flag is only allowed on allocations originating from kbase.
- */
-#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7)
-
-/* Used as BASE_MEM_FIXED in other backends
- */
+/* Used as BASE_MEM_FIXED in other backends */
 #define BASE_MEM_RESERVED_BIT_8 ((base_mem_alloc_flags)1 << 8)
 
-/* Grow backing store on GPU Page Fault
- */
-#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9)
-
-/* Page coherence Outer shareable, if available
- */
-#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10)
-
-/* Page coherence Inner shareable
- */
-#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11)
-
-/* IN/OUT */
-/* Should be cached on the CPU, returned if actually cached
- */
-#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12)
-
-/* IN/OUT */
-/* Must have same VA on both the GPU and the CPU
- */
-#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13)
-
-/* OUT */
-/* Must call mmap to acquire a GPU address for the allocation
- */
-#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14)
-
-/* IN */
-/* Page coherence Outer shareable, required.
- */
-#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15)
-
-/* Protected memory
- */
-#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16)
-
-/* Not needed physical memory
- */
-#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17)
-
-/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the
- * addresses to be the same
- */
-#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18)
-
 /**
  * BASE_MEM_RESERVED_BIT_19 - Bit 19 is reserved.
  *
@@ -131,47 +47,15 @@
  */
 #define BASE_MEM_TILER_ALIGN_TOP ((base_mem_alloc_flags)1 << 20)
 
-/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu
- * mode. Some components within the GPU might only be able to access memory
- * that is GPU cacheable. Refer to the specific GPU implementation for more
- * details. The 3 shareability flags will be ignored for GPU uncached memory.
- * If used while importing USER_BUFFER type memory, then the import will fail
- * if the memory is not aligned to GPU and CPU cache line width.
- */
-#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21)
-
-/*
- * Bits [22:25] for group_id (0~15).
- *
- * base_mem_group_id_set() should be used to pack a memory group ID into a
- * base_mem_alloc_flags value instead of accessing the bits directly.
- * base_mem_group_id_get() should be used to extract the memory group ID from
- * a base_mem_alloc_flags value.
- */
-#define BASEP_MEM_GROUP_ID_SHIFT 22
-#define BASE_MEM_GROUP_ID_MASK \
-	((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT)
-
-/* Must do CPU cache maintenance when imported memory is mapped/unmapped
- * on GPU. Currently applicable to dma-buf type only.
- */
-#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26)
-
 /* Use the GPU VA chosen by the kernel client */
 #define BASE_MEM_FLAG_MAP_FIXED ((base_mem_alloc_flags)1 << 27)
 
-/* OUT */
-/* Kernel side cache sync ops required */
-#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28)
-
 /* Force trimming of JIT allocations when creating a new allocation */
 #define BASEP_MEM_PERFORM_JIT_TRIM ((base_mem_alloc_flags)1 << 29)
 
-/* Number of bits used as flags for base memory management
- *
- * Must be kept in sync with the base_mem_alloc_flags flags
+/* Note that the number of bits used for base_mem_alloc_flags
+ * must be less than BASE_MEM_FLAGS_NR_BITS !!!
  */
-#define BASE_MEM_FLAGS_NR_BITS 30
 
 /* A mask of all the flags which are only valid for allocations within kbase,
  * and may not be passed from user space.
@@ -180,29 +64,11 @@
 	(BASEP_MEM_PERMANENT_KERNEL_MAPPING | BASEP_MEM_NO_USER_FREE | \
 	 BASE_MEM_FLAG_MAP_FIXED | BASEP_MEM_PERFORM_JIT_TRIM)
 
-/* A mask for all output bits, excluding IN/OUT bits.
- */
-#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP
-
-/* A mask for all input bits, including IN/OUT bits.
- */
-#define BASE_MEM_FLAGS_INPUT_MASK \
-	(((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK)
-
 /* A mask of all currently reserved flags
  */
 #define BASE_MEM_FLAGS_RESERVED \
 	(BASE_MEM_RESERVED_BIT_8 | BASE_MEM_RESERVED_BIT_19)
 
-#define BASEP_MEM_INVALID_HANDLE (0ul)
-#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT)
-#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT)
-/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */
-#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT)
-#define BASE_MEM_FIRST_FREE_ADDRESS                                            \
-	((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE)
 
 /* Similar to BASE_MEM_TILER_ALIGN_TOP, memory starting from the end of the
  * initial commit is aligned to 'extension' pages, where 'extension' must be a power
@@ -227,47 +93,6 @@
 #define BASE_JIT_ALLOC_VALID_FLAGS \
 	(BASE_JIT_ALLOC_MEM_TILER_ALIGN_TOP | BASE_JIT_ALLOC_HEAP_INFO_IS_SIZE)
 
-/**
- * typedef base_context_create_flags - Flags to pass to ::base_context_init.
- *
- * Flags can be ORed together to enable multiple things.
- *
- * These share the same space as BASEP_CONTEXT_FLAG_*, and so must
- * not collide with them.
- */
-typedef __u32 base_context_create_flags;
-
-/* No flags set */
-#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0)
-
-/* Base context is embedded in a cctx object (flag used for CINSTR
- * software counter macros)
- */
-#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0)
-
-/* Base context is a 'System Monitor' context for Hardware counters.
- *
- * One important side effect of this is that job submission is disabled.
- */
-#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED \
-	((base_context_create_flags)1 << 1)
-
-/* Bit-shift used to encode a memory group ID in base_context_create_flags
- */
-#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3)
-
-/* Bitmask used to encode a memory group ID in base_context_create_flags
- */
-#define BASEP_CONTEXT_MMU_GROUP_ID_MASK \
-	((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT)
-
-/* Bitpattern describing the base_context_create_flags that can be
- * passed to the kernel
- */
-#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS \
-	(BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | \
-	 BASEP_CONTEXT_MMU_GROUP_ID_MASK)
-
 /* Bitpattern describing the ::base_context_create_flags that can be
  * passed to base_context_init()
  */
@@ -287,16 +112,7 @@ typedef __u32 base_context_create_flags;
 #define BASEP_CONTEXT_FLAG_JOB_DUMP_DISABLED \
 	((base_context_create_flags)(1 << 31))
 
-/* Enable additional tracepoints for latency measurements (TL_ATOM_READY,
- * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST)
- */
-#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0)
-
-/* Indicate that job dumping is enabled. This could affect certain timers
- * to account for the performance impact.
- */
-#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1)
-
+/* Flags for base tracepoint specific to JM */
 #define BASE_TLSTREAM_FLAGS_MASK (BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS | \
 		BASE_TLSTREAM_JOB_DUMPING_ENABLED)
 /*
@@ -509,9 +325,6 @@ typedef __u32 base_jd_core_req;
  * takes priority
  *
  * This is only guaranteed to work for BASE_JD_REQ_ONLY_COMPUTE atoms.
- *
- * If the core availability policy is keeping the required core group turned
- * off, then the job will fail with a BASE_JD_EVENT_PM_EVENT error code.
  */
 #define BASE_JD_REQ_SPECIFIC_COHERENT_GROUP ((base_jd_core_req)1 << 11)
 
@@ -770,6 +583,9 @@ typedef __u8 base_jd_prio;
  */
 #define BASE_JD_PRIO_REALTIME    ((base_jd_prio)3)
 
+/* Invalid atom priority (max uint8_t value) */
+#define BASE_JD_PRIO_INVALID ((base_jd_prio)255)
+
 /* Count of the number of priority levels. This itself is not a valid
  * base_jd_prio setting
  */
@@ -1016,11 +832,6 @@ enum {
  *                             BASE_JD_EVENT_JOB_CONFIG_FAULT, or if the
  *                             platform doesn't support the feature specified in
  *                             the atom.
- * @BASE_JD_EVENT_PM_EVENT:   TODO: remove as it's not used
- * @BASE_JD_EVENT_TIMED_OUT:   TODO: remove as it's not used
- * @BASE_JD_EVENT_BAG_INVALID:   TODO: remove as it's not used
- * @BASE_JD_EVENT_PROGRESS_REPORT:   TODO: remove as it's not used
- * @BASE_JD_EVENT_BAG_DONE:   TODO: remove as it's not used
  * @BASE_JD_EVENT_DRV_TERMINATED: this is a special event generated to indicate
  *                                to userspace that the KBase context has been
  *                                destroyed and Base should stop listening for
@@ -1115,17 +926,10 @@ enum base_jd_event_code {
 	/* SW defined exceptions */
 	BASE_JD_EVENT_MEM_GROWTH_FAILED =
 		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x000,
-	BASE_JD_EVENT_TIMED_OUT =
-		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x001,
 	BASE_JD_EVENT_JOB_CANCELLED =
 		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x002,
 	BASE_JD_EVENT_JOB_INVALID =
 		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x003,
-	BASE_JD_EVENT_PM_EVENT =
-		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_JOB | 0x004,
-
-	BASE_JD_EVENT_BAG_INVALID =
-		BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_BAG | 0x003,
 
 	BASE_JD_EVENT_RANGE_HW_FAULT_OR_SW_ERROR_END = BASE_JD_SW_EVENT |
 		BASE_JD_SW_EVENT_RESERVED | 0x3FF,
@@ -1133,10 +937,6 @@ enum base_jd_event_code {
 	BASE_JD_EVENT_RANGE_SW_SUCCESS_START = BASE_JD_SW_EVENT |
 		BASE_JD_SW_EVENT_SUCCESS | 0x000,
 
-	BASE_JD_EVENT_PROGRESS_REPORT = BASE_JD_SW_EVENT |
-		BASE_JD_SW_EVENT_SUCCESS | BASE_JD_SW_EVENT_JOB | 0x000,
-	BASE_JD_EVENT_BAG_DONE = BASE_JD_SW_EVENT | BASE_JD_SW_EVENT_SUCCESS |
-		BASE_JD_SW_EVENT_BAG | 0x000,
 	BASE_JD_EVENT_DRV_TERMINATED = BASE_JD_SW_EVENT |
 		BASE_JD_SW_EVENT_SUCCESS | BASE_JD_SW_EVENT_INFO | 0x000,
 
@@ -1203,4 +1003,53 @@ struct base_dump_cpu_gpu_counters {
 	__u8 padding[36];
 };
 
+/**
+ * struct mali_base_gpu_core_props - GPU core props info
+ *
+ * @product_id: Pro specific value.
+ * @version_status: Status of the GPU release. No defined values, but starts at
+ *   0 and increases by one for each release status (alpha, beta, EAC, etc.).
+ *   4 bit values (0-15).
+ * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn"
+ *   release number.
+ *   8 bit values (0-255).
+ * @major_revision: Major release number of the GPU. "R" part of an "RnPn"
+ *   release number.
+ *   4 bit values (0-15).
+ * @padding: padding to align to 8-byte
+ * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by
+ *   clGetDeviceInfo()
+ * @log2_program_counter_size: Size of the shader program counter, in bits.
+ * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This
+ *   is a bitpattern where a set bit indicates that the format is supported.
+ *   Before using a texture format, it is recommended that the corresponding
+ *   bit be checked.
+ * @paddings_1: Padding bytes.
+ * @gpu_available_memory_size: Theoretical maximum memory available to the GPU.
+ *   It is unlikely that a client will be able to allocate all of this memory
+ *   for their own purposes, but this at least provides an upper bound on the
+ *   memory available to the GPU.
+ *   This is required for OpenCL's clGetDeviceInfo() call when
+ *   CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The
+ *   client will not be expecting to allocate anywhere near this value.
+ * @num_exec_engines: The number of execution engines. Only valid for tGOX
+ *   (Bifrost) GPUs, where GPU_HAS_REG_CORE_FEATURES is defined. Otherwise,
+ *   this is always 0.
+ * @paddings_2: Padding bytes.
+ */
+struct mali_base_gpu_core_props {
+	__u32 product_id;
+	__u16 version_status;
+	__u16 minor_revision;
+	__u16 major_revision;
+	__u16 padding;
+	__u32 gpu_freq_khz_max;
+	__u32 log2_program_counter_size;
+	__u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS];
+	__u8 paddings_1[4];
+	__u64 gpu_available_memory_size;
+	__u8 num_exec_engines;
+	__u8 paddings_2[7];
+};
+
 #endif /* _UAPI_BASE_JM_KERNEL_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h b/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h
index 215f12d..f2329f9 100644
--- a/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h
+++ b/common/include/uapi/gpu/arm/midgard/jm/mali_kbase_jm_ioctl.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -125,9 +125,32 @@
  * - Removed Kernel legacy HWC interface
  * 11.34:
  * - First release of new HW performance counters interface.
+ * 11.35:
+ * - Dummy model (no mali) backend will now clear HWC values after each sample
+ * 11.36:
+ * - Remove legacy definitions:
+ *   - base_jit_alloc_info_10_2
+ *   - base_jit_alloc_info_11_5
+ *   - kbase_ioctl_mem_jit_init_10_2
+ *   - kbase_ioctl_mem_jit_init_11_5
+ * 11.37:
+ * - Fix kinstr_prfcnt issues:
+ *   - Missing implicit sample for CMD_STOP when HWCNT buffer is full.
+ *   - Race condition when stopping periodic sampling.
+ *   - prfcnt_block_metadata::block_idx gaps.
+ *   - PRFCNT_CONTROL_CMD_SAMPLE_ASYNC is removed.
+ * 11.38:
+ * - Relax the requirement to create a mapping with BASE_MEM_MAP_TRACKING_HANDLE
+ *   before allocating GPU memory for the context.
+ * - CPU mappings of USER_BUFFER imported memory handles must be cached.
+ * 11.39:
+ * - Restrict child process from doing supported file operations (like mmap, ioctl,
+ *   read, poll) on the file descriptor of mali device file that was inherited
+ *   from the parent process.
  */
+
 #define BASE_UK_VERSION_MAJOR 11
-#define BASE_UK_VERSION_MINOR 34
+#define BASE_UK_VERSION_MINOR 39
 
 /**
  * struct kbase_ioctl_version_check - Check version compatibility between
diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h b/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h
new file mode 100644
index 0000000..f837814
--- /dev/null
+++ b/common/include/uapi/gpu/arm/midgard/mali_base_common_kernel.h
@@ -0,0 +1,231 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _UAPI_BASE_COMMON_KERNEL_H_
+#define _UAPI_BASE_COMMON_KERNEL_H_
+
+#include <linux/types.h>
+
+struct base_mem_handle {
+	struct {
+		__u64 handle;
+	} basep;
+};
+
+#define BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS 4
+
+/* Memory allocation, access/hint flags & mask.
+ *
+ * See base_mem_alloc_flags.
+ */
+
+/* IN */
+/* Read access CPU side
+ */
+#define BASE_MEM_PROT_CPU_RD ((base_mem_alloc_flags)1 << 0)
+
+/* Write access CPU side
+ */
+#define BASE_MEM_PROT_CPU_WR ((base_mem_alloc_flags)1 << 1)
+
+/* Read access GPU side
+ */
+#define BASE_MEM_PROT_GPU_RD ((base_mem_alloc_flags)1 << 2)
+
+/* Write access GPU side
+ */
+#define BASE_MEM_PROT_GPU_WR ((base_mem_alloc_flags)1 << 3)
+
+/* Execute allowed on the GPU side
+ */
+#define BASE_MEM_PROT_GPU_EX ((base_mem_alloc_flags)1 << 4)
+
+/* Will be permanently mapped in kernel space.
+ * Flag is only allowed on allocations originating from kbase.
+ */
+#define BASEP_MEM_PERMANENT_KERNEL_MAPPING ((base_mem_alloc_flags)1 << 5)
+
+/* The allocation will completely reside within the same 4GB chunk in the GPU
+ * virtual space.
+ * Since this flag is primarily required only for the TLS memory which will
+ * not be used to contain executable code and also not used for Tiler heap,
+ * it can't be used along with BASE_MEM_PROT_GPU_EX and TILER_ALIGN_TOP flags.
+ */
+#define BASE_MEM_GPU_VA_SAME_4GB_PAGE ((base_mem_alloc_flags)1 << 6)
+
+/* Userspace is not allowed to free this memory.
+ * Flag is only allowed on allocations originating from kbase.
+ */
+#define BASEP_MEM_NO_USER_FREE ((base_mem_alloc_flags)1 << 7)
+
+/* Grow backing store on GPU Page Fault
+ */
+#define BASE_MEM_GROW_ON_GPF ((base_mem_alloc_flags)1 << 9)
+
+/* Page coherence Outer shareable, if available
+ */
+#define BASE_MEM_COHERENT_SYSTEM ((base_mem_alloc_flags)1 << 10)
+
+/* Page coherence Inner shareable
+ */
+#define BASE_MEM_COHERENT_LOCAL ((base_mem_alloc_flags)1 << 11)
+
+/* IN/OUT */
+/* Should be cached on the CPU, returned if actually cached
+ */
+#define BASE_MEM_CACHED_CPU ((base_mem_alloc_flags)1 << 12)
+
+/* IN/OUT */
+/* Must have same VA on both the GPU and the CPU
+ */
+#define BASE_MEM_SAME_VA ((base_mem_alloc_flags)1 << 13)
+
+/* OUT */
+/* Must call mmap to acquire a GPU address for the allocation
+ */
+#define BASE_MEM_NEED_MMAP ((base_mem_alloc_flags)1 << 14)
+
+/* IN */
+/* Page coherence Outer shareable, required.
+ */
+#define BASE_MEM_COHERENT_SYSTEM_REQUIRED ((base_mem_alloc_flags)1 << 15)
+
+/* Protected memory
+ */
+#define BASE_MEM_PROTECTED ((base_mem_alloc_flags)1 << 16)
+
+/* Not needed physical memory
+ */
+#define BASE_MEM_DONT_NEED ((base_mem_alloc_flags)1 << 17)
+
+/* Must use shared CPU/GPU zone (SAME_VA zone) but doesn't require the
+ * addresses to be the same
+ */
+#define BASE_MEM_IMPORT_SHARED ((base_mem_alloc_flags)1 << 18)
+
+/* Should be uncached on the GPU, will work only for GPUs using AARCH64 mmu
+ * mode. Some components within the GPU might only be able to access memory
+ * that is GPU cacheable. Refer to the specific GPU implementation for more
+ * details. The 3 shareability flags will be ignored for GPU uncached memory.
+ * If used while importing USER_BUFFER type memory, then the import will fail
+ * if the memory is not aligned to GPU and CPU cache line width.
+ */
+#define BASE_MEM_UNCACHED_GPU ((base_mem_alloc_flags)1 << 21)
+
+/*
+ * Bits [22:25] for group_id (0~15).
+ *
+ * base_mem_group_id_set() should be used to pack a memory group ID into a
+ * base_mem_alloc_flags value instead of accessing the bits directly.
+ * base_mem_group_id_get() should be used to extract the memory group ID from
+ * a base_mem_alloc_flags value.
+ */
+#define BASEP_MEM_GROUP_ID_SHIFT 22
+#define BASE_MEM_GROUP_ID_MASK ((base_mem_alloc_flags)0xF << BASEP_MEM_GROUP_ID_SHIFT)
+
+/* Must do CPU cache maintenance when imported memory is mapped/unmapped
+ * on GPU. Currently applicable to dma-buf type only.
+ */
+#define BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP ((base_mem_alloc_flags)1 << 26)
+
+/* OUT */
+/* Kernel side cache sync ops required */
+#define BASE_MEM_KERNEL_SYNC ((base_mem_alloc_flags)1 << 28)
+
+/* Number of bits used as flags for base memory management
+ *
+ * Must be kept in sync with the base_mem_alloc_flags flags
+ */
+#define BASE_MEM_FLAGS_NR_BITS 30
+
+/* A mask for all output bits, excluding IN/OUT bits.
+ */
+#define BASE_MEM_FLAGS_OUTPUT_MASK BASE_MEM_NEED_MMAP
+
+/* A mask for all input bits, including IN/OUT bits.
+ */
+#define BASE_MEM_FLAGS_INPUT_MASK                                                                  \
+	(((1 << BASE_MEM_FLAGS_NR_BITS) - 1) & ~BASE_MEM_FLAGS_OUTPUT_MASK)
+
+/* Special base mem handles.
+ */
+#define BASEP_MEM_INVALID_HANDLE (0ul)
+#define BASE_MEM_MMU_DUMP_HANDLE (1ul << LOCAL_PAGE_SHIFT)
+#define BASE_MEM_TRACE_BUFFER_HANDLE (2ul << LOCAL_PAGE_SHIFT)
+#define BASE_MEM_MAP_TRACKING_HANDLE (3ul << LOCAL_PAGE_SHIFT)
+#define BASEP_MEM_WRITE_ALLOC_PAGES_HANDLE (4ul << LOCAL_PAGE_SHIFT)
+/* reserved handles ..-47<<PAGE_SHIFT> for future special handles */
+#define BASE_MEM_COOKIE_BASE (64ul << LOCAL_PAGE_SHIFT)
+#define BASE_MEM_FIRST_FREE_ADDRESS ((BITS_PER_LONG << LOCAL_PAGE_SHIFT) + BASE_MEM_COOKIE_BASE)
+
+/* Flags to pass to ::base_context_init.
+ * Flags can be ORed together to enable multiple things.
+ *
+ * These share the same space as BASEP_CONTEXT_FLAG_*, and so must
+ * not collide with them.
+ */
+typedef __u32 base_context_create_flags;
+
+/* Flags for base context */
+
+/* No flags set */
+#define BASE_CONTEXT_CREATE_FLAG_NONE ((base_context_create_flags)0)
+
+/* Base context is embedded in a cctx object (flag used for CINSTR
+ * software counter macros)
+ */
+#define BASE_CONTEXT_CCTX_EMBEDDED ((base_context_create_flags)1 << 0)
+
+/* Base context is a 'System Monitor' context for Hardware counters.
+ *
+ * One important side effect of this is that job submission is disabled.
+ */
+#define BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED ((base_context_create_flags)1 << 1)
+
+/* Bit-shift used to encode a memory group ID in base_context_create_flags
+ */
+#define BASEP_CONTEXT_MMU_GROUP_ID_SHIFT (3)
+
+/* Bitmask used to encode a memory group ID in base_context_create_flags
+ */
+#define BASEP_CONTEXT_MMU_GROUP_ID_MASK                                                            \
+	((base_context_create_flags)0xF << BASEP_CONTEXT_MMU_GROUP_ID_SHIFT)
+
+/* Bitpattern describing the base_context_create_flags that can be
+ * passed to the kernel
+ */
+#define BASEP_CONTEXT_CREATE_KERNEL_FLAGS                                                          \
+	(BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED | BASEP_CONTEXT_MMU_GROUP_ID_MASK)
+
+/* Flags for base tracepoint
+ */
+
+/* Enable additional tracepoints for latency measurements (TL_ATOM_READY,
+ * TL_ATOM_DONE, TL_ATOM_PRIO_CHANGE, TL_ATOM_EVENT_POST)
+ */
+#define BASE_TLSTREAM_ENABLE_LATENCY_TRACEPOINTS (1 << 0)
+
+/* Indicate that job dumping is enabled. This could affect certain timers
+ * to account for the performance impact.
+ */
+#define BASE_TLSTREAM_JOB_DUMPING_ENABLED (1 << 1)
+
+#endif /* _UAPI_BASE_COMMON_KERNEL_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h b/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h
index f3ffb36..b1b2912 100644
--- a/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h
+++ b/common/include/uapi/gpu/arm/midgard/mali_base_kernel.h
@@ -27,19 +27,10 @@
 #define _UAPI_BASE_KERNEL_H_
 
 #include <linux/types.h>
-
-struct base_mem_handle {
-	struct {
-		__u64 handle;
-	} basep;
-};
-
 #include "mali_base_mem_priv.h"
 #include "gpu/mali_kbase_gpu_id.h"
 #include "gpu/mali_kbase_gpu_coherency.h"
 
-#define BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS 4
-
 #define BASE_MAX_COHERENT_GROUPS 16
 
 #if defined(PAGE_MASK) && defined(PAGE_SHIFT)
@@ -62,9 +53,13 @@ struct base_mem_handle {
  */
 #define BASE_MEM_GROUP_DEFAULT (0)
 
+/* Physical memory group ID for explicit SLC allocations.
+ */
+#define BASE_MEM_GROUP_PIXEL_SLC_EXPLICIT (2)
+
 /* Number of physical memory groups.
  */
-#define BASE_MEM_GROUP_COUNT (16)
+#define BASE_MEM_GROUP_COUNT (4)
 
 /**
  * typedef base_mem_alloc_flags - Memory allocation, access/hint flags.
@@ -206,55 +201,6 @@ struct base_mem_aliasing_info {
  */
 #define BASE_JIT_ALLOC_COUNT (255)
 
-/* base_jit_alloc_info in use for kernel driver versions 10.2 to early 11.5
- *
- * jit_version is 1
- *
- * Due to the lack of padding specified, user clients between 32 and 64-bit
- * may have assumed a different size of the struct
- *
- * An array of structures was not supported
- */
-struct base_jit_alloc_info_10_2 {
-	__u64 gpu_alloc_addr;
-	__u64 va_pages;
-	__u64 commit_pages;
-	__u64 extension;
-	__u8 id;
-};
-
-/* base_jit_alloc_info introduced by kernel driver version 11.5, and in use up
- * to 11.19
- *
- * This structure had a number of modifications during and after kernel driver
- * version 11.5, but remains size-compatible throughout its version history, and
- * with earlier variants compatible with future variants by requiring
- * zero-initialization to the unused space in the structure.
- *
- * jit_version is 2
- *
- * Kernel driver version history:
- * 11.5: Initial introduction with 'usage_id' and padding[5]. All padding bytes
- *       must be zero. Kbase minor version was not incremented, so some
- *       versions of 11.5 do not have this change.
- * 11.5: Added 'bin_id' and 'max_allocations', replacing 2 padding bytes (Kbase
- *       minor version not incremented)
- * 11.6: Added 'flags', replacing 1 padding byte
- * 11.10: Arrays of this structure are supported
- */
-struct base_jit_alloc_info_11_5 {
-	__u64 gpu_alloc_addr;
-	__u64 va_pages;
-	__u64 commit_pages;
-	__u64 extension;
-	__u8 id;
-	__u8 bin_id;
-	__u8 max_allocations;
-	__u8 flags;
-	__u8 padding[2];
-	__u16 usage_id;
-};
-
 /**
  * struct base_jit_alloc_info - Structure which describes a JIT allocation
  *                              request.
@@ -284,16 +230,6 @@ struct base_jit_alloc_info_11_5 {
  * @heap_info_gpu_addr:         Pointer to an object in GPU memory describing
  *                              the actual usage of the region.
  *
- * jit_version is 3.
- *
- * When modifications are made to this structure, it is still compatible with
- * jit_version 3 when: a) the size is unchanged, and b) new members only
- * replace the padding bytes.
- *
- * Previous jit_version history:
- * jit_version == 1, refer to &base_jit_alloc_info_10_2
- * jit_version == 2, refer to &base_jit_alloc_info_11_5
- *
  * Kbase version history:
  * 11.20: added @heap_info_gpu_addr
  */
@@ -458,49 +394,6 @@ struct base_jd_debug_copy_buffer {
  * 16 coherent groups, since core groups are typically 4 cores.
  */
 
-/**
- * struct mali_base_gpu_core_props - GPU core props info
- *
- * @product_id: Pro specific value.
- * @version_status: Status of the GPU release. No defined values, but starts at
- *   0 and increases by one for each release status (alpha, beta, EAC, etc.).
- *   4 bit values (0-15).
- * @minor_revision: Minor release number of the GPU. "P" part of an "RnPn"
- *   release number.
- *   8 bit values (0-255).
- * @major_revision: Major release number of the GPU. "R" part of an "RnPn"
- *   release number.
- *   4 bit values (0-15).
- * @padding: padding to allign to 8-byte
- * @gpu_freq_khz_max: The maximum GPU frequency. Reported to applications by
- *   clGetDeviceInfo()
- * @log2_program_counter_size: Size of the shader program counter, in bits.
- * @texture_features: TEXTURE_FEATURES_x registers, as exposed by the GPU. This
- *   is a bitpattern where a set bit indicates that the format is supported.
- *   Before using a texture format, it is recommended that the corresponding
- *   bit be checked.
- * @gpu_available_memory_size: Theoretical maximum memory available to the GPU.
- *   It is unlikely that a client will be able to allocate all of this memory
- *   for their own purposes, but this at least provides an upper bound on the
- *   memory available to the GPU.
- *   This is required for OpenCL's clGetDeviceInfo() call when
- *   CL_DEVICE_GLOBAL_MEM_SIZE is requested, for OpenCL GPU devices. The
- *   client will not be expecting to allocate anywhere near this value.
- * @num_exec_engines: The number of execution engines.
- */
-struct mali_base_gpu_core_props {
-	__u32 product_id;
-	__u16 version_status;
-	__u16 minor_revision;
-	__u16 major_revision;
-	__u16 padding;
-	__u32 gpu_freq_khz_max;
-	__u32 log2_program_counter_size;
-	__u32 texture_features[BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS];
-	__u64 gpu_available_memory_size;
-	__u8 num_exec_engines;
-};
-
 /*
  * More information is possible - but associativity and bus width are not
  * required by upper-level apis.
@@ -531,7 +424,7 @@ struct mali_base_gpu_tiler_props {
  *                          field.
  * @impl_tech:              0 = Not specified, 1 = Silicon, 2 = FPGA,
  *                          3 = SW Model/Emulation
- * @padding:                padding to allign to 8-byte
+ * @padding:                padding to align to 8-byte
  * @tls_alloc:              Number of threads per core that TLS must be
  *                          allocated for
  */
@@ -551,7 +444,7 @@ struct mali_base_gpu_thread_props {
  * struct mali_base_gpu_coherent_group - descriptor for a coherent group
  * @core_mask: Core restriction mask required for the group
  * @num_cores: Number of cores in the group
- * @padding:   padding to allign to 8-byte
+ * @padding:   padding to align to 8-byte
  *
  * \c core_mask exposes all cores in that coherent group, and \c num_cores
  * provides a cached population-count for that mask.
@@ -581,7 +474,7 @@ struct mali_base_gpu_coherent_group {
  *                         are in the group[] member. Use num_groups instead.
  * @coherency: Coherency features of the memory, accessed by gpu_mem_features
  *             methods
- * @padding: padding to allign to 8-byte
+ * @padding: padding to align to 8-byte
  * @group: Descriptors of coherent groups
  *
  * Note that the sizes of the members could be reduced. However, the \c group
@@ -599,6 +492,12 @@ struct mali_base_gpu_coherent_group_info {
 	struct mali_base_gpu_coherent_group group[BASE_MAX_COHERENT_GROUPS];
 };
 
+#if MALI_USE_CSF
+#include "csf/mali_base_csf_kernel.h"
+#else
+#include "jm/mali_base_jm_kernel.h"
+#endif
+
 /**
  * struct gpu_raw_gpu_props - A complete description of the GPU's Hardware
  *                            Configuration Discovery registers.
@@ -696,12 +595,6 @@ struct base_gpu_props {
 	struct mali_base_gpu_coherent_group_info coherency_info;
 };
 
-#if MALI_USE_CSF
-#include "csf/mali_base_csf_kernel.h"
-#else
-#include "jm/mali_base_jm_kernel.h"
-#endif
-
 #define BASE_MEM_GROUP_ID_GET(flags)                                           \
 	((flags & BASE_MEM_GROUP_ID_MASK) >> BASEP_MEM_GROUP_ID_SHIFT)
 
diff --git a/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h b/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h
index 304a334..70f5b09 100644
--- a/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h
+++ b/common/include/uapi/gpu/arm/midgard/mali_base_mem_priv.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2015, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2015, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,8 +23,7 @@
 #define _UAPI_BASE_MEM_PRIV_H_
 
 #include <linux/types.h>
-
-#include "mali_base_kernel.h"
+#include "mali_base_common_kernel.h"
 
 #define BASE_SYNCSET_OP_MSYNC	(1U << 0)
 #define BASE_SYNCSET_OP_CSYNC	(1U << 1)
diff --git a/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h
index 42d93ba..5089bf2 100644
--- a/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h
+++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h
@@ -221,6 +221,7 @@ struct prfcnt_enum_sample_info {
 
 /**
  * struct prfcnt_enum_item - Performance counter enumeration item.
+ * @padding:         Padding bytes.
  * @hdr:             Header describing the type of item in the list.
  * @u:               Structure containing discriptor for enumeration item type.
  * @u.block_counter: Performance counter block descriptor.
@@ -229,6 +230,7 @@ struct prfcnt_enum_sample_info {
  */
 struct prfcnt_enum_item {
 	struct prfcnt_item_header hdr;
+	__u8 padding[4];
 	/** union u - union of block_counter and request */
 	union {
 		struct prfcnt_enum_block_counter block_counter;
@@ -305,6 +307,7 @@ struct prfcnt_request_scope {
 
 /**
  * struct prfcnt_request_item - Performance counter request item.
+ * @padding:      Padding bytes.
  * @hdr:          Header describing the type of item in the list.
  * @u:            Structure containing descriptor for request type.
  * @u.req_mode:   Mode request descriptor.
@@ -313,6 +316,7 @@ struct prfcnt_request_scope {
  */
 struct prfcnt_request_item {
 	struct prfcnt_item_header hdr;
+	__u8 padding[4];
 	/** union u - union on req_mode and req_enable */
 	union {
 		struct prfcnt_request_mode req_mode;
@@ -417,6 +421,7 @@ struct prfcnt_block_metadata {
 
 /**
  * struct prfcnt_metadata - Performance counter metadata item.
+ * @padding:     Padding bytes.
  * @hdr:         Header describing the type of item in the list.
  * @u:           Structure containing descriptor for metadata type.
  * @u.sample_md: Counter sample data metadata descriptor.
@@ -425,6 +430,7 @@ struct prfcnt_block_metadata {
  */
 struct prfcnt_metadata {
 	struct prfcnt_item_header hdr;
+	__u8 padding[4];
 	union {
 		struct prfcnt_sample_metadata sample_md;
 		struct prfcnt_clock_metadata clock_md;
@@ -439,7 +445,7 @@ struct prfcnt_metadata {
  * @PRFCNT_CONTROL_CMD_STOP:         Stop the counter data dump run for the
  *                                   calling client session.
  * @PRFCNT_CONTROL_CMD_SAMPLE_SYNC:  Trigger a synchronous manual sample.
- * @PRFCNT_CONTROL_CMD_SAMPLE_ASYNC: Trigger an asynchronous manual sample.
+ * @PRFCNT_CONTROL_CMD_RESERVED:     Previously SAMPLE_ASYNC not supported any more.
  * @PRFCNT_CONTROL_CMD_DISCARD:      Discard all samples which have not yet
  *                                   been consumed by userspace. Note that
  *                                   this can race with new samples if
@@ -449,7 +455,7 @@ enum prfcnt_control_cmd_code {
 	PRFCNT_CONTROL_CMD_START = 1,
 	PRFCNT_CONTROL_CMD_STOP,
 	PRFCNT_CONTROL_CMD_SAMPLE_SYNC,
-	PRFCNT_CONTROL_CMD_SAMPLE_ASYNC,
+	PRFCNT_CONTROL_CMD_RESERVED,
 	PRFCNT_CONTROL_CMD_DISCARD,
 };
 
diff --git a/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h
index d1d5f3d..e72c82e 100644
--- a/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h
+++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_ioctl.h
@@ -46,8 +46,7 @@ struct kbase_ioctl_set_flags {
 	__u32 create_flags;
 };
 
-#define KBASE_IOCTL_SET_FLAGS \
-	_IOW(KBASE_IOCTL_TYPE, 1, struct kbase_ioctl_set_flags)
+#define KBASE_IOCTL_SET_FLAGS _IOW(KBASE_IOCTL_TYPE, 1, struct kbase_ioctl_set_flags)
 
 /**
  * struct kbase_ioctl_get_gpuprops - Read GPU properties from the kernel
@@ -81,8 +80,7 @@ struct kbase_ioctl_get_gpuprops {
 	__u32 flags;
 };
 
-#define KBASE_IOCTL_GET_GPUPROPS \
-	_IOW(KBASE_IOCTL_TYPE, 3, struct kbase_ioctl_get_gpuprops)
+#define KBASE_IOCTL_GET_GPUPROPS _IOW(KBASE_IOCTL_TYPE, 3, struct kbase_ioctl_get_gpuprops)
 
 /**
  * union kbase_ioctl_mem_alloc - Allocate memory on the GPU
@@ -108,8 +106,7 @@ union kbase_ioctl_mem_alloc {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_ALLOC \
-	_IOWR(KBASE_IOCTL_TYPE, 5, union kbase_ioctl_mem_alloc)
+#define KBASE_IOCTL_MEM_ALLOC _IOWR(KBASE_IOCTL_TYPE, 5, union kbase_ioctl_mem_alloc)
 
 /**
  * struct kbase_ioctl_mem_query - Query properties of a GPU memory region
@@ -131,12 +128,11 @@ union kbase_ioctl_mem_query {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_QUERY \
-	_IOWR(KBASE_IOCTL_TYPE, 6, union kbase_ioctl_mem_query)
+#define KBASE_IOCTL_MEM_QUERY _IOWR(KBASE_IOCTL_TYPE, 6, union kbase_ioctl_mem_query)
 
-#define KBASE_MEM_QUERY_COMMIT_SIZE	((__u64)1)
-#define KBASE_MEM_QUERY_VA_SIZE		((__u64)2)
-#define KBASE_MEM_QUERY_FLAGS		((__u64)3)
+#define KBASE_MEM_QUERY_COMMIT_SIZE ((__u64)1)
+#define KBASE_MEM_QUERY_VA_SIZE ((__u64)2)
+#define KBASE_MEM_QUERY_FLAGS ((__u64)3)
 
 /**
  * struct kbase_ioctl_mem_free - Free a memory region
@@ -146,8 +142,7 @@ struct kbase_ioctl_mem_free {
 	__u64 gpu_addr;
 };
 
-#define KBASE_IOCTL_MEM_FREE \
-	_IOW(KBASE_IOCTL_TYPE, 7, struct kbase_ioctl_mem_free)
+#define KBASE_IOCTL_MEM_FREE _IOW(KBASE_IOCTL_TYPE, 7, struct kbase_ioctl_mem_free)
 
 /**
  * struct kbase_ioctl_hwcnt_reader_setup - Setup HWC dumper/reader
@@ -167,7 +162,7 @@ struct kbase_ioctl_hwcnt_reader_setup {
 	__u32 mmu_l2_bm;
 };
 
-#define KBASE_IOCTL_HWCNT_READER_SETUP \
+#define KBASE_IOCTL_HWCNT_READER_SETUP                                                             \
 	_IOW(KBASE_IOCTL_TYPE, 8, struct kbase_ioctl_hwcnt_reader_setup)
 
 /**
@@ -182,8 +177,7 @@ struct kbase_ioctl_hwcnt_values {
 	__u32 padding;
 };
 
-#define KBASE_IOCTL_HWCNT_SET \
-	_IOW(KBASE_IOCTL_TYPE, 32, struct kbase_ioctl_hwcnt_values)
+#define KBASE_IOCTL_HWCNT_SET _IOW(KBASE_IOCTL_TYPE, 32, struct kbase_ioctl_hwcnt_values)
 
 /**
  * struct kbase_ioctl_disjoint_query - Query the disjoint counter
@@ -193,8 +187,7 @@ struct kbase_ioctl_disjoint_query {
 	__u32 counter;
 };
 
-#define KBASE_IOCTL_DISJOINT_QUERY \
-	_IOR(KBASE_IOCTL_TYPE, 12, struct kbase_ioctl_disjoint_query)
+#define KBASE_IOCTL_DISJOINT_QUERY _IOR(KBASE_IOCTL_TYPE, 12, struct kbase_ioctl_disjoint_query)
 
 /**
  * struct kbase_ioctl_get_ddk_version - Query the kernel version
@@ -215,54 +208,7 @@ struct kbase_ioctl_get_ddk_version {
 	__u32 padding;
 };
 
-#define KBASE_IOCTL_GET_DDK_VERSION \
-	_IOW(KBASE_IOCTL_TYPE, 13, struct kbase_ioctl_get_ddk_version)
-
-/**
- * struct kbase_ioctl_mem_jit_init_10_2 - Initialize the just-in-time memory
- *                                        allocator (between kernel driver
- *                                        version 10.2--11.4)
- * @va_pages: Number of VA pages to reserve for JIT
- *
- * Note that depending on the VA size of the application and GPU, the value
- * specified in @va_pages may be ignored.
- *
- * New code should use KBASE_IOCTL_MEM_JIT_INIT instead, this is kept for
- * backwards compatibility.
- */
-struct kbase_ioctl_mem_jit_init_10_2 {
-	__u64 va_pages;
-};
-
-#define KBASE_IOCTL_MEM_JIT_INIT_10_2 \
-	_IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init_10_2)
-
-/**
- * struct kbase_ioctl_mem_jit_init_11_5 - Initialize the just-in-time memory
- *                                        allocator (between kernel driver
- *                                        version 11.5--11.19)
- * @va_pages: Number of VA pages to reserve for JIT
- * @max_allocations: Maximum number of concurrent allocations
- * @trim_level: Level of JIT allocation trimming to perform on free (0 - 100%)
- * @group_id: Group ID to be used for physical allocations
- * @padding: Currently unused, must be zero
- *
- * Note that depending on the VA size of the application and GPU, the value
- * specified in @va_pages may be ignored.
- *
- * New code should use KBASE_IOCTL_MEM_JIT_INIT instead, this is kept for
- * backwards compatibility.
- */
-struct kbase_ioctl_mem_jit_init_11_5 {
-	__u64 va_pages;
-	__u8 max_allocations;
-	__u8 trim_level;
-	__u8 group_id;
-	__u8 padding[5];
-};
-
-#define KBASE_IOCTL_MEM_JIT_INIT_11_5 \
-	_IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init_11_5)
+#define KBASE_IOCTL_GET_DDK_VERSION _IOW(KBASE_IOCTL_TYPE, 13, struct kbase_ioctl_get_ddk_version)
 
 /**
  * struct kbase_ioctl_mem_jit_init - Initialize the just-in-time memory
@@ -287,8 +233,7 @@ struct kbase_ioctl_mem_jit_init {
 	__u64 phys_pages;
 };
 
-#define KBASE_IOCTL_MEM_JIT_INIT \
-	_IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init)
+#define KBASE_IOCTL_MEM_JIT_INIT _IOW(KBASE_IOCTL_TYPE, 14, struct kbase_ioctl_mem_jit_init)
 
 /**
  * struct kbase_ioctl_mem_sync - Perform cache maintenance on memory
@@ -308,8 +253,7 @@ struct kbase_ioctl_mem_sync {
 	__u8 padding[7];
 };
 
-#define KBASE_IOCTL_MEM_SYNC \
-	_IOW(KBASE_IOCTL_TYPE, 15, struct kbase_ioctl_mem_sync)
+#define KBASE_IOCTL_MEM_SYNC _IOW(KBASE_IOCTL_TYPE, 15, struct kbase_ioctl_mem_sync)
 
 /**
  * union kbase_ioctl_mem_find_cpu_offset - Find the offset of a CPU pointer
@@ -332,7 +276,7 @@ union kbase_ioctl_mem_find_cpu_offset {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_FIND_CPU_OFFSET \
+#define KBASE_IOCTL_MEM_FIND_CPU_OFFSET                                                            \
 	_IOWR(KBASE_IOCTL_TYPE, 16, union kbase_ioctl_mem_find_cpu_offset)
 
 /**
@@ -344,8 +288,7 @@ struct kbase_ioctl_get_context_id {
 	__u32 id;
 };
 
-#define KBASE_IOCTL_GET_CONTEXT_ID \
-	_IOR(KBASE_IOCTL_TYPE, 17, struct kbase_ioctl_get_context_id)
+#define KBASE_IOCTL_GET_CONTEXT_ID _IOR(KBASE_IOCTL_TYPE, 17, struct kbase_ioctl_get_context_id)
 
 /**
  * struct kbase_ioctl_tlstream_acquire - Acquire a tlstream fd
@@ -358,11 +301,9 @@ struct kbase_ioctl_tlstream_acquire {
 	__u32 flags;
 };
 
-#define KBASE_IOCTL_TLSTREAM_ACQUIRE \
-	_IOW(KBASE_IOCTL_TYPE, 18, struct kbase_ioctl_tlstream_acquire)
+#define KBASE_IOCTL_TLSTREAM_ACQUIRE _IOW(KBASE_IOCTL_TYPE, 18, struct kbase_ioctl_tlstream_acquire)
 
-#define KBASE_IOCTL_TLSTREAM_FLUSH \
-	_IO(KBASE_IOCTL_TYPE, 19)
+#define KBASE_IOCTL_TLSTREAM_FLUSH _IO(KBASE_IOCTL_TYPE, 19)
 
 /**
  * struct kbase_ioctl_mem_commit - Change the amount of memory backing a region
@@ -379,8 +320,7 @@ struct kbase_ioctl_mem_commit {
 	__u64 pages;
 };
 
-#define KBASE_IOCTL_MEM_COMMIT \
-	_IOW(KBASE_IOCTL_TYPE, 20, struct kbase_ioctl_mem_commit)
+#define KBASE_IOCTL_MEM_COMMIT _IOW(KBASE_IOCTL_TYPE, 20, struct kbase_ioctl_mem_commit)
 
 /**
  * union kbase_ioctl_mem_alias - Create an alias of memory regions
@@ -408,8 +348,7 @@ union kbase_ioctl_mem_alias {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_ALIAS \
-	_IOWR(KBASE_IOCTL_TYPE, 21, union kbase_ioctl_mem_alias)
+#define KBASE_IOCTL_MEM_ALIAS _IOWR(KBASE_IOCTL_TYPE, 21, union kbase_ioctl_mem_alias)
 
 /**
  * union kbase_ioctl_mem_import - Import memory for use by the GPU
@@ -437,8 +376,7 @@ union kbase_ioctl_mem_import {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_IMPORT \
-	_IOWR(KBASE_IOCTL_TYPE, 22, union kbase_ioctl_mem_import)
+#define KBASE_IOCTL_MEM_IMPORT _IOWR(KBASE_IOCTL_TYPE, 22, union kbase_ioctl_mem_import)
 
 /**
  * struct kbase_ioctl_mem_flags_change - Change the flags for a memory region
@@ -452,8 +390,7 @@ struct kbase_ioctl_mem_flags_change {
 	__u64 mask;
 };
 
-#define KBASE_IOCTL_MEM_FLAGS_CHANGE \
-	_IOW(KBASE_IOCTL_TYPE, 23, struct kbase_ioctl_mem_flags_change)
+#define KBASE_IOCTL_MEM_FLAGS_CHANGE _IOW(KBASE_IOCTL_TYPE, 23, struct kbase_ioctl_mem_flags_change)
 
 /**
  * struct kbase_ioctl_stream_create - Create a synchronisation stream
@@ -470,8 +407,7 @@ struct kbase_ioctl_stream_create {
 	char name[32];
 };
 
-#define KBASE_IOCTL_STREAM_CREATE \
-	_IOW(KBASE_IOCTL_TYPE, 24, struct kbase_ioctl_stream_create)
+#define KBASE_IOCTL_STREAM_CREATE _IOW(KBASE_IOCTL_TYPE, 24, struct kbase_ioctl_stream_create)
 
 /**
  * struct kbase_ioctl_fence_validate - Validate a fd refers to a fence
@@ -481,8 +417,7 @@ struct kbase_ioctl_fence_validate {
 	int fd;
 };
 
-#define KBASE_IOCTL_FENCE_VALIDATE \
-	_IOW(KBASE_IOCTL_TYPE, 25, struct kbase_ioctl_fence_validate)
+#define KBASE_IOCTL_FENCE_VALIDATE _IOW(KBASE_IOCTL_TYPE, 25, struct kbase_ioctl_fence_validate)
 
 /**
  * struct kbase_ioctl_mem_profile_add - Provide profiling information to kernel
@@ -498,8 +433,7 @@ struct kbase_ioctl_mem_profile_add {
 	__u32 padding;
 };
 
-#define KBASE_IOCTL_MEM_PROFILE_ADD \
-	_IOW(KBASE_IOCTL_TYPE, 27, struct kbase_ioctl_mem_profile_add)
+#define KBASE_IOCTL_MEM_PROFILE_ADD _IOW(KBASE_IOCTL_TYPE, 27, struct kbase_ioctl_mem_profile_add)
 
 /**
  * struct kbase_ioctl_sticky_resource_map - Permanently map an external resource
@@ -511,7 +445,7 @@ struct kbase_ioctl_sticky_resource_map {
 	__u64 address;
 };
 
-#define KBASE_IOCTL_STICKY_RESOURCE_MAP \
+#define KBASE_IOCTL_STICKY_RESOURCE_MAP                                                            \
 	_IOW(KBASE_IOCTL_TYPE, 29, struct kbase_ioctl_sticky_resource_map)
 
 /**
@@ -525,7 +459,7 @@ struct kbase_ioctl_sticky_resource_unmap {
 	__u64 address;
 };
 
-#define KBASE_IOCTL_STICKY_RESOURCE_UNMAP \
+#define KBASE_IOCTL_STICKY_RESOURCE_UNMAP                                                          \
 	_IOW(KBASE_IOCTL_TYPE, 30, struct kbase_ioctl_sticky_resource_unmap)
 
 /**
@@ -553,17 +487,16 @@ union kbase_ioctl_mem_find_gpu_start_and_offset {
 	} out;
 };
 
-#define KBASE_IOCTL_MEM_FIND_GPU_START_AND_OFFSET \
+#define KBASE_IOCTL_MEM_FIND_GPU_START_AND_OFFSET                                                  \
 	_IOWR(KBASE_IOCTL_TYPE, 31, union kbase_ioctl_mem_find_gpu_start_and_offset)
 
-#define KBASE_IOCTL_CINSTR_GWT_START \
-	_IO(KBASE_IOCTL_TYPE, 33)
+#define KBASE_IOCTL_CINSTR_GWT_START _IO(KBASE_IOCTL_TYPE, 33)
 
-#define KBASE_IOCTL_CINSTR_GWT_STOP \
-	_IO(KBASE_IOCTL_TYPE, 34)
+#define KBASE_IOCTL_CINSTR_GWT_STOP _IO(KBASE_IOCTL_TYPE, 34)
 
 /**
- * union kbase_ioctl_gwt_dump - Used to collect all GPU write fault addresses.
+ * union kbase_ioctl_cinstr_gwt_dump - Used to collect all GPU write fault
+ *                                     addresses.
  * @in: Input parameters
  * @in.addr_buffer: Address of buffer to hold addresses of gpu modified areas.
  * @in.size_buffer: Address of buffer to hold size of modified areas (in pages)
@@ -592,8 +525,7 @@ union kbase_ioctl_cinstr_gwt_dump {
 	} out;
 };
 
-#define KBASE_IOCTL_CINSTR_GWT_DUMP \
-	_IOWR(KBASE_IOCTL_TYPE, 35, union kbase_ioctl_cinstr_gwt_dump)
+#define KBASE_IOCTL_CINSTR_GWT_DUMP _IOWR(KBASE_IOCTL_TYPE, 35, union kbase_ioctl_cinstr_gwt_dump)
 
 /**
  * struct kbase_ioctl_mem_exec_init - Initialise the EXEC_VA memory zone
@@ -604,8 +536,7 @@ struct kbase_ioctl_mem_exec_init {
 	__u64 va_pages;
 };
 
-#define KBASE_IOCTL_MEM_EXEC_INIT \
-	_IOW(KBASE_IOCTL_TYPE, 38, struct kbase_ioctl_mem_exec_init)
+#define KBASE_IOCTL_MEM_EXEC_INIT _IOW(KBASE_IOCTL_TYPE, 38, struct kbase_ioctl_mem_exec_init)
 
 /**
  * union kbase_ioctl_get_cpu_gpu_timeinfo - Request zero or more types of
@@ -634,7 +565,7 @@ union kbase_ioctl_get_cpu_gpu_timeinfo {
 	} out;
 };
 
-#define KBASE_IOCTL_GET_CPU_GPU_TIMEINFO \
+#define KBASE_IOCTL_GET_CPU_GPU_TIMEINFO                                                           \
 	_IOWR(KBASE_IOCTL_TYPE, 50, union kbase_ioctl_get_cpu_gpu_timeinfo)
 
 /**
@@ -646,7 +577,7 @@ struct kbase_ioctl_context_priority_check {
 	__u8 priority;
 };
 
-#define KBASE_IOCTL_CONTEXT_PRIORITY_CHECK \
+#define KBASE_IOCTL_CONTEXT_PRIORITY_CHECK                                                         \
 	_IOWR(KBASE_IOCTL_TYPE, 54, struct kbase_ioctl_context_priority_check)
 
 /**
@@ -658,7 +589,7 @@ struct kbase_ioctl_set_limited_core_count {
 	__u8 max_core_count;
 };
 
-#define KBASE_IOCTL_SET_LIMITED_CORE_COUNT \
+#define KBASE_IOCTL_SET_LIMITED_CORE_COUNT                                                         \
 	_IOW(KBASE_IOCTL_TYPE, 55, struct kbase_ioctl_set_limited_core_count)
 
 /**
@@ -679,11 +610,11 @@ struct kbase_ioctl_kinstr_prfcnt_enum_info {
 	__u64 info_list_ptr;
 };
 
-#define KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO                                    \
+#define KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO                                                        \
 	_IOWR(KBASE_IOCTL_TYPE, 56, struct kbase_ioctl_kinstr_prfcnt_enum_info)
 
 /**
- * struct kbase_ioctl_hwcnt_reader_setup - Setup HWC dumper/reader
+ * struct kbase_ioctl_kinstr_prfcnt_setup - Setup HWC dumper/reader
  * @in: input parameters.
  * @in.request_item_count: Number of requests in the requests array.
  * @in.request_item_size:  Size in bytes of each request in the requests array.
@@ -708,7 +639,7 @@ union kbase_ioctl_kinstr_prfcnt_setup {
 	} out;
 };
 
-#define KBASE_IOCTL_KINSTR_PRFCNT_SETUP                                        \
+#define KBASE_IOCTL_KINSTR_PRFCNT_SETUP                                                            \
 	_IOWR(KBASE_IOCTL_TYPE, 57, union kbase_ioctl_kinstr_prfcnt_setup)
 
 /***************
@@ -727,6 +658,27 @@ struct kbase_ioctl_apc_request {
 #define KBASE_IOCTL_APC_REQUEST \
 	_IOW(KBASE_IOCTL_TYPE, 66, struct kbase_ioctl_apc_request)
 
+/**
+ * struct kbase_ioctl_buffer_liveness_update - Update the live ranges of buffers from previous frame
+ *
+ * @live_ranges_address: Array of live ranges
+ * @live_ranges_count: Number of elements in the live ranges buffer
+ * @buffer_va_address: Array of buffer base virtual addresses
+ * @buffer_sizes_address: Array of buffer sizes
+ * @buffer_count: Number of buffers
+ * @padding: Unused
+ */
+struct kbase_ioctl_buffer_liveness_update {
+	__u64 live_ranges_address;
+	__u64 live_ranges_count;
+	__u64 buffer_va_address;
+	__u64 buffer_sizes_address;
+	__u64 buffer_count;
+};
+
+#define KBASE_IOCTL_BUFFER_LIVENESS_UPDATE \
+	_IOW(KBASE_IOCTL_TYPE, 67, struct kbase_ioctl_buffer_liveness_update)
+
 /***************
  * test ioctls *
  ***************/
@@ -748,8 +700,7 @@ struct kbase_ioctl_tlstream_stats {
 	__u32 bytes_generated;
 };
 
-#define KBASE_IOCTL_TLSTREAM_STATS \
-	_IOR(KBASE_IOCTL_TEST_TYPE, 2, struct kbase_ioctl_tlstream_stats)
+#define KBASE_IOCTL_TLSTREAM_STATS _IOR(KBASE_IOCTL_TEST_TYPE, 2, struct kbase_ioctl_tlstream_stats)
 
 #endif /* MALI_UNIT_TEST */
 
@@ -767,108 +718,107 @@ struct kbase_ioctl_tlstream_stats {
  *         _IOWR(KBASE_IOCTL_EXTRA_TYPE, 0, struct my_ioctl_args)
  */
 
-
 /**********************************
  * Definitions for GPU properties *
  **********************************/
-#define KBASE_GPUPROP_VALUE_SIZE_U8	(0x0)
-#define KBASE_GPUPROP_VALUE_SIZE_U16	(0x1)
-#define KBASE_GPUPROP_VALUE_SIZE_U32	(0x2)
-#define KBASE_GPUPROP_VALUE_SIZE_U64	(0x3)
-
-#define KBASE_GPUPROP_PRODUCT_ID			1
-#define KBASE_GPUPROP_VERSION_STATUS			2
-#define KBASE_GPUPROP_MINOR_REVISION			3
-#define KBASE_GPUPROP_MAJOR_REVISION			4
+#define KBASE_GPUPROP_VALUE_SIZE_U8 (0x0)
+#define KBASE_GPUPROP_VALUE_SIZE_U16 (0x1)
+#define KBASE_GPUPROP_VALUE_SIZE_U32 (0x2)
+#define KBASE_GPUPROP_VALUE_SIZE_U64 (0x3)
+
+#define KBASE_GPUPROP_PRODUCT_ID 1
+#define KBASE_GPUPROP_VERSION_STATUS 2
+#define KBASE_GPUPROP_MINOR_REVISION 3
+#define KBASE_GPUPROP_MAJOR_REVISION 4
 /* 5 previously used for GPU speed */
-#define KBASE_GPUPROP_GPU_FREQ_KHZ_MAX			6
+#define KBASE_GPUPROP_GPU_FREQ_KHZ_MAX 6
 /* 7 previously used for minimum GPU speed */
-#define KBASE_GPUPROP_LOG2_PROGRAM_COUNTER_SIZE		8
-#define KBASE_GPUPROP_TEXTURE_FEATURES_0		9
-#define KBASE_GPUPROP_TEXTURE_FEATURES_1		10
-#define KBASE_GPUPROP_TEXTURE_FEATURES_2		11
-#define KBASE_GPUPROP_GPU_AVAILABLE_MEMORY_SIZE		12
-
-#define KBASE_GPUPROP_L2_LOG2_LINE_SIZE			13
-#define KBASE_GPUPROP_L2_LOG2_CACHE_SIZE		14
-#define KBASE_GPUPROP_L2_NUM_L2_SLICES			15
-
-#define KBASE_GPUPROP_TILER_BIN_SIZE_BYTES		16
-#define KBASE_GPUPROP_TILER_MAX_ACTIVE_LEVELS		17
-
-#define KBASE_GPUPROP_MAX_THREADS			18
-#define KBASE_GPUPROP_MAX_WORKGROUP_SIZE		19
-#define KBASE_GPUPROP_MAX_BARRIER_SIZE			20
-#define KBASE_GPUPROP_MAX_REGISTERS			21
-#define KBASE_GPUPROP_MAX_TASK_QUEUE			22
-#define KBASE_GPUPROP_MAX_THREAD_GROUP_SPLIT		23
-#define KBASE_GPUPROP_IMPL_TECH				24
-
-#define KBASE_GPUPROP_RAW_SHADER_PRESENT		25
-#define KBASE_GPUPROP_RAW_TILER_PRESENT			26
-#define KBASE_GPUPROP_RAW_L2_PRESENT			27
-#define KBASE_GPUPROP_RAW_STACK_PRESENT			28
-#define KBASE_GPUPROP_RAW_L2_FEATURES			29
-#define KBASE_GPUPROP_RAW_CORE_FEATURES			30
-#define KBASE_GPUPROP_RAW_MEM_FEATURES			31
-#define KBASE_GPUPROP_RAW_MMU_FEATURES			32
-#define KBASE_GPUPROP_RAW_AS_PRESENT			33
-#define KBASE_GPUPROP_RAW_JS_PRESENT			34
-#define KBASE_GPUPROP_RAW_JS_FEATURES_0			35
-#define KBASE_GPUPROP_RAW_JS_FEATURES_1			36
-#define KBASE_GPUPROP_RAW_JS_FEATURES_2			37
-#define KBASE_GPUPROP_RAW_JS_FEATURES_3			38
-#define KBASE_GPUPROP_RAW_JS_FEATURES_4			39
-#define KBASE_GPUPROP_RAW_JS_FEATURES_5			40
-#define KBASE_GPUPROP_RAW_JS_FEATURES_6			41
-#define KBASE_GPUPROP_RAW_JS_FEATURES_7			42
-#define KBASE_GPUPROP_RAW_JS_FEATURES_8			43
-#define KBASE_GPUPROP_RAW_JS_FEATURES_9			44
-#define KBASE_GPUPROP_RAW_JS_FEATURES_10		45
-#define KBASE_GPUPROP_RAW_JS_FEATURES_11		46
-#define KBASE_GPUPROP_RAW_JS_FEATURES_12		47
-#define KBASE_GPUPROP_RAW_JS_FEATURES_13		48
-#define KBASE_GPUPROP_RAW_JS_FEATURES_14		49
-#define KBASE_GPUPROP_RAW_JS_FEATURES_15		50
-#define KBASE_GPUPROP_RAW_TILER_FEATURES		51
-#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_0		52
-#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_1		53
-#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_2		54
-#define KBASE_GPUPROP_RAW_GPU_ID			55
-#define KBASE_GPUPROP_RAW_THREAD_MAX_THREADS		56
-#define KBASE_GPUPROP_RAW_THREAD_MAX_WORKGROUP_SIZE	57
-#define KBASE_GPUPROP_RAW_THREAD_MAX_BARRIER_SIZE	58
-#define KBASE_GPUPROP_RAW_THREAD_FEATURES		59
-#define KBASE_GPUPROP_RAW_COHERENCY_MODE		60
-
-#define KBASE_GPUPROP_COHERENCY_NUM_GROUPS		61
-#define KBASE_GPUPROP_COHERENCY_NUM_CORE_GROUPS		62
-#define KBASE_GPUPROP_COHERENCY_COHERENCY		63
-#define KBASE_GPUPROP_COHERENCY_GROUP_0			64
-#define KBASE_GPUPROP_COHERENCY_GROUP_1			65
-#define KBASE_GPUPROP_COHERENCY_GROUP_2			66
-#define KBASE_GPUPROP_COHERENCY_GROUP_3			67
-#define KBASE_GPUPROP_COHERENCY_GROUP_4			68
-#define KBASE_GPUPROP_COHERENCY_GROUP_5			69
-#define KBASE_GPUPROP_COHERENCY_GROUP_6			70
-#define KBASE_GPUPROP_COHERENCY_GROUP_7			71
-#define KBASE_GPUPROP_COHERENCY_GROUP_8			72
-#define KBASE_GPUPROP_COHERENCY_GROUP_9			73
-#define KBASE_GPUPROP_COHERENCY_GROUP_10		74
-#define KBASE_GPUPROP_COHERENCY_GROUP_11		75
-#define KBASE_GPUPROP_COHERENCY_GROUP_12		76
-#define KBASE_GPUPROP_COHERENCY_GROUP_13		77
-#define KBASE_GPUPROP_COHERENCY_GROUP_14		78
-#define KBASE_GPUPROP_COHERENCY_GROUP_15		79
-
-#define KBASE_GPUPROP_TEXTURE_FEATURES_3		80
-#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_3		81
-
-#define KBASE_GPUPROP_NUM_EXEC_ENGINES			82
-
-#define KBASE_GPUPROP_RAW_THREAD_TLS_ALLOC		83
-#define KBASE_GPUPROP_TLS_ALLOC				84
-#define KBASE_GPUPROP_RAW_GPU_FEATURES			85
+#define KBASE_GPUPROP_LOG2_PROGRAM_COUNTER_SIZE 8
+#define KBASE_GPUPROP_TEXTURE_FEATURES_0 9
+#define KBASE_GPUPROP_TEXTURE_FEATURES_1 10
+#define KBASE_GPUPROP_TEXTURE_FEATURES_2 11
+#define KBASE_GPUPROP_GPU_AVAILABLE_MEMORY_SIZE 12
+
+#define KBASE_GPUPROP_L2_LOG2_LINE_SIZE 13
+#define KBASE_GPUPROP_L2_LOG2_CACHE_SIZE 14
+#define KBASE_GPUPROP_L2_NUM_L2_SLICES 15
+
+#define KBASE_GPUPROP_TILER_BIN_SIZE_BYTES 16
+#define KBASE_GPUPROP_TILER_MAX_ACTIVE_LEVELS 17
+
+#define KBASE_GPUPROP_MAX_THREADS 18
+#define KBASE_GPUPROP_MAX_WORKGROUP_SIZE 19
+#define KBASE_GPUPROP_MAX_BARRIER_SIZE 20
+#define KBASE_GPUPROP_MAX_REGISTERS 21
+#define KBASE_GPUPROP_MAX_TASK_QUEUE 22
+#define KBASE_GPUPROP_MAX_THREAD_GROUP_SPLIT 23
+#define KBASE_GPUPROP_IMPL_TECH 24
+
+#define KBASE_GPUPROP_RAW_SHADER_PRESENT 25
+#define KBASE_GPUPROP_RAW_TILER_PRESENT 26
+#define KBASE_GPUPROP_RAW_L2_PRESENT 27
+#define KBASE_GPUPROP_RAW_STACK_PRESENT 28
+#define KBASE_GPUPROP_RAW_L2_FEATURES 29
+#define KBASE_GPUPROP_RAW_CORE_FEATURES 30
+#define KBASE_GPUPROP_RAW_MEM_FEATURES 31
+#define KBASE_GPUPROP_RAW_MMU_FEATURES 32
+#define KBASE_GPUPROP_RAW_AS_PRESENT 33
+#define KBASE_GPUPROP_RAW_JS_PRESENT 34
+#define KBASE_GPUPROP_RAW_JS_FEATURES_0 35
+#define KBASE_GPUPROP_RAW_JS_FEATURES_1 36
+#define KBASE_GPUPROP_RAW_JS_FEATURES_2 37
+#define KBASE_GPUPROP_RAW_JS_FEATURES_3 38
+#define KBASE_GPUPROP_RAW_JS_FEATURES_4 39
+#define KBASE_GPUPROP_RAW_JS_FEATURES_5 40
+#define KBASE_GPUPROP_RAW_JS_FEATURES_6 41
+#define KBASE_GPUPROP_RAW_JS_FEATURES_7 42
+#define KBASE_GPUPROP_RAW_JS_FEATURES_8 43
+#define KBASE_GPUPROP_RAW_JS_FEATURES_9 44
+#define KBASE_GPUPROP_RAW_JS_FEATURES_10 45
+#define KBASE_GPUPROP_RAW_JS_FEATURES_11 46
+#define KBASE_GPUPROP_RAW_JS_FEATURES_12 47
+#define KBASE_GPUPROP_RAW_JS_FEATURES_13 48
+#define KBASE_GPUPROP_RAW_JS_FEATURES_14 49
+#define KBASE_GPUPROP_RAW_JS_FEATURES_15 50
+#define KBASE_GPUPROP_RAW_TILER_FEATURES 51
+#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_0 52
+#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_1 53
+#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_2 54
+#define KBASE_GPUPROP_RAW_GPU_ID 55
+#define KBASE_GPUPROP_RAW_THREAD_MAX_THREADS 56
+#define KBASE_GPUPROP_RAW_THREAD_MAX_WORKGROUP_SIZE 57
+#define KBASE_GPUPROP_RAW_THREAD_MAX_BARRIER_SIZE 58
+#define KBASE_GPUPROP_RAW_THREAD_FEATURES 59
+#define KBASE_GPUPROP_RAW_COHERENCY_MODE 60
+
+#define KBASE_GPUPROP_COHERENCY_NUM_GROUPS 61
+#define KBASE_GPUPROP_COHERENCY_NUM_CORE_GROUPS 62
+#define KBASE_GPUPROP_COHERENCY_COHERENCY 63
+#define KBASE_GPUPROP_COHERENCY_GROUP_0 64
+#define KBASE_GPUPROP_COHERENCY_GROUP_1 65
+#define KBASE_GPUPROP_COHERENCY_GROUP_2 66
+#define KBASE_GPUPROP_COHERENCY_GROUP_3 67
+#define KBASE_GPUPROP_COHERENCY_GROUP_4 68
+#define KBASE_GPUPROP_COHERENCY_GROUP_5 69
+#define KBASE_GPUPROP_COHERENCY_GROUP_6 70
+#define KBASE_GPUPROP_COHERENCY_GROUP_7 71
+#define KBASE_GPUPROP_COHERENCY_GROUP_8 72
+#define KBASE_GPUPROP_COHERENCY_GROUP_9 73
+#define KBASE_GPUPROP_COHERENCY_GROUP_10 74
+#define KBASE_GPUPROP_COHERENCY_GROUP_11 75
+#define KBASE_GPUPROP_COHERENCY_GROUP_12 76
+#define KBASE_GPUPROP_COHERENCY_GROUP_13 77
+#define KBASE_GPUPROP_COHERENCY_GROUP_14 78
+#define KBASE_GPUPROP_COHERENCY_GROUP_15 79
+
+#define KBASE_GPUPROP_TEXTURE_FEATURES_3 80
+#define KBASE_GPUPROP_RAW_TEXTURE_FEATURES_3 81
+
+#define KBASE_GPUPROP_NUM_EXEC_ENGINES 82
+
+#define KBASE_GPUPROP_RAW_THREAD_TLS_ALLOC 83
+#define KBASE_GPUPROP_TLS_ALLOC 84
+#define KBASE_GPUPROP_RAW_GPU_FEATURES 85
 #ifdef __cpluscplus
 }
 #endif
diff --git a/mali_kbase/mali_kbase_mem_profile_debugfs_buf_size.h b/common/include/uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h
index c2fb3f5..1649100 100644
--- a/mali_kbase/mali_kbase_mem_profile_debugfs_buf_size.h
+++ b/common/include/uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h
@@ -23,14 +23,13 @@
  * DOC: Header file for the size of the buffer to accumulate the histogram report text in
  */
 
-#ifndef _KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_
-#define _KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_
+#ifndef _UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_
+#define _UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_
 
 /**
  * KBASE_MEM_PROFILE_MAX_BUF_SIZE - The size of the buffer to accumulate the histogram report text
  *                                  in @see @ref CCTXP_HIST_BUF_SIZE_MAX_LENGTH_REPORT
  */
-#define KBASE_MEM_PROFILE_MAX_BUF_SIZE ((size_t)(64 + ((80 + (56 * 64)) * 54) + 56))
-
-#endif  /*_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_*/
+#define KBASE_MEM_PROFILE_MAX_BUF_SIZE ((size_t)(64 + ((80 + (56 * 64)) * 55) + 56))
 
+#endif /*_UAPI_KBASE_MEM_PROFILE_DEBUGFS_BUF_SIZE_H_*/
diff --git a/common/include/uapi/gpu/arm/midgard/mali_uk.h b/common/include/uapi/gpu/arm/midgard/mali_uk.h
deleted file mode 100644
index 78946f6..0000000
--- a/common/include/uapi/gpu/arm/midgard/mali_uk.h
+++ /dev/null
@@ -1,70 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- *
- * (C) COPYRIGHT 2010, 2012-2015, 2018, 2020-2022 ARM Limited. All rights reserved.
- *
- * This program is free software and is provided to you under the terms of the
- * GNU General Public License version 2 as published by the Free Software
- * Foundation, and any use by you of this program is subject to the terms
- * of such GNU license.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, you can access it online at
- * http://www.gnu.org/licenses/gpl-2.0.html.
- *
- */
-
-/**
- * DOC: Types and definitions that are common across OSs for both the user
- *      and kernel side of the User-Kernel interface.
- */
-
-#ifndef _UAPI_UK_H_
-#define _UAPI_UK_H_
-
-#ifdef __cplusplus
-extern "C" {
-#endif /* __cplusplus */
-
-/**
- * DOC: uk_api User-Kernel Interface API
- *
- * The User-Kernel Interface abstracts the communication mechanism between the user and kernel-side code of device
- * drivers developed as part of the Midgard DDK. Currently that includes the Base driver.
- *
- * It exposes an OS independent API to user-side code (UKU) which routes functions calls to an OS-independent
- * kernel-side API (UKK) via an OS-specific communication mechanism.
- *
- * This API is internal to the Midgard DDK and is not exposed to any applications.
- *
- */
-
-/**
- * enum uk_client_id - These are identifiers for kernel-side drivers
- * implementing a UK interface, aka UKK clients.
- * @UK_CLIENT_MALI_T600_BASE: Value used to identify the Base driver UK client.
- * @UK_CLIENT_COUNT:          The number of uk clients supported. This must be
- *                            the last member of the enum
- *
- * The UK module maps this to an OS specific device name, e.g. "gpu_base" -> "GPU0:". Specify this
- * identifier to select a UKK client to the uku_open() function.
- *
- * When a new UKK client driver is created a new identifier needs to be added to the uk_client_id
- * enumeration and the uku_open() implemenation for the various OS ports need to be updated to
- * provide a mapping of the identifier to the OS specific device name.
- *
- */
-enum uk_client_id {
-	UK_CLIENT_MALI_T600_BASE,
-	UK_CLIENT_COUNT
-};
-
-#ifdef __cplusplus
-}
-#endif /* __cplusplus */
-#endif /* _UAPI_UK_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h
new file mode 100644
index 0000000..d2de578
--- /dev/null
+++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_kernel.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022-2023 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+#ifndef _UAPI_PIXEL_GPU_COMMON_KERNEL_H_
+#define _UAPI_PIXEL_GPU_COMMON_KERNEL_H_
+
+#include "pixel_gpu_common_slc.h"
+
+#endif /* _UAPI_PIXEL_GPU_COMMON_KERNEL_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h
new file mode 100644
index 0000000..76e631d
--- /dev/null
+++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022-2023 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+#ifndef _UAPI_PIXEL_GPU_COMMON_SLC_H_
+#define _UAPI_PIXEL_GPU_COMMON_SLC_H_
+
+#include <linux/types.h>
+
+/**
+ * enum kbase_pixel_gpu_slc_liveness_mark_type - Determines the type of a live range mark
+ *
+ * @KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN: Signifies that a mark is the start of a live range
+ * @KBASE_PIXEL_GPU_LIVE_RANGE_END:   Signifies that a mark is the end of a live range
+ *
+ */
+enum kbase_pixel_gpu_slc_liveness_mark_type {
+	KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN,
+	KBASE_PIXEL_GPU_LIVE_RANGE_END,
+};
+
+/**
+ * struct kbase_pixel_gpu_slc_liveness_mark - Live range marker
+ *
+ * @type: See @struct kbase_pixel_gpu_slc_liveness_mark_type
+ * @index: Buffer index (within liveness update array) that this mark represents
+ *
+ */
+struct kbase_pixel_gpu_slc_liveness_mark {
+	__u32 type : 1;
+	__u32 index : 31;
+};
+
+#endif /* _UAPI_PIXEL_GPU_COMMON_SLC_H_ */
diff --git a/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h
new file mode 100644
index 0000000..b575c79
--- /dev/null
+++ b/common/include/uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022-2023 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+#ifndef _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_
+#define _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_
+
+/**
+ * enum pixel_mgm_group_id - Symbolic names for used memory groups
+ */
+enum pixel_mgm_group_id
+{
+	/* The Mali driver requires that allocations made on one of the groups
+	 * are not treated specially.
+	 */
+	MGM_RESERVED_GROUP_ID = 0,
+
+	/* Group for memory that should be cached in the system level cache. */
+	MGM_SLC_GROUP_ID = 1,
+
+	/* Group for memory explicitly allocated in SLC. */
+	MGM_SLC_EXPLICIT_GROUP_ID = 2,
+
+	/* Imported memory is handled by the allocator of the memory, and the Mali
+	 * DDK will request a group_id for such memory via mgm_get_import_memory_id().
+	 * We specify which group we want to use for this here.
+	 */
+	MGM_IMPORTED_MEMORY_GROUP_ID = (MEMORY_GROUP_MANAGER_NR_GROUPS - 1),
+};
+
+/**
+ * pixel_mgm_query_group_size - Query the current size of a memory group
+ *
+ * @mgm_dev:   The memory group manager through which the request is being made.
+ * @group_id:  Memory group to query.
+ *
+ * Returns the actual size of the memory group's active partition
+ */
+extern u64 pixel_mgm_query_group_size(struct memory_group_manager_device* mgm_dev,
+                                      enum pixel_mgm_group_id group_id);
+
+/**
+ * pixel_mgm_resize_group_to_fit - Resize a memory group to meet @demand, if possible
+ *
+ * @mgm_dev:   The memory group manager through which the request is being made.
+ * @group_id:  Memory group for which we will change the backing partition.
+ * @demand:    The demanded space from the memory group.
+ */
+extern void pixel_mgm_resize_group_to_fit(struct memory_group_manager_device* mgm_dev,
+                                          enum pixel_mgm_group_id group_id,
+                                          u64 demand);
+
+#endif /* _UAPI_PIXEL_MEMORY_GROUP_MANAGER_H_ */
diff --git a/mali_kbase/BUILD.bazel b/mali_kbase/BUILD.bazel
index 86d8658..b987493 100644
--- a/mali_kbase/BUILD.bazel
+++ b/mali_kbase/BUILD.bazel
@@ -1,25 +1,59 @@
-# SPDX-License-Identifier: GPL-2.0-or-later
+# This program is free software and is provided to you under the terms of the
+# GNU General Public License version 2 as published by the Free Software
+# Foundation, and any use by you of this program is subject to the terms
+# of such GNU license.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, you can access it online at
+# http://www.gnu.org/licenses/gpl-2.0.html.
+#
+#
 
-load("//build/kernel/kleaf:kernel.bzl", "kernel_module")
+load(
+    "//build/kernel/kleaf:kernel.bzl",
+    "kernel_module",
+)
+
+_midgard_modules = [
+    "mali_kbase.ko",
+    "tests/kutf/mali_kutf.ko",
+    "tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test_portal.ko",
+]
 
 kernel_module(
     name = "mali_kbase",
     srcs = glob([
         "**/*.c",
         "**/*.h",
-        "**/Kbuild",
+        "**/*Kbuild",
+        "**/*Makefile",
     ]) + [
         "//private/google-modules/gpu/common:headers",
         "//private/google-modules/soc/gs:gs_soc_headers",
     ],
-    outs = [
-        "mali_kbase.ko",
-    ],
+    outs = _midgard_modules,
     kernel_build = "//private/google-modules/soc/gs:gs_kernel_build",
     visibility = [
         "//private/google-modules/soc/gs:__pkg__",
     ],
     deps = [
+        "//private/google-modules/gpu/mali_pixel",
         "//private/google-modules/soc/gs:gs_soc_module",
     ],
 )
+
+filegroup(
+    name = "midgard_kconfig.cloudripper",
+    srcs = glob([
+        "**/*Kconfig",
+    ]),
+    visibility = [
+        "//common:__pkg__",
+        "//common-modules/mali:__subpackages__",
+    ],
+)
diff --git a/mali_kbase/Kbuild b/mali_kbase/Kbuild
index e0703ab..666498c 100644
--- a/mali_kbase/Kbuild
+++ b/mali_kbase/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -59,10 +59,8 @@ ifeq ($(CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS), y)
 endif
 
 ifeq ($(CONFIG_MALI_FENCE_DEBUG), y)
-    ifneq ($(CONFIG_SYNC), y)
-        ifneq ($(CONFIG_SYNC_FILE), y)
-            $(error CONFIG_MALI_FENCE_DEBUG depends on CONFIG_SYNC || CONFIG_SYNC_FILE to be set in Kernel configuration)
-        endif
+    ifneq ($(CONFIG_SYNC_FILE), y)
+        $(error CONFIG_MALI_FENCE_DEBUG depends on CONFIG_SYNC_FILE to be set in Kernel configuration)
     endif
 endif
 
@@ -70,12 +68,11 @@ endif
 # Configurations
 #
 
-# Driver version string which is returned to userspace via an ioctl
-MALI_RELEASE_NAME ?= '"r36p0-01eac0"'
-
 # We are building for Pixel
 CONFIG_MALI_PLATFORM_NAME="pixel"
 
+# Driver version string which is returned to userspace via an ioctl
+MALI_RELEASE_NAME ?= '"r44p1-00dev3"'
 # Set up defaults if not defined by build system
 ifeq ($(CONFIG_MALI_DEBUG), y)
     MALI_UNIT_TEST = 1
@@ -89,9 +86,19 @@ MALI_COVERAGE ?= 0
 # Kconfig passes in the name with quotes for in-tree builds - remove them.
 MALI_PLATFORM_DIR := $(shell echo $(CONFIG_MALI_PLATFORM_NAME))
 
+ifneq ($(CONFIG_SOC_GS101),y)
+    CONFIG_MALI_CSF_SUPPORT ?= y
+endif
+
 ifeq ($(CONFIG_MALI_CSF_SUPPORT),y)
     MALI_JIT_PRESSURE_LIMIT_BASE = 0
     MALI_USE_CSF = 1
+    ccflags-y += -DCONFIG_MALI_PIXEL_GPU_SSCD
+ifeq ($(CONFIG_SOC_GS201),y)
+ifeq ($(CONFIG_MALI_HOST_CONTROLS_SC_RAILS),y)
+    ccflags-y += -DCONFIG_MALI_HOST_CONTROLS_SC_RAILS
+endif
+endif
 else
     MALI_JIT_PRESSURE_LIMIT_BASE ?= 1
     MALI_USE_CSF ?= 0
@@ -110,12 +117,12 @@ endif
 #
 # Experimental features must default to disabled, e.g.:
 # MALI_EXPERIMENTAL_FEATURE ?= 0
-MALI_INCREMENTAL_RENDERING ?= 0
+MALI_INCREMENTAL_RENDERING_JM ?= 0
 
 #
 # ccflags
 #
-ccflags-y = \
+ccflags-y += \
     -DMALI_CUSTOMER_RELEASE=$(MALI_CUSTOMER_RELEASE) \
     -DMALI_USE_CSF=$(MALI_USE_CSF) \
     -DMALI_KERNEL_TEST_API=$(MALI_KERNEL_TEST_API) \
@@ -123,10 +130,9 @@ ccflags-y = \
     -DMALI_COVERAGE=$(MALI_COVERAGE) \
     -DMALI_RELEASE_NAME=$(MALI_RELEASE_NAME) \
     -DMALI_JIT_PRESSURE_LIMIT_BASE=$(MALI_JIT_PRESSURE_LIMIT_BASE) \
-    -DMALI_INCREMENTAL_RENDERING=$(MALI_INCREMENTAL_RENDERING) \
+    -DMALI_INCREMENTAL_RENDERING_JM=$(MALI_INCREMENTAL_RENDERING_JM) \
     -DMALI_PLATFORM_DIR=$(MALI_PLATFORM_DIR)
 
-
 ifeq ($(KBUILD_EXTMOD),)
 # in-tree
     ccflags-y +=-DMALI_KBASE_PLATFORM_PATH=../../$(src)/platform/$(CONFIG_MALI_PLATFORM_NAME)
@@ -139,7 +145,8 @@ ccflags-y += \
     -I$(src) \
     -I$(src)/platform/$(MALI_PLATFORM_DIR) \
     -I$(src)/../../../base \
-    -I$(src)/../../../../include
+    -I$(src)/../../../../include \
+    -I$(src)/tests/include
 
 # Add include path for related GPU modules
 ccflags-y += -I$(src)/../common/include
@@ -150,13 +157,14 @@ subdir-ccflags-y += $(ccflags-y)
 # Kernel Modules
 #
 obj-$(CONFIG_MALI_MIDGARD) += mali_kbase.o
-obj-$(CONFIG_MALI_ARBITRATION) += arbitration/
+obj-$(CONFIG_MALI_ARBITRATION) += ../arbitration/
 obj-$(CONFIG_MALI_KUTF)    += tests/
 
 mali_kbase-y := \
     mali_kbase_cache_policy.o \
     mali_kbase_ccswe.o \
     mali_kbase_mem.o \
+    mali_kbase_mem_migrate.o \
     mali_kbase_mem_pool_group.o \
     mali_kbase_native_mgm.o \
     mali_kbase_ctx_sched.o \
@@ -165,12 +173,6 @@ mali_kbase-y := \
     mali_kbase_config.o \
     mali_kbase_kinstr_prfcnt.o \
     mali_kbase_vinstr.o \
-    mali_kbase_hwcnt.o \
-    mali_kbase_hwcnt_gpu.o \
-    mali_kbase_hwcnt_gpu_narrow.o \
-    mali_kbase_hwcnt_types.o \
-    mali_kbase_hwcnt_virtualizer.o \
-    mali_kbase_hwcnt_watchdog_if_timer.o \
     mali_kbase_softjobs.o \
     mali_kbase_hw.o \
     mali_kbase_debug.o \
@@ -180,11 +182,12 @@ mali_kbase-y := \
     mali_kbase_mem_profile_debugfs.o \
     mali_kbase_disjoint_events.o \
     mali_kbase_debug_mem_view.o \
+    mali_kbase_debug_mem_zones.o \
+    mali_kbase_debug_mem_allocs.o \
     mali_kbase_smc.o \
     mali_kbase_mem_pool.o \
     mali_kbase_mem_pool_debugfs.o \
     mali_kbase_debugfs_helper.o \
-    mali_kbase_strings.o \
     mali_kbase_as_fault_debugfs.o \
     mali_kbase_regs_history_debugfs.o \
     mali_kbase_dvfs_debugfs.o \
@@ -196,24 +199,18 @@ mali_kbase-$(CONFIG_DEBUG_FS) += mali_kbase_pbha_debugfs.o
 
 mali_kbase-$(CONFIG_MALI_CINSTR_GWT) += mali_kbase_gwt.o
 
-mali_kbase-$(CONFIG_SYNC) += \
-    mali_kbase_sync_android.o \
-    mali_kbase_sync_common.o
-
 mali_kbase-$(CONFIG_SYNC_FILE) += \
     mali_kbase_fence_ops.o \
     mali_kbase_sync_file.o \
     mali_kbase_sync_common.o
 
-ifeq ($(CONFIG_MALI_CSF_SUPPORT),y)
-    mali_kbase-y += \
-        mali_kbase_hwcnt_backend_csf.o \
-        mali_kbase_hwcnt_backend_csf_if_fw.o
-else
+mali_kbase-$(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) += \
+        mali_power_gpu_work_period_trace.o \
+        mali_kbase_gpu_metrics.o
+
+ifneq ($(CONFIG_MALI_CSF_SUPPORT),y)
     mali_kbase-y += \
         mali_kbase_jm.o \
-        mali_kbase_hwcnt_backend_jm.o \
-        mali_kbase_hwcnt_backend_jm_watchdog.o \
         mali_kbase_dummy_job_wa.o \
         mali_kbase_debug_job_fault.o \
         mali_kbase_event.o \
@@ -223,11 +220,6 @@ else
         mali_kbase_js_ctx_attr.o \
         mali_kbase_kinstr_jm.o
 
-    mali_kbase-$(CONFIG_MALI_DMA_FENCE) += \
-        mali_kbase_fence_ops.o \
-        mali_kbase_dma_fence.o \
-        mali_kbase_fence.o
-
     mali_kbase-$(CONFIG_SYNC_FILE) += \
         mali_kbase_fence_ops.o \
         mali_kbase_fence.o
@@ -241,6 +233,7 @@ INCLUDE_SUBDIR = \
     $(src)/backend/gpu/Kbuild \
     $(src)/mmu/Kbuild \
     $(src)/tl/Kbuild \
+    $(src)/hwcnt/Kbuild \
     $(src)/gpu/Kbuild \
     $(src)/thirdparty/Kbuild \
     $(src)/platform/$(MALI_PLATFORM_DIR)/Kbuild
diff --git a/mali_kbase/Kconfig b/mali_kbase/Kconfig
index a563d35..bb25ef4 100644
--- a/mali_kbase/Kconfig
+++ b/mali_kbase/Kconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -43,12 +43,40 @@ config MALI_PLATFORM_NAME
 	  include in the build. 'platform/$(MALI_PLATFORM_NAME)/Kbuild' must
 	  exist.
 
+choice
+	prompt "Mali HW backend"
+	depends on MALI_MIDGARD
+	default MALI_REAL_HW
+
 config MALI_REAL_HW
+	bool "Enable build of Mali kernel driver for real HW"
 	depends on MALI_MIDGARD
-	def_bool !MALI_NO_MALI
+	help
+	  This is the default HW backend.
+
+config MALI_NO_MALI
+	bool "Enable build of Mali kernel driver for No Mali"
+	depends on MALI_MIDGARD && MALI_EXPERT
+	help
+	  This can be used to test the driver in a simulated environment
+	  whereby the hardware is not physically present. If the hardware is physically
+	  present it will not be used. This can be used to test the majority of the
+	  driver without needing actual hardware or for software benchmarking.
+	  All calls to the simulated hardware will complete immediately as if the hardware
+	  completed the task.
+
+config MALI_NO_MALI_DEFAULT_GPU
+	string "Default GPU for No Mali"
+	depends on MALI_NO_MALI
+	default "tMIx"
+	help
+	  This option sets the default GPU to identify as for No Mali builds.
+
+
+endchoice
 
 menu "Platform specific options"
-source "drivers/gpu/arm/midgard/platform/Kconfig"
+source "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/platform/Kconfig"
 endmenu
 
 config MALI_CSF_SUPPORT
@@ -94,16 +122,6 @@ config MALI_MIDGARD_ENABLE_TRACE
 	  Enables tracing in kbase. Trace log available through
 	  the "mali_trace" debugfs file, when the CONFIG_DEBUG_FS is enabled
 
-config MALI_DMA_FENCE
-	bool "Enable DMA_BUF fence support for Mali"
-	depends on MALI_MIDGARD
-	default n
-	help
-	  Support DMA_BUF fences for Mali.
-
-	  This option should only be enabled if the Linux Kernel has built in
-	  support for DMA_BUF fences.
-
 config MALI_ARBITER_SUPPORT
 	bool "Enable arbiter support for Mali"
 	depends on MALI_MIDGARD && !MALI_CSF_SUPPORT
@@ -120,7 +138,7 @@ config MALI_DMA_BUF_MAP_ON_DEMAND
 	depends on MALI_MIDGARD
 	default n
 	help
-	  This option caused kbase to set up the GPU mapping of imported
+	  This option will cause kbase to set up the GPU mapping of imported
 	  dma-buf when needed to run atoms. This is the legacy behavior.
 
 	  This is intended for testing and the option will get removed in the
@@ -140,6 +158,11 @@ config MALI_DMA_BUF_LEGACY_COMPAT
 	  flushes in other drivers. This only has an effect for clients using
 	  UK 11.18 or older. For later UK versions it is not possible.
 
+config MALI_CORESIGHT
+	depends on MALI_MIDGARD && MALI_CSF_SUPPORT && !MALI_NO_MALI
+	bool "Enable Kbase CoreSight tracing support"
+	default n
+
 menuconfig MALI_EXPERT
 	depends on MALI_MIDGARD
 	bool "Enable Expert Settings"
@@ -150,7 +173,19 @@ menuconfig MALI_EXPERT
 
 if MALI_EXPERT
 
-config MALI_2MB_ALLOC
+config LARGE_PAGE_ALLOC_OVERRIDE
+	bool "Override default setting of 2MB pages"
+	depends on MALI_MIDGARD && MALI_EXPERT
+	default n
+	help
+	  An override config for LARGE_PAGE_ALLOC config.
+	  When LARGE_PAGE_ALLOC_OVERRIDE is Y, 2MB page allocation will be
+	  enabled by LARGE_PAGE_ALLOC. When this is N, the feature will be
+	  enabled when GPU HW satisfies requirements.
+
+	  If in doubt, say N
+
+config LARGE_PAGE_ALLOC
 	bool "Attempt to allocate 2MB pages"
 	depends on MALI_MIDGARD && MALI_EXPERT
 	default n
@@ -159,8 +194,28 @@ config MALI_2MB_ALLOC
 	  allocate 2MB pages from the kernel. This reduces TLB pressure and
 	  helps to prevent memory fragmentation.
 
+	  Note this config applies only when LARGE_PAGE_ALLOC_OVERRIDE config
+	  is enabled and enabling this on a GPU HW that does not satisfy
+	  requirements can cause serious problem.
+
 	  If in doubt, say N
 
+config PAGE_MIGRATION_SUPPORT
+	bool "Enable support for page migration"
+	depends on MALI_MIDGARD && MALI_EXPERT
+	default y
+	default n if ANDROID
+	help
+	  Compile in support for page migration.
+	  If set to disabled ('n') then page migration cannot
+	  be enabled at all, and related symbols are not compiled in.
+	  If not set, page migration is compiled in by default, and
+	  if not explicitly enabled or disabled with the insmod parameter,
+	  page migration becomes automatically enabled with large pages.
+
+	  If in doubt, say Y. To strip out page migration symbols and support,
+	  say N.
+
 config MALI_MEMORY_FULLY_BACKED
 	bool "Enable memory fully physically-backed"
 	depends on MALI_MIDGARD && MALI_EXPERT
@@ -187,18 +242,6 @@ config MALI_CORESTACK
 comment "Platform options"
 	depends on MALI_MIDGARD && MALI_EXPERT
 
-config MALI_NO_MALI
-	bool "Enable No Mali"
-	depends on MALI_MIDGARD && MALI_EXPERT
-	default n
-	help
-	  This can be used to test the driver in a simulated environment
-	  whereby the hardware is not physically present. If the hardware is physically
-	  present it will not be used. This can be used to test the majority of the
-	  driver without needing actual hardware or for software benchmarking.
-	  All calls to the simulated hardware will complete immediately as if the hardware
-	  completed the task.
-
 config MALI_ERROR_INJECT
 	bool "Enable No Mali error injection"
 	depends on MALI_MIDGARD && MALI_EXPERT && MALI_NO_MALI
@@ -206,14 +249,6 @@ config MALI_ERROR_INJECT
 	help
 	  Enables insertion of errors to test module failure and recovery mechanisms.
 
-config MALI_GEM5_BUILD
-	bool "Enable build of Mali kernel driver for GEM5"
-	depends on MALI_MIDGARD && MALI_EXPERT
-	default n
-	help
-	  This option is to do a Mali GEM5 build.
-	  If unsure, say N.
-
 comment "Debug options"
 	depends on MALI_MIDGARD && MALI_EXPERT
 
@@ -226,7 +261,7 @@ config MALI_DEBUG
 
 config MALI_FENCE_DEBUG
 	bool "Enable debug sync fence usage"
-	depends on MALI_MIDGARD && MALI_EXPERT && (SYNC || SYNC_FILE)
+	depends on MALI_MIDGARD && MALI_EXPERT && SYNC_FILE
 	default y if MALI_DEBUG
 	help
 	  Select this option to enable additional checking and reporting on the
@@ -363,6 +398,15 @@ config MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE
 	  tree using the property, opp-mali-errata-1485982. Otherwise the
 	  slowest clock will be selected.
 
+config MALI_HOST_CONTROLS_SC_RAILS
+	bool "Enable Host based control of the shader core power rails"
+	depends on MALI_CSF_SUPPORT
+	default n
+	help
+	  This option enables the Host based control of the power rails for
+	  shader cores. It is recommended to use PDCA (Power Domain Control
+	  Adapter) inside the GPU to handshake with SoC PMU to control the
+	  power of cores.
 endif
 
 config MALI_ARBITRATION
@@ -374,10 +418,16 @@ config MALI_ARBITRATION
 	  virtualization setup for Mali
 	  If unsure, say N.
 
-if MALI_ARBITRATION
-source "drivers/gpu/arm/midgard/arbitration/Kconfig"
-endif
+config MALI_TRACE_POWER_GPU_WORK_PERIOD
+	bool "Enable per-application GPU metrics tracepoints"
+	depends on MALI_MIDGARD
+	default y
+	help
+	  This option enables per-application GPU metrics tracepoints.
+
+	  If unsure, say N.
+
 
-source "drivers/gpu/arm/midgard/tests/Kconfig"
+source "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/tests/Kconfig"
 
 endif
diff --git a/mali_kbase/Makefile b/mali_kbase/Makefile
index ae4609c..6ee3a2d 100644
--- a/mali_kbase/Makefile
+++ b/mali_kbase/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -20,8 +20,6 @@
 
 KERNEL_SRC ?= /lib/modules/$(shell uname -r)/build
 KDIR ?= $(KERNEL_SRC)
-
-# Ensure build intermediates are in OUT_DIR instead of alongside the source
 M ?= $(shell pwd)
 
 ifeq ($(KDIR),)
@@ -33,17 +31,21 @@ endif
 # Pixel integration configuration values
 #
 
+# Debug Ftrace configuration options
+CONFIG_MALI_SYSTEM_TRACE=y
+
 # Core kbase configuration options
 CONFIG_MALI_EXPERT=y
 CONFIG_MALI_MIDGARD_DVFS=y
+CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD = y
 
 # Pixel integration specific configuration options
 CONFIG_MALI_PLATFORM_NAME="pixel"
-CONFIG_MALI_PIXEL_GPU_QOS=y
-CONFIG_MALI_PIXEL_GPU_BTS=y
-CONFIG_MALI_PIXEL_GPU_THERMAL=y
-CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING=y
-
+CONFIG_MALI_PIXEL_GPU_QOS ?= y
+CONFIG_MALI_PIXEL_GPU_BTS ?= y
+CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING ?= y
+CONFIG_MALI_PIXEL_GPU_THERMAL ?= y
+CONFIG_MALI_PIXEL_GPU_SLC ?= y
 
 #
 # Default configuration values
@@ -51,175 +53,179 @@ CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING=y
 # Dependency resolution is done through statements as Kconfig
 # is not supported for out-of-tree builds.
 #
+CONFIGS :=
+ifeq ($(MALI_KCONFIG_EXT_PREFIX),)
+    CONFIG_MALI_MIDGARD ?= m
+    ifeq ($(CONFIG_MALI_MIDGARD),m)
+        CONFIG_MALI_PLATFORM_NAME ?= "devicetree"
+        CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD ?= y
+        CONFIG_MALI_GATOR_SUPPORT ?= y
+        CONFIG_MALI_ARBITRATION ?= n
+        CONFIG_MALI_PARTITION_MANAGER ?= n
 
-CONFIG_MALI_MIDGARD ?= m
-ifeq ($(CONFIG_MALI_MIDGARD),m)
-    CONFIG_MALI_PLATFORM_NAME ?= "devicetree"
-    CONFIG_MALI_GATOR_SUPPORT ?= y
-    CONFIG_MALI_ARBITRATION ?= n
-    CONFIG_MALI_PARTITION_MANAGER ?= n
-
-    ifeq ($(origin CONFIG_MALI_ABITER_MODULES), undefined)
-        CONFIG_MALI_ARBITER_MODULES := $(CONFIG_MALI_ARBITRATION)
-    endif
-
-    ifeq ($(origin CONFIG_MALI_GPU_POWER_MODULES), undefined)
-        CONFIG_MALI_GPU_POWER_MODULES := $(CONFIG_MALI_ARBITRATION)
-    endif
-
-    ifneq ($(CONFIG_MALI_NO_MALI),y)
-        # Prevent misuse when CONFIG_MALI_NO_MALI=y
-        CONFIG_MALI_REAL_HW ?= y
-    endif
-
-    ifeq ($(CONFIG_MALI_MIDGARD_DVFS),y)
-        # Prevent misuse when CONFIG_MALI_MIDGARD_DVFS=y
-        CONFIG_MALI_DEVFREQ ?= n
-    else
-        CONFIG_MALI_DEVFREQ ?= y
-    endif
-
-    ifeq ($(CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND), y)
-        # Prevent misuse when CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y
-        CONFIG_MALI_DMA_BUF_LEGACY_COMPAT = n
-    endif
-
-    ifeq ($(CONFIG_XEN),y)
-        ifneq ($(CONFIG_MALI_ARBITRATION), n)
-            CONFIG_MALI_XEN ?= m
+        ifneq ($(CONFIG_MALI_NO_MALI),y)
+            # Prevent misuse when CONFIG_MALI_NO_MALI=y
+            CONFIG_MALI_REAL_HW ?= y
+            CONFIG_MALI_CORESIGHT = n
         endif
-    endif
 
-    #
-    # Expert/Debug/Test released configurations
-    #
-    ifeq ($(CONFIG_MALI_EXPERT), y)
-        ifeq ($(CONFIG_MALI_NO_MALI), y)
-            CONFIG_MALI_REAL_HW = n
+        ifeq ($(CONFIG_MALI_MIDGARD_DVFS),y)
+            # Prevent misuse when CONFIG_MALI_MIDGARD_DVFS=y
+            CONFIG_MALI_DEVFREQ ?= n
         else
-            # Prevent misuse when CONFIG_MALI_NO_MALI=n
-            CONFIG_MALI_REAL_HW = y
-            CONFIG_MALI_ERROR_INJECT = n
+            CONFIG_MALI_DEVFREQ ?= y
         endif
 
-        ifeq ($(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED), y)
-            # Prevent misuse when CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED=y
-            CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n
+        ifeq ($(CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND), y)
+            # Prevent misuse when CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND=y
+            CONFIG_MALI_DMA_BUF_LEGACY_COMPAT = n
         endif
 
-        ifeq ($(CONFIG_MALI_DEBUG), y)
-            CONFIG_MALI_MIDGARD_ENABLE_TRACE ?= y
-            CONFIG_MALI_SYSTEM_TRACE ?= y
+        ifeq ($(CONFIG_MALI_CSF_SUPPORT), y)
+            CONFIG_MALI_CORESIGHT ?= n
+        endif
+
+        #
+        # Expert/Debug/Test released configurations
+        #
+        ifeq ($(CONFIG_MALI_EXPERT), y)
+            ifeq ($(CONFIG_MALI_NO_MALI), y)
+                CONFIG_MALI_REAL_HW = n
+                CONFIG_MALI_NO_MALI_DEFAULT_GPU ?= "tMIx"
 
-            ifeq ($(CONFIG_SYNC), y)
-                CONFIG_MALI_FENCE_DEBUG ?= y
             else
+                # Prevent misuse when CONFIG_MALI_NO_MALI=n
+                CONFIG_MALI_REAL_HW = y
+                CONFIG_MALI_ERROR_INJECT = n
+            endif
+
+
+            ifeq ($(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED), y)
+                # Prevent misuse when CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED=y
+                CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n
+            endif
+
+            ifeq ($(CONFIG_MALI_DEBUG), y)
+                CONFIG_MALI_MIDGARD_ENABLE_TRACE ?= y
+                CONFIG_MALI_SYSTEM_TRACE ?= y
+
                 ifeq ($(CONFIG_SYNC_FILE), y)
                     CONFIG_MALI_FENCE_DEBUG ?= y
                 else
                     CONFIG_MALI_FENCE_DEBUG = n
                 endif
+            else
+                # Prevent misuse when CONFIG_MALI_DEBUG=n
+                CONFIG_MALI_MIDGARD_ENABLE_TRACE = n
+                CONFIG_MALI_FENCE_DEBUG = n
             endif
         else
-            # Prevent misuse when CONFIG_MALI_DEBUG=n
+            # Prevent misuse when CONFIG_MALI_EXPERT=n
+            CONFIG_MALI_CORESTACK = n
+            CONFIG_LARGE_PAGE_ALLOC_OVERRIDE = n
+            CONFIG_LARGE_PAGE_ALLOC = n
+            CONFIG_MALI_PWRSOFT_765 = n
+            CONFIG_MALI_MEMORY_FULLY_BACKED = n
+            CONFIG_MALI_JOB_DUMP = n
+            CONFIG_MALI_NO_MALI = n
+            CONFIG_MALI_REAL_HW = y
+            CONFIG_MALI_ERROR_INJECT = n
+            CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED = n
+            CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n
+            CONFIG_MALI_HOST_CONTROLS_SC_RAILS = n
+            CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS = n
+            CONFIG_MALI_DEBUG = n
             CONFIG_MALI_MIDGARD_ENABLE_TRACE = n
-            CONFIG_MALI_SYSTEM_TRACE = n
             CONFIG_MALI_FENCE_DEBUG = n
         endif
-    else
-        # Prevent misuse when CONFIG_MALI_EXPERT=n
-        CONFIG_MALI_CORESTACK = n
-        CONFIG_MALI_2MB_ALLOC = n
-        CONFIG_MALI_PWRSOFT_765 = n
-        CONFIG_MALI_MEMORY_FULLY_BACKED = n
-        CONFIG_MALI_JOB_DUMP = n
-        CONFIG_MALI_NO_MALI = n
-        CONFIG_MALI_REAL_HW = y
-        CONFIG_MALI_ERROR_INJECT = n
-        CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED = n
-        CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE = n
-        CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS = n
-        CONFIG_MALI_DEBUG = n
-        CONFIG_MALI_MIDGARD_ENABLE_TRACE = n
-        CONFIG_MALI_SYSTEM_TRACE = n
-        CONFIG_MALI_FENCE_DEBUG = n
-    endif
 
-    ifeq ($(CONFIG_MALI_DEBUG), y)
-        CONFIG_MALI_KUTF ?= y
-        ifeq ($(CONFIG_MALI_KUTF), y)
-            CONFIG_MALI_KUTF_IRQ_TEST ?= y
-            CONFIG_MALI_KUTF_CLK_RATE_TRACE ?= y
+        ifeq ($(CONFIG_MALI_DEBUG), y)
+            CONFIG_MALI_KUTF ?= y
+            ifeq ($(CONFIG_MALI_KUTF), y)
+                CONFIG_MALI_KUTF_IRQ_TEST ?= y
+                CONFIG_MALI_KUTF_CLK_RATE_TRACE ?= y
+                CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST ?= y
+                ifeq ($(CONFIG_MALI_DEVFREQ), y)
+                    ifeq ($(CONFIG_MALI_NO_MALI), y)
+                        CONFIG_MALI_KUTF_IPA_UNIT_TEST ?= y
+                    endif
+                endif
+
+            else
+                # Prevent misuse when CONFIG_MALI_KUTF=n
+                CONFIG_MALI_KUTF_IRQ_TEST = n
+                CONFIG_MALI_KUTF_CLK_RATE_TRACE = n
+                CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n
+            endif
         else
-            # Prevent misuse when CONFIG_MALI_KUTF=n
+            # Prevent misuse when CONFIG_MALI_DEBUG=n
+            CONFIG_MALI_KUTF = y
             CONFIG_MALI_KUTF_IRQ_TEST = n
-            CONFIG_MALI_KUTF_CLK_RATE_TRACE = n
+            CONFIG_MALI_KUTF_CLK_RATE_TRACE = y
+            CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n
         endif
     else
-        # Prevent misuse when CONFIG_MALI_DEBUG=n
+        # Prevent misuse when CONFIG_MALI_MIDGARD=n
+        CONFIG_MALI_ARBITRATION = n
         CONFIG_MALI_KUTF = n
         CONFIG_MALI_KUTF_IRQ_TEST = n
-        CONFIG_MALI_KUTF_CLK_RATE_TRACE = n
+        CONFIG_MALI_KUTF_CLK_RATE_TRACE = y
+        CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST = n
     endif
-else
-    # Prevent misuse when CONFIG_MALI_MIDGARD=n
-    CONFIG_MALI_ARBITRATION = n
-    CONFIG_MALI_ARBITER_MODULES = n
-    CONFIG_MALI_GPU_POWER_MODULES = n
-    CONFIG_MALI_KUTF = n
-    CONFIG_MALI_KUTF_IRQ_TEST = n
-    CONFIG_MALI_KUTF_CLK_RATE_TRACE = n
-endif
 
-# All Mali CONFIG should be listed here
-CONFIGS := \
-    CONFIG_MALI_MIDGARD \
-    CONFIG_MALI_CSF_SUPPORT \
-    CONFIG_MALI_GATOR_SUPPORT \
-    CONFIG_MALI_DMA_FENCE \
-    CONFIG_MALI_ARBITER_SUPPORT \
-    CONFIG_MALI_ARBITRATION \
-    CONFIG_MALI_ARBITER_MODULES \
-    CONFIG_MALI_GPU_POWER_MODULES \
-    CONFIG_MALI_PARTITION_MANAGER \
-    CONFIG_MALI_REAL_HW \
-    CONFIG_MALI_GEM5_BUILD \
-    CONFIG_MALI_DEVFREQ \
-    CONFIG_MALI_MIDGARD_DVFS \
-    CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND \
-    CONFIG_MALI_DMA_BUF_LEGACY_COMPAT \
-    CONFIG_MALI_EXPERT \
-    CONFIG_MALI_CORESTACK \
-    CONFIG_MALI_2MB_ALLOC \
-    CONFIG_MALI_PWRSOFT_765 \
-    CONFIG_MALI_MEMORY_FULLY_BACKED \
-    CONFIG_MALI_JOB_DUMP \
-    CONFIG_MALI_NO_MALI \
-    CONFIG_MALI_ERROR_INJECT \
-    CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED \
-    CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE \
-    CONFIG_MALI_PRFCNT_SET_PRIMARY \
-    CONFIG_MALI_PRFCNT_SET_SECONDARY \
-    CONFIG_MALI_PRFCNT_SET_TERTIARY \
-    CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS \
-    CONFIG_MALI_DEBUG \
-    CONFIG_MALI_MIDGARD_ENABLE_TRACE \
-    CONFIG_MALI_SYSTEM_TRACE \
-    CONFIG_MALI_FENCE_DEBUG \
-    CONFIG_MALI_KUTF \
-    CONFIG_MALI_KUTF_IRQ_TEST \
-    CONFIG_MALI_KUTF_CLK_RATE_TRACE \
-    CONFIG_MALI_XEN
-
-# Pixel integration CONFIG options
-CONFIGS += \
-    CONFIG_MALI_PIXEL_GPU_QOS \
-    CONFIG_MALI_PIXEL_GPU_BTS \
-    CONFIG_MALI_PIXEL_GPU_THERMAL \
-    CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING
+    # All Mali CONFIG should be listed here
+    CONFIGS := \
+        CONFIG_MALI_MIDGARD \
+        CONFIG_MALI_GATOR_SUPPORT \
+        CONFIG_MALI_ARBITER_SUPPORT \
+        CONFIG_MALI_ARBITRATION \
+        CONFIG_MALI_PARTITION_MANAGER \
+        CONFIG_MALI_REAL_HW \
+        CONFIG_MALI_DEVFREQ \
+        CONFIG_MALI_MIDGARD_DVFS \
+        CONFIG_MALI_DMA_BUF_MAP_ON_DEMAND \
+        CONFIG_MALI_DMA_BUF_LEGACY_COMPAT \
+        CONFIG_MALI_EXPERT \
+        CONFIG_MALI_CORESTACK \
+        CONFIG_LARGE_PAGE_ALLOC_OVERRIDE \
+        CONFIG_LARGE_PAGE_ALLOC \
+        CONFIG_MALI_PWRSOFT_765 \
+        CONFIG_MALI_MEMORY_FULLY_BACKED \
+        CONFIG_MALI_JOB_DUMP \
+        CONFIG_MALI_NO_MALI \
+        CONFIG_MALI_ERROR_INJECT \
+        CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED \
+        CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE \
+        CONFIG_MALI_HOST_CONTROLS_SC_RAILS \
+        CONFIG_MALI_PRFCNT_SET_PRIMARY \
+        CONFIG_MALI_PRFCNT_SET_SECONDARY \
+        CONFIG_MALI_PRFCNT_SET_TERTIARY \
+        CONFIG_MALI_PRFCNT_SET_SELECT_VIA_DEBUG_FS \
+        CONFIG_MALI_DEBUG \
+        CONFIG_MALI_MIDGARD_ENABLE_TRACE \
+        CONFIG_MALI_SYSTEM_TRACE \
+        CONFIG_MALI_FENCE_DEBUG \
+        CONFIG_MALI_KUTF \
+        CONFIG_MALI_KUTF_IRQ_TEST \
+        CONFIG_MALI_KUTF_CLK_RATE_TRACE \
+        CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST \
+        CONFIG_MALI_XEN \
+        CONFIG_MALI_CORESIGHT \
+        CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD
 
+    # Pixel integration CONFIG options
+    CONFIGS += \
+        CONFIG_MALI_PIXEL_GPU_QOS \
+        CONFIG_MALI_PIXEL_GPU_BTS \
+        CONFIG_MALI_PIXEL_GPU_THERMAL \
+        CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING \
+        CONFIG_MALI_PIXEL_GPU_SLC
+
+endif
+
+THIS_DIR := $(dir $(lastword $(MAKEFILE_LIST)))
+-include $(THIS_DIR)/../arbitration/Makefile
 
-#
 # MAKE_ARGS to pass the custom CONFIGs on out-of-tree build
 #
 # Generate the list of CONFIGs and values.
@@ -231,7 +237,9 @@ MAKE_ARGS := $(foreach config,$(CONFIGS), \
                         $(value config)=$(value $(value config)), \
                         $(value config)=n))
 
-MAKE_ARGS += CONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME)
+ifeq ($(MALI_KCONFIG_EXT_PREFIX),)
+    MAKE_ARGS += CONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME)
+endif
 
 #
 # EXTRA_CFLAGS to define the custom CONFIGs on out-of-tree build
@@ -243,13 +251,71 @@ EXTRA_CFLAGS := $(foreach config,$(CONFIGS), \
                     $(if $(filter y m,$(value $(value config))), \
                         -D$(value config)=1))
 
-EXTRA_CFLAGS += -DCONFIG_MALI_PLATFORM_NAME=$(CONFIG_MALI_PLATFORM_NAME)
+ifeq ($(MALI_KCONFIG_EXT_PREFIX),)
+    EXTRA_CFLAGS += -DCONFIG_MALI_PLATFORM_NAME='\"$(CONFIG_MALI_PLATFORM_NAME)\"'
+    EXTRA_CFLAGS += -DCONFIG_MALI_NO_MALI_DEFAULT_GPU='\"$(CONFIG_MALI_NO_MALI_DEFAULT_GPU)\"'
+endif
 
 include $(KDIR)/../private/google-modules/soc/gs/Makefile.include
 
 #
 # KBUILD_EXTRA_SYMBOLS to prevent warnings about unknown functions
 #
+EXTRA_SYMBOLS += $(OUT_DIR)/../private/google-modules/gpu/mali_pixel/Module.symvers
+
+CFLAGS_MODULE += -Wall -Werror
+
+# The following were added to align with W=1 in scripts/Makefile.extrawarn
+# from the Linux source tree (v5.18.14)
+CFLAGS_MODULE += -Wextra -Wunused -Wno-unused-parameter
+CFLAGS_MODULE += -Wmissing-declarations
+CFLAGS_MODULE += -Wmissing-format-attribute
+CFLAGS_MODULE += -Wmissing-prototypes
+CFLAGS_MODULE += -Wold-style-definition
+# The -Wmissing-include-dirs cannot be enabled as the path to some of the
+# included directories change depending on whether it is an in-tree or
+# out-of-tree build.
+CFLAGS_MODULE += $(call cc-option, -Wunused-but-set-variable)
+CFLAGS_MODULE += $(call cc-option, -Wunused-const-variable)
+CFLAGS_MODULE += $(call cc-option, -Wpacked-not-aligned)
+CFLAGS_MODULE += $(call cc-option, -Wstringop-truncation)
+# The following turn off the warnings enabled by -Wextra
+CFLAGS_MODULE += -Wno-sign-compare
+CFLAGS_MODULE += -Wno-shift-negative-value
+# This flag is needed to avoid build errors on older kernels
+CFLAGS_MODULE += $(call cc-option, -Wno-cast-function-type)
+
+KBUILD_CPPFLAGS += -DKBUILD_EXTRA_WARN1
+
+# The following were added to align with W=2 in scripts/Makefile.extrawarn
+# from the Linux source tree (v5.18.14)
+CFLAGS_MODULE += -Wdisabled-optimization
+# The -Wshadow flag cannot be enabled unless upstream kernels are
+# patched to fix redefinitions of certain built-in functions and
+# global variables.
+CFLAGS_MODULE += $(call cc-option, -Wlogical-op)
+CFLAGS_MODULE += -Wmissing-field-initializers
+# -Wtype-limits must be disabled due to build failures on kernel 5.x
+CFLAGS_MODULE += -Wno-type-limits
+CFLAGS_MODULE += $(call cc-option, -Wmaybe-uninitialized)
+CFLAGS_MODULE += $(call cc-option, -Wunused-macros)
+
+KBUILD_CPPFLAGS += -DKBUILD_EXTRA_WARN2
+
+# This warning is disabled to avoid build failures in some kernel versions
+CFLAGS_MODULE += -Wno-ignored-qualifiers
+
+ifeq ($(CONFIG_GCOV_KERNEL),y)
+    CFLAGS_MODULE += $(call cc-option, -ftest-coverage)
+    CFLAGS_MODULE += $(call cc-option, -fprofile-arcs)
+    EXTRA_CFLAGS += -DGCOV_PROFILE=1
+endif
+
+ifeq ($(CONFIG_MALI_KCOV),y)
+    CFLAGS_MODULE += $(call cc-option, -fsanitize-coverage=trace-cmp)
+    EXTRA_CFLAGS += -DKCOV=1
+    EXTRA_CFLAGS += -DKCOV_ENABLE_COMPARISONS=1
+endif
 
 modules modules_install clean:
 	$(MAKE) -C $(KDIR) M=$(M) W=1 $(MAKE_ARGS) EXTRA_CFLAGS="$(EXTRA_CFLAGS)" KBUILD_EXTRA_SYMBOLS="$(EXTRA_SYMBOLS)" $(@)
diff --git a/mali_kbase/Mconfig b/mali_kbase/Mconfig
index 0f8f273..2d6fca0 100644
--- a/mali_kbase/Mconfig
+++ b/mali_kbase/Mconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -41,11 +41,31 @@ config MALI_PLATFORM_NAME
 	  When PLATFORM_CUSTOM is set, this needs to be set manually to
 	  pick up the desired platform files.
 
+choice
+	prompt "Mali HW backend"
+	depends on MALI_MIDGARD
+	default MALI_NO_MALI if NO_MALI
+	default MALI_REAL_HW
+
 config MALI_REAL_HW
-	bool
+	bool "Enable build of Mali kernel driver for real HW"
 	depends on MALI_MIDGARD
-	default y
-	default n if NO_MALI
+	help
+	  This is the default HW backend.
+
+config MALI_NO_MALI
+	bool "Enable build of Mali kernel driver for No Mali"
+	depends on MALI_MIDGARD && MALI_EXPERT
+	help
+	  This can be used to test the driver in a simulated environment
+	  whereby the hardware is not physically present. If the hardware is physically
+	  present it will not be used. This can be used to test the majority of the
+	  driver without needing actual hardware or for software benchmarking.
+	  All calls to the simulated hardware will complete immediately as if the hardware
+	  completed the task.
+
+
+endchoice
 
 config MALI_PLATFORM_DT_PIN_RST
 	bool "Enable Juno GPU Pin reset"
@@ -65,8 +85,7 @@ config MALI_CSF_SUPPORT
 config MALI_DEVFREQ
 	bool "Enable devfreq support for Mali"
 	depends on MALI_MIDGARD
-	default y if PLATFORM_JUNO
-	default y if PLATFORM_CUSTOM
+	default y
 	help
 	  Support devfreq for Mali.
 
@@ -98,16 +117,6 @@ config MALI_MIDGARD_ENABLE_TRACE
 	  Enables tracing in kbase. Trace log available through
 	  the "mali_trace" debugfs file, when the CONFIG_DEBUG_FS is enabled
 
-config MALI_DMA_FENCE
-	bool "Enable DMA_BUF fence support for Mali"
-	depends on MALI_MIDGARD
-	default n
-	help
-	  Support DMA_BUF fences for Mali.
-
-	  This option should only be enabled if the Linux Kernel has built in
-	  support for DMA_BUF fences.
-
 config MALI_ARBITER_SUPPORT
 	bool "Enable arbiter support for Mali"
 	depends on MALI_MIDGARD && !MALI_CSF_SUPPORT
@@ -130,7 +139,7 @@ config MALI_DMA_BUF_MAP_ON_DEMAND
 	default n
 	default y if !DMA_BUF_SYNC_IOCTL_SUPPORTED
 	help
-	  This option caused kbase to set up the GPU mapping of imported
+	  This option will cause kbase to set up the GPU mapping of imported
 	  dma-buf when needed to run atoms. This is the legacy behavior.
 
 	  This is intended for testing and the option will get removed in the
@@ -150,6 +159,12 @@ config MALI_DMA_BUF_LEGACY_COMPAT
 	  flushes in other drivers. This only has an effect for clients using
 	  UK 11.18 or older. For later UK versions it is not possible.
 
+config MALI_CORESIGHT
+	depends on MALI_MIDGARD && MALI_CSF_SUPPORT && !NO_MALI
+	select CSFFW_DEBUG_FW_AS_RW
+	bool "Enable Kbase CoreSight tracing support"
+	default n
+
 menuconfig MALI_EXPERT
 	depends on MALI_MIDGARD
 	bool "Enable Expert Settings"
@@ -158,17 +173,6 @@ menuconfig MALI_EXPERT
 	  Enabling this option and modifying the default settings may produce
 	  a driver with performance or other limitations.
 
-config MALI_2MB_ALLOC
-	bool "Attempt to allocate 2MB pages"
-	depends on MALI_MIDGARD && MALI_EXPERT
-	default n
-	help
-	  Rather than allocating all GPU memory page-by-page, attempt to
-	  allocate 2MB pages from the kernel. This reduces TLB pressure and
-	  helps to prevent memory fragmentation.
-
-	  If in doubt, say N
-
 config MALI_MEMORY_FULLY_BACKED
 	bool "Enable memory fully physically-backed"
 	depends on MALI_MIDGARD && MALI_EXPERT
@@ -192,6 +196,18 @@ config MALI_CORESTACK
 
 	  If unsure, say N.
 
+config PAGE_MIGRATION_SUPPORT
+	bool "Compile with page migration support"
+	depends on BACKEND_KERNEL
+	default y
+	default n if ANDROID
+	help
+	  Compile in support for page migration.
+	  If set to disabled ('n') then page migration cannot
+	  be enabled at all. If set to enabled, then page migration
+	  support is explicitly compiled in. This has no effect when
+	  PAGE_MIGRATION_OVERRIDE is disabled.
+
 choice
 	prompt "Error injection level"
 	depends on MALI_MIDGARD && MALI_EXPERT
@@ -231,14 +247,6 @@ config MALI_ERROR_INJECT
 	depends on MALI_MIDGARD && MALI_EXPERT
 	default y if !MALI_ERROR_INJECT_NONE
 
-config MALI_GEM5_BUILD
-	bool "Enable build of Mali kernel driver for GEM5"
-	depends on MALI_MIDGARD && MALI_EXPERT
-	default n
-	help
-	  This option is to do a Mali GEM5 build.
-	  If unsure, say N.
-
 config MALI_DEBUG
 	bool "Enable debug build"
 	depends on MALI_MIDGARD && MALI_EXPERT
@@ -247,6 +255,23 @@ config MALI_DEBUG
 	help
 	  Select this option for increased checking and reporting of errors.
 
+config MALI_GCOV_KERNEL
+	bool "Enable branch coverage via gcov"
+	depends on MALI_MIDGARD && MALI_DEBUG
+	default n
+	help
+	  Choose this option to enable building kbase with branch
+	  coverage information. When built against a supporting kernel,
+	  the coverage information will be available via debugfs.
+
+config MALI_KCOV
+	bool "Enable kcov coverage to support fuzzers"
+	depends on MALI_MIDGARD && MALI_DEBUG
+	default n
+	help
+	  Choose this option to enable building with fuzzing-oriented
+	  coverage, to improve the random test cases that are generated.
+
 config MALI_FENCE_DEBUG
 	bool "Enable debug sync fence usage"
 	depends on MALI_MIDGARD && MALI_EXPERT
@@ -329,6 +354,55 @@ config MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE
 	  tree using the property, opp-mali-errata-1485982. Otherwise the
 	  slowest clock will be selected.
 
+config MALI_HOST_CONTROLS_SC_RAILS
+	bool "Enable Host based control of the shader core power rails"
+	depends on MALI_EXPERT && MALI_CSF_SUPPORT
+	default n
+	help
+	  This option enables the Host based control of the power rails for
+	  shader cores. It is recommended to use PDCA (Power Domain Control
+	  Adapter) inside the GPU to handshake with SoC PMU to control the
+	  power of cores.
+
+config MALI_TRACE_POWER_GPU_WORK_PERIOD
+	bool "Enable per-application GPU metrics tracepoints"
+	depends on MALI_MIDGARD
+	default y
+	help
+	  This option enables per-application GPU metrics tracepoints.
+
+	  If unsure, say N.
+
+choice
+	prompt "CSF Firmware trace mode"
+	depends on MALI_MIDGARD
+	default MALI_FW_TRACE_MODE_MANUAL
+	help
+	  CSF Firmware log operating mode.
+
+config MALI_FW_TRACE_MODE_MANUAL
+	bool "manual mode"
+	depends on MALI_MIDGARD
+	help
+	  firmware log can be read manually by the userspace (and it will
+	  also be dumped automatically into dmesg on GPU reset).
+
+config MALI_FW_TRACE_MODE_AUTO_PRINT
+	bool "automatic printing mode"
+	depends on MALI_MIDGARD
+	help
+	  firmware log will be periodically emptied into dmesg, manual
+	  reading through debugfs is disabled.
+
+config MALI_FW_TRACE_MODE_AUTO_DISCARD
+	bool "automatic discarding mode"
+	depends on MALI_MIDGARD
+	help
+	  firmware log will be periodically discarded, the remaining log can be
+	  read manually by the userspace (and it will also be dumped
+	  automatically into dmesg on GPU reset).
+
+endchoice
 
-source "kernel/drivers/gpu/arm/midgard/arbitration/Mconfig"
+source "kernel/drivers/gpu/arm/arbitration/Mconfig"
 source "kernel/drivers/gpu/arm/midgard/tests/Mconfig"
diff --git a/mali_kbase/arbiter/mali_kbase_arbif.c b/mali_kbase/arbiter/mali_kbase_arbif.c
index 64e11ce..b5d3cd6 100644
--- a/mali_kbase/arbiter/mali_kbase_arbif.c
+++ b/mali_kbase/arbiter/mali_kbase_arbif.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,12 +28,12 @@
 #include <tl/mali_kbase_tracepoints.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
-#include "mali_kbase_arbiter_interface.h"
+#include "linux/mali_arbiter_interface.h"
 
 /* Arbiter interface version against which was implemented this module */
 #define MALI_REQUIRED_KBASE_ARBITER_INTERFACE_VERSION 5
 #if MALI_REQUIRED_KBASE_ARBITER_INTERFACE_VERSION != \
-			MALI_KBASE_ARBITER_INTERFACE_VERSION
+			MALI_ARBITER_INTERFACE_VERSION
 #error "Unsupported Mali Arbiter interface version."
 #endif
 
@@ -205,6 +205,7 @@ int kbase_arbif_init(struct kbase_device *kbdev)
 
 	if (!pdev->dev.driver || !try_module_get(pdev->dev.driver->owner)) {
 		dev_err(kbdev->dev, "arbiter_if driver not available\n");
+		put_device(&pdev->dev);
 		return -EPROBE_DEFER;
 	}
 	kbdev->arb.arb_dev = &pdev->dev;
@@ -212,6 +213,7 @@ int kbase_arbif_init(struct kbase_device *kbdev)
 	if (!arb_if) {
 		dev_err(kbdev->dev, "arbiter_if driver not ready\n");
 		module_put(pdev->dev.driver->owner);
+		put_device(&pdev->dev);
 		return -EPROBE_DEFER;
 	}
 
@@ -233,6 +235,7 @@ int kbase_arbif_init(struct kbase_device *kbdev)
 		if (err) {
 			dev_err(&pdev->dev, "Failed to register with arbiter\n");
 			module_put(pdev->dev.driver->owner);
+			put_device(&pdev->dev);
 			if (err != -EPROBE_DEFER)
 				err = -EFAULT;
 			return err;
@@ -262,8 +265,10 @@ void kbase_arbif_destroy(struct kbase_device *kbdev)
 		arb_if->vm_ops.vm_arb_unregister_dev(kbdev->arb.arb_if);
 	}
 	kbdev->arb.arb_if = NULL;
-	if (kbdev->arb.arb_dev)
+	if (kbdev->arb.arb_dev) {
 		module_put(kbdev->arb.arb_dev->driver->owner);
+		put_device(kbdev->arb.arb_dev);
+	}
 	kbdev->arb.arb_dev = NULL;
 }
 
diff --git a/mali_kbase/arbiter/mali_kbase_arbiter_pm.c b/mali_kbase/arbiter/mali_kbase_arbiter_pm.c
index d813a04..667552c 100644
--- a/mali_kbase/arbiter/mali_kbase_arbiter_pm.c
+++ b/mali_kbase/arbiter/mali_kbase_arbiter_pm.c
@@ -955,7 +955,6 @@ static inline bool kbase_arbiter_pm_vm_gpu_assigned_lockheld(
 int kbase_arbiter_pm_ctx_active_handle_suspend(struct kbase_device *kbdev,
 	enum kbase_pm_suspend_handler suspend_handler)
 {
-	struct kbasep_js_device_data *js_devdata = &kbdev->js_data;
 	struct kbase_arbiter_vm_state *arb_vm_state = kbdev->pm.arb_vm_state;
 	int res = 0;
 
@@ -1008,11 +1007,9 @@ int kbase_arbiter_pm_ctx_active_handle_suspend(struct kbase_device *kbdev,
 			/* Need to synchronously wait for GPU assignment */
 			atomic_inc(&kbdev->pm.gpu_users_waiting);
 			mutex_unlock(&arb_vm_state->vm_state_lock);
-			mutex_unlock(&kbdev->pm.lock);
-			mutex_unlock(&js_devdata->runpool_mutex);
+			kbase_pm_unlock(kbdev);
 			kbase_arbiter_pm_vm_wait_gpu_assignment(kbdev);
-			mutex_lock(&js_devdata->runpool_mutex);
-			mutex_lock(&kbdev->pm.lock);
+			kbase_pm_lock(kbdev);
 			mutex_lock(&arb_vm_state->vm_state_lock);
 			atomic_dec(&kbdev->pm.gpu_users_waiting);
 		}
@@ -1111,7 +1108,7 @@ static int arb_gpu_clk_notifier_register(struct kbase_device *kbdev,
 }
 
 /**
- * gpu_clk_notifier_unregister() - Unregister clock rate change notifier
+ * arb_gpu_clk_notifier_unregister() - Unregister clock rate change notifier
  * @kbdev:           kbase_device pointer
  * @gpu_clk_handle:  Handle unique to the enumerated GPU clock
  * @nb:              notifier block containing the callback function pointer
diff --git a/mali_kbase/arbitration/Kconfig b/mali_kbase/arbitration/Kconfig
deleted file mode 100644
index 1935c81..0000000
--- a/mali_kbase/arbitration/Kconfig
+++ /dev/null
@@ -1,49 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note OR MIT
-#
-# (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
-#
-# This program is free software and is provided to you under the terms of the
-# GNU General Public License version 2 as published by the Free Software
-# Foundation, and any use by you of this program is subject to the terms
-# of such GNU license.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, you can access it online at
-# http://www.gnu.org/licenses/gpl-2.0.html.
-#
-#
-
-config MALI_XEN
-	tristate "Enable Xen Interface reference code"
-	depends on MALI_ARBITRATION && XEN
-	default n
-	help
-	  Enables the build of xen interface modules used in the reference
-	  virtualization setup for Mali
-	  If unsure, say N.
-
-config MALI_ARBITER_MODULES
-	tristate "Enable mali arbiter modules"
-	depends on MALI_ARBITRATION
-	default y
-	help
-	  Enables the build of the arbiter modules used in the reference
-	  virtualization setup for Mali
-	  If unsure, say N
-
-config MALI_GPU_POWER_MODULES
-	tristate "Enable gpu power modules"
-	depends on MALI_ARBITRATION
-	default y
-	help
-	  Enables the build of the gpu power modules used in the reference
-	  virtualization setup for Mali
-	  If unsure, say N
-
-
-source "drivers/gpu/arm/midgard/arbitration/ptm/Kconfig"
diff --git a/mali_kbase/backend/gpu/Kbuild b/mali_kbase/backend/gpu/Kbuild
index 49abc1c..c37cc59 100644
--- a/mali_kbase/backend/gpu/Kbuild
+++ b/mali_kbase/backend/gpu/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -22,7 +22,6 @@ mali_kbase-y += \
     backend/gpu/mali_kbase_cache_policy_backend.o \
     backend/gpu/mali_kbase_gpuprops_backend.o \
     backend/gpu/mali_kbase_irq_linux.o \
-    backend/gpu/mali_kbase_js_backend.o \
     backend/gpu/mali_kbase_pm_backend.o \
     backend/gpu/mali_kbase_pm_driver.o \
     backend/gpu/mali_kbase_pm_metrics.o \
@@ -31,6 +30,7 @@ mali_kbase-y += \
     backend/gpu/mali_kbase_pm_coarse_demand.o \
     backend/gpu/mali_kbase_pm_adaptive.o \
     backend/gpu/mali_kbase_pm_policy.o \
+    backend/gpu/mali_kbase_pm_event_log.o \
     backend/gpu/mali_kbase_time.o \
     backend/gpu/mali_kbase_l2_mmu_config.o \
     backend/gpu/mali_kbase_clk_rate_trace_mgr.o
@@ -41,15 +41,20 @@ ifeq ($(MALI_USE_CSF),0)
         backend/gpu/mali_kbase_jm_as.o \
         backend/gpu/mali_kbase_debug_job_fault_backend.o \
         backend/gpu/mali_kbase_jm_hw.o \
-        backend/gpu/mali_kbase_jm_rb.o
+        backend/gpu/mali_kbase_jm_rb.o \
+        backend/gpu/mali_kbase_js_backend.o
 endif
 
 
 mali_kbase-$(CONFIG_MALI_DEVFREQ) += \
     backend/gpu/mali_kbase_devfreq.o
 
-# Dummy model
+ifneq ($(CONFIG_MALI_REAL_HW),y)
+    mali_kbase-y += backend/gpu/mali_kbase_model_linux.o
+endif
+
+# NO_MALI Dummy model interface
 mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_dummy.o
-mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_linux.o
 # HW error simulation
 mali_kbase-$(CONFIG_MALI_NO_MALI) += backend/gpu/mali_kbase_model_error_generator.o
+
diff --git a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c
index 9587c70..86539d5 100644
--- a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2016, 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,22 +22,59 @@
 #include "backend/gpu/mali_kbase_cache_policy_backend.h"
 #include <device/mali_kbase_device.h>
 
+/**
+ * kbasep_amba_register_present() - Check AMBA_<> register is present
+ *                                  in the GPU.
+ * @kbdev:    Device pointer
+ *
+ * Note: Only for arch version 12.x.1 onwards.
+ *
+ * Return: true if AMBA_FEATURES/ENABLE registers are present.
+ */
+static bool kbasep_amba_register_present(struct kbase_device *kbdev)
+{
+	return (ARCH_MAJOR_REV_REG(kbdev->gpu_props.props.raw_props.gpu_id) >=
+		GPU_ID2_ARCH_MAJOR_REV_MAKE(12, 1));
+}
 
 void kbase_cache_set_coherency_mode(struct kbase_device *kbdev,
 		u32 mode)
 {
 	kbdev->current_gpu_coherency_mode = mode;
 
-		kbase_reg_write(kbdev, COHERENCY_ENABLE, mode);
+	if (kbasep_amba_register_present(kbdev)) {
+		u32 val = kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_ENABLE));
+
+		val = AMBA_ENABLE_COHERENCY_PROTOCOL_SET(val, mode);
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(AMBA_ENABLE), val);
+	} else
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(COHERENCY_ENABLE), mode);
 }
 
 u32 kbase_cache_get_coherency_features(struct kbase_device *kbdev)
 {
 	u32 coherency_features;
 
+	if (kbasep_amba_register_present(kbdev))
+		coherency_features =
+			kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_FEATURES));
+	else
 		coherency_features = kbase_reg_read(
 			kbdev, GPU_CONTROL_REG(COHERENCY_FEATURES));
 
 	return coherency_features;
 }
 
+void kbase_amba_set_memory_cache_support(struct kbase_device *kbdev,
+					 bool enable)
+{
+	if (kbasep_amba_register_present(kbdev)) {
+		u32 val = kbase_reg_read(kbdev, GPU_CONTROL_REG(AMBA_ENABLE));
+
+		val = AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SET(val, enable);
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(AMBA_ENABLE), val);
+
+	} else {
+		WARN(1, "memory_cache_support not supported");
+	}
+}
diff --git a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h
index 13c79d6..0103695 100644
--- a/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h
+++ b/mali_kbase/backend/gpu/mali_kbase_cache_policy_backend.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2016, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -43,4 +43,14 @@ void kbase_cache_set_coherency_mode(struct kbase_device *kbdev,
  */
 u32 kbase_cache_get_coherency_features(struct kbase_device *kbdev);
 
+/**
+ * kbase_amba_set_memory_cache_support() - Sets AMBA memory cache support
+ *                                         in the GPU.
+ * @kbdev:    Device pointer
+ * @enable:   true for enable.
+ *
+ * Note: Only for arch version 12.x.1 onwards.
+ */
+void kbase_amba_set_memory_cache_support(struct kbase_device *kbdev,
+					 bool enable);
 #endif /* _KBASE_CACHE_POLICY_BACKEND_H_ */
diff --git a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c
index d6b9750..cca4f74 100644
--- a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c
+++ b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -58,8 +58,10 @@ get_clk_rate_trace_callbacks(__maybe_unused struct kbase_device *kbdev)
 	if (WARN_ON(!kbdev) || WARN_ON(!kbdev->dev))
 		return callbacks;
 
-	arbiter_if_node =
-		of_get_property(kbdev->dev->of_node, "arbiter_if", NULL);
+	arbiter_if_node = of_get_property(kbdev->dev->of_node, "arbiter-if", NULL);
+	if (!arbiter_if_node)
+		arbiter_if_node = of_get_property(kbdev->dev->of_node, "arbiter_if", NULL);
+
 	/* Arbitration enabled, override the callback pointer.*/
 	if (arbiter_if_node)
 		callbacks = &arb_clk_rate_trace_ops;
@@ -72,49 +74,6 @@ get_clk_rate_trace_callbacks(__maybe_unused struct kbase_device *kbdev)
 	return callbacks;
 }
 
-int kbase_lowest_gpu_freq_init(struct kbase_device *kbdev)
-{
-	/* Uses default reference frequency defined in below macro */
-	u64 lowest_freq_khz = DEFAULT_REF_TIMEOUT_FREQ_KHZ;
-
-	/* Only check lowest frequency in cases when OPPs are used and
-	 * present in the device tree.
-	 */
-#ifdef CONFIG_PM_OPP
-	struct dev_pm_opp *opp_ptr;
-	unsigned long found_freq = 0;
-
-	/* find lowest frequency OPP */
-	opp_ptr = dev_pm_opp_find_freq_ceil(kbdev->dev, &found_freq);
-	if (IS_ERR(opp_ptr)) {
-		dev_err(kbdev->dev,
-			"No OPPs found in device tree! Scaling timeouts using %llu kHz",
-			(unsigned long long)lowest_freq_khz);
-	} else {
-#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
-		dev_pm_opp_put(opp_ptr); /* decrease OPP refcount */
-#endif
-		/* convert found frequency to KHz */
-		found_freq /= 1000;
-
-		/* If lowest frequency in OPP table is still higher
-		 * than the reference, then keep the reference frequency
-		 * as the one to use for scaling .
-		 */
-		if (found_freq < lowest_freq_khz)
-			lowest_freq_khz = found_freq;
-	}
-#else
-	dev_err(kbdev->dev,
-		"No operating-points-v2 node or operating-points property in DT");
-#endif
-
-	kbdev->lowest_gpu_freq_khz = lowest_freq_khz;
-	dev_dbg(kbdev->dev, "Lowest frequency identified is %llu kHz",
-		kbdev->lowest_gpu_freq_khz);
-	return 0;
-}
-
 static int gpu_clk_rate_change_notifier(struct notifier_block *nb,
 			unsigned long event, void *data)
 {
diff --git a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h
index a6ee959..35b3b8d 100644
--- a/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h
+++ b/mali_kbase/backend/gpu/mali_kbase_clk_rate_trace_mgr.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -61,20 +61,6 @@ struct kbase_clk_data {
 int kbase_clk_rate_trace_manager_init(struct kbase_device *kbdev);
 
 /**
- * kbase_init_lowest_gpu_freq() - Find the lowest frequency that the GPU can
- *                                run as using the device tree, and save this
- *                                within kbdev.
- * @kbdev: Pointer to kbase device.
- *
- * This function could be called from kbase_clk_rate_trace_manager_init,
- * but is left separate as it can be called as soon as
- * dev_pm_opp_of_add_table() has been called to initialize the OPP table.
- *
- * Return: 0 in any case.
- */
-int kbase_lowest_gpu_freq_init(struct kbase_device *kbdev);
-
-/**
  * kbase_clk_rate_trace_manager_term - Terminate GPU clock rate trace manager.
  *
  *  @kbdev:      Device pointer
diff --git a/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c b/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c
index e121b41..cd3b29d 100644
--- a/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_debug_job_fault_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2012-2015, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -59,7 +59,7 @@ static int job_slot_reg_snapshot[] = {
 	JS_CONFIG_NEXT
 };
 
-/*MMU_REG(r)*/
+/*MMU_CONTROL_REG(r)*/
 static int mmu_reg_snapshot[] = {
 	MMU_IRQ_MASK,
 	MMU_IRQ_STATUS
@@ -118,15 +118,14 @@ bool kbase_debug_job_fault_reg_snapshot_init(struct kbase_context *kctx,
 
 	/* get the MMU registers*/
 	for (i = 0; i < sizeof(mmu_reg_snapshot)/4; i++) {
-		kctx->reg_dump[offset] = MMU_REG(mmu_reg_snapshot[i]);
+		kctx->reg_dump[offset] = MMU_CONTROL_REG(mmu_reg_snapshot[i]);
 		offset += 2;
 	}
 
 	/* get the Address space registers*/
 	for (j = 0; j < as_number; j++) {
 		for (i = 0; i < sizeof(as_reg_snapshot)/4; i++) {
-			kctx->reg_dump[offset] =
-					MMU_AS_REG(j, as_reg_snapshot[i]);
+			kctx->reg_dump[offset] = MMU_STAGE1_REG(MMU_AS_REG(j, as_reg_snapshot[i]));
 			offset += 2;
 		}
 	}
diff --git a/mali_kbase/backend/gpu/mali_kbase_devfreq.c b/mali_kbase/backend/gpu/mali_kbase_devfreq.c
index 00b32b9..a389cd9 100644
--- a/mali_kbase/backend/gpu/mali_kbase_devfreq.c
+++ b/mali_kbase/backend/gpu/mali_kbase_devfreq.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -57,7 +57,7 @@ static unsigned long get_voltage(struct kbase_device *kbdev, unsigned long freq)
 	opp = dev_pm_opp_find_freq_exact(kbdev->dev, freq, true);
 
 	if (IS_ERR_OR_NULL(opp))
-		dev_err(kbdev->dev, "Failed to get opp (%ld)\n", PTR_ERR(opp));
+		dev_err(kbdev->dev, "Failed to get opp (%d)\n", PTR_ERR_OR_ZERO(opp));
 	else {
 		voltage = dev_pm_opp_get_voltage(opp);
 #if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
@@ -133,8 +133,8 @@ kbase_devfreq_target(struct device *dev, unsigned long *target_freq, u32 flags)
 	rcu_read_unlock();
 #endif
 	if (IS_ERR_OR_NULL(opp)) {
-		dev_err(dev, "Failed to get opp (%ld)\n", PTR_ERR(opp));
-		return PTR_ERR(opp);
+		dev_err(dev, "Failed to get opp (%d)\n", PTR_ERR_OR_ZERO(opp));
+		return IS_ERR(opp) ? PTR_ERR(opp) : -ENODEV;
 	}
 #if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
 	dev_pm_opp_put(opp);
@@ -317,6 +317,7 @@ static int kbase_devfreq_init_freq_table(struct kbase_device *kbdev,
 
 	dp->max_state = i;
 
+
 	/* Have the lowest clock as suspend clock.
 	 * It may be overridden by 'opp-mali-errata-1485982'.
 	 */
@@ -630,12 +631,12 @@ static void kbase_devfreq_work_term(struct kbase_device *kbdev)
 	destroy_workqueue(workq);
 }
 
-
 int kbase_devfreq_init(struct kbase_device *kbdev)
 {
 	struct devfreq_dev_profile *dp;
 	int err;
 	unsigned int i;
+	bool free_devfreq_freq_table = true;
 
 	if (kbdev->nr_clocks == 0) {
 		dev_err(kbdev->dev, "Clock not available for devfreq\n");
@@ -669,32 +670,35 @@ int kbase_devfreq_init(struct kbase_device *kbdev)
 			dp->freq_table[0] / 1000;
 	}
 
-	err = kbase_devfreq_init_core_mask_table(kbdev);
+#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
+	err = kbase_ipa_init(kbdev);
 	if (err) {
-		kbase_devfreq_term_freq_table(kbdev);
-		return err;
+		dev_err(kbdev->dev, "IPA initialization failed");
+		goto ipa_init_failed;
 	}
+#endif
+
+	err = kbase_devfreq_init_core_mask_table(kbdev);
+	if (err)
+		goto init_core_mask_table_failed;
 
 	kbdev->devfreq = devfreq_add_device(kbdev->dev, dp,
 				"simple_ondemand", NULL);
 	if (IS_ERR(kbdev->devfreq)) {
 		err = PTR_ERR(kbdev->devfreq);
 		kbdev->devfreq = NULL;
-		kbase_devfreq_term_core_mask_table(kbdev);
-		kbase_devfreq_term_freq_table(kbdev);
-		dev_err(kbdev->dev, "Fail to add devfreq device(%d)\n", err);
-		return err;
+		dev_err(kbdev->dev, "Fail to add devfreq device(%d)", err);
+		goto devfreq_add_dev_failed;
 	}
 
+	/* Explicit free of freq table isn't needed after devfreq_add_device() */
+	free_devfreq_freq_table = false;
+
 	/* Initialize devfreq suspend/resume workqueue */
 	err = kbase_devfreq_work_init(kbdev);
 	if (err) {
-		if (devfreq_remove_device(kbdev->devfreq))
-			dev_err(kbdev->dev, "Fail to rm devfreq\n");
-		kbdev->devfreq = NULL;
-		kbase_devfreq_term_core_mask_table(kbdev);
-		dev_err(kbdev->dev, "Fail to init devfreq workqueue\n");
-		return err;
+		dev_err(kbdev->dev, "Fail to init devfreq workqueue");
+		goto devfreq_work_init_failed;
 	}
 
 	/* devfreq_add_device only copies a few of kbdev->dev's fields, so
@@ -705,26 +709,20 @@ int kbase_devfreq_init(struct kbase_device *kbdev)
 	err = devfreq_register_opp_notifier(kbdev->dev, kbdev->devfreq);
 	if (err) {
 		dev_err(kbdev->dev,
-			"Failed to register OPP notifier (%d)\n", err);
+			"Failed to register OPP notifier (%d)", err);
 		goto opp_notifier_failed;
 	}
 
 #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
-	err = kbase_ipa_init(kbdev);
-	if (err) {
-		dev_err(kbdev->dev, "IPA initialization failed\n");
-		goto ipa_init_failed;
-	}
-
 	kbdev->devfreq_cooling = of_devfreq_cooling_register_power(
 			kbdev->dev->of_node,
 			kbdev->devfreq,
 			&kbase_ipa_power_model_ops);
 	if (IS_ERR_OR_NULL(kbdev->devfreq_cooling)) {
-		err = PTR_ERR(kbdev->devfreq_cooling);
+		err = PTR_ERR_OR_ZERO(kbdev->devfreq_cooling);
 		dev_err(kbdev->dev,
-			"Failed to register cooling device (%d)\n",
-			err);
+			"Failed to register cooling device (%d)", err);
+		err = err == 0 ? -ENODEV : err;
 		goto cooling_reg_failed;
 	}
 #endif
@@ -733,21 +731,29 @@ int kbase_devfreq_init(struct kbase_device *kbdev)
 
 #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
 cooling_reg_failed:
-	kbase_ipa_term(kbdev);
-ipa_init_failed:
 	devfreq_unregister_opp_notifier(kbdev->dev, kbdev->devfreq);
 #endif /* CONFIG_DEVFREQ_THERMAL */
 
 opp_notifier_failed:
 	kbase_devfreq_work_term(kbdev);
 
+devfreq_work_init_failed:
 	if (devfreq_remove_device(kbdev->devfreq))
-		dev_err(kbdev->dev, "Failed to terminate devfreq (%d)\n", err);
+		dev_err(kbdev->dev, "Failed to terminate devfreq (%d)", err);
 
 	kbdev->devfreq = NULL;
 
+devfreq_add_dev_failed:
 	kbase_devfreq_term_core_mask_table(kbdev);
 
+init_core_mask_table_failed:
+#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
+	kbase_ipa_term(kbdev);
+ipa_init_failed:
+#endif
+	if (free_devfreq_freq_table)
+		kbase_devfreq_term_freq_table(kbdev);
+
 	return err;
 }
 
@@ -760,8 +766,6 @@ void kbase_devfreq_term(struct kbase_device *kbdev)
 #if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
 	if (kbdev->devfreq_cooling)
 		devfreq_cooling_unregister(kbdev->devfreq_cooling);
-
-	kbase_ipa_term(kbdev);
 #endif
 
 	devfreq_unregister_opp_notifier(kbdev->dev, kbdev->devfreq);
@@ -775,4 +779,8 @@ void kbase_devfreq_term(struct kbase_device *kbdev)
 		kbdev->devfreq = NULL;
 
 	kbase_devfreq_term_core_mask_table(kbdev);
+
+#if IS_ENABLED(CONFIG_DEVFREQ_THERMAL)
+	kbase_ipa_term(kbdev);
+#endif
 }
diff --git a/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c b/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c
index 0ea14bc..10e92ec 100644
--- a/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_gpuprops_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -40,19 +40,7 @@ int kbase_backend_gpuprops_get(struct kbase_device *kbdev,
 
 	registers.l2_features = kbase_reg_read(kbdev,
 				GPU_CONTROL_REG(L2_FEATURES));
-	registers.core_features = 0;
-#if !MALI_USE_CSF
-	/* TGOx */
-	registers.core_features = kbase_reg_read(kbdev,
-				GPU_CONTROL_REG(CORE_FEATURES));
-#else /* !MALI_USE_CSF */
-	if (!(((registers.gpu_id & GPU_ID2_PRODUCT_MODEL) ==
-	       GPU_ID2_PRODUCT_TDUX) ||
-	      ((registers.gpu_id & GPU_ID2_PRODUCT_MODEL) ==
-	       GPU_ID2_PRODUCT_TODX)))
-		registers.core_features =
-			kbase_reg_read(kbdev, GPU_CONTROL_REG(CORE_FEATURES));
-#endif /* MALI_USE_CSF */
+
 	registers.tiler_features = kbase_reg_read(kbdev,
 				GPU_CONTROL_REG(TILER_FEATURES));
 	registers.mem_features = kbase_reg_read(kbdev,
@@ -170,6 +158,11 @@ int kbase_backend_gpuprops_get_features(struct kbase_device *kbdev,
 
 	regdump->coherency_features = coherency_features;
 
+	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CORE_FEATURES))
+		regdump->core_features = kbase_reg_read(kbdev, GPU_CONTROL_REG(CORE_FEATURES));
+	else
+		regdump->core_features = 0;
+
 	kbase_pm_register_access_disable(kbdev);
 
 	return error;
diff --git a/mali_kbase/backend/gpu/mali_kbase_instr_backend.c b/mali_kbase/backend/gpu/mali_kbase_instr_backend.c
index 0ece571..b89b917 100644
--- a/mali_kbase/backend/gpu/mali_kbase_instr_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_instr_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -29,6 +29,20 @@
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_instr_internal.h>
 
+static int wait_prfcnt_ready(struct kbase_device *kbdev)
+{
+	u32 loops;
+
+	for (loops = 0; loops < KBASE_PRFCNT_ACTIVE_MAX_LOOPS; loops++) {
+		const u32 prfcnt_active = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)) &
+								  GPU_STATUS_PRFCNT_ACTIVE;
+		if (!prfcnt_active)
+			return 0;
+	}
+
+	dev_err(kbdev->dev, "PRFCNT_ACTIVE bit stuck\n");
+	return -EBUSY;
+}
 
 int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev,
 					struct kbase_context *kctx,
@@ -43,20 +57,20 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev,
 
 	/* alignment failure */
 	if ((enable->dump_buffer == 0ULL) || (enable->dump_buffer & (2048 - 1)))
-		goto out_err;
+		return err;
 
 	spin_lock_irqsave(&kbdev->hwcnt.lock, flags);
 
 	if (kbdev->hwcnt.backend.state != KBASE_INSTR_STATE_DISABLED) {
 		/* Instrumentation is already enabled */
 		spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
-		goto out_err;
+		return err;
 	}
 
 	if (kbase_is_gpu_removed(kbdev)) {
 		/* GPU has been removed by Arbiter */
 		spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
-		goto out_err;
+		return err;
 	}
 
 	/* Enable interrupt */
@@ -81,9 +95,19 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev,
 	prfcnt_config |= enable->counter_set << PRFCNT_CONFIG_SETSELECT_SHIFT;
 #endif
 
+	/* Wait until prfcnt config register can be written */
+	err = wait_prfcnt_ready(kbdev);
+	if (err)
+		return err;
+
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_CONFIG),
 			prfcnt_config | PRFCNT_CONFIG_MODE_OFF);
 
+	/* Wait until prfcnt is disabled before writing configuration registers */
+	err = wait_prfcnt_ready(kbdev);
+	if (err)
+		return err;
+
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_LO),
 					enable->dump_buffer & 0xFFFFFFFF);
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_HI),
@@ -111,12 +135,8 @@ int kbase_instr_hwcnt_enable_internal(struct kbase_device *kbdev,
 
 	spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
 
-	err = 0;
-
 	dev_dbg(kbdev->dev, "HW counters dumping set-up for context %pK", kctx);
-	return err;
- out_err:
-	return err;
+	return 0;
 }
 
 static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev)
@@ -135,7 +155,10 @@ static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev)
 
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), irq_mask & ~PRFCNT_SAMPLE_COMPLETED);
 
-	/* Disable the counters */
+	/* Wait until prfcnt config register can be written, then disable the counters.
+	 * Return value is ignored as we are disabling anyway.
+	 */
+	wait_prfcnt_ready(kbdev);
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_CONFIG), 0);
 
 	kbdev->hwcnt.kctx = NULL;
@@ -146,7 +169,6 @@ static void kbasep_instr_hwc_disable_hw_prfcnt(struct kbase_device *kbdev)
 int kbase_instr_hwcnt_disable_internal(struct kbase_context *kctx)
 {
 	unsigned long flags, pm_flags;
-	int err = -EINVAL;
 	struct kbase_device *kbdev = kctx->kbdev;
 
 	while (1) {
@@ -167,14 +189,14 @@ int kbase_instr_hwcnt_disable_internal(struct kbase_context *kctx)
 			/* Instrumentation is not enabled */
 			spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
 			spin_unlock_irqrestore(&kbdev->hwaccess_lock, pm_flags);
-			return err;
+			return -EINVAL;
 		}
 
 		if (kbdev->hwcnt.kctx != kctx) {
 			/* Instrumentation has been setup for another context */
 			spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
 			spin_unlock_irqrestore(&kbdev->hwaccess_lock, pm_flags);
-			return err;
+			return -EINVAL;
 		}
 
 		if (kbdev->hwcnt.backend.state == KBASE_INSTR_STATE_IDLE)
@@ -233,6 +255,11 @@ int kbase_instr_hwcnt_request_dump(struct kbase_context *kctx)
 	 */
 	kbdev->hwcnt.backend.state = KBASE_INSTR_STATE_DUMPING;
 
+	/* Wait until prfcnt is ready to request dump */
+	err = wait_prfcnt_ready(kbdev);
+	if (err)
+		goto unlock;
+
 	/* Reconfigure the dump address */
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(PRFCNT_BASE_LO),
 					kbdev->hwcnt.addr & 0xFFFFFFFF);
@@ -248,11 +275,8 @@ int kbase_instr_hwcnt_request_dump(struct kbase_context *kctx)
 
 	dev_dbg(kbdev->dev, "HW counters dumping done for context %pK", kctx);
 
-	err = 0;
-
  unlock:
 	spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
-
 	return err;
 }
 KBASE_EXPORT_SYMBOL(kbase_instr_hwcnt_request_dump);
@@ -346,21 +370,24 @@ int kbase_instr_hwcnt_clear(struct kbase_context *kctx)
 	 */
 	if (kbdev->hwcnt.kctx != kctx || kbdev->hwcnt.backend.state !=
 							KBASE_INSTR_STATE_IDLE)
-		goto out;
+		goto unlock;
 
 	if (kbase_is_gpu_removed(kbdev)) {
 		/* GPU has been removed by Arbiter */
-		goto out;
+		goto unlock;
 	}
 
+	/* Wait until prfcnt is ready to clear */
+	err = wait_prfcnt_ready(kbdev);
+	if (err)
+		goto unlock;
+
 	/* Clear the counters */
 	KBASE_KTRACE_ADD(kbdev, CORE_GPU_PRFCNT_CLEAR, NULL, 0);
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND),
 						GPU_COMMAND_PRFCNT_CLEAR);
 
-	err = 0;
-
-out:
+unlock:
 	spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
 	return err;
 }
diff --git a/mali_kbase/backend/gpu/mali_kbase_instr_defs.h b/mali_kbase/backend/gpu/mali_kbase_instr_defs.h
index 7190f42..bd2eb8a 100644
--- a/mali_kbase/backend/gpu/mali_kbase_instr_defs.h
+++ b/mali_kbase/backend/gpu/mali_kbase_instr_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014, 2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014, 2016, 2018-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,7 +26,7 @@
 #ifndef _KBASE_INSTR_DEFS_H_
 #define _KBASE_INSTR_DEFS_H_
 
-#include <mali_kbase_hwcnt_gpu.h>
+#include <hwcnt/mali_kbase_hwcnt_gpu.h>
 
 /*
  * Instrumentation State Machine States
diff --git a/mali_kbase/backend/gpu/mali_kbase_irq_linux.c b/mali_kbase/backend/gpu/mali_kbase_irq_linux.c
index a29f7ef..b95277c 100644
--- a/mali_kbase/backend/gpu/mali_kbase_irq_linux.c
+++ b/mali_kbase/backend/gpu/mali_kbase_irq_linux.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,12 +25,12 @@
 
 #include <linux/interrupt.h>
 
-#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 
 /* GPU IRQ Tags */
-#define	JOB_IRQ_TAG	0
-#define MMU_IRQ_TAG	1
-#define GPU_IRQ_TAG	2
+#define JOB_IRQ_TAG 0
+#define MMU_IRQ_TAG 1
+#define GPU_IRQ_TAG 2
 
 static void *kbase_tag(void *ptr, u32 tag)
 {
@@ -99,7 +99,7 @@ static irqreturn_t kbase_mmu_irq_handler(int irq, void *data)
 
 	atomic_inc(&kbdev->faults_pending);
 
-	val = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_STATUS));
+	val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS));
 
 #ifdef CONFIG_MALI_DEBUG
 	if (!kbdev->pm.backend.driver_ready_for_irqs)
@@ -163,7 +163,6 @@ static irq_handler_t kbase_handler_table[] = {
 
 #ifdef CONFIG_MALI_DEBUG
 #define  JOB_IRQ_HANDLER JOB_IRQ_TAG
-#define  MMU_IRQ_HANDLER MMU_IRQ_TAG
 #define  GPU_IRQ_HANDLER GPU_IRQ_TAG
 
 /**
@@ -299,7 +298,7 @@ static irqreturn_t kbase_mmu_irq_test_handler(int irq, void *data)
 		return IRQ_NONE;
 	}
 
-	val = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_STATUS));
+	val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS));
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
@@ -311,7 +310,7 @@ static irqreturn_t kbase_mmu_irq_test_handler(int irq, void *data)
 	kbasep_irq_test_data.triggered = 1;
 	wake_up(&kbasep_irq_test_data.wait);
 
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), val);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), val);
 
 	return IRQ_HANDLED;
 }
@@ -345,8 +344,8 @@ static int kbasep_common_test_interrupt(
 		break;
 	case MMU_IRQ_TAG:
 		test_handler = kbase_mmu_irq_test_handler;
-		rawstat_offset = MMU_REG(MMU_IRQ_RAWSTAT);
-		mask_offset = MMU_REG(MMU_IRQ_MASK);
+		rawstat_offset = MMU_CONTROL_REG(MMU_IRQ_RAWSTAT);
+		mask_offset = MMU_CONTROL_REG(MMU_IRQ_MASK);
 		break;
 	case GPU_IRQ_TAG:
 		/* already tested by pm_driver - bail out */
@@ -501,4 +500,4 @@ void kbase_synchronize_irqs(struct kbase_device *kbdev)
 
 KBASE_EXPORT_TEST_API(kbase_synchronize_irqs);
 
-#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_as.c b/mali_kbase/backend/gpu/mali_kbase_jm_as.c
index 309e5c7..7059c84 100644
--- a/mali_kbase/backend/gpu/mali_kbase_jm_as.c
+++ b/mali_kbase/backend/gpu/mali_kbase_jm_as.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -67,9 +67,8 @@ static void assign_and_activate_kctx_addr_space(struct kbase_device *kbdev,
 	kbase_js_runpool_inc_context_count(kbdev, kctx);
 }
 
-bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js)
+bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, struct kbase_context *kctx,
+				 unsigned int js)
 {
 	int i;
 
@@ -240,4 +239,3 @@ bool kbase_backend_use_ctx(struct kbase_device *kbdev,
 
 	return true;
 }
-
diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_hw.c b/mali_kbase/backend/gpu/mali_kbase_jm_hw.c
index 32bdf72..dd8f4d9 100644
--- a/mali_kbase/backend/gpu/mali_kbase_jm_hw.c
+++ b/mali_kbase/backend/gpu/mali_kbase_jm_hw.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -34,7 +34,7 @@
 #include <mali_kbase_ctx_sched.h>
 #include <mali_kbase_kinstr_jm.h>
 #include <mali_kbase_hwaccess_instr.h>
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_irq_internal.h>
 #include <backend/gpu/mali_kbase_jm_internal.h>
@@ -44,9 +44,8 @@ static void kbasep_try_reset_gpu_early_locked(struct kbase_device *kbdev);
 static u64 kbasep_apply_limited_core_mask(const struct kbase_device *kbdev,
 				const u64 affinity, const u64 limited_core_mask);
 
-static u64 kbase_job_write_affinity(struct kbase_device *kbdev,
-				base_jd_core_req core_req,
-				int js, const u64 limited_core_mask)
+static u64 kbase_job_write_affinity(struct kbase_device *kbdev, base_jd_core_req core_req,
+				    unsigned int js, const u64 limited_core_mask)
 {
 	u64 affinity;
 	bool skip_affinity_check = false;
@@ -191,9 +190,28 @@ static u64 select_job_chain(struct kbase_jd_atom *katom)
 	return jc;
 }
 
-void kbase_job_hw_submit(struct kbase_device *kbdev,
-				struct kbase_jd_atom *katom,
-				int js)
+static inline bool kbasep_jm_wait_js_free(struct kbase_device *kbdev, unsigned int js,
+					  struct kbase_context *kctx)
+{
+	const ktime_t wait_loop_start = ktime_get_raw();
+	const s64 max_timeout = (s64)kbdev->js_data.js_free_wait_time_ms;
+	s64 diff = 0;
+
+	/* wait for the JS_COMMAND_NEXT register to reach the given status value */
+	do {
+		if (!kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT)))
+			return true;
+
+		diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start));
+	} while (diff < max_timeout);
+
+	dev_err(kbdev->dev, "Timeout in waiting for job slot %u to become free for ctx %d_%u", js,
+		kctx->tgid, kctx->id);
+
+	return false;
+}
+
+int kbase_job_hw_submit(struct kbase_device *kbdev, struct kbase_jd_atom *katom, unsigned int js)
 {
 	struct kbase_context *kctx;
 	u32 cfg;
@@ -202,13 +220,12 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
 	struct slot_rb *ptr_slot_rb = &kbdev->hwaccess.backend.slot_rb[js];
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	KBASE_DEBUG_ASSERT(kbdev);
-	KBASE_DEBUG_ASSERT(katom);
 
 	kctx = katom->kctx;
 
 	/* Command register must be available */
-	KBASE_DEBUG_ASSERT(kbasep_jm_is_js_free(kbdev, js, kctx));
+	if (!kbasep_jm_wait_js_free(kbdev, js, kctx))
+		return -EPERM;
 
 	dev_dbg(kctx->kbdev->dev, "Write JS_HEAD_NEXT 0x%llx for atom %pK\n",
 		jc_head, (void *)katom);
@@ -226,36 +243,47 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
 	 */
 	cfg = kctx->as_nr;
 
-	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION) &&
-			!(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET))
-		cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;
+	if(!kbase_jd_katom_is_protected(katom)) {
+		if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION) &&
+		    !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET))
+			cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;
+
+		if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_START)) {
+			/* Force a cache maintenance operation if the newly submitted
+			 * katom to the slot is from a different kctx. For a JM GPU
+			 * that has the feature BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER,
+			 * applies a FLUSH_INV_SHADER_OTHER. Otherwise, do a
+			 * FLUSH_CLEAN_INVALIDATE.
+			 */
+			u64 tagged_kctx = ptr_slot_rb->last_kctx_tagged;
+
+			if (tagged_kctx != SLOT_RB_NULL_TAG_VAL &&
+			    tagged_kctx != SLOT_RB_TAG_KCTX(kctx)) {
+				if (kbase_hw_has_feature(kbdev,
+							 BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER))
+					cfg |= JS_CONFIG_START_FLUSH_INV_SHADER_OTHER;
+				else
+					cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE;
+			} else
+				cfg |= JS_CONFIG_START_FLUSH_NO_ACTION;
+		} else
+			cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE;
 
-	if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_START)) {
-		/* Force a cache maintenance operation if the newly submitted
-		 * katom to the slot is from a different kctx. For a JM GPU
-		 * that has the feature BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER,
-		 * applies a FLUSH_INV_SHADER_OTHER. Otherwise, do a
-		 * FLUSH_CLEAN_INVALIDATE.
+		if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_END) &&
+		    !(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET))
+			cfg |= JS_CONFIG_END_FLUSH_NO_ACTION;
+		else if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CLEAN_ONLY_SAFE))
+			cfg |= JS_CONFIG_END_FLUSH_CLEAN;
+		else
+			cfg |= JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
+	} else {
+		/* Force cache flush on job chain start/end if katom is protected.
+		 * Valhall JM GPUs have BASE_HW_FEATURE_CLEAN_ONLY_SAFE feature,
+		 * so DDK set JS_CONFIG_END_FLUSH_CLEAN config
 		 */
-		u64 tagged_kctx = ptr_slot_rb->last_kctx_tagged;
-
-		if (tagged_kctx != SLOT_RB_NULL_TAG_VAL && tagged_kctx != SLOT_RB_TAG_KCTX(kctx)) {
-			if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER))
-				cfg |= JS_CONFIG_START_FLUSH_INV_SHADER_OTHER;
-			else
-				cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE;
-		} else
-			cfg |= JS_CONFIG_START_FLUSH_NO_ACTION;
-	} else
 		cfg |= JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE;
-
-	if (0 != (katom->core_req & BASE_JD_REQ_SKIP_CACHE_END) &&
-			!(kbdev->serialize_jobs & KBASE_SERIALIZE_RESET))
-		cfg |= JS_CONFIG_END_FLUSH_NO_ACTION;
-	else if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_CLEAN_ONLY_SAFE))
 		cfg |= JS_CONFIG_END_FLUSH_CLEAN;
-	else
-		cfg |= JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
+	}
 
 	cfg |= JS_CONFIG_THREAD_PRI(8);
 
@@ -281,7 +309,7 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
 	/* Write an approximate start timestamp.
 	 * It's approximate because there might be a job in the HEAD register.
 	 */
-	katom->start_timestamp = ktime_get();
+	katom->start_timestamp = ktime_get_raw();
 
 	/* GO ! */
 	dev_dbg(kbdev->dev, "JS: Submitting atom %pK from ctx %pK to js[%d] with head=0x%llx",
@@ -329,6 +357,8 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
 
 	kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT),
 						JS_COMMAND_START);
+
+	return 0;
 }
 
 /**
@@ -344,10 +374,8 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
  * work out the best estimate (which might still result in an over-estimate to
  * the calculated time spent)
  */
-static void kbasep_job_slot_update_head_start_timestamp(
-						struct kbase_device *kbdev,
-						int js,
-						ktime_t end_timestamp)
+static void kbasep_job_slot_update_head_start_timestamp(struct kbase_device *kbdev, unsigned int js,
+							ktime_t end_timestamp)
 {
 	ktime_t timestamp_diff;
 	struct kbase_jd_atom *katom;
@@ -377,8 +405,7 @@ static void kbasep_job_slot_update_head_start_timestamp(
  * Make a tracepoint call to the instrumentation module informing that
  * softstop happened on given lpu (job slot).
  */
-static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev,
-					int js)
+static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev, unsigned int js)
 {
 	KBASE_TLSTREAM_TL_EVENT_LPU_SOFTSTOP(
 		kbdev,
@@ -387,19 +414,17 @@ static void kbasep_trace_tl_event_lpu_softstop(struct kbase_device *kbdev,
 
 void kbase_job_done(struct kbase_device *kbdev, u32 done)
 {
-	int i;
 	u32 count = 0;
 	ktime_t end_timestamp;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
-	KBASE_DEBUG_ASSERT(kbdev);
-
 	KBASE_KTRACE_ADD_JM(kbdev, JM_IRQ, NULL, NULL, 0, done);
 
-	end_timestamp = ktime_get();
+	end_timestamp = ktime_get_raw();
 
 	while (done) {
+		unsigned int i;
 		u32 failed = done >> 16;
 
 		/* treat failed slots as finished slots */
@@ -409,7 +434,6 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done)
 		 * numbered interrupts before the higher numbered ones.
 		 */
 		i = ffs(finished) - 1;
-		KBASE_DEBUG_ASSERT(i >= 0);
 
 		do {
 			int nr_done;
@@ -561,7 +585,7 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done)
 			count += nr_done;
 
 			while (nr_done) {
-				if (nr_done == 1) {
+				if (likely(nr_done == 1)) {
 					kbase_gpu_complete_hw(kbdev, i,
 								completion_code,
 								job_tail,
@@ -580,6 +604,14 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done)
 							BASE_JD_EVENT_DONE,
 							0,
 							&end_timestamp);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+					/* Increment the end timestamp value by 1 ns to
+					 * avoid having the same value for 'start_time_ns'
+					 * and 'end_time_ns' for the 2nd atom whose job
+					 * completion IRQ got merged with the 1st atom.
+					 */
+					end_timestamp = ktime_add(end_timestamp, ns_to_ktime(1));
+#endif
 				}
 				nr_done--;
 			}
@@ -590,7 +622,7 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done)
 			failed = done >> 16;
 			finished = (done & 0xFFFF) | failed;
 			if (done)
-				end_timestamp = ktime_get();
+				end_timestamp = ktime_get_raw();
 		} while (finished & (1 << i));
 
 		kbasep_job_slot_update_head_start_timestamp(kbdev, i,
@@ -608,18 +640,16 @@ void kbase_job_done(struct kbase_device *kbdev, u32 done)
 	KBASE_KTRACE_ADD_JM(kbdev, JM_IRQ_END, NULL, NULL, 0, count);
 }
 
-void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
-					int js,
-					u32 action,
-					base_jd_core_req core_reqs,
-					struct kbase_jd_atom *target_katom)
+void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, unsigned int js,
+						 u32 action, base_jd_core_req core_reqs,
+						 struct kbase_jd_atom *target_katom)
 {
 #if KBASE_KTRACE_ENABLE
 	u32 status_reg_before;
 	u64 job_in_head_before;
 	u32 status_reg_after;
 
-	KBASE_DEBUG_ASSERT(!(action & (~JS_COMMAND_MASK)));
+	WARN_ON(action & (~JS_COMMAND_MASK));
 
 	/* Check the head pointer */
 	job_in_head_before = ((u64) kbase_reg_read(kbdev,
@@ -670,6 +700,10 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
 		struct kbase_context *head_kctx;
 
 		head = kbase_gpu_inspect(kbdev, js, 0);
+		if (unlikely(!head)) {
+			dev_err(kbdev->dev, "Can't get a katom from js(%d)\n", js);
+			return;
+		}
 		head_kctx = head->kctx;
 
 		if (status_reg_before == BASE_JD_EVENT_ACTIVE)
@@ -697,7 +731,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
 			KBASE_KTRACE_ADD_JM_SLOT(kbdev, JM_HARDSTOP_1, head_kctx, head, head->jc, js);
 			break;
 		default:
-			BUG();
+			WARN(1, "Unknown action %d on atom %pK in kctx %pK\n", action,
+			     (void *)target_katom, (void *)target_katom->kctx);
 			break;
 		}
 	} else {
@@ -726,7 +761,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
 			KBASE_KTRACE_ADD_JM_SLOT(kbdev, JM_HARDSTOP_1, NULL, NULL, 0, js);
 			break;
 		default:
-			BUG();
+			WARN(1, "Unknown action %d on atom %pK in kctx %pK\n", action,
+			     (void *)target_katom, (void *)target_katom->kctx);
 			break;
 		}
 	}
@@ -736,7 +772,7 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
 void kbase_backend_jm_kill_running_jobs_from_kctx(struct kbase_context *kctx)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	int i;
+	unsigned int i;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -748,13 +784,11 @@ void kbase_job_slot_ctx_priority_check_locked(struct kbase_context *kctx,
 				struct kbase_jd_atom *target_katom)
 {
 	struct kbase_device *kbdev;
-	int target_js = target_katom->slot_nr;
+	unsigned int target_js = target_katom->slot_nr;
 	int i;
 	bool stop_sent = false;
 
-	KBASE_DEBUG_ASSERT(kctx != NULL);
 	kbdev = kctx->kbdev;
-	KBASE_DEBUG_ASSERT(kbdev != NULL);
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -884,11 +918,11 @@ u32 kbase_backend_get_current_flush_id(struct kbase_device *kbdev)
 	u32 flush_id = 0;
 
 	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_FLUSH_REDUCTION)) {
-		mutex_lock(&kbdev->pm.lock);
+		rt_mutex_lock(&kbdev->pm.lock);
 		if (kbdev->pm.backend.gpu_powered)
 			flush_id = kbase_reg_read(kbdev,
 					GPU_CONTROL_REG(LATEST_FLUSH));
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 	}
 
 	return flush_id;
@@ -928,13 +962,17 @@ KBASE_EXPORT_TEST_API(kbase_job_slot_term);
  *
  * Where possible any job in the next register is evicted before the soft-stop.
  */
-void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, int js,
-			struct kbase_jd_atom *target_katom, u32 sw_flags)
+void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, unsigned int js,
+				     struct kbase_jd_atom *target_katom, u32 sw_flags)
 {
 	dev_dbg(kbdev->dev, "Soft-stop atom %pK with flags 0x%x (s:%d)\n",
 		target_katom, sw_flags, js);
 
-	KBASE_DEBUG_ASSERT(!(sw_flags & JS_COMMAND_MASK));
+	if (sw_flags & JS_COMMAND_MASK) {
+		WARN(true, "Atom %pK in kctx %pK received non-NOP flags %d\n", (void *)target_katom,
+		     target_katom ? (void *)target_katom->kctx : NULL, sw_flags);
+		sw_flags &= ~((u32)JS_COMMAND_MASK);
+	}
 	kbase_backend_soft_hard_stop_slot(kbdev, NULL, js, target_katom,
 			JS_COMMAND_SOFT_STOP | sw_flags);
 }
@@ -945,8 +983,8 @@ void kbase_job_slot_softstop(struct kbase_device *kbdev, int js,
 	kbase_job_slot_softstop_swflags(kbdev, js, target_katom, 0u);
 }
 
-void kbase_job_slot_hardstop(struct kbase_context *kctx, int js,
-				struct kbase_jd_atom *target_katom)
+void kbase_job_slot_hardstop(struct kbase_context *kctx, unsigned int js,
+			     struct kbase_jd_atom *target_katom)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
 
@@ -1031,12 +1069,12 @@ static void kbase_debug_dump_registers(struct kbase_device *kbdev)
 			i, kbase_reg_read(kbdev, JOB_SLOT_REG(i, JS_HEAD_LO)));
 	}
 	dev_err(kbdev->dev, "  MMU_IRQ_RAWSTAT=0x%08x GPU_FAULTSTATUS=0x%08x",
-		kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_RAWSTAT)),
+		kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_FAULTSTATUS)));
 	dev_err(kbdev->dev, "  GPU_IRQ_MASK=0x%08x    JOB_IRQ_MASK=0x%08x     MMU_IRQ_MASK=0x%08x",
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK)),
 		kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK)),
-		kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)));
+		kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)));
 	dev_err(kbdev->dev, "  PWR_OVERRIDE0=0x%08x   PWR_OVERRIDE1=0x%08x",
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE0)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1)));
@@ -1052,17 +1090,14 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 {
 	unsigned long flags;
 	struct kbase_device *kbdev;
-	ktime_t end_timestamp = ktime_get();
+	ktime_t end_timestamp = ktime_get_raw();
 	struct kbasep_js_device_data *js_devdata;
 	bool silent = false;
 	u32 max_loops = KBASE_CLEAN_CACHE_MAX_LOOPS;
 
-	KBASE_DEBUG_ASSERT(data);
-
 	kbdev = container_of(data, struct kbase_device,
 						hwaccess.backend.reset_work);
 
-	KBASE_DEBUG_ASSERT(kbdev);
 	js_devdata = &kbdev->js_data;
 
 	if (atomic_read(&kbdev->hwaccess.backend.reset_gpu) ==
@@ -1097,7 +1132,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 		return;
 	}
 
-	KBASE_DEBUG_ASSERT(kbdev->irq_reset_flush == false);
+	WARN(kbdev->irq_reset_flush, "%s: GPU reset already in flight\n", __func__);
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	spin_lock(&kbdev->mmu_mask_change);
@@ -1136,9 +1171,10 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 		WARN(!max_loops, "L2 power transition timed out while trying to reset\n");
 	}
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	/* We hold the pm lock, so there ought to be a current policy */
-	KBASE_DEBUG_ASSERT(kbdev->pm.backend.pm_current_policy);
+	if (unlikely(!kbdev->pm.backend.pm_current_policy))
+		dev_warn(kbdev->dev, "No power policy set!");
 
 	/* All slot have been soft-stopped and we've waited
 	 * SOFT_STOP_RESET_TIMEOUT for the slots to clear, at this point we
@@ -1174,7 +1210,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 	/* Reset the GPU */
 	kbase_pm_init_hw(kbdev, 0);
 
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 	mutex_lock(&js_devdata->runpool_mutex);
 
@@ -1190,7 +1226,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 
 	mutex_unlock(&js_devdata->runpool_mutex);
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 
 	kbase_pm_reset_complete(kbdev);
 
@@ -1202,7 +1238,7 @@ static void kbasep_reset_timeout_worker(struct work_struct *data)
 	 */
 	kbase_pm_wait_for_desired_state(kbdev);
 
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 	atomic_set(&kbdev->hwaccess.backend.reset_gpu,
 						KBASE_RESET_GPU_NOT_PENDING);
@@ -1235,8 +1271,6 @@ static enum hrtimer_restart kbasep_reset_timer_callback(struct hrtimer *timer)
 	struct kbase_device *kbdev = container_of(timer, struct kbase_device,
 						hwaccess.backend.reset_timer);
 
-	KBASE_DEBUG_ASSERT(kbdev);
-
 	/* Reset still pending? */
 	if (atomic_cmpxchg(&kbdev->hwaccess.backend.reset_gpu,
 			KBASE_RESET_GPU_COMMITTED, KBASE_RESET_GPU_HAPPENING) ==
@@ -1254,11 +1288,9 @@ static enum hrtimer_restart kbasep_reset_timer_callback(struct hrtimer *timer)
 
 static void kbasep_try_reset_gpu_early_locked(struct kbase_device *kbdev)
 {
-	int i;
+	unsigned int i;
 	int pending_jobs = 0;
 
-	KBASE_DEBUG_ASSERT(kbdev);
-
 	/* Count the number of jobs */
 	for (i = 0; i < kbdev->gpu_props.num_job_slots; i++)
 		pending_jobs += kbase_backend_nr_atoms_submitted(kbdev, i);
@@ -1316,8 +1348,6 @@ bool kbase_prepare_to_reset_gpu_locked(struct kbase_device *kbdev,
 {
 	int i;
 
-	KBASE_DEBUG_ASSERT(kbdev);
-
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	if (kbase_pm_is_gpu_lost(kbdev)) {
 		/* GPU access has been removed, reset will be done by
@@ -1371,13 +1401,11 @@ KBASE_EXPORT_TEST_API(kbase_prepare_to_reset_gpu);
  */
 void kbase_reset_gpu(struct kbase_device *kbdev)
 {
-	KBASE_DEBUG_ASSERT(kbdev);
-
 	/* Note this is an assert/atomic_set because it is a software issue for
 	 * a race to be occurring here
 	 */
-	KBASE_DEBUG_ASSERT(atomic_read(&kbdev->hwaccess.backend.reset_gpu) ==
-						KBASE_RESET_GPU_PREPARED);
+	if (WARN_ON(atomic_read(&kbdev->hwaccess.backend.reset_gpu) != KBASE_RESET_GPU_PREPARED))
+		return;
 	atomic_set(&kbdev->hwaccess.backend.reset_gpu,
 						KBASE_RESET_GPU_COMMITTED);
 
@@ -1395,13 +1423,11 @@ KBASE_EXPORT_TEST_API(kbase_reset_gpu);
 
 void kbase_reset_gpu_locked(struct kbase_device *kbdev)
 {
-	KBASE_DEBUG_ASSERT(kbdev);
-
 	/* Note this is an assert/atomic_set because it is a software issue for
 	 * a race to be occurring here
 	 */
-	KBASE_DEBUG_ASSERT(atomic_read(&kbdev->hwaccess.backend.reset_gpu) ==
-						KBASE_RESET_GPU_PREPARED);
+	if (WARN_ON(atomic_read(&kbdev->hwaccess.backend.reset_gpu) != KBASE_RESET_GPU_PREPARED))
+		return;
 	atomic_set(&kbdev->hwaccess.backend.reset_gpu,
 						KBASE_RESET_GPU_COMMITTED);
 
@@ -1442,6 +1468,11 @@ bool kbase_reset_gpu_is_active(struct kbase_device *kbdev)
 	return true;
 }
 
+bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev)
+{
+	return atomic_read(&kbdev->hwaccess.backend.reset_gpu) == KBASE_RESET_GPU_NOT_PENDING;
+}
+
 int kbase_reset_gpu_wait(struct kbase_device *kbdev)
 {
 	wait_event(kbdev->hwaccess.backend.reset_wait,
diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_internal.h b/mali_kbase/backend/gpu/mali_kbase_jm_internal.h
index 1039e85..380a530 100644
--- a/mali_kbase/backend/gpu/mali_kbase_jm_internal.h
+++ b/mali_kbase/backend/gpu/mali_kbase_jm_internal.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2016, 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -34,21 +34,6 @@
 #include <device/mali_kbase_device.h>
 
 /**
- * kbase_job_submit_nolock() - Submit a job to a certain job-slot
- * @kbdev:	Device pointer
- * @katom:	Atom to submit
- * @js:		Job slot to submit on
- *
- * The caller must check kbasep_jm_is_submit_slots_free() != false before
- * calling this.
- *
- * The following locking conditions are made on the caller:
- * - it must hold the hwaccess_lock
- */
-void kbase_job_submit_nolock(struct kbase_device *kbdev,
-					struct kbase_jd_atom *katom, int js);
-
-/**
  * kbase_job_done_slot() - Complete the head job on a particular job-slot
  * @kbdev:		Device pointer
  * @s:			Job slot
@@ -60,23 +45,13 @@ void kbase_job_done_slot(struct kbase_device *kbdev, int s, u32 completion_code,
 					u64 job_tail, ktime_t *end_timestamp);
 
 #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS)
-static inline char *kbasep_make_job_slot_string(int js, char *js_string,
-						size_t js_size)
+static inline char *kbasep_make_job_slot_string(unsigned int js, char *js_string, size_t js_size)
 {
-	snprintf(js_string, js_size, "job_slot_%i", js);
+	(void)scnprintf(js_string, js_size, "job_slot_%u", js);
 	return js_string;
 }
 #endif
 
-#if !MALI_USE_CSF
-static inline int kbasep_jm_is_js_free(struct kbase_device *kbdev, int js,
-						struct kbase_context *kctx)
-{
-	return !kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT));
-}
-#endif
-
-
 /**
  * kbase_job_hw_submit() - Submit a job to the GPU
  * @kbdev:	Device pointer
@@ -88,10 +63,10 @@ static inline int kbasep_jm_is_js_free(struct kbase_device *kbdev, int js,
  *
  * The following locking conditions are made on the caller:
  * - it must hold the hwaccess_lock
+ *
+ * Return: 0 if the job was successfully submitted to hardware, an error otherwise.
  */
-void kbase_job_hw_submit(struct kbase_device *kbdev,
-				struct kbase_jd_atom *katom,
-				int js);
+int kbase_job_hw_submit(struct kbase_device *kbdev, struct kbase_jd_atom *katom, unsigned int js);
 
 #if !MALI_USE_CSF
 /**
@@ -107,11 +82,9 @@ void kbase_job_hw_submit(struct kbase_device *kbdev,
  * The following locking conditions are made on the caller:
  * - it must hold the hwaccess_lock
  */
-void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
-					int js,
-					u32 action,
-					base_jd_core_req core_reqs,
-					struct kbase_jd_atom *target_katom);
+void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev, unsigned int js,
+						 u32 action, base_jd_core_req core_reqs,
+						 struct kbase_jd_atom *target_katom);
 #endif /* !MALI_USE_CSF */
 
 /**
@@ -135,11 +108,8 @@ void kbasep_job_slot_soft_or_hard_stop_do_action(struct kbase_device *kbdev,
  *
  * Return: true if an atom was stopped, false otherwise
  */
-bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev,
-					struct kbase_context *kctx,
-					int js,
-					struct kbase_jd_atom *katom,
-					u32 action);
+bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, struct kbase_context *kctx,
+				       unsigned int js, struct kbase_jd_atom *katom, u32 action);
 
 /**
  * kbase_job_slot_init - Initialise job slot framework
diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_rb.c b/mali_kbase/backend/gpu/mali_kbase_jm_rb.c
index eaa3640..66f068a 100644
--- a/mali_kbase/backend/gpu/mali_kbase_jm_rb.c
+++ b/mali_kbase/backend/gpu/mali_kbase_jm_rb.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -29,9 +29,12 @@
 #include <mali_kbase_jm.h>
 #include <mali_kbase_js.h>
 #include <tl/mali_kbase_tracepoints.h>
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 #include <mali_kbase_reset_gpu.h>
 #include <mali_kbase_kinstr_jm.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+#endif
 #include <backend/gpu/mali_kbase_cache_policy_backend.h>
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_jm_internal.h>
@@ -93,9 +96,8 @@ static void kbase_gpu_enqueue_atom(struct kbase_device *kbdev,
  *
  * Return: Atom removed from ringbuffer
  */
-static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev,
-						int js,
-						ktime_t *end_timestamp)
+static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev, unsigned int js,
+						    ktime_t *end_timestamp)
 {
 	struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js];
 	struct kbase_jd_atom *katom;
@@ -118,8 +120,7 @@ static struct kbase_jd_atom *kbase_gpu_dequeue_atom(struct kbase_device *kbdev,
 	return katom;
 }
 
-struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js,
-					int idx)
+struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, unsigned int js, int idx)
 {
 	struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js];
 
@@ -131,8 +132,7 @@ struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js,
 	return rb->entries[(rb->read_idx + idx) & SLOT_RB_MASK].katom;
 }
 
-struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev,
-					int js)
+struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, unsigned int js)
 {
 	struct slot_rb *rb = &kbdev->hwaccess.backend.slot_rb[js];
 
@@ -144,12 +144,13 @@ struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev,
 
 bool kbase_gpu_atoms_submitted_any(struct kbase_device *kbdev)
 {
-	int js;
-	int i;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
 	for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) {
+		int i;
+
 		for (i = 0; i < SLOT_RB_SIZE; i++) {
 			struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev, js, i);
 
@@ -160,7 +161,7 @@ bool kbase_gpu_atoms_submitted_any(struct kbase_device *kbdev)
 	return false;
 }
 
-int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js)
+int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, unsigned int js)
 {
 	int nr = 0;
 	int i;
@@ -178,7 +179,7 @@ int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js)
 	return nr;
 }
 
-int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js)
+int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, unsigned int js)
 {
 	int nr = 0;
 	int i;
@@ -193,8 +194,8 @@ int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js)
 	return nr;
 }
 
-static int kbase_gpu_nr_atoms_on_slot_min(struct kbase_device *kbdev, int js,
-				enum kbase_atom_gpu_rb_state min_rb_state)
+static int kbase_gpu_nr_atoms_on_slot_min(struct kbase_device *kbdev, unsigned int js,
+					  enum kbase_atom_gpu_rb_state min_rb_state)
 {
 	int nr = 0;
 	int i;
@@ -244,9 +245,11 @@ static bool check_secure_atom(struct kbase_jd_atom *katom, bool secure)
 static bool kbase_gpu_check_secure_atoms(struct kbase_device *kbdev,
 		bool secure)
 {
-	int js, i;
+	unsigned int js;
 
 	for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) {
+		int i;
+
 		for (i = 0; i < SLOT_RB_SIZE; i++) {
 			struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev,
 					js, i);
@@ -261,7 +264,7 @@ static bool kbase_gpu_check_secure_atoms(struct kbase_device *kbdev,
 	return false;
 }
 
-int kbase_backend_slot_free(struct kbase_device *kbdev, int js)
+int kbase_backend_slot_free(struct kbase_device *kbdev, unsigned int js)
 {
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -274,6 +277,59 @@ int kbase_backend_slot_free(struct kbase_device *kbdev, int js)
 	return SLOT_RB_SIZE - kbase_backend_nr_atoms_on_slot(kbdev, js);
 }
 
+/**
+ * trace_atom_completion_for_gpu_metrics - Report the completion of atom for the
+ *                                         purpose of emitting power/gpu_work_period
+ *                                         tracepoint.
+ *
+ * @katom:         Pointer to the atom that completed execution on GPU.
+ * @end_timestamp: Pointer to the timestamp of atom completion. May be NULL, in
+ *                 which case current time will be used.
+ *
+ * The function would also report the start for an atom that was in the HEAD_NEXT
+ * register.
+ *
+ * Note: Caller must hold the HW access lock.
+ */
+static inline void trace_atom_completion_for_gpu_metrics(
+			struct kbase_jd_atom *const katom,
+			ktime_t *end_timestamp)
+{
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	u64 complete_ns;
+	struct kbase_context *kctx = katom->kctx;
+	struct kbase_jd_atom *queued =
+		kbase_gpu_inspect(kctx->kbdev, katom->slot_nr, 1);
+
+#ifdef CONFIG_MALI_DEBUG
+	WARN_ON(!kbase_gpu_inspect(kctx->kbdev, katom->slot_nr, 0));
+#endif
+
+	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
+
+	if (unlikely(queued == katom))
+		return;
+
+	/* A protected atom and a non-protected atom cannot be in the RB_SUBMITTED
+	 * state at the same time in the job slot ringbuffer. Atom submission state
+	 * machine prevents the submission of a non-protected atom until all
+	 * protected atoms have completed and GPU has exited the protected mode.
+	 * This implies that if the queued atom is in RB_SUBMITTED state, it shall
+	 * be a protected atom and so we can return early.
+	 */
+	if (unlikely(kbase_jd_katom_is_protected(katom)))
+		return;
+
+	if (likely(end_timestamp))
+		complete_ns = ktime_to_ns(*end_timestamp);
+	else
+		complete_ns = ktime_get_raw_ns();
+
+	kbase_gpu_metrics_ctx_end_activity(kctx, complete_ns);
+	if (queued && queued->gpu_rb_state == KBASE_ATOM_GPU_RB_SUBMITTED)
+		kbase_gpu_metrics_ctx_start_activity(queued->kctx, complete_ns);
+#endif
+}
 
 static void kbase_gpu_release_atom(struct kbase_device *kbdev,
 					struct kbase_jd_atom *katom,
@@ -290,6 +346,7 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev,
 		break;
 
 	case KBASE_ATOM_GPU_RB_SUBMITTED:
+		trace_atom_completion_for_gpu_metrics(katom, end_timestamp);
 		kbase_kinstr_jm_atom_hw_release(katom);
 		/* Inform power management at start/finish of atom so it can
 		 * update its GPU utilisation metrics. Mark atom as not
@@ -298,8 +355,7 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev,
 		katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY;
 		kbase_pm_metrics_update(kbdev, end_timestamp);
 
-		/* Inform platform at start/finish of atom */
-		kbasep_platform_event_atom_complete(katom);
+		kbasep_platform_event_work_end(katom);
 
 		if (katom->core_req & BASE_JD_REQ_PERMON)
 			kbase_pm_release_gpu_cycle_counter_nolock(kbdev);
@@ -347,16 +403,35 @@ static void kbase_gpu_release_atom(struct kbase_device *kbdev,
 				katom->protected_state.exit !=
 				KBASE_ATOM_EXIT_PROTECTED_CHECK)
 			kbdev->protected_mode_transition = false;
+
+		/* If the atom is at KBASE_ATOM_ENTER_PROTECTED_HWCNT state, it means
+		 * one of two events prevented it from progressing to the next state and
+		 * ultimately reach protected mode:
+		 * - hwcnts were enabled, and the atom had to schedule a worker to
+		 *   disable them.
+		 * - the hwcnts were already disabled, but some other error occurred.
+		 * In the first case, if the worker has not yet completed
+		 * (kbdev->protected_mode_hwcnt_disabled == false), we need to re-enable
+		 * them and signal to the worker they have already been enabled
+		 */
+		if (kbase_jd_katom_is_protected(katom) &&
+		    (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_HWCNT)) {
+			kbdev->protected_mode_hwcnt_desired = true;
+			if (kbdev->protected_mode_hwcnt_disabled) {
+				kbase_hwcnt_context_enable(kbdev->hwcnt_gpu_ctx);
+				kbdev->protected_mode_hwcnt_disabled = false;
+			}
+		}
+
 		/* If the atom has suspended hwcnt but has not yet entered
 		 * protected mode, then resume hwcnt now. If the GPU is now in
 		 * protected mode then hwcnt will be resumed by GPU reset so
 		 * don't resume it here.
 		 */
 		if (kbase_jd_katom_is_protected(katom) &&
-				((katom->protected_state.enter ==
-				KBASE_ATOM_ENTER_PROTECTED_IDLE_L2) ||
-				 (katom->protected_state.enter ==
-				KBASE_ATOM_ENTER_PROTECTED_SET_COHERENCY))) {
+		    ((katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_IDLE_L2) ||
+		     (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_SET_COHERENCY) ||
+		     (katom->protected_state.enter == KBASE_ATOM_ENTER_PROTECTED_FINISHED))) {
 			WARN_ON(!kbdev->protected_mode_hwcnt_disabled);
 			kbdev->protected_mode_hwcnt_desired = true;
 			if (kbdev->protected_mode_hwcnt_disabled) {
@@ -411,9 +486,9 @@ static void kbase_gpu_mark_atom_for_return(struct kbase_device *kbdev,
  *
  * Return: true if any slots other than @js are busy, false otherwise
  */
-static inline bool other_slots_busy(struct kbase_device *kbdev, int js)
+static inline bool other_slots_busy(struct kbase_device *kbdev, unsigned int js)
 {
-	int slot;
+	unsigned int slot;
 
 	for (slot = 0; slot < kbdev->gpu_props.num_job_slots; slot++) {
 		if (slot == js)
@@ -507,17 +582,14 @@ static int kbase_jm_protected_entry(struct kbase_device *kbdev,
 	KBASE_TLSTREAM_AUX_PROTECTED_ENTER_END(kbdev, kbdev);
 	if (err) {
 		/*
-		 * Failed to switch into protected mode, resume
-		 * GPU hwcnt and fail atom.
+		 * Failed to switch into protected mode.
+		 *
+		 * At this point we expect:
+		 * katom->gpu_rb_state = KBASE_ATOM_GPU_RB_WAITING_PROTECTED_MODE_TRANSITION &&
+		 * katom->protected_state.enter = KBASE_ATOM_ENTER_PROTECTED_FINISHED
+		 *  ==>
+		 * kbdev->protected_mode_hwcnt_disabled = false
 		 */
-		WARN_ON(!kbdev->protected_mode_hwcnt_disabled);
-		kbdev->protected_mode_hwcnt_desired = true;
-		if (kbdev->protected_mode_hwcnt_disabled) {
-			kbase_hwcnt_context_enable(
-				kbdev->hwcnt_gpu_ctx);
-			kbdev->protected_mode_hwcnt_disabled = false;
-		}
-
 		katom[idx]->event_code = BASE_JD_EVENT_JOB_INVALID;
 		kbase_gpu_mark_atom_for_return(kbdev, katom[idx]);
 		/*
@@ -537,12 +609,9 @@ static int kbase_jm_protected_entry(struct kbase_device *kbdev,
 	/*
 	 * Protected mode sanity checks.
 	 */
-	KBASE_DEBUG_ASSERT_MSG(
-			kbase_jd_katom_is_protected(katom[idx]) ==
-			kbase_gpu_in_protected_mode(kbdev),
-			"Protected mode of atom (%d) doesn't match protected mode of GPU (%d)",
-			kbase_jd_katom_is_protected(katom[idx]),
-			kbase_gpu_in_protected_mode(kbdev));
+	WARN(kbase_jd_katom_is_protected(katom[idx]) != kbase_gpu_in_protected_mode(kbdev),
+	     "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)",
+	     kbase_jd_katom_is_protected(katom[idx]), kbase_gpu_in_protected_mode(kbdev));
 	katom[idx]->gpu_rb_state =
 			KBASE_ATOM_GPU_RB_READY;
 
@@ -831,7 +900,7 @@ static int kbase_jm_exit_protected_mode(struct kbase_device *kbdev,
 
 void kbase_backend_slot_update(struct kbase_device *kbdev)
 {
-	int js;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -853,6 +922,9 @@ void kbase_backend_slot_update(struct kbase_device *kbdev)
 
 		for (idx = 0; idx < SLOT_RB_SIZE; idx++) {
 			bool cores_ready;
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+			bool trace_atom_submit_for_gpu_metrics = true;
+#endif
 			int ret;
 
 			if (!katom[idx])
@@ -952,18 +1024,6 @@ void kbase_backend_slot_update(struct kbase_device *kbdev)
 				cores_ready = kbase_pm_cores_requested(kbdev,
 						true);
 
-				if (katom[idx]->event_code ==
-						BASE_JD_EVENT_PM_EVENT) {
-					KBASE_KTRACE_ADD_JM_SLOT_INFO(
-						kbdev, JM_MARK_FOR_RETURN_TO_JS,
-						katom[idx]->kctx, katom[idx],
-						katom[idx]->jc, js,
-						katom[idx]->event_code);
-					katom[idx]->gpu_rb_state =
-						KBASE_ATOM_GPU_RB_RETURN_TO_JS;
-					break;
-				}
-
 				if (!cores_ready)
 					break;
 
@@ -975,12 +1035,21 @@ void kbase_backend_slot_update(struct kbase_device *kbdev)
 			case KBASE_ATOM_GPU_RB_READY:
 
 				if (idx == 1) {
+					enum kbase_atom_gpu_rb_state atom_0_gpu_rb_state =
+						katom[0]->gpu_rb_state;
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+					trace_atom_submit_for_gpu_metrics =
+						(atom_0_gpu_rb_state ==
+						 KBASE_ATOM_GPU_RB_NOT_IN_SLOT_RB);
+#endif
+
 					/* Only submit if head atom or previous
 					 * atom already submitted
 					 */
-					if ((katom[0]->gpu_rb_state !=
+					if ((atom_0_gpu_rb_state !=
 						KBASE_ATOM_GPU_RB_SUBMITTED &&
-						katom[0]->gpu_rb_state !=
+						atom_0_gpu_rb_state !=
 					KBASE_ATOM_GPU_RB_NOT_IN_SLOT_RB))
 						break;
 
@@ -1000,36 +1069,42 @@ void kbase_backend_slot_update(struct kbase_device *kbdev)
 						other_slots_busy(kbdev, js))
 					break;
 
-#ifdef CONFIG_MALI_GEM5_BUILD
-				if (!kbasep_jm_is_js_free(kbdev, js,
-						katom[idx]->kctx))
-					break;
-#endif
 				/* Check if this job needs the cycle counter
 				 * enabled before submission
 				 */
 				if (katom[idx]->core_req & BASE_JD_REQ_PERMON)
-					kbase_pm_request_gpu_cycle_counter_l2_is_on(
-									kbdev);
+					kbase_pm_request_gpu_cycle_counter_l2_is_on(kbdev);
 
-				kbase_job_hw_submit(kbdev, katom[idx], js);
-				katom[idx]->gpu_rb_state =
-					KBASE_ATOM_GPU_RB_SUBMITTED;
+				if (!kbase_job_hw_submit(kbdev, katom[idx], js)) {
+					katom[idx]->gpu_rb_state = KBASE_ATOM_GPU_RB_SUBMITTED;
+
+					/* Inform power management at start/finish of
+					 * atom so it can update its GPU utilisation
+					 * metrics.
+					 */
+					kbase_pm_metrics_update(kbdev,
+							&katom[idx]->start_timestamp);
+
+					/* Inform platform at start/finish of atom */
+
+					kbasep_platform_event_work_begin(katom[idx]);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+					if (likely(trace_atom_submit_for_gpu_metrics &&
+						   !kbase_jd_katom_is_protected(katom[idx])))
+						kbase_gpu_metrics_ctx_start_activity(
+							katom[idx]->kctx,
+							ktime_to_ns(katom[idx]->start_timestamp));
+#endif
+				} else {
+					if (katom[idx]->core_req & BASE_JD_REQ_PERMON)
+						kbase_pm_release_gpu_cycle_counter_nolock(kbdev);
+
+					break;
+				}
 
 				/* ***TRANSITION TO HIGHER STATE*** */
 				fallthrough;
 			case KBASE_ATOM_GPU_RB_SUBMITTED:
-
-				/* Inform power management at start/finish of
-				 * atom so it can update its GPU utilisation
-				 * metrics.
-				 */
-				kbase_pm_metrics_update(kbdev,
-						&katom[idx]->start_timestamp);
-
-				/* Inform platform at start/finish of atom */
-				kbasep_platform_event_atom_submit(katom[idx]);
-
 				break;
 
 			case KBASE_ATOM_GPU_RB_RETURN_TO_JS:
@@ -1081,6 +1156,25 @@ kbase_rb_atom_might_depend(const struct kbase_jd_atom *katom_a,
 					KBASE_KATOM_FLAG_FAIL_BLOCKER)));
 }
 
+static inline void kbase_gpu_remove_atom(struct kbase_device *kbdev,
+						struct kbase_jd_atom *katom,
+						u32 action,
+						bool disjoint)
+{
+	struct kbase_context *kctx = katom->kctx;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT;
+	kbase_gpu_mark_atom_for_return(kbdev, katom);
+	kbase_jsctx_slot_prio_blocked_set(kctx, katom->slot_nr,
+					  katom->sched_priority);
+
+	if (disjoint)
+		kbase_job_check_enter_disjoint(kbdev, action, katom->core_req,
+									katom);
+}
+
 /**
  * kbase_gpu_irq_evict - evict a slot's JSn_HEAD_NEXT atom from the HW if it is
  *                       related to a failed JSn_HEAD atom
@@ -1109,8 +1203,7 @@ kbase_rb_atom_might_depend(const struct kbase_jd_atom *katom_a,
  *
  * Return: true if an atom was evicted, false otherwise.
  */
-bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
-				u32 completion_code)
+bool kbase_gpu_irq_evict(struct kbase_device *kbdev, unsigned int js, u32 completion_code)
 {
 	struct kbase_jd_atom *katom;
 	struct kbase_jd_atom *next_katom;
@@ -1118,6 +1211,10 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
 	katom = kbase_gpu_inspect(kbdev, js, 0);
+	if (!katom) {
+		dev_err(kbdev->dev, "Can't get a katom from js(%u)\n", js);
+		return false;
+	}
 	next_katom = kbase_gpu_inspect(kbdev, js, 1);
 
 	if (next_katom &&
@@ -1128,9 +1225,9 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
 	     kbase_reg_read(kbdev, JOB_SLOT_REG(js, JS_HEAD_NEXT_HI)) != 0)) {
 		kbase_reg_write(kbdev, JOB_SLOT_REG(js, JS_COMMAND_NEXT),
 				JS_COMMAND_NOP);
-		next_katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY;
 
 		if (completion_code == BASE_JD_EVENT_STOPPED) {
+			kbase_gpu_remove_atom(kbdev, next_katom, JS_COMMAND_SOFT_STOP, false);
 			KBASE_TLSTREAM_TL_NRET_ATOM_LPU(kbdev, next_katom,
 				&kbdev->gpu_props.props.raw_props.js_features
 					[next_katom->slot_nr]);
@@ -1139,10 +1236,12 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
 			KBASE_TLSTREAM_TL_NRET_CTX_LPU(kbdev, next_katom->kctx,
 				&kbdev->gpu_props.props.raw_props.js_features
 					[next_katom->slot_nr]);
-		}
+		} else {
+			next_katom->gpu_rb_state = KBASE_ATOM_GPU_RB_READY;
 
-		if (next_katom->core_req & BASE_JD_REQ_PERMON)
-			kbase_pm_release_gpu_cycle_counter_nolock(kbdev);
+			if (next_katom->core_req & BASE_JD_REQ_PERMON)
+				kbase_pm_release_gpu_cycle_counter_nolock(kbdev);
+		}
 
 		/* On evicting the next_katom, the last submission kctx on the
 		 * given job slot then reverts back to the one that owns katom.
@@ -1181,13 +1280,19 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
  * otherwise we would be in the incorrect state of having an atom both running
  * on the HW and returned to the JS.
  */
-void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
-				u32 completion_code,
-				u64 job_tail,
-				ktime_t *end_timestamp)
+
+void kbase_gpu_complete_hw(struct kbase_device *kbdev, unsigned int js, u32 completion_code,
+			   u64 job_tail, ktime_t *end_timestamp)
 {
 	struct kbase_jd_atom *katom = kbase_gpu_inspect(kbdev, js, 0);
-	struct kbase_context *kctx = katom->kctx;
+	struct kbase_context *kctx = NULL;
+
+	if (unlikely(!katom)) {
+		dev_err(kbdev->dev, "Can't get a katom from js(%d)\n", js);
+		return;
+	}
+
+	kctx = katom->kctx;
 
 	dev_dbg(kbdev->dev,
 		"Atom %pK completed on hw with code 0x%x and job_tail 0x%llx (s:%d)\n",
@@ -1240,7 +1345,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
 		}
 	} else if (completion_code != BASE_JD_EVENT_DONE) {
 		struct kbasep_js_device_data *js_devdata = &kbdev->js_data;
-		int i;
+		unsigned int i;
 
 		if (!kbase_ctx_flag(katom->kctx, KCTX_DYING)) {
 			dev_warn(kbdev->dev, "error detected from slot %d, job status 0x%08x (%s)",
@@ -1348,11 +1453,9 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
 		} else {
 			char js_string[16];
 
-			trace_gpu_sched_switch(kbasep_make_job_slot_string(js,
-							js_string,
-							sizeof(js_string)),
-						ktime_to_ns(ktime_get()), 0, 0,
-						0);
+			trace_gpu_sched_switch(kbasep_make_job_slot_string(js, js_string,
+									   sizeof(js_string)),
+					       ktime_to_ns(ktime_get_raw()), 0, 0, 0);
 		}
 	}
 #endif
@@ -1387,7 +1490,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
 
 void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp)
 {
-	int js;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -1408,14 +1511,14 @@ void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp)
 			if (katom->protected_state.exit ==
 			    KBASE_ATOM_EXIT_PROTECTED_RESET_WAIT) {
 				/* protected mode sanity checks */
-				KBASE_DEBUG_ASSERT_MSG(
-					kbase_jd_katom_is_protected(katom) == kbase_gpu_in_protected_mode(kbdev),
-					"Protected mode of atom (%d) doesn't match protected mode of GPU (%d)",
-					kbase_jd_katom_is_protected(katom), kbase_gpu_in_protected_mode(kbdev));
-				KBASE_DEBUG_ASSERT_MSG(
-					(kbase_jd_katom_is_protected(katom) && js == 0) ||
-					!kbase_jd_katom_is_protected(katom),
-					"Protected atom on JS%d not supported", js);
+				WARN(kbase_jd_katom_is_protected(katom) !=
+					     kbase_gpu_in_protected_mode(kbdev),
+				     "Protected mode of atom (%d) doesn't match protected mode of GPU (%d)",
+				     kbase_jd_katom_is_protected(katom),
+				     kbase_gpu_in_protected_mode(kbdev));
+				WARN(!(kbase_jd_katom_is_protected(katom) && js == 0) &&
+					     kbase_jd_katom_is_protected(katom),
+				     "Protected atom on JS%u not supported", js);
 			}
 			if ((katom->gpu_rb_state < KBASE_ATOM_GPU_RB_SUBMITTED) &&
 			    !kbase_ctx_flag(katom->kctx, KCTX_DYING))
@@ -1511,10 +1614,8 @@ static bool should_stop_next_atom(struct kbase_device *kbdev,
 	return ret;
 }
 
-static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev,
-					int js,
-					struct kbase_jd_atom *katom,
-					u32 action)
+static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev, unsigned int js,
+				       struct kbase_jd_atom *katom, u32 action)
 {
 	struct kbase_context *kctx = katom->kctx;
 	u32 hw_action = action & JS_COMMAND_MASK;
@@ -1525,25 +1626,6 @@ static inline void kbase_gpu_stop_atom(struct kbase_device *kbdev,
 	kbase_jsctx_slot_prio_blocked_set(kctx, js, katom->sched_priority);
 }
 
-static inline void kbase_gpu_remove_atom(struct kbase_device *kbdev,
-						struct kbase_jd_atom *katom,
-						u32 action,
-						bool disjoint)
-{
-	struct kbase_context *kctx = katom->kctx;
-
-	lockdep_assert_held(&kbdev->hwaccess_lock);
-
-	katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT;
-	kbase_gpu_mark_atom_for_return(kbdev, katom);
-	kbase_jsctx_slot_prio_blocked_set(kctx, katom->slot_nr,
-					  katom->sched_priority);
-
-	if (disjoint)
-		kbase_job_check_enter_disjoint(kbdev, action, katom->core_req,
-									katom);
-}
-
 static int should_stop_x_dep_slot(struct kbase_jd_atom *katom)
 {
 	if (katom->x_post_dep) {
@@ -1558,11 +1640,8 @@ static int should_stop_x_dep_slot(struct kbase_jd_atom *katom)
 	return -1;
 }
 
-bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev,
-					struct kbase_context *kctx,
-					int js,
-					struct kbase_jd_atom *katom,
-					u32 action)
+bool kbase_backend_soft_hard_stop_slot(struct kbase_device *kbdev, struct kbase_context *kctx,
+				       unsigned int js, struct kbase_jd_atom *katom, u32 action)
 {
 	struct kbase_jd_atom *katom_idx0;
 	struct kbase_context *kctx_idx0 = NULL;
@@ -1806,18 +1885,16 @@ void kbase_backend_complete_wq_post_sched(struct kbase_device *kbdev,
 		base_jd_core_req core_req)
 {
 	if (!kbdev->pm.active_count) {
-		mutex_lock(&kbdev->js_data.runpool_mutex);
-		mutex_lock(&kbdev->pm.lock);
+		kbase_pm_lock(kbdev);
 		kbase_pm_update_active(kbdev);
-		mutex_unlock(&kbdev->pm.lock);
-		mutex_unlock(&kbdev->js_data.runpool_mutex);
+		kbase_pm_unlock(kbdev);
 	}
 }
 
 void kbase_gpu_dump_slots(struct kbase_device *kbdev)
 {
 	unsigned long flags;
-	int js;
+	unsigned int js;
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
@@ -1832,12 +1909,10 @@ void kbase_gpu_dump_slots(struct kbase_device *kbdev)
 									idx);
 
 			if (katom)
-				dev_info(kbdev->dev,
-				"  js%d idx%d : katom=%pK gpu_rb_state=%d\n",
-				js, idx, katom, katom->gpu_rb_state);
+				dev_info(kbdev->dev, "  js%u idx%d : katom=%pK gpu_rb_state=%d\n",
+					 js, idx, katom, katom->gpu_rb_state);
 			else
-				dev_info(kbdev->dev, "  js%d idx%d : empty\n",
-								js, idx);
+				dev_info(kbdev->dev, "  js%u idx%d : empty\n", js, idx);
 		}
 	}
 
@@ -1846,7 +1921,7 @@ void kbase_gpu_dump_slots(struct kbase_device *kbdev)
 
 void kbase_backend_slot_kctx_purge_locked(struct kbase_device *kbdev, struct kbase_context *kctx)
 {
-	int js;
+	unsigned int js;
 	bool tracked = false;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
diff --git a/mali_kbase/backend/gpu/mali_kbase_jm_rb.h b/mali_kbase/backend/gpu/mali_kbase_jm_rb.h
index d3ff203..32be0bf 100644
--- a/mali_kbase/backend/gpu/mali_kbase_jm_rb.h
+++ b/mali_kbase/backend/gpu/mali_kbase_jm_rb.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -40,8 +40,7 @@
  *
  * Return: true if job evicted from NEXT registers, false otherwise
  */
-bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
-				u32 completion_code);
+bool kbase_gpu_irq_evict(struct kbase_device *kbdev, unsigned int js, u32 completion_code);
 
 /**
  * kbase_gpu_complete_hw - Complete an atom on job slot js
@@ -53,10 +52,8 @@ bool kbase_gpu_irq_evict(struct kbase_device *kbdev, int js,
  *                   completed
  * @end_timestamp:   Time of completion
  */
-void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
-				u32 completion_code,
-				u64 job_tail,
-				ktime_t *end_timestamp);
+void kbase_gpu_complete_hw(struct kbase_device *kbdev, unsigned int js, u32 completion_code,
+			   u64 job_tail, ktime_t *end_timestamp);
 
 /**
  * kbase_gpu_inspect - Inspect the contents of the HW access ringbuffer
@@ -68,8 +65,7 @@ void kbase_gpu_complete_hw(struct kbase_device *kbdev, int js,
  * Return:  The atom at that position in the ringbuffer
  *          or NULL if no atom present
  */
-struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, int js,
-					int idx);
+struct kbase_jd_atom *kbase_gpu_inspect(struct kbase_device *kbdev, unsigned int js, int idx);
 
 /**
  * kbase_gpu_dump_slots - Print the contents of the slot ringbuffers
diff --git a/mali_kbase/backend/gpu/mali_kbase_js_backend.c b/mali_kbase/backend/gpu/mali_kbase_js_backend.c
index 02d7cdb..ff4e114 100644
--- a/mali_kbase/backend/gpu/mali_kbase_js_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_js_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,28 +28,18 @@
 #include <mali_kbase_reset_gpu.h>
 #include <backend/gpu/mali_kbase_jm_internal.h>
 #include <backend/gpu/mali_kbase_js_internal.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+
+#endif
 
-#if !MALI_USE_CSF
 /*
  * Hold the runpool_mutex for this
  */
-static inline bool timer_callback_should_run(struct kbase_device *kbdev)
+static inline bool timer_callback_should_run(struct kbase_device *kbdev, int nr_running_ctxs)
 {
-	struct kbase_backend_data *backend = &kbdev->hwaccess.backend;
-	int nr_running_ctxs;
-
 	lockdep_assert_held(&kbdev->js_data.runpool_mutex);
 
-	/* Timer must stop if we are suspending */
-	if (backend->suspend_timer)
-		return false;
-
-	/* nr_contexts_pullable is updated with the runpool_mutex. However, the
-	 * locking in the caller gives us a barrier that ensures
-	 * nr_contexts_pullable is up-to-date for reading
-	 */
-	nr_running_ctxs = atomic_read(&kbdev->js_data.nr_contexts_runnable);
-
 #ifdef CONFIG_MALI_DEBUG
 	if (kbdev->js_data.softstop_always) {
 		/* Debug support for allowing soft-stop on a single context */
@@ -91,7 +81,7 @@ static enum hrtimer_restart timer_callback(struct hrtimer *timer)
 	struct kbase_device *kbdev;
 	struct kbasep_js_device_data *js_devdata;
 	struct kbase_backend_data *backend;
-	int s;
+	unsigned int s;
 	bool reset_needed = false;
 
 	KBASE_DEBUG_ASSERT(timer != NULL);
@@ -273,18 +263,20 @@ static enum hrtimer_restart timer_callback(struct hrtimer *timer)
 
 	return HRTIMER_NORESTART;
 }
-#endif /* !MALI_USE_CSF */
 
 void kbase_backend_ctx_count_changed(struct kbase_device *kbdev)
 {
-#if !MALI_USE_CSF
 	struct kbasep_js_device_data *js_devdata = &kbdev->js_data;
 	struct kbase_backend_data *backend = &kbdev->hwaccess.backend;
 	unsigned long flags;
+	/* Timer must stop if we are suspending */
+	const bool suspend_timer = backend->suspend_timer;
+	const int nr_running_ctxs =
+		atomic_read(&kbdev->js_data.nr_contexts_runnable);
 
 	lockdep_assert_held(&js_devdata->runpool_mutex);
 
-	if (!timer_callback_should_run(kbdev)) {
+	if (suspend_timer || !timer_callback_should_run(kbdev, nr_running_ctxs)) {
 		/* Take spinlock to force synchronisation with timer */
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		backend->timer_running = false;
@@ -298,7 +290,8 @@ void kbase_backend_ctx_count_changed(struct kbase_device *kbdev)
 		hrtimer_cancel(&backend->scheduling_timer);
 	}
 
-	if (timer_callback_should_run(kbdev) && !backend->timer_running) {
+	if (!suspend_timer && timer_callback_should_run(kbdev, nr_running_ctxs) &&
+	    !backend->timer_running) {
 		/* Take spinlock to force synchronisation with timer */
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		backend->timer_running = true;
@@ -309,36 +302,59 @@ void kbase_backend_ctx_count_changed(struct kbase_device *kbdev)
 
 		KBASE_KTRACE_ADD_JM(kbdev, JS_POLICY_TIMER_START, NULL, NULL, 0u, 0u);
 	}
-#else /* !MALI_USE_CSF */
-	CSTD_UNUSED(kbdev);
-#endif /* !MALI_USE_CSF */
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	if (unlikely(suspend_timer)) {
+		js_devdata->gpu_metrics_timer_needed = false;
+		/* Cancel the timer as System suspend is happening */
+		hrtimer_cancel(&js_devdata->gpu_metrics_timer);
+		js_devdata->gpu_metrics_timer_running = false;
+		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+		/* Explicitly emit the tracepoint on System suspend */
+		kbase_gpu_metrics_emit_tracepoint(kbdev, ktime_get_raw_ns());
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+		return;
+	}
+
+	if (!nr_running_ctxs) {
+		/* Just set the flag to not restart the timer on expiry */
+		js_devdata->gpu_metrics_timer_needed = false;
+		return;
+	}
+
+	/* There are runnable contexts so the timer is needed */
+	if (!js_devdata->gpu_metrics_timer_needed) {
+		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+		js_devdata->gpu_metrics_timer_needed = true;
+		/* No need to restart the timer if it is already running. */
+		if (!js_devdata->gpu_metrics_timer_running) {
+			hrtimer_start(&js_devdata->gpu_metrics_timer,
+				      HR_TIMER_DELAY_NSEC(kbase_gpu_metrics_get_emit_interval()),
+				      HRTIMER_MODE_REL);
+			js_devdata->gpu_metrics_timer_running = true;
+		}
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	}
+#endif
 }
 
 int kbase_backend_timer_init(struct kbase_device *kbdev)
 {
-#if !MALI_USE_CSF
 	struct kbase_backend_data *backend = &kbdev->hwaccess.backend;
 
 	hrtimer_init(&backend->scheduling_timer, CLOCK_MONOTONIC,
 							HRTIMER_MODE_REL);
 	backend->scheduling_timer.function = timer_callback;
 	backend->timer_running = false;
-#else /* !MALI_USE_CSF */
-	CSTD_UNUSED(kbdev);
-#endif /* !MALI_USE_CSF */
 
 	return 0;
 }
 
 void kbase_backend_timer_term(struct kbase_device *kbdev)
 {
-#if !MALI_USE_CSF
 	struct kbase_backend_data *backend = &kbdev->hwaccess.backend;
 
 	hrtimer_cancel(&backend->scheduling_timer);
-#else /* !MALI_USE_CSF */
-	CSTD_UNUSED(kbdev);
-#endif /* !MALI_USE_CSF */
 }
 
 void kbase_backend_timer_suspend(struct kbase_device *kbdev)
@@ -365,4 +381,3 @@ void kbase_backend_timeouts_changed(struct kbase_device *kbdev)
 
 	backend->timeouts_updated = true;
 }
-
diff --git a/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c b/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c
index 9ce5075..6eedc00 100644
--- a/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c
+++ b/mali_kbase/backend/gpu/mali_kbase_l2_mmu_config.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,8 +19,9 @@
  *
  */
 
+#include <linux/version_compat_defs.h>
+
 #include <mali_kbase.h>
-#include <mali_kbase_bits.h>
 #include <mali_kbase_config_defaults.h>
 #include <device/mali_kbase_device.h>
 #include "mali_kbase_l2_mmu_config.h"
diff --git a/mali_kbase/backend/gpu/mali_kbase_model_dummy.c b/mali_kbase/backend/gpu/mali_kbase_model_dummy.c
index 603ffcf..46bcdc7 100644
--- a/mali_kbase/backend/gpu/mali_kbase_model_dummy.c
+++ b/mali_kbase/backend/gpu/mali_kbase_model_dummy.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -62,8 +62,9 @@
  *      document
  */
 #include <mali_kbase.h>
+#include <device/mali_kbase_device.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
-#include <backend/gpu/mali_kbase_model_dummy.h>
+#include <backend/gpu/mali_kbase_model_linux.h>
 #include <mali_kbase_mem_linux.h>
 
 #if MALI_USE_CSF
@@ -80,71 +81,23 @@ static bool ipa_control_timer_enabled;
 #endif
 
 #define LO_MASK(M) ((M) & 0xFFFFFFFF)
-
-static u32 get_implementation_register(u32 reg)
-{
-	switch (reg) {
-	case GPU_CONTROL_REG(SHADER_PRESENT_LO):
-		return LO_MASK(DUMMY_IMPLEMENTATION_SHADER_PRESENT);
-	case GPU_CONTROL_REG(TILER_PRESENT_LO):
-		return LO_MASK(DUMMY_IMPLEMENTATION_TILER_PRESENT);
-	case GPU_CONTROL_REG(L2_PRESENT_LO):
-		return LO_MASK(DUMMY_IMPLEMENTATION_L2_PRESENT);
-	case GPU_CONTROL_REG(STACK_PRESENT_LO):
-		return LO_MASK(DUMMY_IMPLEMENTATION_STACK_PRESENT);
-
-	case GPU_CONTROL_REG(SHADER_PRESENT_HI):
-	case GPU_CONTROL_REG(TILER_PRESENT_HI):
-	case GPU_CONTROL_REG(L2_PRESENT_HI):
-	case GPU_CONTROL_REG(STACK_PRESENT_HI):
-	/* *** FALLTHROUGH *** */
-	default:
-		return 0;
-	}
-}
-
-struct {
-	unsigned long prfcnt_base;
-	u32 *prfcnt_base_cpu;
-	struct kbase_device *kbdev;
-	struct tagged_addr *pages;
-	size_t page_count;
-
-	u32 time;
-
-	struct {
-		u32 jm;
-		u32 tiler;
-		u32 l2;
-		u32 shader;
-	} prfcnt_en;
-
-	u64 l2_present;
-	u64 shader_present;
-
 #if !MALI_USE_CSF
-	u64 jm_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
-#else
-	u64 cshw_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
-#endif /* !MALI_USE_CSF */
-	u64 tiler_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
-	u64 l2_counters[KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS *
-			KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
-	u64 shader_counters[KBASE_DUMMY_MODEL_MAX_SHADER_CORES *
-			    KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+#define HI_MASK(M) ((M) & 0xFFFFFFFF00000000)
+#endif
 
-} performance_counters = {
-	.l2_present = DUMMY_IMPLEMENTATION_L2_PRESENT,
-	.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
-};
+/* Construct a value for the THREAD_FEATURES register, *except* the two most
+ * significant bits, which are set to IMPLEMENTATION_MODEL in
+ * midgard_model_read_reg().
+ */
+#if MALI_USE_CSF
+#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT)                       \
+	((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 24))
+#else
+#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT)                       \
+	((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 16) | ((MAX_TG_SPLIT) << 24))
+#endif
 
-struct job_slot {
-	int job_active;
-	int job_queued;
-	int job_complete_irq_asserted;
-	int job_irq_mask;
-	int job_disabled;
-};
+struct error_status_t hw_error_status;
 
 /**
  * struct control_reg_values_t - control register values specific to the GPU being 'emulated'
@@ -162,6 +115,9 @@ struct job_slot {
  * @mmu_features:		MMU features
  * @gpu_features_lo:		GPU features (low)
  * @gpu_features_hi:		GPU features (high)
+ * @shader_present:		Available shader bitmap
+ * @stack_present:		Core stack present bitmap
+ *
  */
 struct control_reg_values_t {
 	const char *name;
@@ -176,16 +132,32 @@ struct control_reg_values_t {
 	u32 mmu_features;
 	u32 gpu_features_lo;
 	u32 gpu_features_hi;
+	u32 shader_present;
+	u32 stack_present;
+};
+
+struct job_slot {
+	int job_active;
+	int job_queued;
+	int job_complete_irq_asserted;
+	int job_irq_mask;
+	int job_disabled;
 };
 
 struct dummy_model_t {
 	int reset_completed;
 	int reset_completed_mask;
+#if !MALI_USE_CSF
 	int prfcnt_sample_completed;
+#endif /* !MALI_USE_CSF */
 	int power_changed_mask;	/* 2bits: _ALL,_SINGLE */
 	int power_changed;	/* 1bit */
 	bool clean_caches_completed;
 	bool clean_caches_completed_irq_enabled;
+#if MALI_USE_CSF
+	bool flush_pa_range_completed;
+	bool flush_pa_range_completed_irq_enabled;
+#endif
 	int power_on;		/* 6bits: SHADER[4],TILER,L2 */
 	u32 stack_power_on_lo;
 	u32 coherency_enable;
@@ -196,45 +168,6 @@ struct dummy_model_t {
 	void *data;
 };
 
-void gpu_device_set_data(void *model, void *data)
-{
-	struct dummy_model_t *dummy = (struct dummy_model_t *)model;
-
-	dummy->data = data;
-}
-
-void *gpu_device_get_data(void *model)
-{
-	struct dummy_model_t *dummy = (struct dummy_model_t *)model;
-
-	return dummy->data;
-}
-
-#define signal_int(m, s) m->slots[(s)].job_complete_irq_asserted = 1
-
-/* SCons should pass in a default GPU, but other ways of building (e.g.
- * in-tree) won't, so define one here in case.
- */
-#ifndef CONFIG_MALI_NO_MALI_DEFAULT_GPU
-#define CONFIG_MALI_NO_MALI_DEFAULT_GPU "tMIx"
-#endif
-
-static char *no_mali_gpu = CONFIG_MALI_NO_MALI_DEFAULT_GPU;
-module_param(no_mali_gpu, charp, 0000);
-MODULE_PARM_DESC(no_mali_gpu, "GPU to identify as");
-
-/* Construct a value for the THREAD_FEATURES register, *except* the two most
- * significant bits, which are set to IMPLEMENTATION_MODEL in
- * midgard_model_read_reg().
- */
-#if MALI_USE_CSF
-#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \
-	((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 24))
-#else
-#define THREAD_FEATURES_PARTIAL(MAX_REGISTERS, MAX_TASK_QUEUE, MAX_TG_SPLIT) \
-	((MAX_REGISTERS) | ((MAX_TASK_QUEUE) << 16) | ((MAX_TG_SPLIT) << 24))
-#endif
-
 /* Array associating GPU names with control register values. The first
  * one is used in the case of no match.
  */
@@ -251,6 +184,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tHEx",
@@ -264,6 +199,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tSIx",
@@ -277,6 +214,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2821,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tDVx",
@@ -290,6 +229,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2821,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tNOx",
@@ -303,6 +244,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tGOx_r0p0",
@@ -316,6 +259,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tGOx_r1p0",
@@ -330,6 +275,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2823,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tTRx",
@@ -343,6 +290,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tNAx",
@@ -356,6 +305,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tBEx",
@@ -369,6 +320,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TBEX,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tBAx",
@@ -382,19 +335,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
-	},
-	{
-		.name = "tDUx",
-		.gpu_id = GPU_ID2_MAKE(10, 2, 0, 1, 0, 0, 0),
-		.as_present = 0xFF,
-		.thread_max_threads = 0x180,
-		.thread_max_workgroup_size = 0x180,
-		.thread_max_barrier_size = 0x180,
-		.thread_features = THREAD_FEATURES_PARTIAL(0x6000, 4, 0),
-		.tiler_features = 0x809,
-		.mmu_features = 0x2830,
-		.gpu_features_lo = 0,
-		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tODx",
@@ -408,6 +350,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TODX,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tGRx",
@@ -422,6 +366,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tVAx",
@@ -436,6 +382,8 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT,
+		.stack_present = DUMMY_IMPLEMENTATION_STACK_PRESENT,
 	},
 	{
 		.name = "tTUx",
@@ -450,10 +398,95 @@ static const struct control_reg_values_t all_control_reg_values[] = {
 		.mmu_features = 0x2830,
 		.gpu_features_lo = 0xf,
 		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTUX,
+		.stack_present = 0xF,
+	},
+	{
+		.name = "tTIx",
+		.gpu_id = GPU_ID2_MAKE(12, 8, 1, 0, 0, 0, 0),
+		.as_present = 0xFF,
+		.thread_max_threads = 0x800,
+		.thread_max_workgroup_size = 0x400,
+		.thread_max_barrier_size = 0x400,
+		.thread_features = THREAD_FEATURES_PARTIAL(0x10000, 16, 0),
+		.core_features = 0x1, /* core_1e64fma4tex */
+		.tiler_features = 0x809,
+		.mmu_features = 0x2830,
+		.gpu_features_lo = 0xf,
+		.gpu_features_hi = 0,
+		.shader_present = DUMMY_IMPLEMENTATION_SHADER_PRESENT_TTIX,
+		.stack_present = 0xF,
 	},
 };
 
-struct error_status_t hw_error_status;
+static struct {
+	spinlock_t access_lock;
+#if !MALI_USE_CSF
+	unsigned long prfcnt_base;
+#endif /* !MALI_USE_CSF */
+	u32 *prfcnt_base_cpu;
+
+	u32 time;
+
+	struct gpu_model_prfcnt_en prfcnt_en;
+
+	u64 l2_present;
+	u64 shader_present;
+
+#if !MALI_USE_CSF
+	u64 jm_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+#else
+	u64 cshw_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+#endif /* !MALI_USE_CSF */
+	u64 tiler_counters[KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+	u64 l2_counters[KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS *
+					KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+	u64 shader_counters[KBASE_DUMMY_MODEL_MAX_SHADER_CORES *
+						KBASE_DUMMY_MODEL_COUNTER_PER_CORE];
+} performance_counters;
+
+static u32 get_implementation_register(u32 reg,
+				       const struct control_reg_values_t *const control_reg_values)
+{
+	switch (reg) {
+	case GPU_CONTROL_REG(SHADER_PRESENT_LO):
+		return LO_MASK(control_reg_values->shader_present);
+	case GPU_CONTROL_REG(TILER_PRESENT_LO):
+		return LO_MASK(DUMMY_IMPLEMENTATION_TILER_PRESENT);
+	case GPU_CONTROL_REG(L2_PRESENT_LO):
+		return LO_MASK(DUMMY_IMPLEMENTATION_L2_PRESENT);
+	case GPU_CONTROL_REG(STACK_PRESENT_LO):
+		return LO_MASK(control_reg_values->stack_present);
+
+	case GPU_CONTROL_REG(SHADER_PRESENT_HI):
+	case GPU_CONTROL_REG(TILER_PRESENT_HI):
+	case GPU_CONTROL_REG(L2_PRESENT_HI):
+	case GPU_CONTROL_REG(STACK_PRESENT_HI):
+	/* *** FALLTHROUGH *** */
+	default:
+		return 0;
+	}
+}
+
+void gpu_device_set_data(void *model, void *data)
+{
+	struct dummy_model_t *dummy = (struct dummy_model_t *)model;
+
+	dummy->data = data;
+}
+
+void *gpu_device_get_data(void *model)
+{
+	struct dummy_model_t *dummy = (struct dummy_model_t *)model;
+
+	return dummy->data;
+}
+
+#define signal_int(m, s) m->slots[(s)].job_complete_irq_asserted = 1
+
+static char *no_mali_gpu = CONFIG_MALI_NO_MALI_DEFAULT_GPU;
+module_param(no_mali_gpu, charp, 0000);
+MODULE_PARM_DESC(no_mali_gpu, "GPU to identify as");
 
 #if MALI_USE_CSF
 static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type,
@@ -464,6 +497,7 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type,
 	u32 event_index;
 	u64 value = 0;
 	u32 core;
+	unsigned long flags;
 
 	if (WARN_ON(core_type >= KBASE_IPA_CORE_TYPE_NUM))
 		return 0;
@@ -475,17 +509,20 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type,
 		(ipa_ctl_select_config[core_type] >> (cnt_idx * 8)) & 0xFF;
 
 	/* Currently only primary counter blocks are supported */
-	if (WARN_ON(event_index >= 64))
+	if (WARN_ON(event_index >=
+		    (KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS + KBASE_DUMMY_MODEL_COUNTER_PER_CORE)))
 		return 0;
 
 	/* The actual events start index 4 onwards. Spec also says PRFCNT_EN,
 	 * TIMESTAMP_LO or TIMESTAMP_HI pseudo-counters do not make sense for
 	 * IPA counters. If selected, the value returned for them will be zero.
 	 */
-	if (WARN_ON(event_index <= 3))
+	if (WARN_ON(event_index < KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS))
 		return 0;
 
-	event_index -= 4;
+	event_index -= KBASE_DUMMY_MODEL_COUNTER_HEADER_DWORDS;
+
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
 
 	switch (core_type) {
 	case KBASE_IPA_CORE_TYPE_CSHW:
@@ -514,28 +551,46 @@ static u32 gpu_model_get_prfcnt_value(enum kbase_ipa_core_type core_type,
 		event_index += KBASE_DUMMY_MODEL_COUNTER_PER_CORE;
 	}
 
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
+
 	if (is_low_word)
 		return (value & U32_MAX);
 	else
 		return (value >> 32);
 }
+#endif /* MALI_USE_CSF */
 
-void gpu_model_clear_prfcnt_values(void)
+/**
+ * gpu_model_clear_prfcnt_values_nolock - Clear performance counter values
+ *
+ * Sets all performance counter values to zero. The performance counter access
+ * lock must be held when calling this function.
+ */
+static void gpu_model_clear_prfcnt_values_nolock(void)
 {
-	memset(performance_counters.cshw_counters, 0,
-	       sizeof(performance_counters.cshw_counters));
-
-	memset(performance_counters.tiler_counters, 0,
-	       sizeof(performance_counters.tiler_counters));
-
-	memset(performance_counters.l2_counters, 0,
-	       sizeof(performance_counters.l2_counters));
-
+	lockdep_assert_held(&performance_counters.access_lock);
+#if !MALI_USE_CSF
+	memset(performance_counters.jm_counters, 0, sizeof(performance_counters.jm_counters));
+#else
+	memset(performance_counters.cshw_counters, 0, sizeof(performance_counters.cshw_counters));
+#endif /* !MALI_USE_CSF */
+	memset(performance_counters.tiler_counters, 0, sizeof(performance_counters.tiler_counters));
+	memset(performance_counters.l2_counters, 0, sizeof(performance_counters.l2_counters));
 	memset(performance_counters.shader_counters, 0,
 	       sizeof(performance_counters.shader_counters));
 }
+
+#if MALI_USE_CSF
+void gpu_model_clear_prfcnt_values(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
+	gpu_model_clear_prfcnt_values_nolock();
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
+}
 KBASE_EXPORT_TEST_API(gpu_model_clear_prfcnt_values);
-#endif
+#endif /* MALI_USE_CSF */
 
 /**
  * gpu_model_dump_prfcnt_blocks() - Dump performance counter values to buffer
@@ -545,17 +600,20 @@ KBASE_EXPORT_TEST_API(gpu_model_clear_prfcnt_values);
  * @block_count:        Number of blocks to dump
  * @prfcnt_enable_mask: Counter enable mask
  * @blocks_present:     Available blocks bit mask
+ *
+ * The performance counter access lock must be held before calling this
+ * function.
  */
-static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index,
-					 u32 block_count,
-					 u32 prfcnt_enable_mask,
-					 u64 blocks_present)
+static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index, u32 block_count,
+					 u32 prfcnt_enable_mask, u64 blocks_present)
 {
 	u32 block_idx, counter;
 	u32 counter_value = 0;
 	u32 *prfcnt_base;
 	u32 index = 0;
 
+	lockdep_assert_held(&performance_counters.access_lock);
+
 	prfcnt_base = performance_counters.prfcnt_base_cpu;
 
 	for (block_idx = 0; block_idx < block_count; block_idx++) {
@@ -594,35 +652,18 @@ static void gpu_model_dump_prfcnt_blocks(u64 *values, u32 *out_index,
 	}
 }
 
-/**
- * gpu_model_sync_dummy_prfcnt() - Synchronize dumped performance counter values
- *
- * Used to ensure counter values are not lost if cache invalidation is performed
- * prior to reading.
- */
-static void gpu_model_sync_dummy_prfcnt(void)
-{
-	int i;
-	struct page *pg;
-
-	for (i = 0; i < performance_counters.page_count; i++) {
-		pg = as_page(performance_counters.pages[i]);
-		kbase_sync_single_for_device(performance_counters.kbdev,
-					     kbase_dma_addr(pg), PAGE_SIZE,
-					     DMA_BIDIRECTIONAL);
-	}
-}
-
-static void midgard_model_dump_prfcnt(void)
+static void gpu_model_dump_nolock(void)
 {
 	u32 index = 0;
 
+	lockdep_assert_held(&performance_counters.access_lock);
+
 #if !MALI_USE_CSF
-	gpu_model_dump_prfcnt_blocks(performance_counters.jm_counters, &index,
-				     1, 0xffffffff, 0x1);
+	gpu_model_dump_prfcnt_blocks(performance_counters.jm_counters, &index, 1,
+				     performance_counters.prfcnt_en.fe, 0x1);
 #else
-	gpu_model_dump_prfcnt_blocks(performance_counters.cshw_counters, &index,
-				     1, 0xffffffff, 0x1);
+	gpu_model_dump_prfcnt_blocks(performance_counters.cshw_counters, &index, 1,
+				     performance_counters.prfcnt_en.fe, 0x1);
 #endif /* !MALI_USE_CSF */
 	gpu_model_dump_prfcnt_blocks(performance_counters.tiler_counters,
 				     &index, 1,
@@ -637,12 +678,48 @@ static void midgard_model_dump_prfcnt(void)
 				     performance_counters.prfcnt_en.shader,
 				     performance_counters.shader_present);
 
-	gpu_model_sync_dummy_prfcnt();
+	/* Counter values are cleared after each dump */
+	gpu_model_clear_prfcnt_values_nolock();
 
 	/* simulate a 'long' time between samples */
 	performance_counters.time += 10;
 }
 
+#if !MALI_USE_CSF
+static void midgard_model_dump_prfcnt(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
+	gpu_model_dump_nolock();
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
+}
+#else
+void gpu_model_prfcnt_dump_request(u32 *sample_buf, struct gpu_model_prfcnt_en enable_maps)
+{
+	unsigned long flags;
+
+	if (WARN_ON(!sample_buf))
+		return;
+
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
+	performance_counters.prfcnt_base_cpu = sample_buf;
+	performance_counters.prfcnt_en = enable_maps;
+	gpu_model_dump_nolock();
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
+}
+
+void gpu_model_glb_request_job_irq(void *model)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&hw_error_status.access_lock, flags);
+	hw_error_status.job_irq_status |= JOB_IRQ_GLOBAL_IF;
+	spin_unlock_irqrestore(&hw_error_status.access_lock, flags);
+	gpu_device_raise_irq(model, MODEL_LINUX_JOB_IRQ);
+}
+#endif /* !MALI_USE_CSF */
+
 static void init_register_statuses(struct dummy_model_t *dummy)
 {
 	int i;
@@ -671,8 +748,10 @@ static void init_register_statuses(struct dummy_model_t *dummy)
 	performance_counters.time = 0;
 }
 
-static void update_register_statuses(struct dummy_model_t *dummy, int job_slot)
+static void update_register_statuses(struct dummy_model_t *dummy, unsigned int job_slot)
 {
+	lockdep_assert_held(&hw_error_status.access_lock);
+
 	if (hw_error_status.errors_mask & IS_A_JOB_ERROR) {
 		if (job_slot == hw_error_status.current_job_slot) {
 #if !MALI_USE_CSF
@@ -922,6 +1001,7 @@ static void update_job_irq_js_state(struct dummy_model_t *dummy, int mask)
 {
 	int i;
 
+	lockdep_assert_held(&hw_error_status.access_lock);
 	pr_debug("%s", "Updating the JS_ACTIVE register");
 
 	for (i = 0; i < NUM_SLOTS; i++) {
@@ -967,6 +1047,21 @@ static const struct control_reg_values_t *find_control_reg_values(const char *gp
 	size_t i;
 	const struct control_reg_values_t *ret = NULL;
 
+	/* Edge case for tGOx, as it has 2 entries in the table for its R0 and R1
+	 * revisions respectively. As none of them are named "tGOx" the name comparison
+	 * needs to be fixed in these cases. CONFIG_GPU_HWVER should be one of "r0p0"
+	 * or "r1p0" and is derived from the DDK's build configuration. In cases
+	 * where it is unavailable, it defaults to tGOx r1p0.
+	 */
+	if (!strcmp(gpu, "tGOx")) {
+#ifdef CONFIG_GPU_HWVER
+		if (!strcmp(CONFIG_GPU_HWVER, "r0p0"))
+			gpu = "tGOx_r0p0";
+		else if (!strcmp(CONFIG_GPU_HWVER, "r1p0"))
+#endif /* CONFIG_GPU_HWVER defined */
+			gpu = "tGOx_r1p0";
+	}
+
 	for (i = 0; i < ARRAY_SIZE(all_control_reg_values); ++i) {
 		const struct control_reg_values_t * const fcrv = &all_control_reg_values[i];
 
@@ -986,17 +1081,29 @@ static const struct control_reg_values_t *find_control_reg_values(const char *gp
 	return ret;
 }
 
-void *midgard_model_create(const void *config)
+void *midgard_model_create(struct kbase_device *kbdev)
 {
 	struct dummy_model_t *dummy = NULL;
 
+	spin_lock_init(&hw_error_status.access_lock);
+	spin_lock_init(&performance_counters.access_lock);
+
 	dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
 
 	if (dummy) {
 		dummy->job_irq_js_state = 0;
 		init_register_statuses(dummy);
 		dummy->control_reg_values = find_control_reg_values(no_mali_gpu);
+		performance_counters.l2_present = get_implementation_register(
+			GPU_CONTROL_REG(L2_PRESENT_LO), dummy->control_reg_values);
+		performance_counters.shader_present = get_implementation_register(
+			GPU_CONTROL_REG(SHADER_PRESENT_LO), dummy->control_reg_values);
+
+		gpu_device_set_data(dummy, kbdev);
+
+		dev_info(kbdev->dev, "Using Dummy Model");
 	}
+
 	return dummy;
 }
 
@@ -1009,18 +1116,24 @@ static void midgard_model_get_outputs(void *h)
 {
 	struct dummy_model_t *dummy = (struct dummy_model_t *)h;
 
+	lockdep_assert_held(&hw_error_status.access_lock);
+
 	if (hw_error_status.job_irq_status)
-		gpu_device_raise_irq(dummy, GPU_DUMMY_JOB_IRQ);
+		gpu_device_raise_irq(dummy, MODEL_LINUX_JOB_IRQ);
 
 	if ((dummy->power_changed && dummy->power_changed_mask) ||
 	    (dummy->reset_completed & dummy->reset_completed_mask) ||
 	    hw_error_status.gpu_error_irq ||
-	    (dummy->clean_caches_completed && dummy->clean_caches_completed_irq_enabled) ||
-	    dummy->prfcnt_sample_completed)
-		gpu_device_raise_irq(dummy, GPU_DUMMY_GPU_IRQ);
+#if !MALI_USE_CSF
+	    dummy->prfcnt_sample_completed ||
+#else
+	    (dummy->flush_pa_range_completed && dummy->flush_pa_range_completed_irq_enabled) ||
+#endif
+	    (dummy->clean_caches_completed && dummy->clean_caches_completed_irq_enabled))
+		gpu_device_raise_irq(dummy, MODEL_LINUX_GPU_IRQ);
 
 	if (hw_error_status.mmu_irq_rawstat & hw_error_status.mmu_irq_mask)
-		gpu_device_raise_irq(dummy, GPU_DUMMY_MMU_IRQ);
+		gpu_device_raise_irq(dummy, MODEL_LINUX_MMU_IRQ);
 }
 
 static void midgard_model_update(void *h)
@@ -1028,6 +1141,8 @@ static void midgard_model_update(void *h)
 	struct dummy_model_t *dummy = (struct dummy_model_t *)h;
 	int i;
 
+	lockdep_assert_held(&hw_error_status.access_lock);
+
 	for (i = 0; i < NUM_SLOTS; i++) {
 		if (!dummy->slots[i].job_active)
 			continue;
@@ -1074,6 +1189,8 @@ static void invalidate_active_jobs(struct dummy_model_t *dummy)
 {
 	int i;
 
+	lockdep_assert_held(&hw_error_status.access_lock);
+
 	for (i = 0; i < NUM_SLOTS; i++) {
 		if (dummy->slots[i].job_active) {
 			hw_error_status.job_irq_rawstat |= (1 << (16 + i));
@@ -1083,13 +1200,17 @@ static void invalidate_active_jobs(struct dummy_model_t *dummy)
 	}
 }
 
-u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
+void midgard_model_write_reg(void *h, u32 addr, u32 value)
 {
+	unsigned long flags;
 	struct dummy_model_t *dummy = (struct dummy_model_t *)h;
+
+	spin_lock_irqsave(&hw_error_status.access_lock, flags);
+
 #if !MALI_USE_CSF
 	if ((addr >= JOB_CONTROL_REG(JOB_SLOT0)) &&
 			(addr < (JOB_CONTROL_REG(JOB_SLOT15) + 0x80))) {
-		int slot_idx = (addr >> 7) & 0xf;
+		unsigned int slot_idx = (addr >> 7) & 0xf;
 
 		KBASE_DEBUG_ASSERT(slot_idx < NUM_SLOTS);
 		if (addr == JOB_SLOT_REG(slot_idx, JS_HEAD_NEXT_LO)) {
@@ -1176,6 +1297,9 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 		dummy->reset_completed_mask = (value >> 8) & 0x01;
 		dummy->power_changed_mask = (value >> 9) & 0x03;
 		dummy->clean_caches_completed_irq_enabled = (value & (1u << 17)) != 0u;
+#if MALI_USE_CSF
+		dummy->flush_pa_range_completed_irq_enabled = (value & (1u << 20)) != 0u;
+#endif
 	} else if (addr == GPU_CONTROL_REG(COHERENCY_ENABLE)) {
 		dummy->coherency_enable = value;
 	} else if (addr == GPU_CONTROL_REG(GPU_IRQ_CLEAR)) {
@@ -1188,8 +1312,16 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 
 		if (value & (1 << 17))
 			dummy->clean_caches_completed = false;
-		if (value & (1 << 16))
+
+#if MALI_USE_CSF
+		if (value & (1u << 20))
+			dummy->flush_pa_range_completed = false;
+#endif /* MALI_USE_CSF */
+
+#if !MALI_USE_CSF
+		if (value & PRFCNT_SAMPLE_COMPLETED) /* (1 << 16) */
 			dummy->prfcnt_sample_completed = 0;
+#endif /* !MALI_USE_CSF */
 
 		/*update error status */
 		hw_error_status.gpu_error_irq &= ~(value);
@@ -1214,21 +1346,42 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 			pr_debug("clean caches requested");
 			dummy->clean_caches_completed = true;
 			break;
+#if MALI_USE_CSF
+		case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2:
+		case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC:
+		case GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_FULL:
+			pr_debug("pa range flush requested");
+			dummy->flush_pa_range_completed = true;
+			break;
+#endif /* MALI_USE_CSF */
+#if !MALI_USE_CSF
 		case GPU_COMMAND_PRFCNT_SAMPLE:
 			midgard_model_dump_prfcnt();
 			dummy->prfcnt_sample_completed = 1;
+#endif /* !MALI_USE_CSF */
 		default:
 			break;
 		}
+#if MALI_USE_CSF
+	} else if (addr >= GPU_CONTROL_REG(GPU_COMMAND_ARG0_LO) &&
+		   addr <= GPU_CONTROL_REG(GPU_COMMAND_ARG1_HI)) {
+		/* Writes ignored */
+#endif
 	} else if (addr == GPU_CONTROL_REG(L2_CONFIG)) {
 		dummy->l2_config = value;
 	}
 #if MALI_USE_CSF
-	else if (addr >= GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET) &&
-			 addr < GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET +
-						(CSF_NUM_DOORBELL * CSF_HW_DOORBELL_PAGE_SIZE))) {
-		if (addr == GPU_CONTROL_REG(CSF_HW_DOORBELL_PAGE_OFFSET))
+	else if (addr >= CSF_HW_DOORBELL_PAGE_OFFSET &&
+		 addr < CSF_HW_DOORBELL_PAGE_OFFSET +
+				 (CSF_NUM_DOORBELL * CSF_HW_DOORBELL_PAGE_SIZE)) {
+		if (addr == CSF_HW_DOORBELL_PAGE_OFFSET)
 			hw_error_status.job_irq_status = JOB_IRQ_GLOBAL_IF;
+	} else if ((addr >= GPU_CONTROL_REG(SYSC_ALLOC0)) &&
+		   (addr < GPU_CONTROL_REG(SYSC_ALLOC(SYSC_ALLOC_COUNT)))) {
+		/* Do nothing */
+	} else if ((addr >= GPU_CONTROL_REG(ASN_HASH_0)) &&
+		   (addr < GPU_CONTROL_REG(ASN_HASH(ASN_HASH_COUNT)))) {
+		/* Do nothing */
 	} else if (addr == IPA_CONTROL_REG(COMMAND)) {
 		pr_debug("Received IPA_CONTROL command");
 	} else if (addr == IPA_CONTROL_REG(TIMER)) {
@@ -1249,14 +1402,13 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 		}
 	}
 #endif
-	else if (addr == MMU_REG(MMU_IRQ_MASK)) {
+	else if (addr == MMU_CONTROL_REG(MMU_IRQ_MASK)) {
 		hw_error_status.mmu_irq_mask = value;
-	} else if (addr == MMU_REG(MMU_IRQ_CLEAR)) {
+	} else if (addr == MMU_CONTROL_REG(MMU_IRQ_CLEAR)) {
 		hw_error_status.mmu_irq_rawstat &= (~value);
-	} else if ((addr >= MMU_AS_REG(0, AS_TRANSTAB_LO)) &&
-			(addr <= MMU_AS_REG(15, AS_STATUS))) {
-		int mem_addr_space = (addr - MMU_AS_REG(0, AS_TRANSTAB_LO))
-									>> 6;
+	} else if ((addr >= MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) &&
+		   (addr <= MMU_STAGE1_REG(MMU_AS_REG(15, AS_STATUS)))) {
+		int mem_addr_space = (addr - MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) >> 6;
 
 		switch (addr & 0x3F) {
 		case AS_COMMAND:
@@ -1346,20 +1498,24 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 						mem_addr_space, addr, value);
 			break;
 		}
-	} else if (addr >= GPU_CONTROL_REG(PRFCNT_BASE_LO) &&
-			   addr <= GPU_CONTROL_REG(PRFCNT_MMU_L2_EN)) {
+	} else {
 		switch (addr) {
+#if !MALI_USE_CSF
 		case PRFCNT_BASE_LO:
-			performance_counters.prfcnt_base |= value;
+			performance_counters.prfcnt_base =
+				HI_MASK(performance_counters.prfcnt_base) | value;
+			performance_counters.prfcnt_base_cpu =
+				(u32 *)(uintptr_t)performance_counters.prfcnt_base;
 			break;
 		case PRFCNT_BASE_HI:
-			performance_counters.prfcnt_base |= ((u64) value) << 32;
+			performance_counters.prfcnt_base =
+				LO_MASK(performance_counters.prfcnt_base) | (((u64)value) << 32);
+			performance_counters.prfcnt_base_cpu =
+				(u32 *)(uintptr_t)performance_counters.prfcnt_base;
 			break;
-#if !MALI_USE_CSF
 		case PRFCNT_JM_EN:
-			performance_counters.prfcnt_en.jm = value;
+			performance_counters.prfcnt_en.fe = value;
 			break;
-#endif /* !MALI_USE_CSF */
 		case PRFCNT_SHADER_EN:
 			performance_counters.prfcnt_en.shader = value;
 			break;
@@ -1369,9 +1525,7 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 		case PRFCNT_MMU_L2_EN:
 			performance_counters.prfcnt_en.l2 = value;
 			break;
-		}
-	} else {
-		switch (addr) {
+#endif /* !MALI_USE_CSF */
 		case TILER_PWRON_LO:
 			dummy->power_on |= (value & 1) << 1;
 			/* Also ensure L2 is powered on */
@@ -1379,7 +1533,8 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 			dummy->power_changed = 1;
 			break;
 		case SHADER_PWRON_LO:
-			dummy->power_on |= (value & 0xF) << 2;
+			dummy->power_on |=
+				(value & dummy->control_reg_values->shader_present) << 2;
 			dummy->power_changed = 1;
 			break;
 		case L2_PWRON_LO:
@@ -1395,7 +1550,8 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 			dummy->power_changed = 1;
 			break;
 		case SHADER_PWROFF_LO:
-			dummy->power_on &= ~((value & 0xF) << 2);
+			dummy->power_on &=
+				~((value & dummy->control_reg_values->shader_present) << 2);
 			dummy->power_changed = 1;
 			break;
 		case L2_PWROFF_LO:
@@ -1416,6 +1572,7 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 		case PWR_OVERRIDE0:
 #if !MALI_USE_CSF
 		case JM_CONFIG:
+		case PRFCNT_CONFIG:
 #else /* !MALI_USE_CSF */
 		case CSF_CONFIG:
 #endif /* !MALI_USE_CSF */
@@ -1434,13 +1591,16 @@ u8 midgard_model_write_reg(void *h, u32 addr, u32 value)
 
 	midgard_model_update(dummy);
 	midgard_model_get_outputs(dummy);
-
-	return 1;
+	spin_unlock_irqrestore(&hw_error_status.access_lock, flags);
 }
 
-u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
+void midgard_model_read_reg(void *h, u32 addr, u32 *const value)
 {
+	unsigned long flags;
 	struct dummy_model_t *dummy = (struct dummy_model_t *)h;
+
+	spin_lock_irqsave(&hw_error_status.access_lock, flags);
+
 	*value = 0;		/* 0 by default */
 #if !MALI_USE_CSF
 	if (addr == JOB_CONTROL_REG(JOB_IRQ_JS_STATE)) {
@@ -1475,24 +1635,44 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 #endif /* !MALI_USE_CSF */
 	else if (addr == GPU_CONTROL_REG(GPU_IRQ_MASK)) {
 		*value = (dummy->reset_completed_mask << 8) |
-				(dummy->power_changed_mask << 9) | (1 << 7) | 1;
+			 ((dummy->clean_caches_completed_irq_enabled ? 1u : 0u) << 17) |
+#if MALI_USE_CSF
+			 ((dummy->flush_pa_range_completed_irq_enabled ? 1u : 0u) << 20) |
+#endif
+			 (dummy->power_changed_mask << 9) | (1 << 7) | 1;
 		pr_debug("GPU_IRQ_MASK read %x", *value);
 	} else if (addr == GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) {
 		*value = (dummy->power_changed << 9) | (dummy->power_changed << 10) |
 			 (dummy->reset_completed << 8) |
+#if !MALI_USE_CSF
+			 (dummy->prfcnt_sample_completed ? PRFCNT_SAMPLE_COMPLETED : 0) |
+#endif /* !MALI_USE_CSF */
 			 ((dummy->clean_caches_completed ? 1u : 0u) << 17) |
-			 (dummy->prfcnt_sample_completed << 16) | hw_error_status.gpu_error_irq;
+#if MALI_USE_CSF
+			 ((dummy->flush_pa_range_completed ? 1u : 0u) << 20) |
+#endif
+			 hw_error_status.gpu_error_irq;
 		pr_debug("GPU_IRQ_RAWSTAT read %x", *value);
 	} else if (addr == GPU_CONTROL_REG(GPU_IRQ_STATUS)) {
 		*value = ((dummy->power_changed && (dummy->power_changed_mask & 0x1)) << 9) |
 			 ((dummy->power_changed && (dummy->power_changed_mask & 0x2)) << 10) |
 			 ((dummy->reset_completed & dummy->reset_completed_mask) << 8) |
+#if !MALI_USE_CSF
+			 (dummy->prfcnt_sample_completed ? PRFCNT_SAMPLE_COMPLETED : 0) |
+#endif /* !MALI_USE_CSF */
 			 (((dummy->clean_caches_completed &&
 			    dummy->clean_caches_completed_irq_enabled) ?
 				   1u :
 				   0u)
 			  << 17) |
-			 (dummy->prfcnt_sample_completed << 16) | hw_error_status.gpu_error_irq;
+#if MALI_USE_CSF
+			 (((dummy->flush_pa_range_completed &&
+			    dummy->flush_pa_range_completed_irq_enabled) ?
+				   1u :
+				   0u)
+			  << 20) |
+#endif
+			 hw_error_status.gpu_error_irq;
 		pr_debug("GPU_IRQ_STAT read %x", *value);
 	} else if (addr == GPU_CONTROL_REG(GPU_STATUS)) {
 		*value = 0;
@@ -1504,8 +1684,18 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		*value = hw_error_status.gpu_fault_status;
 	} else if (addr == GPU_CONTROL_REG(L2_CONFIG)) {
 		*value = dummy->l2_config;
-	} else if ((addr >= GPU_CONTROL_REG(SHADER_PRESENT_LO)) &&
-				(addr <= GPU_CONTROL_REG(L2_MMU_CONFIG))) {
+	}
+#if MALI_USE_CSF
+	else if ((addr >= GPU_CONTROL_REG(SYSC_ALLOC0)) &&
+		 (addr < GPU_CONTROL_REG(SYSC_ALLOC(SYSC_ALLOC_COUNT)))) {
+		*value = 0;
+	} else if ((addr >= GPU_CONTROL_REG(ASN_HASH_0)) &&
+		   (addr < GPU_CONTROL_REG(ASN_HASH(ASN_HASH_COUNT)))) {
+		*value = 0;
+	}
+#endif
+	else if ((addr >= GPU_CONTROL_REG(SHADER_PRESENT_LO)) &&
+		 (addr <= GPU_CONTROL_REG(L2_MMU_CONFIG))) {
 		switch (addr) {
 		case GPU_CONTROL_REG(SHADER_PRESENT_LO):
 		case GPU_CONTROL_REG(SHADER_PRESENT_HI):
@@ -1515,27 +1705,27 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		case GPU_CONTROL_REG(L2_PRESENT_HI):
 		case GPU_CONTROL_REG(STACK_PRESENT_LO):
 		case GPU_CONTROL_REG(STACK_PRESENT_HI):
-			*value = get_implementation_register(addr);
+			*value = get_implementation_register(addr, dummy->control_reg_values);
 			break;
 		case GPU_CONTROL_REG(SHADER_READY_LO):
 			*value = (dummy->power_on >> 0x02) &
-			get_implementation_register(
-				GPU_CONTROL_REG(SHADER_PRESENT_LO));
+				 get_implementation_register(GPU_CONTROL_REG(SHADER_PRESENT_LO),
+							     dummy->control_reg_values);
 			break;
 		case GPU_CONTROL_REG(TILER_READY_LO):
 			*value = (dummy->power_on >> 0x01) &
-				 get_implementation_register(
-				GPU_CONTROL_REG(TILER_PRESENT_LO));
+				 get_implementation_register(GPU_CONTROL_REG(TILER_PRESENT_LO),
+							     dummy->control_reg_values);
 			break;
 		case GPU_CONTROL_REG(L2_READY_LO):
 			*value = dummy->power_on &
-				 get_implementation_register(
-				GPU_CONTROL_REG(L2_PRESENT_LO));
+				 get_implementation_register(GPU_CONTROL_REG(L2_PRESENT_LO),
+							     dummy->control_reg_values);
 			break;
 		case GPU_CONTROL_REG(STACK_READY_LO):
 			*value = dummy->stack_power_on_lo &
-				 get_implementation_register(
-				GPU_CONTROL_REG(STACK_PRESENT_LO));
+				 get_implementation_register(GPU_CONTROL_REG(STACK_PRESENT_LO),
+							     dummy->control_reg_values);
 			break;
 
 		case GPU_CONTROL_REG(SHADER_READY_HI):
@@ -1729,10 +1919,9 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 	} else if (addr >= GPU_CONTROL_REG(CYCLE_COUNT_LO)
 				&& addr <= GPU_CONTROL_REG(TIMESTAMP_HI)) {
 		*value = 0;
-	} else if (addr >= MMU_AS_REG(0, AS_TRANSTAB_LO)
-				&& addr <= MMU_AS_REG(15, AS_STATUS)) {
-		int mem_addr_space = (addr - MMU_AS_REG(0, AS_TRANSTAB_LO))
-									>> 6;
+	} else if (addr >= MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO)) &&
+		   addr <= MMU_STAGE1_REG(MMU_AS_REG(15, AS_STATUS))) {
+		int mem_addr_space = (addr - MMU_STAGE1_REG(MMU_AS_REG(0, AS_TRANSTAB_LO))) >> 6;
 
 		switch (addr & 0x3F) {
 		case AS_TRANSTAB_LO:
@@ -1776,11 +1965,11 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 			*value = 0;
 			break;
 		}
-	} else if (addr == MMU_REG(MMU_IRQ_MASK)) {
+	} else if (addr == MMU_CONTROL_REG(MMU_IRQ_MASK)) {
 		*value = hw_error_status.mmu_irq_mask;
-	} else if (addr == MMU_REG(MMU_IRQ_RAWSTAT)) {
+	} else if (addr == MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)) {
 		*value = hw_error_status.mmu_irq_rawstat;
-	} else if (addr == MMU_REG(MMU_IRQ_STATUS)) {
+	} else if (addr == MMU_CONTROL_REG(MMU_IRQ_STATUS)) {
 		*value = hw_error_status.mmu_irq_mask &
 						hw_error_status.mmu_irq_rawstat;
 	}
@@ -1788,8 +1977,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 	else if (addr == IPA_CONTROL_REG(STATUS)) {
 		*value = (ipa_control_timer_enabled << 31);
 	} else if ((addr >= IPA_CONTROL_REG(VALUE_CSHW_REG_LO(0))) &&
-		   (addr <= IPA_CONTROL_REG(VALUE_CSHW_REG_HI(
-				    IPA_CTL_MAX_VAL_CNT_IDX)))) {
+		   (addr <= IPA_CONTROL_REG(VALUE_CSHW_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) {
 		u32 counter_index =
 			(addr - IPA_CONTROL_REG(VALUE_CSHW_REG_LO(0))) >> 3;
 		bool is_low_word =
@@ -1798,8 +1986,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		*value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_CSHW,
 						    counter_index, is_low_word);
 	} else if ((addr >= IPA_CONTROL_REG(VALUE_MEMSYS_REG_LO(0))) &&
-		   (addr <= IPA_CONTROL_REG(VALUE_MEMSYS_REG_HI(
-				    IPA_CTL_MAX_VAL_CNT_IDX)))) {
+		   (addr <= IPA_CONTROL_REG(VALUE_MEMSYS_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) {
 		u32 counter_index =
 			(addr - IPA_CONTROL_REG(VALUE_MEMSYS_REG_LO(0))) >> 3;
 		bool is_low_word =
@@ -1808,8 +1995,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		*value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_MEMSYS,
 						    counter_index, is_low_word);
 	} else if ((addr >= IPA_CONTROL_REG(VALUE_TILER_REG_LO(0))) &&
-		   (addr <= IPA_CONTROL_REG(VALUE_TILER_REG_HI(
-				    IPA_CTL_MAX_VAL_CNT_IDX)))) {
+		   (addr <= IPA_CONTROL_REG(VALUE_TILER_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) {
 		u32 counter_index =
 			(addr - IPA_CONTROL_REG(VALUE_TILER_REG_LO(0))) >> 3;
 		bool is_low_word =
@@ -1818,8 +2004,7 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		*value = gpu_model_get_prfcnt_value(KBASE_IPA_CORE_TYPE_TILER,
 						    counter_index, is_low_word);
 	} else if ((addr >= IPA_CONTROL_REG(VALUE_SHADER_REG_LO(0))) &&
-		   (addr <= IPA_CONTROL_REG(VALUE_SHADER_REG_HI(
-				    IPA_CTL_MAX_VAL_CNT_IDX)))) {
+		   (addr <= IPA_CONTROL_REG(VALUE_SHADER_REG_HI(IPA_CTL_MAX_VAL_CNT_IDX)))) {
 		u32 counter_index =
 			(addr - IPA_CONTROL_REG(VALUE_SHADER_REG_LO(0))) >> 3;
 		bool is_low_word =
@@ -1840,18 +2025,18 @@ u8 midgard_model_read_reg(void *h, u32 addr, u32 * const value)
 		*value = 0;
 	}
 
+	spin_unlock_irqrestore(&hw_error_status.access_lock, flags);
 	CSTD_UNUSED(dummy);
-
-	return 1;
 }
 
-static u32 set_user_sample_core_type(u64 *counters,
-	u32 *usr_data_start, u32 usr_data_offset,
-	u32 usr_data_size, u32 core_count)
+static u32 set_user_sample_core_type(u64 *counters, u32 *usr_data_start, u32 usr_data_offset,
+				     u32 usr_data_size, u32 core_count)
 {
 	u32 sample_size;
 	u32 *usr_data = NULL;
 
+	lockdep_assert_held(&performance_counters.access_lock);
+
 	sample_size =
 		core_count * KBASE_DUMMY_MODEL_COUNTER_PER_CORE * sizeof(u32);
 
@@ -1866,11 +2051,7 @@ static u32 set_user_sample_core_type(u64 *counters,
 		u32 i;
 
 		for (i = 0; i < loop_cnt; i++) {
-			if (copy_from_user(&counters[i], &usr_data[i],
-					   sizeof(u32))) {
-				model_error_log(KBASE_CORE, "Unable to set counter sample 2");
-				break;
-			}
+			counters[i] = usr_data[i];
 		}
 	}
 
@@ -1884,6 +2065,8 @@ static u32 set_kernel_sample_core_type(u64 *counters,
 	u32 sample_size;
 	u64 *usr_data = NULL;
 
+	lockdep_assert_held(&performance_counters.access_lock);
+
 	sample_size =
 		core_count * KBASE_DUMMY_MODEL_COUNTER_PER_CORE * sizeof(u64);
 
@@ -1900,49 +2083,70 @@ static u32 set_kernel_sample_core_type(u64 *counters,
 }
 
 /* Counter values injected through ioctl are of 32 bits */
-void gpu_model_set_dummy_prfcnt_sample(u32 *usr_data, u32 usr_data_size)
+int gpu_model_set_dummy_prfcnt_user_sample(u32 __user *data, u32 size)
 {
+	unsigned long flags;
+	u32 *user_data;
 	u32 offset = 0;
 
+	if (data == NULL || size == 0 || size > KBASE_DUMMY_MODEL_COUNTER_TOTAL * sizeof(u32))
+		return -EINVAL;
+
+	/* copy_from_user might sleep so can't be called from inside a spinlock
+	 * allocate a temporary buffer for user data and copy to that before taking
+	 * the lock
+	 */
+	user_data = kmalloc(size, GFP_KERNEL);
+	if (!user_data)
+		return -ENOMEM;
+
+	if (copy_from_user(user_data, data, size)) {
+		model_error_log(KBASE_CORE, "Unable to copy prfcnt data from userspace");
+		kfree(user_data);
+		return -EINVAL;
+	}
+
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
 #if !MALI_USE_CSF
-	offset = set_user_sample_core_type(performance_counters.jm_counters,
-		usr_data, offset, usr_data_size, 1);
+	offset = set_user_sample_core_type(performance_counters.jm_counters, user_data, offset,
+					   size, 1);
 #else
-	offset = set_user_sample_core_type(performance_counters.cshw_counters,
-		usr_data, offset, usr_data_size, 1);
+	offset = set_user_sample_core_type(performance_counters.cshw_counters, user_data, offset,
+					   size, 1);
 #endif /* !MALI_USE_CSF */
-	offset = set_user_sample_core_type(performance_counters.tiler_counters,
-		usr_data, offset, usr_data_size,
-		hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT));
-	offset = set_user_sample_core_type(performance_counters.l2_counters,
-		usr_data, offset, usr_data_size,
-		KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS);
-	offset = set_user_sample_core_type(performance_counters.shader_counters,
-		usr_data, offset, usr_data_size,
-		KBASE_DUMMY_MODEL_MAX_SHADER_CORES);
+	offset = set_user_sample_core_type(performance_counters.tiler_counters, user_data, offset,
+					   size, hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT));
+	offset = set_user_sample_core_type(performance_counters.l2_counters, user_data, offset,
+					   size, KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS);
+	offset = set_user_sample_core_type(performance_counters.shader_counters, user_data, offset,
+					   size, KBASE_DUMMY_MODEL_MAX_SHADER_CORES);
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
+
+	kfree(user_data);
+	return 0;
 }
 
 /* Counter values injected through kutf are of 64 bits */
-void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *usr_data, u32 usr_data_size)
+void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *data, u32 size)
 {
+	unsigned long flags;
 	u32 offset = 0;
 
+	spin_lock_irqsave(&performance_counters.access_lock, flags);
 #if !MALI_USE_CSF
-	offset = set_kernel_sample_core_type(performance_counters.jm_counters,
-		usr_data, offset, usr_data_size, 1);
+	offset = set_kernel_sample_core_type(performance_counters.jm_counters, data, offset, size,
+					     1);
 #else
-	offset = set_kernel_sample_core_type(performance_counters.cshw_counters,
-		usr_data, offset, usr_data_size, 1);
+	offset = set_kernel_sample_core_type(performance_counters.cshw_counters, data, offset, size,
+					     1);
 #endif /* !MALI_USE_CSF */
-	offset = set_kernel_sample_core_type(performance_counters.tiler_counters,
-		usr_data, offset, usr_data_size,
-		hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT));
-	offset = set_kernel_sample_core_type(performance_counters.l2_counters,
-		usr_data, offset, usr_data_size,
-		hweight64(performance_counters.l2_present));
-	offset = set_kernel_sample_core_type(performance_counters.shader_counters,
-		usr_data, offset, usr_data_size,
-		hweight64(performance_counters.shader_present));
+	offset = set_kernel_sample_core_type(performance_counters.tiler_counters, data, offset,
+					     size, hweight64(DUMMY_IMPLEMENTATION_TILER_PRESENT));
+	offset = set_kernel_sample_core_type(performance_counters.l2_counters, data, offset, size,
+					     hweight64(performance_counters.l2_present));
+	offset = set_kernel_sample_core_type(performance_counters.shader_counters, data, offset,
+					     size, hweight64(performance_counters.shader_present));
+	spin_unlock_irqrestore(&performance_counters.access_lock, flags);
 }
 KBASE_EXPORT_TEST_API(gpu_model_set_dummy_prfcnt_kernel_sample);
 
@@ -1977,21 +2181,12 @@ void gpu_model_set_dummy_prfcnt_cores(struct kbase_device *kbdev,
 }
 KBASE_EXPORT_TEST_API(gpu_model_set_dummy_prfcnt_cores);
 
-void gpu_model_set_dummy_prfcnt_base_cpu(u32 *base, struct kbase_device *kbdev,
-					 struct tagged_addr *pages,
-					 size_t page_count)
-{
-	performance_counters.prfcnt_base_cpu = base;
-	performance_counters.kbdev = kbdev;
-	performance_counters.pages = pages;
-	performance_counters.page_count = page_count;
-}
-
 int gpu_model_control(void *model,
 				struct kbase_model_control_params *params)
 {
 	struct dummy_model_t *dummy = (struct dummy_model_t *)model;
 	int i;
+	unsigned long flags;
 
 	if (params->command == KBASE_MC_DISABLE_JOBS) {
 		for (i = 0; i < NUM_SLOTS; i++)
@@ -2000,8 +2195,10 @@ int gpu_model_control(void *model,
 		return -EINVAL;
 	}
 
+	spin_lock_irqsave(&hw_error_status.access_lock, flags);
 	midgard_model_update(dummy);
 	midgard_model_get_outputs(dummy);
+	spin_unlock_irqrestore(&hw_error_status.access_lock, flags);
 
 	return 0;
 }
diff --git a/mali_kbase/backend/gpu/mali_kbase_model_dummy.h b/mali_kbase/backend/gpu/mali_kbase_model_dummy.h
index 87690f4..2a3351b 100644
--- a/mali_kbase/backend/gpu/mali_kbase_model_dummy.h
+++ b/mali_kbase/backend/gpu/mali_kbase_model_dummy.h
@@ -21,11 +21,24 @@
 
 /*
  * Dummy Model interface
+ *
+ * Support for NO_MALI dummy Model interface.
+ *
+ * +-----------------------------------+
+ * | Kbase read/write/IRQ              |
+ * +-----------------------------------+
+ * | Model Linux Framework             |
+ * +-----------------------------------+
+ * | Model Dummy interface definitions |
+ * +-----------------+-----------------+
+ * | Fake R/W        | Fake IRQ        |
+ * +-----------------+-----------------+
  */
 
 #ifndef _KBASE_MODEL_DUMMY_H_
 #define _KBASE_MODEL_DUMMY_H_
 
+#include <uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_linux.h>
 #include <uapi/gpu/arm/midgard/backend/gpu/mali_kbase_model_dummy.h>
 
 #define model_error_log(module, ...) pr_err(__VA_ARGS__)
@@ -116,6 +129,8 @@ struct kbase_error_atom {
 
 /*struct to track the system error state*/
 struct error_status_t {
+	spinlock_t access_lock;
+
 	u32 errors_mask;
 	u32 mmu_table_level;
 	int faulty_mmu_as;
@@ -138,38 +153,71 @@ struct error_status_t {
 	u64 as_transtab[NUM_MMU_AS];
 };
 
-void *midgard_model_create(const void *config);
-void midgard_model_destroy(void *h);
-u8 midgard_model_write_reg(void *h, u32 addr, u32 value);
-u8 midgard_model_read_reg(void *h, u32 addr,
-							u32 * const value);
+/**
+ * struct gpu_model_prfcnt_en - Performance counter enable masks
+ * @fe: Enable mask for front-end block
+ * @tiler: Enable mask for tiler block
+ * @l2: Enable mask for L2/Memory system blocks
+ * @shader: Enable mask for shader core blocks
+ */
+struct gpu_model_prfcnt_en {
+	u32 fe;
+	u32 tiler;
+	u32 l2;
+	u32 shader;
+};
+
 void midgard_set_error(int job_slot);
 int job_atom_inject_error(struct kbase_error_params *params);
 int gpu_model_control(void *h,
 				struct kbase_model_control_params *params);
 
-void gpu_model_set_dummy_prfcnt_sample(u32 *usr_data, u32 usr_data_size);
-void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *usr_data, u32 usr_data_size);
+/**
+ * gpu_model_set_dummy_prfcnt_user_sample() - Set performance counter values
+ * @data: Userspace pointer to array of counter values
+ * @size: Size of counter value array
+ *
+ * Counter values set by this function will be used for one sample dump only
+ * after which counters will be cleared back to zero.
+ *
+ * Return: 0 on success, else error code.
+ */
+int gpu_model_set_dummy_prfcnt_user_sample(u32 __user *data, u32 size);
+
+/**
+ * gpu_model_set_dummy_prfcnt_kernel_sample() - Set performance counter values
+ * @data: Pointer to array of counter values
+ * @size: Size of counter value array
+ *
+ * Counter values set by this function will be used for one sample dump only
+ * after which counters will be cleared back to zero.
+ */
+void gpu_model_set_dummy_prfcnt_kernel_sample(u64 *data, u32 size);
+
 void gpu_model_get_dummy_prfcnt_cores(struct kbase_device *kbdev,
 		u64 *l2_present, u64 *shader_present);
 void gpu_model_set_dummy_prfcnt_cores(struct kbase_device *kbdev,
 		u64 l2_present, u64 shader_present);
-void gpu_model_set_dummy_prfcnt_base_cpu(u32 *base, struct kbase_device *kbdev,
-					 struct tagged_addr *pages,
-					 size_t page_count);
+
 /* Clear the counter values array maintained by the dummy model */
 void gpu_model_clear_prfcnt_values(void);
 
-enum gpu_dummy_irq {
-	GPU_DUMMY_JOB_IRQ,
-	GPU_DUMMY_GPU_IRQ,
-	GPU_DUMMY_MMU_IRQ
-};
+#if MALI_USE_CSF
+/**
+ * gpu_model_prfcnt_dump_request() - Request performance counter sample dump.
+ * @sample_buf:  Pointer to KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE sized array
+ *               in which to store dumped performance counter values.
+ * @enable_maps: Physical enable maps for performance counter blocks.
+ */
+void gpu_model_prfcnt_dump_request(uint32_t *sample_buf, struct gpu_model_prfcnt_en enable_maps);
 
-void gpu_device_raise_irq(void *model,
-						enum gpu_dummy_irq irq);
-void gpu_device_set_data(void *model, void *data);
-void *gpu_device_get_data(void *model);
+/**
+ * gpu_model_glb_request_job_irq() - Trigger job interrupt with global request
+ *                                   flag set.
+ * @model: Model pointer returned by midgard_model_create().
+ */
+void gpu_model_glb_request_job_irq(void *model);
+#endif /* MALI_USE_CSF */
 
 extern struct error_status_t hw_error_status;
 
diff --git a/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c b/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c
index c91c0d8..f310cc7 100644
--- a/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c
+++ b/mali_kbase/backend/gpu/mali_kbase_model_error_generator.c
@@ -21,30 +21,29 @@
 
 #include <mali_kbase.h>
 #include <linux/random.h>
-#include "backend/gpu/mali_kbase_model_dummy.h"
+#include "backend/gpu/mali_kbase_model_linux.h"
 
-/* all the error conditions supported by the model */
-#define TOTAL_FAULTS 27
-/* maximum number of levels in the MMU translation table tree */
-#define MAX_MMU_TABLE_LEVEL 4
-/* worst case scenario is <1 MMU fault + 1 job fault + 2 GPU faults> */
-#define MAX_CONCURRENT_FAULTS 3
+static struct kbase_error_atom *error_track_list;
+
+#ifdef CONFIG_MALI_ERROR_INJECT_RANDOM
 
 /** Kernel 6.1.0 has dropped prandom_u32(), use get_random_u32() */
 #if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE)
 #define prandom_u32 get_random_u32
 #endif
 
-static struct kbase_error_atom *error_track_list;
-
-unsigned int rand_seed;
-
 /*following error probability are set quite high in order to stress the driver*/
-unsigned int error_probability = 50;	/* to be set between 0 and 100 */
+static unsigned int error_probability = 50; /* to be set between 0 and 100 */
 /* probability to have multiple error give that there is an error */
-unsigned int multiple_error_probability = 50;
+static unsigned int multiple_error_probability = 50;
+
+/* all the error conditions supported by the model */
+#define TOTAL_FAULTS 27
+/* maximum number of levels in the MMU translation table tree */
+#define MAX_MMU_TABLE_LEVEL 4
+/* worst case scenario is <1 MMU fault + 1 job fault + 2 GPU faults> */
+#define MAX_CONCURRENT_FAULTS 3
 
-#ifdef CONFIG_MALI_ERROR_INJECT_RANDOM
 /**
  * gpu_generate_error - Generate GPU error
  */
diff --git a/mali_kbase/backend/gpu/mali_kbase_model_linux.c b/mali_kbase/backend/gpu/mali_kbase_model_linux.c
index 7887cb2..67e00e9 100644
--- a/mali_kbase/backend/gpu/mali_kbase_model_linux.c
+++ b/mali_kbase/backend/gpu/mali_kbase_model_linux.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010, 2012-2015, 2017-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,12 +20,12 @@
  */
 
 /*
- * Model interface
+ * Model Linux Framework interfaces.
  */
 
 #include <mali_kbase.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
-#include <backend/gpu/mali_kbase_model_dummy.h>
+
 #include "backend/gpu/mali_kbase_model_linux.h"
 #include "device/mali_kbase_device.h"
 #include "mali_kbase_irq_internal.h"
@@ -95,8 +95,7 @@ static void serve_mmu_irq(struct work_struct *work)
 	if (atomic_cmpxchg(&kbdev->serving_mmu_irq, 1, 0) == 1) {
 		u32 val;
 
-		while ((val = kbase_reg_read(kbdev,
-					MMU_REG(MMU_IRQ_STATUS)))) {
+		while ((val = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_STATUS)))) {
 			/* Handle the IRQ */
 			kbase_mmu_interrupt(kbdev, val);
 		}
@@ -105,8 +104,7 @@ static void serve_mmu_irq(struct work_struct *work)
 	kmem_cache_free(kbdev->irq_slab, data);
 }
 
-void gpu_device_raise_irq(void *model,
-				enum gpu_dummy_irq irq)
+void gpu_device_raise_irq(void *model, u32 irq)
 {
 	struct model_irq_data *data;
 	struct kbase_device *kbdev = gpu_device_get_data(model);
@@ -120,15 +118,15 @@ void gpu_device_raise_irq(void *model,
 	data->kbdev = kbdev;
 
 	switch (irq) {
-	case GPU_DUMMY_JOB_IRQ:
+	case MODEL_LINUX_JOB_IRQ:
 		INIT_WORK(&data->work, serve_job_irq);
 		atomic_set(&kbdev->serving_job_irq, 1);
 		break;
-	case GPU_DUMMY_GPU_IRQ:
+	case MODEL_LINUX_GPU_IRQ:
 		INIT_WORK(&data->work, serve_gpu_irq);
 		atomic_set(&kbdev->serving_gpu_irq, 1);
 		break;
-	case GPU_DUMMY_MMU_IRQ:
+	case MODEL_LINUX_MMU_IRQ:
 		INIT_WORK(&data->work, serve_mmu_irq);
 		atomic_set(&kbdev->serving_mmu_irq, 1);
 		break;
@@ -157,7 +155,7 @@ KBASE_EXPORT_TEST_API(kbase_reg_write);
 u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 {
 	unsigned long flags;
-	u32 val;
+	u32 val = 0;
 
 	spin_lock_irqsave(&kbdev->reg_op_lock, flags);
 	midgard_model_read_reg(kbdev->model, offset, &val);
@@ -165,22 +163,8 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 
 	return val;
 }
-
 KBASE_EXPORT_TEST_API(kbase_reg_read);
 
-/**
- * kbase_is_gpu_removed - Has the GPU been removed.
- * @kbdev:    Kbase device pointer
- *
- * This function would return true if the GPU has been removed.
- * It is stubbed here
- * Return: Always false
- */
-bool kbase_is_gpu_removed(struct kbase_device *kbdev)
-{
-	return false;
-}
-
 int kbase_install_interrupts(struct kbase_device *kbdev)
 {
 	KBASE_DEBUG_ASSERT(kbdev);
@@ -239,16 +223,12 @@ KBASE_EXPORT_TEST_API(kbase_gpu_irq_test_handler);
 
 int kbase_gpu_device_create(struct kbase_device *kbdev)
 {
-	kbdev->model = midgard_model_create(NULL);
+	kbdev->model = midgard_model_create(kbdev);
 	if (kbdev->model == NULL)
 		return -ENOMEM;
 
-	gpu_device_set_data(kbdev->model, kbdev);
-
 	spin_lock_init(&kbdev->reg_op_lock);
 
-	dev_warn(kbdev->dev, "Using Dummy Model");
-
 	return 0;
 }
 
diff --git a/mali_kbase/backend/gpu/mali_kbase_model_linux.h b/mali_kbase/backend/gpu/mali_kbase_model_linux.h
index dcb2e7c..4cf1235 100644
--- a/mali_kbase/backend/gpu/mali_kbase_model_linux.h
+++ b/mali_kbase/backend/gpu/mali_kbase_model_linux.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,13 +20,132 @@
  */
 
 /*
- * Model interface
+ * Model Linux Framework interfaces.
+ *
+ * This framework is used to provide generic Kbase Models interfaces.
+ * Note: Backends cannot be used together; the selection is done at build time.
+ *
+ * - Without Model Linux Framework:
+ * +-----------------------------+
+ * | Kbase read/write/IRQ        |
+ * +-----------------------------+
+ * | HW interface definitions    |
+ * +-----------------------------+
+ *
+ * - With Model Linux Framework:
+ * +-----------------------------+
+ * | Kbase read/write/IRQ        |
+ * +-----------------------------+
+ * | Model Linux Framework       |
+ * +-----------------------------+
+ * | Model interface definitions |
+ * +-----------------------------+
  */
 
 #ifndef _KBASE_MODEL_LINUX_H_
 #define _KBASE_MODEL_LINUX_H_
 
+/*
+ * Include Model definitions
+ */
+
+#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+#include <backend/gpu/mali_kbase_model_dummy.h>
+#endif /* IS_ENABLED(CONFIG_MALI_NO_MALI) */
+
+#if !IS_ENABLED(CONFIG_MALI_REAL_HW)
+/**
+ * kbase_gpu_device_create() - Generic create function.
+ *
+ * @kbdev: Kbase device.
+ *
+ * Specific model hook is implemented by midgard_model_create()
+ *
+ * Return: 0 on success, error code otherwise.
+ */
 int kbase_gpu_device_create(struct kbase_device *kbdev);
+
+/**
+ * kbase_gpu_device_destroy() - Generic create function.
+ *
+ * @kbdev: Kbase device.
+ *
+ * Specific model hook is implemented by midgard_model_destroy()
+ */
 void kbase_gpu_device_destroy(struct kbase_device *kbdev);
 
-#endif				/* _KBASE_MODEL_LINUX_H_ */
+/**
+ * midgard_model_create() - Private create function.
+ *
+ * @kbdev: Kbase device.
+ *
+ * This hook is specific to the model built in Kbase.
+ *
+ * Return: Model handle.
+ */
+void *midgard_model_create(struct kbase_device *kbdev);
+
+/**
+ * midgard_model_destroy() - Private destroy function.
+ *
+ * @h: Model handle.
+ *
+ * This hook is specific to the model built in Kbase.
+ */
+void midgard_model_destroy(void *h);
+
+/**
+ * midgard_model_write_reg() - Private model write function.
+ *
+ * @h: Model handle.
+ * @addr: Address at which to write.
+ * @value: value to write.
+ *
+ * This hook is specific to the model built in Kbase.
+ */
+void midgard_model_write_reg(void *h, u32 addr, u32 value);
+
+/**
+ * midgard_model_read_reg() - Private model read function.
+ *
+ * @h: Model handle.
+ * @addr: Address from which to read.
+ * @value: Pointer where to store the read value.
+ *
+ * This hook is specific to the model built in Kbase.
+ */
+void midgard_model_read_reg(void *h, u32 addr, u32 *const value);
+
+/**
+ * gpu_device_raise_irq() - Private IRQ raise function.
+ *
+ * @model: Model handle.
+ * @irq: IRQ type to raise.
+ *
+ * This hook is global to the model Linux framework.
+ */
+void gpu_device_raise_irq(void *model, u32 irq);
+
+/**
+ * gpu_device_set_data() - Private model set data function.
+ *
+ * @model: Model handle.
+ * @data: Data carried by model.
+ *
+ * This hook is global to the model Linux framework.
+ */
+void gpu_device_set_data(void *model, void *data);
+
+/**
+ * gpu_device_get_data() - Private model get data function.
+ *
+ * @model: Model handle.
+ *
+ * This hook is global to the model Linux framework.
+ *
+ * Return: Pointer to the data carried by model.
+ */
+void *gpu_device_get_data(void *model);
+#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
+
+#endif /* _KBASE_MODEL_LINUX_H_ */
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_backend.c b/mali_kbase/backend/gpu/mali_kbase_pm_backend.c
index 2d52eca..311ce90 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_backend.c
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_backend.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -36,7 +36,7 @@
 #include <linux/pm_runtime.h>
 #include <mali_kbase_reset_gpu.h>
 #endif /* !MALI_USE_CSF */
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <backend/gpu/mali_kbase_devfreq.h>
 #include <mali_kbase_dummy_job_wa.h>
@@ -72,10 +72,18 @@ int kbase_pm_runtime_init(struct kbase_device *kbdev)
 					callbacks->power_runtime_idle_callback;
 		kbdev->pm.backend.callback_soft_reset =
 					callbacks->soft_reset_callback;
+		kbdev->pm.backend.callback_hardware_reset =
+					callbacks->hardware_reset_callback;
 		kbdev->pm.backend.callback_power_runtime_gpu_idle =
 					callbacks->power_runtime_gpu_idle_callback;
 		kbdev->pm.backend.callback_power_runtime_gpu_active =
 					callbacks->power_runtime_gpu_active_callback;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+		kbdev->pm.backend.callback_power_on_sc_rails =
+					callbacks->power_on_sc_rails_callback;
+		kbdev->pm.backend.callback_power_off_sc_rails =
+					callbacks->power_off_sc_rails_callback;
+#endif
 
 		if (callbacks->power_runtime_init_callback)
 			return callbacks->power_runtime_init_callback(kbdev);
@@ -93,8 +101,13 @@ int kbase_pm_runtime_init(struct kbase_device *kbdev)
 	kbdev->pm.backend.callback_power_runtime_off = NULL;
 	kbdev->pm.backend.callback_power_runtime_idle = NULL;
 	kbdev->pm.backend.callback_soft_reset = NULL;
+	kbdev->pm.backend.callback_hardware_reset = NULL;
 	kbdev->pm.backend.callback_power_runtime_gpu_idle = NULL;
 	kbdev->pm.backend.callback_power_runtime_gpu_active = NULL;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	kbdev->pm.backend.callback_power_on_sc_rails = NULL;
+	kbdev->pm.backend.callback_power_off_sc_rails = NULL;
+#endif
 
 	return 0;
 }
@@ -140,7 +153,9 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev)
 
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 
-	mutex_init(&kbdev->pm.lock);
+	rt_mutex_init(&kbdev->pm.lock);
+
+	kbase_pm_init_event_log(kbdev);
 
 	kbdev->pm.backend.gpu_poweroff_wait_wq = alloc_workqueue("kbase_pm_poweroff_wait",
 			WQ_HIGHPRI | WQ_UNBOUND, 1);
@@ -154,6 +169,7 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev)
 	kbdev->pm.backend.gpu_powered = false;
 	kbdev->pm.backend.gpu_ready = false;
 	kbdev->pm.suspending = false;
+	kbdev->pm.resuming = false;
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	kbase_pm_set_gpu_lost(kbdev, false);
 #endif
@@ -207,6 +223,10 @@ int kbase_hwaccess_pm_init(struct kbase_device *kbdev)
 		!kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TURSEHW_1997) &&
 		kbdev->pm.backend.callback_power_runtime_gpu_active &&
 		kbdev->pm.backend.callback_power_runtime_gpu_idle;
+
+	kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa =
+		kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TITANHW_2938) &&
+		kbdev->pm.backend.gpu_sleep_supported;
 #endif
 
 	if (IS_ENABLED(CONFIG_MALI_HW_ERRATA_1485982_NOT_AFFECTED)) {
@@ -422,8 +442,7 @@ static void kbase_pm_l2_clock_slow(struct kbase_device *kbdev)
 		return;
 
 	/* Stop the metrics gathering framework */
-	if (kbase_pm_metrics_is_active(kbdev))
-		kbase_pm_metrics_stop(kbdev);
+	kbase_pm_metrics_stop(kbdev);
 
 	/* Keep the current freq to restore it upon resume */
 	kbdev->previous_frequency = clk_get_rate(clk);
@@ -576,11 +595,13 @@ static int kbase_pm_do_poweroff_sync(struct kbase_device *kbdev)
 {
 	struct kbase_pm_backend_data *backend = &kbdev->pm.backend;
 	unsigned long flags;
-	int ret = 0;
+	int ret;
 
 	WARN_ON(kbdev->pm.active_count);
 
-	kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	ret = kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	if (ret)
+		return ret;
 
 	kbase_pm_lock(kbdev);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
@@ -665,25 +686,6 @@ unlock_hwaccess:
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 }
 
-static bool is_poweroff_in_progress(struct kbase_device *kbdev)
-{
-	bool ret;
-	unsigned long flags;
-
-	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	ret = (kbdev->pm.backend.poweroff_wait_in_progress == false);
-	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-
-	return ret;
-}
-
-void kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev)
-{
-	wait_event_killable(kbdev->pm.backend.poweroff_wait,
-			is_poweroff_in_progress(kbdev));
-}
-KBASE_EXPORT_TEST_API(kbase_pm_wait_for_poweroff_work_complete);
-
 /**
  * is_gpu_powered_down - Check whether GPU is powered down
  *
@@ -807,9 +809,9 @@ void kbase_hwaccess_pm_halt(struct kbase_device *kbdev)
 #if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
 	WARN_ON(kbase_pm_do_poweroff_sync(kbdev));
 #else
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	kbase_pm_do_poweroff(kbdev);
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 	kbase_pm_wait_for_poweroff_work_complete(kbdev);
 #endif
@@ -865,7 +867,7 @@ void kbase_pm_power_changed(struct kbase_device *kbdev)
 	kbase_pm_update_state(kbdev);
 
 #if !MALI_USE_CSF
-		kbase_backend_slot_update(kbdev);
+	kbase_backend_slot_update(kbdev);
 #endif /* !MALI_USE_CSF */
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -937,7 +939,13 @@ int kbase_hwaccess_pm_suspend(struct kbase_device *kbdev)
 
 	kbase_pm_unlock(kbdev);
 
-	kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	ret = kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	if (ret) {
+#if !MALI_USE_CSF
+		kbase_backend_timer_resume(kbdev);
+#endif /* !MALI_USE_CSF */
+		return ret;
+	}
 #endif
 
 	WARN_ON(kbdev->pm.backend.gpu_powered);
@@ -953,6 +961,8 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev)
 {
 	kbase_pm_lock(kbdev);
 
+	/* System resume callback has begun */
+	kbdev->pm.resuming = true;
 	kbdev->pm.suspending = false;
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	if (kbase_pm_is_gpu_lost(kbdev)) {
@@ -967,7 +977,6 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev)
 	kbase_backend_timer_resume(kbdev);
 #endif /* !MALI_USE_CSF */
 
-	wake_up_all(&kbdev->pm.resume_wait);
 	kbase_pm_unlock(kbdev);
 }
 
@@ -975,13 +984,13 @@ void kbase_hwaccess_pm_resume(struct kbase_device *kbdev)
 void kbase_pm_handle_gpu_lost(struct kbase_device *kbdev)
 {
 	unsigned long flags;
-	ktime_t end_timestamp = ktime_get();
+	ktime_t end_timestamp = ktime_get_raw();
 	struct kbase_arbiter_vm_state *arb_vm_state = kbdev->pm.arb_vm_state;
 
 	if (!kbdev->arb.arb_if)
 		return;
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	mutex_lock(&arb_vm_state->vm_state_lock);
 	if (kbdev->pm.backend.gpu_powered &&
 			!kbase_pm_is_gpu_lost(kbdev)) {
@@ -1021,7 +1030,7 @@ void kbase_pm_handle_gpu_lost(struct kbase_device *kbdev)
 		spin_unlock_irqrestore(&kbdev->hwcnt.lock, flags);
 	}
 	mutex_unlock(&arb_vm_state->vm_state_lock);
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 }
 
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
@@ -1050,6 +1059,7 @@ static int pm_handle_mcu_sleep_on_runtime_suspend(struct kbase_device *kbdev)
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 	lockdep_assert_held(&kbdev->pm.lock);
 
+#ifdef CONFIG_MALI_DEBUG
 	/* In case of no active CSG on slot, powering up L2 could be skipped and
 	 * proceed directly to suspend GPU.
 	 * ToDo: firmware has to be reloaded after wake-up as no halt command
@@ -1059,6 +1069,7 @@ static int pm_handle_mcu_sleep_on_runtime_suspend(struct kbase_device *kbdev)
 		dev_info(
 			kbdev->dev,
 			"No active CSGs. Can skip the power up of L2 and go for suspension directly");
+#endif
 
 	ret = kbase_pm_force_mcu_wakeup_after_sleep(kbdev);
 	if (ret) {
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_ca.c b/mali_kbase/backend/gpu/mali_kbase_pm_ca.c
index 7d14be9..b02f77f 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_ca.c
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_ca.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2013-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,9 +26,7 @@
 #include <mali_kbase.h>
 #include <mali_kbase_pm.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-#include <backend/gpu/mali_kbase_model_dummy.h>
-#endif /* CONFIG_MALI_NO_MALI */
+#include <backend/gpu/mali_kbase_model_linux.h>
 #include <mali_kbase_dummy_job_wa.h>
 
 int kbase_pm_ca_init(struct kbase_device *kbdev)
@@ -92,29 +90,10 @@ void kbase_devfreq_set_core_mask(struct kbase_device *kbdev, u64 core_mask)
 	 * for those cores to get powered down
 	 */
 	if ((core_mask & old_core_mask) != old_core_mask) {
-		bool can_wait;
-
-		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-		can_wait = kbdev->pm.backend.gpu_ready && kbase_pm_is_mcu_desired(kbdev);
-		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-
-		/* This check is ideally not required, the wait function can
-		 * deal with the GPU power down. But it has been added to
-		 * address the scenario where down-scaling request comes from
-		 * the platform specific code soon after the GPU power down
-		 * and at the time same time application thread tries to
-		 * power up the GPU (on the flush of GPU queue).
-		 * The platform specific @ref callback_power_on that gets
-		 * invoked on power up does not return until down-scaling
-		 * request is complete. The check mitigates the race caused by
-		 * the problem in platform specific code.
-		 */
-		if (likely(can_wait)) {
-			if (kbase_pm_wait_for_desired_state(kbdev)) {
-				dev_warn(kbdev->dev,
-					 "Wait for update of core_mask from %llx to %llx failed",
-					 old_core_mask, core_mask);
-			}
+		if (kbase_pm_wait_for_cores_down_scale(kbdev)) {
+			dev_warn(kbdev->dev,
+				 "Wait for update of core_mask from %llx to %llx failed",
+				 old_core_mask, core_mask);
 		}
 	}
 #endif
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_defs.h b/mali_kbase/backend/gpu/mali_kbase_pm_defs.h
index 80da093..ad49019 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_defs.h
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -136,7 +136,7 @@ struct kbasep_pm_metrics {
  *           or removed from a GPU slot.
  *  @active_cl_ctx: number of CL jobs active on the GPU. Array is per-device.
  *  @active_gl_ctx: number of GL jobs active on the GPU. Array is per-slot.
- *  @lock: spinlock protecting the kbasep_pm_metrics_data structure
+ *  @lock: spinlock protecting the kbasep_pm_metrics_state structure
  *  @platform_data: pointer to data controlled by platform specific code
  *  @kbdev: pointer to kbase device for which metrics are collected
  *  @values: The current values of the power management metrics. The
@@ -145,7 +145,7 @@ struct kbasep_pm_metrics {
  *  @initialized: tracks whether metrics_state has been initialized or not.
  *  @timer: timer to regularly make DVFS decisions based on the power
  *           management metrics.
- *  @timer_active: boolean indicating @timer is running
+ *  @timer_state: atomic indicating current @timer state, on, off, or stopped.
  *  @dvfs_last: values of the PM metrics from the last DVFS tick
  *  @dvfs_diff: different between the current and previous PM metrics.
  */
@@ -169,7 +169,7 @@ struct kbasep_pm_metrics_state {
 #ifdef CONFIG_MALI_MIDGARD_DVFS
 	bool initialized;
 	struct hrtimer timer;
-	bool timer_active;
+	atomic_t timer_state;
 	struct kbasep_pm_metrics dvfs_last;
 	struct kbasep_pm_metrics dvfs_diff;
 #endif
@@ -215,6 +215,60 @@ union kbase_pm_policy_data {
 };
 
 /**
+ * enum kbase_pm_log_event_type - The types of core in a GPU.
+ *
+ * @KBASE_PM_LOG_EVENT_NONE: an unused log event, default state at
+ *                           initialization. Carries no data.
+ * @KBASE_PM_LOG_EVENT_SHADERS_STATE: a transition of the JM shader state
+ *                               machine.  .state is populated.
+ * @KBASE_PM_LOG_EVENT_L2_STATE: a transition of the L2 state machine.
+ *                               .state is populated.
+ * @KBASE_PM_LOG_EVENT_MCU_STATE: a transition of the MCU state machine.
+ *                                .state is populated.
+ * @KBASE_PM_LOG_EVENT_CORES: a transition of core availability.
+ *                            .cores is populated.
+ *
+ * Each event log event has a type which determines the data it carries.
+ */
+enum kbase_pm_log_event_type {
+	KBASE_PM_LOG_EVENT_NONE = 0,
+	KBASE_PM_LOG_EVENT_SHADERS_STATE,
+	KBASE_PM_LOG_EVENT_L2_STATE,
+	KBASE_PM_LOG_EVENT_MCU_STATE,
+	KBASE_PM_LOG_EVENT_CORES
+};
+
+/**
+ * struct kbase_pm_event_log_event - One event in the PM log.
+ *
+ * @type: The type of the event, from &enum kbase_pm_log_event_type.
+ * @timestamp: The time the log event was generated.
+ **/
+struct kbase_pm_event_log_event {
+	u8 type;
+	ktime_t timestamp;
+	union {
+		struct {
+			u8 next;
+			u8 prev;
+		} state;
+		struct {
+			u64 l2;
+			u64 shader;
+			u64 tiler;
+			u64 stack;
+		} cores;
+	};
+};
+
+#define EVENT_LOG_MAX (PAGE_SIZE / sizeof(struct kbase_pm_event_log_event))
+
+struct kbase_pm_event_log {
+	u32 last_event;
+	struct kbase_pm_event_log_event events[EVENT_LOG_MAX];
+};
+
+/**
  * struct kbase_pm_backend_data - Data stored per device for power management.
  *
  * @pm_current_policy: The policy that is currently actively controlling the
@@ -279,6 +333,8 @@ union kbase_pm_policy_data {
  *                               &struct kbase_pm_callback_conf
  * @callback_soft_reset: Optional callback to software reset the GPU. See
  *                       &struct kbase_pm_callback_conf
+ * @callback_hardware_reset: Optional callback to hardware reset the GPU. See
+ *                           &struct kbase_pm_callback_conf
  * @callback_power_runtime_gpu_idle: Callback invoked by Kbase when GPU has
  *                                   become idle.
  *                                   See &struct kbase_pm_callback_conf.
@@ -286,7 +342,13 @@ union kbase_pm_policy_data {
  *                                     @callback_power_runtime_gpu_idle was
  *                                     called previously.
  *                                     See &struct kbase_pm_callback_conf.
+ * @callback_power_on_sc_rails: Callback invoked to turn on the shader core
+ *                              power rails. See &struct kbase_pm_callback_conf.
+ * @callback_power_off_sc_rails: Callback invoked to turn off the shader core
+ *                               power rails. See &struct kbase_pm_callback_conf.
  * @ca_cores_enabled: Cores that are currently available
+ * @apply_hw_issue_TITANHW_2938_wa: Indicates if the workaround for BASE_HW_ISSUE_TITANHW_2938
+ *                                  needs to be applied when unmapping memory from GPU.
  * @mcu_state: The current state of the micro-control unit, only applicable
  *             to GPUs that have such a component
  * @l2_state:     The current state of the L2 cache state machine. See
@@ -391,6 +453,7 @@ union kbase_pm_policy_data {
  *                         work function, kbase_pm_gpu_clock_control_worker.
  * @gpu_clock_control_work: work item to set GPU clock during L2 power cycle
  *                          using gpu_clock_control
+ * @event_log: data for the always-on event log
  *
  * This structure contains data for the power management framework. There is one
  * instance of this structure per device in the system.
@@ -444,12 +507,18 @@ struct kbase_pm_backend_data {
 	void (*callback_power_runtime_off)(struct kbase_device *kbdev);
 	int (*callback_power_runtime_idle)(struct kbase_device *kbdev);
 	int (*callback_soft_reset)(struct kbase_device *kbdev);
+	void (*callback_hardware_reset)(struct kbase_device *kbdev);
 	void (*callback_power_runtime_gpu_idle)(struct kbase_device *kbdev);
 	void (*callback_power_runtime_gpu_active)(struct kbase_device *kbdev);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	void (*callback_power_on_sc_rails)(struct kbase_device *kbdev);
+	void (*callback_power_off_sc_rails)(struct kbase_device *kbdev);
+#endif
 
 	u64 ca_cores_enabled;
 
 #if MALI_USE_CSF
+	bool apply_hw_issue_TITANHW_2938_wa;
 	enum kbase_mcu_state mcu_state;
 #endif
 	enum kbase_l2_core_state l2_state;
@@ -463,6 +532,11 @@ struct kbase_pm_backend_data {
 	struct mutex policy_change_lock;
 	struct workqueue_struct *core_idle_wq;
 	struct work_struct core_idle_work;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	struct work_struct sc_rails_on_work;
+	bool sc_power_rails_off;
+	bool sc_pwroff_safe;
+#endif
 
 #ifdef KBASE_PM_RUNTIME
 	bool gpu_sleep_supported;
@@ -496,10 +570,12 @@ struct kbase_pm_backend_data {
 	bool gpu_clock_slow_down_desired;
 	bool gpu_clock_slowed_down;
 	struct work_struct gpu_clock_control_work;
+
+	struct kbase_pm_event_log event_log;
 };
 
 #if MALI_USE_CSF
-/* CSF PM flag, signaling that the MCU CORE should be kept on */
+/* CSF PM flag, signaling that the MCU shader Core should be kept on */
 #define  CSF_DYNAMIC_PM_CORE_KEEP_ON (1 << 0)
 /* CSF PM flag, signaling no scheduler suspension on idle groups */
 #define CSF_DYNAMIC_PM_SCHED_IGNORE_IDLE (1 << 1)
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_driver.c b/mali_kbase/backend/gpu/mali_kbase_pm_driver.c
index 240c31a..7c891c1 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_driver.c
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_driver.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -39,20 +39,18 @@
 
 #include <mali_kbase_reset_gpu.h>
 #include <mali_kbase_ctx_sched.h>
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 #include <mali_kbase_pbha.h>
 #include <backend/gpu/mali_kbase_cache_policy_backend.h>
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_irq_internal.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <backend/gpu/mali_kbase_l2_mmu_config.h>
+#include <backend/gpu/mali_kbase_pm_event_log.h>
 #include <mali_kbase_dummy_job_wa.h>
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 #include <arbiter/mali_kbase_arbiter_pm.h>
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
-#if MALI_USE_CSF
-#include <csf/ipa_control/mali_kbase_csf_ipa_control.h>
-#endif
 
 #if MALI_USE_CSF
 #include <linux/delay.h>
@@ -148,9 +146,9 @@ bool kbase_pm_is_l2_desired(struct kbase_device *kbdev)
 	if (unlikely(kbdev->pm.backend.policy_change_clamp_state_to_off))
 		return false;
 
-	/* Power up the L2 cache only when MCU is desired */
-	if (likely(kbdev->csf.firmware_inited))
-		return kbase_pm_is_mcu_desired(kbdev);
+	/* We need to power up the L2 when the MCU is desired */
+	if (kbase_pm_is_mcu_desired(kbdev))
+		return true;
 #endif
 
 	return kbdev->pm.backend.l2_desired;
@@ -538,6 +536,14 @@ static void kbase_pm_l2_config_override(struct kbase_device *kbdev)
 	if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_L2_CONFIG))
 		return;
 
+#if MALI_USE_CSF
+	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU)) {
+		val = kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_CONFIG));
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(L2_CONFIG),
+				L2_CONFIG_PBHA_HWU_SET(val, kbdev->pbha_propagate_bits));
+	}
+#endif /* MALI_USE_CSF */
+
 	/*
 	 * Skip if size and hash are not given explicitly,
 	 * which means default values are used.
@@ -599,6 +605,21 @@ static const char *kbase_mcu_state_to_string(enum kbase_mcu_state state)
 		return strings[state];
 }
 
+static
+void kbase_ktrace_log_mcu_state(struct kbase_device *kbdev, enum kbase_mcu_state state)
+{
+#if KBASE_KTRACE_ENABLE
+	switch (state) {
+#define KBASEP_MCU_STATE(n) \
+	case KBASE_MCU_ ## n: \
+		KBASE_KTRACE_ADD(kbdev, PM_MCU_ ## n, NULL, state); \
+		break;
+#include "mali_kbase_pm_mcu_states.h"
+#undef KBASEP_MCU_STATE
+	}
+#endif
+}
+
 static inline bool kbase_pm_handle_mcu_core_attr_update(struct kbase_device *kbdev)
 {
 	struct kbase_pm_backend_data *backend = &kbdev->pm.backend;
@@ -655,8 +676,39 @@ static void kbase_pm_enable_mcu_db_notification(struct kbase_device *kbdev)
 	val &= ~MCU_CNTRL_DOORBELL_DISABLE_MASK;
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(MCU_CONTROL), val);
 }
-#endif
 
+/**
+ * wait_mcu_as_inactive - Wait for AS used by MCU FW to get configured
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is called to wait for the AS used by MCU FW to get configured
+ * before DB notification on MCU is enabled, as a workaround for HW issue.
+ */
+static void wait_mcu_as_inactive(struct kbase_device *kbdev)
+{
+	unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (!kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_TURSEHW_2716))
+		return;
+
+	/* Wait for the AS_ACTIVE_INT bit to become 0 for the AS used by MCU FW */
+	while (--max_loops &&
+	       kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(MCU_AS_NR, AS_STATUS))) &
+		       AS_STATUS_AS_ACTIVE_INT)
+		;
+
+	if (!WARN_ON_ONCE(max_loops == 0))
+		return;
+
+	dev_err(kbdev->dev, "AS_ACTIVE_INT bit stuck for AS %d used by MCU FW", MCU_AS_NR);
+
+	if (kbase_prepare_to_reset_gpu(kbdev, 0))
+		kbase_reset_gpu(kbdev);
+}
+#endif
 
 /**
  * kbasep_pm_toggle_power_interrupt - Toggles the IRQ mask for power interrupts
@@ -665,10 +717,10 @@ static void kbase_pm_enable_mcu_db_notification(struct kbase_device *kbdev)
  * @kbdev:  Pointer to the device
  * @enable: boolean indicating to enable interrupts or not
  *
- * The POWER_CHANGED_ALL and POWER_CHANGED_SINGLE interrupts can be disabled
- * after L2 has been turned on when FW is controlling the power for the shader
- * cores. Correspondingly, the interrupts can be re-enabled after the MCU has
- * been disabled before the power down of L2.
+ * The POWER_CHANGED_ALL interrupt can be disabled after L2 has been turned on
+ * when FW is controlling the power for the shader cores. Correspondingly, the
+ * interrupts can be re-enabled after the MCU has been disabled before the
+ * power down of L2.
  */
 static void kbasep_pm_toggle_power_interrupt(struct kbase_device *kbdev, bool enable)
 {
@@ -678,10 +730,16 @@ static void kbasep_pm_toggle_power_interrupt(struct kbase_device *kbdev, bool en
 
 	irq_mask = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK));
 
-	if (enable)
-		irq_mask |= POWER_CHANGED_ALL | POWER_CHANGED_SINGLE;
-	else
-		irq_mask &= ~(POWER_CHANGED_ALL | POWER_CHANGED_SINGLE);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/* For IFPO, we require the POWER_CHANGED_ALL interrupt to be always on */
+	enable = true;
+#endif
+	if (enable) {
+		irq_mask |= POWER_CHANGED_ALL;
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), POWER_CHANGED_ALL);
+	} else {
+		irq_mask &= ~POWER_CHANGED_ALL;
+	}
 
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), irq_mask);
 }
@@ -742,12 +800,31 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 						backend->shaders_desired_mask;
 				backend->pm_shaders_core_mask = 0;
 				if (kbdev->csf.firmware_hctl_core_pwr) {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+					/* On rail up, this state machine will be re-invoked */
+					if (backend->sc_power_rails_off) {
+						/* The work should already be queued or executing */
+						WARN_ON(!work_busy(&backend->sc_rails_on_work));
+						break;
+					}
+#endif
 					kbase_pm_invoke(kbdev, KBASE_PM_CORE_SHADER,
 						backend->shaders_avail, ACTION_PWRON);
 					backend->mcu_state =
 						KBASE_MCU_HCTL_SHADERS_PEND_ON;
 				} else
 					backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE;
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+				if (kbase_debug_coresight_csf_state_check(
+					    kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED)) {
+					kbase_debug_coresight_csf_state_request(
+						kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED);
+					backend->mcu_state = KBASE_MCU_CORESIGHT_ENABLE;
+				} else if (kbase_debug_coresight_csf_state_check(
+						   kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) {
+					backend->mcu_state = KBASE_MCU_CORESIGHT_ENABLE;
+				}
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 			}
 			break;
 
@@ -776,8 +853,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 				unsigned long flags;
 
 				kbase_csf_scheduler_spin_lock(kbdev, &flags);
-				kbase_hwcnt_context_enable(
-					kbdev->hwcnt_gpu_ctx);
+				kbase_hwcnt_context_enable(kbdev->hwcnt_gpu_ctx);
 				kbase_csf_scheduler_spin_unlock(kbdev, flags);
 				backend->hwcnt_disabled = false;
 			}
@@ -798,9 +874,19 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 					backend->mcu_state =
 						KBASE_MCU_HCTL_MCU_ON_RECHECK;
 				}
-			} else if (kbase_pm_handle_mcu_core_attr_update(kbdev)) {
+			} else if (kbase_pm_handle_mcu_core_attr_update(kbdev))
 				backend->mcu_state = KBASE_MCU_ON_CORE_ATTR_UPDATE_PEND;
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+			else if (kbdev->csf.coresight.disable_on_pmode_enter) {
+				kbase_debug_coresight_csf_state_request(
+					kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED);
+				backend->mcu_state = KBASE_MCU_ON_PMODE_ENTER_CORESIGHT_DISABLE;
+			} else if (kbdev->csf.coresight.enable_on_pmode_exit) {
+				kbase_debug_coresight_csf_state_request(
+					kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED);
+				backend->mcu_state = KBASE_MCU_ON_PMODE_EXIT_CORESIGHT_ENABLE;
 			}
+#endif
 			break;
 
 		case KBASE_MCU_HCTL_MCU_ON_RECHECK:
@@ -891,12 +977,46 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 #ifdef KBASE_PM_RUNTIME
 				if (backend->gpu_sleep_mode_active)
 					backend->mcu_state = KBASE_MCU_ON_SLEEP_INITIATE;
-				else
+				else {
 #endif
 					backend->mcu_state = KBASE_MCU_ON_HALT;
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+					kbase_debug_coresight_csf_state_request(
+						kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED);
+					backend->mcu_state = KBASE_MCU_CORESIGHT_DISABLE;
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+				}
 			}
 			break;
 
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+		case KBASE_MCU_ON_PMODE_ENTER_CORESIGHT_DISABLE:
+			if (kbase_debug_coresight_csf_state_check(
+				    kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED)) {
+				backend->mcu_state = KBASE_MCU_ON;
+				kbdev->csf.coresight.disable_on_pmode_enter = false;
+			}
+			break;
+		case KBASE_MCU_ON_PMODE_EXIT_CORESIGHT_ENABLE:
+			if (kbase_debug_coresight_csf_state_check(
+				    kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED)) {
+				backend->mcu_state = KBASE_MCU_ON;
+				kbdev->csf.coresight.enable_on_pmode_exit = false;
+			}
+			break;
+		case KBASE_MCU_CORESIGHT_DISABLE:
+			if (kbase_debug_coresight_csf_state_check(
+				    kbdev, KBASE_DEBUG_CORESIGHT_CSF_DISABLED))
+				backend->mcu_state = KBASE_MCU_ON_HALT;
+			break;
+
+		case KBASE_MCU_CORESIGHT_ENABLE:
+			if (kbase_debug_coresight_csf_state_check(
+				    kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED))
+				backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE;
+			break;
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+
 		case KBASE_MCU_ON_HALT:
 			if (!kbase_pm_is_mcu_desired(kbdev)) {
 				kbase_csf_firmware_trigger_mcu_halt(kbdev);
@@ -907,7 +1027,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 
 		case KBASE_MCU_ON_PEND_HALT:
 			if (kbase_csf_firmware_mcu_halted(kbdev)) {
-				KBASE_KTRACE_ADD(kbdev, MCU_HALTED, NULL,
+				KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_MCU_HALTED, NULL,
 					kbase_csf_ktrace_gpu_cycle_cnt(kbdev));
 				if (kbdev->csf.firmware_hctl_core_pwr)
 					backend->mcu_state =
@@ -954,7 +1074,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 
 		case KBASE_MCU_ON_PEND_SLEEP:
 			if (kbase_csf_firmware_is_mcu_in_sleep(kbdev)) {
-				KBASE_KTRACE_ADD(kbdev, MCU_IN_SLEEP, NULL,
+				KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_MCU_SLEEP, NULL,
 					kbase_csf_ktrace_gpu_cycle_cnt(kbdev));
 				backend->mcu_state = KBASE_MCU_IN_SLEEP;
 				kbase_pm_enable_db_mirror_interrupt(kbdev);
@@ -970,6 +1090,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 		case KBASE_MCU_IN_SLEEP:
 			if (kbase_pm_is_mcu_desired(kbdev) &&
 			    backend->l2_state == KBASE_L2_ON) {
+				wait_mcu_as_inactive(kbdev);
 				KBASE_TLSTREAM_TL_KBASE_CSFFW_FW_REQUEST_WAKEUP(
 					kbdev, kbase_backend_get_cycle_cnt(kbdev));
 				kbase_pm_enable_mcu_db_notification(kbdev);
@@ -980,6 +1101,7 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 				if (!kbdev->csf.firmware_hctl_core_pwr)
 					kbasep_pm_toggle_power_interrupt(kbdev, false);
 				backend->mcu_state = KBASE_MCU_ON_HWCNT_ENABLE;
+				kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 			}
 			break;
 #endif
@@ -987,6 +1109,11 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 			/* Reset complete  */
 			if (!backend->in_reset)
 				backend->mcu_state = KBASE_MCU_OFF;
+
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+			kbdev->csf.coresight.disable_on_pmode_enter = false;
+			kbdev->csf.coresight.enable_on_pmode_exit = false;
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 			break;
 
 		default:
@@ -994,10 +1121,18 @@ static int kbase_pm_mcu_update_state(struct kbase_device *kbdev)
 			     backend->mcu_state);
 		}
 
-		if (backend->mcu_state != prev_state)
+		if (backend->mcu_state != prev_state) {
+			struct kbase_pm_event_log_event *event =
+					kbase_pm_add_log_event(kbdev);
+			event->type = KBASE_PM_LOG_EVENT_MCU_STATE;
+			event->state.prev = prev_state;
+			event->state.next = backend->mcu_state;
+
 			dev_dbg(kbdev->dev, "MCU state transition: %s to %s\n",
 				kbase_mcu_state_to_string(prev_state),
 				kbase_mcu_state_to_string(backend->mcu_state));
+			kbase_ktrace_log_mcu_state(kbdev, backend->mcu_state);
+		}
 
 	} while (backend->mcu_state != prev_state);
 
@@ -1032,6 +1167,31 @@ static void core_idle_worker(struct work_struct *work)
 }
 #endif
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void sc_rails_on_worker(struct work_struct *work)
+{
+	struct kbase_device *kbdev =
+		container_of(work, struct kbase_device, pm.backend.sc_rails_on_work);
+	unsigned long flags;
+
+	/*
+	 * Intentionally not synchronized using the scheduler.lock, as the scheduler may be waiting
+	 * on the SC rail to power up
+	 */
+	kbase_pm_lock(kbdev);
+
+	kbase_pm_turn_on_sc_power_rails_locked(kbdev);
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	/* Push the state machine forward in case it was waiting on SC rail power up */
+	kbase_pm_update_state(kbdev);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	kbase_pm_unlock(kbdev);
+}
+#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */
+
+
 static const char *kbase_l2_core_state_to_string(enum kbase_l2_core_state state)
 {
 	const char *const strings[] = {
@@ -1045,6 +1205,21 @@ static const char *kbase_l2_core_state_to_string(enum kbase_l2_core_state state)
 		return strings[state];
 }
 
+static
+void kbase_ktrace_log_l2_core_state(struct kbase_device *kbdev, enum kbase_l2_core_state state)
+{
+#if KBASE_KTRACE_ENABLE
+	switch (state) {
+#define KBASEP_L2_STATE(n) \
+	case KBASE_L2_ ## n: \
+		KBASE_KTRACE_ADD(kbdev, PM_L2_ ## n, NULL, state); \
+		break;
+#include "mali_kbase_pm_l2_states.h"
+#undef KBASEP_L2_STATE
+	}
+#endif
+}
+
 #if !MALI_USE_CSF
 /* On powering on the L2, the tracked kctx becomes stale and can be cleared.
  * This enables the backend to spare the START_FLUSH.INV_SHADER_OTHER
@@ -1062,13 +1237,82 @@ static void kbase_pm_l2_clear_backend_slot_submit_kctx(struct kbase_device *kbde
 }
 #endif
 
+/* wait_as_active_int - Wait for AS_ACTIVE_INT bits to become 0 for all AS
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is supposed to be called before the write to L2_PWROFF register
+ * to wait for AS_ACTIVE_INT bit to become 0 for all the GPU address space slots.
+ * AS_ACTIVE_INT bit can become 1 for an AS, only when L2_READY becomes 1, based
+ * on the value in TRANSCFG register and would become 0 once AS has been reconfigured.
+ */
+static void wait_as_active_int(struct kbase_device *kbdev)
+{
+#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+	int as_no;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (!kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_GPU2019_3878))
+		return;
+
+	for (as_no = 0; as_no != kbdev->nr_hw_address_spaces; as_no++) {
+		unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS;
+
+		/* Wait for the AS_ACTIVE_INT bit to become 0 for the AS.
+		 * The wait is actually needed only for the enabled AS.
+		 */
+		while (--max_loops &&
+			kbase_reg_read(kbdev, MMU_AS_REG(as_no, AS_STATUS)) &
+				AS_STATUS_AS_ACTIVE_INT)
+			;
+
+#ifdef CONFIG_MALI_DEBUG
+		/* For a disabled AS the loop should run for a single iteration only. */
+		if (!kbdev->as_to_kctx[as_no] && (max_loops != (KBASE_AS_INACTIVE_MAX_LOOPS -1)))
+			dev_warn(kbdev->dev, "AS_ACTIVE_INT bit found to be set for disabled AS %d", as_no);
+#endif
+
+		if (max_loops)
+			continue;
+
+		dev_warn(kbdev->dev, "AS_ACTIVE_INT bit stuck for AS %d", as_no);
+
+		if (kbase_prepare_to_reset_gpu(kbdev, 0))
+			kbase_reset_gpu(kbdev);
+		return;
+	}
+#endif
+}
+
 static bool can_power_down_l2(struct kbase_device *kbdev)
 {
 #if MALI_USE_CSF
 	/* Due to the HW issue GPU2019-3878, need to prevent L2 power off
 	 * whilst MMU command is in progress.
+	 * Also defer the power-down if MMU is in process of page migration.
 	 */
-	return !kbdev->mmu_hw_operation_in_progress;
+	return !kbdev->mmu_hw_operation_in_progress && !kbdev->mmu_page_migrate_in_progress;
+#else
+	return !kbdev->mmu_page_migrate_in_progress;
+#endif
+}
+
+static bool can_power_up_l2(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	/* Avoiding l2 transition if MMU is undergoing page migration */
+	return !kbdev->mmu_page_migrate_in_progress;
+}
+
+static bool need_tiler_control(struct kbase_device *kbdev)
+{
+#if MALI_USE_CSF
+	if (kbase_pm_no_mcu_core_pwroff(kbdev))
+		return true;
+	else
+		return false;
 #else
 	return true;
 #endif
@@ -1078,9 +1322,8 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 {
 	struct kbase_pm_backend_data *backend = &kbdev->pm.backend;
 	u64 l2_present = kbdev->gpu_props.curr_config.l2_present;
-#if !MALI_USE_CSF
 	u64 tiler_present = kbdev->gpu_props.props.raw_props.tiler_present;
-#endif
+	bool l2_power_up_done;
 	enum kbase_l2_core_state prev_state;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
@@ -1092,23 +1335,12 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 		u64 l2_ready = kbase_pm_get_ready_cores(kbdev,
 				KBASE_PM_CORE_L2);
 
-#if !MALI_USE_CSF
-		u64 tiler_trans = kbase_pm_get_trans_cores(kbdev,
-				KBASE_PM_CORE_TILER);
-		u64 tiler_ready = kbase_pm_get_ready_cores(kbdev,
-				KBASE_PM_CORE_TILER);
-#endif
-
+#ifdef CONFIG_MALI_ARBITER_SUPPORT
 		/*
 		 * kbase_pm_get_ready_cores and kbase_pm_get_trans_cores
 		 * are vulnerable to corruption if gpu is lost
 		 */
-		if (kbase_is_gpu_removed(kbdev)
-#ifdef CONFIG_MALI_ARBITER_SUPPORT
-				|| kbase_pm_is_gpu_lost(kbdev)) {
-#else
-				) {
-#endif
+		if (kbase_is_gpu_removed(kbdev) || kbase_pm_is_gpu_lost(kbdev)) {
 			backend->shaders_state =
 				KBASE_SHADERS_OFF_CORESTACK_OFF;
 			backend->hwcnt_desired = false;
@@ -1122,41 +1354,59 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 				 */
 				backend->l2_state =
 					KBASE_L2_ON_HWCNT_DISABLE;
+				KBASE_KTRACE_ADD(kbdev, PM_L2_ON_HWCNT_DISABLE, NULL,
+							backend->l2_state);
 				kbase_pm_trigger_hwcnt_disable(kbdev);
 			}
 
 			if (backend->hwcnt_disabled) {
 				backend->l2_state = KBASE_L2_OFF;
+				KBASE_KTRACE_ADD(kbdev, PM_L2_OFF, NULL, backend->l2_state);
 				dev_dbg(kbdev->dev, "GPU lost has occurred - L2 off\n");
 			}
 			break;
 		}
+#endif
 
 		/* mask off ready from trans in case transitions finished
 		 * between the register reads
 		 */
 		l2_trans &= ~l2_ready;
-#if !MALI_USE_CSF
-		tiler_trans &= ~tiler_ready;
-#endif
+
 		prev_state = backend->l2_state;
 
 		switch (backend->l2_state) {
 		case KBASE_L2_OFF:
-			if (kbase_pm_is_l2_desired(kbdev)) {
+			if (kbase_pm_is_l2_desired(kbdev) && can_power_up_l2(kbdev)) {
+#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
+				// Workaround: give a short pause here before starting L2 transition.
+				udelay(200);
+				/* Enable HW timer of IPA control before
+				 * L2 cache is powered-up.
+				 */
+				kbase_ipa_control_handle_gpu_sleep_exit(kbdev);
+#endif
 				/*
 				 * Set the desired config for L2 before
 				 * powering it on
 				 */
 				kbase_pm_l2_config_override(kbdev);
 				kbase_pbha_write_settings(kbdev);
-#if !MALI_USE_CSF
-				/* L2 is required, power on.  Powering on the
-				 * tiler will also power the first L2 cache.
-				 */
-				kbase_pm_invoke(kbdev, KBASE_PM_CORE_TILER,
-						tiler_present, ACTION_PWRON);
 
+				/* If Host is controlling the power for shader
+				 * cores, then it also needs to control the
+				 * power for Tiler.
+				 * Powering on the tiler will also power the
+				 * L2 cache.
+				 */
+				if (need_tiler_control(kbdev)) {
+					kbase_pm_invoke(kbdev, KBASE_PM_CORE_TILER, tiler_present,
+							ACTION_PWRON);
+				} else {
+					kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2, l2_present,
+							ACTION_PWRON);
+				}
+#if !MALI_USE_CSF
 				/* If we have more than one L2 cache then we
 				 * must power them on explicitly.
 				 */
@@ -1166,30 +1416,34 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 							ACTION_PWRON);
 				/* Clear backend slot submission kctx */
 				kbase_pm_l2_clear_backend_slot_submit_kctx(kbdev);
-#else
-				/* With CSF firmware, Host driver doesn't need to
-				 * handle power management with both shader and tiler cores.
-				 * The CSF firmware will power up the cores appropriately.
-				 * So only power the l2 cache explicitly.
-				 */
-				kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2,
-						l2_present, ACTION_PWRON);
 #endif
 				backend->l2_state = KBASE_L2_PEND_ON;
 			}
 			break;
 
 		case KBASE_L2_PEND_ON:
-#if !MALI_USE_CSF
-			if (!l2_trans && l2_ready == l2_present && !tiler_trans
-					&& tiler_ready == tiler_present) {
-				KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_TILER, NULL,
-						tiler_ready);
-#else
+			l2_power_up_done = false;
 			if (!l2_trans && l2_ready == l2_present) {
-				KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_L2, NULL,
-						l2_ready);
-#endif
+				if (need_tiler_control(kbdev)) {
+					u64 tiler_trans = kbase_pm_get_trans_cores(
+						kbdev, KBASE_PM_CORE_TILER);
+					u64 tiler_ready = kbase_pm_get_ready_cores(
+						kbdev, KBASE_PM_CORE_TILER);
+					tiler_trans &= ~tiler_ready;
+
+					if (!tiler_trans && tiler_ready == tiler_present) {
+						KBASE_KTRACE_ADD(kbdev,
+								 PM_CORES_CHANGE_AVAILABLE_TILER,
+								 NULL, tiler_ready);
+						l2_power_up_done = true;
+					}
+				} else {
+					KBASE_KTRACE_ADD(kbdev, PM_CORES_CHANGE_AVAILABLE_L2, NULL,
+							 l2_ready);
+					l2_power_up_done = true;
+				}
+			}
+			if (l2_power_up_done) {
 				/*
 				 * Ensure snoops are enabled after L2 is powered
 				 * up. Note that kbase keeps track of the snoop
@@ -1356,14 +1610,15 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 			if (kbase_pm_is_l2_desired(kbdev))
 				backend->l2_state = KBASE_L2_PEND_ON;
 			else if (can_power_down_l2(kbdev)) {
-				if (!backend->l2_always_on)
+				if (!backend->l2_always_on) {
+					wait_as_active_int(kbdev);
 					/* Powering off the L2 will also power off the
 					 * tiler.
 					 */
 					kbase_pm_invoke(kbdev, KBASE_PM_CORE_L2,
 							l2_present,
 							ACTION_PWROFF);
-				else
+				} else
 					/* If L2 cache is powered then we must flush it
 					 * before we power off the GPU. Normally this
 					 * would have been handled when the L2 was
@@ -1385,12 +1640,26 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 				/* We only need to check the L2 here - if the L2
 				 * is off then the tiler is definitely also off.
 				 */
-				if (!l2_trans && !l2_ready)
+				if (!l2_trans && !l2_ready) {
+#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
+					/* Allow clock gating within the GPU and prevent it
+					 * from being seen as active during sleep.
+					 */
+					kbase_ipa_control_handle_gpu_sleep_enter(kbdev);
+#endif
 					/* L2 is now powered off */
 					backend->l2_state = KBASE_L2_OFF;
+				}
 			} else {
-				if (!kbdev->cache_clean_in_progress)
+				if (!kbdev->cache_clean_in_progress) {
+#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
+					/* Allow clock gating within the GPU and prevent it
+					 * from being seen as active during sleep.
+					 */
+					kbase_ipa_control_handle_gpu_sleep_enter(kbdev);
+#endif
 					backend->l2_state = KBASE_L2_OFF;
+				}
 			}
 			break;
 
@@ -1405,11 +1674,19 @@ static int kbase_pm_l2_update_state(struct kbase_device *kbdev)
 					backend->l2_state);
 		}
 
-		if (backend->l2_state != prev_state)
+		if (backend->l2_state != prev_state) {
+			struct kbase_pm_event_log_event *event =
+					kbase_pm_add_log_event(kbdev);
+			event->type = KBASE_PM_LOG_EVENT_L2_STATE;
+			event->state.prev = prev_state;
+			event->state.next = backend->l2_state;
+
 			dev_dbg(kbdev->dev, "L2 state transition: %s to %s\n",
 				kbase_l2_core_state_to_string(prev_state),
 				kbase_l2_core_state_to_string(
 					backend->l2_state));
+			kbase_ktrace_log_l2_core_state(kbdev, backend->l2_state);
+		}
 
 	} while (backend->l2_state != prev_state);
 
@@ -1845,11 +2122,18 @@ static int kbase_pm_shaders_update_state(struct kbase_device *kbdev)
 			break;
 		}
 
-		if (backend->shaders_state != prev_state)
+		if (backend->shaders_state != prev_state) {
+			struct kbase_pm_event_log_event *event =
+					kbase_pm_add_log_event(kbdev);
+			event->type = KBASE_PM_LOG_EVENT_SHADERS_STATE;
+			event->state.prev = prev_state;
+			event->state.next = backend->shaders_state;
+
 			dev_dbg(kbdev->dev, "Shader state transition: %s to %s\n",
 				kbase_shader_core_state_to_string(prev_state),
 				kbase_shader_core_state_to_string(
 					backend->shaders_state));
+		}
 
 	} while (backend->shaders_state != prev_state);
 
@@ -1873,7 +2157,7 @@ static bool kbase_pm_is_in_desired_state_nolock(struct kbase_device *kbdev)
 			kbdev->pm.backend.shaders_state != KBASE_SHADERS_OFF_CORESTACK_OFF)
 		in_desired_state = false;
 #else
-	in_desired_state = kbase_pm_mcu_is_in_desired_state(kbdev);
+	in_desired_state &= kbase_pm_mcu_is_in_desired_state(kbdev);
 #endif
 
 	return in_desired_state;
@@ -1910,6 +2194,22 @@ static void kbase_pm_trace_power_state(struct kbase_device *kbdev)
 {
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
+	{
+		struct kbase_pm_event_log_event *event =
+				kbase_pm_add_log_event(kbdev);
+		event->type = KBASE_PM_LOG_EVENT_CORES;
+		event->cores.l2 = kbase_pm_get_state(
+				kbdev, KBASE_PM_CORE_L2, ACTION_READY);
+		event->cores.shader = kbase_pm_get_state(
+				kbdev, KBASE_PM_CORE_SHADER, ACTION_READY);
+		event->cores.tiler = kbase_pm_get_state(
+				kbdev, KBASE_PM_CORE_TILER, ACTION_READY);
+		if (corestack_driver_control) {
+			event->cores.stack = kbase_pm_get_state(
+					kbdev, KBASE_PM_CORE_STACK, ACTION_READY);
+		}
+	}
+
 	KBASE_TLSTREAM_AUX_PM_STATE(
 			kbdev,
 			KBASE_PM_CORE_L2,
@@ -2048,6 +2348,9 @@ int kbase_pm_state_machine_init(struct kbase_device *kbdev)
 	}
 
 	INIT_WORK(&kbdev->pm.backend.core_idle_work, core_idle_worker);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	INIT_WORK(&kbdev->pm.backend.sc_rails_on_work, sc_rails_on_worker);
+#endif
 #endif
 
 	return 0;
@@ -2070,6 +2373,7 @@ void kbase_pm_reset_start_locked(struct kbase_device *kbdev)
 
 	backend->in_reset = true;
 	backend->l2_state = KBASE_L2_RESET_WAIT;
+	KBASE_KTRACE_ADD(kbdev, PM_L2_RESET_WAIT, NULL, backend->l2_state);
 #if !MALI_USE_CSF
 	backend->shaders_state = KBASE_SHADERS_RESET_WAIT;
 #else
@@ -2078,6 +2382,7 @@ void kbase_pm_reset_start_locked(struct kbase_device *kbdev)
 	 */
 	if (likely(kbdev->csf.firmware_inited)) {
 		backend->mcu_state = KBASE_MCU_RESET_WAIT;
+		KBASE_KTRACE_ADD(kbdev, PM_MCU_RESET_WAIT, NULL, backend->mcu_state);
 #ifdef KBASE_PM_RUNTIME
 		backend->exit_gpu_sleep_mode = true;
 #endif
@@ -2134,22 +2439,38 @@ void kbase_pm_reset_complete(struct kbase_device *kbdev)
 #define PM_TIMEOUT_MS (5000) /* 5s */
 #endif
 
-static void kbase_pm_timed_out(struct kbase_device *kbdev)
+void kbase_gpu_timeout_debug_message(struct kbase_device *kbdev, const char *timeout_msg)
 {
 	unsigned long flags;
 
-	dev_err(kbdev->dev, "Power transition timed out unexpectedly\n");
+	dev_err(kbdev->dev, "%s", timeout_msg);
 #if !MALI_USE_CSF
 	CSTD_UNUSED(flags);
 	dev_err(kbdev->dev, "Desired state :\n");
 	dev_err(kbdev->dev, "\tShader=%016llx\n",
 			kbdev->pm.backend.shaders_desired ? kbdev->pm.backend.shaders_avail : 0);
 #else
+	dev_err(kbdev->dev, "GPU pm state :\n");
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	dev_err(kbdev->dev, "\tscheduler.pm_active_count = %d", kbdev->csf.scheduler.pm_active_count);
+	dev_err(kbdev->dev, "\tpoweron_required %d pm.active_count %d invoke_poweroff_wait_wq_when_l2_off %d",
+		kbdev->pm.backend.poweron_required,
+		kbdev->pm.active_count,
+		kbdev->pm.backend.invoke_poweroff_wait_wq_when_l2_off);
+	dev_err(kbdev->dev, "\tgpu_poweroff_wait_work pending %d",
+		work_pending(&kbdev->pm.backend.gpu_poweroff_wait_work));
 	dev_err(kbdev->dev, "\tMCU desired = %d\n",
 		kbase_pm_is_mcu_desired(kbdev));
 	dev_err(kbdev->dev, "\tMCU sw state = %d\n",
 		kbdev->pm.backend.mcu_state);
+	dev_err(kbdev->dev, "\tL2 desired = %d (locked_off: %d)\n",
+		kbase_pm_is_l2_desired(kbdev), kbdev->pm.backend.policy_change_clamp_state_to_off);
+	dev_err(kbdev->dev, "\tL2 sw state = %d\n",
+		kbdev->pm.backend.l2_state);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	dev_err(kbdev->dev, "\tbackend.sc_power_rails_off = %d\n",
+		kbdev->pm.backend.sc_power_rails_off);
+#endif
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 #endif
 	dev_err(kbdev->dev, "Current state :\n");
@@ -2169,8 +2490,7 @@ static void kbase_pm_timed_out(struct kbase_device *kbdev)
 			kbase_reg_read(kbdev,
 				GPU_CONTROL_REG(L2_READY_LO)));
 #if MALI_USE_CSF
-	dev_err(kbdev->dev, "\tMCU status = %d\n",
-		kbase_reg_read(kbdev, GPU_CONTROL_REG(MCU_STATUS)));
+	kbase_csf_debug_dump_registers(kbdev);
 #endif
 	dev_err(kbdev->dev, "Cores transitioning :\n");
 	dev_err(kbdev->dev, "\tShader=%08x%08x\n",
@@ -2189,9 +2509,28 @@ static void kbase_pm_timed_out(struct kbase_device *kbdev)
 			kbase_reg_read(kbdev, GPU_CONTROL_REG(
 					L2_PWRTRANS_LO)));
 
+	dump_stack();
+}
+
+static void kbase_pm_timed_out(struct kbase_device *kbdev, const char *timeout_msg)
+{
+	kbase_gpu_timeout_debug_message(kbdev, timeout_msg);
+	/* pixel: If either:
+	 *   1. L2/MCU power transition timed out, or,
+	 *   2. kbase state machine fell out of sync with the hw state,
+	 * a soft/hard reset (ie writing to SOFT/HARD_RESET regs) is insufficient to resume
+	 * operation.
+	 *
+	 * Besides, Odin TRM advises against touching SOFT/HARD_RESET
+	 * regs if L2_PWRTRANS is 1 to avoid undefined state.
+	 *
+	 * We have already lost work if we end up here, so send a powercycle to reset the hw,
+	 * which is more reliable.
+	 */
 	dev_err(kbdev->dev, "Sending reset to GPU - all running jobs will be lost\n");
 	if (kbase_prepare_to_reset_gpu(kbdev,
-				       RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+				       RESET_FLAGS_HWC_UNRECOVERABLE_ERROR |
+				       RESET_FLAGS_FORCE_PM_HW_RESET))
 		kbase_reset_gpu(kbdev);
 }
 
@@ -2214,15 +2553,22 @@ int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev)
 
 	/* Wait for cores */
 #if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE
-	remaining = wait_event_killable_timeout(
+	remaining = wait_event_killable_timeout(kbdev->pm.backend.gpu_in_desired_state_wait,
+						kbase_pm_is_in_desired_state_with_l2_powered(kbdev),
+						timeout);
 #else
 	remaining = wait_event_timeout(
-#endif
 		kbdev->pm.backend.gpu_in_desired_state_wait,
 		kbase_pm_is_in_desired_state_with_l2_powered(kbdev), timeout);
+#endif
 
 	if (!remaining) {
-		kbase_pm_timed_out(kbdev);
+		const struct gpu_uevent evt = {
+			.type = GPU_UEVENT_TYPE_KMD_ERROR,
+			.info = GPU_UEVENT_INFO_L2_PM_TIMEOUT
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
+		kbase_pm_timed_out(kbdev, "Wait for desired PM state with L2 powered timed out");
 		err = -ETIMEDOUT;
 	} else if (remaining < 0) {
 		dev_info(
@@ -2234,7 +2580,7 @@ int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev)
 	return err;
 }
 
-int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev)
+static int pm_wait_for_desired_state(struct kbase_device *kbdev, bool killable_wait)
 {
 	unsigned long flags;
 	long remaining;
@@ -2252,27 +2598,193 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev)
 
 	/* Wait for cores */
 #if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE
+	if (killable_wait)
+		remaining = wait_event_killable_timeout(kbdev->pm.backend.gpu_in_desired_state_wait,
+							kbase_pm_is_in_desired_state(kbdev),
+							timeout);
+#else
+	killable_wait = false;
+#endif
+	if (!killable_wait)
+		remaining = wait_event_timeout(kbdev->pm.backend.gpu_in_desired_state_wait,
+					       kbase_pm_is_in_desired_state(kbdev), timeout);
+	if (!remaining) {
+		const struct gpu_uevent evt = {
+			.type = GPU_UEVENT_TYPE_KMD_ERROR,
+			.info = GPU_UEVENT_INFO_PM_TIMEOUT
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
+		kbase_pm_timed_out(kbdev, "Wait for power transition timed out");
+		err = -ETIMEDOUT;
+	} else if (remaining < 0) {
+		WARN_ON_ONCE(!killable_wait);
+		dev_info(kbdev->dev, "Wait for power transition got interrupted");
+		err = (int)remaining;
+	}
+
+	return err;
+}
+
+int kbase_pm_killable_wait_for_desired_state(struct kbase_device *kbdev)
+{
+	return pm_wait_for_desired_state(kbdev, true);
+}
+
+int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev)
+{
+	return pm_wait_for_desired_state(kbdev, false);
+}
+KBASE_EXPORT_TEST_API(kbase_pm_wait_for_desired_state);
+
+#if MALI_USE_CSF
+/**
+ * core_mask_update_done - Check if downscaling of shader cores is done
+ *
+ * @kbdev: The kbase device structure for the device.
+ *
+ * This function checks if the downscaling of cores is effectively complete.
+ *
+ * Return: true if the downscale is done.
+ */
+static bool core_mask_update_done(struct kbase_device *kbdev)
+{
+	bool update_done = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	/* If MCU is in stable ON state then it implies that the downscale
+	 * request had completed.
+	 * If MCU is not active then it implies all cores are off, so can
+	 * consider the downscale request as complete.
+	 */
+	if ((kbdev->pm.backend.mcu_state == KBASE_MCU_ON) ||
+	    kbase_pm_is_mcu_inactive(kbdev, kbdev->pm.backend.mcu_state))
+		update_done = true;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return update_done;
+}
+
+int kbase_pm_wait_for_cores_down_scale(struct kbase_device *kbdev)
+{
+	long timeout = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_PM_TIMEOUT));
+	long remaining;
+	int err = 0;
+
+	/* Wait for core mask update to complete  */
+#if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE
 	remaining = wait_event_killable_timeout(
 		kbdev->pm.backend.gpu_in_desired_state_wait,
-		kbase_pm_is_in_desired_state(kbdev), timeout);
+		core_mask_update_done(kbdev), timeout);
 #else
 	remaining = wait_event_timeout(
 		kbdev->pm.backend.gpu_in_desired_state_wait,
-		kbase_pm_is_in_desired_state(kbdev), timeout);
+		core_mask_update_done(kbdev), timeout);
 #endif
 
 	if (!remaining) {
-		kbase_pm_timed_out(kbdev);
+		kbase_pm_timed_out(kbdev, "Wait for cores down scaling timed out");
 		err = -ETIMEDOUT;
 	} else if (remaining < 0) {
-		dev_info(kbdev->dev,
-			 "Wait for desired PM state got interrupted");
+		dev_info(
+			kbdev->dev,
+			"Wait for cores down scaling got interrupted");
 		err = (int)remaining;
 	}
 
 	return err;
 }
-KBASE_EXPORT_TEST_API(kbase_pm_wait_for_desired_state);
+#endif
+
+static bool is_poweroff_wait_in_progress(struct kbase_device *kbdev)
+{
+	bool ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	ret = kbdev->pm.backend.poweroff_wait_in_progress;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return ret;
+}
+
+static int pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev, bool killable_wait)
+{
+	long remaining;
+#if MALI_USE_CSF
+	/* gpu_poweroff_wait_work would be subjected to the kernel scheduling
+	 * and so the wait time can't only be the function of GPU frequency.
+	 */
+	const unsigned int extra_wait_time_ms = 2000;
+	const long timeout = kbase_csf_timeout_in_jiffies(
+		kbase_get_timeout_ms(kbdev, CSF_PM_TIMEOUT) + extra_wait_time_ms);
+#else
+#ifdef CONFIG_MALI_ARBITER_SUPPORT
+	/* Handling of timeout error isn't supported for arbiter builds */
+	const long timeout = MAX_SCHEDULE_TIMEOUT;
+#else
+	const long timeout = msecs_to_jiffies(PM_TIMEOUT_MS);
+#endif
+#endif
+	int err = 0;
+
+#if KERNEL_VERSION(4, 13, 1) <= LINUX_VERSION_CODE
+	if (killable_wait)
+		remaining = wait_event_killable_timeout(kbdev->pm.backend.poweroff_wait,
+							!is_poweroff_wait_in_progress(kbdev),
+							timeout);
+#else
+	killable_wait = false;
+#endif
+
+	if (!killable_wait)
+		remaining = wait_event_timeout(kbdev->pm.backend.poweroff_wait,
+					       !is_poweroff_wait_in_progress(kbdev), timeout);
+	if (!remaining) {
+		/* If work is now pending, kbase_pm_gpu_poweroff_wait_wq() will
+		 * definitely be called, so it's safe to continue waiting for it.
+		 */
+		if (work_pending(&kbdev->pm.backend.gpu_poweroff_wait_work)) {
+			wait_event_killable(kbdev->pm.backend.poweroff_wait,
+			                    !is_poweroff_wait_in_progress(kbdev));
+		} else {
+			unsigned long flags;
+			kbasep_platform_event_core_dump(kbdev, "poweroff work timeout");
+			kbase_gpu_timeout_debug_message(kbdev, "failed to wait for poweroff worker");
+#if MALI_USE_CSF
+			//csf.scheduler.state should be accessed with scheduler lock!
+			//callchains go through this function though holding that lock
+			//so just print without locking.
+			dev_err(kbdev->dev, "scheduler.state %d", kbdev->csf.scheduler.state);
+			dev_err(kbdev->dev, "Firmware ping %d", kbase_csf_firmware_ping_wait(kbdev, 0));
+#endif
+			//Attempt another state machine transition prompt.
+			dev_err(kbdev->dev, "Attempt to prompt state machine");
+			spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+			kbase_pm_update_state(kbdev);
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+			kbase_gpu_timeout_debug_message(kbdev, "GPU state after re-prompt of state machine");
+			err = -ETIMEDOUT;
+		}
+	} else if (remaining < 0) {
+		WARN_ON_ONCE(!killable_wait);
+		dev_info(kbdev->dev, "Wait for poweroff work got interrupted");
+		err = (int)remaining;
+	}
+	return err;
+}
+
+int kbase_pm_killable_wait_for_poweroff_work_complete(struct kbase_device *kbdev)
+{
+	return pm_wait_for_poweroff_work_complete(kbdev, true);
+}
+
+int kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev)
+{
+	return pm_wait_for_poweroff_work_complete(kbdev, false);
+}
+KBASE_EXPORT_TEST_API(kbase_pm_wait_for_poweroff_work_complete);
 
 void kbase_pm_enable_interrupts(struct kbase_device *kbdev)
 {
@@ -2291,12 +2803,12 @@ void kbase_pm_enable_interrupts(struct kbase_device *kbdev)
 	kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), 0xFFFFFFFF);
 	kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK), 0xFFFFFFFF);
 
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF);
 #if MALI_USE_CSF
 	/* Enable only the Page fault bits part */
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0xFFFF);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0xFFFF);
 #else
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0xFFFFFFFF);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0xFFFFFFFF);
 #endif
 }
 
@@ -2316,8 +2828,8 @@ void kbase_pm_disable_interrupts_nolock(struct kbase_device *kbdev)
 	kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK), 0);
 	kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), 0xFFFFFFFF);
 
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0);
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), 0xFFFFFFFF);
 }
 
 void kbase_pm_disable_interrupts(struct kbase_device *kbdev)
@@ -2332,24 +2844,37 @@ void kbase_pm_disable_interrupts(struct kbase_device *kbdev)
 KBASE_EXPORT_TEST_API(kbase_pm_disable_interrupts);
 
 #if MALI_USE_CSF
+/**
+ * update_user_reg_page_mapping - Update the mapping for USER Register page
+ *
+ * @kbdev: The kbase device structure for the device.
+ *
+ * This function must be called to unmap the dummy or real page from USER Register page
+ * mapping whenever GPU is powered up or down. The dummy or real page would get
+ * appropriately mapped in when Userspace reads the LATEST_FLUSH value.
+ */
 static void update_user_reg_page_mapping(struct kbase_device *kbdev)
 {
+	struct kbase_context *kctx, *n;
+
 	lockdep_assert_held(&kbdev->pm.lock);
 
 	mutex_lock(&kbdev->csf.reg_lock);
-	if (kbdev->csf.mali_file_inode) {
-		/* This would zap the pte corresponding to the mapping of User
-		 * register page for all the Kbase contexts.
+	list_for_each_entry_safe(kctx, n, &kbdev->csf.user_reg.list, csf.user_reg.link) {
+		/* This would zap the PTE corresponding to the mapping of User
+		 * Register page of the kbase context. The mapping will be reestablished
+		 * when the context (user process) needs to access to the page.
 		 */
-		unmap_mapping_range(kbdev->csf.mali_file_inode->i_mapping,
-				    BASEP_MEM_CSF_USER_REG_PAGE_HANDLE,
-				    PAGE_SIZE, 1);
+		unmap_mapping_range(kbdev->csf.user_reg.filp->f_inode->i_mapping,
+				    kctx->csf.user_reg.file_offset << PAGE_SHIFT, PAGE_SIZE, 1);
+		list_del_init(&kctx->csf.user_reg.link);
+		dev_dbg(kbdev->dev, "Updated USER Reg page mapping of ctx %d_%d", kctx->tgid,
+			kctx->id);
 	}
 	mutex_unlock(&kbdev->csf.reg_lock);
 }
 #endif
 
-
 /*
  * pmu layout:
  * 0x0000: PMU TAG (RO) (0xCAFECAFE)
@@ -2487,7 +3012,6 @@ void kbase_pm_clock_on(struct kbase_device *kbdev, bool is_resume)
 		backend->gpu_idled = false;
 	}
 #endif
-
 }
 
 KBASE_EXPORT_TEST_API(kbase_pm_clock_on);
@@ -2722,9 +3246,13 @@ static int kbase_pm_hw_issues_detect(struct kbase_device *kbdev)
 	kbdev->hw_quirks_tiler = 0;
 	kbdev->hw_quirks_mmu = 0;
 
-	if (!of_property_read_u32(np, "quirks_gpu", &kbdev->hw_quirks_gpu)) {
-		dev_info(kbdev->dev,
-			 "Found quirks_gpu = [0x%x] in Devicetree\n",
+	/* Read the "-" versions of the properties and fall back to
+	 * the "_" versions if these are not found
+	 */
+
+	if (!of_property_read_u32(np, "quirks-gpu", &kbdev->hw_quirks_gpu) ||
+	    !of_property_read_u32(np, "quirks_gpu", &kbdev->hw_quirks_gpu)) {
+		dev_info(kbdev->dev, "Found quirks_gpu = [0x%x] in Devicetree\n",
 			 kbdev->hw_quirks_gpu);
 	} else {
 		error = kbase_set_gpu_quirks(kbdev, prod_id);
@@ -2732,33 +3260,30 @@ static int kbase_pm_hw_issues_detect(struct kbase_device *kbdev)
 			return error;
 	}
 
-	if (!of_property_read_u32(np, "quirks_sc",
-				&kbdev->hw_quirks_sc)) {
-		dev_info(kbdev->dev,
-			"Found quirks_sc = [0x%x] in Devicetree\n",
-			kbdev->hw_quirks_sc);
+	if (!of_property_read_u32(np, "quirks-sc", &kbdev->hw_quirks_sc) ||
+	    !of_property_read_u32(np, "quirks_sc", &kbdev->hw_quirks_sc)) {
+		dev_info(kbdev->dev, "Found quirks_sc = [0x%x] in Devicetree\n",
+			 kbdev->hw_quirks_sc);
 	} else {
 		error = kbase_set_sc_quirks(kbdev, prod_id);
 		if (error)
 			return error;
 	}
 
-	if (!of_property_read_u32(np, "quirks_tiler",
-				&kbdev->hw_quirks_tiler)) {
-		dev_info(kbdev->dev,
-			"Found quirks_tiler = [0x%x] in Devicetree\n",
-			kbdev->hw_quirks_tiler);
+	if (!of_property_read_u32(np, "quirks-tiler", &kbdev->hw_quirks_tiler) ||
+	    !of_property_read_u32(np, "quirks_tiler", &kbdev->hw_quirks_tiler)) {
+		dev_info(kbdev->dev, "Found quirks_tiler = [0x%x] in Devicetree\n",
+			 kbdev->hw_quirks_tiler);
 	} else {
 		error = kbase_set_tiler_quirks(kbdev);
 		if (error)
 			return error;
 	}
 
-	if (!of_property_read_u32(np, "quirks_mmu",
-				&kbdev->hw_quirks_mmu)) {
-		dev_info(kbdev->dev,
-			"Found quirks_mmu = [0x%x] in Devicetree\n",
-			kbdev->hw_quirks_mmu);
+	if (!of_property_read_u32(np, "quirks-mmu", &kbdev->hw_quirks_mmu) ||
+	    !of_property_read_u32(np, "quirks_mmu", &kbdev->hw_quirks_mmu)) {
+		dev_info(kbdev->dev, "Found quirks_mmu = [0x%x] in Devicetree\n",
+			 kbdev->hw_quirks_mmu);
 	} else {
 		error = kbase_set_mmu_quirks(kbdev);
 	}
@@ -2827,15 +3352,73 @@ static void reenable_protected_mode_hwcnt(struct kbase_device *kbdev)
 }
 #endif
 
+static int kbase_pm_hw_reset(struct kbase_device *kbdev)
+{
+	unsigned long flags;
+	bool gpu_ready;
+
+	lockdep_assert_held(&kbdev->pm.lock);
+
+	if (!kbdev->pm.backend.callback_hardware_reset) {
+		dev_warn(kbdev->dev, "No hardware reset provided");
+		return -EINVAL;
+	}
+
+	/* Save GPU power state */
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	WARN_ON(!kbdev->pm.backend.gpu_powered);
+	gpu_ready = kbdev->pm.backend.gpu_ready;
+	kbdev->pm.backend.gpu_ready = false;
+	kbdev->pm.backend.gpu_powered = false;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+#if MALI_USE_CSF
+	/* Swap for dummy page */
+	update_user_reg_page_mapping(kbdev);
+#endif
+
+	/* Delegate hardware reset to platform */
+	kbdev->pm.backend.callback_hardware_reset(kbdev);
+
+#if MALI_USE_CSF
+	/* Swap for real page */
+	update_user_reg_page_mapping(kbdev);
+#endif
+
+	/* GPU is powered again, restore state */
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	kbdev->pm.backend.gpu_powered = true;
+	kbdev->pm.backend.gpu_ready = gpu_ready;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	/* Check register access for success */
+	if (kbase_is_gpu_removed(kbdev)) {
+		dev_err(kbdev->dev, "Registers in-accessible after platform reset");
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int kbase_pm_do_reset(struct kbase_device *kbdev)
 {
 	struct kbasep_reset_timeout_data rtdata;
 	int ret;
 
+#if MALI_USE_CSF
+	if (kbdev->csf.reset.force_pm_hw_reset && kbdev->pm.backend.callback_hardware_reset) {
+		dev_err(kbdev->dev, "Power Cycle reset mali");
+		kbdev->csf.reset.force_pm_hw_reset = false;
+		return kbase_pm_hw_reset(kbdev);
+	}
+#endif
+
 	KBASE_KTRACE_ADD(kbdev, CORE_GPU_SOFT_RESET, NULL, 0);
 
 	KBASE_TLSTREAM_JD_GPU_SOFT_RESET(kbdev, kbdev);
 
+	/* Unmask the reset complete interrupt only */
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), RESET_COMPLETED);
+
 	if (kbdev->pm.backend.callback_soft_reset) {
 		ret = kbdev->pm.backend.callback_soft_reset(kbdev);
 		if (ret < 0)
@@ -2847,9 +3430,6 @@ static int kbase_pm_do_reset(struct kbase_device *kbdev)
 				GPU_COMMAND_SOFT_RESET);
 	}
 
-	/* Unmask the reset complete interrupt only */
-	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK), RESET_COMPLETED);
-
 	/* Initialize a structure for tracking the status of the reset */
 	rtdata.kbdev = kbdev;
 	rtdata.timed_out = false;
@@ -2921,8 +3501,12 @@ static int kbase_pm_do_reset(struct kbase_device *kbdev)
 
 		destroy_hrtimer_on_stack(&rtdata.timer);
 
-		dev_err(kbdev->dev, "Failed to hard-reset the GPU (timed out after %d ms)\n",
-					RESET_TIMEOUT);
+		dev_err(kbdev->dev,
+			"Failed to hard-reset the GPU (timed out after %d ms) GPU_IRQ_RAWSTAT: %d\n",
+			RESET_TIMEOUT, kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)));
+
+		/* Last resort, trigger a hardware reset of the GPU */
+		return kbase_pm_hw_reset(kbdev);
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	}
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
@@ -2959,6 +3543,10 @@ int kbase_pm_init_hw(struct kbase_device *kbdev, unsigned int flags)
 
 		kbdev->pm.backend.gpu_powered = true;
 	}
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/* Ensure the SC rail is up otherwise the FW will get stuck during reset */
+	kbase_pm_turn_on_sc_power_rails_locked(kbdev);
+#endif
 
 	/* Ensure interrupts are off to begin with, this also clears any
 	 * outstanding interrupts
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c
new file mode 100644
index 0000000..b752af8
--- /dev/null
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022 Google LLC. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <backend/gpu/mali_kbase_pm_event_log.h>
+
+static inline u32 kbase_pm_next_log_event(
+	struct kbase_pm_event_log *log)
+{
+	u32 ret = log->last_event;
+	++ret;
+	ret %= EVENT_LOG_MAX;
+	log->last_event = ret;
+	return ret;
+}
+
+struct kbase_pm_event_log_event *kbase_pm_add_log_event(
+	struct kbase_device *kbdev)
+{
+	struct kbase_pm_event_log *log = &kbdev->pm.backend.event_log;
+	struct kbase_pm_event_log_event *ret = NULL;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	ret = &log->events[kbase_pm_next_log_event(log)];
+
+	memset(ret, 0, sizeof(*ret));
+	ret->timestamp = ktime_get();
+	return ret;
+}
+
+/**
+ * struct kbase_pm_event_log_metadata - Info about the event log.
+ *
+ * @magic: always 'kpel', helps find the log in memory dumps
+ * @version: updated whenever the binary layout changes
+ * @events_address: the memory address of the log, or in a file the offset
+ *                  from the start of the metadata to the log
+ * @num_events: the capacity of the event log
+ * @event_stride: distance between log entries, to aid in parsing if only some
+ *                entry types are supported by the parser
+ **/
+struct kbase_pm_event_log_metadata {
+	char magic[4];
+	u8 version;
+	u64 events_address;
+	u32 num_events;
+	u32 event_stride;
+} __attribute__((packed));
+
+static struct kbase_pm_event_log_metadata global_event_log_metadata;
+
+void kbase_pm_init_event_log(struct kbase_device *kbdev)
+{
+	struct kbase_pm_event_log_metadata *md =
+			&global_event_log_metadata;
+	kbdev->pm.backend.event_log.last_event = -1;
+	md->magic[0] = 'k';
+	md->magic[1] = 'p';
+	md->magic[2] = 'e';
+	md->magic[3] = 'l';
+	md->version = 1;
+	md->num_events = EVENT_LOG_MAX;
+	md->events_address = (u64)kbdev->pm.backend.event_log.events;
+	md->event_stride = ((u8*)&kbdev->pm.backend.event_log.events[1] -
+			    (u8*)&kbdev->pm.backend.event_log.events[0]);
+}
+
+u64 kbase_pm_max_event_log_size(struct kbase_device *kbdev)
+{
+	return sizeof(struct kbase_pm_event_log_metadata)
+			+ sizeof(kbdev->pm.backend.event_log.events);
+}
+
+int kbase_pm_copy_event_log(struct kbase_device *kbdev,
+		void *buffer, u64 size)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	if (size < kbase_pm_max_event_log_size(kbdev)) {
+		return -EINVAL;
+	}
+	memcpy(buffer, &global_event_log_metadata,
+	       sizeof(global_event_log_metadata));
+	memcpy(((u8*)buffer) + sizeof(global_event_log_metadata),
+	       &kbdev->pm.backend.event_log.events,
+	       sizeof(kbdev->pm.backend.event_log.events));
+	((struct kbase_pm_event_log_metadata*)buffer)->events_address =
+			sizeof(struct kbase_pm_event_log_metadata);
+
+	return 0;
+}
+
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h
new file mode 100644
index 0000000..072efa5
--- /dev/null
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_event_log.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 Google LLC. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/*
+ * Power management API definitions used internally by GPU backend
+ */
+
+#ifndef _KBASE_BACKEND_PM_EVENT_LOG_H_
+#define _KBASE_BACKEND_PM_EVENT_LOG_H_
+
+#include <mali_kbase.h>
+#include <mali_kbase_pm.h>
+
+/**
+ * kbase_pm_add_log_event - Add a newly-initialized event to the event log.
+ *
+ * @kbdev: Device pointer
+ *
+ * Return: a pointer to the event, which has been nulled out and had its
+ * timestamp set to the current time.
+ *
+ */
+struct kbase_pm_event_log_event *kbase_pm_add_log_event(
+	struct kbase_device *kbdev);
+
+#endif /* _KBASE_BACKEND_PM_INTERNAL_H_ */
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_internal.h b/mali_kbase/backend/gpu/mali_kbase_pm_internal.h
index 68ded7d..d7f19fb 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_internal.h
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_internal.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -224,7 +224,7 @@ void kbase_pm_reset_done(struct kbase_device *kbdev);
  * power off in progress and kbase_pm_context_active() was called instead of
  * kbase_csf_scheduler_pm_active().
  *
- * Return: 0 on success, error code on error
+ * Return: 0 on success, or -ETIMEDOUT code on timeout error.
  */
 int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev);
 #else
@@ -247,12 +247,27 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev);
  * must ensure that this is not the case by, for example, calling
  * kbase_pm_wait_for_poweroff_work_complete()
  *
- * Return: 0 on success, error code on error
+ * Return: 0 on success, or -ETIMEDOUT error code on timeout error.
  */
 int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev);
 #endif
 
 /**
+ * kbase_pm_killable_wait_for_desired_state - Wait for the desired power state to be
+ *                                            reached in a killable state.
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ *
+ * This function is same as kbase_pm_wait_for_desired_state(), expect that it would
+ * allow the SIGKILL signal to interrupt the wait.
+ * This function is supposed to be called from the code that is executed in ioctl or
+ * Userspace context, wherever it is safe to do so.
+ *
+ * Return: 0 on success, or -ETIMEDOUT code on timeout error or -ERESTARTSYS if the
+ *         wait was interrupted.
+ */
+int kbase_pm_killable_wait_for_desired_state(struct kbase_device *kbdev);
+
+/**
  * kbase_pm_wait_for_l2_powered - Wait for the L2 cache to be powered on
  *
  * @kbdev: The kbase device structure for the device (must be a valid pointer)
@@ -269,6 +284,37 @@ int kbase_pm_wait_for_desired_state(struct kbase_device *kbdev);
  */
 int kbase_pm_wait_for_l2_powered(struct kbase_device *kbdev);
 
+#if MALI_USE_CSF
+/**
+ * kbase_pm_wait_for_cores_down_scale - Wait for the downscaling of shader cores
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ *
+ * This function can be called to ensure that the downscaling of cores is
+ * effectively complete and it would be safe to lower the voltage.
+ * The function assumes that caller had exercised the MCU state machine for the
+ * downscale request through the kbase_pm_update_state() function.
+ *
+ * This function needs to be used by the caller to safely wait for the completion
+ * of downscale request, instead of kbase_pm_wait_for_desired_state().
+ * The downscale request would trigger a state change in MCU state machine
+ * and so when MCU reaches the stable ON state, it can be inferred that
+ * downscaling is complete. But it has been observed that the wake up of the
+ * waiting thread can get delayed by few milli seconds and by the time the
+ * thread wakes up the power down transition could have started (after the
+ * completion of downscale request).
+ * On the completion of power down transition another wake up signal would be
+ * sent, but again by the time thread wakes up the power up transition can begin.
+ * And the power up transition could then get blocked inside the platform specific
+ * callback_power_on() function due to the thread that called into Kbase (from the
+ * platform specific code) to perform the downscaling and then ended up waiting
+ * for the completion of downscale request.
+ *
+ * Return: 0 on success, error code on error or remaining jiffies on timeout.
+ */
+int kbase_pm_wait_for_cores_down_scale(struct kbase_device *kbdev);
+#endif
+
 /**
  * kbase_pm_update_dynamic_cores_onoff - Update the L2 and shader power state
  *                                       machines after changing shader core
@@ -436,8 +482,26 @@ void kbase_pm_release_gpu_cycle_counter_nolock(struct kbase_device *kbdev);
  * This function effectively just waits for the @gpu_poweroff_wait_work work
  * item to complete, if it was enqueued. GPU may not have been powered down
  * before this function returns.
+ *
+ * Return: 0 on success, error code on error
+ */
+int kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev);
+
+/**
+ * kbase_pm_killable_wait_for_poweroff_work_complete - Wait for the poweroff workqueue to
+ *                                                     complete in killable state.
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ *
+ * This function is same as kbase_pm_wait_for_poweroff_work_complete(), expect that
+ * it would allow the SIGKILL signal to interrupt the wait.
+ * This function is supposed to be called from the code that is executed in ioctl or
+ * Userspace context, wherever it is safe to do so.
+ *
+ * Return: 0 on success, or -ETIMEDOUT code on timeout error or -ERESTARTSYS if the
+ *         wait was interrupted.
  */
-void kbase_pm_wait_for_poweroff_work_complete(struct kbase_device *kbdev);
+int kbase_pm_killable_wait_for_poweroff_work_complete(struct kbase_device *kbdev);
 
 /**
  * kbase_pm_wait_for_gpu_power_down - Wait for the GPU power down to complete
@@ -800,7 +864,7 @@ bool kbase_pm_no_runnables_sched_suspendable(struct kbase_device *kbdev)
 
 /**
  * kbase_pm_no_mcu_core_pwroff - Check whether the PM is required to keep the
- *                               MCU core powered in accordance to the active
+ *                               MCU shader Core powered in accordance to the active
  *                               power management policy
  *
  * @kbdev: Device pointer
@@ -826,6 +890,8 @@ static inline bool kbase_pm_mcu_is_in_desired_state(struct kbase_device *kbdev)
 {
 	bool in_desired_state = true;
 
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
 	if (kbase_pm_is_mcu_desired(kbdev) && kbdev->pm.backend.mcu_state != KBASE_MCU_ON)
 		in_desired_state = false;
 	else if (!kbase_pm_is_mcu_desired(kbdev) &&
@@ -869,7 +935,7 @@ static inline void kbase_pm_lock(struct kbase_device *kbdev)
 #if !MALI_USE_CSF
 	mutex_lock(&kbdev->js_data.runpool_mutex);
 #endif /* !MALI_USE_CSF */
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 }
 
 /**
@@ -879,7 +945,7 @@ static inline void kbase_pm_lock(struct kbase_device *kbdev)
  */
 static inline void kbase_pm_unlock(struct kbase_device *kbdev)
 {
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 #if !MALI_USE_CSF
 	mutex_unlock(&kbdev->js_data.runpool_mutex);
 #endif /* !MALI_USE_CSF */
@@ -964,4 +1030,27 @@ static inline void kbase_pm_disable_db_mirror_interrupt(struct kbase_device *kbd
 }
 #endif
 
+/**
+ * kbase_pm_l2_allow_mmu_page_migration - L2 state allows MMU page migration or not
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ *
+ * Check whether the L2 state is in power transition phase or not. If it is, the MMU
+ * page migration should be deferred. The caller must hold hwaccess_lock, and, if MMU
+ * page migration is intended, immediately start the MMU migration action without
+ * dropping the lock. When page migration begins, a flag is set in kbdev that would
+ * prevent the L2 state machine traversing into power transition phases, until
+ * the MMU migration action ends.
+ *
+ * Return: true if MMU page migration is allowed
+ */
+static inline bool kbase_pm_l2_allow_mmu_page_migration(struct kbase_device *kbdev)
+{
+	struct kbase_pm_backend_data *backend = &kbdev->pm.backend;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	return (backend->l2_state != KBASE_L2_PEND_ON && backend->l2_state != KBASE_L2_PEND_OFF);
+}
+
 #endif /* _KBASE_BACKEND_PM_INTERNAL_H_ */
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h b/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h
index 5e57c9d..3b448e3 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_mcu_states.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -66,6 +66,13 @@
  *                                      is being put to sleep.
  * @ON_PEND_SLEEP:                      MCU sleep is in progress.
  * @IN_SLEEP:                           Sleep request is completed and MCU has halted.
+ * @ON_PMODE_ENTER_CORESIGHT_DISABLE:   The MCU is on, protected mode enter is about to
+ *                                      be requested, Coresight is being disabled.
+ * @ON_PMODE_EXIT_CORESIGHT_ENABLE :    The MCU is on, protected mode exit has happened
+ *                                      Coresight is being enabled.
+ * @CORESIGHT_DISABLE:                  The MCU is on and Coresight is being disabled.
+ * @CORESIGHT_ENABLE:                   The MCU is on, host does not have control and
+ *                                      Coresight is being enabled.
  */
 KBASEP_MCU_STATE(OFF)
 KBASEP_MCU_STATE(PEND_ON_RELOAD)
@@ -92,3 +99,10 @@ KBASEP_MCU_STATE(HCTL_SHADERS_CORE_OFF_PEND)
 KBASEP_MCU_STATE(ON_SLEEP_INITIATE)
 KBASEP_MCU_STATE(ON_PEND_SLEEP)
 KBASEP_MCU_STATE(IN_SLEEP)
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+/* Additional MCU states for Coresight */
+KBASEP_MCU_STATE(ON_PMODE_ENTER_CORESIGHT_DISABLE)
+KBASEP_MCU_STATE(ON_PMODE_EXIT_CORESIGHT_ENABLE)
+KBASEP_MCU_STATE(CORESIGHT_DISABLE)
+KBASEP_MCU_STATE(CORESIGHT_ENABLE)
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c b/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c
index f85b466..5d98bd7 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_metrics.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,6 +24,7 @@
  */
 
 #include <mali_kbase.h>
+#include <mali_kbase_config_defaults.h>
 #include <mali_kbase_pm.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 
@@ -37,38 +38,64 @@
 #include <backend/gpu/mali_kbase_pm_defs.h>
 #include <mali_linux_trace.h>
 
+#if defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS) || !MALI_USE_CSF
 /* Shift used for kbasep_pm_metrics_data.time_busy/idle - units of (1 << 8) ns
  * This gives a maximum period between samples of 2^(32+8)/100 ns = slightly
  * under 11s. Exceeding this will cause overflow
  */
 #define KBASE_PM_TIME_SHIFT			8
+#endif
 
 #if MALI_USE_CSF
 /* To get the GPU_ACTIVE value in nano seconds unit */
 #define GPU_ACTIVE_SCALING_FACTOR ((u64)1E9)
 #endif
 
+/*
+ * Possible state transitions
+ * ON        -> ON | OFF | STOPPED
+ * STOPPED   -> ON | OFF
+ * OFF       -> ON
+ *
+ *
+ * ┌─e─┐┌────────────f─────────────┐
+ * │   v│                          v
+ * └───ON ──a──> STOPPED ──b──> OFF
+ *     ^^            │             │
+ *     │└──────c─────┘             │
+ *     │                           │
+ *     └─────────────d─────────────┘
+ *
+ * Transition effects:
+ * a. None
+ * b. Timer expires without restart
+ * c. Timer is not stopped, timer period is unaffected
+ * d. Timer must be restarted
+ * e. Callback is executed and the timer is restarted
+ * f. Timer is cancelled, or the callback is waited on if currently executing. This is called during
+ *    tear-down and should not be subject to a race from an OFF->ON transition
+ */
+enum dvfs_metric_timer_state { TIMER_OFF, TIMER_STOPPED, TIMER_ON };
+
 #ifdef CONFIG_MALI_MIDGARD_DVFS
 static enum hrtimer_restart dvfs_callback(struct hrtimer *timer)
 {
-	unsigned long flags;
 	struct kbasep_pm_metrics_state *metrics;
 
-	KBASE_DEBUG_ASSERT(timer != NULL);
+	if (WARN_ON(!timer))
+		return HRTIMER_NORESTART;
 
 	metrics = container_of(timer, struct kbasep_pm_metrics_state, timer);
-	kbase_pm_get_dvfs_action(metrics->kbdev);
-
-	spin_lock_irqsave(&metrics->lock, flags);
 
-	if (metrics->timer_active)
-		hrtimer_start(timer,
-			HR_TIMER_DELAY_MSEC(metrics->kbdev->pm.dvfs_period),
-			HRTIMER_MODE_REL);
+	/* Transition (b) to fully off if timer was stopped, don't restart the timer in this case */
+	if (atomic_cmpxchg(&metrics->timer_state, TIMER_STOPPED, TIMER_OFF) != TIMER_ON)
+		return HRTIMER_NORESTART;
 
-	spin_unlock_irqrestore(&metrics->lock, flags);
+	kbase_pm_get_dvfs_action(metrics->kbdev);
 
-	return HRTIMER_NORESTART;
+	/* Set the new expiration time and restart (transition e) */
+	hrtimer_forward_now(timer, HR_TIMER_DELAY_MSEC(metrics->kbdev->pm.dvfs_period));
+	return HRTIMER_RESTART;
 }
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
 
@@ -83,7 +110,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev)
 
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 	kbdev->pm.backend.metrics.kbdev = kbdev;
-	kbdev->pm.backend.metrics.time_period_start = ktime_get();
+	kbdev->pm.backend.metrics.time_period_start = ktime_get_raw();
 	kbdev->pm.backend.metrics.values.time_busy = 0;
 	kbdev->pm.backend.metrics.values.time_idle = 0;
 	kbdev->pm.backend.metrics.values.time_in_protm = 0;
@@ -111,7 +138,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev)
 #else
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 	kbdev->pm.backend.metrics.kbdev = kbdev;
-	kbdev->pm.backend.metrics.time_period_start = ktime_get();
+	kbdev->pm.backend.metrics.time_period_start = ktime_get_raw();
 
 	kbdev->pm.backend.metrics.gpu_active = false;
 	kbdev->pm.backend.metrics.active_cl_ctx[0] = 0;
@@ -134,6 +161,7 @@ int kbasep_pm_metrics_init(struct kbase_device *kbdev)
 							HRTIMER_MODE_REL);
 	kbdev->pm.backend.metrics.timer.function = dvfs_callback;
 	kbdev->pm.backend.metrics.initialized = true;
+	atomic_set(&kbdev->pm.backend.metrics.timer_state, TIMER_OFF);
 	kbase_pm_metrics_start(kbdev);
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
 
@@ -152,16 +180,12 @@ KBASE_EXPORT_TEST_API(kbasep_pm_metrics_init);
 void kbasep_pm_metrics_term(struct kbase_device *kbdev)
 {
 #ifdef CONFIG_MALI_MIDGARD_DVFS
-	unsigned long flags;
-
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 
-	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
-	kbdev->pm.backend.metrics.timer_active = false;
-	spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags);
-
-	hrtimer_cancel(&kbdev->pm.backend.metrics.timer);
+	/* Cancel the timer, and block if the callback is currently executing (transition f) */
 	kbdev->pm.backend.metrics.initialized = false;
+	atomic_set(&kbdev->pm.backend.metrics.timer_state, TIMER_OFF);
+	hrtimer_cancel(&kbdev->pm.backend.metrics.timer);
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
 
 #if MALI_USE_CSF
@@ -177,7 +201,7 @@ KBASE_EXPORT_TEST_API(kbasep_pm_metrics_term);
  */
 #if MALI_USE_CSF
 #if defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS)
-static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
+static bool kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
 {
 	int err;
 	u64 gpu_active_counter;
@@ -199,7 +223,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
 	 * elapsed time. The lock taken inside kbase_ipa_control_query()
 	 * function can cause lot of variation.
 	 */
-	now = ktime_get();
+	now = ktime_get_raw();
 
 	if (err) {
 		dev_err(kbdev->dev,
@@ -215,7 +239,20 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
 		diff_ns_signed = ktime_to_ns(diff);
 
 		if (diff_ns_signed < 0)
-			return;
+			return false;
+
+		/*
+		 * The GPU internal counter is updated every IPA_CONTROL_TIMER_DEFAULT_VALUE_MS
+		 * milliseconds. If an update occurs prematurely and the counter has not been
+		 * updated, the same counter value will be obtained, resulting in a difference
+		 * of zero. To handle this scenario, we will skip the update if the difference
+		 * is zero and the update occurred less than 1.5 times the internal update period
+		 * (IPA_CONTROL_TIMER_DEFAULT_VALUE_MS). Ideally, we should check the counter
+		 * update timestamp in the GPU internal register to ensure accurate updates.
+		 */
+		if (gpu_active_counter == 0 &&
+			diff_ns_signed < IPA_CONTROL_TIMER_DEFAULT_VALUE_MS * NSEC_PER_MSEC * 3 / 2)
+			return false;
 
 		diff_ns = (u64)diff_ns_signed;
 
@@ -231,12 +268,14 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
 		 * time.
 		 */
 		if (!kbdev->pm.backend.metrics.skip_gpu_active_sanity_check) {
-			/* Use a margin value that is approximately 1% of the time
-			 * difference.
+			/* The margin is scaled to allow for the worst-case
+			 * scenario where the samples are maximally separated,
+			 * plus a small offset for sampling errors.
 			 */
-			u64 margin_ns = diff_ns >> 6;
+			u64 const MARGIN_NS =
+				IPA_CONTROL_TIMER_DEFAULT_VALUE_MS * NSEC_PER_MSEC * 3 / 2;
 
-			if (gpu_active_counter > (diff_ns + margin_ns)) {
+			if (gpu_active_counter > (diff_ns + MARGIN_NS)) {
 				dev_info(
 					kbdev->dev,
 					"GPU activity takes longer than time interval: %llu ns > %llu ns",
@@ -282,10 +321,11 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev)
 	}
 
 	kbdev->pm.backend.metrics.time_period_start = now;
+	return true;
 }
 #endif /* defined(CONFIG_MALI_DEVFREQ) || defined(CONFIG_MALI_MIDGARD_DVFS) */
 #else
-static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev,
+static bool kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev,
 					       ktime_t now)
 {
 	ktime_t diff;
@@ -294,7 +334,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev,
 
 	diff = ktime_sub(now, kbdev->pm.backend.metrics.time_period_start);
 	if (ktime_to_ns(diff) < 0)
-		return;
+		return false;
 
 	if (kbdev->pm.backend.metrics.gpu_active) {
 		u32 ns_time = (u32) (ktime_to_ns(diff) >> KBASE_PM_TIME_SHIFT);
@@ -316,6 +356,7 @@ static void kbase_pm_get_dvfs_utilisation_calc(struct kbase_device *kbdev,
 	}
 
 	kbdev->pm.backend.metrics.time_period_start = now;
+	return true;
 }
 #endif  /* MALI_USE_CSF */
 
@@ -329,10 +370,13 @@ void kbase_pm_get_dvfs_metrics(struct kbase_device *kbdev,
 
 	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
 #if MALI_USE_CSF
-	kbase_pm_get_dvfs_utilisation_calc(kbdev);
+	if (!kbase_pm_get_dvfs_utilisation_calc(kbdev)) {
 #else
-	kbase_pm_get_dvfs_utilisation_calc(kbdev, ktime_get());
+	if (!kbase_pm_get_dvfs_utilisation_calc(kbdev, ktime_get_raw())) {
 #endif
+		spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags);
+		return;
+	}
 
 	memset(diff, 0, sizeof(*diff));
 	diff->time_busy = cur->time_busy - last->time_busy;
@@ -396,57 +440,33 @@ void kbase_pm_get_dvfs_action(struct kbase_device *kbdev)
 
 bool kbase_pm_metrics_is_active(struct kbase_device *kbdev)
 {
-	bool isactive;
-	unsigned long flags;
-
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 
-	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
-	isactive = kbdev->pm.backend.metrics.timer_active;
-	spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags);
-
-	return isactive;
+	return atomic_read(&kbdev->pm.backend.metrics.timer_state) == TIMER_ON;
 }
 KBASE_EXPORT_TEST_API(kbase_pm_metrics_is_active);
 
 void kbase_pm_metrics_start(struct kbase_device *kbdev)
 {
-	unsigned long flags;
-	bool update = true;
+	struct kbasep_pm_metrics_state *metrics = &kbdev->pm.backend.metrics;
 
-	if (unlikely(!kbdev->pm.backend.metrics.initialized))
+	if (unlikely(!metrics->initialized))
 		return;
 
-	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
-	if (!kbdev->pm.backend.metrics.timer_active)
-		kbdev->pm.backend.metrics.timer_active = true;
-	else
-		update = false;
-	spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags);
-
-	if (update)
-		hrtimer_start(&kbdev->pm.backend.metrics.timer,
-			HR_TIMER_DELAY_MSEC(kbdev->pm.dvfs_period),
-			HRTIMER_MODE_REL);
+	/* Transition to ON, from a stopped state (transition c) */
+	if (atomic_xchg(&metrics->timer_state, TIMER_ON) == TIMER_OFF)
+		/* Start the timer only if it's been fully stopped (transition d)*/
+		hrtimer_start(&metrics->timer, HR_TIMER_DELAY_MSEC(kbdev->pm.dvfs_period),
+			      HRTIMER_MODE_REL);
 }
 
 void kbase_pm_metrics_stop(struct kbase_device *kbdev)
 {
-	unsigned long flags;
-	bool update = true;
-
 	if (unlikely(!kbdev->pm.backend.metrics.initialized))
 		return;
 
-	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
-	if (kbdev->pm.backend.metrics.timer_active)
-		kbdev->pm.backend.metrics.timer_active = false;
-	else
-		update = false;
-	spin_unlock_irqrestore(&kbdev->pm.backend.metrics.lock, flags);
-
-	if (update)
-		hrtimer_cancel(&kbdev->pm.backend.metrics.timer);
+	/* Timer is Stopped if its currently on (transition a) */
+	atomic_cmpxchg(&kbdev->pm.backend.metrics.timer_state, TIMER_ON, TIMER_STOPPED);
 }
 
 
@@ -462,7 +482,7 @@ void kbase_pm_metrics_stop(struct kbase_device *kbdev)
  */
 static void kbase_pm_metrics_active_calc(struct kbase_device *kbdev)
 {
-	int js;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->pm.backend.metrics.lock);
 
@@ -512,7 +532,7 @@ void kbase_pm_metrics_update(struct kbase_device *kbdev, ktime_t *timestamp)
 	spin_lock_irqsave(&kbdev->pm.backend.metrics.lock, flags);
 
 	if (!timestamp) {
-		now = ktime_get();
+		now = ktime_get_raw();
 		timestamp = &now;
 	}
 
diff --git a/mali_kbase/backend/gpu/mali_kbase_pm_policy.c b/mali_kbase/backend/gpu/mali_kbase_pm_policy.c
index cb38c6e..7d7650c 100644
--- a/mali_kbase/backend/gpu/mali_kbase_pm_policy.c
+++ b/mali_kbase/backend/gpu/mali_kbase_pm_policy.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -54,7 +54,9 @@ void kbase_pm_policy_init(struct kbase_device *kbdev)
 	unsigned long flags;
 	int i;
 
-	if (of_property_read_string(np, "power_policy", &power_policy_name) == 0) {
+	/* Read "power-policy" property and fallback to "power_policy" if not found */
+	if ((of_property_read_string(np, "power-policy", &power_policy_name) == 0) ||
+	    (of_property_read_string(np, "power_policy", &power_policy_name) == 0)) {
 		for (i = 0; i < ARRAY_SIZE(all_policy_list); i++)
 			if (sysfs_streq(all_policy_list[i]->name, power_policy_name)) {
 				default_policy = all_policy_list[i];
@@ -117,10 +119,12 @@ void kbase_pm_update_active(struct kbase_device *kbdev)
 		} else {
 			/* Cancel the invocation of
 			 * kbase_pm_gpu_poweroff_wait_wq() from the L2 state
-			 * machine. This is safe - it
+			 * machine. This is safe - if
 			 * invoke_poweroff_wait_wq_when_l2_off is true, then
 			 * the poweroff work hasn't even been queued yet,
-			 * meaning we can go straight to powering on.
+			 * meaning we can go straight to powering on. We must
+			 * however wake_up(poweroff_wait) in case someone was
+			 * waiting for poweroff_wait_in_progress to become false.
 			 */
 			pm->backend.invoke_poweroff_wait_wq_when_l2_off = false;
 			pm->backend.poweroff_wait_in_progress = false;
@@ -130,6 +134,7 @@ void kbase_pm_update_active(struct kbase_device *kbdev)
 #endif
 
 			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+			wake_up(&kbdev->pm.backend.poweroff_wait);
 			kbase_pm_do_poweron(kbdev, false);
 		}
 	} else {
@@ -293,6 +298,10 @@ void kbase_pm_set_policy(struct kbase_device *kbdev,
 	unsigned int new_policy_csf_pm_sched_flags;
 	bool sched_suspend;
 	bool reset_gpu = false;
+	bool reset_op_prevented = true;
+	struct kbase_csf_scheduler *scheduler = NULL;
+	u32 pwroff;
+	bool switching_to_always_on;
 #endif
 
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
@@ -301,9 +310,33 @@ void kbase_pm_set_policy(struct kbase_device *kbdev,
 	KBASE_KTRACE_ADD(kbdev, PM_SET_POLICY, NULL, new_policy->id);
 
 #if MALI_USE_CSF
+	pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev);
+	switching_to_always_on = new_policy == &kbase_pm_always_on_policy_ops;
+	if (pwroff == 0 && !switching_to_always_on) {
+		dev_warn(kbdev->dev,
+			"power_policy: cannot switch away from always_on with mcu_shader_pwroff_timeout set to 0\n");
+		dev_warn(kbdev->dev,
+			"power_policy: resetting mcu_shader_pwroff_timeout to default value to switch policy from always_on\n");
+		kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev);
+	}
+
+	scheduler = &kbdev->csf.scheduler;
+	KBASE_DEBUG_ASSERT(scheduler != NULL);
+
 	/* Serialize calls on kbase_pm_set_policy() */
 	mutex_lock(&kbdev->pm.backend.policy_change_lock);
 
+	if (kbase_reset_gpu_prevent_and_wait(kbdev)) {
+		dev_warn(kbdev->dev, "Set PM policy failing to prevent gpu reset");
+		reset_op_prevented = false;
+	}
+
+	/* In case of CSF, the scheduler may be invoked to suspend. In that
+	 * case, there is a risk that the L2 may be turned on by the time we
+	 * check it here. So we hold the scheduler lock to avoid other operations
+	 * interfering with the policy change and vice versa.
+	 */
+	rt_mutex_lock(&scheduler->lock);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	/* policy_change_clamp_state_to_off, when needed, is set/cleared in
 	 * this function, a very limited temporal scope for covering the
@@ -316,23 +349,22 @@ void kbase_pm_set_policy(struct kbase_device *kbdev,
 	 * the always_on policy, reflected by the CSF_DYNAMIC_PM_CORE_KEEP_ON
 	 * flag bit.
 	 */
-	sched_suspend = kbdev->csf.firmware_inited &&
+	sched_suspend = reset_op_prevented &&
 			(CSF_DYNAMIC_PM_CORE_KEEP_ON &
-			 (new_policy_csf_pm_sched_flags |
-			  kbdev->pm.backend.csf_pm_sched_flags));
+			 (new_policy_csf_pm_sched_flags | kbdev->pm.backend.csf_pm_sched_flags));
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
-	if (sched_suspend)
-		kbase_csf_scheduler_pm_suspend(kbdev);
+	if (sched_suspend) {
+		/* Update the suspend flag to reflect actually suspend being done ! */
+		sched_suspend = !kbase_csf_scheduler_pm_suspend_no_lock(kbdev);
+		/* Set the reset recovery flag if the required suspend failed */
+		reset_gpu = !sched_suspend;
+	}
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	/* If the current active policy is always_on, one needs to clamp the
-	 * MCU/L2 for reaching off-state
-	 */
-	if (sched_suspend)
-		kbdev->pm.backend.policy_change_clamp_state_to_off =
-			CSF_DYNAMIC_PM_CORE_KEEP_ON & kbdev->pm.backend.csf_pm_sched_flags;
+
+	kbdev->pm.backend.policy_change_clamp_state_to_off = sched_suspend;
 
 	kbase_pm_update_state(kbdev);
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -392,13 +424,19 @@ void kbase_pm_set_policy(struct kbase_device *kbdev,
 
 #if MALI_USE_CSF
 	/* Reverse the suspension done */
+	if (sched_suspend)
+		kbase_csf_scheduler_pm_resume_no_lock(kbdev);
+	rt_mutex_unlock(&scheduler->lock);
+
+	if (reset_op_prevented)
+		kbase_reset_gpu_allow(kbdev);
+
 	if (reset_gpu) {
 		dev_warn(kbdev->dev, "Resorting to GPU reset for policy change\n");
 		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu(kbdev);
 		kbase_reset_gpu_wait(kbdev);
-	} else if (sched_suspend)
-		kbase_csf_scheduler_pm_resume(kbdev);
+	}
 
 	mutex_unlock(&kbdev->pm.backend.policy_change_lock);
 #endif
diff --git a/mali_kbase/backend/gpu/mali_kbase_time.c b/mali_kbase/backend/gpu/mali_kbase_time.c
index a83206a..28365c0 100644
--- a/mali_kbase/backend/gpu/mali_kbase_time.c
+++ b/mali_kbase/backend/gpu/mali_kbase_time.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,9 +21,47 @@
 
 #include <mali_kbase.h>
 #include <mali_kbase_hwaccess_time.h>
+#if MALI_USE_CSF
+#include <asm/arch_timer.h>
+#include <linux/gcd.h>
+#include <csf/mali_kbase_csf_timeout.h>
+#endif
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <mali_kbase_config_defaults.h>
+#include <linux/version_compat_defs.h>
+
+struct kbase_timeout_info {
+	char *selector_str;
+	u64 timeout_cycles;
+};
+
+#if MALI_USE_CSF
+static struct kbase_timeout_info timeout_info[KBASE_TIMEOUT_SELECTOR_COUNT] = {
+	[CSF_FIRMWARE_TIMEOUT] = { "CSF_FIRMWARE_TIMEOUT", MIN(CSF_FIRMWARE_TIMEOUT_CYCLES,
+							       CSF_FIRMWARE_PING_TIMEOUT_CYCLES) },
+	[CSF_PM_TIMEOUT] = { "CSF_PM_TIMEOUT", CSF_PM_TIMEOUT_CYCLES },
+	[CSF_GPU_RESET_TIMEOUT] = { "CSF_GPU_RESET_TIMEOUT", CSF_GPU_RESET_TIMEOUT_CYCLES },
+	[CSF_CSG_SUSPEND_TIMEOUT] = { "CSF_CSG_SUSPEND_TIMEOUT", CSF_CSG_SUSPEND_TIMEOUT_CYCLES },
+	[CSF_FIRMWARE_BOOT_TIMEOUT] = { "CSF_FIRMWARE_BOOT_TIMEOUT",
+					CSF_FIRMWARE_BOOT_TIMEOUT_CYCLES },
+	[CSF_FIRMWARE_PING_TIMEOUT] = { "CSF_FIRMWARE_PING_TIMEOUT",
+					CSF_FIRMWARE_PING_TIMEOUT_CYCLES },
+	[CSF_SCHED_PROTM_PROGRESS_TIMEOUT] = { "CSF_SCHED_PROTM_PROGRESS_TIMEOUT",
+					       DEFAULT_PROGRESS_TIMEOUT_CYCLES },
+	[MMU_AS_INACTIVE_WAIT_TIMEOUT] = { "MMU_AS_INACTIVE_WAIT_TIMEOUT",
+					   MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES },
+	[KCPU_FENCE_SIGNAL_TIMEOUT] = { "KCPU_FENCE_SIGNAL_TIMEOUT",
+					KCPU_FENCE_SIGNAL_TIMEOUT_CYCLES },
+};
+#else
+static struct kbase_timeout_info timeout_info[KBASE_TIMEOUT_SELECTOR_COUNT] = {
+	[MMU_AS_INACTIVE_WAIT_TIMEOUT] = { "MMU_AS_INACTIVE_WAIT_TIMEOUT",
+					   MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES },
+	[JM_DEFAULT_JS_FREE_TIMEOUT] = { "JM_DEFAULT_JS_FREE_TIMEOUT",
+					 JM_DEFAULT_JS_FREE_TIMEOUT_CYCLES },
+};
+#endif
 
 void kbase_backend_get_gpu_time_norequest(struct kbase_device *kbdev,
 					  u64 *cycle_counter,
@@ -103,72 +141,132 @@ void kbase_backend_get_gpu_time(struct kbase_device *kbdev, u64 *cycle_counter,
 #endif
 }
 
-unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev,
-				  enum kbase_timeout_selector selector)
+static u64 kbase_device_get_scaling_frequency(struct kbase_device *kbdev)
 {
+	u64 freq_khz = kbdev->lowest_gpu_freq_khz;
+
+	if (!freq_khz) {
+		dev_dbg(kbdev->dev,
+			"Lowest frequency uninitialized! Using reference frequency for scaling");
+		return DEFAULT_REF_TIMEOUT_FREQ_KHZ;
+	}
+
+	return freq_khz;
+}
+
+void kbase_device_set_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector,
+				 unsigned int timeout_ms)
+{
+	char *selector_str;
+
+	if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) {
+		selector = KBASE_DEFAULT_TIMEOUT;
+		selector_str = timeout_info[selector].selector_str;
+		dev_warn(kbdev->dev,
+			 "Unknown timeout selector passed, falling back to default: %s\n",
+			 timeout_info[selector].selector_str);
+	}
+	selector_str = timeout_info[selector].selector_str;
+
+	kbdev->backend_time.device_scaled_timeouts[selector] = timeout_ms;
+	dev_dbg(kbdev->dev, "\t%-35s: %ums\n", selector_str, timeout_ms);
+}
+
+void kbase_device_set_timeout(struct kbase_device *kbdev, enum kbase_timeout_selector selector,
+			      u64 timeout_cycles, u32 cycle_multiplier)
+{
+	u64 final_cycles;
+	u64 timeout;
+	u64 freq_khz = kbase_device_get_scaling_frequency(kbdev);
+
+	if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) {
+		selector = KBASE_DEFAULT_TIMEOUT;
+		dev_warn(kbdev->dev,
+			 "Unknown timeout selector passed, falling back to default: %s\n",
+			 timeout_info[selector].selector_str);
+	}
+
+	/* If the multiplication overflows, we will have unsigned wrap-around, and so might
+	 * end up with a shorter timeout. In those cases, we then want to have the largest
+	 * timeout possible that will not run into these issues. Note that this will not
+	 * wait for U64_MAX/frequency ms, as it will be clamped to a max of UINT_MAX
+	 * milliseconds by subsequent steps.
+	 */
+	if (check_mul_overflow(timeout_cycles, (u64)cycle_multiplier, &final_cycles))
+		final_cycles = U64_MAX;
+
 	/* Timeout calculation:
 	 * dividing number of cycles by freq in KHz automatically gives value
 	 * in milliseconds. nr_cycles will have to be multiplied by 1e3 to
 	 * get result in microseconds, and 1e6 to get result in nanoseconds.
 	 */
+	timeout = div_u64(final_cycles, freq_khz);
 
-	u64 timeout, nr_cycles = 0;
-	/* Default value to mean 'no cap' */
-	u64 timeout_cap = U64_MAX;
-	u64 freq_khz = kbdev->lowest_gpu_freq_khz;
-	/* Only for debug messages, safe default in case it's mis-maintained */
-	const char *selector_str = "(unknown)";
+	if (unlikely(timeout > UINT_MAX)) {
+		dev_dbg(kbdev->dev,
+			"Capping excessive timeout %llums for %s at freq %llukHz to UINT_MAX ms",
+			timeout, timeout_info[selector].selector_str,
+			kbase_device_get_scaling_frequency(kbdev));
+		timeout = UINT_MAX;
+	}
 
-	WARN_ON(!freq_khz);
+	kbase_device_set_timeout_ms(kbdev, selector, (unsigned int)timeout);
+}
 
-	switch (selector) {
-	case KBASE_TIMEOUT_SELECTOR_COUNT:
-	default:
-#if !MALI_USE_CSF
-		WARN(1, "Invalid timeout selector used! Using default value");
-		nr_cycles = JM_DEFAULT_TIMEOUT_CYCLES;
-		break;
-#else
-		/* Use Firmware timeout if invalid selection */
-		WARN(1,
-		     "Invalid timeout selector used! Using CSF Firmware timeout");
-		fallthrough;
-	case CSF_FIRMWARE_TIMEOUT:
-		selector_str = "CSF_FIRMWARE_TIMEOUT";
-		nr_cycles = CSF_FIRMWARE_TIMEOUT_CYCLES;
-		/* Setup a cap on CSF FW timeout to FIRMWARE_PING_INTERVAL_MS,
-		 * if calculated timeout exceeds it. This should be adapted to
-		 * a direct timeout comparison once the
-		 * FIRMWARE_PING_INTERVAL_MS option is added to this timeout
-		 * function. A compile-time check such as BUILD_BUG_ON can also
-		 * be done once the firmware ping interval in cycles becomes
-		 * available as a macro.
+/**
+ * kbase_timeout_scaling_init - Initialize the table of scaled timeout
+ *                              values associated with a @kbase_device.
+ *
+ * @kbdev:	KBase device pointer.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int kbase_timeout_scaling_init(struct kbase_device *kbdev)
+{
+	int err;
+	enum kbase_timeout_selector selector;
+
+	/* First, we initialize the minimum and maximum device frequencies, which
+	 * are used to compute the timeouts.
+	 */
+	err = kbase_pm_gpu_freq_init(kbdev);
+	if (unlikely(err < 0)) {
+		dev_dbg(kbdev->dev, "Could not initialize GPU frequency\n");
+		return err;
+	}
+
+	dev_dbg(kbdev->dev, "Scaling kbase timeouts:\n");
+	for (selector = 0; selector < KBASE_TIMEOUT_SELECTOR_COUNT; selector++) {
+		u32 cycle_multiplier = 1;
+		u64 nr_cycles = timeout_info[selector].timeout_cycles;
+#if MALI_USE_CSF
+		/* Special case: the scheduler progress timeout can be set manually,
+		 * and does not have a canonical length defined in the headers. Hence,
+		 * we query it once upon startup to get a baseline, and change it upon
+		 * every invocation of the appropriate functions
 		 */
-		timeout_cap = FIRMWARE_PING_INTERVAL_MS;
-		break;
-	case CSF_PM_TIMEOUT:
-		selector_str = "CSF_PM_TIMEOUT";
-		nr_cycles = CSF_PM_TIMEOUT_CYCLES;
-		break;
-	case CSF_GPU_RESET_TIMEOUT:
-		selector_str = "CSF_GPU_RESET_TIMEOUT";
-		nr_cycles = CSF_GPU_RESET_TIMEOUT_CYCLES;
-		break;
+		if (selector == CSF_SCHED_PROTM_PROGRESS_TIMEOUT)
+			nr_cycles = kbase_csf_timeout_get(kbdev);
 #endif
+
+		/* Since we are in control of the iteration bounds for the selector,
+		 * we don't have to worry about bounds checking when setting the timeout.
+		 */
+		kbase_device_set_timeout(kbdev, selector, nr_cycles, cycle_multiplier);
 	}
+	return 0;
+}
 
-	timeout = div_u64(nr_cycles, freq_khz);
-	if (timeout > timeout_cap) {
-		dev_dbg(kbdev->dev, "Capped %s %llu to %llu", selector_str,
-			(unsigned long long)timeout, (unsigned long long)timeout_cap);
-		timeout = timeout_cap;
+unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector)
+{
+	if (unlikely(selector >= KBASE_TIMEOUT_SELECTOR_COUNT)) {
+		dev_warn(kbdev->dev, "Querying wrong selector, falling back to default\n");
+		selector = KBASE_DEFAULT_TIMEOUT;
 	}
-	if (WARN(timeout > UINT_MAX,
-		 "Capping excessive timeout %llums for %s at freq %llukHz to UINT_MAX ms",
-		 (unsigned long long)timeout, selector_str, (unsigned long long)freq_khz))
-		timeout = UINT_MAX;
-	return (unsigned int)timeout;
+
+	return kbdev->backend_time.device_scaled_timeouts[selector];
 }
+KBASE_EXPORT_TEST_API(kbase_get_timeout_ms);
 
 u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev)
 {
@@ -186,3 +284,79 @@ u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev)
 
 	return lo | (((u64) hi1) << 32);
 }
+
+#if MALI_USE_CSF
+u64 __maybe_unused kbase_backend_time_convert_gpu_to_cpu(struct kbase_device *kbdev, u64 gpu_ts)
+{
+	if (WARN_ON(!kbdev))
+		return 0;
+
+	return div64_u64(gpu_ts * kbdev->backend_time.multiplier, kbdev->backend_time.divisor) +
+	       kbdev->backend_time.offset;
+}
+
+/**
+ * get_cpu_gpu_time() - Get current CPU and GPU timestamps.
+ *
+ * @kbdev:	Kbase device.
+ * @cpu_ts:	Output CPU timestamp.
+ * @gpu_ts:	Output GPU timestamp.
+ * @gpu_cycle:  Output GPU cycle counts.
+ */
+static void get_cpu_gpu_time(struct kbase_device *kbdev, u64 *cpu_ts, u64 *gpu_ts, u64 *gpu_cycle)
+{
+	struct timespec64 ts;
+
+	kbase_backend_get_gpu_time(kbdev, gpu_cycle, gpu_ts, &ts);
+
+	if (cpu_ts)
+		*cpu_ts = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+}
+#endif
+
+int kbase_backend_time_init(struct kbase_device *kbdev)
+{
+	int err = 0;
+#if MALI_USE_CSF
+	u64 cpu_ts = 0;
+	u64 gpu_ts = 0;
+	u64 freq;
+	u64 common_factor;
+
+	kbase_pm_register_access_enable(kbdev);
+	get_cpu_gpu_time(kbdev, &cpu_ts, &gpu_ts, NULL);
+	freq = arch_timer_get_cntfrq();
+
+	if (!freq) {
+		dev_warn(kbdev->dev, "arch_timer_get_rate() is zero!");
+		err = -EINVAL;
+		goto disable_registers;
+	}
+
+	common_factor = gcd(NSEC_PER_SEC, freq);
+
+	kbdev->backend_time.multiplier = div64_u64(NSEC_PER_SEC, common_factor);
+	kbdev->backend_time.divisor = div64_u64(freq, common_factor);
+
+	if (!kbdev->backend_time.divisor) {
+		dev_warn(kbdev->dev, "CPU to GPU divisor is zero!");
+		err = -EINVAL;
+		goto disable_registers;
+	}
+
+	kbdev->backend_time.offset = cpu_ts - div64_u64(gpu_ts * kbdev->backend_time.multiplier,
+							kbdev->backend_time.divisor);
+#endif
+
+	if (kbase_timeout_scaling_init(kbdev)) {
+		dev_warn(kbdev->dev, "Could not initialize timeout scaling");
+		err = -EINVAL;
+	}
+
+#if MALI_USE_CSF
+disable_registers:
+	kbase_pm_register_access_disable(kbdev);
+#endif
+
+	return err;
+}
diff --git a/mali_kbase/build.bp b/mali_kbase/build.bp
index 5dd5fd5..381b1fe 100644
--- a/mali_kbase/build.bp
+++ b/mali_kbase/build.bp
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,10 +28,11 @@ bob_defaults {
     defaults: [
         "kernel_defaults",
     ],
-    no_mali: {
+    mali_no_mali: {
         kbuild_options: [
             "CONFIG_MALI_NO_MALI=y",
             "CONFIG_MALI_NO_MALI_DEFAULT_GPU={{.gpu}}",
+            "CONFIG_GPU_HWVER={{.hwver}}",
         ],
     },
     mali_platform_dt_pin_rst: {
@@ -52,9 +53,6 @@ bob_defaults {
     mali_midgard_enable_trace: {
         kbuild_options: ["CONFIG_MALI_MIDGARD_ENABLE_TRACE=y"],
     },
-    mali_dma_fence: {
-        kbuild_options: ["CONFIG_MALI_DMA_FENCE=y"],
-    },
     mali_arbiter_support: {
         kbuild_options: ["CONFIG_MALI_ARBITER_SUPPORT=y"],
     },
@@ -64,8 +62,14 @@ bob_defaults {
     mali_dma_buf_legacy_compat: {
         kbuild_options: ["CONFIG_MALI_DMA_BUF_LEGACY_COMPAT=y"],
     },
-    mali_2mb_alloc: {
-        kbuild_options: ["CONFIG_MALI_2MB_ALLOC=y"],
+    large_page_alloc_override: {
+        kbuild_options: ["CONFIG_LARGE_PAGE_ALLOC_OVERRIDE=y"],
+    },
+    large_page_alloc: {
+        kbuild_options: ["CONFIG_LARGE_PAGE_ALLOC=y"],
+    },
+    page_migration_support: {
+        kbuild_options: ["CONFIG_PAGE_MIGRATION_SUPPORT=y"],
     },
     mali_memory_fully_backed: {
         kbuild_options: ["CONFIG_MALI_MEMORY_FULLY_BACKED=y"],
@@ -88,9 +92,6 @@ bob_defaults {
     mali_error_inject: {
         kbuild_options: ["CONFIG_MALI_ERROR_INJECT=y"],
     },
-    mali_gem5_build: {
-       kbuild_options: ["CONFIG_MALI_GEM5_BUILD=y"],
-    },
     mali_debug: {
         kbuild_options: [
             "CONFIG_MALI_DEBUG=y",
@@ -136,6 +137,27 @@ bob_defaults {
     mali_hw_errata_1485982_use_clock_alternative: {
         kbuild_options: ["CONFIG_MALI_HW_ERRATA_1485982_USE_CLOCK_ALTERNATIVE=y"],
     },
+    mali_host_controls_sc_rails: {
+        kbuild_options: ["CONFIG_MALI_HOST_CONTROLS_SC_RAILS=y"],
+    },
+    platform_is_fpga: {
+        kbuild_options: ["CONFIG_MALI_IS_FPGA=y"],
+    },
+    mali_coresight: {
+        kbuild_options: ["CONFIG_MALI_CORESIGHT=y"],
+    },
+    mali_fw_trace_mode_manual: {
+        kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_MANUAL=y"],
+    },
+    mali_fw_trace_mode_auto_print: {
+        kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_AUTO_PRINT=y"],
+    },
+    mali_fw_trace_mode_auto_discard: {
+        kbuild_options: ["CONFIG_MALI_FW_TRACE_MODE_AUTO_DISCARD=y"],
+    },
+    mali_trace_power_gpu_work_period: {
+        kbuild_options: ["CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD=y"],
+    },
     kbuild_options: [
         "CONFIG_MALI_PLATFORM_NAME={{.mali_platform_name}}",
         "MALI_CUSTOMER_RELEASE={{.release}}",
@@ -156,10 +178,8 @@ bob_defaults {
         // is an umbrella feature that would be open for inappropriate use
         // (catch-all for experimental CS code without separating it into
         // different features).
-        "MALI_INCREMENTAL_RENDERING={{.incremental_rendering}}",
-        "MALI_GPU_TIMESTAMP_CORRECTION={{.gpu_timestamp_correction}}",
+        "MALI_INCREMENTAL_RENDERING_JM={{.incremental_rendering_jm}}",
         "MALI_BASE_CSF_PERFORMANCE_TESTS={{.base_csf_performance_tests}}",
-        "MALI_GPU_TIMESTAMP_INTERPOLATION={{.gpu_timestamp_interpolation}}",
     ],
 }
 
@@ -178,6 +198,10 @@ bob_kernel_module {
         "context/*.c",
         "context/*.h",
         "context/Kbuild",
+        "hwcnt/*.c",
+        "hwcnt/*.h",
+        "hwcnt/backend/*.h",
+        "hwcnt/Kbuild",
         "ipa/*.c",
         "ipa/*.h",
         "ipa/Kbuild",
@@ -185,6 +209,15 @@ bob_kernel_module {
         "platform/*/*.c",
         "platform/*/*.h",
         "platform/*/Kbuild",
+        "platform/*/*/*.c",
+        "platform/*/*/*.h",
+        "platform/*/*/Kbuild",
+        "platform/*/*/*.c",
+        "platform/*/*/*.h",
+        "platform/*/*/Kbuild",
+        "platform/*/*/*/*.c",
+        "platform/*/*/*/*.h",
+        "platform/*/*/*/Kbuild",
         "thirdparty/*.c",
         "thirdparty/Kbuild",
         "debug/*.c",
@@ -211,6 +244,10 @@ bob_kernel_module {
             "device/backend/*_jm.c",
             "gpu/backend/*_jm.c",
             "gpu/backend/*_jm.h",
+            "hwcnt/backend/*_jm.c",
+            "hwcnt/backend/*_jm.h",
+            "hwcnt/backend/*_jm_*.c",
+            "hwcnt/backend/*_jm_*.h",
             "jm/*.h",
             "tl/backend/*_jm.c",
             "mmu/backend/*_jm.c",
@@ -232,6 +269,10 @@ bob_kernel_module {
             "device/backend/*_csf.c",
             "gpu/backend/*_csf.c",
             "gpu/backend/*_csf.h",
+            "hwcnt/backend/*_csf.c",
+            "hwcnt/backend/*_csf.h",
+            "hwcnt/backend/*_csf_*.c",
+            "hwcnt/backend/*_csf_*.h",
             "tl/backend/*_csf.c",
             "mmu/backend/*_csf.c",
             "ipa/backend/*_csf.c",
diff --git a/mali_kbase/context/backend/mali_kbase_context_csf.c b/mali_kbase/context/backend/mali_kbase_context_csf.c
index 7d45a08..45a5a6c 100644
--- a/mali_kbase/context/backend/mali_kbase_context_csf.c
+++ b/mali_kbase/context/backend/mali_kbase_context_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,27 +26,33 @@
 #include <context/mali_kbase_context_internal.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
 #include <mali_kbase.h>
-#include <mali_kbase_dma_fence.h>
 #include <mali_kbase_mem_linux.h>
 #include <mali_kbase_mem_pool_group.h>
 #include <mmu/mali_kbase_mmu.h>
 #include <tl/mali_kbase_timeline.h>
+#include <backend/gpu/mali_kbase_pm_internal.h>
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 #include <csf/mali_kbase_csf_csg_debugfs.h>
 #include <csf/mali_kbase_csf_kcpu_debugfs.h>
+#include <csf/mali_kbase_csf_sync_debugfs.h>
 #include <csf/mali_kbase_csf_tiler_heap_debugfs.h>
 #include <csf/mali_kbase_csf_cpu_queue_debugfs.h>
 #include <mali_kbase_debug_mem_view.h>
+#include <mali_kbase_debug_mem_zones.h>
+#include <mali_kbase_debug_mem_allocs.h>
 #include <mali_kbase_mem_pool_debugfs.h>
 
 void kbase_context_debugfs_init(struct kbase_context *const kctx)
 {
 	kbase_debug_mem_view_init(kctx);
+	kbase_debug_mem_zones_init(kctx);
+	kbase_debug_mem_allocs_init(kctx);
 	kbase_mem_pool_debugfs_init(kctx->kctx_dentry, kctx);
 	kbase_jit_debugfs_init(kctx);
 	kbase_csf_queue_group_debugfs_init(kctx);
 	kbase_csf_kcpu_debugfs_init(kctx);
+	kbase_csf_sync_debugfs_init(kctx);
 	kbase_csf_tiler_heap_debugfs_init(kctx);
 	kbase_csf_tiler_heap_total_debugfs_init(kctx);
 	kbase_csf_cpu_queue_debugfs_init(kctx);
@@ -96,6 +102,8 @@ static const struct kbase_context_init context_init[] = {
 	{ kbase_sticky_resource_init, kbase_context_sticky_resource_term,
 	  "Sticky resource initialization failed" },
 	{ kbase_jit_init, kbase_jit_term, "JIT initialization failed" },
+	{ kbasep_platform_context_init, kbasep_platform_context_term,
+	  "Platform callback for kctx initialization failed" },
 	{ kbase_csf_ctx_init, kbase_csf_ctx_term,
 	  "CSF context initialization failed" },
 	{ kbase_context_add_to_dev_list, kbase_context_remove_from_dev_list,
@@ -116,7 +124,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev,
 	bool is_compat,
 	base_context_create_flags const flags,
 	unsigned long const api_version,
-	struct file *const filp)
+	struct kbase_file *const kfile)
 {
 	struct kbase_context *kctx;
 	unsigned int i = 0;
@@ -135,9 +143,11 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev,
 
 	kctx->kbdev = kbdev;
 	kctx->api_version = api_version;
-	kctx->filp = filp;
+	kctx->kfile = kfile;
 	kctx->create_flags = flags;
 
+	memcpy(kctx->comm, current->comm, sizeof(current->comm));
+
 	if (is_compat)
 		kbase_ctx_flag_set(kctx, KCTX_COMPAT);
 #if defined(CONFIG_64BIT)
@@ -172,6 +182,7 @@ KBASE_EXPORT_SYMBOL(kbase_create_context);
 void kbase_destroy_context(struct kbase_context *kctx)
 {
 	struct kbase_device *kbdev;
+	int err;
 
 	if (WARN_ON(!kctx))
 		return;
@@ -192,6 +203,27 @@ void kbase_destroy_context(struct kbase_context *kctx)
 		wait_event(kbdev->pm.resume_wait,
 			   !kbase_pm_is_suspending(kbdev));
 	}
+	/*
+	 * Taking a pm reference does not guarantee that the GPU has finished powering up.
+	 * It's possible that the power up has been deferred until after a scheduled power down.
+	 * We must wait here for the L2 to be powered up, and holding a pm reference guarantees that
+	 * it will not be powered down afterwards.
+	 */
+	err = kbase_pm_wait_for_l2_powered(kbdev);
+	if (err) {
+		dev_err(kbdev->dev, "Wait for L2 power up failed on term of ctx %d_%d",
+			kctx->tgid, kctx->id);
+	}
+
+	/* Have synchronized against the System suspend and incremented the
+	 * pm.active_count. So any subsequent invocation of System suspend
+	 * callback would get blocked.
+	 * If System suspend callback was already in progress then the above loop
+	 * would have waited till the System resume callback has begun.
+	 * So wait for the System resume callback to also complete as we want to
+	 * avoid context termination during System resume also.
+	 */
+	wait_event(kbdev->pm.resume_wait, !kbase_pm_is_resuming(kbdev));
 
 	kbase_mem_pool_group_mark_dying(&kctx->mem_pools);
 
diff --git a/mali_kbase/context/backend/mali_kbase_context_jm.c b/mali_kbase/context/backend/mali_kbase_context_jm.c
index 74402ec..39595d9 100644
--- a/mali_kbase/context/backend/mali_kbase_context_jm.c
+++ b/mali_kbase/context/backend/mali_kbase_context_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,7 +27,6 @@
 #include <gpu/mali_kbase_gpu_regmap.h>
 #include <mali_kbase.h>
 #include <mali_kbase_ctx_sched.h>
-#include <mali_kbase_dma_fence.h>
 #include <mali_kbase_kinstr_jm.h>
 #include <mali_kbase_mem_linux.h>
 #include <mali_kbase_mem_pool_group.h>
@@ -36,11 +35,15 @@
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 #include <mali_kbase_debug_mem_view.h>
+#include <mali_kbase_debug_mem_zones.h>
+#include <mali_kbase_debug_mem_allocs.h>
 #include <mali_kbase_mem_pool_debugfs.h>
 
 void kbase_context_debugfs_init(struct kbase_context *const kctx)
 {
 	kbase_debug_mem_view_init(kctx);
+	kbase_debug_mem_zones_init(kctx);
+	kbase_debug_mem_allocs_init(kctx);
 	kbase_mem_pool_debugfs_init(kctx->kctx_dentry, kctx);
 	kbase_jit_debugfs_init(kctx);
 	kbasep_jd_debugfs_ctx_init(kctx);
@@ -126,8 +129,6 @@ static const struct kbase_context_init context_init[] = {
 	{ NULL, kbase_context_free, NULL },
 	{ kbase_context_common_init, kbase_context_common_term,
 	  "Common context initialization failed" },
-	{ kbase_dma_fence_init, kbase_dma_fence_term,
-	  "DMA fence initialization failed" },
 	{ kbase_context_mem_pool_group_init, kbase_context_mem_pool_group_term,
 	  "Memory pool group initialization failed" },
 	{ kbase_mem_evictable_init, kbase_mem_evictable_deinit,
@@ -157,11 +158,11 @@ static const struct kbase_context_init context_init[] = {
 	  kbase_debug_job_fault_context_term,
 	  "Job fault context initialization failed" },
 #endif
+	{ kbasep_platform_context_init, kbasep_platform_context_term,
+	  "Platform callback for kctx initialization failed" },
 	{ NULL, kbase_context_flush_jobs, NULL },
 	{ kbase_context_add_to_dev_list, kbase_context_remove_from_dev_list,
 	  "Adding kctx to device failed" },
-	{ kbasep_platform_context_init, kbasep_platform_context_term,
-	  "Platform callback for kctx initialization failed" },
 };
 
 static void kbase_context_term_partial(
@@ -178,7 +179,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev,
 	bool is_compat,
 	base_context_create_flags const flags,
 	unsigned long const api_version,
-	struct file *const filp)
+	struct kbase_file *const kfile)
 {
 	struct kbase_context *kctx;
 	unsigned int i = 0;
@@ -197,7 +198,7 @@ struct kbase_context *kbase_create_context(struct kbase_device *kbdev,
 
 	kctx->kbdev = kbdev;
 	kctx->api_version = api_version;
-	kctx->filp = filp;
+	kctx->kfile = kfile;
 	kctx->create_flags = flags;
 
 	if (is_compat)
@@ -257,6 +258,17 @@ void kbase_destroy_context(struct kbase_context *kctx)
 		wait_event(kbdev->pm.resume_wait,
 			   !kbase_pm_is_suspending(kbdev));
 	}
+
+	/* Have synchronized against the System suspend and incremented the
+	 * pm.active_count. So any subsequent invocation of System suspend
+	 * callback would get blocked.
+	 * If System suspend callback was already in progress then the above loop
+	 * would have waited till the System resume callback has begun.
+	 * So wait for the System resume callback to also complete as we want to
+	 * avoid context termination during System resume also.
+	 */
+	wait_event(kbdev->pm.resume_wait, !kbase_pm_is_resuming(kbdev));
+
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	atomic_dec(&kbdev->pm.gpu_users_waiting);
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
diff --git a/mali_kbase/context/mali_kbase_context.c b/mali_kbase/context/mali_kbase_context.c
index c7d7585..d227084 100644
--- a/mali_kbase/context/mali_kbase_context.c
+++ b/mali_kbase/context/mali_kbase_context.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,6 +22,12 @@
 /*
  * Base kernel context APIs
  */
+#include <linux/version.h>
+#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
+#include <linux/sched/task.h>
+#else
+#include <linux/sched.h>
+#endif
 
 #include <mali_kbase.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
@@ -176,17 +182,51 @@ int kbase_context_common_init(struct kbase_context *kctx)
 	/* creating a context is considered a disjoint event */
 	kbase_disjoint_event(kctx->kbdev);
 
-	kctx->as_nr = KBASEP_AS_NR_INVALID;
-
-	atomic_set(&kctx->refcount, 0);
-
-	spin_lock_init(&kctx->mm_update_lock);
 	kctx->process_mm = NULL;
+	kctx->task = NULL;
 	atomic_set(&kctx->nonmapped_pages, 0);
 	atomic_set(&kctx->permanent_mapped_pages, 0);
 	kctx->tgid = current->tgid;
 	kctx->pid = current->pid;
 
+	/* Check if this is a Userspace created context */
+	if (likely(kctx->kfile)) {
+		struct pid *pid_struct;
+
+		rcu_read_lock();
+		pid_struct = find_get_pid(kctx->tgid);
+		if (likely(pid_struct)) {
+			struct task_struct *task = pid_task(pid_struct, PIDTYPE_PID);
+
+			if (likely(task)) {
+				/* Take a reference on the task to avoid slow lookup
+				 * later on from the page allocation loop.
+				 */
+				get_task_struct(task);
+				kctx->task = task;
+			} else {
+				dev_err(kctx->kbdev->dev,
+					"Failed to get task pointer for %s/%d",
+					current->comm, current->pid);
+				err = -ESRCH;
+			}
+
+			put_pid(pid_struct);
+		} else {
+			dev_err(kctx->kbdev->dev,
+				"Failed to get pid pointer for %s/%d",
+				current->comm, current->pid);
+			err = -ESRCH;
+		}
+		rcu_read_unlock();
+
+		if (unlikely(err))
+			return err;
+
+		kbase_mem_mmgrab();
+		kctx->process_mm = current->mm;
+	}
+
 	atomic_set(&kctx->used_pages, 0);
 
 	mutex_init(&kctx->reg_lock);
@@ -197,7 +237,6 @@ int kbase_context_common_init(struct kbase_context *kctx)
 	spin_lock_init(&kctx->waiting_soft_jobs_lock);
 	INIT_LIST_HEAD(&kctx->waiting_soft_jobs);
 
-	init_waitqueue_head(&kctx->event_queue);
 	atomic_set(&kctx->event_count, 0);
 
 #if !MALI_USE_CSF
@@ -212,18 +251,23 @@ int kbase_context_common_init(struct kbase_context *kctx)
 	atomic64_set(&kctx->num_fixed_allocs, 0);
 #endif
 
+	kbase_gpu_vm_lock(kctx);
 	bitmap_copy(kctx->cookies, &cookies_mask, BITS_PER_LONG);
+	kbase_gpu_vm_unlock(kctx);
 
 	kctx->id = atomic_add_return(1, &(kctx->kbdev->ctx_num)) - 1;
 
 	mutex_lock(&kctx->kbdev->kctx_list_lock);
-
 	err = kbase_insert_kctx_to_process(kctx);
-	if (err)
-		dev_err(kctx->kbdev->dev,
-		"(err:%d) failed to insert kctx to kbase_process\n", err);
-
 	mutex_unlock(&kctx->kbdev->kctx_list_lock);
+	if (err) {
+		dev_err(kctx->kbdev->dev,
+			"(err:%d) failed to insert kctx to kbase_process", err);
+		if (likely(kctx->kfile)) {
+			mmdrop(kctx->process_mm);
+			put_task_struct(kctx->task);
+		}
+	}
 
 	return err;
 }
@@ -286,7 +330,9 @@ static void kbase_remove_kctx_from_process(struct kbase_context *kctx)
 		/* Add checks, so that the terminating process Should not
 		 * hold any gpu_memory.
 		 */
+		spin_lock(&kctx->kbdev->gpu_mem_usage_lock);
 		WARN_ON(kprcs->total_gpu_pages);
+		spin_unlock(&kctx->kbdev->gpu_mem_usage_lock);
 		WARN_ON(!RB_EMPTY_ROOT(&kprcs->dma_buf_root));
 		kobject_del(&kprcs->kobj);
 		kobject_put(&kprcs->kobj);
@@ -296,15 +342,8 @@ static void kbase_remove_kctx_from_process(struct kbase_context *kctx)
 
 void kbase_context_common_term(struct kbase_context *kctx)
 {
-	unsigned long flags;
 	int pages;
 
-	mutex_lock(&kctx->kbdev->mmu_hw_mutex);
-	spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags);
-	kbase_ctx_sched_remove_ctx(kctx);
-	spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags);
-	mutex_unlock(&kctx->kbdev->mmu_hw_mutex);
-
 	pages = atomic_read(&kctx->used_pages);
 	if (pages != 0)
 		dev_warn(kctx->kbdev->dev,
@@ -316,15 +355,18 @@ void kbase_context_common_term(struct kbase_context *kctx)
 	kbase_remove_kctx_from_process(kctx);
 	mutex_unlock(&kctx->kbdev->kctx_list_lock);
 
+	if (likely(kctx->kfile)) {
+		mmdrop(kctx->process_mm);
+		put_task_struct(kctx->task);
+	}
+
 	KBASE_KTRACE_ADD(kctx->kbdev, CORE_CTX_DESTROY, kctx, 0u);
 }
 
 int kbase_context_mem_pool_group_init(struct kbase_context *kctx)
 {
-	return kbase_mem_pool_group_init(&kctx->mem_pools,
-		kctx->kbdev,
-		&kctx->kbdev->mem_pool_defaults,
-		&kctx->kbdev->mem_pools);
+	return kbase_mem_pool_group_init(&kctx->mem_pools, kctx->kbdev,
+					 &kctx->kbdev->mem_pool_defaults, &kctx->kbdev->mem_pools);
 }
 
 void kbase_context_mem_pool_group_term(struct kbase_context *kctx)
diff --git a/mali_kbase/context/mali_kbase_context.h b/mali_kbase/context/mali_kbase_context.h
index a0c51c9..22cb00c 100644
--- a/mali_kbase/context/mali_kbase_context.h
+++ b/mali_kbase/context/mali_kbase_context.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2017, 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -56,8 +56,9 @@ void kbase_context_debugfs_term(struct kbase_context *const kctx);
  *               BASEP_CONTEXT_CREATE_KERNEL_FLAGS.
  * @api_version: Application program interface version, as encoded in
  *               a single integer by the KBASE_API_VERSION macro.
- * @filp:        Pointer to the struct file corresponding to device file
- *               /dev/malixx instance, passed to the file's open method.
+ * @kfile:       Pointer to the object representing the /dev/malixx device
+ *               file instance. Shall be passed as NULL for internally created
+ *               contexts.
  *
  * Up to one context can be created for each client that opens the device file
  * /dev/malixx. Context creation is deferred until a special ioctl() system call
@@ -69,7 +70,7 @@ struct kbase_context *
 kbase_create_context(struct kbase_device *kbdev, bool is_compat,
 	base_context_create_flags const flags,
 	unsigned long api_version,
-	struct file *filp);
+	struct kbase_file *const kfile);
 
 /**
  * kbase_destroy_context - Destroy a kernel base context.
@@ -93,6 +94,19 @@ static inline bool kbase_ctx_flag(struct kbase_context *kctx,
 }
 
 /**
+ * kbase_ctx_compat_mode - Indicate whether a kbase context needs to operate
+ *                         in compatibility mode for 32-bit userspace.
+ * @kctx: kbase context
+ *
+ * Return: True if needs to maintain compatibility, False otherwise.
+ */
+static inline bool kbase_ctx_compat_mode(struct kbase_context *kctx)
+{
+	return !IS_ENABLED(CONFIG_64BIT) ||
+	       (IS_ENABLED(CONFIG_64BIT) && kbase_ctx_flag(kctx, KCTX_COMPAT));
+}
+
+/**
  * kbase_ctx_flag_clear - Clear @flag on @kctx
  * @kctx: Pointer to kbase context
  * @flag: Flag to clear
diff --git a/mali_kbase/csf/Kbuild b/mali_kbase/csf/Kbuild
index 29983fb..c626092 100644
--- a/mali_kbase/csf/Kbuild
+++ b/mali_kbase/csf/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -31,14 +31,24 @@ mali_kbase-y += \
     csf/mali_kbase_csf_reset_gpu.o \
     csf/mali_kbase_csf_csg_debugfs.o \
     csf/mali_kbase_csf_kcpu_debugfs.o \
+    csf/mali_kbase_csf_sync_debugfs.o \
+    csf/mali_kbase_csf_kcpu_fence_debugfs.o \
     csf/mali_kbase_csf_protected_memory.o \
     csf/mali_kbase_csf_tiler_heap_debugfs.o \
     csf/mali_kbase_csf_cpu_queue_debugfs.o \
-    csf/mali_kbase_csf_event.o
+    csf/mali_kbase_csf_event.o \
+    csf/mali_kbase_csf_firmware_log.o \
+    csf/mali_kbase_csf_firmware_core_dump.o \
+    csf/mali_kbase_csf_tiler_heap_reclaim.o \
+    csf/mali_kbase_csf_mcu_shared_reg.o
 
-mali_kbase-$(CONFIG_MALI_REAL_HW) += csf/mali_kbase_csf_firmware.o
+ifeq ($(CONFIG_MALI_NO_MALI),y)
+mali_kbase-y += csf/mali_kbase_csf_firmware_no_mali.o
+else
+mali_kbase-y += csf/mali_kbase_csf_firmware.o
+endif
 
-mali_kbase-$(CONFIG_MALI_NO_MALI) += csf/mali_kbase_csf_firmware_no_mali.o
+mali_kbase-$(CONFIG_DEBUG_FS) += csf/mali_kbase_debug_csf_fault.o
 
 ifeq ($(KBUILD_EXTMOD),)
 # in-tree
diff --git a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c
index a56b689..bbf2e4e 100644
--- a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c
+++ b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,6 +20,7 @@
  */
 
 #include <mali_kbase.h>
+#include <mali_kbase_config_defaults.h>
 #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h"
 #include "mali_kbase_csf_ipa_control.h"
 
@@ -27,8 +28,6 @@
  * Status flags from the STATUS register of the IPA Control interface.
  */
 #define STATUS_COMMAND_ACTIVE ((u32)1 << 0)
-#define STATUS_TIMER_ACTIVE ((u32)1 << 1)
-#define STATUS_AUTO_ACTIVE ((u32)1 << 2)
 #define STATUS_PROTECTED_MODE ((u32)1 << 8)
 #define STATUS_RESET ((u32)1 << 9)
 #define STATUS_TIMER_ENABLED ((u32)1 << 31)
@@ -36,27 +35,15 @@
 /*
  * Commands for the COMMAND register of the IPA Control interface.
  */
-#define COMMAND_NOP ((u32)0)
 #define COMMAND_APPLY ((u32)1)
-#define COMMAND_CLEAR ((u32)2)
 #define COMMAND_SAMPLE ((u32)3)
 #define COMMAND_PROTECTED_ACK ((u32)4)
 #define COMMAND_RESET_ACK ((u32)5)
 
 /*
- * Default value for the TIMER register of the IPA Control interface,
- * expressed in milliseconds.
- *
- * The chosen value is a trade off between two requirements: the IPA Control
- * interface should sample counters with a resolution in the order of
- * milliseconds, while keeping GPU overhead as limited as possible.
- */
-#define TIMER_DEFAULT_VALUE_MS ((u32)10) /* 10 milliseconds */
-
-/*
  * Number of timer events per second.
  */
-#define TIMER_EVENTS_PER_SECOND ((u32)1000 / TIMER_DEFAULT_VALUE_MS)
+#define TIMER_EVENTS_PER_SECOND ((u32)1000 / IPA_CONTROL_TIMER_DEFAULT_VALUE_MS)
 
 /*
  * Maximum number of loops polling the GPU before we assume the GPU has hung.
@@ -77,12 +64,19 @@
  * struct kbase_ipa_control_listener_data - Data for the GPU clock frequency
  *                                          listener
  *
- * @listener: GPU clock frequency listener.
- * @kbdev:    Pointer to kbase device.
+ * @listener:     GPU clock frequency listener.
+ * @kbdev:        Pointer to kbase device.
+ * @clk_chg_wq:   Dedicated workqueue to process the work item corresponding to
+ *                a clock rate notification.
+ * @clk_chg_work: Work item to process the clock rate change
+ * @rate:         The latest notified rate change, in unit of Hz
  */
 struct kbase_ipa_control_listener_data {
 	struct kbase_clk_rate_listener listener;
 	struct kbase_device *kbdev;
+	struct workqueue_struct *clk_chg_wq;
+	struct work_struct clk_chg_work;
+	atomic_t rate;
 };
 
 static u32 timer_value(u32 gpu_rate)
@@ -284,58 +278,61 @@ kbase_ipa_control_rate_change_notify(struct kbase_clk_rate_listener *listener,
 				     u32 clk_index, u32 clk_rate_hz)
 {
 	if ((clk_index == KBASE_CLOCK_DOMAIN_TOP) && (clk_rate_hz != 0)) {
-		size_t i;
-		unsigned long flags;
 		struct kbase_ipa_control_listener_data *listener_data =
-			container_of(listener,
-				     struct kbase_ipa_control_listener_data,
-				     listener);
-		struct kbase_device *kbdev = listener_data->kbdev;
-		struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control;
+			container_of(listener, struct kbase_ipa_control_listener_data, listener);
 
-		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+		/* Save the rate and delegate the job to a work item */
+		atomic_set(&listener_data->rate, clk_rate_hz);
+		queue_work(listener_data->clk_chg_wq, &listener_data->clk_chg_work);
+	}
+}
 
-		if (!kbdev->pm.backend.gpu_ready) {
-			dev_err(kbdev->dev,
-				"%s: GPU frequency cannot change while GPU is off",
-				__func__);
-			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-			return;
-		}
+static void kbase_ipa_ctrl_rate_change_worker(struct work_struct *data)
+{
+	struct kbase_ipa_control_listener_data *listener_data =
+		container_of(data, struct kbase_ipa_control_listener_data, clk_chg_work);
+	struct kbase_device *kbdev = listener_data->kbdev;
+	struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control;
+	unsigned long flags;
+	u32 rate;
+	size_t i;
 
-		/* Interrupts are already disabled and interrupt state is also saved */
-		spin_lock(&ipa_ctrl->lock);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
-		for (i = 0; i < KBASE_IPA_CONTROL_MAX_SESSIONS; i++) {
-			struct kbase_ipa_control_session *session = &ipa_ctrl->sessions[i];
+	if (!kbdev->pm.backend.gpu_ready) {
+		dev_err(kbdev->dev, "%s: GPU frequency cannot change while GPU is off", __func__);
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+		return;
+	}
 
-			if (session->active) {
-				size_t j;
+	spin_lock(&ipa_ctrl->lock);
+	/* Picking up the latest notified rate */
+	rate = (u32)atomic_read(&listener_data->rate);
 
-				for (j = 0; j < session->num_prfcnts; j++) {
-					struct kbase_ipa_control_prfcnt *prfcnt =
-						&session->prfcnts[j];
+	for (i = 0; i < KBASE_IPA_CONTROL_MAX_SESSIONS; i++) {
+		struct kbase_ipa_control_session *session = &ipa_ctrl->sessions[i];
 
-					if (prfcnt->gpu_norm)
-						calc_prfcnt_delta(kbdev, prfcnt, true);
-				}
-			}
-		}
+		if (session->active) {
+			size_t j;
 
-		ipa_ctrl->cur_gpu_rate = clk_rate_hz;
+			for (j = 0; j < session->num_prfcnts; j++) {
+				struct kbase_ipa_control_prfcnt *prfcnt = &session->prfcnts[j];
 
-		/* Update the timer for automatic sampling if active sessions
-		 * are present. Counters have already been manually sampled.
-		 */
-		if (ipa_ctrl->num_active_sessions > 0) {
-			kbase_reg_write(kbdev, IPA_CONTROL_REG(TIMER),
-					timer_value(ipa_ctrl->cur_gpu_rate));
+				if (prfcnt->gpu_norm)
+					calc_prfcnt_delta(kbdev, prfcnt, true);
+			}
 		}
+	}
 
-		spin_unlock(&ipa_ctrl->lock);
+	ipa_ctrl->cur_gpu_rate = rate;
+	/* Update the timer for automatic sampling if active sessions
+	 * are present. Counters have already been manually sampled.
+	 */
+	if (ipa_ctrl->num_active_sessions > 0)
+		kbase_reg_write(kbdev, IPA_CONTROL_REG(TIMER), timer_value(rate));
 
-		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-	}
+	spin_unlock(&ipa_ctrl->lock);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 }
 
 void kbase_ipa_control_init(struct kbase_device *kbdev)
@@ -344,6 +341,7 @@ void kbase_ipa_control_init(struct kbase_device *kbdev)
 	struct kbase_clk_rate_trace_manager *clk_rtm = &kbdev->pm.clk_rtm;
 	struct kbase_ipa_control_listener_data *listener_data;
 	size_t i, j;
+	unsigned long flags;
 
 	for (i = 0; i < KBASE_IPA_CORE_TYPE_NUM; i++) {
 		for (j = 0; j < KBASE_IPA_CONTROL_NUM_BLOCK_COUNTERS; j++) {
@@ -362,20 +360,35 @@ void kbase_ipa_control_init(struct kbase_device *kbdev)
 	listener_data = kmalloc(sizeof(struct kbase_ipa_control_listener_data),
 				GFP_KERNEL);
 	if (listener_data) {
-		listener_data->listener.notify =
-			kbase_ipa_control_rate_change_notify;
-		listener_data->kbdev = kbdev;
-		ipa_ctrl->rtm_listener_data = listener_data;
-	}
+		listener_data->clk_chg_wq =
+			alloc_workqueue("ipa_ctrl_wq", WQ_HIGHPRI | WQ_UNBOUND, 1);
+		if (listener_data->clk_chg_wq) {
+			INIT_WORK(&listener_data->clk_chg_work, kbase_ipa_ctrl_rate_change_worker);
+			listener_data->listener.notify = kbase_ipa_control_rate_change_notify;
+			listener_data->kbdev = kbdev;
+			ipa_ctrl->rtm_listener_data = listener_data;
+			/* Initialise to 0, which is out of normal notified rates */
+			atomic_set(&listener_data->rate, 0);
+		} else {
+			dev_warn(kbdev->dev,
+				 "%s: failed to allocate workqueue, clock rate update disabled",
+				 __func__);
+			kfree(listener_data);
+			listener_data = NULL;
+		}
+	} else
+		dev_warn(kbdev->dev,
+			 "%s: failed to allocate memory, IPA control clock rate update disabled",
+			 __func__);
 
-	spin_lock(&clk_rtm->lock);
+	spin_lock_irqsave(&clk_rtm->lock, flags);
 	if (clk_rtm->clks[KBASE_CLOCK_DOMAIN_TOP])
 		ipa_ctrl->cur_gpu_rate =
 			clk_rtm->clks[KBASE_CLOCK_DOMAIN_TOP]->clock_val;
 	if (listener_data)
 		kbase_clk_rate_trace_manager_subscribe_no_lock(
 			clk_rtm, &listener_data->listener);
-	spin_unlock(&clk_rtm->lock);
+	spin_unlock_irqrestore(&clk_rtm->lock, flags);
 }
 KBASE_EXPORT_TEST_API(kbase_ipa_control_init);
 
@@ -389,8 +402,10 @@ void kbase_ipa_control_term(struct kbase_device *kbdev)
 
 	WARN_ON(ipa_ctrl->num_active_sessions);
 
-	if (listener_data)
+	if (listener_data) {
 		kbase_clk_rate_trace_manager_unsubscribe(clk_rtm, &listener_data->listener);
+		destroy_workqueue(listener_data->clk_chg_wq);
+	}
 	kfree(ipa_ctrl->rtm_listener_data);
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
@@ -602,9 +617,10 @@ int kbase_ipa_control_register(
 	 */
 	for (session_idx = 0; session_idx < KBASE_IPA_CONTROL_MAX_SESSIONS;
 	     session_idx++) {
-		session = &ipa_ctrl->sessions[session_idx];
-		if (!session->active)
+		if (!ipa_ctrl->sessions[session_idx].active) {
+			session = &ipa_ctrl->sessions[session_idx];
 			break;
+		}
 	}
 
 	if (!session) {
@@ -659,7 +675,7 @@ int kbase_ipa_control_register(
 		/* Reports to this client for GPU time spent in protected mode
 		 * should begin from the point of registration.
 		 */
-		session->last_query_time = ktime_get_ns();
+		session->last_query_time = ktime_get_raw_ns();
 
 		/* Initially, no time has been spent in protected mode */
 		session->protm_time = 0;
@@ -829,7 +845,7 @@ int kbase_ipa_control_query(struct kbase_device *kbdev, const void *client,
 	}
 
 	if (protected_time) {
-		u64 time_now = ktime_get_ns();
+		u64 time_now = ktime_get_raw_ns();
 
 		/* This is the amount of protected-mode time spent prior to
 		 * the current protm period.
@@ -973,16 +989,53 @@ void kbase_ipa_control_handle_gpu_reset_post(struct kbase_device *kbdev)
 }
 KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_reset_post);
 
+#ifdef KBASE_PM_RUNTIME
+void kbase_ipa_control_handle_gpu_sleep_enter(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (kbdev->pm.backend.mcu_state == KBASE_MCU_IN_SLEEP) {
+		/* GPU Sleep is treated as a power down */
+		kbase_ipa_control_handle_gpu_power_off(kbdev);
+
+		/* SELECT_CSHW register needs to be cleared to prevent any
+		 * IPA control message to be sent to the top level GPU HWCNT.
+		 */
+		kbase_reg_write(kbdev, IPA_CONTROL_REG(SELECT_CSHW_LO), 0);
+		kbase_reg_write(kbdev, IPA_CONTROL_REG(SELECT_CSHW_HI), 0);
+
+		/* No need to issue the APPLY command here */
+	}
+}
+KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_sleep_enter);
+
+void kbase_ipa_control_handle_gpu_sleep_exit(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (kbdev->pm.backend.mcu_state == KBASE_MCU_IN_SLEEP) {
+		/* To keep things simple, currently exit from
+		 * GPU Sleep is treated as a power on event where
+		 * all 4 SELECT registers are reconfigured.
+		 * On exit from sleep, reconfiguration is needed
+		 * only for the SELECT_CSHW register.
+		 */
+		kbase_ipa_control_handle_gpu_power_on(kbdev);
+	}
+}
+KBASE_EXPORT_TEST_API(kbase_ipa_control_handle_gpu_sleep_exit);
+#endif
+
 #if MALI_UNIT_TEST
 void kbase_ipa_control_rate_change_notify_test(struct kbase_device *kbdev,
 					       u32 clk_index, u32 clk_rate_hz)
 {
 	struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control;
-	struct kbase_ipa_control_listener_data *listener_data =
-		ipa_ctrl->rtm_listener_data;
+	struct kbase_ipa_control_listener_data *listener_data = ipa_ctrl->rtm_listener_data;
 
-	kbase_ipa_control_rate_change_notify(&listener_data->listener,
-					     clk_index, clk_rate_hz);
+	kbase_ipa_control_rate_change_notify(&listener_data->listener, clk_index, clk_rate_hz);
+	/* Ensure the callback has taken effect before returning back to the test caller */
+	flush_work(&listener_data->clk_chg_work);
 }
 KBASE_EXPORT_TEST_API(kbase_ipa_control_rate_change_notify_test);
 #endif
@@ -992,14 +1045,14 @@ void kbase_ipa_control_protm_entered(struct kbase_device *kbdev)
 	struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	ipa_ctrl->protm_start = ktime_get_ns();
+	ipa_ctrl->protm_start = ktime_get_raw_ns();
 }
 
 void kbase_ipa_control_protm_exited(struct kbase_device *kbdev)
 {
 	struct kbase_ipa_control *ipa_ctrl = &kbdev->csf.ipa_control;
 	size_t i;
-	u64 time_now = ktime_get_ns();
+	u64 time_now = ktime_get_raw_ns();
 	u32 status;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
@@ -1035,4 +1088,3 @@ void kbase_ipa_control_protm_exited(struct kbase_device *kbdev)
 		}
 	}
 }
-
diff --git a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h
index 0469c48..69ff897 100644
--- a/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h
+++ b/mali_kbase/csf/ipa_control/mali_kbase_csf_ipa_control.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -198,6 +198,33 @@ void kbase_ipa_control_handle_gpu_reset_pre(struct kbase_device *kbdev);
  */
 void kbase_ipa_control_handle_gpu_reset_post(struct kbase_device *kbdev);
 
+#ifdef KBASE_PM_RUNTIME
+/**
+ * kbase_ipa_control_handle_gpu_sleep_enter - Handle the pre GPU Sleep event
+ *
+ * @kbdev:          Pointer to kbase device.
+ *
+ * This function is called after MCU has been put to sleep state & L2 cache has
+ * been powered down. The top level part of GPU is still powered up when this
+ * function is called.
+ */
+void kbase_ipa_control_handle_gpu_sleep_enter(struct kbase_device *kbdev);
+
+/**
+ * kbase_ipa_control_handle_gpu_sleep_exit - Handle the post GPU Sleep event
+ *
+ * @kbdev:          Pointer to kbase device.
+ *
+ * This function is called when L2 needs to be powered up and MCU can exit the
+ * sleep state. The top level part of GPU is powered up when this function is
+ * called.
+ *
+ * This function must be called only if kbase_ipa_control_handle_gpu_sleep_enter()
+ * was called previously.
+ */
+void kbase_ipa_control_handle_gpu_sleep_exit(struct kbase_device *kbdev);
+#endif
+
 #if MALI_UNIT_TEST
 /**
  * kbase_ipa_control_rate_change_notify_test - Notify GPU rate change
diff --git a/mali_kbase/csf/mali_kbase_csf.c b/mali_kbase/csf/mali_kbase_csf.c
index 1a92267..91d5c43 100644
--- a/mali_kbase/csf/mali_kbase_csf.c
+++ b/mali_kbase/csf/mali_kbase_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -34,10 +34,19 @@
 #include <csf/ipa_control/mali_kbase_csf_ipa_control.h>
 #include <mali_kbase_hwaccess_time.h>
 #include "mali_kbase_csf_event.h"
+#include <mali_linux_trace.h>
+#include <linux/protected_memory_allocator.h>
+#include <tl/mali_kbase_tracepoints.h>
+#include "mali_kbase_csf_mcu_shared_reg.h"
+#include <linux/version_compat_defs.h>
 
 #define CS_REQ_EXCEPTION_MASK (CS_REQ_FAULT_MASK | CS_REQ_FATAL_MASK)
 #define CS_ACK_EXCEPTION_MASK (CS_ACK_FAULT_MASK | CS_ACK_FATAL_MASK)
-#define POWER_DOWN_LATEST_FLUSH_VALUE ((u32)1)
+
+#define CS_RING_BUFFER_MAX_SIZE ((uint32_t)(1 << 31)) /* 2GiB */
+#define CS_RING_BUFFER_MIN_SIZE ((uint32_t)4096)
+
+#define PROTM_ALLOC_MAX_RETRIES ((u8)5)
 
 const u8 kbasep_csf_queue_group_priority_to_relative[BASE_QUEUE_GROUP_PRIORITY_COUNT] = {
 	KBASE_QUEUE_GROUP_PRIORITY_HIGH,
@@ -52,6 +61,55 @@ const u8 kbasep_csf_relative_to_queue_group_priority[KBASE_QUEUE_GROUP_PRIORITY_
 	BASE_QUEUE_GROUP_PRIORITY_LOW
 };
 
+/*
+ * struct irq_idle_and_protm_track - Object that tracks the idle and protected mode
+ *                                   request information in an interrupt case across
+ *                                   groups.
+ *
+ * @protm_grp: Possibly schedulable group that requested protected mode in the interrupt.
+ *             If NULL, no such case observed in the tracked interrupt case.
+ * @idle_seq:  The highest priority group that notified idle. If no such instance in the
+ *             interrupt case, marked with the largest field value: U32_MAX.
+ * @idle_slot: The slot number if @p idle_seq is valid in the given tracking case.
+ */
+struct irq_idle_and_protm_track {
+	struct kbase_queue_group *protm_grp;
+	u32 idle_seq;
+	s8 idle_slot;
+};
+
+/**
+ * kbasep_ctx_user_reg_page_mapping_term() - Terminate resources for USER Register Page.
+ *
+ * @kctx:   Pointer to the kbase context
+ */
+static void kbasep_ctx_user_reg_page_mapping_term(struct kbase_context *kctx)
+{
+	struct kbase_device *kbdev = kctx->kbdev;
+
+	if (unlikely(kctx->csf.user_reg.vma))
+		dev_err(kbdev->dev, "VMA for USER Register page exist on termination of ctx %d_%d",
+			kctx->tgid, kctx->id);
+	if (WARN_ON_ONCE(!list_empty(&kctx->csf.user_reg.link)))
+		list_del_init(&kctx->csf.user_reg.link);
+}
+
+/**
+ * kbasep_ctx_user_reg_page_mapping_init() - Initialize resources for USER Register Page.
+ *
+ * @kctx:   Pointer to the kbase context
+ *
+ * @return: 0 on success.
+ */
+static int kbasep_ctx_user_reg_page_mapping_init(struct kbase_context *kctx)
+{
+	INIT_LIST_HEAD(&kctx->csf.user_reg.link);
+	kctx->csf.user_reg.vma = NULL;
+	kctx->csf.user_reg.file_offset = 0;
+
+	return 0;
+}
+
 static void put_user_pages_mmap_handle(struct kbase_context *kctx,
 			struct kbase_queue *queue)
 {
@@ -112,116 +170,32 @@ static int get_user_pages_mmap_handle(struct kbase_context *kctx,
 	return 0;
 }
 
-static void gpu_munmap_user_io_pages(struct kbase_context *kctx,
-			struct kbase_va_region *reg)
-{
-	size_t num_pages = 2;
-
-	kbase_mmu_teardown_pages(kctx->kbdev, &kctx->kbdev->csf.mcu_mmu,
-				 reg->start_pfn, num_pages, MCU_AS_NR);
-
-	WARN_ON(reg->flags & KBASE_REG_FREE);
-
-	mutex_lock(&kctx->kbdev->csf.reg_lock);
-	kbase_remove_va_region(kctx->kbdev, reg);
-	mutex_unlock(&kctx->kbdev->csf.reg_lock);
-}
-
 static void init_user_io_pages(struct kbase_queue *queue)
 {
-	u32 *input_addr = (u32 *)(queue->user_io_addr);
-	u32 *output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE);
-
-	input_addr[CS_INSERT_LO/4] = 0;
-	input_addr[CS_INSERT_HI/4] = 0;
-
-	input_addr[CS_EXTRACT_INIT_LO/4] = 0;
-	input_addr[CS_EXTRACT_INIT_HI/4] = 0;
-
-	output_addr[CS_EXTRACT_LO/4] = 0;
-	output_addr[CS_EXTRACT_HI/4] = 0;
-
-	output_addr[CS_ACTIVE/4] = 0;
-}
-
-/* Map the input/output pages in the shared interface segment of MCU firmware
- * address space.
- */
-static int gpu_mmap_user_io_pages(struct kbase_device *kbdev,
-		struct tagged_addr *phys, struct kbase_va_region *reg)
-{
-	unsigned long mem_flags = KBASE_REG_GPU_RD;
-	const size_t num_pages = 2;
-	int ret;
+	u64 *input_addr = queue->user_io_addr;
+	u64 *output_addr64 = queue->user_io_addr + PAGE_SIZE / sizeof(u64);
+	u32 *output_addr32 = (u32 *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64));
 
-	/* Calls to this function are inherently asynchronous, with respect to
-	 * MMU operations.
+	/*
+	 * CS_INSERT and CS_EXTRACT registers contain 64-bit memory addresses which
+	 * should be accessed atomically. Here we update them 32-bits at a time, but
+	 * as this is initialisation code, non-atomic accesses are safe.
 	 */
-	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
-
-#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \
-		((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \
-		 (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE)))
-	mem_flags |=
-		KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
-#else
-	if (kbdev->system_coherency == COHERENCY_NONE) {
-		mem_flags |=
-			KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
-	} else {
-		mem_flags |= KBASE_REG_SHARE_BOTH |
-			KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_SHARED);
-	}
-#endif
-
-	mutex_lock(&kbdev->csf.reg_lock);
-	ret = kbase_add_va_region_rbtree(kbdev, reg, 0, num_pages, 1);
-	reg->flags &= ~KBASE_REG_FREE;
-	mutex_unlock(&kbdev->csf.reg_lock);
-
-	if (ret)
-		return ret;
-
-	/* Map input page */
-	ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, reg->start_pfn,
-				     &phys[0], 1, mem_flags, MCU_AS_NR,
-				     KBASE_MEM_GROUP_CSF_IO, mmu_sync_info);
-	if (ret)
-		goto bad_insert;
-
-	/* Map output page, it needs rw access */
-	mem_flags |= KBASE_REG_GPU_WR;
-	ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu,
-				     reg->start_pfn + 1, &phys[1], 1, mem_flags,
-				     MCU_AS_NR, KBASE_MEM_GROUP_CSF_IO,
-				     mmu_sync_info);
-	if (ret)
-		goto bad_insert_output_page;
-
-	return 0;
-
-bad_insert_output_page:
-	kbase_mmu_teardown_pages(kbdev, &kbdev->csf.mcu_mmu,
-				 reg->start_pfn, 1, MCU_AS_NR);
-bad_insert:
-	mutex_lock(&kbdev->csf.reg_lock);
-	kbase_remove_va_region(kbdev, reg);
-	mutex_unlock(&kbdev->csf.reg_lock);
-
-	return ret;
+	input_addr[CS_INSERT_LO / sizeof(*input_addr)] = 0;
+	input_addr[CS_EXTRACT_INIT_LO / sizeof(*input_addr)] = 0;
+	output_addr64[CS_EXTRACT_LO / sizeof(*output_addr64)] = 0;
+	output_addr32[CS_ACTIVE / sizeof(*output_addr32)] = 0;
 }
 
 static void kernel_unmap_user_io_pages(struct kbase_context *kctx,
 			struct kbase_queue *queue)
 {
-	const size_t num_pages = 2;
-
 	kbase_gpu_vm_lock(kctx);
 
 	vunmap(queue->user_io_addr);
 
-	WARN_ON(num_pages > atomic_read(&kctx->permanent_mapped_pages));
-	atomic_sub(num_pages, &kctx->permanent_mapped_pages);
+	WARN_ON(atomic_read(&kctx->permanent_mapped_pages) < KBASEP_NUM_CS_USER_IO_PAGES);
+	atomic_sub(KBASEP_NUM_CS_USER_IO_PAGES, &kctx->permanent_mapped_pages);
 
 	kbase_gpu_vm_unlock(kctx);
 }
@@ -231,6 +205,8 @@ static int kernel_map_user_io_pages(struct kbase_context *kctx,
 {
 	struct page *page_list[2];
 	pgprot_t cpu_map_prot;
+	unsigned long flags;
+	uint64_t *user_io_addr;
 	int ret = 0;
 	size_t i;
 
@@ -245,27 +221,25 @@ static int kernel_map_user_io_pages(struct kbase_context *kctx,
 	/* The pages are mapped to Userspace also, so use the same mapping
 	 * attributes as used inside the CPU page fault handler.
 	 */
-#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \
-		((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \
-		 (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE)))
-	cpu_map_prot = pgprot_device(PAGE_KERNEL);
-#else
 	if (kctx->kbdev->system_coherency == COHERENCY_NONE)
 		cpu_map_prot = pgprot_writecombine(PAGE_KERNEL);
 	else
 		cpu_map_prot = PAGE_KERNEL;
-#endif
 
 	for (i = 0; i < ARRAY_SIZE(page_list); i++)
 		page_list[i] = as_page(queue->phys[i]);
 
-	queue->user_io_addr = vmap(page_list, ARRAY_SIZE(page_list), VM_MAP, cpu_map_prot);
+	user_io_addr = vmap(page_list, ARRAY_SIZE(page_list), VM_MAP, cpu_map_prot);
 
-	if (!queue->user_io_addr)
+	if (!user_io_addr)
 		ret = -ENOMEM;
 	else
 		atomic_add(ARRAY_SIZE(page_list), &kctx->permanent_mapped_pages);
 
+	kbase_csf_scheduler_spin_lock(kctx->kbdev, &flags);
+	queue->user_io_addr = user_io_addr;
+	kbase_csf_scheduler_spin_unlock(kctx->kbdev, flags);
+
 unlock:
 	kbase_gpu_vm_unlock(kctx);
 	return ret;
@@ -273,7 +247,7 @@ unlock:
 
 static void term_queue_group(struct kbase_queue_group *group);
 static void get_queue(struct kbase_queue *queue);
-static void release_queue(struct kbase_queue *queue);
+static bool release_queue(struct kbase_queue *queue);
 
 /**
  * kbase_csf_free_command_stream_user_pages() - Free the resources allocated
@@ -297,70 +271,62 @@ static void release_queue(struct kbase_queue *queue);
  * If an explicit or implicit unbind was missed by the userspace then the
  * mapping will persist. On process exit kernel itself will remove the mapping.
  */
-static void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx,
-		struct kbase_queue *queue)
+void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue)
 {
-	const size_t num_pages = 2;
-
-	gpu_munmap_user_io_pages(kctx, queue->reg);
 	kernel_unmap_user_io_pages(kctx, queue);
 
 	kbase_mem_pool_free_pages(
 		&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
-		num_pages, queue->phys, true, false);
+		KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, true, false);
+	kbase_process_page_usage_dec(kctx, KBASEP_NUM_CS_USER_IO_PAGES);
 
-	kfree(queue->reg);
-	queue->reg = NULL;
+	/* The user_io_gpu_va should have been unmapped inside the scheduler */
+	WARN_ONCE(queue->user_io_gpu_va, "Userio pages appears still have mapping");
 
 	/* If the queue has already been terminated by userspace
 	 * then the ref count for queue object will drop to 0 here.
 	 */
 	release_queue(queue);
 }
+KBASE_EXPORT_TEST_API(kbase_csf_free_command_stream_user_pages);
 
-int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx,
-			struct kbase_queue *queue)
+int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx, struct kbase_queue *queue)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	struct kbase_va_region *reg;
-	const size_t num_pages = 2;
 	int ret;
 
 	lockdep_assert_held(&kctx->csf.lock);
 
-	reg = kbase_alloc_free_region(&kctx->kbdev->csf.shared_reg_rbtree, 0,
-				      num_pages, KBASE_REG_ZONE_MCU_SHARED);
-	if (!reg)
+	ret = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
+					 KBASEP_NUM_CS_USER_IO_PAGES,
+					 queue->phys, false, kctx->task);
+	if (ret != KBASEP_NUM_CS_USER_IO_PAGES) {
+		/* Marking both the phys to zero for indicating there is no phys allocated */
+		queue->phys[0].tagged_addr = 0;
+		queue->phys[1].tagged_addr = 0;
 		return -ENOMEM;
-
-	ret = kbase_mem_pool_alloc_pages(
-				&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
-				num_pages, queue->phys, false);
-
-	if (ret != num_pages)
-		goto phys_alloc_failed;
+	}
 
 	ret = kernel_map_user_io_pages(kctx, queue);
 	if (ret)
 		goto kernel_map_failed;
 
+	kbase_process_page_usage_inc(kctx, KBASEP_NUM_CS_USER_IO_PAGES);
 	init_user_io_pages(queue);
 
-	ret = gpu_mmap_user_io_pages(kctx->kbdev, queue->phys, reg);
-	if (ret)
-		goto gpu_mmap_failed;
-
-	queue->reg = reg;
+	/* user_io_gpu_va is only mapped when scheduler decides to put the queue
+	 * on slot at runtime. Initialize it to 0, signalling no mapping.
+	 */
+	queue->user_io_gpu_va = 0;
 
 	mutex_lock(&kbdev->csf.reg_lock);
-	if (kbdev->csf.db_file_offsets >
-			(U32_MAX - BASEP_QUEUE_NR_MMAP_USER_PAGES + 1))
+	if (kbdev->csf.db_file_offsets > (U32_MAX - BASEP_QUEUE_NR_MMAP_USER_PAGES + 1))
 		kbdev->csf.db_file_offsets = 0;
 
 	queue->db_file_offset = kbdev->csf.db_file_offsets;
 	kbdev->csf.db_file_offsets += BASEP_QUEUE_NR_MMAP_USER_PAGES;
-
-	WARN(atomic_read(&queue->refcount) != 1, "Incorrect refcounting for queue object\n");
+	WARN(kbase_refcount_read(&queue->refcount) != 1,
+	     "Incorrect refcounting for queue object\n");
 	/* This is the second reference taken on the queue object and
 	 * would be dropped only when the IO mapping is removed either
 	 * explicitly by userspace or implicitly by kernel on process exit.
@@ -371,19 +337,16 @@ int kbase_csf_alloc_command_stream_user_pages(struct kbase_context *kctx,
 
 	return 0;
 
-gpu_mmap_failed:
-	kernel_unmap_user_io_pages(kctx, queue);
-
 kernel_map_failed:
-	kbase_mem_pool_free_pages(
-		&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
-		num_pages, queue->phys, false, false);
-
-phys_alloc_failed:
-	kfree(reg);
+	kbase_mem_pool_free_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_IO],
+				  KBASEP_NUM_CS_USER_IO_PAGES, queue->phys, false, false);
+	/* Marking both the phys to zero for indicating there is no phys allocated */
+	queue->phys[0].tagged_addr = 0;
+	queue->phys[1].tagged_addr = 0;
 
-	return -ENOMEM;
+	return ret;
 }
+KBASE_EXPORT_TEST_API(kbase_csf_alloc_command_stream_user_pages);
 
 static struct kbase_queue_group *find_queue_group(struct kbase_context *kctx,
 	u8 group_handle)
@@ -401,14 +364,20 @@ static struct kbase_queue_group *find_queue_group(struct kbase_context *kctx,
 	return NULL;
 }
 
+struct kbase_queue_group *kbase_csf_find_queue_group(struct kbase_context *kctx, u8 group_handle)
+{
+	return find_queue_group(kctx, group_handle);
+}
+KBASE_EXPORT_TEST_API(kbase_csf_find_queue_group);
+
 int kbase_csf_queue_group_handle_is_valid(struct kbase_context *kctx,
 	u8 group_handle)
 {
 	struct kbase_queue_group *group;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 	group = find_queue_group(kctx, group_handle);
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	return group ? 0 : -EINVAL;
 }
@@ -429,25 +398,49 @@ static struct kbase_queue *find_queue(struct kbase_context *kctx, u64 base_addr)
 
 static void get_queue(struct kbase_queue *queue)
 {
-	WARN_ON(!atomic_inc_not_zero(&queue->refcount));
+	WARN_ON(!kbase_refcount_inc_not_zero(&queue->refcount));
 }
 
-static void release_queue(struct kbase_queue *queue)
+/**
+ * release_queue() - Release a reference to a GPU queue
+ *
+ * @queue: The queue to release.
+ *
+ * Return: true if the queue has been released.
+ *
+ * The queue will be released when its reference count reaches zero.
+ */
+static bool release_queue(struct kbase_queue *queue)
 {
 	lockdep_assert_held(&queue->kctx->csf.lock);
-
-	WARN_ON(atomic_read(&queue->refcount) <= 0);
-
-	if (atomic_dec_and_test(&queue->refcount)) {
+	if (kbase_refcount_dec_and_test(&queue->refcount)) {
 		/* The queue can't still be on the per context list. */
 		WARN_ON(!list_empty(&queue->link));
 		WARN_ON(queue->group);
+		dev_dbg(queue->kctx->kbdev->dev,
+			"Remove any pending command queue fatal from ctx %d_%d",
+			queue->kctx->tgid, queue->kctx->id);
+
+		/* After this the Userspace would be able to free the
+		 * memory for GPU queue. In case the Userspace missed
+		 * terminating the queue, the cleanup will happen on
+		 * context termination where tear down of region tracker
+		 * would free up the GPU queue memory.
+		 */
+		kbase_gpu_vm_lock(queue->kctx);
+		kbase_va_region_no_user_free_dec(queue->queue_reg);
+		kbase_gpu_vm_unlock(queue->kctx);
+
 		kfree(queue);
+
+		return true;
 	}
+
+	return false;
 }
 
 static void oom_event_worker(struct work_struct *data);
-static void fatal_event_worker(struct work_struct *data);
+static void cs_error_worker(struct work_struct *data);
 
 /* Between reg and reg_ex, one and only one must be null */
 static int csf_queue_register_internal(struct kbase_context *kctx,
@@ -482,7 +475,7 @@ static int csf_queue_register_internal(struct kbase_context *kctx,
 	queue_addr = reg->buffer_gpu_addr;
 	queue_size = reg->buffer_size >> PAGE_SHIFT;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	/* Check if queue is already registered */
 	if (find_queue(kctx, queue_addr) != NULL) {
@@ -495,7 +488,8 @@ static int csf_queue_register_internal(struct kbase_context *kctx,
 	region = kbase_region_tracker_find_region_enclosing_address(kctx,
 								    queue_addr);
 
-	if (kbase_is_region_invalid_or_free(region)) {
+	if (kbase_is_region_invalid_or_free(region) || kbase_is_region_shrinkable(region) ||
+	    region->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) {
 		ret = -ENOENT;
 		goto out_unlock_vm;
 	}
@@ -544,41 +538,31 @@ static int csf_queue_register_internal(struct kbase_context *kctx,
 
 	queue->kctx = kctx;
 	queue->base_addr = queue_addr;
+
 	queue->queue_reg = region;
+	kbase_va_region_no_user_free_inc(region);
+
 	queue->size = (queue_size << PAGE_SHIFT);
 	queue->csi_index = KBASEP_IF_NR_INVALID;
-	queue->enabled = false;
 
 	queue->priority = reg->priority;
-	atomic_set(&queue->refcount, 1);
+	/* Default to a safe value, this would be updated on binding */
+	queue->group_priority = KBASE_QUEUE_GROUP_PRIORITY_LOW;
+	kbase_refcount_set(&queue->refcount, 1);
 
-	queue->group = NULL;
 	queue->bind_state = KBASE_CSF_QUEUE_UNBOUND;
 	queue->handle = BASEP_MEM_INVALID_HANDLE;
 	queue->doorbell_nr = KBASEP_USER_DB_NR_INVALID;
 
-	queue->status_wait = 0;
-	queue->sync_ptr = 0;
-	queue->sync_value = 0;
-
-#if IS_ENABLED(CONFIG_DEBUG_FS)
-	queue->saved_cmd_ptr = 0;
-#endif
-
-	queue->sb_status = 0;
 	queue->blocked_reason = CS_STATUS_BLOCKED_REASON_REASON_UNBLOCKED;
 
-	atomic_set(&queue->pending, 0);
-
 	INIT_LIST_HEAD(&queue->link);
-	INIT_LIST_HEAD(&queue->error.link);
+	atomic_set(&queue->pending_kick, 0);
+	INIT_LIST_HEAD(&queue->pending_kick_link);
 	INIT_WORK(&queue->oom_event_work, oom_event_worker);
-	INIT_WORK(&queue->fatal_event_work, fatal_event_worker);
+	INIT_WORK(&queue->cs_error_work, cs_error_worker);
 	list_add(&queue->link, &kctx->csf.queue_list);
 
-	queue->extract_ofs = 0;
-
-	region->flags |= KBASE_REG_NO_USER_FREE;
 	region->user_data = queue;
 
 	/* Initialize the cs_trace configuration parameters, When buffer_size
@@ -600,7 +584,7 @@ static int csf_queue_register_internal(struct kbase_context *kctx,
 out_unlock_vm:
 	kbase_gpu_vm_unlock(kctx);
 out:
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	return ret;
 }
@@ -608,6 +592,13 @@ out:
 int kbase_csf_queue_register(struct kbase_context *kctx,
 			     struct kbase_ioctl_cs_queue_register *reg)
 {
+	/* Validate the ring buffer configuration parameters */
+	if (reg->buffer_size < CS_RING_BUFFER_MIN_SIZE ||
+	    reg->buffer_size > CS_RING_BUFFER_MAX_SIZE ||
+	    reg->buffer_size & (reg->buffer_size - 1) || !reg->buffer_gpu_addr ||
+	    reg->buffer_gpu_addr & ~PAGE_MASK)
+		return -EINVAL;
+
 	return csf_queue_register_internal(kctx, reg, NULL);
 }
 
@@ -626,6 +617,13 @@ int kbase_csf_queue_register_ex(struct kbase_context *kctx,
 	if (glb_version < kbase_csf_interface_version(1, 1, 0))
 		return -EINVAL;
 
+	/* Validate the ring buffer configuration parameters */
+	if (reg->buffer_size < CS_RING_BUFFER_MIN_SIZE ||
+	    reg->buffer_size > CS_RING_BUFFER_MAX_SIZE ||
+	    reg->buffer_size & (reg->buffer_size - 1) || !reg->buffer_gpu_addr ||
+	    reg->buffer_gpu_addr & ~PAGE_MASK)
+		return -EINVAL;
+
 	/* Validate the cs_trace configuration parameters */
 	if (reg->ex_buffer_size &&
 		((reg->ex_event_size > max_size) ||
@@ -639,6 +637,22 @@ int kbase_csf_queue_register_ex(struct kbase_context *kctx,
 static void unbind_queue(struct kbase_context *kctx,
 		struct kbase_queue *queue);
 
+static void wait_pending_queue_kick(struct kbase_queue *queue)
+{
+	struct kbase_context *const kctx = queue->kctx;
+
+	/* Drain a pending queue kick if any. It should no longer be
+	 * possible to issue further queue kicks at this point: either the
+	 * queue has been unbound, or the context is being terminated.
+	 *
+	 * Signal kbase_csf_scheduler_kthread() to allow for the
+	 * eventual completion of the current iteration. Once it's done the
+	 * event_wait wait queue shall be signalled.
+	 */
+	complete(&kctx->kbdev->csf.scheduler.kthread_signal);
+	wait_event(kctx->kbdev->csf.event_wait, atomic_read(&queue->pending_kick) == 0);
+}
+
 void kbase_csf_queue_terminate(struct kbase_context *kctx,
 			      struct kbase_ioctl_cs_queue_terminate *term)
 {
@@ -656,7 +670,7 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx,
 	else
 		reset_prevented = true;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 	queue = find_queue(kctx, term->buffer_gpu_addr);
 
 	if (queue) {
@@ -672,27 +686,26 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx,
 		unbind_queue(kctx, queue);
 
 		kbase_gpu_vm_lock(kctx);
-		if (!WARN_ON(!queue->queue_reg)) {
-			/* After this the Userspace would be able to free the
-			 * memory for GPU queue. In case the Userspace missed
-			 * terminating the queue, the cleanup will happen on
-			 * context termination where tear down of region tracker
-			 * would free up the GPU queue memory.
-			 */
-			queue->queue_reg->flags &= ~KBASE_REG_NO_USER_FREE;
+		if (!WARN_ON(!queue->queue_reg))
 			queue->queue_reg->user_data = NULL;
-		}
 		kbase_gpu_vm_unlock(kctx);
 
-		dev_dbg(kctx->kbdev->dev,
-			"Remove any pending command queue fatal from context %pK\n",
-			(void *)kctx);
-		kbase_csf_event_remove_error(kctx, &queue->error);
+		rt_mutex_unlock(&kctx->csf.lock);
+		/* The GPU reset can be allowed now as the queue has been unbound. */
+		if (reset_prevented) {
+			kbase_reset_gpu_allow(kbdev);
+			reset_prevented = false;
+		}
+		wait_pending_queue_kick(queue);
+		/* The work items can be cancelled as Userspace is terminating the queue */
+		cancel_work_sync(&queue->oom_event_work);
+		cancel_work_sync(&queue->cs_error_work);
+		rt_mutex_lock(&kctx->csf.lock);
 
 		release_queue(queue);
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	if (reset_prevented)
 		kbase_reset_gpu_allow(kbdev);
 }
@@ -704,7 +717,7 @@ int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_
 	u8 max_streams;
 	int ret = -EINVAL;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	group = find_queue_group(kctx, bind->in.group_handle);
 	queue = find_queue(kctx, bind->in.buffer_gpu_addr);
@@ -733,21 +746,30 @@ int kbase_csf_queue_bind(struct kbase_context *kctx, union kbase_ioctl_cs_queue_
 	bind->out.mmap_handle = queue->handle;
 	group->bound_queues[bind->in.csi_index] = queue;
 	queue->group = group;
+	queue->group_priority = group->priority;
 	queue->csi_index = bind->in.csi_index;
 	queue->bind_state = KBASE_CSF_QUEUE_BIND_IN_PROGRESS;
 
 out:
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	return ret;
 }
 
-static struct kbase_queue_group *get_bound_queue_group(
-					struct kbase_queue *queue)
+/**
+ * get_bound_queue_group - Get the group to which a queue was bound
+ *
+ * @queue: Pointer to the queue for this group
+ *
+ * Return: The group to which this queue was bound, or NULL on error.
+ */
+static struct kbase_queue_group *get_bound_queue_group(struct kbase_queue *queue)
 {
 	struct kbase_context *kctx = queue->kctx;
 	struct kbase_queue_group *group;
 
+	lockdep_assert_held(&kctx->csf.lock);
+
 	if (queue->bind_state == KBASE_CSF_QUEUE_UNBOUND)
 		return NULL;
 
@@ -769,53 +791,13 @@ static struct kbase_queue_group *get_bound_queue_group(
 	return group;
 }
 
-/**
- * pending_submission_worker() - Work item to process pending kicked GPU command queues.
- *
- * @work: Pointer to pending_submission_work.
- *
- * This function starts all pending queues, for which the work
- * was previously submitted via ioctl call from application thread.
- * If the queue is already scheduled and resident, it will be started
- * right away, otherwise once the group is made resident.
- */
-static void pending_submission_worker(struct work_struct *work)
-{
-	struct kbase_context *kctx =
-		container_of(work, struct kbase_context, csf.pending_submission_work);
-	struct kbase_device *kbdev = kctx->kbdev;
-	struct kbase_queue *queue;
-	int err = kbase_reset_gpu_prevent_and_wait(kbdev);
-
-	if (err) {
-		dev_err(kbdev->dev, "Unsuccessful GPU reset detected when kicking queue ");
-		return;
-	}
-
-	mutex_lock(&kctx->csf.lock);
-
-	/* Iterate through the queue list and schedule the pending ones for submission. */
-	list_for_each_entry(queue, &kctx->csf.queue_list, link) {
-		if (atomic_cmpxchg(&queue->pending, 1, 0) == 1) {
-			struct kbase_queue_group *group = get_bound_queue_group(queue);
-
-			if (!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND)
-				dev_dbg(kbdev->dev, "queue is not bound to a group");
-			else
-				WARN_ON(kbase_csf_scheduler_queue_start(queue));
-		}
-	}
-
-	mutex_unlock(&kctx->csf.lock);
-
-	kbase_reset_gpu_allow(kbdev);
-}
-
 void kbase_csf_ring_csg_doorbell(struct kbase_device *kbdev, int slot)
 {
 	if (WARN_ON(slot < 0))
 		return;
 
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
+
 	kbase_csf_ring_csg_slots_doorbell(kbdev, (u32) (1 << slot));
 }
 
@@ -828,9 +810,20 @@ void kbase_csf_ring_csg_slots_doorbell(struct kbase_device *kbdev,
 		(u32) ((1U << kbdev->csf.global_iface.group_num) - 1);
 	u32 value;
 
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
+
 	if (WARN_ON(slot_bitmap > allowed_bitmap))
 		return;
 
+	/* The access to GLB_DB_REQ/ACK needs to be ordered with respect to CSG_REQ/ACK and
+	 * CSG_DB_REQ/ACK to avoid a scenario where a CSI request overlaps with a CSG request
+	 * or 2 CSI requests overlap and FW ends up missing the 2nd request.
+	 * Memory barrier is required, both on Host and FW side, to guarantee the ordering.
+	 *
+	 * 'osh' is used as CPU and GPU would be in the same Outer shareable domain.
+	 */
+	dmb(osh);
+
 	value = kbase_csf_firmware_global_output(global_iface, GLB_DB_ACK);
 	value ^= slot_bitmap;
 	kbase_csf_firmware_global_input_mask(global_iface, GLB_DB_REQ, value,
@@ -857,6 +850,8 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev,
 	struct kbase_csf_cmd_stream_group_info *ginfo;
 	u32 value;
 
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
+
 	if (WARN_ON(csg_nr < 0) ||
 	    WARN_ON(csg_nr >= kbdev->csf.global_iface.group_num))
 		return;
@@ -867,6 +862,14 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev,
 	    WARN_ON(csi_index >= ginfo->stream_num))
 		return;
 
+	/* The access to CSG_DB_REQ/ACK needs to be ordered with respect to
+	 * CS_REQ/ACK to avoid a scenario where CSG_DB_REQ/ACK becomes visible to
+	 * FW before CS_REQ/ACK is set.
+	 *
+	 * 'osh' is used as CPU and GPU would be in the same outer shareable domain.
+	 */
+	dmb(osh);
+
 	value = kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK);
 	value ^= (1 << csi_index);
 	kbase_csf_firmware_csg_input_mask(ginfo, CSG_DB_REQ, value,
@@ -876,19 +879,15 @@ void kbase_csf_ring_cs_kernel_doorbell(struct kbase_device *kbdev,
 		kbase_csf_ring_csg_doorbell(kbdev, csg_nr);
 }
 
-static void enqueue_gpu_submission_work(struct kbase_context *const kctx)
-{
-	queue_work(system_highpri_wq, &kctx->csf.pending_submission_work);
-}
-
 int kbase_csf_queue_kick(struct kbase_context *kctx,
 			 struct kbase_ioctl_cs_queue_kick *kick)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	bool trigger_submission = false;
 	struct kbase_va_region *region;
 	int err = 0;
 
+	KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK(kbdev, kctx->id, kick->buffer_gpu_addr);
+
 	/* GPU work submission happening asynchronously to prevent the contention with
 	 * scheduler lock and as the result blocking application thread. For this reason,
 	 * the vm_lock is used here to get the reference to the queue based on its buffer_gpu_addr
@@ -901,9 +900,19 @@ int kbase_csf_queue_kick(struct kbase_context *kctx,
 	if (!kbase_is_region_invalid_or_free(region)) {
 		struct kbase_queue *queue = region->user_data;
 
-		if (queue) {
-			atomic_cmpxchg(&queue->pending, 0, 1);
-			trigger_submission = true;
+		if (queue && (queue->bind_state == KBASE_CSF_QUEUE_BOUND)) {
+			spin_lock(&kbdev->csf.pending_gpuq_kicks_lock);
+			if (list_empty(&queue->pending_kick_link)) {
+				/* Queue termination shall block until this
+				 * kick has been handled.
+				 */
+				atomic_inc(&queue->pending_kick);
+				list_add_tail(
+					&queue->pending_kick_link,
+					&kbdev->csf.pending_gpuq_kicks[queue->group_priority]);
+				complete(&kbdev->csf.scheduler.kthread_signal);
+			}
+			spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock);
 		}
 	} else {
 		dev_dbg(kbdev->dev,
@@ -912,9 +921,6 @@ int kbase_csf_queue_kick(struct kbase_context *kctx,
 	}
 	kbase_gpu_vm_unlock(kctx);
 
-	if (likely(trigger_submission))
-		enqueue_gpu_submission_work(kctx);
-
 	return err;
 }
 
@@ -923,19 +929,23 @@ static void unbind_stopped_queue(struct kbase_context *kctx,
 {
 	lockdep_assert_held(&kctx->csf.lock);
 
+	if (WARN_ON(queue->csi_index < 0))
+		return;
+
 	if (queue->bind_state != KBASE_CSF_QUEUE_UNBOUND) {
 		unsigned long flags;
 
 		kbase_csf_scheduler_spin_lock(kctx->kbdev, &flags);
 		bitmap_clear(queue->group->protm_pending_bitmap,
 				queue->csi_index, 1);
-		KBASE_KTRACE_ADD_CSF_GRP_Q(kctx->kbdev, PROTM_PENDING_CLEAR,
+		KBASE_KTRACE_ADD_CSF_GRP_Q(kctx->kbdev, CSI_PROTM_PEND_CLEAR,
 			 queue->group, queue, queue->group->protm_pending_bitmap[0]);
 		queue->group->bound_queues[queue->csi_index] = NULL;
 		queue->group = NULL;
 		kbase_csf_scheduler_spin_unlock(kctx->kbdev, flags);
 
 		put_user_pages_mmap_handle(kctx, queue);
+		WARN_ON_ONCE(queue->doorbell_nr != KBASEP_USER_DB_NR_INVALID);
 		queue->bind_state = KBASE_CSF_QUEUE_UNBOUND;
 	}
 }
@@ -977,7 +987,16 @@ static void unbind_queue(struct kbase_context *kctx, struct kbase_queue *queue)
 	}
 }
 
-void kbase_csf_queue_unbind(struct kbase_queue *queue)
+static bool kbase_csf_queue_phys_allocated(struct kbase_queue *queue)
+{
+	/* The queue's phys are zeroed when allocation fails. Both of them being
+	 * zero is an impossible condition for a successful allocated set of phy pages.
+	 */
+
+	return (queue->phys[0].tagged_addr | queue->phys[1].tagged_addr);
+}
+
+void kbase_csf_queue_unbind(struct kbase_queue *queue, bool process_exit)
 {
 	struct kbase_context *kctx = queue->kctx;
 
@@ -991,7 +1010,7 @@ void kbase_csf_queue_unbind(struct kbase_queue *queue)
 	 * whereas CSG TERM request would result in an immediate abort or
 	 * cancellation of the pending work.
 	 */
-	if (current->flags & PF_EXITING) {
+	if (process_exit) {
 		struct kbase_queue_group *group = get_bound_queue_group(queue);
 
 		if (group)
@@ -1002,8 +1021,8 @@ void kbase_csf_queue_unbind(struct kbase_queue *queue)
 		unbind_queue(kctx, queue);
 	}
 
-	/* Free the resources, if allocated for this queue. */
-	if (queue->reg)
+	/* Free the resources, if allocated phys for this queue */
+	if (kbase_csf_queue_phys_allocated(queue))
 		kbase_csf_free_command_stream_user_pages(kctx, queue);
 }
 
@@ -1016,8 +1035,8 @@ void kbase_csf_queue_unbind_stopped(struct kbase_queue *queue)
 	WARN_ON(queue->bind_state == KBASE_CSF_QUEUE_BOUND);
 	unbind_stopped_queue(kctx, queue);
 
-	/* Free the resources, if allocated for this queue. */
-	if (queue->reg)
+	/* Free the resources, if allocated phys for this queue */
+	if (kbase_csf_queue_phys_allocated(queue))
 		kbase_csf_free_command_stream_user_pages(kctx, queue);
 }
 
@@ -1080,172 +1099,43 @@ static bool iface_has_enough_streams(struct kbase_device *const kbdev,
  * @kctx:	Pointer to kbase context where the queue group is created at
  * @s_buf:	Pointer to suspend buffer that is attached to queue group
  *
- * Return: 0 if suspend buffer is successfully allocated and reflected to GPU
- *         MMU page table. Otherwise -ENOMEM.
+ * Return: 0 if phy-pages for the suspend buffer is successfully allocated.
+ *	   Otherwise -ENOMEM or error code.
  */
 static int create_normal_suspend_buffer(struct kbase_context *const kctx,
 		struct kbase_normal_suspend_buffer *s_buf)
 {
-	struct kbase_va_region *reg = NULL;
-	const unsigned long mem_flags = KBASE_REG_GPU_RD | KBASE_REG_GPU_WR;
 	const size_t nr_pages =
 		PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size);
-	int err = 0;
-
-	/* Calls to this function are inherently asynchronous, with respect to
-	 * MMU operations.
-	 */
-	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	int err;
 
 	lockdep_assert_held(&kctx->csf.lock);
 
-	/* Allocate and initialize Region Object */
-	reg = kbase_alloc_free_region(&kctx->kbdev->csf.shared_reg_rbtree, 0,
-			nr_pages, KBASE_REG_ZONE_MCU_SHARED);
-
-	if (!reg)
-		return -ENOMEM;
-
-	s_buf->phy = kcalloc(nr_pages, sizeof(*s_buf->phy), GFP_KERNEL);
-
-	if (!s_buf->phy) {
-		err = -ENOMEM;
-		goto phy_alloc_failed;
-	}
-
-	/* Get physical page for a normal suspend buffer */
-	err = kbase_mem_pool_alloc_pages(
-			&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-			nr_pages, &s_buf->phy[0], false);
-
-	if (err < 0)
-		goto phy_pages_alloc_failed;
-
-	/* Insert Region Object into rbtree and make virtual address available
-	 * to map it to physical page
-	 */
-	mutex_lock(&kctx->kbdev->csf.reg_lock);
-	err = kbase_add_va_region_rbtree(kctx->kbdev, reg, 0, nr_pages, 1);
-	reg->flags &= ~KBASE_REG_FREE;
-	mutex_unlock(&kctx->kbdev->csf.reg_lock);
-
-	if (err)
-		goto add_va_region_failed;
-
-	/* Update MMU table */
-	err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->kbdev->csf.mcu_mmu,
-				     reg->start_pfn, &s_buf->phy[0], nr_pages,
-				     mem_flags, MCU_AS_NR,
-				     KBASE_MEM_GROUP_CSF_FW, mmu_sync_info);
-	if (err)
-		goto mmu_insert_failed;
-
-	s_buf->reg = reg;
-
-	return 0;
-
-mmu_insert_failed:
-	mutex_lock(&kctx->kbdev->csf.reg_lock);
-	kbase_remove_va_region(kctx->kbdev, reg);
-	mutex_unlock(&kctx->kbdev->csf.reg_lock);
-
-add_va_region_failed:
-	kbase_mem_pool_free_pages(
-		&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages,
-		&s_buf->phy[0], false, false);
-
-phy_pages_alloc_failed:
-	kfree(s_buf->phy);
-phy_alloc_failed:
-	kfree(reg);
-
-	return err;
-}
-
-/**
- * create_protected_suspend_buffer() - Create protected-mode suspend buffer
- *					per queue group
- *
- * @kbdev: Instance of a GPU platform device that implements a CSF interface.
- * @s_buf: Pointer to suspend buffer that is attached to queue group
- *
- * Return: 0 if suspend buffer is successfully allocated and reflected to GPU
- *         MMU page table. Otherwise -ENOMEM.
- */
-static int create_protected_suspend_buffer(struct kbase_device *const kbdev,
-		struct kbase_protected_suspend_buffer *s_buf)
-{
-	struct kbase_va_region *reg = NULL;
-	struct tagged_addr *phys = NULL;
-	const unsigned long mem_flags = KBASE_REG_GPU_RD | KBASE_REG_GPU_WR;
-	const size_t nr_pages =
-		PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
-	int err = 0;
-
-	/* Calls to this function are inherently asynchronous, with respect to
-	 * MMU operations.
+	/* The suspend buffer's mapping address is valid only when the CSG is to
+	 * run on slot, initializing it 0, signalling the buffer is not mapped.
 	 */
-	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	s_buf->gpu_va = 0;
 
-	/* Allocate and initialize Region Object */
-	reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0,
-			nr_pages, KBASE_REG_ZONE_MCU_SHARED);
+	s_buf->phy = kcalloc(nr_pages, sizeof(*s_buf->phy), GFP_KERNEL);
 
-	if (!reg)
+	if (!s_buf->phy)
 		return -ENOMEM;
 
-	phys = kcalloc(nr_pages, sizeof(*phys), GFP_KERNEL);
-	if (!phys) {
-		err = -ENOMEM;
-		goto phy_alloc_failed;
-	}
+	/* Get physical page for a normal suspend buffer */
+	err = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages,
+					 &s_buf->phy[0], false, kctx->task);
 
-	s_buf->pma = kbase_csf_protected_memory_alloc(kbdev, phys,
-			nr_pages, true);
-	if (s_buf->pma == NULL) {
-		err = -ENOMEM;
-		goto pma_alloc_failed;
+	if (err < 0) {
+		kfree(s_buf->phy);
+		return err;
 	}
 
-	/* Insert Region Object into rbtree and make virtual address available
-	 * to map it to physical page
-	 */
-	mutex_lock(&kbdev->csf.reg_lock);
-	err = kbase_add_va_region_rbtree(kbdev, reg, 0, nr_pages, 1);
-	reg->flags &= ~KBASE_REG_FREE;
-	mutex_unlock(&kbdev->csf.reg_lock);
-
-	if (err)
-		goto add_va_region_failed;
-
-	/* Update MMU table */
-	err = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, reg->start_pfn,
-				     phys, nr_pages, mem_flags, MCU_AS_NR,
-				     KBASE_MEM_GROUP_CSF_FW, mmu_sync_info);
-	if (err)
-		goto mmu_insert_failed;
-
-	s_buf->reg = reg;
-	kfree(phys);
+	kbase_process_page_usage_inc(kctx, nr_pages);
 	return 0;
-
-mmu_insert_failed:
-	mutex_lock(&kbdev->csf.reg_lock);
-	kbase_remove_va_region(kbdev, reg);
-	mutex_unlock(&kbdev->csf.reg_lock);
-
-add_va_region_failed:
-	kbase_csf_protected_memory_free(kbdev, s_buf->pma, nr_pages, true);
-pma_alloc_failed:
-	kfree(phys);
-phy_alloc_failed:
-	kfree(reg);
-
-	return err;
 }
 
 static void timer_event_worker(struct work_struct *data);
-static void protm_event_worker(struct work_struct *data);
+static void protm_event_worker(struct kthread_work *work);
 static void term_normal_suspend_buffer(struct kbase_context *const kctx,
 		struct kbase_normal_suspend_buffer *s_buf);
 
@@ -1262,26 +1152,17 @@ static void term_normal_suspend_buffer(struct kbase_context *const kctx,
 static int create_suspend_buffers(struct kbase_context *const kctx,
 		struct kbase_queue_group * const group)
 {
-	int err = 0;
-
 	if (create_normal_suspend_buffer(kctx, &group->normal_suspend_buf)) {
 		dev_err(kctx->kbdev->dev, "Failed to create normal suspend buffer\n");
 		return -ENOMEM;
 	}
 
-	if (kctx->kbdev->csf.pma_dev) {
-		err = create_protected_suspend_buffer(kctx->kbdev,
-				&group->protected_suspend_buf);
-		if (err) {
-			term_normal_suspend_buffer(kctx,
-					&group->normal_suspend_buf);
-			dev_err(kctx->kbdev->dev, "Failed to create protected suspend buffer\n");
-		}
-	} else {
-		group->protected_suspend_buf.reg = NULL;
-	}
+	/* Protected suspend buffer, runtime binding so just initialize it */
+	group->protected_suspend_buf.gpu_va = 0;
+	group->protected_suspend_buf.pma = NULL;
+	group->protected_suspend_buf.alloc_retries = 0;
 
-	return err;
+	return 0;
 }
 
 /**
@@ -1328,6 +1209,9 @@ static int create_queue_group(struct kbase_context *const kctx,
 		} else {
 			int err = 0;
 
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+			group->prev_act = false;
+#endif
 			group->kctx = kctx;
 			group->handle = group_handle;
 			group->csg_nr = KBASEP_CSG_NR_INVALID;
@@ -1339,11 +1223,23 @@ static int create_queue_group(struct kbase_context *const kctx,
 			group->tiler_max = create->in.tiler_max;
 			group->fragment_max = create->in.fragment_max;
 			group->compute_max = create->in.compute_max;
+			group->csi_handlers = create->in.csi_handlers;
 			group->priority = kbase_csf_priority_queue_group_priority_to_relative(
 				kbase_csf_priority_check(kctx->kbdev, create->in.priority));
 			group->doorbell_nr = KBASEP_USER_DB_NR_INVALID;
 			group->faulted = false;
+			group->cs_unrecoverable = false;
+			group->reevaluate_idle_status = false;
+
+			group->csg_reg = NULL;
+			group->csg_reg_bind_retries = 0;
 
+			group->dvs_buf = create->in.dvs_buf;
+
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+			group->deschedule_deferred_cnt = 0;
+#endif
 
 			group->group_uid = generate_group_uid();
 			create->out.group_uid = group->group_uid;
@@ -1351,14 +1247,15 @@ static int create_queue_group(struct kbase_context *const kctx,
 			INIT_LIST_HEAD(&group->link);
 			INIT_LIST_HEAD(&group->link_to_schedule);
 			INIT_LIST_HEAD(&group->error_fatal.link);
-			INIT_LIST_HEAD(&group->error_timeout.link);
-			INIT_LIST_HEAD(&group->error_tiler_oom.link);
 			INIT_WORK(&group->timer_event_work, timer_event_worker);
-			INIT_WORK(&group->protm_event_work, protm_event_worker);
+			kthread_init_work(&group->protm_event_work, protm_event_worker);
 			bitmap_zero(group->protm_pending_bitmap,
 					MAX_SUPPORTED_STREAMS_PER_GROUP);
 
 			group->run_state = KBASE_CSF_GROUP_INACTIVE;
+			KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_INACTIVE, group,
+						group->run_state);
+
 			err = create_suspend_buffers(kctx, group);
 
 			if (err < 0) {
@@ -1378,6 +1275,17 @@ static int create_queue_group(struct kbase_context *const kctx,
 	return group_handle;
 }
 
+static bool dvs_supported(u32 csf_version)
+{
+	if (GLB_VERSION_MAJOR_GET(csf_version) < 3)
+		return false;
+
+	if (GLB_VERSION_MAJOR_GET(csf_version) == 3)
+		if (GLB_VERSION_MINOR_GET(csf_version) < 2)
+			return false;
+
+	return true;
+}
 
 int kbase_csf_queue_group_create(struct kbase_context *const kctx,
 			union kbase_ioctl_cs_queue_group_create *const create)
@@ -1386,11 +1294,18 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx,
 	const u32 tiler_count = hweight64(create->in.tiler_mask);
 	const u32 fragment_count = hweight64(create->in.fragment_mask);
 	const u32 compute_count = hweight64(create->in.compute_mask);
+	size_t i;
 
-	mutex_lock(&kctx->csf.lock);
+	for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) {
+		if (create->in.padding[i] != 0) {
+			dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n");
+			return -EINVAL;
+		}
+	}
 
-	if ((create->in.tiler_max > tiler_count) ||
-	    (create->in.fragment_max > fragment_count) ||
+	rt_mutex_lock(&kctx->csf.lock);
+
+	if ((create->in.tiler_max > tiler_count) || (create->in.fragment_max > fragment_count) ||
 	    (create->in.compute_max > compute_count)) {
 		dev_dbg(kctx->kbdev->dev,
 			"Invalid maximum number of endpoints for a queue group");
@@ -1404,8 +1319,20 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx,
 			"No CSG has at least %d CSs",
 			create->in.cs_min);
 		err = -EINVAL;
-	} else if (create->in.reserved) {
-		dev_warn(kctx->kbdev->dev, "Reserved field was set to non-0");
+	} else if (create->in.csi_handlers & ~BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK) {
+		dev_warn(kctx->kbdev->dev, "Unknown exception handler flags set: %u",
+			 create->in.csi_handlers & ~BASE_CSF_EXCEPTION_HANDLER_FLAGS_MASK);
+		err = -EINVAL;
+	} else if (!dvs_supported(kctx->kbdev->csf.global_iface.version) && create->in.dvs_buf) {
+		dev_warn(
+			kctx->kbdev->dev,
+			"GPU does not support DVS but userspace is trying to use it");
+		err = -EINVAL;
+	} else if (dvs_supported(kctx->kbdev->csf.global_iface.version) &&
+		   !CSG_DVS_BUF_BUFFER_POINTER_GET(create->in.dvs_buf) &&
+		   CSG_DVS_BUF_BUFFER_SIZE_GET(create->in.dvs_buf)) {
+		dev_warn(kctx->kbdev->dev,
+			 "DVS buffer pointer is null but size is not 0");
 		err = -EINVAL;
 	} else {
 		/* For the CSG which satisfies the condition for having
@@ -1423,7 +1350,7 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx,
 			err = group_handle;
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	return err;
 }
@@ -1435,60 +1362,39 @@ int kbase_csf_queue_group_create(struct kbase_context *const kctx,
  * @s_buf:	Pointer to queue group suspend buffer to be freed
  */
 static void term_normal_suspend_buffer(struct kbase_context *const kctx,
-		struct kbase_normal_suspend_buffer *s_buf)
+				       struct kbase_normal_suspend_buffer *s_buf)
 {
-	const size_t nr_pages =
-		PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size);
+	const size_t nr_pages = PFN_UP(kctx->kbdev->csf.global_iface.groups[0].suspend_size);
 
 	lockdep_assert_held(&kctx->csf.lock);
 
-	WARN_ON(kbase_mmu_teardown_pages(
-				kctx->kbdev, &kctx->kbdev->csf.mcu_mmu,
-				s_buf->reg->start_pfn, nr_pages, MCU_AS_NR));
-
-	WARN_ON(s_buf->reg->flags & KBASE_REG_FREE);
+	/* The group should not have a bind remaining on any suspend buf region */
+	WARN_ONCE(s_buf->gpu_va, "Suspend buffer address should be 0 at termination");
 
-	mutex_lock(&kctx->kbdev->csf.reg_lock);
-	kbase_remove_va_region(kctx->kbdev, s_buf->reg);
-	mutex_unlock(&kctx->kbdev->csf.reg_lock);
-
-	kbase_mem_pool_free_pages(
-			&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-			nr_pages, &s_buf->phy[0], false, false);
+	kbase_mem_pool_free_pages(&kctx->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], nr_pages,
+				  &s_buf->phy[0], false, false);
+	kbase_process_page_usage_dec(kctx, nr_pages);
 
 	kfree(s_buf->phy);
 	s_buf->phy = NULL;
-	kfree(s_buf->reg);
-	s_buf->reg = NULL;
 }
 
 /**
- * term_protected_suspend_buffer() - Free normal-mode suspend buffer of
+ * term_protected_suspend_buffer() - Free protected-mode suspend buffer of
  *					queue group
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
- * @s_buf: Pointer to queue group suspend buffer to be freed
+ * @sbuf: Pointer to queue group suspend buffer to be freed
  */
 static void term_protected_suspend_buffer(struct kbase_device *const kbdev,
-		struct kbase_protected_suspend_buffer *s_buf)
+					  struct kbase_protected_suspend_buffer *sbuf)
 {
-	const size_t nr_pages =
-		PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
-
-	WARN_ON(kbase_mmu_teardown_pages(
-			kbdev, &kbdev->csf.mcu_mmu,
-			s_buf->reg->start_pfn, nr_pages, MCU_AS_NR));
-
-	WARN_ON(s_buf->reg->flags & KBASE_REG_FREE);
-
-	mutex_lock(&kbdev->csf.reg_lock);
-	kbase_remove_va_region(kbdev, s_buf->reg);
-	mutex_unlock(&kbdev->csf.reg_lock);
-
-	kbase_csf_protected_memory_free(kbdev, s_buf->pma, nr_pages, true);
-	s_buf->pma = NULL;
-	kfree(s_buf->reg);
-	s_buf->reg = NULL;
+	WARN_ONCE(sbuf->gpu_va, "Suspend buf should have been unmapped inside scheduler!");
+	if (sbuf->pma) {
+		const size_t nr_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+		kbase_csf_protected_memory_free(kbdev, sbuf->pma, nr_pages, true);
+		sbuf->pma = NULL;
+	}
 }
 
 void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group)
@@ -1520,6 +1426,7 @@ void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group)
 			&group->protected_suspend_buf);
 
 	group->run_state = KBASE_CSF_GROUP_TERMINATED;
+	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_TERMINATED, group, group->run_state);
 }
 
 /**
@@ -1550,10 +1457,38 @@ static void term_queue_group(struct kbase_queue_group *group)
 	kbase_csf_term_descheduled_queue_group(group);
 }
 
+/**
+ * wait_group_deferred_deschedule_completion - Wait for refcount of the group to
+ *         become 0 that was taken when the group deschedule had to be deferred.
+ *
+ * @group: Pointer to GPU command queue group that is being deleted.
+ *
+ * This function is called when Userspace deletes the group and after the group
+ * has been descheduled. The function synchronizes with the other threads that were
+ * also trying to deschedule the group whilst the dumping was going on for a fault.
+ * Please refer the documentation of wait_for_dump_complete_on_group_deschedule()
+ * for more details.
+ */
+static void wait_group_deferred_deschedule_completion(struct kbase_queue_group *group)
+{
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	struct kbase_context *kctx = group->kctx;
+
+	lockdep_assert_held(&kctx->csf.lock);
+
+	if (likely(!group->deschedule_deferred_cnt))
+		return;
+
+	rt_mutex_unlock(&kctx->csf.lock);
+	wait_event(kctx->kbdev->csf.event_wait, !group->deschedule_deferred_cnt);
+	rt_mutex_lock(&kctx->csf.lock);
+#endif
+}
+
 static void cancel_queue_group_events(struct kbase_queue_group *group)
 {
 	cancel_work_sync(&group->timer_event_work);
-	cancel_work_sync(&group->protm_event_work);
+	kthread_cancel_work_sync(&group->protm_event_work);
 }
 
 static void remove_pending_group_fatal_error(struct kbase_queue_group *group)
@@ -1564,8 +1499,6 @@ static void remove_pending_group_fatal_error(struct kbase_queue_group *group)
 		"Remove any pending group fatal error from context %pK\n",
 		(void *)group->kctx);
 
-	kbase_csf_event_remove_error(kctx, &group->error_tiler_oom);
-	kbase_csf_event_remove_error(kctx, &group->error_timeout);
 	kbase_csf_event_remove_error(kctx, &group->error_fatal);
 }
 
@@ -1586,32 +1519,49 @@ void kbase_csf_queue_group_terminate(struct kbase_context *kctx,
 	else
 		reset_prevented = true;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	group = find_queue_group(kctx, group_handle);
 
 	if (group) {
-		remove_pending_group_fatal_error(group);
-		term_queue_group(group);
 		kctx->csf.queue_groups[group_handle] = NULL;
+		/* Stop the running of the given group */
+		term_queue_group(group);
+		rt_mutex_unlock(&kctx->csf.lock);
+
+		if (reset_prevented) {
+			/* Allow GPU reset before cancelling the group specific
+			 * work item to avoid potential deadlock.
+			 * Reset prevention isn't needed after group termination.
+			 */
+			kbase_reset_gpu_allow(kbdev);
+			reset_prevented = false;
+		}
+
+		/* Cancel any pending event callbacks. If one is in progress
+		 * then this thread waits synchronously for it to complete (which
+		 * is why we must unlock the context first). We already ensured
+		 * that no more callbacks can be enqueued by terminating the group.
+		 */
+		cancel_queue_group_events(group);
+
+		rt_mutex_lock(&kctx->csf.lock);
+
+		/* Clean up after the termination */
+		remove_pending_group_fatal_error(group);
+
+		wait_group_deferred_deschedule_completion(group);
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	if (reset_prevented)
 		kbase_reset_gpu_allow(kbdev);
 
-	if (!group)
-		return;
-
-	/* Cancel any pending event callbacks. If one is in progress
-	 * then this thread waits synchronously for it to complete (which
-	 * is why we must unlock the context first). We already ensured
-	 * that no more callbacks can be enqueued by terminating the group.
-	 */
-	cancel_queue_group_events(group);
 	kfree(group);
 }
+KBASE_EXPORT_TEST_API(kbase_csf_queue_group_terminate);
 
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 int kbase_csf_queue_group_suspend(struct kbase_context *kctx,
 				  struct kbase_suspend_copy_buffer *sus_buf,
 				  u8 group_handle)
@@ -1628,7 +1578,7 @@ int kbase_csf_queue_group_suspend(struct kbase_context *kctx,
 			group_handle);
 		return err;
 	}
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	group = find_queue_group(kctx, group_handle);
 	if (group)
@@ -1637,11 +1587,12 @@ int kbase_csf_queue_group_suspend(struct kbase_context *kctx,
 	else
 		err = -EINVAL;
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	kbase_reset_gpu_allow(kbdev);
 
 	return err;
 }
+#endif
 
 void kbase_csf_add_group_fatal_error(
 	struct kbase_queue_group *const group,
@@ -1677,7 +1628,7 @@ void kbase_csf_active_queue_groups_reset(struct kbase_device *kbdev,
 
 	INIT_LIST_HEAD(&evicted_groups);
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	kbase_csf_scheduler_evict_ctx_slots(kbdev, kctx, &evicted_groups);
 	while (!list_empty(&evicted_groups)) {
@@ -1698,12 +1649,11 @@ void kbase_csf_active_queue_groups_reset(struct kbase_device *kbdev,
 			kbase_csf_term_descheduled_queue_group(group);
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 }
 
 int kbase_csf_ctx_init(struct kbase_context *kctx)
 {
-	struct kbase_device *kbdev = kctx->kbdev;
 	int err = -ENOMEM;
 
 	INIT_LIST_HEAD(&kctx->csf.queue_list);
@@ -1711,21 +1661,6 @@ int kbase_csf_ctx_init(struct kbase_context *kctx)
 
 	kbase_csf_event_init(kctx);
 
-	kctx->csf.user_reg_vma = NULL;
-	mutex_lock(&kbdev->pm.lock);
-	/* The inode information for /dev/malixx file is not available at the
-	 * time of device probe as the inode is created when the device node
-	 * is created by udevd (through mknod).
-	 */
-	if (kctx->filp) {
-		if (!kbdev->csf.mali_file_inode)
-			kbdev->csf.mali_file_inode = kctx->filp->f_inode;
-
-		/* inode is unique for a file */
-		WARN_ON(kbdev->csf.mali_file_inode != kctx->filp->f_inode);
-	}
-	mutex_unlock(&kbdev->pm.lock);
-
 	/* Mark all the cookies as 'free' */
 	bitmap_fill(kctx->csf.cookies, KBASE_CSF_NUM_USER_IO_PAGES_HANDLE);
 
@@ -1742,10 +1677,24 @@ int kbase_csf_ctx_init(struct kbase_context *kctx)
 				err = kbase_csf_tiler_heap_context_init(kctx);
 
 				if (likely(!err)) {
-					mutex_init(&kctx->csf.lock);
-					INIT_WORK(&kctx->csf.pending_submission_work,
-						  pending_submission_worker);
-				} else
+					rt_mutex_init(&kctx->csf.lock);
+
+					err = kbasep_ctx_user_reg_page_mapping_init(kctx);
+
+					if (likely(!err)) {
+						err = kbase_kthread_run_worker_rt(kctx->kbdev,
+						                                  &kctx->csf.protm_event_worker, "mali_protm_event");
+						if (unlikely(err)) {
+							dev_err(kctx->kbdev->dev, "error initializing protm event worker thread");
+							kbasep_ctx_user_reg_page_mapping_term(kctx);
+						}
+					}
+
+					if (unlikely(err))
+						kbase_csf_tiler_heap_context_term(kctx);
+				}
+
+				if (unlikely(err))
 					kbase_csf_kcpu_queue_context_term(kctx);
 			}
 
@@ -1760,6 +1709,36 @@ int kbase_csf_ctx_init(struct kbase_context *kctx)
 	return err;
 }
 
+void kbase_csf_ctx_report_page_fault_for_active_groups(struct kbase_context *kctx,
+						       struct kbase_fault *fault)
+{
+	struct base_gpu_queue_group_error err_payload =
+		(struct base_gpu_queue_group_error){ .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL,
+						     .payload = { .fatal_group = {
+									  .sideband = fault->addr,
+									  .status = fault->status,
+								  } } };
+	struct kbase_device *kbdev = kctx->kbdev;
+	const u32 num_groups = kbdev->csf.global_iface.group_num;
+	unsigned long flags;
+	int csg_nr;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	for (csg_nr = 0; csg_nr < num_groups; csg_nr++) {
+		struct kbase_queue_group *const group =
+			kbdev->csf.scheduler.csg_slots[csg_nr].resident_group;
+
+		if (!group || (group->kctx != kctx))
+			continue;
+
+		group->faulted = true;
+		kbase_csf_add_group_fatal_error(group, &err_payload);
+	}
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+}
+
 void kbase_csf_ctx_handle_fault(struct kbase_context *kctx,
 		struct kbase_fault *fault)
 {
@@ -1793,7 +1772,7 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx,
 		}
 	};
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	for (gr = 0; gr < MAX_QUEUE_GROUP_NUM; gr++) {
 		struct kbase_queue_group *const group =
@@ -1801,12 +1780,15 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx,
 
 		if (group && group->run_state != KBASE_CSF_GROUP_TERMINATED) {
 			term_queue_group(group);
+			/* This would effectively be a NOP if the fatal error was already added to
+			 * the error_list by kbase_csf_ctx_report_page_fault_for_active_groups().
+			 */
 			kbase_csf_add_group_fatal_error(group, &err_payload);
 			reported = true;
 		}
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	if (reported)
 		kbase_event_wakeup_sync(kctx);
@@ -1839,9 +1821,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx)
 	else
 		reset_prevented = true;
 
-	cancel_work_sync(&kctx->csf.pending_submission_work);
-
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	/* Iterate through the queue groups that were not terminated by
 	 * userspace and issue the term request to firmware for them.
@@ -1854,7 +1834,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx)
 			term_queue_group(group);
 		}
 	}
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	if (reset_prevented)
 		kbase_reset_gpu_allow(kbdev);
@@ -1881,7 +1861,7 @@ void kbase_csf_ctx_term(struct kbase_context *kctx)
 	if (as)
 		flush_workqueue(as->pf_wq);
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	for (i = 0; i < MAX_QUEUE_GROUP_NUM; i++) {
 		kfree(kctx->csf.queue_groups[i]);
@@ -1897,34 +1877,40 @@ void kbase_csf_ctx_term(struct kbase_context *kctx)
 		queue = list_first_entry(&kctx->csf.queue_list,
 						struct kbase_queue, link);
 
+		list_del_init(&queue->link);
+
+		rt_mutex_unlock(&kctx->csf.lock);
+		wait_pending_queue_kick(queue);
+		rt_mutex_lock(&kctx->csf.lock);
+
 		/* The reference held when the IO mapping was created on bind
 		 * would have been dropped otherwise the termination of Kbase
 		 * context itself wouldn't have kicked-in. So there shall be
 		 * only one reference left that was taken when queue was
 		 * registered.
 		 */
-		if (atomic_read(&queue->refcount) != 1)
-			dev_warn(kctx->kbdev->dev,
-				 "Releasing queue with incorrect refcounting!\n");
-		list_del_init(&queue->link);
+		WARN_ON(kbase_refcount_read(&queue->refcount) != 1);
+
 		release_queue(queue);
 	}
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
+	kbase_destroy_kworker_stack(&kctx->csf.protm_event_worker);
+	kbasep_ctx_user_reg_page_mapping_term(kctx);
 	kbase_csf_tiler_heap_context_term(kctx);
 	kbase_csf_kcpu_queue_context_term(kctx);
 	kbase_csf_scheduler_context_term(kctx);
 	kbase_csf_event_term(kctx);
 
-	mutex_destroy(&kctx->csf.lock);
+	rt_mutex_destroy(&kctx->csf.lock);
 }
 
 /**
  * handle_oom_event - Handle the OoM event generated by the firmware for the
  *                    CSI.
  *
- * @kctx: Pointer to the kbase context in which the tiler heap was initialized.
+ * @group:  Pointer to the CSG group the oom-event belongs to.
  * @stream: Pointer to the structure containing info provided by the firmware
  *          about the CSI.
  *
@@ -1939,9 +1925,10 @@ void kbase_csf_ctx_term(struct kbase_context *kctx)
  * Return: 0 if successfully handled the request, otherwise a negative error
  *         code on failure.
  */
-static int handle_oom_event(struct kbase_context *const kctx,
-		struct kbase_csf_cmd_stream_info const *const stream)
+static int handle_oom_event(struct kbase_queue_group *const group,
+			    struct kbase_csf_cmd_stream_info const *const stream)
 {
+	struct kbase_context *const kctx = group->kctx;
 	u64 gpu_heap_va =
 		kbase_csf_firmware_cs_output(stream, CS_HEAP_ADDRESS_LO) |
 		((u64)kbase_csf_firmware_cs_output(stream, CS_HEAP_ADDRESS_HI) << 32);
@@ -1968,12 +1955,18 @@ static int handle_oom_event(struct kbase_context *const kctx,
 	err = kbase_csf_tiler_heap_alloc_new_chunk(kctx,
 		gpu_heap_va, renderpasses_in_flight, pending_frag_count, &new_chunk_ptr);
 
-	/* It is okay to acknowledge with a NULL chunk (firmware will then wait
-	 * for the fragment jobs to complete and release chunks)
-	 */
-	if (err == -EBUSY)
+	if ((group->csi_handlers & BASE_CSF_TILER_OOM_EXCEPTION_FLAG) &&
+	    (pending_frag_count == 0) && (err == -ENOMEM || err == -EBUSY)) {
+		/* The group allows incremental rendering, trigger it */
+		new_chunk_ptr = 0;
+		dev_dbg(kctx->kbdev->dev, "Group-%d (slot-%d) enter incremental render\n",
+			group->handle, group->csg_nr);
+	} else if (err == -EBUSY) {
+		/* Acknowledge with a NULL chunk (firmware will then wait for
+		 * the fragment jobs to complete and release chunks)
+		 */
 		new_chunk_ptr = 0;
-	else if (err)
+	} else if (err)
 		return err;
 
 	kbase_csf_firmware_cs_input(stream, CS_TILER_HEAP_START_LO,
@@ -2007,11 +2000,33 @@ static void report_tiler_oom_error(struct kbase_queue_group *group)
 					  } } } };
 
 	kbase_csf_event_add_error(group->kctx,
-				  &group->error_tiler_oom,
+				  &group->error_fatal,
 				  &error);
 	kbase_event_wakeup_sync(group->kctx);
 }
 
+static void flush_gpu_cache_on_fatal_error(struct kbase_device *kbdev)
+{
+	kbase_pm_lock(kbdev);
+	/* With the advent of partial cache flush, dirty cache lines could
+	 * be left in the GPU L2 caches by terminating the queue group here
+	 * without waiting for proper cache maintenance. A full cache flush
+	 * here will prevent these dirty cache lines from being arbitrarily
+	 * evicted later and possible causing memory corruption.
+	 */
+	if (kbdev->pm.backend.gpu_powered) {
+		kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC);
+		if (kbase_gpu_wait_cache_clean_timeout(kbdev,
+						       kbdev->mmu_or_gpu_cache_op_wait_time_ms))
+			dev_warn(
+				kbdev->dev,
+				"[%llu] Timeout waiting for CACHE_CLN_INV_L2_LSC to complete after fatal error",
+				kbase_backend_get_cycle_cnt(kbdev));
+	}
+
+	kbase_pm_unlock(kbdev);
+}
+
 /**
  * kbase_queue_oom_event - Handle tiler out-of-memory for a GPU command queue.
  *
@@ -2024,8 +2039,8 @@ static void report_tiler_oom_error(struct kbase_queue_group *group)
  * notification to allow the firmware to report out-of-memory again in future.
  * If the out-of-memory condition was successfully handled then this function
  * rings the relevant doorbell to notify the firmware; otherwise, it terminates
- * the GPU command queue group to which the queue is bound. See
- * term_queue_group() for details.
+ * the GPU command queue group to which the queue is bound and notify a waiting
+ * user space client of the failure.
  */
 static void kbase_queue_oom_event(struct kbase_queue *const queue)
 {
@@ -2037,6 +2052,7 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue)
 	struct kbase_csf_cmd_stream_info const *stream;
 	int csi_index = queue->csi_index;
 	u32 cs_oom_ack, cs_oom_req;
+	unsigned long flags;
 
 	lockdep_assert_held(&kctx->csf.lock);
 
@@ -2048,6 +2064,13 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue)
 
 	kbase_csf_scheduler_lock(kbdev);
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	if (kbdev->csf.scheduler.sc_power_rails_off) {
+		dev_warn(kctx->kbdev->dev, "SC power rails off unexpectedly when handling OoM event");
+		goto unlock;
+	}
+#endif
+
 	slot_num = kbase_csf_scheduler_group_get_slot(group);
 
 	/* The group could have gone off slot before this work item got
@@ -2080,22 +2103,25 @@ static void kbase_queue_oom_event(struct kbase_queue *const queue)
 	if (cs_oom_ack == cs_oom_req)
 		goto unlock;
 
-	err = handle_oom_event(kctx, stream);
+	err = handle_oom_event(group, stream);
 
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
 	kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_oom_ack,
 					 CS_REQ_TILER_OOM_MASK);
+	kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true);
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
 
-	if (err) {
+	if (unlikely(err)) {
 		dev_warn(
 			kbdev->dev,
 			"Queue group to be terminated, couldn't handle the OoM event\n");
+		kbase_debug_csf_fault_notify(kbdev, kctx, DF_TILER_OOM);
 		kbase_csf_scheduler_unlock(kbdev);
 		term_queue_group(group);
+		flush_gpu_cache_on_fatal_error(kbdev);
 		report_tiler_oom_error(group);
 		return;
 	}
-
-	kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true);
 unlock:
 	kbase_csf_scheduler_unlock(kbdev);
 }
@@ -2117,18 +2143,18 @@ static void oom_event_worker(struct work_struct *data)
 	struct kbase_device *const kbdev = kctx->kbdev;
 
 	int err = kbase_reset_gpu_try_prevent(kbdev);
+
 	/* Regardless of whether reset failed or is currently happening, exit
 	 * early
 	 */
 	if (err)
 		return;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	kbase_queue_oom_event(queue);
-	release_queue(queue);
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	kbase_reset_gpu_allow(kbdev);
 }
 
@@ -2153,7 +2179,7 @@ static void report_group_timeout_error(struct kbase_queue_group *const group)
 		 "Notify the event notification thread, forward progress timeout (%llu cycles)\n",
 		 kbase_csf_timeout_get(group->kctx->kbdev));
 
-	kbase_csf_event_add_error(group->kctx, &group->error_timeout, &error);
+	kbase_csf_event_add_error(group->kctx, &group->error_fatal, &error);
 	kbase_event_wakeup_sync(group->kctx);
 }
 
@@ -2169,25 +2195,27 @@ static void timer_event_worker(struct work_struct *data)
 	struct kbase_queue_group *const group =
 		container_of(data, struct kbase_queue_group, timer_event_work);
 	struct kbase_context *const kctx = group->kctx;
+	struct kbase_device *const kbdev = kctx->kbdev;
 	bool reset_prevented = false;
-	int err = kbase_reset_gpu_prevent_and_wait(kctx->kbdev);
+	int err = kbase_reset_gpu_prevent_and_wait(kbdev);
 
 	if (err)
 		dev_warn(
-			kctx->kbdev->dev,
+			kbdev->dev,
 			"Unsuccessful GPU reset detected when terminating group %d on progress timeout, attempting to terminate regardless",
 			group->handle);
 	else
 		reset_prevented = true;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	term_queue_group(group);
+	flush_gpu_cache_on_fatal_error(kbdev);
 	report_group_timeout_error(group);
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	if (reset_prevented)
-		kbase_reset_gpu_allow(kctx->kbdev);
+		kbase_reset_gpu_allow(kbdev);
 }
 
 /**
@@ -2195,30 +2223,125 @@ static void timer_event_worker(struct work_struct *data)
  *
  * @group: Pointer to GPU queue group for which the timeout event is received.
  *
+ * Notify a waiting user space client of the timeout.
  * Enqueue a work item to terminate the group and notify the event notification
  * thread of progress timeout fault for the GPU command queue group.
  */
 static void handle_progress_timer_event(struct kbase_queue_group *const group)
 {
+	kbase_debug_csf_fault_notify(group->kctx->kbdev, group->kctx,
+		DF_PROGRESS_TIMER_TIMEOUT);
+
 	queue_work(group->kctx->csf.wq, &group->timer_event_work);
 }
 
 /**
+ * alloc_grp_protected_suspend_buffer_pages() -  Allocate physical pages from the protected
+ *                                               memory for the protected mode suspend buffer.
+ * @group: Pointer to the GPU queue group.
+ *
+ * Return: 0 if suspend buffer allocation is successful or if its already allocated, otherwise
+ * negative error value.
+ */
+static int alloc_grp_protected_suspend_buffer_pages(struct kbase_queue_group *const group)
+{
+	struct kbase_device *const kbdev = group->kctx->kbdev;
+	struct kbase_context *kctx = group->kctx;
+	struct tagged_addr *phys = NULL;
+	struct kbase_protected_suspend_buffer *sbuf = &group->protected_suspend_buf;
+	size_t nr_pages;
+	int err = 0;
+
+	if (likely(sbuf->pma))
+		return 0;
+
+	nr_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	phys = kcalloc(nr_pages, sizeof(*phys), GFP_KERNEL);
+	if (unlikely(!phys)) {
+		err = -ENOMEM;
+		goto phys_free;
+	}
+
+	rt_mutex_lock(&kctx->csf.lock);
+	kbase_csf_scheduler_lock(kbdev);
+
+	if (unlikely(!group->csg_reg)) {
+		/* The only chance of the bound csg_reg is removed from the group is
+		 * that it has been put off slot by the scheduler and the csg_reg resource
+		 * is contended by other groups. In this case, it needs another occasion for
+		 * mapping the pma, which needs a bound csg_reg. Since the group is already
+		 * off-slot, returning no error is harmless as the scheduler, when place the
+		 * group back on-slot again would do the required MMU map operation on the
+		 * allocated and retained pma.
+		 */
+		WARN_ON(group->csg_nr >= 0);
+		dev_dbg(kbdev->dev, "No bound csg_reg for group_%d_%d_%d to enter protected mode",
+			group->kctx->tgid, group->kctx->id, group->handle);
+		goto unlock;
+	}
+
+	/* Allocate the protected mode pages */
+	sbuf->pma = kbase_csf_protected_memory_alloc(kbdev, phys, nr_pages, true);
+	if (unlikely(!sbuf->pma)) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	/* Map the bound susp_reg to the just allocated pma pages */
+	err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group);
+
+unlock:
+	kbase_csf_scheduler_unlock(kbdev);
+	rt_mutex_unlock(&kctx->csf.lock);
+phys_free:
+	kfree(phys);
+	return err;
+}
+
+static void report_group_fatal_error(struct kbase_queue_group *const group)
+{
+	struct base_gpu_queue_group_error const
+		err_payload = { .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL,
+				.payload = { .fatal_group = {
+						     .status = GPU_EXCEPTION_TYPE_SW_FAULT_0,
+					     } } };
+
+	kbase_csf_add_group_fatal_error(group, &err_payload);
+	kbase_event_wakeup_sync(group->kctx);
+}
+
+/**
  * protm_event_worker - Protected mode switch request event handler
- *			called from a workqueue.
+ *			called from a kthread.
  *
- * @data: Pointer to a work_struct embedded in GPU command queue group data.
+ * @work: Pointer to a kthread_work struct embedded in GPU command queue group data.
  *
  * Request to switch to protected mode.
  */
-static void protm_event_worker(struct work_struct *data)
+static void protm_event_worker(struct kthread_work *work)
 {
 	struct kbase_queue_group *const group =
-		container_of(data, struct kbase_queue_group, protm_event_work);
+		container_of(work, struct kbase_queue_group, protm_event_work);
+	struct kbase_protected_suspend_buffer *sbuf = &group->protected_suspend_buf;
+	int err = 0;
 
-	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_BEGIN,
+	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_START,
 				 group, 0u);
-	kbase_csf_scheduler_group_protm_enter(group);
+
+	err = alloc_grp_protected_suspend_buffer_pages(group);
+	if (!err) {
+		kbase_csf_scheduler_group_protm_enter(group);
+	} else if (err == -ENOMEM && sbuf->alloc_retries <= PROTM_ALLOC_MAX_RETRIES) {
+		sbuf->alloc_retries++;
+		/* try again to allocate pages */
+		kthread_queue_work(&group->kctx->csf.protm_event_worker, &group->protm_event_work);
+	} else if (sbuf->alloc_retries >= PROTM_ALLOC_MAX_RETRIES || err != -ENOMEM) {
+		dev_err(group->kctx->kbdev->dev,
+			"Failed to allocate physical pages for Protected mode suspend buffer for the group %d of context %d_%d",
+			group->handle, group->kctx->tgid, group->kctx->id);
+		report_group_fatal_error(group);
+	}
+
 	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, PROTM_EVENT_WORKER_END,
 				 group, 0u);
 }
@@ -2227,16 +2350,20 @@ static void protm_event_worker(struct work_struct *data)
  * handle_fault_event - Handler for CS fault.
  *
  * @queue:  Pointer to queue for which fault event was received.
- * @stream: Pointer to the structure containing info provided by the
- *          firmware about the CSI.
- *
- * Prints meaningful CS fault information.
+ * @cs_ack: Value of the CS_ACK register in the CS kernel input page used for
+ *          the queue.
  *
+ * Print required information about the CS fault and notify the user space client
+ * about the fault.
  */
 static void
-handle_fault_event(struct kbase_queue *const queue,
-		   struct kbase_csf_cmd_stream_info const *const stream)
+handle_fault_event(struct kbase_queue *const queue, const u32 cs_ack)
 {
+	struct kbase_device *const kbdev = queue->kctx->kbdev;
+	struct kbase_csf_cmd_stream_group_info const *ginfo =
+			&kbdev->csf.global_iface.groups[queue->group->csg_nr];
+	struct kbase_csf_cmd_stream_info const *stream =
+			&ginfo->streams[queue->csi_index];
 	const u32 cs_fault = kbase_csf_firmware_cs_output(stream, CS_FAULT);
 	const u64 cs_fault_info =
 		kbase_csf_firmware_cs_output(stream, CS_FAULT_INFO_LO) |
@@ -2248,7 +2375,6 @@ handle_fault_event(struct kbase_queue *const queue,
 		CS_FAULT_EXCEPTION_DATA_GET(cs_fault);
 	const u64 cs_fault_info_exception_data =
 		CS_FAULT_INFO_EXCEPTION_DATA_GET(cs_fault_info);
-	struct kbase_device *const kbdev = queue->kctx->kbdev;
 
 	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
 
@@ -2263,53 +2389,82 @@ handle_fault_event(struct kbase_queue *const queue,
 		 kbase_gpu_exception_name(cs_fault_exception_type),
 		 cs_fault_exception_data, cs_fault_info_exception_data);
 
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	/* CS_RESOURCE_TERMINATED type fault event can be ignored from the
+	 * standpoint of dump on error. It is used to report fault for the CSIs
+	 * that are associated with the same CSG as the CSI for which the actual
+	 * fault was reported by the Iterator.
+	 * Dumping would be triggered when the actual fault is reported.
+	 *
+	 * CS_INHERIT_FAULT can also be ignored. It could happen due to the error
+	 * in other types of queues (cpu/kcpu). If a fault had occurred in some
+	 * other GPU queue then the dump would have been performed anyways when
+	 * that fault was reported.
+	 */
+	if ((cs_fault_exception_type != CS_FAULT_EXCEPTION_TYPE_CS_INHERIT_FAULT) &&
+	    (cs_fault_exception_type != CS_FAULT_EXCEPTION_TYPE_CS_RESOURCE_TERMINATED)) {
+		if (unlikely(kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_CS_FAULT))) {
+			queue->cs_error = cs_fault;
+			queue->cs_error_info = cs_fault_info;
+			queue->cs_error_fatal = false;
+			queue_work(queue->kctx->csf.wq, &queue->cs_error_work);
+			return;
+		}
+	}
+#endif
+
+	kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
+					 CS_REQ_FAULT_MASK);
+	kbase_csf_ring_cs_kernel_doorbell(kbdev, queue->csi_index, queue->group->csg_nr, true);
 }
 
-static void report_queue_fatal_error(struct kbase_queue *const queue,
-				     u32 cs_fatal, u64 cs_fatal_info,
-				     u8 group_handle)
+static void report_queue_fatal_error(struct kbase_queue *const queue, u32 cs_fatal,
+				     u64 cs_fatal_info, struct kbase_queue_group *group)
 {
-	struct base_csf_notification error = {
-		.type = BASE_CSF_NOTIFICATION_GPU_QUEUE_GROUP_ERROR,
-		.payload = {
-			.csg_error = {
-				.handle = group_handle,
-				.error = {
-					.error_type =
-					BASE_GPU_QUEUE_GROUP_QUEUE_ERROR_FATAL,
-					.payload = {
-						.fatal_queue = {
-						.sideband = cs_fatal_info,
-						.status = cs_fatal,
-						.csi_index = queue->csi_index,
-						}
-					}
-				}
-			}
-		}
-	};
+	struct base_csf_notification
+		error = { .type = BASE_CSF_NOTIFICATION_GPU_QUEUE_GROUP_ERROR,
+			  .payload = {
+				  .csg_error = {
+					  .error = { .error_type =
+							     BASE_GPU_QUEUE_GROUP_QUEUE_ERROR_FATAL,
+						     .payload = { .fatal_queue = {
+									  .sideband = cs_fatal_info,
+									  .status = cs_fatal,
+								  } } } } } };
+
+	if (!queue)
+		return;
 
-	kbase_csf_event_add_error(queue->kctx, &queue->error, &error);
-	kbase_event_wakeup(queue->kctx);
+	if (WARN_ON_ONCE(!group))
+		return;
+
+	error.payload.csg_error.handle = group->handle;
+	error.payload.csg_error.error.payload.fatal_queue.csi_index = queue->csi_index;
+	kbase_csf_event_add_error(queue->kctx, &group->error_fatal, &error);
+	kbase_event_wakeup_sync(queue->kctx);
 }
 
 /**
- * fatal_event_worker - Handle the fatal error for the GPU queue
+ * cs_error_worker - Handle the CS_FATAL/CS_FAULT error for the GPU queue
  *
  * @data: Pointer to a work_struct embedded in GPU command queue.
  *
  * Terminate the CSG and report the error to userspace.
  */
-static void fatal_event_worker(struct work_struct *const data)
+static void cs_error_worker(struct work_struct *const data)
 {
 	struct kbase_queue *const queue =
-		container_of(data, struct kbase_queue, fatal_event_work);
+		container_of(data, struct kbase_queue, cs_error_work);
+	const u32 cs_fatal_exception_type = CS_FATAL_EXCEPTION_TYPE_GET(queue->cs_error);
 	struct kbase_context *const kctx = queue->kctx;
 	struct kbase_device *const kbdev = kctx->kbdev;
 	struct kbase_queue_group *group;
-	u8 group_handle;
 	bool reset_prevented = false;
-	int err = kbase_reset_gpu_prevent_and_wait(kbdev);
+	int err;
+
+	kbase_debug_csf_fault_wait_completion(kbdev);
+	err = kbase_reset_gpu_prevent_and_wait(kbdev);
 
 	if (err)
 		dev_warn(
@@ -2318,7 +2473,7 @@ static void fatal_event_worker(struct work_struct *const data)
 	else
 		reset_prevented = true;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	group = get_bound_queue_group(queue);
 	if (!group) {
@@ -2326,14 +2481,48 @@ static void fatal_event_worker(struct work_struct *const data)
 		goto unlock;
 	}
 
-	group_handle = group->handle;
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	if (!queue->cs_error_fatal) {
+		unsigned long flags;
+		int slot_num;
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		slot_num = kbase_csf_scheduler_group_get_slot_locked(group);
+		if (slot_num >= 0) {
+			struct kbase_csf_cmd_stream_group_info const *ginfo =
+				&kbdev->csf.global_iface.groups[slot_num];
+			struct kbase_csf_cmd_stream_info const *stream =
+				&ginfo->streams[queue->csi_index];
+			u32 const cs_ack =
+				kbase_csf_firmware_cs_output(stream, CS_ACK);
+
+			kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
+				CS_REQ_FAULT_MASK);
+			kbase_csf_ring_cs_kernel_doorbell(kbdev, queue->csi_index,
+				slot_num, true);
+		}
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		goto unlock;
+	}
+#endif
+
 	term_queue_group(group);
-	report_queue_fatal_error(queue, queue->cs_fatal, queue->cs_fatal_info,
-				 group_handle);
+	flush_gpu_cache_on_fatal_error(kbdev);
+	/* For an invalid GPU page fault, CS_BUS_FAULT fatal error is expected after the
+	 * page fault handler disables the AS of faulty context. Need to skip reporting the
+	 * CS_BUS_FAULT fatal error to the Userspace as it doesn't have the full fault info.
+	 * Page fault handler will report the fatal error with full page fault info.
+	 */
+	if ((cs_fatal_exception_type == CS_FATAL_EXCEPTION_TYPE_CS_BUS_FAULT) && group->faulted) {
+		dev_dbg(kbdev->dev,
+			"Skipped reporting CS_BUS_FAULT for queue %d of group %d of ctx %d_%d",
+			queue->csi_index, group->handle, kctx->tgid, kctx->id);
+	} else {
+		report_queue_fatal_error(queue, queue->cs_error, queue->cs_error_info, group);
+	}
 
 unlock:
-	release_queue(queue);
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	if (reset_prevented)
 		kbase_reset_gpu_allow(kbdev);
 }
@@ -2344,14 +2533,18 @@ unlock:
  * @queue:    Pointer to queue for which fatal event was received.
  * @stream:   Pointer to the structure containing info provided by the
  *            firmware about the CSI.
+ * @cs_ack: Value of the CS_ACK register in the CS kernel input page used for
+ *          the queue.
  *
- * Prints meaningful CS fatal information.
+ * Notify a waiting user space client of the CS fatal and prints meaningful
+ * information.
  * Enqueue a work item to terminate the group and report the fatal error
  * to user space.
  */
 static void
 handle_fatal_event(struct kbase_queue *const queue,
-		   struct kbase_csf_cmd_stream_info const *const stream)
+		   struct kbase_csf_cmd_stream_info const *const stream,
+		   u32 cs_ack)
 {
 	const u32 cs_fatal = kbase_csf_firmware_cs_output(stream, CS_FATAL);
 	const u64 cs_fatal_info =
@@ -2381,52 +2574,24 @@ handle_fatal_event(struct kbase_queue *const queue,
 
 	if (cs_fatal_exception_type ==
 			CS_FATAL_EXCEPTION_TYPE_FIRMWARE_INTERNAL_ERROR) {
+		kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_FW_INTERNAL_ERROR);
 		queue_work(system_wq, &kbdev->csf.fw_error_work);
 	} else {
-		get_queue(queue);
-		queue->cs_fatal = cs_fatal;
-		queue->cs_fatal_info = cs_fatal_info;
-		if (!queue_work(queue->kctx->csf.wq, &queue->fatal_event_work))
-			release_queue(queue);
+		kbase_debug_csf_fault_notify(kbdev, queue->kctx, DF_CS_FATAL);
+		if (cs_fatal_exception_type == CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE) {
+			queue->group->cs_unrecoverable = true;
+			if (kbase_prepare_to_reset_gpu(queue->kctx->kbdev, RESET_FLAGS_NONE))
+				kbase_reset_gpu(queue->kctx->kbdev);
+		}
+		queue->cs_error = cs_fatal;
+		queue->cs_error_info = cs_fatal_info;
+		queue->cs_error_fatal = true;
+		queue_work(queue->kctx->csf.wq, &queue->cs_error_work);
 	}
 
-}
-
-/**
- * handle_queue_exception_event - Handler for CS fatal/fault exception events.
- *
- * @queue:  Pointer to queue for which fatal/fault event was received.
- * @cs_req: Value of the CS_REQ register from the CS's input page.
- * @cs_ack: Value of the CS_ACK register from the CS's output page.
- */
-static void handle_queue_exception_event(struct kbase_queue *const queue,
-					 const u32 cs_req, const u32 cs_ack)
-{
-	struct kbase_csf_cmd_stream_group_info const *ginfo;
-	struct kbase_csf_cmd_stream_info const *stream;
-	struct kbase_context *const kctx = queue->kctx;
-	struct kbase_device *const kbdev = kctx->kbdev;
-	struct kbase_queue_group *group = queue->group;
-	int csi_index = queue->csi_index;
-	int slot_num = group->csg_nr;
+	kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
+					CS_REQ_FATAL_MASK);
 
-	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
-
-	ginfo = &kbdev->csf.global_iface.groups[slot_num];
-	stream = &ginfo->streams[csi_index];
-
-	if ((cs_ack & CS_ACK_FATAL_MASK) != (cs_req & CS_REQ_FATAL_MASK)) {
-		handle_fatal_event(queue, stream);
-		kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
-						 CS_REQ_FATAL_MASK);
-	}
-
-	if ((cs_ack & CS_ACK_FAULT_MASK) != (cs_req & CS_REQ_FAULT_MASK)) {
-		handle_fault_event(queue, stream);
-		kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
-						 CS_REQ_FAULT_MASK);
-		kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, slot_num, true);
-	}
 }
 
 /**
@@ -2436,6 +2601,9 @@ static void handle_queue_exception_event(struct kbase_queue *const queue,
  * @ginfo:  The CSG interface provided by the firmware.
  * @irqreq: CSG's IRQ request bitmask (one bit per CS).
  * @irqack: CSG's IRQ acknowledge bitmask (one bit per CS).
+ * @track: Pointer that tracks the highest scanout priority idle CSG
+ *         and any newly potentially viable protected mode requesting
+ *          CSG in current IRQ context.
  *
  * If the interrupt request bitmask differs from the acknowledge bitmask
  * then the firmware is notifying the host of an event concerning those
@@ -2444,8 +2612,9 @@ static void handle_queue_exception_event(struct kbase_queue *const queue,
  * the request and acknowledge registers for the individual CS(s).
  */
 static void process_cs_interrupts(struct kbase_queue_group *const group,
-		      struct kbase_csf_cmd_stream_group_info const *const ginfo,
-		      u32 const irqreq, u32 const irqack)
+				  struct kbase_csf_cmd_stream_group_info const *const ginfo,
+				  u32 const irqreq, u32 const irqack,
+				  struct irq_idle_and_protm_track *track)
 {
 	struct kbase_device *const kbdev = group->kctx->kbdev;
 	u32 remaining = irqreq ^ irqack;
@@ -2475,10 +2644,16 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
 				kbase_csf_firmware_cs_output(stream, CS_ACK);
 			struct workqueue_struct *wq = group->kctx->csf.wq;
 
-			if ((cs_req & CS_REQ_EXCEPTION_MASK) ^
-			    (cs_ack & CS_ACK_EXCEPTION_MASK)) {
-				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_FAULT_INTERRUPT, group, queue, cs_req ^ cs_ack);
-				handle_queue_exception_event(queue, cs_req, cs_ack);
+			if ((cs_ack & CS_ACK_FATAL_MASK) != (cs_req & CS_REQ_FATAL_MASK)) {
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_FAULT,
+							 group, queue, cs_req ^ cs_ack);
+				handle_fatal_event(queue, stream, cs_ack);
+			}
+
+			if ((cs_ack & CS_ACK_FAULT_MASK) != (cs_req & CS_REQ_FAULT_MASK)) {
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_FAULT,
+							 group, queue, cs_req ^ cs_ack);
+				handle_fault_event(queue, cs_ack);
 			}
 
 			/* PROTM_PEND and TILER_OOM can be safely ignored
@@ -2489,30 +2664,35 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
 				u32 const cs_req_remain = cs_req & ~CS_REQ_EXCEPTION_MASK;
 				u32 const cs_ack_remain = cs_ack & ~CS_ACK_EXCEPTION_MASK;
 
-				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND,
-							   group, queue, cs_req_remain ^ cs_ack_remain);
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev,
+							 CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED,
+							 group, queue,
+							 cs_req_remain ^ cs_ack_remain);
 				continue;
 			}
 
 			if (((cs_req & CS_REQ_TILER_OOM_MASK) ^
 			     (cs_ack & CS_ACK_TILER_OOM_MASK))) {
-				get_queue(queue);
-				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_TILER_OOM_INTERRUPT, group, queue,
-							   cs_req ^ cs_ack);
-				if (WARN_ON(!queue_work(wq, &queue->oom_event_work))) {
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_TILER_OOM,
+							 group, queue, cs_req ^ cs_ack);
+				if (!queue_work(wq, &queue->oom_event_work)) {
 					/* The work item shall not have been
 					 * already queued, there can be only
 					 * one pending OoM event for a
 					 * queue.
 					 */
-					release_queue(queue);
+					dev_warn(
+						kbdev->dev,
+						"Tiler OOM work pending: queue %d group %d (ctx %d_%d)",
+						queue->csi_index, group->handle, queue->kctx->tgid,
+						queue->kctx->id);
 				}
 			}
 
 			if ((cs_req & CS_REQ_PROTM_PEND_MASK) ^
 			    (cs_ack & CS_ACK_PROTM_PEND_MASK)) {
-				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_INTERRUPT, group, queue,
-							   cs_req ^ cs_ack);
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_INTERRUPT_PROTM_PEND,
+							 group, queue, cs_req ^ cs_ack);
 
 				dev_dbg(kbdev->dev,
 					"Protected mode entry request for queue on csi %d bound to group-%d on slot %d",
@@ -2520,7 +2700,7 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
 					group->csg_nr);
 
 				bitmap_set(group->protm_pending_bitmap, i, 1);
-				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, PROTM_PENDING_SET, group, queue,
+				KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_SET, group, queue,
 							   group->protm_pending_bitmap[0]);
 				protm_pend = true;
 			}
@@ -2529,17 +2709,21 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
 
 	if (protm_pend) {
 		struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
-		u32 current_protm_pending_seq =
-			scheduler->tick_protm_pending_seq;
 
-		if (current_protm_pending_seq > group->scan_seq_num) {
+		if (scheduler->tick_protm_pending_seq > group->scan_seq_num) {
 			scheduler->tick_protm_pending_seq = group->scan_seq_num;
-			queue_work(group->kctx->csf.wq, &group->protm_event_work);
+			track->protm_grp = group;
 		}
 
+		if (!group->protected_suspend_buf.pma)
+			kthread_queue_work(&group->kctx->csf.protm_event_worker,
+				&group->protm_event_work);
+
 		if (test_bit(group->csg_nr, scheduler->csg_slots_idle_mask)) {
 			clear_bit(group->csg_nr,
 				  scheduler->csg_slots_idle_mask);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group,
+							scheduler->csg_slots_idle_mask[0]);
 			dev_dbg(kbdev->dev,
 				"Group-%d on slot %d de-idled by protm request",
 				group->handle, group->csg_nr);
@@ -2552,6 +2736,8 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
  * @csg_nr: CSG number.
+ * @track: Pointer that tracks the highest idle CSG and the newly possible viable
+ *         protected mode requesting group, in current IRQ context.
  *
  * Handles interrupts for a CSG and for CSs within it.
  *
@@ -2562,8 +2748,8 @@ static void process_cs_interrupts(struct kbase_queue_group *const group,
  *
  * See process_cs_interrupts() for details of per-stream interrupt handling.
  */
-static void process_csg_interrupts(struct kbase_device *const kbdev,
-	int const csg_nr)
+static void process_csg_interrupts(struct kbase_device *const kbdev, int const csg_nr,
+				   struct irq_idle_and_protm_track *track)
 {
 	struct kbase_csf_cmd_stream_group_info *ginfo;
 	struct kbase_queue_group *group = NULL;
@@ -2574,8 +2760,6 @@ static void process_csg_interrupts(struct kbase_device *const kbdev,
 	if (WARN_ON(csg_nr >= kbdev->csf.global_iface.group_num))
 		return;
 
-	KBASE_KTRACE_ADD(kbdev, CSG_INTERRUPT_PROCESS, NULL, csg_nr);
-
 	ginfo = &kbdev->csf.global_iface.groups[csg_nr];
 	req = kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ);
 	ack = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
@@ -2584,7 +2768,7 @@ static void process_csg_interrupts(struct kbase_device *const kbdev,
 
 	/* There may not be any pending CSG/CS interrupts to process */
 	if ((req == ack) && (irqreq == irqack))
-		goto out;
+		return;
 
 	/* Immediately set IRQ_ACK bits to be same as the IRQ_REQ bits before
 	 * examining the CS_ACK & CS_REQ bits. This would ensure that Host
@@ -2605,33 +2789,28 @@ static void process_csg_interrupts(struct kbase_device *const kbdev,
 	 * slot scheduler spinlock is required.
 	 */
 	if (!group)
-		goto out;
+		return;
 
 	if (WARN_ON(kbase_csf_scheduler_group_get_slot_locked(group) != csg_nr))
-		goto out;
-
-	if ((req ^ ack) & CSG_REQ_SYNC_UPDATE_MASK) {
-		kbase_csf_firmware_csg_input_mask(ginfo,
-			CSG_REQ, ack, CSG_REQ_SYNC_UPDATE_MASK);
+		return;
 
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SYNC_UPDATE_INTERRUPT, group, req ^ ack);
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROCESS_START, group, csg_nr);
 
-		/* SYNC_UPDATE events shall invalidate GPU idle event */
-		atomic_set(&kbdev->csf.scheduler.gpu_no_longer_idle, true);
-
-		kbase_csf_event_signal_cpu_only(group->kctx);
-	}
+	kbase_csf_handle_csg_sync_update(kbdev, ginfo, group, req, ack);
 
 	if ((req ^ ack) & CSG_REQ_IDLE_MASK) {
 		struct kbase_csf_scheduler *scheduler =	&kbdev->csf.scheduler;
 
+		KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE(
+			kbdev, kbdev->gpu_props.props.raw_props.gpu_id, csg_nr);
+
 		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack,
 			CSG_REQ_IDLE_MASK);
 
 		set_bit(csg_nr, scheduler->csg_slots_idle_mask);
 		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, group,
 					 scheduler->csg_slots_idle_mask[0]);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev,  CSG_IDLE_INTERRUPT, group, req ^ ack);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev,  CSG_INTERRUPT_IDLE, group, req ^ ack);
 		dev_dbg(kbdev->dev, "Idle notification received for Group %u on slot %d\n",
 			 group->handle, csg_nr);
 
@@ -2639,42 +2818,37 @@ static void process_csg_interrupts(struct kbase_device *const kbdev,
 			/* If there are non-idle CSGs waiting for a slot, fire
 			 * a tock for a replacement.
 			 */
-			mod_delayed_work(scheduler->wq, &scheduler->tock_work, 0);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_NON_IDLE_GROUPS,
+						group, req ^ ack);
+			kbase_csf_scheduler_invoke_tock(kbdev);
 		} else {
-			u32 current_protm_pending_seq =
-				scheduler->tick_protm_pending_seq;
-
-			if ((current_protm_pending_seq !=
-				KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) &&
-			    (group->scan_seq_num < current_protm_pending_seq)) {
-				/* If the protm enter was prevented due to groups
-				 * priority, then fire a tock for the scheduler
-				 * to re-examine the case.
-				 */
-				mod_delayed_work(scheduler->wq,
-						 &scheduler->tock_work, 0);
-			}
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_NO_NON_IDLE_GROUPS,
+						group, req ^ ack);
+		}
+
+		if (group->scan_seq_num < track->idle_seq) {
+			track->idle_seq = group->scan_seq_num;
+			track->idle_slot = csg_nr;
 		}
 	}
 
 	if ((req ^ ack) & CSG_REQ_PROGRESS_TIMER_EVENT_MASK) {
 		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack,
-			CSG_REQ_PROGRESS_TIMER_EVENT_MASK);
+						  CSG_REQ_PROGRESS_TIMER_EVENT_MASK);
 
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_PROGRESS_TIMER_INTERRUPT,
-					 group, req ^ ack);
-		dev_info(kbdev->dev,
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROGRESS_TIMER_EVENT, group,
+					 req ^ ack);
+		dev_info(
+			kbdev->dev,
 			"[%llu] Iterator PROGRESS_TIMER timeout notification received for group %u of ctx %d_%d on slot %d\n",
-			kbase_backend_get_cycle_cnt(kbdev),
-			group->handle, group->kctx->tgid, group->kctx->id, csg_nr);
+			kbase_backend_get_cycle_cnt(kbdev), group->handle, group->kctx->tgid,
+			group->kctx->id, csg_nr);
 
 		handle_progress_timer_event(group);
 	}
 
-	process_cs_interrupts(group, ginfo, irqreq, irqack);
+	process_cs_interrupts(group, ginfo, irqreq, irqack, track);
 
-out:
-	/* group may still be NULL here */
 	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_PROCESS_END, group,
 				 ((u64)req ^ ack) | (((u64)irqreq ^ irqack) << 32));
 }
@@ -2793,6 +2967,7 @@ static inline void check_protm_enter_req_complete(struct kbase_device *kbdev,
 	dev_dbg(kbdev->dev, "Protected mode entry interrupt received");
 
 	kbdev->protected_mode = true;
+	trace_mali_protected_mode(kbdev->protected_mode);
 	kbase_ipa_protection_mode_switch_event(kbdev);
 	kbase_ipa_control_protm_entered(kbdev);
 	kbase_hwcnt_backend_csf_protm_entered(&kbdev->hwcnt_gpu_iface);
@@ -2822,7 +2997,7 @@ static inline void process_protm_exit(struct kbase_device *kbdev, u32 glb_ack)
 					     GLB_REQ_PROTM_EXIT_MASK);
 
 	if (likely(scheduler->active_protm_grp)) {
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_EXIT_PROTM,
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_EXIT,
 					 scheduler->active_protm_grp, 0u);
 		scheduler->active_protm_grp = NULL;
 	} else {
@@ -2831,80 +3006,230 @@ static inline void process_protm_exit(struct kbase_device *kbdev, u32 glb_ack)
 
 	if (!WARN_ON(!kbdev->protected_mode)) {
 		kbdev->protected_mode = false;
+		trace_mali_protected_mode(kbdev->protected_mode);
 		kbase_ipa_control_protm_exited(kbdev);
 		kbase_hwcnt_backend_csf_protm_exited(&kbdev->hwcnt_gpu_iface);
 	}
+
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+	kbase_debug_coresight_csf_enable_pmode_exit(kbdev);
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 }
 
-void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val)
+static inline void process_tracked_info_for_protm(struct kbase_device *kbdev,
+						  struct irq_idle_and_protm_track *track)
 {
-	unsigned long flags;
-	u32 csg_interrupts = val & ~JOB_IRQ_GLOBAL_IF;
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct kbase_queue_group *group = track->protm_grp;
+	u32 current_protm_pending_seq = scheduler->tick_protm_pending_seq;
 
-	lockdep_assert_held(&kbdev->hwaccess_lock);
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
 
-	KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT, NULL, val);
-	kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), val);
+	if (likely(current_protm_pending_seq == KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID))
+		return;
 
-	if (csg_interrupts != 0) {
-		kbase_csf_scheduler_spin_lock(kbdev, &flags);
-		while (csg_interrupts != 0) {
-			int const csg_nr = ffs(csg_interrupts) - 1;
+	/* Handle protm from the tracked information */
+	if (track->idle_seq < current_protm_pending_seq) {
+		/* If the protm enter was prevented due to groups priority, then fire a tock
+		 * for the scheduler to re-examine the case.
+		 */
+		dev_dbg(kbdev->dev, "Attempt pending protm from idle slot %d\n", track->idle_slot);
+		kbase_csf_scheduler_invoke_tock(kbdev);
+	} else if (group) {
+		u32 i, num_groups = kbdev->csf.global_iface.group_num;
+		struct kbase_queue_group *grp;
+		bool tock_triggered = false;
+
+		/* A new protm request, and track->idle_seq is not sufficient, check across
+		 * previously notified idle CSGs in the current tick/tock cycle.
+		 */
+		for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) {
+			if (i == track->idle_slot)
+				continue;
+			grp = kbase_csf_scheduler_get_group_on_slot(kbdev, i);
+			/* If not NULL then the group pointer cannot disappear as the
+			 * scheduler spinlock is held.
+			 */
+			if (grp == NULL)
+				continue;
 
-			process_csg_interrupts(kbdev, csg_nr);
-			csg_interrupts &= ~(1 << csg_nr);
+			if (grp->scan_seq_num < current_protm_pending_seq) {
+				tock_triggered = true;
+				dev_dbg(kbdev->dev,
+					"Attempt new protm from tick/tock idle slot %d\n", i);
+				kbase_csf_scheduler_invoke_tock(kbdev);
+				break;
+			}
+		}
+
+		if (!tock_triggered) {
+			dev_dbg(kbdev->dev, "Group-%d on slot-%d start protm work\n",
+				group->handle, group->csg_nr);
+			kthread_queue_work(&group->kctx->csf.protm_event_worker,
+				&group->protm_event_work);
 		}
-		kbase_csf_scheduler_spin_unlock(kbdev, flags);
 	}
+}
 
-	if (val & JOB_IRQ_GLOBAL_IF) {
-		const struct kbase_csf_global_iface *const global_iface =
-			&kbdev->csf.global_iface;
+static void order_job_irq_clear_with_iface_mem_read(void)
+{
+	/* Ensure that write to the JOB_IRQ_CLEAR is ordered with regards to the
+	 * read from interface memory. The ordering is needed considering the way
+	 * FW & Kbase writes to the JOB_IRQ_RAWSTAT and JOB_IRQ_CLEAR registers
+	 * without any synchronization. Without the barrier there is no guarantee
+	 * about the ordering, the write to IRQ_CLEAR can take effect after the read
+	 * from interface memory and that could cause a problem for the scenario where
+	 * FW sends back to back notifications for the same CSG for events like
+	 * SYNC_UPDATE and IDLE, but Kbase gets a single IRQ and observes only the
+	 * first event. Similar thing can happen with glb events like CFG_ALLOC_EN
+	 * acknowledgment and GPU idle notification.
+	 *
+	 *       MCU                                    CPU
+	 *  ---------------                         ----------------
+	 *  Update interface memory                 Write to IRQ_CLEAR to clear current IRQ
+	 *  <barrier>                               <barrier>
+	 *  Write to IRQ_RAWSTAT to raise new IRQ   Read interface memory
+	 */
 
-		kbdev->csf.interrupt_received = true;
+	/* CPU and GPU would be in the same Outer shareable domain */
+	dmb(osh);
+}
 
-		if (!kbdev->csf.firmware_reloaded)
-			kbase_csf_firmware_reload_completed(kbdev);
-		else if (global_iface->output) {
-			u32 glb_req, glb_ack;
+void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val)
+{
+	bool deferred_handling_glb_idle_irq = false;
 
-			kbase_csf_scheduler_spin_lock(kbdev, &flags);
-			glb_req = kbase_csf_firmware_global_input_read(
-					global_iface, GLB_REQ);
-			glb_ack = kbase_csf_firmware_global_output(
-					global_iface, GLB_ACK);
-			KBASE_KTRACE_ADD(kbdev, GLB_REQ_ACQ, NULL, glb_req ^ glb_ack);
+	lockdep_assert_held(&kbdev->hwaccess_lock);
 
-			check_protm_enter_req_complete(kbdev, glb_req, glb_ack);
+	KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_START, NULL, val);
 
-			if ((glb_req ^ glb_ack) & GLB_REQ_PROTM_EXIT_MASK)
-				process_protm_exit(kbdev, glb_ack);
+	do {
+		unsigned long flags;
+		u32 csg_interrupts = val & ~JOB_IRQ_GLOBAL_IF;
+		bool glb_idle_irq_received = false;
 
-			/* Handle IDLE Hysteresis notification event */
-			if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) {
-				dev_dbg(kbdev->dev, "Idle-hysteresis event flagged");
-				kbase_csf_firmware_global_input_mask(
-						global_iface, GLB_REQ, glb_ack,
-						GLB_REQ_IDLE_EVENT_MASK);
+		kbase_reg_write(kbdev, JOB_CONTROL_REG(JOB_IRQ_CLEAR), val);
+		order_job_irq_clear_with_iface_mem_read();
 
-				kbase_csf_scheduler_process_gpu_idle_event(kbdev);
-			}
+		if (csg_interrupts != 0) {
+			struct irq_idle_and_protm_track track = { .protm_grp = NULL,
+								  .idle_seq = U32_MAX,
+								  .idle_slot = S8_MAX };
 
-			process_prfcnt_interrupts(kbdev, glb_req, glb_ack);
+			kbase_csf_scheduler_spin_lock(kbdev, &flags);
+			/* Looping through and track the highest idle and protm groups */
+			while (csg_interrupts != 0) {
+				int const csg_nr = ffs(csg_interrupts) - 1;
+
+				process_csg_interrupts(kbdev, csg_nr, &track);
+				csg_interrupts &= ~(1 << csg_nr);
+			}
 
+			/* Handle protm from the tracked information */
+			process_tracked_info_for_protm(kbdev, &track);
 			kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		}
 
-			/* Invoke the MCU state machine as a state transition
-			 * might have completed.
-			 */
-			kbase_pm_update_state(kbdev);
+		if (val & JOB_IRQ_GLOBAL_IF) {
+			const struct kbase_csf_global_iface *const global_iface =
+				&kbdev->csf.global_iface;
+
+			kbdev->csf.interrupt_received = true;
+
+			if (!kbdev->csf.firmware_reloaded)
+				kbase_csf_firmware_reload_completed(kbdev);
+			else if (global_iface->output) {
+				u32 glb_req, glb_ack;
+
+				kbase_csf_scheduler_spin_lock(kbdev, &flags);
+				glb_req =
+					kbase_csf_firmware_global_input_read(global_iface, GLB_REQ);
+				glb_ack = kbase_csf_firmware_global_output(global_iface, GLB_ACK);
+				KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_GLB_REQ_ACK, NULL,
+						 glb_req ^ glb_ack);
+
+				check_protm_enter_req_complete(kbdev, glb_req, glb_ack);
+
+				if ((glb_req ^ glb_ack) & GLB_REQ_PROTM_EXIT_MASK)
+					process_protm_exit(kbdev, glb_ack);
+
+				/* Handle IDLE Hysteresis notification event */
+				if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) {
+					dev_dbg(kbdev->dev, "Idle-hysteresis event flagged");
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+					if (kbase_csf_scheduler_process_gpu_idle_event(kbdev)) {
+						kbase_csf_firmware_global_input_mask(
+							global_iface, GLB_REQ, glb_ack,
+							GLB_REQ_IDLE_EVENT_MASK);
+					}
+#else
+					kbase_csf_firmware_global_input_mask(
+							global_iface, GLB_REQ, glb_ack,
+							GLB_REQ_IDLE_EVENT_MASK);
+#endif
+
+					glb_idle_irq_received = true;
+					/* Defer handling this IRQ to account for a race condition
+					 * where the idle worker could be executed before we have
+					 * finished handling all pending IRQs (including CSG IDLE
+					 * IRQs).
+					 */
+					deferred_handling_glb_idle_irq = true;
+				}
+
+				process_prfcnt_interrupts(kbdev, glb_req, glb_ack);
+
+				kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+				/* Invoke the MCU state machine as a state transition
+				 * might have completed.
+				 */
+				kbase_pm_update_state(kbdev);
+			}
 		}
+
+		if (!glb_idle_irq_received)
+			break;
+		/* Attempt to serve potential IRQs that might have occurred
+		 * whilst handling the previous IRQ. In case we have observed
+		 * the GLB IDLE IRQ without all CSGs having been marked as
+		 * idle, the GPU would be treated as no longer idle and left
+		 * powered on.
+		 */
+		val = kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_STATUS));
+	} while (val);
+
+	if (deferred_handling_glb_idle_irq) {
+		unsigned long flags;
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbase_csf_scheduler_process_gpu_idle_event(kbdev);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
 	}
 
 	wake_up_all(&kbdev->csf.event_wait);
+
 	KBASE_KTRACE_ADD(kbdev, CSF_INTERRUPT_END, NULL, val);
 }
 
+void kbase_csf_handle_csg_sync_update(struct kbase_device *const kbdev,
+				      struct kbase_csf_cmd_stream_group_info *ginfo,
+				      struct kbase_queue_group *group, u32 req, u32 ack)
+{
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
+
+	if ((req ^ ack) & CSG_REQ_SYNC_UPDATE_MASK) {
+		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, ack, CSG_REQ_SYNC_UPDATE_MASK);
+
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_INTERRUPT_SYNC_UPDATE, group, req ^ ack);
+
+		/* SYNC_UPDATE events shall invalidate GPU idle event */
+		atomic_set(&kbdev->csf.scheduler.gpu_no_longer_idle, true);
+
+		kbase_csf_event_signal_cpu_only(group->kctx);
+	}
+}
+
 void kbase_csf_doorbell_mapping_term(struct kbase_device *kbdev)
 {
 	if (kbdev->csf.db_filp) {
@@ -2924,13 +3249,12 @@ int kbase_csf_doorbell_mapping_init(struct kbase_device *kbdev)
 	struct file *filp;
 	int ret;
 
-	filp = shmem_file_setup("mali csf", MAX_LFS_FILESIZE, VM_NORESERVE);
+	filp = shmem_file_setup("mali csf db", MAX_LFS_FILESIZE, VM_NORESERVE);
 	if (IS_ERR(filp))
 		return PTR_ERR(filp);
 
-	ret = kbase_mem_pool_alloc_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-		1, &phys, false);
+	ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys,
+					 false, NULL);
 
 	if (ret <= 0) {
 		fput(filp);
@@ -2944,47 +3268,74 @@ int kbase_csf_doorbell_mapping_init(struct kbase_device *kbdev)
 	return 0;
 }
 
+void kbase_csf_pending_gpuq_kicks_init(struct kbase_device *kbdev)
+{
+	size_t i;
+
+	for (i = 0; i != ARRAY_SIZE(kbdev->csf.pending_gpuq_kicks); ++i)
+		INIT_LIST_HEAD(&kbdev->csf.pending_gpuq_kicks[i]);
+	spin_lock_init(&kbdev->csf.pending_gpuq_kicks_lock);
+}
+
+void kbase_csf_pending_gpuq_kicks_term(struct kbase_device *kbdev)
+{
+	size_t i;
+
+	spin_lock(&kbdev->csf.pending_gpuq_kicks_lock);
+	for (i = 0; i != ARRAY_SIZE(kbdev->csf.pending_gpuq_kicks); ++i) {
+		if (!list_empty(&kbdev->csf.pending_gpuq_kicks[i]))
+			dev_warn(kbdev->dev,
+				 "Some GPU queue kicks for priority %zu were not handled", i);
+	}
+	spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock);
+}
+
 void kbase_csf_free_dummy_user_reg_page(struct kbase_device *kbdev)
 {
-	if (as_phys_addr_t(kbdev->csf.dummy_user_reg_page)) {
-		struct page *page = as_page(kbdev->csf.dummy_user_reg_page);
+	if (kbdev->csf.user_reg.filp) {
+		struct page *page = as_page(kbdev->csf.user_reg.dummy_page);
 
-		kbase_mem_pool_free(
-			&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page,
-			false);
+		kbase_mem_pool_free(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page, false);
+		fput(kbdev->csf.user_reg.filp);
 	}
 }
 
 int kbase_csf_setup_dummy_user_reg_page(struct kbase_device *kbdev)
 {
 	struct tagged_addr phys;
+	struct file *filp;
 	struct page *page;
 	u32 *addr;
-	int ret;
 
-	kbdev->csf.dummy_user_reg_page = as_tagged(0);
+	kbdev->csf.user_reg.filp = NULL;
 
-	ret = kbase_mem_pool_alloc_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys,
-		false);
+	filp = shmem_file_setup("mali csf user_reg", MAX_LFS_FILESIZE, VM_NORESERVE);
+	if (IS_ERR(filp)) {
+		dev_err(kbdev->dev, "failed to get an unlinked file for user_reg");
+		return PTR_ERR(filp);
+	}
 
-	if (ret <= 0)
-		return ret;
+	if (kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1, &phys,
+				       false, NULL) <= 0) {
+		fput(filp);
+		return -ENOMEM;
+	}
 
 	page = as_page(phys);
-	addr = kmap_atomic(page);
+	addr = kbase_kmap_atomic(page);
 
 	/* Write a special value for the latest flush register inside the
 	 * dummy page
 	 */
 	addr[LATEST_FLUSH / sizeof(u32)] = POWER_DOWN_LATEST_FLUSH_VALUE;
 
-	kbase_sync_single_for_device(kbdev, kbase_dma_addr(page), sizeof(u32),
+	kbase_sync_single_for_device(kbdev, kbase_dma_addr(page) + LATEST_FLUSH, sizeof(u32),
 				     DMA_BIDIRECTIONAL);
-	kunmap_atomic(addr);
-
-	kbdev->csf.dummy_user_reg_page = phys;
+	kbase_kunmap_atomic(addr);
 
+	kbdev->csf.user_reg.filp = filp;
+	kbdev->csf.user_reg.dummy_page = phys;
+	kbdev->csf.user_reg.file_offset = 0;
 	return 0;
 }
 
@@ -3001,3 +3352,60 @@ u8 kbase_csf_priority_check(struct kbase_device *kbdev, u8 req_priority)
 
 	return out_priority;
 }
+
+void kbase_csf_process_queue_kick(struct kbase_queue *queue)
+{
+	struct kbase_context *kctx = queue->kctx;
+	struct kbase_device *kbdev = kctx->kbdev;
+	bool retry_kick = false;
+	int err = kbase_reset_gpu_prevent_and_wait(kbdev);
+
+	if (err) {
+		dev_err(kbdev->dev, "Unsuccessful GPU reset detected when kicking queue");
+		goto out_release_queue;
+	}
+
+	rt_mutex_lock(&kctx->csf.lock);
+
+	if (queue->bind_state != KBASE_CSF_QUEUE_BOUND)
+		goto out_allow_gpu_reset;
+
+	err = kbase_csf_scheduler_queue_start(queue);
+	if (unlikely(err)) {
+		dev_dbg(kbdev->dev, "Failed to start queue");
+		if (err == -EBUSY) {
+			retry_kick = true;
+
+			spin_lock(&kbdev->csf.pending_gpuq_kicks_lock);
+			if (list_empty(&queue->pending_kick_link)) {
+				/* A failed queue kick shall be pushed to the
+				 * back of the queue to avoid potential abuse.
+				 */
+				list_add_tail(
+					&queue->pending_kick_link,
+					&kbdev->csf.pending_gpuq_kicks[queue->group_priority]);
+				spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock);
+			} else {
+				spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock);
+				WARN_ON(atomic_read(&queue->pending_kick) == 0);
+			}
+
+			complete(&kbdev->csf.scheduler.kthread_signal);
+		}
+	}
+
+out_allow_gpu_reset:
+	if (likely(!retry_kick)) {
+		WARN_ON(atomic_read(&queue->pending_kick) == 0);
+		atomic_dec(&queue->pending_kick);
+	}
+
+	rt_mutex_unlock(&kctx->csf.lock);
+
+	kbase_reset_gpu_allow(kbdev);
+
+	return;
+out_release_queue:
+	WARN_ON(atomic_read(&queue->pending_kick) == 0);
+	atomic_dec(&queue->pending_kick);
+}
diff --git a/mali_kbase/csf/mali_kbase_csf.h b/mali_kbase/csf/mali_kbase_csf.h
index 46a0529..29119e1 100644
--- a/mali_kbase/csf/mali_kbase_csf.h
+++ b/mali_kbase/csf/mali_kbase_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -40,14 +40,17 @@
  */
 #define KBASEP_USER_DB_NR_INVALID ((s8)-1)
 
+/* Number of pages used for GPU command queue's User input & output data */
+#define KBASEP_NUM_CS_USER_IO_PAGES (2)
+
 /* Indicates an invalid value for the scan out sequence number, used to
  * signify there is no group that has protected mode execution pending.
  */
 #define KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID (U32_MAX)
 
-#define FIRMWARE_PING_INTERVAL_MS (12000) /* 12 seconds */
-
-#define FIRMWARE_IDLE_HYSTERESIS_TIME_MS (10) /* Default 10 milliseconds */
+/* 60ms optimizes power while minimizing latency impact for UI test cases. */
+#define MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS (600 * 1000)
+#define FIRMWARE_IDLE_HYSTERESIS_TIME_NS (60 * 1000 * 1000) /* Default 60 milliseconds */
 
 /* Idle hysteresis time can be scaled down when GPU sleep feature is used */
 #define FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER (5)
@@ -75,6 +78,18 @@ void kbase_csf_ctx_handle_fault(struct kbase_context *kctx,
 		struct kbase_fault *fault);
 
 /**
+ * kbase_csf_ctx_report_page_fault_for_active_groups - Notify Userspace about GPU page fault
+ *                                                   for active groups of the faulty context.
+ *
+ * @kctx:       Pointer to faulty kbase context.
+ * @fault:      Pointer to the fault.
+ *
+ * This function notifies the event notification thread of the GPU page fault.
+ */
+void kbase_csf_ctx_report_page_fault_for_active_groups(struct kbase_context *kctx,
+						       struct kbase_fault *fault);
+
+/**
  * kbase_csf_ctx_term - Terminate the CSF interface for a GPU address space.
  *
  * @kctx:	Pointer to the kbase context which is being terminated.
@@ -126,6 +141,25 @@ void kbase_csf_queue_terminate(struct kbase_context *kctx,
 			      struct kbase_ioctl_cs_queue_terminate *term);
 
 /**
+ * kbase_csf_free_command_stream_user_pages() - Free the resources allocated
+ *				    for a queue at the time of bind.
+ *
+ * @kctx:	Address of the kbase context within which the queue was created.
+ * @queue:	Pointer to the queue to be unlinked.
+ *
+ * This function will free the pair of physical pages allocated for a GPU
+ * command queue, and also release the hardware doorbell page, that were mapped
+ * into the process address space to enable direct submission of commands to
+ * the hardware. Also releases the reference taken on the queue when the mapping
+ * was created.
+ *
+ * If an explicit or implicit unbind was missed by the userspace then the
+ * mapping will persist. On process exit kernel itself will remove the mapping.
+ */
+void kbase_csf_free_command_stream_user_pages(struct kbase_context *kctx,
+					      struct kbase_queue *queue);
+
+/**
  * kbase_csf_alloc_command_stream_user_pages - Allocate resources for a
  *                                             GPU command queue.
  *
@@ -161,8 +195,9 @@ int kbase_csf_queue_bind(struct kbase_context *kctx,
  *			    are any.
  *
  * @queue:	Pointer to queue to be unbound.
+ * @process_exit: Flag to indicate if process exit is happening.
  */
-void kbase_csf_queue_unbind(struct kbase_queue *queue);
+void kbase_csf_queue_unbind(struct kbase_queue *queue, bool process_exit);
 
 /**
  * kbase_csf_queue_unbind_stopped - Unbind a GPU command queue in the case
@@ -187,6 +222,20 @@ int kbase_csf_queue_kick(struct kbase_context *kctx,
 			 struct kbase_ioctl_cs_queue_kick *kick);
 
 /**
+ * kbase_csf_queue_group_handle_is_valid - Find the queue group corresponding
+ *                                         to the indicated handle.
+ *
+ * @kctx:          The kbase context under which the queue group exists.
+ * @group_handle:  Handle for the group which uniquely identifies it within
+ *                 the context with which it was created.
+ *
+ * This function is used to find the queue group when passed a handle.
+ *
+ * Return: Pointer to a queue group on success, NULL on failure
+ */
+struct kbase_queue_group *kbase_csf_find_queue_group(struct kbase_context *kctx, u8 group_handle);
+
+/**
  * kbase_csf_queue_group_handle_is_valid - Find if the given queue group handle
  *                                         is valid.
  *
@@ -239,6 +288,7 @@ void kbase_csf_queue_group_terminate(struct kbase_context *kctx,
  */
 void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group);
 
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 /**
  * kbase_csf_queue_group_suspend - Suspend a GPU command queue group
  *
@@ -256,6 +306,7 @@ void kbase_csf_term_descheduled_queue_group(struct kbase_queue_group *group);
  */
 int kbase_csf_queue_group_suspend(struct kbase_context *kctx,
 	struct kbase_suspend_copy_buffer *sus_buf, u8 group_handle);
+#endif
 
 /**
  * kbase_csf_add_group_fatal_error - Report a fatal group error to userspace
@@ -276,6 +327,19 @@ void kbase_csf_add_group_fatal_error(
 void kbase_csf_interrupt(struct kbase_device *kbdev, u32 val);
 
 /**
+ * kbase_csf_handle_csg_sync_update - Handle SYNC_UPDATE notification for the group.
+ *
+ * @kbdev: The kbase device to handle the SYNC_UPDATE interrupt.
+ * @ginfo: Pointer to the CSG interface used by the @group
+ * @group: Pointer to the GPU command queue group.
+ * @req:   CSG_REQ register value corresponding to @group.
+ * @ack:   CSG_ACK register value corresponding to @group.
+ */
+void kbase_csf_handle_csg_sync_update(struct kbase_device *const kbdev,
+				      struct kbase_csf_cmd_stream_group_info *ginfo,
+				      struct kbase_queue_group *group, u32 req, u32 ack);
+
+/**
  * kbase_csf_doorbell_mapping_init - Initialize the fields that facilitates
  *                                   the update of userspace mapping of HW
  *                                   doorbell page.
@@ -324,6 +388,22 @@ int kbase_csf_setup_dummy_user_reg_page(struct kbase_device *kbdev);
 void kbase_csf_free_dummy_user_reg_page(struct kbase_device *kbdev);
 
 /**
+ * kbase_csf_pending_gpuq_kicks_init - Initialize the data used for handling
+ *                                     GPU queue kicks.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ */
+void kbase_csf_pending_gpuq_kicks_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_pending_gpuq_kicks_init - De-initialize the data used for handling
+ *                                     GPU queue kicks.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ */
+void kbase_csf_pending_gpuq_kicks_term(struct kbase_device *kbdev);
+
+/**
  * kbase_csf_ring_csg_doorbell - ring the doorbell for a CSG interface.
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
@@ -465,4 +545,18 @@ static inline u64 kbase_csf_ktrace_gpu_cycle_cnt(struct kbase_device *kbdev)
 	return 0;
 #endif
 }
+
+/**
+ * kbase_csf_process_queue_kick() - Process a pending kicked GPU command queue.
+ *
+ * @queue: Pointer to the queue to process.
+ *
+ * This function starts the pending queue, for which the work
+ * was previously submitted via ioctl call from application thread.
+ * If the queue is already scheduled and resident, it will be started
+ * right away, otherwise once the group is made resident.
+ */
+void kbase_csf_process_queue_kick(struct kbase_queue *queue);
+
+
 #endif /* _KBASE_CSF_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c b/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c
index 66b671d..d783650 100644
--- a/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c
+++ b/mali_kbase/csf/mali_kbase_csf_cpu_queue_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -51,18 +51,18 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data)
 {
 	struct kbase_context *kctx = file->private;
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 	if (atomic_read(&kctx->csf.cpu_queue.dump_req_status) !=
 				BASE_CSF_CPU_QUEUE_DUMP_COMPLETE) {
 		seq_puts(file, "Dump request already started! (try again)\n");
-		mutex_unlock(&kctx->csf.lock);
+		rt_mutex_unlock(&kctx->csf.lock);
 		return -EBUSY;
 	}
 
 	atomic_set(&kctx->csf.cpu_queue.dump_req_status, BASE_CSF_CPU_QUEUE_DUMP_ISSUED);
 	init_completion(&kctx->csf.cpu_queue.dump_cmp);
 	kbase_event_wakeup_nosync(kctx);
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	seq_puts(file,
 		"CPU Queues table (version:v" __stringify(MALI_CSF_CPU_QUEUE_DEBUGFS_VERSION) "):\n");
@@ -70,7 +70,7 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data)
 	wait_for_completion_timeout(&kctx->csf.cpu_queue.dump_cmp,
 			msecs_to_jiffies(3000));
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 	if (kctx->csf.cpu_queue.buffer) {
 		WARN_ON(atomic_read(&kctx->csf.cpu_queue.dump_req_status) !=
 				    BASE_CSF_CPU_QUEUE_DUMP_PENDING);
@@ -86,7 +86,7 @@ static int kbasep_csf_cpu_queue_debugfs_show(struct seq_file *file, void *data)
 	atomic_set(&kctx->csf.cpu_queue.dump_req_status,
 			BASE_CSF_CPU_QUEUE_DUMP_COMPLETE);
 
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 	return 0;
 }
 
@@ -126,33 +126,30 @@ void kbase_csf_cpu_queue_debugfs_init(struct kbase_context *kctx)
 int kbase_csf_cpu_queue_dump(struct kbase_context *kctx,
 		u64 buffer, size_t buf_size)
 {
-	int err = 0;
-
 	size_t alloc_size = buf_size;
 	char *dump_buffer;
 
 	if (!buffer || !alloc_size)
-		goto done;
+		return 0;
+
+	if (alloc_size > SIZE_MAX - PAGE_SIZE)
+		return -ENOMEM;
 
 	alloc_size = (alloc_size + PAGE_SIZE) & ~(PAGE_SIZE - 1);
 	dump_buffer = kzalloc(alloc_size, GFP_KERNEL);
-	if (ZERO_OR_NULL_PTR(dump_buffer)) {
-		err = -ENOMEM;
-		goto done;
-	}
+	if (!dump_buffer)
+		return -ENOMEM;
 
 	WARN_ON(kctx->csf.cpu_queue.buffer != NULL);
 
-	err = copy_from_user(dump_buffer,
+	if (copy_from_user(dump_buffer,
 			u64_to_user_ptr(buffer),
-			buf_size);
-	if (err) {
+			buf_size)) {
 		kfree(dump_buffer);
-		err = -EFAULT;
-		goto done;
+		return -EFAULT;
 	}
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 
 	kfree(kctx->csf.cpu_queue.buffer);
 
@@ -161,13 +158,12 @@ int kbase_csf_cpu_queue_dump(struct kbase_context *kctx,
 		kctx->csf.cpu_queue.buffer = dump_buffer;
 		kctx->csf.cpu_queue.buffer_size = buf_size;
 		complete_all(&kctx->csf.cpu_queue.dump_cmp);
-	} else {
+	} else
 		kfree(dump_buffer);
-	}
 
-	mutex_unlock(&kctx->csf.lock);
-done:
-	return err;
+	rt_mutex_unlock(&kctx->csf.lock);
+
+	return 0;
 }
 #else
 /*
diff --git a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c
index 2075797..c94e656 100644
--- a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c
+++ b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,11 +23,137 @@
 #include <mali_kbase.h>
 #include <linux/seq_file.h>
 #include <linux/delay.h>
-#include <csf/mali_kbase_csf_trace_buffer.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 #include "mali_kbase_csf_tl_reader.h"
+#include <linux/version_compat_defs.h>
+
+/* Wait time to be used cumulatively for all the CSG slots.
+ * Since scheduler lock is held when STATUS_UPDATE request is sent, there won't be
+ * any other Host request pending on the FW side and usually FW would be responsive
+ * to the Doorbell IRQs as it won't do any polling for a long time and also it won't
+ * have to wait for any HW state transition to complete for publishing the status.
+ * So it is reasonable to expect that handling of STATUS_UPDATE request would be
+ * relatively very quick.
+ */
+#define STATUS_UPDATE_WAIT_TIMEOUT 500
+
+/* The bitmask of CSG slots for which the STATUS_UPDATE request completed.
+ * The access to it is serialized with scheduler lock, so at a time it would
+ * get used either for "active_groups" or per context "groups" debugfs file.
+ */
+static DECLARE_BITMAP(csg_slots_status_updated, MAX_SUPPORTED_CSGS);
+
+static
+bool csg_slot_status_update_finish(struct kbase_device *kbdev, u32 csg_nr)
+{
+	struct kbase_csf_cmd_stream_group_info const *const ginfo =
+		&kbdev->csf.global_iface.groups[csg_nr];
+
+	return !((kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ) ^
+		  kbase_csf_firmware_csg_output(ginfo, CSG_ACK)) &
+			     CSG_REQ_STATUS_UPDATE_MASK);
+}
+
+static
+bool csg_slots_status_update_finish(struct kbase_device *kbdev,
+		const unsigned long *slots_mask)
+{
+	const u32 max_csg_slots = kbdev->csf.global_iface.group_num;
+	bool changed = false;
+	u32 csg_nr;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	for_each_set_bit(csg_nr, slots_mask, max_csg_slots) {
+		if (csg_slot_status_update_finish(kbdev, csg_nr)) {
+			set_bit(csg_nr, csg_slots_status_updated);
+			changed = true;
+		}
+	}
+
+	return changed;
+}
+
+static void wait_csg_slots_status_update_finish(struct kbase_device *kbdev,
+		unsigned long *slots_mask)
+{
+	const u32 max_csg_slots = kbdev->csf.global_iface.group_num;
+	long remaining = kbase_csf_timeout_in_jiffies(STATUS_UPDATE_WAIT_TIMEOUT);
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	bitmap_zero(csg_slots_status_updated, max_csg_slots);
+
+	while (!bitmap_empty(slots_mask, max_csg_slots) && remaining) {
+		remaining = wait_event_timeout(kbdev->csf.event_wait,
+				csg_slots_status_update_finish(kbdev, slots_mask),
+				remaining);
+		if (likely(remaining)) {
+			bitmap_andnot(slots_mask, slots_mask,
+				csg_slots_status_updated, max_csg_slots);
+		} else {
+			dev_warn(kbdev->dev,
+				 "STATUS_UPDATE request timed out for slots 0x%lx",
+				 slots_mask[0]);
+		}
+	}
+}
+
+void kbase_csf_debugfs_update_active_groups_status(struct kbase_device *kbdev)
+{
+	u32 max_csg_slots = kbdev->csf.global_iface.group_num;
+	DECLARE_BITMAP(used_csgs, MAX_SUPPORTED_CSGS) = { 0 };
+	u32 csg_nr;
+	unsigned long flags;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	/* Global doorbell ring for CSG STATUS_UPDATE request or User doorbell
+	 * ring for Extract offset update, shall not be made when MCU has been
+	 * put to sleep otherwise it will undesirably make MCU exit the sleep
+	 * state. Also it isn't really needed as FW will implicitly update the
+	 * status of all on-slot groups when MCU sleep request is sent to it.
+	 */
+	if (kbdev->csf.scheduler.state == SCHED_SLEEPING) {
+		/* Wait for the MCU sleep request to complete. */
+		kbase_pm_wait_for_desired_state(kbdev);
+		bitmap_copy(csg_slots_status_updated,
+			    kbdev->csf.scheduler.csg_inuse_bitmap, max_csg_slots);
+		return;
+	}
+
+	for (csg_nr = 0; csg_nr < max_csg_slots; csg_nr++) {
+		struct kbase_queue_group *const group =
+			kbdev->csf.scheduler.csg_slots[csg_nr].resident_group;
+		if (!group)
+			continue;
+		/* Ring the User doorbell for FW to update the Extract offset */
+		kbase_csf_ring_doorbell(kbdev, group->doorbell_nr);
+		set_bit(csg_nr, used_csgs);
+	}
+
+	/* Return early if there are no on-slot groups */
+	if (bitmap_empty(used_csgs, max_csg_slots))
+		return;
+
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	for_each_set_bit(csg_nr, used_csgs, max_csg_slots) {
+		struct kbase_csf_cmd_stream_group_info const *const ginfo =
+			&kbdev->csf.global_iface.groups[csg_nr];
+		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ,
+						  ~kbase_csf_firmware_csg_output(ginfo, CSG_ACK),
+						  CSG_REQ_STATUS_UPDATE_MASK);
+	}
+
+	BUILD_BUG_ON(MAX_SUPPORTED_CSGS > (sizeof(used_csgs[0]) * BITS_PER_BYTE));
+	kbase_csf_ring_csg_slots_doorbell(kbdev, used_csgs[0]);
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	wait_csg_slots_status_update_finish(kbdev, used_csgs);
+	/* Wait for the User doobell ring to take effect */
+	msleep(100);
+}
 
 #define MAX_SCHED_STATE_STRING_LEN (16)
 static const char *scheduler_state_to_string(struct kbase_device *kbdev,
@@ -77,16 +203,32 @@ static const char *blocked_reason_to_string(u32 reason_id)
 	return cs_blocked_reason[reason_id];
 }
 
+static bool sb_source_supported(u32 glb_version)
+{
+	bool supported = false;
+
+	if (((GLB_VERSION_MAJOR_GET(glb_version) == 3) &&
+	     (GLB_VERSION_MINOR_GET(glb_version) >= 5)) ||
+	    ((GLB_VERSION_MAJOR_GET(glb_version) == 2) &&
+	     (GLB_VERSION_MINOR_GET(glb_version) >= 6)) ||
+	    ((GLB_VERSION_MAJOR_GET(glb_version) == 1) &&
+	     (GLB_VERSION_MINOR_GET(glb_version) >= 3)))
+		supported = true;
+
+	return supported;
+}
+
 static void kbasep_csf_scheduler_dump_active_queue_cs_status_wait(
-	struct seq_file *file, u32 wait_status, u32 wait_sync_value,
-	u64 wait_sync_live_value, u64 wait_sync_pointer, u32 sb_status,
-	u32 blocked_reason)
+	struct seq_file *file, u32 glb_version, u32 wait_status, u32 wait_sync_value,
+	u64 wait_sync_live_value, u64 wait_sync_pointer, u32 sb_status, u32 blocked_reason)
 {
 #define WAITING "Waiting"
 #define NOT_WAITING "Not waiting"
 
 	seq_printf(file, "SB_MASK: %d\n",
 			CS_STATUS_WAIT_SB_MASK_GET(wait_status));
+	if (sb_source_supported(glb_version))
+		seq_printf(file, "SB_SOURCE: %d\n", CS_STATUS_WAIT_SB_SOURCE_GET(wait_status));
 	seq_printf(file, "PROGRESS_WAIT: %s\n",
 			CS_STATUS_WAIT_PROGRESS_WAIT_GET(wait_status) ?
 			WAITING : NOT_WAITING);
@@ -145,7 +287,8 @@ static void kbasep_csf_scheduler_dump_active_cs_trace(struct seq_file *file,
 static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file,
 		struct kbase_queue *queue)
 {
-	u32 *addr;
+	u64 *addr;
+	u32 *addr32;
 	u64 cs_extract;
 	u64 cs_insert;
 	u32 cs_active;
@@ -156,20 +299,25 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file,
 	struct kbase_vmap_struct *mapping;
 	u64 *evt;
 	u64 wait_sync_live_value;
+	u32 glb_version;
 
 	if (!queue)
 		return;
 
+	glb_version = queue->kctx->kbdev->csf.global_iface.version;
+
 	if (WARN_ON(queue->csi_index == KBASEP_IF_NR_INVALID ||
 		    !queue->group))
 		return;
 
-	addr = (u32 *)queue->user_io_addr;
-	cs_insert = addr[CS_INSERT_LO/4] | ((u64)addr[CS_INSERT_HI/4] << 32);
+	addr = queue->user_io_addr;
+	cs_insert = addr[CS_INSERT_LO / sizeof(*addr)];
+
+	addr = queue->user_io_addr + PAGE_SIZE / sizeof(*addr);
+	cs_extract = addr[CS_EXTRACT_LO / sizeof(*addr)];
 
-	addr = (u32 *)(queue->user_io_addr + PAGE_SIZE);
-	cs_extract = addr[CS_EXTRACT_LO/4] | ((u64)addr[CS_EXTRACT_HI/4] << 32);
-	cs_active = addr[CS_ACTIVE/4];
+	addr32 = (u32 *)(queue->user_io_addr + PAGE_SIZE / sizeof(*addr));
+	cs_active = addr32[CS_ACTIVE / sizeof(*addr32)];
 
 #define KBASEP_CSF_DEBUGFS_CS_HEADER_USER_IO \
 	"Bind Idx,     Ringbuf addr,     Size, Prio,    Insert offset,   Extract offset, Active, Doorbell\n"
@@ -200,9 +348,8 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file,
 			}
 
 			kbasep_csf_scheduler_dump_active_queue_cs_status_wait(
-				file, wait_status, wait_sync_value,
-				wait_sync_live_value, wait_sync_pointer,
-				sb_status, blocked_reason);
+				file, glb_version, wait_status, wait_sync_value,
+				wait_sync_live_value, wait_sync_pointer, sb_status, blocked_reason);
 		}
 	} else {
 		struct kbase_device const *const kbdev =
@@ -257,9 +404,8 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file,
 		}
 
 		kbasep_csf_scheduler_dump_active_queue_cs_status_wait(
-			file, wait_status, wait_sync_value,
-			wait_sync_live_value, wait_sync_pointer, sb_status,
-			blocked_reason);
+			file, glb_version, wait_status, wait_sync_value, wait_sync_live_value,
+			wait_sync_pointer, sb_status, blocked_reason);
 		/* Dealing with cs_trace */
 		if (kbase_csf_scheduler_queue_has_trace(queue))
 			kbasep_csf_scheduler_dump_active_cs_trace(file, stream);
@@ -270,54 +416,6 @@ static void kbasep_csf_scheduler_dump_active_queue(struct seq_file *file,
 	seq_puts(file, "\n");
 }
 
-static void update_active_group_status(struct seq_file *file,
-		struct kbase_queue_group *const group)
-{
-	struct kbase_device *const kbdev = group->kctx->kbdev;
-	struct kbase_csf_cmd_stream_group_info const *const ginfo =
-		&kbdev->csf.global_iface.groups[group->csg_nr];
-	long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
-	unsigned long flags;
-
-	/* Global doorbell ring for CSG STATUS_UPDATE request or User doorbell
-	 * ring for Extract offset update, shall not be made when MCU has been
-	 * put to sleep otherwise it will undesirably make MCU exit the sleep
-	 * state. Also it isn't really needed as FW will implicitly update the
-	 * status of all on-slot groups when MCU sleep request is sent to it.
-	 */
-	if (kbdev->csf.scheduler.state == SCHED_SLEEPING)
-		return;
-
-	/* Ring the User doobell shared between the queues bound to this
-	 * group, to have FW update the CS_EXTRACT for all the queues
-	 * bound to the group. Ring early so that FW gets adequate time
-	 * for the handling.
-	 */
-	kbase_csf_ring_doorbell(kbdev, group->doorbell_nr);
-
-	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ,
-			~kbase_csf_firmware_csg_output(ginfo, CSG_ACK),
-			CSG_REQ_STATUS_UPDATE_MASK);
-	kbase_csf_scheduler_spin_unlock(kbdev, flags);
-	kbase_csf_ring_csg_doorbell(kbdev, group->csg_nr);
-
-	remaining = wait_event_timeout(kbdev->csf.event_wait,
-		!((kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ) ^
-		kbase_csf_firmware_csg_output(ginfo, CSG_ACK)) &
-		CSG_REQ_STATUS_UPDATE_MASK), remaining);
-
-	if (!remaining) {
-		dev_err(kbdev->dev,
-			"Timed out for STATUS_UPDATE on group %d on slot %d",
-			group->handle, group->csg_nr);
-
-		seq_printf(file, "*** Warn: Timed out for STATUS_UPDATE on slot %d\n",
-			group->csg_nr);
-		seq_puts(file, "*** The following group-record is likely stale\n");
-	}
-}
-
 static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file,
 		struct kbase_queue_group *const group)
 {
@@ -331,8 +429,6 @@ static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file,
 		u8 slot_priority =
 			kbdev->csf.scheduler.csg_slots[group->csg_nr].priority;
 
-		update_active_group_status(file, group);
-
 		ep_c = kbase_csf_firmware_csg_output(ginfo,
 				CSG_STATUS_EP_CURRENT);
 		ep_r = kbase_csf_firmware_csg_output(ginfo, CSG_STATUS_EP_REQ);
@@ -348,25 +444,25 @@ static void kbasep_csf_scheduler_dump_active_group(struct seq_file *file,
 				CSG_STATUS_STATE_IDLE_MASK)
 			idle = 'Y';
 
-		seq_puts(file, "GroupID, CSG NR, CSG Prio, Run State, Priority, C_EP(Alloc/Req), F_EP(Alloc/Req), T_EP(Alloc/Req), Exclusive, Idle\n");
-		seq_printf(file, "%7d, %6d, %8d, %9d, %8d, %11d/%3d, %11d/%3d, %11d/%3d, %9c, %4c\n",
-			group->handle,
-			group->csg_nr,
-			slot_priority,
-			group->run_state,
-			group->priority,
-			CSG_STATUS_EP_CURRENT_COMPUTE_EP_GET(ep_c),
-			CSG_STATUS_EP_REQ_COMPUTE_EP_GET(ep_r),
-			CSG_STATUS_EP_CURRENT_FRAGMENT_EP_GET(ep_c),
-			CSG_STATUS_EP_REQ_FRAGMENT_EP_GET(ep_r),
-			CSG_STATUS_EP_CURRENT_TILER_EP_GET(ep_c),
-			CSG_STATUS_EP_REQ_TILER_EP_GET(ep_r),
-			exclusive,
-			idle);
-
-		/* Wait for the User doobell ring to take effect */
-		if (kbdev->csf.scheduler.state != SCHED_SLEEPING)
-			msleep(100);
+		if (!test_bit(group->csg_nr, csg_slots_status_updated)) {
+			seq_printf(file, "*** Warn: Timed out for STATUS_UPDATE on slot %d\n",
+				group->csg_nr);
+			seq_puts(file, "*** The following group-record is likely stale\n");
+		}
+			seq_puts(
+				file,
+				"GroupID, CSG NR, CSG Prio, Run State, Priority, C_EP(Alloc/Req), F_EP(Alloc/Req), T_EP(Alloc/Req), Exclusive, Idle\n");
+			seq_printf(
+				file,
+				"%7d, %6d, %8d, %9d, %8d, %11d/%3d, %11d/%3d, %11d/%3d, %9c, %4c\n",
+				group->handle, group->csg_nr, slot_priority, group->run_state,
+				group->priority, CSG_STATUS_EP_CURRENT_COMPUTE_EP_GET(ep_c),
+				CSG_STATUS_EP_REQ_COMPUTE_EP_GET(ep_r),
+				CSG_STATUS_EP_CURRENT_FRAGMENT_EP_GET(ep_c),
+				CSG_STATUS_EP_REQ_FRAGMENT_EP_GET(ep_r),
+				CSG_STATUS_EP_CURRENT_TILER_EP_GET(ep_c),
+				CSG_STATUS_EP_REQ_TILER_EP_GET(ep_r), exclusive, idle);
+
 	} else {
 		seq_puts(file, "GroupID, CSG NR, Run State, Priority\n");
 		seq_printf(file, "%7d, %6d, %9d, %8d\n",
@@ -404,22 +500,19 @@ static int kbasep_csf_queue_group_debugfs_show(struct seq_file *file,
 {
 	u32 gr;
 	struct kbase_context *const kctx = file->private;
-	struct kbase_device *const kbdev = kctx->kbdev;
+	struct kbase_device *kbdev;
 
 	if (WARN_ON(!kctx))
 		return -EINVAL;
 
+	kbdev = kctx->kbdev;
+
 	seq_printf(file, "MALI_CSF_CSG_DEBUGFS_VERSION: v%u\n",
 			MALI_CSF_CSG_DEBUGFS_VERSION);
 
-	mutex_lock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
 	kbase_csf_scheduler_lock(kbdev);
-	if (kbdev->csf.scheduler.state == SCHED_SLEEPING) {
-		/* Wait for the MCU sleep request to complete. Please refer the
-		 * update_active_group_status() function for the explanation.
-		 */
-		kbase_pm_wait_for_desired_state(kbdev);
-	}
+	kbase_csf_debugfs_update_active_groups_status(kbdev);
 	for (gr = 0; gr < MAX_QUEUE_GROUP_NUM; gr++) {
 		struct kbase_queue_group *const group =
 			kctx->csf.queue_groups[gr];
@@ -428,7 +521,7 @@ static int kbasep_csf_queue_group_debugfs_show(struct seq_file *file,
 			kbasep_csf_scheduler_dump_active_group(file, group);
 	}
 	kbase_csf_scheduler_unlock(kbdev);
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	return 0;
 }
@@ -453,12 +546,7 @@ static int kbasep_csf_scheduler_dump_active_groups(struct seq_file *file,
 			MALI_CSF_CSG_DEBUGFS_VERSION);
 
 	kbase_csf_scheduler_lock(kbdev);
-	if (kbdev->csf.scheduler.state == SCHED_SLEEPING) {
-		/* Wait for the MCU sleep request to complete. Please refer the
-		 * update_active_group_status() function for the explanation.
-		 */
-		kbase_pm_wait_for_desired_state(kbdev);
-	}
+	kbase_csf_debugfs_update_active_groups_status(kbdev);
 	for (csg_nr = 0; csg_nr < num_groups; csg_nr++) {
 		struct kbase_queue_group *const group =
 			kbdev->csf.scheduler.csg_slots[csg_nr].resident_group;
@@ -500,11 +588,7 @@ static const struct file_operations kbasep_csf_queue_group_debugfs_fops = {
 void kbase_csf_queue_group_debugfs_init(struct kbase_context *kctx)
 {
 	struct dentry *file;
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 
 	if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry)))
 		return;
@@ -556,14 +640,11 @@ static int kbasep_csf_debugfs_scheduling_timer_kick_set(
 	return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_enabled_fops,
-		&kbasep_csf_debugfs_scheduling_timer_enabled_get,
-		&kbasep_csf_debugfs_scheduling_timer_enabled_set,
-		"%llu\n");
-DEFINE_SIMPLE_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_kick_fops,
-		NULL,
-		&kbasep_csf_debugfs_scheduling_timer_kick_set,
-		"%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_enabled_fops,
+			 &kbasep_csf_debugfs_scheduling_timer_enabled_get,
+			 &kbasep_csf_debugfs_scheduling_timer_enabled_set, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(kbasep_csf_debugfs_scheduling_timer_kick_fops, NULL,
+			 &kbasep_csf_debugfs_scheduling_timer_kick_set, "%llu\n");
 
 /**
  * kbase_csf_debugfs_scheduler_state_get() - Get the state of scheduler.
@@ -671,7 +752,6 @@ void kbase_csf_debugfs_init(struct kbase_device *kbdev)
 			&kbasep_csf_debugfs_scheduler_state_fops);
 
 	kbase_csf_tl_reader_debugfs_init(kbdev);
-	kbase_csf_firmware_trace_buffer_debugfs_init(kbdev);
 }
 
 #else
diff --git a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h
index 397e657..16a548b 100644
--- a/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h
+++ b/mali_kbase/csf/mali_kbase_csf_csg_debugfs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -44,4 +44,11 @@ void kbase_csf_queue_group_debugfs_init(struct kbase_context *kctx);
  */
 void kbase_csf_debugfs_init(struct kbase_device *kbdev);
 
+/**
+ * kbase_csf_debugfs_update_active_groups_status() - Update on-slot group statuses
+ *
+ * @kbdev: Pointer to the device
+ */
+void kbase_csf_debugfs_update_active_groups_status(struct kbase_device *kbdev);
+
 #endif /* _KBASE_CSF_CSG_DEBUGFS_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_defs.h b/mali_kbase/csf/mali_kbase_csf_defs.h
index 07b5874..fdaa10f 100644
--- a/mali_kbase/csf/mali_kbase_csf_defs.h
+++ b/mali_kbase/csf/mali_kbase_csf_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -30,7 +30,13 @@
 #include <linux/wait.h>
 
 #include "mali_kbase_csf_firmware.h"
+#include "mali_kbase_refcount_defs.h"
 #include "mali_kbase_csf_event.h"
+#include <uapi/gpu/arm/midgard/csf/mali_kbase_csf_errors_dumpfault.h>
+
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+#include <debug/backend/mali_kbase_debug_coresight_internal_csf.h>
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 
 /* Maximum number of KCPU command queues to be created per GPU address space.
  */
@@ -55,7 +61,7 @@
 #define CSF_FIRMWARE_ENTRY_ZERO       (1ul << 31)
 
 /**
- * enum kbase_csf_bind_state - bind state of the queue
+ * enum kbase_csf_queue_bind_state - bind state of the queue
  *
  * @KBASE_CSF_QUEUE_UNBOUND: Set when the queue is registered or when the link
  * between queue and the group to which it was bound or being bound is removed.
@@ -259,16 +265,33 @@ enum kbase_queue_group_priority {
  * @CSF_PM_TIMEOUT: Timeout for GPU Power Management to reach the desired
  *                  Shader, L2 and MCU state.
  * @CSF_GPU_RESET_TIMEOUT: Waiting timeout for GPU reset to complete.
+ * @CSF_CSG_SUSPEND_TIMEOUT: Timeout given for a CSG to be suspended.
+ * @CSF_FIRMWARE_BOOT_TIMEOUT: Maximum time to wait for firmware to boot.
+ * @CSF_FIRMWARE_PING_TIMEOUT: Maximum time to wait for firmware to respond
+ *                             to a ping from KBase.
+ * @CSF_SCHED_PROTM_PROGRESS_TIMEOUT: Timeout used to prevent protected mode execution hang.
+ * @MMU_AS_INACTIVE_WAIT_TIMEOUT: Maximum waiting time in ms for the completion
+ *                                of a MMU operation.
+ * @KCPU_FENCE_SIGNAL_TIMEOUT: Waiting time in ms for triggering a KCPU queue sync state dump
  * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors. Must be last in
  *                                the enum.
+ * @KBASE_DEFAULT_TIMEOUT: Default timeout used when an invalid selector is passed
+ *                         to the pre-computed timeout getter.
  */
 enum kbase_timeout_selector {
 	CSF_FIRMWARE_TIMEOUT,
 	CSF_PM_TIMEOUT,
 	CSF_GPU_RESET_TIMEOUT,
+	CSF_CSG_SUSPEND_TIMEOUT,
+	CSF_FIRMWARE_BOOT_TIMEOUT,
+	CSF_FIRMWARE_PING_TIMEOUT,
+	CSF_SCHED_PROTM_PROGRESS_TIMEOUT,
+	MMU_AS_INACTIVE_WAIT_TIMEOUT,
+	KCPU_FENCE_SIGNAL_TIMEOUT,
 
 	/* Must be the last in the enum */
-	KBASE_TIMEOUT_SELECTOR_COUNT
+	KBASE_TIMEOUT_SELECTOR_COUNT,
+	KBASE_DEFAULT_TIMEOUT = CSF_FIRMWARE_TIMEOUT
 };
 
 /**
@@ -288,9 +311,9 @@ struct kbase_csf_notification {
  *
  * @kctx:        Pointer to the base context with which this GPU command queue
  *               is associated.
- * @reg:         Pointer to the region allocated from the shared
- *               interface segment for mapping the User mode
- *               input/output pages in MCU firmware address space.
+ * @user_io_gpu_va: The start GPU VA address of this queue's userio pages. Only
+ *                  valid (i.e. not 0 ) when the queue is enabled and its owner
+ *                  group has a runtime bound csg_reg (group region).
  * @phys:        Pointer to the physical pages allocated for the
  *               pair or User mode input/output page
  * @user_io_addr: Pointer to the permanent kernel mapping of User mode
@@ -306,6 +329,14 @@ struct kbase_csf_notification {
  *                  It is in page units.
  * @link:        Link to the linked list of GPU command queues created per
  *               GPU address space.
+ * @pending_kick:      Indicates whether there is a pending kick to be handled.
+ * @pending_kick_link: Link to the linked list of GPU command queues that have
+ *                     been kicked, but the kick has not yet been processed.
+ *                     This link would be deleted right before the kick is
+ *                     handled to allow for future kicks to occur in the mean
+ *                     time. For this reason, this must not be used to check
+ *                     for the presence of a pending queue kick. @pending_kick
+ *                     should be used instead.
  * @refcount:    Reference count, stands for the number of times the queue
  *               has been referenced. The reference is taken when it is
  *               created, when it is bound to the group and also when the
@@ -318,6 +349,7 @@ struct kbase_csf_notification {
  * @base_addr:      Base address of the CS buffer.
  * @size:           Size of the CS buffer.
  * @priority:       Priority of this queue within the group.
+ * @group_priority: Priority of the group to which this queue has been bound.
  * @bind_state:     Bind state of the queue as enum @kbase_csf_queue_bind_state
  * @csi_index:      The ID of the assigned CS hardware interface.
  * @enabled:        Indicating whether the CS is running, or not.
@@ -345,15 +377,18 @@ struct kbase_csf_notification {
  * @trace_offset_ptr:  Pointer to the CS trace buffer offset variable.
  * @trace_buffer_size: CS trace buffer size for the queue.
  * @trace_cfg:         CS trace configuration parameters.
- * @error:          GPU command queue fatal information to pass to user space.
- * @fatal_event_work: Work item to handle the CS fatal event reported for this
- *                    queue.
- * @cs_fatal_info:    Records additional information about the CS fatal event.
- * @cs_fatal:         Records information about the CS fatal event.
- * @pending:          Indicating whether the queue has new submitted work.
- * @extract_ofs: The current EXTRACT offset, this is updated during certain
- *               events such as GPU idle IRQ in order to help detect a
- *               queue's true idle status.
+ * @cs_error_work:    Work item to handle the CS fatal event reported for this
+ *                    queue or the CS fault event if dump on fault is enabled
+ *                    and acknowledgment for CS fault event needs to be done
+ *                    after dumping is complete.
+ * @cs_error_info:    Records additional information about the CS fatal event or
+ *                    about CS fault event if dump on fault is enabled.
+ * @cs_error:         Records information about the CS fatal event or
+ *                    about CS fault event if dump on fault is enabled.
+ * @cs_error_fatal:   Flag to track if the CS fault or CS fatal event occurred.
+ * @extract_ofs: The current EXTRACT offset, this is only updated when handling
+ *               the GLB IDLE IRQ if the idle timeout value is non-0 in order
+ *               to help detect a queue's true idle status.
  * @saved_cmd_ptr: The command pointer value for the GPU queue, saved when the
  *                 group to which queue is bound is suspended.
  *                 This can be useful in certain cases to know that till which
@@ -361,20 +396,23 @@ struct kbase_csf_notification {
  */
 struct kbase_queue {
 	struct kbase_context *kctx;
-	struct kbase_va_region *reg;
+	u64 user_io_gpu_va;
 	struct tagged_addr phys[2];
-	char *user_io_addr;
+	u64 *user_io_addr;
 	u64 handle;
 	int doorbell_nr;
 	unsigned long db_file_offset;
 	struct list_head link;
-	atomic_t refcount;
+	atomic_t pending_kick;
+	struct list_head pending_kick_link;
+	kbase_refcount_t refcount;
 	struct kbase_queue_group *group;
 	struct kbase_va_region *queue_reg;
 	struct work_struct oom_event_work;
 	u64 base_addr;
 	u32 size;
 	u8 priority;
+	u8 group_priority;
 	s8 csi_index;
 	enum kbase_csf_queue_bind_state bind_state;
 	bool enabled;
@@ -387,40 +425,46 @@ struct kbase_queue {
 	u64 trace_offset_ptr;
 	u32 trace_buffer_size;
 	u32 trace_cfg;
-	struct kbase_csf_notification error;
-	struct work_struct fatal_event_work;
-	u64 cs_fatal_info;
-	u32 cs_fatal;
-	atomic_t pending;
+	struct work_struct cs_error_work;
+	u64 cs_error_info;
+	u32 cs_error;
+	bool cs_error_fatal;
 	u64 extract_ofs;
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	u64 saved_cmd_ptr;
-#endif
+#endif /* CONFIG_DEBUG_FS */
 };
 
 /**
  * struct kbase_normal_suspend_buffer - Object representing a normal
  *		suspend buffer for queue group.
- * @reg:	Memory region allocated for the normal-mode suspend buffer.
+ * @gpu_va:     The start GPU VA address of the bound suspend buffer. Note, this
+ *              field is only valid when the owner group has a region bound at
+ *              runtime.
  * @phy:	Array of physical memory pages allocated for the normal-
  *		mode suspend buffer.
  */
 struct kbase_normal_suspend_buffer {
-	struct kbase_va_region *reg;
+	u64 gpu_va;
 	struct tagged_addr *phy;
 };
 
 /**
  * struct kbase_protected_suspend_buffer - Object representing a protected
  *		suspend buffer for queue group.
- * @reg:	Memory region allocated for the protected-mode suspend buffer.
+ * @gpu_va:     The start GPU VA address of the bound protected mode suspend buffer.
+ *              Note, this field is only valid when the owner group has a region
+ *              bound at runtime.
  * @pma:	Array of pointer to protected mode allocations containing
  *		information about memory pages allocated for protected mode
  *		suspend	buffer.
+ * @alloc_retries:	Number of times we retried allocing physical pages
+ *			for protected suspend buffers.
  */
 struct kbase_protected_suspend_buffer {
-	struct kbase_va_region *reg;
+	u64 gpu_va;
 	struct protected_memory_allocation **pma;
+	u8 alloc_retries;
 };
 
 /**
@@ -446,6 +490,7 @@ struct kbase_protected_suspend_buffer {
  *                  allowed to use.
  * @compute_max:    Maximum number of compute endpoints the group is
  *                  allowed to use.
+ * @csi_handlers:   Requested CSI exception handler flags for the group.
  * @tiler_mask:     Mask of tiler endpoints the group is allowed to use.
  * @fragment_mask:  Mask of fragment endpoints the group is allowed to use.
  * @compute_mask:   Mask of compute endpoints the group is allowed to use.
@@ -467,6 +512,12 @@ struct kbase_protected_suspend_buffer {
  * @faulted:          Indicates that a GPU fault occurred for the queue group.
  *                    This flag persists until the fault has been queued to be
  *                    reported to userspace.
+ * @cs_unrecoverable: Flag to unblock the thread waiting for CSG termination in
+ *                    case of CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE
+ * @reevaluate_idle_status : Flag set when work is submitted for the normal group
+ *                           or it becomes unblocked during protected mode. The
+ *                           flag helps Scheduler confirm if the group actually
+ *                           became non idle or not.
  * @bound_queues:   Array of registered queues bound to this queue group.
  * @doorbell_nr:    Index of the hardware doorbell page assigned to the
  *                  group.
@@ -476,12 +527,18 @@ struct kbase_protected_suspend_buffer {
  *                         have pending protected mode entry requests.
  * @error_fatal: An error of type BASE_GPU_QUEUE_GROUP_ERROR_FATAL to be
  *               returned to userspace if such an error has occurred.
- * @error_timeout: An error of type BASE_GPU_QUEUE_GROUP_ERROR_TIMEOUT
- *                 to be returned to userspace if such an error has occurred.
- * @error_tiler_oom: An error of type BASE_GPU_QUEUE_GROUP_ERROR_TILER_HEAP_OOM
- *                   to be returned to userspace if such an error has occurred.
  * @timer_event_work: Work item to handle the progress timeout fatal event
  *                    for the group.
+ * @deschedule_deferred_cnt: Counter keeping a track of the number of threads
+ *                           that tried to deschedule the group and had to defer
+ *                           the descheduling due to the dump on fault.
+ * @csg_reg:     An opaque pointer to the runtime bound shared regions. It is
+ *               dynamically managed by the scheduler and can be NULL if the
+ *               group is off-slot.
+ * @csg_reg_bind_retries: Runtime MCU shared region map operation attempted counts.
+ *                  It is accumulated on consecutive mapping attempt failures. On
+ *                  reaching a preset limit, the group is regarded as suffered
+ *                  a fatal error and triggers a fatal error notification.
  */
 struct kbase_queue_group {
 	struct kbase_context *kctx;
@@ -494,6 +551,8 @@ struct kbase_queue_group {
 	u8 tiler_max;
 	u8 fragment_max;
 	u8 compute_max;
+	u8 csi_handlers;
+
 
 	u64 tiler_mask;
 	u64 fragment_mask;
@@ -507,19 +566,36 @@ struct kbase_queue_group {
 	u32 prepared_seq_num;
 	u32 scan_seq_num;
 	bool faulted;
+	bool cs_unrecoverable;
+	bool reevaluate_idle_status;
 
 	struct kbase_queue *bound_queues[MAX_SUPPORTED_STREAMS_PER_GROUP];
 
 	int doorbell_nr;
-	struct work_struct protm_event_work;
+	struct kthread_work protm_event_work;
 	DECLARE_BITMAP(protm_pending_bitmap, MAX_SUPPORTED_STREAMS_PER_GROUP);
 
 	struct kbase_csf_notification error_fatal;
-	struct kbase_csf_notification error_timeout;
-	struct kbase_csf_notification error_tiler_oom;
 
 	struct work_struct timer_event_work;
 
+	/**
+	 * @dvs_buf: Address and size of scratch memory.
+	 *
+	 * Used to store intermediate DVS data by the GPU.
+	 */
+	u64 dvs_buf;
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	u32 deschedule_deferred_cnt;
+#endif
+	void *csg_reg;
+	u8 csg_reg_bind_retries;
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	/**
+	 * @prev_act: Previous CSG activity transition in a GPU metrics.
+	 */
+	bool prev_act;
+#endif
 };
 
 /**
@@ -529,10 +605,10 @@ struct kbase_queue_group {
  * @lock:   Lock preventing concurrent access to @array and the @in_use bitmap.
  * @array:  Array of pointers to kernel CPU command queues.
  * @in_use: Bitmap which indicates which kernel CPU command queues are in use.
- * @wq:     Dedicated workqueue for processing kernel CPU command queues.
- * @num_cmds:           The number of commands that have been enqueued across
- *                      all the KCPU command queues. This could be used as a
- *                      timestamp to determine the command's enqueueing time.
+ * @cmd_seq_num:        The sequence number assigned to an enqueued command,
+ *                      in incrementing order (older commands shall have a
+ *                      smaller number).
+ * @jit_lock:           Lock to serialise JIT operations.
  * @jit_cmds_head:      A list of the just-in-time memory commands, both
  *                      allocate & free, in submission order, protected
  *                      by kbase_csf_kcpu_queue_context.lock.
@@ -545,9 +621,9 @@ struct kbase_csf_kcpu_queue_context {
 	struct mutex lock;
 	struct kbase_kcpu_command_queue *array[KBASEP_MAX_KCPU_QUEUES];
 	DECLARE_BITMAP(in_use, KBASEP_MAX_KCPU_QUEUES);
-	struct workqueue_struct *wq;
-	u64 num_cmds;
+	atomic64_t cmd_seq_num;
 
+	struct mutex jit_lock;
 	struct list_head jit_cmds_head;
 	struct list_head jit_blocked_queues;
 };
@@ -581,6 +657,8 @@ struct kbase_csf_cpu_queue_context {
  * @lock:     Lock preventing concurrent access to the @in_use bitmap.
  * @in_use:   Bitmap that indicates which heap context structures are currently
  *            allocated (in @region).
+ * @heap_context_size_aligned: Size of a heap context structure, in bytes,
+ *                             aligned to GPU cacheline size.
  *
  * Heap context structures are allocated by the kernel for use by the firmware.
  * The current implementation subdivides a single GPU memory region for use as
@@ -592,6 +670,7 @@ struct kbase_csf_heap_context_allocator {
 	u64 gpu_va;
 	struct mutex lock;
 	DECLARE_BITMAP(in_use, MAX_TILER_HEAPS);
+	u32 heap_context_size_aligned;
 };
 
 /**
@@ -618,6 +697,28 @@ struct kbase_csf_tiler_heap_context {
 };
 
 /**
+ * struct kbase_csf_ctx_heap_reclaim_info - Object representing the data section of
+ *                                          a kctx for tiler heap reclaim manger
+ * @mgr_link:            Link for hooking up to the heap reclaim manger's kctx lists
+ * @nr_freed_pages:      Number of freed pages from the the kctx, after its attachment
+ *                       to the reclaim manager. This is used for tracking reclaim's
+ *                       free operation progress.
+ * @nr_est_unused_pages: Estimated number of pages that could be freed for the kctx
+ *                       when all its CSGs are off-slot, on attaching to the reclaim
+ *                       manager.
+ * @on_slot_grps:        Number of on-slot groups from this kctx. In principle, if a
+ *                       kctx has groups on-slot, the scheduler will detach it from
+ *                       the tiler heap reclaim manager, i.e. no tiler heap memory
+ *                       reclaiming operations on the kctx.
+ */
+struct kbase_csf_ctx_heap_reclaim_info {
+	struct list_head mgr_link;
+	u32 nr_freed_pages;
+	u32 nr_est_unused_pages;
+	u8 on_slot_grps;
+};
+
+/**
  * struct kbase_csf_scheduler_context - Object representing the scheduler's
  *                                      context for a GPU address space.
  *
@@ -629,7 +730,7 @@ struct kbase_csf_tiler_heap_context {
  *                      GPU command queues are idle and at least one of them
  *                      is blocked on a sync wait operation.
  * @num_idle_wait_grps: Length of the @idle_wait_groups list.
- * @sync_update_wq:     Dedicated workqueue to process work items corresponding
+ * @sync_update_worker: Dedicated workqueue to process work items corresponding
  *                      to the sync_update events by sync_set/sync_add
  *                      instruction execution on CSs bound to groups
  *                      of @idle_wait_groups list.
@@ -638,15 +739,20 @@ struct kbase_csf_tiler_heap_context {
  *                      streams bound to groups of @idle_wait_groups list.
  * @ngrp_to_schedule:	Number of groups added for the context to the
  *                      'groups_to_schedule' list of scheduler instance.
+ * @heap_info:          Heap reclaim information data of the kctx. As the
+ *                      reclaim action needs to be coordinated with the scheduler
+ *                      operations, any manipulations on the data needs holding
+ *                      the scheduler's mutex lock.
  */
 struct kbase_csf_scheduler_context {
 	struct list_head runnable_groups[KBASE_QUEUE_GROUP_PRIORITY_COUNT];
 	u32 num_runnable_grps;
 	struct list_head idle_wait_groups;
 	u32 num_idle_wait_grps;
-	struct workqueue_struct *sync_update_wq;
-	struct work_struct sync_update_work;
+	struct kthread_worker sync_update_worker;
+	struct kthread_work sync_update_work;
 	u32 ngrp_to_schedule;
+	struct kbase_csf_ctx_heap_reclaim_info heap_info;
 };
 
 /**
@@ -687,6 +793,23 @@ struct kbase_csf_event {
 };
 
 /**
+ * struct kbase_csf_user_reg_context - Object containing members to manage the mapping
+ *                                     of USER Register page for a context.
+ *
+ * @vma:                Pointer to the VMA corresponding to the virtual mapping
+ *                      of the USER register page.
+ * @file_offset:        File offset value that is assigned to userspace mapping
+ *                      of the USER Register page. It is in page units.
+ * @link:               Links the context to the device list when mapping is pointing to
+ *                      either the dummy or the real Register page.
+ */
+struct kbase_csf_user_reg_context {
+	struct vm_area_struct *vma;
+	u32 file_offset;
+	struct list_head link;
+};
+
+/**
  * struct kbase_csf_context - Object representing CSF for a GPU address space.
  *
  * @event_pages_head: A list of pages allocated for the event memory used by
@@ -724,20 +847,18 @@ struct kbase_csf_event {
  *                    used by GPU command queues, and progress timeout events.
  * @link:             Link to this csf context in the 'runnable_kctxs' list of
  *                    the scheduler instance
- * @user_reg_vma:     Pointer to the vma corresponding to the virtual mapping
- *                    of the USER register page. Currently used only for sanity
- *                    checking.
  * @sched:            Object representing the scheduler's context
- * @pending_submission_work: Work item to process pending kicked GPU command queues.
+ * @protm_event_worker: Worker to process requests to enter protected mode.
  * @cpu_queue:        CPU queue information. Only be available when DEBUG_FS
  *                    is enabled.
+ * @user_reg:         Collective information to support mapping to USER Register page.
  */
 struct kbase_csf_context {
 	struct list_head event_pages_head;
 	DECLARE_BITMAP(cookies, KBASE_CSF_NUM_USER_IO_PAGES_HANDLE);
 	struct kbase_queue *user_pages_info[
 		KBASE_CSF_NUM_USER_IO_PAGES_HANDLE];
-	struct mutex lock;
+	struct rt_mutex lock;
 	struct kbase_queue_group *queue_groups[MAX_QUEUE_GROUP_NUM];
 	struct list_head queue_list;
 	struct kbase_csf_kcpu_queue_context kcpu_queues;
@@ -745,12 +866,12 @@ struct kbase_csf_context {
 	struct kbase_csf_tiler_heap_context tiler_heaps;
 	struct workqueue_struct *wq;
 	struct list_head link;
-	struct vm_area_struct *user_reg_vma;
 	struct kbase_csf_scheduler_context sched;
-	struct work_struct pending_submission_work;
+	struct kthread_worker protm_event_worker;
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	struct kbase_csf_cpu_queue_context cpu_queue;
 #endif
+	struct kbase_csf_user_reg_context user_reg;
 };
 
 /**
@@ -765,6 +886,7 @@ struct kbase_csf_context {
  *                 mechanism to check for deadlocks involving reset waits.
  * @state:         Tracks if the GPU reset is in progress or not.
  *                 The state is represented by enum @kbase_csf_reset_gpu_state.
+ * @force_pm_hw_reset:	pixel: Powercycle the GPU instead of attempting a soft/hard reset.
  */
 struct kbase_csf_reset_gpu {
 	struct workqueue_struct *workq;
@@ -772,6 +894,7 @@ struct kbase_csf_reset_gpu {
 	wait_queue_head_t wait;
 	struct rw_semaphore sem;
 	atomic_t state;
+	bool force_pm_hw_reset;
 };
 
 /**
@@ -790,6 +913,49 @@ struct kbase_csf_csg_slot {
 };
 
 /**
+ * struct kbase_csf_sched_heap_reclaim_mgr - Object for managing tiler heap reclaim
+ *                                           kctx lists inside the CSF device's scheduler.
+ *
+ * @heap_reclaim:   Tiler heap reclaim shrinker object.
+ * @ctx_lists:      Array of kctx lists, size matching CSG defined priorities. The
+ *                  lists track the kctxs attached to the reclaim manager.
+ * @unused_pages:   Estimated number of unused pages from the @ctxlist array. The
+ *                  number is indicative for use with reclaim shrinker's count method.
+ */
+struct kbase_csf_sched_heap_reclaim_mgr {
+	struct shrinker heap_reclaim;
+	struct list_head ctx_lists[KBASE_QUEUE_GROUP_PRIORITY_COUNT];
+	atomic_t unused_pages;
+};
+
+/**
+ * struct kbase_csf_mcu_shared_regions - Control data for managing the MCU shared
+ *                                       interface segment regions for scheduler
+ *                                       operations
+ *
+ * @array_csg_regs:   Base pointer of an internally created array_csg_regs[].
+ * @unused_csg_regs:  List contains unused csg_regs items. When an item is bound to a
+ *                    group that is placed onto on-slot by the scheduler, it is dropped
+ *                    from the list (i.e busy active). The Scheduler will put an active
+ *                    item back when it's becoming off-slot (not in use).
+ * @dummy_phys:       An array of dummy phys[nr_susp_pages] pages for use with normal
+ *                    and pmode suspend buffers, as a default replacement of a CSG's pages
+ *                    for the MMU mapping when the csg_reg is not bound to a group.
+ * @pma_phys:         Pre-allocated array phy[nr_susp_pages] for transitional use with
+ *                    protected suspend buffer MMU map operations.
+ * @userio_mem_rd_flags: Userio input page's read access mapping configuration flags.
+ * @dummy_phys_allocated: Indicating the @p dummy_phy page is allocated when true.
+ */
+struct kbase_csf_mcu_shared_regions {
+	void *array_csg_regs;
+	struct list_head unused_csg_regs;
+	struct tagged_addr *dummy_phys;
+	struct tagged_addr *pma_phys;
+	unsigned long userio_mem_rd_flags;
+	bool dummy_phys_allocated;
+};
+
+/**
  * struct kbase_csf_scheduler - Object representing the scheduler used for
  *                              CSF for an instance of GPU platform device.
  * @lock:                  Lock to serialize the scheduler operations and
@@ -848,19 +1014,19 @@ struct kbase_csf_csg_slot {
  *                          "tock" schedule operation concluded. Used for
  *                          evaluating the exclusion window for in-cycle
  *                          schedule operation.
+ * @csf_worker:             Dedicated kthread_worker to execute the @tick_work.
  * @timer_enabled:          Whether the CSF scheduler wakes itself up for
  *                          periodic scheduling tasks. If this value is 0
  *                          then it will only perform scheduling under the
  *                          influence of external factors e.g., IRQs, IOCTLs.
- * @wq:                     Dedicated workqueue to execute the @tick_work.
  * @tick_timer:             High-resolution timer employed to schedule tick
  *                          workqueue items (kernel-provided delayed_work
  *                          items do not use hrtimer and for some reason do
  *                          not provide sufficiently reliable periodicity).
- * @tick_work:              Work item that performs the "schedule on tick"
- *                          operation to implement timeslice-based scheduling.
- * @tock_work:              Work item that would perform the schedule on tock
- *                          operation to implement the asynchronous scheduling.
+ * @pending_tick_work:      Indicates that kbase_csf_scheduler_kthread() should perform
+ *                          a scheduling tick.
+ * @pending_tock_work:      Indicates that kbase_csf_scheduler_kthread() should perform
+ *                          a scheduling tock.
  * @ping_work:              Work item that would ping the firmware at regular
  *                          intervals, only if there is a single active CSG
  *                          slot, to check if firmware is alive and would
@@ -870,8 +1036,6 @@ struct kbase_csf_csg_slot {
  *                          @top_grp.
  * @top_grp:                Pointer to queue group inside @groups_to_schedule
  *                          list that was assigned the highest slot priority.
- * @tock_pending_request:   A "tock" request is pending: a group that is not
- *                          currently on the GPU demands to be scheduled.
  * @active_protm_grp:       Indicates if firmware has been permitted to let GPU
  *                          enter protected mode with the given group. On exit
  *                          from protected mode the pointer is reset to NULL.
@@ -884,6 +1048,13 @@ struct kbase_csf_csg_slot {
  *                          handler.
  * @gpu_idle_work:          Work item for facilitating the scheduler to bring
  *                          the GPU to a low-power mode on becoming idle.
+ * @fast_gpu_idle_handling: Indicates whether to relax many of the checks
+ *                          normally done in the GPU idle worker. This is
+ *                          set to true when handling the GLB IDLE IRQ if the
+ *                          idle hysteresis timeout is 0, since it makes it
+ *                          possible to receive this IRQ before the extract
+ *                          offset is published (which would cause more
+ *                          extensive GPU idle checks to fail).
  * @gpu_no_longer_idle:     Effective only when the GPU idle worker has been
  *                          queued for execution, this indicates whether the
  *                          GPU has become non-idle since the last time the
@@ -901,22 +1072,41 @@ struct kbase_csf_csg_slot {
  *                          after GPU and L2 cache have been powered up. So when
  *                          this count is zero, MCU will not be powered up.
  * @csg_scheduling_period_ms: Duration of Scheduling tick in milliseconds.
- * @tick_timer_active:      Indicates whether the @tick_timer is effectively
- *                          active or not, as the callback function of
- *                          @tick_timer will enqueue @tick_work only if this
- *                          flag is true. This is mainly useful for the case
- *                          when scheduling tick needs to be advanced from
- *                          interrupt context, without actually deactivating
- *                          the @tick_timer first and then enqueing @tick_work.
  * @tick_protm_pending_seq: Scan out sequence number of the group that has
  *                          protected mode execution pending for the queue(s)
  *                          bound to it and will be considered first for the
  *                          protected mode execution compared to other such
  *                          groups. It is updated on every tick/tock.
  *                          @interrupt_lock is used to serialize the access.
+ * @sc_rails_off_work:      Work item enqueued on GPU idle notification to
+ *                          turn off the shader core power rails.
+ * @sc_power_rails_off:     Flag to keep a track of the status of shader core
+ *                          power rails, set to true when power rails are
+ *                          turned off.
+ * @gpu_idle_work_pending:  Flag to indicate that the power down of GPU is
+ *                          pending and it is set after turning off the
+ *                          shader core power rails. The power down is skipped
+ *                          if the flag is cleared. @lock is used to serialize
+ *                          the access. Scheduling actions are skipped whilst
+ *                          this flag is set.
+ * @gpu_idle_fw_timer_enabled: Flag to keep a track if GPU idle event reporting
+ *                             is disabled on FW side. It is set for the power
+ *                             policy where the power managment of shader cores
+ *                             needs to be done by the Host.
+ * @protm_enter_time:       GPU protected mode enter time.
+ * @reclaim_mgr:            CSGs tiler heap manager object.
+ * @mcu_regs_data:          Scheduler MCU shared regions data for managing the
+ *                          shared interface mappings for on-slot queues and
+ *                          CSG suspend buffers.
+ * @kthread_signal:         Used to wake up the GPU queue submission
+ *                          thread when a queue needs attention.
+ * @kthread_running:        Whether the GPU queue submission thread should keep
+ *                          executing.
+ * @gpuq_kthread:           High-priority thread used to handle GPU queue
+ *                          submissions.
  */
 struct kbase_csf_scheduler {
-	struct mutex lock;
+	struct rt_mutex lock;
 	spinlock_t interrupt_lock;
 	enum kbase_csf_scheduler_state state;
 	DECLARE_BITMAP(doorbell_inuse_bitmap, CSF_NUM_DOORBELL);
@@ -935,25 +1125,46 @@ struct kbase_csf_scheduler {
 	DECLARE_BITMAP(csg_slots_idle_mask, MAX_SUPPORTED_CSGS);
 	DECLARE_BITMAP(csg_slots_prio_update, MAX_SUPPORTED_CSGS);
 	unsigned long last_schedule;
-	bool timer_enabled;
-	struct workqueue_struct *wq;
+	struct kthread_worker csf_worker;
+	atomic_t timer_enabled;
 	struct hrtimer tick_timer;
-	struct work_struct tick_work;
-	struct delayed_work tock_work;
+	atomic_t pending_tick_work;
+	atomic_t pending_tock_work;
 	struct delayed_work ping_work;
 	struct kbase_context *top_ctx;
 	struct kbase_queue_group *top_grp;
-	bool tock_pending_request;
 	struct kbase_queue_group *active_protm_grp;
-	struct workqueue_struct *idle_wq;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	struct delayed_work gpu_idle_work;
+#else
 	struct work_struct gpu_idle_work;
+#endif
+	struct workqueue_struct *idle_wq;
+	bool fast_gpu_idle_handling;
 	atomic_t gpu_no_longer_idle;
 	atomic_t non_idle_offslot_grps;
 	u32 non_idle_scanout_grps;
 	u32 pm_active_count;
 	unsigned int csg_scheduling_period_ms;
-	bool tick_timer_active;
 	u32 tick_protm_pending_seq;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	struct work_struct sc_rails_off_work;
+	bool sc_power_rails_off;
+	bool gpu_idle_work_pending;
+	bool gpu_idle_fw_timer_enabled;
+#endif
+	ktime_t protm_enter_time;
+	struct kbase_csf_sched_heap_reclaim_mgr reclaim_mgr;
+	struct kbase_csf_mcu_shared_regions mcu_regs_data;
+	struct completion kthread_signal;
+	bool kthread_running;
+	struct task_struct *gpuq_kthread;
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	/**
+	 *  @gpu_metrics_tb: Handler of firmware trace buffer for gpu_metrics
+	 */
+	struct firmware_trace_buffer *gpu_metrics_tb;
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
 };
 
 /*
@@ -970,9 +1181,9 @@ struct kbase_csf_scheduler {
 	GLB_PROGRESS_TIMER_TIMEOUT_SCALE)
 
 /*
- * Default GLB_PWROFF_TIMER_TIMEOUT value in unit of micro-seconds.
+ * Default GLB_PWROFF_TIMER_TIMEOUT value in unit of nanosecond.
  */
-#define DEFAULT_GLB_PWROFF_TIMEOUT_US (800)
+#define DEFAULT_GLB_PWROFF_TIMEOUT_NS (800 * 1000)
 
 /*
  * In typical operations, the management of the shader core power transitions
@@ -1140,6 +1351,7 @@ struct kbase_ipa_control {
  * @flags: bitmask of CSF_FIRMWARE_ENTRY_* conveying the interface attributes
  * @data_start: Offset into firmware image at which the interface data starts
  * @data_end: Offset into firmware image at which the interface data ends
+ * @virtual_exe_start: Starting GPU execution virtual address of this interface
  * @kernel_map: A kernel mapping of the memory or NULL if not required to be
  *              mapped in the kernel
  * @pma: Array of pointers to protected memory allocations.
@@ -1156,6 +1368,7 @@ struct kbase_csf_firmware_interface {
 	u32 flags;
 	u32 data_start;
 	u32 data_end;
+	u32 virtual_exe_start;
 	void *kernel_map;
 	struct protected_memory_allocation **pma;
 };
@@ -1174,6 +1387,144 @@ struct kbase_csf_hwcnt {
 	bool enable_pending;
 };
 
+/*
+ * struct kbase_csf_mcu_fw - Object containing device loaded MCU firmware data.
+ *
+ * @size:                    Loaded firmware data size. Meaningful only when the
+ *                           other field @p data is not NULL.
+ * @data:                    Pointer to the device retained firmware data. If NULL
+ *                           means not loaded yet or error in loading stage.
+ */
+struct kbase_csf_mcu_fw {
+	size_t size;
+	u8 *data;
+};
+
+/*
+ * Firmware log polling period.
+ */
+#define KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT 25
+
+/**
+ * enum kbase_csf_firmware_log_mode - Firmware log operating mode
+ *
+ * @KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL: Manual mode, firmware log can be read
+ * manually by the userspace (and it will also be dumped automatically into
+ * dmesg on GPU reset).
+ *
+ * @KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT: Automatic printing mode, firmware log
+ * will be periodically emptied into dmesg, manual reading through debugfs is
+ * disabled.
+ *
+ * @KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD: Automatic discarding mode, firmware
+ * log will be periodically discarded, the remaining log can be read manually by
+ * the userspace (and it will also be dumped automatically into dmesg on GPU
+ * reset).
+ */
+enum kbase_csf_firmware_log_mode {
+	KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL,
+	KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT,
+	KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD
+};
+
+/**
+ * struct kbase_csf_firmware_log - Object containing members for handling firmware log.
+ *
+ * @mode:                      Firmware log operating mode.
+ * @busy:                      Indicating whether a firmware log operation is in progress.
+ * @poll_work:                 Work item that would poll firmware log buffer
+ *                             at regular intervals to perform any periodic
+ *                             activities required by current log mode.
+ * @dump_buf:                  Buffer used for dumping the log.
+ * @func_call_list_va_start:   Virtual address of the start of the call list of FW log functions.
+ * @func_call_list_va_end:     Virtual address of the end of the call list of FW log functions.
+ * @poll_period_ms:            Firmware log polling period in milliseconds.
+ */
+struct kbase_csf_firmware_log {
+	enum kbase_csf_firmware_log_mode mode;
+	atomic_t busy;
+	struct delayed_work poll_work;
+	u8 *dump_buf;
+	u32 func_call_list_va_start;
+	u32 func_call_list_va_end;
+	atomic_t poll_period_ms;
+};
+
+/**
+ * struct kbase_csf_firmware_core_dump - Object containing members for handling
+ *                                       firmware core dump.
+ *
+ * @mcu_regs_addr: GPU virtual address of the start of the MCU registers buffer
+ *                 in Firmware.
+ * @version:       Version of the FW image header core dump data format. Bits
+ *                 7:0 specify version minor and 15:8 specify version major.
+ * @available:     Flag to identify if the FW core dump buffer is available.
+ *                 True if entry is available in the FW image header and version
+ *                 is supported, False otherwise.
+ */
+struct kbase_csf_firmware_core_dump {
+	u32 mcu_regs_addr;
+	u16 version;
+	bool available;
+};
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+/**
+ * struct kbase_csf_dump_on_fault - Faulty information to deliver to the daemon
+ *
+ * @error_code:       Error code.
+ * @kctx_tgid:        tgid value of the Kbase context for which the fault happened.
+ * @kctx_id:          id of the Kbase context for which the fault happened.
+ * @enabled:          Flag to indicate that 'csf_fault' debugfs has been opened
+ *                    so dump on fault is enabled.
+ * @fault_wait_wq:    Waitqueue on which user space client is blocked till kbase
+ *                    reports a fault.
+ * @dump_wait_wq:     Waitqueue on which kbase threads are blocked till user space client
+ *                    completes the dump on fault.
+ * @lock:             Lock to protect this struct members from concurrent access.
+ */
+struct kbase_csf_dump_on_fault {
+	enum dumpfault_error_type error_code;
+	u32 kctx_tgid;
+	u32 kctx_id;
+	atomic_t enabled;
+	wait_queue_head_t fault_wait_wq;
+	wait_queue_head_t dump_wait_wq;
+	spinlock_t lock;
+};
+#endif /* CONFIG_DEBUG_FS*/
+
+/**
+ * struct kbase_csf_user_reg - Object containing members to manage the mapping
+ *                             of USER Register page for all contexts
+ *
+ * @dummy_page:             Address of a dummy page that is mapped in place
+ *                          of the real USER Register page just before the GPU
+ *                          is powered down. The USER Register page is mapped
+ *                          in the address space of every process, that created
+ *                          a Base context, to enable the access to LATEST_FLUSH
+ *                          register from userspace.
+ * @filp:                   Pointer to a dummy file, that along with @file_offset,
+ *                          facilitates the use of unique file offset for the userspace mapping
+ *                          created for USER Register page.
+ *                          The userspace mapping is made to point to this file
+ *                          inside the mmap handler.
+ * @file_offset:            Counter that is incremented every time Userspace creates a mapping of
+ *                          USER Register page, to provide a unique file offset range for
+ *                          @filp file, so that the CPU PTE of the Userspace mapping can be zapped
+ *                          through the kernel function unmap_mapping_range().
+ *                          It is incremented in page units.
+ * @list:                   Linked list to maintain user processes(contexts)
+ *                          having the mapping to USER Register page.
+ *                          It's protected by &kbase_csf_device.reg_lock.
+ */
+struct kbase_csf_user_reg {
+	struct tagged_addr dummy_page;
+	struct file *filp;
+	u32 file_offset;
+	struct list_head list;
+};
+
 /**
  * struct kbase_csf_device - Object representing CSF for an instance of GPU
  *                           platform device.
@@ -1192,7 +1543,7 @@ struct kbase_csf_hwcnt {
  *                          image.
  * @shared_interface:       Pointer to the interface object containing info for
  *                          the memory area shared between firmware & host.
- * @shared_reg_rbtree:      RB tree of the memory regions allocated from the
+ * @mcu_shared_zone:        Memory zone tracking memory regions allocated from the
  *                          shared interface segment in MCU firmware address
  *                          space.
  * @db_filp:                Pointer to a dummy file, that alongwith
@@ -1211,17 +1562,6 @@ struct kbase_csf_hwcnt {
  *                          of the real Hw doorbell page for the active GPU
  *                          command queues after they are stopped or after the
  *                          GPU is powered down.
- * @dummy_user_reg_page:    Address of the dummy page that is mapped in place
- *                          of the real User register page just before the GPU
- *                          is powered down. The User register page is mapped
- *                          in the address space of every process, that created
- *                          a Base context, to enable the access to LATEST_FLUSH
- *                          register from userspace.
- * @mali_file_inode:        Pointer to the inode corresponding to mali device
- *                          file. This is needed in order to switch to the
- *                          @dummy_user_reg_page on GPU power down.
- *                          All instances of the mali device file will point to
- *                          the same inode.
  * @reg_lock:               Lock to serialize the MCU firmware related actions
  *                          that affect all contexts such as allocation of
  *                          regions from shared interface area, assignment of
@@ -1264,27 +1604,48 @@ struct kbase_csf_hwcnt {
  *                            acknowledgement is pending.
  * @fw_error_work:          Work item for handling the firmware internal error
  *                          fatal event.
+ * @coredump_work:          Work item for initiating a platform core dump.
  * @ipa_control:            IPA Control component manager.
- * @mcu_core_pwroff_dur_us: Sysfs attribute for the glb_pwroff timeout input
- *                          in unit of micro-seconds. The firmware does not use
+ * @mcu_core_pwroff_dur_ns: Sysfs attribute for the glb_pwroff timeout input
+ *                          in unit of nanoseconds. The firmware does not use
  *                          it directly.
  * @mcu_core_pwroff_dur_count: The counterpart of the glb_pwroff timeout input
  *                             in interface required format, ready to be used
  *                             directly in the firmware.
+ * @mcu_core_pwroff_dur_count_modifier: Update csffw_glb_req_cfg_pwroff_timer
+ *                                      to make the shr(10) modifier conditional
+ *                                      on new flag in GLB_PWROFF_TIMER_CONFIG
  * @mcu_core_pwroff_reg_shadow: The actual value that has been programed into
  *                              the glb_pwoff register. This is separated from
  *                              the @p mcu_core_pwroff_dur_count as an update
  *                              to the latter is asynchronous.
- * @gpu_idle_hysteresis_ms: Sysfs attribute for the idle hysteresis time
- *                          window in unit of ms. The firmware does not use it
- *                          directly.
+ * @gpu_idle_hysteresis_ns: Sysfs attribute for the idle hysteresis time
+ *                          window in unit of nanoseconds. The firmware does not
+ *                          use it directly.
  * @gpu_idle_dur_count:     The counterpart of the hysteresis time window in
  *                          interface required format, ready to be used
  *                          directly in the firmware.
+ * @gpu_idle_dur_count_modifier: Update csffw_glb_req_idle_enable to make the shr(10)
+ *                               modifier conditional on the new flag
+ *                               in GLB_IDLE_TIMER_CONFIG.
  * @fw_timeout_ms:          Timeout value (in milliseconds) used when waiting
  *                          for any request sent to the firmware.
  * @hwcnt:                  Contain members required for handling the dump of
  *                          HW counters.
+ * @fw:                     Copy of the loaded MCU firmware image.
+ * @fw_log:                 Contain members required for handling firmware log.
+ * @fw_core_dump:           Contain members required for handling the firmware
+ *                          core dump.
+ * @dof:                    Structure for dump on fault.
+ * @user_reg:               Collective information to support the mapping to
+ *                          USER Register page for user processes.
+ * @pending_gpuq_kicks:     Lists of GPU queue that have been kicked but not
+ *                          yet processed, categorised by queue group's priority.
+ * @pending_gpuq_kicks_lock: Protect @pending_gpu_kicks and
+ *                           kbase_queue.pending_kick_link.
+ * @quirks_ext:             Pointer to an allocated buffer containing the firmware
+ *                          workarounds configuration.
+ * @pmode_sync_sem:         RW Semaphore to prevent MMU operations during P.Mode entrance.
  */
 struct kbase_csf_device {
 	struct kbase_mmu_table mcu_mmu;
@@ -1294,12 +1655,10 @@ struct kbase_csf_device {
 	struct kobject *fw_cfg_kobj;
 	struct kbase_csf_trace_buffers firmware_trace_buffers;
 	void *shared_interface;
-	struct rb_root shared_reg_rbtree;
+	struct kbase_reg_zone mcu_shared_zone;
 	struct file *db_filp;
 	u32 db_file_offsets;
 	struct tagged_addr dummy_db_page;
-	struct tagged_addr dummy_user_reg_page;
-	struct inode *mali_file_inode;
 	struct mutex reg_lock;
 	wait_queue_head_t event_wait;
 	bool interrupt_received;
@@ -1316,14 +1675,34 @@ struct kbase_csf_device {
 	struct work_struct firmware_reload_work;
 	bool glb_init_request_pending;
 	struct work_struct fw_error_work;
+	struct work_struct coredump_work;
 	struct kbase_ipa_control ipa_control;
-	u32 mcu_core_pwroff_dur_us;
+	u32 mcu_core_pwroff_dur_ns;
 	u32 mcu_core_pwroff_dur_count;
+	u32 mcu_core_pwroff_dur_count_modifier;
 	u32 mcu_core_pwroff_reg_shadow;
-	u32 gpu_idle_hysteresis_ms;
+	u32 gpu_idle_hysteresis_ns;
 	u32 gpu_idle_dur_count;
+	u32 gpu_idle_dur_count_modifier;
 	unsigned int fw_timeout_ms;
 	struct kbase_csf_hwcnt hwcnt;
+	struct kbase_csf_mcu_fw fw;
+	struct kbase_csf_firmware_log fw_log;
+	struct kbase_csf_firmware_core_dump fw_core_dump;
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	struct kbase_csf_dump_on_fault dof;
+#endif /* CONFIG_DEBUG_FS */
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+	/**
+	 * @coresight: Coresight device structure.
+	 */
+	struct kbase_debug_coresight_device coresight;
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+	struct kbase_csf_user_reg user_reg;
+	struct list_head pending_gpuq_kicks[KBASE_QUEUE_GROUP_PRIORITY_COUNT];
+	spinlock_t pending_gpuq_kicks_lock;
+	u32 *quirks_ext;
+	struct rw_semaphore pmode_sync_sem;
 };
 
 /**
diff --git a/mali_kbase/csf/mali_kbase_csf_event.c b/mali_kbase/csf/mali_kbase_csf_event.c
index 5c86688..63e6c15 100644
--- a/mali_kbase/csf/mali_kbase_csf_event.c
+++ b/mali_kbase/csf/mali_kbase_csf_event.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -102,7 +102,7 @@ static void sync_update_notify_gpu(struct kbase_context *kctx)
 
 	if (can_notify_gpu) {
 		kbase_csf_ring_doorbell(kctx->kbdev, CSF_KERNEL_DOORBELL_NR);
-		KBASE_KTRACE_ADD(kctx->kbdev, SYNC_UPDATE_EVENT_NOTIFY_GPU, kctx, 0u);
+		KBASE_KTRACE_ADD(kctx->kbdev, CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT, kctx, 0u);
 	}
 
 	spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags);
@@ -120,7 +120,7 @@ void kbase_csf_event_signal(struct kbase_context *kctx, bool notify_gpu)
 	/* First increment the signal count and wake up event thread.
 	 */
 	atomic_set(&kctx->event_count, 1);
-	kbase_event_wakeup(kctx);
+	kbase_event_wakeup_nosync(kctx);
 
 	/* Signal the CSF firmware. This is to ensure that pending command
 	 * stream synch object wait operations are re-evaluated.
@@ -169,7 +169,8 @@ void kbase_csf_event_term(struct kbase_context *kctx)
 		kfree(event_cb);
 	}
 
-	WARN_ON(!list_empty(&kctx->csf.event.error_list));
+	WARN(!list_empty(&kctx->csf.event.error_list),
+	     "Error list not empty for ctx %d_%d\n", kctx->tgid, kctx->id);
 
 	spin_unlock_irqrestore(&kctx->csf.event.lock, flags);
 }
@@ -226,12 +227,15 @@ void kbase_csf_event_add_error(struct kbase_context *const kctx,
 		return;
 
 	spin_lock_irqsave(&kctx->csf.event.lock, flags);
-	if (!WARN_ON(!list_empty(&error->link))) {
+	if (list_empty(&error->link)) {
 		error->data = *data;
 		list_add_tail(&error->link, &kctx->csf.event.error_list);
 		dev_dbg(kctx->kbdev->dev,
 			"Added error %pK of type %d in context %pK\n",
 			(void *)error, data->type, (void *)kctx);
+	} else {
+		dev_dbg(kctx->kbdev->dev, "Error %pK of type %d already pending in context %pK",
+			(void *)error, error->data.type, (void *)kctx);
 	}
 	spin_unlock_irqrestore(&kctx->csf.event.lock, flags);
 }
@@ -241,6 +245,14 @@ bool kbase_csf_event_error_pending(struct kbase_context *kctx)
 	bool error_pending = false;
 	unsigned long flags;
 
+	/* Withhold the error event if the dump on fault is ongoing.
+	 * This would prevent the Userspace from taking error recovery actions
+	 * (which can potentially affect the state that is being dumped).
+	 * Event handling thread would eventually notice the error event.
+	 */
+	if (unlikely(!kbase_debug_csf_fault_dump_complete(kctx->kbdev)))
+		return false;
+
 	spin_lock_irqsave(&kctx->csf.event.lock, flags);
 	error_pending = !list_empty(&kctx->csf.event.error_list);
 
diff --git a/mali_kbase/csf/mali_kbase_csf_event.h b/mali_kbase/csf/mali_kbase_csf_event.h
index 4c853b5..52122a9 100644
--- a/mali_kbase/csf/mali_kbase_csf_event.h
+++ b/mali_kbase/csf/mali_kbase_csf_event.h
@@ -30,8 +30,8 @@ struct kbase_csf_event;
 enum kbase_csf_event_callback_action;
 
 /**
- * kbase_csf_event_callback_action - type for callback functions to be
- *                                   called upon CSF events.
+ * kbase_csf_event_callback - type for callback functions to be
+ *                            called upon CSF events.
  * @param:   Generic parameter to pass to the callback function.
  *
  * This is the type of callback functions that can be registered
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware.c b/mali_kbase/csf/mali_kbase_csf_firmware.c
index bf7cdf4..cf4bb4c 100644
--- a/mali_kbase/csf/mali_kbase_csf_firmware.c
+++ b/mali_kbase/csf/mali_kbase_csf_firmware.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,6 +21,8 @@
 
 #include "mali_kbase.h"
 #include "mali_kbase_csf_firmware_cfg.h"
+#include "mali_kbase_csf_firmware_log.h"
+#include "mali_kbase_csf_firmware_core_dump.h"
 #include "mali_kbase_csf_trace_buffer.h"
 #include "mali_kbase_csf_timeout.h"
 #include "mali_kbase_mem.h"
@@ -37,27 +39,29 @@
 #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h"
 #include <csf/ipa_control/mali_kbase_csf_ipa_control.h>
 #include <csf/mali_kbase_csf_registers.h>
-
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/firmware.h>
 #include <linux/mman.h>
 #include <linux/string.h>
 #include <linux/mutex.h>
+#include <linux/ctype.h>
 #if (KERNEL_VERSION(4, 13, 0) <= LINUX_VERSION_CODE)
 #include <linux/set_memory.h>
 #endif
 #include <mmu/mali_kbase_mmu.h>
 #include <asm/arch_timer.h>
+#include <linux/delay.h>
+#include <linux/version_compat_defs.h>
 
-#define MALI_MAX_FIRMWARE_NAME_LEN ((size_t)20)
+#define MALI_MAX_DEFAULT_FIRMWARE_NAME_LEN ((size_t)20)
 
-static char fw_name[MALI_MAX_FIRMWARE_NAME_LEN] = "mali_csffw.bin";
-module_param_string(fw_name, fw_name, sizeof(fw_name), 0644);
+static char default_fw_name[MALI_MAX_DEFAULT_FIRMWARE_NAME_LEN] = "mali_csffw.bin";
+module_param_string(fw_name, default_fw_name, sizeof(default_fw_name), 0644);
 MODULE_PARM_DESC(fw_name, "firmware image");
 
 /* The waiting time for firmware to boot */
-static unsigned int csf_firmware_boot_timeout_ms = 500;
+static unsigned int csf_firmware_boot_timeout_ms;
 module_param(csf_firmware_boot_timeout_ms, uint, 0444);
 MODULE_PARM_DESC(csf_firmware_boot_timeout_ms,
 		 "Maximum time to wait for firmware to boot.");
@@ -75,9 +79,10 @@ MODULE_PARM_DESC(fw_debug,
 	"Enables effective use of a debugger for debugging firmware code.");
 #endif
 
-#define FIRMWARE_HEADER_MAGIC    (0xC3F13A6Eul)
-#define FIRMWARE_HEADER_VERSION  (0ul)
-#define FIRMWARE_HEADER_LENGTH   (0x14ul)
+#define FIRMWARE_HEADER_MAGIC		(0xC3F13A6Eul)
+#define FIRMWARE_HEADER_VERSION_MAJOR	(0ul)
+#define FIRMWARE_HEADER_VERSION_MINOR	(3ul)
+#define FIRMWARE_HEADER_LENGTH		(0x14ul)
 
 #define CSF_FIRMWARE_ENTRY_SUPPORTED_FLAGS \
 	(CSF_FIRMWARE_ENTRY_READ | \
@@ -88,11 +93,13 @@ MODULE_PARM_DESC(fw_debug,
 	 CSF_FIRMWARE_ENTRY_ZERO | \
 	 CSF_FIRMWARE_ENTRY_CACHE_MODE)
 
-#define CSF_FIRMWARE_ENTRY_TYPE_INTERFACE     (0)
-#define CSF_FIRMWARE_ENTRY_TYPE_CONFIGURATION (1)
-#define CSF_FIRMWARE_ENTRY_TYPE_FUTF_TEST     (2)
-#define CSF_FIRMWARE_ENTRY_TYPE_TRACE_BUFFER  (3)
-#define CSF_FIRMWARE_ENTRY_TYPE_TIMELINE_METADATA (4)
+#define CSF_FIRMWARE_ENTRY_TYPE_INTERFACE           (0)
+#define CSF_FIRMWARE_ENTRY_TYPE_CONFIGURATION       (1)
+#define CSF_FIRMWARE_ENTRY_TYPE_TRACE_BUFFER        (3)
+#define CSF_FIRMWARE_ENTRY_TYPE_TIMELINE_METADATA   (4)
+#define CSF_FIRMWARE_ENTRY_TYPE_BUILD_INFO_METADATA (6)
+#define CSF_FIRMWARE_ENTRY_TYPE_FUNC_CALL_LIST      (7)
+#define CSF_FIRMWARE_ENTRY_TYPE_CORE_DUMP           (9)
 
 #define CSF_FIRMWARE_CACHE_MODE_NONE              (0ul << 3)
 #define CSF_FIRMWARE_CACHE_MODE_CACHED            (1ul << 3)
@@ -109,6 +116,8 @@ MODULE_PARM_DESC(fw_debug,
 	(GLB_REQ_CFG_ALLOC_EN_MASK | GLB_REQ_CFG_PROGRESS_TIMER_MASK |                             \
 	 GLB_REQ_CFG_PWROFF_TIMER_MASK | GLB_REQ_IDLE_ENABLE_MASK)
 
+char fw_git_sha[BUILD_INFO_GIT_SHA_LEN];
+
 static inline u32 input_page_read(const u32 *const input, const u32 offset)
 {
 	WARN_ON(offset % sizeof(u32));
@@ -176,7 +185,7 @@ struct firmware_timeline_metadata {
 /* The shared interface area, used for communicating with firmware, is managed
  * like a virtual memory zone. Reserve the virtual space from that zone
  * corresponding to shared interface entry parsed from the firmware image.
- * The shared_reg_rbtree should have been initialized before calling this
+ * The MCU_SHARED_ZONE should have been initialized before calling this
  * function.
  */
 static int setup_shared_iface_static_region(struct kbase_device *kbdev)
@@ -189,8 +198,7 @@ static int setup_shared_iface_static_region(struct kbase_device *kbdev)
 	if (!interface)
 		return -EINVAL;
 
-	reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0,
-			interface->num_pages_aligned, KBASE_REG_ZONE_MCU_SHARED);
+	reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, interface->num_pages_aligned);
 	if (reg) {
 		mutex_lock(&kbdev->csf.reg_lock);
 		ret = kbase_add_va_region_rbtree(kbdev, reg,
@@ -249,10 +257,15 @@ static void stop_csf_firmware(struct kbase_device *kbdev)
 
 static void wait_for_firmware_boot(struct kbase_device *kbdev)
 {
-	const long wait_timeout =
-		kbase_csf_timeout_in_jiffies(csf_firmware_boot_timeout_ms);
+	long wait_timeout;
 	long remaining;
 
+	if (!csf_firmware_boot_timeout_ms)
+		csf_firmware_boot_timeout_ms =
+			kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_BOOT_TIMEOUT);
+
+	wait_timeout = kbase_csf_timeout_in_jiffies(csf_firmware_boot_timeout_ms);
+
 	/* Firmware will generate a global interface interrupt once booting
 	 * is complete
 	 */
@@ -269,22 +282,53 @@ static void boot_csf_firmware(struct kbase_device *kbdev)
 {
 	kbase_csf_firmware_enable_mcu(kbdev);
 
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+	kbase_debug_coresight_csf_state_request(kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED);
+
+	if (!kbase_debug_coresight_csf_state_wait(kbdev, KBASE_DEBUG_CORESIGHT_CSF_ENABLED))
+		dev_err(kbdev->dev, "Timeout waiting for CoreSight to be enabled");
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+
 	wait_for_firmware_boot(kbdev);
 }
 
-static void wait_ready(struct kbase_device *kbdev)
+/**
+ * wait_ready() - Wait for previously issued MMU command to complete.
+ *
+ * @kbdev:        Kbase device to wait for a MMU command to complete.
+ *
+ * Reset GPU if the wait for previously issued command times out.
+ *
+ * Return:  0 on success, error code otherwise.
+ */
+static int wait_ready(struct kbase_device *kbdev)
 {
-	u32 max_loops = KBASE_AS_INACTIVE_MAX_LOOPS;
-	u32 val;
+	const ktime_t wait_loop_start = ktime_get_raw();
+	const u32 mmu_as_inactive_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms;
+	s64 diff;
 
-	val = kbase_reg_read(kbdev, MMU_AS_REG(MCU_AS_NR, AS_STATUS));
+	do {
+		unsigned int i;
 
-	/* Wait for a while for the update command to take effect */
-	while (--max_loops && (val & AS_STATUS_AS_ACTIVE))
-		val = kbase_reg_read(kbdev, MMU_AS_REG(MCU_AS_NR, AS_STATUS));
+		for (i = 0; i < 1000; i++) {
+			/* Wait for the MMU status to indicate there is no active command */
+			if (!(kbase_reg_read(kbdev,
+					     MMU_STAGE1_REG(MMU_AS_REG(MCU_AS_NR, AS_STATUS))) &
+			      AS_STATUS_AS_ACTIVE))
+				return 0;
+		}
+
+		diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start));
+	} while (diff < mmu_as_inactive_wait_time_ms);
 
-	if (max_loops == 0)
-		dev_err(kbdev->dev, "AS_ACTIVE bit stuck, might be caused by slow/unstable GPU clock or possible faulty FPGA connector\n");
+	dev_err(kbdev->dev,
+		"AS_ACTIVE bit stuck for MCU AS. Might be caused by unstable GPU clk/pwr or faulty system");
+	queue_work(system_highpri_wq, &kbdev->csf.coredump_work);
+
+	if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+		kbase_reset_gpu_locked(kbdev);
+
+	return -ETIMEDOUT;
 }
 
 static void unload_mmu_tables(struct kbase_device *kbdev)
@@ -299,7 +343,7 @@ static void unload_mmu_tables(struct kbase_device *kbdev)
 	mutex_unlock(&kbdev->mmu_hw_mutex);
 }
 
-static void load_mmu_tables(struct kbase_device *kbdev)
+static int load_mmu_tables(struct kbase_device *kbdev)
 {
 	unsigned long irq_flags;
 
@@ -310,7 +354,7 @@ static void load_mmu_tables(struct kbase_device *kbdev)
 	mutex_unlock(&kbdev->mmu_hw_mutex);
 
 	/* Wait for a while for the update command to take effect */
-	wait_ready(kbdev);
+	return wait_ready(kbdev);
 }
 
 /**
@@ -402,7 +446,7 @@ static void load_fw_image_section(struct kbase_device *kbdev, const u8 *data,
 
 	for (page_num = 0; page_num < page_limit; ++page_num) {
 		struct page *const page = as_page(phys[page_num]);
-		char *const p = kmap_atomic(page);
+		char *const p = kbase_kmap_atomic(page);
 		u32 const copy_len = min_t(u32, PAGE_SIZE, data_len);
 
 		if (copy_len > 0) {
@@ -417,9 +461,9 @@ static void load_fw_image_section(struct kbase_device *kbdev, const u8 *data,
 			memset(p + copy_len, 0, zi_len);
 		}
 
-		kbase_sync_single_for_device(kbdev, kbase_dma_addr(page),
-				PAGE_SIZE, DMA_TO_DEVICE);
-		kunmap_atomic(p);
+		kbase_sync_single_for_device(kbdev, kbase_dma_addr_from_tagged(phys[page_num]),
+					     PAGE_SIZE, DMA_TO_DEVICE);
+		kbase_kunmap_atomic(p);
 	}
 }
 
@@ -427,24 +471,17 @@ static int reload_fw_image(struct kbase_device *kbdev)
 {
 	const u32 magic = FIRMWARE_HEADER_MAGIC;
 	struct kbase_csf_firmware_interface *interface;
-	const struct firmware *firmware;
+	struct kbase_csf_mcu_fw *const mcu_fw = &kbdev->csf.fw;
 	int ret = 0;
 
-	if (request_firmware(&firmware, fw_name, kbdev->dev) != 0) {
-		dev_err(kbdev->dev,
-			"Failed to reload firmware image '%s'\n",
-			fw_name);
-		return -ENOENT;
-	}
-
-	/* Do couple of basic sanity checks */
-	if (firmware->size < FIRMWARE_HEADER_LENGTH) {
-		dev_err(kbdev->dev, "Firmware image unexpectedly too small\n");
+	if (WARN_ON(mcu_fw->data == NULL)) {
+		dev_err(kbdev->dev, "Firmware image copy not loaded\n");
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (memcmp(firmware->data, &magic, sizeof(magic)) != 0) {
+	/* Do a basic sanity check on MAGIC signature */
+	if (memcmp(mcu_fw->data, &magic, sizeof(magic)) != 0) {
 		dev_err(kbdev->dev, "Incorrect magic value, firmware image could have been corrupted\n");
 		ret = -EINVAL;
 		goto out;
@@ -459,16 +496,14 @@ static int reload_fw_image(struct kbase_device *kbdev)
 				continue;
 		}
 
-		load_fw_image_section(kbdev, firmware->data, interface->phys,
-			interface->num_pages, interface->flags,
-			interface->data_start, interface->data_end);
+		load_fw_image_section(kbdev, mcu_fw->data, interface->phys, interface->num_pages,
+				      interface->flags, interface->data_start, interface->data_end);
 	}
 
 	kbdev->csf.firmware_full_reload_needed = false;
 
 	kbase_csf_firmware_reload_trace_buffers_data(kbdev);
 out:
-	release_firmware(firmware);
 	return ret;
 }
 
@@ -480,6 +515,7 @@ out:
  * @kbdev: Kbase device structure
  * @virtual_start: Start of the virtual address range required for an entry allocation
  * @virtual_end: End of the virtual address range required for an entry allocation
+ * @flags: Firmware entry flags for comparison with the reusable pages found
  * @phys: Pointer to the array of physical (tagged) addresses making up the new
  *        FW interface entry. It is an output parameter which would be made to
  *        point to an already existing array allocated for the previously parsed
@@ -494,16 +530,19 @@ out:
  *                     within the 2MB pages aligned allocation.
  * @is_small_page: This is an output flag used to select between the small and large page
  *                 to be used for the FW entry allocation.
+ * @force_small_page: Use 4kB pages to allocate memory needed for FW loading
  *
  * Go through all the already initialized interfaces and find if a previously
  * allocated large page can be used to store contents of new FW interface entry.
  *
  * Return: true if a large page can be reused, false otherwise.
  */
-static inline bool entry_find_large_page_to_reuse(
-	struct kbase_device *kbdev, const u32 virtual_start, const u32 virtual_end,
-	struct tagged_addr **phys, struct protected_memory_allocation ***pma,
-	u32 num_pages, u32 *num_pages_aligned, bool *is_small_page)
+static inline bool entry_find_large_page_to_reuse(struct kbase_device *kbdev,
+						  const u32 virtual_start, const u32 virtual_end,
+						  const u32 flags, struct tagged_addr **phys,
+						  struct protected_memory_allocation ***pma,
+						  u32 num_pages, u32 *num_pages_aligned,
+						  bool *is_small_page, bool force_small_page)
 {
 	struct kbase_csf_firmware_interface *interface = NULL;
 	struct kbase_csf_firmware_interface *target_interface = NULL;
@@ -519,7 +558,61 @@ static inline bool entry_find_large_page_to_reuse(
 	*phys = NULL;
 	*pma = NULL;
 
+	if (force_small_page)
+		goto out;
+
+	/* If the section starts at 2MB aligned boundary,
+	 * then use 2MB page(s) for it.
+	 */
+	if (!(virtual_start & (SZ_2M - 1))) {
+		*num_pages_aligned =
+			round_up(*num_pages_aligned, NUM_4K_PAGES_IN_2MB_PAGE);
+		*is_small_page = false;
+		goto out;
+	}
+
+	/* If the section doesn't lie within the same 2MB aligned boundary,
+	 * then use 4KB pages as it would be complicated to use a 2MB page
+	 * for such section.
+	 */
+	if ((virtual_start & ~(SZ_2M - 1)) != (virtual_end & ~(SZ_2M - 1)))
+		goto out;
+
+	/* Find the nearest 2MB aligned section which comes before the current
+	 * section.
+	 */
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		const u32 virtual_diff = virtual_start - interface->virtual;
+
+		if (interface->virtual > virtual_end)
+			continue;
+
+		if (interface->virtual & (SZ_2M - 1))
+			continue;
+
+		if ((virtual_diff < virtual_diff_min) && (interface->flags == flags)) {
+			target_interface = interface;
+			virtual_diff_min = virtual_diff;
+		}
+	}
+
+	if (target_interface) {
+		const u32 page_index = virtual_diff_min >> PAGE_SHIFT;
+
+		if (page_index >= target_interface->num_pages_aligned)
+			goto out;
 
+		if (target_interface->phys)
+			*phys = &target_interface->phys[page_index];
+
+		if (target_interface->pma)
+			*pma = &target_interface->pma[page_index / NUM_4K_PAGES_IN_2MB_PAGE];
+
+		*is_small_page = false;
+		reuse_large_page = true;
+	}
+
+out:
 	return reuse_large_page;
 }
 
@@ -538,8 +631,8 @@ static inline bool entry_find_large_page_to_reuse(
  * Return: 0 if successful, negative error code on failure
  */
 static int parse_memory_setup_entry(struct kbase_device *kbdev,
-		const struct firmware *fw,
-		const u32 *entry, unsigned int size)
+				    const struct kbase_csf_mcu_fw *const fw, const u32 *entry,
+				    unsigned int size)
 {
 	int ret = 0;
 	const u32 flags = entry[0];
@@ -550,6 +643,8 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 	u32 num_pages;
 	u32 num_pages_aligned;
 	char *name;
+	void *name_entry;
+	unsigned int name_len;
 	struct tagged_addr *phys = NULL;
 	struct kbase_csf_firmware_interface *interface = NULL;
 	bool allocated_pages = false, protected_mode = false;
@@ -558,6 +653,7 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 	struct protected_memory_allocation **pma = NULL;
 	bool reuse_pages = false;
 	bool is_small_page = true;
+	bool force_small_page = false;
 
 	if (data_end < data_start) {
 		dev_err(kbdev->dev, "Firmware corrupt, data_end < data_start (0x%x<0x%x)\n",
@@ -592,7 +688,7 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 		protected_mode = true;
 
 	if (protected_mode && kbdev->csf.pma_dev == NULL) {
-		dev_err(kbdev->dev,
+		dev_warn(kbdev->dev,
 			"Protected memory allocator not found, Firmware protected mode entry will not be supported");
 		return 0;
 	}
@@ -600,9 +696,15 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 	num_pages = (virtual_end - virtual_start)
 		>> PAGE_SHIFT;
 
-	reuse_pages = entry_find_large_page_to_reuse(
-		kbdev, virtual_start, virtual_end, &phys, &pma,
-		num_pages, &num_pages_aligned, &is_small_page);
+	if(protected_mode) {
+		force_small_page = true;
+		dev_warn(kbdev->dev, "Protected memory allocation requested for %u bytes (%u pages), serving with small pages and tight allocation.", (virtual_end - virtual_start), num_pages);
+	}
+
+retry_alloc:
+	reuse_pages = entry_find_large_page_to_reuse(kbdev, virtual_start, virtual_end, flags,
+						     &phys, &pma, num_pages, &num_pages_aligned,
+						     &is_small_page, force_small_page);
 	if (!reuse_pages)
 		phys = kmalloc_array(num_pages_aligned, sizeof(*phys), GFP_KERNEL);
 
@@ -613,23 +715,41 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 		if (!reuse_pages) {
 			pma = kbase_csf_protected_memory_alloc(
 				kbdev, phys, num_pages_aligned, is_small_page);
+			if (!pma) {
+				/* If we can't allocate sufficient memory for FW - bail out and leave protected execution unsupported by termintating the allocator. */
+				dev_warn(kbdev->dev,
+				"Protected memory allocation failed during FW initialization - Firmware protected mode entry will not be supported");
+				kbase_csf_protected_memory_term(kbdev);
+				kbdev->csf.pma_dev = NULL;
+				kfree(phys);
+				return 0;
+			}
+		} else if (WARN_ON(!pma)) {
+			ret = -EINVAL;
+			goto out;
 		}
-
-		if (!pma)
-			ret = -ENOMEM;
 	} else {
 		if (!reuse_pages) {
 			ret = kbase_mem_pool_alloc_pages(
-				kbase_mem_pool_group_select(
-					kbdev, KBASE_MEM_GROUP_CSF_FW, is_small_page),
-				num_pages_aligned, phys, false);
+				kbase_mem_pool_group_select(kbdev, KBASE_MEM_GROUP_CSF_FW,
+							    is_small_page),
+				num_pages_aligned, phys, false, NULL);
 		}
 	}
 
 	if (ret < 0) {
-		dev_err(kbdev->dev,
-			"Failed to allocate %u physical pages for the firmware interface entry at VA 0x%x\n",
-			num_pages_aligned, virtual_start);
+		dev_warn(
+			kbdev->dev,
+			"Failed to allocate %u physical pages for the firmware interface entry at VA 0x%x using %s ",
+			num_pages_aligned, virtual_start,
+			is_small_page ? "small pages" : "large page");
+		WARN_ON(reuse_pages);
+		if (!is_small_page) {
+			dev_warn(kbdev->dev, "Retrying by using small pages");
+			force_small_page = true;
+			kfree(phys);
+			goto retry_alloc;
+		}
 		goto out;
 	}
 
@@ -638,21 +758,24 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 			data_start, data_end);
 
 	/* Allocate enough memory for the struct kbase_csf_firmware_interface and
-	 * the name of the interface. An extra byte is allocated to place a
-	 * NUL-terminator in. This should already be included according to the
-	 * specification but here we add it anyway to be robust against a
-	 * corrupt firmware image.
+	 * the name of the interface.
 	 */
-	interface = kmalloc(sizeof(*interface) +
-			size - INTERFACE_ENTRY_NAME_OFFSET + 1, GFP_KERNEL);
+	name_entry = (void *)entry + INTERFACE_ENTRY_NAME_OFFSET;
+	name_len = strnlen(name_entry, size - INTERFACE_ENTRY_NAME_OFFSET);
+	if (size < (INTERFACE_ENTRY_NAME_OFFSET + name_len + 1 + sizeof(u32))) {
+		dev_err(kbdev->dev, "Memory setup entry too short to contain virtual_exe_start");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	interface = kmalloc(sizeof(*interface) + name_len + 1, GFP_KERNEL);
 	if (!interface) {
 		ret = -ENOMEM;
 		goto out;
 	}
 	name = (void *)(interface + 1);
-	memcpy(name, entry + (INTERFACE_ENTRY_NAME_OFFSET / sizeof(*entry)),
-			size - INTERFACE_ENTRY_NAME_OFFSET);
-	name[size - INTERFACE_ENTRY_NAME_OFFSET] = 0;
+	memcpy(name, name_entry, name_len);
+	name[name_len] = 0;
 
 	interface->name = name;
 	interface->phys = phys;
@@ -667,6 +790,11 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 	interface->data_end = data_end;
 	interface->pma = pma;
 
+	/* Discover the virtual execution address field after the end of the name
+	 * field taking into account the NULL-termination character.
+	 */
+	interface->virtual_exe_start = *((u32 *)(name_entry + name_len + 1));
+
 	mem_flags = convert_mem_flags(kbdev, flags, &cache_mode);
 
 	if (flags & CSF_FIRMWARE_ENTRY_SHARED) {
@@ -722,8 +850,9 @@ static int parse_memory_setup_entry(struct kbase_device *kbdev,
 
 	if (!reuse_pages) {
 		ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu,
-				virtual_start >> PAGE_SHIFT, phys, num_pages_aligned, mem_flags,
-				KBASE_MEM_GROUP_CSF_FW);
+						      virtual_start >> PAGE_SHIFT, phys,
+						      num_pages_aligned, mem_flags,
+						      KBASE_MEM_GROUP_CSF_FW, NULL, NULL);
 
 		if (ret != 0) {
 			dev_err(kbdev->dev, "Failed to insert firmware pages\n");
@@ -770,7 +899,8 @@ out:
  * @size:  Size (in bytes) of the section
  */
 static int parse_timeline_metadata_entry(struct kbase_device *kbdev,
-	const struct firmware *fw, const u32 *entry, unsigned int size)
+					 const struct kbase_csf_mcu_fw *const fw, const u32 *entry,
+					 unsigned int size)
 {
 	const u32 data_start = entry[0];
 	const u32 data_size = entry[1];
@@ -813,6 +943,59 @@ static int parse_timeline_metadata_entry(struct kbase_device *kbdev,
 }
 
 /**
+ * parse_build_info_metadata_entry() - Process a "build info metadata" section
+ * @kbdev: Kbase device structure
+ * @fw:    Firmware image containing the section
+ * @entry: Pointer to the section
+ * @size:  Size (in bytes) of the section
+ *
+ * This prints the git SHA of the firmware on frimware load.
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+static int parse_build_info_metadata_entry(struct kbase_device *kbdev,
+					   const struct kbase_csf_mcu_fw *const fw,
+					   const u32 *entry, unsigned int size)
+{
+	const u32 meta_start_addr = entry[0];
+	char *ptr = NULL;
+	size_t sha_pattern_len = strlen(BUILD_INFO_GIT_SHA_PATTERN);
+
+	/* Only print git SHA to avoid releasing sensitive information */
+	ptr = strstr(fw->data + meta_start_addr, BUILD_INFO_GIT_SHA_PATTERN);
+	/* Check that we won't overrun the found string  */
+	if (ptr &&
+	    strlen(ptr) >= BUILD_INFO_GIT_SHA_LEN + BUILD_INFO_GIT_DIRTY_LEN + sha_pattern_len) {
+		char git_sha[BUILD_INFO_GIT_SHA_LEN + BUILD_INFO_GIT_DIRTY_LEN + 1];
+		int i = 0;
+
+		/* Move ptr to start of SHA */
+		ptr += sha_pattern_len;
+		for (i = 0; i < BUILD_INFO_GIT_SHA_LEN; i++) {
+			/* Ensure that the SHA is made up of hex digits */
+			if (!isxdigit(ptr[i]))
+				break;
+
+			git_sha[i] = ptr[i];
+		}
+
+		/* Check if the next char indicates git SHA is dirty */
+		if (ptr[i] == ' ' || ptr[i] == '+') {
+			git_sha[i] = ptr[i];
+			i++;
+		}
+		git_sha[i] = '\0';
+
+		memcpy(fw_git_sha, git_sha, BUILD_INFO_GIT_SHA_LEN);
+
+		dev_info(kbdev->dev, "Mali firmware git_sha: %s\n", git_sha);
+	} else
+		dev_info(kbdev->dev, "Mali firmware git_sha not found or invalid\n");
+
+	return 0;
+}
+
+/**
  * load_firmware_entry() - Process an entry from a firmware image
  *
  * @kbdev:  Kbase device
@@ -828,9 +1011,8 @@ static int parse_timeline_metadata_entry(struct kbase_device *kbdev,
  *
  * Return: 0 if successful, negative error code on failure
  */
-static int load_firmware_entry(struct kbase_device *kbdev,
-		const struct firmware *fw,
-		u32 offset, u32 header)
+static int load_firmware_entry(struct kbase_device *kbdev, const struct kbase_csf_mcu_fw *const fw,
+			       u32 offset, u32 header)
 {
 	const unsigned int type = entry_type(header);
 	unsigned int size = entry_size(header);
@@ -892,13 +1074,35 @@ static int load_firmware_entry(struct kbase_device *kbdev,
 			return -EINVAL;
 		}
 		return parse_timeline_metadata_entry(kbdev, fw, entry, size);
-	}
-
-	if (!optional) {
-		dev_err(kbdev->dev,
-			"Unsupported non-optional entry type %u in firmware\n",
-			type);
-		return -EINVAL;
+	case CSF_FIRMWARE_ENTRY_TYPE_BUILD_INFO_METADATA:
+		if (size < BUILD_INFO_METADATA_SIZE_OFFSET + sizeof(*entry)) {
+			dev_err(kbdev->dev, "Build info metadata entry too short (size=%u)\n",
+				size);
+			return -EINVAL;
+		}
+		return parse_build_info_metadata_entry(kbdev, fw, entry, size);
+	case CSF_FIRMWARE_ENTRY_TYPE_FUNC_CALL_LIST:
+		/* Function call list section */
+		if (size < FUNC_CALL_LIST_ENTRY_NAME_OFFSET + sizeof(*entry)) {
+			dev_err(kbdev->dev, "Function call list entry too short (size=%u)\n",
+				size);
+			return -EINVAL;
+		}
+		kbase_csf_firmware_log_parse_logging_call_list_entry(kbdev, entry);
+		return 0;
+	case CSF_FIRMWARE_ENTRY_TYPE_CORE_DUMP:
+		/* Core Dump section */
+		if (size < CORE_DUMP_ENTRY_START_ADDR_OFFSET + sizeof(*entry)) {
+			dev_err(kbdev->dev, "FW Core dump entry too short (size=%u)\n", size);
+			return -EINVAL;
+		}
+		return kbase_csf_firmware_core_dump_entry_parse(kbdev, entry);
+	default:
+		if (!optional) {
+			dev_err(kbdev->dev, "Unsupported non-optional entry type %u in firmware\n",
+				type);
+			return -EINVAL;
+		}
 	}
 
 	return 0;
@@ -1115,40 +1319,80 @@ static int parse_capabilities(struct kbase_device *kbdev)
 	return 0;
 }
 
+static inline void access_firmware_memory_common(struct kbase_device *kbdev,
+		struct kbase_csf_firmware_interface *interface, u32 offset_bytes,
+		u32 *value, const bool read)
+{
+	u32 page_num = offset_bytes >> PAGE_SHIFT;
+	u32 offset_in_page = offset_bytes & ~PAGE_MASK;
+	struct page *target_page = as_page(interface->phys[page_num]);
+	uintptr_t cpu_addr = (uintptr_t)kbase_kmap_atomic(target_page);
+	u32 *addr = (u32 *)(cpu_addr + offset_in_page);
+
+	if (read) {
+		kbase_sync_single_for_device(kbdev,
+			kbase_dma_addr_from_tagged(interface->phys[page_num]) + offset_in_page,
+			sizeof(u32), DMA_BIDIRECTIONAL);
+		*value = *addr;
+	} else {
+		*addr = *value;
+		kbase_sync_single_for_device(kbdev,
+			kbase_dma_addr_from_tagged(interface->phys[page_num]) + offset_in_page,
+			sizeof(u32), DMA_BIDIRECTIONAL);
+	}
+
+	kbase_kunmap_atomic((u32 *)cpu_addr);
+}
+
 static inline void access_firmware_memory(struct kbase_device *kbdev,
 	u32 gpu_addr, u32 *value, const bool read)
 {
-	struct kbase_csf_firmware_interface *interface;
+	struct kbase_csf_firmware_interface *interface, *access_interface = NULL;
+	u32 offset_bytes = 0;
 
 	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
 		if ((gpu_addr >= interface->virtual) &&
 			(gpu_addr < interface->virtual + (interface->num_pages << PAGE_SHIFT))) {
-			u32 offset_bytes = gpu_addr - interface->virtual;
-			u32 page_num = offset_bytes >> PAGE_SHIFT;
-			u32 offset_in_page = offset_bytes & ~PAGE_MASK;
-			struct page *target_page = as_page(
-				interface->phys[page_num]);
-			u32 *cpu_addr = kmap_atomic(target_page);
-
-			if (read) {
-				kbase_sync_single_for_device(kbdev,
-					kbase_dma_addr(target_page) + offset_in_page,
-					sizeof(u32), DMA_BIDIRECTIONAL);
-
-				*value = cpu_addr[offset_in_page >> 2];
-			} else {
-				cpu_addr[offset_in_page >> 2] = *value;
+			offset_bytes = gpu_addr - interface->virtual;
+			access_interface = interface;
+			break;
+		}
+	}
 
-				kbase_sync_single_for_device(kbdev,
-					kbase_dma_addr(target_page) + offset_in_page,
-					sizeof(u32), DMA_BIDIRECTIONAL);
-			}
+	if (access_interface)
+		access_firmware_memory_common(kbdev, access_interface, offset_bytes, value, read);
+	else
+		dev_warn(kbdev->dev, "Invalid GPU VA %x passed", gpu_addr);
+}
+
+static inline void access_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 *value, const bool read)
+{
+	struct kbase_csf_firmware_interface *interface, *access_interface = NULL;
+	u32 offset_bytes = 0;
 
-			kunmap_atomic(cpu_addr);
-			return;
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		if ((gpu_addr >= interface->virtual_exe_start) &&
+			(gpu_addr < interface->virtual_exe_start +
+				(interface->num_pages << PAGE_SHIFT))) {
+			offset_bytes = gpu_addr - interface->virtual_exe_start;
+			access_interface = interface;
+
+			/* If there's an overlap in execution address range between a moved and a
+			 * non-moved areas, always prefer the moved one. The idea is that FW may
+			 * move sections around during init time, but after the layout is settled,
+			 * any moved sections are going to override non-moved areas at the same
+			 * location.
+			 */
+			if (interface->virtual_exe_start != interface->virtual)
+				break;
 		}
 	}
-	dev_warn(kbdev->dev, "Invalid GPU VA %x passed\n", gpu_addr);
+
+	if (access_interface)
+		access_firmware_memory_common(kbdev, access_interface, offset_bytes, value, read);
+	else
+		dev_warn(kbdev->dev, "Invalid GPU VA %x passed", gpu_addr);
 }
 
 void kbase_csf_read_firmware_memory(struct kbase_device *kbdev,
@@ -1163,6 +1407,18 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev,
 	access_firmware_memory(kbdev, gpu_addr, &value, false);
 }
 
+void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 *value)
+{
+	access_firmware_memory_exe(kbdev, gpu_addr, value, true);
+}
+
+void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 value)
+{
+	access_firmware_memory_exe(kbdev, gpu_addr, &value, false);
+}
+
 void kbase_csf_firmware_cs_input(
 	const struct kbase_csf_cmd_stream_info *const info, const u32 offset,
 	const u32 value)
@@ -1295,6 +1551,26 @@ u32 kbase_csf_firmware_global_output(
 KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_output);
 
 /**
+ * csf_doorbell_offset() - Calculate the offset to the CSF host doorbell
+ * @doorbell_nr: Doorbell number
+ *
+ * Return: CSF host register offset for the specified doorbell number.
+ */
+static u32 csf_doorbell_offset(int doorbell_nr)
+{
+	WARN_ON(doorbell_nr < 0);
+	WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL);
+
+	return CSF_HW_DOORBELL_PAGE_OFFSET + (doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE);
+}
+
+void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr)
+{
+	kbase_reg_write(kbdev, csf_doorbell_offset(doorbell_nr), (u32)1);
+}
+EXPORT_SYMBOL(kbase_csf_ring_doorbell);
+
+/**
  * handle_internal_firmware_fatal - Handler for CS internal firmware fault.
  *
  * @kbdev:  Pointer to kbase device
@@ -1306,6 +1582,8 @@ static void handle_internal_firmware_fatal(struct kbase_device *const kbdev)
 {
 	int as;
 
+	kbasep_platform_event_core_dump(kbdev, "Internal firmware error");
+
 	for (as = 0; as < kbdev->nr_hw_address_spaces; as++) {
 		unsigned long flags;
 		struct kbase_context *kctx;
@@ -1378,11 +1656,10 @@ static bool global_request_complete(struct kbase_device *const kbdev,
 	return complete;
 }
 
-static int wait_for_global_request(struct kbase_device *const kbdev,
-				   u32 const req_mask)
+static int wait_for_global_request_with_timeout(struct kbase_device *const kbdev,
+						u32 const req_mask, unsigned int timeout_ms)
 {
-	const long wait_timeout =
-		kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+	const long wait_timeout = kbase_csf_timeout_in_jiffies(timeout_ms);
 	long remaining;
 	int err = 0;
 
@@ -1391,10 +1668,9 @@ static int wait_for_global_request(struct kbase_device *const kbdev,
 				       wait_timeout);
 
 	if (!remaining) {
-		dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for global request %x to complete",
-			 kbase_backend_get_cycle_cnt(kbdev),
-			 kbdev->csf.fw_timeout_ms,
-			 req_mask);
+		dev_warn(kbdev->dev,
+			 "[%llu] Timeout (%d ms) waiting for global request %x to complete",
+			 kbase_backend_get_cycle_cnt(kbdev), timeout_ms, req_mask);
 		err = -ETIMEDOUT;
 
 	}
@@ -1402,6 +1678,11 @@ static int wait_for_global_request(struct kbase_device *const kbdev,
 	return err;
 }
 
+static int wait_for_global_request(struct kbase_device *const kbdev, u32 const req_mask)
+{
+	return wait_for_global_request_with_timeout(kbdev, req_mask, kbdev->csf.fw_timeout_ms);
+}
+
 static void set_global_request(
 	const struct kbase_csf_global_iface *const global_iface,
 	u32 const req_mask)
@@ -1442,6 +1723,11 @@ static void enable_shader_poweroff_timer(struct kbase_device *const kbdev,
 
 	kbase_csf_firmware_global_input(global_iface, GLB_PWROFF_TIMER,
 					pwroff_reg);
+
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_PWROFF_TIMER_CONFIG,
+					     kbdev->csf.mcu_core_pwroff_dur_count_modifier,
+					     GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK);
+
 	set_global_request(global_iface, GLB_REQ_CFG_PWROFF_TIMER_MASK);
 
 	/* Save the programed reg value in its shadow field */
@@ -1468,12 +1754,102 @@ static void enable_gpu_idle_timer(struct kbase_device *const kbdev)
 
 	kbase_csf_firmware_global_input(global_iface, GLB_IDLE_TIMER,
 					kbdev->csf.gpu_idle_dur_count);
+
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_IDLE_TIMER_CONFIG,
+					     kbdev->csf.gpu_idle_dur_count_modifier,
+					     GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK);
+
 	kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, GLB_REQ_REQ_IDLE_ENABLE,
 					     GLB_REQ_IDLE_ENABLE_MASK);
 	dev_dbg(kbdev->dev, "Enabling GPU idle timer with count-value: 0x%.8x",
 		kbdev->csf.gpu_idle_dur_count);
 }
 
+static bool global_debug_request_complete(struct kbase_device *const kbdev, u32 const req_mask)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	bool complete = false;
+	unsigned long flags;
+
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	if ((kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK) & req_mask) ==
+	    (kbase_csf_firmware_global_input_read(global_iface, GLB_DEBUG_REQ) & req_mask))
+		complete = true;
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	return complete;
+}
+
+static void set_global_debug_request(const struct kbase_csf_global_iface *const global_iface,
+				     u32 const req_mask)
+{
+	u32 glb_debug_req;
+
+	kbase_csf_scheduler_spin_lock_assert_held(global_iface->kbdev);
+
+	glb_debug_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK);
+	glb_debug_req ^= req_mask;
+
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_debug_req, req_mask);
+}
+
+static void request_fw_core_dump(
+	const struct kbase_csf_global_iface *const global_iface)
+{
+	uint32_t run_mode = GLB_DEBUG_REQ_RUN_MODE_SET(0, GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP);
+
+	set_global_debug_request(global_iface, GLB_DEBUG_REQ_DEBUG_RUN_MASK | run_mode);
+
+	set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK);
+}
+
+int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev)
+{
+	const struct kbase_csf_global_iface *const global_iface =
+		&kbdev->csf.global_iface;
+	unsigned long flags;
+	int ret;
+
+	/* Serialize CORE_DUMP requests. */
+	mutex_lock(&kbdev->csf.reg_lock);
+
+	/* Update GLB_REQ with CORE_DUMP request and make firmware act on it. */
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	request_fw_core_dump(global_iface);
+	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	/* Wait for firmware to acknowledge completion of the CORE_DUMP request. */
+	ret = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK);
+	if (!ret)
+		WARN_ON(!global_debug_request_complete(kbdev, GLB_DEBUG_REQ_DEBUG_RUN_MASK));
+
+	mutex_unlock(&kbdev->csf.reg_lock);
+
+	return ret;
+}
+
+/**
+ * kbasep_enable_rtu - Enable Ray Tracing Unit on powering up shader core
+ *
+ * @kbdev:     The kbase device structure of the device
+ *
+ * This function needs to be called to enable the Ray Tracing Unit
+ * by writing SHADER_PWRFEATURES only when host controls shader cores power.
+ */
+static void kbasep_enable_rtu(struct kbase_device *kbdev)
+{
+	const u32 gpu_id = kbdev->gpu_props.props.raw_props.gpu_id;
+
+	if (gpu_id < GPU_ID2_PRODUCT_MAKE(12, 8, 3, 0))
+		return;
+
+	if (kbdev->csf.firmware_hctl_core_pwr)
+		kbase_reg_write(kbdev, GPU_CONTROL_REG(SHADER_PWRFEATURES), 1);
+}
+
 static void global_init(struct kbase_device *const kbdev, u64 core_mask)
 {
 	u32 const ack_irq_mask =
@@ -1481,30 +1857,49 @@ static void global_init(struct kbase_device *const kbdev, u64 core_mask)
 		GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK | GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK |
 		GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK | GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK |
 		GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK | GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK |
-		GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK;
+		GLB_REQ_DEBUG_CSF_REQ_MASK | GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK;
 
 	const struct kbase_csf_global_iface *const global_iface =
 		&kbdev->csf.global_iface;
 	unsigned long flags;
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/* If the power_policy will grant host control over FW PM, we need to turn on the SC rail*/
+	if (kbdev->csf.firmware_hctl_core_pwr) {
+		queue_work(system_highpri_wq, &kbdev->pm.backend.sc_rails_on_work);
+	}
+#endif
+
 	kbase_csf_scheduler_spin_lock(kbdev, &flags);
 
+	kbasep_enable_rtu(kbdev);
+
 	/* Update shader core allocation enable mask */
 	enable_endpoints_global(global_iface, core_mask);
 	enable_shader_poweroff_timer(kbdev, global_iface);
 
-	set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev));
-
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	/* The GPU idle timer is always enabled for simplicity. Checks will be
 	 * done before scheduling the GPU idle worker to see if it is
 	 * appropriate for the current power policy.
 	 */
 	enable_gpu_idle_timer(kbdev);
+#endif
+
+	set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev));
 
 	/* Unmask the interrupts */
 	kbase_csf_firmware_global_input(global_iface,
 		GLB_ACK_IRQ_MASK, ack_irq_mask);
 
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+	/* Enable FW MCU read/write debug interfaces */
+	kbase_csf_firmware_global_input_mask(
+		global_iface, GLB_DEBUG_ACK_IRQ_MASK,
+		GLB_DEBUG_REQ_FW_AS_READ_MASK | GLB_DEBUG_REQ_FW_AS_WRITE_MASK,
+		GLB_DEBUG_REQ_FW_AS_READ_MASK | GLB_DEBUG_REQ_FW_AS_WRITE_MASK);
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 
 	kbase_csf_scheduler_spin_unlock(kbdev, flags);
@@ -1550,7 +1945,9 @@ void kbase_csf_firmware_global_reinit(struct kbase_device *kbdev,
 bool kbase_csf_firmware_global_reinit_complete(struct kbase_device *kbdev)
 {
 	lockdep_assert_held(&kbdev->hwaccess_lock);
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	WARN_ON(!kbdev->csf.glb_init_request_pending);
+#endif
 
 	if (global_request_complete(kbdev, CSF_GLB_REQ_CFG_MASK))
 		kbdev->csf.glb_init_request_pending = false;
@@ -1613,6 +2010,20 @@ static void kbase_csf_firmware_reload_worker(struct work_struct *work)
 
 	kbase_csf_tl_reader_reset(&kbdev->timeline->csf_tl_reader);
 
+	err = kbasep_platform_fw_config_init(kbdev);
+	if (WARN_ON(err))
+		return;
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	err = kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(kbdev);
+	if (WARN_ON(err))
+		return;
+#endif
+
+	err = kbase_csf_firmware_cfg_fw_wa_enable(kbdev);
+	if (WARN_ON(err))
+		return;
+
 	/* Reboot the firmware */
 	kbase_csf_firmware_enable_mcu(kbdev);
 }
@@ -1625,7 +2036,7 @@ void kbase_csf_firmware_trigger_reload(struct kbase_device *kbdev)
 
 	if (kbdev->csf.firmware_reload_needed) {
 		kbdev->csf.firmware_reload_needed = false;
-		queue_work(system_wq, &kbdev->csf.firmware_reload_work);
+		queue_work(system_highpri_wq, &kbdev->csf.firmware_reload_work);
 	} else {
 		kbase_csf_firmware_enable_mcu(kbdev);
 	}
@@ -1648,19 +2059,20 @@ void kbase_csf_firmware_reload_completed(struct kbase_device *kbdev)
 	if (version != kbdev->csf.global_iface.version)
 		dev_err(kbdev->dev, "Version check failed in firmware reboot.");
 
-	KBASE_KTRACE_ADD(kbdev, FIRMWARE_REBOOT, NULL, 0u);
+	KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_REBOOT, NULL, 0u);
 
 	/* Tell MCU state machine to transit to next state */
 	kbdev->csf.firmware_reloaded = true;
 	kbase_pm_update_state(kbdev);
 }
 
-static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms)
+static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ns, u32 *modifier)
 {
+#define MICROSECONDS_PER_SECOND 1000000u
 #define HYSTERESIS_VAL_UNIT_SHIFT (10)
 	/* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */
 	u64 freq = arch_timer_get_cntfrq();
-	u64 dur_val = dur_ms;
+	u64 dur_val = dur_ns;
 	u32 cnt_val_u32, reg_val_u32;
 	bool src_system_timestamp = freq > 0;
 
@@ -1673,25 +2085,29 @@ static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_m
 			dev_warn(kbdev->dev, "No GPU clock, unexpected intregration issue!");
 		spin_unlock(&kbdev->pm.clk_rtm.lock);
 
-		dev_info(kbdev->dev, "Can't get the timestamp frequency, "
-			 "use cycle counter format with firmware idle hysteresis!");
+		dev_info(
+			kbdev->dev,
+			"Can't get the timestamp frequency, use cycle counter format with firmware idle hysteresis!");
 	}
 
-	/* Formula for dur_val = ((dur_ms/1000) * freq_HZ) >> 10) */
-	dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT;
-	dur_val = div_u64(dur_val, 1000);
+	/* Formula for dur_val = (dur/1e9) * freq_HZ) */
+	dur_val = dur_val * freq;
+	dur_val = div_u64(dur_val, NSEC_PER_SEC);
+	if (dur_val < S32_MAX) {
+		*modifier = 1;
+	} else {
+		dur_val = dur_val >> HYSTERESIS_VAL_UNIT_SHIFT;
+		*modifier = 0;
+	}
 
 	/* Interface limits the value field to S32_MAX */
 	cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val;
 
 	reg_val_u32 = GLB_IDLE_TIMER_TIMEOUT_SET(0, cnt_val_u32);
 	/* add the source flag */
-	if (src_system_timestamp)
-		reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET(reg_val_u32,
-				GLB_IDLE_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP);
-	else
-		reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET(reg_val_u32,
-				GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER);
+	reg_val_u32 = GLB_IDLE_TIMER_TIMER_SOURCE_SET(
+		reg_val_u32, (src_system_timestamp ? GLB_IDLE_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP :
+						     GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER));
 
 	return reg_val_u32;
 }
@@ -1702,16 +2118,22 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev)
 	u32 dur;
 
 	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	dur = kbdev->csf.gpu_idle_hysteresis_ms;
+	dur = kbdev->csf.gpu_idle_hysteresis_ns;
 	kbase_csf_scheduler_spin_unlock(kbdev, flags);
 
 	return dur;
 }
 
-u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur)
+u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur_ns)
 {
 	unsigned long flags;
-	const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur);
+	u32 modifier = 0;
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS, &modifier);
+#else
+	const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur_ns, &modifier);
+#endif
 
 	/* The 'fw_load_lock' is taken to synchronize against the deferred
 	 * loading of FW, where the idle timer will be enabled.
@@ -1719,22 +2141,32 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev,
 	mutex_lock(&kbdev->fw_load_lock);
 	if (unlikely(!kbdev->csf.firmware_inited)) {
 		kbase_csf_scheduler_spin_lock(kbdev, &flags);
-		kbdev->csf.gpu_idle_hysteresis_ms = dur;
+		kbdev->csf.gpu_idle_hysteresis_ns = dur_ns;
 		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
 		kbase_csf_scheduler_spin_unlock(kbdev, flags);
 		mutex_unlock(&kbdev->fw_load_lock);
 		goto end;
 	}
 	mutex_unlock(&kbdev->fw_load_lock);
 
+	if (kbase_reset_gpu_prevent_and_wait(kbdev)) {
+		dev_warn(kbdev->dev,
+			 "Failed to prevent GPU reset when updating idle_hysteresis_time");
+		return kbdev->csf.gpu_idle_dur_count;
+	}
+
 	kbase_csf_scheduler_pm_active(kbdev);
-	if (kbase_csf_scheduler_wait_mcu_active(kbdev)) {
+	if (kbase_csf_scheduler_killable_wait_mcu_active(kbdev)) {
 		dev_err(kbdev->dev,
 			"Unable to activate the MCU, the idle hysteresis value shall remain unchanged");
 		kbase_csf_scheduler_pm_idle(kbdev);
+		kbase_reset_gpu_allow(kbdev);
+
 		return kbdev->csf.gpu_idle_dur_count;
 	}
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	/* The 'reg_lock' is also taken and is held till the update is not
 	 * complete, to ensure the update of idle timer value by multiple Users
 	 * gets serialized.
@@ -1743,22 +2175,49 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev,
 	/* The firmware only reads the new idle timer value when the timer is
 	 * disabled.
 	 */
-	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	kbase_csf_firmware_disable_gpu_idle_timer(kbdev);
-	kbase_csf_scheduler_spin_unlock(kbdev, flags);
-	/* Ensure that the request has taken effect */
-	wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK);
+#endif
 
-	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	kbdev->csf.gpu_idle_hysteresis_ms = dur;
-	kbdev->csf.gpu_idle_dur_count = hysteresis_val;
-	kbase_csf_firmware_enable_gpu_idle_timer(kbdev);
-	kbase_csf_scheduler_spin_unlock(kbdev, flags);
-	wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	kbase_csf_scheduler_lock(kbdev);
+	if (kbdev->csf.scheduler.gpu_idle_fw_timer_enabled) {
+#endif
+		/* The firmware only reads the new idle timer value when the timer is
+		 * disabled.
+		 */
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbase_csf_firmware_disable_gpu_idle_timer(kbdev);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		/* Ensure that the request has taken effect */
+		wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK);
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbdev->csf.gpu_idle_hysteresis_ns = dur_ns;
+		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
+		kbase_csf_firmware_enable_gpu_idle_timer(kbdev);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	} else {
+		/* Record the new values. Would be used later when timer is
+		 * enabled
+		 */
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbdev->csf.gpu_idle_hysteresis_ns = dur_ns;
+		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	}
+	kbase_csf_scheduler_unlock(kbdev);
+#else
 	mutex_unlock(&kbdev->csf.reg_lock);
+#endif
 
+	dev_dbg(kbdev->dev, "GPU suspend timeout updated: %i ns (0x%.8x)",
+		kbdev->csf.gpu_idle_hysteresis_ns,
+		kbdev->csf.gpu_idle_dur_count);
 	kbase_csf_scheduler_pm_idle(kbdev);
-
+	kbase_reset_gpu_allow(kbdev);
 end:
 	dev_dbg(kbdev->dev, "CSF set firmware idle hysteresis count-value: 0x%.8x",
 		hysteresis_val);
@@ -1766,15 +2225,18 @@ end:
 	return hysteresis_val;
 }
 
-static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us)
+static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_ns,
+					    u32 *modifier)
 {
-#define PWROFF_VAL_UNIT_SHIFT (10)
 	/* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */
 	u64 freq = arch_timer_get_cntfrq();
-	u64 dur_val = dur_us;
+	u64 dur_val = dur_ns;
 	u32 cnt_val_u32, reg_val_u32;
 	bool src_system_timestamp = freq > 0;
 
+	const struct kbase_pm_policy *current_policy = kbase_pm_get_policy(kbdev);
+	bool always_on = current_policy == &kbase_pm_always_on_policy_ops;
+
 	if (!src_system_timestamp) {
 		/* Get the cycle_counter source alternative */
 		spin_lock(&kbdev->pm.clk_rtm.lock);
@@ -1784,49 +2246,76 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3
 			dev_warn(kbdev->dev, "No GPU clock, unexpected integration issue!");
 		spin_unlock(&kbdev->pm.clk_rtm.lock);
 
-		dev_info(kbdev->dev, "Can't get the timestamp frequency, "
-			 "use cycle counter with MCU Core Poweroff timer!");
+		dev_info(
+			kbdev->dev,
+			"Can't get the timestamp frequency, use cycle counter with MCU shader Core Poweroff timer!");
 	}
 
-	/* Formula for dur_val = ((dur_us/1e6) * freq_HZ) >> 10) */
-	dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT;
-	dur_val = div_u64(dur_val, 1000000);
+	/* Formula for dur_val = (dur/1e9) * freq_HZ) */
+	dur_val = dur_val * freq;
+	dur_val = div_u64(dur_val, NSEC_PER_SEC);
+	if (dur_val < S32_MAX) {
+		*modifier = 1;
+	} else {
+		dur_val = dur_val >> HYSTERESIS_VAL_UNIT_SHIFT;
+		*modifier = 0;
+	}
 
-	/* Interface limits the value field to S32_MAX */
-	cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val;
+	if (dur_val == 0 && !always_on) {
+		/* Lower Bound - as 0 disables timeout and host controls shader-core power management. */
+		cnt_val_u32 = 1;
+	} else if (dur_val > S32_MAX) {
+		/* Upper Bound - as interface limits the field to S32_MAX */
+		cnt_val_u32 = S32_MAX;
+	} else {
+		cnt_val_u32 = (u32)dur_val;
+	}
 
 	reg_val_u32 = GLB_PWROFF_TIMER_TIMEOUT_SET(0, cnt_val_u32);
 	/* add the source flag */
-	if (src_system_timestamp)
-		reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET(reg_val_u32,
-				GLB_PWROFF_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP);
-	else
-		reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET(reg_val_u32,
-				GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER);
+	reg_val_u32 = GLB_PWROFF_TIMER_TIMER_SOURCE_SET(
+				reg_val_u32,
+				(src_system_timestamp ? GLB_PWROFF_TIMER_TIMER_SOURCE_SYSTEM_TIMESTAMP :
+							GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER));
 
 	return reg_val_u32;
 }
 
 u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev)
 {
-	return kbdev->csf.mcu_core_pwroff_dur_us;
+	u32 pwroff;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	pwroff = kbdev->csf.mcu_core_pwroff_dur_ns;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return pwroff;
 }
 
-u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur)
+u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur_ns)
 {
 	unsigned long flags;
-	const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur);
+	u32 modifier = 0;
+
+	const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur_ns, &modifier);
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	kbdev->csf.mcu_core_pwroff_dur_us = dur;
+	kbdev->csf.mcu_core_pwroff_dur_ns = dur_ns;
 	kbdev->csf.mcu_core_pwroff_dur_count = pwroff;
+	kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier;
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
-	dev_dbg(kbdev->dev, "MCU Core Poweroff input update: 0x%.8x", pwroff);
+	dev_dbg(kbdev->dev, "MCU shader Core Poweroff input update: 0x%.8x", pwroff);
 
 	return pwroff;
 }
 
+u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev)
+{
+	return kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS);
+}
+
 /**
  * kbase_device_csf_iterator_trace_init - Send request to enable iterator
  *                                        trace port.
@@ -1838,19 +2327,25 @@ u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32
 static int kbase_device_csf_iterator_trace_init(struct kbase_device *kbdev)
 {
 	/* Enable the iterator trace port if supported by the GPU.
-	 * It requires the GPU to have a nonzero "iter_trace_enable"
+	 * It requires the GPU to have a nonzero "iter-trace-enable"
 	 * property in the device tree, and the FW must advertise
 	 * this feature in GLB_FEATURES.
 	 */
 	if (kbdev->pm.backend.gpu_powered) {
-		/* check device tree for iterator trace enable property */
+		/* check device tree for iterator trace enable property
+		 * and fallback to "iter_trace_enable" if it is not found
+		 */
 		const void *iter_trace_param = of_get_property(
 					       kbdev->dev->of_node,
-					       "iter_trace_enable", NULL);
+					       "iter-trace-enable", NULL);
 
 		const struct kbase_csf_global_iface *iface =
 						&kbdev->csf.global_iface;
 
+		if (!iter_trace_param)
+			iter_trace_param =
+				of_get_property(kbdev->dev->of_node, "iter_trace_enable", NULL);
+
 		if (iter_trace_param) {
 			u32 iter_trace_value = be32_to_cpup(iter_trace_param);
 
@@ -1889,50 +2384,105 @@ static int kbase_device_csf_iterator_trace_init(struct kbase_device *kbdev)
 	return 0;
 }
 
+static void coredump_worker(struct work_struct *data)
+{
+	struct kbase_device *kbdev = container_of(data, struct kbase_device, csf.coredump_work);
+
+	kbasep_platform_event_core_dump(kbdev, "GPU hang");
+}
+
 int kbase_csf_firmware_early_init(struct kbase_device *kbdev)
 {
+	u32 modifier = 0;
+
 	init_waitqueue_head(&kbdev->csf.event_wait);
 	kbdev->csf.interrupt_received = false;
 
 	kbdev->csf.fw_timeout_ms =
 		kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_TIMEOUT);
 
-	kbdev->csf.gpu_idle_hysteresis_ms = FIRMWARE_IDLE_HYSTERESIS_TIME_MS;
-#ifdef KBASE_PM_RUNTIME
-	if (kbase_pm_gpu_sleep_allowed(kbdev))
-		kbdev->csf.gpu_idle_hysteresis_ms /=
-			FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER;
-#endif
-	WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ms);
-	kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count(
-		kbdev, kbdev->csf.gpu_idle_hysteresis_ms);
-
-	kbdev->csf.mcu_core_pwroff_dur_us = DEFAULT_GLB_PWROFF_TIMEOUT_US;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/* Set to the lowest possible value for FW to immediately write
+	 * to the power off register to disable the cores.
+	 */
+	kbdev->csf.mcu_core_pwroff_dur_count = 1;
+#else
+	kbdev->csf.mcu_core_pwroff_dur_ns = DEFAULT_GLB_PWROFF_TIMEOUT_NS;
 	kbdev->csf.mcu_core_pwroff_dur_count = convert_dur_to_core_pwroff_count(
-		kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_US);
+		kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS, &modifier);
+	kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier;
+#endif
 
+	kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_interfaces);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_config);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_timeline_metadata);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_trace_buffers.list);
+	INIT_LIST_HEAD(&kbdev->csf.user_reg.list);
 	INIT_WORK(&kbdev->csf.firmware_reload_work,
 		  kbase_csf_firmware_reload_worker);
 	INIT_WORK(&kbdev->csf.fw_error_work, firmware_error_worker);
+	INIT_WORK(&kbdev->csf.coredump_work, coredump_worker);
 
+	init_rwsem(&kbdev->csf.pmode_sync_sem);
 	mutex_init(&kbdev->csf.reg_lock);
+	kbase_csf_pending_gpuq_kicks_init(kbdev);
+
+	kbdev->csf.fw = (struct kbase_csf_mcu_fw){ .data = NULL };
+
+	return 0;
+}
+
+void kbase_csf_firmware_early_term(struct kbase_device *kbdev)
+{
+	kbase_csf_pending_gpuq_kicks_term(kbdev);
+	mutex_destroy(&kbdev->csf.reg_lock);
+}
+
+int kbase_csf_firmware_late_init(struct kbase_device *kbdev)
+{
+	u32 modifier = 0;
+
+	kbdev->csf.gpu_idle_hysteresis_ns = FIRMWARE_IDLE_HYSTERESIS_TIME_NS;
+
+#ifdef KBASE_PM_RUNTIME
+	if (kbase_pm_gpu_sleep_allowed(kbdev))
+		kbdev->csf.gpu_idle_hysteresis_ns /= FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER;
+#endif
+	WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ns);
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count(
+		kbdev, MALI_HOST_CONTROLS_SC_RAILS_IDLE_TIMER_NS, &modifier);
+
+	/* Set to the lowest possible value for FW to immediately write
+	 * to the power off register to disable the cores.
+	 */
+	kbdev->csf.mcu_core_pwroff_dur_count = 1;
+#else
+	kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count(
+		kbdev, kbdev->csf.gpu_idle_hysteresis_ns, &modifier);
+	kbdev->csf.gpu_idle_dur_count_modifier = modifier;
+	kbdev->csf.mcu_core_pwroff_dur_ns = DEFAULT_GLB_PWROFF_TIMEOUT_NS;
+	kbdev->csf.mcu_core_pwroff_dur_count = convert_dur_to_core_pwroff_count(
+		kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS, &modifier);
+	kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier;
+#endif
 
 	return 0;
 }
 
-int kbase_csf_firmware_init(struct kbase_device *kbdev)
+int kbase_csf_firmware_load_init(struct kbase_device *kbdev)
 {
-	const struct firmware *firmware;
+	const struct firmware *firmware = NULL;
+	struct kbase_csf_mcu_fw *const mcu_fw = &kbdev->csf.fw;
 	const u32 magic = FIRMWARE_HEADER_MAGIC;
 	u8 version_major, version_minor;
 	u32 version_hash;
 	u32 entry_end_offset;
 	u32 entry_offset;
 	int ret;
+	const char *fw_name = default_fw_name;
 
 	lockdep_assert_held(&kbdev->fw_load_lock);
 
@@ -1953,51 +2503,95 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev)
 	if (ret != 0) {
 		dev_err(kbdev->dev,
 			"Failed to setup the rb tree for managing shared interface segment\n");
-		goto error;
+		goto err_out;
+	}
+
+#if IS_ENABLED(CONFIG_OF)
+	/* If we can't read CSF firmware name from DTB,
+	 * fw_name is not modified and remains the default.
+	 */
+	ret = of_property_read_string(kbdev->dev->of_node, "firmware-name", &fw_name);
+	if (ret == -EINVAL) {
+		/* Property doesn't exist in DTB, and fw_name already points to default FW name
+		 * so just reset return value and continue.
+		 */
+		ret = 0;
+	} else if (ret == -ENODATA) {
+		dev_warn(kbdev->dev,
+			 "\"firmware-name\" DTB property contains no data, using default FW name");
+		/* Reset return value so FW does not fail to load */
+		ret = 0;
+	} else if (ret == -EILSEQ) {
+		/* This is reached when the size of the fw_name buffer is too small for the string
+		 * stored in the DTB and the null terminator.
+		 */
+		dev_warn(kbdev->dev,
+			 "\"firmware-name\" DTB property value too long, using default FW name.");
+		/* Reset return value so FW does not fail to load */
+		ret = 0;
 	}
 
+#endif /* IS_ENABLED(CONFIG_OF) */
+
 	if (request_firmware(&firmware, fw_name, kbdev->dev) != 0) {
 		dev_err(kbdev->dev,
 				"Failed to load firmware image '%s'\n",
 				fw_name);
 		ret = -ENOENT;
-		goto error;
+	} else {
+		/* Try to save a copy and then release the loaded firmware image */
+		mcu_fw->size = firmware->size;
+		mcu_fw->data = vmalloc((unsigned long)mcu_fw->size);
+
+		if (mcu_fw->data == NULL) {
+			ret = -ENOMEM;
+		} else {
+			memcpy(mcu_fw->data, firmware->data, mcu_fw->size);
+			dev_dbg(kbdev->dev, "Firmware image (%zu-bytes) retained in csf.fw\n",
+				mcu_fw->size);
+		}
+
+		release_firmware(firmware);
 	}
 
-	if (firmware->size < FIRMWARE_HEADER_LENGTH) {
+	/* If error in loading or saving the image, branches to error out */
+	if (ret)
+		goto err_out;
+
+	if (mcu_fw->size < FIRMWARE_HEADER_LENGTH) {
 		dev_err(kbdev->dev, "Firmware too small\n");
 		ret = -EINVAL;
-		goto error;
+		goto err_out;
 	}
 
-	if (memcmp(firmware->data, &magic, sizeof(magic)) != 0) {
+	if (memcmp(mcu_fw->data, &magic, sizeof(magic)) != 0) {
 		dev_err(kbdev->dev, "Incorrect firmware magic\n");
 		ret = -EINVAL;
-		goto error;
+		goto err_out;
 	}
 
-	version_minor = firmware->data[4];
-	version_major = firmware->data[5];
+	version_minor = mcu_fw->data[4];
+	version_major = mcu_fw->data[5];
 
-	if (version_major != FIRMWARE_HEADER_VERSION) {
+	if (version_major != FIRMWARE_HEADER_VERSION_MAJOR ||
+			version_minor != FIRMWARE_HEADER_VERSION_MINOR) {
 		dev_err(kbdev->dev,
 				"Firmware header version %d.%d not understood\n",
 				version_major, version_minor);
 		ret = -EINVAL;
-		goto error;
+		goto err_out;
 	}
 
-	memcpy(&version_hash, &firmware->data[8], sizeof(version_hash));
+	memcpy(&version_hash, &mcu_fw->data[8], sizeof(version_hash));
 
 	dev_notice(kbdev->dev, "Loading Mali firmware 0x%x", version_hash);
 
-	memcpy(&entry_end_offset, &firmware->data[0x10],
-			sizeof(entry_end_offset));
+	memcpy(&entry_end_offset, &mcu_fw->data[0x10], sizeof(entry_end_offset));
 
-	if (entry_end_offset > firmware->size) {
+	if (entry_end_offset > mcu_fw->size) {
 		dev_err(kbdev->dev, "Firmware image is truncated\n");
 		ret = -EINVAL;
-		goto error;
+		goto err_out;
 	}
 
 	entry_offset = FIRMWARE_HEADER_LENGTH;
@@ -2005,15 +2599,14 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev)
 		u32 header;
 		unsigned int size;
 
-		memcpy(&header, &firmware->data[entry_offset], sizeof(header));
+		memcpy(&header, &mcu_fw->data[entry_offset], sizeof(header));
 
 		size = entry_size(header);
 
-		ret = load_firmware_entry(kbdev, firmware, entry_offset,
-				header);
+		ret = load_firmware_entry(kbdev, mcu_fw, entry_offset, header);
 		if (ret != 0) {
 			dev_err(kbdev->dev, "Failed to load firmware image\n");
-			goto error;
+			goto err_out;
 		}
 		entry_offset += size;
 	}
@@ -2021,75 +2614,104 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev)
 	if (!kbdev->csf.shared_interface) {
 		dev_err(kbdev->dev, "Shared interface region not found\n");
 		ret = -EINVAL;
-		goto error;
+		goto err_out;
 	} else {
 		ret = setup_shared_iface_static_region(kbdev);
 		if (ret != 0) {
 			dev_err(kbdev->dev, "Failed to insert a region for shared iface entry parsed from fw image\n");
-			goto error;
+			goto err_out;
 		}
 	}
 
 	ret = kbase_csf_firmware_trace_buffers_init(kbdev);
 	if (ret != 0) {
 		dev_err(kbdev->dev, "Failed to initialize trace buffers\n");
+		goto err_out;
+	}
+
+	ret = kbasep_platform_fw_config_init(kbdev);
+	if (ret != 0) {
+		dev_err(kbdev->dev, "Failed to perform platform specific FW configuration");
+		goto err_out;
+	}
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	ret = kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(kbdev);
+	if (ret != 0) {
+		dev_err(kbdev->dev, "Failed to enable SC PM WA");
 		goto error;
 	}
+#endif
+
+	ret = kbase_csf_firmware_cfg_fw_wa_init(kbdev);
+	if (ret != 0) {
+		dev_err(kbdev->dev, "Failed to initialize firmware workarounds");
+		goto err_out;
+	}
 
 	/* Make sure L2 cache is powered up */
 	kbase_pm_wait_for_l2_powered(kbdev);
 
 	/* Load the MMU tables into the selected address space */
-	load_mmu_tables(kbdev);
+	ret = load_mmu_tables(kbdev);
+	if (ret != 0)
+		goto err_out;
 
 	boot_csf_firmware(kbdev);
 
 	ret = parse_capabilities(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = kbase_csf_doorbell_mapping_init(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = kbase_csf_scheduler_init(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = kbase_csf_setup_dummy_user_reg_page(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = kbase_csf_timeout_init(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = global_init_on_boot(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
+
+	ret = kbase_csf_firmware_log_init(kbdev);
+	if (ret != 0) {
+		dev_err(kbdev->dev, "Failed to initialize FW trace (err %d)", ret);
+		goto err_out;
+	}
 
 	ret = kbase_csf_firmware_cfg_init(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
 
 	ret = kbase_device_csf_iterator_trace_init(kbdev);
 	if (ret != 0)
-		goto error;
+		goto err_out;
+
+	if (kbdev->csf.fw_core_dump.available)
+		kbase_csf_firmware_core_dump_init(kbdev);
 
-	/* Firmware loaded successfully */
-	release_firmware(firmware);
-	KBASE_KTRACE_ADD(kbdev, FIRMWARE_BOOT, NULL,
+	/* Firmware loaded successfully, ret = 0 */
+	KBASE_KTRACE_ADD(kbdev, CSF_FIRMWARE_BOOT, NULL,
 			(((u64)version_hash) << 32) |
 			(((u64)version_major) << 8) | version_minor);
 	return 0;
 
-error:
-	kbase_csf_firmware_term(kbdev);
-	release_firmware(firmware);
+err_out:
+	kbase_csf_firmware_unload_term(kbdev);
 	return ret;
 }
 
-void kbase_csf_firmware_term(struct kbase_device *kbdev)
+void kbase_csf_firmware_unload_term(struct kbase_device *kbdev)
 {
 	unsigned long flags;
 	int ret = 0;
@@ -2102,6 +2724,8 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev)
 
 	kbase_csf_firmware_cfg_term(kbdev);
 
+	kbase_csf_firmware_log_term(kbdev);
+
 	kbase_csf_timeout_term(kbdev);
 
 	kbase_csf_free_dummy_user_reg_page(kbdev);
@@ -2129,6 +2753,8 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev)
 
 	unload_mmu_tables(kbdev);
 
+	kbase_csf_firmware_cfg_fw_wa_term(kbdev);
+
 	kbase_csf_firmware_trace_buffers_term(kbdev);
 
 	while (!list_empty(&kbdev->csf.firmware_interfaces)) {
@@ -2175,19 +2801,137 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev)
 		kfree(metadata);
 	}
 
+	if (kbdev->csf.fw.data) {
+		/* Free the copy of the firmware image */
+		vfree(kbdev->csf.fw.data);
+		kbdev->csf.fw.data = NULL;
+		dev_dbg(kbdev->dev, "Free retained image csf.fw (%zu-bytes)\n", kbdev->csf.fw.size);
+	}
+
 	/* This will also free up the region allocated for the shared interface
 	 * entry parsed from the firmware image.
 	 */
 	kbase_mcu_shared_interface_region_tracker_term(kbdev);
 
-	mutex_destroy(&kbdev->csf.reg_lock);
-
 	kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu);
 
 	/* Release the address space */
 	kbdev->as_free |= MCU_AS_BITMASK;
 }
 
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+int kbase_csf_firmware_mcu_register_write(struct kbase_device *const kbdev, u32 const reg_addr,
+					  u32 const reg_val)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	unsigned long flags;
+	int err;
+	u32 glb_req;
+
+	mutex_lock(&kbdev->csf.reg_lock);
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	/* Set the address and value to write */
+	kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN0, reg_addr);
+	kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN1, reg_val);
+
+	/* Set the Global Debug request for FW MCU write */
+	glb_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK);
+	glb_req ^= GLB_DEBUG_REQ_FW_AS_WRITE_MASK;
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_req,
+					     GLB_DEBUG_REQ_FW_AS_WRITE_MASK);
+
+	set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK);
+
+	/* Notify FW about the Global Debug request */
+	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	err = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK);
+
+	mutex_unlock(&kbdev->csf.reg_lock);
+
+	dev_dbg(kbdev->dev, "w: reg %08x val %08x", reg_addr, reg_val);
+
+	return err;
+}
+
+int kbase_csf_firmware_mcu_register_read(struct kbase_device *const kbdev, u32 const reg_addr,
+					 u32 *reg_val)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	unsigned long flags;
+	int err;
+	u32 glb_req;
+
+	if (WARN_ON(reg_val == NULL))
+		return -EINVAL;
+
+	mutex_lock(&kbdev->csf.reg_lock);
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	/* Set the address to read */
+	kbase_csf_firmware_global_input(global_iface, GLB_DEBUG_ARG_IN0, reg_addr);
+
+	/* Set the Global Debug request for FW MCU read */
+	glb_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK);
+	glb_req ^= GLB_DEBUG_REQ_FW_AS_READ_MASK;
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_req,
+					     GLB_DEBUG_REQ_FW_AS_READ_MASK);
+
+	set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK);
+
+	/* Notify FW about the Global Debug request */
+	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	err = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK);
+
+	if (!err) {
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		*reg_val = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ARG_OUT0);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	}
+
+	mutex_unlock(&kbdev->csf.reg_lock);
+
+	dev_dbg(kbdev->dev, "r: reg %08x val %08x", reg_addr, *reg_val);
+
+	return err;
+}
+
+int kbase_csf_firmware_mcu_register_poll(struct kbase_device *const kbdev, u32 const reg_addr,
+					 u32 const val_mask, u32 const reg_val)
+{
+	unsigned long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms) + jiffies;
+	u32 read_val;
+
+	dev_dbg(kbdev->dev, "p: reg %08x val %08x mask %08x", reg_addr, reg_val, val_mask);
+
+	while (time_before(jiffies, remaining)) {
+		int err = kbase_csf_firmware_mcu_register_read(kbdev, reg_addr, &read_val);
+
+		if (err) {
+			dev_err(kbdev->dev,
+				"Error reading MCU register value (read_val = %u, expect = %u)\n",
+				read_val, reg_val);
+			return err;
+		}
+
+		if ((read_val & val_mask) == reg_val)
+			return 0;
+	}
+
+	dev_err(kbdev->dev,
+		"Timeout waiting for MCU register value to be set (read_val = %u, expect = %u)\n",
+		read_val, reg_val);
+
+	return -ETIMEDOUT;
+}
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
+
 void kbase_csf_firmware_enable_gpu_idle_timer(struct kbase_device *kbdev)
 {
 	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
@@ -2233,10 +2977,11 @@ void kbase_csf_firmware_ping(struct kbase_device *const kbdev)
 	kbase_csf_scheduler_spin_unlock(kbdev, flags);
 }
 
-int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev)
+int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev, unsigned int wait_timeout_ms)
 {
 	kbase_csf_firmware_ping(kbdev);
-	return wait_for_global_request(kbdev, GLB_REQ_PING_MASK);
+
+	return wait_for_global_request_with_timeout(kbdev, GLB_REQ_PING_MASK, wait_timeout_ms);
 }
 
 int kbase_csf_firmware_set_timeout(struct kbase_device *const kbdev,
@@ -2275,16 +3020,52 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev)
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 }
 
-void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev)
+int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev)
 {
-	int err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK);
+	int err;
+
+	err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK);
+
+	if (!err) {
+#define WAIT_TIMEOUT 5000 /* 50ms timeout */
+#define DELAY_TIME_IN_US 10
+		const int max_iterations = WAIT_TIMEOUT;
+		int loop;
 
-	if (err) {
-		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
+		/* Wait for the GPU to actually enter protected mode */
+		for (loop = 0; loop < max_iterations; loop++) {
+			unsigned long flags;
+			bool pmode_exited;
+
+			if (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)) &
+			    GPU_STATUS_PROTECTED_MODE_ACTIVE)
+				break;
+
+			/* Check if GPU already exited the protected mode */
+			kbase_csf_scheduler_spin_lock(kbdev, &flags);
+			pmode_exited =
+				!kbase_csf_scheduler_protected_mode_in_use(kbdev);
+			kbase_csf_scheduler_spin_unlock(kbdev, flags);
+			if (pmode_exited)
+				break;
+
+			udelay(DELAY_TIME_IN_US);
+		}
+
+		if (loop == max_iterations) {
+			dev_err(kbdev->dev, "Timeout for actual pmode entry after PROTM_ENTER ack");
+			err = -ETIMEDOUT;
+		}
+	}
+
+	if (unlikely(err)) {
+		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
 			kbase_reset_gpu(kbdev);
 	}
 
 	KBASE_TLSTREAM_AUX_PROTECTED_ENTER_END(kbdev, kbdev);
+
+	return err;
 }
 
 void kbase_csf_firmware_trigger_mcu_halt(struct kbase_device *kbdev)
@@ -2348,7 +3129,9 @@ int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev)
 
 	/* Ensure GPU is powered-up until we complete config update.*/
 	kbase_csf_scheduler_pm_active(kbdev);
-	kbase_csf_scheduler_wait_mcu_active(kbdev);
+	err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev);
+	if (err)
+		goto exit;
 
 	/* The 'reg_lock' is also taken and is held till the update is
 	 * complete, to ensure the config update gets serialized.
@@ -2365,6 +3148,7 @@ int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev)
 				      GLB_REQ_FIRMWARE_CONFIG_UPDATE_MASK);
 	mutex_unlock(&kbdev->csf.reg_lock);
 
+exit:
 	kbase_csf_scheduler_pm_idle(kbdev);
 	return err;
 }
@@ -2488,7 +3272,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 		gpu_map_prot =
 			KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
 		cpu_map_prot = pgprot_writecombine(cpu_map_prot);
-	};
+	}
 
 	phys = kmalloc_array(num_pages, sizeof(*phys), GFP_KERNEL);
 	if (!phys)
@@ -2498,9 +3282,8 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	if (!page_list)
 		goto page_list_alloc_error;
 
-	ret = kbase_mem_pool_alloc_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-		num_pages, phys, false);
+	ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages,
+					 phys, false, NULL);
 	if (ret <= 0)
 		goto phys_mem_pool_alloc_error;
 
@@ -2511,8 +3294,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	if (!cpu_addr)
 		goto vmap_error;
 
-	va_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0,
-			num_pages, KBASE_REG_ZONE_MCU_SHARED);
+	va_reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, num_pages);
 	if (!va_reg)
 		goto va_region_alloc_error;
 
@@ -2526,9 +3308,9 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	gpu_map_properties &= (KBASE_REG_GPU_RD | KBASE_REG_GPU_WR);
 	gpu_map_properties |= gpu_map_prot;
 
-	ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu,
-			va_reg->start_pfn, &phys[0], num_pages,
-			gpu_map_properties, KBASE_MEM_GROUP_CSF_FW);
+	ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, va_reg->start_pfn,
+					      &phys[0], num_pages, gpu_map_properties,
+					      KBASE_MEM_GROUP_CSF_FW, NULL, NULL);
 	if (ret)
 		goto mmu_insert_pages_error;
 
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware.h b/mali_kbase/csf/mali_kbase_csf_firmware.h
index 74bae39..15d7b58 100644
--- a/mali_kbase/csf/mali_kbase_csf_firmware.h
+++ b/mali_kbase/csf/mali_kbase_csf_firmware.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -56,7 +56,7 @@
 #define CSF_NUM_DOORBELL ((u8)24)
 
 /* Offset to the first HW doorbell page */
-#define CSF_HW_DOORBELL_PAGE_OFFSET ((u32)0x80000)
+#define CSF_HW_DOORBELL_PAGE_OFFSET ((u32)DOORBELLS_BASE)
 
 /* Size of HW Doorbell page, used to calculate the offset to subsequent pages */
 #define CSF_HW_DOORBELL_PAGE_SIZE ((u32)0x10000)
@@ -78,6 +78,13 @@
 /* MAX_SUPPORTED_STREAMS_PER_GROUP: Maximum CSs per csg. */
 #define MAX_SUPPORTED_STREAMS_PER_GROUP 32
 
+#define BUILD_INFO_METADATA_SIZE_OFFSET (0x4)
+#define BUILD_INFO_GIT_SHA_LEN (40U)
+#define BUILD_INFO_GIT_DIRTY_LEN (1U)
+#define BUILD_INFO_GIT_SHA_PATTERN "git_sha: "
+
+extern char fw_git_sha[BUILD_INFO_GIT_SHA_LEN];
+
 struct kbase_device;
 
 
@@ -324,24 +331,13 @@ u32 kbase_csf_firmware_global_input_read(
 u32 kbase_csf_firmware_global_output(
 	const struct kbase_csf_global_iface *iface, u32 offset);
 
-/* Calculate the offset to the Hw doorbell page corresponding to the
- * doorbell number.
+/**
+ * kbase_csf_ring_doorbell() - Ring the doorbell
+ *
+ * @kbdev:       An instance of the GPU platform device
+ * @doorbell_nr: Index of the HW doorbell page
  */
-static u32 csf_doorbell_offset(int doorbell_nr)
-{
-	WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL);
-
-	return CSF_HW_DOORBELL_PAGE_OFFSET +
-		(doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE);
-}
-
-static inline void kbase_csf_ring_doorbell(struct kbase_device *kbdev,
-					   int doorbell_nr)
-{
-	WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL);
-
-	kbase_reg_write(kbdev, csf_doorbell_offset(doorbell_nr), (u32)1);
-}
+void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr);
 
 /**
  * kbase_csf_read_firmware_memory - Read a value in a GPU address
@@ -374,7 +370,45 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev,
 	u32 gpu_addr, u32 value);
 
 /**
- * kbase_csf_firmware_early_init() - Early initializatin for the firmware.
+ * kbase_csf_read_firmware_memory_exe - Read a value in a GPU address in the
+ *                                      region of its final execution location.
+ *
+ * @kbdev:     Device pointer
+ * @gpu_addr:  GPU address to read
+ * @value:     Output pointer to which the read value will be written
+ *
+ * This function read a value in a GPU address that belongs to a private loaded
+ * firmware memory region based on its final execution location. The function
+ * assumes that the location is not permanently mapped on the CPU address space,
+ * therefore it maps it and then unmaps it to access it independently. This function
+ * needs to be used when accessing firmware memory regions which will be moved to
+ * their final execution location during firmware boot using an address based on the
+ * final execution location.
+ */
+void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 *value);
+
+/**
+ * kbase_csf_update_firmware_memory_exe - Write a value in a GPU address in the
+ *                                        region of its final execution location.
+ *
+ * @kbdev:     Device pointer
+ * @gpu_addr:  GPU address to write
+ * @value:     Value to write
+ *
+ * This function writes a value in a GPU address that belongs to a private loaded
+ * firmware memory region based on its final execution location. The function
+ * assumes that the location is not permanently mapped on the CPU address space,
+ * therefore it maps it and then unmaps it to access it independently. This function
+ * needs to be used when accessing firmware memory regions which will be moved to
+ * their final execution location during firmware boot using an address based on the
+ * final execution location.
+ */
+void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 value);
+
+/**
+ * kbase_csf_firmware_early_init() - Early initialization for the firmware.
  * @kbdev: Kbase device
  *
  * Initialize resources related to the firmware. Must be called at kbase probe.
@@ -384,22 +418,87 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev,
 int kbase_csf_firmware_early_init(struct kbase_device *kbdev);
 
 /**
- * kbase_csf_firmware_init() - Load the firmware for the CSF MCU
+ * kbase_csf_firmware_early_term() - Terminate resources related to the firmware
+ *                                   after the firmware unload has been done.
+ *
+ * @kbdev: Device pointer
+ *
+ * This should be called only when kbase probe fails or gets rmmoded.
+ */
+void kbase_csf_firmware_early_term(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_late_init() - Late initialization for the firmware.
+ * @kbdev: Kbase device
+ *
+ * Initialize resources related to the firmware. But must be called after
+ * backend late init is done. Must be used at probe time only.
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_late_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_load_init() - Load the firmware for the CSF MCU
  * @kbdev: Kbase device
  *
  * Request the firmware from user space and load it into memory.
  *
  * Return: 0 if successful, negative error code on failure
  */
-int kbase_csf_firmware_init(struct kbase_device *kbdev);
+int kbase_csf_firmware_load_init(struct kbase_device *kbdev);
 
 /**
- * kbase_csf_firmware_term() - Unload the firmware
+ * kbase_csf_firmware_unload_term() - Unload the firmware
  * @kbdev: Kbase device
  *
- * Frees the memory allocated by kbase_csf_firmware_init()
+ * Frees the memory allocated by kbase_csf_firmware_load_init()
+ */
+void kbase_csf_firmware_unload_term(struct kbase_device *kbdev);
+
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+/**
+ * kbase_csf_firmware_mcu_register_write - Write to MCU register
+ *
+ * @kbdev:    Instance of a gpu platform device that implements a csf interface.
+ * @reg_addr: Register address to write into
+ * @reg_val:  Value to be written
+ *
+ * Write a desired value to a register in MCU address space.
+ *
+ * return: 0 on success, or negative on failure.
+ */
+int kbase_csf_firmware_mcu_register_write(struct kbase_device *const kbdev, u32 const reg_addr,
+					  u32 const reg_val);
+/**
+ * kbase_csf_firmware_mcu_register_read - Read from MCU register
+ *
+ * @kbdev:    Instance of a gpu platform device that implements a csf interface.
+ * @reg_addr: Register address to read from
+ * @reg_val:  Value as present in reg_addr register
+ *
+ * Read a value from MCU address space.
+ *
+ * return: 0 on success, or negative on failure.
+ */
+int kbase_csf_firmware_mcu_register_read(struct kbase_device *const kbdev, u32 const reg_addr,
+					 u32 *reg_val);
+
+/**
+ * kbase_csf_firmware_mcu_register_poll - Poll MCU register
+ *
+ * @kbdev:    Instance of a gpu platform device that implements a csf interface.
+ * @reg_addr: Register address to read from
+ * @val_mask: Value to mask the read value for comparison
+ * @reg_val:  Value to be compared against
+ *
+ * Continue to read a value from MCU address space until it matches given mask and value.
+ *
+ * return: 0 on success, or negative on failure.
  */
-void kbase_csf_firmware_term(struct kbase_device *kbdev);
+int kbase_csf_firmware_mcu_register_poll(struct kbase_device *const kbdev, u32 const reg_addr,
+					 u32 const val_mask, u32 const reg_val);
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 
 /**
  * kbase_csf_firmware_ping - Send the ping request to firmware.
@@ -414,13 +513,14 @@ void kbase_csf_firmware_ping(struct kbase_device *kbdev);
  * kbase_csf_firmware_ping_wait - Send the ping request to firmware and waits.
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @wait_timeout_ms: Timeout to get the acknowledgment for PING request from FW.
  *
  * The function sends the ping request to firmware and waits to confirm it is
  * alive.
  *
  * Return: 0 on success, or negative on failure.
  */
-int kbase_csf_firmware_ping_wait(struct kbase_device *kbdev);
+int kbase_csf_firmware_ping_wait(struct kbase_device *kbdev, unsigned int wait_timeout_ms);
 
 /**
  * kbase_csf_firmware_set_timeout - Set a hardware endpoint progress timeout.
@@ -454,11 +554,13 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev);
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
  *
- * This function needs to be called after kbase_csf_wait_protected_mode_enter()
- * to wait for the protected mode entry to complete. GPU reset is triggered if
+ * This function needs to be called after kbase_csf_enter_protected_mode() to
+ * wait for the GPU to actually enter protected mode. GPU reset is triggered if
  * the wait is unsuccessful.
+ *
+ * Return: 0 on success, or negative on failure.
  */
-void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev);
+int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev);
 
 static inline bool kbase_csf_firmware_mcu_halted(struct kbase_device *kbdev)
 {
@@ -523,9 +625,9 @@ bool kbase_csf_firmware_is_mcu_in_sleep(struct kbase_device *kbdev);
 #endif
 
 /**
- * kbase_trigger_firmware_reload - Trigger the reboot of MCU firmware, for the
- *                                 cold boot case firmware image would be
- *                                 reloaded from filesystem into memory.
+ * kbase_csf_firmware_trigger_reload() - Trigger the reboot of MCU firmware, for
+ *                                       the cold boot case firmware image would
+ *                                       be reloaded from filesystem into memory.
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
  */
@@ -738,18 +840,18 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev);
 u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur);
 
 /**
- * kbase_csf_firmware_get_mcu_core_pwroff_time - Get the MCU core power-off
+ * kbase_csf_firmware_get_mcu_core_pwroff_time - Get the MCU shader Core power-off
  *                                               time value
  *
  * @kbdev:   Instance of a GPU platform device that implements a CSF interface.
  *
- * Return: the internally recorded MCU core power-off (nominal) value. The unit
+ * Return: the internally recorded MCU shader Core power-off (nominal) timeout value. The unit
  *         of the value is in micro-seconds.
  */
 u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev);
 
 /**
- * kbase_csf_firmware_set_mcu_core_pwroff_time - Set the MCU core power-off
+ * kbase_csf_firmware_set_mcu_core_pwroff_time - Set the MCU shader Core power-off
  *                                               time value
  *
  * @kbdev:   Instance of a GPU platform device that implements a CSF interface.
@@ -766,7 +868,7 @@ u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev);
  * returned value is the source configuration flag, and it is set to '1'
  * when CYCLE_COUNTER alternative source is used.
  *
- * The configured MCU core power-off timer will only have effect when the host
+ * The configured MCU shader Core power-off timer will only have effect when the host
  * driver has delegated the shader cores' power management to MCU.
  *
  * Return: the actual internal core power-off timer value in register defined
@@ -775,6 +877,22 @@ u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev);
 u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur);
 
 /**
+ * kbase_csf_firmware_reset_mcu_core_pwroff_time - Reset the MCU shader Core power-off
+ *                                               time value
+ *
+ * @kbdev:   Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Sets the MCU Shader Core power-off time value to the default.
+ *
+ * The configured MCU shader Core power-off timer will only have effect when the host
+ * driver has delegated the shader cores' power management to MCU.
+ *
+ * Return: the actual internal core power-off timer value in register defined
+ *         format.
+ */
+u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev);
+
+/**
  * kbase_csf_interface_version - Helper function to build the full firmware
  *                               interface version in a format compatible with
  *                               GLB_VERSION register
@@ -805,4 +923,27 @@ static inline u32 kbase_csf_interface_version(u32 major, u32 minor, u32 patch)
  * Return: 0 if success, or negative error code on failure.
  */
 int kbase_csf_trigger_firmware_config_update(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_debug_dump_registers - Print CSF debug message.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Prints CSF debug message cccontaining critical CSF firmware information.
+ * GPU must be powered during this call.
+ */
+void kbase_csf_debug_dump_registers(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_req_core_dump - Request a firmware core dump
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Request a firmware core dump and wait for for firmware to acknowledge.
+ * Firmware will enter infinite loop after the firmware core dump is created.
+ *
+ * Return: 0 if success, or negative error code on failure.
+ */
+int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev);
+
 #endif
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c
index b114817..48ddbb5 100644
--- a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,12 +20,23 @@
  */
 
 #include <mali_kbase.h>
-#include "mali_kbase_csf_firmware_cfg.h"
 #include <mali_kbase_reset_gpu.h>
+#include <linux/version.h>
+
+#include "mali_kbase_csf_firmware_cfg.h"
+#include "mali_kbase_csf_firmware_log.h"
 
 #if CONFIG_SYSFS
 #define CSF_FIRMWARE_CFG_SYSFS_DIR_NAME "firmware_config"
 
+#define CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME "Log verbosity"
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+#define HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME "Host controls SC rails"
+#endif
+
+#define CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME "WA_CFG0"
+
 /**
  * struct firmware_config - Configuration item within the MCU firmware
  *
@@ -107,7 +118,7 @@ static ssize_t show_fw_cfg(struct kobject *kobj,
 		return -EINVAL;
 	}
 
-	return snprintf(buf, PAGE_SIZE, "%u\n", val);
+	return scnprintf(buf, PAGE_SIZE, "%u\n", val);
 }
 
 static ssize_t store_fw_cfg(struct kobject *kobj,
@@ -124,7 +135,7 @@ static ssize_t store_fw_cfg(struct kobject *kobj,
 
 	if (attr == &fw_cfg_attr_cur) {
 		unsigned long flags;
-		u32 val;
+		u32 val, cur_val;
 		int ret = kstrtouint(buf, 0, &val);
 
 		if (ret) {
@@ -135,11 +146,22 @@ static ssize_t store_fw_cfg(struct kobject *kobj,
 			return -EINVAL;
 		}
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+		if (!strcmp(config->name,
+			    HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME))
+			return -EPERM;
+#endif
+		if (!strcmp(config->name,
+			    CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME))
+			return -EPERM;
+
 		if ((val < config->min) || (val > config->max))
 			return -EINVAL;
 
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-		if (config->cur_val == val) {
+
+		cur_val = config->cur_val;
+		if (cur_val == val) {
 			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 			return count;
 		}
@@ -176,6 +198,20 @@ static ssize_t store_fw_cfg(struct kobject *kobj,
 
 		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
+		/* Enable FW logging only if Log verbosity is non-zero */
+		if (!strcmp(config->name, CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME) &&
+		    (!cur_val || !val)) {
+			ret = kbase_csf_firmware_log_toggle_logging_calls(kbdev, val);
+			if (ret) {
+				/* Undo FW configuration changes */
+				spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+				config->cur_val = cur_val;
+				kbase_csf_update_firmware_memory(kbdev, config->address, cur_val);
+				spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+				return ret;
+			}
+		}
+
 		/* If we can update the config without firmware reset then
 		 * we need to just trigger FIRMWARE_CONFIG_UPDATE.
 		 */
@@ -209,11 +245,18 @@ static struct attribute *fw_cfg_attrs[] = {
 	&fw_cfg_attr_cur,
 	NULL,
 };
+#if (KERNEL_VERSION(5, 2, 0) <= LINUX_VERSION_CODE)
+ATTRIBUTE_GROUPS(fw_cfg);
+#endif
 
 static struct kobj_type fw_cfg_kobj_type = {
 	.release = &fw_cfg_kobj_release,
 	.sysfs_ops = &fw_cfg_ops,
+#if (KERNEL_VERSION(5, 2, 0) <= LINUX_VERSION_CODE)
+	.default_groups = fw_cfg_groups,
+#else
 	.default_attrs = fw_cfg_attrs,
+#endif
 };
 
 int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev)
@@ -236,6 +279,19 @@ int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev)
 		kbase_csf_read_firmware_memory(kbdev, config->address,
 			&config->cur_val);
 
+		if (!strcmp(config->name, CSF_FIRMWARE_CFG_LOG_VERBOSITY_ENTRY_NAME) &&
+		    (config->cur_val)) {
+			err = kbase_csf_firmware_log_toggle_logging_calls(config->kbdev,
+				config->cur_val);
+
+			if (err) {
+				kobject_put(&config->kobj);
+				dev_err(kbdev->dev, "Failed to enable logging (result: %d)", err);
+				return err;
+			}
+		}
+
+
 		err = kobject_init_and_add(&config->kobj, &fw_cfg_kobj_type,
 				kbdev->csf.fw_cfg_kobj, "%s", config->name);
 		if (err) {
@@ -273,9 +329,8 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev)
 }
 
 int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev,
-					      const struct firmware *fw,
-					      const u32 *entry,
-					      unsigned int size, bool updatable)
+					      const struct kbase_csf_mcu_fw *const fw,
+					      const u32 *entry, unsigned int size, bool updatable)
 {
 	const char *name = (char *)&entry[3];
 	struct firmware_config *config;
@@ -307,6 +362,108 @@ int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev,
 
 	return 0;
 }
+
+int kbase_csf_firmware_cfg_find_config_address(struct kbase_device *kbdev, const char *name, u32* addr)
+{
+	struct firmware_config *config;
+
+	list_for_each_entry(config, &kbdev->csf.firmware_config, node) {
+		if (strcmp(config->name, name) || !config->address)
+			continue;
+
+		*addr = config->address;
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev)
+{
+	struct firmware_config *config;
+
+	/* "quirks_ext" property is optional */
+	if (!kbdev->csf.quirks_ext)
+		return 0;
+
+	list_for_each_entry(config, &kbdev->csf.firmware_config, node) {
+		if (strcmp(config->name, CSF_FIRMWARE_CFG_WA_CFG0_ENTRY_NAME))
+			continue;
+		dev_info(kbdev->dev, "External quirks 0: 0x%08x", kbdev->csf.quirks_ext[0]);
+		kbase_csf_update_firmware_memory(kbdev, config->address, kbdev->csf.quirks_ext[0]);
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev)
+{
+	struct firmware_config *config;
+
+	list_for_each_entry(config, &kbdev->csf.firmware_config, node) {
+		if (strcmp(config->name,
+			   HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME))
+			continue;
+
+		kbase_csf_update_firmware_memory(kbdev, config->address, 1);
+		return 0;
+	}
+
+	return -ENOENT;
+}
+#endif
+
+int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev)
+{
+	int ret;
+	int entry_count;
+	size_t entry_bytes;
+
+	/* "quirks-ext" property is optional and may have no value.
+	 * Also try fallback "quirks_ext" property if it doesn't exist.
+	 */
+	entry_count = of_property_count_u32_elems(kbdev->dev->of_node, "quirks-ext");
+
+	if (entry_count == -EINVAL)
+		entry_count = of_property_count_u32_elems(kbdev->dev->of_node, "quirks_ext");
+
+	if (entry_count == -EINVAL || entry_count == -ENODATA)
+		return 0;
+
+	entry_bytes = entry_count * sizeof(u32);
+	kbdev->csf.quirks_ext = kzalloc(entry_bytes, GFP_KERNEL);
+	if (!kbdev->csf.quirks_ext)
+		return -ENOMEM;
+
+	ret = of_property_read_u32_array(kbdev->dev->of_node, "quirks-ext", kbdev->csf.quirks_ext,
+					 entry_count);
+
+	if (ret == -EINVAL)
+		ret = of_property_read_u32_array(kbdev->dev->of_node, "quirks_ext",
+						 kbdev->csf.quirks_ext, entry_count);
+
+	if (ret == -EINVAL || ret == -ENODATA) {
+		/* This is unexpected since the property is already accessed for counting the number
+		 * of its elements.
+		 */
+		dev_err(kbdev->dev, "\"quirks_ext\" DTB property data read failed");
+		return ret;
+	}
+	if (ret == -EOVERFLOW) {
+		dev_err(kbdev->dev, "\"quirks_ext\" DTB property data size exceeds 32 bits");
+		return ret;
+	}
+
+	return kbase_csf_firmware_cfg_fw_wa_enable(kbdev);
+}
+
+void kbase_csf_firmware_cfg_fw_wa_term(struct kbase_device *kbdev)
+{
+	kfree(kbdev->csf.quirks_ext);
+}
+
 #else
 int kbase_csf_firmware_cfg_init(struct kbase_device *kbdev)
 {
@@ -319,9 +476,27 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev)
 }
 
 int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev,
-		const struct firmware *fw,
-		const u32 *entry, unsigned int size)
+					      const struct kbase_csf_mcu_fw *const fw,
+					      const u32 *entry, unsigned int size)
+{
+	return 0;
+}
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev)
 {
 	return 0;
 }
+#endif
+
+int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev)
+{
+	return 0;
+}
+
+int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev)
+{
+	return 0;
+}
+
 #endif /* CONFIG_SYSFS */
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h
index c2d2fc5..f565290 100644
--- a/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_cfg.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -67,8 +67,67 @@ void kbase_csf_firmware_cfg_term(struct kbase_device *kbdev);
  * Return: 0 if successful, negative error code on failure
  */
 int kbase_csf_firmware_cfg_option_entry_parse(struct kbase_device *kbdev,
-					      const struct firmware *fw,
-					      const u32 *entry,
-					      unsigned int size,
-					      bool updatable);
+					      const struct kbase_csf_mcu_fw *const fw,
+					      const u32 *entry, unsigned int size, bool updatable);
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+/**
+ * kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails() - Enable the config in FW to support
+ *                                                      Host based control of SC power rails
+ *
+ * Look for the config entry that enables support in FW for the Host based
+ * control of shader core power rails and set it before the intial boot
+ * or reload of firmware.
+ *
+ * @kbdev:     Kbase device structure
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_cfg_enable_host_ctrl_sc_rails(struct kbase_device *kbdev);
+#endif
+
+/**
+ * kbase_csf_firmware_cfg_find_config_address() - Get a FW config option address
+ *
+ * @kbdev:     Kbase device structure
+ * @name:      Name of cfg option to find
+ * @addr:      Pointer to store the address
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_cfg_find_config_address(struct kbase_device *kbdev, const char *name,
+					       u32 *addr);
+/**
+ * kbase_csf_firmware_cfg_fw_wa_enable() - Enable firmware workarounds configuration.
+ *
+ * @kbdev:     Kbase device structure
+ *
+ * Look for the config entry that enables support in FW for workarounds and set it according to
+ * the firmware workaround configuration before the initial boot or reload of firmware.
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_cfg_fw_wa_enable(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_cfg_fw_wa_init() - Initialize firmware workarounds configuration.
+ *
+ * @kbdev:     Kbase device structure
+ *
+ * Retrieve and save the firmware workarounds configuration from device-tree "quirks_ext" property.
+ * Then, look for the config entry that enables support in FW for workarounds and set it according
+ * to the configuration before the initial firmware boot.
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_cfg_fw_wa_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_cfg_fw_wa_term - Delete local cache for firmware workarounds configuration.
+ *
+ * @kbdev: Pointer to the Kbase device
+ *
+ */
+void kbase_csf_firmware_cfg_fw_wa_term(struct kbase_device *kbdev);
+
 #endif /* _KBASE_CSF_FIRMWARE_CFG_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c
new file mode 100644
index 0000000..e371db2
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.c
@@ -0,0 +1,833 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/file.h>
+#include <linux/elf.h>
+#include <linux/elfcore.h>
+#include <linux/version_compat_defs.h>
+
+#include "mali_kbase.h"
+#include "mali_kbase_csf_firmware_core_dump.h"
+#include "backend/gpu/mali_kbase_pm_internal.h"
+
+/*
+ * FW image header core dump data format supported.
+ * Currently only version 0.1 is supported.
+ */
+#define FW_CORE_DUMP_DATA_VERSION_MAJOR 0
+#define FW_CORE_DUMP_DATA_VERSION_MINOR 1
+
+/* Full version of the image header core dump data format */
+#define FW_CORE_DUMP_DATA_VERSION                                                                  \
+	((FW_CORE_DUMP_DATA_VERSION_MAJOR << 8) | FW_CORE_DUMP_DATA_VERSION_MINOR)
+
+/* Validity flag to indicate if the MCU registers in the buffer are valid */
+#define FW_MCU_STATUS_MASK 0x1
+#define FW_MCU_STATUS_VALID (1 << 0)
+
+/* Core dump entry fields */
+#define FW_CORE_DUMP_VERSION_INDEX 0
+#define FW_CORE_DUMP_START_ADDR_INDEX 1
+
+/* MCU registers stored by a firmware core dump */
+struct fw_core_dump_mcu {
+	u32 r0;
+	u32 r1;
+	u32 r2;
+	u32 r3;
+	u32 r4;
+	u32 r5;
+	u32 r6;
+	u32 r7;
+	u32 r8;
+	u32 r9;
+	u32 r10;
+	u32 r11;
+	u32 r12;
+	u32 sp;
+	u32 lr;
+	u32 pc;
+};
+
+/* Any ELF definitions used in this file are from elf.h/elfcore.h except
+ * when specific 32-bit versions are required (mainly for the
+ * ELF_PRSTATUS32 note that is used to contain the MCU registers).
+ */
+
+/* - 32-bit version of timeval structures used in ELF32 PRSTATUS note. */
+struct prstatus32_timeval {
+	int tv_sec;
+	int tv_usec;
+};
+
+/* - Structure defining ELF32 PRSTATUS note contents, as defined by the
+ *   GNU binutils BFD library used by GDB, in bfd/hosts/x86-64linux.h.
+ *   Note: GDB checks for the size of this structure to be 0x94.
+ *   Modified pr_reg (array containing the Arm 32-bit MCU registers) to
+ *   use u32[18] instead of elf_gregset32_t to prevent introducing new typedefs.
+ */
+struct elf_prstatus32 {
+	struct elf_siginfo pr_info;		/* Info associated with signal. */
+	short int pr_cursig;			/* Current signal. */
+	unsigned int pr_sigpend;		/* Set of pending signals. */
+	unsigned int pr_sighold;		/* Set of held signals. */
+	pid_t pr_pid;
+	pid_t pr_ppid;
+	pid_t pr_pgrp;
+	pid_t pr_sid;
+	struct prstatus32_timeval pr_utime;	/* User time. */
+	struct prstatus32_timeval pr_stime;	/* System time. */
+	struct prstatus32_timeval pr_cutime;	/* Cumulative user time. */
+	struct prstatus32_timeval pr_cstime;	/* Cumulative system time. */
+	u32 pr_reg[18];				/* GP registers. */
+	int pr_fpvalid;				/* True if math copro being used. */
+};
+
+/*
+ * struct fw_core_dump_seq_off - Iterator for seq_file operations used on 'fw_core_dump'
+ * debugfs file.
+ * @interface: current firmware memory interface
+ * @page_num: current page number (0..) within @interface
+ */
+struct fw_core_dump_seq_off {
+	struct kbase_csf_firmware_interface *interface;
+	u32 page_num;
+};
+
+/**
+ * fw_get_core_dump_mcu - Get the MCU registers saved by a firmware core dump
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @regs:  Pointer to a core dump mcu struct where the MCU registers are copied
+ *         to. Should be allocated by the called.
+ *
+ * Return: 0 if successfully copied the MCU registers, negative error code otherwise.
+ */
+static int fw_get_core_dump_mcu(struct kbase_device *kbdev, struct fw_core_dump_mcu *regs)
+{
+	unsigned int i;
+	u32 status = 0;
+	u32 data_addr = kbdev->csf.fw_core_dump.mcu_regs_addr;
+	u32 *data = (u32 *)regs;
+
+	/* Check if the core dump entry exposed the buffer */
+	if (!regs || !kbdev->csf.fw_core_dump.available)
+		return -EPERM;
+
+	/* Check if the data in the buffer is valid, if not, return error */
+	kbase_csf_read_firmware_memory(kbdev, data_addr, &status);
+	if ((status & FW_MCU_STATUS_MASK) != FW_MCU_STATUS_VALID)
+		return -EPERM;
+
+	/* According to image header documentation, the MCU registers core dump
+	 * buffer is 32-bit aligned.
+	 */
+	for (i = 1; i <= sizeof(struct fw_core_dump_mcu) / sizeof(u32); ++i)
+		kbase_csf_read_firmware_memory(kbdev, data_addr + i * sizeof(u32), &data[i - 1]);
+
+	return 0;
+}
+
+/**
+ * fw_core_dump_fill_elf_header - Initializes an ELF32 header
+ * @hdr:	ELF32 header to initialize
+ * @sections:	Number of entries in the ELF program header table
+ *
+ * Initializes an ELF32 header for an ARM 32-bit little-endian
+ * 'Core file' object file.
+ */
+static void fw_core_dump_fill_elf_header(struct elf32_hdr *hdr, unsigned int sections)
+{
+	/* Reset all members in header. */
+	memset(hdr, 0, sizeof(*hdr));
+
+	/* Magic number identifying file as an ELF object. */
+	memcpy(hdr->e_ident, ELFMAG, SELFMAG);
+
+	/* Identify file as 32-bit, little-endian, using current
+	 * ELF header version, with no OS or ABI specific ELF
+	 * extensions used.
+	 */
+	hdr->e_ident[EI_CLASS] = ELFCLASS32;
+	hdr->e_ident[EI_DATA] = ELFDATA2LSB;
+	hdr->e_ident[EI_VERSION] = EV_CURRENT;
+	hdr->e_ident[EI_OSABI] = ELFOSABI_NONE;
+
+	/* 'Core file' type of object file. */
+	hdr->e_type = ET_CORE;
+
+	/* ARM 32-bit architecture (AARCH32) */
+	hdr->e_machine = EM_ARM;
+
+	/* Object file version: the original format. */
+	hdr->e_version = EV_CURRENT;
+
+	/* Offset of program header table in file. */
+	hdr->e_phoff = sizeof(struct elf32_hdr);
+
+	/* No processor specific flags. */
+	hdr->e_flags = 0;
+
+	/* Size of the ELF header in bytes. */
+	hdr->e_ehsize = sizeof(struct elf32_hdr);
+
+	/* Size of the ELF program header entry in bytes. */
+	hdr->e_phentsize = sizeof(struct elf32_phdr);
+
+	/* Number of entries in the program header table. */
+	hdr->e_phnum = sections;
+}
+
+/**
+ * fw_core_dump_fill_elf_program_header_note - Initializes an ELF32 program header
+ * for holding auxiliary information
+ * @phdr:		ELF32 program header
+ * @file_offset:	Location of the note in the file in bytes
+ * @size:		Size of the note in bytes.
+ *
+ * Initializes an ELF32 program header describing auxiliary information (containing
+ * one or more notes) of @size bytes alltogether located in the file at offset
+ * @file_offset.
+ */
+static void fw_core_dump_fill_elf_program_header_note(struct elf32_phdr *phdr, u32 file_offset,
+						      u32 size)
+{
+	/* Auxiliary information (note) in program header. */
+	phdr->p_type = PT_NOTE;
+
+	/* Location of first note in file in bytes. */
+	phdr->p_offset = file_offset;
+
+	/* Size of all notes combined in bytes. */
+	phdr->p_filesz = size;
+
+	/* Other members not relevant for a note. */
+	phdr->p_vaddr = 0;
+	phdr->p_paddr = 0;
+	phdr->p_memsz = 0;
+	phdr->p_align = 0;
+	phdr->p_flags = 0;
+}
+
+/**
+ * fw_core_dump_fill_elf_program_header - Initializes an ELF32 program header for a loadable segment
+ * @phdr:		ELF32 program header to initialize.
+ * @file_offset:	Location of loadable segment in file in bytes
+ *                      (aligned to FW_PAGE_SIZE bytes)
+ * @vaddr:		32-bit virtual address where to write the segment
+ *                      (aligned to FW_PAGE_SIZE bytes)
+ * @size:		Size of the segment in bytes.
+ * @flags:		CSF_FIRMWARE_ENTRY_* flags describing access permissions.
+ *
+ * Initializes an ELF32 program header describing a loadable segment of
+ * @size bytes located in the file at offset @file_offset to be loaded
+ * at virtual address @vaddr with access permissions as described by
+ * CSF_FIRMWARE_ENTRY_* flags in @flags.
+ */
+static void fw_core_dump_fill_elf_program_header(struct elf32_phdr *phdr, u32 file_offset,
+						 u32 vaddr, u32 size, u32 flags)
+{
+	/* Loadable segment in program header. */
+	phdr->p_type = PT_LOAD;
+
+	/* Location of segment in file in bytes. Aligned to p_align bytes. */
+	phdr->p_offset = file_offset;
+
+	/* Virtual address of segment. Aligned to p_align bytes. */
+	phdr->p_vaddr = vaddr;
+
+	/* Physical address of segment. Not relevant. */
+	phdr->p_paddr = 0;
+
+	/* Size of segment in file and memory. */
+	phdr->p_filesz = size;
+	phdr->p_memsz = size;
+
+	/* Alignment of segment in the file and memory in bytes (integral power of 2). */
+	phdr->p_align = FW_PAGE_SIZE;
+
+	/* Set segment access permissions. */
+	phdr->p_flags = 0;
+	if (flags & CSF_FIRMWARE_ENTRY_READ)
+		phdr->p_flags |= PF_R;
+	if (flags & CSF_FIRMWARE_ENTRY_WRITE)
+		phdr->p_flags |= PF_W;
+	if (flags & CSF_FIRMWARE_ENTRY_EXECUTE)
+		phdr->p_flags |= PF_X;
+}
+
+/**
+ * fw_core_dump_get_prstatus_note_size - Calculates size of a ELF32 PRSTATUS note
+ * @name:	Name given to the PRSTATUS note.
+ *
+ * Calculates the size of a 32-bit PRSTATUS note (which contains information
+ * about a process like the current MCU registers) taking into account
+ * @name must be padded to a 4-byte multiple.
+ *
+ * Return: size of 32-bit PRSTATUS note in bytes.
+ */
+static unsigned int fw_core_dump_get_prstatus_note_size(char *name)
+{
+	return sizeof(struct elf32_note) + roundup(strlen(name) + 1, 4) +
+	       sizeof(struct elf_prstatus32);
+}
+
+/**
+ * fw_core_dump_fill_elf_prstatus - Initializes an ELF32 PRSTATUS structure
+ * @prs:	ELF32 PRSTATUS note to initialize
+ * @regs:	MCU registers to copy into the PRSTATUS note
+ *
+ * Initializes an ELF32 PRSTATUS structure with MCU registers @regs.
+ * Other process information is N/A for CSF Firmware.
+ */
+static void fw_core_dump_fill_elf_prstatus(struct elf_prstatus32 *prs,
+					   struct fw_core_dump_mcu *regs)
+{
+	/* Only fill in registers (32-bit) of PRSTATUS note. */
+	memset(prs, 0, sizeof(*prs));
+	prs->pr_reg[0] = regs->r0;
+	prs->pr_reg[1] = regs->r1;
+	prs->pr_reg[2] = regs->r2;
+	prs->pr_reg[3] = regs->r3;
+	prs->pr_reg[4] = regs->r4;
+	prs->pr_reg[5] = regs->r5;
+	prs->pr_reg[6] = regs->r0;
+	prs->pr_reg[7] = regs->r7;
+	prs->pr_reg[8] = regs->r8;
+	prs->pr_reg[9] = regs->r9;
+	prs->pr_reg[10] = regs->r10;
+	prs->pr_reg[11] = regs->r11;
+	prs->pr_reg[12] = regs->r12;
+	prs->pr_reg[13] = regs->sp;
+	prs->pr_reg[14] = regs->lr;
+	prs->pr_reg[15] = regs->pc;
+}
+
+/**
+ * fw_core_dump_create_prstatus_note - Creates an ELF32 PRSTATUS note
+ * @name:	Name for the PRSTATUS note
+ * @prs:	ELF32 PRSTATUS structure to put in the PRSTATUS note
+ * @created_prstatus_note:
+ *		Pointer to the allocated ELF32 PRSTATUS note
+ *
+ * Creates an ELF32 note with one PRSTATUS entry containing the
+ * ELF32 PRSTATUS structure @prs. Caller needs to free the created note in
+ * @created_prstatus_note.
+ *
+ * Return: 0 on failure, otherwise size of ELF32 PRSTATUS note in bytes.
+ */
+static unsigned int fw_core_dump_create_prstatus_note(char *name, struct elf_prstatus32 *prs,
+						      struct elf32_note **created_prstatus_note)
+{
+	struct elf32_note *note;
+	unsigned int note_name_sz;
+	unsigned int note_sz;
+
+	/* Allocate memory for ELF32 note containing a PRSTATUS note. */
+	note_name_sz = strlen(name) + 1;
+	note_sz = sizeof(struct elf32_note) + roundup(note_name_sz, 4) +
+		  sizeof(struct elf_prstatus32);
+	note = kmalloc(note_sz, GFP_KERNEL);
+	if (!note)
+		return 0;
+
+	/* Fill in ELF32 note with one entry for a PRSTATUS note. */
+	note->n_namesz = note_name_sz;
+	note->n_descsz = sizeof(struct elf_prstatus32);
+	note->n_type = NT_PRSTATUS;
+	memcpy(note + 1, name, note_name_sz);
+	memcpy((char *)(note + 1) + roundup(note_name_sz, 4), prs, sizeof(*prs));
+
+	/* Return pointer and size of the created ELF32 note. */
+	*created_prstatus_note = note;
+	return note_sz;
+}
+
+/**
+ * fw_core_dump_write_elf_header - Writes ELF header for the FW core dump
+ * @m: the seq_file handle
+ *
+ * Writes the ELF header of the core dump including program headers for
+ * memory sections and a note containing the current MCU register
+ * values.
+ *
+ * Excludes memory sections without read access permissions or
+ * are for protected memory.
+ *
+ * The data written is as follows:
+ * - ELF header
+ * - ELF PHDRs for memory sections
+ * - ELF PHDR for program header NOTE
+ * - ELF PRSTATUS note
+ * - 0-bytes padding to multiple of ELF_EXEC_PAGESIZE
+ *
+ * The actual memory section dumps should follow this (not written
+ * by this function).
+ *
+ * Retrieves the necessary information via the struct
+ * fw_core_dump_data stored in the private member of the seq_file
+ * handle.
+ *
+ * Return:
+ * * 0		- success
+ * * -ENOMEM	- not enough memory for allocating ELF32 note
+ */
+int fw_core_dump_write_elf_header(struct seq_file *m)
+{
+	struct elf32_hdr hdr;
+	struct elf32_phdr phdr;
+	struct fw_core_dump_data *dump_data = m->private;
+	struct kbase_device *const kbdev = dump_data->kbdev;
+	struct kbase_csf_firmware_interface *interface;
+	struct elf_prstatus32 elf_prs;
+	struct elf32_note *elf_prstatus_note;
+	unsigned int sections = 0;
+	unsigned int elf_prstatus_note_size;
+	u32 elf_prstatus_offset;
+	u32 elf_phdr_note_offset;
+	u32 elf_memory_sections_data_offset;
+	u32 total_pages = 0;
+	u32 padding_size, *padding;
+	struct fw_core_dump_mcu regs = { 0 };
+
+	CSTD_UNUSED(total_pages);
+
+	/* Count number of memory sections. */
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		/* Skip memory sections that cannot be read or are protected. */
+		if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) ||
+		    (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0)
+			continue;
+		sections++;
+	}
+
+	/* Prepare ELF header. */
+	fw_core_dump_fill_elf_header(&hdr, sections + 1);
+	seq_write(m, &hdr, sizeof(struct elf32_hdr));
+
+	elf_prstatus_note_size = fw_core_dump_get_prstatus_note_size("CORE");
+	/* PHDRs of PT_LOAD type. */
+	elf_phdr_note_offset = sizeof(struct elf32_hdr) + sections * sizeof(struct elf32_phdr);
+	/* PHDR of PT_NOTE type. */
+	elf_prstatus_offset = elf_phdr_note_offset + sizeof(struct elf32_phdr);
+	elf_memory_sections_data_offset = elf_prstatus_offset + elf_prstatus_note_size;
+
+	/* Calculate padding size to page offset. */
+	padding_size = roundup(elf_memory_sections_data_offset, ELF_EXEC_PAGESIZE) -
+		       elf_memory_sections_data_offset;
+	elf_memory_sections_data_offset += padding_size;
+
+	/* Prepare ELF program header table. */
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		/* Skip memory sections that cannot be read or are protected. */
+		if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) ||
+		    (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0)
+			continue;
+
+		fw_core_dump_fill_elf_program_header(&phdr, elf_memory_sections_data_offset,
+						     interface->virtual,
+						     interface->num_pages * FW_PAGE_SIZE,
+						     interface->flags);
+
+		seq_write(m, &phdr, sizeof(struct elf32_phdr));
+
+		elf_memory_sections_data_offset += interface->num_pages * FW_PAGE_SIZE;
+		total_pages += interface->num_pages;
+	}
+
+	/* Prepare PHDR of PT_NOTE type. */
+	fw_core_dump_fill_elf_program_header_note(&phdr, elf_prstatus_offset,
+						  elf_prstatus_note_size);
+	seq_write(m, &phdr, sizeof(struct elf32_phdr));
+
+	/* Prepare ELF note of PRSTATUS type. */
+	if (fw_get_core_dump_mcu(kbdev, &regs))
+		dev_dbg(kbdev->dev, "MCU Registers not available, all registers set to zero");
+	/* Even if MCU Registers are not available the ELF prstatus is still
+	 * filled with the registers equal to zero.
+	 */
+	fw_core_dump_fill_elf_prstatus(&elf_prs, &regs);
+	elf_prstatus_note_size =
+		fw_core_dump_create_prstatus_note("CORE", &elf_prs, &elf_prstatus_note);
+	if (elf_prstatus_note_size == 0)
+		return -ENOMEM;
+
+	seq_write(m, elf_prstatus_note, elf_prstatus_note_size);
+	kfree(elf_prstatus_note);
+
+	/* Pad file to page size. */
+	padding = kzalloc(padding_size, GFP_KERNEL);
+	seq_write(m, padding, padding_size);
+	kfree(padding);
+
+	return 0;
+}
+
+#define MAX_FW_CORE_DUMP_HEADER_SIZE (1 << 14)
+
+/**
+ * get_fw_core_dump_size - Get firmware core dump size
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Return: size on success, -1 otherwise.
+ */
+size_t get_fw_core_dump_size(struct kbase_device *kbdev)
+{
+	static char buffer[MAX_FW_CORE_DUMP_HEADER_SIZE];
+	size_t size;
+	struct fw_core_dump_data private = {.kbdev = kbdev};
+	struct seq_file m = {.private = &private, .buf = buffer, .size = MAX_FW_CORE_DUMP_HEADER_SIZE};
+	struct kbase_csf_firmware_interface *interface;
+
+	fw_core_dump_write_elf_header(&m);
+	if (unlikely(m.count >= m.size)) {
+		dev_warn(kbdev->dev, "firmware core dump header may be larger than buffer size");
+		return -1;
+	}
+	size = m.count;
+
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		/* Skip memory sections that cannot be read or are protected. */
+		if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) ||
+		    (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0)
+			continue;
+
+		size += interface->num_pages * FW_PAGE_SIZE;
+	}
+
+	return size;
+}
+
+/**
+ * fw_core_dump_create - Requests firmware to save state for a firmware core dump
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int fw_core_dump_create(struct kbase_device *kbdev)
+{
+	int err;
+
+	/* Ensure MCU is active before requesting the core dump. */
+	kbase_csf_scheduler_pm_active(kbdev);
+	err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev);
+	if (!err)
+		err = kbase_csf_firmware_req_core_dump(kbdev);
+
+	kbase_csf_scheduler_pm_idle(kbdev);
+
+	return err;
+}
+
+/**
+ * fw_core_dump_seq_start - seq_file start operation for firmware core dump file
+ * @m: the seq_file handle
+ * @_pos: holds the current position in pages
+ *        (0 or most recent position used in previous session)
+ *
+ * Starts a seq_file session, positioning the iterator for the session to page @_pos - 1
+ * within the firmware interface memory sections. @_pos value 0 is used to indicate the
+ * position of the ELF header at the start of the file.
+ *
+ * Retrieves the necessary information via the struct fw_core_dump_data stored in
+ * the private member of the seq_file handle.
+ *
+ * Return:
+ * * iterator pointer	- pointer to iterator struct fw_core_dump_seq_off
+ * * SEQ_START_TOKEN	- special iterator pointer indicating its is the start of the file
+ * * NULL		- iterator could not be allocated
+ */
+static void *fw_core_dump_seq_start(struct seq_file *m, loff_t *_pos)
+{
+	struct fw_core_dump_data *dump_data = m->private;
+	struct fw_core_dump_seq_off *data;
+	struct kbase_csf_firmware_interface *interface;
+	loff_t pos = *_pos;
+
+	if (pos == 0)
+		return SEQ_START_TOKEN;
+
+	/* Move iterator in the right position based on page number within
+	 * available pages of firmware interface memory sections.
+	 */
+	pos--; /* ignore start token */
+	list_for_each_entry(interface, &dump_data->kbdev->csf.firmware_interfaces, node) {
+		/* Skip memory sections that cannot be read or are protected. */
+		if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) ||
+		    (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0)
+			continue;
+
+		if (pos >= interface->num_pages) {
+			pos -= interface->num_pages;
+		} else {
+			data = kmalloc(sizeof(*data), GFP_KERNEL);
+			if (!data)
+				return NULL;
+			data->interface = interface;
+			data->page_num = pos;
+			return data;
+		}
+	}
+
+	return NULL;
+}
+
+/**
+ * fw_core_dump_seq_stop - seq_file stop operation for firmware core dump file
+ * @m: the seq_file handle
+ * @v: the current iterator (pointer to struct fw_core_dump_seq_off)
+ *
+ * Closes the current session and frees any memory related.
+ */
+static void fw_core_dump_seq_stop(struct seq_file *m, void *v)
+{
+	kfree(v);
+}
+
+/**
+ * fw_core_dump_seq_next - seq_file next operation for firmware core dump file
+ * @m: the seq_file handle
+ * @v: the current iterator (pointer to struct fw_core_dump_seq_off)
+ * @pos: holds the current position in pages
+ *        (0 or most recent position used in previous session)
+ *
+ * Moves the iterator @v forward to the next page within the firmware interface
+ * memory sections and returns the updated position in @pos.
+ * @v value SEQ_START_TOKEN indicates the ELF header position.
+ *
+ * Return:
+ * * iterator pointer	- pointer to iterator struct fw_core_dump_seq_off
+ * * NULL		- iterator could not be allocated
+ */
+static void *fw_core_dump_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct fw_core_dump_data *dump_data = m->private;
+	struct fw_core_dump_seq_off *data = v;
+	struct kbase_csf_firmware_interface *interface;
+	struct list_head *interfaces = &dump_data->kbdev->csf.firmware_interfaces;
+
+	/* Is current position at the ELF header ? */
+	if (v == SEQ_START_TOKEN) {
+		if (list_empty(interfaces))
+			return NULL;
+
+		/* Prepare iterator for starting at first page in firmware interface
+		 * memory sections.
+		 */
+		data = kmalloc(sizeof(*data), GFP_KERNEL);
+		if (!data)
+			return NULL;
+		data->interface =
+			list_first_entry(interfaces, struct kbase_csf_firmware_interface, node);
+		data->page_num = 0;
+		++*pos;
+		return data;
+	}
+
+	/* First attempt to satisfy from current firmware interface memory section. */
+	interface = data->interface;
+	if (data->page_num + 1 < interface->num_pages) {
+		data->page_num++;
+		++*pos;
+		return data;
+	}
+
+	/* Need next firmware interface memory section. This could be the last one. */
+	if (list_is_last(&interface->node, interfaces)) {
+		kfree(data);
+		return NULL;
+	}
+
+	/* Move to first page in next firmware interface memory section. */
+	data->interface = list_next_entry(interface, node);
+	data->page_num = 0;
+	++*pos;
+
+	return data;
+}
+
+/**
+ * fw_core_dump_seq_show - seq_file show operation for firmware core dump file
+ * @m: the seq_file handle
+ * @v: the current iterator (pointer to struct fw_core_dump_seq_off)
+ *
+ * Writes the current page in a firmware interface memory section indicated
+ * by the iterator @v to the file. If @v is SEQ_START_TOKEN the ELF
+ * header is written.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+static int fw_core_dump_seq_show(struct seq_file *m, void *v)
+{
+	struct fw_core_dump_seq_off *data = v;
+	struct page *page;
+	u32 *p;
+
+	/* Either write the ELF header or current page. */
+	if (v == SEQ_START_TOKEN)
+		return fw_core_dump_write_elf_header(m);
+
+	/* Write the current page. */
+	page = as_page(data->interface->phys[data->page_num]);
+	p = kbase_kmap_atomic(page);
+	seq_write(m, p, FW_PAGE_SIZE);
+	kbase_kunmap_atomic(p);
+
+	return 0;
+}
+
+/* Sequence file operations for firmware core dump file. */
+static const struct seq_operations fw_core_dump_seq_ops = {
+	.start = fw_core_dump_seq_start,
+	.next = fw_core_dump_seq_next,
+	.stop = fw_core_dump_seq_stop,
+	.show = fw_core_dump_seq_show,
+};
+
+/**
+ * fw_core_dump_debugfs_open - callback for opening the 'fw_core_dump' debugfs file
+ * @inode: inode of the file
+ * @file:  file pointer
+ *
+ * Prepares for servicing a write request to request a core dump from firmware and
+ * a read request to retrieve the core dump.
+ *
+ * Returns an error if the firmware is not initialized yet.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+static int fw_core_dump_debugfs_open(struct inode *inode, struct file *file)
+{
+	struct kbase_device *const kbdev = inode->i_private;
+	struct fw_core_dump_data *dump_data;
+	int ret;
+
+	/* Fail if firmware is not initialized yet. */
+	if (!kbdev->csf.firmware_inited) {
+		ret = -ENODEV;
+		goto open_fail;
+	}
+
+	/* Open a sequence file for iterating through the pages in the
+	 * firmware interface memory pages. seq_open stores a
+	 * struct seq_file * in the private_data field of @file.
+	 */
+	ret = seq_open(file, &fw_core_dump_seq_ops);
+	if (ret)
+		goto open_fail;
+
+	/* Allocate a context for sequence file operations. */
+	dump_data = kmalloc(sizeof(*dump_data), GFP_KERNEL);
+	if (!dump_data) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Kbase device will be shared with sequence file operations. */
+	dump_data->kbdev = kbdev;
+
+	/* Link our sequence file context. */
+	((struct seq_file *)file->private_data)->private = dump_data;
+
+	return 0;
+out:
+	seq_release(inode, file);
+open_fail:
+	return ret;
+}
+
+/**
+ * fw_core_dump_debugfs_write - callback for a write to the 'fw_core_dump' debugfs file
+ * @file:  file pointer
+ * @ubuf:  user buffer containing data to store
+ * @count: number of bytes in user buffer
+ * @ppos:  file position
+ *
+ * Any data written to the file triggers a firmware core dump request which
+ * subsequently can be retrieved by reading from the file.
+ *
+ * Return: @count if the function succeeded. An error code on failure.
+ */
+static ssize_t fw_core_dump_debugfs_write(struct file *file, const char __user *ubuf, size_t count,
+					  loff_t *ppos)
+{
+	int err;
+	struct fw_core_dump_data *dump_data = ((struct seq_file *)file->private_data)->private;
+	struct kbase_device *const kbdev = dump_data->kbdev;
+
+	CSTD_UNUSED(ppos);
+
+	err = fw_core_dump_create(kbdev);
+
+	return err ? err : count;
+}
+
+/**
+ * fw_core_dump_debugfs_release - callback for releasing the 'fw_core_dump' debugfs file
+ * @inode: inode of the file
+ * @file:  file pointer
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+static int fw_core_dump_debugfs_release(struct inode *inode, struct file *file)
+{
+	struct fw_core_dump_data *dump_data = ((struct seq_file *)file->private_data)->private;
+
+	seq_release(inode, file);
+
+	kfree(dump_data);
+
+	return 0;
+}
+/* Debugfs file operations for firmware core dump file. */
+static const struct file_operations kbase_csf_fw_core_dump_fops = {
+	.owner = THIS_MODULE,
+	.open = fw_core_dump_debugfs_open,
+	.read = seq_read,
+	.write = fw_core_dump_debugfs_write,
+	.llseek = seq_lseek,
+	.release = fw_core_dump_debugfs_release,
+};
+
+void kbase_csf_firmware_core_dump_init(struct kbase_device *const kbdev)
+{
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	debugfs_create_file("fw_core_dump", 0600, kbdev->mali_debugfs_directory, kbdev,
+			    &kbase_csf_fw_core_dump_fops);
+#endif /* CONFIG_DEBUG_FS */
+}
+
+int kbase_csf_firmware_core_dump_entry_parse(struct kbase_device *kbdev, const u32 *entry)
+{
+	/* Casting to u16 as version is defined by bits 15:0 */
+	kbdev->csf.fw_core_dump.version = (u16)entry[FW_CORE_DUMP_VERSION_INDEX];
+
+	if (kbdev->csf.fw_core_dump.version != FW_CORE_DUMP_DATA_VERSION)
+		return -EPERM;
+
+	kbdev->csf.fw_core_dump.mcu_regs_addr = entry[FW_CORE_DUMP_START_ADDR_INDEX];
+	kbdev->csf.fw_core_dump.available = true;
+
+	return 0;
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h
new file mode 100644
index 0000000..940e8af
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_core_dump.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_CSF_FIRMWARE_CORE_DUMP_H_
+#define _KBASE_CSF_FIRMWARE_CORE_DUMP_H_
+
+struct kbase_device;
+
+/** Offset of the last field of core dump entry from the image header */
+#define CORE_DUMP_ENTRY_START_ADDR_OFFSET (0x4)
+
+/* Page size in bytes in use by MCU. */
+#define FW_PAGE_SIZE 4096
+
+/**
+ * struct fw_core_dump_data - Context for seq_file operations used on 'fw_core_dump'
+ * debugfs file.
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ */
+struct fw_core_dump_data {
+	struct kbase_device *kbdev;
+};
+
+/**
+ * kbase_csf_firmware_core_dump_entry_parse() - Parse a "core dump" entry from
+ *                                              the image header.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @entry: Pointer to section.
+ *
+ * Read a "core dump" entry from the image header, check the version for
+ * compatibility and store the address pointer.
+ *
+ * Return: 0 if successfully parse entry, negative error code otherwise.
+ */
+int kbase_csf_firmware_core_dump_entry_parse(struct kbase_device *kbdev, const u32 *entry);
+
+/**
+ * kbase_csf_firmware_core_dump_init() - Initialize firmware core dump support
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *         Must be zero-initialized.
+ *
+ * Creates the fw_core_dump debugfs file through which to request a firmware
+ * core dump. The created debugfs file is cleaned up as part of kbdev debugfs
+ * cleanup.
+ *
+ * The fw_core_dump debugs file that case be used in the following way:
+ *
+ * To explicitly request core dump:
+ *     echo 1 >/sys/kernel/debug/mali0/fw_core_dump
+ *
+ * To output current core dump (after explicitly requesting a core dump, or
+ * kernel driver reported an internal firmware error):
+ *     cat /sys/kernel/debug/mali0/fw_core_dump
+ */
+void kbase_csf_firmware_core_dump_init(struct kbase_device *const kbdev);
+
+/**
+ * get_fw_core_dump_size - Get firmware core dump size
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Return: size on success, -1 otherwise.
+ */
+size_t get_fw_core_dump_size(struct kbase_device *kbdev);
+
+/**
+ * fw_core_dump_create - Requests firmware to save state for a firmware core dump
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int fw_core_dump_create(struct kbase_device *kbdev);
+
+/**
+ * fw_core_dump_write_elf_header - Writes ELF header for the FW core dump
+ * @m: the seq_file handle
+ *
+ * Writes the ELF header of the core dump including program headers for
+ * memory sections and a note containing the current MCU register
+ * values.
+ *
+ * Excludes memory sections without read access permissions or
+ * are for protected memory.
+ *
+ * The data written is as follows:
+ * - ELF header
+ * - ELF PHDRs for memory sections
+ * - ELF PHDR for program header NOTE
+ * - ELF PRSTATUS note
+ * - 0-bytes padding to multiple of ELF_EXEC_PAGESIZE
+ *
+ * The actual memory section dumps should follow this (not written
+ * by this function).
+ *
+ * Retrieves the necessary information via the struct
+ * fw_core_dump_data stored in the private member of the seq_file
+ * handle.
+ *
+ * Return:
+ * * 0		- success
+ * * -ENOMEM	- not enough memory for allocating ELF32 note
+ */
+int fw_core_dump_write_elf_header(struct seq_file *m);
+
+#endif /* _KBASE_CSF_FIRMWARE_CORE_DUMP_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_log.c b/mali_kbase/csf/mali_kbase_csf_firmware_log.c
new file mode 100644
index 0000000..89df839
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_log.c
@@ -0,0 +1,547 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <mali_kbase.h>
+#include "backend/gpu/mali_kbase_pm_internal.h"
+#include <csf/mali_kbase_csf_firmware_log.h>
+#include <csf/mali_kbase_csf_trace_buffer.h>
+#include <linux/debugfs.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+/*
+ * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address.
+ */
+#define ARMV7_T1_BL_IMM_INSTR		0xd800f000
+
+/*
+ * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address, maximum
+ * negative jump offset.
+ */
+#define ARMV7_T1_BL_IMM_RANGE_MIN	-16777216
+
+/*
+ * ARMv7 instruction: Branch with Link calls a subroutine at a PC-relative address, maximum
+ * positive jump offset.
+ */
+#define ARMV7_T1_BL_IMM_RANGE_MAX	16777214
+
+/*
+ * ARMv7 instruction: Double NOP instructions.
+ */
+#define ARMV7_DOUBLE_NOP_INSTR		0xbf00bf00
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int kbase_csf_firmware_log_enable_mask_read(void *data, u64 *val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct firmware_trace_buffer *tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+
+	if (tb == NULL) {
+		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
+		return -EIO;
+	}
+	/* The enabled traces limited to u64 here, regarded practical */
+	*val = kbase_csf_firmware_trace_buffer_get_active_mask64(tb);
+	return 0;
+}
+
+static int kbase_csf_firmware_log_enable_mask_write(void *data, u64 val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct firmware_trace_buffer *tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+	u64 new_mask;
+	unsigned int enable_bits_count;
+
+	if (tb == NULL) {
+		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
+		return -EIO;
+	}
+
+	/* Ignore unsupported types */
+	enable_bits_count = kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count(tb);
+	if (enable_bits_count > 64) {
+		dev_dbg(kbdev->dev, "Limit enabled bits count from %u to 64", enable_bits_count);
+		enable_bits_count = 64;
+	}
+	new_mask = val & (UINT64_MAX >> (64 - enable_bits_count));
+
+	if (new_mask != kbase_csf_firmware_trace_buffer_get_active_mask64(tb))
+		return kbase_csf_firmware_trace_buffer_set_active_mask64(tb, new_mask);
+	else
+		return 0;
+}
+
+static int kbasep_csf_firmware_log_debugfs_open(struct inode *in, struct file *file)
+{
+	struct kbase_device *kbdev = in->i_private;
+
+	file->private_data = kbdev;
+	dev_dbg(kbdev->dev, "Opened firmware trace buffer dump debugfs file");
+
+	return 0;
+}
+
+static ssize_t kbasep_csf_firmware_log_debugfs_read(struct file *file, char __user *buf,
+						    size_t size, loff_t *ppos)
+{
+	struct kbase_device *kbdev = file->private_data;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	unsigned int n_read;
+	unsigned long not_copied;
+	/* Limit reads to the kernel dump buffer size */
+	size_t mem = MIN(size, FIRMWARE_LOG_DUMP_BUF_SIZE);
+	int ret;
+
+	struct firmware_trace_buffer *tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+
+	if (tb == NULL) {
+		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
+		return -EIO;
+	}
+
+	if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0)
+		return -EBUSY;
+
+	/* Reading from userspace is only allowed in manual mode or auto-discard mode */
+	if (fw_log->mode != KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL &&
+			fw_log->mode != KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	n_read = kbase_csf_firmware_trace_buffer_read_data(tb, fw_log->dump_buf, mem);
+
+	/* Do the copy, if we have obtained some trace data */
+	not_copied = (n_read) ? copy_to_user(buf, fw_log->dump_buf, n_read) : 0;
+
+	if (not_copied) {
+		dev_err(kbdev->dev, "Couldn't copy trace buffer data to user space buffer");
+		ret = -EFAULT;
+		goto out;
+	}
+
+	*ppos += n_read;
+	ret = n_read;
+
+out:
+	atomic_set(&fw_log->busy, 0);
+	return ret;
+}
+
+static int kbase_csf_firmware_log_mode_read(void *data, u64 *val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+
+	*val = fw_log->mode;
+	return 0;
+}
+
+static int kbase_csf_firmware_log_mode_write(void *data, u64 val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	int ret = 0;
+
+	if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0)
+		return -EBUSY;
+
+	if (val == fw_log->mode)
+		goto out;
+
+	switch (val) {
+	case KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL:
+		cancel_delayed_work_sync(&fw_log->poll_work);
+		break;
+	case KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT:
+	case KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD:
+		schedule_delayed_work(&fw_log->poll_work,
+				      msecs_to_jiffies(atomic_read(&fw_log->poll_period_ms)));
+		break;
+	default:
+		ret = -EINVAL;
+		goto out;
+	}
+
+	fw_log->mode = val;
+
+out:
+	atomic_set(&fw_log->busy, 0);
+	return ret;
+}
+
+static int kbase_csf_firmware_log_poll_period_read(void *data, u64 *val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+
+	*val = atomic_read(&fw_log->poll_period_ms);
+	return 0;
+}
+
+static int kbase_csf_firmware_log_poll_period_write(void *data, u64 val)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)data;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+
+	atomic_set(&fw_log->poll_period_ms, val);
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_enable_mask_fops,
+			 kbase_csf_firmware_log_enable_mask_read,
+			 kbase_csf_firmware_log_enable_mask_write, "%llx\n");
+
+static const struct file_operations kbasep_csf_firmware_log_debugfs_fops = {
+	.owner = THIS_MODULE,
+	.open = kbasep_csf_firmware_log_debugfs_open,
+	.read = kbasep_csf_firmware_log_debugfs_read,
+	.llseek = no_llseek,
+};
+
+DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_mode_fops, kbase_csf_firmware_log_mode_read,
+			 kbase_csf_firmware_log_mode_write, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_firmware_log_poll_period_fops,
+			 kbase_csf_firmware_log_poll_period_read,
+			 kbase_csf_firmware_log_poll_period_write, "%llu\n");
+
+#endif /* CONFIG_DEBUG_FS */
+
+static void kbase_csf_firmware_log_discard_buffer(struct kbase_device *kbdev)
+{
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	struct firmware_trace_buffer *tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+
+	if (tb == NULL) {
+		dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware log discard skipped");
+		return;
+	}
+
+	if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0)
+		return;
+
+	kbase_csf_firmware_trace_buffer_discard(tb);
+
+	atomic_set(&fw_log->busy, 0);
+}
+
+static void kbase_csf_firmware_log_poll(struct work_struct *work)
+{
+	struct kbase_device *kbdev =
+		container_of(work, struct kbase_device, csf.fw_log.poll_work.work);
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+
+	if (fw_log->mode == KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT)
+		kbase_csf_firmware_log_dump_buffer(kbdev);
+	else if (fw_log->mode == KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD)
+		kbase_csf_firmware_log_discard_buffer(kbdev);
+	else
+		return;
+
+	schedule_delayed_work(&fw_log->poll_work,
+			      msecs_to_jiffies(atomic_read(&fw_log->poll_period_ms)));
+}
+
+int kbase_csf_firmware_log_init(struct kbase_device *kbdev)
+{
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	int err = 0;
+#if defined(CONFIG_DEBUG_FS)
+	struct dentry *dentry;
+#endif /* CONFIG_DEBUG_FS */
+
+	/* Add one byte for null-termination */
+	fw_log->dump_buf = kmalloc(FIRMWARE_LOG_DUMP_BUF_SIZE + 1, GFP_KERNEL);
+	if (fw_log->dump_buf == NULL) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* Ensure null-termination for all strings */
+	fw_log->dump_buf[FIRMWARE_LOG_DUMP_BUF_SIZE] = 0;
+
+	/* Set default log polling period */
+	atomic_set(&fw_log->poll_period_ms, KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT);
+
+	INIT_DEFERRABLE_WORK(&fw_log->poll_work, kbase_csf_firmware_log_poll);
+#ifdef CONFIG_MALI_FW_TRACE_MODE_AUTO_DISCARD
+	fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_DISCARD;
+	schedule_delayed_work(&fw_log->poll_work,
+			      msecs_to_jiffies(KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT));
+#elif defined(CONFIG_MALI_FW_TRACE_MODE_AUTO_PRINT)
+	fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_AUTO_PRINT;
+	schedule_delayed_work(&fw_log->poll_work,
+			      msecs_to_jiffies(KBASE_CSF_FIRMWARE_LOG_POLL_PERIOD_MS_DEFAULT));
+#else /* CONFIG_MALI_FW_TRACE_MODE_MANUAL */
+	fw_log->mode = KBASE_CSF_FIRMWARE_LOG_MODE_MANUAL;
+#endif
+
+	atomic_set(&fw_log->busy, 0);
+
+#if !defined(CONFIG_DEBUG_FS)
+	return 0;
+#else /* !CONFIG_DEBUG_FS */
+	dentry = debugfs_create_file("fw_trace_enable_mask", 0644, kbdev->mali_debugfs_directory,
+				     kbdev, &kbase_csf_firmware_log_enable_mask_fops);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create fw_trace_enable_mask\n");
+		err = -ENOENT;
+		goto free_out;
+	}
+	dentry = debugfs_create_file("fw_traces", 0444, kbdev->mali_debugfs_directory, kbdev,
+				     &kbasep_csf_firmware_log_debugfs_fops);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create fw_traces\n");
+		err = -ENOENT;
+		goto free_out;
+	}
+	dentry = debugfs_create_file("fw_trace_mode", 0644, kbdev->mali_debugfs_directory, kbdev,
+				     &kbase_csf_firmware_log_mode_fops);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create fw_trace_mode\n");
+		err = -ENOENT;
+		goto free_out;
+	}
+	dentry = debugfs_create_file("fw_trace_poll_period_ms", 0644, kbdev->mali_debugfs_directory,
+				     kbdev, &kbase_csf_firmware_log_poll_period_fops);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create fw_trace_poll_period_ms");
+		err = -ENOENT;
+		goto free_out;
+	}
+
+	return 0;
+
+free_out:
+	kfree(fw_log->dump_buf);
+	fw_log->dump_buf = NULL;
+#endif /* CONFIG_DEBUG_FS */
+out:
+	return err;
+}
+
+void kbase_csf_firmware_log_term(struct kbase_device *kbdev)
+{
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+
+	if (fw_log->dump_buf) {
+		cancel_delayed_work_sync(&fw_log->poll_work);
+		kfree(fw_log->dump_buf);
+		fw_log->dump_buf = NULL;
+	}
+}
+
+void kbase_csf_firmware_log_dump_buffer(struct kbase_device *kbdev)
+{
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	u8 *buf = fw_log->dump_buf, *p, *pnewline, *pend, *pendbuf;
+	unsigned int read_size, remaining_size;
+	struct firmware_trace_buffer *tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+
+	if (tb == NULL) {
+		dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware trace dump skipped");
+		return;
+	}
+
+	if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0)
+		return;
+
+	/* FW should only print complete messages, so there's no need to handle
+	 * partial messages over multiple invocations of this function
+	 */
+
+	p = buf;
+	pendbuf = &buf[FIRMWARE_LOG_DUMP_BUF_SIZE];
+
+	while ((read_size = kbase_csf_firmware_trace_buffer_read_data(tb, p, pendbuf - p))) {
+		pend = p + read_size;
+		p = buf;
+
+		while (p < pend && (pnewline = memchr(p, '\n', pend - p))) {
+			/* Null-terminate the string */
+			*pnewline = 0;
+
+			dev_err(kbdev->dev, "FW> %s", p);
+
+			p = pnewline + 1;
+		}
+
+		remaining_size = pend - p;
+
+		if (!remaining_size) {
+			p = buf;
+		} else if (remaining_size < FIRMWARE_LOG_DUMP_BUF_SIZE) {
+			/* Copy unfinished string to the start of the buffer */
+			memmove(buf, p, remaining_size);
+			p = &buf[remaining_size];
+		} else {
+			/* Print abnormally long string without newlines */
+			dev_err(kbdev->dev, "FW> %s", buf);
+			p = buf;
+		}
+	}
+
+	if (p != buf) {
+		/* Null-terminate and print last unfinished string */
+		*p = 0;
+		dev_err(kbdev->dev, "FW> %s", buf);
+	}
+
+	atomic_set(&fw_log->busy, 0);
+}
+
+void kbase_csf_firmware_log_parse_logging_call_list_entry(struct kbase_device *kbdev,
+							  const uint32_t *entry)
+{
+	kbdev->csf.fw_log.func_call_list_va_start = entry[0];
+	kbdev->csf.fw_log.func_call_list_va_end = entry[1];
+}
+
+/**
+ * toggle_logging_calls_in_loaded_image - Toggles FW log func calls in loaded FW image.
+ *
+ * @kbdev:  Instance of a GPU platform device that implements a CSF interface.
+ * @enable: Whether to enable or disable the function calls.
+ */
+static void toggle_logging_calls_in_loaded_image(struct kbase_device *kbdev, bool enable)
+{
+	uint32_t bl_instruction, diff;
+	uint32_t imm11, imm10, i1, i2, j1, j2, sign;
+	uint32_t calling_address = 0, callee_address = 0;
+	uint32_t list_entry = kbdev->csf.fw_log.func_call_list_va_start;
+	const uint32_t list_va_end = kbdev->csf.fw_log.func_call_list_va_end;
+
+	if (list_entry == 0 || list_va_end == 0)
+		return;
+
+	if (enable) {
+		for (; list_entry < list_va_end; list_entry += 2 * sizeof(uint32_t)) {
+			/* Read calling address */
+			kbase_csf_read_firmware_memory(kbdev, list_entry, &calling_address);
+			/* Read callee address */
+			kbase_csf_read_firmware_memory(kbdev, list_entry + sizeof(uint32_t),
+					&callee_address);
+
+			diff = callee_address - calling_address - 4;
+			sign = !!(diff & 0x80000000);
+			if (ARMV7_T1_BL_IMM_RANGE_MIN > (int32_t)diff ||
+					ARMV7_T1_BL_IMM_RANGE_MAX < (int32_t)diff) {
+				dev_warn(kbdev->dev, "FW log patch 0x%x out of range, skipping",
+						calling_address);
+				continue;
+			}
+
+			i1 = (diff & 0x00800000) >> 23;
+			j1 = !i1 ^ sign;
+			i2 = (diff & 0x00400000) >> 22;
+			j2 = !i2 ^ sign;
+			imm11 = (diff & 0xffe) >> 1;
+			imm10 = (diff & 0x3ff000) >> 12;
+
+			/* Compose BL instruction */
+			bl_instruction = ARMV7_T1_BL_IMM_INSTR;
+			bl_instruction |= j1 << 29;
+			bl_instruction |= j2 << 27;
+			bl_instruction |= imm11 << 16;
+			bl_instruction |= sign << 10;
+			bl_instruction |= imm10;
+
+			/* Patch logging func calls in their load location */
+			dev_dbg(kbdev->dev, "FW log patch 0x%x: 0x%x\n", calling_address,
+					bl_instruction);
+			kbase_csf_update_firmware_memory_exe(kbdev, calling_address,
+					bl_instruction);
+		}
+	} else {
+		for (; list_entry < list_va_end; list_entry += 2 * sizeof(uint32_t)) {
+			/* Read calling address */
+			kbase_csf_read_firmware_memory(kbdev, list_entry, &calling_address);
+
+			/* Overwrite logging func calls with 2 NOP instructions */
+			kbase_csf_update_firmware_memory_exe(kbdev, calling_address,
+					ARMV7_DOUBLE_NOP_INSTR);
+		}
+	}
+}
+
+int kbase_csf_firmware_log_toggle_logging_calls(struct kbase_device *kbdev, u32 val)
+{
+	unsigned long flags;
+	struct kbase_csf_firmware_log *fw_log = &kbdev->csf.fw_log;
+	bool mcu_inactive;
+	bool resume_needed = false;
+	int ret = 0;
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+
+	if (atomic_cmpxchg(&fw_log->busy, 0, 1) != 0)
+		return -EBUSY;
+
+	/* Suspend all the active CS groups */
+	dev_dbg(kbdev->dev, "Suspend all the active CS groups");
+
+	kbase_csf_scheduler_lock(kbdev);
+	while (scheduler->state != SCHED_SUSPENDED) {
+		kbase_csf_scheduler_unlock(kbdev);
+		kbase_csf_scheduler_pm_suspend(kbdev);
+		kbase_csf_scheduler_lock(kbdev);
+		resume_needed = true;
+	}
+
+	/* Wait for the MCU to get disabled */
+	dev_info(kbdev->dev, "Wait for the MCU to get disabled");
+	ret = kbase_pm_killable_wait_for_desired_state(kbdev);
+	if (ret) {
+		dev_err(kbdev->dev,
+			"wait for PM state failed when toggling FW logging calls");
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	mcu_inactive =
+		kbase_pm_is_mcu_inactive(kbdev, kbdev->pm.backend.mcu_state);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	if (!mcu_inactive) {
+		dev_err(kbdev->dev,
+			"MCU not inactive after PM state wait when toggling FW logging calls");
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	/* Toggle FW logging call in the loaded FW image */
+	toggle_logging_calls_in_loaded_image(kbdev, val);
+	dev_dbg(kbdev->dev, "FW logging: %s", val ? "enabled" : "disabled");
+
+out:
+	kbase_csf_scheduler_unlock(kbdev);
+	if (resume_needed)
+		/* Resume queue groups and start mcu */
+		kbase_csf_scheduler_pm_resume(kbdev);
+	atomic_set(&fw_log->busy, 0);
+	return ret;
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_log.h b/mali_kbase/csf/mali_kbase_csf_firmware_log.h
new file mode 100644
index 0000000..1008320
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_log.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_CSF_FIRMWARE_LOG_H_
+#define _KBASE_CSF_FIRMWARE_LOG_H_
+
+#include <mali_kbase.h>
+
+/** Offset of the last field of functions call list entry from the image header */
+#define FUNC_CALL_LIST_ENTRY_NAME_OFFSET (0x8)
+
+/*
+ * Firmware log dumping buffer size.
+ */
+#define FIRMWARE_LOG_DUMP_BUF_SIZE PAGE_SIZE
+
+/**
+ * kbase_csf_firmware_log_init - Initialize firmware log handling.
+ *
+ * @kbdev: Pointer to the Kbase device
+ *
+ * Return: The initialization error code.
+ */
+int kbase_csf_firmware_log_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_log_term - Terminate firmware log handling.
+ *
+ * @kbdev: Pointer to the Kbase device
+ */
+void kbase_csf_firmware_log_term(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_log_dump_buffer - Read remaining data in the firmware log
+ *                                  buffer and print it to dmesg.
+ *
+ * @kbdev: Pointer to the Kbase device
+ */
+void kbase_csf_firmware_log_dump_buffer(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_firmware_log_parse_logging_call_list_entry - Parse FW logging function call list entry.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @entry: Pointer to section.
+ */
+void kbase_csf_firmware_log_parse_logging_call_list_entry(struct kbase_device *kbdev,
+							  const uint32_t *entry);
+/**
+ * kbase_csf_firmware_log_toggle_logging_calls - Enables/Disables FW logging function calls.
+ *
+ * @kbdev:  Instance of a GPU platform device that implements a CSF interface.
+ * @val:    Configuration option value.
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+int kbase_csf_firmware_log_toggle_logging_calls(struct kbase_device *kbdev, u32 val);
+
+#endif /* _KBASE_CSF_FIRMWARE_LOG_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c b/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c
index 0fd848f..93d7c36 100644
--- a/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c
+++ b/mali_kbase/csf/mali_kbase_csf_firmware_no_mali.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -32,6 +32,8 @@
 #include "mali_kbase_csf_scheduler.h"
 #include "mmu/mali_kbase_mmu.h"
 #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h"
+#include <backend/gpu/mali_kbase_model_linux.h>
+#include <csf/mali_kbase_csf_registers.h>
 
 #include <linux/list.h>
 #include <linux/slab.h>
@@ -227,7 +229,8 @@ static int invent_capabilities(struct kbase_device *kbdev)
 	iface->version = 1;
 	iface->kbdev = kbdev;
 	iface->features = 0;
-	iface->prfcnt_size = 64;
+	iface->prfcnt_size =
+		GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET(0, KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE);
 
 	if (iface->version >= kbase_csf_interface_version(1, 1, 0)) {
 		/* update rate=1, max event size = 1<<8 = 256 */
@@ -270,6 +273,18 @@ void kbase_csf_update_firmware_memory(struct kbase_device *kbdev,
 	/* NO_MALI: Nothing to do here */
 }
 
+void kbase_csf_read_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 *value)
+{
+	/* NO_MALI: Nothing to do here */
+}
+
+void kbase_csf_update_firmware_memory_exe(struct kbase_device *kbdev,
+	u32 gpu_addr, u32 value)
+{
+	/* NO_MALI: Nothing to do here */
+}
+
 void kbase_csf_firmware_cs_input(
 	const struct kbase_csf_cmd_stream_info *const info, const u32 offset,
 	const u32 value)
@@ -371,37 +386,6 @@ u32 kbase_csf_firmware_csg_output(
 }
 KBASE_EXPORT_TEST_API(kbase_csf_firmware_csg_output);
 
-static void
-csf_firmware_prfcnt_process(const struct kbase_csf_global_iface *const iface,
-			    const u32 glb_req)
-{
-	struct kbase_device *kbdev = iface->kbdev;
-	u32 glb_ack = output_page_read(iface->output, GLB_ACK);
-	/* If the value of GLB_REQ.PRFCNT_SAMPLE is different from the value of
-	 * GLB_ACK.PRFCNT_SAMPLE, the CSF will sample the performance counters.
-	 */
-	if ((glb_req ^ glb_ack) & GLB_REQ_PRFCNT_SAMPLE_MASK) {
-		/* NO_MALI only uses the first buffer in the ring buffer. */
-		input_page_write(iface->input, GLB_PRFCNT_EXTRACT, 0);
-		output_page_write(iface->output, GLB_PRFCNT_INSERT, 1);
-		kbase_reg_write(kbdev, GPU_COMMAND, GPU_COMMAND_PRFCNT_SAMPLE);
-	}
-
-	/* Propagate enable masks to model if request to enable. */
-	if (glb_req & GLB_REQ_PRFCNT_ENABLE_MASK) {
-		u32 tiler_en, l2_en, sc_en;
-
-		tiler_en = input_page_read(iface->input, GLB_PRFCNT_TILER_EN);
-		l2_en = input_page_read(iface->input, GLB_PRFCNT_MMU_L2_EN);
-		sc_en = input_page_read(iface->input, GLB_PRFCNT_SHADER_EN);
-
-		/* NO_MALI platform enabled all CSHW counters by default. */
-		kbase_reg_write(kbdev, PRFCNT_TILER_EN, tiler_en);
-		kbase_reg_write(kbdev, PRFCNT_MMU_L2_EN, l2_en);
-		kbase_reg_write(kbdev, PRFCNT_SHADER_EN, sc_en);
-	}
-}
-
 void kbase_csf_firmware_global_input(
 	const struct kbase_csf_global_iface *const iface, const u32 offset,
 	const u32 value)
@@ -412,9 +396,17 @@ void kbase_csf_firmware_global_input(
 	input_page_write(iface->input, offset, value);
 
 	if (offset == GLB_REQ) {
-		csf_firmware_prfcnt_process(iface, value);
-		/* NO_MALI: Immediately acknowledge requests */
-		output_page_write(iface->output, GLB_ACK, value);
+		/* NO_MALI: Immediately acknowledge requests - except for PRFCNT_ENABLE
+		 * and PRFCNT_SAMPLE. These will be processed along with the
+		 * corresponding performance counter registers when the global doorbell
+		 * is rung in order to emulate the performance counter sampling behavior
+		 * of the real firmware.
+		 */
+		const u32 ack = output_page_read(iface->output, GLB_ACK);
+		const u32 req_mask = ~(GLB_REQ_PRFCNT_ENABLE_MASK | GLB_REQ_PRFCNT_SAMPLE_MASK);
+		const u32 toggled = (value ^ ack) & req_mask;
+
+		output_page_write(iface->output, GLB_ACK, ack ^ toggled);
 	}
 }
 KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_input);
@@ -455,6 +447,99 @@ u32 kbase_csf_firmware_global_output(
 KBASE_EXPORT_TEST_API(kbase_csf_firmware_global_output);
 
 /**
+ * csf_doorbell_prfcnt() - Process CSF performance counter doorbell request
+ *
+ * @kbdev: An instance of the GPU platform device
+ */
+static void csf_doorbell_prfcnt(struct kbase_device *kbdev)
+{
+	struct kbase_csf_global_iface *iface;
+	u32 req;
+	u32 ack;
+	u32 extract_index;
+
+	if (WARN_ON(!kbdev))
+		return;
+
+	iface = &kbdev->csf.global_iface;
+
+	req = input_page_read(iface->input, GLB_REQ);
+	ack = output_page_read(iface->output, GLB_ACK);
+	extract_index = input_page_read(iface->input, GLB_PRFCNT_EXTRACT);
+
+	/* Process enable bit toggle */
+	if ((req ^ ack) & GLB_REQ_PRFCNT_ENABLE_MASK) {
+		if (req & GLB_REQ_PRFCNT_ENABLE_MASK) {
+			/* Reset insert index to zero on enable bit set */
+			output_page_write(iface->output, GLB_PRFCNT_INSERT, 0);
+			WARN_ON(extract_index != 0);
+		}
+		ack ^= GLB_REQ_PRFCNT_ENABLE_MASK;
+	}
+
+	/* Process sample request */
+	if ((req ^ ack) & GLB_REQ_PRFCNT_SAMPLE_MASK) {
+		const u32 ring_size = GLB_PRFCNT_CONFIG_SIZE_GET(
+			input_page_read(iface->input, GLB_PRFCNT_CONFIG));
+		u32 insert_index = output_page_read(iface->output, GLB_PRFCNT_INSERT);
+
+		const bool prev_overflow = (req ^ ack) & GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK;
+		const bool prev_threshold = (req ^ ack) & GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK;
+
+		/* If ringbuffer is full toggle PRFCNT_OVERFLOW and skip sample */
+		if (insert_index - extract_index >= ring_size) {
+			WARN_ON(insert_index - extract_index > ring_size);
+			if (!prev_overflow)
+				ack ^= GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK;
+		} else {
+			struct gpu_model_prfcnt_en enable_maps = {
+				.fe = input_page_read(iface->input, GLB_PRFCNT_CSF_EN),
+				.tiler = input_page_read(iface->input, GLB_PRFCNT_TILER_EN),
+				.l2 = input_page_read(iface->input, GLB_PRFCNT_MMU_L2_EN),
+				.shader = input_page_read(iface->input, GLB_PRFCNT_SHADER_EN),
+			};
+
+			const u64 prfcnt_base =
+				input_page_read(iface->input, GLB_PRFCNT_BASE_LO) +
+				((u64)input_page_read(iface->input, GLB_PRFCNT_BASE_HI) << 32);
+
+			u32 *sample_base = (u32 *)(uintptr_t)prfcnt_base +
+					   (KBASE_DUMMY_MODEL_MAX_VALUES_PER_SAMPLE *
+					    (insert_index % ring_size));
+
+			/* trigger sample dump in the dummy model */
+			gpu_model_prfcnt_dump_request(sample_base, enable_maps);
+
+			/* increment insert index and toggle PRFCNT_SAMPLE bit in ACK */
+			output_page_write(iface->output, GLB_PRFCNT_INSERT, ++insert_index);
+			ack ^= GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK;
+		}
+
+		/* When the ringbuffer reaches 50% capacity toggle PRFCNT_THRESHOLD */
+		if (!prev_threshold && (insert_index - extract_index >= (ring_size / 2)))
+			ack ^= GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK;
+	}
+
+	/* Update GLB_ACK */
+	output_page_write(iface->output, GLB_ACK, ack);
+}
+
+void kbase_csf_ring_doorbell(struct kbase_device *kbdev, int doorbell_nr)
+{
+	WARN_ON(doorbell_nr < 0);
+	WARN_ON(doorbell_nr >= CSF_NUM_DOORBELL);
+
+	if (WARN_ON(!kbdev))
+		return;
+
+	if (doorbell_nr == CSF_KERNEL_DOORBELL_NR) {
+		csf_doorbell_prfcnt(kbdev);
+		gpu_model_glb_request_job_irq(kbdev->model);
+	}
+}
+EXPORT_SYMBOL(kbase_csf_ring_doorbell);
+
+/**
  * handle_internal_firmware_fatal - Handler for CS internal firmware fault.
  *
  * @kbdev:  Pointer to kbase device
@@ -631,17 +716,80 @@ static void enable_gpu_idle_timer(struct kbase_device *const kbdev)
 		kbdev->csf.gpu_idle_dur_count);
 }
 
+static bool global_debug_request_complete(struct kbase_device *const kbdev, u32 const req_mask)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	bool complete = false;
+	unsigned long flags;
+
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	if ((kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK) & req_mask) ==
+	    (kbase_csf_firmware_global_input_read(global_iface, GLB_DEBUG_REQ) & req_mask))
+		complete = true;
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	return complete;
+}
+
+static void set_global_debug_request(const struct kbase_csf_global_iface *const global_iface,
+				     u32 const req_mask)
+{
+	u32 glb_debug_req;
+
+	kbase_csf_scheduler_spin_lock_assert_held(global_iface->kbdev);
+
+	glb_debug_req = kbase_csf_firmware_global_output(global_iface, GLB_DEBUG_ACK);
+	glb_debug_req ^= req_mask;
+
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_DEBUG_REQ, glb_debug_req, req_mask);
+}
+
+static void request_fw_core_dump(
+	const struct kbase_csf_global_iface *const global_iface)
+{
+	uint32_t run_mode = GLB_DEBUG_REQ_RUN_MODE_SET(0, GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP);
+
+	set_global_debug_request(global_iface, GLB_DEBUG_REQ_DEBUG_RUN_MASK | run_mode);
+
+	set_global_request(global_iface, GLB_REQ_DEBUG_CSF_REQ_MASK);
+}
+
+int kbase_csf_firmware_req_core_dump(struct kbase_device *const kbdev)
+{
+	const struct kbase_csf_global_iface *const global_iface =
+		&kbdev->csf.global_iface;
+	unsigned long flags;
+	int ret;
+
+	/* Serialize CORE_DUMP requests. */
+	mutex_lock(&kbdev->csf.reg_lock);
+
+	/* Update GLB_REQ with CORE_DUMP request and make firmware act on it. */
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	request_fw_core_dump(global_iface);
+	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+
+	/* Wait for firmware to acknowledge completion of the CORE_DUMP request. */
+	ret = wait_for_global_request(kbdev, GLB_REQ_DEBUG_CSF_REQ_MASK);
+	if (!ret)
+		WARN_ON(!global_debug_request_complete(kbdev, GLB_DEBUG_REQ_DEBUG_RUN_MASK));
+
+	mutex_unlock(&kbdev->csf.reg_lock);
+
+	return ret;
+}
+
 static void global_init(struct kbase_device *const kbdev, u64 core_mask)
 {
-	u32 const ack_irq_mask = GLB_ACK_IRQ_MASK_CFG_ALLOC_EN_MASK |
-				 GLB_ACK_IRQ_MASK_PING_MASK |
-				 GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK |
-				 GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK |
-				 GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK |
-				 GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK |
-				 GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK |
-				 GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK |
-				 GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK;
+	u32 const ack_irq_mask =
+		GLB_ACK_IRQ_MASK_CFG_ALLOC_EN_MASK | GLB_ACK_IRQ_MASK_PING_MASK |
+		GLB_ACK_IRQ_MASK_CFG_PROGRESS_TIMER_MASK | GLB_ACK_IRQ_MASK_PROTM_ENTER_MASK |
+		GLB_ACK_IRQ_MASK_PROTM_EXIT_MASK | GLB_ACK_IRQ_MASK_FIRMWARE_CONFIG_UPDATE_MASK |
+		GLB_ACK_IRQ_MASK_CFG_PWROFF_TIMER_MASK | GLB_ACK_IRQ_MASK_IDLE_EVENT_MASK |
+		GLB_ACK_IRQ_MASK_IDLE_ENABLE_MASK | GLB_REQ_DEBUG_CSF_REQ_MASK;
 
 	const struct kbase_csf_global_iface *const global_iface =
 		&kbdev->csf.global_iface;
@@ -655,11 +803,14 @@ static void global_init(struct kbase_device *const kbdev, u64 core_mask)
 
 	set_timeout_global(global_iface, kbase_csf_timeout_get(kbdev));
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	/* The GPU idle timer is always enabled for simplicity. Checks will be
 	 * done before scheduling the GPU idle worker to see if it is
 	 * appropriate for the current power policy.
 	 */
 	enable_gpu_idle_timer(kbdev);
+#endif
+
 
 	/* Unmask the interrupts */
 	kbase_csf_firmware_global_input(global_iface,
@@ -785,7 +936,7 @@ void kbase_csf_firmware_reload_completed(struct kbase_device *kbdev)
 	kbase_pm_update_state(kbdev);
 }
 
-static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms)
+static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_ms, u32 *modifier)
 {
 #define HYSTERESIS_VAL_UNIT_SHIFT (10)
 	/* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */
@@ -803,14 +954,17 @@ static u32 convert_dur_to_idle_count(struct kbase_device *kbdev, const u32 dur_m
 			dev_warn(kbdev->dev, "No GPU clock, unexpected intregration issue!");
 		spin_unlock(&kbdev->pm.clk_rtm.lock);
 
-		dev_info(kbdev->dev, "Can't get the timestamp frequency, "
-			 "use cycle counter format with firmware idle hysteresis!");
+		dev_info(
+			kbdev->dev,
+			"Can't get the timestamp frequency, use cycle counter format with firmware idle hysteresis!");
 	}
 
 	/* Formula for dur_val = ((dur_ms/1000) * freq_HZ) >> 10) */
 	dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT;
 	dur_val = div_u64(dur_val, 1000);
 
+	*modifier = 0;
+
 	/* Interface limits the value field to S32_MAX */
 	cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val;
 
@@ -832,7 +986,7 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev)
 	u32 dur;
 
 	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	dur = kbdev->csf.gpu_idle_hysteresis_ms;
+	dur = kbdev->csf.gpu_idle_hysteresis_ns;
 	kbase_csf_scheduler_spin_unlock(kbdev, flags);
 
 	return dur;
@@ -841,7 +995,9 @@ u32 kbase_csf_firmware_get_gpu_idle_hysteresis_time(struct kbase_device *kbdev)
 u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev, u32 dur)
 {
 	unsigned long flags;
-	const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur);
+	u32 modifier = 0;
+
+	const u32 hysteresis_val = convert_dur_to_idle_count(kbdev, dur, &modifier);
 
 	/* The 'fw_load_lock' is taken to synchronize against the deferred
 	 * loading of FW, where the idle timer will be enabled.
@@ -849,46 +1005,77 @@ u32 kbase_csf_firmware_set_gpu_idle_hysteresis_time(struct kbase_device *kbdev,
 	mutex_lock(&kbdev->fw_load_lock);
 	if (unlikely(!kbdev->csf.firmware_inited)) {
 		kbase_csf_scheduler_spin_lock(kbdev, &flags);
-		kbdev->csf.gpu_idle_hysteresis_ms = dur;
+		kbdev->csf.gpu_idle_hysteresis_ns = dur;
 		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
 		kbase_csf_scheduler_spin_unlock(kbdev, flags);
 		mutex_unlock(&kbdev->fw_load_lock);
 		goto end;
 	}
 	mutex_unlock(&kbdev->fw_load_lock);
 
+	if (kbase_reset_gpu_prevent_and_wait(kbdev)) {
+		dev_warn(kbdev->dev,
+			 "Failed to prevent GPU reset when updating idle_hysteresis_time");
+		return kbdev->csf.gpu_idle_dur_count;
+	}
+
 	kbase_csf_scheduler_pm_active(kbdev);
-	if (kbase_csf_scheduler_wait_mcu_active(kbdev)) {
+	if (kbase_csf_scheduler_killable_wait_mcu_active(kbdev)) {
 		dev_err(kbdev->dev,
 			"Unable to activate the MCU, the idle hysteresis value shall remain unchanged");
 		kbase_csf_scheduler_pm_idle(kbdev);
+		kbase_reset_gpu_allow(kbdev);
+
 		return kbdev->csf.gpu_idle_dur_count;
 	}
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	/* The 'reg_lock' is also taken and is held till the update is not
 	 * complete, to ensure the update of idle timer value by multiple Users
 	 * gets serialized.
 	 */
 	mutex_lock(&kbdev->csf.reg_lock);
-	/* The firmware only reads the new idle timer value when the timer is
-	 * disabled.
-	 */
-	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	kbase_csf_firmware_disable_gpu_idle_timer(kbdev);
-	kbase_csf_scheduler_spin_unlock(kbdev, flags);
-	/* Ensure that the request has taken effect */
-	wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK);
+#endif
 
-	kbase_csf_scheduler_spin_lock(kbdev, &flags);
-	kbdev->csf.gpu_idle_hysteresis_ms = dur;
-	kbdev->csf.gpu_idle_dur_count = hysteresis_val;
-	kbase_csf_firmware_enable_gpu_idle_timer(kbdev);
-	kbase_csf_scheduler_spin_unlock(kbdev, flags);
-	wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	kbase_csf_scheduler_lock(kbdev);
+	if (kbdev->csf.scheduler.gpu_idle_fw_timer_enabled) {
+#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */
+		/* The firmware only reads the new idle timer value when the timer is
+		 * disabled.
+		 */
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbase_csf_firmware_disable_gpu_idle_timer(kbdev);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		/* Ensure that the request has taken effect */
+		wait_for_global_request(kbdev, GLB_REQ_IDLE_DISABLE_MASK);
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbdev->csf.gpu_idle_hysteresis_us = dur;
+		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
+		kbase_csf_firmware_enable_gpu_idle_timer(kbdev);
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		wait_for_global_request(kbdev, GLB_REQ_IDLE_ENABLE_MASK);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	} else {
+		/* Record the new values. Would be used later when timer is
+		 * enabled
+		 */
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+		kbdev->csf.gpu_idle_hysteresis_us = dur;
+		kbdev->csf.gpu_idle_dur_count = hysteresis_val;
+		kbdev->csf.gpu_idle_dur_count_modifier = modifier;
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	}
+	kbase_csf_scheduler_unlock(kbdev);
+#else
 	mutex_unlock(&kbdev->csf.reg_lock);
+#endif
 
 	kbase_csf_scheduler_pm_idle(kbdev);
-
+	kbase_reset_gpu_allow(kbdev);
 end:
 	dev_dbg(kbdev->dev, "CSF set firmware idle hysteresis count-value: 0x%.8x",
 		hysteresis_val);
@@ -896,9 +1083,9 @@ end:
 	return hysteresis_val;
 }
 
-static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us)
+static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u32 dur_us,
+					    u32 *modifier)
 {
-#define PWROFF_VAL_UNIT_SHIFT (10)
 	/* Get the cntfreq_el0 value, which drives the SYSTEM_TIMESTAMP */
 	u64 freq = arch_timer_get_cntfrq();
 	u64 dur_val = dur_us;
@@ -914,14 +1101,17 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3
 			dev_warn(kbdev->dev, "No GPU clock, unexpected integration issue!");
 		spin_unlock(&kbdev->pm.clk_rtm.lock);
 
-		dev_info(kbdev->dev, "Can't get the timestamp frequency, "
-			 "use cycle counter with MCU Core Poweroff timer!");
+		dev_info(
+			kbdev->dev,
+			"Can't get the timestamp frequency, use cycle counter with MCU shader Core Poweroff timer!");
 	}
 
 	/* Formula for dur_val = ((dur_us/1e6) * freq_HZ) >> 10) */
 	dur_val = (dur_val * freq) >> HYSTERESIS_VAL_UNIT_SHIFT;
 	dur_val = div_u64(dur_val, 1000000);
 
+	*modifier = 0;
+
 	/* Interface limits the value field to S32_MAX */
 	cnt_val_u32 = (dur_val > S32_MAX) ? S32_MAX : (u32)dur_val;
 
@@ -939,24 +1129,39 @@ static u32 convert_dur_to_core_pwroff_count(struct kbase_device *kbdev, const u3
 
 u32 kbase_csf_firmware_get_mcu_core_pwroff_time(struct kbase_device *kbdev)
 {
-	return kbdev->csf.mcu_core_pwroff_dur_us;
+	u32 pwroff;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	pwroff = kbdev->csf.mcu_core_pwroff_dur_ns;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return pwroff;
 }
 
 u32 kbase_csf_firmware_set_mcu_core_pwroff_time(struct kbase_device *kbdev, u32 dur)
 {
 	unsigned long flags;
-	const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur);
+	u32 modifier = 0;
+
+	const u32 pwroff = convert_dur_to_core_pwroff_count(kbdev, dur, &modifier);
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	kbdev->csf.mcu_core_pwroff_dur_us = dur;
+	kbdev->csf.mcu_core_pwroff_dur_ns = dur;
 	kbdev->csf.mcu_core_pwroff_dur_count = pwroff;
+	kbdev->csf.mcu_core_pwroff_dur_count_modifier = modifier;
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
-	dev_dbg(kbdev->dev, "MCU Core Poweroff input update: 0x%.8x", pwroff);
+	dev_dbg(kbdev->dev, "MCU shader Core Poweroff input update: 0x%.8x", pwroff);
 
 	return pwroff;
 }
 
+u32 kbase_csf_firmware_reset_mcu_core_pwroff_time(struct kbase_device *kbdev)
+{
+	return kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, DEFAULT_GLB_PWROFF_TIMEOUT_NS);
+}
+
 int kbase_csf_firmware_early_init(struct kbase_device *kbdev)
 {
 	init_waitqueue_head(&kbdev->csf.event_wait);
@@ -965,29 +1170,46 @@ int kbase_csf_firmware_early_init(struct kbase_device *kbdev)
 	kbdev->csf.fw_timeout_ms =
 		kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_TIMEOUT);
 
-	kbdev->csf.gpu_idle_hysteresis_ms = FIRMWARE_IDLE_HYSTERESIS_TIME_MS;
-#ifdef KBASE_PM_RUNTIME
-	if (kbase_pm_gpu_sleep_allowed(kbdev))
-		kbdev->csf.gpu_idle_hysteresis_ms /=
-			FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER;
-#endif
-	WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ms);
-	kbdev->csf.gpu_idle_dur_count = convert_dur_to_idle_count(
-		kbdev, kbdev->csf.gpu_idle_hysteresis_ms);
-
+	kbase_csf_firmware_reset_mcu_core_pwroff_time(kbdev);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_interfaces);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_config);
 	INIT_LIST_HEAD(&kbdev->csf.firmware_trace_buffers.list);
+	INIT_LIST_HEAD(&kbdev->csf.user_reg.list);
 	INIT_WORK(&kbdev->csf.firmware_reload_work,
 		  kbase_csf_firmware_reload_worker);
 	INIT_WORK(&kbdev->csf.fw_error_work, firmware_error_worker);
 
+	init_rwsem(&kbdev->csf.pmode_sync_sem);
 	mutex_init(&kbdev->csf.reg_lock);
+	kbase_csf_pending_gpuq_kicks_init(kbdev);
+
+	return 0;
+}
+
+void kbase_csf_firmware_early_term(struct kbase_device *kbdev)
+{
+	kbase_csf_pending_gpuq_kicks_term(kbdev);
+	mutex_destroy(&kbdev->csf.reg_lock);
+}
+
+int kbase_csf_firmware_late_init(struct kbase_device *kbdev)
+{
+	u32 modifier = 0;
+
+	kbdev->csf.gpu_idle_hysteresis_ns = FIRMWARE_IDLE_HYSTERESIS_TIME_NS;
+#ifdef KBASE_PM_RUNTIME
+	if (kbase_pm_gpu_sleep_allowed(kbdev))
+		kbdev->csf.gpu_idle_hysteresis_ns /= FIRMWARE_IDLE_HYSTERESIS_GPU_SLEEP_SCALER;
+#endif
+	WARN_ON(!kbdev->csf.gpu_idle_hysteresis_ns);
+	kbdev->csf.gpu_idle_dur_count =
+		convert_dur_to_idle_count(kbdev, kbdev->csf.gpu_idle_hysteresis_ns, &modifier);
+	kbdev->csf.gpu_idle_dur_count_modifier = modifier;
 
 	return 0;
 }
 
-int kbase_csf_firmware_init(struct kbase_device *kbdev)
+int kbase_csf_firmware_load_init(struct kbase_device *kbdev)
 {
 	int ret;
 
@@ -1053,11 +1275,11 @@ int kbase_csf_firmware_init(struct kbase_device *kbdev)
 	return 0;
 
 error:
-	kbase_csf_firmware_term(kbdev);
+	kbase_csf_firmware_unload_term(kbdev);
 	return ret;
 }
 
-void kbase_csf_firmware_term(struct kbase_device *kbdev)
+void kbase_csf_firmware_unload_term(struct kbase_device *kbdev)
 {
 	cancel_work_sync(&kbdev->csf.fw_error_work);
 
@@ -1065,12 +1287,10 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev)
 
 	/* NO_MALI: Don't stop firmware or unload MMU tables */
 
-	kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu);
+	kbase_csf_free_dummy_user_reg_page(kbdev);
 
 	kbase_csf_scheduler_term(kbdev);
 
-	kbase_csf_free_dummy_user_reg_page(kbdev);
-
 	kbase_csf_doorbell_mapping_term(kbdev);
 
 	free_global_iface(kbdev);
@@ -1092,12 +1312,12 @@ void kbase_csf_firmware_term(struct kbase_device *kbdev)
 
 	/* NO_MALI: No trace buffers to terminate */
 
-	mutex_destroy(&kbdev->csf.reg_lock);
-
 	/* This will also free up the region allocated for the shared interface
 	 * entry parsed from the firmware image.
 	 */
 	kbase_mcu_shared_interface_region_tracker_term(kbdev);
+
+	kbase_mmu_term(kbdev, &kbdev->csf.mcu_mmu);
 }
 
 void kbase_csf_firmware_enable_gpu_idle_timer(struct kbase_device *kbdev)
@@ -1146,8 +1366,9 @@ void kbase_csf_firmware_ping(struct kbase_device *const kbdev)
 	kbase_csf_scheduler_spin_unlock(kbdev, flags);
 }
 
-int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev)
+int kbase_csf_firmware_ping_wait(struct kbase_device *const kbdev, unsigned int wait_timeout_ms)
 {
+	CSTD_UNUSED(wait_timeout_ms);
 	kbase_csf_firmware_ping(kbdev);
 	return wait_for_global_request(kbdev, GLB_REQ_PING_MASK);
 }
@@ -1186,7 +1407,7 @@ void kbase_csf_enter_protected_mode(struct kbase_device *kbdev)
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 }
 
-void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev)
+int kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev)
 {
 	int err = wait_for_global_request(kbdev, GLB_REQ_PROTM_ENTER_MASK);
 
@@ -1194,6 +1415,8 @@ void kbase_csf_wait_protected_mode_enter(struct kbase_device *kbdev)
 		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu(kbdev);
 	}
+
+	return err;
 }
 
 void kbase_csf_firmware_trigger_mcu_halt(struct kbase_device *kbdev)
@@ -1392,7 +1615,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 		gpu_map_prot =
 			KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
 		cpu_map_prot = pgprot_writecombine(cpu_map_prot);
-	};
+	}
 
 	phys = kmalloc_array(num_pages, sizeof(*phys), GFP_KERNEL);
 	if (!phys)
@@ -1402,9 +1625,8 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	if (!page_list)
 		goto page_list_alloc_error;
 
-	ret = kbase_mem_pool_alloc_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-		num_pages, phys, false);
+	ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages,
+					 phys, false, NULL);
 	if (ret <= 0)
 		goto phys_mem_pool_alloc_error;
 
@@ -1415,8 +1637,7 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	if (!cpu_addr)
 		goto vmap_error;
 
-	va_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree, 0,
-			num_pages, KBASE_REG_ZONE_MCU_SHARED);
+	va_reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, num_pages);
 	if (!va_reg)
 		goto va_region_alloc_error;
 
@@ -1430,9 +1651,9 @@ int kbase_csf_firmware_mcu_shared_mapping_init(
 	gpu_map_properties &= (KBASE_REG_GPU_RD | KBASE_REG_GPU_WR);
 	gpu_map_properties |= gpu_map_prot;
 
-	ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu,
-			va_reg->start_pfn, &phys[0], num_pages,
-			gpu_map_properties, KBASE_MEM_GROUP_CSF_FW);
+	ret = kbase_mmu_insert_pages_no_flush(kbdev, &kbdev->csf.mcu_mmu, va_reg->start_pfn,
+					      &phys[0], num_pages, gpu_map_properties,
+					      KBASE_MEM_GROUP_CSF_FW, NULL, NULL);
 	if (ret)
 		goto mmu_insert_pages_error;
 
diff --git a/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c b/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c
index 4b3931f..7c14b8e 100644
--- a/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c
+++ b/mali_kbase/csf/mali_kbase_csf_heap_context_alloc.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,10 +23,7 @@
 #include "mali_kbase_csf_heap_context_alloc.h"
 
 /* Size of one heap context structure, in bytes. */
-#define HEAP_CTX_SIZE ((size_t)32)
-
-/* Total size of the GPU memory region allocated for heap contexts, in bytes. */
-#define HEAP_CTX_REGION_SIZE (MAX_TILER_HEAPS * HEAP_CTX_SIZE)
+#define HEAP_CTX_SIZE ((u32)32)
 
 /**
  * sub_alloc - Sub-allocate a heap context from a GPU memory region
@@ -38,8 +35,8 @@
 static u64 sub_alloc(struct kbase_csf_heap_context_allocator *const ctx_alloc)
 {
 	struct kbase_context *const kctx = ctx_alloc->kctx;
-	int heap_nr = 0;
-	size_t ctx_offset = 0;
+	unsigned long heap_nr = 0;
+	u32 ctx_offset = 0;
 	u64 heap_gpu_va = 0;
 	struct kbase_vmap_struct mapping;
 	void *ctx_ptr = NULL;
@@ -55,30 +52,65 @@ static u64 sub_alloc(struct kbase_csf_heap_context_allocator *const ctx_alloc)
 		return 0;
 	}
 
-	ctx_offset = heap_nr * HEAP_CTX_SIZE;
+	ctx_offset = heap_nr * ctx_alloc->heap_context_size_aligned;
 	heap_gpu_va = ctx_alloc->gpu_va + ctx_offset;
 	ctx_ptr = kbase_vmap_prot(kctx, heap_gpu_va,
-		HEAP_CTX_SIZE, KBASE_REG_CPU_WR, &mapping);
+		ctx_alloc->heap_context_size_aligned, KBASE_REG_CPU_WR, &mapping);
 
 	if (unlikely(!ctx_ptr)) {
 		dev_err(kctx->kbdev->dev,
-			"Failed to map tiler heap context %d (0x%llX)\n",
+			"Failed to map tiler heap context %lu (0x%llX)\n",
 			heap_nr, heap_gpu_va);
 		return 0;
 	}
 
-	memset(ctx_ptr, 0, HEAP_CTX_SIZE);
+	memset(ctx_ptr, 0, ctx_alloc->heap_context_size_aligned);
 	kbase_vunmap(ctx_ptr, &mapping);
 
 	bitmap_set(ctx_alloc->in_use, heap_nr, 1);
 
-	dev_dbg(kctx->kbdev->dev, "Allocated tiler heap context %d (0x%llX)\n",
+	dev_dbg(kctx->kbdev->dev, "Allocated tiler heap context %lu (0x%llX)\n",
 		heap_nr, heap_gpu_va);
 
 	return heap_gpu_va;
 }
 
 /**
+ * evict_heap_context - Evict the data of heap context from GPU's L2 cache.
+ *
+ * @ctx_alloc:   Pointer to the heap context allocator.
+ * @heap_gpu_va: The GPU virtual address of a heap context structure to free.
+ *
+ * This function is called when memory for the heap context is freed. It uses the
+ * FLUSH_PA_RANGE command to evict the data of heap context, so on older CSF GPUs
+ * there is nothing done. The whole GPU cache is anyways expected to be flushed
+ * on older GPUs when initial chunks of the heap are freed just before the memory
+ * for heap context is freed.
+ */
+static void evict_heap_context(struct kbase_csf_heap_context_allocator *const ctx_alloc,
+			      u64 const heap_gpu_va)
+{
+	struct kbase_context *const kctx = ctx_alloc->kctx;
+	u32 offset_in_bytes = (u32)(heap_gpu_va - ctx_alloc->gpu_va);
+	u32 offset_within_page = offset_in_bytes & ~PAGE_MASK;
+	u32 page_index = offset_in_bytes >> PAGE_SHIFT;
+	struct tagged_addr page =
+		kbase_get_gpu_phy_pages(ctx_alloc->region)[page_index];
+	phys_addr_t heap_context_pa = as_phys_addr_t(page) + offset_within_page;
+
+	lockdep_assert_held(&ctx_alloc->lock);
+
+	/* There is no need to take vm_lock here as the ctx_alloc region is protected
+	 * via a nonzero no_user_free_count. The region and the backing page can't
+	 * disappear whilst this function is executing. Flush type is passed as FLUSH_PT
+	 * to CLN+INV L2 only.
+	 */
+	kbase_mmu_flush_pa_range(kctx->kbdev, kctx,
+				heap_context_pa, ctx_alloc->heap_context_size_aligned,
+				KBASE_MMU_OP_FLUSH_PT);
+}
+
+/**
  * sub_free - Free a heap context sub-allocated from a GPU memory region
  *
  * @ctx_alloc:   Pointer to the heap context allocator.
@@ -88,7 +120,7 @@ static void sub_free(struct kbase_csf_heap_context_allocator *const ctx_alloc,
 	u64 const heap_gpu_va)
 {
 	struct kbase_context *const kctx = ctx_alloc->kctx;
-	u64 ctx_offset = 0;
+	u32 ctx_offset = 0;
 	unsigned int heap_nr = 0;
 
 	lockdep_assert_held(&ctx_alloc->lock);
@@ -99,13 +131,15 @@ static void sub_free(struct kbase_csf_heap_context_allocator *const ctx_alloc,
 	if (WARN_ON(heap_gpu_va < ctx_alloc->gpu_va))
 		return;
 
-	ctx_offset = heap_gpu_va - ctx_alloc->gpu_va;
+	ctx_offset = (u32)(heap_gpu_va - ctx_alloc->gpu_va);
 
-	if (WARN_ON(ctx_offset >= HEAP_CTX_REGION_SIZE) ||
-		WARN_ON(ctx_offset % HEAP_CTX_SIZE))
+	if (WARN_ON(ctx_offset >= (ctx_alloc->region->nr_pages << PAGE_SHIFT)) ||
+		WARN_ON(ctx_offset % ctx_alloc->heap_context_size_aligned))
 		return;
 
-	heap_nr = ctx_offset / HEAP_CTX_SIZE;
+	evict_heap_context(ctx_alloc, heap_gpu_va);
+
+	heap_nr = ctx_offset / ctx_alloc->heap_context_size_aligned;
 	dev_dbg(kctx->kbdev->dev,
 		"Freed tiler heap context %d (0x%llX)\n", heap_nr, heap_gpu_va);
 
@@ -116,12 +150,17 @@ int kbase_csf_heap_context_allocator_init(
 	struct kbase_csf_heap_context_allocator *const ctx_alloc,
 	struct kbase_context *const kctx)
 {
+	const u32 gpu_cache_line_size =
+		(1U << kctx->kbdev->gpu_props.props.l2_props.log2_line_size);
+
 	/* We cannot pre-allocate GPU memory here because the
 	 * custom VA zone may not have been created yet.
 	 */
 	ctx_alloc->kctx = kctx;
 	ctx_alloc->region = NULL;
 	ctx_alloc->gpu_va = 0;
+	ctx_alloc->heap_context_size_aligned =
+		(HEAP_CTX_SIZE + gpu_cache_line_size - 1) & ~(gpu_cache_line_size - 1);
 
 	mutex_init(&ctx_alloc->lock);
 	bitmap_zero(ctx_alloc->in_use, MAX_TILER_HEAPS);
@@ -142,7 +181,9 @@ void kbase_csf_heap_context_allocator_term(
 
 	if (ctx_alloc->region) {
 		kbase_gpu_vm_lock(kctx);
-		ctx_alloc->region->flags &= ~KBASE_REG_NO_USER_FREE;
+		WARN_ON(!kbase_va_region_is_no_user_free(ctx_alloc->region));
+
+		kbase_va_region_no_user_free_dec(ctx_alloc->region);
 		kbase_mem_free_region(kctx, ctx_alloc->region);
 		kbase_gpu_vm_unlock(kctx);
 	}
@@ -154,9 +195,9 @@ u64 kbase_csf_heap_context_allocator_alloc(
 	struct kbase_csf_heap_context_allocator *const ctx_alloc)
 {
 	struct kbase_context *const kctx = ctx_alloc->kctx;
-	u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR |
-		BASE_MEM_PROT_CPU_WR | BASEP_MEM_NO_USER_FREE;
-	u64 nr_pages = PFN_UP(HEAP_CTX_REGION_SIZE);
+	u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | BASE_MEM_PROT_CPU_WR |
+		    BASEP_MEM_NO_USER_FREE | BASE_MEM_PROT_CPU_RD;
+	u64 nr_pages = PFN_UP(MAX_TILER_HEAPS * ctx_alloc->heap_context_size_aligned);
 	u64 heap_gpu_va = 0;
 
 	/* Calls to this function are inherently asynchronous, with respect to
@@ -164,10 +205,6 @@ u64 kbase_csf_heap_context_allocator_alloc(
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
 
-#ifdef CONFIG_MALI_VECTOR_DUMP
-	flags |= BASE_MEM_PROT_CPU_RD;
-#endif
-
 	mutex_lock(&ctx_alloc->lock);
 
 	/* If the pool of heap contexts wasn't already allocated then
diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu.c b/mali_kbase/csf/mali_kbase_csf_kcpu.c
index 5380994..0b08dba 100644
--- a/mali_kbase/csf/mali_kbase_csf_kcpu.c
+++ b/mali_kbase/csf/mali_kbase_csf_kcpu.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,7 +24,9 @@
 #include <mali_kbase_ctx_sched.h>
 #include "device/mali_kbase_device.h"
 #include "mali_kbase_csf.h"
+#include "mali_kbase_csf_sync_debugfs.h"
 #include <linux/export.h>
+#include <linux/version_compat_defs.h>
 
 #if IS_ENABLED(CONFIG_SYNC_FILE)
 #include "mali_kbase_fence.h"
@@ -33,10 +35,14 @@
 static DEFINE_SPINLOCK(kbase_csf_fence_lock);
 #endif
 
+#ifdef CONFIG_MALI_FENCE_DEBUG
+#define FENCE_WAIT_TIMEOUT_MS 3000
+#endif
+
 static void kcpu_queue_process(struct kbase_kcpu_command_queue *kcpu_queue,
 			       bool drain_queue);
 
-static void kcpu_queue_process_worker(struct work_struct *data);
+static void kcpu_queue_process_worker(struct kthread_work *data);
 
 static int kbase_kcpu_map_import_prepare(
 		struct kbase_kcpu_command_queue *kcpu_queue,
@@ -51,7 +57,7 @@ static int kbase_kcpu_map_import_prepare(
 	long i;
 	int ret = 0;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	/* Take the processes mmap lock */
 	down_read(kbase_mem_get_process_mmap_lock());
@@ -76,7 +82,14 @@ static int kbase_kcpu_map_import_prepare(
 		 * on the physical pages tracking object. When the last
 		 * reference to the tracking object is dropped the pages
 		 * would be unpinned if they weren't unpinned before.
+		 *
+		 * Region should be CPU cached: abort if it isn't.
 		 */
+		if (WARN_ON(!(reg->flags & KBASE_REG_CPU_CACHED))) {
+			ret = -EINVAL;
+			goto out;
+		}
+
 		ret = kbase_jd_user_buf_pin_pages(kctx, reg);
 		if (ret)
 			goto out;
@@ -110,7 +123,7 @@ static int kbase_kcpu_unmap_import_prepare_internal(
 	struct kbase_va_region *reg;
 	int ret = 0;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	kbase_gpu_vm_lock(kctx);
 
@@ -178,7 +191,8 @@ static void kbase_jit_add_to_pending_alloc_list(
 			&kctx->csf.kcpu_queues.jit_blocked_queues;
 	struct kbase_kcpu_command_queue *blocked_queue;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
+	lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock);
 
 	list_for_each_entry(blocked_queue,
 			&kctx->csf.kcpu_queues.jit_blocked_queues,
@@ -223,25 +237,28 @@ static int kbase_kcpu_jit_allocate_process(
 	u32 i;
 	int ret;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
-
-	if (alloc_info->blocked) {
-		list_del(&queue->jit_blocked);
-		alloc_info->blocked = false;
-	}
+	lockdep_assert_held(&queue->lock);
 
 	if (WARN_ON(!info))
 		return -EINVAL;
 
+	mutex_lock(&kctx->csf.kcpu_queues.jit_lock);
+
 	/* Check if all JIT IDs are not in use */
 	for (i = 0; i < count; i++, info++) {
 		/* The JIT ID is still in use so fail the allocation */
 		if (kctx->jit_alloc[info->id]) {
 			dev_dbg(kctx->kbdev->dev, "JIT ID still in use");
-			return -EINVAL;
+			ret = -EINVAL;
+			goto fail;
 		}
 	}
 
+	if (alloc_info->blocked) {
+		list_del(&queue->jit_blocked);
+		alloc_info->blocked = false;
+	}
+
 	/* Now start the allocation loop */
 	for (i = 0, info = alloc_info->info; i < count; i++, info++) {
 		/* Create a JIT allocation */
@@ -276,7 +293,7 @@ static int kbase_kcpu_jit_allocate_process(
 				 */
 				dev_warn_ratelimited(kctx->kbdev->dev, "JIT alloc command failed: %pK\n", cmd);
 				ret = -ENOMEM;
-				goto fail;
+				goto fail_rollback;
 			}
 
 			/* There are pending frees for an active allocation
@@ -294,7 +311,8 @@ static int kbase_kcpu_jit_allocate_process(
 				kctx->jit_alloc[info->id] = NULL;
 			}
 
-			return -EAGAIN;
+			ret = -EAGAIN;
+			goto fail;
 		}
 
 		/* Bind it to the user provided ID. */
@@ -310,7 +328,7 @@ static int kbase_kcpu_jit_allocate_process(
 				KBASE_REG_CPU_WR, &mapping);
 		if (!ptr) {
 			ret = -ENOMEM;
-			goto fail;
+			goto fail_rollback;
 		}
 
 		reg = kctx->jit_alloc[info->id];
@@ -319,9 +337,11 @@ static int kbase_kcpu_jit_allocate_process(
 		kbase_vunmap(kctx, &mapping);
 	}
 
+	mutex_unlock(&kctx->csf.kcpu_queues.jit_lock);
+
 	return 0;
 
-fail:
+fail_rollback:
 	/* Roll back completely */
 	for (i = 0, info = alloc_info->info; i < count; i++, info++) {
 		/* Free the allocations that were successful.
@@ -334,6 +354,8 @@ fail:
 
 		kctx->jit_alloc[info->id] = KBASE_RESERVED_REG_JIT_ALLOC;
 	}
+fail:
+	mutex_unlock(&kctx->csf.kcpu_queues.jit_lock);
 
 	return ret;
 }
@@ -345,15 +367,16 @@ static int kbase_kcpu_jit_allocate_prepare(
 {
 	struct kbase_context *const kctx = kcpu_queue->kctx;
 	void __user *data = u64_to_user_ptr(alloc_info->info);
-	struct base_jit_alloc_info *info;
+	struct base_jit_alloc_info *info = NULL;
 	u32 count = alloc_info->count;
 	int ret = 0;
 	u32 i;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
-	if (!data || count > kcpu_queue->kctx->jit_max_allocations ||
-			count > ARRAY_SIZE(kctx->jit_alloc)) {
+	if ((count == 0) || (count > ARRAY_SIZE(kctx->jit_alloc)) ||
+	    (count > kcpu_queue->kctx->jit_max_allocations) || (!data) ||
+	    !kbase_mem_allow_alloc(kctx)) {
 		ret = -EINVAL;
 		goto out;
 	}
@@ -388,11 +411,13 @@ static int kbase_kcpu_jit_allocate_prepare(
 	}
 
 	current_command->type = BASE_KCPU_COMMAND_TYPE_JIT_ALLOC;
-	list_add_tail(&current_command->info.jit_alloc.node,
-			&kctx->csf.kcpu_queues.jit_cmds_head);
 	current_command->info.jit_alloc.info = info;
 	current_command->info.jit_alloc.count = count;
 	current_command->info.jit_alloc.blocked = false;
+	mutex_lock(&kctx->csf.kcpu_queues.jit_lock);
+	list_add_tail(&current_command->info.jit_alloc.node,
+			&kctx->csf.kcpu_queues.jit_cmds_head);
+	mutex_unlock(&kctx->csf.kcpu_queues.jit_lock);
 
 	return 0;
 out_free:
@@ -411,7 +436,9 @@ static void kbase_kcpu_jit_allocate_finish(
 		struct kbase_kcpu_command_queue *queue,
 		struct kbase_kcpu_command *cmd)
 {
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
+
+	mutex_lock(&queue->kctx->csf.kcpu_queues.jit_lock);
 
 	/* Remove this command from the jit_cmds_head list */
 	list_del(&cmd->info.jit_alloc.node);
@@ -425,6 +452,8 @@ static void kbase_kcpu_jit_allocate_finish(
 		cmd->info.jit_alloc.blocked = false;
 	}
 
+	mutex_unlock(&queue->kctx->csf.kcpu_queues.jit_lock);
+
 	kfree(cmd->info.jit_alloc.info);
 }
 
@@ -437,18 +466,17 @@ static void kbase_kcpu_jit_retry_pending_allocs(struct kbase_context *kctx)
 {
 	struct kbase_kcpu_command_queue *blocked_queue;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock);
 
 	/*
 	 * Reschedule all queues blocked by JIT_ALLOC commands.
 	 * NOTE: This code traverses the list of blocked queues directly. It
 	 * only works as long as the queued works are not executed at the same
 	 * time. This precondition is true since we're holding the
-	 * kbase_csf_kcpu_queue_context.lock .
+	 * kbase_csf_kcpu_queue_context.jit_lock .
 	 */
-	list_for_each_entry(blocked_queue,
-			&kctx->csf.kcpu_queues.jit_blocked_queues, jit_blocked)
-		queue_work(kctx->csf.kcpu_queues.wq, &blocked_queue->work);
+	list_for_each_entry(blocked_queue, &kctx->csf.kcpu_queues.jit_blocked_queues, jit_blocked)
+		kthread_queue_work(&blocked_queue->csf_kcpu_worker, &blocked_queue->work);
 }
 
 static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue,
@@ -465,7 +493,8 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue,
 	if (WARN_ON(!ids))
 		return -EINVAL;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
+	mutex_lock(&kctx->csf.kcpu_queues.jit_lock);
 
 	KBASE_TLSTREAM_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_EXECUTE_JIT_FREE_END(queue->kctx->kbdev,
 									   queue);
@@ -497,9 +526,6 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue,
 			queue->kctx->kbdev, queue, item_err, pages_used);
 	}
 
-	/* Free the list of ids */
-	kfree(ids);
-
 	/*
 	 * Remove this command from the jit_cmds_head list and retry pending
 	 * allocations.
@@ -507,6 +533,11 @@ static int kbase_kcpu_jit_free_process(struct kbase_kcpu_command_queue *queue,
 	list_del(&cmd->info.jit_free.node);
 	kbase_kcpu_jit_retry_pending_allocs(kctx);
 
+	mutex_unlock(&kctx->csf.kcpu_queues.jit_lock);
+
+	/* Free the list of ids */
+	kfree(ids);
+
 	return rc;
 }
 
@@ -522,7 +553,7 @@ static int kbase_kcpu_jit_free_prepare(
 	int ret;
 	u32 i;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	/* Sanity checks */
 	if (!count || count > ARRAY_SIZE(kctx->jit_alloc)) {
@@ -568,10 +599,12 @@ static int kbase_kcpu_jit_free_prepare(
 	}
 
 	current_command->type = BASE_KCPU_COMMAND_TYPE_JIT_FREE;
-	list_add_tail(&current_command->info.jit_free.node,
-			&kctx->csf.kcpu_queues.jit_cmds_head);
 	current_command->info.jit_free.ids = ids;
 	current_command->info.jit_free.count = count;
+	mutex_lock(&kctx->csf.kcpu_queues.jit_lock);
+	list_add_tail(&current_command->info.jit_free.node,
+			&kctx->csf.kcpu_queues.jit_cmds_head);
+	mutex_unlock(&kctx->csf.kcpu_queues.jit_lock);
 
 	return 0;
 out_free:
@@ -580,6 +613,7 @@ out:
 	return ret;
 }
 
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 static int kbase_csf_queue_group_suspend_prepare(
 		struct kbase_kcpu_command_queue *kcpu_queue,
 		struct base_kcpu_command_group_suspend_info *suspend_buf,
@@ -597,7 +631,7 @@ static int kbase_csf_queue_group_suspend_prepare(
 	int pinned_pages = 0, ret = 0;
 	struct kbase_va_region *reg;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	if (suspend_buf->size < csg_suspend_buf_size)
 		return -EINVAL;
@@ -647,10 +681,11 @@ static int kbase_csf_queue_group_suspend_prepare(
 		struct tagged_addr *page_array;
 		u64 start, end, i;
 
-		if (((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_SAME_VA) ||
-				reg->nr_pages < nr_pages ||
-				kbase_reg_current_backed_size(reg) !=
-					reg->nr_pages) {
+		if ((kbase_bits_to_zone(reg->flags) != SAME_VA_ZONE) ||
+		    (kbase_reg_current_backed_size(reg) < nr_pages) ||
+		    !(reg->flags & KBASE_REG_CPU_WR) ||
+		    (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) ||
+		    (kbase_is_region_shrinkable(reg)) || (kbase_va_region_is_no_user_free(reg))) {
 			ret = -EINVAL;
 			goto out_clean_pages;
 		}
@@ -694,14 +729,14 @@ static int kbase_csf_queue_group_suspend_process(struct kbase_context *kctx,
 {
 	return kbase_csf_queue_group_suspend(kctx, sus_buf, group_handle);
 }
+#endif
 
 static enum kbase_csf_event_callback_action event_cqs_callback(void *param)
 {
 	struct kbase_kcpu_command_queue *kcpu_queue =
 		(struct kbase_kcpu_command_queue *)param;
-	struct kbase_context *const kctx = kcpu_queue->kctx;
 
-	queue_work(kctx->csf.kcpu_queues.wq, &kcpu_queue->work);
+	kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work);
 
 	return KBASE_CSF_EVENT_CALLBACK_KEEP;
 }
@@ -731,7 +766,7 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev,
 {
 	u32 i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (WARN_ON(!cqs_wait->objs))
 		return -EINVAL;
@@ -748,7 +783,7 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev,
 				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START(kbdev,
 											 queue);
 				queue->command_started = true;
-				KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_WAIT_START,
+				KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_WAIT_START,
 						   queue, cqs_wait->nr_objs, 0);
 			}
 
@@ -759,23 +794,24 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev,
 				return -EINVAL;
 			}
 
-			sig_set = evt[BASEP_EVENT_VAL_INDEX] > cqs_wait->objs[i].val;
+			sig_set =
+				evt[BASEP_EVENT32_VAL_OFFSET / sizeof(u32)] > cqs_wait->objs[i].val;
 			if (sig_set) {
 				bool error = false;
 
 				bitmap_set(cqs_wait->signaled, i, 1);
 				if ((cqs_wait->inherit_err_flags & (1U << i)) &&
-				    evt[BASEP_EVENT_ERR_INDEX] > 0) {
+				    evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)] > 0) {
 					queue->has_error = true;
 					error = true;
 				}
 
-				KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_WAIT_END,
+				KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_WAIT_END,
 						queue, cqs_wait->objs[i].addr,
 						error);
 
 				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END(
-					kbdev, queue, evt[BASEP_EVENT_ERR_INDEX]);
+					kbdev, queue, evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)]);
 				queue->command_started = false;
 			}
 
@@ -792,14 +828,36 @@ static int kbase_kcpu_cqs_wait_process(struct kbase_device *kbdev,
 	return bitmap_full(cqs_wait->signaled, cqs_wait->nr_objs);
 }
 
+static inline bool kbase_kcpu_cqs_is_data_type_valid(u8 data_type)
+{
+	return data_type == BASEP_CQS_DATA_TYPE_U32 || data_type == BASEP_CQS_DATA_TYPE_U64;
+}
+
+static inline bool kbase_kcpu_cqs_is_aligned(u64 addr, u8 data_type)
+{
+	BUILD_BUG_ON(BASEP_EVENT32_ALIGN_BYTES != BASEP_EVENT32_SIZE_BYTES);
+	BUILD_BUG_ON(BASEP_EVENT64_ALIGN_BYTES != BASEP_EVENT64_SIZE_BYTES);
+	WARN_ON(!kbase_kcpu_cqs_is_data_type_valid(data_type));
+
+	switch (data_type) {
+	default:
+		return false;
+	case BASEP_CQS_DATA_TYPE_U32:
+		return (addr & (BASEP_EVENT32_ALIGN_BYTES - 1)) == 0;
+	case BASEP_CQS_DATA_TYPE_U64:
+		return (addr & (BASEP_EVENT64_ALIGN_BYTES - 1)) == 0;
+	}
+}
+
 static int kbase_kcpu_cqs_wait_prepare(struct kbase_kcpu_command_queue *queue,
 		struct base_kcpu_command_cqs_wait_info *cqs_wait_info,
 		struct kbase_kcpu_command *current_command)
 {
 	struct base_cqs_wait_info *objs;
 	unsigned int nr_objs = cqs_wait_info->nr_objs;
+	unsigned int i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS)
 		return -EINVAL;
@@ -817,6 +875,17 @@ static int kbase_kcpu_cqs_wait_prepare(struct kbase_kcpu_command_queue *queue,
 		return -ENOMEM;
 	}
 
+	/* Check the CQS objects as early as possible. By checking their alignment
+	 * (required alignment equals to size for Sync32 and Sync64 objects), we can
+	 * prevent overrunning the supplied event page.
+	 */
+	for (i = 0; i < nr_objs; i++) {
+		if (!kbase_kcpu_cqs_is_aligned(objs[i].addr, BASEP_CQS_DATA_TYPE_U32)) {
+			kfree(objs);
+			return -EINVAL;
+		}
+	}
+
 	if (++queue->cqs_wait_count == 1) {
 		if (kbase_csf_event_wait_add(queue->kctx,
 				event_cqs_callback, queue)) {
@@ -853,7 +922,7 @@ static void kbase_kcpu_cqs_set_process(struct kbase_device *kbdev,
 {
 	unsigned int i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (WARN_ON(!cqs_set->objs))
 		return;
@@ -872,14 +941,13 @@ static void kbase_kcpu_cqs_set_process(struct kbase_device *kbdev,
 				"Sync memory %llx already freed", cqs_set->objs[i].addr);
 			queue->has_error = true;
 		} else {
-			evt[BASEP_EVENT_ERR_INDEX] = queue->has_error;
+			evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)] = queue->has_error;
 			/* Set to signaled */
-			evt[BASEP_EVENT_VAL_INDEX]++;
+			evt[BASEP_EVENT32_VAL_OFFSET / sizeof(u32)]++;
 			kbase_phy_alloc_mapping_put(queue->kctx, mapping);
 
-			KBASE_KTRACE_ADD_CSF_KCPU(kbdev, CQS_SET,
-					queue, cqs_set->objs[i].addr,
-					evt[BASEP_EVENT_ERR_INDEX]);
+			KBASE_KTRACE_ADD_CSF_KCPU(kbdev, KCPU_CQS_SET, queue, cqs_set->objs[i].addr,
+						  evt[BASEP_EVENT32_ERR_OFFSET / sizeof(u32)]);
 		}
 	}
 
@@ -894,11 +962,11 @@ static int kbase_kcpu_cqs_set_prepare(
 		struct base_kcpu_command_cqs_set_info *cqs_set_info,
 		struct kbase_kcpu_command *current_command)
 {
-	struct kbase_context *const kctx = kcpu_queue->kctx;
 	struct base_cqs_set *objs;
 	unsigned int nr_objs = cqs_set_info->nr_objs;
+	unsigned int i;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS)
 		return -EINVAL;
@@ -916,6 +984,17 @@ static int kbase_kcpu_cqs_set_prepare(
 		return -ENOMEM;
 	}
 
+	/* Check the CQS objects as early as possible. By checking their alignment
+	 * (required alignment equals to size for Sync32 and Sync64 objects), we can
+	 * prevent overrunning the supplied event page.
+	 */
+	for (i = 0; i < nr_objs; i++) {
+		if (!kbase_kcpu_cqs_is_aligned(objs[i].addr, BASEP_CQS_DATA_TYPE_U32)) {
+			kfree(objs);
+			return -EINVAL;
+		}
+	}
+
 	current_command->type = BASE_KCPU_COMMAND_TYPE_CQS_SET;
 	current_command->info.cqs_set.nr_objs = nr_objs;
 	current_command->info.cqs_set.objs = objs;
@@ -948,7 +1027,7 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev,
 {
 	u32 i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (WARN_ON(!cqs_wait_operation->objs))
 		return -EINVAL;
@@ -958,12 +1037,16 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev,
 		if (!test_bit(i, cqs_wait_operation->signaled)) {
 			struct kbase_vmap_struct *mapping;
 			bool sig_set;
-			u64 *evt = (u64 *)kbase_phy_alloc_mapping_get(queue->kctx,
-						cqs_wait_operation->objs[i].addr, &mapping);
+			uintptr_t evt = (uintptr_t)kbase_phy_alloc_mapping_get(
+				queue->kctx, cqs_wait_operation->objs[i].addr, &mapping);
+			u64 val = 0;
 
-			/* GPUCORE-28172 RDT to review */
-			if (!queue->command_started)
+			if (!queue->command_started) {
 				queue->command_started = true;
+				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START(
+					kbdev, queue);
+			}
+
 
 			if (!evt) {
 				dev_warn(kbdev->dev,
@@ -972,12 +1055,29 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev,
 				return -EINVAL;
 			}
 
+			switch (cqs_wait_operation->objs[i].data_type) {
+			default:
+				WARN_ON(!kbase_kcpu_cqs_is_data_type_valid(
+					cqs_wait_operation->objs[i].data_type));
+				kbase_phy_alloc_mapping_put(queue->kctx, mapping);
+				queue->has_error = true;
+				return -EINVAL;
+			case BASEP_CQS_DATA_TYPE_U32:
+				val = *(u32 *)evt;
+				evt += BASEP_EVENT32_ERR_OFFSET - BASEP_EVENT32_VAL_OFFSET;
+				break;
+			case BASEP_CQS_DATA_TYPE_U64:
+				val = *(u64 *)evt;
+				evt += BASEP_EVENT64_ERR_OFFSET - BASEP_EVENT64_VAL_OFFSET;
+				break;
+			}
+
 			switch (cqs_wait_operation->objs[i].operation) {
 			case BASEP_CQS_WAIT_OPERATION_LE:
-				sig_set = *evt <= cqs_wait_operation->objs[i].val;
+				sig_set = val <= cqs_wait_operation->objs[i].val;
 				break;
 			case BASEP_CQS_WAIT_OPERATION_GT:
-				sig_set = *evt > cqs_wait_operation->objs[i].val;
+				sig_set = val > cqs_wait_operation->objs[i].val;
 				break;
 			default:
 				dev_dbg(kbdev->dev,
@@ -989,28 +1089,15 @@ static int kbase_kcpu_cqs_wait_operation_process(struct kbase_device *kbdev,
 				return -EINVAL;
 			}
 
-			/* Increment evt up to the error_state value depending on the CQS data type */
-			switch (cqs_wait_operation->objs[i].data_type) {
-			default:
-				dev_dbg(kbdev->dev, "Unreachable data_type=%d", cqs_wait_operation->objs[i].data_type);
-				/* Fallthrough - hint to compiler that there's really only 2 options at present */
-				fallthrough;
-			case BASEP_CQS_DATA_TYPE_U32:
-				evt = (u64 *)((u8 *)evt + sizeof(u32));
-				break;
-			case BASEP_CQS_DATA_TYPE_U64:
-				evt = (u64 *)((u8 *)evt + sizeof(u64));
-				break;
-			}
-
 			if (sig_set) {
 				bitmap_set(cqs_wait_operation->signaled, i, 1);
 				if ((cqs_wait_operation->inherit_err_flags & (1U << i)) &&
-				    *evt > 0) {
+				    *(u32 *)evt > 0) {
 					queue->has_error = true;
 				}
 
-				/* GPUCORE-28172 RDT to review */
+				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END(
+					kbdev, queue, *(u32 *)evt);
 
 				queue->command_started = false;
 			}
@@ -1034,8 +1121,9 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue
 {
 	struct base_cqs_wait_operation_info *objs;
 	unsigned int nr_objs = cqs_wait_operation_info->nr_objs;
+	unsigned int i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS)
 		return -EINVAL;
@@ -1053,6 +1141,18 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue
 		return -ENOMEM;
 	}
 
+	/* Check the CQS objects as early as possible. By checking their alignment
+	 * (required alignment equals to size for Sync32 and Sync64 objects), we can
+	 * prevent overrunning the supplied event page.
+	 */
+	for (i = 0; i < nr_objs; i++) {
+		if (!kbase_kcpu_cqs_is_data_type_valid(objs[i].data_type) ||
+		    !kbase_kcpu_cqs_is_aligned(objs[i].addr, objs[i].data_type)) {
+			kfree(objs);
+			return -EINVAL;
+		}
+	}
+
 	if (++queue->cqs_wait_count == 1) {
 		if (kbase_csf_event_wait_add(queue->kctx,
 				event_cqs_callback, queue)) {
@@ -1083,6 +1183,44 @@ static int kbase_kcpu_cqs_wait_operation_prepare(struct kbase_kcpu_command_queue
 	return 0;
 }
 
+static void kbasep_kcpu_cqs_do_set_operation_32(struct kbase_kcpu_command_queue *queue,
+						uintptr_t evt, u8 operation, u64 val)
+{
+	struct kbase_device *kbdev = queue->kctx->kbdev;
+
+	switch (operation) {
+	case BASEP_CQS_SET_OPERATION_ADD:
+		*(u32 *)evt += (u32)val;
+		break;
+	case BASEP_CQS_SET_OPERATION_SET:
+		*(u32 *)evt = val;
+		break;
+	default:
+		dev_dbg(kbdev->dev, "Unsupported CQS set operation %d", operation);
+		queue->has_error = true;
+		break;
+	}
+}
+
+static void kbasep_kcpu_cqs_do_set_operation_64(struct kbase_kcpu_command_queue *queue,
+						uintptr_t evt, u8 operation, u64 val)
+{
+	struct kbase_device *kbdev = queue->kctx->kbdev;
+
+	switch (operation) {
+	case BASEP_CQS_SET_OPERATION_ADD:
+		*(u64 *)evt += val;
+		break;
+	case BASEP_CQS_SET_OPERATION_SET:
+		*(u64 *)evt = val;
+		break;
+	default:
+		dev_dbg(kbdev->dev, "Unsupported CQS set operation %d", operation);
+		queue->has_error = true;
+		break;
+	}
+}
+
 static void kbase_kcpu_cqs_set_operation_process(
 		struct kbase_device *kbdev,
 		struct kbase_kcpu_command_queue *queue,
@@ -1090,58 +1228,49 @@ static void kbase_kcpu_cqs_set_operation_process(
 {
 	unsigned int i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	if (WARN_ON(!cqs_set_operation->objs))
 		return;
 
 	for (i = 0; i < cqs_set_operation->nr_objs; i++) {
 		struct kbase_vmap_struct *mapping;
-		u64 *evt;
+		uintptr_t evt;
 
-		evt = (u64 *)kbase_phy_alloc_mapping_get(
+		evt = (uintptr_t)kbase_phy_alloc_mapping_get(
 			queue->kctx, cqs_set_operation->objs[i].addr, &mapping);
 
-		/* GPUCORE-28172 RDT to review */
-
 		if (!evt) {
 			dev_warn(kbdev->dev,
 				"Sync memory %llx already freed", cqs_set_operation->objs[i].addr);
 			queue->has_error = true;
 		} else {
-			switch (cqs_set_operation->objs[i].operation) {
-			case BASEP_CQS_SET_OPERATION_ADD:
-				*evt += cqs_set_operation->objs[i].val;
-				break;
-			case BASEP_CQS_SET_OPERATION_SET:
-				*evt = cqs_set_operation->objs[i].val;
-				break;
-			default:
-				dev_dbg(kbdev->dev,
-					"Unsupported CQS set operation %d", cqs_set_operation->objs[i].operation);
-				queue->has_error = true;
-				break;
-			}
+			struct base_cqs_set_operation_info *obj = &cqs_set_operation->objs[i];
 
-			/* Increment evt up to the error_state value depending on the CQS data type */
-			switch (cqs_set_operation->objs[i].data_type) {
+			switch (obj->data_type) {
 			default:
-				dev_dbg(kbdev->dev, "Unreachable data_type=%d", cqs_set_operation->objs[i].data_type);
-				/* Fallthrough - hint to compiler that there's really only 2 options at present */
-				fallthrough;
+				WARN_ON(!kbase_kcpu_cqs_is_data_type_valid(obj->data_type));
+				queue->has_error = true;
+				goto skip_err_propagation;
 			case BASEP_CQS_DATA_TYPE_U32:
-				evt = (u64 *)((u8 *)evt + sizeof(u32));
+				kbasep_kcpu_cqs_do_set_operation_32(queue, evt, obj->operation,
+								    obj->val);
+				evt += BASEP_EVENT32_ERR_OFFSET - BASEP_EVENT32_VAL_OFFSET;
 				break;
 			case BASEP_CQS_DATA_TYPE_U64:
-				evt = (u64 *)((u8 *)evt + sizeof(u64));
+				kbasep_kcpu_cqs_do_set_operation_64(queue, evt, obj->operation,
+								    obj->val);
+				evt += BASEP_EVENT64_ERR_OFFSET - BASEP_EVENT64_VAL_OFFSET;
 				break;
 			}
 
-			/* GPUCORE-28172 RDT to review */
+			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION(
+				kbdev, queue, *(u32 *)evt ? 1 : 0);
 
 			/* Always propagate errors */
-			*evt = queue->has_error;
+			*(u32 *)evt = queue->has_error;
 
+skip_err_propagation:
 			kbase_phy_alloc_mapping_put(queue->kctx, mapping);
 		}
 	}
@@ -1157,11 +1286,11 @@ static int kbase_kcpu_cqs_set_operation_prepare(
 		struct base_kcpu_command_cqs_set_operation_info *cqs_set_operation_info,
 		struct kbase_kcpu_command *current_command)
 {
-	struct kbase_context *const kctx = kcpu_queue->kctx;
 	struct base_cqs_set_operation_info *objs;
 	unsigned int nr_objs = cqs_set_operation_info->nr_objs;
+	unsigned int i;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	if (nr_objs > BASEP_KCPU_CQS_MAX_NUM_OBJS)
 		return -EINVAL;
@@ -1179,6 +1308,18 @@ static int kbase_kcpu_cqs_set_operation_prepare(
 		return -ENOMEM;
 	}
 
+	/* Check the CQS objects as early as possible. By checking their alignment
+	 * (required alignment equals to size for Sync32 and Sync64 objects), we can
+	 * prevent overrunning the supplied event page.
+	 */
+	for (i = 0; i < nr_objs; i++) {
+		if (!kbase_kcpu_cqs_is_data_type_valid(objs[i].data_type) ||
+		    !kbase_kcpu_cqs_is_aligned(objs[i].addr, objs[i].data_type)) {
+			kfree(objs);
+			return -EINVAL;
+		}
+	}
+
 	current_command->type = BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION;
 	current_command->info.cqs_set_operation.nr_objs = nr_objs;
 	current_command->info.cqs_set_operation.objs = objs;
@@ -1200,20 +1341,24 @@ static void kbase_csf_fence_wait_callback(struct dma_fence *fence,
 	struct kbase_kcpu_command_queue *kcpu_queue = fence_info->kcpu_queue;
 	struct kbase_context *const kctx = kcpu_queue->kctx;
 
-	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_WAIT_END, kcpu_queue,
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	/* Fence gets signaled. Deactivate the timer for fence-wait timeout */
+	del_timer(&kcpu_queue->fence_timeout);
+#endif
+
+	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue,
 				  fence->context, fence->seqno);
 
 	/* Resume kcpu command queue processing. */
-	queue_work(kctx->csf.kcpu_queues.wq, &kcpu_queue->work);
+	kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work);
 }
 
-static void kbase_kcpu_fence_wait_cancel(
-		struct kbase_kcpu_command_queue *kcpu_queue,
-		struct kbase_kcpu_command_fence_info *fence_info)
+static void kbasep_kcpu_fence_wait_cancel(struct kbase_kcpu_command_queue *kcpu_queue,
+					  struct kbase_kcpu_command_fence_info *fence_info)
 {
 	struct kbase_context *const kctx = kcpu_queue->kctx;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	if (WARN_ON(!fence_info->fence))
 		return;
@@ -1222,8 +1367,15 @@ static void kbase_kcpu_fence_wait_cancel(
 		bool removed = dma_fence_remove_callback(fence_info->fence,
 				&fence_info->fence_cb);
 
+#ifdef CONFIG_MALI_FENCE_DEBUG
+		/* Fence-wait cancelled or fence signaled. In the latter case
+		 * the timer would already have been deactivated inside
+		 * kbase_csf_fence_wait_callback().
+		 */
+		del_timer_sync(&kcpu_queue->fence_timeout);
+#endif
 		if (removed)
-			KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_WAIT_END,
+			KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END,
 					kcpu_queue, fence_info->fence->context,
 					fence_info->fence->seqno);
 	}
@@ -1235,6 +1387,80 @@ static void kbase_kcpu_fence_wait_cancel(
 	fence_info->fence = NULL;
 }
 
+#ifdef CONFIG_MALI_FENCE_DEBUG
+/**
+ * fence_timeout_callback() - Timeout callback function for fence-wait
+ *
+ * @timer: Timer struct
+ *
+ * Context and seqno of the timed-out fence will be displayed in dmesg.
+ * If the fence has been signalled a work will be enqueued to process
+ * the fence-wait without displaying debugging information.
+ */
+static void fence_timeout_callback(struct timer_list *timer)
+{
+	struct kbase_kcpu_command_queue *kcpu_queue =
+		container_of(timer, struct kbase_kcpu_command_queue, fence_timeout);
+	struct kbase_context *const kctx = kcpu_queue->kctx;
+	struct kbase_kcpu_command *cmd = &kcpu_queue->commands[kcpu_queue->start_offset];
+	struct kbase_kcpu_command_fence_info *fence_info;
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+	struct fence *fence;
+#else
+	struct dma_fence *fence;
+#endif
+	struct kbase_sync_fence_info info;
+
+	if (cmd->type != BASE_KCPU_COMMAND_TYPE_FENCE_WAIT) {
+		dev_err(kctx->kbdev->dev,
+			"%s: Unexpected command type %d in ctx:%d_%d kcpu queue:%u", __func__,
+			cmd->type, kctx->tgid, kctx->id, kcpu_queue->id);
+		return;
+	}
+
+	fence_info = &cmd->info.fence;
+
+	fence = kbase_fence_get(fence_info);
+	if (!fence) {
+		dev_err(kctx->kbdev->dev, "no fence found in ctx:%d_%d kcpu queue:%u", kctx->tgid,
+			kctx->id, kcpu_queue->id);
+		return;
+	}
+
+	kbase_sync_fence_info_get(fence, &info);
+
+	if (info.status == 1) {
+		kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->work);
+	} else if (info.status == 0) {
+		dev_warn(kctx->kbdev->dev, "fence has not yet signalled in %ums",
+			 FENCE_WAIT_TIMEOUT_MS);
+		dev_warn(kctx->kbdev->dev,
+			 "ctx:%d_%d kcpu queue:%u still waiting for fence[%pK] context#seqno:%s",
+			 kctx->tgid, kctx->id, kcpu_queue->id, fence, info.name);
+	} else {
+		dev_warn(kctx->kbdev->dev, "fence has got error");
+		dev_warn(kctx->kbdev->dev,
+			 "ctx:%d_%d kcpu queue:%u faulty fence[%pK] context#seqno:%s error(%d)",
+			 kctx->tgid, kctx->id, kcpu_queue->id, fence, info.name, info.status);
+	}
+
+	kbase_fence_put(fence);
+}
+
+/**
+ * fence_wait_timeout_start() - Start a timer to check fence-wait timeout
+ *
+ * @cmd: KCPU command queue
+ *
+ * Activate a timer to check whether a fence-wait command in the queue
+ * gets completed  within FENCE_WAIT_TIMEOUT_MS
+ */
+static void fence_wait_timeout_start(struct kbase_kcpu_command_queue *cmd)
+{
+	mod_timer(&cmd->fence_timeout, jiffies + msecs_to_jiffies(FENCE_WAIT_TIMEOUT_MS));
+}
+#endif
+
 /**
  * kbase_kcpu_fence_wait_process() - Process the kcpu fence wait command
  *
@@ -1254,8 +1480,9 @@ static int kbase_kcpu_fence_wait_process(
 #else
 	struct dma_fence *fence;
 #endif
+	struct kbase_context *const kctx = kcpu_queue->kctx;
 
-	lockdep_assert_held(&kcpu_queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
 	if (WARN_ON(!fence_info->fence))
 		return -EINVAL;
@@ -1265,18 +1492,38 @@ static int kbase_kcpu_fence_wait_process(
 	if (kcpu_queue->fence_wait_processed) {
 		fence_status = dma_fence_get_status(fence);
 	} else {
-		int cb_err = dma_fence_add_callback(fence,
+		int cb_err;
+
+		KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_START, kcpu_queue,
+					  fence->context, fence->seqno);
+
+		cb_err = dma_fence_add_callback(fence,
 			&fence_info->fence_cb,
 			kbase_csf_fence_wait_callback);
 
-		KBASE_KTRACE_ADD_CSF_KCPU(kcpu_queue->kctx->kbdev,
-					  FENCE_WAIT_START, kcpu_queue,
-					  fence->context, fence->seqno);
 		fence_status = cb_err;
-		if (cb_err == 0)
+		if (cb_err == 0) {
 			kcpu_queue->fence_wait_processed = true;
-		else if (cb_err == -ENOENT)
+#ifdef CONFIG_MALI_FENCE_DEBUG
+			fence_wait_timeout_start(kcpu_queue);
+#endif
+		} else if (cb_err == -ENOENT) {
 			fence_status = dma_fence_get_status(fence);
+			if (!fence_status) {
+				struct kbase_sync_fence_info info;
+
+				kbase_sync_fence_info_get(fence, &info);
+				dev_warn(kctx->kbdev->dev,
+					 "Unexpected status for fence %s of ctx:%d_%d kcpu queue:%u",
+					 info.name, kctx->tgid, kctx->id, kcpu_queue->id);
+			}
+
+			KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue,
+						  fence->context, fence->seqno);
+		} else {
+			KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_WAIT_END, kcpu_queue,
+						  fence->context, fence->seqno);
+		}
 	}
 
 	/*
@@ -1289,17 +1536,15 @@ static int kbase_kcpu_fence_wait_process(
 	 */
 
 	if (fence_status)
-		kbase_kcpu_fence_wait_cancel(kcpu_queue, fence_info);
+		kbasep_kcpu_fence_wait_cancel(kcpu_queue, fence_info);
 
 	return fence_status;
 }
 
-static int kbase_kcpu_fence_wait_prepare(
-		struct kbase_kcpu_command_queue *kcpu_queue,
-		struct base_kcpu_command_fence_info *fence_info,
-		struct kbase_kcpu_command *current_command)
+static int kbase_kcpu_fence_wait_prepare(struct kbase_kcpu_command_queue *kcpu_queue,
+					 struct base_kcpu_command_fence_info *fence_info,
+					 struct kbase_kcpu_command *current_command)
 {
-	struct kbase_context *const kctx = kcpu_queue->kctx;
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
 	struct fence *fence_in;
 #else
@@ -1307,10 +1552,9 @@ static int kbase_kcpu_fence_wait_prepare(
 #endif
 	struct base_fence fence;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
-	if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence),
-			sizeof(fence)))
+	if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), sizeof(fence)))
 		return -ENOMEM;
 
 	fence_in = sync_file_get_fence(fence.basep.fd);
@@ -1321,62 +1565,267 @@ static int kbase_kcpu_fence_wait_prepare(
 	current_command->type = BASE_KCPU_COMMAND_TYPE_FENCE_WAIT;
 	current_command->info.fence.fence = fence_in;
 	current_command->info.fence.kcpu_queue = kcpu_queue;
-
 	return 0;
 }
 
-static int kbase_kcpu_fence_signal_process(
+/**
+ * fence_signal_timeout_start() - Start a timer to check enqueued fence-signal command is
+ *                                blocked for too long a duration
+ *
+ * @kcpu_queue: KCPU command queue
+ *
+ * Activate the queue's fence_signal_timeout timer to check whether a fence-signal command
+ * enqueued has been blocked for longer than a configured wait duration.
+ */
+static void fence_signal_timeout_start(struct kbase_kcpu_command_queue *kcpu_queue)
+{
+	struct kbase_device *kbdev = kcpu_queue->kctx->kbdev;
+	unsigned int wait_ms = kbase_get_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT);
+
+	if (atomic_read(&kbdev->fence_signal_timeout_enabled))
+		mod_timer(&kcpu_queue->fence_signal_timeout, jiffies + msecs_to_jiffies(wait_ms));
+}
+
+static void kbase_kcpu_command_fence_force_signaled_set(
+		struct kbase_kcpu_command_fence_info *fence_info,
+		bool has_force_signaled)
+{
+	fence_info->fence_has_force_signaled = has_force_signaled;
+}
+
+bool kbase_kcpu_command_fence_has_force_signaled(struct kbase_kcpu_command_fence_info *fence_info)
+{
+	return fence_info->fence_has_force_signaled;
+}
+
+static int kbase_kcpu_fence_force_signal_process(
 		struct kbase_kcpu_command_queue *kcpu_queue,
 		struct kbase_kcpu_command_fence_info *fence_info)
 {
 	struct kbase_context *const kctx = kcpu_queue->kctx;
 	int ret;
 
+	/* already force signaled just return*/
+	if (kbase_kcpu_command_fence_has_force_signaled(fence_info))
+		return 0;
+
+	if (WARN_ON(!fence_info->fence))
+		return -EINVAL;
+
+	ret = dma_fence_signal(fence_info->fence);
+	if (unlikely(ret < 0)) {
+		dev_warn(kctx->kbdev->dev, "dma_fence(%d) has been signalled already\n", ret);
+		/* Treated as a success */
+		ret = 0;
+	}
+
+	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_SIGNAL, kcpu_queue,
+				fence_info->fence->context,
+				fence_info->fence->seqno);
+
+#if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
+	dev_info(kctx->kbdev->dev,
+			"ctx:%d_%d kcpu queue[%pK]:%u signal fence[%pK] context#seqno:%llu#%u\n",
+			kctx->tgid, kctx->id, kcpu_queue, kcpu_queue->id, fence_info->fence,
+			fence_info->fence->context, fence_info->fence->seqno);
+#else
+	dev_info(kctx->kbdev->dev,
+			"ctx:%d_%d kcpu queue[%pK]:%u signal fence[%pK] context#seqno:%llu#%llu\n",
+			kctx->tgid, kctx->id, kcpu_queue, kcpu_queue->id, fence_info->fence,
+			fence_info->fence->context, fence_info->fence->seqno);
+#endif
+
+	/* dma_fence refcount needs to be decreased to release it. */
+	dma_fence_put(fence_info->fence);
+	fence_info->fence = NULL;
+
+	return ret;
+}
+
+static void kcpu_force_signal_fence(struct kbase_kcpu_command_queue *kcpu_queue)
+{
+	int status;
+	int i;
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+	struct fence *fence;
+#else
+	struct dma_fence *fence;
+#endif
+	struct kbase_context *const kctx = kcpu_queue->kctx;
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	int del;
+#endif
+
+	/* Force trigger all pending fence-signal commands */
+	for (i = 0; i != kcpu_queue->num_pending_cmds; ++i) {
+		struct kbase_kcpu_command *cmd =
+			&kcpu_queue->commands[(u8)(kcpu_queue->start_offset + i)];
+
+		if (cmd->type == BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL) {
+			/* If a fence had already force-signalled previously,
+			 * just skip it in this round of force signalling.
+			 */
+			if (kbase_kcpu_command_fence_has_force_signaled(&cmd->info.fence))
+				continue;
+
+			fence = kbase_fence_get(&cmd->info.fence);
+
+			dev_info(kctx->kbdev->dev, "kbase KCPU[%pK] cmd%d fence[%pK] force signaled\n",
+					kcpu_queue, i+1, fence);
+
+			/* set ETIMEDOUT error flag before signal the fence*/
+			dma_fence_set_error_helper(fence, -ETIMEDOUT);
+
+			/* force signal fence */
+			status = kbase_kcpu_fence_force_signal_process(
+					kcpu_queue, &cmd->info.fence);
+			if (status < 0)
+				dev_err(kctx->kbdev->dev, "kbase signal failed\n");
+			else
+				kbase_kcpu_command_fence_force_signaled_set(&cmd->info.fence, true);
+
+			kcpu_queue->has_error = true;
+		}
+	}
+
+	/* set fence_signal_pending_cnt to 0
+	 * and del_timer of the kcpu_queue
+	 * because we signaled all the pending fence in the queue
+	 */
+	atomic_set(&kcpu_queue->fence_signal_pending_cnt, 0);
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	del = del_timer_sync(&kcpu_queue->fence_signal_timeout);
+	dev_info(kctx->kbdev->dev, "kbase KCPU [%pK] delete fence signal timeout timer ret: %d",
+			kcpu_queue, del);
+#else
+	del_timer_sync(&kcpu_queue->fence_signal_timeout);
+#endif
+}
+
+static void kcpu_queue_force_fence_signal(struct kbase_kcpu_command_queue *kcpu_queue)
+{
+	struct kbase_context *const kctx = kcpu_queue->kctx;
+	char buff[] = "surfaceflinger";
+
+	/* Force signal unsignaled fence expect surfaceflinger */
+	if (memcmp(kctx->comm, buff, sizeof(buff))) {
+		mutex_lock(&kcpu_queue->lock);
+		kcpu_force_signal_fence(kcpu_queue);
+		mutex_unlock(&kcpu_queue->lock);
+	}
+}
+
+/**
+ * fence_signal_timeout_cb() - Timeout callback function for fence-signal-wait
+ *
+ * @timer: Timer struct
+ *
+ * Callback function on an enqueued fence signal command has expired on its configured wait
+ * duration. At the moment it's just a simple place-holder for other tasks to expand on actual
+ * sync state dump via a bottom-half workqueue item.
+ */
+static void fence_signal_timeout_cb(struct timer_list *timer)
+{
+	struct kbase_kcpu_command_queue *kcpu_queue =
+		container_of(timer, struct kbase_kcpu_command_queue, fence_signal_timeout);
+	struct kbase_context *const kctx = kcpu_queue->kctx;
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	dev_warn(kctx->kbdev->dev, "kbase KCPU fence signal timeout callback triggered");
+#endif
+
+	/* If we have additional pending fence signal commands in the queue, re-arm for the
+	 * remaining fence signal commands, and dump the work to dmesg, only if the
+	 * global configuration option is set.
+	 */
+	if (atomic_read(&kctx->kbdev->fence_signal_timeout_enabled)) {
+		if (atomic_read(&kcpu_queue->fence_signal_pending_cnt) > 1)
+			fence_signal_timeout_start(kcpu_queue);
+
+		kthread_queue_work(&kcpu_queue->csf_kcpu_worker, &kcpu_queue->timeout_work);
+	}
+}
+
+static int kbasep_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue,
+					    struct kbase_kcpu_command_fence_info *fence_info)
+{
+	struct kbase_context *const kctx = kcpu_queue->kctx;
+	int ret;
+
+	/* already force signaled */
+	if (kbase_kcpu_command_fence_has_force_signaled(fence_info))
+		return 0;
+
 	if (WARN_ON(!fence_info->fence))
 		return -EINVAL;
 
 	ret = dma_fence_signal(fence_info->fence);
 
 	if (unlikely(ret < 0)) {
-		dev_warn(kctx->kbdev->dev,
-			"fence_signal() failed with %d\n", ret);
+		dev_warn(kctx->kbdev->dev, "dma_fence(%d) has been signalled already\n", ret);
+		/* Treated as a success */
+		ret = 0;
 	}
 
-	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, FENCE_SIGNAL, kcpu_queue,
+	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_FENCE_SIGNAL, kcpu_queue,
 				  fence_info->fence->context,
 				  fence_info->fence->seqno);
 
-	dma_fence_put(fence_info->fence);
+	/* If one has multiple enqueued fence signal commands, re-arm the timer */
+	if (atomic_dec_return(&kcpu_queue->fence_signal_pending_cnt) > 0) {
+		fence_signal_timeout_start(kcpu_queue);
+#ifdef CONFIG_MALI_FENCE_DEBUG
+		dev_dbg(kctx->kbdev->dev,
+			"kbase re-arm KCPU fence signal timeout timer for next signal command");
+#endif
+	} else {
+#ifdef CONFIG_MALI_FENCE_DEBUG
+		int del = del_timer_sync(&kcpu_queue->fence_signal_timeout);
+
+		dev_dbg(kctx->kbdev->dev, "kbase KCPU delete fence signal timeout timer ret: %d",
+			del);
+		CSTD_UNUSED(del);
+#else
+		del_timer_sync(&kcpu_queue->fence_signal_timeout);
+#endif
+	}
+
+	/* dma_fence refcount needs to be decreased to release it. */
+	kbase_fence_put(fence_info->fence);
 	fence_info->fence = NULL;
 
 	return ret;
 }
 
-static int kbase_kcpu_fence_signal_prepare(
-		struct kbase_kcpu_command_queue *kcpu_queue,
-		struct base_kcpu_command_fence_info *fence_info,
-		struct kbase_kcpu_command *current_command)
+static int kbasep_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue,
+					 struct kbase_kcpu_command *current_command,
+					 struct base_fence *fence, struct sync_file **sync_file,
+					 int *fd)
 {
-	struct kbase_context *const kctx = kcpu_queue->kctx;
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
 	struct fence *fence_out;
 #else
 	struct dma_fence *fence_out;
 #endif
-	struct base_fence fence;
-	struct sync_file *sync_file;
+	struct kbase_kcpu_dma_fence *kcpu_fence;
 	int ret = 0;
-	int fd;
 
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&kcpu_queue->lock);
 
-	if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence),
-			sizeof(fence)))
-		return -EFAULT;
-
-	fence_out = kzalloc(sizeof(*fence_out), GFP_KERNEL);
-	if (!fence_out)
+	kcpu_fence = kzalloc(sizeof(*kcpu_fence), GFP_KERNEL);
+	if (!kcpu_fence)
 		return -ENOMEM;
+	/* Set reference to KCPU metadata */
+	kcpu_fence->metadata = kcpu_queue->metadata;
+
+	/* Set reference to KCPU metadata and increment refcount */
+	kcpu_fence->metadata = kcpu_queue->metadata;
+	WARN_ON(!kbase_refcount_inc_not_zero(&kcpu_fence->metadata->refcount));
+
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+	fence_out = (struct fence *)kcpu_fence;
+#else
+	fence_out = (struct dma_fence *)kcpu_fence;
+#endif
 
 	dma_fence_init(fence_out,
 		       &kbase_fence_ops,
@@ -1394,55 +1843,197 @@ static int kbase_kcpu_fence_signal_prepare(
 #endif
 
 	/* create a sync_file fd representing the fence */
-	sync_file = sync_file_create(fence_out);
-	if (!sync_file) {
-#if (KERNEL_VERSION(4, 9, 67) >= LINUX_VERSION_CODE)
-		dma_fence_put(fence_out);
-#endif
+	*sync_file = sync_file_create(fence_out);
+	if (!(*sync_file)) {
 		ret = -ENOMEM;
 		goto file_create_fail;
 	}
 
-	fd = get_unused_fd_flags(O_CLOEXEC);
-	if (fd < 0) {
-		ret = fd;
+	*fd = get_unused_fd_flags(O_CLOEXEC);
+	if (*fd < 0) {
+		ret = *fd;
 		goto fd_flags_fail;
 	}
 
-	fd_install(fd, sync_file->file);
-
-	fence.basep.fd = fd;
+	fence->basep.fd = *fd;
 
 	current_command->type = BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL;
 	current_command->info.fence.fence = fence_out;
+	kbase_kcpu_command_fence_force_signaled_set(&current_command->info.fence, false);
+
+	return 0;
+
+fd_flags_fail:
+	fput((*sync_file)->file);
+file_create_fail:
+	/*
+	 * Upon failure, dma_fence refcount that was increased by
+	 * dma_fence_get() or sync_file_create() needs to be decreased
+	 * to release it.
+	 */
+	kbase_fence_put(fence_out);
+	current_command->info.fence.fence = NULL;
+
+	return ret;
+}
+
+static int kbase_kcpu_fence_signal_prepare(struct kbase_kcpu_command_queue *kcpu_queue,
+					   struct base_kcpu_command_fence_info *fence_info,
+					   struct kbase_kcpu_command *current_command)
+{
+	struct base_fence fence;
+	struct sync_file *sync_file = NULL;
+	int fd;
+	int ret = 0;
+
+	lockdep_assert_held(&kcpu_queue->lock);
+
+	if (copy_from_user(&fence, u64_to_user_ptr(fence_info->fence), sizeof(fence)))
+		return -EFAULT;
+
+	ret = kbasep_kcpu_fence_signal_init(kcpu_queue, current_command, &fence, &sync_file, &fd);
+	if (ret)
+		return ret;
 
 	if (copy_to_user(u64_to_user_ptr(fence_info->fence), &fence,
 			sizeof(fence))) {
 		ret = -EFAULT;
-		goto fd_flags_fail;
+		goto fail;
 	}
 
+	/* 'sync_file' pointer can't be safely dereferenced once 'fd' is
+	 * installed, so the install step needs to be done at the last
+	 * before returning success.
+	 */
+	fd_install(fd, sync_file->file);
+
+	if (atomic_inc_return(&kcpu_queue->fence_signal_pending_cnt) == 1)
+		fence_signal_timeout_start(kcpu_queue);
+
 	return 0;
 
-fd_flags_fail:
+fail:
 	fput(sync_file->file);
-file_create_fail:
-	dma_fence_put(fence_out);
+	kbase_fence_put(current_command->info.fence.fence);
+	current_command->info.fence.fence = NULL;
 
 	return ret;
 }
+
+int kbase_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue,
+				    struct kbase_kcpu_command_fence_info *fence_info)
+{
+	if (!kcpu_queue || !fence_info)
+		return -EINVAL;
+
+	return kbasep_kcpu_fence_signal_process(kcpu_queue, fence_info);
+}
+KBASE_EXPORT_TEST_API(kbase_kcpu_fence_signal_process);
+
+int kbase_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue,
+				 struct kbase_kcpu_command *current_command,
+				 struct base_fence *fence, struct sync_file **sync_file, int *fd)
+{
+	if (!kcpu_queue || !current_command || !fence || !sync_file || !fd)
+		return -EINVAL;
+
+	return kbasep_kcpu_fence_signal_init(kcpu_queue, current_command, fence, sync_file, fd);
+}
+KBASE_EXPORT_TEST_API(kbase_kcpu_fence_signal_init);
 #endif /* CONFIG_SYNC_FILE */
 
-static void kcpu_queue_process_worker(struct work_struct *data)
+static void kcpu_queue_dump(struct kbase_kcpu_command_queue *queue)
+{
+	struct kbase_context *kctx = queue->kctx;
+	struct kbase_kcpu_command *cmd;
+	struct kbase_kcpu_command_fence_info *fence_info;
+	struct kbase_kcpu_dma_fence *kcpu_fence;
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+	struct fence *fence;
+#else
+	struct dma_fence *fence;
+#endif
+	struct kbase_sync_fence_info info;
+	size_t i;
+
+	mutex_lock(&queue->lock);
+
+	/* Find the next fence signal command in the queue */
+	for (i = 0; i != queue->num_pending_cmds; ++i) {
+		cmd = &queue->commands[(u8)(queue->start_offset + i)];
+		if (cmd->type == BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL) {
+			fence_info = &cmd->info.fence;
+			/* find the first unforce signaled fence */
+			if (!kbase_kcpu_command_fence_has_force_signaled(fence_info))
+				break;
+		}
+	}
+
+	if (i == queue->num_pending_cmds) {
+		dev_err(kctx->kbdev->dev,
+			"%s: No fence signal command found in ctx:%d_%d kcpu queue:%u", __func__,
+			kctx->tgid, kctx->id, queue->id);
+		mutex_unlock(&queue->lock);
+		return;
+	}
+
+
+	fence = kbase_fence_get(fence_info);
+	if (!fence) {
+		dev_err(kctx->kbdev->dev, "no fence found in ctx:%d_%d kcpu queue:%u", kctx->tgid,
+			kctx->id, queue->id);
+		mutex_unlock(&queue->lock);
+		return;
+	}
+
+	kcpu_fence = kbase_kcpu_dma_fence_get(fence);
+	if (!kcpu_fence) {
+		dev_err(kctx->kbdev->dev, "no fence metadata found in ctx:%d_%d kcpu queue:%u",
+			kctx->tgid, kctx->id, queue->id);
+		kbase_fence_put(fence);
+		mutex_unlock(&queue->lock);
+		return;
+	}
+
+	kbase_sync_fence_info_get(fence, &info);
+
+	dev_warn(kctx->kbdev->dev, "------------------------------------------------\n");
+	dev_warn(kctx->kbdev->dev, "KCPU Fence signal timeout detected for ctx:%d_%d\n", kctx->tgid,
+		 kctx->id);
+	dev_warn(kctx->kbdev->dev, "------------------------------------------------\n");
+	dev_warn(kctx->kbdev->dev, "Kcpu queue:%u still waiting for fence[%pK] context#seqno:%s\n",
+		 queue->id, fence, info.name);
+	dev_warn(kctx->kbdev->dev, "Fence metadata timeline name: %s\n",
+		 kcpu_fence->metadata->timeline_name);
+
+	kbase_fence_put(fence);
+	mutex_unlock(&queue->lock);
+
+	mutex_lock(&kctx->csf.kcpu_queues.lock);
+	kbasep_csf_sync_kcpu_dump_locked(kctx, NULL);
+	mutex_unlock(&kctx->csf.kcpu_queues.lock);
+
+	dev_warn(kctx->kbdev->dev, "-----------------------------------------------\n");
+}
+
+static void kcpu_queue_timeout_worker(struct kthread_work *data)
+{
+	struct kbase_kcpu_command_queue *queue =
+		container_of(data, struct kbase_kcpu_command_queue, timeout_work);
+
+	kcpu_queue_dump(queue);
+
+	kcpu_queue_force_fence_signal(queue);
+}
+
+static void kcpu_queue_process_worker(struct kthread_work *data)
 {
 	struct kbase_kcpu_command_queue *queue = container_of(data,
 				struct kbase_kcpu_command_queue, work);
 
-	mutex_lock(&queue->kctx->csf.kcpu_queues.lock);
-
+	mutex_lock(&queue->lock);
 	kcpu_queue_process(queue, false);
-
-	mutex_unlock(&queue->kctx->csf.kcpu_queues.lock);
+	mutex_unlock(&queue->lock);
 }
 
 static int delete_queue(struct kbase_context *kctx, u32 id)
@@ -1455,9 +2046,23 @@ static int delete_queue(struct kbase_context *kctx, u32 id)
 		struct kbase_kcpu_command_queue *queue =
 					kctx->csf.kcpu_queues.array[id];
 
-		KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_DESTROY,
+		KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_DELETE,
 			queue, queue->num_pending_cmds, queue->cqs_wait_count);
 
+		/* Disassociate the queue from the system to prevent further
+		 * submissions. Draining pending commands would be acceptable
+		 * even if a new queue is created using the same ID.
+		 */
+		kctx->csf.kcpu_queues.array[id] = NULL;
+		bitmap_clear(kctx->csf.kcpu_queues.in_use, id, 1);
+
+		mutex_unlock(&kctx->csf.kcpu_queues.lock);
+
+		mutex_lock(&queue->lock);
+
+		/* Metadata struct may outlive KCPU queue.  */
+		kbase_kcpu_dma_fence_meta_put(queue->metadata);
+
 		/* Drain the remaining work for this queue first and go past
 		 * all the waits.
 		 */
@@ -1469,17 +2074,16 @@ static int delete_queue(struct kbase_context *kctx, u32 id)
 		/* All CQS wait commands should have been cleaned up */
 		WARN_ON(queue->cqs_wait_count);
 
-		kctx->csf.kcpu_queues.array[id] = NULL;
-		bitmap_clear(kctx->csf.kcpu_queues.in_use, id, 1);
-
 		/* Fire the tracepoint with the mutex held to enforce correct
 		 * ordering with the summary stream.
 		 */
 		KBASE_TLSTREAM_TL_KBASE_DEL_KCPUQUEUE(kctx->kbdev, queue);
 
-		mutex_unlock(&kctx->csf.kcpu_queues.lock);
+		mutex_unlock(&queue->lock);
+
+		kbase_destroy_kworker_stack(&queue->csf_kcpu_worker);
 
-		cancel_work_sync(&queue->work);
+		mutex_destroy(&queue->lock);
 
 		kfree(queue);
 	} else {
@@ -1546,7 +2150,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 	bool process_next = true;
 	size_t i;
 
-	lockdep_assert_held(&queue->kctx->csf.kcpu_queues.lock);
+	lockdep_assert_held(&queue->lock);
 
 	for (i = 0; i != queue->num_pending_cmds; ++i) {
 		struct kbase_kcpu_command *cmd =
@@ -1564,8 +2168,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 			status = 0;
 #if IS_ENABLED(CONFIG_SYNC_FILE)
 			if (drain_queue) {
-				kbase_kcpu_fence_wait_cancel(queue,
-					&cmd->info.fence);
+				kbasep_kcpu_fence_wait_cancel(queue, &cmd->info.fence);
 			} else {
 				status = kbase_kcpu_fence_wait_process(queue,
 					&cmd->info.fence);
@@ -1595,8 +2198,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 			status = 0;
 
 #if IS_ENABLED(CONFIG_SYNC_FILE)
-			status = kbase_kcpu_fence_signal_process(
-				queue, &cmd->info.fence);
+			status = kbasep_kcpu_fence_signal_process(queue, &cmd->info.fence);
 
 			if (status < 0)
 				queue->has_error = true;
@@ -1668,10 +2270,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START(kbdev,
 											   queue);
 
-				kbase_gpu_vm_lock(queue->kctx);
-				meta = kbase_sticky_resource_acquire(
-					queue->kctx, cmd->info.import.gpu_va);
-				kbase_gpu_vm_unlock(queue->kctx);
+				kbase_gpu_vm_lock_with_pmode_sync(queue->kctx);
+				meta = kbase_sticky_resource_acquire(queue->kctx,
+								     cmd->info.import.gpu_va);
+				kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx);
 
 				if (meta == NULL) {
 					queue->has_error = true;
@@ -1690,10 +2292,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 
 			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_START(kbdev, queue);
 
-			kbase_gpu_vm_lock(queue->kctx);
-			ret = kbase_sticky_resource_release(
-				queue->kctx, NULL, cmd->info.import.gpu_va);
-			kbase_gpu_vm_unlock(queue->kctx);
+			kbase_gpu_vm_lock_with_pmode_sync(queue->kctx);
+			ret = kbase_sticky_resource_release(queue->kctx, NULL,
+							    cmd->info.import.gpu_va);
+			kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx);
 
 			if (!ret) {
 				queue->has_error = true;
@@ -1711,10 +2313,10 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_FORCE_START(kbdev,
 											   queue);
 
-			kbase_gpu_vm_lock(queue->kctx);
-			ret = kbase_sticky_resource_release_force(
-				queue->kctx, NULL, cmd->info.import.gpu_va);
-			kbase_gpu_vm_unlock(queue->kctx);
+			kbase_gpu_vm_lock_with_pmode_sync(queue->kctx);
+			ret = kbase_sticky_resource_release_force(queue->kctx, NULL,
+								  cmd->info.import.gpu_va);
+			kbase_gpu_vm_unlock_with_pmode_sync(queue->kctx);
 
 			if (!ret) {
 				queue->has_error = true;
@@ -1756,7 +2358,7 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 
 			break;
 		}
-		case BASE_KCPU_COMMAND_TYPE_JIT_FREE:
+		case BASE_KCPU_COMMAND_TYPE_JIT_FREE: {
 			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_JIT_FREE_START(kbdev, queue);
 
 			status = kbase_kcpu_jit_free_process(queue, cmd);
@@ -1766,6 +2368,8 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_JIT_FREE_END(
 				kbdev, queue);
 			break;
+		}
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 		case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND: {
 			struct kbase_suspend_copy_buffer *sus_buf =
 					cmd->info.suspend_buf_copy.sus_buf;
@@ -1777,29 +2381,31 @@ static void kcpu_queue_process(struct kbase_kcpu_command_queue *queue,
 				status = kbase_csf_queue_group_suspend_process(
 					queue->kctx, sus_buf,
 					cmd->info.suspend_buf_copy.group_handle);
+
 				if (status)
 					queue->has_error = true;
 
 				KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_GROUP_SUSPEND_END(
 					kbdev, queue, status);
+			}
 
-				if (!sus_buf->cpu_alloc) {
-					int i;
+			if (!sus_buf->cpu_alloc) {
+				int i;
 
-					for (i = 0; i < sus_buf->nr_pages; i++)
-						put_page(sus_buf->pages[i]);
-				} else {
-					kbase_mem_phy_alloc_kernel_unmapped(
-						sus_buf->cpu_alloc);
-					kbase_mem_phy_alloc_put(
-						sus_buf->cpu_alloc);
-				}
+				for (i = 0; i < sus_buf->nr_pages; i++)
+					put_page(sus_buf->pages[i]);
+			} else {
+				kbase_mem_phy_alloc_kernel_unmapped(
+					sus_buf->cpu_alloc);
+				kbase_mem_phy_alloc_put(
+					sus_buf->cpu_alloc);
 			}
 
 			kfree(sus_buf->pages);
 			kfree(sus_buf);
 			break;
 		}
+#endif
 		default:
 			dev_dbg(kbdev->dev,
 				"Unrecognized command type");
@@ -1874,12 +2480,29 @@ static void KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_COMMAND(
 	}
 	case BASE_KCPU_COMMAND_TYPE_CQS_WAIT_OPERATION:
 	{
-		/* GPUCORE-28172 RDT to review */
+		const struct base_cqs_wait_operation_info *waits =
+			cmd->info.cqs_wait_operation.objs;
+		u32 inherit_err_flags = cmd->info.cqs_wait_operation.inherit_err_flags;
+		unsigned int i;
+
+		for (i = 0; i < cmd->info.cqs_wait_operation.nr_objs; i++) {
+			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION(
+				kbdev, queue, waits[i].addr, waits[i].val,
+				waits[i].operation, waits[i].data_type,
+				(inherit_err_flags & ((uint32_t)1 << i)) ? 1 : 0);
+		}
 		break;
 	}
 	case BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION:
 	{
-		/* GPUCORE-28172 RDT to review */
+		const struct base_cqs_set_operation_info *sets = cmd->info.cqs_set_operation.objs;
+		unsigned int i;
+
+		for (i = 0; i < cmd->info.cqs_set_operation.nr_objs; i++) {
+			KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION(
+				kbdev, queue, sets[i].addr, sets[i].val,
+				sets[i].operation, sets[i].data_type);
+		}
 		break;
 	}
 	case BASE_KCPU_COMMAND_TYPE_ERROR_BARRIER:
@@ -1926,11 +2549,13 @@ static void KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_COMMAND(
 		KBASE_TLSTREAM_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_FREE(kbdev, queue);
 		break;
 	}
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 	case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND:
 		KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND(
 			kbdev, queue, cmd->info.suspend_buf_copy.sus_buf,
 			cmd->info.suspend_buf_copy.group_handle);
 		break;
+#endif
 	default:
 		dev_dbg(kbdev->dev, "Unknown command type %u", cmd->type);
 		break;
@@ -1947,9 +2572,11 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 
 	/* The offset to the first command that is being processed or yet to
 	 * be processed is of u8 type, so the number of commands inside the
-	 * queue cannot be more than 256.
+	 * queue cannot be more than 256. The current implementation expects
+	 * exactly 256, any other size will require the addition of wrapping
+	 * logic.
 	 */
-	BUILD_BUG_ON(KBASEP_KCPU_QUEUE_SIZE > 256);
+	BUILD_BUG_ON(KBASEP_KCPU_QUEUE_SIZE != 256);
 
 	/* Whilst the backend interface allows enqueueing multiple commands in
 	 * a single operation, the Base interface does not expose any mechanism
@@ -1964,14 +2591,30 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 		return -EINVAL;
 	}
 
+	/* There might be a race between one thread trying to enqueue commands to the queue
+	 * and other thread trying to delete the same queue.
+	 * This racing could lead to use-after-free problem by enqueuing thread if
+	 * resources for the queue has already been freed by deleting thread.
+	 *
+	 * To prevent the issue, two mutexes are acquired/release asymmetrically as follows.
+	 *
+	 * Lock A (kctx mutex)
+	 * Lock B (queue mutex)
+	 * Unlock A
+	 * Unlock B
+	 *
+	 * With the kctx mutex being held, enqueuing thread will check the queue
+	 * and will return error code if the queue had already been deleted.
+	 */
 	mutex_lock(&kctx->csf.kcpu_queues.lock);
-
-	if (!kctx->csf.kcpu_queues.array[enq->id]) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	queue = kctx->csf.kcpu_queues.array[enq->id];
+	if (queue == NULL) {
+		dev_dbg(kctx->kbdev->dev, "Invalid KCPU queue (id:%u)", enq->id);
+		mutex_unlock(&kctx->csf.kcpu_queues.lock);
+		return -EINVAL;
+	}
+	mutex_lock(&queue->lock);
+	mutex_unlock(&kctx->csf.kcpu_queues.lock);
 
 	if (kcpu_queue_get_space(queue) < enq->nr_commands) {
 		ret = -EBUSY;
@@ -1986,7 +2629,7 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 	 * for the possibility to roll back.
 	 */
 
-	for (i = 0; (i != enq->nr_commands) && !ret; ++i, ++kctx->csf.kcpu_queues.num_cmds) {
+	for (i = 0; (i != enq->nr_commands) && !ret; ++i) {
 		struct kbase_kcpu_command *kcpu_cmd =
 			&queue->commands[(u8)(queue->start_offset + queue->num_pending_cmds + i)];
 		struct base_kcpu_command command;
@@ -2009,7 +2652,7 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 			}
 		}
 
-		kcpu_cmd->enqueue_ts = kctx->csf.kcpu_queues.num_cmds;
+		kcpu_cmd->enqueue_ts = atomic64_inc_return(&kctx->csf.kcpu_queues.cmd_seq_num);
 		switch (command.type) {
 		case BASE_KCPU_COMMAND_TYPE_FENCE_WAIT:
 #if IS_ENABLED(CONFIG_SYNC_FILE)
@@ -2069,11 +2712,13 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 			ret = kbase_kcpu_jit_free_prepare(queue,
 					&command.info.jit_free, kcpu_cmd);
 			break;
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 		case BASE_KCPU_COMMAND_TYPE_GROUP_SUSPEND:
 			ret = kbase_csf_queue_group_suspend_prepare(queue,
 					&command.info.suspend_buf_copy,
 					kcpu_cmd);
 			break;
+#endif
 		default:
 			dev_dbg(queue->kctx->kbdev->dev,
 				"Unknown command type %u", command.type);
@@ -2097,13 +2742,10 @@ int kbase_csf_kcpu_queue_enqueue(struct kbase_context *kctx,
 
 		queue->num_pending_cmds += enq->nr_commands;
 		kcpu_queue_process(queue, false);
-	} else {
-		/* Roll back the number of enqueued commands */
-		kctx->csf.kcpu_queues.num_cmds -= i;
 	}
 
 out:
-	mutex_unlock(&kctx->csf.kcpu_queues.lock);
+	mutex_unlock(&queue->lock);
 
 	return ret;
 }
@@ -2117,14 +2759,9 @@ int kbase_csf_kcpu_queue_context_init(struct kbase_context *kctx)
 	for (idx = 0; idx < KBASEP_MAX_KCPU_QUEUES; ++idx)
 		kctx->csf.kcpu_queues.array[idx] = NULL;
 
-	kctx->csf.kcpu_queues.wq = alloc_workqueue("mali_kbase_csf_kcpu",
-					WQ_UNBOUND | WQ_HIGHPRI, 0);
-	if (!kctx->csf.kcpu_queues.wq)
-		return -ENOMEM;
-
 	mutex_init(&kctx->csf.kcpu_queues.lock);
 
-	kctx->csf.kcpu_queues.num_cmds = 0;
+	atomic64_set(&kctx->csf.kcpu_queues.cmd_seq_num, 0);
 
 	return 0;
 }
@@ -2142,9 +2779,9 @@ void kbase_csf_kcpu_queue_context_term(struct kbase_context *kctx)
 			(void)delete_queue(kctx, id);
 	}
 
-	destroy_workqueue(kctx->csf.kcpu_queues.wq);
 	mutex_destroy(&kctx->csf.kcpu_queues.lock);
 }
+KBASE_EXPORT_TEST_API(kbase_csf_kcpu_queue_context_term);
 
 int kbase_csf_kcpu_queue_delete(struct kbase_context *kctx,
 			struct kbase_ioctl_kcpu_queue_delete *del)
@@ -2157,8 +2794,11 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx,
 {
 	struct kbase_kcpu_command_queue *queue;
 	int idx;
+	int n;
 	int ret = 0;
-
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+	struct kbase_kcpu_dma_fence_meta *metadata;
+#endif
 	/* The queue id is of u8 type and we use the index of the kcpu_queues
 	 * array as an id, so the number of elements in the array can't be
 	 * more than 256.
@@ -2186,8 +2826,14 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx,
 		goto out;
 	}
 
-	bitmap_set(kctx->csf.kcpu_queues.in_use, idx, 1);
-	kctx->csf.kcpu_queues.array[idx] = queue;
+	ret = kbase_kthread_run_worker_rt(kctx->kbdev, &queue->csf_kcpu_worker, "csf_kcpu_%i", idx);
+
+	if (ret) {
+		kfree(queue);
+		goto out;
+	}
+
+	mutex_init(&queue->lock);
 	queue->kctx = kctx;
 	queue->start_offset = 0;
 	queue->num_pending_cmds = 0;
@@ -2195,12 +2841,37 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx,
 	queue->fence_context = dma_fence_context_alloc(1);
 	queue->fence_seqno = 0;
 	queue->fence_wait_processed = false;
-#endif
+
+	metadata = kzalloc(sizeof(*metadata), GFP_KERNEL);
+	if (!metadata) {
+		kbase_destroy_kworker_stack(&queue->csf_kcpu_worker);
+		kfree(queue);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	metadata->kbdev = kctx->kbdev;
+	metadata->kctx_id = kctx->id;
+	n = snprintf(metadata->timeline_name, MAX_TIMELINE_NAME, "%d-%d_%d-%lld-kcpu",
+		     kctx->kbdev->id, kctx->tgid, kctx->id, queue->fence_context);
+	if (WARN_ON(n >= MAX_TIMELINE_NAME)) {
+		kbase_destroy_kworker_stack(&queue->csf_kcpu_worker);
+		kfree(queue);
+		kfree(metadata);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	kbase_refcount_set(&metadata->refcount, 1);
+	queue->metadata = metadata;
+	atomic_inc(&kctx->kbdev->live_fence_metadata);
+#endif /* CONFIG_SYNC_FILE */
 	queue->enqueue_failed = false;
 	queue->command_started = false;
 	INIT_LIST_HEAD(&queue->jit_blocked);
 	queue->has_error = false;
-	INIT_WORK(&queue->work, kcpu_queue_process_worker);
+	kthread_init_work(&queue->work, kcpu_queue_process_worker);
+	kthread_init_work(&queue->timeout_work, kcpu_queue_timeout_worker);
 	queue->id = idx;
 
 	newq->id = idx;
@@ -2211,10 +2882,103 @@ int kbase_csf_kcpu_queue_new(struct kbase_context *kctx,
 	KBASE_TLSTREAM_TL_KBASE_NEW_KCPUQUEUE(kctx->kbdev, queue, queue->id, kctx->id,
 					      queue->num_pending_cmds);
 
-	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_NEW, queue,
+	KBASE_KTRACE_ADD_CSF_KCPU(kctx->kbdev, KCPU_QUEUE_CREATE, queue,
 		queue->fence_context, 0);
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	kbase_timer_setup(&queue->fence_timeout, fence_timeout_callback);
+#endif
+
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+	atomic_set(&queue->fence_signal_pending_cnt, 0);
+	kbase_timer_setup(&queue->fence_signal_timeout, fence_signal_timeout_cb);
+#endif
+	bitmap_set(kctx->csf.kcpu_queues.in_use, idx, 1);
+	kctx->csf.kcpu_queues.array[idx] = queue;
 out:
 	mutex_unlock(&kctx->csf.kcpu_queues.lock);
 
 	return ret;
 }
+KBASE_EXPORT_TEST_API(kbase_csf_kcpu_queue_new);
+
+int kbase_csf_kcpu_queue_halt_timers(struct kbase_device *kbdev)
+{
+	struct kbase_context *kctx;
+
+	list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) {
+		unsigned long queue_idx;
+		struct kbase_csf_kcpu_queue_context *kcpu_ctx = &kctx->csf.kcpu_queues;
+
+		mutex_lock(&kcpu_ctx->lock);
+
+		for_each_set_bit(queue_idx, kcpu_ctx->in_use, KBASEP_MAX_KCPU_QUEUES) {
+			struct kbase_kcpu_command_queue *kcpu_queue = kcpu_ctx->array[queue_idx];
+
+			if (unlikely(!kcpu_queue))
+				continue;
+
+			mutex_lock(&kcpu_queue->lock);
+
+			if (atomic_read(&kcpu_queue->fence_signal_pending_cnt)) {
+				int ret = del_timer_sync(&kcpu_queue->fence_signal_timeout);
+
+				dev_dbg(kbdev->dev,
+					"Fence signal timeout on KCPU queue(%lu), kctx (%d_%d) was %s on suspend",
+					queue_idx, kctx->tgid, kctx->id,
+					ret ? "pending" : "not pending");
+			}
+
+#ifdef CONFIG_MALI_FENCE_DEBUG
+			if (kcpu_queue->fence_wait_processed) {
+				int ret = del_timer_sync(&kcpu_queue->fence_timeout);
+
+				dev_dbg(kbdev->dev,
+					"Fence wait timeout on KCPU queue(%lu), kctx (%d_%d) was %s on suspend",
+					queue_idx, kctx->tgid, kctx->id,
+					ret ? "pending" : "not pending");
+			}
+#endif
+			mutex_unlock(&kcpu_queue->lock);
+		}
+		mutex_unlock(&kcpu_ctx->lock);
+	}
+	return 0;
+}
+
+void kbase_csf_kcpu_queue_resume_timers(struct kbase_device *kbdev)
+{
+	struct kbase_context *kctx;
+
+	list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) {
+		unsigned long queue_idx;
+		struct kbase_csf_kcpu_queue_context *kcpu_ctx = &kctx->csf.kcpu_queues;
+
+		mutex_lock(&kcpu_ctx->lock);
+
+		for_each_set_bit(queue_idx, kcpu_ctx->in_use, KBASEP_MAX_KCPU_QUEUES) {
+			struct kbase_kcpu_command_queue *kcpu_queue = kcpu_ctx->array[queue_idx];
+
+			if (unlikely(!kcpu_queue))
+				continue;
+
+			mutex_lock(&kcpu_queue->lock);
+#ifdef CONFIG_MALI_FENCE_DEBUG
+			if (kcpu_queue->fence_wait_processed) {
+				fence_wait_timeout_start(kcpu_queue);
+				dev_dbg(kbdev->dev,
+					"Fence wait timeout on KCPU queue(%lu), kctx (%d_%d) has been resumed on system resume",
+					queue_idx, kctx->tgid, kctx->id);
+			}
+#endif
+			if (atomic_read(&kbdev->fence_signal_timeout_enabled) &&
+			    atomic_read(&kcpu_queue->fence_signal_pending_cnt)) {
+				fence_signal_timeout_start(kcpu_queue);
+				dev_dbg(kbdev->dev,
+					"Fence signal timeout on KCPU queue(%lu), kctx (%d_%d) has been resumed on system resume",
+					queue_idx, kctx->tgid, kctx->id);
+			}
+			mutex_unlock(&kcpu_queue->lock);
+		}
+		mutex_unlock(&kcpu_ctx->lock);
+	}
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu.h b/mali_kbase/csf/mali_kbase_csf_kcpu.h
index 2216cb7..4a8d937 100644
--- a/mali_kbase/csf/mali_kbase_csf_kcpu.h
+++ b/mali_kbase/csf/mali_kbase_csf_kcpu.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,6 +22,9 @@
 #ifndef _KBASE_CSF_KCPU_H_
 #define _KBASE_CSF_KCPU_H_
 
+#include <mali_kbase_fence.h>
+#include <mali_kbase_sync.h>
+
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
 #include <linux/fence.h>
 #else
@@ -44,12 +47,13 @@ struct kbase_kcpu_command_import_info {
 };
 
 /**
- * struct kbase_kcpu_command_fence_info - Structure which holds information
- *		about the fence object enqueued in the kcpu command queue
+ * struct kbase_kcpu_command_fence_info - Structure which holds information about the
+ *                                        fence object enqueued in the kcpu command queue
  *
- * @fence_cb:   Fence callback
- * @fence:      Fence
- * @kcpu_queue: kcpu command queue
+ * @fence_cb:      Fence callback
+ * @fence:         Fence
+ * @kcpu_queue:    kcpu command queue
+ * @fence_has_force_signaled:	fence has forced signaled after fence timeouted
  */
 struct kbase_kcpu_command_fence_info {
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
@@ -60,6 +64,7 @@ struct kbase_kcpu_command_fence_info {
 	struct dma_fence *fence;
 #endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */
 	struct kbase_kcpu_command_queue *kcpu_queue;
+	bool fence_has_force_signaled;
 };
 
 /**
@@ -183,8 +188,9 @@ struct kbase_suspend_copy_buffer {
 	struct kbase_mem_phy_alloc *cpu_alloc;
 };
 
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 /**
- * struct base_kcpu_command_group_suspend - structure which contains
+ * struct kbase_kcpu_command_group_suspend_info - structure which contains
  *		suspend buffer data captured for a suspended queue group.
  *
  * @sus_buf:		Pointer to the structure which contains details of the
@@ -195,10 +201,11 @@ struct kbase_kcpu_command_group_suspend_info {
 	struct kbase_suspend_copy_buffer *sus_buf;
 	u8 group_handle;
 };
+#endif
 
 
 /**
- * struct kbase_cpu_command - Command which is to be part of the kernel
+ * struct kbase_kcpu_command - Command which is to be part of the kernel
  *                            command queue
  *
  * @type:	Type of the command.
@@ -229,20 +236,28 @@ struct kbase_kcpu_command {
 		struct kbase_kcpu_command_import_info import;
 		struct kbase_kcpu_command_jit_alloc_info jit_alloc;
 		struct kbase_kcpu_command_jit_free_info jit_free;
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 		struct kbase_kcpu_command_group_suspend_info suspend_buf_copy;
+#endif
 	} info;
 };
 
 /**
  * struct kbase_kcpu_command_queue - a command queue executed by the kernel
  *
+ * @lock:			Lock to protect accesses to this queue.
  * @kctx:			The context to which this command queue belongs.
  * @commands:			Array of commands which have been successfully
  *				enqueued to this command queue.
- * @work:			struct work_struct which contains a pointer to
+ * @csf_kcpu_worker:		Dedicated worker for processing kernel CPU command
+ *				queues.
+ * @work:			struct kthread_work which contains a pointer to
  *				the function which handles processing of kcpu
  *				commands enqueued into a kcpu command queue;
  *				part of kernel API for processing workqueues
+ * @timeout_work:		struct kthread_work which contains a pointer to the
+ *				function which handles post-timeout actions
+ *				queue when a fence signal timeout occurs.
  * @start_offset:		Index of the command to be executed next
  * @id:				KCPU command queue ID.
  * @num_pending_cmds:		The number of commands enqueued but not yet
@@ -271,11 +286,20 @@ struct kbase_kcpu_command {
  *				or without errors since last cleaned.
  * @jit_blocked:		Used to keep track of command queues blocked
  *				by a pending JIT allocation command.
+ * @fence_timeout:		Timer used to detect the fence wait timeout.
+ * @metadata:                   Metadata structure containing basic information about
+ *                              this queue for any fence objects associated with this queue.
+ * @fence_signal_timeout:	Timer used for detect a fence signal command has
+ *				been blocked for too long.
+ * @fence_signal_pending_cnt:	Enqueued fence signal commands in the queue.
  */
 struct kbase_kcpu_command_queue {
+	struct mutex lock;
 	struct kbase_context *kctx;
 	struct kbase_kcpu_command commands[KBASEP_KCPU_QUEUE_SIZE];
-	struct work_struct work;
+	struct kthread_worker csf_kcpu_worker;
+	struct kthread_work work;
+	struct kthread_work timeout_work;
 	u8 start_offset;
 	u8 id;
 	u16 num_pending_cmds;
@@ -287,6 +311,14 @@ struct kbase_kcpu_command_queue {
 	bool command_started;
 	struct list_head jit_blocked;
 	bool has_error;
+#ifdef CONFIG_MALI_FENCE_DEBUG
+	struct timer_list fence_timeout;
+#endif /* CONFIG_MALI_FENCE_DEBUG */
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+	struct kbase_kcpu_dma_fence_meta *metadata;
+#endif /* CONFIG_SYNC_FILE */
+	struct timer_list fence_signal_timeout;
+	atomic_t fence_signal_pending_cnt;
 };
 
 /**
@@ -351,4 +383,42 @@ int kbase_csf_kcpu_queue_context_init(struct kbase_context *kctx);
  */
 void kbase_csf_kcpu_queue_context_term(struct kbase_context *kctx);
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+/* Test wrappers for dma fence operations. */
+int kbase_kcpu_fence_signal_process(struct kbase_kcpu_command_queue *kcpu_queue,
+				    struct kbase_kcpu_command_fence_info *fence_info);
+
+int kbase_kcpu_fence_signal_init(struct kbase_kcpu_command_queue *kcpu_queue,
+				 struct kbase_kcpu_command *current_command,
+				 struct base_fence *fence, struct sync_file **sync_file, int *fd);
+#endif /* CONFIG_SYNC_FILE */
+
+/*
+ * kbase_csf_kcpu_queue_halt_timers - Halt the KCPU fence timers associated with
+ *                                    the kbase device.
+ *
+ * @kbdev: Kbase device
+ *
+ * Note that this function assumes that the caller has ensured that the
+ * kbase_device::kctx_list does not get updated during this function's runtime.
+ * At the moment, the function is only safe to call during system suspend, when
+ * the device PM active count has reached zero.
+ *
+ * Return: 0 on success, negative value otherwise.
+ */
+int kbase_csf_kcpu_queue_halt_timers(struct kbase_device *kbdev);
+
+/*
+ * kbase_csf_kcpu_queue_resume_timers - Resume the KCPU fence timers associated
+ *                                      with the kbase device.
+ *
+ * @kbdev: Kbase device
+ *
+ * Note that this function assumes that the caller has ensured that the
+ * kbase_device::kctx_list does not get updated during this function's runtime.
+ * At the moment, the function is only safe to call during system resume.
+ */
+void kbase_csf_kcpu_queue_resume_timers(struct kbase_device *kbdev);
+
+bool kbase_kcpu_command_fence_has_force_signaled(struct kbase_kcpu_command_fence_info *fence_info);
 #endif /* _KBASE_CSF_KCPU_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c b/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c
index 0a2cde0..fa87777 100644
--- a/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c
+++ b/mali_kbase/csf/mali_kbase_csf_kcpu_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -30,7 +30,7 @@
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 
 /**
- * kbasep_csf_kcpu_debugfs_print_queue() - Print additional info for KCPU
+ * kbasep_csf_kcpu_debugfs_print_cqs_waits() - Print additional info for KCPU
  *					queues blocked on CQS wait commands.
  *
  * @file:  The seq_file to print to
@@ -167,11 +167,7 @@ static const struct file_operations kbasep_csf_kcpu_debugfs_fops = {
 void kbase_csf_kcpu_debugfs_init(struct kbase_context *kctx)
 {
 	struct dentry *file;
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 
 	if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry)))
 		return;
diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c
new file mode 100644
index 0000000..cd55f62
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+#include <linux/debugfs.h>
+#endif
+
+#include <mali_kbase.h>
+#include <csf/mali_kbase_csf_kcpu_fence_debugfs.h>
+#include <mali_kbase_hwaccess_time.h>
+
+#define BUF_SIZE 10
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+static ssize_t kbase_csf_kcpu_queue_fence_signal_enabled_get(struct file *file, char __user *buf,
+							     size_t count, loff_t *ppos)
+{
+	int ret;
+	struct kbase_device *kbdev = file->private_data;
+
+	if (atomic_read(&kbdev->fence_signal_timeout_enabled))
+		ret = simple_read_from_buffer(buf, count, ppos, "1\n", 2);
+	else
+		ret = simple_read_from_buffer(buf, count, ppos, "0\n", 2);
+
+	return ret;
+};
+
+static ssize_t kbase_csf_kcpu_queue_fence_signal_enabled_set(struct file *file,
+							     const char __user *buf, size_t count,
+							     loff_t *ppos)
+{
+	int ret;
+	unsigned int enabled;
+	struct kbase_device *kbdev = file->private_data;
+
+	ret = kstrtouint_from_user(buf, count, 10, &enabled);
+	if (ret < 0)
+		return ret;
+
+	atomic_set(&kbdev->fence_signal_timeout_enabled, enabled);
+
+	return count;
+}
+
+static const struct file_operations kbase_csf_kcpu_queue_fence_signal_fops = {
+	.owner = THIS_MODULE,
+	.read = kbase_csf_kcpu_queue_fence_signal_enabled_get,
+	.write = kbase_csf_kcpu_queue_fence_signal_enabled_set,
+	.open = simple_open,
+	.llseek = default_llseek,
+};
+
+static ssize_t kbase_csf_kcpu_queue_fence_signal_timeout_get(struct file *file, char __user *buf,
+							     size_t count, loff_t *ppos)
+{
+	int size;
+	char buffer[BUF_SIZE];
+	struct kbase_device *kbdev = file->private_data;
+	unsigned int timeout_ms = kbase_get_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT);
+
+	size = scnprintf(buffer, sizeof(buffer), "%u\n", timeout_ms);
+	return simple_read_from_buffer(buf, count, ppos, buffer, size);
+}
+
+static ssize_t kbase_csf_kcpu_queue_fence_signal_timeout_set(struct file *file,
+							     const char __user *buf, size_t count,
+							     loff_t *ppos)
+{
+	int ret;
+	unsigned int timeout_ms;
+	struct kbase_device *kbdev = file->private_data;
+
+	ret = kstrtouint_from_user(buf, count, 10, &timeout_ms);
+	if (ret < 0)
+		return ret;
+
+	/* The timeout passed by the user is bounded when trying to insert it into
+	 * the precomputed timeout table, so we don't need to do any more validation
+	 * before-hand.
+	 */
+	kbase_device_set_timeout_ms(kbdev, KCPU_FENCE_SIGNAL_TIMEOUT, timeout_ms);
+
+	return count;
+}
+
+static const struct file_operations kbase_csf_kcpu_queue_fence_signal_timeout_fops = {
+	.owner = THIS_MODULE,
+	.read = kbase_csf_kcpu_queue_fence_signal_timeout_get,
+	.write = kbase_csf_kcpu_queue_fence_signal_timeout_set,
+	.open = simple_open,
+	.llseek = default_llseek,
+};
+
+int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev)
+{
+	struct dentry *file;
+	const mode_t mode = 0644;
+
+	if (WARN_ON(IS_ERR_OR_NULL(kbdev->mali_debugfs_directory)))
+		return -1;
+
+	file = debugfs_create_file("fence_signal_timeout_enable", mode,
+				   kbdev->mali_debugfs_directory, kbdev,
+				   &kbase_csf_kcpu_queue_fence_signal_fops);
+
+	if (IS_ERR_OR_NULL(file)) {
+		dev_warn(kbdev->dev, "Unable to create fence signal timer toggle entry");
+		return -1;
+	}
+
+	file = debugfs_create_file("fence_signal_timeout_ms", mode, kbdev->mali_debugfs_directory,
+				   kbdev, &kbase_csf_kcpu_queue_fence_signal_timeout_fops);
+
+	if (IS_ERR_OR_NULL(file)) {
+		dev_warn(kbdev->dev, "Unable to create fence signal timeout entry");
+		return -1;
+	}
+	return 0;
+}
+
+#else
+int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev)
+{
+	return 0;
+}
+
+#endif
+void kbase_csf_fence_timer_debugfs_term(struct kbase_device *kbdev)
+{
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h
new file mode 100644
index 0000000..e3799fb
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_kcpu_fence_debugfs.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+#ifndef _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_
+#define _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_
+
+struct kbase_device;
+
+/*
+ * kbase_csf_fence_timer_debugfs_init - Initialize fence signal timeout debugfs
+ *                                      entries.
+ * @kbdev: Kbase device.
+ *
+ * Return: 0 on success, -1 on failure.
+ */
+int kbase_csf_fence_timer_debugfs_init(struct kbase_device *kbdev);
+
+/*
+ * kbase_csf_fence_timer_debugfs_term - Terminate fence signal timeout debugfs
+ *                                      entries.
+ * @kbdev: Kbase device.
+ */
+void kbase_csf_fence_timer_debugfs_term(struct kbase_device *kbdev);
+
+#endif /* _KBASE_CSF_KCPU_FENCE_SIGNAL_DEBUGFS_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c
new file mode 100644
index 0000000..863cf10
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.c
@@ -0,0 +1,818 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <linux/protected_memory_allocator.h>
+#include <mali_kbase.h>
+#include "mali_kbase_csf.h"
+#include "mali_kbase_csf_mcu_shared_reg.h"
+#include <mali_kbase_mem_migrate.h>
+
+/* Scaling factor in pre-allocating shared regions for suspend bufs and userios */
+#define MCU_SHARED_REGS_PREALLOCATE_SCALE (8)
+
+/* MCU shared region map attempt limit */
+#define MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT (4)
+
+/* Convert a VPFN to its start addr */
+#define GET_VPFN_VA(vpfn) ((vpfn) << PAGE_SHIFT)
+
+/* Macros for extract the corresponding VPFNs from a CSG_REG */
+#define CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages) (reg->start_pfn)
+#define CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages) (reg->start_pfn + nr_susp_pages)
+#define CSG_REG_USERIO_VPFN(reg, csi, nr_susp_pages) (reg->start_pfn + 2 * (nr_susp_pages + csi))
+
+/* MCU shared segment dummy page mapping flags */
+#define DUMMY_PAGE_MAP_FLAGS (KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_DEFAULT) | KBASE_REG_GPU_NX)
+
+/* MCU shared segment suspend buffer mapping flags */
+#define SUSP_PAGE_MAP_FLAGS                                                                        \
+	(KBASE_REG_GPU_RD | KBASE_REG_GPU_WR | KBASE_REG_GPU_NX |                                  \
+	 KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_DEFAULT))
+
+/**
+ * struct kbase_csg_shared_region - Wrapper object for use with a CSG on runtime
+ *                                  resources for suspend buffer pages, userio pages
+ *                                  and their corresponding mapping GPU VA addresses
+ *                                  from the MCU shared interface segment
+ *
+ * @link:       Link to the managing list for the wrapper object.
+ * @reg:        pointer to the region allocated from the shared interface segment, which
+ *              covers the normal/P-mode suspend buffers, userio pages of the queues
+ * @grp:        Pointer to the bound kbase_queue_group, or NULL if no binding (free).
+ * @pmode_mapped: Boolean for indicating the region has MMU mapped with the bound group's
+ *              protected mode suspend buffer pages.
+ */
+struct kbase_csg_shared_region {
+	struct list_head link;
+	struct kbase_va_region *reg;
+	struct kbase_queue_group *grp;
+	bool pmode_mapped;
+};
+
+static unsigned long get_userio_mmu_flags(struct kbase_device *kbdev)
+{
+	unsigned long userio_map_flags;
+
+	if (kbdev->system_coherency == COHERENCY_NONE)
+		userio_map_flags =
+			KBASE_REG_GPU_RD | KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
+	else
+		userio_map_flags = KBASE_REG_GPU_RD | KBASE_REG_SHARE_BOTH |
+				   KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_SHARED);
+
+	return (userio_map_flags | KBASE_REG_GPU_NX);
+}
+
+static void set_page_meta_status_not_movable(struct tagged_addr phy)
+{
+	if (kbase_is_page_migration_enabled()) {
+		struct kbase_page_metadata *page_md = kbase_page_private(as_page(phy));
+
+		if (page_md) {
+			spin_lock(&page_md->migrate_lock);
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+			spin_unlock(&page_md->migrate_lock);
+		}
+	}
+}
+
+static struct kbase_csg_shared_region *get_group_bound_csg_reg(struct kbase_queue_group *group)
+{
+	return (struct kbase_csg_shared_region *)group->csg_reg;
+}
+
+static inline int update_mapping_with_dummy_pages(struct kbase_device *kbdev, u64 vpfn,
+						  u32 nr_pages)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	const unsigned long mem_flags = DUMMY_PAGE_MAP_FLAGS;
+
+	return kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, shared_regs->dummy_phys, nr_pages,
+					      mem_flags, KBASE_MEM_GROUP_CSF_FW);
+}
+
+static inline int insert_dummy_pages(struct kbase_device *kbdev, u64 vpfn, u32 nr_pages)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	const unsigned long mem_flags = DUMMY_PAGE_MAP_FLAGS;
+	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+
+	return kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys,
+				      nr_pages, mem_flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW,
+				      mmu_sync_info, NULL);
+}
+
+/* Reset consecutive retry count to zero */
+static void notify_group_csg_reg_map_done(struct kbase_queue_group *group)
+{
+	lockdep_assert_held(&group->kctx->kbdev->csf.scheduler.lock);
+
+	/* Just clear the internal map retry count */
+	group->csg_reg_bind_retries = 0;
+}
+
+/* Return true if a fatal group error has already been triggered */
+static bool notify_group_csg_reg_map_error(struct kbase_queue_group *group)
+{
+	struct kbase_device *kbdev = group->kctx->kbdev;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (group->csg_reg_bind_retries < U8_MAX)
+		group->csg_reg_bind_retries++;
+
+	/* Allow only one fatal error notification */
+	if (group->csg_reg_bind_retries == MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT) {
+		struct base_gpu_queue_group_error const err_payload = {
+			.error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL,
+			.payload = { .fatal_group = { .status = GPU_EXCEPTION_TYPE_SW_FAULT_0 } }
+		};
+
+		dev_err(kbdev->dev, "Fatal: group_%d_%d_%d exceeded shared region map retry limit",
+			group->kctx->tgid, group->kctx->id, group->handle);
+		kbase_csf_add_group_fatal_error(group, &err_payload);
+		kbase_event_wakeup_nosync(group->kctx);
+	}
+
+	return group->csg_reg_bind_retries >= MCU_SHARED_REGS_BIND_ATTEMPT_LIMIT;
+}
+
+/* Replace the given phys at vpfn (reflecting a queue's userio_pages) mapping.
+ * If phys is NULL, the internal dummy_phys is used, which effectively
+ * restores back to the initialized state for the given queue's userio_pages
+ * (i.e. mapped to the default dummy page).
+ * In case of CSF mmu update error on a queue, the dummy phy is used to restore
+ * back the default 'unbound' (i.e. mapped to dummy) condition.
+ *
+ * It's the caller's responsibility to ensure that the given vpfn is extracted
+ * correctly from a CSG_REG object, for example, using CSG_REG_USERIO_VPFN().
+ */
+static int userio_pages_replace_phys(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	int err = 0, err1;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (phys) {
+		unsigned long mem_flags_input = shared_regs->userio_mem_rd_flags;
+		unsigned long mem_flags_output = mem_flags_input | KBASE_REG_GPU_WR;
+
+		/* Dealing with a queue's INPUT page */
+		err = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, &phys[0], 1, mem_flags_input,
+						     KBASE_MEM_GROUP_CSF_IO);
+		/* Dealing with a queue's OUTPUT page */
+		err1 = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn + 1, &phys[1], 1,
+						      mem_flags_output, KBASE_MEM_GROUP_CSF_IO);
+		if (unlikely(err1))
+			err = err1;
+	}
+
+	if (unlikely(err) || !phys) {
+		/* Restore back to dummy_userio_phy */
+		update_mapping_with_dummy_pages(kbdev, vpfn, KBASEP_NUM_CS_USER_IO_PAGES);
+	}
+
+	return err;
+}
+
+/* Update a group's queues' mappings for a group with its runtime bound group region */
+static int csg_reg_update_on_csis(struct kbase_device *kbdev, struct kbase_queue_group *group,
+				  struct kbase_queue_group *prev_grp)
+{
+	struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group);
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	const u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num;
+	struct tagged_addr *phy;
+	int err = 0, err1;
+	u32 i;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ONCE(!csg_reg, "Update_userio pages: group has no bound csg_reg"))
+		return -EINVAL;
+
+	for (i = 0; i < nr_csis; i++) {
+		struct kbase_queue *queue = group->bound_queues[i];
+		struct kbase_queue *prev_queue = prev_grp ? prev_grp->bound_queues[i] : NULL;
+
+		/* Set the phy if the group's queue[i] needs mapping, otherwise NULL */
+		phy = (queue && queue->enabled && !queue->user_io_gpu_va) ? queue->phys : NULL;
+
+		/* Either phy is valid, or this update is for a transition change from
+		 * prev_group, and the prev_queue was mapped, so an update is required.
+		 */
+		if (phy || (prev_queue && prev_queue->user_io_gpu_va)) {
+			u64 vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, i, nr_susp_pages);
+
+			err1 = userio_pages_replace_phys(kbdev, vpfn, phy);
+
+			if (unlikely(err1)) {
+				dev_warn(kbdev->dev,
+					 "%s: Error in update queue-%d mapping for csg_%d_%d_%d",
+					 __func__, i, group->kctx->tgid, group->kctx->id,
+					 group->handle);
+				err = err1;
+			} else if (phy)
+				queue->user_io_gpu_va = GET_VPFN_VA(vpfn);
+
+			/* Mark prev_group's queue has lost its mapping */
+			if (prev_queue)
+				prev_queue->user_io_gpu_va = 0;
+		}
+	}
+
+	return err;
+}
+
+/* Bind a group to a given csg_reg, any previous mappings with the csg_reg are replaced
+ * with the given group's phy pages, or, if no replacement, the default dummy pages.
+ * Note, the csg_reg's fields are in transition step-by-step from the prev_grp to its
+ * new binding owner in this function. At the end, the prev_grp would be completely
+ * detached away from the previously bound csg_reg.
+ */
+static int group_bind_csg_reg(struct kbase_device *kbdev, struct kbase_queue_group *group,
+			      struct kbase_csg_shared_region *csg_reg)
+{
+	const unsigned long mem_flags = SUSP_PAGE_MAP_FLAGS;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	struct kbase_queue_group *prev_grp = csg_reg->grp;
+	struct kbase_va_region *reg = csg_reg->reg;
+	struct tagged_addr *phy;
+	int err = 0, err1;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	/* The csg_reg is expected still on the unused list so its link is not empty */
+	if (WARN_ON_ONCE(list_empty(&csg_reg->link))) {
+		dev_dbg(kbdev->dev, "csg_reg is marked in active use");
+		return -EINVAL;
+	}
+
+	if (WARN_ON_ONCE(prev_grp && prev_grp->csg_reg != csg_reg)) {
+		dev_dbg(kbdev->dev, "Unexpected bound lost on prev_group");
+		prev_grp->csg_reg = NULL;
+		return -EINVAL;
+	}
+
+	/* Replacing the csg_reg bound group to the newly given one */
+	csg_reg->grp = group;
+	group->csg_reg = csg_reg;
+
+	/* Resolving mappings, deal with protected mode first */
+	if (group->protected_suspend_buf.pma) {
+		/* We are binding a new group with P-mode ready, the prev_grp's P-mode mapping
+		 * status is now stale during this transition of ownership. For the new owner,
+		 * its mapping would have been updated away when it lost its binding previously.
+		 * So it needs an update to this pma map. By clearing here the mapped flag
+		 * ensures it reflects the new owner's condition.
+		 */
+		csg_reg->pmode_mapped = false;
+		err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group);
+	} else if (csg_reg->pmode_mapped) {
+		/* Need to unmap the previous one, use the dummy pages */
+		err = update_mapping_with_dummy_pages(
+			kbdev, CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages);
+
+		if (unlikely(err))
+			dev_warn(kbdev->dev, "%s: Failed to update P-mode dummy for csg_%d_%d_%d",
+				 __func__, group->kctx->tgid, group->kctx->id, group->handle);
+
+		csg_reg->pmode_mapped = false;
+	}
+
+	/* Unlike the normal suspend buf, the mapping of the protected mode suspend buffer is
+	 * actually reflected by a specific mapped flag (due to phys[] is only allocated on
+	 * in-need basis). So the GPU_VA is always updated to the bound region's corresponding
+	 * VA, as a reflection of the binding to the csg_reg.
+	 */
+	group->protected_suspend_buf.gpu_va =
+		GET_VPFN_VA(CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages));
+
+	/* Deal with normal mode suspend buffer */
+	phy = group->normal_suspend_buf.phy;
+	err1 = kbase_mmu_update_csf_mcu_pages(kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), phy,
+					      nr_susp_pages, mem_flags, KBASE_MEM_GROUP_CSF_FW);
+
+	if (unlikely(err1)) {
+		dev_warn(kbdev->dev, "%s: Failed to update suspend buffer for csg_%d_%d_%d",
+			 __func__, group->kctx->tgid, group->kctx->id, group->handle);
+
+		/* Attempt a restore to default dummy for removing previous mapping */
+		if (prev_grp)
+			update_mapping_with_dummy_pages(
+				kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages);
+		err = err1;
+		/* Marking the normal suspend buffer is not mapped (due to error) */
+		group->normal_suspend_buf.gpu_va = 0;
+	} else {
+		/* Marking the normal suspend buffer is actually mapped */
+		group->normal_suspend_buf.gpu_va =
+			GET_VPFN_VA(CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages));
+	}
+
+	/* Deal with queue uerio_pages */
+	err1 = csg_reg_update_on_csis(kbdev, group, prev_grp);
+	if (likely(!err1))
+		err = err1;
+
+	/* Reset the previous group's suspend buffers' GPU_VAs as it has lost its bound */
+	if (prev_grp) {
+		prev_grp->normal_suspend_buf.gpu_va = 0;
+		prev_grp->protected_suspend_buf.gpu_va = 0;
+		prev_grp->csg_reg = NULL;
+	}
+
+	return err;
+}
+
+/* Notify the group is placed on-slot, hence the bound csg_reg is active in use */
+void kbase_csf_mcu_shared_set_group_csg_reg_active(struct kbase_device *kbdev,
+						   struct kbase_queue_group *group)
+{
+	struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group);
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ONCE(!csg_reg || csg_reg->grp != group, "Group_%d_%d_%d has no csg_reg bounding",
+		      group->kctx->tgid, group->kctx->id, group->handle))
+		return;
+
+	/* By dropping out the csg_reg from the unused list, it becomes active and is tracked
+	 * by its bound group that is on-slot. The design is that, when this on-slot group is
+	 * moved to off-slot, the scheduler slot-clean up will add it back to the tail of the
+	 * unused list.
+	 */
+	if (!WARN_ON_ONCE(list_empty(&csg_reg->link)))
+		list_del_init(&csg_reg->link);
+}
+
+/* Notify the group is placed off-slot, hence the bound csg_reg is not in active use
+ * anymore. Existing bounding/mappings are left untouched. These would only be dealt with
+ * if the bound csg_reg is to be reused with another group.
+ */
+void kbase_csf_mcu_shared_set_group_csg_reg_unused(struct kbase_device *kbdev,
+						   struct kbase_queue_group *group)
+{
+	struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group);
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ONCE(!csg_reg || csg_reg->grp != group, "Group_%d_%d_%d has no csg_reg bound",
+		      group->kctx->tgid, group->kctx->id, group->handle))
+		return;
+
+	/* By adding back the csg_reg to the unused list, it becomes available for another
+	 * group to break its existing binding and set up a new one.
+	 */
+	if (!list_empty(&csg_reg->link)) {
+		WARN_ONCE(group->csg_nr >= 0, "Group is assumed vacated from slot");
+		list_move_tail(&csg_reg->link, &shared_regs->unused_csg_regs);
+	} else
+		list_add_tail(&csg_reg->link, &shared_regs->unused_csg_regs);
+}
+
+/* Adding a new queue to an existing on-slot group */
+int kbase_csf_mcu_shared_add_queue(struct kbase_device *kbdev, struct kbase_queue *queue)
+{
+	struct kbase_queue_group *group = queue->group;
+	struct kbase_csg_shared_region *csg_reg;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	u64 vpfn;
+	int err;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ONCE(!group || group->csg_nr < 0, "No bound group, or group is not on-slot"))
+		return -EIO;
+
+	csg_reg = get_group_bound_csg_reg(group);
+	if (WARN_ONCE(!csg_reg || !list_empty(&csg_reg->link),
+		      "No bound csg_reg, or in wrong state"))
+		return -EIO;
+
+	vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, queue->csi_index, nr_susp_pages);
+	err = userio_pages_replace_phys(kbdev, vpfn, queue->phys);
+	if (likely(!err)) {
+		/* Mark the queue has been successfully mapped */
+		queue->user_io_gpu_va = GET_VPFN_VA(vpfn);
+	} else {
+		/* Mark the queue has no mapping on its phys[] */
+		queue->user_io_gpu_va = 0;
+		dev_dbg(kbdev->dev,
+			"%s: Error in mapping userio pages for queue-%d of csg_%d_%d_%d", __func__,
+			queue->csi_index, group->kctx->tgid, group->kctx->id, group->handle);
+
+		/* notify the error for the bound group */
+		if (notify_group_csg_reg_map_error(group))
+			err = -EIO;
+	}
+
+	return err;
+}
+
+/* Unmap a given queue's userio pages, when the queue is deleted */
+void kbase_csf_mcu_shared_drop_stopped_queue(struct kbase_device *kbdev, struct kbase_queue *queue)
+{
+	struct kbase_queue_group *group;
+	struct kbase_csg_shared_region *csg_reg;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	u64 vpfn;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	/* The queue has no existing mapping, nothing to do */
+	if (!queue || !queue->user_io_gpu_va)
+		return;
+
+	group = queue->group;
+	if (WARN_ONCE(!group || !group->csg_reg, "Queue/Group has no bound region"))
+		return;
+
+	csg_reg = get_group_bound_csg_reg(group);
+
+	vpfn = CSG_REG_USERIO_VPFN(csg_reg->reg, queue->csi_index, nr_susp_pages);
+
+	WARN_ONCE(userio_pages_replace_phys(kbdev, vpfn, NULL),
+		  "Unexpected restoring to dummy map update error");
+	queue->user_io_gpu_va = 0;
+}
+
+int kbase_csf_mcu_shared_group_update_pmode_map(struct kbase_device *kbdev,
+						struct kbase_queue_group *group)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group);
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	int err = 0, err1;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ONCE(!csg_reg, "Update_pmode_map: the bound csg_reg can't be NULL"))
+		return -EINVAL;
+
+	/* If the pmode already mapped, nothing to do */
+	if (csg_reg->pmode_mapped)
+		return 0;
+
+	/* P-mode map not in place and the group has allocated P-mode pages, map it */
+	if (group->protected_suspend_buf.pma) {
+		unsigned long mem_flags = SUSP_PAGE_MAP_FLAGS;
+		struct tagged_addr *phy = shared_regs->pma_phys;
+		struct kbase_va_region *reg = csg_reg->reg;
+		u64 vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages);
+		u32 i;
+
+		/* Populate the protected phys from pma to phy[] */
+		for (i = 0; i < nr_susp_pages; i++)
+			phy[i] = as_tagged(group->protected_suspend_buf.pma[i]->pa);
+
+		/* Add the P-mode suspend buffer mapping */
+		err = kbase_mmu_update_csf_mcu_pages(kbdev, vpfn, phy, nr_susp_pages, mem_flags,
+						     KBASE_MEM_GROUP_CSF_FW);
+
+		/* If error, restore to default dummpy */
+		if (unlikely(err)) {
+			err1 = update_mapping_with_dummy_pages(kbdev, vpfn, nr_susp_pages);
+			if (unlikely(err1))
+				dev_warn(
+					kbdev->dev,
+					"%s: Failed in recovering to P-mode dummy for csg_%d_%d_%d",
+					__func__, group->kctx->tgid, group->kctx->id,
+					group->handle);
+
+			csg_reg->pmode_mapped = false;
+		} else
+			csg_reg->pmode_mapped = true;
+	}
+
+	return err;
+}
+
+void kbase_csf_mcu_shared_clear_evicted_group_csg_reg(struct kbase_device *kbdev,
+						      struct kbase_queue_group *group)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	struct kbase_csg_shared_region *csg_reg = get_group_bound_csg_reg(group);
+	struct kbase_va_region *reg;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num;
+	int err = 0;
+	u32 i;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	/* Nothing to do for clearing up if no bound csg_reg */
+	if (!csg_reg)
+		return;
+
+	reg = csg_reg->reg;
+	/* Restore mappings default dummy pages for any mapped pages */
+	if (csg_reg->pmode_mapped) {
+		err = update_mapping_with_dummy_pages(
+			kbdev, CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages);
+		WARN_ONCE(unlikely(err), "Restore dummy failed for clearing pmod buffer mapping");
+
+		csg_reg->pmode_mapped = false;
+	}
+
+	if (group->normal_suspend_buf.gpu_va) {
+		err = update_mapping_with_dummy_pages(
+			kbdev, CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages), nr_susp_pages);
+		WARN_ONCE(err, "Restore dummy failed for clearing suspend buffer mapping");
+	}
+
+	/* Deal with queue uerio pages */
+	for (i = 0; i < nr_csis; i++)
+		kbase_csf_mcu_shared_drop_stopped_queue(kbdev, group->bound_queues[i]);
+
+	group->normal_suspend_buf.gpu_va = 0;
+	group->protected_suspend_buf.gpu_va = 0;
+
+	/* Break the binding */
+	group->csg_reg = NULL;
+	csg_reg->grp = NULL;
+
+	/* Put the csg_reg to the front of the unused list */
+	if (WARN_ON_ONCE(list_empty(&csg_reg->link)))
+		list_add(&csg_reg->link, &shared_regs->unused_csg_regs);
+	else
+		list_move(&csg_reg->link, &shared_regs->unused_csg_regs);
+}
+
+int kbase_csf_mcu_shared_group_bind_csg_reg(struct kbase_device *kbdev,
+					    struct kbase_queue_group *group)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	struct kbase_csg_shared_region *csg_reg;
+	int err;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	csg_reg = get_group_bound_csg_reg(group);
+	if (!csg_reg)
+		csg_reg = list_first_entry_or_null(&shared_regs->unused_csg_regs,
+						   struct kbase_csg_shared_region, link);
+
+	if (!WARN_ON_ONCE(!csg_reg)) {
+		struct kbase_queue_group *prev_grp = csg_reg->grp;
+
+		/* Deal with the previous binding and lazy unmap, i.e if the previous mapping not
+		 * the required one, unmap it.
+		 */
+		if (prev_grp == group) {
+			/* Update existing bindings, if there have been some changes */
+			err = kbase_csf_mcu_shared_group_update_pmode_map(kbdev, group);
+			if (likely(!err))
+				err = csg_reg_update_on_csis(kbdev, group, NULL);
+		} else
+			err = group_bind_csg_reg(kbdev, group, csg_reg);
+	} else {
+		/* This should not have been possible if the code operates rightly */
+		dev_err(kbdev->dev, "%s: Unexpected NULL csg_reg for group %d of context %d_%d",
+			__func__, group->handle, group->kctx->tgid, group->kctx->id);
+		return -EIO;
+	}
+
+	if (likely(!err))
+		notify_group_csg_reg_map_done(group);
+	else
+		notify_group_csg_reg_map_error(group);
+
+	return err;
+}
+
+static int shared_mcu_csg_reg_init(struct kbase_device *kbdev,
+				   struct kbase_csg_shared_region *csg_reg)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num;
+	const size_t nr_csg_reg_pages = 2 * (nr_susp_pages + nr_csis);
+	struct kbase_va_region *reg;
+	u64 vpfn;
+	int err, i;
+
+	INIT_LIST_HEAD(&csg_reg->link);
+	reg = kbase_alloc_free_region(&kbdev->csf.mcu_shared_zone, 0, nr_csg_reg_pages);
+
+	if (!reg) {
+		dev_err(kbdev->dev, "%s: Failed to allocate a MCU shared region for %zu pages\n",
+			__func__, nr_csg_reg_pages);
+		return -ENOMEM;
+	}
+
+	/* Insert the region into rbtree, so it becomes ready to use */
+	mutex_lock(&kbdev->csf.reg_lock);
+	err = kbase_add_va_region_rbtree(kbdev, reg, 0, nr_csg_reg_pages, 1);
+	reg->flags &= ~KBASE_REG_FREE;
+	mutex_unlock(&kbdev->csf.reg_lock);
+	if (err) {
+		kfree(reg);
+		dev_err(kbdev->dev, "%s: Failed to add a region of %zu pages into rbtree", __func__,
+			nr_csg_reg_pages);
+		return err;
+	}
+
+	/* Initialize the mappings so MMU only need to update the the corresponding
+	 * mapped phy-pages at runtime.
+	 * Map the normal suspend buffer pages to the prepared dummy phys[].
+	 */
+	vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages);
+	err = insert_dummy_pages(kbdev, vpfn, nr_susp_pages);
+
+	if (unlikely(err))
+		goto fail_susp_map_fail;
+
+	/* Map the protected suspend buffer pages to the prepared dummy phys[] */
+	vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages);
+	err = insert_dummy_pages(kbdev, vpfn, nr_susp_pages);
+
+	if (unlikely(err))
+		goto fail_pmod_map_fail;
+
+	for (i = 0; i < nr_csis; i++) {
+		vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages);
+		err = insert_dummy_pages(kbdev, vpfn, KBASEP_NUM_CS_USER_IO_PAGES);
+
+		if (unlikely(err))
+			goto fail_userio_pages_map_fail;
+	}
+
+	/* Replace the previous NULL-valued field with the successully initialized reg */
+	csg_reg->reg = reg;
+
+	return 0;
+
+fail_userio_pages_map_fail:
+	while (i-- > 0) {
+		vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages);
+		kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn,
+						  shared_regs->dummy_phys,
+						  KBASEP_NUM_CS_USER_IO_PAGES,
+						  KBASEP_NUM_CS_USER_IO_PAGES, MCU_AS_NR);
+	}
+
+	vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages);
+	kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys,
+					  nr_susp_pages, nr_susp_pages, MCU_AS_NR);
+fail_pmod_map_fail:
+	vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages);
+	kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys,
+					  nr_susp_pages, nr_susp_pages, MCU_AS_NR);
+fail_susp_map_fail:
+	mutex_lock(&kbdev->csf.reg_lock);
+	kbase_remove_va_region(kbdev, reg);
+	mutex_unlock(&kbdev->csf.reg_lock);
+	kfree(reg);
+
+	return err;
+}
+
+/* Note, this helper can only be called on scheduler shutdown */
+static void shared_mcu_csg_reg_term(struct kbase_device *kbdev,
+				    struct kbase_csg_shared_region *csg_reg)
+{
+	struct kbase_csf_mcu_shared_regions *shared_regs = &kbdev->csf.scheduler.mcu_regs_data;
+	struct kbase_va_region *reg = csg_reg->reg;
+	const u32 nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	const u32 nr_csis = kbdev->csf.global_iface.groups[0].stream_num;
+	u64 vpfn;
+	int i;
+
+	for (i = 0; i < nr_csis; i++) {
+		vpfn = CSG_REG_USERIO_VPFN(reg, i, nr_susp_pages);
+		kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn,
+						  shared_regs->dummy_phys,
+						  KBASEP_NUM_CS_USER_IO_PAGES,
+						  KBASEP_NUM_CS_USER_IO_PAGES, MCU_AS_NR);
+	}
+
+	vpfn = CSG_REG_PMOD_BUF_VPFN(reg, nr_susp_pages);
+	kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys,
+					  nr_susp_pages, nr_susp_pages, MCU_AS_NR);
+	vpfn = CSG_REG_SUSP_BUF_VPFN(reg, nr_susp_pages);
+	kbase_mmu_teardown_firmware_pages(kbdev, &kbdev->csf.mcu_mmu, vpfn, shared_regs->dummy_phys,
+					  nr_susp_pages, nr_susp_pages, MCU_AS_NR);
+
+	mutex_lock(&kbdev->csf.reg_lock);
+	kbase_remove_va_region(kbdev, reg);
+	mutex_unlock(&kbdev->csf.reg_lock);
+	kfree(reg);
+}
+
+int kbase_csf_mcu_shared_regs_data_init(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct kbase_csf_mcu_shared_regions *shared_regs = &scheduler->mcu_regs_data;
+	struct kbase_csg_shared_region *array_csg_regs;
+	const size_t nr_susp_pages = PFN_UP(kbdev->csf.global_iface.groups[0].suspend_size);
+	const u32 nr_groups = kbdev->csf.global_iface.group_num;
+	const u32 nr_csg_regs = MCU_SHARED_REGS_PREALLOCATE_SCALE * nr_groups;
+	const u32 nr_dummy_phys = MAX(nr_susp_pages, KBASEP_NUM_CS_USER_IO_PAGES);
+	u32 i;
+	int err;
+
+	shared_regs->userio_mem_rd_flags = get_userio_mmu_flags(kbdev);
+	INIT_LIST_HEAD(&shared_regs->unused_csg_regs);
+
+	shared_regs->dummy_phys =
+		kcalloc(nr_dummy_phys, sizeof(*shared_regs->dummy_phys), GFP_KERNEL);
+	if (!shared_regs->dummy_phys)
+		return -ENOMEM;
+
+	if (kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], 1,
+				       &shared_regs->dummy_phys[0], false, NULL) <= 0)
+		return -ENOMEM;
+
+	shared_regs->dummy_phys_allocated = true;
+	set_page_meta_status_not_movable(shared_regs->dummy_phys[0]);
+
+	/* Replicate the allocated single shared_regs->dummy_phys[0] to the full array */
+	for (i = 1; i < nr_dummy_phys; i++)
+		shared_regs->dummy_phys[i] = shared_regs->dummy_phys[0];
+
+	shared_regs->pma_phys = kcalloc(nr_susp_pages, sizeof(*shared_regs->pma_phys), GFP_KERNEL);
+	if (!shared_regs->pma_phys)
+		return -ENOMEM;
+
+	array_csg_regs = kcalloc(nr_csg_regs, sizeof(*array_csg_regs), GFP_KERNEL);
+	if (!array_csg_regs)
+		return -ENOMEM;
+	shared_regs->array_csg_regs = array_csg_regs;
+
+	/* All fields in scheduler->mcu_regs_data except the shared_regs->array_csg_regs
+	 * are properly populated and ready to use. Now initialize the items in
+	 * shared_regs->array_csg_regs[]
+	 */
+	for (i = 0; i < nr_csg_regs; i++) {
+		err = shared_mcu_csg_reg_init(kbdev, &array_csg_regs[i]);
+		if (err)
+			return err;
+
+		list_add_tail(&array_csg_regs[i].link, &shared_regs->unused_csg_regs);
+	}
+
+	return 0;
+}
+
+void kbase_csf_mcu_shared_regs_data_term(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct kbase_csf_mcu_shared_regions *shared_regs = &scheduler->mcu_regs_data;
+	struct kbase_csg_shared_region *array_csg_regs =
+		(struct kbase_csg_shared_region *)shared_regs->array_csg_regs;
+	const u32 nr_groups = kbdev->csf.global_iface.group_num;
+	const u32 nr_csg_regs = MCU_SHARED_REGS_PREALLOCATE_SCALE * nr_groups;
+
+	if (array_csg_regs) {
+		struct kbase_csg_shared_region *csg_reg;
+		u32 i, cnt_csg_regs = 0;
+
+		for (i = 0; i < nr_csg_regs; i++) {
+			csg_reg = &array_csg_regs[i];
+			/* There should not be any group mapping bindings */
+			WARN_ONCE(csg_reg->grp, "csg_reg has a bound group");
+
+			if (csg_reg->reg) {
+				shared_mcu_csg_reg_term(kbdev, csg_reg);
+				cnt_csg_regs++;
+			}
+		}
+
+		/* The nr_susp_regs counts should match the array_csg_regs' length */
+		list_for_each_entry(csg_reg, &shared_regs->unused_csg_regs, link)
+			cnt_csg_regs--;
+
+		WARN_ONCE(cnt_csg_regs, "Unmatched counts of susp_regs");
+		kfree(shared_regs->array_csg_regs);
+	}
+
+	if (shared_regs->dummy_phys_allocated) {
+		struct page *page = as_page(shared_regs->dummy_phys[0]);
+
+		kbase_mem_pool_free(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], page, false);
+	}
+
+	kfree(shared_regs->dummy_phys);
+	kfree(shared_regs->pma_phys);
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h
new file mode 100644
index 0000000..61943cb
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_mcu_shared_reg.h
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_CSF_MCU_SHARED_REG_H_
+#define _KBASE_CSF_MCU_SHARED_REG_H_
+
+/**
+ * kbase_csf_mcu_shared_set_group_csg_reg_active - Notify that the group is active on-slot with
+ *                                                 scheduling action. Essential runtime resources
+ *                                                 are bound with the group for it to run
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @group: Pointer to the group that is placed into active on-slot running by the scheduler.
+ *
+ */
+void kbase_csf_mcu_shared_set_group_csg_reg_active(struct kbase_device *kbdev,
+						   struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_mcu_shared_set_group_csg_reg_unused - Notify that the group is placed off-slot with
+ *                                                 scheduling action. Some of bound runtime
+ *                                                 resources can be reallocated for others to use
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @group: Pointer to the group that is placed off-slot by the scheduler.
+ *
+ */
+void kbase_csf_mcu_shared_set_group_csg_reg_unused(struct kbase_device *kbdev,
+						   struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_mcu_shared_group_update_pmode_map - Request to update the given group's protected
+ *                                             suspend buffer pages to be mapped for supporting
+ *                                             protected mode operations.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @group: Pointer to the group for attempting a protected mode suspend buffer binding/mapping.
+ *
+ * Return: 0 for success, the group has a protected suspend buffer region mapped. Otherwise an
+ *         error code is returned.
+ */
+int kbase_csf_mcu_shared_group_update_pmode_map(struct kbase_device *kbdev,
+						struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_mcu_shared_clear_evicted_group_csg_reg - Clear any bound regions/mappings as the
+ *                                                    given group is evicted out of the runtime
+ *                                                    operations.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @group: Pointer to the group that has been evicted out of set of operational groups.
+ *
+ * This function will taken away any of the bindings/mappings immediately so the resources
+ * are not tied up to the given group, which has been evicted out of scheduling action for
+ * termination.
+ */
+void kbase_csf_mcu_shared_clear_evicted_group_csg_reg(struct kbase_device *kbdev,
+						      struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_mcu_shared_add_queue - Request to add a newly activated queue for a group to be
+ *                                  run on slot.
+ *
+ * @kbdev:     Instance of a GPU platform device that implements a CSF interface.
+ * @queue:     Pointer to the queue that requires some runtime resource to be bound for joining
+ *             others that are already running on-slot with their bound group.
+ *
+ * Return: 0 on success, or negative on failure.
+ */
+int kbase_csf_mcu_shared_add_queue(struct kbase_device *kbdev, struct kbase_queue *queue);
+
+/**
+ * kbase_csf_mcu_shared_drop_stopped_queue - Request to drop a queue after it has been stopped
+ *                                           from its operational state from a group.
+ *
+ * @kbdev:     Instance of a GPU platform device that implements a CSF interface.
+ * @queue:     Pointer to the queue that has been stopped from operational state.
+ *
+ */
+void kbase_csf_mcu_shared_drop_stopped_queue(struct kbase_device *kbdev, struct kbase_queue *queue);
+
+/**
+ * kbase_csf_mcu_shared_group_bind_csg_reg - Bind some required runtime resources to the given
+ *                                           group for ready to run on-slot.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ * @group: Pointer to the queue group that requires the runtime resources.
+ *
+ * This function binds/maps the required suspend buffer pages and userio pages for the given
+ * group, readying it to run on-slot.
+ *
+ * Return: 0 on success, or negative on failure.
+ */
+int kbase_csf_mcu_shared_group_bind_csg_reg(struct kbase_device *kbdev,
+					    struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_mcu_shared_regs_data_init - Allocate and initialize the MCU shared regions data for
+ *                                       the given device.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function allocate and initialize the MCU shared VA regions for runtime operations
+ * of the CSF scheduler.
+ *
+ * Return: 0 on success, or an error code.
+ */
+int kbase_csf_mcu_shared_regs_data_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_mcu_shared_regs_data_term - Terminate the allocated MCU shared regions data for
+ *                                       the given device.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function terminates the MCU shared VA regions allocated for runtime operations
+ * of the CSF scheduler.
+ */
+void kbase_csf_mcu_shared_regs_data_term(struct kbase_device *kbdev);
+
+#endif /* _KBASE_CSF_MCU_SHARED_REG_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_protected_memory.c b/mali_kbase/csf/mali_kbase_csf_protected_memory.c
index bf1835b..1bb1c03 100644
--- a/mali_kbase/csf/mali_kbase_csf_protected_memory.c
+++ b/mali_kbase/csf/mali_kbase_csf_protected_memory.c
@@ -51,7 +51,12 @@ int kbase_csf_protected_memory_init(struct kbase_device *const kbdev)
 				dev_err(kbdev->dev, "Failed to get Protected memory allocator module\n");
 				err = -ENODEV;
 			} else {
-				dev_info(kbdev->dev, "Protected memory allocator successfully loaded\n");
+				err = dma_set_mask_and_coherent(&pdev->dev,
+					DMA_BIT_MASK(kbdev->gpu_props.mmu.pa_bits));
+				if (err)
+					dev_err(&(pdev->dev), "protected_memory_allocator set dma fail\n");
+				else
+					dev_info(kbdev->dev, "Protected memory allocator successfully loaded\n");
 			}
 		}
 		of_node_put(pma_node);
diff --git a/mali_kbase/csf/mali_kbase_csf_registers.h b/mali_kbase/csf/mali_kbase_csf_registers.h
index 99de444..b5ca885 100644
--- a/mali_kbase/csf/mali_kbase_csf_registers.h
+++ b/mali_kbase/csf/mali_kbase_csf_registers.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,10 +31,6 @@
  * Begin register sets
  */
 
-/* DOORBELLS base address */
-#define DOORBELLS_BASE 0x0080000
-#define DOORBELLS_REG(r) (DOORBELLS_BASE + (r))
-
 /* CS_KERNEL_INPUT_BLOCK base address */
 #define CS_KERNEL_INPUT_BLOCK_BASE 0x0000
 #define CS_KERNEL_INPUT_BLOCK_REG(r) (CS_KERNEL_INPUT_BLOCK_BASE + (r))
@@ -71,10 +67,6 @@
 #define GLB_OUTPUT_BLOCK_BASE 0x0000
 #define GLB_OUTPUT_BLOCK_REG(r) (GLB_OUTPUT_BLOCK_BASE + (r))
 
-/* USER base address */
-#define USER_BASE 0x0010000
-#define USER_REG(r) (USER_BASE + (r))
-
 /* End register sets */
 
 /*
@@ -151,18 +143,23 @@
 #define CSG_ACK_IRQ_MASK 0x0004 /* () Global acknowledge interrupt mask */
 #define CSG_DB_REQ 0x0008 /* () Global doorbell request */
 #define CSG_IRQ_ACK 0x000C /* () CS IRQ acknowledge */
+
+
 #define CSG_ALLOW_COMPUTE_LO 0x0020 /* () Allowed compute endpoints, low word */
 #define CSG_ALLOW_COMPUTE_HI 0x0024 /* () Allowed compute endpoints, high word */
 #define CSG_ALLOW_FRAGMENT_LO 0x0028 /* () Allowed fragment endpoints, low word */
 #define CSG_ALLOW_FRAGMENT_HI 0x002C /* () Allowed fragment endpoints, high word */
 #define CSG_ALLOW_OTHER 0x0030 /* () Allowed other endpoints */
-#define CSG_EP_REQ 0x0034 /* () Maximum number of endpoints allowed */
+#define CSG_EP_REQ_LO 0x0034 /* () Maximum number of endpoints allowed, low word */
+#define CSG_EP_REQ_HI 0x0038 /* () Maximum number of endpoints allowed, high word */
 #define CSG_SUSPEND_BUF_LO 0x0040 /* () Normal mode suspend buffer, low word */
 #define CSG_SUSPEND_BUF_HI 0x0044 /* () Normal mode suspend buffer, high word */
 #define CSG_PROTM_SUSPEND_BUF_LO 0x0048 /* () Protected mode suspend buffer, low word */
 #define CSG_PROTM_SUSPEND_BUF_HI 0x004C /* () Protected mode suspend buffer, high word */
 #define CSG_CONFIG 0x0050 /* () CSG configuration options */
 #define CSG_ITER_TRACE_CONFIG 0x0054 /* () CSG trace configuration */
+#define CSG_DVS_BUF_LO 0x0060 /* () Normal mode deferred vertex shading work buffer, low word */
+#define CSG_DVS_BUF_HI 0x0064 /* () Normal mode deferred vertex shading work buffer, high word */
 
 /* CSG_OUTPUT_BLOCK register offsets */
 #define CSG_ACK 0x0000 /* () CSG acknowledge flags */
@@ -227,24 +224,43 @@
 #define GLB_PRFCNT_TILER_EN 0x0058 /* () Performance counter enable for tiler */
 #define GLB_PRFCNT_MMU_L2_EN 0x005C /* () Performance counter enable for MMU/L2 cache */
 
-#define GLB_DEBUG_FWUTF_DESTROY 0x0FE0 /* () Test fixture destroy function address */
-#define GLB_DEBUG_FWUTF_TEST 0x0FE4 /* () Test index */
-#define GLB_DEBUG_FWUTF_FIXTURE 0x0FE8 /* () Test fixture index */
-#define GLB_DEBUG_FWUTF_CREATE 0x0FEC /* () Test fixture create function address */
+#define GLB_DEBUG_ARG_IN0 0x0FE0 /* Firmware Debug argument array element 0 */
+#define GLB_DEBUG_ARG_IN1 0x0FE4 /* Firmware Debug argument array element 1 */
+#define GLB_DEBUG_ARG_IN2 0x0FE8 /* Firmware Debug argument array element 2 */
+#define GLB_DEBUG_ARG_IN3 0x0FEC /* Firmware Debug argument array element 3 */
+
+/* Mappings based on GLB_DEBUG_REQ.FWUTF_RUN bit being different from GLB_DEBUG_ACK.FWUTF_RUN */
+#define GLB_DEBUG_FWUTF_DESTROY GLB_DEBUG_ARG_IN0 /* () Test fixture destroy function address */
+#define GLB_DEBUG_FWUTF_TEST GLB_DEBUG_ARG_IN1 /* () Test index */
+#define GLB_DEBUG_FWUTF_FIXTURE GLB_DEBUG_ARG_IN2 /* () Test fixture index */
+#define GLB_DEBUG_FWUTF_CREATE GLB_DEBUG_ARG_IN3 /* () Test fixture create function address */
+
 #define GLB_DEBUG_ACK_IRQ_MASK 0x0FF8 /* () Global debug acknowledge interrupt mask */
 #define GLB_DEBUG_REQ 0x0FFC /* () Global debug request */
 
 /* GLB_OUTPUT_BLOCK register offsets */
+#define GLB_DEBUG_ARG_OUT0 0x0FE0 /* Firmware debug result element 0 */
+#define GLB_DEBUG_ARG_OUT1 0x0FE4 /* Firmware debug result element 1 */
+#define GLB_DEBUG_ARG_OUT2 0x0FE8 /* Firmware debug result element 2 */
+#define GLB_DEBUG_ARG_OUT3 0x0FEC /* Firmware debug result element 3 */
+
 #define GLB_ACK 0x0000 /* () Global acknowledge */
 #define GLB_DB_ACK 0x0008 /* () Global doorbell acknowledge */
 #define GLB_HALT_STATUS 0x0010 /* () Global halt status */
 #define GLB_PRFCNT_STATUS 0x0014 /* () Performance counter status */
 #define GLB_PRFCNT_INSERT 0x0018 /* () Performance counter buffer insert index */
-#define GLB_DEBUG_FWUTF_RESULT 0x0FE0 /* () Firmware debug test result */
+#define GLB_DEBUG_FWUTF_RESULT GLB_DEBUG_ARG_OUT0 /* () Firmware debug test result */
 #define GLB_DEBUG_ACK 0x0FFC /* () Global debug acknowledge */
 
-/* USER register offsets */
-#define LATEST_FLUSH 0x0000 /* () Flush ID of latest clean-and-invalidate operation */
+#ifdef CONFIG_MALI_CORESIGHT
+#define GLB_DEBUG_REQ_FW_AS_WRITE_SHIFT 4
+#define GLB_DEBUG_REQ_FW_AS_WRITE_MASK (0x1 << GLB_DEBUG_REQ_FW_AS_WRITE_SHIFT)
+#define GLB_DEBUG_REQ_FW_AS_READ_SHIFT 5
+#define GLB_DEBUG_REQ_FW_AS_READ_MASK (0x1 << GLB_DEBUG_REQ_FW_AS_READ_SHIFT)
+#define GLB_DEBUG_ARG_IN0 0x0FE0
+#define GLB_DEBUG_ARG_IN1 0x0FE4
+#define GLB_DEBUG_ARG_OUT0 0x0FE0
+#endif /* CONFIG_MALI_CORESIGHT */
 
 /* End register offsets */
 
@@ -302,10 +318,17 @@
 #define CS_REQ_IDLE_RESOURCE_REQ_SHIFT 11
 #define CS_REQ_IDLE_RESOURCE_REQ_MASK (0x1 << CS_REQ_IDLE_RESOURCE_REQ_SHIFT)
 #define CS_REQ_IDLE_RESOURCE_REQ_GET(reg_val) \
-	(((reg_val)&CS_REQ_IDLE_RESOURCE_REQ_MASK) >> CS_REQ_IDLE_RESOURCE_REQ_SHIFT)
+	(((reg_val) & CS_REQ_IDLE_RESOURCE_REQ_MASK) >> CS_REQ_IDLE_RESOURCE_REQ_SHIFT)
 #define CS_REQ_IDLE_RESOURCE_REQ_SET(reg_val, value) \
 	(((reg_val) & ~CS_REQ_IDLE_RESOURCE_REQ_MASK) |  \
 	 (((value) << CS_REQ_IDLE_RESOURCE_REQ_SHIFT) & CS_REQ_IDLE_RESOURCE_REQ_MASK))
+#define CS_REQ_IDLE_SHARED_SB_DEC_SHIFT 12
+#define CS_REQ_IDLE_SHARED_SB_DEC_MASK (0x1 << CS_REQ_IDLE_SHARED_SB_DEC_SHIFT)
+#define CS_REQ_IDLE_SHARED_SB_DEC_GET(reg_val) \
+	(((reg_val) & CS_REQ_IDLE_SHARED_SB_DEC_MASK) >> CS_REQ_IDLE_SHARED_SB_DEC_SHIFT)
+#define CS_REQ_IDLE_SHARED_SB_DEC_REQ_SET(reg_val, value) \
+	(((reg_val) & ~CS_REQ_IDLE_SHARED_SB_DEC_MASK) |  \
+	 (((value) << CS_REQ_IDLE_SHARED_SB_DEC_SHIFT) & CS_REQ_IDLE_SHARED_SB_DEC_MASK))
 #define CS_REQ_TILER_OOM_SHIFT 26
 #define CS_REQ_TILER_OOM_MASK (0x1 << CS_REQ_TILER_OOM_SHIFT)
 #define CS_REQ_TILER_OOM_GET(reg_val) (((reg_val)&CS_REQ_TILER_OOM_MASK) >> CS_REQ_TILER_OOM_SHIFT)
@@ -387,7 +410,7 @@
 
 /* CS_BASE register */
 #define CS_BASE_POINTER_SHIFT 0
-#define CS_BASE_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_BASE_POINTER_SHIFT)
+#define CS_BASE_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_BASE_POINTER_SHIFT)
 #define CS_BASE_POINTER_GET(reg_val) (((reg_val)&CS_BASE_POINTER_MASK) >> CS_BASE_POINTER_SHIFT)
 #define CS_BASE_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CS_BASE_POINTER_MASK) | (((value) << CS_BASE_POINTER_SHIFT) & CS_BASE_POINTER_MASK))
@@ -401,7 +424,8 @@
 
 /* CS_TILER_HEAP_START register */
 #define CS_TILER_HEAP_START_POINTER_SHIFT 0
-#define CS_TILER_HEAP_START_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_TILER_HEAP_START_POINTER_SHIFT)
+#define CS_TILER_HEAP_START_POINTER_MASK                                                           \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_TILER_HEAP_START_POINTER_SHIFT)
 #define CS_TILER_HEAP_START_POINTER_GET(reg_val) \
 	(((reg_val)&CS_TILER_HEAP_START_POINTER_MASK) >> CS_TILER_HEAP_START_POINTER_SHIFT)
 #define CS_TILER_HEAP_START_POINTER_SET(reg_val, value) \
@@ -412,7 +436,8 @@
 
 /* CS_TILER_HEAP_END register */
 #define CS_TILER_HEAP_END_POINTER_SHIFT 0
-#define CS_TILER_HEAP_END_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_TILER_HEAP_END_POINTER_SHIFT)
+#define CS_TILER_HEAP_END_POINTER_MASK                                                             \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_TILER_HEAP_END_POINTER_SHIFT)
 #define CS_TILER_HEAP_END_POINTER_GET(reg_val) \
 	(((reg_val)&CS_TILER_HEAP_END_POINTER_MASK) >> CS_TILER_HEAP_END_POINTER_SHIFT)
 #define CS_TILER_HEAP_END_POINTER_SET(reg_val, value) \
@@ -423,7 +448,7 @@
 
 /* CS_USER_INPUT register */
 #define CS_USER_INPUT_POINTER_SHIFT 0
-#define CS_USER_INPUT_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_USER_INPUT_POINTER_SHIFT)
+#define CS_USER_INPUT_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_USER_INPUT_POINTER_SHIFT)
 #define CS_USER_INPUT_POINTER_GET(reg_val) (((reg_val)&CS_USER_INPUT_POINTER_MASK) >> CS_USER_INPUT_POINTER_SHIFT)
 #define CS_USER_INPUT_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CS_USER_INPUT_POINTER_MASK) |  \
@@ -431,7 +456,7 @@
 
 /* CS_USER_OUTPUT register */
 #define CS_USER_OUTPUT_POINTER_SHIFT 0
-#define CS_USER_OUTPUT_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_USER_OUTPUT_POINTER_SHIFT)
+#define CS_USER_OUTPUT_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_USER_OUTPUT_POINTER_SHIFT)
 #define CS_USER_OUTPUT_POINTER_GET(reg_val) (((reg_val)&CS_USER_OUTPUT_POINTER_MASK) >> CS_USER_OUTPUT_POINTER_SHIFT)
 #define CS_USER_OUTPUT_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CS_USER_OUTPUT_POINTER_MASK) |  \
@@ -470,7 +495,8 @@
 
 /* CS_INSTR_BUFFER_BASE register */
 #define CS_INSTR_BUFFER_BASE_POINTER_SHIFT (0)
-#define CS_INSTR_BUFFER_BASE_POINTER_MASK ((u64)0xFFFFFFFFFFFFFFFF << CS_INSTR_BUFFER_BASE_POINTER_SHIFT)
+#define CS_INSTR_BUFFER_BASE_POINTER_MASK                                                          \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_INSTR_BUFFER_BASE_POINTER_SHIFT)
 #define CS_INSTR_BUFFER_BASE_POINTER_GET(reg_val) \
 	(((reg_val)&CS_INSTR_BUFFER_BASE_POINTER_MASK) >> CS_INSTR_BUFFER_BASE_POINTER_SHIFT)
 #define CS_INSTR_BUFFER_BASE_POINTER_SET(reg_val, value) \
@@ -479,8 +505,8 @@
 
 /* CS_INSTR_BUFFER_OFFSET_POINTER register */
 #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT (0)
-#define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK \
-	(((u64)0xFFFFFFFFFFFFFFFF) << CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT)
+#define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK                                                \
+	((GPU_ULL(0xFFFFFFFFFFFFFFFF)) << CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT)
 #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_GET(reg_val) \
 	(((reg_val)&CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_MASK) >> CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SHIFT)
 #define CS_INSTR_BUFFER_OFFSET_POINTER_POINTER_SET(reg_val, value) \
@@ -529,7 +555,8 @@
 
 /* CS_STATUS_CMD_PTR register */
 #define CS_STATUS_CMD_PTR_POINTER_SHIFT 0
-#define CS_STATUS_CMD_PTR_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_STATUS_CMD_PTR_POINTER_SHIFT)
+#define CS_STATUS_CMD_PTR_POINTER_MASK                                                             \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_STATUS_CMD_PTR_POINTER_SHIFT)
 #define CS_STATUS_CMD_PTR_POINTER_GET(reg_val) \
 	(((reg_val)&CS_STATUS_CMD_PTR_POINTER_MASK) >> CS_STATUS_CMD_PTR_POINTER_SHIFT)
 #define CS_STATUS_CMD_PTR_POINTER_SET(reg_val, value) \
@@ -543,6 +570,13 @@
 #define CS_STATUS_WAIT_SB_MASK_SET(reg_val, value) \
 	(((reg_val) & ~CS_STATUS_WAIT_SB_MASK_MASK) |  \
 	 (((value) << CS_STATUS_WAIT_SB_MASK_SHIFT) & CS_STATUS_WAIT_SB_MASK_MASK))
+#define CS_STATUS_WAIT_SB_SOURCE_SHIFT 16
+#define CS_STATUS_WAIT_SB_SOURCE_MASK (0xF << CS_STATUS_WAIT_SB_SOURCE_SHIFT)
+#define CS_STATUS_WAIT_SB_SOURCE_GET(reg_val)                                                      \
+	(((reg_val)&CS_STATUS_WAIT_SB_SOURCE_MASK) >> CS_STATUS_WAIT_SB_SOURCE_SHIFT)
+#define CS_STATUS_WAIT_SB_SOURCE_SET(reg_val, value)                                               \
+	(((reg_val) & ~CS_STATUS_WAIT_SB_SOURCE_MASK) |                                            \
+	 (((value) << CS_STATUS_WAIT_SB_SOURCE_SHIFT) & CS_STATUS_WAIT_SB_SOURCE_MASK))
 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_SHIFT 24
 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_MASK (0xF << CS_STATUS_WAIT_SYNC_WAIT_CONDITION_SHIFT)
 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GET(reg_val) \
@@ -553,6 +587,7 @@
 /* CS_STATUS_WAIT_SYNC_WAIT_CONDITION values */
 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE 0x0
 #define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT 0x1
+#define CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE 0x5
 /* End of CS_STATUS_WAIT_SYNC_WAIT_CONDITION values */
 #define CS_STATUS_WAIT_PROGRESS_WAIT_SHIFT 28
 #define CS_STATUS_WAIT_PROGRESS_WAIT_MASK (0x1 << CS_STATUS_WAIT_PROGRESS_WAIT_SHIFT)
@@ -568,6 +603,13 @@
 #define CS_STATUS_WAIT_PROTM_PEND_SET(reg_val, value) \
 	(((reg_val) & ~CS_STATUS_WAIT_PROTM_PEND_MASK) |  \
 	 (((value) << CS_STATUS_WAIT_PROTM_PEND_SHIFT) & CS_STATUS_WAIT_PROTM_PEND_MASK))
+#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT 30
+#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK (0x1 << CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT)
+#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_GET(reg_val)                                                 \
+	(((reg_val)&CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK) >> CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT)
+#define CS_STATUS_WAIT_SYNC_WAIT_SIZE_SET(reg_val, value)                                          \
+	(((reg_val) & ~CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK) |                                       \
+	 (((value) << CS_STATUS_WAIT_SYNC_WAIT_SIZE_SHIFT) & CS_STATUS_WAIT_SYNC_WAIT_SIZE_MASK))
 #define CS_STATUS_WAIT_SYNC_WAIT_SHIFT 31
 #define CS_STATUS_WAIT_SYNC_WAIT_MASK (0x1 << CS_STATUS_WAIT_SYNC_WAIT_SHIFT)
 #define CS_STATUS_WAIT_SYNC_WAIT_GET(reg_val) \
@@ -606,9 +648,11 @@
 	(((reg_val) & ~CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_MASK) |  \
 	 (((value) << CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_SHIFT) & CS_STATUS_REQ_RESOURCE_IDVS_RESOURCES_MASK))
 
+
 /* CS_STATUS_WAIT_SYNC_POINTER register */
 #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT 0
-#define CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT)
+#define CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK                                                   \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT)
 #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_GET(reg_val) \
 	(((reg_val)&CS_STATUS_WAIT_SYNC_POINTER_POINTER_MASK) >> CS_STATUS_WAIT_SYNC_POINTER_POINTER_SHIFT)
 #define CS_STATUS_WAIT_SYNC_POINTER_POINTER_SET(reg_val, value) \
@@ -677,6 +721,27 @@
 #define CS_FAULT_EXCEPTION_TYPE_ADDR_RANGE_FAULT 0x5A
 #define CS_FAULT_EXCEPTION_TYPE_IMPRECISE_FAULT 0x5B
 #define CS_FAULT_EXCEPTION_TYPE_RESOURCE_EVICTION_TIMEOUT 0x69
+#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L0 0xC0
+#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L1 0xC1
+#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L2 0xC2
+#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L3 0xC3
+#define CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L4 0xC4
+#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_0 0xC8
+#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_1 0xC9
+#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_2 0xCA
+#define CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_3 0xCB
+#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_1 0xD9
+#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_2 0xDA
+#define CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_3 0xDB
+#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_IN 0xE0
+#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_0 0xE4
+#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_1 0xE5
+#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_2 0xE6
+#define CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_3 0xE7
+#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_0 0xE8
+#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_1 0xE9
+#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_2 0xEA
+#define CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_3 0xEB
 /* End of CS_FAULT_EXCEPTION_TYPE values */
 #define CS_FAULT_EXCEPTION_DATA_SHIFT 8
 #define CS_FAULT_EXCEPTION_DATA_MASK (0xFFFFFF << CS_FAULT_EXCEPTION_DATA_SHIFT)
@@ -694,6 +759,7 @@
 	 (((value) << CS_FATAL_EXCEPTION_TYPE_SHIFT) & CS_FATAL_EXCEPTION_TYPE_MASK))
 /* CS_FATAL_EXCEPTION_TYPE values */
 #define CS_FATAL_EXCEPTION_TYPE_CS_CONFIG_FAULT 0x40
+#define CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE 0x41
 #define CS_FATAL_EXCEPTION_TYPE_CS_ENDPOINT_FAULT 0x44
 #define CS_FATAL_EXCEPTION_TYPE_CS_BUS_FAULT 0x48
 #define CS_FATAL_EXCEPTION_TYPE_CS_INVALID_INSTRUCTION 0x49
@@ -709,7 +775,8 @@
 
 /* CS_FAULT_INFO register */
 #define CS_FAULT_INFO_EXCEPTION_DATA_SHIFT 0
-#define CS_FAULT_INFO_EXCEPTION_DATA_MASK (0xFFFFFFFFFFFFFFFF << CS_FAULT_INFO_EXCEPTION_DATA_SHIFT)
+#define CS_FAULT_INFO_EXCEPTION_DATA_MASK                                                          \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_FAULT_INFO_EXCEPTION_DATA_SHIFT)
 #define CS_FAULT_INFO_EXCEPTION_DATA_GET(reg_val) \
 	(((reg_val)&CS_FAULT_INFO_EXCEPTION_DATA_MASK) >> CS_FAULT_INFO_EXCEPTION_DATA_SHIFT)
 #define CS_FAULT_INFO_EXCEPTION_DATA_SET(reg_val, value) \
@@ -718,7 +785,8 @@
 
 /* CS_FATAL_INFO register */
 #define CS_FATAL_INFO_EXCEPTION_DATA_SHIFT 0
-#define CS_FATAL_INFO_EXCEPTION_DATA_MASK (0xFFFFFFFFFFFFFFFF << CS_FATAL_INFO_EXCEPTION_DATA_SHIFT)
+#define CS_FATAL_INFO_EXCEPTION_DATA_MASK                                                          \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_FATAL_INFO_EXCEPTION_DATA_SHIFT)
 #define CS_FATAL_INFO_EXCEPTION_DATA_GET(reg_val) \
 	(((reg_val)&CS_FATAL_INFO_EXCEPTION_DATA_MASK) >> CS_FATAL_INFO_EXCEPTION_DATA_SHIFT)
 #define CS_FATAL_INFO_EXCEPTION_DATA_SET(reg_val, value) \
@@ -750,7 +818,7 @@
 
 /* CS_HEAP_ADDRESS register */
 #define CS_HEAP_ADDRESS_POINTER_SHIFT 0
-#define CS_HEAP_ADDRESS_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CS_HEAP_ADDRESS_POINTER_SHIFT)
+#define CS_HEAP_ADDRESS_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_HEAP_ADDRESS_POINTER_SHIFT)
 #define CS_HEAP_ADDRESS_POINTER_GET(reg_val) (((reg_val)&CS_HEAP_ADDRESS_POINTER_MASK) >> CS_HEAP_ADDRESS_POINTER_SHIFT)
 #define CS_HEAP_ADDRESS_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CS_HEAP_ADDRESS_POINTER_MASK) |  \
@@ -761,14 +829,14 @@
 
 /* CS_INSERT register */
 #define CS_INSERT_VALUE_SHIFT 0
-#define CS_INSERT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_INSERT_VALUE_SHIFT)
+#define CS_INSERT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_INSERT_VALUE_SHIFT)
 #define CS_INSERT_VALUE_GET(reg_val) (((reg_val)&CS_INSERT_VALUE_MASK) >> CS_INSERT_VALUE_SHIFT)
 #define CS_INSERT_VALUE_SET(reg_val, value) \
 	(((reg_val) & ~CS_INSERT_VALUE_MASK) | (((value) << CS_INSERT_VALUE_SHIFT) & CS_INSERT_VALUE_MASK))
 
 /* CS_EXTRACT_INIT register */
 #define CS_EXTRACT_INIT_VALUE_SHIFT 0
-#define CS_EXTRACT_INIT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_EXTRACT_INIT_VALUE_SHIFT)
+#define CS_EXTRACT_INIT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_EXTRACT_INIT_VALUE_SHIFT)
 #define CS_EXTRACT_INIT_VALUE_GET(reg_val) (((reg_val)&CS_EXTRACT_INIT_VALUE_MASK) >> CS_EXTRACT_INIT_VALUE_SHIFT)
 #define CS_EXTRACT_INIT_VALUE_SET(reg_val, value) \
 	(((reg_val) & ~CS_EXTRACT_INIT_VALUE_MASK) |  \
@@ -779,7 +847,7 @@
 
 /* CS_EXTRACT register */
 #define CS_EXTRACT_VALUE_SHIFT 0
-#define CS_EXTRACT_VALUE_MASK (0xFFFFFFFFFFFFFFFF << CS_EXTRACT_VALUE_SHIFT)
+#define CS_EXTRACT_VALUE_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CS_EXTRACT_VALUE_SHIFT)
 #define CS_EXTRACT_VALUE_GET(reg_val) (((reg_val)&CS_EXTRACT_VALUE_MASK) >> CS_EXTRACT_VALUE_SHIFT)
 #define CS_EXTRACT_VALUE_SET(reg_val, value) \
 	(((reg_val) & ~CS_EXTRACT_VALUE_MASK) | (((value) << CS_EXTRACT_VALUE_SHIFT) & CS_EXTRACT_VALUE_MASK))
@@ -827,11 +895,6 @@
 #define CSG_REQ_IDLE_GET(reg_val) (((reg_val)&CSG_REQ_IDLE_MASK) >> CSG_REQ_IDLE_SHIFT)
 #define CSG_REQ_IDLE_SET(reg_val, value) \
 	(((reg_val) & ~CSG_REQ_IDLE_MASK) | (((value) << CSG_REQ_IDLE_SHIFT) & CSG_REQ_IDLE_MASK))
-#define CSG_REQ_DOORBELL_SHIFT 30
-#define CSG_REQ_DOORBELL_MASK (0x1 << CSG_REQ_DOORBELL_SHIFT)
-#define CSG_REQ_DOORBELL_GET(reg_val) (((reg_val)&CSG_REQ_DOORBELL_MASK) >> CSG_REQ_DOORBELL_SHIFT)
-#define CSG_REQ_DOORBELL_SET(reg_val, value) \
-	(((reg_val) & ~CSG_REQ_DOORBELL_MASK) | (((value) << CSG_REQ_DOORBELL_SHIFT) & CSG_REQ_DOORBELL_MASK))
 #define CSG_REQ_PROGRESS_TIMER_EVENT_SHIFT 31
 #define CSG_REQ_PROGRESS_TIMER_EVENT_MASK (0x1 << CSG_REQ_PROGRESS_TIMER_EVENT_SHIFT)
 #define CSG_REQ_PROGRESS_TIMER_EVENT_GET(reg_val) \
@@ -894,45 +957,50 @@
 
 /* CSG_EP_REQ register */
 #define CSG_EP_REQ_COMPUTE_EP_SHIFT 0
-#define CSG_EP_REQ_COMPUTE_EP_MASK (0xFF << CSG_EP_REQ_COMPUTE_EP_SHIFT)
+#define CSG_EP_REQ_COMPUTE_EP_MASK ((u64)0xFF << CSG_EP_REQ_COMPUTE_EP_SHIFT)
 #define CSG_EP_REQ_COMPUTE_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_COMPUTE_EP_MASK) >> CSG_EP_REQ_COMPUTE_EP_SHIFT)
-#define CSG_EP_REQ_COMPUTE_EP_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_COMPUTE_EP_MASK) |  \
-	 (((value) << CSG_EP_REQ_COMPUTE_EP_SHIFT) & CSG_EP_REQ_COMPUTE_EP_MASK))
+#define CSG_EP_REQ_COMPUTE_EP_SET(reg_val, value)                                                  \
+	(((reg_val) & ~CSG_EP_REQ_COMPUTE_EP_MASK) |                                               \
+	 ((((u64)value) << CSG_EP_REQ_COMPUTE_EP_SHIFT) & CSG_EP_REQ_COMPUTE_EP_MASK))
 #define CSG_EP_REQ_FRAGMENT_EP_SHIFT 8
-#define CSG_EP_REQ_FRAGMENT_EP_MASK (0xFF << CSG_EP_REQ_FRAGMENT_EP_SHIFT)
+#define CSG_EP_REQ_FRAGMENT_EP_MASK ((u64)0xFF << CSG_EP_REQ_FRAGMENT_EP_SHIFT)
 #define CSG_EP_REQ_FRAGMENT_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_FRAGMENT_EP_MASK) >> CSG_EP_REQ_FRAGMENT_EP_SHIFT)
-#define CSG_EP_REQ_FRAGMENT_EP_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_FRAGMENT_EP_MASK) |  \
-	 (((value) << CSG_EP_REQ_FRAGMENT_EP_SHIFT) & CSG_EP_REQ_FRAGMENT_EP_MASK))
+#define CSG_EP_REQ_FRAGMENT_EP_SET(reg_val, value)                                                 \
+	(((reg_val) & ~CSG_EP_REQ_FRAGMENT_EP_MASK) |                                              \
+	 ((((u64)value) << CSG_EP_REQ_FRAGMENT_EP_SHIFT) & CSG_EP_REQ_FRAGMENT_EP_MASK))
 #define CSG_EP_REQ_TILER_EP_SHIFT 16
-#define CSG_EP_REQ_TILER_EP_MASK (0xF << CSG_EP_REQ_TILER_EP_SHIFT)
+#define CSG_EP_REQ_TILER_EP_MASK ((u64)0xF << CSG_EP_REQ_TILER_EP_SHIFT)
 #define CSG_EP_REQ_TILER_EP_GET(reg_val) (((reg_val)&CSG_EP_REQ_TILER_EP_MASK) >> CSG_EP_REQ_TILER_EP_SHIFT)
-#define CSG_EP_REQ_TILER_EP_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_TILER_EP_MASK) | (((value) << CSG_EP_REQ_TILER_EP_SHIFT) & CSG_EP_REQ_TILER_EP_MASK))
+#define CSG_EP_REQ_TILER_EP_SET(reg_val, value)                                                    \
+	(((reg_val) & ~CSG_EP_REQ_TILER_EP_MASK) |                                                 \
+	 ((((u64)value) << CSG_EP_REQ_TILER_EP_SHIFT) & CSG_EP_REQ_TILER_EP_MASK))
 #define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT 20
-#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK (0x1 << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT)
+#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK ((u64)0x1 << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT)
 #define CSG_EP_REQ_EXCLUSIVE_COMPUTE_GET(reg_val) \
 	(((reg_val)&CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) >> CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT)
-#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) |  \
-	 (((value) << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) & CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK))
+#define CSG_EP_REQ_EXCLUSIVE_COMPUTE_SET(reg_val, value)                                           \
+	(((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK) |                                        \
+	 ((((u64)value) << CSG_EP_REQ_EXCLUSIVE_COMPUTE_SHIFT) &                                   \
+	  CSG_EP_REQ_EXCLUSIVE_COMPUTE_MASK))
 #define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT 21
-#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK (0x1 << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT)
+#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK ((u64)0x1 << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT)
 #define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_GET(reg_val) \
 	(((reg_val)&CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) >> CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT)
-#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) |  \
-	 (((value) << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) & CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK))
+#define CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SET(reg_val, value)                                          \
+	(((reg_val) & ~CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) |                                       \
+	 ((((u64)value) << CSG_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) &                                  \
+	  CSG_EP_REQ_EXCLUSIVE_FRAGMENT_MASK))
 #define CSG_EP_REQ_PRIORITY_SHIFT 28
-#define CSG_EP_REQ_PRIORITY_MASK (0xF << CSG_EP_REQ_PRIORITY_SHIFT)
+#define CSG_EP_REQ_PRIORITY_MASK ((u64)0xF << CSG_EP_REQ_PRIORITY_SHIFT)
 #define CSG_EP_REQ_PRIORITY_GET(reg_val) (((reg_val)&CSG_EP_REQ_PRIORITY_MASK) >> CSG_EP_REQ_PRIORITY_SHIFT)
-#define CSG_EP_REQ_PRIORITY_SET(reg_val, value) \
-	(((reg_val) & ~CSG_EP_REQ_PRIORITY_MASK) | (((value) << CSG_EP_REQ_PRIORITY_SHIFT) & CSG_EP_REQ_PRIORITY_MASK))
+#define CSG_EP_REQ_PRIORITY_SET(reg_val, value)                                                    \
+	(((reg_val) & ~CSG_EP_REQ_PRIORITY_MASK) |                                                 \
+	 ((((u64)value) << CSG_EP_REQ_PRIORITY_SHIFT) & CSG_EP_REQ_PRIORITY_MASK))
+
 
 /* CSG_SUSPEND_BUF register */
 #define CSG_SUSPEND_BUF_POINTER_SHIFT 0
-#define CSG_SUSPEND_BUF_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CSG_SUSPEND_BUF_POINTER_SHIFT)
+#define CSG_SUSPEND_BUF_POINTER_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << CSG_SUSPEND_BUF_POINTER_SHIFT)
 #define CSG_SUSPEND_BUF_POINTER_GET(reg_val) (((reg_val)&CSG_SUSPEND_BUF_POINTER_MASK) >> CSG_SUSPEND_BUF_POINTER_SHIFT)
 #define CSG_SUSPEND_BUF_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CSG_SUSPEND_BUF_POINTER_MASK) |  \
@@ -940,13 +1008,29 @@
 
 /* CSG_PROTM_SUSPEND_BUF register */
 #define CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT 0
-#define CSG_PROTM_SUSPEND_BUF_POINTER_MASK (0xFFFFFFFFFFFFFFFF << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT)
+#define CSG_PROTM_SUSPEND_BUF_POINTER_MASK                                                         \
+	(GPU_ULL(0xFFFFFFFFFFFFFFFF) << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT)
 #define CSG_PROTM_SUSPEND_BUF_POINTER_GET(reg_val) \
 	(((reg_val)&CSG_PROTM_SUSPEND_BUF_POINTER_MASK) >> CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT)
 #define CSG_PROTM_SUSPEND_BUF_POINTER_SET(reg_val, value) \
 	(((reg_val) & ~CSG_PROTM_SUSPEND_BUF_POINTER_MASK) |  \
 	 (((value) << CSG_PROTM_SUSPEND_BUF_POINTER_SHIFT) & CSG_PROTM_SUSPEND_BUF_POINTER_MASK))
 
+/* CSG_DVS_BUF_BUFFER register */
+#define CSG_DVS_BUF_BUFFER_SIZE_SHIFT GPU_U(0)
+#define CSG_DVS_BUF_BUFFER_SIZE_MASK (GPU_U(0xFFF) << CSG_DVS_BUF_BUFFER_SIZE_SHIFT)
+#define CSG_DVS_BUF_BUFFER_SIZE_GET(reg_val) (((reg_val)&CSG_DVS_BUF_BUFFER_SIZE_MASK) >> CSG_DVS_BUF_BUFFER_SIZE_SHIFT)
+#define CSG_DVS_BUF_BUFFER_SIZE_SET(reg_val, value) \
+	(((reg_val) & ~CSG_DVS_BUF_BUFFER_SIZE_MASK) |  \
+	 (((value) << CSG_DVS_BUF_BUFFER_SIZE_SHIFT) & CSG_DVS_BUF_BUFFER_SIZE_MASK))
+#define CSG_DVS_BUF_BUFFER_POINTER_SHIFT GPU_U(12)
+#define CSG_DVS_BUF_BUFFER_POINTER_MASK                                                            \
+	(GPU_ULL(0xFFFFFFFFFFFFF) << CSG_DVS_BUF_BUFFER_POINTER_SHIFT)
+#define CSG_DVS_BUF_BUFFER_POINTER_GET(reg_val) \
+	(((reg_val)&CSG_DVS_BUF_BUFFER_POINTER_MASK) >> CSG_DVS_BUF_BUFFER_POINTER_SHIFT)
+#define CSG_DVS_BUF_BUFFER_POINTER_SET(reg_val, value) \
+	(((reg_val) & ~CSG_DVS_BUF_BUFFER_POINTER_MASK) |  \
+	 (((value) << CSG_DVS_BUF_BUFFER_POINTER_SHIFT) & CSG_DVS_BUF_BUFFER_POINTER_MASK))
 
 /* End of CSG_INPUT_BLOCK register set definitions */
 
@@ -1021,6 +1105,7 @@
 	(((reg_val) & ~CSG_STATUS_EP_CURRENT_TILER_EP_MASK) |  \
 	 (((value) << CSG_STATUS_EP_CURRENT_TILER_EP_SHIFT) & CSG_STATUS_EP_CURRENT_TILER_EP_MASK))
 
+
 /* CSG_STATUS_EP_REQ register */
 #define CSG_STATUS_EP_REQ_COMPUTE_EP_SHIFT 0
 #define CSG_STATUS_EP_REQ_COMPUTE_EP_MASK (0xFF << CSG_STATUS_EP_REQ_COMPUTE_EP_SHIFT)
@@ -1058,6 +1143,7 @@
 	(((reg_val) & ~CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_MASK) |  \
 	 (((value) << CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_SHIFT) & CSG_STATUS_EP_REQ_EXCLUSIVE_FRAGMENT_MASK))
 
+
 /* End of CSG_OUTPUT_BLOCK register set definitions */
 
 /* STREAM_CONTROL_BLOCK register set definitions */
@@ -1406,9 +1492,23 @@
 #define GLB_PWROFF_TIMER_TIMER_SOURCE_GPU_COUNTER 0x1
 /* End of GLB_PWROFF_TIMER_TIMER_SOURCE values */
 
+/* GLB_PWROFF_TIMER_CONFIG register */
+#ifndef GLB_PWROFF_TIMER_CONFIG
+#define GLB_PWROFF_TIMER_CONFIG 0x0088 /* () Configuration fields for GLB_PWROFF_TIMER */
+#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT 0
+#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK (0x1 << GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT)
+#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_GET(reg_val)         \
+	(((reg_val)&GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK) >> \
+	 GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT)
+#define GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SET(reg_val, value)    \
+	(((reg_val) & ~GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK) | \
+	 (((value) << GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_SHIFT) & \
+	  GLB_PWROFF_TIMER_CONFIG_NO_MODIFIER_MASK))
+#endif /* End of GLB_PWROFF_TIMER_CONFIG values */
+
 /* GLB_ALLOC_EN register */
 #define GLB_ALLOC_EN_MASK_SHIFT 0
-#define GLB_ALLOC_EN_MASK_MASK (0xFFFFFFFFFFFFFFFF << GLB_ALLOC_EN_MASK_SHIFT)
+#define GLB_ALLOC_EN_MASK_MASK (GPU_ULL(0xFFFFFFFFFFFFFFFF) << GLB_ALLOC_EN_MASK_SHIFT)
 #define GLB_ALLOC_EN_MASK_GET(reg_val) (((reg_val)&GLB_ALLOC_EN_MASK_MASK) >> GLB_ALLOC_EN_MASK_SHIFT)
 #define GLB_ALLOC_EN_MASK_SET(reg_val, value) \
 	(((reg_val) & ~GLB_ALLOC_EN_MASK_MASK) | (((value) << GLB_ALLOC_EN_MASK_SHIFT) & GLB_ALLOC_EN_MASK_MASK))
@@ -1471,6 +1571,20 @@
 #define GLB_IDLE_TIMER_TIMER_SOURCE_GPU_COUNTER 0x1
 /* End of GLB_IDLE_TIMER_TIMER_SOURCE values */
 
+/* GLB_IDLE_TIMER_CONFIG values */
+#ifndef GLB_IDLE_TIMER_CONFIG
+#define GLB_IDLE_TIMER_CONFIG 0x0084 /* () Configuration fields for GLB_IDLE_TIMER */
+#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT 0
+#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK (0x1 << GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT)
+#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_GET(reg_val)         \
+	(((reg_val)&GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK) >> \
+	 GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT)
+#define GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SET(reg_val, value)    \
+	(((reg_val) & ~GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK) | \
+	 (((value) << GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_SHIFT) & \
+	  GLB_IDLE_TIMER_CONFIG_NO_MODIFIER_MASK))
+#endif /* End of GLB_IDLE_TIMER_CONFIG values */
+
 /* GLB_INSTR_FEATURES register */
 #define GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_SHIFT (0)
 #define GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_MASK ((u32)0xF << GLB_INSTR_FEATURES_OFFSET_UPDATE_RATE_SHIFT)
@@ -1521,4 +1635,84 @@
 	 (((value) << GLB_REQ_ITER_TRACE_ENABLE_SHIFT) &                       \
 	  GLB_REQ_ITER_TRACE_ENABLE_MASK))
 
+/* GLB_PRFCNT_CONFIG register */
+#define GLB_PRFCNT_CONFIG_SIZE_SHIFT (0)
+#define GLB_PRFCNT_CONFIG_SIZE_MASK (0xFF << GLB_PRFCNT_CONFIG_SIZE_SHIFT)
+#define GLB_PRFCNT_CONFIG_SIZE_GET(reg_val)                                                        \
+	(((reg_val)&GLB_PRFCNT_CONFIG_SIZE_MASK) >> GLB_PRFCNT_CONFIG_SIZE_SHIFT)
+#define GLB_PRFCNT_CONFIG_SIZE_SET(reg_val, value)                                                 \
+	(((reg_val) & ~GLB_PRFCNT_CONFIG_SIZE_MASK) |                                              \
+	 (((value) << GLB_PRFCNT_CONFIG_SIZE_SHIFT) & GLB_PRFCNT_CONFIG_SIZE_MASK))
+#define GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT GPU_U(8)
+#define GLB_PRFCNT_CONFIG_SET_SELECT_MASK (GPU_U(0x3) << GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT)
+#define GLB_PRFCNT_CONFIG_SET_SELECT_GET(reg_val)                                                  \
+	(((reg_val)&GLB_PRFCNT_CONFIG_SET_SELECT_MASK) >> GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT)
+#define GLB_PRFCNT_CONFIG_SET_SELECT_SET(reg_val, value)                                           \
+	(((reg_val) & ~GLB_PRFCNT_CONFIG_SET_SELECT_MASK) |                                        \
+	 (((value) << GLB_PRFCNT_CONFIG_SET_SELECT_SHIFT) & GLB_PRFCNT_CONFIG_SET_SELECT_MASK))
+
+/* GLB_PRFCNT_SIZE register */
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET_MOD(value) ((value) >> 8)
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET_MOD(value) ((value) << 8)
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT GPU_U(0)
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK (GPU_U(0xFFFF) << GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT)
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET(reg_val)                                                 \
+	(GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET_MOD(((reg_val)&GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK) >>   \
+					       GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT))
+#define GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET(reg_val, value)                                          \
+	(((reg_val) & ~GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK) |                                       \
+	 ((GLB_PRFCNT_SIZE_HARDWARE_SIZE_SET_MOD(value) << GLB_PRFCNT_SIZE_HARDWARE_SIZE_SHIFT) &  \
+	  GLB_PRFCNT_SIZE_HARDWARE_SIZE_MASK))
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET_MOD(value) ((value) >> 8)
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET_MOD(value) ((value) << 8)
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT GPU_U(16)
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK (GPU_U(0xFFFF) << GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT)
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET(reg_val)                                                 \
+	(GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET_MOD(((reg_val)&GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK) >>   \
+					       GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT))
+#define GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET(reg_val, value)                                          \
+	(((reg_val) & ~GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK) |                                       \
+	 ((GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SET_MOD(value) << GLB_PRFCNT_SIZE_FIRMWARE_SIZE_SHIFT) &  \
+	  GLB_PRFCNT_SIZE_FIRMWARE_SIZE_MASK))
+
+/* GLB_DEBUG_REQ register */
+#define GLB_DEBUG_REQ_DEBUG_RUN_SHIFT GPU_U(23)
+#define GLB_DEBUG_REQ_DEBUG_RUN_MASK (GPU_U(0x1) << GLB_DEBUG_REQ_DEBUG_RUN_SHIFT)
+#define GLB_DEBUG_REQ_DEBUG_RUN_GET(reg_val)                                                       \
+	(((reg_val)&GLB_DEBUG_REQ_DEBUG_RUN_MASK) >> GLB_DEBUG_REQ_DEBUG_RUN_SHIFT)
+#define GLB_DEBUG_REQ_DEBUG_RUN_SET(reg_val, value)                                                \
+	(((reg_val) & ~GLB_DEBUG_REQ_DEBUG_RUN_MASK) |                                             \
+	 (((value) << GLB_DEBUG_REQ_DEBUG_RUN_SHIFT) & GLB_DEBUG_REQ_DEBUG_RUN_MASK))
+
+#define GLB_DEBUG_REQ_RUN_MODE_SHIFT GPU_U(24)
+#define GLB_DEBUG_REQ_RUN_MODE_MASK (GPU_U(0xFF) << GLB_DEBUG_REQ_RUN_MODE_SHIFT)
+#define GLB_DEBUG_REQ_RUN_MODE_GET(reg_val)                                                        \
+	(((reg_val)&GLB_DEBUG_REQ_RUN_MODE_MASK) >> GLB_DEBUG_REQ_RUN_MODE_SHIFT)
+#define GLB_DEBUG_REQ_RUN_MODE_SET(reg_val, value)                                                 \
+	(((reg_val) & ~GLB_DEBUG_REQ_RUN_MODE_MASK) |                                              \
+	 (((value) << GLB_DEBUG_REQ_RUN_MODE_SHIFT) & GLB_DEBUG_REQ_RUN_MODE_MASK))
+
+/* GLB_DEBUG_ACK register */
+#define GLB_DEBUG_ACK_DEBUG_RUN_SHIFT GPU_U(23)
+#define GLB_DEBUG_ACK_DEBUG_RUN_MASK (GPU_U(0x1) << GLB_DEBUG_ACK_DEBUG_RUN_SHIFT)
+#define GLB_DEBUG_ACK_DEBUG_RUN_GET(reg_val)                                                       \
+	(((reg_val)&GLB_DEBUG_ACK_DEBUG_RUN_MASK) >> GLB_DEBUG_ACK_DEBUG_RUN_SHIFT)
+#define GLB_DEBUG_ACK_DEBUG_RUN_SET(reg_val, value)                                                \
+	(((reg_val) & ~GLB_DEBUG_ACK_DEBUG_RUN_MASK) |                                             \
+	 (((value) << GLB_DEBUG_ACK_DEBUG_RUN_SHIFT) & GLB_DEBUG_ACK_DEBUG_RUN_MASK))
+
+#define GLB_DEBUG_ACK_RUN_MODE_SHIFT GPU_U(24)
+#define GLB_DEBUG_ACK_RUN_MODE_MASK (GPU_U(0xFF) << GLB_DEBUG_ACK_RUN_MODE_SHIFT)
+#define GLB_DEBUG_ACK_RUN_MODE_GET(reg_val)                                                        \
+	(((reg_val)&GLB_DEBUG_ACK_RUN_MODE_MASK) >> GLB_DEBUG_ACK_RUN_MODE_SHIFT)
+#define GLB_DEBUG_ACK_RUN_MODE_SET(reg_val, value)                                                 \
+	(((reg_val) & ~GLB_DEBUG_ACK_RUN_MODE_MASK) |                                              \
+	 (((value) << GLB_DEBUG_ACK_RUN_MODE_SHIFT) & GLB_DEBUG_ACK_RUN_MODE_MASK))
+
+
+/* RUN_MODE values */
+#define GLB_DEBUG_RUN_MODE_TYPE_NOP 0x0
+#define GLB_DEBUG_RUN_MODE_TYPE_CORE_DUMP 0x1
+/* End of RUN_MODE values */
+
 #endif /* _KBASE_CSF_REGISTERS_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_reset_gpu.c b/mali_kbase/csf/mali_kbase_csf_reset_gpu.c
index 10de93f..8ed65b1 100644
--- a/mali_kbase/csf/mali_kbase_csf_reset_gpu.c
+++ b/mali_kbase/csf/mali_kbase_csf_reset_gpu.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,7 +21,7 @@
 
 #include <mali_kbase.h>
 #include <mali_kbase_ctx_sched.h>
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 #include <device/mali_kbase_device.h>
 #include <backend/gpu/mali_kbase_irq_internal.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
@@ -29,7 +29,10 @@
 #include <csf/mali_kbase_csf_trace_buffer.h>
 #include <csf/ipa_control/mali_kbase_csf_ipa_control.h>
 #include <mali_kbase_reset_gpu.h>
-#include <linux/string.h>
+#include <csf/mali_kbase_csf_firmware_log.h>
+#include "mali_kbase_config_platform.h"
+
+#include <soc/google/debug-snapshot.h>
 
 enum kbasep_soft_reset_status {
 	RESET_SUCCESS = 0,
@@ -163,6 +166,11 @@ void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev)
 	WARN_ON(kbase_reset_gpu_is_active(kbdev));
 }
 
+bool kbase_reset_gpu_failed(struct kbase_device *kbdev)
+{
+	return (atomic_read(&kbdev->csf.reset.state) == KBASE_CSF_RESET_GPU_FAILED);
+}
+
 /* Mark the reset as now happening, and synchronize with other threads that
  * might be trying to access the GPU
  */
@@ -173,6 +181,9 @@ static void kbase_csf_reset_begin_hw_access_sync(
 	unsigned long hwaccess_lock_flags;
 	unsigned long scheduler_spin_lock_flags;
 
+	/* Flush any pending coredumps */
+	flush_work(&kbdev->csf.coredump_work);
+
 	/* Note this is a WARN/atomic_set because it is a software issue for a
 	 * race to be occurring here
 	 */
@@ -185,7 +196,7 @@ static void kbase_csf_reset_begin_hw_access_sync(
 	 */
 	spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_lock_flags);
 	kbase_csf_scheduler_spin_lock(kbdev, &scheduler_spin_lock_flags);
-	atomic_set(&kbdev->csf.reset.state, KBASE_RESET_GPU_HAPPENING);
+	atomic_set(&kbdev->csf.reset.state, KBASE_CSF_RESET_GPU_HAPPENING);
 	kbase_csf_scheduler_spin_unlock(kbdev, scheduler_spin_lock_flags);
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_lock_flags);
 }
@@ -215,6 +226,9 @@ static void kbase_csf_reset_end_hw_access(struct kbase_device *kbdev,
 	} else {
 		dev_err(kbdev->dev, "Reset failed to complete");
 		atomic_set(&kbdev->csf.reset.state, KBASE_CSF_RESET_GPU_FAILED);
+
+		/* pixel: This is unrecoverable, collect a ramdump and reboot. */
+		dbg_snapshot_emergency_reboot("mali: reset failed - unrecoverable GPU");
 	}
 
 	kbase_csf_scheduler_spin_unlock(kbdev, scheduler_spin_lock_flags);
@@ -231,23 +245,27 @@ static void kbase_csf_reset_end_hw_access(struct kbase_device *kbdev,
 		kbase_csf_scheduler_enable_tick_timer(kbdev);
 }
 
-static void kbase_csf_debug_dump_registers(struct kbase_device *kbdev)
+void kbase_csf_debug_dump_registers(struct kbase_device *kbdev)
 {
+#define DOORBELL_CFG_BASE 0x20000
+#define MCUC_DB_VALUE_0 0x80
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
 	kbase_io_history_dump(kbdev);
 
-	dev_err(kbdev->dev, "Register state:");
+	dev_err(kbdev->dev, "MCU state:");
 	dev_err(kbdev->dev, "  GPU_IRQ_RAWSTAT=0x%08x   GPU_STATUS=0x%08x  MCU_STATUS=0x%08x",
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(MCU_STATUS)));
-	dev_err(kbdev->dev, "  JOB_IRQ_RAWSTAT=0x%08x   MMU_IRQ_RAWSTAT=0x%08x   GPU_FAULTSTATUS=0x%08x",
+	dev_err(kbdev->dev,
+		"  JOB_IRQ_RAWSTAT=0x%08x   MMU_IRQ_RAWSTAT=0x%08x   GPU_FAULTSTATUS=0x%08x",
 		kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_RAWSTAT)),
-		kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_RAWSTAT)),
+		kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_RAWSTAT)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_FAULTSTATUS)));
 	dev_err(kbdev->dev, "  GPU_IRQ_MASK=0x%08x   JOB_IRQ_MASK=0x%08x   MMU_IRQ_MASK=0x%08x",
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_MASK)),
 		kbase_reg_read(kbdev, JOB_CONTROL_REG(JOB_IRQ_MASK)),
-		kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)));
+		kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)));
 	dev_err(kbdev->dev, "  PWR_OVERRIDE0=0x%08x   PWR_OVERRIDE1=0x%08x",
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE0)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1)));
@@ -255,68 +273,12 @@ static void kbase_csf_debug_dump_registers(struct kbase_device *kbdev)
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_CONFIG)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_MMU_CONFIG)),
 		kbase_reg_read(kbdev, GPU_CONTROL_REG(TILER_CONFIG)));
-}
-
-static void kbase_csf_dump_firmware_trace_buffer(struct kbase_device *kbdev)
-{
-	u8 *buf, *p, *pnewline, *pend, *pendbuf;
-	unsigned int read_size, remaining_size;
-	struct firmware_trace_buffer *tb =
-		kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME);
-
-	if (tb == NULL) {
-		dev_dbg(kbdev->dev, "Can't get the trace buffer, firmware trace dump skipped");
-		return;
-	}
-
-	buf = kmalloc(PAGE_SIZE + 1, GFP_KERNEL);
-	if (buf == NULL) {
-		dev_err(kbdev->dev, "Short of memory, firmware trace dump skipped");
-		return;
-	}
-
-	buf[PAGE_SIZE] = 0;
-
-	p = buf;
-	pendbuf = &buf[PAGE_SIZE];
-
-	dev_err(kbdev->dev, "Firmware trace buffer dump:");
-	while ((read_size = kbase_csf_firmware_trace_buffer_read_data(tb, p,
-								pendbuf - p))) {
-		pend = p + read_size;
-		p = buf;
-
-		while (p < pend && (pnewline = memchr(p, '\n', pend - p))) {
-			/* Null-terminate the string */
-			*pnewline = 0;
-
-			dev_err(kbdev->dev, "FW> %s", p);
-
-			p = pnewline + 1;
-		}
-
-		remaining_size = pend - p;
-
-		if (!remaining_size) {
-			p = buf;
-		} else if (remaining_size < PAGE_SIZE) {
-			/* Copy unfinished string to the start of the buffer */
-			memmove(buf, p, remaining_size);
-			p = &buf[remaining_size];
-		} else {
-			/* Print abnormal page-long string without newlines */
-			dev_err(kbdev->dev, "FW> %s", buf);
-			p = buf;
-		}
-	}
-
-	if (p != buf) {
-		/* Null-terminate and print last unfinished string */
-		*p = 0;
-		dev_err(kbdev->dev, "FW> %s", buf);
-	}
-
-	kfree(buf);
+	dev_err(kbdev->dev, "  MCU DB0: %x", kbase_reg_read(kbdev, DOORBELL_CFG_BASE + MCUC_DB_VALUE_0));
+	dev_err(kbdev->dev, "  MCU GLB_REQ %x GLB_ACK %x",
+			kbase_csf_firmware_global_input_read(global_iface, GLB_REQ),
+			kbase_csf_firmware_global_output(global_iface, GLB_ACK));
+#undef MCUC_DB_VALUE_0
+#undef DOORBELL_CFG_BASE
 }
 
 /**
@@ -378,7 +340,6 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic
 		"The flush has completed so reset the active indicator\n");
 	kbdev->irq_reset_flush = false;
 
-	mutex_lock(&kbdev->pm.lock);
 	if (!silent)
 		dev_err(kbdev->dev, "Resetting GPU (allowing up to %d ms)",
 								RESET_TIMEOUT);
@@ -389,7 +350,7 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic
 	if (!silent) {
 		kbase_csf_debug_dump_registers(kbdev);
 		if (likely(firmware_inited))
-			kbase_csf_dump_firmware_trace_buffer(kbdev);
+			kbase_csf_firmware_log_dump_buffer(kbdev);
 	}
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
@@ -403,10 +364,11 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic
 	 */
 	kbase_hwcnt_backend_csf_on_before_reset(&kbdev->hwcnt_gpu_iface);
 
+	rt_mutex_lock(&kbdev->pm.lock);
 	/* Reset the GPU */
 	err = kbase_pm_init_hw(kbdev, 0);
 
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 	if (WARN_ON(err))
 		return SOFT_RESET_FAILED;
@@ -420,17 +382,19 @@ static enum kbasep_soft_reset_status kbase_csf_reset_gpu_once(struct kbase_devic
 
 	kbase_pm_enable_interrupts(kbdev);
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	kbase_pm_reset_complete(kbdev);
 	/* Synchronously wait for the reload of firmware to complete */
 	err = kbase_pm_wait_for_desired_state(kbdev);
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 	if (err) {
+		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		if (!kbase_pm_l2_is_in_desired_state(kbdev))
 			ret = L2_ON_FAILED;
 		else if (!kbase_pm_mcu_is_in_desired_state(kbdev))
 			ret = MCU_REINIT_FAILED;
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 	}
 
 	return ret;
@@ -512,6 +476,7 @@ static void kbase_csf_reset_gpu_worker(struct work_struct *data)
 		atomic_read(&kbdev->csf.reset.state);
 	const bool silent =
 		kbase_csf_reset_state_is_silent(initial_reset_state);
+	struct gpu_uevent evt;
 
 	/* Ensure any threads (e.g. executing the CSF scheduler) have finished
 	 * using the HW
@@ -549,6 +514,16 @@ static void kbase_csf_reset_gpu_worker(struct work_struct *data)
 
 	kbase_disjoint_state_down(kbdev);
 
+	if (err) {
+		evt.type = GPU_UEVENT_TYPE_GPU_RESET;
+		evt.info = GPU_UEVENT_INFO_CSF_RESET_FAILED;
+	} else {
+		evt.type = GPU_UEVENT_TYPE_GPU_RESET;
+		evt.info = GPU_UEVENT_INFO_CSF_RESET_OK;
+	}
+	if (!silent)
+		pixel_gpu_uevent_send(kbdev, &evt);
+
 	/* Allow other threads to once again use the GPU */
 	kbase_csf_reset_end_hw_access(kbdev, err, firmware_inited);
 }
@@ -566,6 +541,9 @@ bool kbase_prepare_to_reset_gpu(struct kbase_device *kbdev, unsigned int flags)
 		/* Some other thread is already resetting the GPU */
 		return false;
 
+	if (flags & RESET_FLAGS_FORCE_PM_HW_RESET)
+		kbdev->csf.reset.force_pm_hw_reset = true;
+
 	return true;
 }
 KBASE_EXPORT_TEST_API(kbase_prepare_to_reset_gpu);
@@ -633,6 +611,11 @@ bool kbase_reset_gpu_is_active(struct kbase_device *kbdev)
 	return kbase_csf_reset_state_is_active(reset_state);
 }
 
+bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev)
+{
+	return atomic_read(&kbdev->csf.reset.state) == KBASE_CSF_RESET_GPU_NOT_PENDING;
+}
+
 int kbase_reset_gpu_wait(struct kbase_device *kbdev)
 {
 	const long wait_timeout =
@@ -676,7 +659,7 @@ KBASE_EXPORT_TEST_API(kbase_reset_gpu_wait);
 
 int kbase_reset_gpu_init(struct kbase_device *kbdev)
 {
-	kbdev->csf.reset.workq = alloc_workqueue("Mali reset workqueue", 0, 1);
+	kbdev->csf.reset.workq = alloc_workqueue("Mali reset workqueue", WQ_HIGHPRI, 1);
 	if (kbdev->csf.reset.workq == NULL)
 		return -ENOMEM;
 
@@ -684,6 +667,7 @@ int kbase_reset_gpu_init(struct kbase_device *kbdev)
 
 	init_waitqueue_head(&kbdev->csf.reset.wait);
 	init_rwsem(&kbdev->csf.reset.sem);
+	kbdev->csf.reset.force_pm_hw_reset = false;
 
 	return 0;
 }
diff --git a/mali_kbase/csf/mali_kbase_csf_scheduler.c b/mali_kbase/csf/mali_kbase_csf_scheduler.c
index 237b7be..01d6feb 100644
--- a/mali_kbase/csf/mali_kbase_csf_scheduler.c
+++ b/mali_kbase/csf/mali_kbase_csf_scheduler.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,6 +19,8 @@
  *
  */
 
+#include <linux/kthread.h>
+
 #include <mali_kbase.h>
 #include "mali_kbase_config_defaults.h"
 #include <mali_kbase_ctx_sched.h>
@@ -28,9 +30,19 @@
 #include <tl/mali_kbase_tracepoints.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <linux/export.h>
+#include <linux/delay.h>
 #include <csf/mali_kbase_csf_registers.h>
 #include <uapi/gpu/arm/midgard/mali_base_kernel.h>
 #include <mali_kbase_hwaccess_time.h>
+#include <trace/events/power.h>
+#include "mali_kbase_csf_tiler_heap.h"
+#include "mali_kbase_csf_tiler_heap_reclaim.h"
+#include "mali_kbase_csf_mcu_shared_reg.h"
+#include <linux/version_compat_defs.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+#include <csf/mali_kbase_csf_trace_buffer.h>
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
 
 /* Value to indicate that a queue group is not groups_to_schedule list */
 #define KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID (U32_MAX)
@@ -50,36 +62,18 @@
 /* CSF scheduler time slice value */
 #define CSF_SCHEDULER_TIME_TICK_MS (100) /* 100 milliseconds */
 
-/*
- * CSF scheduler time threshold for converting "tock" requests into "tick" if
- * they come too close to the end of a tick interval. This avoids scheduling
- * twice in a row.
- */
-#define CSF_SCHEDULER_TIME_TICK_THRESHOLD_MS \
-	CSF_SCHEDULER_TIME_TICK_MS
-
-#define CSF_SCHEDULER_TIME_TICK_THRESHOLD_JIFFIES \
-	msecs_to_jiffies(CSF_SCHEDULER_TIME_TICK_THRESHOLD_MS)
-
-/* Nanoseconds per millisecond */
-#define NS_PER_MS ((u64)1000 * 1000)
-
-/*
- * CSF minimum time to reschedule for a new "tock" request. Bursts of "tock"
- * requests are not serviced immediately, but shall wait for a minimum time in
- * order to reduce load on the CSF scheduler thread.
- */
-#define CSF_SCHEDULER_TIME_TOCK_JIFFIES 1 /* 1 jiffies-time */
-
-/* CS suspended and is idle (empty ring buffer) */
-#define CS_IDLE_FLAG (1 << 0)
-
-/* CS suspended and is wait for a CQS condition */
-#define CS_WAIT_SYNC_FLAG (1 << 1)
+/* CSG_REQ:STATUS_UPDATE timeout */
+#define CSG_STATUS_UPDATE_REQ_TIMEOUT_MS (250) /* 250 milliseconds */
 
 /* A GPU address space slot is reserved for MCU. */
 #define NUM_RESERVED_AS_SLOTS (1)
 
+/* Time to wait for completion of PING req before considering MCU as hung */
+#define FW_PING_AFTER_ERROR_TIMEOUT_MS (10)
+
+/* Explicitly defining this blocked_reason code as SB_WAIT for clarity */
+#define CS_STATUS_BLOCKED_ON_SB_WAIT CS_STATUS_BLOCKED_REASON_REASON_WAIT
+
 static int scheduler_group_schedule(struct kbase_queue_group *group);
 static void remove_group_from_idle_wait(struct kbase_queue_group *const group);
 static
@@ -97,9 +91,441 @@ static int suspend_active_queue_groups(struct kbase_device *kbdev,
 static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev,
 					      bool system_suspend);
 static void schedule_in_cycle(struct kbase_queue_group *group, bool force);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static bool evaluate_sync_update(struct kbase_queue *queue);
+#endif
+static bool queue_group_scheduled_locked(struct kbase_queue_group *group);
 
 #define kctx_as_enabled(kctx) (!kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT))
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+void turn_on_sc_power_rails(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	WARN_ON(kbdev->csf.scheduler.state == SCHED_SUSPENDED);
+
+	if (kbdev->csf.scheduler.sc_power_rails_off) {
+		if (kbdev->pm.backend.callback_power_on_sc_rails)
+			kbdev->pm.backend.callback_power_on_sc_rails(kbdev);
+		kbdev->csf.scheduler.sc_power_rails_off = false;
+	}
+}
+
+/**
+ * turn_off_sc_power_rails - Turn off the shader core power rails.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is called to synchronously turn off the shader core power rails.
+  */
+static void turn_off_sc_power_rails(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	WARN_ON(kbdev->csf.scheduler.state == SCHED_SUSPENDED);
+
+	if (!kbdev->csf.scheduler.sc_power_rails_off) {
+		if (kbdev->pm.backend.callback_power_off_sc_rails)
+			kbdev->pm.backend.callback_power_off_sc_rails(kbdev);
+		kbdev->csf.scheduler.sc_power_rails_off = true;
+	}
+}
+
+/**
+ * gpu_idle_event_is_pending - Check if there is a pending GPU idle event
+ *
+ * @kbdev: Pointer to the device.
+ */
+static bool gpu_idle_event_is_pending(struct kbase_device *kbdev)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+	lockdep_assert_held(&kbdev->csf.scheduler.interrupt_lock);
+
+	return (kbase_csf_firmware_global_input_read(global_iface, GLB_REQ) ^
+		kbase_csf_firmware_global_output(global_iface, GLB_ACK)) &
+	       GLB_REQ_IDLE_EVENT_MASK;
+}
+
+/**
+ * ack_gpu_idle_event - Acknowledge the GPU idle event
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is called to acknowledge the GPU idle event. It is expected
+ * that firmware will re-enable the User submission only when it receives a
+ * CSI kernel doorbell after the idle event acknowledgement.
+ */
+static void ack_gpu_idle_event(struct kbase_device *kbdev)
+{
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	u32 glb_req, glb_ack;
+	unsigned long flags;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
+	glb_req = kbase_csf_firmware_global_input_read(global_iface, GLB_REQ);
+	glb_ack = kbase_csf_firmware_global_output(global_iface, GLB_ACK);
+	if ((glb_req ^ glb_ack) & GLB_REQ_IDLE_EVENT_MASK) {
+		kbase_csf_firmware_global_input_mask(
+			global_iface, GLB_REQ, glb_ack,
+			GLB_REQ_IDLE_EVENT_MASK);
+	}
+	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
+}
+
+static void cancel_gpu_idle_work(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	kbdev->csf.scheduler.gpu_idle_work_pending = false;
+	cancel_delayed_work(&kbdev->csf.scheduler.gpu_idle_work);
+}
+
+static bool queue_empty_or_blocked(struct kbase_queue *queue)
+{
+	bool empty = false;
+	bool blocked = false;
+
+	if (CS_STATUS_WAIT_SYNC_WAIT_GET(queue->status_wait)) {
+		if (!evaluate_sync_update(queue))
+			blocked = true;
+		else
+			queue->status_wait = 0;
+	}
+
+	if (!blocked) {
+		u64 *input_addr = (u64 *)queue->user_io_addr;
+		u64 *output_addr = (u64 *)(queue->user_io_addr + PAGE_SIZE);
+
+		empty = (input_addr[CS_INSERT_LO / sizeof(u64)] ==
+			 output_addr[CS_EXTRACT_LO / sizeof(u64)]);
+	}
+
+	return (empty || blocked);
+}
+#endif
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+/**
+ * gpu_metrics_ctx_init() - Take a reference on GPU metrics context if it exists,
+ *                          otherwise allocate and initialise one.
+ *
+ * @kctx: Pointer to the Kbase context.
+ *
+ * The GPU metrics context represents an "Application" for the purposes of GPU metrics
+ * reporting. There may be multiple kbase_contexts contributing data to a single GPU
+ * metrics context.
+ * This function takes a reference on GPU metrics context if it already exists
+ * corresponding to the Application that is creating the Kbase context, otherwise
+ * memory is allocated for it and initialised.
+ *
+ * Return: 0 on success, or negative on failure.
+ */
+static inline int gpu_metrics_ctx_init(struct kbase_context *kctx)
+{
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx;
+	struct kbase_device *kbdev = kctx->kbdev;
+	int ret = 0;
+
+	const struct cred *cred = get_current_cred();
+	const unsigned int aid = cred->euid.val;
+
+	put_cred(cred);
+
+	/* Return early if this is not a Userspace created context */
+	if (unlikely(!kctx->kfile))
+		return 0;
+
+	/* Serialize against the other threads trying to create/destroy Kbase contexts. */
+	mutex_lock(&kbdev->kctx_list_lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
+	gpu_metrics_ctx = kbase_gpu_metrics_ctx_get(kbdev, aid);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+
+	if (!gpu_metrics_ctx) {
+		gpu_metrics_ctx = kmalloc(sizeof(*gpu_metrics_ctx), GFP_KERNEL);
+
+		if (gpu_metrics_ctx) {
+			rt_mutex_lock(&kbdev->csf.scheduler.lock);
+			kbase_gpu_metrics_ctx_init(kbdev, gpu_metrics_ctx, aid);
+			rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+		} else {
+			dev_err(kbdev->dev, "Allocation for gpu_metrics_ctx failed");
+			ret = -ENOMEM;
+		}
+	}
+
+	kctx->gpu_metrics_ctx = gpu_metrics_ctx;
+	mutex_unlock(&kbdev->kctx_list_lock);
+
+	return ret;
+}
+
+/**
+ * gpu_metrics_ctx_term() - Drop a reference on a GPU metrics context and free it
+ *                          if the refcount becomes 0.
+ *
+ * @kctx: Pointer to the Kbase context.
+ */
+static inline void gpu_metrics_ctx_term(struct kbase_context *kctx)
+{
+	/* Return early if this is not a Userspace created context */
+	if (unlikely(!kctx->kfile))
+		return;
+
+	/* Serialize against the other threads trying to create/destroy Kbase contexts. */
+	mutex_lock(&kctx->kbdev->kctx_list_lock);
+	rt_mutex_lock(&kctx->kbdev->csf.scheduler.lock);
+	kbase_gpu_metrics_ctx_put(kctx->kbdev, kctx->gpu_metrics_ctx);
+	rt_mutex_unlock(&kctx->kbdev->csf.scheduler.lock);
+	mutex_unlock(&kctx->kbdev->kctx_list_lock);
+}
+
+/**
+ * struct gpu_metrics_event - A GPU metrics event recorded in trace buffer.
+ *
+ * @csg_slot_act:  The 32bit data consisting of a GPU metrics event.
+ *                 5 bits[4:0] represents CSG slot number.
+ *                 1 bit [5]  represents the transition of the CSG group on the slot.
+ *                            '1' means idle->active whilst '0' does active->idle.
+ * @timestamp:     64bit timestamp consisting of a GPU metrics event.
+ *
+ * Note: It's packed and word-aligned as agreed layout with firmware.
+ */
+struct gpu_metrics_event {
+	u32 csg_slot_act;
+	u64 timestamp;
+} __packed __aligned(4);
+#define GPU_METRICS_EVENT_SIZE sizeof(struct gpu_metrics_event)
+
+#define GPU_METRICS_ACT_SHIFT 5
+#define GPU_METRICS_ACT_MASK (0x1 << GPU_METRICS_ACT_SHIFT)
+#define GPU_METRICS_ACT_GET(val) (((val)&GPU_METRICS_ACT_MASK) >> GPU_METRICS_ACT_SHIFT)
+
+#define GPU_METRICS_CSG_MASK 0x1f
+#define GPU_METRICS_CSG_GET(val) ((val)&GPU_METRICS_CSG_MASK)
+
+/**
+ * gpu_metrics_read_event() - Read a GPU metrics trace from trace buffer
+ *
+ * @kbdev:    Pointer to the device
+ * @kctx:     Kcontext that is derived from CSG slot field of a GPU metrics.
+ * @prev_act: Previous CSG activity transition in a GPU metrics.
+ * @cur_act:  Current CSG activity transition in a GPU metrics.
+ * @ts:       CSG activity transition timestamp in a GPU metrics.
+ *
+ * This function reads firmware trace buffer, named 'gpu_metrics' and
+ * parse one 12-byte data packet into following information.
+ * - The number of CSG slot on which CSG was transitioned to active or idle.
+ * - Activity transition (1: idle->active, 0: active->idle).
+ * - Timestamp in nanoseconds when the transition occurred.
+ *
+ * Return: true on success.
+ */
+static bool gpu_metrics_read_event(struct kbase_device *kbdev, struct kbase_context **kctx,
+				   bool *prev_act, bool *cur_act, uint64_t *ts)
+{
+	struct firmware_trace_buffer *tb = kbdev->csf.scheduler.gpu_metrics_tb;
+	struct gpu_metrics_event e;
+
+	if (kbase_csf_firmware_trace_buffer_read_data(tb, (u8 *)&e, GPU_METRICS_EVENT_SIZE) ==
+	    GPU_METRICS_EVENT_SIZE) {
+		const u8 slot = GPU_METRICS_CSG_GET(e.csg_slot_act);
+		struct kbase_queue_group *group;
+
+		if (WARN_ON_ONCE(slot >= kbdev->csf.global_iface.group_num)) {
+			dev_err(kbdev->dev, "invalid CSG slot (%u)", slot);
+			return false;
+		}
+
+		group = kbdev->csf.scheduler.csg_slots[slot].resident_group;
+
+		if (unlikely(!group)) {
+			dev_err(kbdev->dev, "failed to find CSG group from CSG slot (%u)", slot);
+			return false;
+		}
+
+		*cur_act = GPU_METRICS_ACT_GET(e.csg_slot_act);
+		*ts = kbase_backend_time_convert_gpu_to_cpu(kbdev, e.timestamp);
+		*kctx = group->kctx;
+
+		*prev_act = group->prev_act;
+		group->prev_act = *cur_act;
+
+		return true;
+	}
+
+	dev_err(kbdev->dev, "failed to read a GPU metrics from trace buffer");
+
+	return false;
+}
+
+/**
+ * emit_gpu_metrics_to_frontend() - Emit GPU metrics events to the frontend.
+ *
+ * @kbdev: Pointer to the device
+ *
+ * This function must be called to emit GPU metrics data to the
+ * frontend whenever needed.
+ * Calls to this function will be serialized by scheduler lock.
+ *
+ * Kbase reports invalid activity traces when detected.
+ */
+static void emit_gpu_metrics_to_frontend(struct kbase_device *kbdev)
+{
+	u64 system_time = 0;
+	u64 ts_before_drain;
+	u64 ts = 0;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+	return;
+#endif
+
+	if (WARN_ON_ONCE(kbdev->csf.scheduler.state == SCHED_SUSPENDED))
+		return;
+
+	kbase_backend_get_gpu_time_norequest(kbdev, NULL, &system_time, NULL);
+	ts_before_drain = kbase_backend_time_convert_gpu_to_cpu(kbdev, system_time);
+
+	while (!kbase_csf_firmware_trace_buffer_is_empty(kbdev->csf.scheduler.gpu_metrics_tb)) {
+		struct kbase_context *kctx;
+		bool prev_act;
+		bool cur_act;
+
+		if (gpu_metrics_read_event(kbdev, &kctx, &prev_act, &cur_act, &ts)) {
+			if (prev_act == cur_act) {
+				/* Error handling
+				 *
+				 * In case of active CSG, Kbase will try to recover the
+				 * lost event by ending previously active event and
+				 * starting a new one.
+				 *
+				 * In case of inactive CSG, the event is drop as Kbase
+				 * cannot recover.
+				 */
+				dev_err(kbdev->dev,
+					"Invalid activity state transition. (prev_act = %u, cur_act = %u)",
+					prev_act, cur_act);
+				if (cur_act) {
+					kbase_gpu_metrics_ctx_end_activity(kctx, ts);
+					kbase_gpu_metrics_ctx_start_activity(kctx, ts);
+				}
+			} else {
+				/* Normal handling */
+				if (cur_act)
+					kbase_gpu_metrics_ctx_start_activity(kctx, ts);
+				else
+					kbase_gpu_metrics_ctx_end_activity(kctx, ts);
+			}
+		} else
+			break;
+	}
+
+	kbase_gpu_metrics_emit_tracepoint(kbdev, ts >= ts_before_drain ? ts + 1 : ts_before_drain);
+}
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+
+/**
+ * wait_for_dump_complete_on_group_deschedule() - Wait for dump on fault and
+ *              scheduling tick/tock to complete before the group deschedule.
+ *
+ * @group: Pointer to the group that is being descheduled.
+ *
+ * This function blocks the descheduling of the group until the dump on fault is
+ * completed and scheduling tick/tock has completed.
+ * To deschedule an on slot group CSG termination request would be sent and that
+ * might time out if the fault had occurred and also potentially affect the state
+ * being dumped. Moreover the scheduler lock would be held, so the access to debugfs
+ * files would get blocked.
+ * Scheduler lock and 'kctx->csf.lock' are released before this function starts
+ * to wait. When a request sent by the Scheduler to the FW times out, Scheduler
+ * would also wait for the dumping to complete and release the Scheduler lock
+ * before the wait. Meanwhile Userspace can try to delete the group, this function
+ * would ensure that the group doesn't exit the Scheduler until scheduling
+ * tick/tock has completed. Though very unlikely, group deschedule can be triggered
+ * from multiple threads around the same time and after the wait Userspace thread
+ * can win the race and get the group descheduled and free the memory for group
+ * pointer before the other threads wake up and notice that group has already been
+ * descheduled. To avoid the freeing in such a case, a sort of refcount is used
+ * for the group which is incremented & decremented across the wait.
+ */
+static
+void wait_for_dump_complete_on_group_deschedule(struct kbase_queue_group *group)
+{
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	struct kbase_device *kbdev = group->kctx->kbdev;
+	struct kbase_context *kctx = group->kctx;
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+
+	lockdep_assert_held(&kctx->csf.lock);
+	lockdep_assert_held(&scheduler->lock);
+
+	if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev)))
+		return;
+
+	while ((!kbase_debug_csf_fault_dump_complete(kbdev) ||
+	       (scheduler->state == SCHED_BUSY)) &&
+	       queue_group_scheduled_locked(group)) {
+		group->deschedule_deferred_cnt++;
+		rt_mutex_unlock(&scheduler->lock);
+		rt_mutex_unlock(&kctx->csf.lock);
+		kbase_debug_csf_fault_wait_completion(kbdev);
+		rt_mutex_lock(&kctx->csf.lock);
+		rt_mutex_lock(&scheduler->lock);
+		group->deschedule_deferred_cnt--;
+	}
+#endif
+}
+
+/**
+ * schedule_actions_trigger_df() - Notify the client about the fault and
+ *                                 wait for the dumping to complete.
+ *
+ * @kbdev: Pointer to the device
+ * @kctx:  Pointer to the context associated with the CSG slot for which
+ *         the timeout was seen.
+ * @error: Error code indicating the type of timeout that occurred.
+ *
+ * This function notifies the Userspace client waiting for the faults and wait
+ * for the Client to complete the dumping.
+ * The function is called only from Scheduling tick/tock when a request sent by
+ * the Scheduler to FW times out or from the protm event work item of the group
+ * when the protected mode entry request times out.
+ * In the latter case there is no wait done as scheduler lock would be released
+ * immediately. In the former case the function waits and releases the scheduler
+ * lock before the wait. It has been ensured that the Scheduler view of the groups
+ * won't change meanwhile, so no group can enter/exit the Scheduler, become
+ * runnable or go off slot.
+ */
+static void schedule_actions_trigger_df(struct kbase_device *kbdev,
+	struct kbase_context *kctx, enum dumpfault_error_type error)
+{
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	if (!kbase_debug_csf_fault_notify(kbdev, kctx, error))
+		return;
+
+	if (unlikely(scheduler->state != SCHED_BUSY)) {
+		WARN_ON(error != DF_PROTECTED_MODE_ENTRY_FAILURE);
+		return;
+	}
+
+	rt_mutex_unlock(&scheduler->lock);
+	kbase_debug_csf_fault_wait_completion(kbdev);
+	rt_mutex_lock(&scheduler->lock);
+	WARN_ON(scheduler->state != SCHED_BUSY);
+#endif
+}
+
 #ifdef KBASE_PM_RUNTIME
 /**
  * wait_for_scheduler_to_exit_sleep() - Wait for Scheduler to exit the
@@ -143,12 +569,12 @@ static int wait_for_scheduler_to_exit_sleep(struct kbase_device *kbdev)
 	remaining = kbase_csf_timeout_in_jiffies(sleep_exit_wait_time);
 
 	while ((scheduler->state == SCHED_SLEEPING) && !ret) {
-		mutex_unlock(&scheduler->lock);
+		rt_mutex_unlock(&scheduler->lock);
 		remaining = wait_event_timeout(
 				kbdev->csf.event_wait,
 				(scheduler->state != SCHED_SLEEPING),
 				remaining);
-		mutex_lock(&scheduler->lock);
+		rt_mutex_lock(&scheduler->lock);
 		if (!remaining && (scheduler->state == SCHED_SLEEPING))
 			ret = -ETIMEDOUT;
 	}
@@ -187,7 +613,8 @@ static int force_scheduler_to_exit_sleep(struct kbase_device *kbdev)
 		goto out;
 	}
 
-	if (suspend_active_groups_on_powerdown(kbdev, true))
+	ret = suspend_active_groups_on_powerdown(kbdev, true);
+	if (ret)
 		goto out;
 
 	kbase_pm_lock(kbdev);
@@ -206,6 +633,7 @@ static int force_scheduler_to_exit_sleep(struct kbase_device *kbdev)
 	}
 
 	scheduler->state = SCHED_SUSPENDED;
+	KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state);
 
 	return 0;
 
@@ -225,80 +653,20 @@ out:
  *
  * @timer: Pointer to the scheduling tick hrtimer
  *
- * This function will enqueue the scheduling tick work item for immediate
- * execution, if it has not been queued already.
+ * This function will wake up kbase_csf_scheduler_kthread() to process a
+ * pending scheduling tick. It will be restarted manually once a tick has been
+ * processed if appropriate.
  *
  * Return: enum value to indicate that timer should not be restarted.
  */
 static enum hrtimer_restart tick_timer_callback(struct hrtimer *timer)
 {
-	struct kbase_device *kbdev = container_of(timer, struct kbase_device,
-						  csf.scheduler.tick_timer);
-
-	kbase_csf_scheduler_advance_tick(kbdev);
-	return HRTIMER_NORESTART;
-}
-
-/**
- * start_tick_timer() - Start the scheduling tick hrtimer.
- *
- * @kbdev: Pointer to the device
- *
- * This function will start the scheduling tick hrtimer and is supposed to
- * be called only from the tick work item function. The tick hrtimer should
- * not be active already.
- */
-static void start_tick_timer(struct kbase_device *kbdev)
-{
-	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-	unsigned long flags;
-
-	lockdep_assert_held(&scheduler->lock);
-
-	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
-	WARN_ON(scheduler->tick_timer_active);
-	if (likely(!work_pending(&scheduler->tick_work))) {
-		scheduler->tick_timer_active = true;
-
-		hrtimer_start(&scheduler->tick_timer,
-		    HR_TIMER_DELAY_MSEC(scheduler->csg_scheduling_period_ms),
-		    HRTIMER_MODE_REL);
-	}
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
-}
-
-/**
- * cancel_tick_timer() - Cancel the scheduling tick hrtimer
- *
- * @kbdev: Pointer to the device
- */
-static void cancel_tick_timer(struct kbase_device *kbdev)
-{
-	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-	unsigned long flags;
-
-	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
-	scheduler->tick_timer_active = false;
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
-	hrtimer_cancel(&scheduler->tick_timer);
-}
-
-/**
- * enqueue_tick_work() - Enqueue the scheduling tick work item
- *
- * @kbdev: Pointer to the device
- *
- * This function will queue the scheduling tick work item for immediate
- * execution. This shall only be called when both the tick hrtimer and tick
- * work item are not active/pending.
- */
-static void enqueue_tick_work(struct kbase_device *kbdev)
-{
-	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-
-	lockdep_assert_held(&scheduler->lock);
+	struct kbase_device *kbdev =
+		container_of(timer, struct kbase_device, csf.scheduler.tick_timer);
 
 	kbase_csf_scheduler_invoke_tick(kbdev);
+
+	return HRTIMER_NORESTART;
 }
 
 static void release_doorbell(struct kbase_device *kbdev, int doorbell_nr)
@@ -398,14 +766,15 @@ static void scheduler_doorbell_init(struct kbase_device *kbdev)
 	bitmap_zero(kbdev->csf.scheduler.doorbell_inuse_bitmap,
 		CSF_NUM_DOORBELL);
 
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 	/* Reserve doorbell 0 for use by kernel driver */
 	doorbell_nr = acquire_doorbell(kbdev);
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 
 	WARN_ON(doorbell_nr != CSF_KERNEL_DOORBELL_NR);
 }
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 /**
  * update_on_slot_queues_offsets - Update active queues' INSERT & EXTRACT ofs
  *
@@ -441,48 +810,90 @@ static void update_on_slot_queues_offsets(struct kbase_device *kbdev)
 		for (j = 0; j < max_streams; ++j) {
 			struct kbase_queue *const queue = group->bound_queues[j];
 
-			if (queue) {
+			if (queue && queue->user_io_addr) {
 				u64 const *const output_addr =
-					(u64 const *)(queue->user_io_addr + PAGE_SIZE);
+					(u64 const *)(queue->user_io_addr +
+						      PAGE_SIZE / sizeof(u64));
 
+				/*
+				 * This 64-bit read will be atomic on a 64-bit kernel but may not
+				 * be atomic on 32-bit kernels. Support for 32-bit kernels is
+				 * limited to build-only.
+				 */
 				queue->extract_ofs = output_addr[CS_EXTRACT_LO / sizeof(u64)];
 			}
 		}
 	}
 }
+#endif
 
-static void enqueue_gpu_idle_work(struct kbase_csf_scheduler *const scheduler)
+static void enqueue_gpu_idle_work(struct kbase_csf_scheduler *const scheduler,
+			unsigned long delay)
 {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	lockdep_assert_held(&scheduler->lock);
+
+	scheduler->gpu_idle_work_pending = true;
+	mod_delayed_work(system_highpri_wq, &scheduler->gpu_idle_work, delay);
+#else
+	CSTD_UNUSED(delay);
 	atomic_set(&scheduler->gpu_no_longer_idle, false);
 	queue_work(scheduler->idle_wq, &scheduler->gpu_idle_work);
+#endif
 }
 
-void kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev)
+bool kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 	int non_idle_offslot_grps;
 	bool can_suspend_on_idle;
+	bool ack_gpu_idle_event = true;
 
+	lockdep_assert_held(&kbdev->hwaccess_lock);
 	lockdep_assert_held(&scheduler->interrupt_lock);
 
 	non_idle_offslot_grps = atomic_read(&scheduler->non_idle_offslot_grps);
 	can_suspend_on_idle = kbase_pm_idle_groups_sched_suspendable(kbdev);
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_CAN_IDLE, NULL,
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND, NULL,
 			 ((u64)(u32)non_idle_offslot_grps) | (((u64)can_suspend_on_idle) << 32));
 
 	if (!non_idle_offslot_grps) {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+		/* If FW is managing the cores then we need to turn off the
+		 * the power rails.
+		 */
+		if (!kbase_pm_no_mcu_core_pwroff(kbdev)) {
+			queue_work(system_highpri_wq,
+				   &scheduler->sc_rails_off_work);
+			ack_gpu_idle_event = false;
+		}
+#else
 		if (can_suspend_on_idle) {
+			/* fast_gpu_idle_handling is protected by the
+			 * interrupt_lock, which would prevent this from being
+			 * updated whilst gpu_idle_worker() is executing.
+			 */
+			scheduler->fast_gpu_idle_handling =
+				(kbdev->csf.gpu_idle_hysteresis_ns == 0) ||
+				!kbase_csf_scheduler_all_csgs_idle(kbdev);
+
 			/* The GPU idle worker relies on update_on_slot_queues_offsets() to have
 			 * finished. It's queued before to reduce the time it takes till execution
 			 * but it'll eventually be blocked by the scheduler->interrupt_lock.
 			 */
-			enqueue_gpu_idle_work(scheduler);
-			update_on_slot_queues_offsets(kbdev);
+			enqueue_gpu_idle_work(scheduler, 0);
+
+			/* The extract offsets are unused in fast GPU idle handling */
+			if (!scheduler->fast_gpu_idle_handling)
+				update_on_slot_queues_offsets(kbdev);
 		}
+#endif
 	} else {
-		/* Advance the scheduling tick to get the non-idle suspended groups loaded soon */
-		kbase_csf_scheduler_advance_tick_nolock(kbdev);
+		/* Invoke the scheduling tick to get the non-idle suspended groups loaded soon */
+		kbase_csf_scheduler_invoke_tick(kbdev);
 	}
+
+	return ack_gpu_idle_event;
 }
 
 u32 kbase_csf_scheduler_get_nr_active_csgs_locked(struct kbase_device *kbdev)
@@ -551,6 +962,12 @@ static bool on_slot_group_idle_locked(struct kbase_queue_group *group)
 	return (group->run_state == KBASE_CSF_GROUP_IDLE);
 }
 
+static bool can_schedule_idle_group(struct kbase_queue_group *group)
+{
+	return (on_slot_group_idle_locked(group) ||
+		(group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME));
+}
+
 static bool queue_group_scheduled(struct kbase_queue_group *group)
 {
 	return (group->run_state != KBASE_CSF_GROUP_INACTIVE &&
@@ -565,35 +982,52 @@ static bool queue_group_scheduled_locked(struct kbase_queue_group *group)
 	return queue_group_scheduled(group);
 }
 
+static void update_idle_protm_group_state_to_runnable(struct kbase_queue_group *group)
+{
+	lockdep_assert_held(&group->kctx->kbdev->csf.scheduler.lock);
+
+	group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_RUNNABLE, group, group->run_state);
+}
+
 /**
- * scheduler_wait_protm_quit() - Wait for GPU to exit protected mode.
+ * scheduler_protm_wait_quit() - Wait for GPU to exit protected mode.
  *
  * @kbdev: Pointer to the GPU device
  *
  * This function waits for the GPU to exit protected mode which is confirmed
  * when active_protm_grp is set to NULL.
+ *
+ * Return: true on success, false otherwise.
  */
-static void scheduler_wait_protm_quit(struct kbase_device *kbdev)
+static bool scheduler_protm_wait_quit(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	long wt = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
 	long remaining;
+	bool success = true;
 
 	lockdep_assert_held(&scheduler->lock);
 
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_WAIT_PROTM_QUIT, NULL,
-			 jiffies_to_msecs(wt));
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_PROTM_WAIT_QUIT_START, NULL, jiffies_to_msecs(wt));
 
 	remaining = wait_event_timeout(kbdev->csf.event_wait,
 			!kbase_csf_scheduler_protected_mode_in_use(kbdev), wt);
 
-	if (!remaining)
+	if (unlikely(!remaining)) {
+		struct kbase_queue_group *group = kbdev->csf.scheduler.active_protm_grp;
+		struct kbase_context *kctx = group ? group->kctx : NULL;
+
 		dev_warn(kbdev->dev, "[%llu] Timeout (%d ms), protm_quit wait skipped",
 			kbase_backend_get_cycle_cnt(kbdev),
 			kbdev->csf.fw_timeout_ms);
+		schedule_actions_trigger_df(kbdev, kctx, DF_PROTECTED_MODE_EXIT_TIMEOUT);
+		success = false;
+	}
 
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_WAIT_PROTM_QUIT_DONE, NULL,
-			 jiffies_to_msecs(remaining));
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_PROTM_WAIT_QUIT_END, NULL, jiffies_to_msecs(remaining));
+
+	return success;
 }
 
 /**
@@ -603,31 +1037,39 @@ static void scheduler_wait_protm_quit(struct kbase_device *kbdev)
  *
  * This function sends a ping request to the firmware and waits for the GPU
  * to exit protected mode.
+ *
+ * If the GPU does not exit protected mode, it is considered as hang.
+ * A GPU reset would then be triggered.
  */
 static void scheduler_force_protm_exit(struct kbase_device *kbdev)
 {
+	unsigned long flags;
+
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 
 	kbase_csf_firmware_ping(kbdev);
-	scheduler_wait_protm_quit(kbdev);
-}
 
-/**
- * scheduler_timer_is_enabled_nolock() - Check if the scheduler wakes up
- * automatically for periodic tasks.
- *
- * @kbdev: Pointer to the device
- *
- * This is a variant of kbase_csf_scheduler_timer_is_enabled() that assumes the
- * CSF scheduler lock to already have been held.
- *
- * Return: true if the scheduler is configured to wake up periodically
- */
-static bool scheduler_timer_is_enabled_nolock(struct kbase_device *kbdev)
-{
-	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+	if (scheduler_protm_wait_quit(kbdev))
+		return;
+
+	dev_err(kbdev->dev, "Possible GPU hang in Protected mode");
+
+	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
+	if (kbdev->csf.scheduler.active_protm_grp) {
+		dev_err(kbdev->dev,
+			"Group-%d of context %d_%d ran in protected mode for too long on slot %d",
+			kbdev->csf.scheduler.active_protm_grp->handle,
+			kbdev->csf.scheduler.active_protm_grp->kctx->tgid,
+			kbdev->csf.scheduler.active_protm_grp->kctx->id,
+			kbdev->csf.scheduler.active_protm_grp->csg_nr);
+	}
+	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
 
-	return kbdev->csf.scheduler.timer_enabled;
+	/* The GPU could be stuck in Protected mode. To prevent a hang,
+	 * a GPU reset is performed.
+	 */
+	if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
+		kbase_reset_gpu(kbdev);
 }
 
 /**
@@ -682,7 +1124,8 @@ static int scheduler_pm_active_handle_suspend(struct kbase_device *kbdev,
  *                                     Scheduler
  *
  * @kbdev: Pointer to the device
- * @flags: flags containing previous interrupt state
+ * @flags: Pointer to the flags variable containing the interrupt state
+ *         when hwaccess lock was acquired.
  *
  * This function is called when Scheduler needs to be activated from the
  * sleeping state.
@@ -690,14 +1133,14 @@ static int scheduler_pm_active_handle_suspend(struct kbase_device *kbdev,
  * MCU is initiated. It resets the flag that indicates to the MCU state
  * machine that MCU needs to be put in sleep state.
  *
- * Note: This function shall be called with hwaccess lock held and it will
- * release that lock.
+ * Note: This function shall be called with hwaccess lock held and it may
+ * release that lock and reacquire it.
  *
  * Return: zero when the PM reference was taken and non-zero when the
  * system is being suspending/suspended.
  */
 static int scheduler_pm_active_after_sleep(struct kbase_device *kbdev,
-					   unsigned long flags)
+					   unsigned long *flags)
 {
 	u32 prev_count;
 	int ret = 0;
@@ -708,20 +1151,20 @@ static int scheduler_pm_active_after_sleep(struct kbase_device *kbdev,
 	prev_count = kbdev->csf.scheduler.pm_active_count;
 	if (!WARN_ON(prev_count == U32_MAX))
 		kbdev->csf.scheduler.pm_active_count++;
-	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
 	/* On 0 => 1, make a pm_ctx_active request */
 	if (!prev_count) {
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, *flags);
+
 		ret = kbase_pm_context_active_handle_suspend(kbdev,
 				KBASE_PM_SUSPEND_HANDLER_DONT_REACTIVATE);
 
-		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+		spin_lock_irqsave(&kbdev->hwaccess_lock, *flags);
 		if (ret)
 			kbdev->csf.scheduler.pm_active_count--;
 		else
 			kbdev->pm.backend.gpu_sleep_mode_active = false;
 		kbase_pm_update_state(kbdev);
-		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 	}
 
 	return ret;
@@ -801,6 +1244,71 @@ static void scheduler_pm_idle_before_sleep(struct kbase_device *kbdev)
 }
 #endif
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void enable_gpu_idle_fw_timer(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	unsigned long flags;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
+	if (!scheduler->gpu_idle_fw_timer_enabled) {
+		kbase_csf_firmware_enable_gpu_idle_timer(kbdev);
+		scheduler->gpu_idle_fw_timer_enabled = true;
+	}
+	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+}
+
+static void disable_gpu_idle_fw_timer(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	unsigned long flags;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
+	if (scheduler->gpu_idle_fw_timer_enabled) {
+		kbase_csf_firmware_disable_gpu_idle_timer(kbdev);
+		scheduler->gpu_idle_fw_timer_enabled = false;
+	}
+	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+}
+
+/**
+ * update_gpu_idle_timer_on_scheduler_wakeup() - Update the GPU idle state
+ *                                reporting as per the power policy in use.
+ *
+ * @kbdev: Pointer to the device
+ *
+ * This function disables the GPU idle state reporting in FW if as per the
+ * power policy the power management of shader cores needs to be done by the
+ * Host. This prevents the needless disabling of User submissions in FW on
+ * reporting the GPU idle event to Host if power rail for shader cores is
+ * controlled by the Host.
+ * Scheduler is suspended when switching and out of such power policy, so on
+ * the wakeup of Scheduler can enable or disable the GPU idle state reporting.
+ */
+static void update_gpu_idle_timer_on_scheduler_wakeup(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	unsigned long flags;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	WARN_ON(scheduler->state != SCHED_SUSPENDED);
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	if (kbase_pm_no_mcu_core_pwroff(kbdev))
+		disable_gpu_idle_fw_timer(kbdev);
+	else
+		enable_gpu_idle_fw_timer(kbdev);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return;
+}
+#endif
+
 static void scheduler_wakeup(struct kbase_device *kbdev, bool kick)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
@@ -825,8 +1333,8 @@ static void scheduler_wakeup(struct kbase_device *kbdev, bool kick)
 			"Re-activating the Scheduler out of sleep");
 
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-		ret = scheduler_pm_active_after_sleep(kbdev, flags);
-		/* hwaccess_lock is released in the previous function call. */
+		ret = scheduler_pm_active_after_sleep(kbdev, &flags);
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 #endif
 	}
 
@@ -839,7 +1347,12 @@ static void scheduler_wakeup(struct kbase_device *kbdev, bool kick)
 		return;
 	}
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	update_gpu_idle_timer_on_scheduler_wakeup(kbdev);
+#endif
+
 	scheduler->state = SCHED_INACTIVE;
+	KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state);
 
 	if (kick)
 		scheduler_enable_tick_timer_nolock(kbdev);
@@ -855,6 +1368,7 @@ static void scheduler_suspend(struct kbase_device *kbdev)
 		dev_dbg(kbdev->dev, "Suspending the Scheduler");
 		scheduler_pm_idle(kbdev);
 		scheduler->state = SCHED_SUSPENDED;
+		KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state);
 	}
 }
 
@@ -885,8 +1399,10 @@ static void update_idle_suspended_group_state(struct kbase_queue_group *group)
 					 KBASE_CSF_GROUP_SUSPENDED);
 	} else if (group->run_state == KBASE_CSF_GROUP_SUSPENDED_ON_IDLE) {
 		group->run_state = KBASE_CSF_GROUP_SUSPENDED;
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED, group,
+					 group->run_state);
 
-		/* If scheduler is not suspended and the given group's
+                /* If scheduler is not suspended and the given group's
 		 * static priority (reflected by the scan_seq_num) is inside
 		 * the current tick slot-range, or there are some on_slot
 		 * idle groups, schedule an async tock.
@@ -916,8 +1432,8 @@ static void update_idle_suspended_group_state(struct kbase_queue_group *group)
 		return;
 
 	new_val = atomic_inc_return(&scheduler->non_idle_offslot_grps);
-	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC,
-				group, new_val);
+	KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group,
+				 new_val);
 }
 
 int kbase_csf_scheduler_group_get_slot_locked(struct kbase_queue_group *group)
@@ -1009,6 +1525,7 @@ static int halt_stream_sync(struct kbase_queue *queue)
 	struct kbase_csf_cmd_stream_info *stream;
 	int csi_index = queue->csi_index;
 	long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+	unsigned long flags;
 
 	if (WARN_ON(!group) ||
 	    WARN_ON(!kbasep_csf_scheduler_group_is_on_slot_locked(group)))
@@ -1026,6 +1543,11 @@ static int halt_stream_sync(struct kbase_queue *queue)
 			 == CS_ACK_STATE_START), remaining);
 
 		if (!remaining) {
+			const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_QUEUE_START
+			};
+			pixel_gpu_uevent_send(kbdev, &evt);
 			dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to start on csi %d bound to group %d on slot %d",
 				 kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms,
 				 csi_index, group->handle, group->csg_nr);
@@ -1040,12 +1562,15 @@ static int halt_stream_sync(struct kbase_queue *queue)
 			kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
 	}
 
+	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
 	/* Set state to STOP */
 	kbase_csf_firmware_cs_input_mask(stream, CS_REQ, CS_REQ_STATE_STOP,
 					 CS_REQ_STATE_MASK);
 
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP_REQUESTED, group, queue, 0u);
 	kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr, true);
+	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
+
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP_REQ, group, queue, 0u);
 
 	/* Timed wait */
 	remaining = wait_event_timeout(kbdev->csf.event_wait,
@@ -1053,6 +1578,11 @@ static int halt_stream_sync(struct kbase_queue *queue)
 		 == CS_ACK_STATE_STOP), remaining);
 
 	if (!remaining) {
+		const struct gpu_uevent evt = {
+			.type = GPU_UEVENT_TYPE_KMD_ERROR,
+			.info = GPU_UEVENT_INFO_QUEUE_STOP
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
 		dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to stop on csi %d bound to group %d on slot %d",
 			 kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms,
 			 queue->csi_index, group->handle, group->csg_nr);
@@ -1117,8 +1647,7 @@ static int sched_halt_stream(struct kbase_queue *queue)
 	long remaining;
 	int slot;
 	int err = 0;
-	const u32 group_schedule_timeout =
-		20 * kbdev->csf.scheduler.csg_scheduling_period_ms;
+	const u32 group_schedule_timeout = kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT);
 
 	if (WARN_ON(!group))
 		return -EINVAL;
@@ -1141,7 +1670,7 @@ retry:
 	/* Update the group state so that it can get scheduled soon */
 	update_idle_suspended_group_state(group);
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 
 	/* This function is called when the queue group is either not on a CSG
 	 * slot or is on the slot but undergoing transition.
@@ -1164,7 +1693,7 @@ retry:
 		kbdev->csf.event_wait, can_halt_stream(kbdev, group),
 		kbase_csf_timeout_in_jiffies(group_schedule_timeout));
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	if (remaining && queue_group_scheduled_locked(group)) {
 		slot = kbase_csf_scheduler_group_get_slot(group);
@@ -1227,6 +1756,11 @@ retry:
 					kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms));
 
 				if (!remaining) {
+					const struct gpu_uevent evt = {
+						.type = GPU_UEVENT_TYPE_KMD_ERROR,
+						.info = GPU_UEVENT_INFO_QUEUE_STOP_ACK
+					};
+					pixel_gpu_uevent_send(kbdev, &evt);
 					dev_warn(kbdev->dev,
 						 "[%llu] Timeout (%d ms) waiting for queue stop ack on csi %d bound to group %d on slot %d",
 						 kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms,
@@ -1292,7 +1826,7 @@ int kbase_csf_scheduler_queue_stop(struct kbase_queue *queue)
 
 	kbase_reset_gpu_assert_failed_or_prevented(kbdev);
 	lockdep_assert_held(&queue->kctx->csf.lock);
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 
 	queue->enabled = false;
 	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_STOP, group, queue, cs_enabled);
@@ -1314,9 +1848,11 @@ int kbase_csf_scheduler_queue_stop(struct kbase_queue *queue)
 			err = sched_halt_stream(queue);
 
 		unassign_user_doorbell_from_queue(kbdev, queue);
+		kbase_csf_mcu_shared_drop_stopped_queue(kbdev, queue);
 	}
 
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_STOP, group, queue, group->run_state);
 	return err;
 }
 
@@ -1324,9 +1860,9 @@ static void update_hw_active(struct kbase_queue *queue, bool active)
 {
 #if IS_ENABLED(CONFIG_MALI_NO_MALI)
 	if (queue && queue->enabled) {
-		u32 *output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE);
+		u64 *output_addr = queue->user_io_addr + PAGE_SIZE / sizeof(u64);
 
-		output_addr[CS_ACTIVE / sizeof(u32)] = active;
+		output_addr[CS_ACTIVE / sizeof(*output_addr)] = active;
 	}
 #else
 	CSTD_UNUSED(queue);
@@ -1336,11 +1872,16 @@ static void update_hw_active(struct kbase_queue *queue, bool active)
 
 static void program_cs_extract_init(struct kbase_queue *queue)
 {
-	u64 *input_addr = (u64 *)queue->user_io_addr;
-	u64 *output_addr = (u64 *)(queue->user_io_addr + PAGE_SIZE);
+	u64 *input_addr = queue->user_io_addr;
+	u64 *output_addr = queue->user_io_addr + PAGE_SIZE / sizeof(u64);
 
-	input_addr[CS_EXTRACT_INIT_LO / sizeof(u64)] =
-			output_addr[CS_EXTRACT_LO / sizeof(u64)];
+	/*
+	 * These 64-bit reads and writes will be atomic on a 64-bit kernel but may
+	 * not be atomic on 32-bit kernels. Support for 32-bit kernels is limited to
+	 * build-only.
+	 */
+	input_addr[CS_EXTRACT_INIT_LO / sizeof(*input_addr)] =
+		output_addr[CS_EXTRACT_LO / sizeof(*output_addr)];
 }
 
 static void program_cs_trace_cfg(struct kbase_csf_cmd_stream_info *stream,
@@ -1394,6 +1935,7 @@ static void program_cs(struct kbase_device *kbdev,
 	struct kbase_csf_cmd_stream_group_info *ginfo;
 	struct kbase_csf_cmd_stream_info *stream;
 	int csi_index = queue->csi_index;
+	unsigned long flags;
 	u64 user_input;
 	u64 user_output;
 
@@ -1411,11 +1953,13 @@ static void program_cs(struct kbase_device *kbdev,
 	    WARN_ON(csi_index >= ginfo->stream_num))
 		return;
 
-	assign_user_doorbell_to_queue(kbdev, queue);
-	if (queue->doorbell_nr == KBASEP_USER_DB_NR_INVALID)
-		return;
+	if (queue->enabled) {
+		assign_user_doorbell_to_queue(kbdev, queue);
+		if (queue->doorbell_nr == KBASEP_USER_DB_NR_INVALID)
+			return;
 
-	WARN_ON(queue->doorbell_nr != queue->group->doorbell_nr);
+		WARN_ON(queue->doorbell_nr != queue->group->doorbell_nr);
+	}
 
 	if (queue->enabled && queue_group_suspended_locked(group))
 		program_cs_extract_init(queue);
@@ -1429,17 +1973,15 @@ static void program_cs(struct kbase_device *kbdev,
 	kbase_csf_firmware_cs_input(stream, CS_SIZE,
 				    queue->size);
 
-	user_input = (queue->reg->start_pfn << PAGE_SHIFT);
-	kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_LO,
-				    user_input & 0xFFFFFFFF);
-	kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_HI,
-				    user_input >> 32);
+	user_input = queue->user_io_gpu_va;
+	WARN_ONCE(!user_input && queue->enabled, "Enabled queue should have a valid gpu_va");
 
-	user_output = ((queue->reg->start_pfn + 1) << PAGE_SHIFT);
-	kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_LO,
-				    user_output & 0xFFFFFFFF);
-	kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_HI,
-				    user_output >> 32);
+	kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_LO, user_input & 0xFFFFFFFF);
+	kbase_csf_firmware_cs_input(stream, CS_USER_INPUT_HI, user_input >> 32);
+
+	user_output = user_input + PAGE_SIZE;
+	kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_LO, user_output & 0xFFFFFFFF);
+	kbase_csf_firmware_cs_input(stream, CS_USER_OUTPUT_HI, user_output >> 32);
 
 	kbase_csf_firmware_cs_input(stream, CS_CONFIG,
 		(queue->doorbell_nr << 8) | (queue->priority & 0xF));
@@ -1450,27 +1992,104 @@ static void program_cs(struct kbase_device *kbdev,
 	/* Enable all interrupts for now */
 	kbase_csf_firmware_cs_input(stream, CS_ACK_IRQ_MASK, ~((u32)0));
 
+	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
+
+	/* The fault bit could be misaligned between CS_REQ and CS_ACK if the
+	 * acknowledgment was deferred due to dump on fault and the group was
+	 * removed from the CSG slot before the fault could be acknowledged.
+	 */
+	if (queue->enabled) {
+		u32 const cs_ack =
+			kbase_csf_firmware_cs_output(stream, CS_ACK);
+
+		kbase_csf_firmware_cs_input_mask(stream, CS_REQ, cs_ack,
+						 CS_REQ_FAULT_MASK);
+	}
+
 	/*
 	 * Enable the CSG idle notification once the CS's ringbuffer
 	 * becomes empty or the CS becomes sync_idle, waiting sync update
 	 * or protected mode switch.
 	 */
 	kbase_csf_firmware_cs_input_mask(stream, CS_REQ,
-			CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK,
-			CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK);
+					 CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK |
+						 CS_REQ_IDLE_SHARED_SB_DEC_MASK,
+					 CS_REQ_IDLE_EMPTY_MASK | CS_REQ_IDLE_SYNC_WAIT_MASK |
+						 CS_REQ_IDLE_SHARED_SB_DEC_MASK);
 
 	/* Set state to START/STOP */
 	kbase_csf_firmware_cs_input_mask(stream, CS_REQ,
 		queue->enabled ? CS_REQ_STATE_START : CS_REQ_STATE_STOP,
 		CS_REQ_STATE_MASK);
+	kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr,
+					  ring_csg_doorbell);
+	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
 
 	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_START, group, queue, queue->enabled);
 
-	kbase_csf_ring_cs_kernel_doorbell(kbdev, csi_index, group->csg_nr,
-					  ring_csg_doorbell);
 	update_hw_active(queue, true);
 }
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void start_stream_sync(struct kbase_queue *queue)
+{
+	struct kbase_queue_group *group = queue->group;
+	struct kbase_device *kbdev = queue->kctx->kbdev;
+	struct kbase_csf_global_iface *global_iface = &kbdev->csf.global_iface;
+	struct kbase_csf_cmd_stream_group_info *ginfo;
+	struct kbase_csf_cmd_stream_info *stream;
+	int csi_index = queue->csi_index;
+	long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	if (WARN_ON(!group) ||
+	    WARN_ON(!kbasep_csf_scheduler_group_is_on_slot_locked(group)))
+		return;
+
+	ginfo = &global_iface->groups[group->csg_nr];
+	stream = &ginfo->streams[csi_index];
+
+	program_cs(kbdev, queue, true);
+
+	/* Timed wait */
+	remaining = wait_event_timeout(kbdev->csf.event_wait,
+		(CS_ACK_STATE_GET(kbase_csf_firmware_cs_output(stream, CS_ACK))
+		 == CS_ACK_STATE_START), remaining);
+
+	if (!remaining) {
+		const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_QUEUE_START
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
+		dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for queue to start on csi %d bound to group %d on slot %d",
+			 kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms,
+			 csi_index, group->handle, group->csg_nr);
+
+		/* TODO GPUCORE-25328: The CSG can't be terminated, the GPU
+		 * will be reset as a work-around.
+		 */
+		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
+			kbase_reset_gpu(kbdev);
+	}
+}
+#endif
+
+static int onslot_csg_add_new_queue(struct kbase_queue *queue)
+{
+	struct kbase_device *kbdev = queue->kctx->kbdev;
+	int err;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	err = kbase_csf_mcu_shared_add_queue(kbdev, queue);
+	if (!err)
+		program_cs(kbdev, queue, true);
+
+	return err;
+}
+
 int kbase_csf_scheduler_queue_start(struct kbase_queue *queue)
 {
 	struct kbase_queue_group *group = queue->group;
@@ -1482,15 +2101,22 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue)
 	kbase_reset_gpu_assert_prevented(kbdev);
 	lockdep_assert_held(&queue->kctx->csf.lock);
 
-	if (WARN_ON(!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND))
+	if (WARN_ON_ONCE(!group || queue->bind_state != KBASE_CSF_QUEUE_BOUND))
 		return -EINVAL;
 
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	if (unlikely(kbdev->csf.scheduler.state == SCHED_BUSY)) {
+		rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+		return -EBUSY;
+	}
+#endif
 
 	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_START, group, queue,
 				   group->run_state);
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_STATUS_WAIT, queue->group,
-				   queue, queue->status_wait);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS, queue->group, queue,
+				   queue->status_wait);
 
 	if (group->run_state == KBASE_CSF_GROUP_FAULT_EVICTED) {
 		err = -EIO;
@@ -1504,8 +2130,34 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue)
 
 		if (!err) {
 			queue->enabled = true;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+			/* If the kicked GPU queue can make progress, then only
+			 * need to abort the GPU power down.
+			 */
+			if (!queue_empty_or_blocked(queue))
+				cancel_gpu_idle_work(kbdev);
+#endif
 			if (kbasep_csf_scheduler_group_is_on_slot_locked(group)) {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+				/* The shader core power rails need to be turned
+				 * on before FW resumes the execution on HW and
+				 * that would happen when the CSI kernel doorbell
+				 * is rung from the following code.
+				 */
+				turn_on_sc_power_rails(kbdev);
+#endif
 				if (cs_enabled) {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+					spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock,
+						flags);
+					kbase_csf_ring_cs_kernel_doorbell(kbdev,
+						queue->csi_index, group->csg_nr,
+						true);
+					spin_unlock_irqrestore(
+						&kbdev->csf.scheduler.interrupt_lock, flags);
+				} else {
+					start_stream_sync(queue);
+#else
 					/* In normal situation, when a queue is
 					 * already running, the queue update
 					 * would be a doorbell kick on user
@@ -1519,16 +2171,37 @@ int kbase_csf_scheduler_queue_start(struct kbase_queue *queue)
 					 * user door-bell on such a case.
 					 */
 					kbase_csf_ring_cs_user_doorbell(kbdev, queue);
-				} else
-					program_cs(kbdev, queue, true);
+				} else {
+					err = onslot_csg_add_new_queue(queue);
+					/* For an on slot CSG, the only error in adding a new
+					 * queue to run is that the scheduler could not map
+					 * the required userio pages due to likely some resource
+					 * issues. In such a case, and if the group is yet
+					 * to enter its fatal error state, we return a -EBUSY
+					 * to the submitter for another kick. The queue itself
+					 * has yet to be programmed hence needs to remain its
+					 * previous (disabled) state. If the error persists,
+					 * the group will eventually reports a fatal error by
+					 * the group's error reporting mechanism, when the MCU
+					 * shared region map retry limit of the group is
+					 * exceeded. For such a case, the expected error value
+					 * is -EIO.
+					 */
+					if (unlikely(err)) {
+						queue->enabled = cs_enabled;
+						rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+						return (err != -EIO) ? -EBUSY : err;
+					}
+#endif
+				}
 			}
-			queue_delayed_work(system_long_wq,
-				&kbdev->csf.scheduler.ping_work,
-				msecs_to_jiffies(FIRMWARE_PING_INTERVAL_MS));
+			queue_delayed_work(system_long_wq, &kbdev->csf.scheduler.ping_work,
+					   msecs_to_jiffies(kbase_get_timeout_ms(
+						   kbdev, CSF_FIRMWARE_PING_TIMEOUT)));
 		}
 	}
 
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 
 	if (evicted)
 		kbase_csf_term_descheduled_queue_group(group);
@@ -1559,7 +2232,8 @@ static enum kbase_csf_csg_slot_state update_csg_slot_status(
 			slot_state = CSG_SLOT_RUNNING;
 			atomic_set(&csg_slot->state, slot_state);
 			csg_slot->trigger_jiffies = jiffies;
-			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STARTED, csg_slot->resident_group, state);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_RUNNING, csg_slot->resident_group,
+						 state);
 			dev_dbg(kbdev->dev, "Group %u running on slot %d\n",
 				csg_slot->resident_group->handle, slot);
 		}
@@ -1649,17 +2323,24 @@ static void halt_csg_slot(struct kbase_queue_group *group, bool suspend)
 		dev_dbg(kbdev->dev, "slot %d wait for up-running\n", slot);
 		remaining = wait_event_timeout(kbdev->csf.event_wait,
 				csg_slot_running(kbdev, slot), remaining);
-		if (!remaining)
+		if (!remaining) {
+			const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_CSG_SLOT_READY
+			};
+			pixel_gpu_uevent_send(kbdev, &evt);
 			dev_warn(kbdev->dev,
 				 "[%llu] slot %d timeout (%d ms) on up-running\n",
 				 kbase_backend_get_cycle_cnt(kbdev),
 				 slot, kbdev->csf.fw_timeout_ms);
+		}
 	}
 
 	if (csg_slot_running(kbdev, slot)) {
 		unsigned long flags;
 		struct kbase_csf_cmd_stream_group_info *ginfo =
 						&global_iface->groups[slot];
+
 		u32 halt_cmd = suspend ? CSG_REQ_STATE_SUSPEND :
 					 CSG_REQ_STATE_TERMINATE;
 
@@ -1670,15 +2351,15 @@ static void halt_csg_slot(struct kbase_queue_group *group, bool suspend)
 		/* Set state to SUSPEND/TERMINATE */
 		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, halt_cmd,
 						  CSG_REQ_STATE_MASK);
+		kbase_csf_ring_csg_doorbell(kbdev, slot);
 		spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock,
 					flags);
 		atomic_set(&csg_slot[slot].state, CSG_SLOT_DOWN2STOP);
 		csg_slot[slot].trigger_jiffies = jiffies;
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STOP, group, halt_cmd);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STOP_REQ, group, halt_cmd);
 
-		KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG(
-			kbdev, kbdev->gpu_props.props.raw_props.gpu_id, slot);
-		kbase_csf_ring_csg_doorbell(kbdev, slot);
+		KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG(
+			kbdev, kbdev->gpu_props.props.raw_props.gpu_id, slot, suspend);
 	}
 }
 
@@ -1692,6 +2373,31 @@ static void suspend_csg_slot(struct kbase_queue_group *group)
 	halt_csg_slot(group, true);
 }
 
+static bool csf_wait_ge_condition_supported(struct kbase_device *kbdev)
+{
+	const uint32_t glb_major = GLB_VERSION_MAJOR_GET(kbdev->csf.global_iface.version);
+	const uint32_t glb_minor = GLB_VERSION_MINOR_GET(kbdev->csf.global_iface.version);
+
+	switch (glb_major) {
+	case 0:
+		break;
+	case 1:
+		if (glb_minor >= 4)
+			return true;
+		break;
+	case 2:
+		if (glb_minor >= 6)
+			return true;
+		break;
+	case 3:
+		if (glb_minor >= 6)
+			return true;
+		break;
+	default:
+		return true;
+	}
+	return false;
+}
 /**
  * evaluate_sync_update() - Evaluate the sync wait condition the GPU command
  *                          queue has been blocked on.
@@ -1705,23 +2411,38 @@ static bool evaluate_sync_update(struct kbase_queue *queue)
 	struct kbase_vmap_struct *mapping;
 	bool updated = false;
 	u32 *sync_ptr;
+	u32 sync_wait_size;
+	u32 sync_wait_align_mask;
 	u32 sync_wait_cond;
 	u32 sync_current_val;
 	struct kbase_device *kbdev;
+	bool sync_wait_align_valid = false;
+	bool sync_wait_cond_valid = false;
 
 	if (WARN_ON(!queue))
 		return false;
 
 	kbdev = queue->kctx->kbdev;
+
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 
+	sync_wait_size = CS_STATUS_WAIT_SYNC_WAIT_SIZE_GET(queue->status_wait);
+	sync_wait_align_mask =
+		(sync_wait_size == 0 ? BASEP_EVENT32_ALIGN_BYTES : BASEP_EVENT64_ALIGN_BYTES) - 1;
+	sync_wait_align_valid = ((uintptr_t)queue->sync_ptr & sync_wait_align_mask) == 0;
+	if (!sync_wait_align_valid) {
+		dev_dbg(queue->kctx->kbdev->dev, "sync memory VA 0x%016llX is misaligned",
+			queue->sync_ptr);
+		goto out;
+	}
+
 	sync_ptr = kbase_phy_alloc_mapping_get(queue->kctx, queue->sync_ptr,
 					&mapping);
 
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE, queue->group,
-				   queue, queue->sync_ptr);
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_BLOCKED_REASON,
-				   queue->group, queue, queue->blocked_reason);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVAL_START, queue->group, queue,
+				   queue->sync_ptr);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_BLOCKED_REASON, queue->group, queue,
+				   queue->blocked_reason);
 
 	if (!sync_ptr) {
 		dev_dbg(queue->kctx->kbdev->dev, "sync memory VA 0x%016llX already freed",
@@ -1731,19 +2452,24 @@ static bool evaluate_sync_update(struct kbase_queue *queue)
 
 	sync_wait_cond =
 		CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GET(queue->status_wait);
+	sync_wait_cond_valid = (sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) ||
+			       (sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE) ||
+			       ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE) &&
+				csf_wait_ge_condition_supported(kbdev));
 
-	WARN_ON((sync_wait_cond != CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) &&
-		(sync_wait_cond != CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE));
+	WARN_ON(!sync_wait_cond_valid);
 
 	sync_current_val = READ_ONCE(*sync_ptr);
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_CURRENT_VAL, queue->group,
-				   queue, sync_current_val);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_CUR_VAL, queue->group, queue,
+				   sync_current_val);
 
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_TEST_VAL, queue->group,
-				   queue, queue->sync_value);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_TEST_VAL, queue->group, queue,
+				   queue->sync_value);
 
 	if (((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GT) &&
 	     (sync_current_val > queue->sync_value)) ||
+	    ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_GE) &&
+	     (sync_current_val >= queue->sync_value) && csf_wait_ge_condition_supported(kbdev)) ||
 	    ((sync_wait_cond == CS_STATUS_WAIT_SYNC_WAIT_CONDITION_LE) &&
 	     (sync_current_val <= queue->sync_value))) {
 		/* The sync wait condition is satisfied so the group to which
@@ -1757,8 +2483,7 @@ static bool evaluate_sync_update(struct kbase_queue *queue)
 
 	kbase_phy_alloc_mapping_put(queue->kctx, mapping);
 out:
-	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVALUATED,
-				   queue->group, queue, updated);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_EVAL_END, queue->group, queue, updated);
 	return updated;
 }
 
@@ -1792,10 +2517,10 @@ bool save_slot_cs(struct kbase_csf_cmd_stream_group_info const *const ginfo,
 	queue->saved_cmd_ptr = cmd_ptr;
 #endif
 
-	KBASE_KTRACE_ADD_CSF_GRP_Q(stream->kbdev, QUEUE_SYNC_STATUS_WAIT,
-				   queue->group, queue, status);
+	KBASE_KTRACE_ADD_CSF_GRP_Q(stream->kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS, queue->group,
+				   queue, status);
 
-	if (CS_STATUS_WAIT_SYNC_WAIT_GET(status)) {
+	if (CS_STATUS_WAIT_SYNC_WAIT_GET(status) || CS_STATUS_WAIT_SB_MASK_GET(status)) {
 		queue->status_wait = status;
 		queue->sync_ptr = kbase_csf_firmware_cs_output(stream,
 			CS_STATUS_WAIT_SYNC_POINTER_LO);
@@ -1811,7 +2536,8 @@ bool save_slot_cs(struct kbase_csf_cmd_stream_group_info const *const ginfo,
 			kbase_csf_firmware_cs_output(stream,
 						     CS_STATUS_BLOCKED_REASON));
 
-		if (!evaluate_sync_update(queue)) {
+		if ((queue->blocked_reason == CS_STATUS_BLOCKED_ON_SB_WAIT) ||
+		    !evaluate_sync_update(queue)) {
 			is_waiting = true;
 		} else {
 			/* Sync object already got updated & met the condition
@@ -1847,12 +2573,48 @@ static void schedule_in_cycle(struct kbase_queue_group *group, bool force)
 	 * of work needs to be enforced in situation such as entering into
 	 * protected mode).
 	 */
-	if ((likely(scheduler_timer_is_enabled_nolock(kbdev)) || force) &&
-			!scheduler->tock_pending_request) {
-		scheduler->tock_pending_request = true;
+	if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) || force) {
 		dev_dbg(kbdev->dev, "Kicking async for group %d\n",
 			group->handle);
-		mod_delayed_work(scheduler->wq, &scheduler->tock_work, 0);
+		kbase_csf_scheduler_invoke_tock(kbdev);
+	}
+}
+
+static void ktrace_log_group_state(struct kbase_queue_group *const group)
+{
+	switch (group->run_state) {
+	case KBASE_CSF_GROUP_INACTIVE:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_INACTIVE, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_RUNNABLE:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_RUNNABLE, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_IDLE:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_IDLE, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_SUSPENDED:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_SUSPENDED_ON_IDLE:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED_ON_IDLE, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_SUSPENDED_ON_WAIT_SYNC,
+					group, group->run_state);
+		break;
+	case KBASE_CSF_GROUP_FAULT_EVICTED:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_FAULT_EVICTED, group,
+					group->run_state);
+		break;
+	case KBASE_CSF_GROUP_TERMINATED:
+		KBASE_KTRACE_ADD_CSF_GRP(group->kctx->kbdev, CSF_GROUP_TERMINATED, group,
+					group->run_state);
+		break;
 	}
 }
 
@@ -1873,13 +2635,15 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler,
 
 	group->run_state = run_state;
 
+	ktrace_log_group_state(group);
+
 	if (run_state == KBASE_CSF_GROUP_RUNNABLE)
 		group->prepared_seq_num = KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID;
 
 	list_add_tail(&group->link,
 			&kctx->csf.sched.runnable_groups[group->priority]);
 	kctx->csf.sched.num_runnable_grps++;
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_INSERT_RUNNABLE, group,
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_INSERT, group,
 				 kctx->csf.sched.num_runnable_grps);
 
 	/* Add the kctx if not yet in runnable kctxs */
@@ -1887,18 +2651,17 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler,
 		/* First runnable csg, adds to the runnable_kctxs */
 		INIT_LIST_HEAD(&kctx->csf.link);
 		list_add_tail(&kctx->csf.link, &scheduler->runnable_kctxs);
-		KBASE_KTRACE_ADD(kbdev, SCHEDULER_INSERT_RUNNABLE, kctx, 0u);
+		KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_INSERT, kctx, 0u);
 	}
 
 	scheduler->total_runnable_grps++;
 
-	if (likely(scheduler_timer_is_enabled_nolock(kbdev)) &&
-	    (scheduler->total_runnable_grps == 1 ||
-	     scheduler->state == SCHED_SUSPENDED ||
+	if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) &&
+	    (scheduler->total_runnable_grps == 1 || scheduler->state == SCHED_SUSPENDED ||
 	     scheduler->state == SCHED_SLEEPING)) {
 		dev_dbg(kbdev->dev, "Kicking scheduler on first runnable group\n");
 		/* Fire a scheduling to start the time-slice */
-		enqueue_tick_work(kbdev);
+		kbase_csf_scheduler_invoke_tick(kbdev);
 	} else
 		schedule_in_cycle(group, false);
 
@@ -1908,6 +2671,17 @@ void insert_group_to_runnable(struct kbase_csf_scheduler *const scheduler,
 	scheduler_wakeup(kbdev, false);
 }
 
+static void cancel_tick_work(struct kbase_csf_scheduler *const scheduler)
+{
+	hrtimer_cancel(&scheduler->tick_timer);
+	atomic_set(&scheduler->pending_tick_work, false);
+}
+
+static void cancel_tock_work(struct kbase_csf_scheduler *const scheduler)
+{
+	atomic_set(&scheduler->pending_tock_work, false);
+}
+
 static
 void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler,
 		struct kbase_queue_group *group,
@@ -1924,6 +2698,9 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler,
 	WARN_ON(!queue_group_scheduled_locked(group));
 
 	group->run_state = run_state;
+
+	ktrace_log_group_state(group);
+
 	list_del_init(&group->link);
 
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
@@ -1944,7 +2721,7 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler,
 		if (kbase_prepare_to_reset_gpu(kctx->kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu(kctx->kbdev);
 
-		KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_EXIT_PROTM,
+		KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_PROTM_EXIT,
 					 scheduler->active_protm_grp, 0u);
 		scheduler->active_protm_grp = NULL;
 	}
@@ -1974,13 +2751,12 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler,
 	}
 
 	kctx->csf.sched.num_runnable_grps--;
-	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_REMOVE_RUNNABLE, group,
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_RUNNABLE_REMOVE, group,
 				 kctx->csf.sched.num_runnable_grps);
 	new_head_grp = (!list_empty(list)) ?
 				list_first_entry(list, struct kbase_queue_group, link) :
 				NULL;
-	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_HEAD_RUNNABLE, new_head_grp,
-				 0u);
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_RUNNABLE_HEAD, new_head_grp, 0u);
 
 	if (kctx->csf.sched.num_runnable_grps == 0) {
 		struct kbase_context *new_head_kctx;
@@ -1989,23 +2765,21 @@ void remove_group_from_runnable(struct kbase_csf_scheduler *const scheduler,
 		list_del_init(&kctx->csf.link);
 		if (scheduler->top_ctx == kctx)
 			scheduler->top_ctx = NULL;
-		KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_REMOVE_RUNNABLE, kctx,
-				 0u);
+		KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_RUNNABLE_KCTX_REMOVE, kctx, 0u);
 		new_head_kctx = (!list_empty(kctx_list)) ?
 					list_first_entry(kctx_list, struct kbase_context, csf.link) :
 					NULL;
-		KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_HEAD_RUNNABLE,
-				 new_head_kctx, 0u);
+		KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_RUNNABLE_KCTX_HEAD, new_head_kctx, 0u);
 	}
 
 	WARN_ON(scheduler->total_runnable_grps == 0);
 	scheduler->total_runnable_grps--;
 	if (!scheduler->total_runnable_grps) {
 		dev_dbg(kctx->kbdev->dev, "Scheduler idle has no runnable groups");
-		cancel_tick_timer(kctx->kbdev);
+		cancel_tick_work(scheduler);
 		WARN_ON(atomic_read(&scheduler->non_idle_offslot_grps));
 		if (scheduler->state != SCHED_SUSPENDED)
-			enqueue_gpu_idle_work(scheduler);
+			enqueue_gpu_idle_work(scheduler, 0);
 	}
 	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, SCHEDULER_TOP_GRP, scheduler->top_grp,
 			scheduler->num_active_address_spaces |
@@ -2022,9 +2796,11 @@ static void insert_group_to_idle_wait(struct kbase_queue_group *const group)
 
 	list_add_tail(&group->link, &kctx->csf.sched.idle_wait_groups);
 	kctx->csf.sched.num_idle_wait_grps++;
-	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_INSERT_IDLE_WAIT, group,
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_INSERT, group,
 				 kctx->csf.sched.num_idle_wait_grps);
 	group->run_state = KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC;
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, CSF_GROUP_SUSPENDED_ON_WAIT_SYNC, group,
+				 group->run_state);
 	dev_dbg(kctx->kbdev->dev,
 		"Group-%d suspended on sync_wait, total wait_groups: %u\n",
 		group->handle, kctx->csf.sched.num_idle_wait_grps);
@@ -2043,14 +2819,14 @@ static void remove_group_from_idle_wait(struct kbase_queue_group *const group)
 	list_del_init(&group->link);
 	WARN_ON(kctx->csf.sched.num_idle_wait_grps == 0);
 	kctx->csf.sched.num_idle_wait_grps--;
-	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_REMOVE_IDLE_WAIT, group,
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_REMOVE, group,
 				 kctx->csf.sched.num_idle_wait_grps);
 	new_head_grp = (!list_empty(list)) ?
 				list_first_entry(list, struct kbase_queue_group, link) :
 				NULL;
-	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_HEAD_IDLE_WAIT,
-				 new_head_grp, 0u);
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, GROUP_IDLE_WAIT_HEAD, new_head_grp, 0u);
 	group->run_state = KBASE_CSF_GROUP_INACTIVE;
+	KBASE_KTRACE_ADD_CSF_GRP(kctx->kbdev, CSF_GROUP_INACTIVE, group, group->run_state);
 }
 
 static void deschedule_idle_wait_group(struct kbase_csf_scheduler *scheduler,
@@ -2065,7 +2841,7 @@ static void deschedule_idle_wait_group(struct kbase_csf_scheduler *scheduler,
 	insert_group_to_idle_wait(group);
 }
 
-static void update_offslot_non_idle_cnt_for_faulty_grp(struct kbase_queue_group *group)
+static void update_offslot_non_idle_cnt(struct kbase_queue_group *group)
 {
 	struct kbase_device *kbdev = group->kctx->kbdev;
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
@@ -2075,8 +2851,7 @@ static void update_offslot_non_idle_cnt_for_faulty_grp(struct kbase_queue_group
 	if (group->prepared_seq_num < scheduler->non_idle_scanout_grps) {
 		int new_val =
 			atomic_dec_return(&scheduler->non_idle_offslot_grps);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC,
-					 group, new_val);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group, new_val);
 	}
 }
 
@@ -2092,8 +2867,7 @@ static void update_offslot_non_idle_cnt_for_onslot_grp(struct kbase_queue_group
 	if (group->prepared_seq_num < scheduler->non_idle_scanout_grps) {
 		int new_val =
 			atomic_dec_return(&scheduler->non_idle_offslot_grps);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC,
-					 group, new_val);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group, new_val);
 	}
 }
 
@@ -2113,15 +2887,15 @@ static void update_offslot_non_idle_cnt_on_grp_suspend(
 			if (group->run_state == KBASE_CSF_GROUP_SUSPENDED) {
 				int new_val = atomic_inc_return(
 					&scheduler->non_idle_offslot_grps);
-				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC,
-					group, new_val);
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC,
+							 group, new_val);
 			}
 		} else {
 			if (group->run_state != KBASE_CSF_GROUP_SUSPENDED) {
 				int new_val = atomic_dec_return(
 					&scheduler->non_idle_offslot_grps);
-				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC,
-					group, new_val);
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC,
+							 group, new_val);
 			}
 		}
 	} else {
@@ -2129,8 +2903,8 @@ static void update_offslot_non_idle_cnt_on_grp_suspend(
 		if (group->run_state == KBASE_CSF_GROUP_SUSPENDED) {
 			int new_val = atomic_inc_return(
 				&scheduler->non_idle_offslot_grps);
-			KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC,
-						 group, new_val);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group,
+						 new_val);
 		}
 	}
 }
@@ -2148,7 +2922,7 @@ static bool confirm_cmd_buf_empty(struct kbase_queue const *queue)
 	u32 glb_version = iface->version;
 
 	u64 const *input_addr = (u64 const *)queue->user_io_addr;
-	u64 const *output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE);
+	u64 const *output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64));
 
 	if (glb_version >= kbase_csf_interface_version(1, 0, 0)) {
 		/* CS_STATUS_SCOREBOARD supported from CSF 1.0 */
@@ -2162,6 +2936,11 @@ static bool confirm_cmd_buf_empty(struct kbase_queue const *queue)
 						     CS_STATUS_SCOREBOARDS));
 	}
 
+	/*
+	 * These 64-bit reads and writes will be atomic on a 64-bit kernel but may
+	 * not be atomic on 32-bit kernels. Support for 32-bit kernels is limited to
+	 * build-only.
+	 */
 	cs_empty = (input_addr[CS_INSERT_LO / sizeof(u64)] ==
 		    output_addr[CS_EXTRACT_LO / sizeof(u64)]);
 	cs_idle = cs_empty && (!sb_status);
@@ -2204,9 +2983,14 @@ static void save_csg_slot(struct kbase_queue_group *group)
 			if (!queue || !queue->enabled)
 				continue;
 
-			if (save_slot_cs(ginfo, queue))
-				sync_wait = true;
-			else {
+			if (save_slot_cs(ginfo, queue)) {
+				/* sync_wait is only true if the queue is blocked on
+				 * a CQS and not a scoreboard.
+				 */
+				if (queue->blocked_reason !=
+				    CS_STATUS_BLOCKED_ON_SB_WAIT)
+					sync_wait = true;
+			} else {
 				/* Need to confirm if ringbuffer of the GPU
 				 * queue is empty or not. A race can arise
 				 * between the flush of GPU queue and suspend
@@ -2231,14 +3015,19 @@ static void save_csg_slot(struct kbase_queue_group *group)
 			else {
 				group->run_state =
 					KBASE_CSF_GROUP_SUSPENDED_ON_IDLE;
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED_ON_IDLE, group,
+							 group->run_state);
 				dev_dbg(kbdev->dev, "Group-%d suspended: idle",
 					group->handle);
 			}
 		} else {
 			group->run_state = KBASE_CSF_GROUP_SUSPENDED;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED, group,
+						 group->run_state);
 		}
 
 		update_offslot_non_idle_cnt_on_grp_suspend(group);
+		kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(group);
 	}
 }
 
@@ -2255,7 +3044,7 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group)
 	s8 slot;
 	struct kbase_csf_csg_slot *csg_slot;
 	unsigned long flags;
-	u32 i;
+	u32 csg_req, csg_ack, i;
 	bool as_fault = false;
 
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
@@ -2285,6 +3074,8 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group)
 
 	unassign_user_doorbell_from_group(kbdev, group);
 
+	kbasep_platform_event_work_end(group);
+
 	/* The csg does not need cleanup other than drop its AS */
 	spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags);
 	as_fault = kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT);
@@ -2293,8 +3084,17 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group)
 		as_fault = true;
 	spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags);
 
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	emit_gpu_metrics_to_frontend(kbdev);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+
 	/* now marking the slot is vacant */
 	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
+	/* Process pending SYNC_UPDATE, if any */
+	csg_req = kbase_csf_firmware_csg_input_read(ginfo, CSG_REQ);
+	csg_ack = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
+	kbase_csf_handle_csg_sync_update(kbdev, ginfo, group, csg_req, csg_ack);
+
 	kbdev->csf.scheduler.csg_slots[slot].resident_group = NULL;
 	clear_bit(slot, kbdev->csf.scheduler.csg_slots_idle_mask);
 	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group,
@@ -2315,6 +3115,11 @@ static bool cleanup_csg_slot(struct kbase_queue_group *group)
 	KBASE_TLSTREAM_TL_KBASE_DEVICE_DEPROGRAM_CSG(kbdev,
 		kbdev->gpu_props.props.raw_props.gpu_id, slot);
 
+	/* Notify the group is off-slot and the csg_reg might be available for
+	 * resue with other groups in a 'lazy unbinding' style.
+	 */
+	kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group);
+
 	return as_fault;
 }
 
@@ -2351,16 +3156,17 @@ static void update_csg_slot_priority(struct kbase_queue_group *group, u8 prio)
 		return;
 
 	/* Read the csg_ep_cfg back for updating the priority field */
-	ep_cfg = kbase_csf_firmware_csg_input_read(ginfo, CSG_EP_REQ);
+	ep_cfg = kbase_csf_firmware_csg_input_read(ginfo, CSG_EP_REQ_LO);
 	prev_prio = CSG_EP_REQ_PRIORITY_GET(ep_cfg);
 	ep_cfg = CSG_EP_REQ_PRIORITY_SET(ep_cfg, prio);
-	kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ, ep_cfg);
+	kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ_LO, ep_cfg);
 
 	spin_lock_irqsave(&kbdev->csf.scheduler.interrupt_lock, flags);
 	csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
 	csg_req ^= CSG_REQ_EP_CFG_MASK;
 	kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req,
 					  CSG_REQ_EP_CFG_MASK);
+	kbase_csf_ring_csg_doorbell(kbdev, slot);
 	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
 
 	csg_slot->priority = prio;
@@ -2369,9 +3175,8 @@ static void update_csg_slot_priority(struct kbase_queue_group *group, u8 prio)
 		group->handle, group->kctx->tgid, group->kctx->id, slot,
 		prev_prio, prio);
 
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_PRIO_UPDATE, group, prev_prio);
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_PRIO_UPDATE, group, prev_prio);
 
-	kbase_csf_ring_csg_doorbell(kbdev, slot);
 	set_bit(slot, kbdev->csf.scheduler.csg_slots_prio_update);
 }
 
@@ -2388,18 +3193,17 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	const u64 compute_mask = shader_core_mask & group->compute_mask;
 	const u64 fragment_mask = shader_core_mask & group->fragment_mask;
 	const u64 tiler_mask = tiler_core_mask & group->tiler_mask;
-	const u8 num_cores = kbdev->gpu_props.num_cores;
-	const u8 compute_max = min(num_cores, group->compute_max);
-	const u8 fragment_max = min(num_cores, group->fragment_max);
+	const u8 compute_max = min(kbdev->gpu_props.num_cores, group->compute_max);
+	const u8 fragment_max = min(kbdev->gpu_props.num_cores, group->fragment_max);
 	const u8 tiler_max = min(CSG_TILER_MAX, group->tiler_max);
 	struct kbase_csf_cmd_stream_group_info *ginfo;
-	u32 ep_cfg = 0;
+	u64 ep_cfg = 0;
 	u32 csg_req;
 	u32 state;
 	int i;
 	unsigned long flags;
-	const u64 normal_suspend_buf =
-		group->normal_suspend_buf.reg->start_pfn << PAGE_SHIFT;
+	u64 normal_suspend_buf;
+	u64 protm_suspend_buf;
 	struct kbase_csf_csg_slot *csg_slot =
 		&kbdev->csf.scheduler.csg_slots[slot];
 
@@ -2411,6 +3215,19 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 
 	WARN_ON(atomic_read(&csg_slot->state) != CSG_SLOT_READY);
 
+	if (unlikely(kbase_csf_mcu_shared_group_bind_csg_reg(kbdev, group))) {
+		dev_warn(kbdev->dev,
+			 "Couldn't bind MCU shared csg_reg for group %d of context %d_%d, slot=%u",
+			 group->handle, group->kctx->tgid, kctx->id, slot);
+		kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group);
+		return;
+	}
+
+	/* The suspend buf has already been mapped through binding to csg_reg */
+	normal_suspend_buf = group->normal_suspend_buf.gpu_va;
+	protm_suspend_buf = group->protected_suspend_buf.gpu_va;
+	WARN_ONCE(!normal_suspend_buf, "Normal suspend buffer not mapped");
+
 	ginfo = &global_iface->groups[slot];
 
 	/* Pick an available address space for this context */
@@ -2423,6 +3240,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	if (kctx->as_nr == KBASEP_AS_NR_INVALID) {
 		dev_warn(kbdev->dev, "Could not get a valid AS for group %d of context %d_%d on slot %d\n",
 			 group->handle, kctx->tgid, kctx->id, slot);
+		kbase_csf_mcu_shared_set_group_csg_reg_unused(kbdev, group);
 		return;
 	}
 
@@ -2430,6 +3248,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	set_bit(slot, kbdev->csf.scheduler.csg_inuse_bitmap);
 	kbdev->csf.scheduler.csg_slots[slot].resident_group = group;
 	group->csg_nr = slot;
+
 	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
 
 	assign_user_doorbell_to_group(kbdev, group);
@@ -2452,6 +3271,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 				     fragment_mask & U32_MAX);
 	kbase_csf_firmware_csg_input(ginfo, CSG_ALLOW_FRAGMENT_HI,
 				     fragment_mask >> 32);
+
 	kbase_csf_firmware_csg_input(ginfo, CSG_ALLOW_OTHER,
 				     tiler_mask & U32_MAX);
 
@@ -2463,7 +3283,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	ep_cfg = CSG_EP_REQ_FRAGMENT_EP_SET(ep_cfg, fragment_max);
 	ep_cfg = CSG_EP_REQ_TILER_EP_SET(ep_cfg, tiler_max);
 	ep_cfg = CSG_EP_REQ_PRIORITY_SET(ep_cfg, prio);
-	kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ, ep_cfg);
+	kbase_csf_firmware_csg_input(ginfo, CSG_EP_REQ_LO, ep_cfg & U32_MAX);
 
 	/* Program the address space number assigned to the context */
 	kbase_csf_firmware_csg_input(ginfo, CSG_CONFIG, kctx->as_nr);
@@ -2473,16 +3293,22 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	kbase_csf_firmware_csg_input(ginfo, CSG_SUSPEND_BUF_HI,
 			normal_suspend_buf >> 32);
 
-	if (group->protected_suspend_buf.reg) {
-		const u64 protm_suspend_buf =
-			group->protected_suspend_buf.reg->start_pfn <<
-				PAGE_SHIFT;
-		kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_LO,
-			protm_suspend_buf & U32_MAX);
-		kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_HI,
-			protm_suspend_buf >> 32);
-	}
+	/* Note, we program the P-mode buffer pointer here, but actual runtime
+	 * enter into pmode execution is controlled by the P-mode phy pages are
+	 * allocated and mapped with the bound csg_reg, which has a specific flag
+	 * for indicating this P-mode runnable condition before a group is
+	 * granted its p-mode section entry. Without a P-mode entry, the buffer
+	 * pointed is not going to be accessed at all.
+	 */
+	kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_LO, protm_suspend_buf & U32_MAX);
+	kbase_csf_firmware_csg_input(ginfo, CSG_PROTM_SUSPEND_BUF_HI, protm_suspend_buf >> 32);
 
+	if (group->dvs_buf) {
+		kbase_csf_firmware_csg_input(ginfo, CSG_DVS_BUF_LO,
+					     group->dvs_buf & U32_MAX);
+		kbase_csf_firmware_csg_input(ginfo, CSG_DVS_BUF_HI,
+					     group->dvs_buf >> 32);
+	}
 
 	/* Enable all interrupts for now */
 	kbase_csf_firmware_csg_input(ginfo, CSG_ACK_IRQ_MASK, ~((u32)0));
@@ -2503,6 +3329,7 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 
 	kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ,
 			state, CSG_REQ_STATE_MASK);
+	kbase_csf_ring_csg_doorbell(kbdev, slot);
 	spin_unlock_irqrestore(&kbdev->csf.scheduler.interrupt_lock, flags);
 
 	/* Update status before rings the door-bell, marking ready => run */
@@ -2518,15 +3345,19 @@ static void program_csg_slot(struct kbase_queue_group *group, s8 slot,
 	dev_dbg(kbdev->dev, "Starting group %d of context %d_%d on slot %d with priority %u\n",
 		group->handle, kctx->tgid, kctx->id, slot, prio);
 
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_START, group,
-				(((u64)ep_cfg) << 32) |
-				((((u32)kctx->as_nr) & 0xF) << 16) |
-				(state & (CSG_REQ_STATE_MASK >> CS_REQ_STATE_SHIFT)));
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_START_REQ, group,
+				 (((u64)ep_cfg) << 32) | ((((u32)kctx->as_nr) & 0xF) << 16) |
+					 (state & (CSG_REQ_STATE_MASK >> CS_REQ_STATE_SHIFT)));
 
-	kbase_csf_ring_csg_doorbell(kbdev, slot);
+	kbasep_platform_event_work_begin(group);
+	/* Update the heap reclaim manager */
+	kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(group);
 
 	/* Programming a slot consumes a group from scanout */
 	update_offslot_non_idle_cnt_for_onslot_grp(group);
+
+	/* Notify the group's bound csg_reg is now in active use */
+	kbase_csf_mcu_shared_set_group_csg_reg_active(kbdev, group);
 }
 
 static void remove_scheduled_group(struct kbase_device *kbdev,
@@ -2547,7 +3378,7 @@ static void remove_scheduled_group(struct kbase_device *kbdev,
 }
 
 static void sched_evict_group(struct kbase_queue_group *group, bool fault,
-			      bool update_non_idle_offslot_grps_cnt)
+			      bool update_non_idle_offslot_grps_cnt_from_run_state)
 {
 	struct kbase_context *kctx = group->kctx;
 	struct kbase_device *kbdev = kctx->kbdev;
@@ -2558,13 +3389,13 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault,
 	if (queue_group_scheduled_locked(group)) {
 		u32 i;
 
-		if (update_non_idle_offslot_grps_cnt &&
+		if (update_non_idle_offslot_grps_cnt_from_run_state &&
 		    (group->run_state == KBASE_CSF_GROUP_SUSPENDED ||
 		     group->run_state == KBASE_CSF_GROUP_RUNNABLE)) {
 			int new_val = atomic_dec_return(
 				&scheduler->non_idle_offslot_grps);
-			KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC,
-						 group, new_val);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC, group,
+						 new_val);
 		}
 
 		for (i = 0; i < MAX_SUPPORTED_STREAMS_PER_GROUP; i++) {
@@ -2573,8 +3404,11 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault,
 		}
 
 		if (group->prepared_seq_num !=
-				KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID)
+				KBASEP_GROUP_PREPARED_SEQ_NUM_INVALID) {
+			if (!update_non_idle_offslot_grps_cnt_from_run_state)
+				update_offslot_non_idle_cnt(group);
 			remove_scheduled_group(kbdev, group);
+		}
 
 		if (group->run_state == KBASE_CSF_GROUP_SUSPENDED_ON_WAIT_SYNC)
 			remove_group_from_idle_wait(group);
@@ -2585,17 +3419,25 @@ static void sched_evict_group(struct kbase_queue_group *group, bool fault,
 
 		WARN_ON(group->run_state != KBASE_CSF_GROUP_INACTIVE);
 
-		if (fault)
+		if (fault) {
 			group->run_state = KBASE_CSF_GROUP_FAULT_EVICTED;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_FAULT_EVICTED, group,
+						 scheduler->total_runnable_grps);
+		}
 
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_EVICT_SCHED, group,
-				(((u64)scheduler->total_runnable_grps) << 32) |
-				((u32)group->run_state));
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_EVICT, group,
+					 (((u64)scheduler->total_runnable_grps) << 32) |
+						 ((u32)group->run_state));
 		dev_dbg(kbdev->dev, "group %d exited scheduler, num_runnable_grps %d\n",
 			group->handle, scheduler->total_runnable_grps);
 		/* Notify a group has been evicted */
 		wake_up_all(&kbdev->csf.event_wait);
 	}
+
+	kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(group);
+
+	/* Clear all the bound shared regions and unmap any in-place MMU maps */
+	kbase_csf_mcu_shared_clear_evicted_group_csg_reg(kbdev, group);
 }
 
 static int term_group_sync(struct kbase_queue_group *group)
@@ -2607,13 +3449,23 @@ static int term_group_sync(struct kbase_queue_group *group)
 	term_csg_slot(group);
 
 	remaining = wait_event_timeout(kbdev->csf.event_wait,
-		csg_slot_stopped_locked(kbdev, group->csg_nr), remaining);
-
-	if (!remaining) {
+		group->cs_unrecoverable || csg_slot_stopped_locked(kbdev, group->csg_nr),
+		remaining);
+
+	if (unlikely(!remaining)) {
+		enum dumpfault_error_type error_type = DF_CSG_TERMINATE_TIMEOUT;
+		const struct gpu_uevent evt = {
+			.type = GPU_UEVENT_TYPE_KMD_ERROR,
+			.info = GPU_UEVENT_INFO_GROUP_TERM
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
 		dev_warn(kbdev->dev, "[%llu] term request timeout (%d ms) for group %d of context %d_%d on slot %d",
 			 kbase_backend_get_cycle_cnt(kbdev), kbdev->csf.fw_timeout_ms,
 			 group->handle, group->kctx->tgid,
 			 group->kctx->id, group->csg_nr);
+		if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS))
+			error_type = DF_PING_REQUEST_TIMEOUT;
+		kbase_debug_csf_fault_notify(kbdev, group->kctx, error_type);
 		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu(kbdev);
 
@@ -2628,13 +3480,15 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group)
 {
 	struct kbase_device *kbdev = group->kctx->kbdev;
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	bool wait_for_termination = true;
 	bool on_slot;
 
 	kbase_reset_gpu_assert_failed_or_prevented(kbdev);
 	lockdep_assert_held(&group->kctx->csf.lock);
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_DESCHEDULE, group, group->run_state);
+	wait_for_dump_complete_on_group_deschedule(group);
 	if (!queue_group_scheduled_locked(group))
 		goto unlock;
 
@@ -2642,39 +3496,28 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group)
 
 #ifdef KBASE_PM_RUNTIME
 	/* If the queue group is on slot and Scheduler is in SLEEPING state,
-	 * then we need to wait here for Scheduler to exit the sleep state
-	 * (i.e. wait for the runtime suspend or power down of GPU). This would
-	 * be better than aborting the power down. The group will be suspended
-	 * anyways on power down, so won't have to send the CSG termination
-	 * request to FW.
+	 * then we need to wake up the Scheduler to exit the sleep state rather
+	 * than waiting for the runtime suspend or power down of GPU.
+	 * The group termination is usually triggered in the context of Application
+	 * thread and it has been seen that certain Apps can destroy groups at
+	 * random points and not necessarily when the App is exiting.
 	 */
 	if (on_slot && (scheduler->state == SCHED_SLEEPING)) {
-		if (wait_for_scheduler_to_exit_sleep(kbdev)) {
+		scheduler_wakeup(kbdev, true);
+
+		/* Wait for MCU firmware to start running */
+		if (kbase_csf_scheduler_wait_mcu_active(kbdev)) {
 			dev_warn(
 				kbdev->dev,
-				"Wait for scheduler to exit sleep state timedout when terminating group %d of context %d_%d on slot %d",
+				"[%llu] Wait for MCU active failed when terminating group %d of context %d_%d on slot %d",
+				kbase_backend_get_cycle_cnt(kbdev),
 				group->handle, group->kctx->tgid,
 				group->kctx->id, group->csg_nr);
-
-			scheduler_wakeup(kbdev, true);
-
-			/* Wait for MCU firmware to start running */
-			if (kbase_csf_scheduler_wait_mcu_active(kbdev))
-				dev_warn(
-					kbdev->dev,
-					"[%llu] Wait for MCU active failed when terminating group %d of context %d_%d on slot %d",
-					kbase_backend_get_cycle_cnt(kbdev),
-					group->handle, group->kctx->tgid,
-					group->kctx->id, group->csg_nr);
+			/* No point in waiting for CSG termination if MCU didn't
+			 * become active.
+			 */
+			wait_for_termination = false;
 		}
-
-		/* Check the group state again as scheduler lock would have been
-		 * released when waiting for the exit from SLEEPING state.
-		 */
-		if (!queue_group_scheduled_locked(group))
-			goto unlock;
-
-		on_slot = kbasep_csf_scheduler_group_is_on_slot_locked(group);
 	}
 #endif
 	if (!on_slot) {
@@ -2682,7 +3525,11 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group)
 	} else {
 		bool as_faulty;
 
-		term_group_sync(group);
+		if (likely(wait_for_termination))
+			term_group_sync(group);
+		else
+			term_csg_slot(group);
+
 		/* Treat the csg been terminated */
 		as_faulty = cleanup_csg_slot(group);
 		/* remove from the scheduler list */
@@ -2692,7 +3539,7 @@ void kbase_csf_scheduler_group_deschedule(struct kbase_queue_group *group)
 	WARN_ON(queue_group_scheduled_locked(group));
 
 unlock:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 
 /**
@@ -2731,6 +3578,8 @@ static int scheduler_group_schedule(struct kbase_queue_group *group)
 				group));
 
 			group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group,
+						 group->run_state);
 
 			/* A normal mode CSG could be idle onslot during
 			 * protected mode. In this case clear the
@@ -2741,6 +3590,8 @@ static int scheduler_group_schedule(struct kbase_queue_group *group)
 			if (protm_grp && protm_grp != group) {
 				clear_bit((unsigned int)group->csg_nr,
 					  scheduler->csg_slots_idle_mask);
+				/* Request the update to confirm the condition inferred. */
+				group->reevaluate_idle_status = true;
 				KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group,
 					scheduler->csg_slots_idle_mask[0]);
 			}
@@ -2767,8 +3618,7 @@ static int scheduler_group_schedule(struct kbase_queue_group *group)
 		/* A new group into the scheduler */
 		new_val = atomic_inc_return(
 			&kbdev->csf.scheduler.non_idle_offslot_grps);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC,
-					 group, new_val);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, new_val);
 	}
 
 	/* Since a group has become active now, check if GPU needs to be
@@ -2971,8 +3821,7 @@ static void program_group_on_vacant_csg_slot(struct kbase_device *kbdev,
 					scheduler->remaining_tick_slots--;
 				}
 			} else {
-				update_offslot_non_idle_cnt_for_faulty_grp(
-					group);
+				update_offslot_non_idle_cnt(group);
 				remove_scheduled_group(kbdev, group);
 			}
 		}
@@ -3064,7 +3913,6 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 	DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS);
 	DECLARE_BITMAP(evicted_mask, MAX_SUPPORTED_CSGS) = {0};
 	bool suspend_wait_failed = false;
-	long remaining = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
 
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 
@@ -3076,6 +3924,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 
 	while (!bitmap_empty(slot_mask, MAX_SUPPORTED_CSGS)) {
 		DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS);
+		long remaining = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT));
 
 		bitmap_copy(changed, slot_mask, MAX_SUPPORTED_CSGS);
 
@@ -3084,7 +3933,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 				csg_slot_stopped_raw),
 			remaining);
 
-		if (remaining) {
+		if (likely(remaining)) {
 			u32 i;
 
 			for_each_set_bit(i, changed, num_groups) {
@@ -3103,6 +3952,12 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 					 * group is not terminated during
 					 * the sleep.
 					 */
+
+					/* Only emit suspend, if there was no AS fault */
+					if (kctx_as_enabled(group->kctx) && !group->faulted)
+						KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG(
+							kbdev,
+							kbdev->gpu_props.props.raw_props.gpu_id, i);
 					save_csg_slot(group);
 					as_fault = cleanup_csg_slot(group);
 					/* If AS fault detected, evict it */
@@ -3115,6 +3970,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 				program_vacant_csg_slot(kbdev, (s8)i);
 			}
 		} else {
+			struct gpu_uevent evt;
 			u32 i;
 
 			/* Groups that have failed to suspend in time shall
@@ -3124,6 +3980,7 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 			for_each_set_bit(i, slot_mask, num_groups) {
 				struct kbase_queue_group *const group =
 					scheduler->csg_slots[i].resident_group;
+				enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT;
 
 				struct base_gpu_queue_group_error const
 					err_payload = { .error_type =
@@ -3137,14 +3994,13 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 				if (unlikely(group == NULL))
 					continue;
 
-				kbase_csf_add_group_fatal_error(group,
-								&err_payload);
-				kbase_event_wakeup_nosync(group->kctx);
-
 				/* TODO GPUCORE-25328: The CSG can't be
 				 * terminated, the GPU will be reset as a
 				 * work-around.
 				 */
+				evt.type = GPU_UEVENT_TYPE_KMD_ERROR;
+				evt.info = GPU_UEVENT_INFO_CSG_GROUP_SUSPEND;
+				pixel_gpu_uevent_send(kbdev, &evt);
 				dev_warn(
 					kbdev->dev,
 					"[%llu] Group %d of context %d_%d on slot %u failed to suspend (timeout %d ms)",
@@ -3152,14 +4008,19 @@ static void program_suspending_csg_slots(struct kbase_device *kbdev)
 					group->handle, group->kctx->tgid,
 					group->kctx->id, i,
 					kbdev->csf.fw_timeout_ms);
+				if (kbase_csf_firmware_ping_wait(kbdev,
+								 FW_PING_AFTER_ERROR_TIMEOUT_MS))
+					error_type = DF_PING_REQUEST_TIMEOUT;
+				schedule_actions_trigger_df(kbdev, group->kctx, error_type);
+
+				kbase_csf_add_group_fatal_error(group, &err_payload);
+				kbase_event_wakeup_nosync(group->kctx);
 
 				/* The group has failed suspension, stop
 				 * further examination.
 				 */
 				clear_bit(i, slot_mask);
 				set_bit(i, scheduler->csgs_events_enable_mask);
-				update_offslot_non_idle_cnt_for_onslot_grp(
-					group);
 			}
 
 			suspend_wait_failed = true;
@@ -3239,7 +4100,7 @@ static void wait_csg_slots_start(struct kbase_device *kbdev)
 			slots_state_changed(kbdev, changed, csg_slot_running),
 			remaining);
 
-		if (remaining) {
+		if (likely(remaining)) {
 			for_each_set_bit(i, changed, num_groups) {
 				struct kbase_queue_group *group =
 					scheduler->csg_slots[i].resident_group;
@@ -3247,12 +4108,27 @@ static void wait_csg_slots_start(struct kbase_device *kbdev)
 				/* The on slot csg is now running */
 				clear_bit(i, slot_mask);
 				group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group,
+							 group->run_state);
 			}
 		} else {
+			const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_CSG_SLOTS_START
+			};
+			const int csg_nr = ffs(slot_mask[0]) - 1;
+			struct kbase_queue_group *group =
+				scheduler->csg_slots[csg_nr].resident_group;
+			enum dumpfault_error_type error_type = DF_CSG_START_TIMEOUT;
+
+			pixel_gpu_uevent_send(kbdev, &evt);
 			dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for CSG slots to start, slots: 0x%*pb\n",
 				 kbase_backend_get_cycle_cnt(kbdev),
 				 kbdev->csf.fw_timeout_ms,
 				 num_groups, slot_mask);
+			if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS))
+				error_type = DF_PING_REQUEST_TIMEOUT;
+			schedule_actions_trigger_df(kbdev, group->kctx, error_type);
 
 			if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
 				kbase_reset_gpu(kbdev);
@@ -3369,11 +4245,10 @@ static int wait_csg_slots_handshake_ack(struct kbase_device *kbdev,
 						   slot_mask, dones),
 				remaining);
 
-		if (remaining)
+		if (likely(remaining))
 			bitmap_andnot(slot_mask, slot_mask, dones, num_groups);
 		else {
 
-
 			/* Timed-out on the wait */
 			return -ETIMEDOUT;
 		}
@@ -3392,20 +4267,47 @@ static void wait_csg_slots_finish_prio_update(struct kbase_device *kbdev)
 
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 
-	if (ret != 0) {
+	if (unlikely(ret != 0)) {
+		const int csg_nr = ffs(slot_mask[0]) - 1;
+		struct kbase_queue_group *group =
+			kbdev->csf.scheduler.csg_slots[csg_nr].resident_group;
+		enum dumpfault_error_type error_type = DF_CSG_EP_CFG_TIMEOUT;
 		/* The update timeout is not regarded as a serious
 		 * issue, no major consequences are expected as a
 		 * result, so just warn the case.
 		 */
+		const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_CSG_EP_CFG
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
 		dev_warn(
 			kbdev->dev,
 			"[%llu] Timeout (%d ms) on CSG_REQ:EP_CFG, skipping the update wait: slot mask=0x%lx",
 			kbase_backend_get_cycle_cnt(kbdev),
 			kbdev->csf.fw_timeout_ms,
 			slot_mask[0]);
+		if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS))
+			error_type = DF_PING_REQUEST_TIMEOUT;
+		schedule_actions_trigger_df(kbdev, group->kctx, error_type);
+
+		/* Timeout could indicate firmware is unresponsive so trigger a GPU reset. */
+		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+			kbase_reset_gpu(kbdev);
 	}
 }
 
+static void report_csg_termination(struct kbase_queue_group *const group)
+{
+	struct base_gpu_queue_group_error
+		err = { .error_type = BASE_GPU_QUEUE_GROUP_ERROR_FATAL,
+			.payload = { .fatal_group = {
+					     .status = GPU_EXCEPTION_TYPE_SW_FAULT_2,
+				     } } };
+
+	kbase_csf_add_group_fatal_error(group, &err);
+}
+
 void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev,
 		struct kbase_context *kctx, struct list_head *evicted_groups)
 {
@@ -3416,23 +4318,28 @@ void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev,
 	DECLARE_BITMAP(slot_mask, MAX_SUPPORTED_CSGS) = {0};
 
 	lockdep_assert_held(&kctx->csf.lock);
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	/* This code is only called during reset, so we don't wait for the CSG
 	 * slots to be stopped
 	 */
 	WARN_ON(!kbase_reset_gpu_is_active(kbdev));
 
-	KBASE_KTRACE_ADD(kbdev, EVICT_CTX_SLOTS, kctx, 0u);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_EVICT_CTX_SLOTS_START, kctx, 0u);
 	for (slot = 0; slot < num_groups; slot++) {
 		group = kbdev->csf.scheduler.csg_slots[slot].resident_group;
 		if (group && group->kctx == kctx) {
 			bool as_fault;
 
+			dev_dbg(kbdev->dev, "Evicting group [%d] running on slot [%d] due to reset",
+				group->handle, group->csg_nr);
+
 			term_csg_slot(group);
 			as_fault = cleanup_csg_slot(group);
 			/* remove the group from the scheduler list */
 			sched_evict_group(group, as_fault, false);
+			/* signal Userspace that CSG is being terminated */
+			report_csg_termination(group);
 			/* return the evicted group to the caller */
 			list_add_tail(&group->link, evicted_groups);
 			set_bit(slot, slot_mask);
@@ -3442,7 +4349,17 @@ void kbase_csf_scheduler_evict_ctx_slots(struct kbase_device *kbdev,
 	dev_info(kbdev->dev, "Evicting context %d_%d slots: 0x%*pb\n",
 			kctx->tgid, kctx->id, num_groups, slot_mask);
 
-	mutex_unlock(&scheduler->lock);
+	/* Fatal errors may have been the cause of the GPU reset
+	 * taking place, in which case we want to make sure that
+	 * we wake up the fatal event queue to notify userspace
+	 * only once. Otherwise, we may have duplicate event
+	 * notifications between the time the first notification
+	 * occurs and the time the GPU is reset.
+	 */
+	kbase_event_wakeup_nosync(kctx);
+
+	rt_mutex_unlock(&scheduler->lock);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_EVICT_CTX_SLOTS_END, kctx, num_groups);
 }
 
 /**
@@ -3486,8 +4403,8 @@ static bool scheduler_slot_protm_ack(struct kbase_device *const kbdev,
 		struct kbase_queue *queue = group->bound_queues[i];
 
 		clear_bit(i, group->protm_pending_bitmap);
-		KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, PROTM_PENDING_CLEAR, group,
-					   queue, group->protm_pending_bitmap[0]);
+		KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, CSI_PROTM_PEND_CLEAR, group, queue,
+					   group->protm_pending_bitmap[0]);
 
 		if (!WARN_ON(!queue) && queue->enabled) {
 			struct kbase_csf_cmd_stream_info *stream =
@@ -3523,6 +4440,39 @@ static bool scheduler_slot_protm_ack(struct kbase_device *const kbdev,
 }
 
 /**
+ * protm_enter_set_next_pending_seq - Update the scheduler's field of
+ * tick_protm_pending_seq to that from the next available on-slot protm
+ * pending CSG.
+ *
+ * @kbdev:     Pointer to the GPU device.
+ *
+ * If applicable, the function updates the scheduler's tick_protm_pending_seq
+ * field from the next available on-slot protm pending CSG. If not, the field
+ * is set to KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID.
+ */
+static void protm_enter_set_next_pending_seq(struct kbase_device *const kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	u32 num_groups = kbdev->csf.global_iface.group_num;
+	u32 num_csis = kbdev->csf.global_iface.groups[0].stream_num;
+	u32 i;
+
+	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
+
+	/* Reset the tick's pending protm seq number to invalid initially */
+	scheduler->tick_protm_pending_seq = KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID;
+	for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) {
+		struct kbase_queue_group *group = scheduler->csg_slots[i].resident_group;
+
+		/* Set to the next pending protm group's scan_seq_number */
+		if ((group != scheduler->active_protm_grp) &&
+		    (!bitmap_empty(group->protm_pending_bitmap, num_csis)) &&
+		    (group->scan_seq_num < scheduler->tick_protm_pending_seq))
+			scheduler->tick_protm_pending_seq = group->scan_seq_num;
+	}
+}
+
+/**
  * scheduler_group_check_protm_enter - Request the given group to be evaluated
  * for triggering the protected mode.
  *
@@ -3540,11 +4490,22 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev,
 				struct kbase_queue_group *const input_grp)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct kbase_protected_suspend_buffer *sbuf = &input_grp->protected_suspend_buf;
 	unsigned long flags;
 	bool protm_in_use;
 
 	lockdep_assert_held(&scheduler->lock);
 
+	/* Return early if the physical pages have not been allocated yet */
+	if (unlikely(!sbuf->pma))
+		return;
+
+	/* This lock is taken to prevent the issuing of MMU command during the
+	 * transition to protected mode. This helps avoid the scenario where the
+	 * entry to protected mode happens with a memory region being locked and
+	 * the same region is then accessed by the GPU in protected mode.
+	 */
+	down_write(&kbdev->csf.pmode_sync_sem);
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 
 	/* Check if the previous transition to enter & exit the protected
@@ -3552,8 +4513,7 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev,
 	 */
 	protm_in_use = kbase_csf_scheduler_protected_mode_in_use(kbdev) ||
 		       kbdev->protected_mode;
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_CHECK_PROTM_ENTER, input_grp,
-				 protm_in_use);
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_ENTER_CHECK, input_grp, protm_in_use);
 
 	/* Firmware samples the PROTM_PEND ACK bit for CSs when
 	 * Host sends PROTM_ENTER global request. So if PROTM_PEND ACK bit
@@ -3584,6 +4544,8 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev,
 			    CSG_SLOT_RUNNING) {
 			if (kctx_as_enabled(input_grp->kctx) &&
 			    scheduler_slot_protm_ack(kbdev, input_grp, slot)) {
+				int err;
+
 				/* Option of acknowledging to multiple
 				 * CSGs from the same kctx is dropped,
 				 * after consulting with the
@@ -3593,22 +4555,75 @@ static void scheduler_group_check_protm_enter(struct kbase_device *const kbdev,
 
 				/* Switch to protected mode */
 				scheduler->active_protm_grp = input_grp;
-				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_ENTER_PROTM,
-							 input_grp, 0u);
-				/* Reset the tick's pending protm seq number */
-				scheduler->tick_protm_pending_seq =
-					KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID;
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_ENTER, input_grp,
+							 0u);
+
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+				spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+
+				/* Coresight must be disabled before entering protected mode. */
+				kbase_debug_coresight_csf_disable_pmode_enter(kbdev);
+
+				spin_lock_irqsave(&scheduler->interrupt_lock, flags);
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 
 				kbase_csf_enter_protected_mode(kbdev);
+				/* Set the pending protm seq number to the next one */
+				protm_enter_set_next_pending_seq(kbdev);
+
 				spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 
-				kbase_csf_wait_protected_mode_enter(kbdev);
+				err = kbase_csf_wait_protected_mode_enter(kbdev);
+				up_write(&kbdev->csf.pmode_sync_sem);
+
+				if (err)
+					schedule_actions_trigger_df(kbdev, input_grp->kctx,
+							DF_PROTECTED_MODE_ENTRY_FAILURE);
+
+				scheduler->protm_enter_time = ktime_get_raw();
+
 				return;
 			}
 		}
 	}
 
 	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+	up_write(&kbdev->csf.pmode_sync_sem);
+}
+
+/**
+ * scheduler_check_pmode_progress - Check if protected mode execution is progressing
+ *
+ * @kbdev:     Pointer to the GPU device.
+ *
+ * This function is called when the GPU is in protected mode.
+ *
+ * It will check if the time spent in protected mode is less
+ * than CSF_SCHED_PROTM_PROGRESS_TIMEOUT. If not, a PROTM_EXIT
+ * request is sent to the FW.
+ */
+static void scheduler_check_pmode_progress(struct kbase_device *kbdev)
+{
+	u64 protm_spent_time_ms;
+	u64 protm_progress_timeout =
+		kbase_get_timeout_ms(kbdev, CSF_SCHED_PROTM_PROGRESS_TIMEOUT);
+	s64 diff_ms_signed =
+		ktime_ms_delta(ktime_get_raw(), kbdev->csf.scheduler.protm_enter_time);
+
+	if (diff_ms_signed < 0)
+		return;
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	protm_spent_time_ms = (u64)diff_ms_signed;
+	if (protm_spent_time_ms < protm_progress_timeout)
+		return;
+
+	dev_dbg(kbdev->dev, "Protected mode progress timeout: %llu >= %llu",
+		protm_spent_time_ms, protm_progress_timeout);
+
+	/* Prompt the FW to exit protected mode */
+	scheduler_force_protm_exit(kbdev);
 }
 
 static void scheduler_apply(struct kbase_device *kbdev)
@@ -3616,8 +4631,6 @@ static void scheduler_apply(struct kbase_device *kbdev)
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 	const u32 total_csg_slots = kbdev->csf.global_iface.group_num;
 	const u32 available_csg_slots = scheduler->num_csg_slots_for_tick;
-	u32 suspend_cnt = 0;
-	u32 remain_cnt = 0;
 	u32 resident_cnt = 0;
 	struct kbase_queue_group *group;
 	u32 i;
@@ -3630,11 +4643,8 @@ static void scheduler_apply(struct kbase_device *kbdev)
 		group = scheduler->csg_slots[i].resident_group;
 		if (group) {
 			resident_cnt++;
-			if (group->prepared_seq_num >= available_csg_slots) {
+			if (group->prepared_seq_num >= available_csg_slots)
 				suspend_queue_group(group);
-				suspend_cnt++;
-			} else
-				remain_cnt++;
 		}
 	}
 
@@ -3664,8 +4674,7 @@ static void scheduler_apply(struct kbase_device *kbdev)
 
 			if (!kctx_as_enabled(group->kctx) || group->faulted) {
 				/* Drop the head group and continue */
-				update_offslot_non_idle_cnt_for_faulty_grp(
-					group);
+				update_offslot_non_idle_cnt(group);
 				remove_scheduled_group(kbdev, group);
 				continue;
 			}
@@ -3688,8 +4697,9 @@ static void scheduler_apply(struct kbase_device *kbdev)
 	program_suspending_csg_slots(kbdev);
 }
 
-static void scheduler_ctx_scan_groups(struct kbase_device *kbdev,
-		struct kbase_context *kctx, int priority)
+static void scheduler_ctx_scan_groups(struct kbase_device *kbdev, struct kbase_context *kctx,
+				      int priority, struct list_head *privileged_groups,
+				      struct list_head *active_groups)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 	struct kbase_queue_group *group;
@@ -3703,8 +4713,9 @@ static void scheduler_ctx_scan_groups(struct kbase_device *kbdev,
 	if (!kctx_as_enabled(kctx))
 		return;
 
-	list_for_each_entry(group, &kctx->csf.sched.runnable_groups[priority],
-			    link) {
+	list_for_each_entry(group, &kctx->csf.sched.runnable_groups[priority], link) {
+		bool protm_req;
+
 		if (WARN_ON(!list_empty(&group->link_to_schedule)))
 			/* This would be a bug */
 			list_del_init(&group->link_to_schedule);
@@ -3715,33 +4726,30 @@ static void scheduler_ctx_scan_groups(struct kbase_device *kbdev,
 		/* Set the scanout sequence number, starting from 0 */
 		group->scan_seq_num = scheduler->csg_scan_count_for_tick++;
 
+		protm_req = !bitmap_empty(group->protm_pending_bitmap,
+					  kbdev->csf.global_iface.groups[0].stream_num);
+
 		if (scheduler->tick_protm_pending_seq ==
-				KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) {
-			if (!bitmap_empty(group->protm_pending_bitmap,
-			     kbdev->csf.global_iface.groups[0].stream_num))
-				scheduler->tick_protm_pending_seq =
-					group->scan_seq_num;
+		    KBASEP_TICK_PROTM_PEND_SCAN_SEQ_NR_INVALID) {
+			if (protm_req)
+				scheduler->tick_protm_pending_seq = group->scan_seq_num;
 		}
 
-		if (queue_group_idle_locked(group)) {
-			if (on_slot_group_idle_locked(group))
+		if (protm_req && on_slot_group_idle_locked(group))
+			update_idle_protm_group_state_to_runnable(group);
+		else if (queue_group_idle_locked(group)) {
+			if (can_schedule_idle_group(group))
 				list_add_tail(&group->link_to_schedule,
 					&scheduler->idle_groups_to_schedule);
 			continue;
 		}
 
-		if (!scheduler->ngrp_to_schedule) {
-			/* keep the top csg's origin */
-			scheduler->top_ctx = kctx;
-			scheduler->top_grp = group;
+		if (protm_req && (group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME)) {
+			list_add_tail(&group->link_to_schedule, privileged_groups);
+			continue;
 		}
 
-		list_add_tail(&group->link_to_schedule,
-			      &scheduler->groups_to_schedule);
-		group->prepared_seq_num = scheduler->ngrp_to_schedule++;
-
-		kctx->csf.sched.ngrp_to_schedule++;
-		count_active_address_space(kbdev, kctx);
+		list_add_tail(&group->link_to_schedule, active_groups);
 	}
 }
 
@@ -3810,10 +4818,9 @@ static void scheduler_rotate_groups(struct kbase_device *kbdev)
 			new_head_grp = (!list_empty(list)) ?
 						list_first_entry(list, struct kbase_queue_group, link) :
 						NULL;
-			KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_ROTATE_RUNNABLE,
-						top_grp, top_ctx->csf.sched.num_runnable_grps);
-			KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_HEAD_RUNNABLE,
-						 new_head_grp, 0u);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_ROTATE, top_grp,
+						 top_ctx->csf.sched.num_runnable_grps);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_RUNNABLE_HEAD, new_head_grp, 0u);
 			dev_dbg(kbdev->dev,
 			    "groups rotated for a context, num_runnable_groups: %u\n",
 			    scheduler->top_ctx->csf.sched.num_runnable_grps);
@@ -3844,13 +4851,12 @@ static void scheduler_rotate_ctxs(struct kbase_device *kbdev)
 				struct kbase_context *new_head_kctx;
 
 				list_move_tail(&pos->csf.link, list);
-				KBASE_KTRACE_ADD(kbdev, SCHEDULER_ROTATE_RUNNABLE, pos,
-						 0u);
+				KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_ROTATE, pos, 0u);
 				new_head_kctx = (!list_empty(list)) ?
 							list_first_entry(list, struct kbase_context, csf.link) :
 							NULL;
-				KBASE_KTRACE_ADD(kbdev, SCHEDULER_HEAD_RUNNABLE,
-						 new_head_kctx, 0u);
+				KBASE_KTRACE_ADD(kbdev, SCHEDULER_RUNNABLE_KCTX_HEAD, new_head_kctx,
+						 0u);
 				dev_dbg(kbdev->dev, "contexts rotated\n");
 			}
 		}
@@ -3865,12 +4871,17 @@ static void scheduler_rotate_ctxs(struct kbase_device *kbdev)
  * @kbdev:             Pointer to the GPU device.
  * @csg_bitmap:        Bitmap of the CSG slots for which
  *                     the status update request completed successfully.
- * @failed_csg_bitmap: Bitmap of the CSG slots for which
+ * @failed_csg_bitmap: Bitmap of the idle CSG slots for which
  *                     the status update request timedout.
  *
  * This function sends a CSG status update request for all the CSG slots
- * present in the bitmap scheduler->csg_slots_idle_mask and wait for the
- * request to complete.
+ * present in the bitmap scheduler->csg_slots_idle_mask. Additionally, if
+ * the group's 'reevaluate_idle_status' field is set, the nominally non-idle
+ * slots are also included in the status update for a confirmation of their
+ * status. The function wait for the status update request to complete and
+ * returns the update completed slots bitmap and any timed out idle-flagged
+ * slots bitmap.
+ *
  * The bits set in the scheduler->csg_slots_idle_mask bitmap are cleared by
  * this function.
  */
@@ -3882,60 +4893,119 @@ static void scheduler_update_idle_slots_status(struct kbase_device *kbdev,
 	struct kbase_csf_global_iface *const global_iface =
 						&kbdev->csf.global_iface;
 	unsigned long flags, i;
+	u32 active_chk = 0;
 
 	lockdep_assert_held(&scheduler->lock);
 
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
-	for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) {
+
+	for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) {
 		struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i];
 		struct kbase_queue_group *group = csg_slot->resident_group;
 		struct kbase_csf_cmd_stream_group_info *const ginfo =
 						&global_iface->groups[i];
 		u32 csg_req;
+		bool idle_flag;
 
-		clear_bit(i, scheduler->csg_slots_idle_mask);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group,
-					 scheduler->csg_slots_idle_mask[0]);
-		if (WARN_ON(!group))
+		if (WARN_ON(!group)) {
+			clear_bit(i, scheduler->csg_inuse_bitmap);
+			clear_bit(i, scheduler->csg_slots_idle_mask);
 			continue;
+		}
 
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_STATUS_UPDATE, group,
-					 i);
+		idle_flag = test_bit(i, scheduler->csg_slots_idle_mask);
+		if (idle_flag || group->reevaluate_idle_status) {
+			if (idle_flag) {
+#ifdef CONFIG_MALI_DEBUG
+				if (!bitmap_empty(group->protm_pending_bitmap,
+						  ginfo->stream_num)) {
+					dev_warn(kbdev->dev,
+						"Idle bit set for group %d of ctx %d_%d on slot %d with pending protm execution",
+						group->handle, group->kctx->tgid,
+						group->kctx->id, (int)i);
+				}
+#endif
+				clear_bit(i, scheduler->csg_slots_idle_mask);
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_CLEAR, group,
+							 scheduler->csg_slots_idle_mask[0]);
+			} else {
+				/* Updates include slots for which reevaluation is needed.
+				 * Here one tracks the extra included slots in active_chk.
+				 * For protm pending slots, their status of activeness are
+				 * assured so no need to request an update.
+				 */
+				active_chk |= BIT(i);
+				group->reevaluate_idle_status = false;
+			}
 
-		csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
-		csg_req ^= CSG_REQ_STATUS_UPDATE_MASK;
-		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req,
-						  CSG_REQ_STATUS_UPDATE_MASK);
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_UPDATE_IDLE_SLOT_REQ, group, i);
+			csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
+			csg_req ^= CSG_REQ_STATUS_UPDATE_MASK;
+			kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req,
+							CSG_REQ_STATUS_UPDATE_MASK);
 
-		set_bit(i, csg_bitmap);
+			/* Track the slot update requests in csg_bitmap.
+			 * Note, if the scheduler requested extended update, the resulting
+			 * csg_bitmap would be the idle_flags + active_chk. Otherwise it's
+			 * identical to the idle_flags.
+			 */
+			set_bit(i, csg_bitmap);
+		} else {
+			group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group,
+						group->run_state);
+		}
 	}
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 
 	/* The groups are aggregated into a single kernel doorbell request */
 	if (!bitmap_empty(csg_bitmap, num_groups)) {
 		long wt =
-			kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+			kbase_csf_timeout_in_jiffies(CSG_STATUS_UPDATE_REQ_TIMEOUT_MS);
 		u32 db_slots = (u32)csg_bitmap[0];
 
 		kbase_csf_ring_csg_slots_doorbell(kbdev, db_slots);
+		spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 
 		if (wait_csg_slots_handshake_ack(kbdev,
 				CSG_REQ_STATUS_UPDATE_MASK, csg_bitmap, wt)) {
+			const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_CSG_REQ_STATUS_UPDATE
+			};
+			const int csg_nr = ffs(csg_bitmap[0]) - 1;
+			struct kbase_queue_group *group =
+				scheduler->csg_slots[csg_nr].resident_group;
+			pixel_gpu_uevent_send(kbdev, &evt);
+
 			dev_warn(
 				kbdev->dev,
 				"[%llu] Timeout (%d ms) on CSG_REQ:STATUS_UPDATE, treat groups as not idle: slot mask=0x%lx",
 				kbase_backend_get_cycle_cnt(kbdev),
-				kbdev->csf.fw_timeout_ms,
+				CSG_STATUS_UPDATE_REQ_TIMEOUT_MS,
 				csg_bitmap[0]);
+			schedule_actions_trigger_df(kbdev, group->kctx,
+				DF_CSG_STATUS_UPDATE_TIMEOUT);
 
 			/* Store the bitmap of timed out slots */
 			bitmap_copy(failed_csg_bitmap, csg_bitmap, num_groups);
 			csg_bitmap[0] = ~csg_bitmap[0] & db_slots;
+
+			/* Mask off any failed bit position contributed from active ones, as the
+			 * intention is to retain the failed bit pattern contains only those from
+			 * idle flags reporting back to the caller. This way, any failed to update
+			 * original idle flag would be kept as 'idle' (an informed guess, as the
+			 * update did not come to a conclusive result). So will be the failed
+			 * active ones be treated as still 'non-idle'. This is for a graceful
+			 * handling to the unexpected timeout condition.
+			 */
+			failed_csg_bitmap[0] &= ~active_chk;
+
 		} else {
-			KBASE_KTRACE_ADD(kbdev, SLOTS_STATUS_UPDATE_ACK, NULL,
-					 db_slots);
+			KBASE_KTRACE_ADD(kbdev, SCHEDULER_UPDATE_IDLE_SLOTS_ACK, NULL, db_slots);
 			csg_bitmap[0] = db_slots;
 		}
+	} else {
+		spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 	}
 }
 
@@ -3990,34 +5060,35 @@ static void scheduler_handle_idle_slots(struct kbase_device *kbdev)
 
 		if (group_on_slot_is_idle(kbdev, i)) {
 			group->run_state = KBASE_CSF_GROUP_IDLE;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_IDLE, group, group->run_state);
 			set_bit(i, scheduler->csg_slots_idle_mask);
 			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET,
 						 group, scheduler->csg_slots_idle_mask[0]);
-		} else
+		} else {
 			group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group,
+						 group->run_state);
+		}
 	}
 
 	bitmap_or(scheduler->csg_slots_idle_mask,
 		  scheduler->csg_slots_idle_mask,
 		  failed_csg_bitmap, num_groups);
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, NULL,
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_HANDLE_IDLE_SLOTS, NULL,
 				 scheduler->csg_slots_idle_mask[0]);
 	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 }
 
-static void scheduler_scan_idle_groups(struct kbase_device *kbdev)
+static void scheduler_scan_group_list(struct kbase_device *kbdev, struct list_head *groups)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 	struct kbase_queue_group *group, *n;
 
-	list_for_each_entry_safe(group, n, &scheduler->idle_groups_to_schedule,
-				 link_to_schedule) {
-
-		WARN_ON(!on_slot_group_idle_locked(group));
-
+	list_for_each_entry_safe(group, n, groups, link_to_schedule) {
 		if (!scheduler->ngrp_to_schedule) {
 			/* keep the top csg's origin */
 			scheduler->top_ctx = group->kctx;
+			/* keep the top csg''s origin */
 			scheduler->top_grp = group;
 		}
 
@@ -4087,14 +5158,27 @@ static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev,
 
 	int ret = suspend_active_queue_groups(kbdev, slot_mask);
 
-	if (ret) {
+	if (unlikely(ret)) {
+		const int csg_nr = ffs(slot_mask[0]) - 1;
+		struct kbase_queue_group *group =
+			scheduler->csg_slots[csg_nr].resident_group;
+		enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT;
+
 		/* The suspend of CSGs failed,
 		 * trigger the GPU reset to be in a deterministic state.
 		 */
+		const struct gpu_uevent evt = {
+			.type = GPU_UEVENT_TYPE_KMD_ERROR,
+			.info = GPU_UEVENT_INFO_CSG_SLOTS_SUSPEND
+		};
+		pixel_gpu_uevent_send(kbdev, &evt);
 		dev_warn(kbdev->dev, "[%llu] Timeout (%d ms) waiting for CSG slots to suspend on power down, slot_mask: 0x%*pb\n",
 			 kbase_backend_get_cycle_cnt(kbdev),
 			 kbdev->csf.fw_timeout_ms,
 			 kbdev->csf.global_iface.group_num, slot_mask);
+		if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS))
+			error_type = DF_PING_REQUEST_TIMEOUT;
+		schedule_actions_trigger_df(kbdev, group->kctx, error_type);
 
 		if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu(kbdev);
@@ -4111,6 +5195,7 @@ static int suspend_active_groups_on_powerdown(struct kbase_device *kbdev,
 	return 0;
 }
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 /**
  * all_on_slot_groups_remained_idle - Live check for all groups' idleness
  *
@@ -4147,10 +5232,15 @@ static bool all_on_slot_groups_remained_idle(struct kbase_device *kbdev)
 			u64 const *output_addr;
 			u64 cur_extract_ofs;
 
-			if (!queue)
+			if (!queue || !queue->user_io_addr)
 				continue;
 
-			output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE);
+			output_addr = (u64 const *)(queue->user_io_addr + PAGE_SIZE / sizeof(u64));
+			/*
+			 * These 64-bit reads and writes will be atomic on a 64-bit kernel
+			 * but may not be atomic on 32-bit kernels. Support for 32-bit
+			 * kernels is limited to build-only.
+			 */
 			cur_extract_ofs = output_addr[CS_EXTRACT_LO / sizeof(u64)];
 			if (cur_extract_ofs != queue->extract_ofs) {
 				/* More work has been executed since the idle
@@ -4163,6 +5253,7 @@ static bool all_on_slot_groups_remained_idle(struct kbase_device *kbdev)
 
 	return true;
 }
+#endif
 
 static bool scheduler_idle_suspendable(struct kbase_device *kbdev)
 {
@@ -4178,6 +5269,21 @@ static bool scheduler_idle_suspendable(struct kbase_device *kbdev)
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	spin_lock(&scheduler->interrupt_lock);
+
+	if (scheduler->fast_gpu_idle_handling) {
+		scheduler->fast_gpu_idle_handling = false;
+
+		if (scheduler->total_runnable_grps) {
+			suspend = !atomic_read(&scheduler->non_idle_offslot_grps) &&
+				  kbase_pm_idle_groups_sched_suspendable(kbdev);
+		} else
+			suspend = kbase_pm_no_runnables_sched_suspendable(kbdev);
+		spin_unlock(&scheduler->interrupt_lock);
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+		return suspend;
+	}
+
 	if (scheduler->total_runnable_grps) {
 
 		/* Check both on-slots and off-slots groups idle status */
@@ -4187,16 +5293,18 @@ static bool scheduler_idle_suspendable(struct kbase_device *kbdev)
 	} else
 		suspend = kbase_pm_no_runnables_sched_suspendable(kbdev);
 
+#ifndef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 	/* Confirm that all groups are actually idle before proceeding with
 	 * suspension as groups might potentially become active again without
 	 * informing the scheduler in case userspace rings a doorbell directly.
 	 */
 	if (suspend && (unlikely(atomic_read(&scheduler->gpu_no_longer_idle)) ||
 			unlikely(!all_on_slot_groups_remained_idle(kbdev)))) {
-		dev_info(kbdev->dev,
+		dev_dbg(kbdev->dev,
 			 "GPU suspension skipped due to active CSGs");
 		suspend = false;
 	}
+#endif
 
 	spin_unlock(&scheduler->interrupt_lock);
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -4224,9 +5332,13 @@ static void scheduler_sleep_on_idle(struct kbase_device *kbdev)
 
 	dev_dbg(kbdev->dev,
 		"Scheduler to be put to sleep on GPU becoming idle");
-	cancel_tick_timer(kbdev);
+	cancel_tick_work(scheduler);
 	scheduler_pm_idle_before_sleep(kbdev);
 	scheduler->state = SCHED_SLEEPING;
+	KBASE_KTRACE_ADD(kbdev, SCHED_SLEEPING, NULL, scheduler->state);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	emit_gpu_metrics_to_frontend(kbdev);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
 }
 #endif
 
@@ -4244,6 +5356,7 @@ static void scheduler_sleep_on_idle(struct kbase_device *kbdev)
  */
 static bool scheduler_suspend_on_idle(struct kbase_device *kbdev)
 {
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	int ret = suspend_active_groups_on_powerdown(kbdev, false);
 
 	if (ret) {
@@ -4251,62 +5364,352 @@ static bool scheduler_suspend_on_idle(struct kbase_device *kbdev)
 			atomic_read(
 				&kbdev->csf.scheduler.non_idle_offslot_grps));
 		/* Bring forward the next tick */
-		kbase_csf_scheduler_advance_tick(kbdev);
+		kbase_csf_scheduler_invoke_tick(kbdev);
 		return false;
 	}
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	turn_off_sc_power_rails(kbdev);
+	ack_gpu_idle_event(kbdev);
+#endif
+
 	dev_dbg(kbdev->dev, "Scheduler to be suspended on GPU becoming idle");
 	scheduler_suspend(kbdev);
-	cancel_tick_timer(kbdev);
+	cancel_tick_work(scheduler);
 	return true;
 }
 
 static void gpu_idle_worker(struct work_struct *work)
 {
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	struct kbase_device *kbdev = container_of(
+		work, struct kbase_device, csf.scheduler.gpu_idle_work.work);
+#else
 	struct kbase_device *kbdev = container_of(
 		work, struct kbase_device, csf.scheduler.gpu_idle_work);
+#endif
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	bool scheduler_is_idle_suspendable = false;
 	bool all_groups_suspended = false;
 
-	KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_BEGIN, NULL, 0u);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_START, NULL, 0u);
 
 #define __ENCODE_KTRACE_INFO(reset, idle, all_suspend)                         \
 	(((u32)reset) | (((u32)idle) << 4) | (((u32)all_suspend) << 8))
 
 	if (kbase_reset_gpu_try_prevent(kbdev)) {
 		dev_warn(kbdev->dev, "Quit idle for failing to prevent gpu reset.\n");
-		KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_END, NULL,
+		KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_END, NULL,
 				 __ENCODE_KTRACE_INFO(true, false, false));
 		return;
 	}
-	mutex_lock(&scheduler->lock);
+	kbase_debug_csf_fault_wait_completion(kbdev);
+	rt_mutex_lock(&scheduler->lock);
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	if (!scheduler->gpu_idle_work_pending)
+		goto unlock;
+
+	scheduler->gpu_idle_work_pending = false;
+#endif
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	if (unlikely(scheduler->state == SCHED_BUSY)) {
+		rt_mutex_unlock(&scheduler->lock);
+		kbase_reset_gpu_allow(kbdev);
+		return;
+	}
+#endif
 
 	scheduler_is_idle_suspendable = scheduler_idle_suspendable(kbdev);
 	if (scheduler_is_idle_suspendable) {
-		KBASE_KTRACE_ADD(kbdev, GPU_IDLE_HANDLING_START, NULL,
+		KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_HANDLING_START, NULL,
 				 kbase_csf_ktrace_gpu_cycle_cnt(kbdev));
 #ifdef KBASE_PM_RUNTIME
 		if (kbase_pm_gpu_sleep_allowed(kbdev) &&
-		    scheduler->total_runnable_grps)
+		    kbase_csf_scheduler_get_nr_active_csgs(kbdev))
 			scheduler_sleep_on_idle(kbdev);
 		else
 #endif
 			all_groups_suspended = scheduler_suspend_on_idle(kbdev);
+
+		KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_HANDLING_END, NULL, 0u);
 	}
 
-	mutex_unlock(&scheduler->lock);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+unlock:
+#endif
+	rt_mutex_unlock(&scheduler->lock);
 	kbase_reset_gpu_allow(kbdev);
-	KBASE_KTRACE_ADD(kbdev, IDLE_WORKER_END, NULL,
-			 __ENCODE_KTRACE_INFO(false,
-					      scheduler_is_idle_suspendable,
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_GPU_IDLE_WORKER_END, NULL,
+			 __ENCODE_KTRACE_INFO(false, scheduler_is_idle_suspendable,
 					      all_groups_suspended));
 #undef __ENCODE_KTRACE_INFO
 }
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+/**
+ * wait_csg_db_ack - Wait for the previously sent CSI kernel DBs for a CSG to
+ *                   get acknowledged.
+ *
+ * @kbdev:  Pointer to the device.
+ * @csg_nr: The CSG number.
+ *
+ * This function is called to wait for the previously sent CSI kernel DBs
+ * for a CSG to get acknowledged before acknowledging the GPU idle event.
+ * This is to ensure when @sc_rails_off_worker is doing the GPU idleness
+ * reevaluation the User submissions remain disabled.
+ * For firmware to re-enable User submission, two conditions are required to
+ * be met.
+ * 1. GLB_IDLE_EVENT acknowledgement
+ * 2. CSI kernel DB ring
+ *
+ * If GLB_IDLE_EVENT is acknowledged and FW notices the previously rung CS kernel
+ * DB, then it would re-enable the User submission and @sc_rails_off_worker might
+ * end up turning off the SC rails.
+ */
+static void wait_csg_db_ack(struct kbase_device *kbdev, int csg_nr)
+{
+#define WAIT_TIMEOUT 10 /* 1ms timeout */
+#define DELAY_TIME_IN_US 100
+	struct kbase_csf_cmd_stream_group_info *const ginfo =
+		&kbdev->csf.global_iface.groups[csg_nr];
+	const int max_iterations = WAIT_TIMEOUT;
+	int loop;
+
+	for (loop = 0; loop < max_iterations; loop++) {
+		if (kbase_csf_firmware_csg_input_read(ginfo, CSG_DB_REQ) ==
+		    kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK))
+			break;
+
+		udelay(DELAY_TIME_IN_US);
+	}
+
+	if (loop == max_iterations) {
+		dev_err(kbdev->dev,
+			"Timeout for csg %d CSG_DB_REQ %x != CSG_DB_ACK %x",
+			csg_nr,
+			kbase_csf_firmware_csg_input_read(ginfo, CSG_DB_REQ),
+			kbase_csf_firmware_csg_output(ginfo, CSG_DB_ACK));
+	}
+}
+
+/**
+ * recheck_gpu_idleness - Recheck the idleness of the GPU before turning off
+ *                        the SC power rails.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is called on the GPU idle notification to recheck the idleness
+ * of GPU before turning off the SC power rails. The reevaluation of idleness
+ * is done by sending CSG status update requests. An additional check is done
+ * for the CSGs that are reported as idle that whether the associated queues
+ * are empty or blocked.
+ *
+ * Return: true if the GPU was reevaluated as idle.
+ */
+static bool recheck_gpu_idleness(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	DECLARE_BITMAP(csg_bitmap, MAX_SUPPORTED_CSGS) = { 0 };
+	long wt = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+	u32 num_groups = kbdev->csf.global_iface.group_num;
+	unsigned long flags, i;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
+	for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) {
+		struct kbase_csf_cmd_stream_group_info *const ginfo =
+			&kbdev->csf.global_iface.groups[i];
+		u32 csg_req = kbase_csf_firmware_csg_output(ginfo, CSG_ACK);
+
+		csg_req ^= CSG_REQ_STATUS_UPDATE_MASK;
+		kbase_csf_firmware_csg_input_mask(ginfo, CSG_REQ, csg_req,
+						  CSG_REQ_STATUS_UPDATE_MASK);
+		set_bit(i, csg_bitmap);
+		wait_csg_db_ack(kbdev, i);
+	}
+	kbase_csf_ring_csg_slots_doorbell(kbdev, csg_bitmap[0]);
+	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+
+	if (wait_csg_slots_handshake_ack(kbdev,
+			CSG_REQ_STATUS_UPDATE_MASK, csg_bitmap, wt)) {
+		dev_warn(
+			kbdev->dev,
+			"[%llu] Timeout (%d ms) on STATUS_UPDATE, treat GPU as not idle: slot mask=0x%lx",
+			kbase_backend_get_cycle_cnt(kbdev),
+			kbdev->csf.fw_timeout_ms,
+			csg_bitmap[0]);
+		return false;
+	}
+
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSG_SLOT_IDLE_SET, NULL,
+				 scheduler->csg_slots_idle_mask[0]);
+
+	ack_gpu_idle_event(kbdev);
+	for_each_set_bit(i, scheduler->csg_slots_idle_mask, num_groups) {
+		struct kbase_csf_cmd_stream_group_info *const ginfo =
+			&kbdev->csf.global_iface.groups[i];
+		struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i];
+		struct kbase_queue_group *group = csg_slot->resident_group;
+		bool group_idle = true;
+		int j;
+
+		if (!group_on_slot_is_idle(kbdev, i))
+			group_idle = false;
+
+		for (j = 0; j < ginfo->stream_num; j++) {
+			struct kbase_queue *const queue =
+					group->bound_queues[j];
+			u32 *output_addr;
+
+			if (!queue || !queue->enabled)
+				continue;
+
+			output_addr = (u32 *)(queue->user_io_addr + PAGE_SIZE);
+
+			if (output_addr[CS_ACTIVE / sizeof(u32)]) {
+				dev_warn(
+					kbdev->dev,
+					"queue %d bound to group %d on slot %d active unexpectedly",
+					queue->csi_index, queue->group->handle,
+					queue->group->csg_nr);
+				group_idle = false;
+			}
+
+			if (group_idle) {
+				if (!save_slot_cs(ginfo, queue) &&
+				    !confirm_cmd_buf_empty(queue))
+					group_idle = false;
+			}
+
+			if (!group_idle) {
+				spin_lock_irqsave(&scheduler->interrupt_lock, flags);
+				kbase_csf_ring_cs_kernel_doorbell(kbdev,
+					queue->csi_index, group->csg_nr, true);
+				spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_RECHECK_NOT_IDLE, group, i);
+				return false;
+			}
+		}
+	}
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_RECHECK_IDLE, NULL, (u64)scheduler->csg_slots_idle_mask);
+	return true;
+}
+
+/**
+ * can_turn_off_sc_rails - Check if the conditions are met to turn off the
+ *                         SC power rails.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function checks both the on-slots and off-slots groups idle status and
+ * if firmware is managing the cores. If the groups are not idle or Host is
+ * managing the cores then the rails need to be kept on.
+ * Additionally, we must check that the Idle event has not already been acknowledged
+ * as that would indicate that the idle worker has run and potentially re-enabled
+ * user-submission.
+ *
+ * Return: true if the SC power rails can be turned off.
+ */
+static bool can_turn_off_sc_rails(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	bool turn_off_sc_rails;
+	bool idle_event_pending;
+	bool all_csg_idle;
+	bool non_idle_offslot;
+	unsigned long flags;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	if (scheduler->state == SCHED_SUSPENDED)
+		return false;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	spin_lock(&scheduler->interrupt_lock);
+	/* Ensure the SC power off sequence is complete before powering off the rail.
+	 * If shader rail is turned off during job, APM generates fatal error and GPU firmware
+	 * will generate error interrupt and try to reset.
+	 * Note that this will avert the case when a power off is not complete, but it is not
+	 * designed to handle a situation where a power on races with this code. That situation
+	 * should be prevented by trapping new work through the kernel.
+	 */
+	if (!kbdev->pm.backend.sc_pwroff_safe) {
+		trace_clock_set_rate("rail_off_aborted.", 1, raw_smp_processor_id());
+		dev_info(kbdev->dev, "SC Rail off aborted, power sequence incomplete");
+	}
+
+	idle_event_pending = gpu_idle_event_is_pending(kbdev);
+	all_csg_idle = kbase_csf_scheduler_all_csgs_idle(kbdev);
+	non_idle_offslot = !atomic_read(&scheduler->non_idle_offslot_grps);
+	turn_off_sc_rails = kbdev->pm.backend.sc_pwroff_safe &&
+			    idle_event_pending &&
+			    all_csg_idle &&
+			    non_idle_offslot &&
+			    !kbase_pm_no_mcu_core_pwroff(kbdev) &&
+			    !scheduler->sc_power_rails_off;
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SC_RAIL_CAN_TURN_OFF, NULL,
+		kbdev->pm.backend.sc_pwroff_safe |
+		idle_event_pending                  << 1 |
+		all_csg_idle                        << 2 |
+		non_idle_offslot                    << 3 |
+		!kbase_pm_no_mcu_core_pwroff(kbdev) << 4 |
+		!scheduler->sc_power_rails_off      << 5);
+
+	spin_unlock(&scheduler->interrupt_lock);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return turn_off_sc_rails;
+}
+
+static void sc_rails_off_worker(struct work_struct *work)
+{
+	struct kbase_device *kbdev = container_of(
+		work, struct kbase_device, csf.scheduler.sc_rails_off_work);
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_ENTER_SC_RAIL, NULL,
+			 kbase_csf_ktrace_gpu_cycle_cnt(kbdev));
+	if (kbase_reset_gpu_try_prevent(kbdev)) {
+		dev_warn(kbdev->dev, "Skip SC rails off for failing to prevent gpu reset");
+		return;
+	}
+
+	rt_mutex_lock(&scheduler->lock);
+	/* All the previously sent CSG/CSI level requests are expected to have
+	 * completed at this point.
+	 */
+
+	if (can_turn_off_sc_rails(kbdev)) {
+		if (recheck_gpu_idleness(kbdev)) {
+			/* The GPU idle work, enqueued after previous idle
+			 * notification, could already be pending if GPU became
+			 * active momentarily after the previous idle notification
+			 * and all CSGs were reported as idle.
+			 */
+			if (!scheduler->gpu_idle_work_pending)
+				WARN_ON(scheduler->sc_power_rails_off);
+			turn_off_sc_power_rails(kbdev);
+			enqueue_gpu_idle_work(scheduler,
+					kbdev->csf.gpu_idle_hysteresis_ms);
+		}
+	} else {
+		ack_gpu_idle_event(kbdev);
+	}
+
+	rt_mutex_unlock(&scheduler->lock);
+	kbase_reset_gpu_allow(kbdev);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_EXIT_SC_RAIL, NULL,
+			 kbase_csf_ktrace_gpu_cycle_cnt(kbdev));
+}
+#endif
+
 static int scheduler_prepare(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct list_head privileged_groups, active_groups;
 	unsigned long flags;
 	int i;
 
@@ -4332,6 +5735,8 @@ static int scheduler_prepare(struct kbase_device *kbdev)
 	scheduler->num_active_address_spaces = 0;
 	scheduler->num_csg_slots_for_tick = 0;
 	bitmap_zero(scheduler->csg_slots_prio_update, MAX_SUPPORTED_CSGS);
+	INIT_LIST_HEAD(&privileged_groups);
+	INIT_LIST_HEAD(&active_groups);
 
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 	scheduler->tick_protm_pending_seq =
@@ -4341,10 +5746,17 @@ static int scheduler_prepare(struct kbase_device *kbdev)
 		struct kbase_context *kctx;
 
 		list_for_each_entry(kctx, &scheduler->runnable_kctxs, csf.link)
-			scheduler_ctx_scan_groups(kbdev, kctx, i);
+			scheduler_ctx_scan_groups(kbdev, kctx, i, &privileged_groups,
+						  &active_groups);
 	}
 	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 
+	/* Adds privileged (RT + p.mode) groups to the scanout list */
+	scheduler_scan_group_list(kbdev, &privileged_groups);
+
+	/* Adds remainder of active groups to the scanout list */
+	scheduler_scan_group_list(kbdev, &active_groups);
+
 	/* Update this tick's non-idle groups */
 	scheduler->non_idle_scanout_grps = scheduler->ngrp_to_schedule;
 
@@ -4355,11 +5767,11 @@ static int scheduler_prepare(struct kbase_device *kbdev)
 	 */
 	atomic_set(&scheduler->non_idle_offslot_grps,
 		   scheduler->non_idle_scanout_grps);
-	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC, NULL,
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, NULL,
 				 scheduler->non_idle_scanout_grps);
 
 	/* Adds those idle but runnable groups to the scanout list */
-	scheduler_scan_idle_groups(kbdev);
+	scheduler_scan_group_list(kbdev, &scheduler->idle_groups_to_schedule);
 
 	WARN_ON(scheduler->csg_scan_count_for_tick < scheduler->ngrp_to_schedule);
 
@@ -4451,14 +5863,176 @@ static int prepare_fast_local_tock(struct kbase_device *kbdev)
 		struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i];
 		struct kbase_queue_group *group = csg_slot->resident_group;
 
-		if (!queue_group_idle_locked(group))
+		if (!queue_group_idle_locked(group)) {
 			group->run_state = KBASE_CSF_GROUP_IDLE;
+			KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_IDLE, group, group->run_state);
+		}
 	}
 
 	/* Return the number of idle slots for potential replacement */
 	return bitmap_weight(csg_bitmap, num_groups);
 }
 
+static int wait_csg_slots_suspend(struct kbase_device *kbdev, unsigned long *slot_mask)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	u32 num_groups = kbdev->csf.global_iface.group_num;
+	int err = 0;
+	DECLARE_BITMAP(slot_mask_local, MAX_SUPPORTED_CSGS);
+
+	lockdep_assert_held(&scheduler->lock);
+
+	bitmap_copy(slot_mask_local, slot_mask, MAX_SUPPORTED_CSGS);
+
+	while (!bitmap_empty(slot_mask_local, MAX_SUPPORTED_CSGS)) {
+		long remaining = kbase_csf_timeout_in_jiffies(kbase_get_timeout_ms(kbdev, CSF_CSG_SUSPEND_TIMEOUT));
+		DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS);
+
+		bitmap_copy(changed, slot_mask_local, MAX_SUPPORTED_CSGS);
+		remaining = wait_event_timeout(
+			kbdev->csf.event_wait,
+			slots_state_changed(kbdev, changed, csg_slot_stopped_locked), remaining);
+
+		if (likely(remaining)) {
+			u32 i;
+
+			for_each_set_bit(i, changed, num_groups) {
+				struct kbase_queue_group *group;
+
+				if (WARN_ON(!csg_slot_stopped_locked(kbdev, (s8)i)))
+					continue;
+
+				/* The on slot csg is now stopped */
+				clear_bit(i, slot_mask_local);
+
+				group = scheduler->csg_slots[i].resident_group;
+				if (likely(group)) {
+					/* Only do save/cleanup if the
+					 * group is not terminated during
+					 * the sleep.
+					 */
+
+					/* Only emit suspend, if there was no AS fault */
+					if (kctx_as_enabled(group->kctx) && !group->faulted)
+						KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG(
+							kbdev,
+							kbdev->gpu_props.props.raw_props.gpu_id, i);
+
+					save_csg_slot(group);
+					if (cleanup_csg_slot(group)) {
+						sched_evict_group(group, true, true);
+					}
+				}
+			}
+		} else {
+			dev_warn(
+				kbdev->dev,
+				"[%llu] Suspend request sent on CSG slots 0x%lx timed out for slots 0x%lx",
+				kbase_backend_get_cycle_cnt(kbdev), slot_mask[0],
+				slot_mask_local[0]);
+			/* Return the bitmask of the timed out slots to the caller */
+			bitmap_copy(slot_mask, slot_mask_local, MAX_SUPPORTED_CSGS);
+			err = -ETIMEDOUT;
+			break;
+		}
+	}
+
+	return err;
+}
+
+/**
+ * evict_lru_or_blocked_csg() - Evict the least-recently-used idle or blocked CSG
+ *
+ * @kbdev: Pointer to the device
+ *
+ * Used to allow for speedier starting/resumption of another CSG. The worst-case
+ * scenario of the evicted CSG being scheduled next is expected to be rare.
+ * Also, the eviction will not be applied if the GPU is running in protected mode.
+ * Otherwise the the eviction attempt would force the MCU to quit the execution of
+ * the protected mode, and likely re-request to enter it again.
+ */
+static void evict_lru_or_blocked_csg(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	size_t i;
+	struct kbase_queue_group *lru_idle_group = NULL;
+	const u32 total_csg_slots = kbdev->csf.global_iface.group_num;
+	const bool all_addr_spaces_used = (scheduler->num_active_address_spaces >=
+					   (kbdev->nr_hw_address_spaces - NUM_RESERVED_AS_SLOTS));
+	u8 as_usage[BASE_MAX_NR_AS] = { 0 };
+
+	lockdep_assert_held(&scheduler->lock);
+	if (kbase_csf_scheduler_protected_mode_in_use(kbdev))
+		return;
+
+	BUILD_BUG_ON(MAX_SUPPORTED_CSGS > (sizeof(int) * BITS_PER_BYTE));
+	if (fls(scheduler->csg_inuse_bitmap[0]) != total_csg_slots)
+		return; /* Some CSG slots remain unused */
+
+	if (all_addr_spaces_used) {
+		for (i = 0; i != total_csg_slots; ++i) {
+			if (scheduler->csg_slots[i].resident_group != NULL) {
+				if (WARN_ON(scheduler->csg_slots[i].resident_group->kctx->as_nr <
+					    0))
+					continue;
+
+				as_usage[scheduler->csg_slots[i].resident_group->kctx->as_nr]++;
+			}
+		}
+	}
+
+	for (i = 0; i != total_csg_slots; ++i) {
+		struct kbase_queue_group *const group = scheduler->csg_slots[i].resident_group;
+
+		/* We expect that by this point all groups would normally be
+		 * assigned a physical CSG slot, but if circumstances have
+		 * changed then bail out of this optimisation.
+		 */
+		if (group == NULL)
+			return;
+
+		/* Real-time priority CSGs must be kept on-slot even when
+		 * idle.
+		 */
+		if ((group->run_state == KBASE_CSF_GROUP_IDLE) &&
+		    (group->priority != KBASE_QUEUE_GROUP_PRIORITY_REALTIME) &&
+		    ((lru_idle_group == NULL) ||
+		     (lru_idle_group->prepared_seq_num < group->prepared_seq_num))) {
+			if (WARN_ON(group->kctx->as_nr < 0))
+				continue;
+
+			/* If all address spaces are used, we need to ensure the group does not
+			 * share the AS with other active CSGs. Or CSG would be freed without AS
+			 * and this optimization would not work.
+			 */
+			if ((!all_addr_spaces_used) || (as_usage[group->kctx->as_nr] == 1))
+				lru_idle_group = group;
+		}
+	}
+
+	if (lru_idle_group != NULL) {
+		unsigned long slot_mask = 1 << lru_idle_group->csg_nr;
+
+		dev_dbg(kbdev->dev, "Suspending LRU idle group %d of context %d_%d on slot %d",
+			lru_idle_group->handle, lru_idle_group->kctx->tgid,
+			lru_idle_group->kctx->id, lru_idle_group->csg_nr);
+		suspend_queue_group(lru_idle_group);
+		if (wait_csg_slots_suspend(kbdev, &slot_mask)) {
+			enum dumpfault_error_type error_type = DF_CSG_SUSPEND_TIMEOUT;
+
+			dev_warn(
+				kbdev->dev,
+				"[%llu] LRU idle group %d of context %d_%d failed to suspend on slot %d (timeout %d ms)",
+				kbase_backend_get_cycle_cnt(kbdev), lru_idle_group->handle,
+				lru_idle_group->kctx->tgid, lru_idle_group->kctx->id,
+				lru_idle_group->csg_nr, kbdev->csf.fw_timeout_ms);
+			if (kbase_csf_firmware_ping_wait(kbdev, FW_PING_AFTER_ERROR_TIMEOUT_MS))
+				error_type = DF_PING_REQUEST_TIMEOUT;
+			schedule_actions_trigger_df(kbdev, lru_idle_group->kctx, error_type);
+		}
+	}
+}
+
 static void schedule_actions(struct kbase_device *kbdev, bool is_tick)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
@@ -4473,6 +6047,11 @@ static void schedule_actions(struct kbase_device *kbdev, bool is_tick)
 	kbase_reset_gpu_assert_prevented(kbdev);
 	lockdep_assert_held(&scheduler->lock);
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	if (scheduler->gpu_idle_work_pending)
+		return;
+#endif
+
 	ret = kbase_csf_scheduler_wait_mcu_active(kbdev);
 	if (ret) {
 		dev_err(kbdev->dev,
@@ -4480,6 +6059,10 @@ static void schedule_actions(struct kbase_device *kbdev, bool is_tick)
 		return;
 	}
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	turn_on_sc_power_rails(kbdev);
+#endif
+
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 	skip_idle_slots_update = kbase_csf_scheduler_protected_mode_in_use(kbdev);
 	skip_scheduling_actions =
@@ -4522,7 +6105,7 @@ redo_local_tock:
 	if (unlikely(!scheduler->ngrp_to_schedule &&
 		     scheduler->total_runnable_grps)) {
 		dev_dbg(kbdev->dev, "No groups to schedule in the tick");
-		enqueue_gpu_idle_work(scheduler);
+		enqueue_gpu_idle_work(scheduler, 0);
 		return;
 	}
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
@@ -4539,13 +6122,13 @@ redo_local_tock:
 	 * queue jobs.
 	 */
 	if (protm_grp && scheduler->top_grp == protm_grp) {
-		int new_val;
-
 		dev_dbg(kbdev->dev, "Scheduler keep protm exec: group-%d",
 			protm_grp->handle);
-		new_val = atomic_dec_return(&scheduler->non_idle_offslot_grps);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_DEC,
-					 protm_grp, new_val);
+		spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+
+		update_offslot_non_idle_cnt_for_onslot_grp(protm_grp);
+		remove_scheduled_group(kbdev, protm_grp);
+		scheduler_check_pmode_progress(kbdev);
 	} else if (scheduler->top_grp) {
 		if (protm_grp)
 			dev_dbg(kbdev->dev, "Scheduler drop protm exec: group-%d",
@@ -4599,11 +6182,11 @@ redo_local_tock:
 				goto redo_local_tock;
 			}
 		}
-
-		return;
+	} else {
+		spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 	}
 
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+	evict_lru_or_blocked_csg(kbdev);
 }
 
 /**
@@ -4625,6 +6208,9 @@ static bool can_skip_scheduling(struct kbase_device *kbdev)
 
 	lockdep_assert_held(&scheduler->lock);
 
+	if (unlikely(!kbase_reset_gpu_is_not_pending(kbdev)))
+		return true;
+
 	if (scheduler->state == SCHED_SUSPENDED)
 		return true;
 
@@ -4634,12 +6220,12 @@ static bool can_skip_scheduling(struct kbase_device *kbdev)
 
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		if (kbdev->pm.backend.exit_gpu_sleep_mode) {
-			int ret = scheduler_pm_active_after_sleep(kbdev, flags);
-			/* hwaccess_lock is released in the previous function
-			 * call.
-			 */
+			int ret = scheduler_pm_active_after_sleep(kbdev, &flags);
+
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 			if (!ret) {
 				scheduler->state = SCHED_INACTIVE;
+				KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state);
 				return false;
 			}
 
@@ -4655,16 +6241,11 @@ static bool can_skip_scheduling(struct kbase_device *kbdev)
 	return false;
 }
 
-static void schedule_on_tock(struct work_struct *work)
+static void schedule_on_tock(struct kbase_device *kbdev)
 {
-	struct kbase_device *kbdev = container_of(work, struct kbase_device,
-					csf.scheduler.tock_work.work);
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	int err;
 
-	/* Tock work item is serviced */
-	scheduler->tock_pending_request = false;
-
 	err = kbase_reset_gpu_try_prevent(kbdev);
 	/* Regardless of whether reset failed or is currently happening, exit
 	 * early
@@ -4672,41 +6253,46 @@ static void schedule_on_tock(struct work_struct *work)
 	if (err)
 		return;
 
-	mutex_lock(&scheduler->lock);
+	kbase_debug_csf_fault_wait_completion(kbdev);
+	rt_mutex_lock(&scheduler->lock);
 	if (can_skip_scheduling(kbdev))
+	{
+		atomic_set(&scheduler->pending_tock_work, false);
 		goto exit_no_schedule_unlock;
+	}
 
 	WARN_ON(!(scheduler->state == SCHED_INACTIVE));
 	scheduler->state = SCHED_BUSY;
+	KBASE_KTRACE_ADD(kbdev, SCHED_BUSY, NULL, scheduler->state);
 
 	/* Undertaking schedule action steps */
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK, NULL, 0u);
-	schedule_actions(kbdev, false);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_START, NULL, 0u);
+	while (atomic_cmpxchg(&scheduler->pending_tock_work, true, false) == true)
+		schedule_actions(kbdev, false);
 
 	/* Record time information on a non-skipped tock */
 	scheduler->last_schedule = jiffies;
 
 	scheduler->state = SCHED_INACTIVE;
+	KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state);
 	if (!scheduler->total_runnable_grps)
-		enqueue_gpu_idle_work(scheduler);
-	mutex_unlock(&scheduler->lock);
+		enqueue_gpu_idle_work(scheduler, 0);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	emit_gpu_metrics_to_frontend(kbdev);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+	rt_mutex_unlock(&scheduler->lock);
 	kbase_reset_gpu_allow(kbdev);
 
-	dev_dbg(kbdev->dev,
-		"Waking up for event after schedule-on-tock completes.");
-	wake_up_all(&kbdev->csf.event_wait);
 	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_END, NULL, 0u);
 	return;
 
 exit_no_schedule_unlock:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 	kbase_reset_gpu_allow(kbdev);
 }
 
-static void schedule_on_tick(struct work_struct *work)
+static void schedule_on_tick(struct kbase_device *kbdev)
 {
-	struct kbase_device *kbdev = container_of(work, struct kbase_device,
-					csf.scheduler.tick_work);
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 
 	int err = kbase_reset_gpu_try_prevent(kbdev);
@@ -4716,109 +6302,51 @@ static void schedule_on_tick(struct work_struct *work)
 	if (err)
 		return;
 
-	mutex_lock(&scheduler->lock);
+	kbase_debug_csf_fault_wait_completion(kbdev);
+	rt_mutex_lock(&scheduler->lock);
 
-	WARN_ON(scheduler->tick_timer_active);
 	if (can_skip_scheduling(kbdev))
 		goto exit_no_schedule_unlock;
 
 	scheduler->state = SCHED_BUSY;
+	KBASE_KTRACE_ADD(kbdev, SCHED_BUSY, NULL, scheduler->state);
 
 	/* Undertaking schedule action steps */
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK, NULL,
-			 scheduler->total_runnable_grps);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_START, NULL, scheduler->total_runnable_grps);
 	schedule_actions(kbdev, true);
 
 	/* Record time information */
 	scheduler->last_schedule = jiffies;
 
 	/* Kicking next scheduling if needed */
-	if (likely(scheduler_timer_is_enabled_nolock(kbdev)) &&
-			(scheduler->total_runnable_grps > 0)) {
-		start_tick_timer(kbdev);
-		dev_dbg(kbdev->dev,
-			"scheduling for next tick, num_runnable_groups:%u\n",
+	if (likely(kbase_csf_scheduler_timer_is_enabled(kbdev)) &&
+	    (scheduler->total_runnable_grps > 0)) {
+		hrtimer_start(&scheduler->tick_timer,
+			      HR_TIMER_DELAY_MSEC(scheduler->csg_scheduling_period_ms),
+			      HRTIMER_MODE_REL);
+		dev_dbg(kbdev->dev, "scheduling for next tick, num_runnable_groups:%u\n",
 			scheduler->total_runnable_grps);
 	} else if (!scheduler->total_runnable_grps) {
-		enqueue_gpu_idle_work(scheduler);
+		enqueue_gpu_idle_work(scheduler, 0);
 	}
 
 	scheduler->state = SCHED_INACTIVE;
-	mutex_unlock(&scheduler->lock);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	emit_gpu_metrics_to_frontend(kbdev);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+	rt_mutex_unlock(&scheduler->lock);
+	KBASE_KTRACE_ADD(kbdev, SCHED_INACTIVE, NULL, scheduler->state);
 	kbase_reset_gpu_allow(kbdev);
 
-	dev_dbg(kbdev->dev, "Waking up for event after schedule-on-tick completes.");
-	wake_up_all(&kbdev->csf.event_wait);
 	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_END, NULL,
 			 scheduler->total_runnable_grps);
 	return;
 
 exit_no_schedule_unlock:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 	kbase_reset_gpu_allow(kbdev);
 }
 
-static int wait_csg_slots_suspend(struct kbase_device *kbdev,
-			   const unsigned long *slot_mask,
-			   unsigned int timeout_ms)
-{
-	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-	long remaining = kbase_csf_timeout_in_jiffies(timeout_ms);
-	u32 num_groups = kbdev->csf.global_iface.group_num;
-	int err = 0;
-	DECLARE_BITMAP(slot_mask_local, MAX_SUPPORTED_CSGS);
-
-	lockdep_assert_held(&scheduler->lock);
-
-	bitmap_copy(slot_mask_local, slot_mask, MAX_SUPPORTED_CSGS);
-
-	while (!bitmap_empty(slot_mask_local, MAX_SUPPORTED_CSGS)
-		&& remaining) {
-		DECLARE_BITMAP(changed, MAX_SUPPORTED_CSGS);
-
-		bitmap_copy(changed, slot_mask_local, MAX_SUPPORTED_CSGS);
-
-		remaining = wait_event_timeout(kbdev->csf.event_wait,
-			slots_state_changed(kbdev, changed,
-				csg_slot_stopped_locked),
-			remaining);
-
-		if (remaining) {
-			u32 i;
-
-			for_each_set_bit(i, changed, num_groups) {
-				struct kbase_queue_group *group;
-
-				if (WARN_ON(!csg_slot_stopped_locked(kbdev, (s8)i)))
-					continue;
-
-				/* The on slot csg is now stopped */
-				clear_bit(i, slot_mask_local);
-
-				group = scheduler->csg_slots[i].resident_group;
-				if (likely(group)) {
-					/* Only do save/cleanup if the
-					 * group is not terminated during
-					 * the sleep.
-					 */
-					save_csg_slot(group);
-					if (cleanup_csg_slot(group))
-						sched_evict_group(group, true, true);
-				}
-			}
-		} else {
-			dev_warn(kbdev->dev, "[%llu] Timeout waiting for CSG slots to suspend, slot_mask: 0x%*pb\n",
-				 kbase_backend_get_cycle_cnt(kbdev),
-				 num_groups, slot_mask_local);
-
-
-			err = -ETIMEDOUT;
-		}
-	}
-
-	return err;
-}
-
 static int suspend_active_queue_groups(struct kbase_device *kbdev,
 				       unsigned long *slot_mask)
 {
@@ -4839,7 +6367,7 @@ static int suspend_active_queue_groups(struct kbase_device *kbdev,
 		}
 	}
 
-	ret = wait_csg_slots_suspend(kbdev, slot_mask, kbdev->reset_timeout_ms);
+	ret = wait_csg_slots_suspend(kbdev, slot_mask);
 	return ret;
 }
 
@@ -4850,13 +6378,18 @@ static int suspend_active_queue_groups_on_reset(struct kbase_device *kbdev)
 	int ret;
 	int ret2;
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	ret = suspend_active_queue_groups(kbdev, slot_mask);
 
 	if (ret) {
 		dev_warn(kbdev->dev, "Timeout waiting for CSG slots to suspend before reset, slot_mask: 0x%*pb\n",
 			 kbdev->csf.global_iface.group_num, slot_mask);
+		//TODO: should introduce SSCD report if this happens.
+		kbase_gpu_timeout_debug_message(kbdev, "");
+		dev_warn(kbdev->dev, "[%llu] Firmware ping %d",
+				kbase_backend_get_cycle_cnt(kbdev),
+				kbase_csf_firmware_ping_wait(kbdev, 0));
 	}
 
 	/* Need to flush the GPU cache to ensure suspend buffer
@@ -4874,16 +6407,15 @@ static int suspend_active_queue_groups_on_reset(struct kbase_device *kbdev)
 	 * overflow.
 	 */
 	kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC);
-	ret2 = kbase_gpu_wait_cache_clean_timeout(kbdev,
-			kbdev->reset_timeout_ms);
+	ret2 = kbase_gpu_wait_cache_clean_timeout(kbdev, kbdev->mmu_or_gpu_cache_op_wait_time_ms);
 	if (ret2) {
-		dev_warn(kbdev->dev, "[%llu] Timeout waiting for cache clean to complete before reset",
-			 kbase_backend_get_cycle_cnt(kbdev));
+		dev_err(kbdev->dev, "[%llu] Timeout waiting for CACHE_CLN_INV_L2_LSC",
+			kbase_backend_get_cycle_cnt(kbdev));
 		if (!ret)
 			ret = ret2;
 	}
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 
 	return ret;
 }
@@ -4920,7 +6452,7 @@ static bool scheduler_handle_reset_in_protected_mode(struct kbase_device *kbdev)
 	unsigned long flags;
 	u32 csg_nr;
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 	protm_grp = scheduler->active_protm_grp;
@@ -4981,27 +6513,21 @@ static bool scheduler_handle_reset_in_protected_mode(struct kbase_device *kbdev)
 
 		cleanup_csg_slot(group);
 		group->run_state = KBASE_CSF_GROUP_SUSPENDED;
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_SUSPENDED, group, group->run_state);
 
 		/* Simply treat the normal mode groups as non-idle. The tick
 		 * scheduled after the reset will re-initialize the counter
 		 * anyways.
 		 */
 		new_val = atomic_inc_return(&scheduler->non_idle_offslot_grps);
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_INC,
-					 group, new_val);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_NONIDLE_OFFSLOT_GRP_INC, group, new_val);
 	}
 
 unlock:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 	return suspend_on_slot_groups;
 }
 
-static void cancel_tock_work(struct kbase_csf_scheduler *const scheduler)
-{
-	cancel_delayed_work_sync(&scheduler->tock_work);
-	scheduler->tock_pending_request = false;
-}
-
 static void scheduler_inner_reset(struct kbase_device *kbdev)
 {
 	u32 const num_groups = kbdev->csf.global_iface.group_num;
@@ -5011,19 +6537,22 @@ static void scheduler_inner_reset(struct kbase_device *kbdev)
 	WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev));
 
 	/* Cancel any potential queued delayed work(s) */
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	cancel_delayed_work_sync(&scheduler->gpu_idle_work);
+#else
 	cancel_work_sync(&kbdev->csf.scheduler.gpu_idle_work);
-	cancel_tick_timer(kbdev);
-	cancel_work_sync(&scheduler->tick_work);
+#endif
+	cancel_tick_work(scheduler);
 	cancel_tock_work(scheduler);
 	cancel_delayed_work_sync(&scheduler->ping_work);
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 	bitmap_fill(scheduler->csgs_events_enable_mask, MAX_SUPPORTED_CSGS);
 	if (scheduler->active_protm_grp)
-		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_EXIT_PROTM,
-					 scheduler->active_protm_grp, 0u);
+		KBASE_KTRACE_ADD_CSF_GRP(kbdev, SCHEDULER_PROTM_EXIT, scheduler->active_protm_grp,
+					 0u);
 	scheduler->active_protm_grp = NULL;
 	memset(kbdev->csf.scheduler.csg_slots, 0,
 	       num_groups * sizeof(struct kbase_csf_csg_slot));
@@ -5037,7 +6566,7 @@ static void scheduler_inner_reset(struct kbase_device *kbdev)
 			scheduler->num_active_address_spaces |
 			(((u64)scheduler->total_runnable_grps) << 32));
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 
 void kbase_csf_scheduler_reset(struct kbase_device *kbdev)
@@ -5046,7 +6575,9 @@ void kbase_csf_scheduler_reset(struct kbase_device *kbdev)
 
 	WARN_ON(!kbase_reset_gpu_is_active(kbdev));
 
-	KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET, NULL, 0u);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET_START, NULL, 0u);
+
+	kbase_debug_csf_fault_wait_completion(kbdev);
 
 	if (scheduler_handle_reset_in_protected_mode(kbdev) &&
 	    !suspend_active_queue_groups_on_reset(kbdev)) {
@@ -5084,6 +6615,8 @@ void kbase_csf_scheduler_reset(struct kbase_device *kbdev)
 
 	mutex_unlock(&kbdev->kctx_list_lock);
 
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_RESET_END, NULL, 0u);
+
 	/* After queue groups reset, the scheduler data fields clear out */
 	scheduler_inner_reset(kbdev);
 }
@@ -5111,7 +6644,7 @@ static void firmware_aliveness_monitor(struct work_struct *work)
 		return;
 	}
 
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 
 #ifdef CONFIG_MALI_DEBUG
 	if (fw_debug) {
@@ -5138,7 +6671,7 @@ static void firmware_aliveness_monitor(struct work_struct *work)
 
 	kbase_csf_scheduler_wait_mcu_active(kbdev);
 
-	err = kbase_csf_firmware_ping_wait(kbdev);
+	err = kbase_csf_firmware_ping_wait(kbdev, kbdev->csf.fw_timeout_ms);
 
 	if (err) {
 		/* It is acceptable to enqueue a reset whilst we've prevented
@@ -5148,14 +6681,14 @@ static void firmware_aliveness_monitor(struct work_struct *work)
 			    kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
 			kbase_reset_gpu(kbdev);
 	} else if (kbase_csf_scheduler_get_nr_active_csgs(kbdev) == 1) {
-		queue_delayed_work(system_long_wq,
-			&kbdev->csf.scheduler.ping_work,
-			msecs_to_jiffies(FIRMWARE_PING_INTERVAL_MS));
+		queue_delayed_work(
+			system_long_wq, &kbdev->csf.scheduler.ping_work,
+			msecs_to_jiffies(kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_PING_TIMEOUT)));
 	}
 
 	kbase_pm_context_idle(kbdev);
 exit:
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 	kbase_reset_gpu_allow(kbdev);
 }
 
@@ -5170,7 +6703,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
 
 	kbase_reset_gpu_assert_prevented(kbdev);
 	lockdep_assert_held(&kctx->csf.lock);
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
 	on_slot = kbasep_csf_scheduler_group_is_on_slot_locked(group);
 
@@ -5207,9 +6740,13 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
 
 		if (!WARN_ON(scheduler->state == SCHED_SUSPENDED))
 			suspend_queue_group(group);
-		err = wait_csg_slots_suspend(kbdev, slot_mask,
-					     kbdev->csf.fw_timeout_ms);
+		err = wait_csg_slots_suspend(kbdev, slot_mask);
 		if (err) {
+			const struct gpu_uevent evt = {
+				.type = GPU_UEVENT_TYPE_KMD_ERROR,
+				.info = GPU_UEVENT_INFO_CSG_GROUP_SUSPEND
+			};
+			pixel_gpu_uevent_send(kbdev, &evt);
 			dev_warn(kbdev->dev, "[%llu] Timeout waiting for the group %d to suspend on slot %d",
 				 kbase_backend_get_cycle_cnt(kbdev),
 				 group->handle, group->csg_nr);
@@ -5248,7 +6785,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
 				target_page_nr < sus_buf->nr_pages; i++) {
 			struct page *pg =
 				as_page(group->normal_suspend_buf.phy[i]);
-			void *sus_page = kmap(pg);
+			void *sus_page = kbase_kmap(pg);
 
 			if (sus_page) {
 				kbase_sync_single_for_cpu(kbdev,
@@ -5259,7 +6796,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
 						sus_buf->pages, sus_page,
 						&to_copy, sus_buf->nr_pages,
 						&target_page_nr, offset);
-				kunmap(pg);
+				kbase_kunmap(pg, sus_page);
 				if (err)
 					break;
 			} else {
@@ -5274,7 +6811,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
 	}
 
 exit:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 	return err;
 }
 
@@ -5375,15 +6912,31 @@ static struct kbase_queue_group *scheduler_get_protm_enter_async_group(
 
 		spin_lock_irqsave(&scheduler->interrupt_lock, flags);
 
-		if (kbase_csf_scheduler_protected_mode_in_use(kbdev) ||
-		    bitmap_empty(pending, ginfo->stream_num))
+		if (bitmap_empty(pending, ginfo->stream_num)) {
+			dev_dbg(kbdev->dev,
+				"Pmode requested for group %d of ctx %d_%d with no pending queues",
+				input_grp->handle, input_grp->kctx->tgid, input_grp->kctx->id);
+			input_grp = NULL;
+		} else if (kbase_csf_scheduler_protected_mode_in_use(kbdev)) {
+			kbase_csf_scheduler_invoke_tock(kbdev);
 			input_grp = NULL;
+		}
 
 		spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
 	} else {
+		if (group && (group->priority == KBASE_QUEUE_GROUP_PRIORITY_REALTIME))
+			kbase_csf_scheduler_invoke_tock(kbdev);
+
 		input_grp = NULL;
 	}
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	if (input_grp && kbdev->csf.scheduler.sc_power_rails_off) {
+		dev_warn(kbdev->dev, "SC power rails unexpectedly off in async protm enter");
+		return NULL;
+	}
+#endif
+
 	return input_grp;
 }
 
@@ -5399,15 +6952,15 @@ void kbase_csf_scheduler_group_protm_enter(struct kbase_queue_group *group)
 	if (err)
 		return;
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 
-	if (group->run_state == KBASE_CSF_GROUP_IDLE)
-		group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+	if (on_slot_group_idle_locked(group))
+		update_idle_protm_group_state_to_runnable(group);
 	/* Check if the group is now eligible for execution in protected mode. */
 	if (scheduler_get_protm_enter_async_group(kbdev, group))
 		scheduler_group_check_protm_enter(kbdev, group);
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 	kbase_reset_gpu_allow(kbdev);
 }
 
@@ -5450,7 +7003,7 @@ static bool check_sync_update_for_on_slot_group(
 					stream, CS_STATUS_WAIT);
 			unsigned long flags;
 
-			KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_STATUS_WAIT,
+			KBASE_KTRACE_ADD_CSF_GRP_Q(kbdev, QUEUE_SYNC_UPDATE_WAIT_STATUS,
 						   queue->group, queue, status);
 
 			if (!CS_STATUS_WAIT_SYNC_WAIT_GET(status))
@@ -5477,6 +7030,10 @@ static bool check_sync_update_for_on_slot_group(
 			if (!evaluate_sync_update(queue))
 				continue;
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+			queue->status_wait = 0;
+#endif
+
 			/* Update csg_slots_idle_mask and group's run_state */
 			if (group->run_state != KBASE_CSF_GROUP_RUNNABLE) {
 				/* Only clear the group's idle flag if it has been dealt
@@ -5492,11 +7049,34 @@ static bool check_sync_update_for_on_slot_group(
 					scheduler->csg_slots_idle_mask[0]);
 				spin_unlock_irqrestore(
 					&scheduler->interrupt_lock, flags);
+				/* Request the scheduler to confirm the condition inferred
+				 * here inside the protected mode.
+				 */
+				group->reevaluate_idle_status = true;
 				group->run_state = KBASE_CSF_GROUP_RUNNABLE;
+				KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_RUNNABLE, group,
+							 group->run_state);
 			}
 
 			KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_SYNC_UPDATE_DONE, group, 0u);
 			sync_update_done = true;
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+			/* As the queue of an on-slot group has become unblocked,
+			 * the power rails can be turned on and the execution can
+			 * be resumed on HW.
+			 */
+			if (kbdev->csf.scheduler.sc_power_rails_off) {
+				cancel_gpu_idle_work(kbdev);
+				turn_on_sc_power_rails(kbdev);
+				spin_lock_irqsave(&scheduler->interrupt_lock,
+						  flags);
+				kbase_csf_ring_cs_kernel_doorbell(kbdev,
+					queue->csi_index, group->csg_nr, true);
+				spin_unlock_irqrestore(&scheduler->interrupt_lock,
+						  flags);
+			}
+#endif
 		}
 	}
 
@@ -5571,17 +7151,34 @@ static void check_sync_update_in_sleep_mode(struct kbase_device *kbdev)
 			continue;
 
 		if (check_sync_update_for_on_slot_group(group)) {
-			/* As sync update has been performed for an on-slot
-			 * group, when MCU is in sleep state, ring the doorbell
-			 * so that FW can re-evaluate the SYNC_WAIT on wakeup.
-			 */
-			kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 			scheduler_wakeup(kbdev, true);
 			return;
 		}
 	}
 }
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void check_sync_update_after_sc_power_down(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	u32 const num_groups = kbdev->csf.global_iface.group_num;
+	u32 csg_nr;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	for (csg_nr = 0; csg_nr < num_groups; csg_nr++) {
+		struct kbase_queue_group *const group =
+			kbdev->csf.scheduler.csg_slots[csg_nr].resident_group;
+
+		if (!group)
+			continue;
+
+		if (check_sync_update_for_on_slot_group(group))
+			return;
+	}
+}
+#endif
+
 /**
  * check_group_sync_update_worker() - Check the sync wait condition for all the
  *                                    blocked queue groups
@@ -5597,7 +7194,7 @@ static void check_sync_update_in_sleep_mode(struct kbase_device *kbdev)
  * runnable groups so that Scheduler can consider scheduling the group
  * in next tick or exit protected mode.
  */
-static void check_group_sync_update_worker(struct work_struct *work)
+static void check_group_sync_update_worker(struct kthread_work *work)
 {
 	struct kbase_context *const kctx = container_of(work,
 		struct kbase_context, csf.sched.sync_update_work);
@@ -5605,9 +7202,18 @@ static void check_group_sync_update_worker(struct work_struct *work)
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	bool sync_updated = false;
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	if (unlikely(scheduler->state == SCHED_BUSY)) {
+		kthread_queue_work(&kctx->csf.sched.sync_update_worker,
+			&kctx->csf.sched.sync_update_work);
+		rt_mutex_unlock(&scheduler->lock);
+		return;
+	}
+#endif
 
-	KBASE_KTRACE_ADD(kbdev, GROUP_SYNC_UPDATE_WORKER_BEGIN, kctx, 0u);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START, kctx, 0u);
 	if (kctx->csf.sched.num_idle_wait_grps != 0) {
 		struct kbase_queue_group *group, *temp;
 
@@ -5620,6 +7226,14 @@ static void check_group_sync_update_worker(struct work_struct *work)
 				 */
 				update_idle_suspended_group_state(group);
 				KBASE_KTRACE_ADD_CSF_GRP(kbdev, GROUP_SYNC_UPDATE_DONE, group, 0u);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+				cancel_gpu_idle_work(kbdev);
+				/* As an off-slot group has become runnable,
+				 * the rails will be turned on and the CS
+				 * kernel doorbell will be rung from the
+				 * scheduling tick.
+				 */
+#endif
 			}
 		}
 	} else {
@@ -5637,9 +7251,18 @@ static void check_group_sync_update_worker(struct work_struct *work)
 	if (!sync_updated && (scheduler->state == SCHED_SLEEPING))
 		check_sync_update_in_sleep_mode(kbdev);
 
-	KBASE_KTRACE_ADD(kbdev, GROUP_SYNC_UPDATE_WORKER_END, kctx, 0u);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/* Check if the sync update happened for a blocked on-slot group,
+	 * after the shader core power rails were turned off and reactivate
+	 * the GPU if the wait condition is met for the blocked group.
+	 */
+	if (!sync_updated && scheduler->sc_power_rails_off)
+		check_sync_update_after_sc_power_down(kbdev);
+#endif
+
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END, kctx, 0u);
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 
 static
@@ -5647,9 +7270,9 @@ enum kbase_csf_event_callback_action check_group_sync_update_cb(void *param)
 {
 	struct kbase_context *const kctx = param;
 
-	KBASE_KTRACE_ADD(kctx->kbdev, SYNC_UPDATE_EVENT, kctx, 0u);
+	KBASE_KTRACE_ADD(kctx->kbdev, SCHEDULER_GROUP_SYNC_UPDATE_EVENT, kctx, 0u);
 
-	queue_work(kctx->csf.sched.sync_update_wq,
+	kthread_queue_work(&kctx->csf.sched.sync_update_worker,
 		&kctx->csf.sched.sync_update_work);
 
 	return KBASE_CSF_EVENT_CALLBACK_KEEP;
@@ -5659,6 +7282,15 @@ int kbase_csf_scheduler_context_init(struct kbase_context *kctx)
 {
 	int priority;
 	int err;
+	struct kbase_device *kbdev = kctx->kbdev;
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	err = gpu_metrics_ctx_init(kctx);
+	if (err)
+		return err;
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+
+	kbase_ctx_sched_init_ctx(kctx);
 
 	for (priority = 0; priority < KBASE_QUEUE_GROUP_PRIORITY_COUNT;
 	     ++priority) {
@@ -5670,34 +7302,113 @@ int kbase_csf_scheduler_context_init(struct kbase_context *kctx)
 	kctx->csf.sched.num_idle_wait_grps = 0;
 	kctx->csf.sched.ngrp_to_schedule = 0;
 
-	kctx->csf.sched.sync_update_wq =
-		alloc_ordered_workqueue("mali_kbase_csf_sync_update_wq",
-			WQ_HIGHPRI);
-	if (!kctx->csf.sched.sync_update_wq) {
+	err = kbase_kthread_run_worker_rt(kctx->kbdev, &kctx->csf.sched.sync_update_worker, "csf_sync_update");
+	if (err) {
 		dev_err(kctx->kbdev->dev,
 			"Failed to initialize scheduler context workqueue");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto alloc_wq_failed;
 	}
 
-	INIT_WORK(&kctx->csf.sched.sync_update_work,
+	kthread_init_work(&kctx->csf.sched.sync_update_work,
 		check_group_sync_update_worker);
 
+	kbase_csf_tiler_heap_reclaim_ctx_init(kctx);
+
 	err = kbase_csf_event_wait_add(kctx, check_group_sync_update_cb, kctx);
 
 	if (err) {
-		dev_err(kctx->kbdev->dev,
-			"Failed to register a sync update callback");
-		destroy_workqueue(kctx->csf.sched.sync_update_wq);
+		dev_err(kbdev->dev, "Failed to register a sync update callback");
+		goto event_wait_add_failed;
 	}
 
 	return err;
+
+event_wait_add_failed:
+	kbase_destroy_kworker_stack(&kctx->csf.sched.sync_update_worker);
+alloc_wq_failed:
+	kbase_ctx_sched_remove_ctx(kctx);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	gpu_metrics_ctx_term(kctx);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+	return err;
 }
 
 void kbase_csf_scheduler_context_term(struct kbase_context *kctx)
 {
 	kbase_csf_event_wait_remove(kctx, check_group_sync_update_cb, kctx);
-	cancel_work_sync(&kctx->csf.sched.sync_update_work);
-	destroy_workqueue(kctx->csf.sched.sync_update_wq);
+	kthread_cancel_work_sync(&kctx->csf.sched.sync_update_work);
+	kbase_destroy_kworker_stack(&kctx->csf.sched.sync_update_worker);
+
+	kbase_ctx_sched_remove_ctx(kctx);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	gpu_metrics_ctx_term(kctx);
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+}
+
+static int kbase_csf_scheduler_kthread(void *data)
+{
+	struct kbase_device *const kbdev = data;
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+
+	while (scheduler->kthread_running) {
+		struct kbase_queue *queue;
+
+		if (wait_for_completion_interruptible(&scheduler->kthread_signal) != 0)
+			continue;
+		reinit_completion(&scheduler->kthread_signal);
+
+		/* Iterate through queues with pending kicks */
+		do {
+			u8 prio;
+
+			spin_lock(&kbdev->csf.pending_gpuq_kicks_lock);
+			queue = NULL;
+			for (prio = 0; prio != KBASE_QUEUE_GROUP_PRIORITY_COUNT; ++prio) {
+				if (!list_empty(&kbdev->csf.pending_gpuq_kicks[prio])) {
+					queue = list_first_entry(
+						&kbdev->csf.pending_gpuq_kicks[prio],
+						struct kbase_queue, pending_kick_link);
+					list_del_init(&queue->pending_kick_link);
+					break;
+				}
+			}
+			spin_unlock(&kbdev->csf.pending_gpuq_kicks_lock);
+
+			if (queue != NULL) {
+				WARN_ONCE(
+					prio != queue->group_priority,
+					"Queue %pK has priority %hhu but instead its kick was handled at priority %hhu",
+					(void *)queue, queue->group_priority, prio);
+
+				kbase_csf_process_queue_kick(queue);
+
+				/* Perform a scheduling tock for high-priority queue groups if
+				 * required.
+				 */
+				BUILD_BUG_ON(KBASE_QUEUE_GROUP_PRIORITY_REALTIME != 0);
+				BUILD_BUG_ON(KBASE_QUEUE_GROUP_PRIORITY_HIGH != 1);
+				if ((prio <= KBASE_QUEUE_GROUP_PRIORITY_HIGH) &&
+				    atomic_read(&scheduler->pending_tock_work))
+					schedule_on_tock(kbdev);
+			}
+		} while (queue != NULL);
+
+		/* Check if we need to perform a scheduling tick/tock. A tick
+		 * event shall override a tock event but not vice-versa.
+		 */
+		if (atomic_cmpxchg(&scheduler->pending_tick_work, true, false) == true) {
+			atomic_set(&scheduler->pending_tock_work, false);
+			schedule_on_tick(kbdev);
+		} else if (atomic_read(&scheduler->pending_tock_work)) {
+			schedule_on_tock(kbdev);
+		}
+
+		dev_dbg(kbdev->dev, "Waking up for event after a scheduling iteration.");
+		wake_up_all(&kbdev->csf.event_wait);
+	}
+
+	return 0;
 }
 
 int kbase_csf_scheduler_init(struct kbase_device *kbdev)
@@ -5716,35 +7427,56 @@ int kbase_csf_scheduler_init(struct kbase_device *kbdev)
 		return -ENOMEM;
 	}
 
-	return 0;
+	init_completion(&scheduler->kthread_signal);
+	scheduler->kthread_running = true;
+	scheduler->gpuq_kthread =
+		kbase_kthread_run_rt(kbdev, &kbase_csf_scheduler_kthread, kbdev, "mali-gpuq-kthread");
+	if (IS_ERR(scheduler->gpuq_kthread)) {
+		kfree(scheduler->csg_slots);
+		scheduler->csg_slots = NULL;
+
+		dev_err(kbdev->dev, "Failed to spawn the GPU queue submission worker thread");
+		return -ENOMEM;
+	}
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+	scheduler->gpu_metrics_tb =
+		kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_GPU_METRICS_BUF_NAME);
+	if (!scheduler->gpu_metrics_tb) {
+		scheduler->kthread_running = false;
+		complete(&scheduler->kthread_signal);
+		kthread_stop(scheduler->gpuq_kthread);
+		scheduler->gpuq_kthread = NULL;
+
+		kfree(scheduler->csg_slots);
+		scheduler->csg_slots = NULL;
+
+		dev_err(kbdev->dev, "Failed to get the handler of gpu_metrics from trace buffer");
+		return -ENOENT;
+	}
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+
+	return kbase_csf_mcu_shared_regs_data_init(kbdev);
 }
 
 int kbase_csf_scheduler_early_init(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 
-	scheduler->timer_enabled = true;
+	atomic_set(&scheduler->timer_enabled, true);
 
-	scheduler->wq = alloc_ordered_workqueue("csf_scheduler_wq", WQ_HIGHPRI);
-	if (!scheduler->wq) {
-		dev_err(kbdev->dev, "Failed to allocate scheduler workqueue\n");
-		return -ENOMEM;
-	}
 	scheduler->idle_wq = alloc_ordered_workqueue(
 		"csf_scheduler_gpu_idle_wq", WQ_HIGHPRI);
 	if (!scheduler->idle_wq) {
-		dev_err(kbdev->dev,
-			"Failed to allocate GPU idle scheduler workqueue\n");
-		destroy_workqueue(kbdev->csf.scheduler.wq);
+		dev_err(kbdev->dev, "Failed to allocate GPU idle scheduler workqueue\n");
 		return -ENOMEM;
 	}
 
-	INIT_WORK(&scheduler->tick_work, schedule_on_tick);
-	INIT_DEFERRABLE_WORK(&scheduler->tock_work, schedule_on_tock);
+	atomic_set(&scheduler->pending_tick_work, false);
+	atomic_set(&scheduler->pending_tock_work, false);
 
 	INIT_DEFERRABLE_WORK(&scheduler->ping_work, firmware_aliveness_monitor);
 
-	mutex_init(&scheduler->lock);
+	rt_mutex_init(&scheduler->lock);
 	spin_lock_init(&scheduler->interrupt_lock);
 
 	/* Internal lists */
@@ -5756,30 +7488,48 @@ int kbase_csf_scheduler_early_init(struct kbase_device *kbdev)
 		(sizeof(scheduler->csgs_events_enable_mask) * BITS_PER_BYTE));
 	bitmap_fill(scheduler->csgs_events_enable_mask, MAX_SUPPORTED_CSGS);
 	scheduler->state = SCHED_SUSPENDED;
+	KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state);
 	scheduler->pm_active_count = 0;
 	scheduler->ngrp_to_schedule = 0;
 	scheduler->total_runnable_grps = 0;
 	scheduler->top_ctx = NULL;
 	scheduler->top_grp = NULL;
 	scheduler->last_schedule = 0;
-	scheduler->tock_pending_request = false;
 	scheduler->active_protm_grp = NULL;
 	scheduler->csg_scheduling_period_ms = CSF_SCHEDULER_TIME_TICK_MS;
 	scheduler_doorbell_init(kbdev);
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	INIT_DEFERRABLE_WORK(&scheduler->gpu_idle_work, gpu_idle_worker);
+	INIT_WORK(&scheduler->sc_rails_off_work, sc_rails_off_worker);
+	scheduler->sc_power_rails_off = true;
+	scheduler->gpu_idle_work_pending = false;
+	scheduler->gpu_idle_fw_timer_enabled = false;
+#else
 	INIT_WORK(&scheduler->gpu_idle_work, gpu_idle_worker);
+#endif
+	scheduler->fast_gpu_idle_handling = false;
 	atomic_set(&scheduler->gpu_no_longer_idle, false);
 	atomic_set(&scheduler->non_idle_offslot_grps, 0);
 
 	hrtimer_init(&scheduler->tick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	scheduler->tick_timer.function = tick_timer_callback;
-	scheduler->tick_timer_active = false;
+
+	kbase_csf_tiler_heap_reclaim_mgr_init(kbdev);
 
 	return 0;
 }
 
 void kbase_csf_scheduler_term(struct kbase_device *kbdev)
 {
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+
+	if (scheduler->gpuq_kthread) {
+		scheduler->kthread_running = false;
+		complete(&scheduler->kthread_signal);
+		kthread_stop(scheduler->gpuq_kthread);
+	}
+
 	if (kbdev->csf.scheduler.csg_slots) {
 		WARN_ON(atomic_read(&kbdev->csf.scheduler.non_idle_offslot_grps));
 		/* The unload of Driver can take place only when all contexts have
@@ -5788,34 +7538,42 @@ void kbase_csf_scheduler_term(struct kbase_device *kbdev)
 		 * to be active at the time of Driver unload.
 		 */
 		WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev));
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+		flush_work(&kbdev->csf.scheduler.sc_rails_off_work);
+		flush_delayed_work(&kbdev->csf.scheduler.gpu_idle_work);
+#else
 		flush_work(&kbdev->csf.scheduler.gpu_idle_work);
-		mutex_lock(&kbdev->csf.scheduler.lock);
+#endif
+		rt_mutex_lock(&kbdev->csf.scheduler.lock);
 
 		if (kbdev->csf.scheduler.state != SCHED_SUSPENDED) {
+			unsigned long flags;
 			/* The power policy could prevent the Scheduler from
 			 * getting suspended when GPU becomes idle.
 			 */
+			spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 			WARN_ON(kbase_pm_idle_groups_sched_suspendable(kbdev));
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 			scheduler_suspend(kbdev);
 		}
 
-		mutex_unlock(&kbdev->csf.scheduler.lock);
+		rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 		cancel_delayed_work_sync(&kbdev->csf.scheduler.ping_work);
-		cancel_tick_timer(kbdev);
-		cancel_work_sync(&kbdev->csf.scheduler.tick_work);
-		cancel_tock_work(&kbdev->csf.scheduler);
-		mutex_destroy(&kbdev->csf.scheduler.lock);
 		kfree(kbdev->csf.scheduler.csg_slots);
 		kbdev->csf.scheduler.csg_slots = NULL;
 	}
+	KBASE_KTRACE_ADD_CSF_GRP(kbdev, CSF_GROUP_TERMINATED, NULL,
+				 kbase_csf_scheduler_get_nr_active_csgs(kbdev));
+	/* Terminating the MCU shared regions, following the release of slots */
+	kbase_csf_mcu_shared_regs_data_term(kbdev);
 }
 
 void kbase_csf_scheduler_early_term(struct kbase_device *kbdev)
 {
 	if (kbdev->csf.scheduler.idle_wq)
 		destroy_workqueue(kbdev->csf.scheduler.idle_wq);
-	if (kbdev->csf.scheduler.wq)
-		destroy_workqueue(kbdev->csf.scheduler.wq);
+
+	kbase_csf_tiler_heap_reclaim_mgr_term(kbdev);
 }
 
 /**
@@ -5834,7 +7592,7 @@ static void scheduler_enable_tick_timer_nolock(struct kbase_device *kbdev)
 
 	lockdep_assert_held(&kbdev->csf.scheduler.lock);
 
-	if (unlikely(!scheduler_timer_is_enabled_nolock(kbdev)))
+	if (unlikely(!kbase_csf_scheduler_timer_is_enabled(kbdev)))
 		return;
 
 	WARN_ON((scheduler->state != SCHED_INACTIVE) &&
@@ -5842,30 +7600,18 @@ static void scheduler_enable_tick_timer_nolock(struct kbase_device *kbdev)
 		(scheduler->state != SCHED_SLEEPING));
 
 	if (scheduler->total_runnable_grps > 0) {
-		enqueue_tick_work(kbdev);
+		kbase_csf_scheduler_invoke_tick(kbdev);
 		dev_dbg(kbdev->dev, "Re-enabling the scheduler timer\n");
 	} else if (scheduler->state != SCHED_SUSPENDED) {
-		enqueue_gpu_idle_work(scheduler);
+		enqueue_gpu_idle_work(scheduler, 0);
 	}
 }
 
 void kbase_csf_scheduler_enable_tick_timer(struct kbase_device *kbdev)
 {
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 	scheduler_enable_tick_timer_nolock(kbdev);
-	mutex_unlock(&kbdev->csf.scheduler.lock);
-}
-
-bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev)
-{
-	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
-	bool enabled;
-
-	mutex_lock(&scheduler->lock);
-	enabled = scheduler_timer_is_enabled_nolock(kbdev);
-	mutex_unlock(&scheduler->lock);
-
-	return enabled;
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 }
 
 void kbase_csf_scheduler_timer_set_enabled(struct kbase_device *kbdev,
@@ -5874,66 +7620,52 @@ void kbase_csf_scheduler_timer_set_enabled(struct kbase_device *kbdev,
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	bool currently_enabled;
 
-	mutex_lock(&scheduler->lock);
+	/* This lock is taken to prevent this code being executed concurrently
+	 * by userspace.
+	 */
+	rt_mutex_lock(&scheduler->lock);
 
-	currently_enabled = scheduler_timer_is_enabled_nolock(kbdev);
+	currently_enabled = kbase_csf_scheduler_timer_is_enabled(kbdev);
 	if (currently_enabled && !enable) {
-		scheduler->timer_enabled = false;
-		cancel_tick_timer(kbdev);
-		cancel_delayed_work(&scheduler->tock_work);
-		scheduler->tock_pending_request = false;
-		mutex_unlock(&scheduler->lock);
-		/* The non-sync version to cancel the normal work item is not
-		 * available, so need to drop the lock before cancellation.
-		 */
-		cancel_work_sync(&scheduler->tick_work);
-		return;
-	}
-
-	if (!currently_enabled && enable) {
-		scheduler->timer_enabled = true;
-
-		scheduler_enable_tick_timer_nolock(kbdev);
+		atomic_set(&scheduler->timer_enabled, false);
+		cancel_tick_work(scheduler);
+	} else if (!currently_enabled && enable) {
+		atomic_set(&scheduler->timer_enabled, true);
+		kbase_csf_scheduler_invoke_tick(kbdev);
 	}
 
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 
 void kbase_csf_scheduler_kick(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 
-	mutex_lock(&scheduler->lock);
+	if (unlikely(kbase_csf_scheduler_timer_is_enabled(kbdev)))
+		return;
 
-	if (unlikely(scheduler_timer_is_enabled_nolock(kbdev)))
-		goto out;
+	/* This lock is taken to prevent this code being executed concurrently
+	 * by userspace.
+	 */
+	rt_mutex_lock(&scheduler->lock);
 
-	if (scheduler->total_runnable_grps > 0) {
-		enqueue_tick_work(kbdev);
-		dev_dbg(kbdev->dev, "Kicking the scheduler manually\n");
-	}
+	kbase_csf_scheduler_invoke_tick(kbdev);
+	dev_dbg(kbdev->dev, "Kicking the scheduler manually\n");
 
-out:
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 
-int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev)
+int kbase_csf_scheduler_pm_suspend_no_lock(struct kbase_device *kbdev)
 {
-	int result = 0;
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	int result = 0;
 
-	/* Cancel any potential queued delayed work(s) */
-	cancel_work_sync(&scheduler->tick_work);
-	cancel_tock_work(scheduler);
-
-	result = kbase_reset_gpu_prevent_and_wait(kbdev);
-	if (result) {
-		dev_warn(kbdev->dev,
-			 "Stop PM suspending for failing to prevent gpu reset.\n");
-		return result;
-	}
+	lockdep_assert_held(&scheduler->lock);
 
-	mutex_lock(&scheduler->lock);
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	if (unlikely(scheduler->state == SCHED_BUSY))
+		return -EBUSY;
+#endif
 
 #ifdef KBASE_PM_RUNTIME
 	/* If scheduler is in sleeping state, then MCU needs to be activated
@@ -5954,14 +7686,35 @@ int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev)
 			dev_warn(kbdev->dev, "failed to suspend active groups");
 			goto exit;
 		} else {
-			dev_info(kbdev->dev, "Scheduler PM suspend");
+			dev_dbg(kbdev->dev, "Scheduler PM suspend");
 			scheduler_suspend(kbdev);
-			cancel_tick_timer(kbdev);
+			cancel_tick_work(scheduler);
 		}
 	}
 
 exit:
-	mutex_unlock(&scheduler->lock);
+	return result;
+}
+
+int kbase_csf_scheduler_pm_suspend(struct kbase_device *kbdev)
+{
+	int result = 0;
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+
+	/* Cancel any potential queued delayed work(s) */
+	cancel_tick_work(scheduler);
+	cancel_tock_work(scheduler);
+
+	result = kbase_reset_gpu_prevent_and_wait(kbdev);
+	if (result) {
+		dev_warn(kbdev->dev, "Stop PM suspending for failing to prevent gpu reset.\n");
+		return result;
+	}
+
+	rt_mutex_lock(&scheduler->lock);
+
+	result = kbase_csf_scheduler_pm_suspend_no_lock(kbdev);
+	rt_mutex_unlock(&scheduler->lock);
 
 	kbase_reset_gpu_allow(kbdev);
 
@@ -5969,17 +7722,23 @@ exit:
 }
 KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_suspend);
 
-void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev)
+void kbase_csf_scheduler_pm_resume_no_lock(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
 
-	mutex_lock(&scheduler->lock);
+	lockdep_assert_held(&scheduler->lock);
 	if ((scheduler->total_runnable_grps > 0) &&
 	    (scheduler->state == SCHED_SUSPENDED)) {
-		dev_info(kbdev->dev, "Scheduler PM resume");
+		dev_dbg(kbdev->dev, "Scheduler PM resume");
 		scheduler_wakeup(kbdev, true);
 	}
-	mutex_unlock(&scheduler->lock);
+}
+
+void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev)
+{
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
+	kbase_csf_scheduler_pm_resume_no_lock(kbdev);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 }
 KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_resume);
 
@@ -5989,10 +7748,10 @@ void kbase_csf_scheduler_pm_active(struct kbase_device *kbdev)
 	 * callback function, which may need to wake up the MCU for suspending
 	 * the CSGs before powering down the GPU.
 	 */
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 	scheduler_pm_active_handle_suspend(kbdev,
 			KBASE_PM_SUSPEND_HANDLER_NOT_POSSIBLE);
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 }
 KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_active);
 
@@ -6001,13 +7760,13 @@ void kbase_csf_scheduler_pm_idle(struct kbase_device *kbdev)
 	/* Here the lock is taken just to maintain symmetry with
 	 * kbase_csf_scheduler_pm_active().
 	 */
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 	scheduler_pm_idle(kbdev);
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 }
 KBASE_EXPORT_TEST_API(kbase_csf_scheduler_pm_idle);
 
-int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev)
+static int scheduler_wait_mcu_active(struct kbase_device *kbdev, bool killable_wait)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 	unsigned long flags;
@@ -6020,9 +7779,17 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev)
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 	kbase_pm_unlock(kbdev);
 
-	kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	if (killable_wait)
+		err = kbase_pm_killable_wait_for_poweroff_work_complete(kbdev);
+	else
+		err = kbase_pm_wait_for_poweroff_work_complete(kbdev);
+	if (err)
+		return err;
 
-	err = kbase_pm_wait_for_desired_state(kbdev);
+	if (killable_wait)
+		err = kbase_pm_killable_wait_for_desired_state(kbdev);
+	else
+		err = kbase_pm_wait_for_desired_state(kbdev);
 	if (!err) {
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		WARN_ON(kbdev->pm.backend.mcu_state != KBASE_MCU_ON);
@@ -6031,6 +7798,17 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev)
 
 	return err;
 }
+
+int kbase_csf_scheduler_killable_wait_mcu_active(struct kbase_device *kbdev)
+{
+	return scheduler_wait_mcu_active(kbdev, true);
+}
+
+int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev)
+{
+	return scheduler_wait_mcu_active(kbdev, false);
+}
+
 KBASE_EXPORT_TEST_API(kbase_csf_scheduler_wait_mcu_active);
 
 #ifdef KBASE_PM_RUNTIME
@@ -6066,6 +7844,7 @@ int kbase_csf_scheduler_handle_runtime_suspend(struct kbase_device *kbdev)
 	}
 
 	scheduler->state = SCHED_SUSPENDED;
+	KBASE_KTRACE_ADD(kbdev, SCHED_SUSPENDED, NULL, scheduler->state);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	kbdev->pm.backend.gpu_sleep_mode_active = false;
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -6107,11 +7886,10 @@ void kbase_csf_scheduler_force_sleep(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 
-	mutex_lock(&scheduler->lock);
-	if (kbase_pm_gpu_sleep_allowed(kbdev) &&
-	    (scheduler->state == SCHED_INACTIVE))
+	rt_mutex_lock(&scheduler->lock);
+	if (kbase_pm_gpu_sleep_allowed(kbdev) && (scheduler->state == SCHED_INACTIVE))
 		scheduler_sleep_on_idle(kbdev);
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
 #endif
 
@@ -6119,7 +7897,7 @@ void kbase_csf_scheduler_force_wakeup(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
 
-	mutex_lock(&scheduler->lock);
+	rt_mutex_lock(&scheduler->lock);
 	scheduler_wakeup(kbdev, true);
-	mutex_unlock(&scheduler->lock);
+	rt_mutex_unlock(&scheduler->lock);
 }
diff --git a/mali_kbase/csf/mali_kbase_csf_scheduler.h b/mali_kbase/csf/mali_kbase_csf_scheduler.h
index a00a9ca..88521f0 100644
--- a/mali_kbase/csf/mali_kbase_csf_scheduler.h
+++ b/mali_kbase/csf/mali_kbase_csf_scheduler.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -36,7 +36,9 @@
  * If the CSG is already scheduled and resident, the CSI will be started
  * right away, otherwise once the group is made resident.
  *
- * Return: 0 on success, or negative on failure.
+ * Return: 0 on success, or negative on failure. -EBUSY is returned to
+ * indicate to the caller that queue could not be enabled due to Scheduler
+ * state and the caller can try to enable the queue after sometime.
  */
 int kbase_csf_scheduler_queue_start(struct kbase_queue *queue);
 
@@ -274,7 +276,7 @@ int kbase_csf_scheduler_group_copy_suspend_buf(struct kbase_queue_group *group,
  */
 static inline void kbase_csf_scheduler_lock(struct kbase_device *kbdev)
 {
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 }
 
 /**
@@ -284,7 +286,7 @@ static inline void kbase_csf_scheduler_lock(struct kbase_device *kbdev)
  */
 static inline void kbase_csf_scheduler_unlock(struct kbase_device *kbdev)
 {
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 }
 
 /**
@@ -336,7 +338,10 @@ kbase_csf_scheduler_spin_lock_assert_held(struct kbase_device *kbdev)
  *
  * Return: true if the scheduler is configured to wake up periodically
  */
-bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev);
+static inline bool kbase_csf_scheduler_timer_is_enabled(struct kbase_device *kbdev)
+{
+	return atomic_read(&kbdev->csf.scheduler.timer_enabled);
+}
 
 /**
  * kbase_csf_scheduler_timer_set_enabled() - Enable/disable periodic
@@ -410,6 +415,33 @@ void kbase_csf_scheduler_pm_idle(struct kbase_device *kbdev);
 int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev);
 
 /**
+ * kbase_csf_scheduler_killable_wait_mcu_active - Wait for the MCU to actually become
+ *                                                active in killable state.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function is same as kbase_csf_scheduler_wait_mcu_active(), expect that
+ * it would allow the SIGKILL signal to interrupt the wait.
+ * This function is supposed to be called from the code that is executed in ioctl or
+ * Userspace context, wherever it is safe to do so.
+ *
+ * Return: 0 if the MCU was successfully activated, or -ETIMEDOUT code on timeout error or
+ *        -ERESTARTSYS if the wait was interrupted.
+ */
+int kbase_csf_scheduler_killable_wait_mcu_active(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_scheduler_pm_resume_no_lock - Reactivate the scheduler on system resume
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function will make the scheduler resume the scheduling of queue groups
+ * and take the power managemenet reference, if there are any runnable groups.
+ * The caller must have acquired the global Scheduler lock.
+ */
+void kbase_csf_scheduler_pm_resume_no_lock(struct kbase_device *kbdev);
+
+/**
  * kbase_csf_scheduler_pm_resume - Reactivate the scheduler on system resume
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
@@ -420,6 +452,19 @@ int kbase_csf_scheduler_wait_mcu_active(struct kbase_device *kbdev);
 void kbase_csf_scheduler_pm_resume(struct kbase_device *kbdev);
 
 /**
+ * kbase_csf_scheduler_pm_suspend_no_lock - Idle the scheduler on system suspend
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function will make the scheduler suspend all the running queue groups
+ * and drop its power managemenet reference.
+ * The caller must have acquired the global Scheduler lock.
+ *
+ * Return: 0 on success.
+ */
+int kbase_csf_scheduler_pm_suspend_no_lock(struct kbase_device *kbdev);
+
+/**
  * kbase_csf_scheduler_pm_suspend - Idle the scheduler on system suspend
  *
  * @kbdev: Instance of a GPU platform device that implements a CSF interface.
@@ -448,68 +493,44 @@ static inline bool kbase_csf_scheduler_all_csgs_idle(struct kbase_device *kbdev)
 }
 
 /**
- * kbase_csf_scheduler_advance_tick_nolock() - Advance the scheduling tick
+ * kbase_csf_scheduler_invoke_tick() - Invoke the scheduling tick
  *
  * @kbdev: Pointer to the device
  *
- * This function advances the scheduling tick by enqueing the tick work item for
- * immediate execution, but only if the tick hrtimer is active. If the timer
- * is inactive then the tick work item is already in flight.
- * The caller must hold the interrupt lock.
- */
-static inline void
-kbase_csf_scheduler_advance_tick_nolock(struct kbase_device *kbdev)
-{
-	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-
-	lockdep_assert_held(&scheduler->interrupt_lock);
-
-	if (scheduler->tick_timer_active) {
-		KBASE_KTRACE_ADD(kbdev, SCHEDULER_ADVANCE_TICK, NULL, 0u);
-		scheduler->tick_timer_active = false;
-		queue_work(scheduler->wq, &scheduler->tick_work);
-	} else {
-		KBASE_KTRACE_ADD(kbdev, SCHEDULER_NOADVANCE_TICK, NULL, 0u);
-	}
-}
-
-/**
- * kbase_csf_scheduler_advance_tick() - Advance the scheduling tick
- *
- * @kbdev: Pointer to the device
+ * This function wakes up kbase_csf_scheduler_kthread() to perform a scheduling
+ * tick regardless of whether the tick timer is enabled. This can be called
+ * from interrupt context to resume the scheduling after GPU was put to sleep.
  *
- * This function advances the scheduling tick by enqueing the tick work item for
- * immediate execution, but only if the tick hrtimer is active. If the timer
- * is inactive then the tick work item is already in flight.
+ * Caller is expected to check kbase_csf_scheduler.timer_enabled as required
+ * to see whether it is appropriate before calling this function.
  */
-static inline void kbase_csf_scheduler_advance_tick(struct kbase_device *kbdev)
+static inline void kbase_csf_scheduler_invoke_tick(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-	unsigned long flags;
 
-	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
-	kbase_csf_scheduler_advance_tick_nolock(kbdev);
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TICK_INVOKE, NULL, 0u);
+	if (atomic_cmpxchg(&scheduler->pending_tick_work, false, true) == false)
+		complete(&scheduler->kthread_signal);
 }
 
 /**
- * kbase_csf_scheduler_invoke_tick() - Invoke the scheduling tick
+ * kbase_csf_scheduler_invoke_tock() - Invoke the scheduling tock
  *
  * @kbdev: Pointer to the device
  *
- * This function will queue the scheduling tick work item for immediate
- * execution if tick timer is not active. This can be called from interrupt
- * context to resume the scheduling after GPU was put to sleep.
+ * This function wakes up kbase_csf_scheduler_kthread() to perform a scheduling
+ * tock.
+ *
+ * Caller is expected to check kbase_csf_scheduler.timer_enabled as required
+ * to see whether it is appropriate before calling this function.
  */
-static inline void kbase_csf_scheduler_invoke_tick(struct kbase_device *kbdev)
+static inline void kbase_csf_scheduler_invoke_tock(struct kbase_device *kbdev)
 {
 	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
-	unsigned long flags;
 
-	spin_lock_irqsave(&scheduler->interrupt_lock, flags);
-	if (!scheduler->tick_timer_active)
-		queue_work(scheduler->wq, &scheduler->tick_work);
-	spin_unlock_irqrestore(&scheduler->interrupt_lock, flags);
+	KBASE_KTRACE_ADD(kbdev, SCHEDULER_TOCK_INVOKE, NULL, 0u);
+	if (atomic_cmpxchg(&scheduler->pending_tock_work, false, true) == false)
+		complete(&scheduler->kthread_signal);
 }
 
 /**
@@ -570,15 +591,6 @@ int kbase_csf_scheduler_handle_runtime_suspend(struct kbase_device *kbdev);
 #endif
 
 /**
- * kbase_csf_scheduler_process_gpu_idle_event() - Process GPU idle IRQ
- *
- * @kbdev: Pointer to the device
- *
- * This function is called when a GPU idle IRQ has been raised.
- */
-void kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev);
-
-/**
  * kbase_csf_scheduler_get_nr_active_csgs() - Get the number of active CSGs
  *
  * @kbdev: Pointer to the device
@@ -634,4 +646,28 @@ void kbase_csf_scheduler_force_wakeup(struct kbase_device *kbdev);
 void kbase_csf_scheduler_force_sleep(struct kbase_device *kbdev);
 #endif
 
+/**
+ * kbase_csf_scheduler_process_gpu_idle_event() - Process GPU idle event
+ *
+ * @kbdev: Pointer to the device
+ *
+ * This function is called when a IRQ for GPU idle event has been raised.
+ *
+ * Return: true if the GPU idle event can be acknowledged.
+ */
+bool kbase_csf_scheduler_process_gpu_idle_event(struct kbase_device *kbdev);
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+/**
+ * turn_on_sc_power_rails - Turn on the shader core power rails.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ * This function is called to synchronously turn on the shader core power rails,
+ * before execution is resumed on the cores.
+ *
+ * scheduler lock must be held when calling this function
+ */
+void turn_on_sc_power_rails(struct kbase_device *kbdev);
+#endif
 #endif /* _KBASE_CSF_SCHEDULER_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c
new file mode 100644
index 0000000..0615d5f
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.c
@@ -0,0 +1,878 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include "mali_kbase_csf_sync_debugfs.h"
+#include "mali_kbase_csf_csg_debugfs.h"
+#include <mali_kbase.h>
+#include <linux/seq_file.h>
+#include <linux/version_compat_defs.h>
+
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+#include "mali_kbase_sync.h"
+#endif
+
+#define CQS_UNREADABLE_LIVE_VALUE "(unavailable)"
+
+#define CSF_SYNC_DUMP_SIZE 256
+
+/**
+ * kbasep_print() - Helper function to print to either debugfs file or dmesg.
+ *
+ * @kctx: The kbase context
+ * @file: The seq_file for printing to. This is NULL if printing to dmesg.
+ * @fmt:  The message to print.
+ * @...:  Arguments to format the message.
+ */
+__attribute__((format(__printf__, 3, 4))) static void
+kbasep_print(struct kbase_context *kctx, struct seq_file *file, const char *fmt, ...)
+{
+	int len = 0;
+	char buffer[CSF_SYNC_DUMP_SIZE];
+	va_list arglist;
+
+	va_start(arglist, fmt);
+	len = vsnprintf(buffer, CSF_SYNC_DUMP_SIZE, fmt, arglist);
+	if (len <= 0) {
+		pr_err("message write to the buffer failed");
+		goto exit;
+	}
+
+	if (file)
+		seq_printf(file, buffer);
+	else
+		dev_warn(kctx->kbdev->dev, buffer);
+
+exit:
+	va_end(arglist);
+}
+
+/**
+ * kbasep_csf_debugfs_get_cqs_live_u32() - Obtain live (u32) value for a CQS object.
+ *
+ * @kctx:     The context of the queue.
+ * @obj_addr: Pointer to the CQS live 32-bit value.
+ * @live_val: Pointer to the u32 that will be set to the CQS object's current, live
+ *            value.
+ *
+ * Return: 0 if successful or a negative error code on failure.
+ */
+static int kbasep_csf_debugfs_get_cqs_live_u32(struct kbase_context *kctx, u64 obj_addr,
+					       u32 *live_val)
+{
+	struct kbase_vmap_struct *mapping;
+	u32 *const cpu_ptr = (u32 *)kbase_phy_alloc_mapping_get(kctx, obj_addr, &mapping);
+
+	if (!cpu_ptr)
+		return -1;
+
+	*live_val = *cpu_ptr;
+	kbase_phy_alloc_mapping_put(kctx, mapping);
+	return 0;
+}
+
+/**
+ * kbasep_csf_debugfs_get_cqs_live_u64() - Obtain live (u64) value for a CQS object.
+ *
+ * @kctx:     The context of the queue.
+ * @obj_addr: Pointer to the CQS live value (32 or 64-bit).
+ * @live_val: Pointer to the u64 that will be set to the CQS object's current, live
+ *            value.
+ *
+ * Return: 0 if successful or a negative error code on failure.
+ */
+static int kbasep_csf_debugfs_get_cqs_live_u64(struct kbase_context *kctx, u64 obj_addr,
+					       u64 *live_val)
+{
+	struct kbase_vmap_struct *mapping;
+	u64 *cpu_ptr = (u64 *)kbase_phy_alloc_mapping_get(kctx, obj_addr, &mapping);
+
+	if (!cpu_ptr)
+		return -1;
+
+	*live_val = *cpu_ptr;
+	kbase_phy_alloc_mapping_put(kctx, mapping);
+	return 0;
+}
+
+/**
+ * kbasep_csf_sync_print_kcpu_fence_wait_or_signal() - Print details of a CSF SYNC Fence Wait
+ *                                                     or Fence Signal command, contained in a
+ *                                                     KCPU queue.
+ *
+ * @buffer:   The buffer to write to.
+ * @length:   The length of text in the buffer.
+ * @cmd:      The KCPU Command to be printed.
+ * @cmd_name: The name of the command: indicates either a fence SIGNAL or WAIT.
+ */
+static void kbasep_csf_sync_print_kcpu_fence_wait_or_signal(char *buffer, int *length,
+							    struct kbase_kcpu_command *cmd,
+							    const char *cmd_name)
+{
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+	struct fence *fence = NULL;
+#else
+	struct dma_fence *fence = NULL;
+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */
+	struct kbase_kcpu_command_fence_info *fence_info;
+	struct kbase_sync_fence_info info;
+	const char *timeline_name = NULL;
+	bool is_signaled = false;
+
+	fence_info = &cmd->info.fence;
+	if (kbase_kcpu_command_fence_has_force_signaled(fence_info))
+		return;
+
+	fence = kbase_fence_get(fence_info);
+	if (WARN_ON(!fence))
+		return;
+
+	kbase_sync_fence_info_get(fence, &info);
+	timeline_name = fence->ops->get_timeline_name(fence);
+	is_signaled = info.status > 0;
+
+	*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+			    "cmd:%s obj:0x%pK live_value:0x%.8x | ", cmd_name, fence, is_signaled);
+
+	/* Note: fence->seqno was u32 until 5.1 kernel, then u64 */
+	*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+			    "timeline_name:%s timeline_context:0x%.16llx fence_seqno:0x%.16llx",
+			    timeline_name, fence->context, (u64)fence->seqno);
+
+	kbase_fence_put(fence);
+}
+
+/**
+ * kbasep_csf_sync_print_kcpu_cqs_wait() - Print details of a CSF SYNC CQS Wait command,
+ *                                         contained in a KCPU queue.
+ *
+ * @kctx:   The kbase context.
+ * @buffer: The buffer to write to.
+ * @length: The length of text in the buffer.
+ * @cmd:    The KCPU Command to be printed.
+ */
+static void kbasep_csf_sync_print_kcpu_cqs_wait(struct kbase_context *kctx, char *buffer,
+						int *length, struct kbase_kcpu_command *cmd)
+{
+	size_t i;
+
+	for (i = 0; i < cmd->info.cqs_wait.nr_objs; i++) {
+		struct base_cqs_wait_info *cqs_obj = &cmd->info.cqs_wait.objs[i];
+
+		u32 live_val;
+		int ret = kbasep_csf_debugfs_get_cqs_live_u32(kctx, cqs_obj->addr, &live_val);
+		bool live_val_valid = (ret >= 0);
+
+		*length +=
+			snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				 "cmd:CQS_WAIT_OPERATION obj:0x%.16llx live_value:", cqs_obj->addr);
+
+		if (live_val_valid)
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    "0x%.16llx", (u64)live_val);
+		else
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    CQS_UNREADABLE_LIVE_VALUE);
+
+		*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				    " | op:gt arg_value:0x%.8x", cqs_obj->val);
+	}
+}
+
+/**
+ * kbasep_csf_sync_print_kcpu_cqs_set() - Print details of a CSF SYNC CQS
+ *                                        Set command, contained in a KCPU queue.
+ *
+ * @kctx:   The kbase context.
+ * @buffer: The buffer to write to.
+ * @length: The length of text in the buffer.
+ * @cmd:    The KCPU Command to be printed.
+ */
+static void kbasep_csf_sync_print_kcpu_cqs_set(struct kbase_context *kctx, char *buffer,
+					       int *length, struct kbase_kcpu_command *cmd)
+{
+	size_t i;
+
+	for (i = 0; i < cmd->info.cqs_set.nr_objs; i++) {
+		struct base_cqs_set *cqs_obj = &cmd->info.cqs_set.objs[i];
+
+		u32 live_val;
+		int ret = kbasep_csf_debugfs_get_cqs_live_u32(kctx, cqs_obj->addr, &live_val);
+		bool live_val_valid = (ret >= 0);
+
+		*length +=
+			snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				 "cmd:CQS_SET_OPERATION obj:0x%.16llx live_value:", cqs_obj->addr);
+
+		if (live_val_valid)
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    "0x%.16llx", (u64)live_val);
+		else
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    CQS_UNREADABLE_LIVE_VALUE);
+
+		*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				    " | op:add arg_value:0x%.8x", 1);
+	}
+}
+
+/**
+ * kbasep_csf_sync_get_wait_op_name() - Print the name of a CQS Wait Operation.
+ *
+ * @op: The numerical value of operation.
+ *
+ * Return: const static pointer to the command name, or '??' if unknown.
+ */
+static const char *kbasep_csf_sync_get_wait_op_name(basep_cqs_wait_operation_op op)
+{
+	const char *string;
+
+	switch (op) {
+	case BASEP_CQS_WAIT_OPERATION_LE:
+		string = "le";
+		break;
+	case BASEP_CQS_WAIT_OPERATION_GT:
+		string = "gt";
+		break;
+	default:
+		string = "??";
+		break;
+	}
+	return string;
+}
+
+/**
+ * kbasep_csf_sync_get_set_op_name() - Print the name of a CQS Set Operation.
+ *
+ * @op: The numerical value of operation.
+ *
+ * Return: const static pointer to the command name, or '??' if unknown.
+ */
+static const char *kbasep_csf_sync_get_set_op_name(basep_cqs_set_operation_op op)
+{
+	const char *string;
+
+	switch (op) {
+	case BASEP_CQS_SET_OPERATION_ADD:
+		string = "add";
+		break;
+	case BASEP_CQS_SET_OPERATION_SET:
+		string = "set";
+		break;
+	default:
+		string = "???";
+		break;
+	}
+	return string;
+}
+
+/**
+ * kbasep_csf_sync_print_kcpu_cqs_wait_op() - Print details of a CSF SYNC CQS
+ *                                            Wait Operation command, contained
+ *                                            in a KCPU queue.
+ *
+ * @kctx:   The kbase context.
+ * @buffer: The buffer to write to.
+ * @length: The length of text in the buffer.
+ * @cmd:    The KCPU Command to be printed.
+ */
+static void kbasep_csf_sync_print_kcpu_cqs_wait_op(struct kbase_context *kctx, char *buffer,
+						   int *length, struct kbase_kcpu_command *cmd)
+{
+	size_t i;
+
+	for (i = 0; i < cmd->info.cqs_wait.nr_objs; i++) {
+		struct base_cqs_wait_operation_info *wait_op =
+			&cmd->info.cqs_wait_operation.objs[i];
+		const char *op_name = kbasep_csf_sync_get_wait_op_name(wait_op->operation);
+
+		u64 live_val;
+		int ret = kbasep_csf_debugfs_get_cqs_live_u64(kctx, wait_op->addr, &live_val);
+
+		bool live_val_valid = (ret >= 0);
+
+		*length +=
+			snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				 "cmd:CQS_WAIT_OPERATION obj:0x%.16llx live_value:", wait_op->addr);
+
+		if (live_val_valid)
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    "0x%.16llx", live_val);
+		else
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    CQS_UNREADABLE_LIVE_VALUE);
+
+		*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				    " | op:%s arg_value:0x%.16llx", op_name, wait_op->val);
+	}
+}
+
+/**
+ * kbasep_csf_sync_print_kcpu_cqs_set_op() - Print details of a CSF SYNC CQS
+ *                                           Set Operation command, contained
+ *                                           in a KCPU queue.
+ *
+ * @kctx:   The kbase context.
+ * @buffer: The buffer to write to.
+ * @length: The length of text in the buffer.
+ * @cmd:    The KCPU Command to be printed.
+ */
+static void kbasep_csf_sync_print_kcpu_cqs_set_op(struct kbase_context *kctx, char *buffer,
+						  int *length, struct kbase_kcpu_command *cmd)
+{
+	size_t i;
+
+	for (i = 0; i < cmd->info.cqs_set_operation.nr_objs; i++) {
+		struct base_cqs_set_operation_info *set_op = &cmd->info.cqs_set_operation.objs[i];
+		const char *op_name = kbasep_csf_sync_get_set_op_name(
+			(basep_cqs_set_operation_op)set_op->operation);
+
+		u64 live_val;
+		int ret = kbasep_csf_debugfs_get_cqs_live_u64(kctx, set_op->addr, &live_val);
+
+		bool live_val_valid = (ret >= 0);
+
+		*length +=
+			snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				 "cmd:CQS_SET_OPERATION obj:0x%.16llx live_value:", set_op->addr);
+
+		if (live_val_valid)
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    "0x%.16llx", live_val);
+		else
+			*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+					    CQS_UNREADABLE_LIVE_VALUE);
+
+		*length += snprintf(buffer + *length, CSF_SYNC_DUMP_SIZE - *length,
+				    " | op:%s arg_value:0x%.16llx", op_name, set_op->val);
+	}
+}
+
+/**
+ * kbasep_csf_sync_kcpu_debugfs_print_queue() - Print debug data for a KCPU queue
+ *
+ * @kctx:  The kbase context.
+ * @file:  The seq_file to print to.
+ * @queue: Pointer to the KCPU queue.
+ */
+static void kbasep_csf_sync_kcpu_debugfs_print_queue(struct kbase_context *kctx,
+						     struct seq_file *file,
+						     struct kbase_kcpu_command_queue *queue)
+{
+	char started_or_pending;
+	struct kbase_kcpu_command *cmd;
+	size_t i;
+
+	if (WARN_ON(!queue))
+		return;
+
+	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+	mutex_lock(&queue->lock);
+
+	for (i = 0; i != queue->num_pending_cmds; ++i) {
+		char buffer[CSF_SYNC_DUMP_SIZE];
+		int length = 0;
+		started_or_pending = ((i == 0) && queue->command_started) ? 'S' : 'P';
+		length += snprintf(buffer, CSF_SYNC_DUMP_SIZE, "queue:KCPU-%d-%d exec:%c ",
+				   kctx->id, queue->id, started_or_pending);
+
+		cmd = &queue->commands[(u8)(queue->start_offset + i)];
+		switch (cmd->type) {
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+		case BASE_KCPU_COMMAND_TYPE_FENCE_SIGNAL:
+			kbasep_csf_sync_print_kcpu_fence_wait_or_signal(buffer, &length, cmd,
+									"FENCE_SIGNAL");
+			break;
+		case BASE_KCPU_COMMAND_TYPE_FENCE_WAIT:
+			kbasep_csf_sync_print_kcpu_fence_wait_or_signal(buffer, &length, cmd,
+									"FENCE_WAIT");
+			break;
+#endif
+		case BASE_KCPU_COMMAND_TYPE_CQS_WAIT:
+			kbasep_csf_sync_print_kcpu_cqs_wait(kctx, buffer, &length, cmd);
+			break;
+		case BASE_KCPU_COMMAND_TYPE_CQS_SET:
+			kbasep_csf_sync_print_kcpu_cqs_set(kctx, buffer, &length, cmd);
+			break;
+		case BASE_KCPU_COMMAND_TYPE_CQS_WAIT_OPERATION:
+			kbasep_csf_sync_print_kcpu_cqs_wait_op(kctx, buffer, &length, cmd);
+			break;
+		case BASE_KCPU_COMMAND_TYPE_CQS_SET_OPERATION:
+			kbasep_csf_sync_print_kcpu_cqs_set_op(kctx, buffer, &length, cmd);
+			break;
+		default:
+			length += snprintf(buffer + length, CSF_SYNC_DUMP_SIZE - length,
+					   ", U, Unknown blocking command");
+			break;
+		}
+
+		length += snprintf(buffer + length, CSF_SYNC_DUMP_SIZE - length, "\n");
+		kbasep_print(kctx, file, buffer);
+	}
+
+	mutex_unlock(&queue->lock);
+}
+
+int kbasep_csf_sync_kcpu_dump_locked(struct kbase_context *kctx, struct seq_file *file)
+{
+	unsigned long queue_idx;
+
+	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
+
+	kbasep_print(kctx, file, "KCPU queues for ctx %d:\n", kctx->id);
+
+	queue_idx = find_first_bit(kctx->csf.kcpu_queues.in_use, KBASEP_MAX_KCPU_QUEUES);
+
+	while (queue_idx < KBASEP_MAX_KCPU_QUEUES) {
+		kbasep_csf_sync_kcpu_debugfs_print_queue(kctx, file,
+							 kctx->csf.kcpu_queues.array[queue_idx]);
+
+		queue_idx = find_next_bit(kctx->csf.kcpu_queues.in_use, KBASEP_MAX_KCPU_QUEUES,
+					  queue_idx + 1);
+	}
+
+	return 0;
+}
+
+int kbasep_csf_sync_kcpu_dump(struct kbase_context *kctx, struct seq_file *file)
+{
+	mutex_lock(&kctx->csf.kcpu_queues.lock);
+	kbasep_csf_sync_kcpu_dump_locked(kctx, file);
+	mutex_unlock(&kctx->csf.kcpu_queues.lock);
+	return 0;
+}
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+
+/* GPU queue related values */
+#define GPU_CSF_MOVE_OPCODE ((u64)0x1)
+#define GPU_CSF_MOVE32_OPCODE ((u64)0x2)
+#define GPU_CSF_SYNC_ADD_OPCODE ((u64)0x25)
+#define GPU_CSF_SYNC_SET_OPCODE ((u64)0x26)
+#define GPU_CSF_SYNC_WAIT_OPCODE ((u64)0x27)
+#define GPU_CSF_SYNC_ADD64_OPCODE ((u64)0x33)
+#define GPU_CSF_SYNC_SET64_OPCODE ((u64)0x34)
+#define GPU_CSF_SYNC_WAIT64_OPCODE ((u64)0x35)
+#define GPU_CSF_CALL_OPCODE ((u64)0x20)
+
+#define MAX_NR_GPU_CALLS (5)
+#define INSTR_OPCODE_MASK ((u64)0xFF << 56)
+#define INSTR_OPCODE_GET(value) ((value & INSTR_OPCODE_MASK) >> 56)
+#define MOVE32_IMM_MASK ((u64)0xFFFFFFFFFUL)
+#define MOVE_DEST_MASK ((u64)0xFF << 48)
+#define MOVE_DEST_GET(value) ((value & MOVE_DEST_MASK) >> 48)
+#define MOVE_IMM_MASK ((u64)0xFFFFFFFFFFFFUL)
+#define SYNC_SRC0_MASK ((u64)0xFF << 40)
+#define SYNC_SRC1_MASK ((u64)0xFF << 32)
+#define SYNC_SRC0_GET(value) (u8)((value & SYNC_SRC0_MASK) >> 40)
+#define SYNC_SRC1_GET(value) (u8)((value & SYNC_SRC1_MASK) >> 32)
+#define SYNC_WAIT_CONDITION_MASK ((u64)0xF << 28)
+#define SYNC_WAIT_CONDITION_GET(value) (u8)((value & SYNC_WAIT_CONDITION_MASK) >> 28)
+
+/* Enumeration for types of GPU queue sync events for
+ * the purpose of dumping them through debugfs.
+ */
+enum debugfs_gpu_sync_type {
+	DEBUGFS_GPU_SYNC_WAIT,
+	DEBUGFS_GPU_SYNC_SET,
+	DEBUGFS_GPU_SYNC_ADD,
+	NUM_DEBUGFS_GPU_SYNC_TYPES
+};
+
+/**
+ * kbasep_csf_get_move_immediate_value() - Get the immediate values for sync operations
+ *                                         from a MOVE instruction.
+ *
+ * @move_cmd:        Raw MOVE instruction.
+ * @sync_addr_reg:   Register identifier from SYNC_* instruction.
+ * @compare_val_reg: Register identifier from SYNC_* instruction.
+ * @sync_val:        Pointer to store CQS object address for sync operation.
+ * @compare_val:     Pointer to store compare value for sync operation.
+ *
+ * Return: True if value is obtained by checking for correct register identifier,
+ * or false otherwise.
+ */
+static bool kbasep_csf_get_move_immediate_value(u64 move_cmd, u64 sync_addr_reg,
+						u64 compare_val_reg, u64 *sync_val,
+						u64 *compare_val)
+{
+	u64 imm_mask;
+
+	/* Verify MOVE instruction and get immediate mask */
+	if (INSTR_OPCODE_GET(move_cmd) == GPU_CSF_MOVE32_OPCODE)
+		imm_mask = MOVE32_IMM_MASK;
+	else if (INSTR_OPCODE_GET(move_cmd) == GPU_CSF_MOVE_OPCODE)
+		imm_mask = MOVE_IMM_MASK;
+	else
+		/* Error return */
+		return false;
+
+	/* Verify value from MOVE instruction and assign to variable */
+	if (sync_addr_reg == MOVE_DEST_GET(move_cmd))
+		*sync_val = move_cmd & imm_mask;
+	else if (compare_val_reg == MOVE_DEST_GET(move_cmd))
+		*compare_val = move_cmd & imm_mask;
+	else
+		/* Error return */
+		return false;
+
+	return true;
+}
+
+/** kbasep_csf_read_ringbuffer_value() - Reads a u64 from the ringbuffer at a provided
+ *                                       offset.
+ *
+ * @queue:            Pointer to the queue.
+ * @ringbuff_offset:  Ringbuffer offset.
+ *
+ * Return: the u64 in the ringbuffer at the desired offset.
+ */
+static u64 kbasep_csf_read_ringbuffer_value(struct kbase_queue *queue, u32 ringbuff_offset)
+{
+	u64 page_off = ringbuff_offset >> PAGE_SHIFT;
+	u64 offset_within_page = ringbuff_offset & ~PAGE_MASK;
+	struct page *page = as_page(queue->queue_reg->gpu_alloc->pages[page_off]);
+	u64 *ringbuffer = vmap(&page, 1, VM_MAP, pgprot_noncached(PAGE_KERNEL));
+	u64 value;
+
+	if (!ringbuffer) {
+		struct kbase_context *kctx = queue->kctx;
+
+		dev_err(kctx->kbdev->dev, "%s failed to map the buffer page for read a command!",
+			__func__);
+		/* Return an alternative 0 for dumpping operation*/
+		value = 0;
+	} else {
+		value = ringbuffer[offset_within_page / sizeof(u64)];
+		vunmap(ringbuffer);
+	}
+
+	return value;
+}
+
+/**
+ * kbasep_csf_print_gpu_sync_op() - Print sync operation info for given sync command.
+ *
+ * @file:             Pointer to debugfs seq_file file struct for writing output.
+ * @kctx:             Pointer to kbase context.
+ * @queue:            Pointer to the GPU command queue.
+ * @ringbuff_offset:  Offset to index the ring buffer with, for the given sync command.
+ *                    (Useful for finding preceding MOVE commands)
+ * @sync_cmd:         Entire u64 of the sync command, which has both sync address and
+ *                    comparison-value encoded in it.
+ * @type:             Type of GPU sync command (e.g. SYNC_SET, SYNC_ADD, SYNC_WAIT).
+ * @is_64bit:         Bool to indicate if operation is 64 bit (true) or 32 bit (false).
+ * @follows_wait:     Bool to indicate if the operation follows at least one wait
+ *                    operation. Used to determine whether it's pending or started.
+ */
+static void kbasep_csf_print_gpu_sync_op(struct seq_file *file, struct kbase_context *kctx,
+					 struct kbase_queue *queue, u32 ringbuff_offset,
+					 u64 sync_cmd, enum debugfs_gpu_sync_type type,
+					 bool is_64bit, bool follows_wait)
+{
+	u64 sync_addr = 0, compare_val = 0, live_val = 0;
+	u64 move_cmd;
+	u8 sync_addr_reg, compare_val_reg, wait_condition = 0;
+	int err;
+
+	static const char *const gpu_sync_type_name[] = { "SYNC_WAIT", "SYNC_SET", "SYNC_ADD" };
+	static const char *const gpu_sync_type_op[] = {
+		"wait", /* This should never be printed, only included to simplify indexing */
+		"set", "add"
+	};
+
+	if (type >= NUM_DEBUGFS_GPU_SYNC_TYPES) {
+		dev_warn(kctx->kbdev->dev, "Expected GPU queue sync type is unknown!");
+		return;
+	}
+
+	/* We expect there to be at least 2 preceding MOVE instructions, and
+	 * Base will always arrange for the 2 MOVE + SYNC instructions to be
+	 * contiguously located, and is therefore never expected to be wrapped
+	 * around the ringbuffer boundary.
+	 */
+	if (unlikely(ringbuff_offset < (2 * sizeof(u64)))) {
+		dev_warn(kctx->kbdev->dev,
+			 "Unexpected wraparound detected between %s & MOVE instruction",
+			 gpu_sync_type_name[type]);
+		return;
+	}
+
+	/* 1. Get Register identifiers from SYNC_* instruction */
+	sync_addr_reg = SYNC_SRC0_GET(sync_cmd);
+	compare_val_reg = SYNC_SRC1_GET(sync_cmd);
+
+	/* 2. Get values from first MOVE command */
+	ringbuff_offset -= sizeof(u64);
+	move_cmd = kbasep_csf_read_ringbuffer_value(queue, ringbuff_offset);
+	if (!kbasep_csf_get_move_immediate_value(move_cmd, sync_addr_reg, compare_val_reg,
+						 &sync_addr, &compare_val))
+		return;
+
+	/* 3. Get values from next MOVE command */
+	ringbuff_offset -= sizeof(u64);
+	move_cmd = kbasep_csf_read_ringbuffer_value(queue, ringbuff_offset);
+	if (!kbasep_csf_get_move_immediate_value(move_cmd, sync_addr_reg, compare_val_reg,
+						 &sync_addr, &compare_val))
+		return;
+
+	/* 4. Get CQS object value */
+	if (is_64bit)
+		err = kbasep_csf_debugfs_get_cqs_live_u64(kctx, sync_addr, &live_val);
+	else
+		err = kbasep_csf_debugfs_get_cqs_live_u32(kctx, sync_addr, (u32 *)(&live_val));
+
+	if (err)
+		return;
+
+	/* 5. Print info */
+	kbasep_print(kctx, file, "queue:GPU-%u-%u-%u exec:%c cmd:%s ", kctx->id,
+		     queue->group->handle, queue->csi_index,
+		     queue->enabled && !follows_wait ? 'S' : 'P', gpu_sync_type_name[type]);
+
+	if (queue->group->csg_nr == KBASEP_CSG_NR_INVALID)
+		kbasep_print(kctx, file, "slot:-");
+	else
+		kbasep_print(kctx, file, "slot:%d", (int)queue->group->csg_nr);
+
+	kbasep_print(kctx, file, " obj:0x%.16llx live_value:0x%.16llx | ", sync_addr, live_val);
+
+	if (type == DEBUGFS_GPU_SYNC_WAIT) {
+		wait_condition = SYNC_WAIT_CONDITION_GET(sync_cmd);
+		kbasep_print(kctx, file, "op:%s ",
+			     kbasep_csf_sync_get_wait_op_name(wait_condition));
+	} else
+		kbasep_print(kctx, file, "op:%s ", gpu_sync_type_op[type]);
+
+	kbasep_print(kctx, file, "arg_value:0x%.16llx\n", compare_val);
+}
+
+/**
+ * kbasep_csf_dump_active_queue_sync_info() - Print GPU command queue sync information.
+ *
+ * @file:  seq_file for printing to.
+ * @queue: Address of a GPU command queue to examine.
+ *
+ * This function will iterate through each command in the ring buffer of the given GPU queue from
+ * CS_EXTRACT, and if is a SYNC_* instruction it will attempt to decode the sync operation and
+ * print relevant information to the debugfs file.
+ * This function will stop iterating once the CS_INSERT address is reached by the cursor (i.e.
+ * when there are no more commands to view) or a number of consumed GPU CALL commands have
+ * been observed.
+ */
+static void kbasep_csf_dump_active_queue_sync_info(struct seq_file *file, struct kbase_queue *queue)
+{
+	struct kbase_context *kctx;
+	u64 *addr;
+	u64 cs_extract, cs_insert, instr, cursor;
+	bool follows_wait = false;
+	int nr_calls = 0;
+
+	if (!queue)
+		return;
+
+	kctx = queue->kctx;
+
+	addr = queue->user_io_addr;
+	cs_insert = addr[CS_INSERT_LO / sizeof(*addr)];
+
+	addr = queue->user_io_addr + PAGE_SIZE / sizeof(*addr);
+	cs_extract = addr[CS_EXTRACT_LO / sizeof(*addr)];
+
+	cursor = cs_extract;
+
+	if (!is_power_of_2(queue->size)) {
+		dev_warn(kctx->kbdev->dev, "GPU queue %u size of %u not a power of 2",
+			 queue->csi_index, queue->size);
+		return;
+	}
+
+	while ((cursor < cs_insert) && (nr_calls < MAX_NR_GPU_CALLS)) {
+		bool instr_is_64_bit = false;
+		/* Calculate offset into ringbuffer from the absolute cursor,
+		 * by finding the remainder of the cursor divided by the
+		 * ringbuffer size. The ringbuffer size is guaranteed to be
+		 * a power of 2, so the remainder can be calculated without an
+		 * explicit modulo. queue->size - 1 is the ringbuffer mask.
+		 */
+		u32 cursor_ringbuff_offset = (u32)(cursor & (queue->size - 1));
+
+		/* Find instruction that cursor is currently on */
+		instr = kbasep_csf_read_ringbuffer_value(queue, cursor_ringbuff_offset);
+
+		switch (INSTR_OPCODE_GET(instr)) {
+		case GPU_CSF_SYNC_ADD64_OPCODE:
+		case GPU_CSF_SYNC_SET64_OPCODE:
+		case GPU_CSF_SYNC_WAIT64_OPCODE:
+			instr_is_64_bit = true;
+			break;
+		default:
+			break;
+		}
+
+		switch (INSTR_OPCODE_GET(instr)) {
+		case GPU_CSF_SYNC_ADD_OPCODE:
+		case GPU_CSF_SYNC_ADD64_OPCODE:
+			kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset,
+						     instr, DEBUGFS_GPU_SYNC_ADD, instr_is_64_bit,
+						     follows_wait);
+			break;
+		case GPU_CSF_SYNC_SET_OPCODE:
+		case GPU_CSF_SYNC_SET64_OPCODE:
+			kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset,
+						     instr, DEBUGFS_GPU_SYNC_SET, instr_is_64_bit,
+						     follows_wait);
+			break;
+		case GPU_CSF_SYNC_WAIT_OPCODE:
+		case GPU_CSF_SYNC_WAIT64_OPCODE:
+			kbasep_csf_print_gpu_sync_op(file, kctx, queue, cursor_ringbuff_offset,
+						     instr, DEBUGFS_GPU_SYNC_WAIT, instr_is_64_bit,
+						     follows_wait);
+			follows_wait = true; /* Future commands will follow at least one wait */
+			break;
+		case GPU_CSF_CALL_OPCODE:
+			nr_calls++;
+			break;
+		default:
+			/* Unrecognized command, skip past it */
+			break;
+		}
+
+		cursor += sizeof(u64);
+	}
+}
+
+/**
+ * kbasep_csf_dump_active_group_sync_state() - Prints SYNC commands in all GPU queues of
+ *                                             the provided queue group.
+ *
+ * @kctx:  The kbase context
+ * @file:  seq_file for printing to.
+ * @group: Address of a GPU command group to iterate through.
+ *
+ * This function will iterate through each queue in the provided GPU queue group and
+ * print its SYNC related commands.
+ */
+static void kbasep_csf_dump_active_group_sync_state(struct kbase_context *kctx,
+						    struct seq_file *file,
+						    struct kbase_queue_group *const group)
+{
+	unsigned int i;
+
+	kbasep_print(kctx, file, "GPU queues for group %u (slot %d) of ctx %d_%d\n", group->handle,
+		     group->csg_nr, kctx->tgid, kctx->id);
+
+	for (i = 0; i < MAX_SUPPORTED_STREAMS_PER_GROUP; i++)
+		kbasep_csf_dump_active_queue_sync_info(file, group->bound_queues[i]);
+}
+
+/**
+ * kbasep_csf_sync_gpu_dump() - Print CSF GPU queue sync info
+ *
+ * @kctx: The kbase context
+ * @file: The seq_file for printing to.
+ *
+ * Return: Negative error code or 0 on success.
+ */
+static int kbasep_csf_sync_gpu_dump(struct kbase_context *kctx, struct seq_file *file)
+{
+	u32 gr;
+	struct kbase_device *kbdev;
+
+	if (WARN_ON(!kctx))
+		return -EINVAL;
+
+	kbdev = kctx->kbdev;
+	kbase_csf_scheduler_lock(kbdev);
+	kbase_csf_debugfs_update_active_groups_status(kbdev);
+
+	for (gr = 0; gr < kbdev->csf.global_iface.group_num; gr++) {
+		struct kbase_queue_group *const group =
+			kbdev->csf.scheduler.csg_slots[gr].resident_group;
+		if (!group || group->kctx != kctx)
+			continue;
+		kbasep_csf_dump_active_group_sync_state(kctx, file, group);
+	}
+
+	kbase_csf_scheduler_unlock(kbdev);
+	return 0;
+}
+
+/**
+ * kbasep_csf_sync_debugfs_show() - Print CSF queue sync information
+ *
+ * @file: The seq_file for printing to.
+ * @data: The debugfs dentry private data, a pointer to kbase_context.
+ *
+ * Return: Negative error code or 0 on success.
+ */
+static int kbasep_csf_sync_debugfs_show(struct seq_file *file, void *data)
+{
+	struct kbase_context *kctx = file->private;
+
+	kbasep_print(kctx, file, "MALI_CSF_SYNC_DEBUGFS_VERSION: v%u\n",
+		     MALI_CSF_SYNC_DEBUGFS_VERSION);
+
+	kbasep_csf_sync_kcpu_dump(kctx, file);
+	kbasep_csf_sync_gpu_dump(kctx, file);
+	return 0;
+}
+
+static int kbasep_csf_sync_debugfs_open(struct inode *in, struct file *file)
+{
+	return single_open(file, kbasep_csf_sync_debugfs_show, in->i_private);
+}
+
+static const struct file_operations kbasep_csf_sync_debugfs_fops = {
+	.open = kbasep_csf_sync_debugfs_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+/**
+ * kbase_csf_sync_debugfs_init() - Initialise debugfs file.
+ *
+ * @kctx: Kernel context pointer.
+ */
+void kbase_csf_sync_debugfs_init(struct kbase_context *kctx)
+{
+	struct dentry *file;
+	const mode_t mode = 0444;
+
+	if (WARN_ON(!kctx || IS_ERR_OR_NULL(kctx->kctx_dentry)))
+		return;
+
+	file = debugfs_create_file("csf_sync", mode, kctx->kctx_dentry, kctx,
+				   &kbasep_csf_sync_debugfs_fops);
+
+	if (IS_ERR_OR_NULL(file))
+		dev_warn(kctx->kbdev->dev, "Unable to create CSF Sync debugfs entry");
+}
+
+#else
+/*
+ * Stub functions for when debugfs is disabled
+ */
+void kbase_csf_sync_debugfs_init(struct kbase_context *kctx)
+{
+}
+
+#endif /* CONFIG_DEBUG_FS */
diff --git a/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h
new file mode 100644
index 0000000..2fe5060
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_sync_debugfs.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_CSF_SYNC_DEBUGFS_H_
+#define _KBASE_CSF_SYNC_DEBUGFS_H_
+
+#include <linux/seq_file.h>
+
+/* Forward declaration */
+struct kbase_context;
+
+#define MALI_CSF_SYNC_DEBUGFS_VERSION 0
+
+/**
+ * kbase_csf_sync_debugfs_init() - Create a debugfs entry for CSF queue sync info
+ *
+ * @kctx: The kbase_context for which to create the debugfs entry
+ */
+void kbase_csf_sync_debugfs_init(struct kbase_context *kctx);
+
+/**
+ * kbasep_csf_sync_kcpu_dump() - Print CSF KCPU queue sync info
+ *
+ * @kctx: The kbase context.
+ * @file: The seq_file for printing to.
+ *
+ * Return: Negative error code or 0 on success.
+ *
+ * Note: This function should not be used if kcpu_queues.lock is held. Use
+ * kbasep_csf_sync_kcpu_dump_locked() instead.
+ */
+int kbasep_csf_sync_kcpu_dump(struct kbase_context *kctx, struct seq_file *file);
+
+/**
+ * kbasep_csf_sync_kcpu_dump() - Print CSF KCPU queue sync info
+ *
+ * @kctx: The kbase context.
+ * @file: The seq_file for printing to.
+ *
+ * Return: Negative error code or 0 on success.
+ */
+int kbasep_csf_sync_kcpu_dump_locked(struct kbase_context *kctx, struct seq_file *file);
+
+#endif /* _KBASE_CSF_SYNC_DEBUGFS_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap.c b/mali_kbase/csf/mali_kbase_csf_tiler_heap.c
index 85babf9..f7e1a8d 100644
--- a/mali_kbase/csf/mali_kbase_csf_tiler_heap.c
+++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,6 +25,26 @@
 #include "mali_kbase_csf_tiler_heap_def.h"
 #include "mali_kbase_csf_heap_context_alloc.h"
 
+/* Tiler heap shrink stop limit for maintaining a minimum number of chunks */
+#define HEAP_SHRINK_STOP_LIMIT (1)
+
+/**
+ * struct kbase_csf_gpu_buffer_heap - A gpu buffer object specific to tiler heap
+ *
+ * @cdsbp_0:       Descriptor_type and buffer_type
+ * @size:          The size of the current heap chunk
+ * @pointer:       Pointer to the current heap chunk
+ * @low_pointer:   Pointer to low end of current heap chunk
+ * @high_pointer:  Pointer to high end of current heap chunk
+ */
+struct kbase_csf_gpu_buffer_heap {
+	u32 cdsbp_0;
+	u32 size;
+	u64 pointer;
+	u64 low_pointer;
+	u64 high_pointer;
+} __packed;
+
 /**
  * encode_chunk_ptr - Encode the address and size of a chunk as an integer.
  *
@@ -74,6 +94,35 @@ static struct kbase_csf_tiler_heap_chunk *get_last_chunk(
 }
 
 /**
+ * remove_external_chunk_mappings - Remove external mappings from a chunk that
+ *                                  is being transitioned to the tiler heap
+ *                                  memory system.
+ *
+ * @kctx:  kbase context the chunk belongs to.
+ * @chunk: The chunk whose external mappings are going to be removed.
+ *
+ * This function marks the region as DONT NEED. Along with NO_USER_FREE, this indicates
+ * that the VA region is owned by the tiler heap and could potentially be shrunk at any time. Other
+ * parts of kbase outside of tiler heap management should not take references on its physical
+ * pages, and should not modify them.
+ */
+static void remove_external_chunk_mappings(struct kbase_context *const kctx,
+					   struct kbase_csf_tiler_heap_chunk *chunk)
+{
+	lockdep_assert_held(&kctx->reg_lock);
+
+	if (chunk->region->cpu_alloc != NULL) {
+		kbase_mem_shrink_cpu_mapping(kctx, chunk->region, 0,
+					     chunk->region->cpu_alloc->nents);
+	}
+#if !defined(CONFIG_MALI_VECTOR_DUMP)
+	chunk->region->flags |= KBASE_REG_DONT_NEED;
+#endif
+
+	dev_dbg(kctx->kbdev->dev, "Removed external mappings from chunk 0x%llX", chunk->gpu_va);
+}
+
+/**
  * link_chunk - Link a chunk into a tiler heap
  *
  * @heap:  Pointer to the tiler heap.
@@ -93,19 +142,12 @@ static int link_chunk(struct kbase_csf_tiler_heap *const heap,
 
 	if (prev) {
 		struct kbase_context *const kctx = heap->kctx;
-		struct kbase_vmap_struct map;
-		u64 *const prev_hdr = kbase_vmap_prot(kctx, prev->gpu_va,
-			sizeof(*prev_hdr), KBASE_REG_CPU_WR, &map);
+		u64 *prev_hdr = prev->map.addr;
 
-		if (unlikely(!prev_hdr)) {
-			dev_err(kctx->kbdev->dev,
-				"Failed to map tiler heap chunk 0x%llX\n",
-				prev->gpu_va);
-			return -ENOMEM;
-		}
+		WARN((prev->region->flags & KBASE_REG_CPU_CACHED),
+		     "Cannot support CPU cached chunks without sync operations");
 
 		*prev_hdr = encode_chunk_ptr(heap->chunk_size, chunk->gpu_va);
-		kbase_vunmap(kctx, &map);
 
 		dev_dbg(kctx->kbdev->dev,
 			"Linked tiler heap chunks, 0x%llX -> 0x%llX\n",
@@ -132,152 +174,284 @@ static int link_chunk(struct kbase_csf_tiler_heap *const heap,
 static int init_chunk(struct kbase_csf_tiler_heap *const heap,
 	struct kbase_csf_tiler_heap_chunk *const chunk, bool link_with_prev)
 {
-	struct kbase_vmap_struct map;
-	struct u64 *chunk_hdr = NULL;
+	int err = 0;
+	u64 *chunk_hdr;
 	struct kbase_context *const kctx = heap->kctx;
 
+	lockdep_assert_held(&kctx->csf.tiler_heaps.lock);
+
 	if (unlikely(chunk->gpu_va & ~CHUNK_ADDR_MASK)) {
 		dev_err(kctx->kbdev->dev,
 			"Tiler heap chunk address is unusable\n");
 		return -EINVAL;
 	}
 
-	chunk_hdr = kbase_vmap_prot(kctx,
-		chunk->gpu_va, CHUNK_HDR_SIZE, KBASE_REG_CPU_WR, &map);
-
-	if (unlikely(!chunk_hdr)) {
-		dev_err(kctx->kbdev->dev,
-			"Failed to map a tiler heap chunk header\n");
-		return -ENOMEM;
+	WARN((chunk->region->flags & KBASE_REG_CPU_CACHED),
+	     "Cannot support CPU cached chunks without sync operations");
+	chunk_hdr = chunk->map.addr;
+	if (WARN(chunk->map.size < CHUNK_HDR_SIZE,
+		 "Tiler chunk kernel mapping was not large enough for zero-init")) {
+		return -EINVAL;
 	}
 
 	memset(chunk_hdr, 0, CHUNK_HDR_SIZE);
-	kbase_vunmap(kctx, &map);
+	INIT_LIST_HEAD(&chunk->link);
 
 	if (link_with_prev)
-		return link_chunk(heap, chunk);
-	else
-		return 0;
+		err = link_chunk(heap, chunk);
+
+	if (unlikely(err)) {
+		dev_err(kctx->kbdev->dev, "Failed to link a chunk to a tiler heap\n");
+		return -EINVAL;
+	}
+
+	list_add_tail(&chunk->link, &heap->chunks_list);
+	heap->chunk_count++;
+
+	return err;
 }
 
 /**
- * create_chunk - Create a tiler heap chunk
+ * remove_unlinked_chunk - Remove a chunk that is not currently linked into a
+ *                         heap.
  *
- * @heap: Pointer to the tiler heap for which to allocate memory.
- * @link_with_prev: Flag to indicate if the chunk to be allocated needs to be
- *                  linked with the previously allocated chunk.
+ * @kctx:  Kbase context that was used to allocate the memory.
+ * @chunk: Chunk that has been allocated, but not linked into a heap.
+ */
+static void remove_unlinked_chunk(struct kbase_context *kctx,
+				  struct kbase_csf_tiler_heap_chunk *chunk)
+{
+	if (WARN_ON(!list_empty(&chunk->link)))
+		return;
+
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
+	kbase_vunmap(kctx, &chunk->map);
+	/* KBASE_REG_DONT_NEED regions will be confused with ephemeral regions (inc freed JIT
+	 * regions), and so we must clear that flag too before freeing.
+	 * For "no user free count", we check that the count is 1 as it is a shrinkable region;
+	 * no other code part within kbase can take a reference to it.
+	 */
+	WARN_ON(atomic_read(&chunk->region->no_user_free_count) > 1);
+	kbase_va_region_no_user_free_dec(chunk->region);
+#if !defined(CONFIG_MALI_VECTOR_DUMP)
+	chunk->region->flags &= ~KBASE_REG_DONT_NEED;
+#endif
+	kbase_mem_free_region(kctx, chunk->region);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
+
+	kfree(chunk);
+}
+
+/**
+ * alloc_new_chunk - Allocate new chunk metadata for the tiler heap, reserve a fully backed VA
+ *                   region for the chunk, and provide a kernel mapping.
+ * @kctx:       kbase context with which the chunk will be linked
+ * @chunk_size: the size of the chunk from the corresponding heap
  *
- * This function allocates a chunk of memory for a tiler heap and adds it to
- * the end of the list of chunks associated with that heap. The size of the
- * chunk is not a parameter because it is configured per-heap not per-chunk.
+ * Allocate the chunk tracking metadata and a corresponding fully backed VA region for the
+ * chunk. The kernel may need to invoke the reclaim path while trying to fulfill the allocation, so
+ * we cannot hold any lock that would be held in the shrinker paths (JIT evict lock or tiler heap
+ * lock).
  *
- * Return: 0 if successful or a negative error code on failure.
+ * Since the chunk may have its physical backing removed, to prevent use-after-free scenarios we
+ * ensure that it is protected from being mapped by other parts of kbase.
+ *
+ * The chunk's GPU memory can be accessed via its 'map' member, but should only be done so by the
+ * shrinker path, as it may be otherwise shrunk at any time.
+ *
+ * Return: pointer to kbase_csf_tiler_heap_chunk on success or a NULL pointer
+ *         on failure
  */
-static int create_chunk(struct kbase_csf_tiler_heap *const heap,
-			bool link_with_prev)
+static struct kbase_csf_tiler_heap_chunk *alloc_new_chunk(struct kbase_context *kctx,
+							  u64 chunk_size)
 {
-	int err = 0;
-	struct kbase_context *const kctx = heap->kctx;
-	u64 nr_pages = PFN_UP(heap->chunk_size);
-	u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR |
-		BASE_MEM_PROT_CPU_WR | BASEP_MEM_NO_USER_FREE |
-		BASE_MEM_COHERENT_LOCAL;
+	u64 nr_pages = PFN_UP(chunk_size);
+	u64 flags = BASE_MEM_PROT_GPU_RD | BASE_MEM_PROT_GPU_WR | BASE_MEM_PROT_CPU_WR |
+		    BASEP_MEM_NO_USER_FREE | BASE_MEM_COHERENT_LOCAL | BASE_MEM_PROT_CPU_RD;
 	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
+	/* The chunk kernel mapping needs to be large enough to:
+	 * - initially zero the CHUNK_HDR_SIZE area
+	 * - on shrinking, access the NEXT_CHUNK_ADDR_SIZE area
+	 */
+	const size_t chunk_kernel_map_size = max(CHUNK_HDR_SIZE, NEXT_CHUNK_ADDR_SIZE);
 
 	/* Calls to this function are inherently synchronous, with respect to
 	 * MMU operations.
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC;
-
 	flags |= kbase_mem_group_id_set(kctx->jit_group_id);
 
-#if defined(CONFIG_MALI_DEBUG) || defined(CONFIG_MALI_VECTOR_DUMP)
-	flags |= BASE_MEM_PROT_CPU_RD;
-#endif
-
 	chunk = kzalloc(sizeof(*chunk), GFP_KERNEL);
 	if (unlikely(!chunk)) {
 		dev_err(kctx->kbdev->dev,
 			"No kernel memory for a new tiler heap chunk\n");
-		return -ENOMEM;
+		return NULL;
 	}
 
 	/* Allocate GPU memory for the new chunk. */
-	INIT_LIST_HEAD(&chunk->link);
 	chunk->region =
 		kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, &chunk->gpu_va, mmu_sync_info);
 
 	if (unlikely(!chunk->region)) {
-		dev_err(kctx->kbdev->dev,
-			"Failed to allocate a tiler heap chunk\n");
-		err = -ENOMEM;
-	} else {
-		err = init_chunk(heap, chunk, link_with_prev);
-		if (unlikely(err)) {
-			kbase_gpu_vm_lock(kctx);
-			chunk->region->flags &= ~KBASE_REG_NO_USER_FREE;
-			kbase_mem_free_region(kctx, chunk->region);
-			kbase_gpu_vm_unlock(kctx);
-		}
+		dev_err(kctx->kbdev->dev, "Failed to allocate a tiler heap chunk!\n");
+		goto unroll_chunk;
 	}
 
-	if (unlikely(err)) {
-		kfree(chunk);
-	} else {
-		list_add_tail(&chunk->link, &heap->chunks_list);
-		heap->chunk_count++;
+	kbase_gpu_vm_lock(kctx);
 
-		dev_dbg(kctx->kbdev->dev, "Created tiler heap chunk 0x%llX\n",
-			chunk->gpu_va);
+	/* Some checks done here as NO_USER_FREE still allows such things to be made
+	 * whilst we had dropped the region lock
+	 */
+	if (unlikely(atomic_read(&chunk->region->gpu_alloc->kernel_mappings) > 0)) {
+		dev_err(kctx->kbdev->dev, "Chunk region has active kernel mappings!\n");
+		goto unroll_region;
 	}
 
-	return err;
+	/* There is a race condition with regard to KBASE_REG_DONT_NEED, where another
+	 * thread can have the "no user free" refcount increased between kbase_mem_alloc
+	 * and kbase_gpu_vm_lock (above) and before KBASE_REG_DONT_NEED is set by
+	 * remove_external_chunk_mappings (below).
+	 *
+	 * It should be fine and not a security risk if we let the region leak till
+	 * region tracker termination in such a case.
+	 */
+	if (unlikely(atomic_read(&chunk->region->no_user_free_count) > 1)) {
+		dev_err(kctx->kbdev->dev, "Chunk region has no_user_free_count > 1!\n");
+		goto unroll_region;
+	}
+
+	/* Whilst we can be sure of a number of other restrictions due to BASEP_MEM_NO_USER_FREE
+	 * being requested, it's useful to document in code what those restrictions are, and ensure
+	 * they remain in place in future.
+	 */
+	if (WARN(!chunk->region->gpu_alloc,
+		 "NO_USER_FREE chunks should not have had their alloc freed")) {
+		goto unroll_region;
+	}
+
+	if (WARN(chunk->region->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE,
+		 "NO_USER_FREE chunks should not have been freed and then reallocated as imported/non-native regions")) {
+		goto unroll_region;
+	}
+
+	if (WARN((chunk->region->flags & KBASE_REG_ACTIVE_JIT_ALLOC),
+		 "NO_USER_FREE chunks should not have been freed and then reallocated as JIT regions")) {
+		goto unroll_region;
+	}
+
+	if (WARN((chunk->region->flags & KBASE_REG_DONT_NEED),
+		 "NO_USER_FREE chunks should not have been made ephemeral")) {
+		goto unroll_region;
+	}
+
+	if (WARN(atomic_read(&chunk->region->cpu_alloc->gpu_mappings) > 1,
+		 "NO_USER_FREE chunks should not have been aliased")) {
+		goto unroll_region;
+	}
+
+	if (unlikely(!kbase_vmap_reg(kctx, chunk->region, chunk->gpu_va, chunk_kernel_map_size,
+				     (KBASE_REG_CPU_RD | KBASE_REG_CPU_WR), &chunk->map,
+				     KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING))) {
+		dev_err(kctx->kbdev->dev, "Failed to map chunk header for shrinking!\n");
+		goto unroll_region;
+	}
+
+	remove_external_chunk_mappings(kctx, chunk);
+	kbase_gpu_vm_unlock(kctx);
+
+	/* If page migration is enabled, we don't want to migrate tiler heap pages.
+	 * This does not change if the constituent pages are already marked as isolated.
+	 */
+	if (kbase_is_page_migration_enabled())
+		kbase_set_phy_alloc_page_status(chunk->region->gpu_alloc, NOT_MOVABLE);
+
+	return chunk;
+
+unroll_region:
+	/* KBASE_REG_DONT_NEED regions will be confused with ephemeral regions (inc freed JIT
+	 * regions), and so we must clear that flag too before freeing.
+	 */
+	kbase_va_region_no_user_free_dec(chunk->region);
+#if !defined(CONFIG_MALI_VECTOR_DUMP)
+	chunk->region->flags &= ~KBASE_REG_DONT_NEED;
+#endif
+	kbase_mem_free_region(kctx, chunk->region);
+	kbase_gpu_vm_unlock(kctx);
+unroll_chunk:
+	kfree(chunk);
+	return NULL;
 }
 
 /**
- * delete_chunk - Delete a tiler heap chunk
+ * create_chunk - Create a tiler heap chunk
  *
- * @heap:  Pointer to the tiler heap for which @chunk was allocated.
- * @chunk: Pointer to a chunk to be deleted.
+ * @heap: Pointer to the tiler heap for which to allocate memory.
  *
- * This function frees a tiler heap chunk previously allocated by @create_chunk
- * and removes it from the list of chunks associated with the heap.
+ * This function allocates a chunk of memory for a tiler heap, adds it to the
+ * the list of chunks associated with that heap both on the host side and in GPU
+ * memory.
  *
- * WARNING: The deleted chunk is not unlinked from the list of chunks used by
- *          the GPU, therefore it is only safe to use this function when
- *          deleting a heap.
+ * Return: 0 if successful or a negative error code on failure.
  */
-static void delete_chunk(struct kbase_csf_tiler_heap *const heap,
-	struct kbase_csf_tiler_heap_chunk *const chunk)
+static int create_chunk(struct kbase_csf_tiler_heap *const heap)
 {
-	struct kbase_context *const kctx = heap->kctx;
+	int err = 0;
+	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
 
-	kbase_gpu_vm_lock(kctx);
-	chunk->region->flags &= ~KBASE_REG_NO_USER_FREE;
-	kbase_mem_free_region(kctx, chunk->region);
-	kbase_gpu_vm_unlock(kctx);
-	list_del(&chunk->link);
-	heap->chunk_count--;
-	kfree(chunk);
+	chunk = alloc_new_chunk(heap->kctx, heap->chunk_size);
+	if (unlikely(!chunk)) {
+		err = -ENOMEM;
+		goto allocation_failure;
+	}
+
+	mutex_lock(&heap->kctx->csf.tiler_heaps.lock);
+	err = init_chunk(heap, chunk, true);
+	mutex_unlock(&heap->kctx->csf.tiler_heaps.lock);
+
+	if (unlikely(err))
+		goto initialization_failure;
+
+	dev_dbg(heap->kctx->kbdev->dev, "Created tiler heap chunk 0x%llX\n", chunk->gpu_va);
+
+	return 0;
+initialization_failure:
+	remove_unlinked_chunk(heap->kctx, chunk);
+allocation_failure:
+	return err;
 }
 
 /**
- * delete_all_chunks - Delete all chunks belonging to a tiler heap
+ * delete_all_chunks - Delete all chunks belonging to an unlinked tiler heap
  *
  * @heap: Pointer to a tiler heap.
  *
- * This function empties the list of chunks associated with a tiler heap by
- * freeing all chunks previously allocated by @create_chunk.
+ * This function empties the list of chunks associated with a tiler heap by freeing all chunks
+ * previously allocated by @create_chunk.
+ *
+ * The heap must not be reachable from a &struct kbase_context.csf.tiler_heaps.list, as the
+ * tiler_heaps lock cannot be held whilst deleting its chunks due to also needing the &struct
+ * kbase_context.region_lock.
+ *
+ * WARNING: Whilst the deleted chunks are unlinked from host memory, they are not unlinked from the
+ *          list of chunks used by the GPU, therefore it is only safe to use this function when
+ *          deleting a heap.
  */
 static void delete_all_chunks(struct kbase_csf_tiler_heap *heap)
 {
+	struct kbase_context *const kctx = heap->kctx;
 	struct list_head *entry = NULL, *tmp = NULL;
 
+	WARN(!list_empty(&heap->link),
+	     "Deleting a heap's chunks when that heap is still linked requires the tiler_heaps lock, which cannot be held by the caller");
+
 	list_for_each_safe(entry, tmp, &heap->chunks_list) {
 		struct kbase_csf_tiler_heap_chunk *chunk = list_entry(
 			entry, struct kbase_csf_tiler_heap_chunk, link);
 
-		delete_chunk(heap, chunk);
+		list_del_init(&chunk->link);
+		heap->chunk_count--;
+
+		remove_unlinked_chunk(kctx, chunk);
 	}
 }
 
@@ -299,7 +473,7 @@ static int create_initial_chunks(struct kbase_csf_tiler_heap *const heap,
 	u32 i;
 
 	for (i = 0; (i < nchunks) && likely(!err); i++)
-		err = create_chunk(heap, true);
+		err = create_chunk(heap);
 
 	if (unlikely(err))
 		delete_all_chunks(heap);
@@ -308,14 +482,17 @@ static int create_initial_chunks(struct kbase_csf_tiler_heap *const heap,
 }
 
 /**
- * delete_heap - Delete a tiler heap
+ * delete_heap - Delete an unlinked tiler heap
  *
  * @heap: Pointer to a tiler heap to be deleted.
  *
  * This function frees any chunks allocated for a tiler heap previously
- * initialized by @kbase_csf_tiler_heap_init and removes it from the list of
- * heaps associated with the kbase context. The heap context structure used by
+ * initialized by @kbase_csf_tiler_heap_init. The heap context structure used by
  * the firmware is also freed.
+ *
+ * The heap must not be reachable from a &struct kbase_context.csf.tiler_heaps.list, as the
+ * tiler_heaps lock cannot be held whilst deleting it due to also needing the &struct
+ * kbase_context.region_lock.
  */
 static void delete_heap(struct kbase_csf_tiler_heap *heap)
 {
@@ -323,23 +500,41 @@ static void delete_heap(struct kbase_csf_tiler_heap *heap)
 
 	dev_dbg(kctx->kbdev->dev, "Deleting tiler heap 0x%llX\n", heap->gpu_va);
 
-	lockdep_assert_held(&kctx->csf.tiler_heaps.lock);
+	WARN(!list_empty(&heap->link),
+	     "Deleting a heap that is still linked requires the tiler_heaps lock, which cannot be held by the caller");
 
+	/* Make sure that all of the VA regions corresponding to the chunks are
+	 * freed at this time and that the work queue is not trying to access freed
+	 * memory.
+	 *
+	 * Note: since the heap is unlinked, and that no references are made to chunks other
+	 * than from their heap, there is no need to separately move the chunks out of the
+	 * heap->chunks_list to delete them.
+	 */
 	delete_all_chunks(heap);
 
+	kbase_vunmap(kctx, &heap->gpu_va_map);
 	/* We could optimize context destruction by not freeing leaked heap
-	 * contexts but it doesn't seem worth the extra complexity.
+	 * contexts but it doesn't seem worth the extra complexity. After this
+	 * point, the suballocation is returned to the heap context allocator and
+	 * may be overwritten with new data, meaning heap->gpu_va should not
+	 * be used past this point.
 	 */
 	kbase_csf_heap_context_allocator_free(&kctx->csf.tiler_heaps.ctx_alloc,
 		heap->gpu_va);
 
-	list_del(&heap->link);
-
 	WARN_ON(heap->chunk_count);
 	KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id,
 		heap->heap_id, 0, 0, heap->max_chunks, heap->chunk_size, 0,
 		heap->target_in_flight, 0);
 
+	if (heap->buf_desc_reg) {
+		kbase_vunmap(kctx, &heap->buf_desc_map);
+		kbase_gpu_vm_lock(kctx);
+		kbase_va_region_no_user_free_dec(heap->buf_desc_reg);
+		kbase_gpu_vm_unlock(kctx);
+	}
+
 	kfree(heap);
 }
 
@@ -375,6 +570,23 @@ static struct kbase_csf_tiler_heap *find_tiler_heap(
 	return NULL;
 }
 
+static struct kbase_csf_tiler_heap_chunk *find_chunk(struct kbase_csf_tiler_heap *heap,
+						     u64 const chunk_gpu_va)
+{
+	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
+
+	lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock);
+
+	list_for_each_entry(chunk, &heap->chunks_list, link) {
+		if (chunk->gpu_va == chunk_gpu_va)
+			return chunk;
+	}
+
+	dev_dbg(heap->kctx->kbdev->dev, "Tiler heap chunk 0x%llX was not found\n", chunk_gpu_va);
+
+	return NULL;
+}
+
 int kbase_csf_tiler_heap_context_init(struct kbase_context *const kctx)
 {
 	int err = kbase_csf_heap_context_allocator_init(
@@ -393,37 +605,88 @@ int kbase_csf_tiler_heap_context_init(struct kbase_context *const kctx)
 
 void kbase_csf_tiler_heap_context_term(struct kbase_context *const kctx)
 {
+	LIST_HEAD(local_heaps_list);
 	struct list_head *entry = NULL, *tmp = NULL;
 
 	dev_dbg(kctx->kbdev->dev, "Terminating a context for tiler heaps\n");
 
 	mutex_lock(&kctx->csf.tiler_heaps.lock);
+	list_splice_init(&kctx->csf.tiler_heaps.list, &local_heaps_list);
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
 
-	list_for_each_safe(entry, tmp, &kctx->csf.tiler_heaps.list) {
+	list_for_each_safe(entry, tmp, &local_heaps_list) {
 		struct kbase_csf_tiler_heap *heap = list_entry(
 			entry, struct kbase_csf_tiler_heap, link);
+
+		list_del_init(&heap->link);
 		delete_heap(heap);
 	}
 
-	mutex_unlock(&kctx->csf.tiler_heaps.lock);
 	mutex_destroy(&kctx->csf.tiler_heaps.lock);
 
 	kbase_csf_heap_context_allocator_term(&kctx->csf.tiler_heaps.ctx_alloc);
 }
 
-int kbase_csf_tiler_heap_init(struct kbase_context *const kctx,
-	u32 const chunk_size, u32 const initial_chunks, u32 const max_chunks,
-	u16 const target_in_flight, u64 *const heap_gpu_va,
-	u64 *const first_chunk_va)
+/**
+ * kbasep_is_buffer_descriptor_region_suitable - Check if a VA region chosen to house
+ *                                               the tiler heap buffer descriptor
+ *                                               is suitable for the purpose.
+ * @kctx: kbase context of the tiler heap
+ * @reg:  VA region being checked for suitability
+ *
+ * The tiler heap buffer descriptor memory does not admit page faults according
+ * to its design, so it must have the entirety of the backing upon allocation,
+ * and it has to remain alive as long as the tiler heap is alive, meaning it
+ * cannot be allocated from JIT/Ephemeral, or user freeable memory.
+ *
+ * Return: true on suitability, false otherwise.
+ */
+static bool kbasep_is_buffer_descriptor_region_suitable(struct kbase_context *const kctx,
+							struct kbase_va_region *const reg)
+{
+	if (kbase_is_region_invalid_or_free(reg)) {
+		dev_err(kctx->kbdev->dev, "Region is either invalid or free!\n");
+		return false;
+	}
+
+	if (!(reg->flags & KBASE_REG_CPU_RD) || kbase_is_region_shrinkable(reg) ||
+	    (reg->flags & KBASE_REG_PF_GROW)) {
+		dev_err(kctx->kbdev->dev, "Region has invalid flags: 0x%lX!\n", reg->flags);
+		return false;
+	}
+
+	if (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE) {
+		dev_err(kctx->kbdev->dev, "Region has invalid type!\n");
+		return false;
+	}
+
+	if ((reg->nr_pages != kbase_reg_current_backed_size(reg)) ||
+	    (reg->nr_pages < PFN_UP(sizeof(struct kbase_csf_gpu_buffer_heap)))) {
+		dev_err(kctx->kbdev->dev, "Region has invalid backing!\n");
+		return false;
+	}
+
+	return true;
+}
+
+#define TILER_BUF_DESC_SIZE (sizeof(struct kbase_csf_gpu_buffer_heap))
+
+int kbase_csf_tiler_heap_init(struct kbase_context *const kctx, u32 const chunk_size,
+			      u32 const initial_chunks, u32 const max_chunks,
+			      u16 const target_in_flight, u64 const buf_desc_va,
+			      u64 *const heap_gpu_va, u64 *const first_chunk_va)
 {
 	int err = 0;
 	struct kbase_csf_tiler_heap *heap = NULL;
 	struct kbase_csf_heap_context_allocator *const ctx_alloc =
 		&kctx->csf.tiler_heaps.ctx_alloc;
+	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
+	struct kbase_va_region *gpu_va_reg = NULL;
+	void *vmap_ptr = NULL;
 
 	dev_dbg(kctx->kbdev->dev,
-		"Creating a tiler heap with %u chunks (limit: %u) of size %u\n",
-		initial_chunks, max_chunks, chunk_size);
+		"Creating a tiler heap with %u chunks (limit: %u) of size %u, buf_desc_va: 0x%llx\n",
+		initial_chunks, max_chunks, chunk_size, buf_desc_va);
 
 	if (!kbase_mem_allow_alloc(kctx))
 		return -EINVAL;
@@ -445,8 +708,7 @@ int kbase_csf_tiler_heap_init(struct kbase_context *const kctx,
 
 	heap = kzalloc(sizeof(*heap), GFP_KERNEL);
 	if (unlikely(!heap)) {
-		dev_err(kctx->kbdev->dev,
-			"No kernel memory for a new tiler heap\n");
+		dev_err(kctx->kbdev->dev, "No kernel memory for a new tiler heap");
 		return -ENOMEM;
 	}
 
@@ -454,57 +716,130 @@ int kbase_csf_tiler_heap_init(struct kbase_context *const kctx,
 	heap->chunk_size = chunk_size;
 	heap->max_chunks = max_chunks;
 	heap->target_in_flight = target_in_flight;
+	heap->buf_desc_checked = false;
 	INIT_LIST_HEAD(&heap->chunks_list);
+	INIT_LIST_HEAD(&heap->link);
 
-	heap->gpu_va = kbase_csf_heap_context_allocator_alloc(ctx_alloc);
+	/* Check on the buffer descriptor virtual Address */
+	if (buf_desc_va) {
+		struct kbase_va_region *buf_desc_reg;
+
+		kbase_gpu_vm_lock(kctx);
+		buf_desc_reg =
+			kbase_region_tracker_find_region_enclosing_address(kctx, buf_desc_va);
+
+		if (!kbasep_is_buffer_descriptor_region_suitable(kctx, buf_desc_reg)) {
+			kbase_gpu_vm_unlock(kctx);
+			dev_err(kctx->kbdev->dev,
+				"Could not find a suitable VA region for the tiler heap buf desc!\n");
+			err = -EINVAL;
+			goto buf_desc_not_suitable;
+		}
+
+		/* If we don't prevent userspace from unmapping this, we may run into
+		 * use-after-free, as we don't check for the existence of the region throughout.
+		 */
+
+		heap->buf_desc_va = buf_desc_va;
+		heap->buf_desc_reg = buf_desc_reg;
+		kbase_va_region_no_user_free_inc(buf_desc_reg);
 
+		vmap_ptr = kbase_vmap_reg(kctx, buf_desc_reg, buf_desc_va, TILER_BUF_DESC_SIZE,
+					  KBASE_REG_CPU_RD, &heap->buf_desc_map,
+					  KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING);
+
+		if (kbase_is_page_migration_enabled())
+			kbase_set_phy_alloc_page_status(buf_desc_reg->gpu_alloc, NOT_MOVABLE);
+
+		kbase_gpu_vm_unlock(kctx);
+
+		if (unlikely(!vmap_ptr)) {
+			dev_err(kctx->kbdev->dev,
+				"Could not vmap buffer descriptor into kernel memory (err %d)\n",
+				err);
+			err = -ENOMEM;
+			goto buf_desc_vmap_failed;
+		}
+	}
+
+	heap->gpu_va = kbase_csf_heap_context_allocator_alloc(ctx_alloc);
 	if (unlikely(!heap->gpu_va)) {
-		dev_dbg(kctx->kbdev->dev,
-			"Failed to allocate a tiler heap context");
+		dev_dbg(kctx->kbdev->dev, "Failed to allocate a tiler heap context\n");
 		err = -ENOMEM;
-	} else {
-		err = create_initial_chunks(heap, initial_chunks);
-		if (unlikely(err))
-			kbase_csf_heap_context_allocator_free(ctx_alloc, heap->gpu_va);
+		goto heap_context_alloc_failed;
+	}
+
+	gpu_va_reg = ctx_alloc->region;
+
+	kbase_gpu_vm_lock(kctx);
+	/* gpu_va_reg was created with BASEP_MEM_NO_USER_FREE, the code to unset this only happens
+	 * on kctx termination (after all syscalls on kctx have finished), and so it is safe to
+	 * assume that gpu_va_reg is still present.
+	 */
+	vmap_ptr = kbase_vmap_reg(kctx, gpu_va_reg, heap->gpu_va, NEXT_CHUNK_ADDR_SIZE,
+				  (KBASE_REG_CPU_RD | KBASE_REG_CPU_WR), &heap->gpu_va_map,
+				  KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING);
+	kbase_gpu_vm_unlock(kctx);
+	if (unlikely(!vmap_ptr)) {
+		dev_dbg(kctx->kbdev->dev, "Failed to vmap the correct heap GPU VA address\n");
+		err = -ENOMEM;
+		goto heap_context_vmap_failed;
 	}
 
+	err = create_initial_chunks(heap, initial_chunks);
 	if (unlikely(err)) {
-		kfree(heap);
-	} else {
-		struct kbase_csf_tiler_heap_chunk const *chunk = list_first_entry(
-			&heap->chunks_list, struct kbase_csf_tiler_heap_chunk, link);
+		dev_dbg(kctx->kbdev->dev, "Failed to create the initial tiler heap chunks\n");
+		goto create_chunks_failed;
+	}
+	chunk = list_first_entry(&heap->chunks_list, struct kbase_csf_tiler_heap_chunk, link);
 
-		*heap_gpu_va = heap->gpu_va;
-		*first_chunk_va = chunk->gpu_va;
+	*heap_gpu_va = heap->gpu_va;
+	*first_chunk_va = chunk->gpu_va;
 
-		mutex_lock(&kctx->csf.tiler_heaps.lock);
-		kctx->csf.tiler_heaps.nr_of_heaps++;
-		heap->heap_id = kctx->csf.tiler_heaps.nr_of_heaps;
-		list_add(&heap->link, &kctx->csf.tiler_heaps.list);
+	mutex_lock(&kctx->csf.tiler_heaps.lock);
+	kctx->csf.tiler_heaps.nr_of_heaps++;
+	heap->heap_id = kctx->csf.tiler_heaps.nr_of_heaps;
+	list_add(&heap->link, &kctx->csf.tiler_heaps.list);
 
-		KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(
-			kctx->kbdev, kctx->id, heap->heap_id,
-			PFN_UP(heap->chunk_size * heap->max_chunks),
-			PFN_UP(heap->chunk_size * heap->chunk_count), heap->max_chunks,
-			heap->chunk_size, heap->chunk_count, heap->target_in_flight, 0);
+	KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id, heap->heap_id,
+					    PFN_UP(heap->chunk_size * heap->max_chunks),
+					    PFN_UP(heap->chunk_size * heap->chunk_count),
+					    heap->max_chunks, heap->chunk_size, heap->chunk_count,
+					    heap->target_in_flight, 0);
 
 #if defined(CONFIG_MALI_VECTOR_DUMP)
-		list_for_each_entry(chunk, &heap->chunks_list, link) {
-			KBASE_TLSTREAM_JD_TILER_HEAP_CHUNK_ALLOC(
-				kctx->kbdev, kctx->id, heap->heap_id, chunk->gpu_va);
-		}
+	list_for_each_entry(chunk, &heap->chunks_list, link) {
+		KBASE_TLSTREAM_JD_TILER_HEAP_CHUNK_ALLOC(kctx->kbdev, kctx->id, heap->heap_id,
+							 chunk->gpu_va);
+	}
 #endif
+	kctx->running_total_tiler_heap_nr_chunks += heap->chunk_count;
+	kctx->running_total_tiler_heap_memory += (u64)heap->chunk_size * heap->chunk_count;
+	if (kctx->running_total_tiler_heap_memory > kctx->peak_total_tiler_heap_memory)
+		kctx->peak_total_tiler_heap_memory = kctx->running_total_tiler_heap_memory;
 
-		dev_dbg(kctx->kbdev->dev, "Created tiler heap 0x%llX\n", heap->gpu_va);
-		mutex_unlock(&kctx->csf.tiler_heaps.lock);
-		kctx->running_total_tiler_heap_nr_chunks += heap->chunk_count;
-		kctx->running_total_tiler_heap_memory +=
-			heap->chunk_size * heap->chunk_count;
-		if (kctx->running_total_tiler_heap_memory >
-		    kctx->peak_total_tiler_heap_memory)
-			kctx->peak_total_tiler_heap_memory =
-				kctx->running_total_tiler_heap_memory;
+	dev_dbg(kctx->kbdev->dev,
+		"Created tiler heap 0x%llX, buffer descriptor 0x%llX, ctx_%d_%d\n", heap->gpu_va,
+		buf_desc_va, kctx->tgid, kctx->id);
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+
+	return 0;
+
+create_chunks_failed:
+	kbase_vunmap(kctx, &heap->gpu_va_map);
+heap_context_vmap_failed:
+	kbase_csf_heap_context_allocator_free(ctx_alloc, heap->gpu_va);
+heap_context_alloc_failed:
+	if (heap->buf_desc_reg)
+		kbase_vunmap(kctx, &heap->buf_desc_map);
+buf_desc_vmap_failed:
+	if (heap->buf_desc_reg) {
+		kbase_gpu_vm_lock(kctx);
+		kbase_va_region_no_user_free_dec(heap->buf_desc_reg);
+		kbase_gpu_vm_unlock(kctx);
 	}
+buf_desc_not_suitable:
+	kfree(heap);
 	return err;
 }
 
@@ -517,16 +852,19 @@ int kbase_csf_tiler_heap_term(struct kbase_context *const kctx,
 	u64 heap_size = 0;
 
 	mutex_lock(&kctx->csf.tiler_heaps.lock);
-
 	heap = find_tiler_heap(kctx, heap_gpu_va);
 	if (likely(heap)) {
 		chunk_count = heap->chunk_count;
 		heap_size = heap->chunk_size * chunk_count;
-		delete_heap(heap);
-	} else
+
+		list_del_init(&heap->link);
+	} else {
 		err = -EINVAL;
+	}
 
-	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+	/* Update stats whilst still holding the lock so they are in sync with the tiler_heaps.list
+	 * at all times
+	 */
 	if (likely(kctx->running_total_tiler_heap_memory >= heap_size))
 		kctx->running_total_tiler_heap_memory -= heap_size;
 	else
@@ -537,36 +875,46 @@ int kbase_csf_tiler_heap_term(struct kbase_context *const kctx,
 	else
 		dev_warn(kctx->kbdev->dev,
 			 "Running total tiler chunk count lower than expected!");
+	if (!err)
+		dev_dbg(kctx->kbdev->dev,
+			"Terminated tiler heap 0x%llX, buffer descriptor 0x%llX, ctx_%d_%d\n",
+			heap->gpu_va, heap->buf_desc_va, kctx->tgid, kctx->id);
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+
+	/* Deletion requires the kctx->reg_lock, so must only operate on it whilst unlinked from
+	 * the kctx's csf.tiler_heaps.list, and without holding the csf.tiler_heaps.lock
+	 */
+	if (likely(heap))
+		delete_heap(heap);
+
 	return err;
 }
 
 /**
- * alloc_new_chunk - Allocate a new chunk for the tiler heap.
- *
- * @heap:               Pointer to the tiler heap.
- * @nr_in_flight:       Number of render passes that are in-flight, must not be zero.
- * @pending_frag_count: Number of render passes in-flight with completed vertex/tiler stage.
- *                      The minimum value is zero but it must be less or equal to
- *                      the total number of render passes in flight
- * @new_chunk_ptr:      Where to store the GPU virtual address & size of the new
- *                      chunk allocated for the heap.
- *
- * This function will allocate a new chunk for the chunked tiler heap depending
- * on the settings provided by userspace when the heap was created and the
- * heap's statistics (like number of render passes in-flight).
- *
- * Return: 0 if a new chunk was allocated otherwise an appropriate negative
- *         error code.
+ * validate_allocation_request - Check whether the chunk allocation request
+ *                               received on tiler OOM should be handled at
+ *                               current time.
+ *
+ * @heap:               The tiler heap the OOM is associated with
+ * @nr_in_flight:       Number of fragment jobs in flight
+ * @pending_frag_count: Number of pending fragment jobs
+ *
+ * Context: must hold the tiler heap lock to guarantee its lifetime
+ *
+ * Return:
+ * * 0       - allowed to allocate an additional chunk
+ * * -EINVAL - invalid
+ * * -EBUSY  - there are fragment jobs still in flight, which may free chunks
+ *             after completing
+ * * -ENOMEM - the targeted number of in-flight chunks has been reached and
+ *             no new ones will be allocated
  */
-static int alloc_new_chunk(struct kbase_csf_tiler_heap *heap,
-		u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr)
+static int validate_allocation_request(struct kbase_csf_tiler_heap *heap, u32 nr_in_flight,
+				       u32 pending_frag_count)
 {
-	int err = -ENOMEM;
-
 	lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock);
 
-	if (WARN_ON(!nr_in_flight) ||
-		WARN_ON(pending_frag_count > nr_in_flight))
+	if (WARN_ON(!nr_in_flight) || WARN_ON(pending_frag_count > nr_in_flight))
 		return -EINVAL;
 
 	if (nr_in_flight <= heap->target_in_flight) {
@@ -574,66 +922,452 @@ static int alloc_new_chunk(struct kbase_csf_tiler_heap *heap,
 			/* Not exceeded the target number of render passes yet so be
 			 * generous with memory.
 			 */
-			err = create_chunk(heap, false);
-
-			if (likely(!err)) {
-				struct kbase_csf_tiler_heap_chunk *new_chunk =
-								get_last_chunk(heap);
-				if (!WARN_ON(!new_chunk)) {
-					*new_chunk_ptr =
-						encode_chunk_ptr(heap->chunk_size,
-								 new_chunk->gpu_va);
-					return 0;
-				}
-			}
+			return 0;
 		} else if (pending_frag_count > 0) {
-			err = -EBUSY;
+			return -EBUSY;
 		} else {
-			err = -ENOMEM;
+			return -ENOMEM;
 		}
 	} else {
 		/* Reached target number of render passes in flight.
 		 * Wait for some of them to finish
 		 */
-		err = -EBUSY;
+		return -EBUSY;
 	}
-
-	return err;
+	return -ENOMEM;
 }
 
 int kbase_csf_tiler_heap_alloc_new_chunk(struct kbase_context *kctx,
 	u64 gpu_heap_va, u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr)
 {
 	struct kbase_csf_tiler_heap *heap;
+	struct kbase_csf_tiler_heap_chunk *chunk;
 	int err = -EINVAL;
+	u64 chunk_size = 0;
+	u64 heap_id = 0;
+
+	/* To avoid potential locking issues during allocation, this is handled
+	 * in three phases:
+	 * 1. Take the lock, find the corresponding heap, and find its chunk size
+	 * (this is always 2 MB, but may change down the line).
+	 * 2. Allocate memory for the chunk and its region.
+	 * 3. If the heap still exists, link it to the end of the list. If it
+	 * doesn't, roll back the allocation.
+	 */
 
 	mutex_lock(&kctx->csf.tiler_heaps.lock);
+	heap = find_tiler_heap(kctx, gpu_heap_va);
+	if (likely(heap)) {
+		chunk_size = heap->chunk_size;
+		heap_id = heap->heap_id;
+	} else {
+		dev_err(kctx->kbdev->dev, "Heap 0x%llX does not exist", gpu_heap_va);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto prelink_failure;
+	}
 
+	err = validate_allocation_request(heap, nr_in_flight, pending_frag_count);
+	if (unlikely(err)) {
+		/* The allocation request can be legitimate, but be invoked on a heap
+		 * that has already reached the maximum pre-configured capacity. This
+		 * is useful debug information, but should not be treated as an error,
+		 * since the request will be re-sent at a later point.
+		 */
+		dev_dbg(kctx->kbdev->dev,
+			"Not allocating new chunk for heap 0x%llX due to current heap state (err %d)",
+			gpu_heap_va, err);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto prelink_failure;
+	}
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+	/* this heap must not be used whilst we have dropped the lock */
+	heap = NULL;
+
+	chunk = alloc_new_chunk(kctx, chunk_size);
+	if (unlikely(!chunk)) {
+		dev_err(kctx->kbdev->dev, "Could not allocate chunk of size %lld for ctx %d_%d",
+			chunk_size, kctx->tgid, kctx->id);
+		goto prelink_failure;
+	}
+
+	/* After this point, the heap that we were targeting could already have had the needed
+	 * chunks allocated, if we were handling multiple OoM events on multiple threads, so
+	 * we need to revalidate the need for the allocation.
+	 */
+	mutex_lock(&kctx->csf.tiler_heaps.lock);
 	heap = find_tiler_heap(kctx, gpu_heap_va);
 
-	if (likely(heap)) {
-		err = alloc_new_chunk(heap, nr_in_flight, pending_frag_count,
-			new_chunk_ptr);
-		if (likely(!err)) {
-			/* update total and peak tiler heap memory record */
-			kctx->running_total_tiler_heap_nr_chunks++;
-			kctx->running_total_tiler_heap_memory += heap->chunk_size;
-
-			if (kctx->running_total_tiler_heap_memory >
-			    kctx->peak_total_tiler_heap_memory)
-				kctx->peak_total_tiler_heap_memory =
-					kctx->running_total_tiler_heap_memory;
-		}
+	if (unlikely(!heap)) {
+		dev_err(kctx->kbdev->dev, "Tiler heap 0x%llX no longer exists!\n", gpu_heap_va);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto unroll_chunk;
+	}
 
-		KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(
-			kctx->kbdev, kctx->id, heap->heap_id,
-			PFN_UP(heap->chunk_size * heap->max_chunks),
-			PFN_UP(heap->chunk_size * heap->chunk_count),
-			heap->max_chunks, heap->chunk_size, heap->chunk_count,
-			heap->target_in_flight, nr_in_flight);
+	if (heap_id != heap->heap_id) {
+		dev_err(kctx->kbdev->dev,
+			"Tiler heap 0x%llX was removed from ctx %d_%d while allocating chunk of size %lld!",
+			gpu_heap_va, kctx->tgid, kctx->id, chunk_size);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto unroll_chunk;
 	}
 
+	if (WARN_ON(chunk_size != heap->chunk_size)) {
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto unroll_chunk;
+	}
+
+	err = validate_allocation_request(heap, nr_in_flight, pending_frag_count);
+	if (unlikely(err)) {
+		dev_warn(
+			kctx->kbdev->dev,
+			"Aborting linking chunk to heap 0x%llX: heap state changed during allocation (err %d)",
+			gpu_heap_va, err);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto unroll_chunk;
+	}
+
+	err = init_chunk(heap, chunk, false);
+
+	/* On error, the chunk would not be linked, so we can still treat it as an unlinked
+	 * chunk for error handling.
+	 */
+	if (unlikely(err)) {
+		dev_err(kctx->kbdev->dev,
+			"Could not link chunk(0x%llX) with tiler heap 0%llX in ctx %d_%d due to error %d",
+			chunk->gpu_va, gpu_heap_va, kctx->tgid, kctx->id, err);
+		mutex_unlock(&kctx->csf.tiler_heaps.lock);
+		goto unroll_chunk;
+	}
+
+	*new_chunk_ptr = encode_chunk_ptr(heap->chunk_size, chunk->gpu_va);
+
+	/* update total and peak tiler heap memory record */
+	kctx->running_total_tiler_heap_nr_chunks++;
+	kctx->running_total_tiler_heap_memory += heap->chunk_size;
+
+	if (kctx->running_total_tiler_heap_memory > kctx->peak_total_tiler_heap_memory)
+		kctx->peak_total_tiler_heap_memory = kctx->running_total_tiler_heap_memory;
+
+	KBASE_TLSTREAM_AUX_TILER_HEAP_STATS(kctx->kbdev, kctx->id, heap->heap_id,
+					    PFN_UP(heap->chunk_size * heap->max_chunks),
+					    PFN_UP(heap->chunk_size * heap->chunk_count),
+					    heap->max_chunks, heap->chunk_size, heap->chunk_count,
+					    heap->target_in_flight, nr_in_flight);
+
 	mutex_unlock(&kctx->csf.tiler_heaps.lock);
 
 	return err;
+unroll_chunk:
+	remove_unlinked_chunk(kctx, chunk);
+prelink_failure:
+	return err;
+}
+
+static bool delete_chunk_physical_pages(struct kbase_csf_tiler_heap *heap, u64 chunk_gpu_va,
+					u64 *hdr_val)
+{
+	int err;
+	u64 *chunk_hdr;
+	struct kbase_context *kctx = heap->kctx;
+	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
+
+	lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock);
+	lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock);
+
+	chunk = find_chunk(heap, chunk_gpu_va);
+	if (unlikely(!chunk)) {
+		dev_warn(kctx->kbdev->dev,
+			 "Failed to find tiler heap(0x%llX) chunk(0x%llX) for reclaim-delete\n",
+			 heap->gpu_va, chunk_gpu_va);
+		return false;
+	}
+
+	WARN((chunk->region->flags & KBASE_REG_CPU_CACHED),
+	     "Cannot support CPU cached chunks without sync operations");
+	chunk_hdr = chunk->map.addr;
+	*hdr_val = *chunk_hdr;
+
+	dev_dbg(kctx->kbdev->dev,
+		"Reclaim: delete chunk(0x%llx) in heap(0x%llx), header value(0x%llX)\n",
+		chunk_gpu_va, heap->gpu_va, *hdr_val);
+
+	err = kbase_mem_shrink_gpu_mapping(kctx, chunk->region, 0, chunk->region->gpu_alloc->nents);
+	if (unlikely(err)) {
+		dev_warn(
+			kctx->kbdev->dev,
+			"Reclaim: shrinking GPU mapping failed on chunk(0x%llx) in heap(0x%llx) (err %d)\n",
+			chunk_gpu_va, heap->gpu_va, err);
+
+		/* Cannot free the pages whilst references on the GPU remain, so keep the chunk on
+		 * the heap's chunk list and try a different heap.
+		 */
+
+		return false;
+	}
+	/* Destroy the mapping before the physical pages which are mapped are destroyed. */
+	kbase_vunmap(kctx, &chunk->map);
+
+	err = kbase_free_phy_pages_helper(chunk->region->gpu_alloc,
+					  chunk->region->gpu_alloc->nents);
+	if (unlikely(err)) {
+		dev_warn(
+			kctx->kbdev->dev,
+			"Reclaim: remove physical backing failed on chunk(0x%llx) in heap(0x%llx) (err %d), continuing with deferred removal\n",
+			chunk_gpu_va, heap->gpu_va, err);
+
+		/* kbase_free_phy_pages_helper() should only fail on invalid input, and WARNs
+		 * anyway, so continue instead of returning early.
+		 *
+		 * Indeed, we don't want to leave the chunk on the heap's chunk list whilst it has
+		 * its mapping removed, as that could lead to problems. It's safest to instead
+		 * continue with deferred destruction of the chunk.
+		 */
+	}
+
+	dev_dbg(kctx->kbdev->dev,
+		"Reclaim: delete chunk(0x%llx) in heap(0x%llx), header value(0x%llX)\n",
+		chunk_gpu_va, heap->gpu_va, *hdr_val);
+
+	mutex_lock(&heap->kctx->jit_evict_lock);
+	list_move(&chunk->region->jit_node, &kctx->jit_destroy_head);
+	mutex_unlock(&heap->kctx->jit_evict_lock);
+
+	list_del(&chunk->link);
+	heap->chunk_count--;
+	kfree(chunk);
+
+	return true;
+}
+
+static void sanity_check_gpu_buffer_heap(struct kbase_csf_tiler_heap *heap,
+					 struct kbase_csf_gpu_buffer_heap *desc)
+{
+	u64 first_hoarded_chunk_gpu_va = desc->pointer & CHUNK_ADDR_MASK;
+
+	lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock);
+
+	if (first_hoarded_chunk_gpu_va) {
+		struct kbase_csf_tiler_heap_chunk *chunk =
+			find_chunk(heap, first_hoarded_chunk_gpu_va);
+
+		if (likely(chunk)) {
+			dev_dbg(heap->kctx->kbdev->dev,
+				"Buffer descriptor 0x%llX sanity check ok, HW reclaim allowed\n",
+				heap->buf_desc_va);
+
+			heap->buf_desc_checked = true;
+			return;
+		}
+	}
+	/* If there is no match, defer the check to next time */
+	dev_dbg(heap->kctx->kbdev->dev, "Buffer descriptor 0x%llX runtime sanity check deferred\n",
+		heap->buf_desc_va);
+}
+
+static bool can_read_hw_gpu_buffer_heap(struct kbase_csf_tiler_heap *heap, u64 *chunk_gpu_va_ptr)
+{
+	struct kbase_context *kctx = heap->kctx;
+
+	lockdep_assert_held(&kctx->csf.tiler_heaps.lock);
+
+	/* Initialize the descriptor pointer value to 0 */
+	*chunk_gpu_va_ptr = 0;
+
+	/* The BufferDescriptor on heap is a hint on creation, do a sanity check at runtime */
+	if (heap->buf_desc_reg && !heap->buf_desc_checked) {
+		struct kbase_csf_gpu_buffer_heap *desc = heap->buf_desc_map.addr;
+
+		/* BufferDescriptor is supplied by userspace, so could be CPU-cached */
+		if (heap->buf_desc_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED)
+			kbase_sync_mem_regions(kctx, &heap->buf_desc_map, KBASE_SYNC_TO_CPU);
+
+		sanity_check_gpu_buffer_heap(heap, desc);
+		if (heap->buf_desc_checked)
+			*chunk_gpu_va_ptr = desc->pointer & CHUNK_ADDR_MASK;
+	}
+
+	return heap->buf_desc_checked;
+}
+
+static u32 delete_hoarded_chunks(struct kbase_csf_tiler_heap *heap)
+{
+	u32 freed = 0;
+	u64 chunk_gpu_va = 0;
+	struct kbase_context *kctx = heap->kctx;
+	struct kbase_csf_tiler_heap_chunk *chunk = NULL;
+
+	lockdep_assert_held(&kctx->csf.tiler_heaps.lock);
+
+	if (can_read_hw_gpu_buffer_heap(heap, &chunk_gpu_va)) {
+		u64 chunk_hdr_val;
+		u64 *hw_hdr;
+
+		if (!chunk_gpu_va) {
+			struct kbase_csf_gpu_buffer_heap *desc = heap->buf_desc_map.addr;
+
+			/* BufferDescriptor is supplied by userspace, so could be CPU-cached */
+			if (heap->buf_desc_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED)
+				kbase_sync_mem_regions(kctx, &heap->buf_desc_map,
+						       KBASE_SYNC_TO_CPU);
+			chunk_gpu_va = desc->pointer & CHUNK_ADDR_MASK;
+
+			if (!chunk_gpu_va) {
+				dev_dbg(kctx->kbdev->dev,
+					"Buffer descriptor 0x%llX has no chunks (NULL) for reclaim scan\n",
+					heap->buf_desc_va);
+				goto out;
+			}
+		}
+
+		chunk = find_chunk(heap, chunk_gpu_va);
+		if (unlikely(!chunk))
+			goto out;
+
+		WARN((chunk->region->flags & KBASE_REG_CPU_CACHED),
+		     "Cannot support CPU cached chunks without sync operations");
+		hw_hdr = chunk->map.addr;
+
+		/* Move onto the next chunk relevant information */
+		chunk_hdr_val = *hw_hdr;
+		chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK;
+
+		while (chunk_gpu_va && heap->chunk_count > HEAP_SHRINK_STOP_LIMIT) {
+			bool success =
+				delete_chunk_physical_pages(heap, chunk_gpu_va, &chunk_hdr_val);
+
+			if (!success)
+				break;
+
+			freed++;
+			/* On success, chunk_hdr_val is updated, extract the next chunk address */
+			chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK;
+		}
+
+		/* Update the existing hardware chunk header, after reclaim deletion of chunks */
+		*hw_hdr = chunk_hdr_val;
+
+		dev_dbg(heap->kctx->kbdev->dev,
+			"HW reclaim scan freed chunks: %u, set hw_hdr[0]: 0x%llX\n", freed,
+			chunk_hdr_val);
+	} else {
+		dev_dbg(kctx->kbdev->dev,
+			"Skip HW reclaim scan, (disabled: buffer descriptor 0x%llX)\n",
+			heap->buf_desc_va);
+	}
+out:
+	return freed;
+}
+
+static u64 delete_unused_chunk_pages(struct kbase_csf_tiler_heap *heap)
+{
+	u32 freed_chunks = 0;
+	u64 freed_pages = 0;
+	u64 chunk_gpu_va;
+	u64 chunk_hdr_val;
+	struct kbase_context *kctx = heap->kctx;
+	u64 *ctx_ptr;
+
+	lockdep_assert_held(&kctx->csf.tiler_heaps.lock);
+
+	WARN(heap->gpu_va_map.flags & KBASE_VMAP_FLAG_SYNC_NEEDED,
+	     "Cannot support CPU cached heap context without sync operations");
+
+	ctx_ptr = heap->gpu_va_map.addr;
+
+	/* Extract the first chunk address from the context's free_list_head */
+	chunk_hdr_val = *ctx_ptr;
+	chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK;
+
+	while (chunk_gpu_va) {
+		u64 hdr_val;
+		bool success = delete_chunk_physical_pages(heap, chunk_gpu_va, &hdr_val);
+
+		if (!success)
+			break;
+
+		freed_chunks++;
+		chunk_hdr_val = hdr_val;
+		/* extract the next chunk address */
+		chunk_gpu_va = chunk_hdr_val & CHUNK_ADDR_MASK;
+	}
+
+	/* Update the post-scan deletion to context header */
+	*ctx_ptr = chunk_hdr_val;
+
+	/* Try to scan the HW hoarded list of unused chunks */
+	freed_chunks += delete_hoarded_chunks(heap);
+	freed_pages = freed_chunks * PFN_UP(heap->chunk_size);
+	dev_dbg(heap->kctx->kbdev->dev,
+		"Scan reclaim freed chunks/pages %u/%llu, set heap-ctx_u64[0]: 0x%llX\n",
+		freed_chunks, freed_pages, chunk_hdr_val);
+
+	/* Update context tiler heaps memory usage */
+	kctx->running_total_tiler_heap_memory -= freed_pages << PAGE_SHIFT;
+	kctx->running_total_tiler_heap_nr_chunks -= freed_chunks;
+	return freed_pages;
+}
+
+u32 kbase_csf_tiler_heap_scan_kctx_unused_pages(struct kbase_context *kctx, u32 to_free)
+{
+	u64 freed = 0;
+	struct kbase_csf_tiler_heap *heap;
+
+	mutex_lock(&kctx->csf.tiler_heaps.lock);
+
+	list_for_each_entry(heap, &kctx->csf.tiler_heaps.list, link) {
+		freed += delete_unused_chunk_pages(heap);
+
+		/* If freed enough, then stop here */
+		if (freed >= to_free)
+			break;
+	}
+
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+	/* The scan is surely not more than 4-G pages, but for logic flow limit it */
+	if (WARN_ON(unlikely(freed > U32_MAX)))
+		return U32_MAX;
+	else
+		return (u32)freed;
+}
+
+static u64 count_unused_heap_pages(struct kbase_csf_tiler_heap *heap)
+{
+	u32 chunk_cnt = 0;
+	u64 page_cnt = 0;
+
+	lockdep_assert_held(&heap->kctx->csf.tiler_heaps.lock);
+
+	/* Here the count is basically an informed estimate, avoiding the costly mapping/unmaping
+	 * in the chunk list walk. The downside is that the number is a less reliable guide for
+	 * later on scan (free) calls on this heap for what actually is freeable.
+	 */
+	if (heap->chunk_count > HEAP_SHRINK_STOP_LIMIT) {
+		chunk_cnt = heap->chunk_count - HEAP_SHRINK_STOP_LIMIT;
+		page_cnt = chunk_cnt * PFN_UP(heap->chunk_size);
+	}
+
+	dev_dbg(heap->kctx->kbdev->dev,
+		"Reclaim count chunks/pages %u/%llu (estimated), heap_va: 0x%llX\n", chunk_cnt,
+		page_cnt, heap->gpu_va);
+
+	return page_cnt;
+}
+
+u32 kbase_csf_tiler_heap_count_kctx_unused_pages(struct kbase_context *kctx)
+{
+	u64 page_cnt = 0;
+	struct kbase_csf_tiler_heap *heap;
+
+	mutex_lock(&kctx->csf.tiler_heaps.lock);
+
+	list_for_each_entry(heap, &kctx->csf.tiler_heaps.list, link)
+		page_cnt += count_unused_heap_pages(heap);
+
+	mutex_unlock(&kctx->csf.tiler_heaps.lock);
+
+	/* The count is surely not more than 4-G pages, but for logic flow limit it */
+	if (WARN_ON(unlikely(page_cnt > U32_MAX)))
+		return U32_MAX;
+	else
+		return (u32)page_cnt;
 }
diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap.h
index 4031ad4..1b5cb56 100644
--- a/mali_kbase/csf/mali_kbase_csf_tiler_heap.h
+++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,7 +23,6 @@
 #define _KBASE_CSF_TILER_HEAP_H_
 
 #include <mali_kbase.h>
-
 /**
  * kbase_csf_tiler_heap_context_init - Initialize the tiler heaps context for a
  *                                     GPU address space
@@ -58,6 +57,12 @@ void kbase_csf_tiler_heap_context_term(struct kbase_context *kctx);
  * @target_in_flight: Number of render-passes that the driver should attempt to
  *                    keep in flight for which allocation of new chunks is
  *                    allowed. Must not be zero.
+ * @buf_desc_va: Buffer descriptor GPU virtual address. This is a hint for
+ *               indicating that the caller is intending to perform tiler heap
+ *               chunks reclaim for those that are hoarded with hardware while
+ *               the associated shader activites are suspended and the CSGs are
+ *               off slots. If the referred reclaiming is not desired, can
+ *               set it to 0.
  * @gpu_heap_va: Where to store the GPU virtual address of the context that was
  *               set up for the tiler heap.
  * @first_chunk_va: Where to store the GPU virtual address of the first chunk
@@ -66,13 +71,12 @@ void kbase_csf_tiler_heap_context_term(struct kbase_context *kctx);
  *
  * Return: 0 if successful or a negative error code on failure.
  */
-int kbase_csf_tiler_heap_init(struct kbase_context *kctx,
-	u32 chunk_size, u32 initial_chunks, u32 max_chunks,
-	u16 target_in_flight, u64 *gpu_heap_va,
-	u64 *first_chunk_va);
+int kbase_csf_tiler_heap_init(struct kbase_context *kctx, u32 chunk_size, u32 initial_chunks,
+			      u32 max_chunks, u16 target_in_flight, u64 const buf_desc_va,
+			      u64 *gpu_heap_va, u64 *first_chunk_va);
 
 /**
- * kbasep_cs_tiler_heap_term - Terminate a chunked tiler memory heap.
+ * kbase_csf_tiler_heap_term - Terminate a chunked tiler memory heap.
  *
  * @kctx: Pointer to the kbase context in which the tiler heap was initialized.
  * @gpu_heap_va: The GPU virtual address of the context that was set up for the
@@ -112,4 +116,27 @@ int kbase_csf_tiler_heap_term(struct kbase_context *kctx, u64 gpu_heap_va);
  */
 int kbase_csf_tiler_heap_alloc_new_chunk(struct kbase_context *kctx,
 	u64 gpu_heap_va, u32 nr_in_flight, u32 pending_frag_count, u64 *new_chunk_ptr);
+
+/**
+ * kbase_csf_tiler_heap_scan_kctx_unused_pages - Performs the tiler heap shrinker calim's scan
+ *                                               functionality.
+ *
+ * @kctx:               Pointer to the kbase context for which the tiler heap recalim is to be
+ *                      operated with.
+ * @to_free:            Number of pages suggested for the reclaim scan (free) method to reach.
+ *
+ * Return: the actual number of pages the scan method has freed from the call.
+ */
+u32 kbase_csf_tiler_heap_scan_kctx_unused_pages(struct kbase_context *kctx, u32 to_free);
+
+/**
+ * kbase_csf_tiler_heap_count_kctx_unused_pages - Performs the tiler heap shrinker calim's count
+ *                                                functionality.
+ *
+ * @kctx:               Pointer to the kbase context for which the tiler heap recalim is to be
+ *                      operated with.
+ *
+ * Return: a number of pages that could likely be freed on the subsequent scan method call.
+ */
+u32 kbase_csf_tiler_heap_count_kctx_unused_pages(struct kbase_context *kctx);
 #endif
diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h
index 2c006d9..96f2b03 100644
--- a/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h
+++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_def.h
@@ -56,12 +56,20 @@
 	((CHUNK_HDR_NEXT_ADDR_MASK >> CHUNK_HDR_NEXT_ADDR_POS) << \
 	 CHUNK_HDR_NEXT_ADDR_ENCODE_SHIFT)
 
+/* The size of the area needed to be vmapped prior to handing the tiler heap
+ * over to the tiler, so that the shrinker could be invoked.
+ */
+#define NEXT_CHUNK_ADDR_SIZE (sizeof(u64))
+
 /**
  * struct kbase_csf_tiler_heap_chunk - A tiler heap chunk managed by the kernel
  *
  * @link:   Link to this chunk in a list of chunks belonging to a
  *          @kbase_csf_tiler_heap.
  * @region: Pointer to the GPU memory region allocated for the chunk.
+ * @map:    Kernel VA mapping so that we would not need to use vmap in the
+ *          shrinker callback, which can allocate. This maps only the header
+ *          of the chunk, so it could be traversed.
  * @gpu_va: GPU virtual address of the start of the memory region.
  *          This points to the header of the chunk and not to the low address
  *          of free memory within it.
@@ -75,9 +83,12 @@
 struct kbase_csf_tiler_heap_chunk {
 	struct list_head link;
 	struct kbase_va_region *region;
+	struct kbase_vmap_struct map;
 	u64 gpu_va;
 };
 
+#define HEAP_BUF_DESCRIPTOR_CHECKED (1 << 0)
+
 /**
  * struct kbase_csf_tiler_heap - A tiler heap managed by the kernel
  *
@@ -85,6 +96,20 @@ struct kbase_csf_tiler_heap_chunk {
  *                   associated.
  * @link:            Link to this heap in a list of tiler heaps belonging to
  *                   the @kbase_csf_tiler_heap_context.
+ * @chunks_list:     Linked list of allocated chunks.
+ * @gpu_va:          The GPU virtual address of the heap context structure that
+ *                   was allocated for the firmware. This is also used to
+ *                   uniquely identify the heap.
+ * @heap_id:         Unique id representing the heap, assigned during heap
+ *                   initialization.
+ * @buf_desc_va:     Buffer descriptor GPU VA. Can be 0 for backward compatible
+ *                   to earlier version base interfaces.
+ * @buf_desc_reg:    Pointer to the VA region that covers the provided buffer
+ *                   descriptor memory object pointed to by buf_desc_va.
+ * @gpu_va_map:      Kernel VA mapping of the GPU VA region.
+ * @buf_desc_map:    Kernel VA mapping of the buffer descriptor, read from
+ *                   during the tiler heap shrinker. Sync operations may need
+ *                   to be done before each read.
  * @chunk_size:      Size of each chunk, in bytes. Must be page-aligned.
  * @chunk_count:     The number of chunks currently allocated. Must not be
  *                   zero or greater than @max_chunks.
@@ -93,22 +118,23 @@ struct kbase_csf_tiler_heap_chunk {
  * @target_in_flight: Number of render-passes that the driver should attempt
  *                    to keep in flight for which allocation of new chunks is
  *                    allowed. Must not be zero.
- * @gpu_va:          The GPU virtual address of the heap context structure that
- *                   was allocated for the firmware. This is also used to
- *                   uniquely identify the heap.
- * @heap_id:         Unique id representing the heap, assigned during heap
- *                   initialization.
- * @chunks_list:     Linked list of allocated chunks.
+ * @buf_desc_checked: Indicates if runtime check on buffer descriptor has been done.
  */
 struct kbase_csf_tiler_heap {
 	struct kbase_context *kctx;
 	struct list_head link;
+	struct list_head chunks_list;
+	u64 gpu_va;
+	u64 heap_id;
+	u64 buf_desc_va;
+	struct kbase_va_region *buf_desc_reg;
+	struct kbase_vmap_struct buf_desc_map;
+	struct kbase_vmap_struct gpu_va_map;
 	u32 chunk_size;
 	u32 chunk_count;
 	u32 max_chunks;
 	u16 target_in_flight;
-	u64 gpu_va;
-	u64 heap_id;
-	struct list_head chunks_list;
+	bool buf_desc_checked;
 };
+
 #endif /* !_KBASE_CSF_TILER_HEAP_DEF_H_ */
diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c
new file mode 100644
index 0000000..39db1a0
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.c
@@ -0,0 +1,394 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <mali_kbase.h>
+#include "backend/gpu/mali_kbase_pm_internal.h"
+#include "mali_kbase_csf.h"
+#include "mali_kbase_csf_tiler_heap.h"
+#include "mali_kbase_csf_tiler_heap_reclaim.h"
+
+/* Tiler heap shrinker seek value, needs to be higher than jit and memory pools */
+#define HEAP_SHRINKER_SEEKS (DEFAULT_SEEKS + 2)
+
+/* Tiler heap shrinker batch value */
+#define HEAP_SHRINKER_BATCH (512)
+
+/* Tiler heap reclaim scan (free) method size for limiting a scan run length */
+#define HEAP_RECLAIM_SCAN_BATCH_SIZE (HEAP_SHRINKER_BATCH << 7)
+
+static u8 get_kctx_highest_csg_priority(struct kbase_context *kctx)
+{
+	u8 prio;
+
+	for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_LOW;
+	     prio++)
+		if (!list_empty(&kctx->csf.sched.runnable_groups[prio]))
+			break;
+
+	if (prio != KBASE_QUEUE_GROUP_PRIORITY_REALTIME && kctx->csf.sched.num_idle_wait_grps) {
+		struct kbase_queue_group *group;
+
+		list_for_each_entry(group, &kctx->csf.sched.idle_wait_groups, link) {
+			if (group->priority < prio)
+				prio = group->priority;
+		}
+	}
+
+	return prio;
+}
+
+static void detach_ctx_from_heap_reclaim_mgr(struct kbase_context *kctx)
+{
+	struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler;
+	struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	if (!list_empty(&info->mgr_link)) {
+		u32 remaining = (info->nr_est_unused_pages > info->nr_freed_pages) ?
+					info->nr_est_unused_pages - info->nr_freed_pages :
+					0;
+
+		list_del_init(&info->mgr_link);
+		if (remaining)
+			WARN_ON(atomic_sub_return(remaining, &scheduler->reclaim_mgr.unused_pages) <
+				0);
+
+		dev_dbg(kctx->kbdev->dev,
+			"Reclaim_mgr_detach: ctx_%d_%d, est_pages=0%u, freed_pages=%u", kctx->tgid,
+			kctx->id, info->nr_est_unused_pages, info->nr_freed_pages);
+	}
+}
+
+static void attach_ctx_to_heap_reclaim_mgr(struct kbase_context *kctx)
+{
+	struct kbase_csf_ctx_heap_reclaim_info *const info = &kctx->csf.sched.heap_info;
+	struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler;
+	u8 const prio = get_kctx_highest_csg_priority(kctx);
+
+	lockdep_assert_held(&scheduler->lock);
+
+	if (WARN_ON(!list_empty(&info->mgr_link)))
+		list_del_init(&info->mgr_link);
+
+	/* Count the pages that could be freed */
+	info->nr_est_unused_pages = kbase_csf_tiler_heap_count_kctx_unused_pages(kctx);
+	/* Initialize the scan operation tracking pages */
+	info->nr_freed_pages = 0;
+
+	list_add_tail(&info->mgr_link, &scheduler->reclaim_mgr.ctx_lists[prio]);
+	/* Accumulate the estimated pages to the manager total field */
+	atomic_add(info->nr_est_unused_pages, &scheduler->reclaim_mgr.unused_pages);
+
+	dev_dbg(kctx->kbdev->dev, "Reclaim_mgr_attach: ctx_%d_%d, est_count_pages=%u", kctx->tgid,
+		kctx->id, info->nr_est_unused_pages);
+}
+
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(struct kbase_queue_group *group)
+{
+	struct kbase_context *kctx = group->kctx;
+	struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info;
+
+	lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock);
+
+	info->on_slot_grps++;
+	/* If the kctx has an on-slot change from 0 => 1, detach it from reclaim_mgr */
+	if (info->on_slot_grps == 1) {
+		dev_dbg(kctx->kbdev->dev, "CSG_%d_%d_%d on-slot, remove kctx from reclaim manager",
+			group->kctx->tgid, group->kctx->id, group->handle);
+
+		detach_ctx_from_heap_reclaim_mgr(kctx);
+	}
+}
+
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(struct kbase_queue_group *group)
+{
+	struct kbase_context *kctx = group->kctx;
+	struct kbase_csf_ctx_heap_reclaim_info *const info = &kctx->csf.sched.heap_info;
+	struct kbase_csf_scheduler *const scheduler = &kctx->kbdev->csf.scheduler;
+	const u32 num_groups = kctx->kbdev->csf.global_iface.group_num;
+	u32 on_slot_grps = 0;
+	u32 i;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	/* Group eviction from the scheduler is a bit more complex, but fairly less
+	 * frequent in operations. Taking the opportunity to actually count the
+	 * on-slot CSGs from the given kctx, for robustness and clearer code logic.
+	 */
+	for_each_set_bit(i, scheduler->csg_inuse_bitmap, num_groups) {
+		struct kbase_csf_csg_slot *csg_slot = &scheduler->csg_slots[i];
+		struct kbase_queue_group *grp = csg_slot->resident_group;
+
+		if (unlikely(!grp))
+			continue;
+
+		if (grp->kctx == kctx)
+			on_slot_grps++;
+	}
+
+	info->on_slot_grps = on_slot_grps;
+
+	/* If the kctx has no other CSGs on-slot, handle the heap reclaim related actions */
+	if (!info->on_slot_grps) {
+		if (kctx->csf.sched.num_runnable_grps || kctx->csf.sched.num_idle_wait_grps) {
+			/* The kctx has other operational CSGs, attach it if not yet done */
+			if (list_empty(&info->mgr_link)) {
+				dev_dbg(kctx->kbdev->dev,
+					"CSG_%d_%d_%d evict, add kctx to reclaim manager",
+					group->kctx->tgid, group->kctx->id, group->handle);
+
+				attach_ctx_to_heap_reclaim_mgr(kctx);
+			}
+		} else {
+			/* The kctx is a zombie after the group eviction, drop it out */
+			dev_dbg(kctx->kbdev->dev,
+				"CSG_%d_%d_%d evict leading to zombie kctx, dettach from reclaim manager",
+				group->kctx->tgid, group->kctx->id, group->handle);
+
+			detach_ctx_from_heap_reclaim_mgr(kctx);
+		}
+	}
+}
+
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(struct kbase_queue_group *group)
+{
+	struct kbase_context *kctx = group->kctx;
+	struct kbase_csf_ctx_heap_reclaim_info *info = &kctx->csf.sched.heap_info;
+
+	lockdep_assert_held(&kctx->kbdev->csf.scheduler.lock);
+
+	if (!WARN_ON(info->on_slot_grps == 0))
+		info->on_slot_grps--;
+	/* If the kctx has no CSGs on-slot, attach it to scheduler's reclaim manager */
+	if (info->on_slot_grps == 0) {
+		dev_dbg(kctx->kbdev->dev, "CSG_%d_%d_%d off-slot, add kctx to reclaim manager",
+			group->kctx->tgid, group->kctx->id, group->handle);
+
+		attach_ctx_to_heap_reclaim_mgr(kctx);
+	}
+}
+
+static unsigned long reclaim_unused_heap_pages(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	struct kbase_csf_sched_heap_reclaim_mgr *const mgr = &scheduler->reclaim_mgr;
+	unsigned long total_freed_pages = 0;
+	int prio;
+
+	lockdep_assert_held(&scheduler->lock);
+
+	if (scheduler->state != SCHED_SUSPENDED) {
+		/* Clean and invalidate the L2 cache before reading from the heap contexts,
+		 * headers of the individual chunks and buffer descriptors.
+		 */
+		kbase_gpu_start_cache_clean(kbdev, GPU_COMMAND_CACHE_CLN_INV_L2);
+		if (kbase_gpu_wait_cache_clean_timeout(kbdev,
+						       kbdev->mmu_or_gpu_cache_op_wait_time_ms))
+			dev_warn(
+				kbdev->dev,
+				"[%llu] Timeout waiting for CACHE_CLN_INV_L2 to complete before Tiler heap reclaim",
+				kbase_backend_get_cycle_cnt(kbdev));
+
+	} else {
+		/* Make sure power down transitions have completed, i.e. L2 has been
+		 * powered off as that would ensure its contents are flushed to memory.
+		 * This is needed as Scheduler doesn't wait for the power down to finish.
+		 */
+		if (kbase_pm_wait_for_desired_state(kbdev))
+			dev_warn(kbdev->dev,
+				 "Wait for power down transition failed before Tiler heap reclaim");
+	}
+
+	for (prio = KBASE_QUEUE_GROUP_PRIORITY_LOW;
+	     total_freed_pages < HEAP_RECLAIM_SCAN_BATCH_SIZE &&
+	     prio >= KBASE_QUEUE_GROUP_PRIORITY_REALTIME;
+	     prio--) {
+		struct kbase_csf_ctx_heap_reclaim_info *info, *tmp;
+		u32 cnt_ctxs = 0;
+
+		list_for_each_entry_safe(info, tmp, &scheduler->reclaim_mgr.ctx_lists[prio],
+					 mgr_link) {
+			struct kbase_context *kctx =
+				container_of(info, struct kbase_context, csf.sched.heap_info);
+			u32 freed_pages = kbase_csf_tiler_heap_scan_kctx_unused_pages(
+				kctx, info->nr_est_unused_pages);
+
+			if (freed_pages) {
+				/* Remove the freed pages from the manager retained estimate. The
+				 * accumulated removals from the kctx should not exceed the kctx
+				 * initially notified contribution amount:
+				 *   info->nr_est_unused_pages.
+				 */
+				u32 rm_cnt = MIN(info->nr_est_unused_pages - info->nr_freed_pages,
+						 freed_pages);
+
+				WARN_ON(atomic_sub_return(rm_cnt, &mgr->unused_pages) < 0);
+
+				/* tracking the freed pages, before a potential detach call */
+				info->nr_freed_pages += freed_pages;
+				total_freed_pages += freed_pages;
+
+				schedule_work(&kctx->jit_work);
+			}
+
+			/* If the kctx can't offer anymore, drop it from the reclaim manger,
+			 * otherwise leave it remaining in. If the kctx changes its state (i.e.
+			 * some CSGs becoming on-slot), the scheduler will pull it out.
+			 */
+			if (info->nr_freed_pages >= info->nr_est_unused_pages || freed_pages == 0)
+				detach_ctx_from_heap_reclaim_mgr(kctx);
+
+			cnt_ctxs++;
+
+			/* Enough has been freed, break to avoid holding the lock too long */
+			if (total_freed_pages >= HEAP_RECLAIM_SCAN_BATCH_SIZE)
+				break;
+		}
+
+		dev_dbg(kbdev->dev, "Reclaim free heap pages: %lu (cnt_ctxs: %u, prio: %d)",
+			total_freed_pages, cnt_ctxs, prio);
+	}
+
+	dev_dbg(kbdev->dev, "Reclaim free total heap pages: %lu (across all CSG priority)",
+		total_freed_pages);
+
+	return total_freed_pages;
+}
+
+static unsigned long kbase_csf_tiler_heap_reclaim_count_free_pages(struct kbase_device *kbdev,
+								   struct shrink_control *sc)
+{
+	struct kbase_csf_sched_heap_reclaim_mgr *mgr = &kbdev->csf.scheduler.reclaim_mgr;
+	unsigned long page_cnt = atomic_read(&mgr->unused_pages);
+
+	dev_dbg(kbdev->dev, "Reclaim count unused pages (estimate): %lu", page_cnt);
+
+	return page_cnt;
+}
+
+static unsigned long kbase_csf_tiler_heap_reclaim_scan_free_pages(struct kbase_device *kbdev,
+								  struct shrink_control *sc)
+{
+	struct kbase_csf_sched_heap_reclaim_mgr *mgr = &kbdev->csf.scheduler.reclaim_mgr;
+	unsigned long freed = 0;
+	unsigned long avail = 0;
+
+	/* If Scheduler is busy in action, return 0 */
+	if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) {
+		struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+
+		/* Wait for roughly 2-ms */
+		wait_event_timeout(kbdev->csf.event_wait, (scheduler->state != SCHED_BUSY),
+				   msecs_to_jiffies(2));
+		if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) {
+			dev_dbg(kbdev->dev, "Tiler heap reclaim scan see device busy (freed: 0)");
+			return 0;
+		}
+	}
+
+	avail = atomic_read(&mgr->unused_pages);
+	if (avail)
+		freed = reclaim_unused_heap_pages(kbdev);
+
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+
+#if (KERNEL_VERSION(4, 14, 0) <= LINUX_VERSION_CODE)
+	if (freed > sc->nr_to_scan)
+		sc->nr_scanned = freed;
+#endif /* (KERNEL_VERSION(4, 14, 0) <= LINUX_VERSION_CODE) */
+
+	dev_dbg(kbdev->dev, "Tiler heap reclaim scan freed pages: %lu (unused: %lu)", freed,
+		 avail);
+
+	/* On estimate suggesting available, yet actual free failed, return STOP */
+	if (avail && !freed)
+		return SHRINK_STOP;
+	else
+		return freed;
+}
+
+static unsigned long kbase_csf_tiler_heap_reclaim_count_objects(struct shrinker *s,
+								struct shrink_control *sc)
+{
+	struct kbase_device *kbdev =
+		container_of(s, struct kbase_device, csf.scheduler.reclaim_mgr.heap_reclaim);
+
+	return kbase_csf_tiler_heap_reclaim_count_free_pages(kbdev, sc);
+}
+
+static unsigned long kbase_csf_tiler_heap_reclaim_scan_objects(struct shrinker *s,
+							       struct shrink_control *sc)
+{
+	struct kbase_device *kbdev =
+		container_of(s, struct kbase_device, csf.scheduler.reclaim_mgr.heap_reclaim);
+
+	return kbase_csf_tiler_heap_reclaim_scan_free_pages(kbdev, sc);
+}
+
+void kbase_csf_tiler_heap_reclaim_ctx_init(struct kbase_context *kctx)
+{
+	/* Per-kctx heap_info object initialization */
+	memset(&kctx->csf.sched.heap_info, 0, sizeof(struct kbase_csf_ctx_heap_reclaim_info));
+	INIT_LIST_HEAD(&kctx->csf.sched.heap_info.mgr_link);
+}
+
+void kbase_csf_tiler_heap_reclaim_mgr_init(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	struct shrinker *reclaim = &scheduler->reclaim_mgr.heap_reclaim;
+	u8 prio;
+
+	for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_COUNT;
+	     prio++)
+		INIT_LIST_HEAD(&scheduler->reclaim_mgr.ctx_lists[prio]);
+
+	atomic_set(&scheduler->reclaim_mgr.unused_pages, 0);
+
+	reclaim->count_objects = kbase_csf_tiler_heap_reclaim_count_objects;
+	reclaim->scan_objects = kbase_csf_tiler_heap_reclaim_scan_objects;
+	reclaim->seeks = HEAP_SHRINKER_SEEKS;
+	reclaim->batch = HEAP_SHRINKER_BATCH;
+
+#if !defined(CONFIG_MALI_VECTOR_DUMP)
+#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE
+	register_shrinker(reclaim);
+#else
+	register_shrinker(reclaim, "mali-csf-tiler-heap");
+#endif
+#endif
+}
+
+void kbase_csf_tiler_heap_reclaim_mgr_term(struct kbase_device *kbdev)
+{
+	struct kbase_csf_scheduler *scheduler = &kbdev->csf.scheduler;
+	u8 prio;
+
+#if !defined(CONFIG_MALI_VECTOR_DUMP)
+	unregister_shrinker(&scheduler->reclaim_mgr.heap_reclaim);
+#endif
+
+	for (prio = KBASE_QUEUE_GROUP_PRIORITY_REALTIME; prio < KBASE_QUEUE_GROUP_PRIORITY_COUNT;
+	     prio++)
+		WARN_ON(!list_empty(&scheduler->reclaim_mgr.ctx_lists[prio]));
+
+	WARN_ON(atomic_read(&scheduler->reclaim_mgr.unused_pages));
+}
diff --git a/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h
new file mode 100644
index 0000000..b6e580e
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_csf_tiler_heap_reclaim.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_CSF_TILER_HEAP_RECLAIM_H_
+#define _KBASE_CSF_TILER_HEAP_RECLAIM_H_
+
+#include <mali_kbase.h>
+
+/**
+ * kbase_csf_tiler_heap_reclaim_sched_notify_grp_active - Notifier function for the scheduler
+ *                                                        to use when a group is put on-slot.
+ *
+ * @group: Pointer to the group object that has been placed on-slot for running.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_active(struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict - Notifier function for the scheduler
+ *               to use when a group is evicted out of the schedulder's scope, i.e no run of
+ *               the group is possible afterwards.
+ *
+ * @group: Pointer to the group object that has been evicted.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_evict(struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend - Notifier function for the scheduler
+ *                to use when a group is suspended from running, but could resume in future.
+ *
+ * @group: Pointer to the group object that is in suspended state.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_sched_notify_grp_suspend(struct kbase_queue_group *group);
+
+/**
+ * kbase_csf_tiler_heap_reclaim_ctx_init - Initializer on per context data fields for use
+ *                                         with the tiler heap reclaim manager.
+ *
+ * @kctx: Pointer to the kbase_context.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_ctx_init(struct kbase_context *kctx);
+
+/**
+ * kbase_csf_tiler_heap_reclaim_mgr_init - Initializer for the tiler heap reclaim manger.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_mgr_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_csf_tiler_heap_reclaim_mgr_term - Termination call for the tiler heap reclaim manger.
+ *
+ * @kbdev: Pointer to the device.
+ *
+ */
+void kbase_csf_tiler_heap_reclaim_mgr_term(struct kbase_device *kbdev);
+
+#endif
diff --git a/mali_kbase/csf/mali_kbase_csf_timeout.c b/mali_kbase/csf/mali_kbase_csf_timeout.c
index ea6c116..f7fcbb1 100644
--- a/mali_kbase/csf/mali_kbase_csf_timeout.c
+++ b/mali_kbase/csf/mali_kbase_csf_timeout.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -52,6 +52,7 @@ static int set_timeout(struct kbase_device *const kbdev, u64 const timeout)
 	dev_dbg(kbdev->dev, "New progress timeout: %llu cycles\n", timeout);
 
 	atomic64_set(&kbdev->csf.progress_timeout, timeout);
+	kbase_device_set_timeout(kbdev, CSF_SCHED_PROTM_PROGRESS_TIMEOUT, timeout, 1);
 
 	return 0;
 }
@@ -100,7 +101,7 @@ static ssize_t progress_timeout_store(struct device * const dev,
 	if (!err) {
 		kbase_csf_scheduler_pm_active(kbdev);
 
-		err = kbase_csf_scheduler_wait_mcu_active(kbdev);
+		err = kbase_csf_scheduler_killable_wait_mcu_active(kbdev);
 		if (!err)
 			err = kbase_csf_firmware_set_timeout(kbdev, timeout);
 
@@ -147,8 +148,14 @@ int kbase_csf_timeout_init(struct kbase_device *const kbdev)
 	int err;
 
 #if IS_ENABLED(CONFIG_OF)
-	err = of_property_read_u64(kbdev->dev->of_node,
-		"progress_timeout", &timeout);
+	/* Read "progress-timeout" property and fallback to "progress_timeout"
+	 * if not found.
+	 */
+	err = of_property_read_u64(kbdev->dev->of_node, "progress-timeout", &timeout);
+
+	if (err == -EINVAL)
+		err = of_property_read_u64(kbdev->dev->of_node, "progress_timeout", &timeout);
+
 	if (!err)
 		dev_info(kbdev->dev, "Found progress_timeout = %llu in Devicetree\n",
 			timeout);
diff --git a/mali_kbase/csf/mali_kbase_csf_tl_reader.c b/mali_kbase/csf/mali_kbase_csf_tl_reader.c
index f40be8f..ce50683 100644
--- a/mali_kbase/csf/mali_kbase_csf_tl_reader.c
+++ b/mali_kbase/csf/mali_kbase_csf_tl_reader.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,21 +31,14 @@
 #include "mali_kbase_pm.h"
 #include "mali_kbase_hwaccess_time.h"
 
-#include <linux/gcd.h>
 #include <linux/math64.h>
-#include <asm/arch_timer.h>
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 #include "tl/mali_kbase_timeline_priv.h"
 #include <linux/debugfs.h>
-
-#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE)
-#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE
-#endif
+#include <linux/version_compat_defs.h>
 #endif
 
-/* Name of the CSFFW timeline tracebuffer. */
-#define KBASE_CSFFW_TRACEBUFFER_NAME "timeline"
 /* Name of the timeline header metatadata */
 #define KBASE_CSFFW_TIMELINE_HEADER_NAME "timeline_header"
 
@@ -92,93 +85,15 @@ DEFINE_DEBUGFS_ATTRIBUTE(kbase_csf_tl_poll_interval_fops,
 		kbase_csf_tl_debugfs_poll_interval_read,
 		kbase_csf_tl_debugfs_poll_interval_write, "%llu\n");
 
-
 void kbase_csf_tl_reader_debugfs_init(struct kbase_device *kbdev)
 {
 	debugfs_create_file("csf_tl_poll_interval_in_ms", 0644,
 		kbdev->debugfs_instr_directory, kbdev,
 		&kbase_csf_tl_poll_interval_fops);
-
 }
 #endif
 
 /**
- * get_cpu_gpu_time() - Get current CPU and GPU timestamps.
- *
- * @kbdev:	Kbase device.
- * @cpu_ts:	Output CPU timestamp.
- * @gpu_ts:	Output GPU timestamp.
- * @gpu_cycle:  Output GPU cycle counts.
- */
-static void get_cpu_gpu_time(
-	struct kbase_device *kbdev,
-	u64 *cpu_ts,
-	u64 *gpu_ts,
-	u64 *gpu_cycle)
-{
-	struct timespec64 ts;
-
-	kbase_pm_context_active(kbdev);
-	kbase_backend_get_gpu_time(kbdev, gpu_cycle, gpu_ts, &ts);
-	kbase_pm_context_idle(kbdev);
-
-	if (cpu_ts)
-		*cpu_ts = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
-}
-
-
-/**
- * kbase_ts_converter_init() - Initialize system timestamp converter.
- *
- * @self:	System Timestamp Converter instance.
- * @kbdev:	Kbase device pointer
- *
- * Return: Zero on success, -1 otherwise.
- */
-static int kbase_ts_converter_init(
-	struct kbase_ts_converter *self,
-	struct kbase_device *kbdev)
-{
-	u64 cpu_ts = 0;
-	u64 gpu_ts = 0;
-	u64 freq;
-	u64 common_factor;
-
-	get_cpu_gpu_time(kbdev, &cpu_ts, &gpu_ts, NULL);
-	freq = arch_timer_get_cntfrq();
-
-	if (!freq) {
-		dev_warn(kbdev->dev, "arch_timer_get_rate() is zero!");
-		return -1;
-	}
-
-	common_factor = gcd(NSEC_PER_SEC, freq);
-
-	self->multiplier = div64_u64(NSEC_PER_SEC, common_factor);
-	self->divisor = div64_u64(freq, common_factor);
-	self->offset =
-		cpu_ts - div64_u64(gpu_ts * self->multiplier, self->divisor);
-
-	return 0;
-}
-
-/**
- * kbase_ts_converter_convert() - Convert GPU timestamp to CPU timestamp.
- *
- * @self:	System Timestamp Converter instance.
- * @gpu_ts:	System timestamp value to converter.
- *
- * Return: The CPU timestamp.
- */
-static void __maybe_unused
-kbase_ts_converter_convert(const struct kbase_ts_converter *self, u64 *gpu_ts)
-{
-	u64 old_gpu_ts = *gpu_ts;
-	*gpu_ts = div64_u64(old_gpu_ts * self->multiplier, self->divisor) +
-		  self->offset;
-}
-
-/**
  * tl_reader_overflow_notify() - Emit stream overflow tracepoint.
  *
  * @self:		CSFFW TL Reader instance.
@@ -254,7 +169,6 @@ static void tl_reader_reset(struct kbase_csf_tl_reader *self)
 	self->tl_header.btc = 0;
 }
 
-
 int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self)
 {
 	int ret = 0;
@@ -279,7 +193,6 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self)
 		return -EBUSY;
 	}
 
-
 	/* Copying the whole buffer in a single shot. We assume
 	 * that the buffer will not contain partially written messages.
 	 */
@@ -301,7 +214,7 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self)
 			dev_warn(
 				kbdev->dev,
 				"Unable to parse CSFFW tracebuffer event header.");
-				ret = -EBUSY;
+			ret = -EBUSY;
 			break;
 		}
 
@@ -322,7 +235,7 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self)
 			dev_warn(kbdev->dev,
 				"event_id: %u, can't read with event_size: %u.",
 				event_id, event_size);
-				ret = -EBUSY;
+			ret = -EBUSY;
 			break;
 		}
 
@@ -330,8 +243,8 @@ int kbase_csf_tl_reader_flush_buffer(struct kbase_csf_tl_reader *self)
 		{
 			struct kbase_csffw_tl_message *msg =
 				(struct kbase_csffw_tl_message *) csffw_data_it;
-			kbase_ts_converter_convert(&self->ts_converter,
-						   &msg->timestamp);
+			msg->timestamp =
+				kbase_backend_time_convert_gpu_to_cpu(kbdev, msg->timestamp);
 		}
 
 		/* Copy the message out to the tl_stream. */
@@ -384,16 +297,13 @@ static int tl_reader_init_late(
 	if (self->kbdev)
 		return 0;
 
-	tb = kbase_csf_firmware_get_trace_buffer(
-		kbdev, KBASE_CSFFW_TRACEBUFFER_NAME);
+	tb = kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_TIMELINE_BUF_NAME);
 	hdr = kbase_csf_firmware_get_timeline_metadata(
 		kbdev, KBASE_CSFFW_TIMELINE_HEADER_NAME, &hdr_size);
 
 	if (!tb) {
-		dev_warn(
-			kbdev->dev,
-			"'%s' tracebuffer is not present in the firmware image.",
-			KBASE_CSFFW_TRACEBUFFER_NAME);
+		dev_warn(kbdev->dev, "'%s' tracebuffer is not present in the firmware image.",
+			 KBASE_CSFFW_TIMELINE_BUF_NAME);
 		return -1;
 	}
 
@@ -405,9 +315,6 @@ static int tl_reader_init_late(
 		return -1;
 	}
 
-	if (kbase_ts_converter_init(&self->ts_converter, kbdev))
-		return -1;
-
 	self->kbdev = kbdev;
 	self->trace_buffer = tb;
 	self->tl_header.data = hdr;
diff --git a/mali_kbase/csf/mali_kbase_csf_tl_reader.h b/mali_kbase/csf/mali_kbase_csf_tl_reader.h
index d554d56..12b285f 100644
--- a/mali_kbase/csf/mali_kbase_csf_tl_reader.h
+++ b/mali_kbase/csf/mali_kbase_csf_tl_reader.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -40,37 +40,6 @@ struct kbase_tlstream;
 struct kbase_device;
 
 /**
- * struct kbase_ts_converter - System timestamp to CPU timestamp converter state.
- *
- * @multiplier:		Numerator of the converter's fraction.
- * @divisor:		Denominator of the converter's fraction.
- * @offset:		Converter's offset term.
- *
- * According to Generic timer spec, system timer:
- * - Increments at a fixed frequency
- * - Starts operating from zero
- *
- * Hence CPU time is a linear function of System Time.
- *
- * CPU_ts = alpha * SYS_ts + beta
- *
- * Where
- * - alpha = 10^9/SYS_ts_freq
- * - beta is calculated by two timer samples taken at the same time:
- *   beta = CPU_ts_s - SYS_ts_s * alpha
- *
- * Since alpha is a rational number, we minimizing possible
- * rounding error by simplifying the ratio. Thus alpha is stored
- * as a simple `multiplier / divisor` ratio.
- *
- */
-struct kbase_ts_converter {
-	u64 multiplier;
-	u64 divisor;
-	s64 offset;
-};
-
-/**
  * struct kbase_csf_tl_reader - CSFFW timeline reader state.
  *
  * @read_timer:        Timer used for periodical tracebufer reading.
@@ -106,7 +75,6 @@ struct kbase_csf_tl_reader {
 		size_t size;
 		size_t btc;
 	} tl_header;
-	struct kbase_ts_converter ts_converter;
 
 	bool got_first_event;
 	bool is_active;
diff --git a/mali_kbase/csf/mali_kbase_csf_trace_buffer.c b/mali_kbase/csf/mali_kbase_csf_trace_buffer.c
index e90d30d..2b63f19 100644
--- a/mali_kbase/csf/mali_kbase_csf_trace_buffer.c
+++ b/mali_kbase/csf/mali_kbase_csf_trace_buffer.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,12 +28,7 @@
 
 #include <linux/list.h>
 #include <linux/mman.h>
-
-#if IS_ENABLED(CONFIG_DEBUG_FS)
-#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE)
-#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE
-#endif
-#endif
+#include <linux/version_compat_defs.h>
 
 /**
  * struct firmware_trace_buffer - Trace Buffer within the MCU firmware
@@ -94,7 +89,7 @@ struct firmware_trace_buffer {
 	} cpu_va;
 	u32 num_pages;
 	u32 trace_enable_init_mask[CSF_FIRMWARE_TRACE_ENABLE_INIT_MASK_MAX];
-	char name[1]; /* this field must be last */
+	char name[]; /* this field must be last */
 };
 
 /**
@@ -123,11 +118,19 @@ struct firmware_trace_buffer_data {
  */
 static const struct firmware_trace_buffer_data trace_buffer_data[] = {
 #if MALI_UNIT_TEST
-	{ "fwutf", { 0 }, 1 },
+	{ KBASE_CSFFW_UTF_BUF_NAME, { 0 }, 1 },
 #endif
-	{ FW_TRACE_BUF_NAME, { 0 }, 4 },
-	{ "benchmark", { 0 }, 2 },
-	{ "timeline", { 0 }, KBASE_CSF_TL_BUFFER_NR_PAGES },
+#ifdef CONFIG_MALI_PIXEL_GPU_SSCD
+	/* Enable all the logs */
+	{ KBASE_CSFFW_LOG_BUF_NAME, { 0xFFFFFFFF }, FW_TRACE_BUF_NR_PAGES },
+#else
+	{ KBASE_CSFFW_LOG_BUF_NAME, { 0 }, FW_TRACE_BUF_NR_PAGES },
+#endif /* CONFIG_MALI_PIXEL_GPU_SSCD */
+	{ KBASE_CSFFW_BENCHMARK_BUF_NAME, { 0 }, 2 },
+	{ KBASE_CSFFW_TIMELINE_BUF_NAME, { 0 }, KBASE_CSF_TL_BUFFER_NR_PAGES },
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	{ KBASE_CSFFW_GPU_METRICS_BUF_NAME, { 0 }, 8 },
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
 };
 
 int kbase_csf_firmware_trace_buffers_init(struct kbase_device *kbdev)
@@ -265,7 +268,7 @@ int kbase_csf_firmware_parse_trace_buffer_entry(struct kbase_device *kbdev,
 	 * trace buffer name (with NULL termination).
 	 */
 	trace_buffer =
-		kmalloc(sizeof(*trace_buffer) + name_len + 1, GFP_KERNEL);
+		kmalloc(struct_size(trace_buffer, name, name_len + 1), GFP_KERNEL);
 
 	if (!trace_buffer)
 		return -ENOMEM;
@@ -512,10 +515,47 @@ unsigned int kbase_csf_firmware_trace_buffer_read_data(
 }
 EXPORT_SYMBOL(kbase_csf_firmware_trace_buffer_read_data);
 
-#if IS_ENABLED(CONFIG_DEBUG_FS)
+void kbase_csf_firmware_trace_buffer_discard(struct firmware_trace_buffer *trace_buffer)
+{
+	unsigned int bytes_discarded;
+	u32 buffer_size = trace_buffer->num_pages << PAGE_SHIFT;
+	u32 extract_offset = *(trace_buffer->cpu_va.extract_cpu_va);
+	u32 insert_offset = *(trace_buffer->cpu_va.insert_cpu_va);
+	unsigned int trace_size;
+
+	if (insert_offset >= extract_offset) {
+		trace_size = insert_offset - extract_offset;
+		if (trace_size > buffer_size / 2) {
+			bytes_discarded = trace_size - buffer_size / 2;
+			extract_offset += bytes_discarded;
+			*(trace_buffer->cpu_va.extract_cpu_va) = extract_offset;
+		}
+	} else {
+		unsigned int bytes_tail;
+
+		bytes_tail = buffer_size - extract_offset;
+		trace_size = bytes_tail + insert_offset;
+		if (trace_size > buffer_size / 2) {
+			bytes_discarded = trace_size - buffer_size / 2;
+			extract_offset += bytes_discarded;
+			if (extract_offset >= buffer_size)
+				extract_offset = extract_offset - buffer_size;
+			*(trace_buffer->cpu_va.extract_cpu_va) = extract_offset;
+		}
+	}
+}
+EXPORT_SYMBOL(kbase_csf_firmware_trace_buffer_discard);
+
+static void update_trace_buffer_active_mask64(struct firmware_trace_buffer *tb, u64 mask)
+{
+	unsigned int i;
+
+	for (i = 0; i < tb->trace_enable_entry_count; i++)
+		kbasep_csf_firmware_trace_buffer_update_trace_enable_bit(tb, i, (mask >> i) & 1);
+}
 
 #define U32_BITS 32
-static u64 get_trace_buffer_active_mask64(struct firmware_trace_buffer *tb)
+u64 kbase_csf_firmware_trace_buffer_get_active_mask64(struct firmware_trace_buffer *tb)
 {
 	u64 active_mask = tb->trace_enable_init_mask[0];
 
@@ -525,18 +565,7 @@ static u64 get_trace_buffer_active_mask64(struct firmware_trace_buffer *tb)
 	return active_mask;
 }
 
-static void update_trace_buffer_active_mask64(struct firmware_trace_buffer *tb,
-		u64 mask)
-{
-	unsigned int i;
-
-	for (i = 0; i < tb->trace_enable_entry_count; i++)
-		kbasep_csf_firmware_trace_buffer_update_trace_enable_bit(
-			tb, i, (mask >> i) & 1);
-}
-
-static int set_trace_buffer_active_mask64(struct firmware_trace_buffer *tb,
-		u64 mask)
+int kbase_csf_firmware_trace_buffer_set_active_mask64(struct firmware_trace_buffer *tb, u64 mask)
 {
 	struct kbase_device *kbdev = tb->kbdev;
 	unsigned long flags;
@@ -564,124 +593,3 @@ static int set_trace_buffer_active_mask64(struct firmware_trace_buffer *tb,
 
 	return err;
 }
-
-static int kbase_csf_firmware_trace_enable_mask_read(void *data, u64 *val)
-{
-	struct kbase_device *kbdev = (struct kbase_device *)data;
-	struct firmware_trace_buffer *tb =
-		kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME);
-
-	if (tb == NULL) {
-		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
-		return -EIO;
-	}
-	/* The enabled traces limited to u64 here, regarded practical */
-	*val = get_trace_buffer_active_mask64(tb);
-	return 0;
-}
-
-static int kbase_csf_firmware_trace_enable_mask_write(void *data, u64 val)
-{
-	struct kbase_device *kbdev = (struct kbase_device *)data;
-	struct firmware_trace_buffer *tb =
-		kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME);
-	u64 new_mask;
-	unsigned int enable_bits_count;
-
-	if (tb == NULL) {
-		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
-		return -EIO;
-	}
-
-	/* Ignore unsupported types */
-	enable_bits_count =
-	    kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count(tb);
-	if (enable_bits_count > 64) {
-		dev_dbg(kbdev->dev, "Limit enabled bits count from %u to 64",
-			enable_bits_count);
-		enable_bits_count = 64;
-	}
-	new_mask = val & ((1 << enable_bits_count) - 1);
-
-	if (new_mask != get_trace_buffer_active_mask64(tb))
-		return set_trace_buffer_active_mask64(tb, new_mask);
-	else
-		return 0;
-}
-
-static int kbasep_csf_firmware_trace_debugfs_open(struct inode *in,
-		struct file *file)
-{
-	struct kbase_device *kbdev = in->i_private;
-
-	file->private_data = kbdev;
-	dev_dbg(kbdev->dev, "Opened firmware trace buffer dump debugfs file");
-
-	return 0;
-}
-
-static ssize_t kbasep_csf_firmware_trace_debugfs_read(struct file *file,
-		char __user *buf, size_t size, loff_t *ppos)
-{
-	struct kbase_device *kbdev = file->private_data;
-	u8 *pbyte;
-	unsigned int n_read;
-	unsigned long not_copied;
-	/* Limit the kernel buffer to no more than two pages */
-	size_t mem = MIN(size, 2 * PAGE_SIZE);
-	unsigned long flags;
-
-	struct firmware_trace_buffer *tb =
-		kbase_csf_firmware_get_trace_buffer(kbdev, FW_TRACE_BUF_NAME);
-
-	if (tb == NULL) {
-		dev_err(kbdev->dev, "Couldn't get the firmware trace buffer");
-		return -EIO;
-	}
-
-	pbyte = kmalloc(mem, GFP_KERNEL);
-	if (pbyte == NULL) {
-		dev_err(kbdev->dev, "Couldn't allocate memory for trace buffer dump");
-		return -ENOMEM;
-	}
-
-	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	n_read = kbase_csf_firmware_trace_buffer_read_data(tb, pbyte, mem);
-	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-
-	/* Do the copy, if we have obtained some trace data */
-	not_copied = (n_read) ? copy_to_user(buf, pbyte, n_read) : 0;
-	kfree(pbyte);
-
-	if (!not_copied) {
-		*ppos += n_read;
-		return n_read;
-	}
-
-	dev_err(kbdev->dev, "Couldn't copy trace buffer data to user space buffer");
-	return -EFAULT;
-}
-
-
-DEFINE_SIMPLE_ATTRIBUTE(kbase_csf_firmware_trace_enable_mask_fops,
-		kbase_csf_firmware_trace_enable_mask_read,
-		kbase_csf_firmware_trace_enable_mask_write, "%llx\n");
-
-static const struct file_operations kbasep_csf_firmware_trace_debugfs_fops = {
-	.owner = THIS_MODULE,
-	.open = kbasep_csf_firmware_trace_debugfs_open,
-	.read = kbasep_csf_firmware_trace_debugfs_read,
-	.llseek = no_llseek,
-};
-
-void kbase_csf_firmware_trace_buffer_debugfs_init(struct kbase_device *kbdev)
-{
-	debugfs_create_file("fw_trace_enable_mask", 0644,
-			    kbdev->mali_debugfs_directory, kbdev,
-			    &kbase_csf_firmware_trace_enable_mask_fops);
-
-	debugfs_create_file("fw_traces", 0444,
-			    kbdev->mali_debugfs_directory, kbdev,
-			    &kbasep_csf_firmware_trace_debugfs_fops);
-}
-#endif /* CONFIG_DEBUG_FS */
diff --git a/mali_kbase/csf/mali_kbase_csf_trace_buffer.h b/mali_kbase/csf/mali_kbase_csf_trace_buffer.h
index 823ace7..c0a42ca 100644
--- a/mali_kbase/csf/mali_kbase_csf_trace_buffer.h
+++ b/mali_kbase/csf/mali_kbase_csf_trace_buffer.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,7 +25,16 @@
 #include <linux/types.h>
 
 #define CSF_FIRMWARE_TRACE_ENABLE_INIT_MASK_MAX (4)
-#define FW_TRACE_BUF_NAME "fwlog"
+#define FW_TRACE_BUF_NR_PAGES 4
+#if MALI_UNIT_TEST
+#define KBASE_CSFFW_UTF_BUF_NAME "fwutf"
+#endif
+#define KBASE_CSFFW_LOG_BUF_NAME "fwlog"
+#define KBASE_CSFFW_BENCHMARK_BUF_NAME "benchmark"
+#define KBASE_CSFFW_TIMELINE_BUF_NAME "timeline"
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#define KBASE_CSFFW_GPU_METRICS_BUF_NAME "gpu_metrics"
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
 
 /* Forward declarations */
 struct firmware_trace_buffer;
@@ -58,7 +67,7 @@ struct kbase_device;
 int kbase_csf_firmware_trace_buffers_init(struct kbase_device *kbdev);
 
 /**
- * kbase_csf_firmware_trace_buffer_term - Terminate trace buffers
+ * kbase_csf_firmware_trace_buffers_term - Terminate trace buffers
  *
  * @kbdev: Device pointer
  */
@@ -116,7 +125,8 @@ struct firmware_trace_buffer *kbase_csf_firmware_get_trace_buffer(
 	struct kbase_device *kbdev, const char *name);
 
 /**
- * kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count - Get number of trace enable bits for a trace buffer
+ * kbase_csf_firmware_trace_buffer_get_trace_enable_bits_count - Get number of trace enable bits
+ *                                                               for a trace buffer
  *
  * @trace_buffer: Trace buffer handle
  *
@@ -165,15 +175,32 @@ bool kbase_csf_firmware_trace_buffer_is_empty(
 unsigned int kbase_csf_firmware_trace_buffer_read_data(
 	struct firmware_trace_buffer *trace_buffer, u8 *data, unsigned int num_bytes);
 
-#if IS_ENABLED(CONFIG_DEBUG_FS)
 /**
- * kbase_csf_fw_trace_buffer_debugfs_init() - Add debugfs entries for setting
- *                                         enable mask and dumping the binary
- *                                         firmware trace buffer
+ * kbase_csf_firmware_trace_buffer_discard - Discard data from a trace buffer
  *
- * @kbdev: Pointer to the device
+ * @trace_buffer: Trace buffer handle
+ *
+ * Discard part of the data in the trace buffer to reduce its utilization to half of its size.
+ */
+void kbase_csf_firmware_trace_buffer_discard(struct firmware_trace_buffer *trace_buffer);
+
+/**
+ * kbase_csf_firmware_trace_buffer_get_active_mask64 - Get trace buffer active mask
+ *
+ * @tb: Trace buffer handle
+ *
+ * Return: Trace buffer active mask.
+ */
+u64 kbase_csf_firmware_trace_buffer_get_active_mask64(struct firmware_trace_buffer *tb);
+
+/**
+ * kbase_csf_firmware_trace_buffer_set_active_mask64 - Set trace buffer active mask
+ *
+ * @tb: Trace buffer handle
+ * @mask: New active mask
+ *
+ * Return: 0 if successful, negative error code on failure.
  */
-void kbase_csf_firmware_trace_buffer_debugfs_init(struct kbase_device *kbdev);
-#endif /* CONFIG_DEBUG_FS */
+int kbase_csf_firmware_trace_buffer_set_active_mask64(struct firmware_trace_buffer *tb, u64 mask);
 
 #endif /* _KBASE_CSF_TRACE_BUFFER_H_ */
diff --git a/mali_kbase/csf/mali_kbase_debug_csf_fault.c b/mali_kbase/csf/mali_kbase_debug_csf_fault.c
new file mode 100644
index 0000000..185779c
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_debug_csf_fault.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <mali_kbase.h>
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+
+/**
+ * kbasep_fault_occurred - Check if fault occurred.
+ *
+ * @kbdev:  Device pointer
+ *
+ * Return: true if a fault occurred.
+ */
+static bool kbasep_fault_occurred(struct kbase_device *kbdev)
+{
+	unsigned long flags;
+	bool ret;
+
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+	ret = (kbdev->csf.dof.error_code != DF_NO_ERROR);
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+
+	return ret;
+}
+
+void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev)
+{
+	if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev))) {
+		dev_dbg(kbdev->dev, "No userspace client for dumping exists");
+		return;
+	}
+
+	wait_event(kbdev->csf.dof.dump_wait_wq, kbase_debug_csf_fault_dump_complete(kbdev));
+}
+KBASE_EXPORT_TEST_API(kbase_debug_csf_fault_wait_completion);
+
+/**
+ * kbase_debug_csf_fault_wakeup - Wake up a waiting user space client.
+ *
+ * @kbdev:   Kbase device
+ */
+static void kbase_debug_csf_fault_wakeup(struct kbase_device *kbdev)
+{
+	wake_up_interruptible(&kbdev->csf.dof.fault_wait_wq);
+}
+
+bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev,
+	struct kbase_context *kctx, enum dumpfault_error_type error)
+{
+	unsigned long flags;
+
+	if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev)))
+		return false;
+
+	if (WARN_ON(error == DF_NO_ERROR))
+		return false;
+
+	if (kctx && kbase_ctx_flag(kctx, KCTX_DYING)) {
+		dev_info(kbdev->dev, "kctx %d_%d is dying when error %d is reported",
+			kctx->tgid, kctx->id, error);
+		kctx = NULL;
+	}
+
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+
+	/* Only one fault at a time can be processed */
+	if (kbdev->csf.dof.error_code) {
+		dev_info(kbdev->dev, "skip this fault as there's a pending fault");
+		goto unlock;
+	}
+
+	kbdev->csf.dof.kctx_tgid = kctx ? kctx->tgid : 0;
+	kbdev->csf.dof.kctx_id = kctx ? kctx->id : 0;
+	kbdev->csf.dof.error_code = error;
+	kbase_debug_csf_fault_wakeup(kbdev);
+
+unlock:
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+	return true;
+}
+
+static ssize_t debug_csf_fault_read(struct file *file, char __user *buffer, size_t size,
+				    loff_t *f_pos)
+{
+#define BUF_SIZE 64
+	struct kbase_device *kbdev;
+	unsigned long flags;
+	int count;
+	char buf[BUF_SIZE];
+	u32 tgid, ctx_id;
+	enum dumpfault_error_type error_code;
+
+	if (unlikely(!file)) {
+		pr_warn("%s: file is NULL", __func__);
+		return -EINVAL;
+	}
+
+	kbdev = file->private_data;
+	if (unlikely(!buffer)) {
+		dev_warn(kbdev->dev, "%s: buffer is NULL", __func__);
+		return -EINVAL;
+	}
+
+	if (unlikely(*f_pos < 0)) {
+		dev_warn(kbdev->dev, "%s: f_pos is negative", __func__);
+		return -EINVAL;
+	}
+
+	if (size < sizeof(buf)) {
+		dev_warn(kbdev->dev, "%s: buffer is too small", __func__);
+		return -EINVAL;
+	}
+
+	if (wait_event_interruptible(kbdev->csf.dof.fault_wait_wq, kbasep_fault_occurred(kbdev)))
+		return -ERESTARTSYS;
+
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+	tgid = kbdev->csf.dof.kctx_tgid;
+	ctx_id = kbdev->csf.dof.kctx_id;
+	error_code = kbdev->csf.dof.error_code;
+	BUILD_BUG_ON(sizeof(buf) < (sizeof(tgid) + sizeof(ctx_id) + sizeof(error_code)));
+	count = scnprintf(buf, sizeof(buf), "%u_%u_%u\n", tgid, ctx_id, error_code);
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+
+	dev_info(kbdev->dev, "debug csf fault info read");
+	return simple_read_from_buffer(buffer, size, f_pos, buf, count);
+}
+
+static int debug_csf_fault_open(struct inode *in, struct file *file)
+{
+	struct kbase_device *kbdev;
+
+	if (unlikely(!in)) {
+		pr_warn("%s: inode is NULL", __func__);
+		return -EINVAL;
+	}
+
+	kbdev = in->i_private;
+	if (unlikely(!file)) {
+		dev_warn(kbdev->dev, "%s: file is NULL", __func__);
+		return -EINVAL;
+	}
+
+	if (atomic_cmpxchg(&kbdev->csf.dof.enabled, 0, 1) == 1) {
+		dev_warn(kbdev->dev, "Only one client is allowed for dump on fault");
+		return -EBUSY;
+	}
+
+	dev_info(kbdev->dev, "debug csf fault file open");
+
+	return simple_open(in, file);
+}
+
+static ssize_t debug_csf_fault_write(struct file *file, const char __user *ubuf, size_t count,
+				     loff_t *ppos)
+{
+	struct kbase_device *kbdev;
+	unsigned long flags;
+
+	if (unlikely(!file)) {
+		pr_warn("%s: file is NULL", __func__);
+		return -EINVAL;
+	}
+
+	kbdev = file->private_data;
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+	kbdev->csf.dof.error_code = DF_NO_ERROR;
+	kbdev->csf.dof.kctx_tgid = 0;
+	kbdev->csf.dof.kctx_id = 0;
+	dev_info(kbdev->dev, "debug csf fault dump complete");
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+
+	/* User space finished the dump.
+	 * Wake up blocked kernel threads to proceed.
+	 */
+	wake_up(&kbdev->csf.dof.dump_wait_wq);
+
+	return count;
+}
+
+static int debug_csf_fault_release(struct inode *in, struct file *file)
+{
+	struct kbase_device *kbdev;
+	unsigned long flags;
+
+	if (unlikely(!in)) {
+		pr_warn("%s: inode is NULL", __func__);
+		return -EINVAL;
+	}
+
+	kbdev = in->i_private;
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+	kbdev->csf.dof.kctx_tgid = 0;
+	kbdev->csf.dof.kctx_id = 0;
+	kbdev->csf.dof.error_code = DF_NO_ERROR;
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+
+	atomic_set(&kbdev->csf.dof.enabled, 0);
+	dev_info(kbdev->dev, "debug csf fault file close");
+
+	/* User space closed the debugfs file.
+	 * Wake up blocked kernel threads to resume.
+	 */
+	wake_up(&kbdev->csf.dof.dump_wait_wq);
+
+	return 0;
+}
+
+static const struct file_operations kbasep_debug_csf_fault_fops = {
+	.owner = THIS_MODULE,
+	.open = debug_csf_fault_open,
+	.read = debug_csf_fault_read,
+	.write = debug_csf_fault_write,
+	.llseek = default_llseek,
+	.release = debug_csf_fault_release,
+};
+
+void kbase_debug_csf_fault_debugfs_init(struct kbase_device *kbdev)
+{
+	const char *fname = "csf_fault";
+
+	if (unlikely(!kbdev)) {
+		pr_warn("%s: kbdev is NULL", __func__);
+		return;
+	}
+
+	debugfs_create_file(fname, 0600, kbdev->mali_debugfs_directory, kbdev,
+			    &kbasep_debug_csf_fault_fops);
+}
+
+int kbase_debug_csf_fault_init(struct kbase_device *kbdev)
+{
+	if (unlikely(!kbdev)) {
+		pr_warn("%s: kbdev is NULL", __func__);
+		return -EINVAL;
+	}
+
+	init_waitqueue_head(&(kbdev->csf.dof.fault_wait_wq));
+	init_waitqueue_head(&(kbdev->csf.dof.dump_wait_wq));
+	spin_lock_init(&kbdev->csf.dof.lock);
+	kbdev->csf.dof.kctx_tgid = 0;
+	kbdev->csf.dof.kctx_id = 0;
+	kbdev->csf.dof.error_code = DF_NO_ERROR;
+	atomic_set(&kbdev->csf.dof.enabled, 0);
+
+	return 0;
+}
+
+void kbase_debug_csf_fault_term(struct kbase_device *kbdev)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
diff --git a/mali_kbase/csf/mali_kbase_debug_csf_fault.h b/mali_kbase/csf/mali_kbase_debug_csf_fault.h
new file mode 100644
index 0000000..6e9b1a9
--- /dev/null
+++ b/mali_kbase/csf/mali_kbase_debug_csf_fault.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_DEBUG_CSF_FAULT_H
+#define _KBASE_DEBUG_CSF_FAULT_H
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+/**
+ * kbase_debug_csf_fault_debugfs_init - Initialize CSF fault debugfs
+ * @kbdev:	Device pointer
+ */
+void kbase_debug_csf_fault_debugfs_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_csf_fault_init - Create the fault event wait queue per device
+ *                              and initialize the required resources.
+ * @kbdev:    Device pointer
+ *
+ * Return: Zero on success or a negative error code.
+ */
+int kbase_debug_csf_fault_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_csf_fault_term - Clean up resources created by
+ *		                @kbase_debug_csf_fault_init.
+ * @kbdev:    Device pointer
+ */
+void kbase_debug_csf_fault_term(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_csf_fault_wait_completion - Wait for the client to complete.
+ *
+ * @kbdev:    Device Pointer
+ *
+ * Wait for the user space client to finish reading the fault information.
+ * This function must be called in thread context.
+ */
+void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_csf_fault_notify - Notify client of a fault.
+ *
+ * @kbdev:    Device pointer
+ * @kctx:     Faulty context (can be NULL)
+ * @error:    Error code.
+ *
+ * Store fault information and wake up the user space client.
+ *
+ * Return: true if a dump on fault was initiated or was is in progress and
+ *         so caller can opt to wait for the dumping to complete.
+ */
+bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev,
+		struct kbase_context *kctx, enum dumpfault_error_type error);
+
+/**
+ * kbase_debug_csf_fault_dump_enabled - Check if dump on fault is enabled.
+ *
+ * @kbdev:  Device pointer
+ *
+ * Return: true if debugfs file is opened so dump on fault is enabled.
+ */
+static inline bool kbase_debug_csf_fault_dump_enabled(struct kbase_device *kbdev)
+{
+	return atomic_read(&kbdev->csf.dof.enabled);
+}
+
+/**
+ * kbase_debug_csf_fault_dump_complete - Check if dump on fault is completed.
+ *
+ * @kbdev:  Device pointer
+ *
+ * Return: true if dump on fault completes or file is closed.
+ */
+static inline bool kbase_debug_csf_fault_dump_complete(struct kbase_device *kbdev)
+{
+	unsigned long flags;
+	bool ret;
+
+	if (likely(!kbase_debug_csf_fault_dump_enabled(kbdev)))
+		return true;
+
+	spin_lock_irqsave(&kbdev->csf.dof.lock, flags);
+	ret = (kbdev->csf.dof.error_code == DF_NO_ERROR);
+	spin_unlock_irqrestore(&kbdev->csf.dof.lock, flags);
+
+	return ret;
+}
+#else /* CONFIG_DEBUG_FS */
+static inline int kbase_debug_csf_fault_init(struct kbase_device *kbdev)
+{
+	return 0;
+}
+
+static inline void kbase_debug_csf_fault_term(struct kbase_device *kbdev)
+{
+}
+
+static inline void kbase_debug_csf_fault_wait_completion(struct kbase_device *kbdev)
+{
+}
+
+static inline bool kbase_debug_csf_fault_notify(struct kbase_device *kbdev,
+		struct kbase_context *kctx, enum dumpfault_error_type error)
+{
+	return false;
+}
+
+static inline bool kbase_debug_csf_fault_dump_enabled(struct kbase_device *kbdev)
+{
+	return false;
+}
+
+static inline bool kbase_debug_csf_fault_dump_complete(struct kbase_device *kbdev)
+{
+	return true;
+}
+#endif /* CONFIG_DEBUG_FS */
+
+#endif /*_KBASE_DEBUG_CSF_FAULT_H*/
diff --git a/mali_kbase/debug/Kbuild b/mali_kbase/debug/Kbuild
index 1682c0f..8beee2d 100644
--- a/mali_kbase/debug/Kbuild
+++ b/mali_kbase/debug/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -22,6 +22,7 @@ mali_kbase-y += debug/mali_kbase_debug_ktrace.o
 
 ifeq ($(CONFIG_MALI_CSF_SUPPORT),y)
     mali_kbase-y += debug/backend/mali_kbase_debug_ktrace_csf.o
+    mali_kbase-$(CONFIG_MALI_CORESIGHT) += debug/backend/mali_kbase_debug_coresight_csf.o
 else
     mali_kbase-y += debug/backend/mali_kbase_debug_ktrace_jm.o
 endif
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c b/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c
new file mode 100644
index 0000000..ff5f947
--- /dev/null
+++ b/mali_kbase/debug/backend/mali_kbase_debug_coresight_csf.c
@@ -0,0 +1,851 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <mali_kbase.h>
+#include <linux/slab.h>
+#include <csf/mali_kbase_csf_registers.h>
+#include <csf/mali_kbase_csf_firmware.h>
+#include <backend/gpu/mali_kbase_pm_internal.h>
+#include <linux/mali_kbase_debug_coresight_csf.h>
+#include <debug/backend/mali_kbase_debug_coresight_internal_csf.h>
+
+static const char *coresight_state_to_string(enum kbase_debug_coresight_csf_state state)
+{
+	switch (state) {
+	case KBASE_DEBUG_CORESIGHT_CSF_DISABLED:
+		return "DISABLED";
+	case KBASE_DEBUG_CORESIGHT_CSF_ENABLED:
+		return "ENABLED";
+	default:
+		break;
+	}
+
+	return "UNKNOWN";
+}
+
+static bool validate_reg_addr(struct kbase_debug_coresight_csf_client *client,
+			      struct kbase_device *kbdev, u32 reg_addr, u8 op_type)
+{
+	int i;
+
+	if (reg_addr & 0x3) {
+		dev_err(kbdev->dev, "Invalid operation %d: reg_addr (0x%x) not 32bit aligned",
+			op_type, reg_addr);
+		return false;
+	}
+
+	for (i = 0; i < client->nr_ranges; i++) {
+		struct kbase_debug_coresight_csf_address_range *range = &client->addr_ranges[i];
+
+		if ((range->start <= reg_addr) && (reg_addr <= range->end))
+			return true;
+	}
+
+	dev_err(kbdev->dev, "Invalid operation %d: reg_addr (0x%x) not in client range", op_type,
+		reg_addr);
+
+	return false;
+}
+
+static bool validate_op(struct kbase_debug_coresight_csf_client *client,
+			struct kbase_debug_coresight_csf_op *op)
+{
+	struct kbase_device *kbdev;
+	u32 reg;
+
+	if (!op)
+		return false;
+
+	if (!client)
+		return false;
+
+	kbdev = (struct kbase_device *)client->drv_data;
+
+	switch (op->type) {
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP:
+		return true;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM:
+		if (validate_reg_addr(client, kbdev, op->op.write_imm.reg_addr, op->type))
+			return true;
+
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE:
+		for (reg = op->op.write_imm_range.reg_start; reg <= op->op.write_imm_range.reg_end;
+		     reg += sizeof(u32)) {
+			if (!validate_reg_addr(client, kbdev, reg, op->type))
+				return false;
+		}
+
+		return true;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE:
+		if (!op->op.write.ptr) {
+			dev_err(kbdev->dev, "Invalid operation %d: ptr not set", op->type);
+			break;
+		}
+
+		if (validate_reg_addr(client, kbdev, op->op.write.reg_addr, op->type))
+			return true;
+
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ:
+		if (!op->op.read.ptr) {
+			dev_err(kbdev->dev, "Invalid operation %d: ptr not set", op->type);
+			break;
+		}
+
+		if (validate_reg_addr(client, kbdev, op->op.read.reg_addr, op->type))
+			return true;
+
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL:
+		if (validate_reg_addr(client, kbdev, op->op.poll.reg_addr, op->type))
+			return true;
+
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND:
+		fallthrough;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR:
+		fallthrough;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR:
+		fallthrough;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT:
+		if (op->op.bitw.ptr != NULL)
+			return true;
+
+		dev_err(kbdev->dev, "Invalid bitwise operation pointer");
+
+		break;
+	default:
+		dev_err(kbdev->dev, "Invalid operation %d", op->type);
+		break;
+	}
+
+	return false;
+}
+
+static bool validate_seq(struct kbase_debug_coresight_csf_client *client,
+			 struct kbase_debug_coresight_csf_sequence *seq)
+{
+	struct kbase_debug_coresight_csf_op *ops = seq->ops;
+	int nr_ops = seq->nr_ops;
+	int i;
+
+	for (i = 0; i < nr_ops; i++) {
+		if (!validate_op(client, &ops[i]))
+			return false;
+	}
+
+	return true;
+}
+
+static int execute_op(struct kbase_device *kbdev, struct kbase_debug_coresight_csf_op *op)
+{
+	int result = -EINVAL;
+	u32 reg;
+
+	dev_dbg(kbdev->dev, "Execute operation %d", op->type);
+
+	switch (op->type) {
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_NOP:
+		result = 0;
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM:
+		result = kbase_csf_firmware_mcu_register_write(kbdev, op->op.write.reg_addr,
+							       op->op.write_imm.val);
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE_IMM_RANGE:
+		for (reg = op->op.write_imm_range.reg_start; reg <= op->op.write_imm_range.reg_end;
+		     reg += sizeof(u32)) {
+			result = kbase_csf_firmware_mcu_register_write(kbdev, reg,
+								       op->op.write_imm_range.val);
+			if (!result)
+				break;
+		}
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_WRITE:
+		result = kbase_csf_firmware_mcu_register_write(kbdev, op->op.write.reg_addr,
+							       *op->op.write.ptr);
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_READ:
+		result = kbase_csf_firmware_mcu_register_read(kbdev, op->op.read.reg_addr,
+							      op->op.read.ptr);
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_POLL:
+		result = kbase_csf_firmware_mcu_register_poll(kbdev, op->op.poll.reg_addr,
+							      op->op.poll.mask, op->op.poll.val);
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_AND:
+		*op->op.bitw.ptr &= op->op.bitw.val;
+		result = 0;
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_OR:
+		*op->op.bitw.ptr |= op->op.bitw.val;
+		result = 0;
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_XOR:
+		*op->op.bitw.ptr ^= op->op.bitw.val;
+		result = 0;
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_OP_TYPE_BIT_NOT:
+		*op->op.bitw.ptr = ~(*op->op.bitw.ptr);
+		result = 0;
+		break;
+	default:
+		dev_err(kbdev->dev, "Invalid operation %d", op->type);
+		break;
+	}
+
+	return result;
+}
+
+static int coresight_config_enable(struct kbase_device *kbdev,
+				   struct kbase_debug_coresight_csf_config *config)
+{
+	int ret = 0;
+	int i;
+
+	if (!config)
+		return -EINVAL;
+
+	if (config->state == KBASE_DEBUG_CORESIGHT_CSF_ENABLED)
+		return ret;
+
+	for (i = 0; config->enable_seq && !ret && i < config->enable_seq->nr_ops; i++)
+		ret = execute_op(kbdev, &config->enable_seq->ops[i]);
+
+	if (!ret) {
+		dev_dbg(kbdev->dev, "Coresight config (0x%pK) state transition: %s to %s", config,
+			coresight_state_to_string(config->state),
+			coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_ENABLED));
+		config->state = KBASE_DEBUG_CORESIGHT_CSF_ENABLED;
+	}
+
+	/* Always assign the return code during config enable.
+	 * It gets propagated when calling config disable.
+	 */
+	config->error = ret;
+
+	return ret;
+}
+
+static int coresight_config_disable(struct kbase_device *kbdev,
+				    struct kbase_debug_coresight_csf_config *config)
+{
+	int ret = 0;
+	int i;
+
+	if (!config)
+		return -EINVAL;
+
+	if (config->state == KBASE_DEBUG_CORESIGHT_CSF_DISABLED)
+		return ret;
+
+	for (i = 0; config->disable_seq && !ret && i < config->disable_seq->nr_ops; i++)
+		ret = execute_op(kbdev, &config->disable_seq->ops[i]);
+
+	if (!ret) {
+		dev_dbg(kbdev->dev, "Coresight config (0x%pK) state transition: %s to %s", config,
+			coresight_state_to_string(config->state),
+			coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_DISABLED));
+		config->state = KBASE_DEBUG_CORESIGHT_CSF_DISABLED;
+	} else {
+		/* Only assign the error if ret is not 0.
+		 * As we don't want to overwrite an error from config enable
+		 */
+		if (!config->error)
+			config->error = ret;
+	}
+
+	return ret;
+}
+
+void *kbase_debug_coresight_csf_register(void *drv_data,
+					 struct kbase_debug_coresight_csf_address_range *ranges,
+					 int nr_ranges)
+{
+	struct kbase_debug_coresight_csf_client *client, *client_entry;
+	struct kbase_device *kbdev;
+	unsigned long flags;
+	int k;
+
+	if (unlikely(!drv_data)) {
+		pr_err("NULL drv_data");
+		return NULL;
+	}
+
+	kbdev = (struct kbase_device *)drv_data;
+
+	if (unlikely(!ranges)) {
+		dev_err(kbdev->dev, "NULL ranges");
+		return NULL;
+	}
+
+	if (unlikely(!nr_ranges)) {
+		dev_err(kbdev->dev, "nr_ranges is 0");
+		return NULL;
+	}
+
+	for (k = 0; k < nr_ranges; k++) {
+		if (ranges[k].end < ranges[k].start) {
+			dev_err(kbdev->dev, "Invalid address ranges 0x%08x - 0x%08x",
+				ranges[k].start, ranges[k].end);
+			return NULL;
+		}
+	}
+
+	client = kzalloc(sizeof(struct kbase_debug_coresight_csf_client), GFP_KERNEL);
+
+	if (!client)
+		return NULL;
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	list_for_each_entry(client_entry, &kbdev->csf.coresight.clients, link) {
+		struct kbase_debug_coresight_csf_address_range *client_ranges =
+			client_entry->addr_ranges;
+		int i;
+
+		for (i = 0; i < client_entry->nr_ranges; i++) {
+			int j;
+
+			for (j = 0; j < nr_ranges; j++) {
+				if ((ranges[j].start < client_ranges[i].end) &&
+				    (client_ranges[i].start < ranges[j].end)) {
+					spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+					kfree(client);
+					dev_err(kbdev->dev,
+						"Client with range 0x%08x - 0x%08x already present at address range 0x%08x - 0x%08x",
+						client_ranges[i].start, client_ranges[i].end,
+						ranges[j].start, ranges[j].end);
+
+					return NULL;
+				}
+			}
+		}
+	}
+
+	client->drv_data = drv_data;
+	client->addr_ranges = ranges;
+	client->nr_ranges = nr_ranges;
+	list_add(&client->link, &kbdev->csf.coresight.clients);
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	return client;
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_register);
+
+void kbase_debug_coresight_csf_unregister(void *client_data)
+{
+	struct kbase_debug_coresight_csf_client *client;
+	struct kbase_debug_coresight_csf_config *config_entry;
+	struct kbase_device *kbdev;
+	unsigned long flags;
+	bool retry = true;
+
+	if (unlikely(!client_data)) {
+		pr_err("NULL client");
+		return;
+	}
+
+	client = (struct kbase_debug_coresight_csf_client *)client_data;
+
+	kbdev = (struct kbase_device *)client->drv_data;
+	if (unlikely(!kbdev)) {
+		pr_err("NULL drv_data in client");
+		return;
+	}
+
+	/* check for active config from client */
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	list_del_init(&client->link);
+
+	while (retry && !list_empty(&kbdev->csf.coresight.configs)) {
+		retry = false;
+		list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+			if (config_entry->client == client) {
+				spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+				kbase_debug_coresight_csf_config_free(config_entry);
+				spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+				retry = true;
+				break;
+			}
+		}
+	}
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	kfree(client);
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_unregister);
+
+void *
+kbase_debug_coresight_csf_config_create(void *client_data,
+					struct kbase_debug_coresight_csf_sequence *enable_seq,
+					struct kbase_debug_coresight_csf_sequence *disable_seq)
+{
+	struct kbase_debug_coresight_csf_client *client;
+	struct kbase_debug_coresight_csf_config *config;
+	struct kbase_device *kbdev;
+
+	if (unlikely(!client_data)) {
+		pr_err("NULL client");
+		return NULL;
+	}
+
+	client = (struct kbase_debug_coresight_csf_client *)client_data;
+
+	kbdev = (struct kbase_device *)client->drv_data;
+	if (unlikely(!kbdev)) {
+		pr_err("NULL drv_data in client");
+		return NULL;
+	}
+
+	if (enable_seq) {
+		if (!validate_seq(client, enable_seq)) {
+			dev_err(kbdev->dev, "Invalid enable_seq");
+			return NULL;
+		}
+	}
+
+	if (disable_seq) {
+		if (!validate_seq(client, disable_seq)) {
+			dev_err(kbdev->dev, "Invalid disable_seq");
+			return NULL;
+		}
+	}
+
+	config = kzalloc(sizeof(struct kbase_debug_coresight_csf_config), GFP_KERNEL);
+	if (WARN_ON(!client))
+		return NULL;
+
+	config->client = client;
+	config->enable_seq = enable_seq;
+	config->disable_seq = disable_seq;
+	config->error = 0;
+	config->state = KBASE_DEBUG_CORESIGHT_CSF_DISABLED;
+
+	INIT_LIST_HEAD(&config->link);
+
+	return config;
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_config_create);
+
+void kbase_debug_coresight_csf_config_free(void *config_data)
+{
+	struct kbase_debug_coresight_csf_config *config;
+
+	if (unlikely(!config_data)) {
+		pr_err("NULL config");
+		return;
+	}
+
+	config = (struct kbase_debug_coresight_csf_config *)config_data;
+
+	kbase_debug_coresight_csf_config_disable(config);
+
+	kfree(config);
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_config_free);
+
+int kbase_debug_coresight_csf_config_enable(void *config_data)
+{
+	struct kbase_debug_coresight_csf_config *config;
+	struct kbase_debug_coresight_csf_client *client;
+	struct kbase_device *kbdev;
+	struct kbase_debug_coresight_csf_config *config_entry;
+	unsigned long flags;
+	int ret = 0;
+
+	if (unlikely(!config_data)) {
+		pr_err("NULL config");
+		return -EINVAL;
+	}
+
+	config = (struct kbase_debug_coresight_csf_config *)config_data;
+	client = (struct kbase_debug_coresight_csf_client *)config->client;
+
+	if (unlikely(!client)) {
+		pr_err("NULL client in config");
+		return -EINVAL;
+	}
+
+	kbdev = (struct kbase_device *)client->drv_data;
+	if (unlikely(!kbdev)) {
+		pr_err("NULL drv_data in client");
+		return -EINVAL;
+	}
+
+	/* Check to prevent double entry of config */
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+		if (config_entry == config) {
+			spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+			dev_err(kbdev->dev, "Config already enabled");
+			return -EINVAL;
+		}
+	}
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	kbase_csf_scheduler_lock(kbdev);
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	/* Check the state of Scheduler to confirm the desired state of MCU */
+	if (((kbdev->csf.scheduler.state != SCHED_SUSPENDED) &&
+	     (kbdev->csf.scheduler.state != SCHED_SLEEPING) &&
+	     !kbase_csf_scheduler_protected_mode_in_use(kbdev)) ||
+	    kbase_pm_get_policy(kbdev) == &kbase_pm_always_on_policy_ops) {
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		/* Wait for MCU to reach the stable ON state */
+		ret = kbase_pm_wait_for_desired_state(kbdev);
+
+		if (ret)
+			dev_err(kbdev->dev,
+				"Wait for PM state failed when enabling coresight config");
+		else
+			ret = coresight_config_enable(kbdev, config);
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	}
+
+	/* Add config to next enable sequence */
+	if (!ret) {
+		spin_lock(&kbdev->csf.coresight.lock);
+		list_add(&config->link, &kbdev->csf.coresight.configs);
+		spin_unlock(&kbdev->csf.coresight.lock);
+	}
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	kbase_csf_scheduler_unlock(kbdev);
+
+	return ret;
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_config_enable);
+
+int kbase_debug_coresight_csf_config_disable(void *config_data)
+{
+	struct kbase_debug_coresight_csf_config *config;
+	struct kbase_debug_coresight_csf_client *client;
+	struct kbase_device *kbdev;
+	struct kbase_debug_coresight_csf_config *config_entry;
+	bool found_in_list = false;
+	unsigned long flags;
+	int ret = 0;
+
+	if (unlikely(!config_data)) {
+		pr_err("NULL config");
+		return -EINVAL;
+	}
+
+	config = (struct kbase_debug_coresight_csf_config *)config_data;
+
+	/* Exit early if not enabled prior */
+	if (list_empty(&config->link))
+		return ret;
+
+	client = (struct kbase_debug_coresight_csf_client *)config->client;
+
+	if (unlikely(!client)) {
+		pr_err("NULL client in config");
+		return -EINVAL;
+	}
+
+	kbdev = (struct kbase_device *)client->drv_data;
+	if (unlikely(!kbdev)) {
+		pr_err("NULL drv_data in client");
+		return -EINVAL;
+	}
+
+	/* Check if the config is in the correct list */
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+		if (config_entry == config) {
+			found_in_list = true;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	if (!found_in_list) {
+		dev_err(kbdev->dev, "Config looks corrupted");
+		return -EINVAL;
+	}
+
+	kbase_csf_scheduler_lock(kbdev);
+	kbase_csf_scheduler_spin_lock(kbdev, &flags);
+
+	/* Check the state of Scheduler to confirm the desired state of MCU */
+	if (((kbdev->csf.scheduler.state != SCHED_SUSPENDED) &&
+	     (kbdev->csf.scheduler.state != SCHED_SLEEPING) &&
+	     !kbase_csf_scheduler_protected_mode_in_use(kbdev)) ||
+	    kbase_pm_get_policy(kbdev) == &kbase_pm_always_on_policy_ops) {
+		kbase_csf_scheduler_spin_unlock(kbdev, flags);
+		/* Wait for MCU to reach the stable ON state */
+		ret = kbase_pm_wait_for_desired_state(kbdev);
+
+		if (ret)
+			dev_err(kbdev->dev,
+				"Wait for PM state failed when disabling coresight config");
+		else
+			ret = coresight_config_disable(kbdev, config);
+
+		kbase_csf_scheduler_spin_lock(kbdev, &flags);
+	} else if (kbdev->pm.backend.mcu_state == KBASE_MCU_OFF) {
+		/* MCU is OFF, so the disable sequence was already executed.
+		 *
+		 * Propagate any error that would have occurred during the enable
+		 * or disable sequence.
+		 *
+		 * This is done as part of the disable sequence, since the call from
+		 * client is synchronous.
+		 */
+		ret = config->error;
+	}
+
+	/* Remove config from next disable sequence */
+	spin_lock(&kbdev->csf.coresight.lock);
+	list_del_init(&config->link);
+	spin_unlock(&kbdev->csf.coresight.lock);
+
+	kbase_csf_scheduler_spin_unlock(kbdev, flags);
+	kbase_csf_scheduler_unlock(kbdev);
+
+	return ret;
+}
+EXPORT_SYMBOL(kbase_debug_coresight_csf_config_disable);
+
+static void coresight_config_enable_all(struct work_struct *data)
+{
+	struct kbase_device *kbdev =
+		container_of(data, struct kbase_device, csf.coresight.enable_work);
+	struct kbase_debug_coresight_csf_config *config_entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+	list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+		spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+		if (coresight_config_enable(kbdev, config_entry))
+			dev_err(kbdev->dev, "enable config (0x%pK) failed", config_entry);
+		spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	}
+
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	kbase_pm_update_state(kbdev);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	wake_up_all(&kbdev->csf.coresight.event_wait);
+}
+
+static void coresight_config_disable_all(struct work_struct *data)
+{
+	struct kbase_device *kbdev =
+		container_of(data, struct kbase_device, csf.coresight.disable_work);
+	struct kbase_debug_coresight_csf_config *config_entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+	list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+		spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+		if (coresight_config_disable(kbdev, config_entry))
+			dev_err(kbdev->dev, "disable config (0x%pK) failed", config_entry);
+		spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+	}
+
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	kbase_pm_update_state(kbdev);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	wake_up_all(&kbdev->csf.coresight.event_wait);
+}
+
+void kbase_debug_coresight_csf_disable_pmode_enter(struct kbase_device *kbdev)
+{
+	unsigned long flags;
+
+	dev_dbg(kbdev->dev, "Coresight state %s before protected mode enter",
+		coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_ENABLED));
+
+	lockdep_assert_held(&kbdev->csf.scheduler.lock);
+
+	kbase_pm_lock(kbdev);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+
+	kbdev->csf.coresight.disable_on_pmode_enter = true;
+	kbdev->csf.coresight.enable_on_pmode_exit = false;
+	kbase_pm_update_state(kbdev);
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	kbase_pm_wait_for_desired_state(kbdev);
+
+	kbase_pm_unlock(kbdev);
+}
+
+void kbase_debug_coresight_csf_enable_pmode_exit(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "Coresight state %s after protected mode exit",
+		coresight_state_to_string(KBASE_DEBUG_CORESIGHT_CSF_DISABLED));
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	WARN_ON(kbdev->csf.coresight.disable_on_pmode_enter);
+
+	kbdev->csf.coresight.enable_on_pmode_exit = true;
+	kbase_pm_update_state(kbdev);
+}
+
+void kbase_debug_coresight_csf_state_request(struct kbase_device *kbdev,
+					     enum kbase_debug_coresight_csf_state state)
+{
+	if (unlikely(!kbdev))
+		return;
+
+	if (unlikely(!kbdev->csf.coresight.workq))
+		return;
+
+	dev_dbg(kbdev->dev, "Coresight state %s requested", coresight_state_to_string(state));
+
+	switch (state) {
+	case KBASE_DEBUG_CORESIGHT_CSF_DISABLED:
+		queue_work(kbdev->csf.coresight.workq, &kbdev->csf.coresight.disable_work);
+		break;
+	case KBASE_DEBUG_CORESIGHT_CSF_ENABLED:
+		queue_work(kbdev->csf.coresight.workq, &kbdev->csf.coresight.enable_work);
+		break;
+	default:
+		dev_err(kbdev->dev, "Invalid Coresight state %d", state);
+		break;
+	}
+}
+
+bool kbase_debug_coresight_csf_state_check(struct kbase_device *kbdev,
+					   enum kbase_debug_coresight_csf_state state)
+{
+	struct kbase_debug_coresight_csf_config *config_entry;
+	unsigned long flags;
+	bool success = true;
+
+	dev_dbg(kbdev->dev, "Coresight check for state: %s", coresight_state_to_string(state));
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+	list_for_each_entry(config_entry, &kbdev->csf.coresight.configs, link) {
+		if (state != config_entry->state) {
+			success = false;
+			break;
+		}
+	}
+
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	return success;
+}
+KBASE_EXPORT_TEST_API(kbase_debug_coresight_csf_state_check);
+
+bool kbase_debug_coresight_csf_state_wait(struct kbase_device *kbdev,
+					  enum kbase_debug_coresight_csf_state state)
+{
+	const long wait_timeout = kbase_csf_timeout_in_jiffies(kbdev->csf.fw_timeout_ms);
+	struct kbase_debug_coresight_csf_config *config_entry, *next_config_entry;
+	unsigned long flags;
+	bool success = true;
+
+	dev_dbg(kbdev->dev, "Coresight wait for state: %s", coresight_state_to_string(state));
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+	list_for_each_entry_safe(config_entry, next_config_entry, &kbdev->csf.coresight.configs,
+				  link) {
+		const enum kbase_debug_coresight_csf_state prev_state = config_entry->state;
+		long remaining;
+
+		spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+		remaining = wait_event_timeout(kbdev->csf.coresight.event_wait,
+					       state == config_entry->state, wait_timeout);
+		spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+		if (!remaining) {
+			success = false;
+			dev_err(kbdev->dev,
+				"Timeout waiting for Coresight state transition %s to %s",
+				coresight_state_to_string(prev_state),
+				coresight_state_to_string(state));
+		}
+	}
+
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+
+	return success;
+}
+KBASE_EXPORT_TEST_API(kbase_debug_coresight_csf_state_wait);
+
+int kbase_debug_coresight_csf_init(struct kbase_device *kbdev)
+{
+	kbdev->csf.coresight.workq = alloc_ordered_workqueue("Mali CoreSight workqueue", 0);
+	if (kbdev->csf.coresight.workq == NULL)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&kbdev->csf.coresight.clients);
+	INIT_LIST_HEAD(&kbdev->csf.coresight.configs);
+	INIT_WORK(&kbdev->csf.coresight.enable_work, coresight_config_enable_all);
+	INIT_WORK(&kbdev->csf.coresight.disable_work, coresight_config_disable_all);
+	init_waitqueue_head(&kbdev->csf.coresight.event_wait);
+	spin_lock_init(&kbdev->csf.coresight.lock);
+
+	kbdev->csf.coresight.disable_on_pmode_enter = false;
+	kbdev->csf.coresight.enable_on_pmode_exit = false;
+
+	return 0;
+}
+
+void kbase_debug_coresight_csf_term(struct kbase_device *kbdev)
+{
+	struct kbase_debug_coresight_csf_client *client_entry, *next_client_entry;
+	struct kbase_debug_coresight_csf_config *config_entry, *next_config_entry;
+	unsigned long flags;
+
+	kbdev->csf.coresight.disable_on_pmode_enter = false;
+	kbdev->csf.coresight.enable_on_pmode_exit = false;
+
+	cancel_work_sync(&kbdev->csf.coresight.enable_work);
+	cancel_work_sync(&kbdev->csf.coresight.disable_work);
+	destroy_workqueue(kbdev->csf.coresight.workq);
+	kbdev->csf.coresight.workq = NULL;
+
+	spin_lock_irqsave(&kbdev->csf.coresight.lock, flags);
+
+	list_for_each_entry_safe(config_entry, next_config_entry, &kbdev->csf.coresight.configs,
+				  link) {
+		list_del_init(&config_entry->link);
+		kfree(config_entry);
+	}
+
+	list_for_each_entry_safe(client_entry, next_client_entry, &kbdev->csf.coresight.clients,
+				  link) {
+		list_del_init(&client_entry->link);
+		kfree(client_entry);
+	}
+
+	spin_unlock_irqrestore(&kbdev->csf.coresight.lock, flags);
+}
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h
new file mode 100644
index 0000000..06d62dc
--- /dev/null
+++ b/mali_kbase/debug/backend/mali_kbase_debug_coresight_internal_csf.h
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_
+#define _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_
+
+#include <mali_kbase.h>
+#include <linux/mali_kbase_debug_coresight_csf.h>
+
+/**
+ * struct kbase_debug_coresight_csf_client - Coresight client definition
+ *
+ * @drv_data:    Pointer to driver device data.
+ * @addr_ranges: Arrays of address ranges used by the registered client.
+ * @nr_ranges:   Size of @addr_ranges array.
+ * @link:        Link item of a Coresight client.
+ *               Linked to &struct_kbase_device.csf.coresight.clients.
+ */
+struct kbase_debug_coresight_csf_client {
+	void *drv_data;
+	struct kbase_debug_coresight_csf_address_range *addr_ranges;
+	u32 nr_ranges;
+	struct list_head link;
+};
+
+/**
+ * enum kbase_debug_coresight_csf_state - Coresight configuration states
+ *
+ * @KBASE_DEBUG_CORESIGHT_CSF_DISABLED: Coresight configuration is disabled.
+ * @KBASE_DEBUG_CORESIGHT_CSF_ENABLED:  Coresight configuration is enabled.
+ */
+enum kbase_debug_coresight_csf_state {
+	KBASE_DEBUG_CORESIGHT_CSF_DISABLED = 0,
+	KBASE_DEBUG_CORESIGHT_CSF_ENABLED,
+};
+
+/**
+ * struct kbase_debug_coresight_csf_config - Coresight configuration definition
+ *
+ * @client:      Pointer to the client for which the configuration is created.
+ * @enable_seq:  Array of operations for Coresight client enable sequence. Can be NULL.
+ * @disable_seq: Array of operations for Coresight client disable sequence. Can be NULL.
+ * @state:       Current Coresight configuration state.
+ * @error:       Error code used to know if an error occurred during the execution
+ *               of the enable or disable sequences.
+ * @link:        Link item of a Coresight configuration.
+ *               Linked to &struct_kbase_device.csf.coresight.configs.
+ */
+struct kbase_debug_coresight_csf_config {
+	void *client;
+	struct kbase_debug_coresight_csf_sequence *enable_seq;
+	struct kbase_debug_coresight_csf_sequence *disable_seq;
+	enum kbase_debug_coresight_csf_state state;
+	int error;
+	struct list_head link;
+};
+
+/**
+ * struct kbase_debug_coresight_device - Object representing the Coresight device
+ *
+ * @clients: List head to maintain Coresight clients.
+ * @configs: List head to maintain Coresight configs.
+ * @lock: A lock to protect client/config lists.
+ *                  Lists can be accessed concurrently by
+ *                  Coresight kernel modules and kernel threads.
+ * @workq: Work queue for Coresight enable/disable execution.
+ * @enable_work: Work item used to enable Coresight.
+ * @disable_work: Work item used to disable Coresight.
+ * @event_wait: Wait queue for Coresight events.
+ * @enable_on_pmode_exit: Flag used by the PM state machine to
+ *                        identify if Coresight enable is needed.
+ * @disable_on_pmode_enter: Flag used by the PM state machine to
+ *                         identify if Coresight disable is needed.
+ */
+struct kbase_debug_coresight_device {
+	struct list_head clients;
+	struct list_head configs;
+	spinlock_t lock;
+	struct workqueue_struct *workq;
+	struct work_struct enable_work;
+	struct work_struct disable_work;
+	wait_queue_head_t event_wait;
+	bool enable_on_pmode_exit;
+	bool disable_on_pmode_enter;
+};
+
+/**
+ * kbase_debug_coresight_csf_init - Initialize Coresight resources.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function should be called once at device initialization.
+ *
+ * Return: 0 on success.
+ */
+int kbase_debug_coresight_csf_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_coresight_csf_term - Terminate Coresight resources.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function should be called at device termination to prevent any
+ * memory leaks if Coresight module would have been removed without calling
+ * kbasep_debug_coresight_csf_trace_disable().
+ */
+void kbase_debug_coresight_csf_term(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_coresight_csf_disable_pmode_enter - Disable Coresight on Protected
+ *                                                 mode enter.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function should be called just before requesting to enter protected mode.
+ * It will trigger a PM state machine transition from MCU_ON
+ * to ON_PMODE_ENTER_CORESIGHT_DISABLE.
+ */
+void kbase_debug_coresight_csf_disable_pmode_enter(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_coresight_csf_enable_pmode_exit - Enable Coresight on Protected
+ *                                                 mode enter.
+ *
+ * @kbdev: Instance of a GPU platform device that implements a CSF interface.
+ *
+ * This function should be called after protected mode exit is acknowledged.
+ * It will trigger a PM state machine transition from MCU_ON
+ * to ON_PMODE_EXIT_CORESIGHT_ENABLE.
+ */
+void kbase_debug_coresight_csf_enable_pmode_exit(struct kbase_device *kbdev);
+
+/**
+ * kbase_debug_coresight_csf_state_request - Request Coresight state transition.
+ *
+ * @kbdev:     Instance of a GPU platform device that implements a CSF interface.
+ * @state:     Coresight state to check for.
+ */
+void kbase_debug_coresight_csf_state_request(struct kbase_device *kbdev,
+					     enum kbase_debug_coresight_csf_state state);
+
+/**
+ * kbase_debug_coresight_csf_state_check - Check Coresight state.
+ *
+ * @kbdev:     Instance of a GPU platform device that implements a CSF interface.
+ * @state:     Coresight state to check for.
+ *
+ * Return: true if all states of configs are @state.
+ */
+bool kbase_debug_coresight_csf_state_check(struct kbase_device *kbdev,
+					   enum kbase_debug_coresight_csf_state state);
+
+/**
+ * kbase_debug_coresight_csf_state_wait - Wait for Coresight state transition to complete.
+ *
+ * @kbdev:     Instance of a GPU platform device that implements a CSF interface.
+ * @state:     Coresight state to wait for.
+ *
+ * Return: true if all configs become @state in pre-defined time period.
+ */
+bool kbase_debug_coresight_csf_state_wait(struct kbase_device *kbdev,
+					  enum kbase_debug_coresight_csf_state state);
+
+#endif /* _KBASE_DEBUG_CORESIGHT_INTERNAL_CSF_H_ */
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h
index 2506ce1..87e13e5 100644
--- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h
+++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_codes_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -42,67 +42,75 @@ int dummy_array[] = {
 	/*
 	 * Generic CSF events
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(EVICT_CTX_SLOTS),
+	/* info_val = 0 */
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EVICT_CTX_SLOTS_START),
+	/* info_val == number of CSGs supported */
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EVICT_CTX_SLOTS_END),
 	/* info_val[0:7]   == fw version_minor
 	 * info_val[15:8]  == fw version_major
 	 * info_val[63:32] == fw version_hash
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(FIRMWARE_BOOT),
-	KBASE_KTRACE_CODE_MAKE_CODE(FIRMWARE_REBOOT),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_BOOT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_REBOOT),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_INVOKE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_INVOKE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_START),
 	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TOCK_END),
 	/* info_val == total number of runnable groups across all kctxs */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_START),
 	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_END),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RESET_END),
 	/* info_val = timeout in ms */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_WAIT_PROTM_QUIT),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_WAIT_QUIT_START),
 	/* info_val = remaining ms timeout, or 0 if timedout */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_WAIT_PROTM_QUIT_DONE),
-	KBASE_KTRACE_CODE_MAKE_CODE(SYNC_UPDATE_EVENT),
-	KBASE_KTRACE_CODE_MAKE_CODE(SYNC_UPDATE_EVENT_NOTIFY_GPU),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_WAIT_QUIT_END),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_EVENT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT),
 
 	/* info_val = JOB_IRQ_STATUS */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_START),
 	/* info_val = JOB_IRQ_STATUS */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_END),
 	/* info_val = JOB_IRQ_STATUS */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROCESS),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROCESS_START),
 	/* info_val = GLB_REQ ^ GLB_ACQ */
-	KBASE_KTRACE_CODE_MAKE_CODE(GLB_REQ_ACQ),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_INTERRUPT_GLB_REQ_ACK),
 	/* info_val[31:0] = num non idle offslot groups
 	 * info_val[32] = scheduler can suspend on idle
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_CAN_IDLE),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ADVANCE_TICK),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NOADVANCE_TICK),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_ADVANCE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_TICK_NOADVANCE),
 	/* kctx is added to the back of the list */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_INSERT_RUNNABLE),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_REMOVE_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_INSERT),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_REMOVE),
 	/* kctx is moved to the back of the list */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ROTATE_RUNNABLE),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_HEAD_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_ROTATE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_RUNNABLE_KCTX_HEAD),
 
-	KBASE_KTRACE_CODE_MAKE_CODE(IDLE_WORKER_BEGIN),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_START),
 	/* 4-bit encoding of boolean values (ease of reading as hex values)
 	 *
 	 * info_val[3:0] = was reset active/failed to be prevented
 	 * info_val[7:4] = whether scheduler was both idle and suspendable
 	 * info_val[11:8] = whether all groups were suspended
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(IDLE_WORKER_END),
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_SYNC_UPDATE_WORKER_BEGIN),
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_SYNC_UPDATE_WORKER_END),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_END),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END),
 
 	/* info_val = bitmask of slots that gave an ACK for STATUS_UPDATE */
-	KBASE_KTRACE_CODE_MAKE_CODE(SLOTS_STATUS_UPDATE_ACK),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_UPDATE_IDLE_SLOTS_ACK),
 
 	/* info_val[63:0] = GPU cycle counter, used mainly for benchmarking
 	 * purpose.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GPU_IDLE_HANDLING_START),
-	KBASE_KTRACE_CODE_MAKE_CODE(MCU_HALTED),
-	KBASE_KTRACE_CODE_MAKE_CODE(MCU_IN_SLEEP),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_HANDLING_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_GPU_IDLE_WORKER_HANDLING_END),
+
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_MCU_HALTED),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_FIRMWARE_MCU_SLEEP),
 
 	/*
 	 * Group events
@@ -111,21 +119,23 @@ int dummy_array[] = {
 	 * info_val[19:16] == as_nr
 	 * info_val[63:32] == endpoint config (max number of endpoints allowed)
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_START_REQ),
 	/* info_val == CSG_REQ state issued */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOP),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOP_REQ),
 	/* info_val == CSG_ACK state */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STARTED),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_RUNNING),
 	/* info_val == CSG_ACK state */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STOPPED),
 	/* info_val == slot cleaned */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_CLEANED),
 	/* info_val = slot requesting STATUS_UPDATE */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_STATUS_UPDATE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_UPDATE_IDLE_SLOT_REQ),
 	/* info_val = scheduler's new csg_slots_idle_mask[0]
 	 * group->csg_nr indicates which bit was set
 	 */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_IDLE_SET),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_NO_NON_IDLE_GROUPS),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_NON_IDLE_GROUPS),
 	/* info_val = scheduler's new csg_slots_idle_mask[0]
 	 * group->csg_nr indicates which bit was cleared
 	 *
@@ -133,13 +143,13 @@ int dummy_array[] = {
 	 */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_IDLE_CLEAR),
 	/* info_val == previous priority */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_PRIO_UPDATE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SLOT_PRIO_UPDATE),
 	/* info_val == CSG_REQ ^ CSG_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_SYNC_UPDATE_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_SYNC_UPDATE),
 	/* info_val == CSG_REQ ^ CSG_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_IDLE_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_IDLE),
 	/* info_val == CSG_REQ ^ CSG_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSG_PROGRESS_TIMER_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSG_INTERRUPT_PROGRESS_TIMER_EVENT),
 	/* info_val[31:0] == CSG_REQ ^ CSG_ACQ
 	 * info_val[63:32] == CSG_IRQ_REQ ^ CSG_IRQ_ACK
 	 */
@@ -152,34 +162,34 @@ int dummy_array[] = {
 	/* info_val[31:0] == new run state of the evicted group
 	 * info_val[63:32] == number of runnable groups
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_EVICT_SCHED),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_EVICT),
 
 	/* info_val == new num_runnable_grps
 	 * group is added to the back of the list for its priority level
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_INSERT_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_INSERT),
 	/* info_val == new num_runnable_grps
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_REMOVE_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_REMOVE),
 	/* info_val == num_runnable_grps
 	 * group is moved to the back of the list for its priority level
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_ROTATE_RUNNABLE),
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_HEAD_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_ROTATE),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_RUNNABLE_HEAD),
 	/* info_val == new num_idle_wait_grps
 	 * group is added to the back of the list
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_INSERT_IDLE_WAIT),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_INSERT),
 	/* info_val == new num_idle_wait_grps
 	 * group is added to the back of the list
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_REMOVE_IDLE_WAIT),
-	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_HEAD_IDLE_WAIT),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_REMOVE),
+	KBASE_KTRACE_CODE_MAKE_CODE(GROUP_IDLE_WAIT_HEAD),
 
 	/* info_val == is scheduler running with protected mode tasks */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_CHECK_PROTM_ENTER),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ENTER_PROTM),
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EXIT_PROTM),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_ENTER_CHECK),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_ENTER),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_PROTM_EXIT),
 	/* info_val[31:0] == number of GPU address space slots in use
 	 * info_val[63:32] == number of runnable groups
 	 */
@@ -187,13 +197,40 @@ int dummy_array[] = {
 	/* info_val == new count of off-slot non-idle groups
 	 * no group indicates it was set rather than incremented
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_INC),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_GRP_INC),
 	/* info_val == new count of off-slot non-idle groups */
-	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_DEC),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC),
+	/* info_val = scheduler's new csg_slots_idle_mask[0]
+	 * group->csg_nr indicates which bit was set
+	 */
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_HANDLE_IDLE_SLOTS),
 
-	KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_BEGIN),
+	KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_START),
 	KBASE_KTRACE_CODE_MAKE_CODE(PROTM_EVENT_WORKER_END),
 
+	/* info_val = scheduler state */
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHED_BUSY),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHED_INACTIVE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHED_SUSPENDED),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHED_SLEEPING),
+
+	/* info_val = mcu state */
+#define KBASEP_MCU_STATE(n) KBASE_KTRACE_CODE_MAKE_CODE(PM_MCU_ ## n),
+#include "backend/gpu/mali_kbase_pm_mcu_states.h"
+#undef KBASEP_MCU_STATE
+
+	/* info_val = number of runnable groups */
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_INACTIVE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_RUNNABLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_IDLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED_ON_IDLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_SUSPENDED_ON_WAIT_SYNC),
+	/* info_val = new run state of the evicted group */
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_FAULT_EVICTED),
+	/* info_val = get the number of active CSGs */
+	KBASE_KTRACE_CODE_MAKE_CODE(CSF_GROUP_TERMINATED),
+
 	/*
 	 * Group + Queue events
 	 */
@@ -201,42 +238,42 @@ int dummy_array[] = {
 	KBASE_KTRACE_CODE_MAKE_CODE(CSI_START),
 	/* info_val == queue->enabled before stop */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP),
-	KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP_REQUESTED),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_STOP_REQ),
 	/* info_val == CS_REQ ^ CS_ACK that were not processed due to the group
 	 * being suspended
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED),
 	/* info_val == CS_REQ ^ CS_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSI_FAULT_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_FAULT),
 	/* info_val == CS_REQ ^ CS_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSI_TILER_OOM_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_TILER_OOM),
 	/* info_val == CS_REQ ^ CS_ACK */
-	KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_INTERRUPT),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_INTERRUPT_PROTM_PEND),
 	/* info_val == CS_ACK_PROTM_PEND ^ CS_REQ_PROTM_PEND */
 	KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_ACK),
 	/* info_val == group->run_State (for group the queue is bound to) */
 	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_START),
 	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_STOP),
 	/* info_val == contents of CS_STATUS_WAIT_SYNC_POINTER */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVAL_START),
 	/* info_val == bool for result of the evaluation */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVALUATED),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_EVAL_END),
 	/* info_val == contents of CS_STATUS_WAIT */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_STATUS_WAIT),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_WAIT_STATUS),
 	/* info_val == current sync value pointed to by queue->sync_ptr */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_CURRENT_VAL),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_CUR_VAL),
 	/* info_val == current value of CS_STATUS_WAIT_SYNC_VALUE */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_TEST_VAL),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_TEST_VAL),
 	/* info_val == current value of CS_STATUS_BLOCKED_REASON */
-	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_BLOCKED_REASON),
+	KBASE_KTRACE_CODE_MAKE_CODE(QUEUE_SYNC_UPDATE_BLOCKED_REASON),
 	/* info_val = group's new protm_pending_bitmap[0]
 	 * queue->csi_index indicates which bit was set
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(PROTM_PENDING_SET),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_SET),
 	/* info_val = group's new protm_pending_bitmap[0]
 	 * queue->csi_index indicates which bit was cleared
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(PROTM_PENDING_CLEAR),
+	KBASE_KTRACE_CODE_MAKE_CODE(CSI_PROTM_PEND_CLEAR),
 
 	/*
 	 * KCPU queue events
@@ -244,42 +281,49 @@ int dummy_array[] = {
 	/* KTrace info_val == KCPU queue fence context
 	 * KCPU extra_info_val == N/A.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_NEW),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_CREATE),
 	/* KTrace info_val == Number of pending commands in KCPU queue when
 	 * it is destroyed.
 	 * KCPU extra_info_val == Number of CQS wait operations present in
 	 * the KCPU queue when it is destroyed.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_DESTROY),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_QUEUE_DELETE),
 	/* KTrace info_val == CQS event memory address
 	 * KCPU extra_info_val == Upper 32 bits of event memory, i.e. contents
 	 * of error field.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(CQS_SET),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_SET),
 	/* KTrace info_val == Number of CQS objects to be waited upon
 	 * KCPU extra_info_val == N/A.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(CQS_WAIT_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_WAIT_START),
 	/* KTrace info_val == CQS event memory address
 	 * KCPU extra_info_val == 1 if CQS was signaled with an error and queue
 	 * inherited the error, otherwise 0.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(CQS_WAIT_END),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_CQS_WAIT_END),
 	/* KTrace info_val == Fence context
 	 * KCPU extra_info_val == Fence seqno.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(FENCE_SIGNAL),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_SIGNAL),
 	/* KTrace info_val == Fence context
 	 * KCPU extra_info_val == Fence seqno.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(FENCE_WAIT_START),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_WAIT_START),
 	/* KTrace info_val == Fence context
 	 * KCPU extra_info_val == Fence seqno.
 	 */
-	KBASE_KTRACE_CODE_MAKE_CODE(FENCE_WAIT_END),
+	KBASE_KTRACE_CODE_MAKE_CODE(KCPU_FENCE_WAIT_END),
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_ENTER_SC_RAIL),
+	KBASE_KTRACE_CODE_MAKE_CODE(SCHEDULER_EXIT_SC_RAIL),
+	KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_RECHECK_IDLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_RECHECK_NOT_IDLE),
+	KBASE_KTRACE_CODE_MAKE_CODE(SC_RAIL_CAN_TURN_OFF),
+#endif
 #if 0 /* Dummy section to avoid breaking formatting */
 };
 #endif
 
-/* ***** THE LACK OF HEADER GUARDS IS INTENTIONAL ***** */
+	/* ***** THE LACK OF HEADER GUARDS IS INTENTIONAL ***** */
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c
index 824ca4b..cff6f89 100644
--- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c
+++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -98,6 +98,9 @@ void kbasep_ktrace_add_csf(struct kbase_device *kbdev,
 	struct kbase_ktrace_msg *trace_msg;
 	struct kbase_context *kctx = NULL;
 
+	if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace)))
+		return;
+
 	spin_lock_irqsave(&kbdev->ktrace.lock, irqflags);
 
 	/* Reserve and update indices */
@@ -165,6 +168,9 @@ void kbasep_ktrace_add_csf_kcpu(struct kbase_device *kbdev,
 	struct kbase_ktrace_msg *trace_msg;
 	struct kbase_context *kctx = queue->kctx;
 
+	if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace)))
+		return;
+
 	spin_lock_irqsave(&kbdev->ktrace.lock, irqflags);
 
 	/* Reserve and update indices */
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h
index 7f32cd2..1896e10 100644
--- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h
+++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_defs_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -47,7 +47,7 @@
  * 1.3:
  * Add a lot of extra new traces. Tweak some existing scheduler related traces
  * to contain extra information information/happen at slightly different times.
- * SCHEDULER_EXIT_PROTM now has group information
+ * SCHEDULER_PROTM_EXIT now has group information
  */
 #define KBASE_KTRACE_VERSION_MAJOR 1
 #define KBASE_KTRACE_VERSION_MINOR 3
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c
index 05d1677..6597a15 100644
--- a/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c
+++ b/mali_kbase/debug/backend/mali_kbase_debug_ktrace_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -80,6 +80,9 @@ void kbasep_ktrace_add_jm(struct kbase_device *kbdev,
 	unsigned long irqflags;
 	struct kbase_ktrace_msg *trace_msg;
 
+	if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace)))
+		return;
+
 	spin_lock_irqsave(&kbdev->ktrace.lock, irqflags);
 
 	/* Reserve and update indices */
diff --git a/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h b/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h
index 9ee7f81..e70a498 100644
--- a/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h
+++ b/mali_kbase/debug/backend/mali_kbase_debug_linux_ktrace_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -30,37 +30,52 @@
 /*
  * Generic CSF events - using the common DEFINE_MALI_ADD_EVENT
  */
-DEFINE_MALI_ADD_EVENT(EVICT_CTX_SLOTS);
-DEFINE_MALI_ADD_EVENT(FIRMWARE_BOOT);
-DEFINE_MALI_ADD_EVENT(FIRMWARE_REBOOT);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_EVICT_CTX_SLOTS_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_EVICT_CTX_SLOTS_END);
+DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_BOOT);
+DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_REBOOT);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_INVOKE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_INVOKE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_START);
 DEFINE_MALI_ADD_EVENT(SCHEDULER_TOCK_END);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_START);
 DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_END);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_WAIT_PROTM_QUIT);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_WAIT_PROTM_QUIT_DONE);
-DEFINE_MALI_ADD_EVENT(SYNC_UPDATE_EVENT);
-DEFINE_MALI_ADD_EVENT(SYNC_UPDATE_EVENT_NOTIFY_GPU);
-DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RESET_END);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_PROTM_WAIT_QUIT_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_PROTM_WAIT_QUIT_END);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_EVENT);
+DEFINE_MALI_ADD_EVENT(CSF_SYNC_UPDATE_NOTIFY_GPU_EVENT);
+DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_START);
 DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_END);
-DEFINE_MALI_ADD_EVENT(CSG_INTERRUPT_PROCESS);
-DEFINE_MALI_ADD_EVENT(GLB_REQ_ACQ);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_CAN_IDLE);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_ADVANCE_TICK);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_NOADVANCE_TICK);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_INSERT_RUNNABLE);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_REMOVE_RUNNABLE);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_ROTATE_RUNNABLE);
-DEFINE_MALI_ADD_EVENT(SCHEDULER_HEAD_RUNNABLE);
-DEFINE_MALI_ADD_EVENT(IDLE_WORKER_BEGIN);
-DEFINE_MALI_ADD_EVENT(IDLE_WORKER_END);
-DEFINE_MALI_ADD_EVENT(GROUP_SYNC_UPDATE_WORKER_BEGIN);
-DEFINE_MALI_ADD_EVENT(GROUP_SYNC_UPDATE_WORKER_END);
-DEFINE_MALI_ADD_EVENT(SLOTS_STATUS_UPDATE_ACK);
-DEFINE_MALI_ADD_EVENT(GPU_IDLE_HANDLING_START);
-DEFINE_MALI_ADD_EVENT(MCU_HALTED);
-DEFINE_MALI_ADD_EVENT(MCU_IN_SLEEP);
+DEFINE_MALI_ADD_EVENT(CSF_INTERRUPT_GLB_REQ_ACK);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_EVENT_CAN_SUSPEND);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_ADVANCE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_TICK_NOADVANCE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_INSERT);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_REMOVE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_ROTATE);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_RUNNABLE_KCTX_HEAD);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_END);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GROUP_SYNC_UPDATE_WORKER_END);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_UPDATE_IDLE_SLOTS_ACK);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_HANDLING_START);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_GPU_IDLE_WORKER_HANDLING_END);
+DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_MCU_HALTED);
+DEFINE_MALI_ADD_EVENT(CSF_FIRMWARE_MCU_SLEEP);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+DEFINE_MALI_ADD_EVENT(SCHEDULER_ENTER_SC_RAIL);
+DEFINE_MALI_ADD_EVENT(SCHEDULER_EXIT_SC_RAIL);
+#endif
+DEFINE_MALI_ADD_EVENT(SCHED_BUSY);
+DEFINE_MALI_ADD_EVENT(SCHED_INACTIVE);
+DEFINE_MALI_ADD_EVENT(SCHED_SUSPENDED);
+DEFINE_MALI_ADD_EVENT(SCHED_SLEEPING);
+#define KBASEP_MCU_STATE(n) DEFINE_MALI_ADD_EVENT(PM_MCU_ ## n);
+#include "backend/gpu/mali_kbase_pm_mcu_states.h"
+#undef KBASEP_MCU_STATE
 
 DECLARE_EVENT_CLASS(mali_csf_grp_q_template,
 	TP_PROTO(struct kbase_device *kbdev, struct kbase_queue_group *group,
@@ -130,38 +145,55 @@ DECLARE_EVENT_CLASS(mali_csf_grp_q_template,
 		__entry->kctx_tgid, __entry->kctx_id, __entry->group_handle, \
 		__entry->csg_nr, __entry->slot_prio, __entry->info_val))
 
-DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_START);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOP);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STARTED);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_START_REQ);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOP_REQ);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_RUNNING);
 DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STOPPED);
 DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_CLEANED);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_STATUS_UPDATE);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_UPDATE_IDLE_SLOT_REQ);
 DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_IDLE_SET);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_NO_NON_IDLE_GROUPS);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_NON_IDLE_GROUPS);
 DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_IDLE_CLEAR);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_PRIO_UPDATE);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_SYNC_UPDATE_INTERRUPT);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_IDLE_INTERRUPT);
-DEFINE_MALI_CSF_GRP_EVENT(CSG_PROGRESS_TIMER_INTERRUPT);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_SLOT_PRIO_UPDATE);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_SYNC_UPDATE);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_IDLE);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROGRESS_TIMER_EVENT);
+DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROCESS_START);
 DEFINE_MALI_CSF_GRP_EVENT(CSG_INTERRUPT_PROCESS_END);
 DEFINE_MALI_CSF_GRP_EVENT(GROUP_SYNC_UPDATE_DONE);
 DEFINE_MALI_CSF_GRP_EVENT(GROUP_DESCHEDULE);
 DEFINE_MALI_CSF_GRP_EVENT(GROUP_SCHEDULE);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_EVICT_SCHED);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_INSERT_RUNNABLE);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_REMOVE_RUNNABLE);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_ROTATE_RUNNABLE);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_HEAD_RUNNABLE);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_INSERT_IDLE_WAIT);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_REMOVE_IDLE_WAIT);
-DEFINE_MALI_CSF_GRP_EVENT(GROUP_HEAD_IDLE_WAIT);
-DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_CHECK_PROTM_ENTER);
-DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_ENTER_PROTM);
-DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_EXIT_PROTM);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_EVICT);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_INSERT);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_REMOVE);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_ROTATE);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_RUNNABLE_HEAD);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_INSERT);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_REMOVE);
+DEFINE_MALI_CSF_GRP_EVENT(GROUP_IDLE_WAIT_HEAD);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_ENTER_CHECK);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_ENTER);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_PROTM_EXIT);
 DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_TOP_GRP);
-DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_INC);
-DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_DEC);
-DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_BEGIN);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_GRP_INC);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_NONIDLE_OFFSLOT_GRP_DEC);
+DEFINE_MALI_CSF_GRP_EVENT(SCHEDULER_HANDLE_IDLE_SLOTS);
+DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_START);
 DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_END);
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_RECHECK_IDLE);
+DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_RECHECK_NOT_IDLE);
+DEFINE_MALI_CSF_GRP_EVENT(SC_RAIL_CAN_TURN_OFF);
+#endif
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_INACTIVE);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_RUNNABLE);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_IDLE);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED_ON_IDLE);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_SUSPENDED_ON_WAIT_SYNC);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_FAULT_EVICTED);
+DEFINE_MALI_CSF_GRP_EVENT(CSF_GROUP_TERMINATED);
 
 #undef DEFINE_MALI_CSF_GRP_EVENT
 
@@ -176,22 +208,22 @@ DEFINE_MALI_CSF_GRP_EVENT(PROTM_EVENT_WORKER_END);
 
 DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_START);
 DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP);
-DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP_REQUESTED);
-DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_IGNORED_INTERRUPTS_GROUP_SUSPEND);
-DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_FAULT_INTERRUPT);
-DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_TILER_OOM_INTERRUPT);
-DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_INTERRUPT);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_STOP_REQ);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_GROUP_SUSPENDS_IGNORED);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_FAULT);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_TILER_OOM);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_INTERRUPT_PROTM_PEND);
 DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_ACK);
 DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_START);
 DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_STOP);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVALUATED);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_STATUS_WAIT);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_CURRENT_VAL);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_TEST_VAL);
-DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_BLOCKED_REASON);
-DEFINE_MALI_CSF_GRP_Q_EVENT(PROTM_PENDING_SET);
-DEFINE_MALI_CSF_GRP_Q_EVENT(PROTM_PENDING_CLEAR);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVAL_START);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_EVAL_END);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_WAIT_STATUS);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_CUR_VAL);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_TEST_VAL);
+DEFINE_MALI_CSF_GRP_Q_EVENT(QUEUE_SYNC_UPDATE_BLOCKED_REASON);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_SET);
+DEFINE_MALI_CSF_GRP_Q_EVENT(CSI_PROTM_PEND_CLEAR);
 
 #undef DEFINE_MALI_CSF_GRP_Q_EVENT
 
@@ -230,14 +262,14 @@ DECLARE_EVENT_CLASS(mali_csf_kcpu_queue_template,
 		 u64 info_val1, u64 info_val2), \
 	TP_ARGS(queue, info_val1, info_val2))
 
-DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_NEW);
-DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_DESTROY);
-DEFINE_MALI_CSF_KCPU_EVENT(CQS_SET);
-DEFINE_MALI_CSF_KCPU_EVENT(CQS_WAIT_START);
-DEFINE_MALI_CSF_KCPU_EVENT(CQS_WAIT_END);
-DEFINE_MALI_CSF_KCPU_EVENT(FENCE_SIGNAL);
-DEFINE_MALI_CSF_KCPU_EVENT(FENCE_WAIT_START);
-DEFINE_MALI_CSF_KCPU_EVENT(FENCE_WAIT_END);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_CREATE);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_QUEUE_DELETE);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_SET);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_WAIT_START);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_CQS_WAIT_END);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_SIGNAL);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_WAIT_START);
+DEFINE_MALI_CSF_KCPU_EVENT(KCPU_FENCE_WAIT_END);
 
 #undef DEFINE_MALI_CSF_KCPU_EVENT
 
diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace.c b/mali_kbase/debug/mali_kbase_debug_ktrace.c
index 9bf8610..3cbd2da 100644
--- a/mali_kbase/debug/mali_kbase_debug_ktrace.c
+++ b/mali_kbase/debug/mali_kbase_debug_ktrace.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,13 +27,13 @@ int kbase_ktrace_init(struct kbase_device *kbdev)
 #if KBASE_KTRACE_TARGET_RBUF
 	struct kbase_ktrace_msg *rbuf;
 
+	spin_lock_init(&kbdev->ktrace.lock);
 	rbuf = kmalloc_array(KBASE_KTRACE_SIZE, sizeof(*rbuf), GFP_KERNEL);
 
 	if (!rbuf)
 		return -EINVAL;
 
 	kbdev->ktrace.rbuf = rbuf;
-	spin_lock_init(&kbdev->ktrace.lock);
 #endif /* KBASE_KTRACE_TARGET_RBUF */
 	return 0;
 }
@@ -42,6 +42,7 @@ void kbase_ktrace_term(struct kbase_device *kbdev)
 {
 #if KBASE_KTRACE_TARGET_RBUF
 	kfree(kbdev->ktrace.rbuf);
+	kbdev->ktrace.rbuf = NULL;
 #endif /* KBASE_KTRACE_TARGET_RBUF */
 }
 
@@ -131,7 +132,7 @@ static void kbasep_ktrace_dump_msg(struct kbase_device *kbdev,
 	lockdep_assert_held(&kbdev->ktrace.lock);
 
 	kbasep_ktrace_format_msg(trace_msg, buffer, sizeof(buffer));
-	dev_dbg(kbdev->dev, "%s", buffer);
+	dev_err(kbdev->dev, "%s", buffer);
 }
 
 struct kbase_ktrace_msg *kbasep_ktrace_reserve(struct kbase_ktrace *ktrace)
@@ -183,6 +184,9 @@ void kbasep_ktrace_add(struct kbase_device *kbdev, enum kbase_ktrace_code code,
 	unsigned long irqflags;
 	struct kbase_ktrace_msg *trace_msg;
 
+	if (unlikely(!kbasep_ktrace_initialized(&kbdev->ktrace)))
+		return;
+
 	WARN_ON((flags & ~KBASE_KTRACE_FLAG_COMMON_ALL));
 
 	spin_lock_irqsave(&kbdev->ktrace.lock, irqflags);
@@ -212,34 +216,61 @@ void kbasep_ktrace_clear(struct kbase_device *kbdev)
 	spin_unlock_irqrestore(&kbdev->ktrace.lock, flags);
 }
 
+static inline u32 ktrace_buffer_distance(u32 start, u32 end) {
+	if (end == start)
+		return 0;
+	if (end > start)
+		return end - start;
+	return KBASE_KTRACE_SIZE;
+}
+
 void kbasep_ktrace_dump(struct kbase_device *kbdev)
 {
 	unsigned long flags;
 	u32 start;
 	u32 end;
+	u32 i = 0;
+	u32 distance = 0;
 	char buffer[KTRACE_DUMP_MESSAGE_SIZE] = "Dumping trace:\n";
 
 	kbasep_ktrace_format_header(buffer, sizeof(buffer), strlen(buffer));
-	dev_dbg(kbdev->dev, "%s", buffer);
+	dev_err(kbdev->dev, "%s", buffer);
 
 	spin_lock_irqsave(&kbdev->ktrace.lock, flags);
 	start = kbdev->ktrace.first_out;
 	end = kbdev->ktrace.next_in;
-
-	while (start != end) {
-		struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[start];
-
+	distance = ktrace_buffer_distance(start, end);
+	for (i = 0; i < distance; ++i) {
+		struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[end];
 		kbasep_ktrace_dump_msg(kbdev, trace_msg);
 
-		start = (start + 1) & KBASE_KTRACE_MASK;
+		end = (end + 1) & KBASE_KTRACE_MASK;
 	}
-	dev_dbg(kbdev->dev, "TRACE_END");
+	dev_err(kbdev->dev, "TRACE_END: (%i entries)", i);
 
 	kbasep_ktrace_clear_locked(kbdev);
 
 	spin_unlock_irqrestore(&kbdev->ktrace.lock, flags);
 }
 
+u32 kbasep_ktrace_copy(struct kbase_device* kbdev, struct kbase_ktrace_msg* msgs, u32 num_msgs)
+{
+	u32 start = kbdev->ktrace.first_out;
+	u32 end = kbdev->ktrace.next_in;
+	u32 i = 0;
+	u32 distance = min(ktrace_buffer_distance(start, end), num_msgs);
+
+	lockdep_assert_held(&kbdev->ktrace.lock);
+
+	for (i = 0; i < distance; ++i) {
+		struct kbase_ktrace_msg *trace_msg = &kbdev->ktrace.rbuf[end];
+		memcpy(&msgs[i], trace_msg, sizeof(struct kbase_ktrace_msg));
+		end = (end + 1) & KBASE_KTRACE_MASK;
+	}
+
+	return i;
+}
+
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 struct trace_seq_state {
 	struct kbase_ktrace_msg trace_buf[KBASE_KTRACE_SIZE];
diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace.h b/mali_kbase/debug/mali_kbase_debug_ktrace.h
index f1e6d3d..7c988f4 100644
--- a/mali_kbase/debug/mali_kbase_debug_ktrace.h
+++ b/mali_kbase/debug/mali_kbase_debug_ktrace.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -82,6 +82,18 @@ void kbase_ktrace_debugfs_init(struct kbase_device *kbdev);
  */
 #if KBASE_KTRACE_TARGET_RBUF
 /**
+ * kbasep_ktrace_initialized - Check whether kbase ktrace is initialized
+ *
+ * @ktrace: ktrace of kbase device.
+ *
+ * Return: true if ktrace has been initialized.
+ */
+static inline bool kbasep_ktrace_initialized(struct kbase_ktrace *ktrace)
+{
+	return ktrace->rbuf != NULL;
+}
+
+/**
  * kbasep_ktrace_add - internal function to add trace to the ringbuffer.
  * @kbdev:    kbase device
  * @code:     ktrace code
@@ -111,6 +123,18 @@ void kbasep_ktrace_clear(struct kbase_device *kbdev);
  */
 void kbasep_ktrace_dump(struct kbase_device *kbdev);
 
+/**
+ * kbasep_ktrace_copy - copy ktrace buffer.
+ * Elements in the buffer will be ordered from earliest to latest.
+ * Precondition: ktrace lock must be held.
+ *
+ * @kbdev: kbase device
+ * @msgs: a region of memory of size data_size that the ktrace buffer will be copied to
+ * @num_msgs: the size of data. 
+ * Return: The number of elements copied.
+ */
+ u32 kbasep_ktrace_copy(struct kbase_device* kbdev, struct kbase_ktrace_msg* msgs, u32 num_msgs);
+
 #define KBASE_KTRACE_RBUF_ADD(kbdev, code, kctx, info_val)     \
 	kbasep_ktrace_add(kbdev, KBASE_KTRACE_CODE(code), kctx, 0, \
 			info_val) \
diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h b/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h
index 1c6b4cd..e2a1e8c 100644
--- a/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h
+++ b/mali_kbase/debug/mali_kbase_debug_ktrace_codes.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2015, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2015, 2018-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -142,6 +142,11 @@ int dummy_array[] = {
 	KBASE_KTRACE_CODE_MAKE_CODE(PM_RUNTIME_SUSPEND_CALLBACK),
 	KBASE_KTRACE_CODE_MAKE_CODE(PM_RUNTIME_RESUME_CALLBACK),
 
+	/* info_val = l2 state */
+#define KBASEP_L2_STATE(n) KBASE_KTRACE_CODE_MAKE_CODE(PM_L2_ ## n),
+#include "backend/gpu/mali_kbase_pm_l2_states.h"
+#undef KBASEP_L2_STATE
+
 	/*
 	 * Context Scheduler events
 	 */
@@ -157,6 +162,10 @@ int dummy_array[] = {
 	KBASE_KTRACE_CODE_MAKE_CODE(ARB_VM_STATE),
 	KBASE_KTRACE_CODE_MAKE_CODE(ARB_VM_EVT),
 #endif
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	KBASE_KTRACE_CODE_MAKE_CODE(PM_RAIL_ON),
+	KBASE_KTRACE_CODE_MAKE_CODE(PM_RAIL_OFF),
+#endif
 
 #if MALI_USE_CSF
 #include "debug/backend/mali_kbase_debug_ktrace_codes_csf.h"
diff --git a/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h b/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h
index 4694b78..8d9e11e 100644
--- a/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h
+++ b/mali_kbase/debug/mali_kbase_debug_ktrace_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -138,8 +138,8 @@ enum kbase_ktrace_code {
 };
 
 /**
- * struct kbase_ktrace - object representing a trace message added to trace
- *                      buffer trace_rbuf in &kbase_device
+ * struct kbase_ktrace_msg - object representing a trace message added to trace
+ *                           buffer trace_rbuf in &kbase_device
  * @timestamp: CPU timestamp at which the trace message was added.
  * @thread_id: id of the thread in the context of which trace message was
  *             added.
diff --git a/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h b/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h
index 5fac763..1b95306 100644
--- a/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h
+++ b/mali_kbase/debug/mali_kbase_debug_linux_ktrace.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014, 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014, 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -98,6 +98,9 @@ DEFINE_MALI_ADD_EVENT(PM_WAKE_WAITERS);
 DEFINE_MALI_ADD_EVENT(PM_POWEROFF_WAIT_WQ);
 DEFINE_MALI_ADD_EVENT(PM_RUNTIME_SUSPEND_CALLBACK);
 DEFINE_MALI_ADD_EVENT(PM_RUNTIME_RESUME_CALLBACK);
+#define KBASEP_L2_STATE(n) DEFINE_MALI_ADD_EVENT(PM_L2_ ## n);
+#include "backend/gpu/mali_kbase_pm_l2_states.h"
+#undef KBASEP_L2_STATE
 DEFINE_MALI_ADD_EVENT(SCHED_RETAIN_CTX_NOLOCK);
 DEFINE_MALI_ADD_EVENT(SCHED_RELEASE_CTX);
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
@@ -107,6 +110,11 @@ DEFINE_MALI_ADD_EVENT(ARB_VM_STATE);
 DEFINE_MALI_ADD_EVENT(ARB_VM_EVT);
 
 #endif
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+DEFINE_MALI_ADD_EVENT(PM_RAIL_ON);
+DEFINE_MALI_ADD_EVENT(PM_RAIL_OFF);
+#endif
+
 #if MALI_USE_CSF
 #include "backend/mali_kbase_debug_linux_ktrace_csf.h"
 #else
diff --git a/mali_kbase/device/backend/mali_kbase_device_csf.c b/mali_kbase/device/backend/mali_kbase_device_csf.c
index 5325658..571761f 100644
--- a/mali_kbase/device/backend/mali_kbase_device_csf.c
+++ b/mali_kbase/device/backend/mali_kbase_device_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,26 +23,27 @@
 #include <device/mali_kbase_device.h>
 
 #include <mali_kbase_hwaccess_backend.h>
-#include <mali_kbase_hwcnt_backend_csf_if_fw.h>
-#include <mali_kbase_hwcnt_watchdog_if_timer.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h>
+#include <hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h>
 #include <mali_kbase_ctx_sched.h>
 #include <mali_kbase_reset_gpu.h>
 #include <csf/mali_kbase_csf.h>
 #include <csf/ipa_control/mali_kbase_csf_ipa_control.h>
-
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
 #include <backend/gpu/mali_kbase_model_linux.h>
-#endif
 
 #include <mali_kbase.h>
 #include <backend/gpu/mali_kbase_irq_internal.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
-#include <backend/gpu/mali_kbase_js_internal.h>
 #include <backend/gpu/mali_kbase_clk_rate_trace_mgr.h>
 #include <csf/mali_kbase_csf_csg_debugfs.h>
-#include <mali_kbase_hwcnt_virtualizer.h>
+#include <csf/mali_kbase_csf_kcpu_fence_debugfs.h>
+#include <hwcnt/mali_kbase_hwcnt_virtualizer.h>
 #include <mali_kbase_kinstr_prfcnt.h>
 #include <mali_kbase_vinstr.h>
+#include <tl/mali_kbase_timeline.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+#endif
 
 /**
  * kbase_device_firmware_hwcnt_term - Terminate CSF firmware and HWC
@@ -60,7 +61,7 @@ static void kbase_device_firmware_hwcnt_term(struct kbase_device *kbdev)
 		kbase_vinstr_term(kbdev->vinstr_ctx);
 		kbase_hwcnt_virtualizer_term(kbdev->hwcnt_gpu_virt);
 		kbase_hwcnt_backend_csf_metadata_term(&kbdev->hwcnt_gpu_iface);
-		kbase_csf_firmware_term(kbdev);
+		kbase_csf_firmware_unload_term(kbdev);
 	}
 }
 
@@ -86,18 +87,14 @@ static int kbase_backend_late_init(struct kbase_device *kbdev)
 	if (err)
 		goto fail_pm_powerup;
 
-	err = kbase_backend_timer_init(kbdev);
-	if (err)
-		goto fail_timer;
-
 #ifdef CONFIG_MALI_DEBUG
-#ifndef CONFIG_MALI_NO_MALI
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 	if (kbasep_common_test_interrupt_handlers(kbdev) != 0) {
 		dev_err(kbdev->dev, "Interrupt assignment check failed.\n");
 		err = -EINVAL;
 		goto fail_interrupt_test;
 	}
-#endif /* !CONFIG_MALI_NO_MALI */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
 #endif /* CONFIG_MALI_DEBUG */
 
 	kbase_ipa_control_init(kbdev);
@@ -141,13 +138,11 @@ fail_pm_metrics_init:
 	kbase_ipa_control_term(kbdev);
 
 #ifdef CONFIG_MALI_DEBUG
-#ifndef CONFIG_MALI_NO_MALI
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 fail_interrupt_test:
-#endif /* !CONFIG_MALI_NO_MALI */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
 #endif /* CONFIG_MALI_DEBUG */
 
-	kbase_backend_timer_term(kbdev);
-fail_timer:
 	kbase_pm_context_idle(kbdev);
 	kbase_hwaccess_pm_halt(kbdev);
 fail_pm_powerup:
@@ -191,12 +186,26 @@ static int kbase_csf_early_init(struct kbase_device *kbdev)
 }
 
 /**
- * kbase_csf_early_init - Early termination for firmware & scheduler.
+ * kbase_csf_early_term() - Early termination for firmware & scheduler.
  * @kbdev:	Device pointer
  */
 static void kbase_csf_early_term(struct kbase_device *kbdev)
 {
 	kbase_csf_scheduler_early_term(kbdev);
+	kbase_csf_firmware_early_term(kbdev);
+}
+
+/**
+ * kbase_csf_late_init - late initialization for firmware.
+ * @kbdev:	Device pointer
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+static int kbase_csf_late_init(struct kbase_device *kbdev)
+{
+	int err = kbase_csf_firmware_late_init(kbdev);
+
+	return err;
 }
 
 /**
@@ -268,60 +277,55 @@ static void kbase_device_hwcnt_backend_csf_term(struct kbase_device *kbdev)
 }
 
 static const struct kbase_device_init dev_init[] = {
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	{ kbase_gpu_device_create, kbase_gpu_device_destroy,
-	  "Dummy model initialization failed" },
-#else
+#if !IS_ENABLED(CONFIG_MALI_REAL_HW)
+	{ kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" },
+#else /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
 	{ assign_irqs, NULL, "IRQ search failed" },
+#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
+#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
 	{ registers_map, registers_unmap, "Register map failed" },
-#endif
-	{ power_control_init, power_control_term,
-	  "Power control initialization failed" },
+#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	{ kbase_gpu_metrics_init, kbase_gpu_metrics_term, "GPU metrics initialization failed" },
+#endif /* IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) */
+	{ power_control_init, power_control_term, "Power control initialization failed" },
 	{ kbase_device_io_history_init, kbase_device_io_history_term,
 	  "Register access history initialization failed" },
-	{ kbase_device_early_init, kbase_device_early_term,
-	  "Early device initialization failed" },
-	{ kbase_device_populate_max_freq, NULL,
-	  "Populating max frequency failed" },
+	{ kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" },
+	{ kbase_backend_time_init, NULL, "Time backend initialization failed" },
 	{ kbase_device_misc_init, kbase_device_misc_term,
 	  "Miscellaneous device initialization failed" },
 	{ kbase_device_pcm_dev_init, kbase_device_pcm_dev_term,
 	  "Priority control manager initialization failed" },
-	{ kbase_ctx_sched_init, kbase_ctx_sched_term,
-	  "Context scheduler initialization failed" },
-	{ kbase_mem_init, kbase_mem_term,
-	  "Memory subsystem initialization failed" },
+	{ kbase_ctx_sched_init, kbase_ctx_sched_term, "Context scheduler initialization failed" },
+	{ kbase_mem_init, kbase_mem_term, "Memory subsystem initialization failed" },
 	{ kbase_csf_protected_memory_init, kbase_csf_protected_memory_term,
 	  "Protected memory allocator initialization failed" },
 	{ kbase_device_coherency_init, NULL, "Device coherency init failed" },
 	{ kbase_protected_mode_init, kbase_protected_mode_term,
 	  "Protected mode subsystem initialization failed" },
-	{ kbase_device_list_init, kbase_device_list_term,
-	  "Device list setup failed" },
+	{ kbase_device_list_init, kbase_device_list_term, "Device list setup failed" },
 	{ kbase_device_timeline_init, kbase_device_timeline_term,
 	  "Timeline stream initialization failed" },
 	{ kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term,
 	  "Clock rate trace manager initialization failed" },
-	{ kbase_lowest_gpu_freq_init, NULL,
-	  "Lowest freq initialization failed" },
-	{ kbase_device_hwcnt_watchdog_if_init,
-	  kbase_device_hwcnt_watchdog_if_term,
+	{ kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term,
 	  "GPU hwcnt backend watchdog interface creation failed" },
-	{ kbase_device_hwcnt_backend_csf_if_init,
-	  kbase_device_hwcnt_backend_csf_if_term,
+	{ kbase_device_hwcnt_backend_csf_if_init, kbase_device_hwcnt_backend_csf_if_term,
 	  "GPU hwcnt backend CSF interface creation failed" },
-	{ kbase_device_hwcnt_backend_csf_init,
-	  kbase_device_hwcnt_backend_csf_term,
+	{ kbase_device_hwcnt_backend_csf_init, kbase_device_hwcnt_backend_csf_term,
 	  "GPU hwcnt backend creation failed" },
 	{ kbase_device_hwcnt_context_init, kbase_device_hwcnt_context_term,
 	  "GPU hwcnt context initialization failed" },
-	{ kbase_backend_late_init, kbase_backend_late_term,
-	  "Late backend initialization failed" },
-	{ kbase_csf_early_init, kbase_csf_early_term,
-	  "Early CSF initialization failed" },
+	{ kbase_csf_early_init, kbase_csf_early_term, "Early CSF initialization failed" },
+	{ kbase_backend_late_init, kbase_backend_late_term, "Late backend initialization failed" },
+	{ kbase_csf_late_init, NULL, "Late CSF initialization failed" },
 	{ NULL, kbase_device_firmware_hwcnt_term, NULL },
-	{ kbase_device_debugfs_init, kbase_device_debugfs_term,
-	  "DebugFS initialization failed" },
+	{ kbase_debug_csf_fault_init, kbase_debug_csf_fault_term,
+	  "CSF fault debug initialization failed" },
+	{ kbase_device_debugfs_init, kbase_device_debugfs_term, "DebugFS initialization failed" },
+	{ kbase_csf_fence_timer_debugfs_init, kbase_csf_fence_timer_debugfs_term,
+	  "Fence timeout DebugFS initialization failed" },
 	/* Sysfs init needs to happen before registering the device with
 	 * misc_register(), otherwise it causes a race condition between
 	 * registering the device and a uevent event being generated for
@@ -339,8 +343,11 @@ static const struct kbase_device_init dev_init[] = {
 	  "Misc device registration failed" },
 	{ kbase_gpuprops_populate_user_buffer, kbase_gpuprops_free_user_buffer,
 	  "GPU property population failed" },
-	{ kbase_device_late_init, kbase_device_late_term,
-	  "Late device initialization failed" },
+	{ kbase_device_late_init, kbase_device_late_term, "Late device initialization failed" },
+#if IS_ENABLED(CONFIG_MALI_CORESIGHT)
+	{ kbase_debug_coresight_csf_init, kbase_debug_coresight_csf_term,
+	  "Coresight initialization failed" },
+#endif /* IS_ENABLED(CONFIG_MALI_CORESIGHT) */
 };
 
 static void kbase_device_term_partial(struct kbase_device *kbdev,
@@ -354,7 +361,6 @@ static void kbase_device_term_partial(struct kbase_device *kbdev,
 
 void kbase_device_term(struct kbase_device *kbdev)
 {
-	kbdev->csf.mali_file_inode = NULL;
 	kbase_device_term_partial(kbdev, ARRAY_SIZE(dev_init));
 	kbase_mem_halt(kbdev);
 }
@@ -468,7 +474,7 @@ static int kbase_csf_firmware_deferred_init(struct kbase_device *kbdev)
 
 	lockdep_assert_held(&kbdev->fw_load_lock);
 
-	err = kbase_csf_firmware_init(kbdev);
+	err = kbase_csf_firmware_load_init(kbdev);
 	if (!err) {
 		unsigned long flags;
 
@@ -498,11 +504,12 @@ int kbase_device_firmware_init_once(struct kbase_device *kbdev)
 
 		ret = kbase_device_hwcnt_csf_deferred_init(kbdev);
 		if (ret) {
-			kbase_csf_firmware_term(kbdev);
+			kbase_csf_firmware_unload_term(kbdev);
 			goto out;
 		}
 
 		kbase_csf_debugfs_init(kbdev);
+		kbase_timeline_io_debugfs_init(kbdev);
 out:
 		kbase_pm_context_idle(kbdev);
 	}
@@ -511,4 +518,4 @@ out:
 
 	return ret;
 }
-KBASE_EXPORT_TEST_API(kbase_device_firmware_init_once);
+KBASE_EXPORT_TEST_API(kbase_device_firmware_init_once);
+\ No newline at end of file
diff --git a/mali_kbase/device/backend/mali_kbase_device_hw_csf.c b/mali_kbase/device/backend/mali_kbase_device_hw_csf.c
index e2228ca..c837f5a 100644
--- a/mali_kbase/device/backend/mali_kbase_device_hw_csf.c
+++ b/mali_kbase/device/backend/mali_kbase_device_hw_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,6 +24,7 @@
 #include <backend/gpu/mali_kbase_instr_internal.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <device/mali_kbase_device.h>
+#include <device/mali_kbase_device_internal.h>
 #include <mali_kbase_reset_gpu.h>
 #include <mmu/mali_kbase_mmu.h>
 #include <mali_kbase_ctx_sched.h>
@@ -57,7 +58,7 @@ static void kbase_gpu_fault_interrupt(struct kbase_device *kbdev)
 {
 	const u32 status = kbase_reg_read(kbdev,
 			GPU_CONTROL_REG(GPU_FAULTSTATUS));
-	const bool as_valid = status & GPU_FAULTSTATUS_JASID_VALID_FLAG;
+	const bool as_valid = status & GPU_FAULTSTATUS_JASID_VALID_MASK;
 	const u32 as_nr = (status & GPU_FAULTSTATUS_JASID_MASK) >>
 			GPU_FAULTSTATUS_JASID_SHIFT;
 	bool bus_fault = (status & GPU_FAULTSTATUS_EXCEPTION_TYPE_MASK) ==
@@ -83,6 +84,37 @@ static void kbase_gpu_fault_interrupt(struct kbase_device *kbdev)
 
 }
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+/* When the GLB_PWROFF_TIMER expires, FW will write the SHADER_PWROFF register, this sequence
+ * follows:
+ *  - SHADER_PWRTRANS goes high
+ *  - SHADER_READY goes low
+ *  - Iterator is told not to send any more work to the core
+ *  - Wait for the core to drain
+ *  - SHADER_PWRACTIVE goes low
+ *  - Do an IPA sample
+ *  - Flush the core
+ *  - Apply functional isolation
+ *  - Turn the clock off
+ *  - Put the core in reset
+ *  - Apply electrical isolation
+ *  - Power off the core
+ *  - SHADER_PWRTRANS goes low
+ *
+ * It's therefore safe to turn off the SC rail when:
+ *  - SHADER_READY == 0, this means the SC's last transitioned to OFF
+ *  - SHADER_PWRTRANS == 0, this means the SC's have finished transitioning
+ */
+static bool safe_to_turn_off_sc_rail(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	return (kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_READY_HI)) ||
+		kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_READY_LO)) ||
+		kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI)) ||
+		kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO))) == 0;
+}
+#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */
+
 void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 {
 	KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ, NULL, val);
@@ -115,6 +147,9 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 									GPU_EXCEPTION_TYPE_SW_FAULT_0,
 							} } };
 
+			kbase_debug_csf_fault_notify(kbdev, scheduler->active_protm_grp->kctx,
+						     DF_GPU_PROTECTED_FAULT);
+
 			scheduler->active_protm_grp->faulted = true;
 			kbase_csf_add_group_fatal_error(
 				scheduler->active_protm_grp, &err_payload);
@@ -146,7 +181,6 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 
 		dev_dbg(kbdev->dev, "Doorbell mirror interrupt received");
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-		WARN_ON(!kbase_csf_scheduler_get_nr_active_csgs(kbdev));
 		kbase_pm_disable_db_mirror_interrupt(kbdev);
 		kbdev->pm.backend.exit_gpu_sleep_mode = true;
 		kbase_csf_scheduler_invoke_tick(kbdev);
@@ -166,6 +200,16 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 	if (val & CLEAN_CACHES_COMPLETED)
 		kbase_clean_caches_done(kbdev);
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	if (val & POWER_CHANGED_ALL) {
+		unsigned long flags;
+		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+		kbdev->pm.backend.sc_pwroff_safe = safe_to_turn_off_sc_rail(kbdev);
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	}
+#endif
+
+
 	if (val & (POWER_CHANGED_ALL | MCU_STATUS_GPU_IRQ)) {
 		kbase_pm_power_changed(kbdev);
 	} else if (val & CLEAN_CACHES_COMPLETED) {
@@ -184,7 +228,7 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 }
 
 #if !IS_ENABLED(CONFIG_MALI_NO_MALI)
-static bool kbase_is_register_accessible(u32 offset)
+bool kbase_is_register_accessible(u32 offset)
 {
 #ifdef CONFIG_MALI_DEBUG
 	if (((offset >= MCU_SUBSYSTEM_BASE) && (offset < IPA_CONTROL_BASE)) ||
@@ -196,11 +240,16 @@ static bool kbase_is_register_accessible(u32 offset)
 
 	return true;
 }
+#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
 
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 void kbase_reg_write(struct kbase_device *kbdev, u32 offset, u32 value)
 {
-	KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered);
-	KBASE_DEBUG_ASSERT(kbdev->dev != NULL);
+	if (WARN_ON(!kbdev->pm.backend.gpu_powered))
+		return;
+
+	if (WARN_ON(kbdev->dev == NULL))
+		return;
 
 	if (!kbase_is_register_accessible(offset))
 		return;
@@ -220,8 +269,11 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 {
 	u32 val;
 
-	KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered);
-	KBASE_DEBUG_ASSERT(kbdev->dev != NULL);
+	if (WARN_ON(!kbdev->pm.backend.gpu_powered))
+		return 0;
+
+	if (WARN_ON(kbdev->dev == NULL))
+		return 0;
 
 	if (!kbase_is_register_accessible(offset))
 		return 0;
@@ -238,4 +290,4 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 	return val;
 }
 KBASE_EXPORT_TEST_API(kbase_reg_read);
-#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
diff --git a/mali_kbase/device/backend/mali_kbase_device_hw_jm.c b/mali_kbase/device/backend/mali_kbase_device_hw_jm.c
index ff57cf6..8f7b39b 100644
--- a/mali_kbase/device/backend/mali_kbase_device_hw_jm.c
+++ b/mali_kbase/device/backend/mali_kbase_device_hw_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -63,9 +63,6 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 	if (val & RESET_COMPLETED)
 		kbase_pm_reset_done(kbdev);
 
-	if (val & PRFCNT_SAMPLE_COMPLETED)
-		kbase_instr_hwcnt_sample_done(kbdev);
-
 	/* Defer clearing CLEAN_CACHES_COMPLETED to kbase_clean_caches_done.
 	 * We need to acquire hwaccess_lock to avoid a race condition with
 	 * kbase_gpu_cache_flush_and_busy_wait
@@ -73,6 +70,13 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 	KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, val & ~CLEAN_CACHES_COMPLETED);
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), val & ~CLEAN_CACHES_COMPLETED);
 
+	/* kbase_instr_hwcnt_sample_done frees the HWCNT pipeline to request another
+	 * sample. Therefore this must be called after clearing the IRQ to avoid a
+	 * race between clearing and the next sample raising the IRQ again.
+	 */
+	if (val & PRFCNT_SAMPLE_COMPLETED)
+		kbase_instr_hwcnt_sample_done(kbdev);
+
 	/* kbase_pm_check_transitions (called by kbase_pm_power_changed) must
 	 * be called after the IRQ has been cleared. This is because it might
 	 * trigger further power transitions and we don't want to miss the
@@ -102,11 +106,10 @@ void kbase_gpu_interrupt(struct kbase_device *kbdev, u32 val)
 	KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_DONE, NULL, val);
 }
 
-#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 void kbase_reg_write(struct kbase_device *kbdev, u32 offset, u32 value)
 {
-	KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered);
-	KBASE_DEBUG_ASSERT(kbdev->dev != NULL);
+	WARN_ON(!kbdev->pm.backend.gpu_powered);
 
 	writel(value, kbdev->reg + offset);
 
@@ -121,10 +124,10 @@ KBASE_EXPORT_TEST_API(kbase_reg_write);
 
 u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 {
-	u32 val;
+	u32 val = 0;
 
-	KBASE_DEBUG_ASSERT(kbdev->pm.backend.gpu_powered);
-	KBASE_DEBUG_ASSERT(kbdev->dev != NULL);
+	if (WARN_ON(!kbdev->pm.backend.gpu_powered))
+		return val;
 
 	val = readl(kbdev->reg + offset);
 
@@ -138,4 +141,4 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset)
 	return val;
 }
 KBASE_EXPORT_TEST_API(kbase_reg_read);
-#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
diff --git a/mali_kbase/device/backend/mali_kbase_device_jm.c b/mali_kbase/device/backend/mali_kbase_device_jm.c
index ab75bc6..0ce2bc8 100644
--- a/mali_kbase/device/backend/mali_kbase_device_jm.c
+++ b/mali_kbase/device/backend/mali_kbase_device_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -29,13 +29,10 @@
 #include <mali_kbase_hwaccess_backend.h>
 #include <mali_kbase_ctx_sched.h>
 #include <mali_kbase_reset_gpu.h>
-#include <mali_kbase_hwcnt_watchdog_if_timer.h>
-#include <mali_kbase_hwcnt_backend_jm.h>
-#include <mali_kbase_hwcnt_backend_jm_watchdog.h>
-
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+#include <hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h>
 #include <backend/gpu/mali_kbase_model_linux.h>
-#endif /* CONFIG_MALI_NO_MALI */
 
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 #include <arbiter/mali_kbase_arbiter_pm.h>
@@ -48,6 +45,9 @@
 #include <backend/gpu/mali_kbase_pm_internal.h>
 #include <mali_kbase_dummy_job_wa.h>
 #include <backend/gpu/mali_kbase_clk_rate_trace_mgr.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+#endif
 
 /**
  * kbase_backend_late_init - Perform any backend-specific initialization.
@@ -76,13 +76,13 @@ static int kbase_backend_late_init(struct kbase_device *kbdev)
 		goto fail_timer;
 
 #ifdef CONFIG_MALI_DEBUG
-#ifndef CONFIG_MALI_NO_MALI
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 	if (kbasep_common_test_interrupt_handlers(kbdev) != 0) {
 		dev_err(kbdev->dev, "Interrupt assignment check failed.\n");
 		err = -EINVAL;
 		goto fail_interrupt_test;
 	}
-#endif /* !CONFIG_MALI_NO_MALI */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
 #endif /* CONFIG_MALI_DEBUG */
 
 	err = kbase_job_slot_init(kbdev);
@@ -121,9 +121,9 @@ fail_devfreq_init:
 fail_job_slot:
 
 #ifdef CONFIG_MALI_DEBUG
-#ifndef CONFIG_MALI_NO_MALI
+#if IS_ENABLED(CONFIG_MALI_REAL_HW)
 fail_interrupt_test:
-#endif /* !CONFIG_MALI_NO_MALI */
+#endif /* IS_ENABLED(CONFIG_MALI_REAL_HW) */
 #endif /* CONFIG_MALI_DEBUG */
 
 	kbase_backend_timer_term(kbdev);
@@ -215,17 +215,22 @@ static void kbase_device_hwcnt_backend_jm_watchdog_term(struct kbase_device *kbd
 }
 
 static const struct kbase_device_init dev_init[] = {
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+#if !IS_ENABLED(CONFIG_MALI_REAL_HW)
 	{ kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" },
-#else
+#else /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
 	{ assign_irqs, NULL, "IRQ search failed" },
+#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
+#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
 	{ registers_map, registers_unmap, "Register map failed" },
-#endif
+#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	{ kbase_gpu_metrics_init, kbase_gpu_metrics_term, "GPU metrics initialization failed" },
+#endif /* IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD) */
 	{ kbase_device_io_history_init, kbase_device_io_history_term,
 	  "Register access history initialization failed" },
 	{ kbase_device_pm_init, kbase_device_pm_term, "Power management initialization failed" },
 	{ kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" },
-	{ kbase_device_populate_max_freq, NULL, "Populating max frequency failed" },
+	{ kbase_backend_time_init, NULL, "Time backend initialization failed" },
 	{ kbase_device_misc_init, kbase_device_misc_term,
 	  "Miscellaneous device initialization failed" },
 	{ kbase_device_pcm_dev_init, kbase_device_pcm_dev_term,
@@ -241,7 +246,6 @@ static const struct kbase_device_init dev_init[] = {
 	  "Timeline stream initialization failed" },
 	{ kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term,
 	  "Clock rate trace manager initialization failed" },
-	{ kbase_lowest_gpu_freq_init, NULL, "Lowest freq initialization failed" },
 	{ kbase_instr_backend_init, kbase_instr_backend_term,
 	  "Instrumentation backend initialization failed" },
 	{ kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term,
@@ -323,20 +327,21 @@ int kbase_device_init(struct kbase_device *kbdev)
 		}
 	}
 
-	kthread_init_worker(&kbdev->job_done_worker);
-	kbdev->job_done_worker_thread = kbase_create_realtime_thread(kbdev,
-		kthread_worker_fn, &kbdev->job_done_worker, "mali_jd_thread");
-	if (IS_ERR(kbdev->job_done_worker_thread))
-		return PTR_ERR(kbdev->job_done_worker_thread);
+	if (err)
+		return err;
+
+	err = kbase_kthread_run_worker_rt(kbdev, &kbdev->job_done_worker, "mali_jd_thread");
+	if (err)
+		return err;
 
 	err = kbase_pm_apc_init(kbdev);
 	if (err)
 		return err;
 
 	kthread_init_worker(&kbdev->event_worker);
-	kbdev->event_worker_thread = kthread_run(kthread_worker_fn,
-		&kbdev->event_worker, "mali_event_thread");
-	if (IS_ERR(kbdev->event_worker_thread)) {
+	kbdev->event_worker.task =
+		kthread_run(kthread_worker_fn, &kbdev->event_worker, "mali_event_thread");
+	if (IS_ERR(kbdev->event_worker.task)) {
 		err = -ENOMEM;
 	}
 
@@ -358,4 +363,4 @@ int kbase_device_firmware_init_once(struct kbase_device *kbdev)
 	mutex_unlock(&kbdev->fw_load_lock);
 
 	return ret;
-}
+}
+\ No newline at end of file
diff --git a/mali_kbase/device/mali_kbase_device.c b/mali_kbase/device/mali_kbase_device.c
index c123010..e5b3e2b 100644
--- a/mali_kbase/device/mali_kbase_device.c
+++ b/mali_kbase/device/mali_kbase_device.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -35,6 +35,7 @@
 #include <mali_kbase.h>
 #include <mali_kbase_defs.h>
 #include <mali_kbase_hwaccess_instr.h>
+#include <mali_kbase_hwaccess_time.h>
 #include <mali_kbase_hw.h>
 #include <mali_kbase_config_defaults.h>
 #include <linux/priority_control_manager.h>
@@ -42,8 +43,8 @@
 #include <tl/mali_kbase_timeline.h>
 #include "mali_kbase_kinstr_prfcnt.h"
 #include "mali_kbase_vinstr.h"
-#include "mali_kbase_hwcnt_context.h"
-#include "mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_context.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
 
 #include "mali_kbase_device.h"
 #include "mali_kbase_device_internal.h"
@@ -56,17 +57,15 @@
 #include "arbiter/mali_kbase_arbiter_pm.h"
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
 
-/* NOTE: Magic - 0x45435254 (TRCE in ASCII).
- * Supports tracing feature provided in the base module.
- * Please keep it in sync with the value of base module.
- */
-#define TRACE_BUFFER_HEADER_SPECIAL 0x45435254
+#if defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI)
 
 /* Number of register accesses for the buffer that we allocate during
  * initialization time. The buffer size can be changed later via debugfs.
  */
 #define KBASEP_DEFAULT_REGISTER_HISTORY_SIZE ((u16)512)
 
+#endif /* defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+
 static DEFINE_MUTEX(kbase_dev_list_lock);
 static LIST_HEAD(kbase_dev_list);
 static int kbase_dev_nr;
@@ -187,8 +186,8 @@ static int mali_oom_notifier_handler(struct notifier_block *nb,
 	kbdev_alloc_total =
 		KBASE_PAGES_TO_KIB(atomic_read(&(kbdev->memdev.used_pages)));
 
-	dev_err(kbdev->dev, "OOM notifier: dev %s  %lu kB\n", kbdev->devname,
-		kbdev_alloc_total);
+	dev_info(kbdev->dev,
+		"System reports low memory, GPU memory usage summary:\n");
 
 	mutex_lock(&kbdev->kctx_list_lock);
 
@@ -202,15 +201,18 @@ static int mali_oom_notifier_handler(struct notifier_block *nb,
 		pid_struct = find_get_pid(kctx->pid);
 		task = pid_task(pid_struct, PIDTYPE_PID);
 
-		dev_err(kbdev->dev,
-			"OOM notifier: tsk %s  tgid (%u)  pid (%u) %lu kB\n",
-			task ? task->comm : "[null task]", kctx->tgid,
-			kctx->pid, task_alloc_total);
+		dev_info(kbdev->dev,
+			" tsk %s tgid %u pid %u has allocated %lu kB GPU memory\n",
+			task ? task->comm : "[null task]", kctx->tgid, kctx->pid,
+			task_alloc_total);
 
 		put_pid(pid_struct);
 		rcu_read_unlock();
 	}
 
+	dev_info(kbdev->dev, "End of summary, device usage is %lu kB\n",
+		kbdev_alloc_total);
+
 	mutex_unlock(&kbdev->kctx_list_lock);
 	return NOTIFY_OK;
 }
@@ -228,11 +230,14 @@ int kbase_device_misc_init(struct kbase_device * const kbdev)
 	kbdev->cci_snoop_enabled = false;
 	np = kbdev->dev->of_node;
 	if (np != NULL) {
-		if (of_property_read_u32(np, "snoop_enable_smc",
-					&kbdev->snoop_enable_smc))
+		/* Read "-" versions of the properties and fallback to "_"
+		 * if these are not found
+		 */
+		if (of_property_read_u32(np, "snoop-enable-smc", &kbdev->snoop_enable_smc) &&
+		    of_property_read_u32(np, "snoop_enable_smc", &kbdev->snoop_enable_smc))
 			kbdev->snoop_enable_smc = 0;
-		if (of_property_read_u32(np, "snoop_disable_smc",
-					&kbdev->snoop_disable_smc))
+		if (of_property_read_u32(np, "snoop-disable-smc", &kbdev->snoop_disable_smc) &&
+		    of_property_read_u32(np, "snoop_disable_smc", &kbdev->snoop_disable_smc))
 			kbdev->snoop_disable_smc = 0;
 		/* Either both or none of the calls should be provided. */
 		if (!((kbdev->snoop_disable_smc == 0
@@ -279,9 +284,7 @@ int kbase_device_misc_init(struct kbase_device * const kbdev)
 		goto dma_set_mask_failed;
 
 
-	/* There is no limit for Mali, so set to max. We only do this if dma_parms
-	 * is already allocated by the platform.
-	 */
+	/* There is no limit for Mali, so set to max. */
 	if (kbdev->dev->dma_parms)
 		err = dma_set_max_seg_size(kbdev->dev, UINT_MAX);
 	if (err)
@@ -293,12 +296,9 @@ int kbase_device_misc_init(struct kbase_device * const kbdev)
 	if (err)
 		goto dma_set_mask_failed;
 
-	err = kbase_ktrace_init(kbdev);
-	if (err)
-		goto term_as;
 	err = kbase_pbha_read_dtb(kbdev);
 	if (err)
-		goto term_ktrace;
+		goto term_as;
 
 	init_waitqueue_head(&kbdev->cache_clean_wait);
 
@@ -308,10 +308,15 @@ int kbase_device_misc_init(struct kbase_device * const kbdev)
 
 	kbdev->pm.dvfs_period = DEFAULT_PM_DVFS_PERIOD;
 
-	kbdev->reset_timeout_ms = DEFAULT_RESET_TIMEOUT_MS;
+#if MALI_USE_CSF
+	kbdev->reset_timeout_ms = kbase_get_timeout_ms(kbdev, CSF_GPU_RESET_TIMEOUT);
+#else /* MALI_USE_CSF */
+	kbdev->reset_timeout_ms = JM_DEFAULT_RESET_TIMEOUT_MS;
+#endif /* !MALI_USE_CSF */
 
 	kbdev->mmu_mode = kbase_mmu_mode_get_aarch64();
-
+	kbdev->mmu_or_gpu_cache_op_wait_time_ms =
+		kbase_get_timeout_ms(kbdev, MMU_AS_INACTIVE_WAIT_TIMEOUT);
 	mutex_init(&kbdev->kctx_list_lock);
 	INIT_LIST_HEAD(&kbdev->kctx_list);
 
@@ -324,10 +329,16 @@ int kbase_device_misc_init(struct kbase_device * const kbdev)
 			"Unable to register OOM notifier for Mali - but will continue\n");
 		kbdev->oom_notifier_block.notifier_call = NULL;
 	}
+
+#if MALI_USE_CSF
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+	atomic_set(&kbdev->live_fence_metadata, 0);
+#endif /* IS_ENABLED(CONFIG_SYNC_FILE) */
+	atomic_set(&kbdev->fence_signal_timeout_enabled, 1);
+#endif
+
 	return 0;
 
-term_ktrace:
-	kbase_ktrace_term(kbdev);
 term_as:
 	kbase_device_all_as_term(kbdev);
 dma_set_mask_failed:
@@ -344,14 +355,16 @@ void kbase_device_misc_term(struct kbase_device *kbdev)
 #if KBASE_KTRACE_ENABLE
 	kbase_debug_assert_register_hook(NULL, NULL);
 #endif
-
-	kbase_ktrace_term(kbdev);
-
 	kbase_device_all_as_term(kbdev);
 
 
 	if (kbdev->oom_notifier_block.notifier_call)
 		unregister_oom_notifier(&kbdev->oom_notifier_block);
+
+#if MALI_USE_CSF && IS_ENABLED(CONFIG_SYNC_FILE)
+	if (atomic_read(&kbdev->live_fence_metadata) > 0)
+		dev_warn(kbdev->dev, "Terminating Kbase device with live fence metadata!");
+#endif
 }
 
 void kbase_device_free(struct kbase_device *kbdev)
@@ -361,8 +374,7 @@ void kbase_device_free(struct kbase_device *kbdev)
 
 void kbase_device_id_init(struct kbase_device *kbdev)
 {
-	scnprintf(kbdev->devname, DEVNAME_SIZE, "%s%d", kbase_drv_name,
-			kbase_dev_nr);
+	scnprintf(kbdev->devname, DEVNAME_SIZE, "%s%d", KBASE_DRV_NAME, kbase_dev_nr);
 	kbdev->id = kbase_dev_nr;
 }
 
@@ -484,10 +496,14 @@ int kbase_device_early_init(struct kbase_device *kbdev)
 {
 	int err;
 
+	err = kbase_ktrace_init(kbdev);
+	if (err)
+		return err;
+
 
 	err = kbasep_platform_device_init(kbdev);
 	if (err)
-		return err;
+		goto ktrace_term;
 
 	err = kbase_pm_runtime_init(kbdev);
 	if (err)
@@ -501,7 +517,12 @@ int kbase_device_early_init(struct kbase_device *kbdev)
 	/* Ensure we can access the GPU registers */
 	kbase_pm_register_access_enable(kbdev);
 
-	/* Find out GPU properties based on the GPU feature registers */
+	/*
+	 * Find out GPU properties based on the GPU feature registers.
+	 * Note that this does not populate the few properties that depend on
+	 * hw_features being initialized. Those are set by kbase_gpuprops_set_features
+	 * soon after this in the init process.
+	 */
 	kbase_gpuprops_set(kbdev);
 
 	/* We're done accessing the GPU registers for now. */
@@ -524,6 +545,8 @@ fail_interrupts:
 	kbase_pm_runtime_term(kbdev);
 fail_runtime_pm:
 	kbasep_platform_device_term(kbdev);
+ktrace_term:
+	kbase_ktrace_term(kbdev);
 
 	return err;
 }
@@ -540,6 +563,7 @@ void kbase_device_early_term(struct kbase_device *kbdev)
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
 	kbase_pm_runtime_term(kbdev);
 	kbasep_platform_device_term(kbdev);
+	kbase_ktrace_term(kbdev);
 }
 
 int kbase_device_late_init(struct kbase_device *kbdev)
diff --git a/mali_kbase/device/mali_kbase_device.h b/mali_kbase/device/mali_kbase_device.h
index 5ff970a..e9cb5c2 100644
--- a/mali_kbase/device/mali_kbase_device.h
+++ b/mali_kbase/device/mali_kbase_device.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -39,7 +39,7 @@ const struct list_head *kbase_device_get_list(void);
 void kbase_device_put_list(const struct list_head *dev_list);
 
 /**
- * Kbase_increment_device_id - increment device id.
+ * kbase_increment_device_id - increment device id.
  *
  * Used to increment device id on successful initialization of the device.
  */
@@ -116,6 +116,26 @@ u32 kbase_reg_read(struct kbase_device *kbdev, u32 offset);
 bool kbase_is_gpu_removed(struct kbase_device *kbdev);
 
 /**
+ * kbase_gpu_cache_flush_pa_range_and_busy_wait() - Start a cache physical range flush
+ * and busy wait
+ *
+ * @kbdev:    kbase device to issue the MMU operation on.
+ * @phys:     Starting address of the physical range to start the operation on.
+ * @nr_bytes: Number of bytes to work on.
+ * @flush_op: Flush command register value to be sent to HW
+ *
+ * Issue a cache flush physical range command, then busy wait an irq status.
+ * This function will clear FLUSH_PA_RANGE_COMPLETED irq mask bit
+ * and busy-wait the rawstat register.
+ *
+ * Return: 0 if successful or a negative error code on failure.
+ */
+#if MALI_USE_CSF
+int kbase_gpu_cache_flush_pa_range_and_busy_wait(struct kbase_device *kbdev, phys_addr_t phys,
+						 size_t nr_bytes, u32 flush_op);
+#endif /* MALI_USE_CSF */
+
+/**
  * kbase_gpu_cache_flush_and_busy_wait - Start a cache flush and busy wait
  * @kbdev: Kbase device
  * @flush_op: Flush command register value to be sent to HW
@@ -171,6 +191,7 @@ void kbase_gpu_wait_cache_clean(struct kbase_device *kbdev);
  * called from paths (like GPU reset) where an indefinite wait for the
  * completion of cache clean operation can cause deadlock, as the operation may
  * never complete.
+ * If cache clean times out, reset GPU to recover.
  *
  * Return: 0 if successful or a negative error code on failure.
  */
@@ -188,7 +209,7 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev,
 void kbase_gpu_cache_clean_wait_complete(struct kbase_device *kbdev);
 
 /**
- * kbase_clean_caches_done - Issue preiously queued cache clean request or
+ * kbase_clean_caches_done - Issue previously queued cache clean request or
  *                           wake up the requester that issued cache clean.
  * @kbdev: Kbase device
  *
diff --git a/mali_kbase/device/mali_kbase_device_hw.c b/mali_kbase/device/mali_kbase_device_hw.c
index 249d5f8..8126b9b 100644
--- a/mali_kbase/device/mali_kbase_device_hw.c
+++ b/mali_kbase/device/mali_kbase_device_hw.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,49 +27,108 @@
 #include <mali_kbase_reset_gpu.h>
 #include <mmu/mali_kbase_mmu.h>
 
-#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
 bool kbase_is_gpu_removed(struct kbase_device *kbdev)
 {
-	u32 val;
+	if (!IS_ENABLED(CONFIG_MALI_ARBITER_SUPPORT))
+		return false;
 
-	val = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_ID));
-
-	return val == 0;
+	return (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_ID)) == 0);
 }
-#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
 
-static int busy_wait_cache_clean_irq(struct kbase_device *kbdev)
+/**
+ * busy_wait_cache_operation - Wait for a pending cache flush to complete
+ *
+ * @kbdev:   Pointer of kbase device.
+ * @irq_bit: IRQ bit cache flush operation to wait on.
+ *
+ * It will reset GPU if the wait fails.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+static int busy_wait_cache_operation(struct kbase_device *kbdev, u32 irq_bit)
 {
-	/* Previously MMU-AS command was used for L2 cache flush on page-table update.
-	 * And we're using the same max-loops count for GPU command, because amount of
-	 * L2 cache flush overhead are same between them.
-	 */
-	unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS;
+	const ktime_t wait_loop_start = ktime_get_raw();
+	const u32 wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms;
+	bool completed = false;
+	s64 diff;
+
+	do {
+		unsigned int i;
+
+		for (i = 0; i < 1000; i++) {
+			if (kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) & irq_bit) {
+				completed = true;
+				break;
+			}
+		}
 
-	/* Wait for the GPU cache clean operation to complete */
-	while (--max_loops &&
-	       !(kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_IRQ_RAWSTAT)) &
-		 CLEAN_CACHES_COMPLETED)) {
-		;
-	}
+		diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start));
+	} while ((diff < wait_time_ms) && !completed);
+
+	if (!completed) {
+		char *irq_flag_name;
+
+		switch (irq_bit) {
+		case CLEAN_CACHES_COMPLETED:
+			irq_flag_name = "CLEAN_CACHES_COMPLETED";
+			break;
+		case FLUSH_PA_RANGE_COMPLETED:
+			irq_flag_name = "FLUSH_PA_RANGE_COMPLETED";
+			break;
+		default:
+			irq_flag_name = "UNKNOWN";
+			break;
+		}
 
-	/* reset gpu if time-out occurred */
-	if (max_loops == 0) {
 		dev_err(kbdev->dev,
-			"CLEAN_CACHES_COMPLETED bit stuck, might be caused by slow/unstable GPU clock or possible faulty FPGA connector\n");
+			"Stuck waiting on %s bit, might be due to unstable GPU clk/pwr or possible faulty FPGA connector\n",
+			irq_flag_name);
+
 		if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu_locked(kbdev);
+
 		return -EBUSY;
 	}
 
-	/* Clear the interrupt CLEAN_CACHES_COMPLETED bit. */
-	KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, CLEAN_CACHES_COMPLETED);
-	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR),
-			CLEAN_CACHES_COMPLETED);
+	KBASE_KTRACE_ADD(kbdev, CORE_GPU_IRQ_CLEAR, NULL, irq_bit);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), irq_bit);
 
 	return 0;
 }
 
+#if MALI_USE_CSF
+#define U64_LO_MASK ((1ULL << 32) - 1)
+#define U64_HI_MASK (~U64_LO_MASK)
+
+int kbase_gpu_cache_flush_pa_range_and_busy_wait(struct kbase_device *kbdev, phys_addr_t phys,
+						 size_t nr_bytes, u32 flush_op)
+{
+	u64 start_pa, end_pa;
+	int ret = 0;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	/* 1. Clear the interrupt FLUSH_PA_RANGE_COMPLETED bit. */
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), FLUSH_PA_RANGE_COMPLETED);
+
+	/* 2. Issue GPU_CONTROL.COMMAND.FLUSH_PA_RANGE operation. */
+	start_pa = phys;
+	end_pa = start_pa + nr_bytes - 1;
+
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG0_LO), start_pa & U64_LO_MASK);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG0_HI),
+			(start_pa & U64_HI_MASK) >> 32);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG1_LO), end_pa & U64_LO_MASK);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND_ARG1_HI), (end_pa & U64_HI_MASK) >> 32);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), flush_op);
+
+	/* 3. Busy-wait irq status to be enabled. */
+	ret = busy_wait_cache_operation(kbdev, (u32)FLUSH_PA_RANGE_COMPLETED);
+
+	return ret;
+}
+#endif /* MALI_USE_CSF */
+
 int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev,
 					u32 flush_op)
 {
@@ -97,7 +156,7 @@ int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev,
 				irq_mask & ~CLEAN_CACHES_COMPLETED);
 
 		/* busy wait irq status to be enabled */
-		ret = busy_wait_cache_clean_irq(kbdev);
+		ret = busy_wait_cache_operation(kbdev, (u32)CLEAN_CACHES_COMPLETED);
 		if (ret)
 			return ret;
 
@@ -118,7 +177,7 @@ int kbase_gpu_cache_flush_and_busy_wait(struct kbase_device *kbdev,
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), flush_op);
 
 	/* 3. Busy-wait irq status to be enabled. */
-	ret = busy_wait_cache_clean_irq(kbdev);
+	ret = busy_wait_cache_operation(kbdev, (u32)CLEAN_CACHES_COMPLETED);
 	if (ret)
 		return ret;
 
@@ -225,8 +284,9 @@ static inline bool get_cache_clean_flag(struct kbase_device *kbdev)
 void kbase_gpu_wait_cache_clean(struct kbase_device *kbdev)
 {
 	while (get_cache_clean_flag(kbdev)) {
-		wait_event_interruptible(kbdev->cache_clean_wait,
-				!kbdev->cache_clean_in_progress);
+		if (wait_event_interruptible(kbdev->cache_clean_wait,
+					     !kbdev->cache_clean_in_progress))
+			dev_warn(kbdev->dev, "Wait for cache clean is interrupted");
 	}
 }
 
@@ -234,6 +294,7 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev,
 				unsigned int wait_timeout_ms)
 {
 	long remaining = msecs_to_jiffies(wait_timeout_ms);
+	int result = 0;
 
 	while (remaining && get_cache_clean_flag(kbdev)) {
 		remaining = wait_event_timeout(kbdev->cache_clean_wait,
@@ -241,5 +302,15 @@ int kbase_gpu_wait_cache_clean_timeout(struct kbase_device *kbdev,
 					remaining);
 	}
 
-	return (remaining ? 0 : -ETIMEDOUT);
+	if (!remaining) {
+		dev_err(kbdev->dev,
+			"Cache clean timed out. Might be caused by unstable GPU clk/pwr or faulty system");
+
+		if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+			kbase_reset_gpu_locked(kbdev);
+
+		result = -ETIMEDOUT;
+	}
+
+	return result;
 }
diff --git a/mali_kbase/device/mali_kbase_device_internal.h b/mali_kbase/device/mali_kbase_device_internal.h
index d4f6875..de54c83 100644
--- a/mali_kbase/device/mali_kbase_device_internal.h
+++ b/mali_kbase/device/mali_kbase_device_internal.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -89,3 +89,13 @@ int kbase_device_late_init(struct kbase_device *kbdev);
  * @kbdev:	Device pointer
  */
 void kbase_device_late_term(struct kbase_device *kbdev);
+
+#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+/**
+ * kbase_is_register_accessible - Checks if register is accessible
+ * @offset: Register offset
+ *
+ * Return: true if the register is accessible, false otherwise.
+ */
+bool kbase_is_register_accessible(u32 offset);
+#endif /* MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI) */
diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c
index 893a335..60ba9be 100644
--- a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c
+++ b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -86,6 +86,9 @@ const char *kbase_gpu_exception_name(u32 const exception_code)
 	case CS_FATAL_EXCEPTION_TYPE_FIRMWARE_INTERNAL_ERROR:
 		e = "FIRMWARE_INTERNAL_ERROR";
 		break;
+	case CS_FATAL_EXCEPTION_TYPE_CS_UNRECOVERABLE:
+		e = "CS_UNRECOVERABLE";
+		break;
 	case CS_FAULT_EXCEPTION_TYPE_RESOURCE_EVICTION_TIMEOUT:
 		e = "RESOURCE_EVICTION_TIMEOUT";
 		break;
@@ -102,6 +105,70 @@ const char *kbase_gpu_exception_name(u32 const exception_code)
 	case GPU_FAULTSTATUS_EXCEPTION_TYPE_GPU_CACHEABILITY_FAULT:
 		e = "GPU_CACHEABILITY_FAULT";
 		break;
+	/* MMU Fault */
+	case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L0:
+		e = "TRANSLATION_FAULT at level 0";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L1:
+		e = "TRANSLATION_FAULT at level 1";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L2:
+		e = "TRANSLATION_FAULT at level 2";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L3:
+		e = "TRANSLATION_FAULT at level 3";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_TRANSLATION_FAULT_L4:
+		e = "TRANSLATION_FAULT";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_0:
+		e = "PERMISSION_FAULT at level 0";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_1:
+		e = "PERMISSION_FAULT at level 1";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_2:
+		e = "PERMISSION_FAULT at level 2";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_PERMISSION_FAULT_3:
+		e = "PERMISSION_FAULT at level 3";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_1:
+		e = "ACCESS_FLAG at level 1";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_2:
+		e = "ACCESS_FLAG at level 2";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ACCESS_FLAG_3:
+		e = "ACCESS_FLAG at level 3";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_IN:
+		e = "ADDRESS_SIZE_FAULT_IN";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_0:
+		e = "ADDRESS_SIZE_FAULT_OUT_0 at level 0";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_1:
+		e = "ADDRESS_SIZE_FAULT_OUT_1 at level 1";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_2:
+		e = "ADDRESS_SIZE_FAULT_OUT_2 at level 2";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_ADDRESS_SIZE_FAULT_OUT_3:
+		e = "ADDRESS_SIZE_FAULT_OUT_3 at level 3";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_0:
+		e = "MEMORY_ATTRIBUTE_FAULT_0 at level 0";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_1:
+		e = "MEMORY_ATTRIBUTE_FAULT_1 at level 1";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_2:
+		e = "MEMORY_ATTRIBUTE_FAULT_2 at level 2";
+		break;
+	case CS_FAULT_EXCEPTION_TYPE_MEMORY_ATTRIBUTE_FAULT_3:
+		e = "MEMORY_ATTRIBUTE_FAULT_3 at level 3";
+		break;
 	/* Any other exception code is unknown */
 	default:
 		e = "UNKNOWN";
diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c
index 37015cc..7f3743c 100644
--- a/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c
+++ b/mali_kbase/gpu/backend/mali_kbase_gpu_fault_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -170,7 +170,7 @@ const char *kbase_gpu_exception_name(u32 const exception_code)
 	default:
 		e = "UNKNOWN";
 		break;
-	};
+	}
 
 	return e;
 }
diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h
index f6945b3..ab989e0 100644
--- a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h
+++ b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,6 +28,17 @@
 #error "Cannot be compiled with JM"
 #endif
 
+/* GPU control registers */
+#define MCU_CONTROL 0x700
+
+#define L2_CONFIG_PBHA_HWU_SHIFT GPU_U(12)
+#define L2_CONFIG_PBHA_HWU_MASK (GPU_U(0xF) << L2_CONFIG_PBHA_HWU_SHIFT)
+#define L2_CONFIG_PBHA_HWU_GET(reg_val)                                                            \
+	(((reg_val)&L2_CONFIG_PBHA_HWU_MASK) >> L2_CONFIG_PBHA_HWU_SHIFT)
+#define L2_CONFIG_PBHA_HWU_SET(reg_val, value)                                                     \
+	(((reg_val) & ~L2_CONFIG_PBHA_HWU_MASK) |                                                  \
+	 (((value) << L2_CONFIG_PBHA_HWU_SHIFT) & L2_CONFIG_PBHA_HWU_MASK))
+
 /* GPU_CONTROL_MCU base address */
 #define GPU_CONTROL_MCU_BASE 0x3000
 
@@ -35,38 +46,41 @@
 #define MCU_SUBSYSTEM_BASE 0x20000
 
 /* IPA control registers */
-#define IPA_CONTROL_BASE       0x40000
-#define IPA_CONTROL_REG(r)     (IPA_CONTROL_BASE+(r))
-#define COMMAND                0x000 /* (WO) Command register */
-#define STATUS                 0x004 /* (RO) Status register */
-#define TIMER                  0x008 /* (RW) Timer control register */
-
-#define SELECT_CSHW_LO         0x010 /* (RW) Counter select for CS hardware, low word */
-#define SELECT_CSHW_HI         0x014 /* (RW) Counter select for CS hardware, high word */
-#define SELECT_MEMSYS_LO       0x018 /* (RW) Counter select for Memory system, low word */
-#define SELECT_MEMSYS_HI       0x01C /* (RW) Counter select for Memory system, high word */
-#define SELECT_TILER_LO        0x020 /* (RW) Counter select for Tiler cores, low word */
-#define SELECT_TILER_HI        0x024 /* (RW) Counter select for Tiler cores, high word */
-#define SELECT_SHADER_LO       0x028 /* (RW) Counter select for Shader cores, low word */
-#define SELECT_SHADER_HI       0x02C /* (RW) Counter select for Shader cores, high word */
+#define IPA_CONTROL_BASE        0x40000
+#define IPA_CONTROL_REG(r)      (IPA_CONTROL_BASE + (r))
+
+#define COMMAND                 0x000 /* (WO) Command register */
+#define STATUS                  0x004 /* (RO) Status register */
+#define TIMER                   0x008 /* (RW) Timer control register */
+
+#define SELECT_CSHW_LO          0x010 /* (RW) Counter select for CS hardware, low word */
+#define SELECT_CSHW_HI          0x014 /* (RW) Counter select for CS hardware, high word */
+#define SELECT_MEMSYS_LO        0x018 /* (RW) Counter select for Memory system, low word */
+#define SELECT_MEMSYS_HI        0x01C /* (RW) Counter select for Memory system, high word */
+#define SELECT_TILER_LO         0x020 /* (RW) Counter select for Tiler cores, low word */
+#define SELECT_TILER_HI         0x024 /* (RW) Counter select for Tiler cores, high word */
+#define SELECT_SHADER_LO        0x028 /* (RW) Counter select for Shader cores, low word */
+#define SELECT_SHADER_HI        0x02C /* (RW) Counter select for Shader cores, high word */
 
 /* Accumulated counter values for CS hardware */
-#define VALUE_CSHW_BASE        0x100
-#define VALUE_CSHW_REG_LO(n)   (VALUE_CSHW_BASE + ((n) << 3))       /* (RO) Counter value #n, low word */
-#define VALUE_CSHW_REG_HI(n)   (VALUE_CSHW_BASE + ((n) << 3) + 4)   /* (RO) Counter value #n, high word */
+#define VALUE_CSHW_BASE         0x100
+#define VALUE_CSHW_REG_LO(n)    (VALUE_CSHW_BASE + ((n) << 3))       /* (RO) Counter value #n, low word */
+#define VALUE_CSHW_REG_HI(n)    (VALUE_CSHW_BASE + ((n) << 3) + 4)   /* (RO) Counter value #n, high word */
 
 /* Accumulated counter values for memory system */
-#define VALUE_MEMSYS_BASE      0x140
-#define VALUE_MEMSYS_REG_LO(n) (VALUE_MEMSYS_BASE + ((n) << 3))     /* (RO) Counter value #n, low word */
-#define VALUE_MEMSYS_REG_HI(n) (VALUE_MEMSYS_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */
+#define VALUE_MEMSYS_BASE       0x140
+#define VALUE_MEMSYS_REG_LO(n)  (VALUE_MEMSYS_BASE + ((n) << 3))     /* (RO) Counter value #n, low word */
+#define VALUE_MEMSYS_REG_HI(n)  (VALUE_MEMSYS_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */
 
-#define VALUE_TILER_BASE       0x180
-#define VALUE_TILER_REG_LO(n)  (VALUE_TILER_BASE + ((n) << 3))      /* (RO) Counter value #n, low word */
-#define VALUE_TILER_REG_HI(n)  (VALUE_TILER_BASE + ((n) << 3) + 4)  /* (RO) Counter value #n, high word */
+#define VALUE_TILER_BASE        0x180
+#define VALUE_TILER_REG_LO(n)   (VALUE_TILER_BASE + ((n) << 3))      /* (RO) Counter value #n, low word */
+#define VALUE_TILER_REG_HI(n)   (VALUE_TILER_BASE + ((n) << 3) + 4)  /* (RO) Counter value #n, high word */
 
-#define VALUE_SHADER_BASE      0x1C0
-#define VALUE_SHADER_REG_LO(n) (VALUE_SHADER_BASE + ((n) << 3))     /* (RO) Counter value #n, low word */
-#define VALUE_SHADER_REG_HI(n) (VALUE_SHADER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */
+#define VALUE_SHADER_BASE       0x1C0
+#define VALUE_SHADER_REG_LO(n)  (VALUE_SHADER_BASE + ((n) << 3))     /* (RO) Counter value #n, low word */
+#define VALUE_SHADER_REG_HI(n)  (VALUE_SHADER_BASE + ((n) << 3) + 4) /* (RO) Counter value #n, high word */
+
+#define AS_STATUS_AS_ACTIVE_INT 0x2
 
 /* Set to implementation defined, outer caching */
 #define AS_MEMATTR_AARCH64_OUTER_IMPL_DEF 0x88ull
@@ -113,7 +127,6 @@
 
 /* GPU control registers */
 #define CORE_FEATURES           0x008   /* () Shader Core Features */
-#define MCU_CONTROL             0x700
 #define MCU_STATUS              0x704
 
 #define MCU_CNTRL_ENABLE        (1 << 0)
@@ -123,44 +136,20 @@
 #define MCU_CNTRL_DOORBELL_DISABLE_SHIFT (31)
 #define MCU_CNTRL_DOORBELL_DISABLE_MASK (1 << MCU_CNTRL_DOORBELL_DISABLE_SHIFT)
 
-#define MCU_STATUS_HALTED        (1 << 1)
-
-#define PRFCNT_BASE_LO   0x060  /* (RW) Performance counter memory
-				 * region base address, low word
-				 */
-#define PRFCNT_BASE_HI   0x064  /* (RW) Performance counter memory
-				 * region base address, high word
-				 */
-#define PRFCNT_CONFIG    0x068  /* (RW) Performance counter
-				 * configuration
-				 */
-
-#define PRFCNT_CSHW_EN   0x06C  /* (RW) Performance counter
-				 * enable for CS Hardware
-				 */
-
-#define PRFCNT_SHADER_EN 0x070  /* (RW) Performance counter enable
-				 * flags for shader cores
-				 */
-#define PRFCNT_TILER_EN  0x074  /* (RW) Performance counter enable
-				 * flags for tiler
-				 */
-#define PRFCNT_MMU_L2_EN 0x07C  /* (RW) Performance counter enable
-				 * flags for MMU/L2 cache
-				 */
+#define MCU_STATUS_HALTED       (1 << 1)
 
 /* JOB IRQ flags */
-#define JOB_IRQ_GLOBAL_IF       (1 << 31)   /* Global interface interrupt received */
+#define JOB_IRQ_GLOBAL_IF (1u << 31) /* Global interface interrupt received */
 
 /* GPU_COMMAND codes */
 #define GPU_COMMAND_CODE_NOP                0x00 /* No operation, nothing happens */
 #define GPU_COMMAND_CODE_RESET              0x01 /* Reset the GPU */
-#define GPU_COMMAND_CODE_PRFCNT             0x02 /* Clear or sample performance counters */
 #define GPU_COMMAND_CODE_TIME               0x03 /* Configure time sources */
 #define GPU_COMMAND_CODE_FLUSH_CACHES       0x04 /* Flush caches */
 #define GPU_COMMAND_CODE_SET_PROTECTED_MODE 0x05 /* Places the GPU in protected mode */
 #define GPU_COMMAND_CODE_FINISH_HALT        0x06 /* Halt CSF */
 #define GPU_COMMAND_CODE_CLEAR_FAULT        0x07 /* Clear GPU_FAULTSTATUS and GPU_FAULTADDRESS, TODX */
+#define GPU_COMMAND_CODE_FLUSH_PA_RANGE 0x08 /* Flush the GPU caches for a physical range, TITX */
 
 /* GPU_COMMAND_RESET payloads */
 
@@ -179,27 +168,34 @@
  */
 #define GPU_COMMAND_RESET_PAYLOAD_HARD_RESET 0x02
 
-/* GPU_COMMAND_PRFCNT payloads */
-#define GPU_COMMAND_PRFCNT_PAYLOAD_SAMPLE 0x01 /* Sample performance counters */
-#define GPU_COMMAND_PRFCNT_PAYLOAD_CLEAR  0x02 /* Clear performance counters */
-
 /* GPU_COMMAND_TIME payloads */
 #define GPU_COMMAND_TIME_DISABLE 0x00 /* Disable cycle counter */
 #define GPU_COMMAND_TIME_ENABLE  0x01 /* Enable cycle counter */
 
 /* GPU_COMMAND_FLUSH_CACHES payloads bits for L2 caches */
-#define GPU_COMMAND_FLUSH_PAYLOAD_L2_NONE 0x000 /* No flush */
-#define GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN 0x001 /* CLN only */
-#define GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE 0x003 /* CLN + INV */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_NONE 0x000 /* No flush */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN 0x001 /* CLN only */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE 0x003 /* CLN + INV */
 
 /* GPU_COMMAND_FLUSH_CACHES payloads bits for Load-store caches */
-#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_NONE 0x000 /* No flush */
-#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN 0x010 /* CLN only */
-#define GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE 0x030 /* CLN + INV */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_NONE 0x000 /* No flush */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN 0x010 /* CLN only */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE 0x030 /* CLN + INV */
 
 /* GPU_COMMAND_FLUSH_CACHES payloads bits for Other caches */
-#define GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE 0x000 /* No flush */
-#define GPU_COMMAND_FLUSH_PAYLOAD_OTHER_INVALIDATE 0x200 /* INV only */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE 0x000 /* No flush */
+#define GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_INVALIDATE 0x200 /* INV only */
+
+/* GPU_COMMAND_FLUSH_PA_RANGE payload bits for flush modes */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_NONE 0x00 /* No flush */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN 0x01 /* CLN only */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_INVALIDATE 0x02 /* INV only */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE 0x03 /* CLN + INV */
+
+/* GPU_COMMAND_FLUSH_PA_RANGE payload bits for which caches should be the target of the command */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_OTHER_CACHE 0x10 /* Other caches */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE 0x20 /* Load-store caches */
+#define GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE 0x40 /* L2 caches */
 
 /* GPU_COMMAND command + payload */
 #define GPU_COMMAND_CODE_PAYLOAD(opcode, payload) \
@@ -218,14 +214,6 @@
 #define GPU_COMMAND_HARD_RESET \
 	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_RESET, GPU_COMMAND_RESET_PAYLOAD_HARD_RESET)
 
-/* Clear all performance counters, setting them all to zero. */
-#define GPU_COMMAND_PRFCNT_CLEAR \
-	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_PRFCNT, GPU_COMMAND_PRFCNT_PAYLOAD_CLEAR)
-
-/* Sample all performance counters, writing them out to memory */
-#define GPU_COMMAND_PRFCNT_SAMPLE \
-	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_PRFCNT, GPU_COMMAND_PRFCNT_PAYLOAD_SAMPLE)
-
 /* Starts the cycle counter, and system timestamp propagation */
 #define GPU_COMMAND_CYCLE_COUNT_START \
 	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_TIME, GPU_COMMAND_TIME_ENABLE)
@@ -235,28 +223,53 @@
 	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_TIME, GPU_COMMAND_TIME_DISABLE)
 
 /* Clean and invalidate L2 cache (Equivalent to FLUSH_PT) */
-#define GPU_COMMAND_CACHE_CLN_INV_L2                                           \
-	GPU_COMMAND_CODE_PAYLOAD(                                              \
-		GPU_COMMAND_CODE_FLUSH_CACHES,                                 \
-		(GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE |               \
-		 GPU_COMMAND_FLUSH_PAYLOAD_LSC_NONE |                          \
-		 GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE))
+#define GPU_COMMAND_CACHE_CLN_INV_L2                                                               \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES,                                    \
+				 (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE |           \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_NONE |                      \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE))
 
 /* Clean and invalidate L2 and LSC caches (Equivalent to FLUSH_MEM) */
-#define GPU_COMMAND_CACHE_CLN_INV_L2_LSC                                       \
-	GPU_COMMAND_CODE_PAYLOAD(                                              \
-		GPU_COMMAND_CODE_FLUSH_CACHES,                                 \
-		(GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE |               \
-		 GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE |              \
-		 GPU_COMMAND_FLUSH_PAYLOAD_OTHER_NONE))
+#define GPU_COMMAND_CACHE_CLN_INV_L2_LSC                                                           \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES,                                    \
+				 (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE |           \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE |          \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE))
 
 /* Clean and invalidate L2, LSC, and Other caches */
-#define GPU_COMMAND_CACHE_CLN_INV_FULL                                         \
-	GPU_COMMAND_CODE_PAYLOAD(                                              \
-		GPU_COMMAND_CODE_FLUSH_CACHES,                                 \
-		(GPU_COMMAND_FLUSH_PAYLOAD_L2_CLEAN_INVALIDATE |               \
-		 GPU_COMMAND_FLUSH_PAYLOAD_LSC_CLEAN_INVALIDATE |              \
-		 GPU_COMMAND_FLUSH_PAYLOAD_OTHER_INVALIDATE))
+#define GPU_COMMAND_CACHE_CLN_INV_FULL                                                             \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES,                                    \
+				 (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_CLEAN_INVALIDATE |           \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE |          \
+				  GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_INVALIDATE))
+
+/* Clean and invalidate only LSC cache */
+#define GPU_COMMAND_CACHE_CLN_INV_LSC                                                              \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_CACHES,                                    \
+				  (GPU_COMMAND_FLUSH_CACHES_PAYLOAD_L2_NONE |                      \
+				   GPU_COMMAND_FLUSH_CACHES_PAYLOAD_LSC_CLEAN_INVALIDATE |         \
+				   GPU_COMMAND_FLUSH_CACHES_PAYLOAD_OTHER_NONE))
+
+/* Clean and invalidate physical range L2 cache (equivalent to FLUSH_PT) */
+#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2                                                      \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE,                                  \
+				 (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE |       \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE))
+
+/* Clean and invalidate physical range L2 and LSC cache (equivalent to FLUSH_MEM) */
+#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC                                                  \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE,                                  \
+				 (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE |       \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE |                   \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE))
+
+/* Clean and invalidate physical range L2, LSC and Other caches */
+#define GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_FULL                                                    \
+	GPU_COMMAND_CODE_PAYLOAD(GPU_COMMAND_CODE_FLUSH_PA_RANGE,                                  \
+				 (GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_MODE_CLEAN_INVALIDATE |       \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_OTHER_CACHE |                 \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_LSC_CACHE |                   \
+				  GPU_COMMAND_FLUSH_PA_RANGE_PAYLOAD_L2_CACHE))
 
 /* Merge cache flush commands */
 #define GPU_COMMAND_FLUSH_CACHE_MERGE(cmd1, cmd2) ((cmd1) | (cmd2))
@@ -285,13 +298,13 @@
 #define GPU_FAULTSTATUS_ACCESS_TYPE_MASK \
 	(0x3ul << GPU_FAULTSTATUS_ACCESS_TYPE_SHIFT)
 
-#define GPU_FAULTSTATUS_ADDR_VALID_SHIFT 10
-#define GPU_FAULTSTATUS_ADDR_VALID_FLAG \
-	(1ul << GPU_FAULTSTATUS_ADDR_VALID_SHIFT)
+#define GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT GPU_U(10)
+#define GPU_FAULTSTATUS_ADDRESS_VALID_MASK \
+	(GPU_U(0x1) << GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT)
 
-#define GPU_FAULTSTATUS_JASID_VALID_SHIFT 11
-#define GPU_FAULTSTATUS_JASID_VALID_FLAG \
-	(1ul << GPU_FAULTSTATUS_JASID_VALID_SHIFT)
+#define GPU_FAULTSTATUS_JASID_VALID_SHIFT GPU_U(11)
+#define GPU_FAULTSTATUS_JASID_VALID_MASK \
+	(GPU_U(0x1) << GPU_FAULTSTATUS_JASID_VALID_SHIFT)
 
 #define GPU_FAULTSTATUS_JASID_SHIFT 12
 #define GPU_FAULTSTATUS_JASID_MASK (0xF << GPU_FAULTSTATUS_JASID_SHIFT)
@@ -337,14 +350,16 @@
 	(((value) << GPU_FAULTSTATUS_ADDRESS_VALID_SHIFT) & GPU_FAULTSTATUS_ADDRESS_VALID_MASK))
 
 /* IRQ flags */
-#define GPU_FAULT               (1 << 0)    /* A GPU Fault has occurred */
-#define GPU_PROTECTED_FAULT     (1 << 1)    /* A GPU fault has occurred in protected mode */
-#define RESET_COMPLETED         (1 << 8)    /* Set when a reset has completed.  */
-#define POWER_CHANGED_SINGLE    (1 << 9)    /* Set when a single core has finished powering up or down. */
-#define POWER_CHANGED_ALL       (1 << 10)   /* Set when all cores have finished powering up or down. */
-#define CLEAN_CACHES_COMPLETED  (1 << 17)   /* Set when a cache clean operation has completed. */
-#define DOORBELL_MIRROR         (1 << 18)   /* Mirrors the doorbell interrupt line to the CPU */
-#define MCU_STATUS_GPU_IRQ      (1 << 19)   /* MCU requires attention */
+#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */
+#define GPU_PROTECTED_FAULT (1 << 1) /* A GPU fault has occurred in protected mode */
+#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed.  */
+#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */
+#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */
+#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */
+#define DOORBELL_MIRROR (1 << 18) /* Mirrors the doorbell interrupt line to the CPU */
+#define MCU_STATUS_GPU_IRQ (1 << 19) /* MCU requires attention */
+#define FLUSH_PA_RANGE_COMPLETED                                                                   \
+	(1 << 20) /* Set when a physical range cache clean operation has completed. */
 
 /*
  * In Debug build,
@@ -362,7 +377,11 @@
 #define GPU_IRQ_REG_COMMON (GPU_FAULT | GPU_PROTECTED_FAULT | RESET_COMPLETED \
 			| POWER_CHANGED_ALL | MCU_STATUS_GPU_IRQ)
 
-/* GPU_CONTROL_MCU.GPU_IRQ_RAWSTAT */
-#define PRFCNT_SAMPLE_COMPLETED (1 << 16)   /* Set when performance count sample has completed */
+/* GPU_FEATURES register */
+#define GPU_FEATURES_RAY_TRACING_SHIFT GPU_U(2)
+#define GPU_FEATURES_RAY_TRACING_MASK (GPU_U(0x1) << GPU_FEATURES_RAY_TRACING_SHIFT)
+#define GPU_FEATURES_RAY_TRACING_GET(reg_val) \
+	(((reg_val)&GPU_FEATURES_RAY_TRACING_MASK) >> GPU_FEATURES_RAY_TRACING_SHIFT)
+/* End of GPU_FEATURES register */
 
 #endif /* _KBASE_GPU_REGMAP_CSF_H_ */
diff --git a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h
index d1cd8fc..387cd50 100644
--- a/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h
+++ b/mali_kbase/gpu/backend/mali_kbase_gpu_regmap_jm.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -59,28 +59,27 @@
 
 #define CORE_FEATURES           0x008   /* (RO) Shader Core Features */
 #define JS_PRESENT              0x01C   /* (RO) Job slots present */
-
-#define PRFCNT_BASE_LO   0x060  /* (RW) Performance counter memory
-				 * region base address, low word
-				 */
-#define PRFCNT_BASE_HI   0x064  /* (RW) Performance counter memory
-				 * region base address, high word
-				 */
-#define PRFCNT_CONFIG    0x068  /* (RW) Performance counter
-				 * configuration
-				 */
-#define PRFCNT_JM_EN     0x06C  /* (RW) Performance counter enable
-				 * flags for Job Manager
-				 */
-#define PRFCNT_SHADER_EN 0x070  /* (RW) Performance counter enable
-				 * flags for shader cores
-				 */
-#define PRFCNT_TILER_EN  0x074  /* (RW) Performance counter enable
-				 * flags for tiler
-				 */
-#define PRFCNT_MMU_L2_EN 0x07C  /* (RW) Performance counter enable
-				 * flags for MMU/L2 cache
-				 */
+#define LATEST_FLUSH            0x038   /* (RO) Flush ID of latest
+                                         * clean-and-invalidate operation
+                                         */
+#define PRFCNT_BASE_LO          0x060   /* (RW) Performance counter memory
+                                         * region base address, low word
+                                         */
+#define PRFCNT_BASE_HI          0x064   /* (RW) Performance counter memory
+                                         * region base address, high word
+                                         */
+#define PRFCNT_CONFIG           0x068   /* (RW) Performance counter configuration */
+#define PRFCNT_JM_EN            0x06C   /* (RW) Performance counter enable
+                                         * flags for Job Manager
+                                         */
+#define PRFCNT_SHADER_EN        0x070   /* (RW) Performance counter enable
+                                         * flags for shader cores */
+#define PRFCNT_TILER_EN         0x074   /* (RW) Performance counter enable
+                                         * flags for tiler
+                                         */
+#define PRFCNT_MMU_L2_EN        0x07C   /* (RW) Performance counter enable
+                                         * flags for MMU/L2 cache
+                                         */
 
 #define JS0_FEATURES            0x0C0   /* (RO) Features of job slot 0 */
 #define JS1_FEATURES            0x0C4   /* (RO) Features of job slot 1 */
@@ -109,6 +108,7 @@
 #define JOB_IRQ_THROTTLE        0x014   /* cycles to delay delivering an interrupt externally. The JOB_IRQ_STATUS is NOT affected by this, just the delivery of the interrupt.  */
 
 #define JOB_SLOT0               0x800   /* Configuration registers for job slot 0 */
+#define JOB_SLOT_REG(n, r)      (JOB_CONTROL_REG(JOB_SLOT0 + ((n) << 7)) + (r))
 #define JOB_SLOT1               0x880   /* Configuration registers for job slot 1 */
 #define JOB_SLOT2               0x900   /* Configuration registers for job slot 2 */
 #define JOB_SLOT3               0x980   /* Configuration registers for job slot 3 */
@@ -125,48 +125,41 @@
 #define JOB_SLOT14              0xF00   /* Configuration registers for job slot 14 */
 #define JOB_SLOT15              0xF80   /* Configuration registers for job slot 15 */
 
-#define JOB_SLOT_REG(n, r)      (JOB_CONTROL_REG(JOB_SLOT0 + ((n) << 7)) + (r))
-
-#define JS_HEAD_LO             0x00	/* (RO) Job queue head pointer for job slot n, low word */
-#define JS_HEAD_HI             0x04	/* (RO) Job queue head pointer for job slot n, high word */
-#define JS_TAIL_LO             0x08	/* (RO) Job queue tail pointer for job slot n, low word */
-#define JS_TAIL_HI             0x0C	/* (RO) Job queue tail pointer for job slot n, high word */
-#define JS_AFFINITY_LO         0x10	/* (RO) Core affinity mask for job slot n, low word */
-#define JS_AFFINITY_HI         0x14	/* (RO) Core affinity mask for job slot n, high word */
-#define JS_CONFIG              0x18	/* (RO) Configuration settings for job slot n */
-/* (RO) Extended affinity mask for job slot n*/
-#define JS_XAFFINITY           0x1C
-
-#define JS_COMMAND             0x20	/* (WO) Command register for job slot n */
-#define JS_STATUS              0x24	/* (RO) Status register for job slot n */
-
-#define JS_HEAD_NEXT_LO        0x40	/* (RW) Next job queue head pointer for job slot n, low word */
-#define JS_HEAD_NEXT_HI        0x44	/* (RW) Next job queue head pointer for job slot n, high word */
-
-#define JS_AFFINITY_NEXT_LO    0x50	/* (RW) Next core affinity mask for job slot n, low word */
-#define JS_AFFINITY_NEXT_HI    0x54	/* (RW) Next core affinity mask for job slot n, high word */
-#define JS_CONFIG_NEXT         0x58	/* (RW) Next configuration settings for job slot n */
-/* (RW) Next extended affinity mask for job slot n */
-#define JS_XAFFINITY_NEXT      0x5C
-
-#define JS_COMMAND_NEXT        0x60	/* (RW) Next command register for job slot n */
-
-#define JS_FLUSH_ID_NEXT       0x70	/* (RW) Next job slot n cache flush ID */
+/* JM Job control register definitions for mali_kbase_debug_job_fault */
+#define JS_HEAD_LO              0x00    /* (RO) Job queue head pointer for job slot n, low word */
+#define JS_HEAD_HI              0x04    /* (RO) Job queue head pointer for job slot n, high word */
+#define JS_TAIL_LO              0x08    /* (RO) Job queue tail pointer for job slot n, low word */
+#define JS_TAIL_HI              0x0C    /* (RO) Job queue tail pointer for job slot n, high word */
+#define JS_AFFINITY_LO          0x10    /* (RO) Core affinity mask for job slot n, low word */
+#define JS_AFFINITY_HI          0x14    /* (RO) Core affinity mask for job slot n, high word */
+#define JS_CONFIG               0x18    /* (RO) Configuration settings for job slot n */
+#define JS_XAFFINITY            0x1C    /* (RO) Extended affinity mask for job slot n*/
+#define JS_COMMAND              0x20	/* (WO) Command register for job slot n */
+#define JS_STATUS               0x24    /* (RO) Status register for job slot n */
+#define JS_HEAD_NEXT_LO         0x40    /* (RW) Next job queue head pointer for job slot n, low word */
+#define JS_HEAD_NEXT_HI         0x44    /* (RW) Next job queue head pointer for job slot n, high word */
+#define JS_AFFINITY_NEXT_LO     0x50    /* (RW) Next core affinity mask for job slot n, low word */
+#define JS_AFFINITY_NEXT_HI     0x54    /* (RW) Next core affinity mask for job slot n, high word */
+#define JS_CONFIG_NEXT          0x58    /* (RW) Next configuration settings for job slot n */
+#define JS_XAFFINITY_NEXT       0x5C    /* (RW) Next extended affinity mask for job slot n */
+#define JS_COMMAND_NEXT         0x60    /* (RW) Next command register for job slot n */
+
+#define JS_FLUSH_ID_NEXT        0x70    /* (RW) Next job slot n cache flush ID */
 
 /* No JM-specific MMU control registers */
 /* No JM-specific MMU address space control registers */
 
 /* JS_COMMAND register commands */
-#define JS_COMMAND_NOP         0x00	/* NOP Operation. Writing this value is ignored */
-#define JS_COMMAND_START       0x01	/* Start processing a job chain. Writing this value is ignored */
-#define JS_COMMAND_SOFT_STOP   0x02	/* Gently stop processing a job chain */
-#define JS_COMMAND_HARD_STOP   0x03	/* Rudely stop processing a job chain */
-#define JS_COMMAND_SOFT_STOP_0 0x04	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 0 */
-#define JS_COMMAND_HARD_STOP_0 0x05	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 0 */
-#define JS_COMMAND_SOFT_STOP_1 0x06	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */
-#define JS_COMMAND_HARD_STOP_1 0x07	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */
+#define JS_COMMAND_NOP          0x00	/* NOP Operation. Writing this value is ignored */
+#define JS_COMMAND_START        0x01	/* Start processing a job chain. Writing this value is ignored */
+#define JS_COMMAND_SOFT_STOP    0x02	/* Gently stop processing a job chain */
+#define JS_COMMAND_HARD_STOP    0x03	/* Rudely stop processing a job chain */
+#define JS_COMMAND_SOFT_STOP_0  0x04	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 0 */
+#define JS_COMMAND_HARD_STOP_0  0x05	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 0 */
+#define JS_COMMAND_SOFT_STOP_1  0x06	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */
+#define JS_COMMAND_HARD_STOP_1  0x07	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */
 
-#define JS_COMMAND_MASK        0x07    /* Mask of bits currently in use by the HW */
+#define JS_COMMAND_MASK         0x07    /* Mask of bits currently in use by the HW */
 
 /* Possible values of JS_CONFIG and JS_CONFIG_NEXT registers */
 #define JS_CONFIG_START_FLUSH_NO_ACTION        (0u << 0)
@@ -262,19 +255,22 @@
 #define GPU_COMMAND_CACHE_CLN_INV_L2 GPU_COMMAND_CLEAN_INV_CACHES
 #define GPU_COMMAND_CACHE_CLN_INV_L2_LSC GPU_COMMAND_CLEAN_INV_CACHES
 #define GPU_COMMAND_CACHE_CLN_INV_FULL GPU_COMMAND_CLEAN_INV_CACHES
+#define GPU_COMMAND_CACHE_CLN_INV_LSC GPU_COMMAND_CLEAN_INV_CACHES
 
 /* Merge cache flush commands */
 #define GPU_COMMAND_FLUSH_CACHE_MERGE(cmd1, cmd2)                              \
 	((cmd1) > (cmd2) ? (cmd1) : (cmd2))
 
 /* IRQ flags */
-#define GPU_FAULT               (1 << 0)    /* A GPU Fault has occurred */
-#define MULTIPLE_GPU_FAULTS     (1 << 7)    /* More than one GPU Fault occurred.  */
-#define RESET_COMPLETED         (1 << 8)    /* Set when a reset has completed.  */
-#define POWER_CHANGED_SINGLE    (1 << 9)    /* Set when a single core has finished powering up or down. */
-#define POWER_CHANGED_ALL       (1 << 10)   /* Set when all cores have finished powering up or down. */
-#define PRFCNT_SAMPLE_COMPLETED (1 << 16)   /* Set when a performance count sample has completed. */
-#define CLEAN_CACHES_COMPLETED  (1 << 17)   /* Set when a cache clean operation has completed. */
+#define GPU_FAULT (1 << 0) /* A GPU Fault has occurred */
+#define MULTIPLE_GPU_FAULTS (1 << 7) /* More than one GPU Fault occurred.  */
+#define RESET_COMPLETED (1 << 8) /* Set when a reset has completed.  */
+#define POWER_CHANGED_SINGLE (1 << 9) /* Set when a single core has finished powering up or down. */
+#define POWER_CHANGED_ALL (1 << 10) /* Set when all cores have finished powering up or down. */
+#define PRFCNT_SAMPLE_COMPLETED (1 << 16) /* Set when a performance count sample has completed. */
+#define CLEAN_CACHES_COMPLETED (1 << 17) /* Set when a cache clean operation has completed. */
+#define FLUSH_PA_RANGE_COMPLETED                                                                   \
+	(1 << 20) /* Set when a physical range cache clean operation has completed. */
 
 /*
  * In Debug build,
diff --git a/mali_kbase/gpu/mali_kbase_gpu.c b/mali_kbase/gpu/mali_kbase_gpu.c
index 8a84ef5..eee670f 100644
--- a/mali_kbase/gpu/mali_kbase_gpu.c
+++ b/mali_kbase/gpu/mali_kbase_gpu.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -32,7 +32,7 @@ const char *kbase_gpu_access_type_name(u32 fault_status)
 		return "READ";
 	case AS_FAULTSTATUS_ACCESS_TYPE_WRITE:
 		return "WRITE";
-	case AS_FAULTSTATUS_ACCESS_TYPE_EX:
+	case AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE:
 		return "EXECUTE";
 	default:
 		WARN_ON(1);
diff --git a/mali_kbase/gpu/mali_kbase_gpu_fault.h b/mali_kbase/gpu/mali_kbase_gpu_fault.h
index 8b50a5d..6a937a5 100644
--- a/mali_kbase/gpu/mali_kbase_gpu_fault.h
+++ b/mali_kbase/gpu/mali_kbase_gpu_fault.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,9 +27,9 @@
  *
  * @exception_code: exception code
  *
- * This function is called from the interrupt handler when a GPU fault occurs.
+ * This function is called by error handlers when GPU reports an error.
  *
- * Return: name associated with the exception code
+ * Return: Error string associated with the exception code
  */
 const char *kbase_gpu_exception_name(u32 exception_code);
 
diff --git a/mali_kbase/gpu/mali_kbase_gpu_regmap.h b/mali_kbase/gpu/mali_kbase_gpu_regmap.h
index 1d2a49b..a92b498 100644
--- a/mali_kbase/gpu/mali_kbase_gpu_regmap.h
+++ b/mali_kbase/gpu/mali_kbase_gpu_regmap.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,6 +25,7 @@
 #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_regmap.h>
 #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_coherency.h>
 #include <uapi/gpu/arm/midgard/gpu/mali_kbase_gpu_id.h>
+
 #if MALI_USE_CSF
 #include "backend/mali_kbase_gpu_regmap_csf.h"
 #else
@@ -34,15 +35,21 @@
 /* GPU_U definition */
 #ifdef __ASSEMBLER__
 #define GPU_U(x) x
+#define GPU_UL(x) x
+#define GPU_ULL(x) x
 #else
 #define GPU_U(x) x##u
+#define GPU_UL(x) x##ul
+#define GPU_ULL(x) x##ull
 #endif /* __ASSEMBLER__ */
 
+
 /* Begin Register Offsets */
 /* GPU control registers */
 
 #define GPU_CONTROL_BASE        0x0000
 #define GPU_CONTROL_REG(r)      (GPU_CONTROL_BASE + (r))
+
 #define GPU_ID                  0x000   /* (RO) GPU and revision identifier */
 #define L2_FEATURES             0x004   /* (RO) Level 2 cache features */
 #define TILER_FEATURES          0x00C   /* (RO) Tiler Features */
@@ -53,9 +60,12 @@
 #define GPU_IRQ_CLEAR           0x024   /* (WO) */
 #define GPU_IRQ_MASK            0x028   /* (RW) */
 #define GPU_IRQ_STATUS          0x02C   /* (RO) */
-
 #define GPU_COMMAND             0x030   /* (WO) */
+
 #define GPU_STATUS              0x034   /* (RO) */
+#define GPU_STATUS_PRFCNT_ACTIVE            (1 << 2)    /* Set if the performance counters are active. */
+#define GPU_STATUS_CYCLE_COUNT_ACTIVE       (1 << 6)    /* Set if the cycle counter is active. */
+#define GPU_STATUS_PROTECTED_MODE_ACTIVE    (1 << 7)    /* Set if protected mode is active */
 
 #define GPU_DBGEN               (1 << 8)    /* DBGEN wire status */
 
@@ -65,10 +75,9 @@
 
 #define L2_CONFIG               0x048   /* (RW) Level 2 cache configuration */
 
-#define GROUPS_L2_COHERENT      (1 << 0) /* Cores groups are l2 coherent */
-#define SUPER_L2_COHERENT       (1 << 1) /* Shader cores within a core
-					  * supergroup are l2 coherent
-					  */
+/* Cores groups are l2 coherent */
+#define MEM_FEATURES_COHERENT_CORE_GROUP_SHIFT GPU_U(0)
+#define MEM_FEATURES_COHERENT_CORE_GROUP_MASK (GPU_U(0x1) << MEM_FEATURES_COHERENT_CORE_GROUP_SHIFT)
 
 #define PWR_KEY                 0x050   /* (WO) Power manager key register */
 #define PWR_OVERRIDE0           0x054   /* (RW) Power manager override settings */
@@ -96,6 +105,11 @@
 
 #define TEXTURE_FEATURES_REG(n) GPU_CONTROL_REG(TEXTURE_FEATURES_0 + ((n) << 2))
 
+#define GPU_COMMAND_ARG0_LO     0x0D0 /* (RW) Additional parameter 0 for GPU commands, low word */
+#define GPU_COMMAND_ARG0_HI     0x0D4 /* (RW) Additional parameter 0 for GPU commands, high word */
+#define GPU_COMMAND_ARG1_LO     0x0D8 /* (RW) Additional parameter 1 for GPU commands, low word */
+#define GPU_COMMAND_ARG1_HI     0x0DC /* (RW) Additional parameter 1 for GPU commands, high word */
+
 #define SHADER_PRESENT_LO       0x100   /* (RO) Shader core present bitmap, low word */
 #define SHADER_PRESENT_HI       0x104   /* (RO) Shader core present bitmap, high word */
 
@@ -105,9 +119,6 @@
 #define L2_PRESENT_LO           0x120   /* (RO) Level 2 cache present bitmap, low word */
 #define L2_PRESENT_HI           0x124   /* (RO) Level 2 cache present bitmap, high word */
 
-#define STACK_PRESENT_LO        0xE00   /* (RO) Core stack present bitmap, low word */
-#define STACK_PRESENT_HI        0xE04   /* (RO) Core stack present bitmap, high word */
-
 #define SHADER_READY_LO         0x140   /* (RO) Shader core ready bitmap, low word */
 #define SHADER_READY_HI         0x144   /* (RO) Shader core ready bitmap, high word */
 
@@ -117,18 +128,23 @@
 #define L2_READY_LO             0x160   /* (RO) Level 2 cache ready bitmap, low word */
 #define L2_READY_HI             0x164   /* (RO) Level 2 cache ready bitmap, high word */
 
-#define STACK_READY_LO          0xE10   /* (RO) Core stack ready bitmap, low word */
-#define STACK_READY_HI          0xE14   /* (RO) Core stack ready bitmap, high word */
-
 #define SHADER_PWRON_LO         0x180   /* (WO) Shader core power on bitmap, low word */
 #define SHADER_PWRON_HI         0x184   /* (WO) Shader core power on bitmap, high word */
 
+#define SHADER_PWRFEATURES      0x188   /* (RW) Shader core power features */
+
 #define TILER_PWRON_LO          0x190   /* (WO) Tiler core power on bitmap, low word */
 #define TILER_PWRON_HI          0x194   /* (WO) Tiler core power on bitmap, high word */
 
 #define L2_PWRON_LO             0x1A0   /* (WO) Level 2 cache power on bitmap, low word */
 #define L2_PWRON_HI             0x1A4   /* (WO) Level 2 cache power on bitmap, high word */
 
+#define STACK_PRESENT_LO        0xE00   /* (RO) Core stack present bitmap, low word */
+#define STACK_PRESENT_HI        0xE04   /* (RO) Core stack present bitmap, high word */
+
+#define STACK_READY_LO          0xE10   /* (RO) Core stack ready bitmap, low word */
+#define STACK_READY_HI          0xE14   /* (RO) Core stack ready bitmap, high word */
+
 #define STACK_PWRON_LO          0xE20   /* (RO) Core stack power on bitmap, low word */
 #define STACK_PWRON_HI          0xE24   /* (RO) Core stack power on bitmap, high word */
 
@@ -176,6 +192,8 @@
 #define COHERENCY_FEATURES      0x300   /* (RO) Coherency features present */
 #define COHERENCY_ENABLE        0x304   /* (RW) Coherency enable */
 
+#define AMBA_FEATURES           0x300   /* (RO) AMBA bus supported features */
+#define AMBA_ENABLE             0x304   /* (RW) AMBA features enable */
 
 #define SHADER_CONFIG           0xF04   /* (RW) Shader core configuration (implementation-specific) */
 #define TILER_CONFIG            0xF08   /* (RW) Tiler core configuration (implementation-specific) */
@@ -184,7 +202,6 @@
 /* Job control registers */
 
 #define JOB_CONTROL_BASE        0x1000
-
 #define JOB_CONTROL_REG(r)      (JOB_CONTROL_BASE + (r))
 
 #define JOB_IRQ_RAWSTAT         0x000   /* Raw interrupt status register */
@@ -194,6 +211,10 @@
 
 /* MMU control registers */
 
+#define MMU_CONTROL_BASE        0x2000
+#define MMU_CONTROL_REG(r)      (MMU_CONTROL_BASE + (r))
+
+#define MMU_IRQ_RAWSTAT         0x000   /* (RW) Raw interrupt status register */
 #define MMU_IRQ_CLEAR           0x004   /* (WO) Interrupt clear register */
 #define MMU_IRQ_MASK            0x008   /* (RW) Interrupt mask register */
 #define MMU_IRQ_STATUS          0x00C   /* (RO) Interrupt status register */
@@ -217,28 +238,26 @@
 
 /* MMU address space control registers */
 
-#define MMU_AS_REG(n, r)        (MMU_REG(MMU_AS0 + ((n) << 6)) + (r))
-
-#define AS_TRANSTAB_LO         0x00	/* (RW) Translation Table Base Address for address space n, low word */
-#define AS_TRANSTAB_HI         0x04	/* (RW) Translation Table Base Address for address space n, high word */
-#define AS_MEMATTR_LO          0x08	/* (RW) Memory attributes for address space n, low word. */
-#define AS_MEMATTR_HI          0x0C	/* (RW) Memory attributes for address space n, high word. */
-#define AS_LOCKADDR_LO         0x10	/* (RW) Lock region address for address space n, low word */
-#define AS_LOCKADDR_HI         0x14	/* (RW) Lock region address for address space n, high word */
-#define AS_COMMAND             0x18	/* (WO) MMU command register for address space n */
-#define AS_FAULTSTATUS         0x1C	/* (RO) MMU fault status register for address space n */
-#define AS_FAULTADDRESS_LO     0x20	/* (RO) Fault Address for address space n, low word */
-#define AS_FAULTADDRESS_HI     0x24	/* (RO) Fault Address for address space n, high word */
-#define AS_STATUS              0x28	/* (RO) Status flags for address space n */
-
-/* (RW) Translation table configuration for address space n, low word */
-#define AS_TRANSCFG_LO         0x30
-/* (RW) Translation table configuration for address space n, high word */
-#define AS_TRANSCFG_HI         0x34
-/* (RO) Secondary fault address for address space n, low word */
-#define AS_FAULTEXTRA_LO       0x38
-/* (RO) Secondary fault address for address space n, high word */
-#define AS_FAULTEXTRA_HI       0x3C
+#define MMU_STAGE1 0x2000 /* () MMU control registers */
+#define MMU_STAGE1_REG(r) (MMU_STAGE1 + (r))
+
+#define MMU_AS_REG(n, r)        (MMU_AS0 + ((n) << 6) + (r))
+
+#define AS_TRANSTAB_LO          0x00	/* (RW) Translation Table Base Address for address space n, low word */
+#define AS_TRANSTAB_HI          0x04	/* (RW) Translation Table Base Address for address space n, high word */
+#define AS_MEMATTR_LO           0x08	/* (RW) Memory attributes for address space n, low word. */
+#define AS_MEMATTR_HI           0x0C	/* (RW) Memory attributes for address space n, high word. */
+#define AS_LOCKADDR_LO          0x10    /* (RW) Lock region address for address space n, low word */
+#define AS_LOCKADDR_HI          0x14    /* (RW) Lock region address for address space n, high word */
+#define AS_COMMAND              0x18	/* (WO) MMU command register for address space n */
+#define AS_FAULTSTATUS          0x1C    /* (RO) MMU fault status register for address space n */
+#define AS_FAULTADDRESS_LO      0x20    /* (RO) Fault Address for address space n, low word */
+#define AS_FAULTADDRESS_HI      0x24    /* (RO) Fault Address for address space n, high word */
+#define AS_STATUS               0x28    /* (RO) Status flags for address space n */
+#define AS_TRANSCFG_LO          0x30    /* (RW) Translation table configuration for address space n, low word */
+#define AS_TRANSCFG_HI          0x34    /* (RW) Translation table configuration for address space n, high word */
+#define AS_FAULTEXTRA_LO        0x38    /* (RO) Secondary fault address for address space n, low word */
+#define AS_FAULTEXTRA_HI        0x3C    /* (RO) Secondary fault address for address space n, high word */
 
 /* End Register Offsets */
 
@@ -288,7 +307,7 @@
 	(((reg_val)&AS_FAULTSTATUS_ACCESS_TYPE_MASK) >> AS_FAULTSTATUS_ACCESS_TYPE_SHIFT)
 
 #define AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC       (0x0)
-#define AS_FAULTSTATUS_ACCESS_TYPE_EX           (0x1)
+#define AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE      (0x1)
 #define AS_FAULTSTATUS_ACCESS_TYPE_READ         (0x2)
 #define AS_FAULTSTATUS_ACCESS_TYPE_WRITE        (0x3)
 
@@ -355,8 +374,8 @@
 	 (((value) << AS_LOCKADDR_LOCKADDR_SIZE_SHIFT) &                             \
 	 AS_LOCKADDR_LOCKADDR_SIZE_MASK))
 #define AS_LOCKADDR_LOCKADDR_BASE_SHIFT GPU_U(12)
-#define AS_LOCKADDR_LOCKADDR_BASE_MASK                                         \
-	(GPU_U(0xFFFFFFFFFFFFF) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT)
+#define AS_LOCKADDR_LOCKADDR_BASE_MASK                                                             \
+	(GPU_ULL(0xFFFFFFFFFFFFF) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT)
 #define AS_LOCKADDR_LOCKADDR_BASE_GET(reg_val)                                 \
 	(((reg_val)&AS_LOCKADDR_LOCKADDR_BASE_MASK) >>                               \
 	 AS_LOCKADDR_LOCKADDR_BASE_SHIFT)
@@ -364,11 +383,11 @@
 	(((reg_val) & ~AS_LOCKADDR_LOCKADDR_BASE_MASK) |                             \
 	 (((value) << AS_LOCKADDR_LOCKADDR_BASE_SHIFT) &                             \
 	 AS_LOCKADDR_LOCKADDR_BASE_MASK))
-
-/* GPU_STATUS values */
-#define GPU_STATUS_PRFCNT_ACTIVE            (1 << 2)    /* Set if the performance counters are active. */
-#define GPU_STATUS_CYCLE_COUNT_ACTIVE       (1 << 6)    /* Set if the cycle counter is active. */
-#define GPU_STATUS_PROTECTED_MODE_ACTIVE    (1 << 7)    /* Set if protected mode is active */
+#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT (6)
+#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK ((0xF) << AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT)
+#define AS_LOCKADDR_FLUSH_SKIP_LEVELS_SET(reg_val, value)                                          \
+	(((reg_val) & ~AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK) |                                       \
+	 ((value << AS_LOCKADDR_FLUSH_SKIP_LEVELS_SHIFT) & AS_LOCKADDR_FLUSH_SKIP_LEVELS_MASK))
 
 /* PRFCNT_CONFIG register values */
 #define PRFCNT_CONFIG_MODE_SHIFT        0 /* Counter mode position. */
@@ -454,6 +473,60 @@
 #define L2_CONFIG_ASN_HASH_ENABLE_MASK         (1ul << L2_CONFIG_ASN_HASH_ENABLE_SHIFT)
 /* End L2_CONFIG register */
 
+/* AMBA_FEATURES register */
+#define AMBA_FEATURES_ACE_LITE_SHIFT GPU_U(0)
+#define AMBA_FEATURES_ACE_LITE_MASK (GPU_U(0x1) << AMBA_FEATURES_ACE_LITE_SHIFT)
+#define AMBA_FEATURES_ACE_LITE_GET(reg_val)                                    \
+	(((reg_val)&AMBA_FEATURES_ACE_LITE_MASK) >>                            \
+	 AMBA_FEATURES_ACE_LITE_SHIFT)
+#define AMBA_FEATURES_ACE_LITE_SET(reg_val, value)                             \
+	(((reg_val) & ~AMBA_FEATURES_ACE_LITE_MASK) |                          \
+	 (((value) << AMBA_FEATURES_ACE_LITE_SHIFT) &                          \
+	  AMBA_FEATURES_ACE_LITE_MASK))
+#define AMBA_FEATURES_ACE_SHIFT GPU_U(1)
+#define AMBA_FEATURES_ACE_MASK (GPU_U(0x1) << AMBA_FEATURES_ACE_SHIFT)
+#define AMBA_FEATURES_ACE_GET(reg_val)                                         \
+	(((reg_val)&AMBA_FEATURES_ACE_MASK) >> AMBA_FEATURES_ACE_SHIFT)
+#define AMBA_FEATURES_ACE_SET(reg_val, value)                                  \
+	(((reg_val) & ~AMBA_FEATURES_ACE_MASK) |                               \
+	 (((value) << AMBA_FEATURES_ACE_SHIFT) & AMBA_FEATURES_ACE_MASK))
+#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT GPU_U(5)
+#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK                                \
+	(GPU_U(0x1) << AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT)
+#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_GET(reg_val)                        \
+	(((reg_val)&AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK) >>                \
+	 AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT)
+#define AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SET(reg_val, value)                 \
+	(((reg_val) & ~AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK) |              \
+	 (((value) << AMBA_FEATURES_MEMORY_CACHE_SUPPORT_SHIFT) &              \
+	  AMBA_FEATURES_MEMORY_CACHE_SUPPORT_MASK))
+
+/* AMBA_ENABLE register */
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT GPU_U(0)
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_MASK                                    \
+	(GPU_U(0x1F) << AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT)
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_GET(reg_val)                            \
+	(((reg_val)&AMBA_ENABLE_COHERENCY_PROTOCOL_MASK) >>                    \
+	 AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT)
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_SET(reg_val, value)                     \
+	(((reg_val) & ~AMBA_ENABLE_COHERENCY_PROTOCOL_MASK) |                  \
+	 (((value) << AMBA_ENABLE_COHERENCY_PROTOCOL_SHIFT) &                  \
+	  AMBA_ENABLE_COHERENCY_PROTOCOL_MASK))
+/* AMBA_ENABLE_coherency_protocol values */
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_ACE_LITE 0x0
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_ACE 0x1
+#define AMBA_ENABLE_COHERENCY_PROTOCOL_NO_COHERENCY 0x1F
+/* End of AMBA_ENABLE_coherency_protocol values */
+#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT GPU_U(5)
+#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK                                  \
+	(GPU_U(0x1) << AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT)
+#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_GET(reg_val)                          \
+	(((reg_val)&AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK) >>                  \
+	 AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT)
+#define AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SET(reg_val, value)                   \
+	(((reg_val) & ~AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK) |                \
+	 (((value) << AMBA_ENABLE_MEMORY_CACHE_SUPPORT_SHIFT) &                \
+	  AMBA_ENABLE_MEMORY_CACHE_SUPPORT_MASK))
 
 /* IDVS_GROUP register */
 #define IDVS_GROUP_SIZE_SHIFT (16)
diff --git a/mali_kbase/hwcnt/Kbuild b/mali_kbase/hwcnt/Kbuild
new file mode 100644
index 0000000..8c8775f
--- /dev/null
+++ b/mali_kbase/hwcnt/Kbuild
@@ -0,0 +1,37 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+#
+# This program is free software and is provided to you under the terms of the
+# GNU General Public License version 2 as published by the Free Software
+# Foundation, and any use by you of this program is subject to the terms
+# of such GNU license.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, you can access it online at
+# http://www.gnu.org/licenses/gpl-2.0.html.
+#
+#
+
+mali_kbase-y += \
+    hwcnt/mali_kbase_hwcnt.o \
+    hwcnt/mali_kbase_hwcnt_gpu.o \
+    hwcnt/mali_kbase_hwcnt_gpu_narrow.o \
+    hwcnt/mali_kbase_hwcnt_types.o \
+    hwcnt/mali_kbase_hwcnt_virtualizer.o \
+    hwcnt/mali_kbase_hwcnt_watchdog_if_timer.o
+
+ifeq ($(CONFIG_MALI_CSF_SUPPORT),y)
+    mali_kbase-y += \
+        hwcnt/backend/mali_kbase_hwcnt_backend_csf.o \
+        hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.o
+else
+    mali_kbase-y += \
+        hwcnt/backend/mali_kbase_hwcnt_backend_jm.o \
+        hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.o
+endif
diff --git a/mali_kbase/mali_kbase_hwcnt_backend.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend.h
index b069fc1..6cfa6f5 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -56,8 +56,8 @@ struct kbase_hwcnt_backend;
  *
  * Return: Non-NULL pointer to immutable hardware counter metadata.
  */
-typedef const struct kbase_hwcnt_metadata *kbase_hwcnt_backend_metadata_fn(
-	const struct kbase_hwcnt_backend_info *info);
+typedef const struct kbase_hwcnt_metadata *
+kbase_hwcnt_backend_metadata_fn(const struct kbase_hwcnt_backend_info *info);
 
 /**
  * typedef kbase_hwcnt_backend_init_fn - Initialise a counter backend.
@@ -69,9 +69,8 @@ typedef const struct kbase_hwcnt_metadata *kbase_hwcnt_backend_metadata_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_init_fn(
-	const struct kbase_hwcnt_backend_info *info,
-	struct kbase_hwcnt_backend **out_backend);
+typedef int kbase_hwcnt_backend_init_fn(const struct kbase_hwcnt_backend_info *info,
+					struct kbase_hwcnt_backend **out_backend);
 
 /**
  * typedef kbase_hwcnt_backend_term_fn - Terminate a counter backend.
@@ -86,8 +85,7 @@ typedef void kbase_hwcnt_backend_term_fn(struct kbase_hwcnt_backend *backend);
  *
  * Return: Backend timestamp in nanoseconds.
  */
-typedef u64 kbase_hwcnt_backend_timestamp_ns_fn(
-	struct kbase_hwcnt_backend *backend);
+typedef u64 kbase_hwcnt_backend_timestamp_ns_fn(struct kbase_hwcnt_backend *backend);
 
 /**
  * typedef kbase_hwcnt_backend_dump_enable_fn - Start counter dumping with the
@@ -102,9 +100,8 @@ typedef u64 kbase_hwcnt_backend_timestamp_ns_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_enable_fn(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map);
+typedef int kbase_hwcnt_backend_dump_enable_fn(struct kbase_hwcnt_backend *backend,
+					       const struct kbase_hwcnt_enable_map *enable_map);
 
 /**
  * typedef kbase_hwcnt_backend_dump_enable_nolock_fn - Start counter dumping
@@ -118,9 +115,9 @@ typedef int kbase_hwcnt_backend_dump_enable_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_enable_nolock_fn(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map);
+typedef int
+kbase_hwcnt_backend_dump_enable_nolock_fn(struct kbase_hwcnt_backend *backend,
+					  const struct kbase_hwcnt_enable_map *enable_map);
 
 /**
  * typedef kbase_hwcnt_backend_dump_disable_fn - Disable counter dumping with
@@ -130,8 +127,7 @@ typedef int kbase_hwcnt_backend_dump_enable_nolock_fn(
  * If the backend is already disabled, does nothing.
  * Any undumped counter values since the last dump get will be lost.
  */
-typedef void kbase_hwcnt_backend_dump_disable_fn(
-	struct kbase_hwcnt_backend *backend);
+typedef void kbase_hwcnt_backend_dump_disable_fn(struct kbase_hwcnt_backend *backend);
 
 /**
  * typedef kbase_hwcnt_backend_dump_clear_fn - Reset all the current undumped
@@ -142,8 +138,7 @@ typedef void kbase_hwcnt_backend_dump_disable_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_clear_fn(
-	struct kbase_hwcnt_backend *backend);
+typedef int kbase_hwcnt_backend_dump_clear_fn(struct kbase_hwcnt_backend *backend);
 
 /**
  * typedef kbase_hwcnt_backend_dump_request_fn - Request an asynchronous counter
@@ -157,9 +152,8 @@ typedef int kbase_hwcnt_backend_dump_clear_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_request_fn(
-	struct kbase_hwcnt_backend *backend,
-	u64 *dump_time_ns);
+typedef int kbase_hwcnt_backend_dump_request_fn(struct kbase_hwcnt_backend *backend,
+						u64 *dump_time_ns);
 
 /**
  * typedef kbase_hwcnt_backend_dump_wait_fn - Wait until the last requested
@@ -170,8 +164,7 @@ typedef int kbase_hwcnt_backend_dump_request_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_wait_fn(
-	struct kbase_hwcnt_backend *backend);
+typedef int kbase_hwcnt_backend_dump_wait_fn(struct kbase_hwcnt_backend *backend);
 
 /**
  * typedef kbase_hwcnt_backend_dump_get_fn - Copy or accumulate enable the
@@ -189,11 +182,10 @@ typedef int kbase_hwcnt_backend_dump_wait_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_dump_get_fn(
-	struct kbase_hwcnt_backend *backend,
-	struct kbase_hwcnt_dump_buffer *dump_buffer,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	bool accumulate);
+typedef int kbase_hwcnt_backend_dump_get_fn(struct kbase_hwcnt_backend *backend,
+					    struct kbase_hwcnt_dump_buffer *dump_buffer,
+					    const struct kbase_hwcnt_enable_map *enable_map,
+					    bool accumulate);
 
 /**
  * struct kbase_hwcnt_backend_interface - Hardware counter backend virtual
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.c
index c42f2a0..27acfc6 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_csf.c
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,9 +19,9 @@
  *
  */
 
-#include "mali_kbase_hwcnt_backend_csf.h"
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
 #include <linux/log2.h>
 #include <linux/kernel.h>
@@ -36,8 +36,13 @@
 #define BASE_MAX_NR_CLOCKS_REGULATORS 2
 #endif
 
+#if IS_ENABLED(CONFIG_MALI_IS_FPGA) && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+/* Backend watch dog timer interval in milliseconds: 18 seconds. */
+#define HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS ((u32)18000)
+#else
 /* Backend watch dog timer interval in milliseconds: 1 second. */
 #define HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS ((u32)1000)
+#endif /* IS_FPGA && !NO_MALI */
 
 /**
  * enum kbase_hwcnt_backend_csf_dump_state - HWC CSF backend dumping states.
@@ -168,23 +173,29 @@ struct kbase_hwcnt_backend_csf_info {
 /**
  * struct kbase_hwcnt_csf_physical_layout - HWC sample memory physical layout
  *                                          information.
+ * @hw_block_cnt:       Total number of hardware counters blocks. The hw counters blocks are
+ *                      sub-categorized into 4 classes: front-end, tiler, memory system, and shader.
+ *                      hw_block_cnt = fe_cnt + tiler_cnt + mmu_l2_cnt + shader_cnt.
  * @fe_cnt:             Front end block count.
  * @tiler_cnt:          Tiler block count.
- * @mmu_l2_cnt:         Memory system(MMU and L2 cache) block count.
+ * @mmu_l2_cnt:         Memory system (MMU and L2 cache) block count.
  * @shader_cnt:         Shader Core block count.
- * @block_cnt:          Total block count (sum of all other block counts).
+ * @fw_block_cnt:       Total number of firmware counters blocks.
+ * @block_cnt:          Total block count (sum of all counter blocks: hw_block_cnt + fw_block_cnt).
  * @shader_avail_mask:  Bitmap of all shader cores in the system.
  * @enable_mask_offset: Offset in array elements of enable mask in each block
  *                      starting from the beginning of block.
- * @headers_per_block:  Header size per block.
- * @counters_per_block: Counters size per block.
- * @values_per_block:   Total size per block.
+ * @headers_per_block:  For any block, the number of counters designated as block's header.
+ * @counters_per_block: For any block, the number of counters designated as block's payload.
+ * @values_per_block:   For any block, the number of counters in total (header + payload).
  */
 struct kbase_hwcnt_csf_physical_layout {
+	u8 hw_block_cnt;
 	u8 fe_cnt;
 	u8 tiler_cnt;
 	u8 mmu_l2_cnt;
 	u8 shader_cnt;
+	u8 fw_block_cnt;
 	u8 block_cnt;
 	u64 shader_avail_mask;
 	size_t enable_mask_offset;
@@ -256,8 +267,7 @@ struct kbase_hwcnt_backend_csf {
 	struct work_struct hwc_threshold_work;
 };
 
-static bool kbasep_hwcnt_backend_csf_backend_exists(
-	struct kbase_hwcnt_backend_csf_info *csf_info)
+static bool kbasep_hwcnt_backend_csf_backend_exists(struct kbase_hwcnt_backend_csf_info *csf_info)
 {
 	WARN_ON(!csf_info);
 	csf_info->csf_if->assert_lock_held(csf_info->csf_if->ctx);
@@ -271,19 +281,22 @@ static bool kbasep_hwcnt_backend_csf_backend_exists(
  * @backend_csf: Non-NULL pointer to backend.
  * @enable_map:  Non-NULL pointer to enable map specifying enabled counters.
  */
-static void kbasep_hwcnt_backend_csf_cc_initial_sample(
-	struct kbase_hwcnt_backend_csf *backend_csf,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static void
+kbasep_hwcnt_backend_csf_cc_initial_sample(struct kbase_hwcnt_backend_csf *backend_csf,
+					   const struct kbase_hwcnt_enable_map *enable_map)
 {
 	u64 clk_enable_map = enable_map->clk_enable_map;
 	u64 cycle_counts[BASE_MAX_NR_CLOCKS_REGULATORS];
 	size_t clk;
 
+	memset(cycle_counts, 0, sizeof(cycle_counts));
+
 	/* Read cycle count from CSF interface for both clock domains. */
-	backend_csf->info->csf_if->get_gpu_cycle_count(
-		backend_csf->info->csf_if->ctx, cycle_counts, clk_enable_map);
+	backend_csf->info->csf_if->get_gpu_cycle_count(backend_csf->info->csf_if->ctx, cycle_counts,
+						       clk_enable_map);
 
-	kbase_hwcnt_metadata_for_each_clock(enable_map->metadata, clk) {
+	kbase_hwcnt_metadata_for_each_clock(enable_map->metadata, clk)
+	{
 		if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, clk))
 			backend_csf->prev_cycle_count[clk] = cycle_counts[clk];
 	}
@@ -292,42 +305,37 @@ static void kbasep_hwcnt_backend_csf_cc_initial_sample(
 	backend_csf->clk_enable_map = clk_enable_map;
 }
 
-static void
-kbasep_hwcnt_backend_csf_cc_update(struct kbase_hwcnt_backend_csf *backend_csf)
+static void kbasep_hwcnt_backend_csf_cc_update(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	u64 cycle_counts[BASE_MAX_NR_CLOCKS_REGULATORS];
 	size_t clk;
 
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
+	memset(cycle_counts, 0, sizeof(cycle_counts));
+
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
 
-	backend_csf->info->csf_if->get_gpu_cycle_count(
-		backend_csf->info->csf_if->ctx, cycle_counts,
-		backend_csf->clk_enable_map);
+	backend_csf->info->csf_if->get_gpu_cycle_count(backend_csf->info->csf_if->ctx, cycle_counts,
+						       backend_csf->clk_enable_map);
 
-	kbase_hwcnt_metadata_for_each_clock(backend_csf->info->metadata, clk) {
-		if (kbase_hwcnt_clk_enable_map_enabled(
-			    backend_csf->clk_enable_map, clk)) {
+	kbase_hwcnt_metadata_for_each_clock(backend_csf->info->metadata, clk)
+	{
+		if (kbase_hwcnt_clk_enable_map_enabled(backend_csf->clk_enable_map, clk)) {
 			backend_csf->cycle_count_elapsed[clk] =
-				cycle_counts[clk] -
-				backend_csf->prev_cycle_count[clk];
+				cycle_counts[clk] - backend_csf->prev_cycle_count[clk];
 			backend_csf->prev_cycle_count[clk] = cycle_counts[clk];
 		}
 	}
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_timestamp_ns_fn */
-static u64
-kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend)
+static u64 kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend)
 {
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 
 	if (!backend_csf || !backend_csf->info || !backend_csf->info->csf_if)
 		return 0;
 
-	return backend_csf->info->csf_if->timestamp_ns(
-		backend_csf->info->csf_if->ctx);
+	return backend_csf->info->csf_if->timestamp_ns(backend_csf->info->csf_if->ctx);
 }
 
 /** kbasep_hwcnt_backend_csf_process_enable_map() - Process the enable_map to
@@ -336,8 +344,8 @@ kbasep_hwcnt_backend_csf_timestamp_ns(struct kbase_hwcnt_backend *backend)
  *                                                  required.
  *@phys_enable_map: HWC physical enable map to be processed.
  */
-static void kbasep_hwcnt_backend_csf_process_enable_map(
-	struct kbase_hwcnt_physical_enable_map *phys_enable_map)
+static void
+kbasep_hwcnt_backend_csf_process_enable_map(struct kbase_hwcnt_physical_enable_map *phys_enable_map)
 {
 	WARN_ON(!phys_enable_map);
 
@@ -361,46 +369,55 @@ static void kbasep_hwcnt_backend_csf_init_layout(
 	const struct kbase_hwcnt_backend_csf_if_prfcnt_info *prfcnt_info,
 	struct kbase_hwcnt_csf_physical_layout *phys_layout)
 {
-	u8 shader_core_cnt;
+	size_t shader_core_cnt;
 	size_t values_per_block;
+	size_t fw_blocks_count;
+	size_t hw_blocks_count;
 
 	WARN_ON(!prfcnt_info);
 	WARN_ON(!phys_layout);
 
 	shader_core_cnt = fls64(prfcnt_info->core_mask);
-	values_per_block =
-		prfcnt_info->prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES;
+	values_per_block = prfcnt_info->prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES;
+	fw_blocks_count = div_u64(prfcnt_info->prfcnt_fw_size, prfcnt_info->prfcnt_block_size);
+	hw_blocks_count = div_u64(prfcnt_info->prfcnt_hw_size, prfcnt_info->prfcnt_block_size);
+
+	/* The number of hardware counters reported by the GPU matches the legacy guess-work we
+	 * have done in the past
+	 */
+	WARN_ON(hw_blocks_count != KBASE_HWCNT_V5_FE_BLOCK_COUNT +
+					   KBASE_HWCNT_V5_TILER_BLOCK_COUNT +
+					   prfcnt_info->l2_count + shader_core_cnt);
 
 	*phys_layout = (struct kbase_hwcnt_csf_physical_layout){
 		.fe_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT,
 		.tiler_cnt = KBASE_HWCNT_V5_TILER_BLOCK_COUNT,
 		.mmu_l2_cnt = prfcnt_info->l2_count,
 		.shader_cnt = shader_core_cnt,
-		.block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT +
-			     KBASE_HWCNT_V5_TILER_BLOCK_COUNT +
-			     prfcnt_info->l2_count + shader_core_cnt,
+		.fw_block_cnt = fw_blocks_count,
+		.hw_block_cnt = hw_blocks_count,
+		.block_cnt = fw_blocks_count + hw_blocks_count,
 		.shader_avail_mask = prfcnt_info->core_mask,
 		.headers_per_block = KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
 		.values_per_block = values_per_block,
-		.counters_per_block =
-			values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
+		.counters_per_block = values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
 		.enable_mask_offset = KBASE_HWCNT_V5_PRFCNT_EN_HEADER,
 	};
 }
 
-static void kbasep_hwcnt_backend_csf_reset_internal_buffers(
-	struct kbase_hwcnt_backend_csf *backend_csf)
+static void
+kbasep_hwcnt_backend_csf_reset_internal_buffers(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	size_t user_buf_bytes = backend_csf->info->metadata->dump_buf_bytes;
 
 	memset(backend_csf->to_user_buf, 0, user_buf_bytes);
 	memset(backend_csf->accum_buf, 0, user_buf_bytes);
-	memset(backend_csf->old_sample_buf, 0,
-	       backend_csf->info->prfcnt_info.dump_bytes);
+	memset(backend_csf->old_sample_buf, 0, backend_csf->info->prfcnt_info.dump_bytes);
 }
 
-static void kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(
-	struct kbase_hwcnt_backend_csf *backend_csf, u32 *sample)
+static void
+kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(struct kbase_hwcnt_backend_csf *backend_csf,
+						      u32 *sample)
 {
 	u32 block_idx;
 	const struct kbase_hwcnt_csf_physical_layout *phys_layout;
@@ -414,8 +431,8 @@ static void kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(
 	}
 }
 
-static void kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(
-	struct kbase_hwcnt_backend_csf *backend_csf)
+static void
+kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	u32 idx;
 	u32 *sample;
@@ -426,19 +443,16 @@ static void kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(
 
 	for (idx = 0; idx < backend_csf->info->ring_buf_cnt; idx++) {
 		sample = (u32 *)&cpu_dump_base[idx * dump_bytes];
-		kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(
-			backend_csf, sample);
+		kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(backend_csf, sample);
 	}
 }
 
-static void kbasep_hwcnt_backend_csf_update_user_sample(
-	struct kbase_hwcnt_backend_csf *backend_csf)
+static void kbasep_hwcnt_backend_csf_update_user_sample(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	size_t user_buf_bytes = backend_csf->info->metadata->dump_buf_bytes;
 
 	/* Copy the data into the sample and wait for the user to get it. */
-	memcpy(backend_csf->to_user_buf, backend_csf->accum_buf,
-	       user_buf_bytes);
+	memcpy(backend_csf->to_user_buf, backend_csf->accum_buf, user_buf_bytes);
 
 	/* After copied data into user sample, clear the accumulator values to
 	 * prepare for the next accumulator, such as the next request or
@@ -448,9 +462,8 @@ static void kbasep_hwcnt_backend_csf_update_user_sample(
 }
 
 static void kbasep_hwcnt_backend_csf_accumulate_sample(
-	const struct kbase_hwcnt_csf_physical_layout *phys_layout,
-	size_t dump_bytes, u64 *accum_buf, const u32 *old_sample_buf,
-	const u32 *new_sample_buf, bool clearing_samples)
+	const struct kbase_hwcnt_csf_physical_layout *phys_layout, size_t dump_bytes,
+	u64 *accum_buf, const u32 *old_sample_buf, const u32 *new_sample_buf, bool clearing_samples)
 {
 	size_t block_idx;
 	const u32 *old_block = old_sample_buf;
@@ -458,11 +471,17 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample(
 	u64 *acc_block = accum_buf;
 	const size_t values_per_block = phys_layout->values_per_block;
 
-	for (block_idx = 0; block_idx < phys_layout->block_cnt; block_idx++) {
-		const u32 old_enable_mask =
-			old_block[phys_layout->enable_mask_offset];
-		const u32 new_enable_mask =
-			new_block[phys_layout->enable_mask_offset];
+	/* Performance counter blocks for firmware are stored before blocks for hardware.
+	 * We skip over the firmware's performance counter blocks (counters dumping is not
+	 * supported for firmware blocks, only hardware ones).
+	 */
+	old_block += values_per_block * phys_layout->fw_block_cnt;
+	new_block += values_per_block * phys_layout->fw_block_cnt;
+
+	for (block_idx = phys_layout->fw_block_cnt; block_idx < phys_layout->block_cnt;
+	     block_idx++) {
+		const u32 old_enable_mask = old_block[phys_layout->enable_mask_offset];
+		const u32 new_enable_mask = new_block[phys_layout->enable_mask_offset];
 
 		if (new_enable_mask == 0) {
 			/* Hardware block was unavailable or we didn't turn on
@@ -475,9 +494,7 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample(
 			size_t ctr_idx;
 
 			/* Unconditionally copy the headers. */
-			for (ctr_idx = 0;
-			     ctr_idx < phys_layout->headers_per_block;
-			     ctr_idx++) {
+			for (ctr_idx = 0; ctr_idx < phys_layout->headers_per_block; ctr_idx++) {
 				acc_block[ctr_idx] = new_block[ctr_idx];
 			}
 
@@ -506,34 +523,25 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample(
 					 * counters only, as we know previous
 					 * values are zeroes.
 					 */
-					for (ctr_idx =
-						     phys_layout
-							     ->headers_per_block;
-					     ctr_idx < values_per_block;
-					     ctr_idx++) {
-						acc_block[ctr_idx] +=
-							new_block[ctr_idx];
+					for (ctr_idx = phys_layout->headers_per_block;
+					     ctr_idx < values_per_block; ctr_idx++) {
+						acc_block[ctr_idx] += new_block[ctr_idx];
 					}
 				} else {
 					/* Hardware block was previously
 					 * available. Accumulate the delta
 					 * between old and new counter values.
 					 */
-					for (ctr_idx =
-						     phys_layout
-							     ->headers_per_block;
-					     ctr_idx < values_per_block;
-					     ctr_idx++) {
+					for (ctr_idx = phys_layout->headers_per_block;
+					     ctr_idx < values_per_block; ctr_idx++) {
 						acc_block[ctr_idx] +=
-							new_block[ctr_idx] -
-							old_block[ctr_idx];
+							new_block[ctr_idx] - old_block[ctr_idx];
 					}
 				}
 			} else {
 				for (ctr_idx = phys_layout->headers_per_block;
 				     ctr_idx < values_per_block; ctr_idx++) {
-					acc_block[ctr_idx] +=
-						new_block[ctr_idx];
+					acc_block[ctr_idx] += new_block[ctr_idx];
 				}
 			}
 		}
@@ -542,27 +550,25 @@ static void kbasep_hwcnt_backend_csf_accumulate_sample(
 		acc_block += values_per_block;
 	}
 
-	WARN_ON(old_block !=
-		old_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
-	WARN_ON(new_block !=
-		new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
-	WARN_ON(acc_block !=
-		accum_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
+	WARN_ON(old_block != old_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
+	WARN_ON(new_block != new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
+	WARN_ON(acc_block != accum_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES) -
+				     (values_per_block * phys_layout->fw_block_cnt));
 	(void)dump_bytes;
 }
 
-static void kbasep_hwcnt_backend_csf_accumulate_samples(
-	struct kbase_hwcnt_backend_csf *backend_csf, u32 extract_index_to_start,
-	u32 insert_index_to_stop)
+static void kbasep_hwcnt_backend_csf_accumulate_samples(struct kbase_hwcnt_backend_csf *backend_csf,
+							u32 extract_index_to_start,
+							u32 insert_index_to_stop)
 {
 	u32 raw_idx;
-	unsigned long flags;
+	unsigned long flags = 0UL;
 	u8 *cpu_dump_base = (u8 *)backend_csf->ring_buf_cpu_base;
 	const size_t ring_buf_cnt = backend_csf->info->ring_buf_cnt;
 	const size_t buf_dump_bytes = backend_csf->info->prfcnt_info.dump_bytes;
 	bool clearing_samples = backend_csf->info->prfcnt_info.clearing_samples;
 	u32 *old_sample_buf = backend_csf->old_sample_buf;
-	u32 *new_sample_buf;
+	u32 *new_sample_buf = old_sample_buf;
 
 	if (extract_index_to_start == insert_index_to_stop)
 		/* No samples to accumulate. Early out. */
@@ -570,25 +576,22 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples(
 
 	/* Sync all the buffers to CPU side before read the data. */
 	backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx,
-						 backend_csf->ring_buf,
-						 extract_index_to_start,
+						 backend_csf->ring_buf, extract_index_to_start,
 						 insert_index_to_stop, true);
 
 	/* Consider u32 wrap case, '!=' is used here instead of '<' operator */
-	for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop;
-	     raw_idx++) {
+	for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; raw_idx++) {
 		/* The logical "&" acts as a modulo operation since buf_count
 		 * must be a power of two.
 		 */
 		const u32 buf_idx = raw_idx & (ring_buf_cnt - 1);
 
-		new_sample_buf =
-			(u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes];
+		new_sample_buf = (u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes];
 
-		kbasep_hwcnt_backend_csf_accumulate_sample(
-			&backend_csf->phys_layout, buf_dump_bytes,
-			backend_csf->accum_buf, old_sample_buf, new_sample_buf,
-			clearing_samples);
+		kbasep_hwcnt_backend_csf_accumulate_sample(&backend_csf->phys_layout,
+							   buf_dump_bytes, backend_csf->accum_buf,
+							   old_sample_buf, new_sample_buf,
+							   clearing_samples);
 
 		old_sample_buf = new_sample_buf;
 	}
@@ -597,19 +600,16 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples(
 	memcpy(backend_csf->old_sample_buf, new_sample_buf, buf_dump_bytes);
 
 	/* Reset the prfcnt_en header on each sample before releasing them. */
-	for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop;
-	     raw_idx++) {
+	for (raw_idx = extract_index_to_start; raw_idx != insert_index_to_stop; raw_idx++) {
 		const u32 buf_idx = raw_idx & (ring_buf_cnt - 1);
 		u32 *sample = (u32 *)&cpu_dump_base[buf_idx * buf_dump_bytes];
 
-		kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(
-			backend_csf, sample);
+		kbasep_hwcnt_backend_csf_zero_sample_prfcnt_en_header(backend_csf, sample);
 	}
 
 	/* Sync zeroed buffers to avoid coherency issues on future use. */
 	backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx,
-						 backend_csf->ring_buf,
-						 extract_index_to_start,
+						 backend_csf->ring_buf, extract_index_to_start,
 						 insert_index_to_stop, false);
 
 	/* After consuming all samples between extract_idx and insert_idx,
@@ -617,22 +617,20 @@ static void kbasep_hwcnt_backend_csf_accumulate_samples(
 	 * can be released back to the ring buffer pool.
 	 */
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
-	backend_csf->info->csf_if->set_extract_index(
-		backend_csf->info->csf_if->ctx, insert_index_to_stop);
+	backend_csf->info->csf_if->set_extract_index(backend_csf->info->csf_if->ctx,
+						     insert_index_to_stop);
 	/* Update the watchdog last seen index to check any new FW auto samples
 	 * in next watchdog callback.
 	 */
 	backend_csf->watchdog_last_seen_insert_idx = insert_index_to_stop;
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 }
 
 static void kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
 	struct kbase_hwcnt_backend_csf *backend_csf,
 	enum kbase_hwcnt_backend_csf_enable_state new_state)
 {
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
 
 	if (backend_csf->enable_state != new_state) {
 		backend_csf->enable_state = new_state;
@@ -645,7 +643,7 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info = info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
-	unsigned long flags;
+	unsigned long flags = 0UL;
 
 	csf_info->csf_if->lock(csf_info->csf_if->ctx, &flags);
 
@@ -663,26 +661,22 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info)
 	    (!csf_info->fw_in_protected_mode) &&
 	    /* 3. dump state indicates no other dumping is in progress. */
 	    ((backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) ||
-	     (backend_csf->dump_state ==
-	      KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED))) {
-		u32 extract_index;
-		u32 insert_index;
+	     (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED))) {
+		u32 extract_index = 0U;
+		u32 insert_index = 0U;
 
 		/* Read the raw extract and insert indexes from the CSF interface. */
-		csf_info->csf_if->get_indexes(csf_info->csf_if->ctx,
-					      &extract_index, &insert_index);
+		csf_info->csf_if->get_indexes(csf_info->csf_if->ctx, &extract_index, &insert_index);
 
 		/* Do watchdog request if no new FW auto samples. */
-		if (insert_index ==
-		    backend_csf->watchdog_last_seen_insert_idx) {
+		if (insert_index == backend_csf->watchdog_last_seen_insert_idx) {
 			/* Trigger the watchdog request. */
 			csf_info->csf_if->dump_request(csf_info->csf_if->ctx);
 
 			/* A watchdog dump is required, change the state to
 			 * start the request process.
 			 */
-			backend_csf->dump_state =
-				KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED;
+			backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED;
 		}
 	}
 
@@ -691,12 +685,10 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info)
 	 * counter enabled interrupt.
 	 */
 	if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) ||
-	    (backend_csf->enable_state ==
-	     KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED)) {
+	    (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED)) {
 		/* Reschedule the timer for next watchdog callback. */
-		csf_info->watchdog_if->modify(
-			csf_info->watchdog_if->timer,
-			HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS);
+		csf_info->watchdog_if->modify(csf_info->watchdog_if->timer,
+					      HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS);
 	}
 
 	csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags);
@@ -712,15 +704,14 @@ static void kbasep_hwcnt_backend_watchdog_timer_cb(void *info)
  */
 static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work)
 {
-	unsigned long flags;
+	unsigned long flags = 0ULL;
 	struct kbase_hwcnt_backend_csf *backend_csf;
 	u32 insert_index_to_acc;
-	u32 extract_index;
-	u32 insert_index;
+	u32 extract_index = 0U;
+	u32 insert_index = 0U;
 
 	WARN_ON(!work);
-	backend_csf = container_of(work, struct kbase_hwcnt_backend_csf,
-				   hwc_dump_work);
+	backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, hwc_dump_work);
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
 	/* Assert the backend is not destroyed. */
 	WARN_ON(backend_csf != backend_csf->info->backend);
@@ -729,26 +720,22 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work)
 	 * launched.
 	 */
 	if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) {
-		WARN_ON(backend_csf->dump_state !=
-			KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE);
+		WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE);
 		WARN_ON(!completion_done(&backend_csf->dump_completed));
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return;
 	}
 
-	WARN_ON(backend_csf->dump_state !=
-		KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED);
+	WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED);
 
 	backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING;
 	insert_index_to_acc = backend_csf->insert_index_to_accumulate;
 
 	/* Read the raw extract and insert indexes from the CSF interface. */
-	backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx,
-					       &extract_index, &insert_index);
+	backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, &extract_index,
+					       &insert_index);
 
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* Accumulate up to the insert we grabbed at the prfcnt request
 	 * interrupt.
@@ -769,22 +756,18 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work)
 	/* The backend was disabled or had an error while we were accumulating.
 	 */
 	if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) {
-		WARN_ON(backend_csf->dump_state !=
-			KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE);
+		WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE);
 		WARN_ON(!completion_done(&backend_csf->dump_completed));
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return;
 	}
 
-	WARN_ON(backend_csf->dump_state !=
-		KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING);
+	WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_ACCUMULATING);
 
 	/* Our work here is done - set the wait object and unblock waiters. */
 	backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED;
 	complete_all(&backend_csf->dump_completed);
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 }
 
 /**
@@ -797,30 +780,28 @@ static void kbasep_hwcnt_backend_csf_dump_worker(struct work_struct *work)
  */
 static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work)
 {
-	unsigned long flags;
+	unsigned long flags = 0ULL;
 	struct kbase_hwcnt_backend_csf *backend_csf;
-	u32 extract_index;
-	u32 insert_index;
+	u32 extract_index = 0U;
+	u32 insert_index = 0U;
 
 	WARN_ON(!work);
 
-	backend_csf = container_of(work, struct kbase_hwcnt_backend_csf,
-				   hwc_threshold_work);
+	backend_csf = container_of(work, struct kbase_hwcnt_backend_csf, hwc_threshold_work);
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
 
 	/* Assert the backend is not destroyed. */
 	WARN_ON(backend_csf != backend_csf->info->backend);
 
 	/* Read the raw extract and insert indexes from the CSF interface. */
-	backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx,
-					       &extract_index, &insert_index);
+	backend_csf->info->csf_if->get_indexes(backend_csf->info->csf_if->ctx, &extract_index,
+					       &insert_index);
 
 	/* The backend was disabled or had an error while the worker was being
 	 * launched.
 	 */
 	if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) {
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return;
 	}
 
@@ -829,14 +810,11 @@ static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work)
 	 * interfere.
 	 */
 	if ((backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) &&
-	    (backend_csf->dump_state !=
-	     KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED)) {
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+	    (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED)) {
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return;
 	}
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* Accumulate everything we possibly can. We grabbed the insert index
 	 * immediately after we acquired the lock but before we checked whether
@@ -845,14 +823,13 @@ static void kbasep_hwcnt_backend_csf_threshold_worker(struct work_struct *work)
 	 * fact that our insert will not exceed the concurrent dump's
 	 * insert_to_accumulate, so we don't risk accumulating too much data.
 	 */
-	kbasep_hwcnt_backend_csf_accumulate_samples(backend_csf, extract_index,
-						    insert_index);
+	kbasep_hwcnt_backend_csf_accumulate_samples(backend_csf, extract_index, insert_index);
 
 	/* No need to wake up anything since it is not a user dump request. */
 }
 
-static void kbase_hwcnt_backend_csf_submit_dump_worker(
-	struct kbase_hwcnt_backend_csf_info *csf_info)
+static void
+kbase_hwcnt_backend_csf_submit_dump_worker(struct kbase_hwcnt_backend_csf_info *csf_info)
 {
 	u32 extract_index;
 
@@ -860,31 +837,26 @@ static void kbase_hwcnt_backend_csf_submit_dump_worker(
 	csf_info->csf_if->assert_lock_held(csf_info->csf_if->ctx);
 
 	WARN_ON(!kbasep_hwcnt_backend_csf_backend_exists(csf_info));
-	WARN_ON(csf_info->backend->enable_state !=
-		KBASE_HWCNT_BACKEND_CSF_ENABLED);
-	WARN_ON(csf_info->backend->dump_state !=
-		KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT);
+	WARN_ON(csf_info->backend->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED);
+	WARN_ON(csf_info->backend->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT);
 
 	/* Save insert index now so that the dump worker only accumulates the
 	 * HWC data associated with this request. Extract index is not stored
 	 * as that needs to be checked when accumulating to prevent re-reading
 	 * buffers that have already been read and returned to the GPU.
 	 */
-	csf_info->csf_if->get_indexes(
-		csf_info->csf_if->ctx, &extract_index,
-		&csf_info->backend->insert_index_to_accumulate);
-	csf_info->backend->dump_state =
-		KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED;
+	csf_info->csf_if->get_indexes(csf_info->csf_if->ctx, &extract_index,
+				      &csf_info->backend->insert_index_to_accumulate);
+	csf_info->backend->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_WORKER_LAUNCHED;
 
 	/* Submit the accumulator task into the work queue. */
-	queue_work(csf_info->backend->hwc_dump_workq,
-		   &csf_info->backend->hwc_dump_work);
+	queue_work(csf_info->backend->hwc_dump_workq, &csf_info->backend->hwc_dump_work);
 }
 
-static void kbasep_hwcnt_backend_csf_get_physical_enable(
-	struct kbase_hwcnt_backend_csf *backend_csf,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	struct kbase_hwcnt_backend_csf_if_enable *enable)
+static void
+kbasep_hwcnt_backend_csf_get_physical_enable(struct kbase_hwcnt_backend_csf *backend_csf,
+					     const struct kbase_hwcnt_enable_map *enable_map,
+					     struct kbase_hwcnt_backend_csf_if_enable *enable)
 {
 	enum kbase_hwcnt_physical_set phys_counter_set;
 	struct kbase_hwcnt_physical_enable_map phys_enable_map;
@@ -896,8 +868,7 @@ static void kbasep_hwcnt_backend_csf_get_physical_enable(
 	 */
 	kbasep_hwcnt_backend_csf_process_enable_map(&phys_enable_map);
 
-	kbase_hwcnt_gpu_set_to_physical(&phys_counter_set,
-					backend_csf->info->counter_set);
+	kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, backend_csf->info->counter_set);
 
 	/* Use processed enable_map to enable HWC in HW level. */
 	enable->fe_bm = phys_enable_map.fe_bm;
@@ -909,33 +880,29 @@ static void kbasep_hwcnt_backend_csf_get_physical_enable(
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_enable_nolock_fn */
-static int kbasep_hwcnt_backend_csf_dump_enable_nolock(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static int
+kbasep_hwcnt_backend_csf_dump_enable_nolock(struct kbase_hwcnt_backend *backend,
+					    const struct kbase_hwcnt_enable_map *enable_map)
 {
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	struct kbase_hwcnt_backend_csf_if_enable enable;
 	int err;
 
-	if (!backend_csf || !enable_map ||
-	    (enable_map->metadata != backend_csf->info->metadata))
+	if (!backend_csf || !enable_map || (enable_map->metadata != backend_csf->info->metadata))
 		return -EINVAL;
 
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
 
-	kbasep_hwcnt_backend_csf_get_physical_enable(backend_csf, enable_map,
-						     &enable);
+	kbasep_hwcnt_backend_csf_get_physical_enable(backend_csf, enable_map, &enable);
 
 	/* enable_state should be DISABLED before we transfer it to enabled */
 	if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_DISABLED)
 		return -EIO;
 
-	err = backend_csf->info->watchdog_if->enable(
-		backend_csf->info->watchdog_if->timer,
-		HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS,
-		kbasep_hwcnt_backend_watchdog_timer_cb, backend_csf->info);
+	err = backend_csf->info->watchdog_if->enable(backend_csf->info->watchdog_if->timer,
+						     HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS,
+						     kbasep_hwcnt_backend_watchdog_timer_cb,
+						     backend_csf->info);
 	if (err)
 		return err;
 
@@ -953,58 +920,46 @@ static int kbasep_hwcnt_backend_csf_dump_enable_nolock(
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_enable_fn */
-static int kbasep_hwcnt_backend_csf_dump_enable(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static int kbasep_hwcnt_backend_csf_dump_enable(struct kbase_hwcnt_backend *backend,
+						const struct kbase_hwcnt_enable_map *enable_map)
 {
 	int errcode;
-	unsigned long flags;
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	unsigned long flags = 0UL;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 
 	if (!backend_csf)
 		return -EINVAL;
 
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
-	errcode = kbasep_hwcnt_backend_csf_dump_enable_nolock(backend,
-							      enable_map);
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	errcode = kbasep_hwcnt_backend_csf_dump_enable_nolock(backend, enable_map);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 	return errcode;
 }
 
 static void kbasep_hwcnt_backend_csf_wait_enable_transition_complete(
 	struct kbase_hwcnt_backend_csf *backend_csf, unsigned long *lock_flags)
 {
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
-
-	while ((backend_csf->enable_state ==
-		KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) ||
-	       (backend_csf->enable_state ==
-		KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)) {
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, *lock_flags);
-
-		wait_event(
-			backend_csf->enable_state_waitq,
-			(backend_csf->enable_state !=
-			 KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) &&
-				(backend_csf->enable_state !=
-				 KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED));
-
-		backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx,
-						lock_flags);
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
+
+	while ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) ||
+	       (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED)) {
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, *lock_flags);
+
+		wait_event(backend_csf->enable_state_waitq,
+			   (backend_csf->enable_state !=
+			    KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) &&
+				   (backend_csf->enable_state !=
+				    KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED));
+
+		backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, lock_flags);
 	}
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_disable_fn */
-static void
-kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
+static void kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 {
-	unsigned long flags;
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	unsigned long flags = 0UL;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	bool do_disable = false;
 
 	WARN_ON(!backend_csf);
@@ -1014,24 +969,20 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 	/* Make sure we wait until any previous enable or disable have completed
 	 * before doing anything.
 	 */
-	kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf,
-								 &flags);
+	kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, &flags);
 
 	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED ||
-	    backend_csf->enable_state ==
-		    KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) {
+	    backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) {
 		/* If we are already disabled or in an unrecoverable error
 		 * state, there is nothing for us to do.
 		 */
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return;
 	}
 
 	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) {
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-			backend_csf,
-			KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
+			backend_csf, KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
 		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE;
 		complete_all(&backend_csf->dump_completed);
 		/* Only disable if we were previously enabled - in all other
@@ -1043,15 +994,13 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 	WARN_ON(backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE);
 	WARN_ON(!completion_done(&backend_csf->dump_completed));
 
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* Deregister the timer and block until any timer callback has completed.
 	 * We've transitioned out of the ENABLED state so we can guarantee it
 	 * won't reschedule itself.
 	 */
-	backend_csf->info->watchdog_if->disable(
-		backend_csf->info->watchdog_if->timer);
+	backend_csf->info->watchdog_if->disable(backend_csf->info->watchdog_if->timer);
 
 	/* Block until any async work has completed. We have transitioned out of
 	 * the ENABLED state so we can guarantee no new work will concurrently
@@ -1062,11 +1011,9 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
 
 	if (do_disable)
-		backend_csf->info->csf_if->dump_disable(
-			backend_csf->info->csf_if->ctx);
+		backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx);
 
-	kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf,
-								 &flags);
+	kbasep_hwcnt_backend_csf_wait_enable_transition_complete(backend_csf, &flags);
 
 	switch (backend_csf->enable_state) {
 	case KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER:
@@ -1075,8 +1022,7 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 		break;
 	case KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER:
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-			backend_csf,
-			KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR);
+			backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR);
 		break;
 	default:
 		WARN_ON(true);
@@ -1086,8 +1032,7 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 	backend_csf->user_requested = false;
 	backend_csf->watchdog_last_seen_insert_idx = 0;
 
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* After disable, zero the header of all buffers in the ring buffer back
 	 * to 0 to prepare for the next enable.
@@ -1095,9 +1040,9 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 	kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(backend_csf);
 
 	/* Sync zeroed buffers to avoid coherency issues on future use. */
-	backend_csf->info->csf_if->ring_buf_sync(
-		backend_csf->info->csf_if->ctx, backend_csf->ring_buf, 0,
-		backend_csf->info->ring_buf_cnt, false);
+	backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx,
+						 backend_csf->ring_buf, 0,
+						 backend_csf->info->ring_buf_cnt, false);
 
 	/* Reset accumulator, old_sample_buf and user_sample to all-0 to prepare
 	 * for next enable.
@@ -1106,13 +1051,11 @@ kbasep_hwcnt_backend_csf_dump_disable(struct kbase_hwcnt_backend *backend)
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_request_fn */
-static int
-kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
-				      u64 *dump_time_ns)
+static int kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
+						 u64 *dump_time_ns)
 {
-	unsigned long flags;
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	unsigned long flags = 0UL;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	bool do_request = false;
 	bool watchdog_dumping = false;
 
@@ -1125,22 +1068,18 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
 	 * the user dump buffer is already zeroed. We can just short circuit to
 	 * the DUMP_COMPLETED state.
 	 */
-	if (backend_csf->enable_state ==
-	    KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) {
-		backend_csf->dump_state =
-			KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED;
+	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) {
+		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED;
 		*dump_time_ns = kbasep_hwcnt_backend_csf_timestamp_ns(backend);
 		kbasep_hwcnt_backend_csf_cc_update(backend_csf);
 		backend_csf->user_requested = true;
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return 0;
 	}
 
 	/* Otherwise, make sure we're already enabled. */
 	if (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_ENABLED) {
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		return -EIO;
 	}
 
@@ -1153,15 +1092,12 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
 	 * request can be processed instead of ignored.
 	 */
 	if ((backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE) &&
-	    (backend_csf->dump_state !=
-	     KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) &&
-	    (backend_csf->dump_state !=
-	     KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)) {
+	    (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) &&
+	    (backend_csf->dump_state != KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)) {
 		/* HWC is disabled or another user dump is ongoing,
 		 * or we're on fault.
 		 */
-		backend_csf->info->csf_if->unlock(
-			backend_csf->info->csf_if->ctx, flags);
+		backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 		/* HWC is disabled or another dump is ongoing, or we are on
 		 * fault.
 		 */
@@ -1171,8 +1107,7 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
 	/* Reset the completion so dump_wait() has something to wait on. */
 	reinit_completion(&backend_csf->dump_completed);
 
-	if (backend_csf->dump_state ==
-	    KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)
+	if (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)
 		watchdog_dumping = true;
 
 	if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) &&
@@ -1180,15 +1115,13 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
 		/* Only do the request if we are fully enabled and not in
 		 * protected mode.
 		 */
-		backend_csf->dump_state =
-			KBASE_HWCNT_BACKEND_CSF_DUMP_REQUESTED;
+		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_REQUESTED;
 		do_request = true;
 	} else {
 		/* Skip the request and waiting for ack and go straight to
 		 * checking the insert and kicking off the worker to do the dump
 		 */
-		backend_csf->dump_state =
-			KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT;
+		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_QUERYING_INSERT;
 	}
 
 	/* CSF firmware might enter protected mode now, but still call request.
@@ -1210,31 +1143,26 @@ kbasep_hwcnt_backend_csf_dump_request(struct kbase_hwcnt_backend *backend,
 		 * ownership of the sample which watchdog requested.
 		 */
 		if (!watchdog_dumping)
-			backend_csf->info->csf_if->dump_request(
-				backend_csf->info->csf_if->ctx);
+			backend_csf->info->csf_if->dump_request(backend_csf->info->csf_if->ctx);
 	} else
 		kbase_hwcnt_backend_csf_submit_dump_worker(backend_csf->info);
 
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* Modify watchdog timer to delay the regular check time since
 	 * just requested.
 	 */
-	backend_csf->info->watchdog_if->modify(
-		backend_csf->info->watchdog_if->timer,
-		HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS);
+	backend_csf->info->watchdog_if->modify(backend_csf->info->watchdog_if->timer,
+					       HWCNT_BACKEND_WATCHDOG_TIMER_INTERVAL_MS);
 
 	return 0;
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_wait_fn */
-static int
-kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend)
+static int kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend)
 {
-	unsigned long flags;
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	unsigned long flags = 0UL;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	int errcode;
 
 	if (!backend_csf)
@@ -1247,26 +1175,21 @@ kbasep_hwcnt_backend_csf_dump_wait(struct kbase_hwcnt_backend *backend)
 	 * set.
 	 */
 	if (backend_csf->user_requested &&
-	    ((backend_csf->dump_state ==
-	      KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) ||
-	     (backend_csf->dump_state ==
-	      KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)))
+	    ((backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED) ||
+	     (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED)))
 		errcode = 0;
 	else
 		errcode = -EIO;
 
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	return errcode;
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_clear_fn */
-static int
-kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend)
+static int kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend)
 {
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	int errcode;
 	u64 ts;
 
@@ -1285,13 +1208,12 @@ kbasep_hwcnt_backend_csf_dump_clear(struct kbase_hwcnt_backend *backend)
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_dump_get_fn */
-static int kbasep_hwcnt_backend_csf_dump_get(
-	struct kbase_hwcnt_backend *backend,
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate)
+static int kbasep_hwcnt_backend_csf_dump_get(struct kbase_hwcnt_backend *backend,
+					     struct kbase_hwcnt_dump_buffer *dst,
+					     const struct kbase_hwcnt_enable_map *dst_enable_map,
+					     bool accumulate)
 {
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 	int ret;
 	size_t clk;
 
@@ -1301,9 +1223,9 @@ static int kbasep_hwcnt_backend_csf_dump_get(
 		return -EINVAL;
 
 	/* Extract elapsed cycle count for each clock domain if enabled. */
-	kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) {
-		if (!kbase_hwcnt_clk_enable_map_enabled(
-			    dst_enable_map->clk_enable_map, clk))
+	kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk)
+	{
+		if (!kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk))
 			continue;
 
 		/* Reset the counter to zero if accumulation is off. */
@@ -1316,8 +1238,7 @@ static int kbasep_hwcnt_backend_csf_dump_get(
 	 * as it is undefined to call this function without a prior succeeding
 	 * one to dump_wait().
 	 */
-	ret = kbase_hwcnt_csf_dump_get(dst, backend_csf->to_user_buf,
-				       dst_enable_map, accumulate);
+	ret = kbase_hwcnt_csf_dump_get(dst, backend_csf->to_user_buf, dst_enable_map, accumulate);
 
 	return ret;
 }
@@ -1329,8 +1250,7 @@ static int kbasep_hwcnt_backend_csf_dump_get(
  * Can be safely called on a backend in any state of partial construction.
  *
  */
-static void
-kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf)
+static void kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	if (!backend_csf)
 		return;
@@ -1360,9 +1280,8 @@ kbasep_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_csf *backend_csf)
  *
  * Return: 0 on success, else error code.
  */
-static int
-kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
-				struct kbase_hwcnt_backend_csf **out_backend)
+static int kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
+					   struct kbase_hwcnt_backend_csf **out_backend)
 {
 	struct kbase_hwcnt_backend_csf *backend_csf = NULL;
 	int errcode = -ENOMEM;
@@ -1375,27 +1294,23 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
 		goto alloc_error;
 
 	backend_csf->info = csf_info;
-	kbasep_hwcnt_backend_csf_init_layout(&csf_info->prfcnt_info,
-					     &backend_csf->phys_layout);
+	kbasep_hwcnt_backend_csf_init_layout(&csf_info->prfcnt_info, &backend_csf->phys_layout);
 
-	backend_csf->accum_buf =
-		kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL);
+	backend_csf->accum_buf = kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL);
 	if (!backend_csf->accum_buf)
 		goto err_alloc_acc_buf;
 
-	backend_csf->old_sample_buf =
-		kzalloc(csf_info->prfcnt_info.dump_bytes, GFP_KERNEL);
+	backend_csf->old_sample_buf = kzalloc(csf_info->prfcnt_info.dump_bytes, GFP_KERNEL);
 	if (!backend_csf->old_sample_buf)
 		goto err_alloc_pre_sample_buf;
 
-	backend_csf->to_user_buf =
-		kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL);
+	backend_csf->to_user_buf = kzalloc(csf_info->metadata->dump_buf_bytes, GFP_KERNEL);
 	if (!backend_csf->to_user_buf)
 		goto err_alloc_user_sample_buf;
 
-	errcode = csf_info->csf_if->ring_buf_alloc(
-		csf_info->csf_if->ctx, csf_info->ring_buf_cnt,
-		&backend_csf->ring_buf_cpu_base, &backend_csf->ring_buf);
+	errcode = csf_info->csf_if->ring_buf_alloc(csf_info->csf_if->ctx, csf_info->ring_buf_cnt,
+						   &backend_csf->ring_buf_cpu_base,
+						   &backend_csf->ring_buf);
 	if (errcode)
 		goto err_ring_buf_alloc;
 	errcode = -ENOMEM;
@@ -1404,9 +1319,9 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
 	kbasep_hwcnt_backend_csf_zero_all_prfcnt_en_header(backend_csf);
 
 	/* Sync zeroed buffers to avoid coherency issues on use. */
-	backend_csf->info->csf_if->ring_buf_sync(
-		backend_csf->info->csf_if->ctx, backend_csf->ring_buf, 0,
-		backend_csf->info->ring_buf_cnt, false);
+	backend_csf->info->csf_if->ring_buf_sync(backend_csf->info->csf_if->ctx,
+						 backend_csf->ring_buf, 0,
+						 backend_csf->info->ring_buf_cnt, false);
 
 	init_completion(&backend_csf->dump_completed);
 
@@ -1420,10 +1335,8 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
 	if (!backend_csf->hwc_dump_workq)
 		goto err_alloc_workqueue;
 
-	INIT_WORK(&backend_csf->hwc_dump_work,
-		  kbasep_hwcnt_backend_csf_dump_worker);
-	INIT_WORK(&backend_csf->hwc_threshold_work,
-		  kbasep_hwcnt_backend_csf_threshold_worker);
+	INIT_WORK(&backend_csf->hwc_dump_work, kbasep_hwcnt_backend_csf_dump_worker);
+	INIT_WORK(&backend_csf->hwc_threshold_work, kbasep_hwcnt_backend_csf_threshold_worker);
 
 	backend_csf->enable_state = KBASE_HWCNT_BACKEND_CSF_DISABLED;
 	backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE;
@@ -1434,7 +1347,6 @@ kbasep_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_info *csf_info,
 	*out_backend = backend_csf;
 	return 0;
 
-	destroy_workqueue(backend_csf->hwc_dump_workq);
 err_alloc_workqueue:
 	backend_csf->info->csf_if->ring_buf_free(backend_csf->info->csf_if->ctx,
 						 backend_csf->ring_buf);
@@ -1454,14 +1366,12 @@ alloc_error:
 }
 
 /* CSF backend implementation of kbase_hwcnt_backend_init_fn */
-static int
-kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info,
-			      struct kbase_hwcnt_backend **out_backend)
+static int kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info,
+					 struct kbase_hwcnt_backend **out_backend)
 {
-	unsigned long flags;
+	unsigned long flags = 0UL;
 	struct kbase_hwcnt_backend_csf *backend_csf = NULL;
-	struct kbase_hwcnt_backend_csf_info *csf_info =
-		(struct kbase_hwcnt_backend_csf_info *)info;
+	struct kbase_hwcnt_backend_csf_info *csf_info = (struct kbase_hwcnt_backend_csf_info *)info;
 	int errcode;
 	bool success = false;
 
@@ -1482,11 +1392,9 @@ kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info,
 		*out_backend = (struct kbase_hwcnt_backend *)backend_csf;
 		success = true;
 		if (csf_info->unrecoverable_error_happened)
-			backend_csf->enable_state =
-				KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR;
+			backend_csf->enable_state = KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR;
 	}
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	/* Destroy the new created backend if the backend has already created
 	 * before. In normal case, this won't happen if the client call init()
@@ -1503,9 +1411,8 @@ kbasep_hwcnt_backend_csf_init(const struct kbase_hwcnt_backend_info *info,
 /* CSF backend implementation of kbase_hwcnt_backend_term_fn */
 static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend)
 {
-	unsigned long flags;
-	struct kbase_hwcnt_backend_csf *backend_csf =
-		(struct kbase_hwcnt_backend_csf *)backend;
+	unsigned long flags = 0UL;
+	struct kbase_hwcnt_backend_csf *backend_csf = (struct kbase_hwcnt_backend_csf *)backend;
 
 	if (!backend)
 		return;
@@ -1517,8 +1424,7 @@ static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend)
 	 */
 	backend_csf->info->csf_if->lock(backend_csf->info->csf_if->ctx, &flags);
 	backend_csf->info->backend = NULL;
-	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx,
-					  flags);
+	backend_csf->info->csf_if->unlock(backend_csf->info->csf_if->ctx, flags);
 
 	kbasep_hwcnt_backend_csf_destroy(backend_csf);
 }
@@ -1530,8 +1436,7 @@ static void kbasep_hwcnt_backend_csf_term(struct kbase_hwcnt_backend *backend)
  * Can be safely called on a backend info in any state of partial construction.
  *
  */
-static void kbasep_hwcnt_backend_csf_info_destroy(
-	const struct kbase_hwcnt_backend_csf_info *info)
+static void kbasep_hwcnt_backend_csf_info_destroy(const struct kbase_hwcnt_backend_csf_info *info)
 {
 	if (!info)
 		return;
@@ -1558,10 +1463,10 @@ static void kbasep_hwcnt_backend_csf_info_destroy(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_csf_info_create(
-	struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
-	struct kbase_hwcnt_watchdog_interface *watchdog_if,
-	const struct kbase_hwcnt_backend_csf_info **out_info)
+static int
+kbasep_hwcnt_backend_csf_info_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
+				     struct kbase_hwcnt_watchdog_interface *watchdog_if,
+				     const struct kbase_hwcnt_backend_csf_info **out_info)
 {
 	struct kbase_hwcnt_backend_csf_info *info = NULL;
 
@@ -1584,8 +1489,7 @@ static int kbasep_hwcnt_backend_csf_info_create(
 		.counter_set = KBASE_HWCNT_SET_PRIMARY,
 #endif
 		.backend = NULL, .csf_if = csf_if, .ring_buf_cnt = ring_buf_cnt,
-		.fw_in_protected_mode = false,
-		.unrecoverable_error_happened = false,
+		.fw_in_protected_mode = false, .unrecoverable_error_happened = false,
 		.watchdog_if = watchdog_if,
 	};
 	*out_info = info;
@@ -1605,19 +1509,17 @@ kbasep_hwcnt_backend_csf_metadata(const struct kbase_hwcnt_backend_info *info)
 	return ((const struct kbase_hwcnt_backend_csf_info *)info)->metadata;
 }
 
-static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
-	struct kbase_hwcnt_backend_csf *backend_csf)
+static void
+kbasep_hwcnt_backend_csf_handle_unrecoverable_error(struct kbase_hwcnt_backend_csf *backend_csf)
 {
 	bool do_disable = false;
 
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
 
 	/* We are already in or transitioning to the unrecoverable error state.
 	 * Early out.
 	 */
-	if ((backend_csf->enable_state ==
-	     KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) ||
+	if ((backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) ||
 	    (backend_csf->enable_state ==
 	     KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER))
 		return;
@@ -1627,8 +1529,7 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
 	 */
 	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED) {
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-			backend_csf,
-			KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR);
+			backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR);
 		return;
 	}
 
@@ -1636,12 +1537,11 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
 	 * disabled, we don't want to disable twice if an unrecoverable error
 	 * happens while we are disabling.
 	 */
-	do_disable = (backend_csf->enable_state !=
-		      KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
+	do_disable =
+		(backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
 
 	kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-		backend_csf,
-		KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER);
+		backend_csf, KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR_WAIT_FOR_WORKER);
 
 	/* Transition the dump to the IDLE state and unblock any waiters. The
 	 * IDLE state signifies an error.
@@ -1654,15 +1554,13 @@ static void kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
 	 * happens while we are disabling.
 	 */
 	if (do_disable)
-		backend_csf->info->csf_if->dump_disable(
-			backend_csf->info->csf_if->ctx);
+		backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx);
 }
 
-static void kbasep_hwcnt_backend_csf_handle_recoverable_error(
-	struct kbase_hwcnt_backend_csf *backend_csf)
+static void
+kbasep_hwcnt_backend_csf_handle_recoverable_error(struct kbase_hwcnt_backend_csf *backend_csf)
 {
-	backend_csf->info->csf_if->assert_lock_held(
-		backend_csf->info->csf_if->ctx);
+	backend_csf->info->csf_if->assert_lock_held(backend_csf->info->csf_if->ctx);
 
 	switch (backend_csf->enable_state) {
 	case KBASE_HWCNT_BACKEND_CSF_DISABLED:
@@ -1678,8 +1576,7 @@ static void kbasep_hwcnt_backend_csf_handle_recoverable_error(
 		/* A seemingly recoverable error that occurs while we are
 		 * transitioning to enabled is probably unrecoverable.
 		 */
-		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
-			backend_csf);
+		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(backend_csf);
 		return;
 	case KBASE_HWCNT_BACKEND_CSF_ENABLED:
 		/* Start transitioning to the disabled state. We can't wait for
@@ -1688,22 +1585,19 @@ static void kbasep_hwcnt_backend_csf_handle_recoverable_error(
 		 * disable().
 		 */
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-			backend_csf,
-			KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
+			backend_csf, KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED);
 		/* Transition the dump to the IDLE state and unblock any
 		 * waiters. The IDLE state signifies an error.
 		 */
 		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_IDLE;
 		complete_all(&backend_csf->dump_completed);
 
-		backend_csf->info->csf_if->dump_disable(
-			backend_csf->info->csf_if->ctx);
+		backend_csf->info->csf_if->dump_disable(backend_csf->info->csf_if->ctx);
 		return;
 	}
 }
 
-void kbase_hwcnt_backend_csf_protm_entered(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_protm_entered(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info =
 		(struct kbase_hwcnt_backend_csf_info *)iface->info;
@@ -1717,8 +1611,7 @@ void kbase_hwcnt_backend_csf_protm_entered(
 	kbase_hwcnt_backend_csf_on_prfcnt_sample(iface);
 }
 
-void kbase_hwcnt_backend_csf_protm_exited(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_protm_exited(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 
@@ -1728,10 +1621,9 @@ void kbase_hwcnt_backend_csf_protm_exited(
 	csf_info->fw_in_protected_mode = false;
 }
 
-void kbase_hwcnt_backend_csf_on_unrecoverable_error(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_unrecoverable_error(struct kbase_hwcnt_backend_interface *iface)
 {
-	unsigned long flags;
+	unsigned long flags = 0UL;
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 
 	csf_info = (struct kbase_hwcnt_backend_csf_info *)iface->info;
@@ -1749,10 +1641,9 @@ void kbase_hwcnt_backend_csf_on_unrecoverable_error(
 	csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags);
 }
 
-void kbase_hwcnt_backend_csf_on_before_reset(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_before_reset(struct kbase_hwcnt_backend_interface *iface)
 {
-	unsigned long flags;
+	unsigned long flags = 0UL;
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
 
@@ -1768,8 +1659,7 @@ void kbase_hwcnt_backend_csf_on_before_reset(
 	backend_csf = csf_info->backend;
 
 	if ((backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_DISABLED) &&
-	    (backend_csf->enable_state !=
-	     KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR)) {
+	    (backend_csf->enable_state != KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR)) {
 		/* Before a reset occurs, we must either have been disabled
 		 * (else we lose data) or we should have encountered an
 		 * unrecoverable error. Either way, we will have disabled the
@@ -1780,13 +1670,11 @@ void kbase_hwcnt_backend_csf_on_before_reset(
 		 * We can't wait for this disable to complete, but it doesn't
 		 * really matter, the power is being pulled.
 		 */
-		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
-			csf_info->backend);
+		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend);
 	}
 
 	/* A reset is the only way to exit the unrecoverable error state */
-	if (backend_csf->enable_state ==
-	    KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) {
+	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_UNRECOVERABLE_ERROR) {
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
 			backend_csf, KBASE_HWCNT_BACKEND_CSF_DISABLED);
 	}
@@ -1794,8 +1682,7 @@ void kbase_hwcnt_backend_csf_on_before_reset(
 	csf_info->csf_if->unlock(csf_info->csf_if->ctx, flags);
 }
 
-void kbase_hwcnt_backend_csf_on_prfcnt_sample(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_prfcnt_sample(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
@@ -1809,10 +1696,8 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample(
 	backend_csf = csf_info->backend;
 
 	/* Skip the dump_work if it's a watchdog request. */
-	if (backend_csf->dump_state ==
-	    KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) {
-		backend_csf->dump_state =
-			KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED;
+	if (backend_csf->dump_state == KBASE_HWCNT_BACKEND_CSF_DUMP_WATCHDOG_REQUESTED) {
+		backend_csf->dump_state = KBASE_HWCNT_BACKEND_CSF_DUMP_COMPLETED;
 		return;
 	}
 
@@ -1826,8 +1711,7 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample(
 	kbase_hwcnt_backend_csf_submit_dump_worker(csf_info);
 }
 
-void kbase_hwcnt_backend_csf_on_prfcnt_threshold(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_prfcnt_threshold(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
@@ -1844,12 +1728,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_threshold(
 		/* Submit the threshold work into the work queue to consume the
 		 * available samples.
 		 */
-		queue_work(backend_csf->hwc_dump_workq,
-			   &backend_csf->hwc_threshold_work);
+		queue_work(backend_csf->hwc_dump_workq, &backend_csf->hwc_threshold_work);
 }
 
-void kbase_hwcnt_backend_csf_on_prfcnt_overflow(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_prfcnt_overflow(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 
@@ -1870,8 +1752,7 @@ void kbase_hwcnt_backend_csf_on_prfcnt_overflow(
 	kbasep_hwcnt_backend_csf_handle_recoverable_error(csf_info->backend);
 }
 
-void kbase_hwcnt_backend_csf_on_prfcnt_enable(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_prfcnt_enable(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
@@ -1884,12 +1765,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_enable(
 		return;
 	backend_csf = csf_info->backend;
 
-	if (backend_csf->enable_state ==
-	    KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) {
+	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_ENABLED) {
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
 			backend_csf, KBASE_HWCNT_BACKEND_CSF_ENABLED);
-	} else if (backend_csf->enable_state ==
-		   KBASE_HWCNT_BACKEND_CSF_ENABLED) {
+	} else if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_ENABLED) {
 		/* Unexpected, but we are already in the right state so just
 		 * ignore it.
 		 */
@@ -1897,13 +1776,11 @@ void kbase_hwcnt_backend_csf_on_prfcnt_enable(
 		/* Unexpected state change, assume everything is broken until
 		 * we reset.
 		 */
-		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
-			csf_info->backend);
+		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend);
 	}
 }
 
-void kbase_hwcnt_backend_csf_on_prfcnt_disable(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_on_prfcnt_disable(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_backend_csf *backend_csf;
@@ -1916,13 +1793,10 @@ void kbase_hwcnt_backend_csf_on_prfcnt_disable(
 		return;
 	backend_csf = csf_info->backend;
 
-	if (backend_csf->enable_state ==
-	    KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED) {
+	if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_TRANSITIONING_TO_DISABLED) {
 		kbasep_hwcnt_backend_csf_change_es_and_wake_waiters(
-			backend_csf,
-			KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER);
-	} else if (backend_csf->enable_state ==
-		   KBASE_HWCNT_BACKEND_CSF_DISABLED) {
+			backend_csf, KBASE_HWCNT_BACKEND_CSF_DISABLED_WAIT_FOR_WORKER);
+	} else if (backend_csf->enable_state == KBASE_HWCNT_BACKEND_CSF_DISABLED) {
 		/* Unexpected, but we are already in the right state so just
 		 * ignore it.
 		 */
@@ -1930,15 +1804,12 @@ void kbase_hwcnt_backend_csf_on_prfcnt_disable(
 		/* Unexpected state change, assume everything is broken until
 		 * we reset.
 		 */
-		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(
-			csf_info->backend);
+		kbasep_hwcnt_backend_csf_handle_unrecoverable_error(csf_info->backend);
 	}
 }
 
-int kbase_hwcnt_backend_csf_metadata_init(
-	struct kbase_hwcnt_backend_interface *iface)
+int kbase_hwcnt_backend_csf_metadata_init(struct kbase_hwcnt_backend_interface *iface)
 {
-	int errcode;
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 	struct kbase_hwcnt_gpu_info gpu_info;
 
@@ -1949,8 +1820,7 @@ int kbase_hwcnt_backend_csf_metadata_init(
 
 	WARN_ON(!csf_info->csf_if->get_prfcnt_info);
 
-	csf_info->csf_if->get_prfcnt_info(csf_info->csf_if->ctx,
-					  &csf_info->prfcnt_info);
+	csf_info->csf_if->get_prfcnt_info(csf_info->csf_if->ctx, &csf_info->prfcnt_info);
 
 	/* The clock domain counts should not exceed the number of maximum
 	 * number of clock regulators.
@@ -1962,25 +1832,12 @@ int kbase_hwcnt_backend_csf_metadata_init(
 	gpu_info.core_mask = csf_info->prfcnt_info.core_mask;
 	gpu_info.clk_cnt = csf_info->prfcnt_info.clk_cnt;
 	gpu_info.prfcnt_values_per_block =
-		csf_info->prfcnt_info.prfcnt_block_size /
-		KBASE_HWCNT_VALUE_HW_BYTES;
-	errcode = kbase_hwcnt_csf_metadata_create(
-		&gpu_info, csf_info->counter_set, &csf_info->metadata);
-	if (errcode)
-		return errcode;
-
-	/*
-	 * Dump abstraction size should be exactly twice the size and layout as
-	 * the physical dump size since 64-bit per value used in metadata.
-	 */
-	WARN_ON(csf_info->prfcnt_info.dump_bytes * 2 !=
-		csf_info->metadata->dump_buf_bytes);
-
-	return 0;
+		csf_info->prfcnt_info.prfcnt_block_size / KBASE_HWCNT_VALUE_HW_BYTES;
+	return kbase_hwcnt_csf_metadata_create(&gpu_info, csf_info->counter_set,
+					       &csf_info->metadata);
 }
 
-void kbase_hwcnt_backend_csf_metadata_term(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_csf_metadata_term(struct kbase_hwcnt_backend_interface *iface)
 {
 	struct kbase_hwcnt_backend_csf_info *csf_info;
 
@@ -1994,10 +1851,9 @@ void kbase_hwcnt_backend_csf_metadata_term(
 	}
 }
 
-int kbase_hwcnt_backend_csf_create(
-	struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
-	struct kbase_hwcnt_watchdog_interface *watchdog_if,
-	struct kbase_hwcnt_backend_interface *iface)
+int kbase_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
+				   struct kbase_hwcnt_watchdog_interface *watchdog_if,
+				   struct kbase_hwcnt_backend_interface *iface)
 {
 	int errcode;
 	const struct kbase_hwcnt_backend_csf_info *info = NULL;
@@ -2009,8 +1865,7 @@ int kbase_hwcnt_backend_csf_create(
 	if (!is_power_of_2(ring_buf_cnt))
 		return -EINVAL;
 
-	errcode = kbasep_hwcnt_backend_csf_info_create(csf_if, ring_buf_cnt,
-						       watchdog_if, &info);
+	errcode = kbasep_hwcnt_backend_csf_info_create(csf_if, ring_buf_cnt, watchdog_if, &info);
 	if (errcode)
 		return errcode;
 
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.h
index e0cafbe..9c5a5c9 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_csf.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,9 +27,9 @@
 #ifndef _KBASE_HWCNT_BACKEND_CSF_H_
 #define _KBASE_HWCNT_BACKEND_CSF_H_
 
-#include "mali_kbase_hwcnt_backend.h"
-#include "mali_kbase_hwcnt_backend_csf_if.h"
-#include "mali_kbase_hwcnt_watchdog_if.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h"
+#include "hwcnt/mali_kbase_hwcnt_watchdog_if.h"
 
 /**
  * kbase_hwcnt_backend_csf_create() - Create a CSF hardware counter backend
@@ -47,10 +47,9 @@
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_backend_csf_create(
-	struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
-	struct kbase_hwcnt_watchdog_interface *watchdog_if,
-	struct kbase_hwcnt_backend_interface *iface);
+int kbase_hwcnt_backend_csf_create(struct kbase_hwcnt_backend_csf_if *csf_if, u32 ring_buf_cnt,
+				   struct kbase_hwcnt_watchdog_interface *watchdog_if,
+				   struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_metadata_init() - Initialize the metadata for a CSF
@@ -58,16 +57,14 @@ int kbase_hwcnt_backend_csf_create(
  * @iface: Non-NULL pointer to backend interface structure
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_backend_csf_metadata_init(
-	struct kbase_hwcnt_backend_interface *iface);
+int kbase_hwcnt_backend_csf_metadata_init(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_metadata_term() - Terminate the metadata for a CSF
  *                                           hardware counter backend.
  * @iface: Non-NULL pointer to backend interface structure.
  */
-void kbase_hwcnt_backend_csf_metadata_term(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_metadata_term(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_destroy() - Destroy a CSF hardware counter backend
@@ -77,8 +74,7 @@ void kbase_hwcnt_backend_csf_metadata_term(
  * Can be safely called on an all-zeroed interface, or on an already destroyed
  * interface.
  */
-void kbase_hwcnt_backend_csf_destroy(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_destroy(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_protm_entered() - CSF HWC backend function to receive
@@ -86,8 +82,7 @@ void kbase_hwcnt_backend_csf_destroy(
  *                                           has been entered.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_protm_entered(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_protm_entered(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_protm_exited() - CSF HWC backend function to receive
@@ -95,8 +90,7 @@ void kbase_hwcnt_backend_csf_protm_entered(
  *                                          been exited.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_protm_exited(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_protm_exited(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_unrecoverable_error() - CSF HWC backend function
@@ -108,8 +102,7 @@ void kbase_hwcnt_backend_csf_protm_exited(
  * with reset, or that may put HWC logic in state that could result in hang. For
  * example, on bus error, or when FW becomes unresponsive.
  */
-void kbase_hwcnt_backend_csf_on_unrecoverable_error(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_unrecoverable_error(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_before_reset() - CSF HWC backend function to be
@@ -119,16 +112,14 @@ void kbase_hwcnt_backend_csf_on_unrecoverable_error(
  *                                             were in it.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_before_reset(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_before_reset(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_prfcnt_sample() - CSF performance counter sample
  *                                              complete interrupt handler.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_prfcnt_sample(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_prfcnt_sample(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_prfcnt_threshold() - CSF performance counter
@@ -136,31 +127,27 @@ void kbase_hwcnt_backend_csf_on_prfcnt_sample(
  *                                                 interrupt handler.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_prfcnt_threshold(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_prfcnt_threshold(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_prfcnt_overflow() - CSF performance counter buffer
  *                                                overflow interrupt handler.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_prfcnt_overflow(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_prfcnt_overflow(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_prfcnt_enable() - CSF performance counter enabled
  *                                              interrupt handler.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_prfcnt_enable(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_prfcnt_enable(struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_csf_on_prfcnt_disable() - CSF performance counter
  *                                               disabled interrupt handler.
  * @iface: Non-NULL pointer to HWC backend interface.
  */
-void kbase_hwcnt_backend_csf_on_prfcnt_disable(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_csf_on_prfcnt_disable(struct kbase_hwcnt_backend_interface *iface);
 
 #endif /* _KBASE_HWCNT_BACKEND_CSF_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h
index 9c4fef5..382a3ad 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -55,8 +55,12 @@ struct kbase_hwcnt_backend_csf_if_enable {
 /**
  * struct kbase_hwcnt_backend_csf_if_prfcnt_info - Performance counter
  *                                                 information.
+ * @prfcnt_hw_size:    Total length in bytes of all the hardware counters data. The hardware
+ *                     counters are sub-divided into 4 classes: front-end, shader, tiler, and
+ *                     memory system (l2 cache + MMU).
+ * @prfcnt_fw_size:    Total length in bytes of all the firmware counters data.
  * @dump_bytes:        Bytes of GPU memory required to perform a performance
- *                     counter dump.
+ *                     counter dump. dump_bytes = prfcnt_hw_size + prfcnt_fw_size.
  * @prfcnt_block_size: Bytes of each performance counter block.
  * @l2_count:          The MMU L2 cache count.
  * @core_mask:         Shader core mask.
@@ -65,6 +69,8 @@ struct kbase_hwcnt_backend_csf_if_enable {
  *                     is taken.
  */
 struct kbase_hwcnt_backend_csf_if_prfcnt_info {
+	size_t prfcnt_hw_size;
+	size_t prfcnt_fw_size;
 	size_t dump_bytes;
 	size_t prfcnt_block_size;
 	size_t l2_count;
@@ -79,8 +85,8 @@ struct kbase_hwcnt_backend_csf_if_prfcnt_info {
  *                                                          held.
  * @ctx: Non-NULL pointer to a CSF context.
  */
-typedef void kbase_hwcnt_backend_csf_if_assert_lock_held_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx);
+typedef void
+kbase_hwcnt_backend_csf_if_assert_lock_held_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_lock_fn - Acquire backend spinlock.
@@ -89,9 +95,8 @@ typedef void kbase_hwcnt_backend_csf_if_assert_lock_held_fn(
  * @flags: Pointer to the memory location that would store the previous
  *         interrupt state.
  */
-typedef void kbase_hwcnt_backend_csf_if_lock_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	unsigned long *flags);
+typedef void kbase_hwcnt_backend_csf_if_lock_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						unsigned long *flags);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_unlock_fn - Release backend spinlock.
@@ -100,9 +105,8 @@ typedef void kbase_hwcnt_backend_csf_if_lock_fn(
  * @flags: Previously stored interrupt state when Scheduler interrupt
  *         spinlock was acquired.
  */
-typedef void kbase_hwcnt_backend_csf_if_unlock_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	unsigned long flags);
+typedef void kbase_hwcnt_backend_csf_if_unlock_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						  unsigned long flags);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_get_prfcnt_info_fn - Get performance
@@ -131,10 +135,10 @@ typedef void kbase_hwcnt_backend_csf_if_get_prfcnt_info_fn(
  *
  * Return: 0 on success, else error code.
  */
-typedef int kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count,
-	void **cpu_dump_base,
-	struct kbase_hwcnt_backend_csf_if_ring_buf **ring_buf);
+typedef int
+kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					     u32 buf_count, void **cpu_dump_base,
+					     struct kbase_hwcnt_backend_csf_if_ring_buf **ring_buf);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_ring_buf_sync_fn - Sync HWC dump buffers
@@ -153,10 +157,10 @@ typedef int kbase_hwcnt_backend_csf_if_ring_buf_alloc_fn(
  * Flush cached HWC dump buffer data to ensure that all writes from GPU and CPU
  * are correctly observed.
  */
-typedef void kbase_hwcnt_backend_csf_if_ring_buf_sync_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
-	u32 buf_index_first, u32 buf_index_last, bool for_cpu);
+typedef void
+kbase_hwcnt_backend_csf_if_ring_buf_sync_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					    struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
+					    u32 buf_index_first, u32 buf_index_last, bool for_cpu);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_ring_buf_free_fn - Free a ring buffer for
@@ -165,9 +169,9 @@ typedef void kbase_hwcnt_backend_csf_if_ring_buf_sync_fn(
  * @ctx:      Non-NULL pointer to a CSF interface context.
  * @ring_buf: Non-NULL pointer to the ring buffer which to be freed.
  */
-typedef void kbase_hwcnt_backend_csf_if_ring_buf_free_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf);
+typedef void
+kbase_hwcnt_backend_csf_if_ring_buf_free_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					    struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_timestamp_ns_fn - Get the current
@@ -177,8 +181,7 @@ typedef void kbase_hwcnt_backend_csf_if_ring_buf_free_fn(
  *
  * Return: CSF interface timestamp in nanoseconds.
  */
-typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx);
+typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_dump_enable_fn - Setup and enable hardware
@@ -189,10 +192,10 @@ typedef u64 kbase_hwcnt_backend_csf_if_timestamp_ns_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_dump_enable_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
-	struct kbase_hwcnt_backend_csf_if_enable *enable);
+typedef void
+kbase_hwcnt_backend_csf_if_dump_enable_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					  struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
+					  struct kbase_hwcnt_backend_csf_if_enable *enable);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_dump_disable_fn - Disable hardware counter
@@ -201,8 +204,7 @@ typedef void kbase_hwcnt_backend_csf_if_dump_enable_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx);
+typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_dump_request_fn - Request a HWC dump.
@@ -211,8 +213,7 @@ typedef void kbase_hwcnt_backend_csf_if_dump_disable_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_dump_request_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx);
+typedef void kbase_hwcnt_backend_csf_if_dump_request_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_get_indexes_fn - Get current extract and
@@ -225,9 +226,8 @@ typedef void kbase_hwcnt_backend_csf_if_dump_request_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 *extract_index,
-	u32 *insert_index);
+typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						       u32 *extract_index, u32 *insert_index);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_set_extract_index_fn - Update the extract
@@ -239,8 +239,9 @@ typedef void kbase_hwcnt_backend_csf_if_get_indexes_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_set_extract_index_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 extract_index);
+typedef void
+kbase_hwcnt_backend_csf_if_set_extract_index_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						u32 extract_index);
 
 /**
  * typedef kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn - Get the current
@@ -254,9 +255,9 @@ typedef void kbase_hwcnt_backend_csf_if_set_extract_index_fn(
  *
  * Requires lock to be taken before calling.
  */
-typedef void kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u64 *cycle_counts,
-	u64 clk_enable_map);
+typedef void
+kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						  u64 *cycle_counts, u64 clk_enable_map);
 
 /**
  * struct kbase_hwcnt_backend_csf_if - Hardware counter backend CSF virtual
@@ -273,8 +274,6 @@ typedef void kbase_hwcnt_backend_csf_if_get_gpu_cycle_count_fn(
  * @timestamp_ns:        Function ptr to get the current CSF interface
  *                       timestamp.
  * @dump_enable:         Function ptr to enable dumping.
- * @dump_enable_nolock:  Function ptr to enable dumping while the
- *                       backend-specific spinlock is already held.
  * @dump_disable:        Function ptr to disable dumping.
  * @dump_request:        Function ptr to request a dump.
  * @get_indexes:         Function ptr to get extract and insert indexes of the
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.c
index 15ffbfa..c8cf934 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.c
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,24 +26,19 @@
 #include <mali_kbase.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
 #include <device/mali_kbase_device.h>
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 #include <csf/mali_kbase_csf_registers.h>
 
 #include "csf/mali_kbase_csf_firmware.h"
-#include "mali_kbase_hwcnt_backend_csf_if_fw.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h"
 #include "mali_kbase_hwaccess_time.h"
 #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h"
+#include <backend/gpu/mali_kbase_model_linux.h>
 
 #include <linux/log2.h>
 #include "mali_kbase_ccswe.h"
 
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-#include <backend/gpu/mali_kbase_model_dummy.h>
-#endif /* CONFIG_MALI_NO_MALI */
-
-/** The number of nanoseconds in a second. */
-#define NSECS_IN_SEC 1000000000ull /* ns */
 
 /* Ring buffer virtual address start at 4GB  */
 #define KBASE_HWC_CSF_RING_BUFFER_VA_START (1ull << 32)
@@ -90,8 +85,8 @@ struct kbase_hwcnt_backend_csf_if_fw_ctx {
 	struct kbase_ccswe ccswe_shader_cores;
 };
 
-static void kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx)
+static void
+kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(struct kbase_hwcnt_backend_csf_if_ctx *ctx)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx;
 	struct kbase_device *kbdev;
@@ -104,9 +99,10 @@ static void kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(
 	kbase_csf_scheduler_spin_lock_assert_held(kbdev);
 }
 
-static void
-kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-				    unsigned long *flags)
+static void kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						unsigned long *flags)
+	__acquires(&(struct kbase_hwcnt_backend_csf_if_fw_ctx)
+			    ctx->kbdev->csf.scheduler.interrupt_lock)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx;
 	struct kbase_device *kbdev;
@@ -119,8 +115,10 @@ kbasep_hwcnt_backend_csf_if_fw_lock(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
 	kbase_csf_scheduler_spin_lock(kbdev, flags);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_unlock(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, unsigned long flags)
+static void kbasep_hwcnt_backend_csf_if_fw_unlock(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						  unsigned long flags)
+	__releases(&(struct kbase_hwcnt_backend_csf_if_fw_ctx)
+			    ctx->kbdev->csf.scheduler.interrupt_lock)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx;
 	struct kbase_device *kbdev;
@@ -141,22 +139,19 @@ static void kbasep_hwcnt_backend_csf_if_fw_unlock(
  * @clk_index:        Clock index
  * @clk_rate_hz:      Clock frequency(hz)
  */
-static void kbasep_hwcnt_backend_csf_if_fw_on_freq_change(
-	struct kbase_clk_rate_listener *rate_listener, u32 clk_index,
-	u32 clk_rate_hz)
+static void
+kbasep_hwcnt_backend_csf_if_fw_on_freq_change(struct kbase_clk_rate_listener *rate_listener,
+					      u32 clk_index, u32 clk_rate_hz)
 {
-	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx =
-		container_of(rate_listener,
-			     struct kbase_hwcnt_backend_csf_if_fw_ctx,
-			     rate_listener);
+	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx = container_of(
+		rate_listener, struct kbase_hwcnt_backend_csf_if_fw_ctx, rate_listener);
 	u64 timestamp_ns;
 
 	if (clk_index != KBASE_CLOCK_DOMAIN_SHADER_CORES)
 		return;
 
 	timestamp_ns = ktime_get_raw_ns();
-	kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns,
-				clk_rate_hz);
+	kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns, clk_rate_hz);
 }
 
 /**
@@ -165,17 +160,16 @@ static void kbasep_hwcnt_backend_csf_if_fw_on_freq_change(
  * @fw_ctx:         Non-NULL pointer to CSF firmware interface context.
  * @clk_enable_map: Non-NULL pointer to enable map specifying enabled counters.
  */
-static void kbasep_hwcnt_backend_csf_if_fw_cc_enable(
-	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx, u64 clk_enable_map)
+static void
+kbasep_hwcnt_backend_csf_if_fw_cc_enable(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx,
+					 u64 clk_enable_map)
 {
 	struct kbase_device *kbdev = fw_ctx->kbdev;
 
-	if (kbase_hwcnt_clk_enable_map_enabled(
-		    clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
 		/* software estimation for non-top clock domains */
 		struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm;
-		const struct kbase_clk_data *clk_data =
-			rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES];
+		const struct kbase_clk_data *clk_data = rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES];
 		u32 cur_freq;
 		unsigned long flags;
 		u64 timestamp_ns;
@@ -186,11 +180,9 @@ static void kbasep_hwcnt_backend_csf_if_fw_cc_enable(
 
 		cur_freq = (u32)clk_data->clock_val;
 		kbase_ccswe_reset(&fw_ctx->ccswe_shader_cores);
-		kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores,
-					timestamp_ns, cur_freq);
+		kbase_ccswe_freq_change(&fw_ctx->ccswe_shader_cores, timestamp_ns, cur_freq);
 
-		kbase_clk_rate_trace_manager_subscribe_no_lock(
-			rtm, &fw_ctx->rate_listener);
+		kbase_clk_rate_trace_manager_subscribe_no_lock(rtm, &fw_ctx->rate_listener);
 
 		spin_unlock_irqrestore(&rtm->lock, flags);
 	}
@@ -203,17 +195,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_cc_enable(
  *
  * @fw_ctx:     Non-NULL pointer to CSF firmware interface context.
  */
-static void kbasep_hwcnt_backend_csf_if_fw_cc_disable(
-	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx)
+static void
+kbasep_hwcnt_backend_csf_if_fw_cc_disable(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx)
 {
 	struct kbase_device *kbdev = fw_ctx->kbdev;
 	struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm;
 	u64 clk_enable_map = fw_ctx->clk_enable_map;
 
-	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map,
-					       KBASE_CLOCK_DOMAIN_SHADER_CORES))
-		kbase_clk_rate_trace_manager_unsubscribe(
-			rtm, &fw_ctx->rate_listener);
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES))
+		kbase_clk_rate_trace_manager_unsubscribe(rtm, &fw_ctx->rate_listener);
 }
 
 static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info(
@@ -221,32 +211,31 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info(
 	struct kbase_hwcnt_backend_csf_if_prfcnt_info *prfcnt_info)
 {
 #if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	size_t dummy_model_blk_count;
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx =
 		(struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx;
 
-	prfcnt_info->l2_count = KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS;
-	prfcnt_info->core_mask =
-		(1ull << KBASE_DUMMY_MODEL_MAX_SHADER_CORES) - 1;
-	/* 1 FE block + 1 Tiler block + l2_count blocks + shader_core blocks */
-	dummy_model_blk_count =
-		2 + prfcnt_info->l2_count + fls64(prfcnt_info->core_mask);
-	prfcnt_info->dump_bytes =
-		dummy_model_blk_count * KBASE_DUMMY_MODEL_BLOCK_SIZE;
-	prfcnt_info->prfcnt_block_size =
-		KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK *
-		KBASE_HWCNT_VALUE_HW_BYTES;
-	prfcnt_info->clk_cnt = 1;
-	prfcnt_info->clearing_samples = true;
+	*prfcnt_info = (struct kbase_hwcnt_backend_csf_if_prfcnt_info){
+		.l2_count = KBASE_DUMMY_MODEL_MAX_MEMSYS_BLOCKS,
+		.core_mask = (1ull << KBASE_DUMMY_MODEL_MAX_SHADER_CORES) - 1,
+		.prfcnt_hw_size =
+			KBASE_DUMMY_MODEL_MAX_NUM_HARDWARE_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE,
+		.prfcnt_fw_size =
+			KBASE_DUMMY_MODEL_MAX_FIRMWARE_BLOCKS * KBASE_DUMMY_MODEL_BLOCK_SIZE,
+		.dump_bytes = KBASE_DUMMY_MODEL_MAX_SAMPLE_SIZE,
+		.prfcnt_block_size = KBASE_DUMMY_MODEL_BLOCK_SIZE,
+		.clk_cnt = 1,
+		.clearing_samples = true,
+	};
+
 	fw_ctx->buf_bytes = prfcnt_info->dump_bytes;
 #else
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx;
 	struct kbase_device *kbdev;
 	u32 prfcnt_size;
-	u32 prfcnt_hw_size = 0;
-	u32 prfcnt_fw_size = 0;
-	u32 prfcnt_block_size = KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK *
-				KBASE_HWCNT_VALUE_HW_BYTES;
+	u32 prfcnt_hw_size;
+	u32 prfcnt_fw_size;
+	u32 prfcnt_block_size =
+		KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK * KBASE_HWCNT_VALUE_HW_BYTES;
 
 	WARN_ON(!ctx);
 	WARN_ON(!prfcnt_info);
@@ -254,8 +243,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info(
 	fw_ctx = (struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx;
 	kbdev = fw_ctx->kbdev;
 	prfcnt_size = kbdev->csf.global_iface.prfcnt_size;
-	prfcnt_hw_size = (prfcnt_size & 0xFF) << 8;
-	prfcnt_fw_size = (prfcnt_size >> 16) << 8;
+	prfcnt_hw_size = GLB_PRFCNT_SIZE_HARDWARE_SIZE_GET(prfcnt_size);
+	prfcnt_fw_size = GLB_PRFCNT_SIZE_FIRMWARE_SIZE_GET(prfcnt_size);
 	fw_ctx->buf_bytes = prfcnt_hw_size + prfcnt_fw_size;
 
 	/* Read the block size if the GPU has the register PRFCNT_FEATURES
@@ -263,33 +252,31 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info(
 	 */
 	if ((kbdev->gpu_props.props.raw_props.gpu_id & GPU_ID2_PRODUCT_MODEL) >=
 	    GPU_ID2_PRODUCT_TTUX) {
-		prfcnt_block_size =
-			PRFCNT_FEATURES_COUNTER_BLOCK_SIZE_GET(kbase_reg_read(
-				kbdev, GPU_CONTROL_REG(PRFCNT_FEATURES)))
-			<< 8;
+		prfcnt_block_size = PRFCNT_FEATURES_COUNTER_BLOCK_SIZE_GET(
+					    kbase_reg_read(kbdev, GPU_CONTROL_REG(PRFCNT_FEATURES)))
+				    << 8;
 	}
 
-	prfcnt_info->dump_bytes = fw_ctx->buf_bytes;
-	prfcnt_info->prfcnt_block_size = prfcnt_block_size;
-	prfcnt_info->l2_count = kbdev->gpu_props.props.l2_props.num_l2_slices;
-	prfcnt_info->core_mask =
-		kbdev->gpu_props.props.coherency_info.group[0].core_mask;
-
-	prfcnt_info->clk_cnt = fw_ctx->clk_cnt;
-	prfcnt_info->clearing_samples = true;
+	*prfcnt_info = (struct kbase_hwcnt_backend_csf_if_prfcnt_info){
+		.prfcnt_hw_size = prfcnt_hw_size,
+		.prfcnt_fw_size = prfcnt_fw_size,
+		.dump_bytes = fw_ctx->buf_bytes,
+		.prfcnt_block_size = prfcnt_block_size,
+		.l2_count = kbdev->gpu_props.props.l2_props.num_l2_slices,
+		.core_mask = kbdev->gpu_props.props.coherency_info.group[0].core_mask,
+		.clk_cnt = fw_ctx->clk_cnt,
+		.clearing_samples = true,
+	};
 
 	/* Block size must be multiple of counter size. */
-	WARN_ON((prfcnt_info->prfcnt_block_size % KBASE_HWCNT_VALUE_HW_BYTES) !=
-		0);
+	WARN_ON((prfcnt_info->prfcnt_block_size % KBASE_HWCNT_VALUE_HW_BYTES) != 0);
 	/* Total size must be multiple of block size. */
-	WARN_ON((prfcnt_info->dump_bytes % prfcnt_info->prfcnt_block_size) !=
-		0);
+	WARN_ON((prfcnt_info->dump_bytes % prfcnt_info->prfcnt_block_size) != 0);
 #endif
 }
 
 static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count,
-	void **cpu_dump_base,
+	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 buf_count, void **cpu_dump_base,
 	struct kbase_hwcnt_backend_csf_if_ring_buf **out_ring_buf)
 {
 	struct kbase_device *kbdev;
@@ -341,9 +328,8 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc(
 		goto page_list_alloc_error;
 
 	/* Get physical page for the buffer */
-	ret = kbase_mem_pool_alloc_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages,
-		phys, false);
+	ret = kbase_mem_pool_alloc_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages,
+					 phys, false, NULL);
 	if (ret != num_pages)
 		goto phys_mem_pool_alloc_error;
 
@@ -359,16 +345,19 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc(
 		KBASE_REG_MEMATTR_INDEX(AS_MEMATTR_INDEX_NON_CACHEABLE);
 
 	/* Update MMU table */
-	ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu,
-				     gpu_va_base >> PAGE_SHIFT, phys, num_pages,
-				     flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW,
-				     mmu_sync_info);
+	ret = kbase_mmu_insert_pages(kbdev, &kbdev->csf.mcu_mmu, gpu_va_base >> PAGE_SHIFT, phys,
+				     num_pages, flags, MCU_AS_NR, KBASE_MEM_GROUP_CSF_FW,
+				     mmu_sync_info, NULL);
 	if (ret)
 		goto mmu_insert_failed;
 
 	kfree(page_list);
 
+#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+	fw_ring_buf->gpu_dump_base = (uintptr_t)cpu_addr;
+#else
 	fw_ring_buf->gpu_dump_base = gpu_va_base;
+#endif /* CONFIG_MALI_NO_MALI */
 	fw_ring_buf->cpu_dump_base = cpu_addr;
 	fw_ring_buf->phys = phys;
 	fw_ring_buf->num_pages = num_pages;
@@ -376,23 +365,15 @@ static int kbasep_hwcnt_backend_csf_if_fw_ring_buf_alloc(
 	fw_ring_buf->as_nr = MCU_AS_NR;
 
 	*cpu_dump_base = fw_ring_buf->cpu_dump_base;
-	*out_ring_buf =
-		(struct kbase_hwcnt_backend_csf_if_ring_buf *)fw_ring_buf;
-
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	/* The dummy model needs the CPU mapping. */
-	gpu_model_set_dummy_prfcnt_base_cpu(fw_ring_buf->cpu_dump_base, kbdev,
-					    phys, num_pages);
-#endif /* CONFIG_MALI_NO_MALI */
+	*out_ring_buf = (struct kbase_hwcnt_backend_csf_if_ring_buf *)fw_ring_buf;
 
 	return 0;
 
 mmu_insert_failed:
 	vunmap(cpu_addr);
 vmap_error:
-	kbase_mem_pool_free_pages(
-		&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages,
-		phys, false, false);
+	kbase_mem_pool_free_pages(&kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW], num_pages, phys,
+				  false, false);
 phys_mem_pool_alloc_error:
 	kfree(page_list);
 page_list_alloc_error:
@@ -402,10 +383,10 @@ phys_alloc_error:
 	return -ENOMEM;
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
-	u32 buf_index_first, u32 buf_index_last, bool for_cpu)
+static void
+kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					     struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
+					     u32 buf_index_first, u32 buf_index_last, bool for_cpu)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ring_buf *fw_ring_buf =
 		(struct kbase_hwcnt_backend_csf_if_fw_ring_buf *)ring_buf;
@@ -422,14 +403,21 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(
 	WARN_ON(!ctx);
 	WARN_ON(!ring_buf);
 
+#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+	/* When using the dummy backend syncing the ring buffer is unnecessary as
+	 * the ring buffer is only accessed by the CPU. It may also cause data loss
+	 * due to cache invalidation so return early.
+	 */
+	return;
+#endif /* CONFIG_MALI_NO_MALI */
+
 	/* The index arguments for this function form an inclusive, exclusive
 	 * range.
 	 * However, when masking back to the available buffers we will make this
 	 * inclusive at both ends so full flushes are not 0 -> 0.
 	 */
 	ring_buf_index_first = buf_index_first & (fw_ring_buf->buf_count - 1);
-	ring_buf_index_last =
-		(buf_index_last - 1) & (fw_ring_buf->buf_count - 1);
+	ring_buf_index_last = (buf_index_last - 1) & (fw_ring_buf->buf_count - 1);
 
 	/* The start address is the offset of the first buffer. */
 	start_address = fw_ctx->buf_bytes * ring_buf_index_first;
@@ -446,15 +434,11 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(
 			struct page *pg = as_page(fw_ring_buf->phys[i]);
 
 			if (for_cpu) {
-				kbase_sync_single_for_cpu(fw_ctx->kbdev,
-							  kbase_dma_addr(pg),
-							  PAGE_SIZE,
-							  DMA_BIDIRECTIONAL);
+				kbase_sync_single_for_cpu(fw_ctx->kbdev, kbase_dma_addr(pg),
+							  PAGE_SIZE, DMA_BIDIRECTIONAL);
 			} else {
-				kbase_sync_single_for_device(fw_ctx->kbdev,
-							     kbase_dma_addr(pg),
-							     PAGE_SIZE,
-							     DMA_BIDIRECTIONAL);
+				kbase_sync_single_for_device(fw_ctx->kbdev, kbase_dma_addr(pg),
+							     PAGE_SIZE, DMA_BIDIRECTIONAL);
 			}
 		}
 
@@ -466,28 +450,24 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_sync(
 		struct page *pg = as_page(fw_ring_buf->phys[i]);
 
 		if (for_cpu) {
-			kbase_sync_single_for_cpu(fw_ctx->kbdev,
-						  kbase_dma_addr(pg), PAGE_SIZE,
+			kbase_sync_single_for_cpu(fw_ctx->kbdev, kbase_dma_addr(pg), PAGE_SIZE,
 						  DMA_BIDIRECTIONAL);
 		} else {
-			kbase_sync_single_for_device(fw_ctx->kbdev,
-						     kbase_dma_addr(pg),
-						     PAGE_SIZE,
+			kbase_sync_single_for_device(fw_ctx->kbdev, kbase_dma_addr(pg), PAGE_SIZE,
 						     DMA_BIDIRECTIONAL);
 		}
 	}
 }
 
-static u64 kbasep_hwcnt_backend_csf_if_fw_timestamp_ns(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx)
+static u64 kbasep_hwcnt_backend_csf_if_fw_timestamp_ns(struct kbase_hwcnt_backend_csf_if_ctx *ctx)
 {
 	CSTD_UNUSED(ctx);
 	return ktime_get_raw_ns();
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf)
+static void
+kbasep_hwcnt_backend_csf_if_fw_ring_buf_free(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					     struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ring_buf *fw_ring_buf =
 		(struct kbase_hwcnt_backend_csf_if_fw_ring_buf *)ring_buf;
@@ -500,17 +480,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free(
 	if (fw_ring_buf->phys) {
 		u64 gpu_va_base = KBASE_HWC_CSF_RING_BUFFER_VA_START;
 
-		WARN_ON(kbase_mmu_teardown_pages(
-			fw_ctx->kbdev, &fw_ctx->kbdev->csf.mcu_mmu,
-			gpu_va_base >> PAGE_SHIFT, fw_ring_buf->num_pages,
+		WARN_ON(kbase_mmu_teardown_firmware_pages(
+			fw_ctx->kbdev, &fw_ctx->kbdev->csf.mcu_mmu, gpu_va_base >> PAGE_SHIFT,
+			fw_ring_buf->phys, fw_ring_buf->num_pages, fw_ring_buf->num_pages,
 			MCU_AS_NR));
 
 		vunmap(fw_ring_buf->cpu_dump_base);
 
-		kbase_mem_pool_free_pages(
-			&fw_ctx->kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
-			fw_ring_buf->num_pages, fw_ring_buf->phys, false,
-			false);
+		kbase_mem_pool_free_pages(&fw_ctx->kbdev->mem_pools.small[KBASE_MEM_GROUP_CSF_FW],
+					  fw_ring_buf->num_pages, fw_ring_buf->phys, false, false);
 
 		kfree(fw_ring_buf->phys);
 
@@ -518,10 +496,10 @@ static void kbasep_hwcnt_backend_csf_if_fw_ring_buf_free(
 	}
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_dump_enable(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx,
-	struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
-	struct kbase_hwcnt_backend_csf_if_enable *enable)
+static void
+kbasep_hwcnt_backend_csf_if_fw_dump_enable(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+					   struct kbase_hwcnt_backend_csf_if_ring_buf *ring_buf,
+					   struct kbase_hwcnt_backend_csf_if_enable *enable)
 {
 	u32 prfcnt_config;
 	struct kbase_device *kbdev;
@@ -540,12 +518,11 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable(
 	global_iface = &kbdev->csf.global_iface;
 
 	/* Configure */
-	prfcnt_config = fw_ring_buf->buf_count;
-	prfcnt_config |= enable->counter_set << PRFCNT_CONFIG_SETSELECT_SHIFT;
+	prfcnt_config = GLB_PRFCNT_CONFIG_SIZE_SET(0, fw_ring_buf->buf_count);
+	prfcnt_config = GLB_PRFCNT_CONFIG_SET_SELECT_SET(prfcnt_config, enable->counter_set);
 
 	/* Configure the ring buffer base address */
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_JASID,
-					fw_ring_buf->as_nr);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_JASID, fw_ring_buf->as_nr);
 	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_BASE_LO,
 					fw_ring_buf->gpu_dump_base & U32_MAX);
 	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_BASE_HI,
@@ -555,38 +532,29 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable(
 	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_EXTRACT, 0);
 
 	/* Configure the enable bitmap */
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CSF_EN,
-					enable->fe_bm);
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_SHADER_EN,
-					enable->shader_bm);
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_MMU_L2_EN,
-					enable->mmu_l2_bm);
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_TILER_EN,
-					enable->tiler_bm);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CSF_EN, enable->fe_bm);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_SHADER_EN, enable->shader_bm);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_MMU_L2_EN, enable->mmu_l2_bm);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_TILER_EN, enable->tiler_bm);
 
 	/* Configure the HWC set and buffer size */
-	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CONFIG,
-					prfcnt_config);
+	kbase_csf_firmware_global_input(global_iface, GLB_PRFCNT_CONFIG, prfcnt_config);
 
 	kbdev->csf.hwcnt.enable_pending = true;
 
 	/* Unmask the interrupts */
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK);
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK);
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK);
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK,
-		GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK,
+					     GLB_ACK_IRQ_MASK_PRFCNT_ENABLE_MASK);
 
 	/* Enable the HWC */
 	kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ,
@@ -594,15 +562,12 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_enable(
 					     GLB_REQ_PRFCNT_ENABLE_MASK);
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 
-	prfcnt_config = kbase_csf_firmware_global_input_read(global_iface,
-							     GLB_PRFCNT_CONFIG);
+	prfcnt_config = kbase_csf_firmware_global_input_read(global_iface, GLB_PRFCNT_CONFIG);
 
-	kbasep_hwcnt_backend_csf_if_fw_cc_enable(fw_ctx,
-						 enable->clk_enable_map);
+	kbasep_hwcnt_backend_csf_if_fw_cc_enable(fw_ctx, enable->clk_enable_map);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_dump_disable(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx)
+static void kbasep_hwcnt_backend_csf_if_fw_dump_disable(struct kbase_hwcnt_backend_csf_if_ctx *ctx)
 {
 	struct kbase_device *kbdev;
 	struct kbase_csf_global_iface *global_iface;
@@ -617,20 +582,16 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_disable(
 
 	/* Disable the HWC */
 	kbdev->csf.hwcnt.enable_pending = true;
-	kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, 0,
-					     GLB_REQ_PRFCNT_ENABLE_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_REQ, 0, GLB_REQ_PRFCNT_ENABLE_MASK);
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 
 	/* mask the interrupts */
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK, 0,
-		GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK);
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK, 0,
-		GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK);
-	kbase_csf_firmware_global_input_mask(
-		global_iface, GLB_ACK_IRQ_MASK, 0,
-		GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0,
+					     GLB_ACK_IRQ_MASK_PRFCNT_SAMPLE_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0,
+					     GLB_ACK_IRQ_MASK_PRFCNT_THRESHOLD_MASK);
+	kbase_csf_firmware_global_input_mask(global_iface, GLB_ACK_IRQ_MASK, 0,
+					     GLB_ACK_IRQ_MASK_PRFCNT_OVERFLOW_MASK);
 
 	/* In case we have a previous request in flight when the disable
 	 * happens.
@@ -640,8 +601,7 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_disable(
 	kbasep_hwcnt_backend_csf_if_fw_cc_disable(fw_ctx);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_dump_request(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx)
+static void kbasep_hwcnt_backend_csf_if_fw_dump_request(struct kbase_hwcnt_backend_csf_if_ctx *ctx)
 {
 	u32 glb_req;
 	struct kbase_device *kbdev;
@@ -664,9 +624,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_dump_request(
 	kbase_csf_ring_doorbell(kbdev, CSF_KERNEL_DOORBELL_NR);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_get_indexes(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 *extract_index,
-	u32 *insert_index)
+static void kbasep_hwcnt_backend_csf_if_fw_get_indexes(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						       u32 *extract_index, u32 *insert_index)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx =
 		(struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx;
@@ -676,14 +635,15 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_indexes(
 	WARN_ON(!insert_index);
 	kbasep_hwcnt_backend_csf_if_fw_assert_lock_held(ctx);
 
-	*extract_index = kbase_csf_firmware_global_input_read(
-		&fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_EXTRACT);
-	*insert_index = kbase_csf_firmware_global_output(
-		&fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_INSERT);
+	*extract_index = kbase_csf_firmware_global_input_read(&fw_ctx->kbdev->csf.global_iface,
+							      GLB_PRFCNT_EXTRACT);
+	*insert_index = kbase_csf_firmware_global_output(&fw_ctx->kbdev->csf.global_iface,
+							 GLB_PRFCNT_INSERT);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_set_extract_index(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u32 extract_idx)
+static void
+kbasep_hwcnt_backend_csf_if_fw_set_extract_index(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						 u32 extract_idx)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx =
 		(struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx;
@@ -694,13 +654,13 @@ static void kbasep_hwcnt_backend_csf_if_fw_set_extract_index(
 	/* Set the raw extract index to release the buffer back to the ring
 	 * buffer.
 	 */
-	kbase_csf_firmware_global_input(&fw_ctx->kbdev->csf.global_iface,
-					GLB_PRFCNT_EXTRACT, extract_idx);
+	kbase_csf_firmware_global_input(&fw_ctx->kbdev->csf.global_iface, GLB_PRFCNT_EXTRACT,
+					extract_idx);
 }
 
-static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count(
-	struct kbase_hwcnt_backend_csf_if_ctx *ctx, u64 *cycle_counts,
-	u64 clk_enable_map)
+static void
+kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count(struct kbase_hwcnt_backend_csf_if_ctx *ctx,
+						   u64 *cycle_counts, u64 clk_enable_map)
 {
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx =
 		(struct kbase_hwcnt_backend_csf_if_fw_ctx *)ctx;
@@ -717,12 +677,12 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count(
 
 		if (clk == KBASE_CLOCK_DOMAIN_TOP) {
 			/* Read cycle count for top clock domain. */
-			kbase_backend_get_gpu_time_norequest(
-				fw_ctx->kbdev, &cycle_counts[clk], NULL, NULL);
+			kbase_backend_get_gpu_time_norequest(fw_ctx->kbdev, &cycle_counts[clk],
+							     NULL, NULL);
 		} else {
 			/* Estimate cycle count for non-top clock domain. */
-			cycle_counts[clk] = kbase_ccswe_cycle_at(
-				&fw_ctx->ccswe_shader_cores, timestamp_ns);
+			cycle_counts[clk] =
+				kbase_ccswe_cycle_at(&fw_ctx->ccswe_shader_cores, timestamp_ns);
 		}
 	}
 }
@@ -732,8 +692,8 @@ static void kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count(
  *
  * @fw_ctx: Pointer to context to destroy.
  */
-static void kbasep_hwcnt_backend_csf_if_fw_ctx_destroy(
-	struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx)
+static void
+kbasep_hwcnt_backend_csf_if_fw_ctx_destroy(struct kbase_hwcnt_backend_csf_if_fw_ctx *fw_ctx)
 {
 	if (!fw_ctx)
 		return;
@@ -748,9 +708,9 @@ static void kbasep_hwcnt_backend_csf_if_fw_ctx_destroy(
  * @out_ctx: Non-NULL pointer to where info is stored on success.
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_csf_if_fw_ctx_create(
-	struct kbase_device *kbdev,
-	struct kbase_hwcnt_backend_csf_if_fw_ctx **out_ctx)
+static int
+kbasep_hwcnt_backend_csf_if_fw_ctx_create(struct kbase_device *kbdev,
+					  struct kbase_hwcnt_backend_csf_if_fw_ctx **out_ctx)
 {
 	u8 clk;
 	int errcode = -ENOMEM;
@@ -774,8 +734,7 @@ static int kbasep_hwcnt_backend_csf_if_fw_ctx_create(
 
 	ctx->clk_enable_map = 0;
 	kbase_ccswe_init(&ctx->ccswe_shader_cores);
-	ctx->rate_listener.notify =
-		kbasep_hwcnt_backend_csf_if_fw_on_freq_change;
+	ctx->rate_listener.notify = kbasep_hwcnt_backend_csf_if_fw_on_freq_change;
 
 	*out_ctx = ctx;
 
@@ -785,8 +744,7 @@ error:
 	return errcode;
 }
 
-void kbase_hwcnt_backend_csf_if_fw_destroy(
-	struct kbase_hwcnt_backend_csf_if *if_fw)
+void kbase_hwcnt_backend_csf_if_fw_destroy(struct kbase_hwcnt_backend_csf_if *if_fw)
 {
 	if (!if_fw)
 		return;
@@ -796,8 +754,8 @@ void kbase_hwcnt_backend_csf_if_fw_destroy(
 	memset(if_fw, 0, sizeof(*if_fw));
 }
 
-int kbase_hwcnt_backend_csf_if_fw_create(
-	struct kbase_device *kbdev, struct kbase_hwcnt_backend_csf_if *if_fw)
+int kbase_hwcnt_backend_csf_if_fw_create(struct kbase_device *kbdev,
+					 struct kbase_hwcnt_backend_csf_if *if_fw)
 {
 	int errcode;
 	struct kbase_hwcnt_backend_csf_if_fw_ctx *ctx = NULL;
@@ -810,8 +768,7 @@ int kbase_hwcnt_backend_csf_if_fw_create(
 		return errcode;
 
 	if_fw->ctx = (struct kbase_hwcnt_backend_csf_if_ctx *)ctx;
-	if_fw->assert_lock_held =
-		kbasep_hwcnt_backend_csf_if_fw_assert_lock_held;
+	if_fw->assert_lock_held = kbasep_hwcnt_backend_csf_if_fw_assert_lock_held;
 	if_fw->lock = kbasep_hwcnt_backend_csf_if_fw_lock;
 	if_fw->unlock = kbasep_hwcnt_backend_csf_if_fw_unlock;
 	if_fw->get_prfcnt_info = kbasep_hwcnt_backend_csf_if_fw_get_prfcnt_info;
@@ -822,11 +779,9 @@ int kbase_hwcnt_backend_csf_if_fw_create(
 	if_fw->dump_enable = kbasep_hwcnt_backend_csf_if_fw_dump_enable;
 	if_fw->dump_disable = kbasep_hwcnt_backend_csf_if_fw_dump_disable;
 	if_fw->dump_request = kbasep_hwcnt_backend_csf_if_fw_dump_request;
-	if_fw->get_gpu_cycle_count =
-		kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count;
+	if_fw->get_gpu_cycle_count = kbasep_hwcnt_backend_csf_if_fw_get_gpu_cycle_count;
 	if_fw->get_indexes = kbasep_hwcnt_backend_csf_if_fw_get_indexes;
-	if_fw->set_extract_index =
-		kbasep_hwcnt_backend_csf_if_fw_set_extract_index;
+	if_fw->set_extract_index = kbasep_hwcnt_backend_csf_if_fw_set_extract_index;
 
 	return 0;
 }
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h
index b69668b..71d1506 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_csf_if_fw.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_csf_if_fw.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,7 +26,7 @@
 #ifndef _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_
 #define _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_
 
-#include "mali_kbase_hwcnt_backend_csf_if.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend_csf_if.h"
 
 /**
  * kbase_hwcnt_backend_csf_if_fw_create() - Create a firmware CSF interface
@@ -36,15 +36,14 @@
  *         creation success.
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_backend_csf_if_fw_create(
-	struct kbase_device *kbdev, struct kbase_hwcnt_backend_csf_if *if_fw);
+int kbase_hwcnt_backend_csf_if_fw_create(struct kbase_device *kbdev,
+					 struct kbase_hwcnt_backend_csf_if *if_fw);
 
 /**
  * kbase_hwcnt_backend_csf_if_fw_destroy() - Destroy a firmware CSF interface of
  *                                           hardware counter backend.
  * @if_fw: Pointer to a CSF interface to destroy.
  */
-void kbase_hwcnt_backend_csf_if_fw_destroy(
-	struct kbase_hwcnt_backend_csf_if *if_fw);
+void kbase_hwcnt_backend_csf_if_fw_destroy(struct kbase_hwcnt_backend_csf_if *if_fw);
 
 #endif /* _KBASE_HWCNT_BACKEND_CSF_IF_FW_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.c
index e418212..8b3caac 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_jm.c
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.c
@@ -19,18 +19,15 @@
  *
  */
 
-#include "mali_kbase_hwcnt_backend_jm.h"
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend_jm.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 #include "mali_kbase.h"
 #include "backend/gpu/mali_kbase_pm_ca.h"
 #include "mali_kbase_hwaccess_instr.h"
 #include "mali_kbase_hwaccess_time.h"
 #include "mali_kbase_ccswe.h"
-
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-#include "backend/gpu/mali_kbase_model_dummy.h"
-#endif /* CONFIG_MALI_NO_MALI */
+#include "backend/gpu/mali_kbase_model_linux.h"
 #include "backend/gpu/mali_kbase_clk_rate_trace_mgr.h"
 
 #include "backend/gpu/mali_kbase_pm_internal.h"
@@ -136,9 +133,8 @@ struct kbase_hwcnt_backend_jm {
  *
  * Return: 0 on success, else error code.
  */
-static int
-kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev,
-				      struct kbase_hwcnt_gpu_info *info)
+static int kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev,
+						 struct kbase_hwcnt_gpu_info *info)
 {
 	size_t clk;
 
@@ -153,13 +149,11 @@ kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev,
 	{
 		const struct base_gpu_props *props = &kbdev->gpu_props.props;
 		const size_t l2_count = props->l2_props.num_l2_slices;
-		const size_t core_mask =
-			props->coherency_info.group[0].core_mask;
+		const size_t core_mask = props->coherency_info.group[0].core_mask;
 
 		info->l2_count = l2_count;
 		info->core_mask = core_mask;
-		info->prfcnt_values_per_block =
-			KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK;
+		info->prfcnt_values_per_block = KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK;
 	}
 #endif /* CONFIG_MALI_NO_MALI */
 
@@ -173,9 +167,8 @@ kbasep_hwcnt_backend_jm_gpu_info_init(struct kbase_device *kbdev,
 	return 0;
 }
 
-static void kbasep_hwcnt_backend_jm_init_layout(
-	const struct kbase_hwcnt_gpu_info *gpu_info,
-	struct kbase_hwcnt_jm_physical_layout *phys_layout)
+static void kbasep_hwcnt_backend_jm_init_layout(const struct kbase_hwcnt_gpu_info *gpu_info,
+						struct kbase_hwcnt_jm_physical_layout *phys_layout)
 {
 	u8 shader_core_cnt;
 
@@ -189,32 +182,29 @@ static void kbasep_hwcnt_backend_jm_init_layout(
 		.tiler_cnt = KBASE_HWCNT_V5_TILER_BLOCK_COUNT,
 		.mmu_l2_cnt = gpu_info->l2_count,
 		.shader_cnt = shader_core_cnt,
-		.block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT +
-			     KBASE_HWCNT_V5_TILER_BLOCK_COUNT +
+		.block_cnt = KBASE_HWCNT_V5_FE_BLOCK_COUNT + KBASE_HWCNT_V5_TILER_BLOCK_COUNT +
 			     gpu_info->l2_count + shader_core_cnt,
 		.shader_avail_mask = gpu_info->core_mask,
 		.headers_per_block = KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
 		.values_per_block = gpu_info->prfcnt_values_per_block,
-		.counters_per_block = gpu_info->prfcnt_values_per_block -
-				      KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
+		.counters_per_block =
+			gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK,
 		.enable_mask_offset = KBASE_HWCNT_V5_PRFCNT_EN_HEADER,
 	};
 }
 
-static void kbasep_hwcnt_backend_jm_dump_sample(
-	const struct kbase_hwcnt_backend_jm *const backend_jm)
+static void
+kbasep_hwcnt_backend_jm_dump_sample(const struct kbase_hwcnt_backend_jm *const backend_jm)
 {
 	size_t block_idx;
 	const u32 *new_sample_buf = backend_jm->cpu_dump_va;
 	const u32 *new_block = new_sample_buf;
 	u64 *dst_buf = backend_jm->to_user_buf;
 	u64 *dst_block = dst_buf;
-	const size_t values_per_block =
-		backend_jm->phys_layout.values_per_block;
+	const size_t values_per_block = backend_jm->phys_layout.values_per_block;
 	const size_t dump_bytes = backend_jm->info->dump_bytes;
 
-	for (block_idx = 0; block_idx < backend_jm->phys_layout.block_cnt;
-	     block_idx++) {
+	for (block_idx = 0; block_idx < backend_jm->phys_layout.block_cnt; block_idx++) {
 		size_t ctr_idx;
 
 		for (ctr_idx = 0; ctr_idx < values_per_block; ctr_idx++)
@@ -224,10 +214,8 @@ static void kbasep_hwcnt_backend_jm_dump_sample(
 		dst_block += values_per_block;
 	}
 
-	WARN_ON(new_block !=
-		new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
-	WARN_ON(dst_block !=
-		dst_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
+	WARN_ON(new_block != new_sample_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
+	WARN_ON(dst_block != dst_buf + (dump_bytes / KBASE_HWCNT_VALUE_HW_BYTES));
 }
 
 /**
@@ -237,21 +225,18 @@ static void kbasep_hwcnt_backend_jm_dump_sample(
  * @clk_index:        Clock index
  * @clk_rate_hz:      Clock frequency(hz)
  */
-static void kbasep_hwcnt_backend_jm_on_freq_change(
-	struct kbase_clk_rate_listener *rate_listener,
-	u32 clk_index,
-	u32 clk_rate_hz)
+static void kbasep_hwcnt_backend_jm_on_freq_change(struct kbase_clk_rate_listener *rate_listener,
+						   u32 clk_index, u32 clk_rate_hz)
 {
-	struct kbase_hwcnt_backend_jm *backend_jm = container_of(
-		rate_listener, struct kbase_hwcnt_backend_jm, rate_listener);
+	struct kbase_hwcnt_backend_jm *backend_jm =
+		container_of(rate_listener, struct kbase_hwcnt_backend_jm, rate_listener);
 	u64 timestamp_ns;
 
 	if (clk_index != KBASE_CLOCK_DOMAIN_SHADER_CORES)
 		return;
 
 	timestamp_ns = ktime_get_raw_ns();
-	kbase_ccswe_freq_change(
-		&backend_jm->ccswe_shader_cores, timestamp_ns, clk_rate_hz);
+	kbase_ccswe_freq_change(&backend_jm->ccswe_shader_cores, timestamp_ns, clk_rate_hz);
 }
 
 /**
@@ -261,53 +246,42 @@ static void kbasep_hwcnt_backend_jm_on_freq_change(
  * @enable_map:   Non-NULL pointer to enable map specifying enabled counters.
  * @timestamp_ns: Timestamp(ns) when HWCNT were enabled.
  */
-static void kbasep_hwcnt_backend_jm_cc_enable(
-	struct kbase_hwcnt_backend_jm *backend_jm,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	u64 timestamp_ns)
+static void kbasep_hwcnt_backend_jm_cc_enable(struct kbase_hwcnt_backend_jm *backend_jm,
+					      const struct kbase_hwcnt_enable_map *enable_map,
+					      u64 timestamp_ns)
 {
 	struct kbase_device *kbdev = backend_jm->kctx->kbdev;
 	u64 clk_enable_map = enable_map->clk_enable_map;
 	u64 cycle_count;
 
-	if (kbase_hwcnt_clk_enable_map_enabled(
-		    clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) {
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) {
 		/* turn on the cycle counter */
 		kbase_pm_request_gpu_cycle_counter_l2_is_on(kbdev);
 		/* Read cycle count for top clock domain. */
-		kbase_backend_get_gpu_time_norequest(
-			kbdev, &cycle_count, NULL, NULL);
+		kbase_backend_get_gpu_time_norequest(kbdev, &cycle_count, NULL, NULL);
 
-		backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_TOP] =
-			cycle_count;
+		backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_TOP] = cycle_count;
 	}
 
-	if (kbase_hwcnt_clk_enable_map_enabled(
-		    clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
 		/* software estimation for non-top clock domains */
 		struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm;
-		const struct kbase_clk_data *clk_data =
-			rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES];
+		const struct kbase_clk_data *clk_data = rtm->clks[KBASE_CLOCK_DOMAIN_SHADER_CORES];
 		u32 cur_freq;
 		unsigned long flags;
 
 		spin_lock_irqsave(&rtm->lock, flags);
 
-		cur_freq = (u32) clk_data->clock_val;
+		cur_freq = (u32)clk_data->clock_val;
 		kbase_ccswe_reset(&backend_jm->ccswe_shader_cores);
-		kbase_ccswe_freq_change(
-			&backend_jm->ccswe_shader_cores,
-			timestamp_ns,
-			cur_freq);
+		kbase_ccswe_freq_change(&backend_jm->ccswe_shader_cores, timestamp_ns, cur_freq);
 
-		kbase_clk_rate_trace_manager_subscribe_no_lock(
-			rtm, &backend_jm->rate_listener);
+		kbase_clk_rate_trace_manager_subscribe_no_lock(rtm, &backend_jm->rate_listener);
 
 		spin_unlock_irqrestore(&rtm->lock, flags);
 
 		/* ccswe was reset. The estimated cycle is zero. */
-		backend_jm->prev_cycle_count[
-			KBASE_CLOCK_DOMAIN_SHADER_CORES] = 0;
+		backend_jm->prev_cycle_count[KBASE_CLOCK_DOMAIN_SHADER_CORES] = 0;
 	}
 
 	/* Keep clk_enable_map for dump_request. */
@@ -319,28 +293,22 @@ static void kbasep_hwcnt_backend_jm_cc_enable(
  *
  * @backend_jm:      Non-NULL pointer to backend.
  */
-static void kbasep_hwcnt_backend_jm_cc_disable(
-	struct kbase_hwcnt_backend_jm *backend_jm)
+static void kbasep_hwcnt_backend_jm_cc_disable(struct kbase_hwcnt_backend_jm *backend_jm)
 {
 	struct kbase_device *kbdev = backend_jm->kctx->kbdev;
 	struct kbase_clk_rate_trace_manager *rtm = &kbdev->pm.clk_rtm;
 	u64 clk_enable_map = backend_jm->clk_enable_map;
 
-	if (kbase_hwcnt_clk_enable_map_enabled(
-		clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) {
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_TOP)) {
 		/* turn off the cycle counter */
 		kbase_pm_release_gpu_cycle_counter(kbdev);
 	}
 
-	if (kbase_hwcnt_clk_enable_map_enabled(
-		clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
-
-		kbase_clk_rate_trace_manager_unsubscribe(
-			rtm, &backend_jm->rate_listener);
+	if (kbase_hwcnt_clk_enable_map_enabled(clk_enable_map, KBASE_CLOCK_DOMAIN_SHADER_CORES)) {
+		kbase_clk_rate_trace_manager_unsubscribe(rtm, &backend_jm->rate_listener);
 	}
 }
 
-
 /**
  * kbasep_hwcnt_gpu_update_curr_config() - Update the destination buffer with
  *                                        current config information.
@@ -356,38 +324,33 @@ static void kbasep_hwcnt_backend_jm_cc_disable(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_gpu_update_curr_config(
-	struct kbase_device *kbdev,
-	struct kbase_hwcnt_curr_config *curr_config)
+static int kbasep_hwcnt_gpu_update_curr_config(struct kbase_device *kbdev,
+					       struct kbase_hwcnt_curr_config *curr_config)
 {
 	if (WARN_ON(!kbdev) || WARN_ON(!curr_config))
 		return -EINVAL;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
-	curr_config->num_l2_slices =
-		kbdev->gpu_props.curr_config.l2_slices;
-	curr_config->shader_present =
-		kbdev->gpu_props.curr_config.shader_present;
+	curr_config->num_l2_slices = kbdev->gpu_props.curr_config.l2_slices;
+	curr_config->shader_present = kbdev->gpu_props.curr_config.shader_present;
 	return 0;
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_timestamp_ns_fn */
-static u64 kbasep_hwcnt_backend_jm_timestamp_ns(
-	struct kbase_hwcnt_backend *backend)
+static u64 kbasep_hwcnt_backend_jm_timestamp_ns(struct kbase_hwcnt_backend *backend)
 {
 	(void)backend;
 	return ktime_get_raw_ns();
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_enable_nolock_fn */
-static int kbasep_hwcnt_backend_jm_dump_enable_nolock(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static int
+kbasep_hwcnt_backend_jm_dump_enable_nolock(struct kbase_hwcnt_backend *backend,
+					   const struct kbase_hwcnt_enable_map *enable_map)
 {
 	int errcode;
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 	struct kbase_context *kctx;
 	struct kbase_device *kbdev;
 	struct kbase_hwcnt_physical_enable_map phys_enable_map;
@@ -406,22 +369,25 @@ static int kbasep_hwcnt_backend_jm_dump_enable_nolock(
 
 	kbase_hwcnt_gpu_enable_map_to_physical(&phys_enable_map, enable_map);
 
-	kbase_hwcnt_gpu_set_to_physical(&phys_counter_set,
-					backend_jm->info->counter_set);
+	kbase_hwcnt_gpu_set_to_physical(&phys_counter_set, backend_jm->info->counter_set);
 
 	enable.fe_bm = phys_enable_map.fe_bm;
 	enable.shader_bm = phys_enable_map.shader_bm;
 	enable.tiler_bm = phys_enable_map.tiler_bm;
 	enable.mmu_l2_bm = phys_enable_map.mmu_l2_bm;
 	enable.counter_set = phys_counter_set;
+#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+	/* The dummy model needs the CPU mapping. */
+	enable.dump_buffer = (uintptr_t)backend_jm->cpu_dump_va;
+#else
 	enable.dump_buffer = backend_jm->gpu_dump_va;
+#endif /* CONFIG_MALI_NO_MALI */
 	enable.dump_buffer_bytes = backend_jm->info->dump_bytes;
 
 	timestamp_ns = kbasep_hwcnt_backend_jm_timestamp_ns(backend);
 
 	/* Update the current configuration information. */
-	errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev,
-						      &backend_jm->curr_config);
+	errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, &backend_jm->curr_config);
 	if (errcode)
 		goto error;
 
@@ -441,14 +407,12 @@ error:
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_enable_fn */
-static int kbasep_hwcnt_backend_jm_dump_enable(
-	struct kbase_hwcnt_backend *backend,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static int kbasep_hwcnt_backend_jm_dump_enable(struct kbase_hwcnt_backend *backend,
+					       const struct kbase_hwcnt_enable_map *enable_map)
 {
 	unsigned long flags;
 	int errcode;
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 	struct kbase_device *kbdev;
 
 	if (!backend_jm)
@@ -458,8 +422,7 @@ static int kbasep_hwcnt_backend_jm_dump_enable(
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
-	errcode = kbasep_hwcnt_backend_jm_dump_enable_nolock(
-		backend, enable_map);
+	errcode = kbasep_hwcnt_backend_jm_dump_enable_nolock(backend, enable_map);
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
@@ -467,12 +430,10 @@ static int kbasep_hwcnt_backend_jm_dump_enable(
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_disable_fn */
-static void kbasep_hwcnt_backend_jm_dump_disable(
-	struct kbase_hwcnt_backend *backend)
+static void kbasep_hwcnt_backend_jm_dump_disable(struct kbase_hwcnt_backend *backend)
 {
 	int errcode;
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 
 	if (WARN_ON(!backend_jm) || !backend_jm->enabled)
 		return;
@@ -486,11 +447,9 @@ static void kbasep_hwcnt_backend_jm_dump_disable(
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_clear_fn */
-static int kbasep_hwcnt_backend_jm_dump_clear(
-	struct kbase_hwcnt_backend *backend)
+static int kbasep_hwcnt_backend_jm_dump_clear(struct kbase_hwcnt_backend *backend)
 {
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 
 	if (!backend_jm || !backend_jm->enabled)
 		return -EINVAL;
@@ -499,12 +458,10 @@ static int kbasep_hwcnt_backend_jm_dump_clear(
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_request_fn */
-static int kbasep_hwcnt_backend_jm_dump_request(
-	struct kbase_hwcnt_backend *backend,
-	u64 *dump_time_ns)
+static int kbasep_hwcnt_backend_jm_dump_request(struct kbase_hwcnt_backend *backend,
+						u64 *dump_time_ns)
 {
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 	struct kbase_device *kbdev;
 	const struct kbase_hwcnt_metadata *metadata;
 	u64 current_cycle_count;
@@ -523,28 +480,25 @@ static int kbasep_hwcnt_backend_jm_dump_request(
 		*dump_time_ns = kbasep_hwcnt_backend_jm_timestamp_ns(backend);
 		ret = kbase_instr_hwcnt_request_dump(backend_jm->kctx);
 
-		kbase_hwcnt_metadata_for_each_clock(metadata, clk) {
-			if (!kbase_hwcnt_clk_enable_map_enabled(
-				backend_jm->clk_enable_map, clk))
+		kbase_hwcnt_metadata_for_each_clock(metadata, clk)
+		{
+			if (!kbase_hwcnt_clk_enable_map_enabled(backend_jm->clk_enable_map, clk))
 				continue;
 
 			if (clk == KBASE_CLOCK_DOMAIN_TOP) {
 				/* Read cycle count for top clock domain. */
-				kbase_backend_get_gpu_time_norequest(
-					kbdev, &current_cycle_count,
-					NULL, NULL);
+				kbase_backend_get_gpu_time_norequest(kbdev, &current_cycle_count,
+								     NULL, NULL);
 			} else {
 				/*
 				 * Estimate cycle count for non-top clock
 				 * domain.
 				 */
 				current_cycle_count = kbase_ccswe_cycle_at(
-					&backend_jm->ccswe_shader_cores,
-					*dump_time_ns);
+					&backend_jm->ccswe_shader_cores, *dump_time_ns);
 			}
 			backend_jm->cycle_count_elapsed[clk] =
-				current_cycle_count -
-				backend_jm->prev_cycle_count[clk];
+				current_cycle_count - backend_jm->prev_cycle_count[clk];
 
 			/*
 			 * Keep the current cycle count for later calculation.
@@ -558,11 +512,9 @@ static int kbasep_hwcnt_backend_jm_dump_request(
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_wait_fn */
-static int kbasep_hwcnt_backend_jm_dump_wait(
-	struct kbase_hwcnt_backend *backend)
+static int kbasep_hwcnt_backend_jm_dump_wait(struct kbase_hwcnt_backend *backend)
 {
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 
 	if (!backend_jm || !backend_jm->enabled)
 		return -EINVAL;
@@ -571,14 +523,12 @@ static int kbasep_hwcnt_backend_jm_dump_wait(
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_dump_get_fn */
-static int kbasep_hwcnt_backend_jm_dump_get(
-	struct kbase_hwcnt_backend *backend,
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map,
-	bool accumulate)
+static int kbasep_hwcnt_backend_jm_dump_get(struct kbase_hwcnt_backend *backend,
+					    struct kbase_hwcnt_dump_buffer *dst,
+					    const struct kbase_hwcnt_enable_map *dst_enable_map,
+					    bool accumulate)
 {
-	struct kbase_hwcnt_backend_jm *backend_jm =
-		(struct kbase_hwcnt_backend_jm *)backend;
+	struct kbase_hwcnt_backend_jm *backend_jm = (struct kbase_hwcnt_backend_jm *)backend;
 	size_t clk;
 #if IS_ENABLED(CONFIG_MALI_NO_MALI)
 	struct kbase_device *kbdev;
@@ -592,16 +542,15 @@ static int kbasep_hwcnt_backend_jm_dump_get(
 		return -EINVAL;
 
 	/* Invalidate the kernel buffer before reading from it. */
-	kbase_sync_mem_regions(
-		backend_jm->kctx, backend_jm->vmap, KBASE_SYNC_TO_CPU);
+	kbase_sync_mem_regions(backend_jm->kctx, backend_jm->vmap, KBASE_SYNC_TO_CPU);
 
 	/* Dump sample to the internal 64-bit user buffer. */
 	kbasep_hwcnt_backend_jm_dump_sample(backend_jm);
 
 	/* Extract elapsed cycle count for each clock domain if enabled. */
-	kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk) {
-		if (!kbase_hwcnt_clk_enable_map_enabled(
-			dst_enable_map->clk_enable_map, clk))
+	kbase_hwcnt_metadata_for_each_clock(dst_enable_map->metadata, clk)
+	{
+		if (!kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk))
 			continue;
 
 		/* Reset the counter to zero if accumulation is off. */
@@ -616,17 +565,16 @@ static int kbasep_hwcnt_backend_jm_dump_get(
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
 	/* Update the current configuration information. */
-	errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev,
-		&backend_jm->curr_config);
+	errcode = kbasep_hwcnt_gpu_update_curr_config(kbdev, &backend_jm->curr_config);
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
 	if (errcode)
 		return errcode;
 #endif /* CONFIG_MALI_NO_MALI */
-	return kbase_hwcnt_jm_dump_get(dst, backend_jm->to_user_buf,
-				       dst_enable_map, backend_jm->pm_core_mask,
-				       &backend_jm->curr_config, accumulate);
+	return kbase_hwcnt_jm_dump_get(dst, backend_jm->to_user_buf, dst_enable_map,
+				       backend_jm->pm_core_mask, &backend_jm->curr_config,
+				       accumulate);
 }
 
 /**
@@ -638,10 +586,8 @@ static int kbasep_hwcnt_backend_jm_dump_get(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_jm_dump_alloc(
-	const struct kbase_hwcnt_backend_jm_info *info,
-	struct kbase_context *kctx,
-	u64 *gpu_dump_va)
+static int kbasep_hwcnt_backend_jm_dump_alloc(const struct kbase_hwcnt_backend_jm_info *info,
+					      struct kbase_context *kctx, u64 *gpu_dump_va)
 {
 	struct kbase_va_region *reg;
 	u64 flags;
@@ -656,16 +602,12 @@ static int kbasep_hwcnt_backend_jm_dump_alloc(
 	WARN_ON(!kctx);
 	WARN_ON(!gpu_dump_va);
 
-	flags = BASE_MEM_PROT_CPU_RD |
-		BASE_MEM_PROT_GPU_WR |
-		BASEP_MEM_PERMANENT_KERNEL_MAPPING |
-		BASE_MEM_CACHED_CPU |
-		BASE_MEM_UNCACHED_GPU;
+	flags = BASE_MEM_PROT_CPU_RD | BASE_MEM_PROT_GPU_WR | BASEP_MEM_PERMANENT_KERNEL_MAPPING |
+		BASE_MEM_CACHED_CPU | BASE_MEM_UNCACHED_GPU;
 
 	nr_pages = PFN_UP(info->dump_bytes);
 
-	reg = kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, gpu_dump_va,
-			      mmu_sync_info);
+	reg = kbase_mem_alloc(kctx, nr_pages, nr_pages, 0, &flags, gpu_dump_va, mmu_sync_info);
 
 	if (!reg)
 		return -ENOMEM;
@@ -678,9 +620,7 @@ static int kbasep_hwcnt_backend_jm_dump_alloc(
  * @kctx:        Non-NULL pointer to kbase context.
  * @gpu_dump_va: GPU dump buffer virtual address.
  */
-static void kbasep_hwcnt_backend_jm_dump_free(
-	struct kbase_context *kctx,
-	u64 gpu_dump_va)
+static void kbasep_hwcnt_backend_jm_dump_free(struct kbase_context *kctx, u64 gpu_dump_va)
 {
 	WARN_ON(!kctx);
 	if (gpu_dump_va)
@@ -693,8 +633,7 @@ static void kbasep_hwcnt_backend_jm_dump_free(
  *
  * Can be safely called on a backend in any state of partial construction.
  */
-static void kbasep_hwcnt_backend_jm_destroy(
-	struct kbase_hwcnt_backend_jm *backend)
+static void kbasep_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_jm *backend)
 {
 	if (!backend)
 		return;
@@ -707,8 +646,7 @@ static void kbasep_hwcnt_backend_jm_destroy(
 			kbase_phy_alloc_mapping_put(kctx, backend->vmap);
 
 		if (backend->gpu_dump_va)
-			kbasep_hwcnt_backend_jm_dump_free(
-				kctx, backend->gpu_dump_va);
+			kbasep_hwcnt_backend_jm_dump_free(kctx, backend->gpu_dump_va);
 
 		kbasep_js_release_privileged_ctx(kbdev, kctx);
 		kbase_destroy_context(kctx);
@@ -726,16 +664,12 @@ static void kbasep_hwcnt_backend_jm_destroy(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_jm_create(
-	const struct kbase_hwcnt_backend_jm_info *info,
-	struct kbase_hwcnt_backend_jm **out_backend)
+static int kbasep_hwcnt_backend_jm_create(const struct kbase_hwcnt_backend_jm_info *info,
+					  struct kbase_hwcnt_backend_jm **out_backend)
 {
 	int errcode;
 	struct kbase_device *kbdev;
 	struct kbase_hwcnt_backend_jm *backend = NULL;
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	size_t page_count;
-#endif
 
 	WARN_ON(!info);
 	WARN_ON(!out_backend);
@@ -747,42 +681,31 @@ static int kbasep_hwcnt_backend_jm_create(
 		goto alloc_error;
 
 	backend->info = info;
-	kbasep_hwcnt_backend_jm_init_layout(&info->hwcnt_gpu_info,
-					    &backend->phys_layout);
+	kbasep_hwcnt_backend_jm_init_layout(&info->hwcnt_gpu_info, &backend->phys_layout);
 
 	backend->kctx = kbase_create_context(kbdev, true,
-		BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED, 0, NULL);
+					     BASE_CONTEXT_SYSTEM_MONITOR_SUBMIT_DISABLED, 0, NULL);
 	if (!backend->kctx)
 		goto alloc_error;
 
 	kbasep_js_schedule_privileged_ctx(kbdev, backend->kctx);
 
-	errcode = kbasep_hwcnt_backend_jm_dump_alloc(
-		info, backend->kctx, &backend->gpu_dump_va);
+	errcode = kbasep_hwcnt_backend_jm_dump_alloc(info, backend->kctx, &backend->gpu_dump_va);
 	if (errcode)
 		goto error;
 
-	backend->cpu_dump_va = kbase_phy_alloc_mapping_get(backend->kctx,
-		backend->gpu_dump_va, &backend->vmap);
+	backend->cpu_dump_va =
+		kbase_phy_alloc_mapping_get(backend->kctx, backend->gpu_dump_va, &backend->vmap);
 	if (!backend->cpu_dump_va || !backend->vmap)
 		goto alloc_error;
 
-	backend->to_user_buf =
-		kzalloc(info->metadata->dump_buf_bytes, GFP_KERNEL);
+	backend->to_user_buf = kzalloc(info->metadata->dump_buf_bytes, GFP_KERNEL);
 	if (!backend->to_user_buf)
 		goto alloc_error;
 
 	kbase_ccswe_init(&backend->ccswe_shader_cores);
 	backend->rate_listener.notify = kbasep_hwcnt_backend_jm_on_freq_change;
 
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	/* The dummy model needs the CPU mapping. */
-	page_count = PFN_UP(info->dump_bytes);
-	gpu_model_set_dummy_prfcnt_base_cpu(backend->cpu_dump_va, kbdev,
-					    backend->vmap->cpu_pages,
-					    page_count);
-#endif /* CONFIG_MALI_NO_MALI */
-
 	*out_backend = backend;
 	return 0;
 
@@ -804,9 +727,8 @@ kbasep_hwcnt_backend_jm_metadata(const struct kbase_hwcnt_backend_info *info)
 }
 
 /* JM backend implementation of kbase_hwcnt_backend_init_fn */
-static int kbasep_hwcnt_backend_jm_init(
-	const struct kbase_hwcnt_backend_info *info,
-	struct kbase_hwcnt_backend **out_backend)
+static int kbasep_hwcnt_backend_jm_init(const struct kbase_hwcnt_backend_info *info,
+					struct kbase_hwcnt_backend **out_backend)
 {
 	int errcode;
 	struct kbase_hwcnt_backend_jm *backend = NULL;
@@ -814,8 +736,8 @@ static int kbasep_hwcnt_backend_jm_init(
 	if (!info || !out_backend)
 		return -EINVAL;
 
-	errcode = kbasep_hwcnt_backend_jm_create(
-		(const struct kbase_hwcnt_backend_jm_info *) info, &backend);
+	errcode = kbasep_hwcnt_backend_jm_create((const struct kbase_hwcnt_backend_jm_info *)info,
+						 &backend);
 	if (errcode)
 		return errcode;
 
@@ -831,8 +753,7 @@ static void kbasep_hwcnt_backend_jm_term(struct kbase_hwcnt_backend *backend)
 		return;
 
 	kbasep_hwcnt_backend_jm_dump_disable(backend);
-	kbasep_hwcnt_backend_jm_destroy(
-		(struct kbase_hwcnt_backend_jm *)backend);
+	kbasep_hwcnt_backend_jm_destroy((struct kbase_hwcnt_backend_jm *)backend);
 }
 
 /**
@@ -841,8 +762,7 @@ static void kbasep_hwcnt_backend_jm_term(struct kbase_hwcnt_backend *backend)
  *
  * Can be safely called on a backend info in any state of partial construction.
  */
-static void kbasep_hwcnt_backend_jm_info_destroy(
-	const struct kbase_hwcnt_backend_jm_info *info)
+static void kbasep_hwcnt_backend_jm_info_destroy(const struct kbase_hwcnt_backend_jm_info *info)
 {
 	if (!info)
 		return;
@@ -858,9 +778,8 @@ static void kbasep_hwcnt_backend_jm_info_destroy(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_jm_info_create(
-	struct kbase_device *kbdev,
-	const struct kbase_hwcnt_backend_jm_info **out_info)
+static int kbasep_hwcnt_backend_jm_info_create(struct kbase_device *kbdev,
+					       const struct kbase_hwcnt_backend_jm_info **out_info)
 {
 	int errcode = -ENOMEM;
 	struct kbase_hwcnt_backend_jm_info *info = NULL;
@@ -883,15 +802,12 @@ static int kbasep_hwcnt_backend_jm_info_create(
 	info->counter_set = KBASE_HWCNT_SET_PRIMARY;
 #endif
 
-	errcode = kbasep_hwcnt_backend_jm_gpu_info_init(kbdev,
-							&info->hwcnt_gpu_info);
+	errcode = kbasep_hwcnt_backend_jm_gpu_info_init(kbdev, &info->hwcnt_gpu_info);
 	if (errcode)
 		goto error;
 
-	errcode = kbase_hwcnt_jm_metadata_create(&info->hwcnt_gpu_info,
-						 info->counter_set,
-						 &info->metadata,
-						 &info->dump_bytes);
+	errcode = kbase_hwcnt_jm_metadata_create(&info->hwcnt_gpu_info, info->counter_set,
+						 &info->metadata, &info->dump_bytes);
 	if (errcode)
 		goto error;
 
@@ -903,9 +819,8 @@ error:
 	return errcode;
 }
 
-int kbase_hwcnt_backend_jm_create(
-	struct kbase_device *kbdev,
-	struct kbase_hwcnt_backend_interface *iface)
+int kbase_hwcnt_backend_jm_create(struct kbase_device *kbdev,
+				  struct kbase_hwcnt_backend_interface *iface)
 {
 	int errcode;
 	const struct kbase_hwcnt_backend_jm_info *info = NULL;
@@ -934,8 +849,7 @@ int kbase_hwcnt_backend_jm_create(
 	return 0;
 }
 
-void kbase_hwcnt_backend_jm_destroy(
-	struct kbase_hwcnt_backend_interface *iface)
+void kbase_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_interface *iface)
 {
 	if (!iface)
 		return;
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.h
index 1bc3906..4a6293c 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_jm.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,7 +27,7 @@
 #ifndef _KBASE_HWCNT_BACKEND_JM_H_
 #define _KBASE_HWCNT_BACKEND_JM_H_
 
-#include "mali_kbase_hwcnt_backend.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend.h"
 
 struct kbase_device;
 
@@ -42,9 +42,8 @@ struct kbase_device;
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_backend_jm_create(
-	struct kbase_device *kbdev,
-	struct kbase_hwcnt_backend_interface *iface);
+int kbase_hwcnt_backend_jm_create(struct kbase_device *kbdev,
+				  struct kbase_hwcnt_backend_interface *iface);
 
 /**
  * kbase_hwcnt_backend_jm_destroy() - Destroy a JM hardware counter backend
@@ -54,7 +53,6 @@ int kbase_hwcnt_backend_jm_create(
  * Can be safely called on an all-zeroed interface, or on an already destroyed
  * interface.
  */
-void kbase_hwcnt_backend_jm_destroy(
-	struct kbase_hwcnt_backend_interface *iface);
+void kbase_hwcnt_backend_jm_destroy(struct kbase_hwcnt_backend_interface *iface);
 
 #endif /* _KBASE_HWCNT_BACKEND_JM_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.c b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.c
index cdf3cd9..a8654ea 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.c
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,13 +21,20 @@
 
 #include <mali_kbase.h>
 
-#include <mali_kbase_hwcnt_gpu.h>
-#include <mali_kbase_hwcnt_types.h>
+#include <hwcnt/mali_kbase_hwcnt_gpu.h>
+#include <hwcnt/mali_kbase_hwcnt_types.h>
 
-#include <mali_kbase_hwcnt_backend.h>
-#include <mali_kbase_hwcnt_watchdog_if.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h>
+#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h>
 
+#if IS_ENABLED(CONFIG_MALI_IS_FPGA) && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+/* Backend watch dog timer interval in milliseconds: 18 seconds. */
+static const u32 hwcnt_backend_watchdog_timer_interval_ms = 18000;
+#else
+/* Backend watch dog timer interval in milliseconds: 1 second. */
 static const u32 hwcnt_backend_watchdog_timer_interval_ms = 1000;
+#endif /* IS_FPGA && !NO_MALI */
 
 /*
  * IDLE_BUFFER_EMPTY -> USER_DUMPING_BUFFER_EMPTY     on dump_request.
@@ -112,8 +119,7 @@ enum backend_watchdog_state {
  */
 enum wd_init_state {
 	HWCNT_JM_WD_INIT_START,
-	HWCNT_JM_WD_INIT_ALLOC = HWCNT_JM_WD_INIT_START,
-	HWCNT_JM_WD_INIT_BACKEND,
+	HWCNT_JM_WD_INIT_BACKEND = HWCNT_JM_WD_INIT_START,
 	HWCNT_JM_WD_INIT_ENABLE_MAP,
 	HWCNT_JM_WD_INIT_DUMP_BUFFER,
 	HWCNT_JM_WD_INIT_END
@@ -290,16 +296,10 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc
 	if (!wd_backend)
 		return;
 
-	/* disable timer thread to avoid concurrent access to shared resources */
-	wd_backend->info->dump_watchdog_iface->disable(
-		wd_backend->info->dump_watchdog_iface->timer);
+	WARN_ON(state > HWCNT_JM_WD_INIT_END);
 
-	/*will exit the loop when state reaches HWCNT_JM_WD_INIT_START*/
 	while (state-- > HWCNT_JM_WD_INIT_START) {
 		switch (state) {
-		case HWCNT_JM_WD_INIT_ALLOC:
-			kfree(wd_backend);
-			break;
 		case HWCNT_JM_WD_INIT_BACKEND:
 			wd_backend->info->jm_backend_iface->term(wd_backend->jm_backend);
 			break;
@@ -313,6 +313,8 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc
 			break;
 		}
 	}
+
+	kfree(wd_backend);
 }
 
 /* Job manager watchdog backend, implementation of kbase_hwcnt_backend_term_fn
@@ -320,11 +322,17 @@ kbasep_hwcnt_backend_jm_watchdog_term_partial(struct kbase_hwcnt_backend_jm_watc
  */
 static void kbasep_hwcnt_backend_jm_watchdog_term(struct kbase_hwcnt_backend *backend)
 {
+	struct kbase_hwcnt_backend_jm_watchdog *wd_backend =
+		(struct kbase_hwcnt_backend_jm_watchdog *)backend;
+
 	if (!backend)
 		return;
 
-	kbasep_hwcnt_backend_jm_watchdog_term_partial(
-		(struct kbase_hwcnt_backend_jm_watchdog *)backend, HWCNT_JM_WD_INIT_END);
+	/* disable timer thread to avoid concurrent access to shared resources */
+	wd_backend->info->dump_watchdog_iface->disable(
+		wd_backend->info->dump_watchdog_iface->timer);
+
+	kbasep_hwcnt_backend_jm_watchdog_term_partial(wd_backend, HWCNT_JM_WD_INIT_END);
 }
 
 /* Job manager watchdog backend, implementation of kbase_hwcnt_backend_init_fn */
@@ -344,20 +352,20 @@ static int kbasep_hwcnt_backend_jm_watchdog_init(const struct kbase_hwcnt_backen
 	jm_info = wd_info->jm_backend_iface->info;
 	metadata = wd_info->jm_backend_iface->metadata(wd_info->jm_backend_iface->info);
 
+	wd_backend = kmalloc(sizeof(*wd_backend), GFP_KERNEL);
+	if (!wd_backend) {
+		*out_backend = NULL;
+		return -ENOMEM;
+	}
+
+	*wd_backend = (struct kbase_hwcnt_backend_jm_watchdog){
+		.info = wd_info,
+		.timeout_ms = hwcnt_backend_watchdog_timer_interval_ms,
+		.locked = { .state = HWCNT_JM_WD_IDLE_BUFFER_EMPTY, .is_enabled = false }
+	};
+
 	while (state < HWCNT_JM_WD_INIT_END && !errcode) {
 		switch (state) {
-		case HWCNT_JM_WD_INIT_ALLOC:
-			wd_backend = kmalloc(sizeof(*wd_backend), GFP_KERNEL);
-			if (wd_backend) {
-				*wd_backend = (struct kbase_hwcnt_backend_jm_watchdog){
-					.info = wd_info,
-					.timeout_ms = hwcnt_backend_watchdog_timer_interval_ms,
-					.locked = { .state = HWCNT_JM_WD_IDLE_BUFFER_EMPTY,
-						    .is_enabled = false }
-				};
-			} else
-				errcode = -ENOMEM;
-			break;
 		case HWCNT_JM_WD_INIT_BACKEND:
 			errcode = wd_info->jm_backend_iface->init(jm_info, &wd_backend->jm_backend);
 			break;
@@ -817,5 +825,5 @@ void kbase_hwcnt_backend_jm_watchdog_destroy(struct kbase_hwcnt_backend_interfac
 	kfree((struct kbase_hwcnt_backend_jm_watchdog_info *)iface->info);
 
 	/*blanking the watchdog backend interface*/
-	*iface = (struct kbase_hwcnt_backend_interface){ NULL };
+	memset(iface, 0, sizeof(*iface));
 }
diff --git a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.h b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h
index 5021b4f..02a7952 100644
--- a/mali_kbase/mali_kbase_hwcnt_backend_jm_watchdog.h
+++ b/mali_kbase/hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -32,8 +32,8 @@
 #ifndef _KBASE_HWCNT_BACKEND_JM_WATCHDOG_H_
 #define _KBASE_HWCNT_BACKEND_JM_WATCHDOG_H_
 
-#include <mali_kbase_hwcnt_backend.h>
-#include <mali_kbase_hwcnt_watchdog_if.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend.h>
+#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h>
 
 /**
  * kbase_hwcnt_backend_jm_watchdog_create() - Create a job manager hardware counter watchdog
diff --git a/mali_kbase/mali_kbase_hwcnt.c b/mali_kbase/hwcnt/mali_kbase_hwcnt.c
index a54f005..34deb5d 100644
--- a/mali_kbase/mali_kbase_hwcnt.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,10 +23,10 @@
  * Implementation of hardware counter context and accumulator APIs.
  */
 
-#include "mali_kbase_hwcnt_context.h"
-#include "mali_kbase_hwcnt_accumulator.h"
-#include "mali_kbase_hwcnt_backend.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_context.h"
+#include "hwcnt/mali_kbase_hwcnt_accumulator.h"
+#include "hwcnt/backend/mali_kbase_hwcnt_backend.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
@@ -39,11 +39,7 @@
  * @ACCUM_STATE_ENABLED:  Enabled state, where dumping is enabled if there are
  *                        any enabled counters.
  */
-enum kbase_hwcnt_accum_state {
-	ACCUM_STATE_ERROR,
-	ACCUM_STATE_DISABLED,
-	ACCUM_STATE_ENABLED
-};
+enum kbase_hwcnt_accum_state { ACCUM_STATE_ERROR, ACCUM_STATE_DISABLED, ACCUM_STATE_ENABLED };
 
 /**
  * struct kbase_hwcnt_accumulator - Hardware counter accumulator structure.
@@ -130,9 +126,8 @@ struct kbase_hwcnt_context {
 	struct workqueue_struct *wq;
 };
 
-int kbase_hwcnt_context_init(
-	const struct kbase_hwcnt_backend_interface *iface,
-	struct kbase_hwcnt_context **out_hctx)
+int kbase_hwcnt_context_init(const struct kbase_hwcnt_backend_interface *iface,
+			     struct kbase_hwcnt_context **out_hctx)
 {
 	struct kbase_hwcnt_context *hctx = NULL;
 
@@ -149,8 +144,7 @@ int kbase_hwcnt_context_init(
 	mutex_init(&hctx->accum_lock);
 	hctx->accum_inited = false;
 
-	hctx->wq =
-		alloc_workqueue("mali_kbase_hwcnt", WQ_HIGHPRI | WQ_UNBOUND, 0);
+	hctx->wq = alloc_workqueue("mali_kbase_hwcnt", WQ_HIGHPRI | WQ_UNBOUND, 0);
 	if (!hctx->wq)
 		goto err_alloc_workqueue;
 
@@ -208,35 +202,30 @@ static int kbasep_hwcnt_accumulator_init(struct kbase_hwcnt_context *hctx)
 	WARN_ON(!hctx);
 	WARN_ON(!hctx->accum_inited);
 
-	errcode = hctx->iface->init(
-		hctx->iface->info, &hctx->accum.backend);
+	errcode = hctx->iface->init(hctx->iface->info, &hctx->accum.backend);
 	if (errcode)
 		goto error;
 
 	hctx->accum.metadata = hctx->iface->metadata(hctx->iface->info);
 	hctx->accum.state = ACCUM_STATE_ERROR;
 
-	errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata,
-					       &hctx->accum.enable_map);
+	errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, &hctx->accum.enable_map);
 	if (errcode)
 		goto error;
 
 	hctx->accum.enable_map_any_enabled = false;
 
-	errcode = kbase_hwcnt_dump_buffer_alloc(hctx->accum.metadata,
-						&hctx->accum.accum_buf);
+	errcode = kbase_hwcnt_dump_buffer_alloc(hctx->accum.metadata, &hctx->accum.accum_buf);
 	if (errcode)
 		goto error;
 
-	errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata,
-					       &hctx->accum.scratch_map);
+	errcode = kbase_hwcnt_enable_map_alloc(hctx->accum.metadata, &hctx->accum.scratch_map);
 	if (errcode)
 		goto error;
 
 	hctx->accum.accumulated = false;
 
-	hctx->accum.ts_last_dump_ns =
-		hctx->iface->timestamp_ns(hctx->accum.backend);
+	hctx->accum.ts_last_dump_ns = hctx->iface->timestamp_ns(hctx->accum.backend);
 
 	return 0;
 
@@ -252,8 +241,7 @@ error:
  * @hctx:       Non-NULL pointer to hardware counter context.
  * @accumulate: True if we should accumulate before disabling, else false.
  */
-static void kbasep_hwcnt_accumulator_disable(
-	struct kbase_hwcnt_context *hctx, bool accumulate)
+static void kbasep_hwcnt_accumulator_disable(struct kbase_hwcnt_context *hctx, bool accumulate)
 {
 	int errcode = 0;
 	bool backend_enabled = false;
@@ -272,8 +260,7 @@ static void kbasep_hwcnt_accumulator_disable(
 	WARN_ON(hctx->disable_count != 0);
 	WARN_ON(hctx->accum.state == ACCUM_STATE_DISABLED);
 
-	if ((hctx->accum.state == ACCUM_STATE_ENABLED) &&
-	    (accum->enable_map_any_enabled))
+	if ((hctx->accum.state == ACCUM_STATE_ENABLED) && (accum->enable_map_any_enabled))
 		backend_enabled = true;
 
 	if (!backend_enabled)
@@ -297,8 +284,8 @@ static void kbasep_hwcnt_accumulator_disable(
 	if (errcode)
 		goto disable;
 
-	errcode = hctx->iface->dump_get(accum->backend,
-		&accum->accum_buf, &accum->enable_map, accum->accumulated);
+	errcode = hctx->iface->dump_get(accum->backend, &accum->accum_buf, &accum->enable_map,
+					accum->accumulated);
 	if (errcode)
 		goto disable;
 
@@ -336,8 +323,7 @@ static void kbasep_hwcnt_accumulator_enable(struct kbase_hwcnt_context *hctx)
 
 	/* The backend only needs enabling if any counters are enabled */
 	if (accum->enable_map_any_enabled)
-		errcode = hctx->iface->dump_enable_nolock(
-			accum->backend, &accum->enable_map);
+		errcode = hctx->iface->dump_enable_nolock(accum->backend, &accum->enable_map);
 
 	if (!errcode)
 		accum->state = ACCUM_STATE_ENABLED;
@@ -364,12 +350,9 @@ static void kbasep_hwcnt_accumulator_enable(struct kbase_hwcnt_context *hctx)
  *
  * Return:       0 on success, else error code.
  */
-static int kbasep_hwcnt_accumulator_dump(
-	struct kbase_hwcnt_context *hctx,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf,
-	const struct kbase_hwcnt_enable_map *new_map)
+static int kbasep_hwcnt_accumulator_dump(struct kbase_hwcnt_context *hctx, u64 *ts_start_ns,
+					 u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf,
+					 const struct kbase_hwcnt_enable_map *new_map)
 {
 	int errcode = 0;
 	unsigned long flags;
@@ -379,7 +362,7 @@ static int kbasep_hwcnt_accumulator_dump(
 	bool cur_map_any_enabled;
 	struct kbase_hwcnt_enable_map *cur_map;
 	bool new_map_any_enabled = false;
-	u64 dump_time_ns;
+	u64 dump_time_ns = 0;
 	struct kbase_hwcnt_accumulator *accum;
 
 	WARN_ON(!hctx);
@@ -398,8 +381,7 @@ static int kbasep_hwcnt_accumulator_dump(
 	kbase_hwcnt_enable_map_copy(cur_map, &accum->enable_map);
 
 	if (new_map)
-		new_map_any_enabled =
-			kbase_hwcnt_enable_map_any_enabled(new_map);
+		new_map_any_enabled = kbase_hwcnt_enable_map_any_enabled(new_map);
 
 	/*
 	 * We're holding accum_lock, so the accumulator state might transition
@@ -426,8 +408,7 @@ static int kbasep_hwcnt_accumulator_dump(
 	 * then we'll do it ourselves after the dump.
 	 */
 	if (new_map) {
-		kbase_hwcnt_enable_map_copy(
-			&accum->enable_map, new_map);
+		kbase_hwcnt_enable_map_copy(&accum->enable_map, new_map);
 		accum->enable_map_any_enabled = new_map_any_enabled;
 	}
 
@@ -440,12 +421,10 @@ static int kbasep_hwcnt_accumulator_dump(
 	/* Initiate the dump if the backend is enabled. */
 	if ((state == ACCUM_STATE_ENABLED) && cur_map_any_enabled) {
 		if (dump_buf) {
-			errcode = hctx->iface->dump_request(
-					accum->backend, &dump_time_ns);
+			errcode = hctx->iface->dump_request(accum->backend, &dump_time_ns);
 			dump_requested = true;
 		} else {
-			dump_time_ns = hctx->iface->timestamp_ns(
-					accum->backend);
+			dump_time_ns = hctx->iface->timestamp_ns(accum->backend);
 			errcode = hctx->iface->dump_clear(accum->backend);
 		}
 
@@ -457,8 +436,7 @@ static int kbasep_hwcnt_accumulator_dump(
 
 	/* Copy any accumulation into the dest buffer */
 	if (accum->accumulated && dump_buf) {
-		kbase_hwcnt_dump_buffer_copy(
-			dump_buf, &accum->accum_buf, cur_map);
+		kbase_hwcnt_dump_buffer_copy(dump_buf, &accum->accum_buf, cur_map);
 		dump_written = true;
 	}
 
@@ -483,8 +461,7 @@ static int kbasep_hwcnt_accumulator_dump(
 		 * we're already enabled and holding accum_lock is impossible.
 		 */
 		if (new_map_any_enabled) {
-			errcode = hctx->iface->dump_enable(
-				accum->backend, new_map);
+			errcode = hctx->iface->dump_enable(accum->backend, new_map);
 			if (errcode)
 				goto error;
 		}
@@ -495,11 +472,8 @@ static int kbasep_hwcnt_accumulator_dump(
 		/* If we dumped, copy or accumulate it into the destination */
 		if (dump_requested) {
 			WARN_ON(state != ACCUM_STATE_ENABLED);
-			errcode = hctx->iface->dump_get(
-				accum->backend,
-				dump_buf,
-				cur_map,
-				dump_written);
+			errcode = hctx->iface->dump_get(accum->backend, dump_buf, cur_map,
+							dump_written);
 			if (errcode)
 				goto error;
 			dump_written = true;
@@ -540,8 +514,7 @@ error:
  * @hctx:       Non-NULL pointer to hardware counter context.
  * @accumulate: True if we should accumulate before disabling, else false.
  */
-static void kbasep_hwcnt_context_disable(
-	struct kbase_hwcnt_context *hctx, bool accumulate)
+static void kbasep_hwcnt_context_disable(struct kbase_hwcnt_context *hctx, bool accumulate)
 {
 	unsigned long flags;
 
@@ -563,9 +536,8 @@ static void kbasep_hwcnt_context_disable(
 	}
 }
 
-int kbase_hwcnt_accumulator_acquire(
-	struct kbase_hwcnt_context *hctx,
-	struct kbase_hwcnt_accumulator **accum)
+int kbase_hwcnt_accumulator_acquire(struct kbase_hwcnt_context *hctx,
+				    struct kbase_hwcnt_accumulator **accum)
 {
 	int errcode = 0;
 	unsigned long flags;
@@ -618,9 +590,7 @@ int kbase_hwcnt_accumulator_acquire(
 	 * Regardless of initial state, counters don't need to be enabled via
 	 * the backend, as the initial enable map has no enabled counters.
 	 */
-	hctx->accum.state = (hctx->disable_count == 0) ?
-		ACCUM_STATE_ENABLED :
-		ACCUM_STATE_DISABLED;
+	hctx->accum.state = (hctx->disable_count == 0) ? ACCUM_STATE_ENABLED : ACCUM_STATE_DISABLED;
 
 	spin_unlock_irqrestore(&hctx->state_lock, flags);
 
@@ -728,8 +698,7 @@ void kbase_hwcnt_context_enable(struct kbase_hwcnt_context *hctx)
 	spin_unlock_irqrestore(&hctx->state_lock, flags);
 }
 
-const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(
-	struct kbase_hwcnt_context *hctx)
+const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(struct kbase_hwcnt_context *hctx)
 {
 	if (!hctx)
 		return NULL;
@@ -737,8 +706,7 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(
 	return hctx->iface->metadata(hctx->iface->info);
 }
 
-bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx,
-				    struct work_struct *work)
+bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, struct work_struct *work)
 {
 	if (WARN_ON(!hctx) || WARN_ON(!work))
 		return false;
@@ -746,12 +714,10 @@ bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx,
 	return queue_work(hctx->wq, work);
 }
 
-int kbase_hwcnt_accumulator_set_counters(
-	struct kbase_hwcnt_accumulator *accum,
-	const struct kbase_hwcnt_enable_map *new_map,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+int kbase_hwcnt_accumulator_set_counters(struct kbase_hwcnt_accumulator *accum,
+					 const struct kbase_hwcnt_enable_map *new_map,
+					 u64 *ts_start_ns, u64 *ts_end_ns,
+					 struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
 	struct kbase_hwcnt_context *hctx;
@@ -767,19 +733,15 @@ int kbase_hwcnt_accumulator_set_counters(
 
 	mutex_lock(&hctx->accum_lock);
 
-	errcode = kbasep_hwcnt_accumulator_dump(
-		hctx, ts_start_ns, ts_end_ns, dump_buf, new_map);
+	errcode = kbasep_hwcnt_accumulator_dump(hctx, ts_start_ns, ts_end_ns, dump_buf, new_map);
 
 	mutex_unlock(&hctx->accum_lock);
 
 	return errcode;
 }
 
-int kbase_hwcnt_accumulator_dump(
-	struct kbase_hwcnt_accumulator *accum,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+int kbase_hwcnt_accumulator_dump(struct kbase_hwcnt_accumulator *accum, u64 *ts_start_ns,
+				 u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
 	struct kbase_hwcnt_context *hctx;
@@ -794,8 +756,7 @@ int kbase_hwcnt_accumulator_dump(
 
 	mutex_lock(&hctx->accum_lock);
 
-	errcode = kbasep_hwcnt_accumulator_dump(
-		hctx, ts_start_ns, ts_end_ns, dump_buf, NULL);
+	errcode = kbasep_hwcnt_accumulator_dump(hctx, ts_start_ns, ts_end_ns, dump_buf, NULL);
 
 	mutex_unlock(&hctx->accum_lock);
 
diff --git a/mali_kbase/mali_kbase_hwcnt_accumulator.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_accumulator.h
index af542ea..069e020 100644
--- a/mali_kbase/mali_kbase_hwcnt_accumulator.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_accumulator.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -67,9 +67,8 @@ struct kbase_hwcnt_dump_buffer;
  *
  * Return: 0 on success or error code.
  */
-int kbase_hwcnt_accumulator_acquire(
-	struct kbase_hwcnt_context *hctx,
-	struct kbase_hwcnt_accumulator **accum);
+int kbase_hwcnt_accumulator_acquire(struct kbase_hwcnt_context *hctx,
+				    struct kbase_hwcnt_accumulator **accum);
 
 /**
  * kbase_hwcnt_accumulator_release() - Release a hardware counter accumulator.
@@ -102,12 +101,10 @@ void kbase_hwcnt_accumulator_release(struct kbase_hwcnt_accumulator *accum);
  *
  * Return: 0 on success or error code.
  */
-int kbase_hwcnt_accumulator_set_counters(
-	struct kbase_hwcnt_accumulator *accum,
-	const struct kbase_hwcnt_enable_map *new_map,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf);
+int kbase_hwcnt_accumulator_set_counters(struct kbase_hwcnt_accumulator *accum,
+					 const struct kbase_hwcnt_enable_map *new_map,
+					 u64 *ts_start_ns, u64 *ts_end_ns,
+					 struct kbase_hwcnt_dump_buffer *dump_buf);
 
 /**
  * kbase_hwcnt_accumulator_dump() - Perform a dump of the currently enabled
@@ -127,11 +124,8 @@ int kbase_hwcnt_accumulator_set_counters(
  *
  * Return: 0 on success or error code.
  */
-int kbase_hwcnt_accumulator_dump(
-	struct kbase_hwcnt_accumulator *accum,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf);
+int kbase_hwcnt_accumulator_dump(struct kbase_hwcnt_accumulator *accum, u64 *ts_start_ns,
+				 u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf);
 
 /**
  * kbase_hwcnt_accumulator_timestamp_ns() - Get the current accumulator backend
diff --git a/mali_kbase/mali_kbase_hwcnt_context.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_context.h
index 34423d1..89732a9 100644
--- a/mali_kbase/mali_kbase_hwcnt_context.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_context.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -43,9 +43,8 @@ struct kbase_hwcnt_context;
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_context_init(
-	const struct kbase_hwcnt_backend_interface *iface,
-	struct kbase_hwcnt_context **out_hctx);
+int kbase_hwcnt_context_init(const struct kbase_hwcnt_backend_interface *iface,
+			     struct kbase_hwcnt_context **out_hctx);
 
 /**
  * kbase_hwcnt_context_term() - Terminate a hardware counter context.
@@ -61,8 +60,7 @@ void kbase_hwcnt_context_term(struct kbase_hwcnt_context *hctx);
  *
  * Return: Non-NULL pointer to metadata, or NULL on error.
  */
-const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(
-	struct kbase_hwcnt_context *hctx);
+const struct kbase_hwcnt_metadata *kbase_hwcnt_context_metadata(struct kbase_hwcnt_context *hctx);
 
 /**
  * kbase_hwcnt_context_disable() - Increment the disable count of the context.
@@ -145,7 +143,6 @@ void kbase_hwcnt_context_enable(struct kbase_hwcnt_context *hctx);
  * this meant progress through the power management states could be stalled
  * for however long that higher priority thread took.
  */
-bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx,
-				    struct work_struct *work);
+bool kbase_hwcnt_context_queue_work(struct kbase_hwcnt_context *hctx, struct work_struct *work);
 
 #endif /* _KBASE_HWCNT_CONTEXT_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_gpu.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.c
index 752d096..74916da 100644
--- a/mali_kbase/mali_kbase_hwcnt_gpu.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,10 +19,9 @@
  *
  */
 
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
-#include <linux/bug.h>
 #include <linux/err.h>
 
 /** enum enable_map_idx - index into a block enable map that spans multiple u64 array elements
@@ -33,8 +32,7 @@ enum enable_map_idx {
 	EM_COUNT,
 };
 
-static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set,
-				     bool is_csf)
+static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, bool is_csf)
 {
 	switch (counter_set) {
 	case KBASE_HWCNT_SET_PRIMARY:
@@ -44,21 +42,20 @@ static void kbasep_get_fe_block_type(u64 *dst, enum kbase_hwcnt_set counter_set,
 		if (is_csf)
 			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2;
 		else
-			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED;
+			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED;
 		break;
 	case KBASE_HWCNT_SET_TERTIARY:
 		if (is_csf)
 			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3;
 		else
-			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED;
+			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED;
 		break;
 	default:
 		WARN_ON(true);
 	}
 }
 
-static void kbasep_get_tiler_block_type(u64 *dst,
-					enum kbase_hwcnt_set counter_set)
+static void kbasep_get_tiler_block_type(u64 *dst, enum kbase_hwcnt_set counter_set)
 {
 	switch (counter_set) {
 	case KBASE_HWCNT_SET_PRIMARY:
@@ -66,15 +63,14 @@ static void kbasep_get_tiler_block_type(u64 *dst,
 		break;
 	case KBASE_HWCNT_SET_SECONDARY:
 	case KBASE_HWCNT_SET_TERTIARY:
-		*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED;
+		*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED;
 		break;
 	default:
 		WARN_ON(true);
 	}
 }
 
-static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set,
-				     bool is_csf)
+static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set, bool is_csf)
 {
 	switch (counter_set) {
 	case KBASE_HWCNT_SET_PRIMARY:
@@ -87,15 +83,14 @@ static void kbasep_get_sc_block_type(u64 *dst, enum kbase_hwcnt_set counter_set,
 		if (is_csf)
 			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3;
 		else
-			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED;
+			*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED;
 		break;
 	default:
 		WARN_ON(true);
 	}
 }
 
-static void kbasep_get_memsys_block_type(u64 *dst,
-					 enum kbase_hwcnt_set counter_set)
+static void kbasep_get_memsys_block_type(u64 *dst, enum kbase_hwcnt_set counter_set)
 {
 	switch (counter_set) {
 	case KBASE_HWCNT_SET_PRIMARY:
@@ -105,7 +100,7 @@ static void kbasep_get_memsys_block_type(u64 *dst,
 		*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2;
 		break;
 	case KBASE_HWCNT_SET_TERTIARY:
-		*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED;
+		*dst = KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED;
 		break;
 	default:
 		WARN_ON(true);
@@ -123,15 +118,14 @@ static void kbasep_get_memsys_block_type(u64 *dst,
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_backend_gpu_metadata_create(
-	const struct kbase_hwcnt_gpu_info *gpu_info, const bool is_csf,
-	enum kbase_hwcnt_set counter_set,
-	const struct kbase_hwcnt_metadata **metadata)
+static int kbasep_hwcnt_backend_gpu_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info,
+						    const bool is_csf,
+						    enum kbase_hwcnt_set counter_set,
+						    const struct kbase_hwcnt_metadata **metadata)
 {
 	struct kbase_hwcnt_description desc;
 	struct kbase_hwcnt_group_description group;
-	struct kbase_hwcnt_block_description
-		blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT];
+	struct kbase_hwcnt_block_description blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT];
 	size_t non_sc_block_count;
 	size_t sc_block_count;
 
@@ -157,22 +151,19 @@ static int kbasep_hwcnt_backend_gpu_metadata_create(
 	kbasep_get_fe_block_type(&blks[0].type, counter_set, is_csf);
 	blks[0].inst_cnt = 1;
 	blks[0].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
-	blks[0].ctr_cnt = gpu_info->prfcnt_values_per_block -
-			  KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
+	blks[0].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
 
 	/* One Tiler block */
 	kbasep_get_tiler_block_type(&blks[1].type, counter_set);
 	blks[1].inst_cnt = 1;
 	blks[1].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
-	blks[1].ctr_cnt = gpu_info->prfcnt_values_per_block -
-			  KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
+	blks[1].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
 
 	/* l2_count memsys blks */
 	kbasep_get_memsys_block_type(&blks[2].type, counter_set);
 	blks[2].inst_cnt = gpu_info->l2_count;
 	blks[2].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
-	blks[2].ctr_cnt = gpu_info->prfcnt_values_per_block -
-			  KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
+	blks[2].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
 
 	/*
 	 * There are as many shader cores in the system as there are bits set in
@@ -193,8 +184,7 @@ static int kbasep_hwcnt_backend_gpu_metadata_create(
 	kbasep_get_sc_block_type(&blks[3].type, counter_set, is_csf);
 	blks[3].inst_cnt = sc_block_count;
 	blks[3].hdr_cnt = KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
-	blks[3].ctr_cnt = gpu_info->prfcnt_values_per_block -
-			  KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
+	blks[3].ctr_cnt = gpu_info->prfcnt_values_per_block - KBASE_HWCNT_V5_HEADERS_PER_BLOCK;
 
 	WARN_ON(KBASE_HWCNT_V5_BLOCK_TYPE_COUNT != 4);
 
@@ -221,8 +211,7 @@ static int kbasep_hwcnt_backend_gpu_metadata_create(
  *
  * Return: Size of buffer the GPU needs to perform a counter dump.
  */
-static size_t
-kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info)
+static size_t kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info)
 {
 	WARN_ON(!gpu_info);
 
@@ -230,11 +219,10 @@ kbasep_hwcnt_backend_jm_dump_bytes(const struct kbase_hwcnt_gpu_info *gpu_info)
 	       gpu_info->prfcnt_values_per_block * KBASE_HWCNT_VALUE_HW_BYTES;
 }
 
-int kbase_hwcnt_jm_metadata_create(
-	const struct kbase_hwcnt_gpu_info *gpu_info,
-	enum kbase_hwcnt_set counter_set,
-	const struct kbase_hwcnt_metadata **out_metadata,
-	size_t *out_dump_bytes)
+int kbase_hwcnt_jm_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info,
+				   enum kbase_hwcnt_set counter_set,
+				   const struct kbase_hwcnt_metadata **out_metadata,
+				   size_t *out_dump_bytes)
 {
 	int errcode;
 	const struct kbase_hwcnt_metadata *metadata;
@@ -251,8 +239,7 @@ int kbase_hwcnt_jm_metadata_create(
 	 * all the available L2 cache and Shader cores are allocated.
 	 */
 	dump_bytes = kbasep_hwcnt_backend_jm_dump_bytes(gpu_info);
-	errcode = kbasep_hwcnt_backend_gpu_metadata_create(
-		gpu_info, false, counter_set, &metadata);
+	errcode = kbasep_hwcnt_backend_gpu_metadata_create(gpu_info, false, counter_set, &metadata);
 	if (errcode)
 		return errcode;
 
@@ -277,10 +264,9 @@ void kbase_hwcnt_jm_metadata_destroy(const struct kbase_hwcnt_metadata *metadata
 	kbase_hwcnt_metadata_destroy(metadata);
 }
 
-int kbase_hwcnt_csf_metadata_create(
-	const struct kbase_hwcnt_gpu_info *gpu_info,
-	enum kbase_hwcnt_set counter_set,
-	const struct kbase_hwcnt_metadata **out_metadata)
+int kbase_hwcnt_csf_metadata_create(const struct kbase_hwcnt_gpu_info *gpu_info,
+				    enum kbase_hwcnt_set counter_set,
+				    const struct kbase_hwcnt_metadata **out_metadata)
 {
 	int errcode;
 	const struct kbase_hwcnt_metadata *metadata;
@@ -288,8 +274,7 @@ int kbase_hwcnt_csf_metadata_create(
 	if (!gpu_info || !out_metadata)
 		return -EINVAL;
 
-	errcode = kbasep_hwcnt_backend_gpu_metadata_create(
-		gpu_info, true, counter_set, &metadata);
+	errcode = kbasep_hwcnt_backend_gpu_metadata_create(gpu_info, true, counter_set, &metadata);
 	if (errcode)
 		return errcode;
 
@@ -298,8 +283,7 @@ int kbase_hwcnt_csf_metadata_create(
 	return 0;
 }
 
-void kbase_hwcnt_csf_metadata_destroy(
-	const struct kbase_hwcnt_metadata *metadata)
+void kbase_hwcnt_csf_metadata_destroy(const struct kbase_hwcnt_metadata *metadata)
 {
 	if (!metadata)
 		return;
@@ -307,10 +291,7 @@ void kbase_hwcnt_csf_metadata_destroy(
 	kbase_hwcnt_metadata_destroy(metadata);
 }
 
-static bool is_block_type_shader(
-	const u64 grp_type,
-	const u64 blk_type,
-	const size_t blk)
+static bool is_block_type_shader(const u64 grp_type, const u64 blk_type, const size_t blk)
 {
 	bool is_shader = false;
 
@@ -320,22 +301,22 @@ static bool is_block_type_shader(
 
 	if (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC ||
 	    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2 ||
-	    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3)
+	    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3 ||
+	    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED)
 		is_shader = true;
 
 	return is_shader;
 }
 
-static bool is_block_type_l2_cache(
-	const u64 grp_type,
-	const u64 blk_type)
+static bool is_block_type_l2_cache(const u64 grp_type, const u64 blk_type)
 {
 	bool is_l2_cache = false;
 
 	switch (grp_type) {
 	case KBASE_HWCNT_GPU_GROUP_TYPE_V5:
 		if (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS ||
-		    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2)
+		    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2 ||
+		    blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED)
 			is_l2_cache = true;
 		break;
 	default:
@@ -347,10 +328,8 @@ static bool is_block_type_l2_cache(
 }
 
 int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
-			    const struct kbase_hwcnt_enable_map *dst_enable_map,
-			    u64 pm_core_mask,
-			    const struct kbase_hwcnt_curr_config *curr_config,
-			    bool accumulate)
+			    const struct kbase_hwcnt_enable_map *dst_enable_map, u64 pm_core_mask,
+			    const struct kbase_hwcnt_curr_config *curr_config, bool accumulate)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
@@ -361,28 +340,23 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
 	/* Variables to deal with the current configuration */
 	int l2_count = 0;
 
-	if (!dst || !src || !dst_enable_map ||
-	    (dst_enable_map->metadata != dst->metadata))
+	if (!dst || !src || !dst_enable_map || (dst_enable_map->metadata != dst->metadata))
 		return -EINVAL;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(
-		metadata, grp, blk, blk_inst) {
-		const size_t hdr_cnt =
-			kbase_hwcnt_metadata_block_headers_count(
-				metadata, grp, blk);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk);
 		const size_t ctr_cnt =
-			kbase_hwcnt_metadata_block_counters_count(
-				metadata, grp, blk);
-		const u64 blk_type = kbase_hwcnt_metadata_block_type(
-			metadata, grp, blk);
+			kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk);
+		const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk);
 		const bool is_shader_core = is_block_type_shader(
-			kbase_hwcnt_metadata_group_type(metadata, grp),
-			blk_type, blk);
+			kbase_hwcnt_metadata_group_type(metadata, grp), blk_type, blk);
 		const bool is_l2_cache = is_block_type_l2_cache(
-			kbase_hwcnt_metadata_group_type(metadata, grp),
-			blk_type);
+			kbase_hwcnt_metadata_group_type(metadata, grp), blk_type);
+		const bool is_undefined = kbase_hwcnt_is_block_type_undefined(
+			kbase_hwcnt_metadata_group_type(metadata, grp), blk_type);
 		bool hw_res_available = true;
 
 		/*
@@ -409,25 +383,46 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
 		/*
 		 * Skip block if no values in the destination block are enabled.
 		 */
-		if (kbase_hwcnt_enable_map_block_enabled(
-			dst_enable_map, grp, blk, blk_inst)) {
-			u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-				dst, grp, blk, blk_inst);
+		if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) {
+			u64 *dst_blk =
+				kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
 			const u64 *src_blk = dump_src + src_offset;
+			bool blk_powered;
+
+			if (!is_shader_core) {
+				/* Under the current PM system, counters will
+				 * only be enabled after all non shader core
+				 * blocks are powered up.
+				 */
+				blk_powered = true;
+			} else {
+				/* Check the PM core mask to see if the shader
+				 * core is powered up.
+				 */
+				blk_powered = core_mask & 1;
+			}
 
-			if ((!is_shader_core || (core_mask & 1)) && hw_res_available) {
+			if (blk_powered && !is_undefined && hw_res_available) {
+				/* Only powered and defined blocks have valid data. */
 				if (accumulate) {
-					kbase_hwcnt_dump_buffer_block_accumulate(
-						dst_blk, src_blk, hdr_cnt,
-						ctr_cnt);
+					kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk,
+										 hdr_cnt, ctr_cnt);
 				} else {
-					kbase_hwcnt_dump_buffer_block_copy(
-						dst_blk, src_blk,
-						(hdr_cnt + ctr_cnt));
+					kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk,
+									   (hdr_cnt + ctr_cnt));
+				}
+			} else {
+				/* Even though the block might be undefined, the
+				 * user has enabled counter collection for it.
+				 * We should not propagate garbage data.
+				 */
+				if (accumulate) {
+					/* No-op to preserve existing values */
+				} else {
+					/* src is garbage, so zero the dst */
+					kbase_hwcnt_dump_buffer_block_zero(dst_blk,
+									   (hdr_cnt + ctr_cnt));
 				}
-			} else if (!accumulate) {
-				kbase_hwcnt_dump_buffer_block_zero(
-					dst_blk, (hdr_cnt + ctr_cnt));
 			}
 		}
 
@@ -442,42 +437,55 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
 }
 
 int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
-			     const struct kbase_hwcnt_enable_map *dst_enable_map,
-			     bool accumulate)
+			     const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	const u64 *dump_src = src;
 	size_t src_offset = 0;
 	size_t grp, blk, blk_inst;
 
-	if (!dst || !src || !dst_enable_map ||
-	    (dst_enable_map->metadata != dst->metadata))
+	if (!dst || !src || !dst_enable_map || (dst_enable_map->metadata != dst->metadata))
 		return -EINVAL;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(
-			metadata, grp, blk);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		const size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk);
 		const size_t ctr_cnt =
-			kbase_hwcnt_metadata_block_counters_count(metadata, grp,
-								  blk);
+			kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk);
+		const uint64_t blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk);
+		const bool is_undefined = kbase_hwcnt_is_block_type_undefined(
+			kbase_hwcnt_metadata_group_type(metadata, grp), blk_type);
 
 		/*
 		 * Skip block if no values in the destination block are enabled.
 		 */
-		if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp,
-							 blk, blk_inst)) {
-			u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-				dst, grp, blk, blk_inst);
+		if (kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst)) {
+			u64 *dst_blk =
+				kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
 			const u64 *src_blk = dump_src + src_offset;
 
-			if (accumulate) {
-				kbase_hwcnt_dump_buffer_block_accumulate(
-					dst_blk, src_blk, hdr_cnt, ctr_cnt);
+			if (!is_undefined) {
+				if (accumulate) {
+					kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk,
+										 hdr_cnt, ctr_cnt);
+				} else {
+					kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk,
+									   (hdr_cnt + ctr_cnt));
+				}
 			} else {
-				kbase_hwcnt_dump_buffer_block_copy(
-					dst_blk, src_blk, (hdr_cnt + ctr_cnt));
+				/* Even though the block might be undefined, the
+				 * user has enabled counter collection for it.
+				 * We should not propagate garbage data.
+				 */
+				if (accumulate) {
+					/* No-op to preserve existing values */
+				} else {
+					/* src is garbage, so zero the dst */
+					kbase_hwcnt_dump_buffer_block_zero(dst_blk,
+									   (hdr_cnt + ctr_cnt));
+				}
 			}
 		}
 
@@ -498,12 +506,9 @@ int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
  * @hi:   Non-NULL pointer to where high 64 bits of block enable map abstraction
  *        will be stored.
  */
-static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical(
-	u32 phys,
-	u64 *lo,
-	u64 *hi)
+static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical(u32 phys, u64 *lo, u64 *hi)
 {
-	u64 dwords[2] = {0, 0};
+	u64 dwords[2] = { 0, 0 };
 
 	size_t dword_idx;
 
@@ -528,9 +533,8 @@ static inline void kbasep_hwcnt_backend_gpu_block_map_from_physical(
 	*hi = dwords[1];
 }
 
-void kbase_hwcnt_gpu_enable_map_to_physical(
-	struct kbase_hwcnt_physical_enable_map *dst,
-	const struct kbase_hwcnt_enable_map *src)
+void kbase_hwcnt_gpu_enable_map_to_physical(struct kbase_hwcnt_physical_enable_map *dst,
+					    const struct kbase_hwcnt_enable_map *src)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	u64 fe_bm[EM_COUNT] = { 0 };
@@ -544,17 +548,13 @@ void kbase_hwcnt_gpu_enable_map_to_physical(
 
 	metadata = src->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(
-		metadata, grp, blk, blk_inst) {
-		const u64 grp_type = kbase_hwcnt_metadata_group_type(
-			metadata, grp);
-		const u64 blk_type = kbase_hwcnt_metadata_block_type(
-			metadata, grp, blk);
-		const u64 *blk_map = kbase_hwcnt_enable_map_block_instance(
-			src, grp, blk, blk_inst);
-
-		if ((enum kbase_hwcnt_gpu_group_type)grp_type ==
-		    KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp);
+		const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk);
+		const u64 *blk_map = kbase_hwcnt_enable_map_block_instance(src, grp, blk, blk_inst);
+
+		if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
 			const size_t map_stride =
 				kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk);
 			size_t map_idx;
@@ -564,7 +564,10 @@ void kbase_hwcnt_gpu_enable_map_to_physical(
 					break;
 
 				switch ((enum kbase_hwcnt_gpu_v5_block_type)blk_type) {
-				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED:
 					/* Nothing to do in this case. */
 					break;
 				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE:
@@ -602,8 +605,7 @@ void kbase_hwcnt_gpu_enable_map_to_physical(
 		kbase_hwcnt_backend_gpu_block_map_to_physical(mmu_l2_bm[EM_LO], mmu_l2_bm[EM_HI]);
 }
 
-void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst,
-				     enum kbase_hwcnt_set src)
+void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, enum kbase_hwcnt_set src)
 {
 	switch (src) {
 	case KBASE_HWCNT_SET_PRIMARY:
@@ -620,9 +622,8 @@ void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst,
 	}
 }
 
-void kbase_hwcnt_gpu_enable_map_from_physical(
-	struct kbase_hwcnt_enable_map *dst,
-	const struct kbase_hwcnt_physical_enable_map *src)
+void kbase_hwcnt_gpu_enable_map_from_physical(struct kbase_hwcnt_enable_map *dst,
+					      const struct kbase_hwcnt_physical_enable_map *src)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 
@@ -645,16 +646,13 @@ void kbase_hwcnt_gpu_enable_map_from_physical(
 	kbasep_hwcnt_backend_gpu_block_map_from_physical(src->mmu_l2_bm, &mmu_l2_bm[EM_LO],
 							 &mmu_l2_bm[EM_HI]);
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		const u64 grp_type = kbase_hwcnt_metadata_group_type(
-			metadata, grp);
-		const u64 blk_type = kbase_hwcnt_metadata_block_type(
-			metadata, grp, blk);
-		u64 *blk_map = kbase_hwcnt_enable_map_block_instance(
-			dst, grp, blk, blk_inst);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp);
+		const u64 blk_type = kbase_hwcnt_metadata_block_type(metadata, grp, blk);
+		u64 *blk_map = kbase_hwcnt_enable_map_block_instance(dst, grp, blk, blk_inst);
 
-		if ((enum kbase_hwcnt_gpu_group_type)grp_type ==
-		    KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
+		if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
 			const size_t map_stride =
 				kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk);
 			size_t map_idx;
@@ -664,7 +662,10 @@ void kbase_hwcnt_gpu_enable_map_from_physical(
 					break;
 
 				switch ((enum kbase_hwcnt_gpu_v5_block_type)blk_type) {
-				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED:
+				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED:
 					/* Nothing to do in this case. */
 					break;
 				case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE:
@@ -694,29 +695,25 @@ void kbase_hwcnt_gpu_enable_map_from_physical(
 	}
 }
 
-void kbase_hwcnt_gpu_patch_dump_headers(
-	struct kbase_hwcnt_dump_buffer *buf,
-	const struct kbase_hwcnt_enable_map *enable_map)
+void kbase_hwcnt_gpu_patch_dump_headers(struct kbase_hwcnt_dump_buffer *buf,
+					const struct kbase_hwcnt_enable_map *enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 
-	if (WARN_ON(!buf) || WARN_ON(!enable_map) ||
-	    WARN_ON(buf->metadata != enable_map->metadata))
+	if (WARN_ON(!buf) || WARN_ON(!enable_map) || WARN_ON(buf->metadata != enable_map->metadata))
 		return;
 
 	metadata = buf->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		const u64 grp_type =
-			kbase_hwcnt_metadata_group_type(metadata, grp);
-		u64 *buf_blk = kbase_hwcnt_dump_buffer_block_instance(
-			buf, grp, blk, blk_inst);
-		const u64 *blk_map = kbase_hwcnt_enable_map_block_instance(
-			enable_map, grp, blk, blk_inst);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		const u64 grp_type = kbase_hwcnt_metadata_group_type(metadata, grp);
+		u64 *buf_blk = kbase_hwcnt_dump_buffer_block_instance(buf, grp, blk, blk_inst);
+		const u64 *blk_map =
+			kbase_hwcnt_enable_map_block_instance(enable_map, grp, blk, blk_inst);
 
-		if ((enum kbase_hwcnt_gpu_group_type)grp_type ==
-		    KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
+		if ((enum kbase_hwcnt_gpu_group_type)grp_type == KBASE_HWCNT_GPU_GROUP_TYPE_V5) {
 			const size_t map_stride =
 				kbase_hwcnt_metadata_block_enable_map_stride(metadata, grp, blk);
 			u64 prfcnt_bm[EM_COUNT] = { 0 };
diff --git a/mali_kbase/mali_kbase_hwcnt_gpu.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.h
index 648f85f..a49c31e 100644
--- a/mali_kbase/mali_kbase_hwcnt_gpu.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,6 +22,7 @@
 #ifndef _KBASE_HWCNT_GPU_H_
 #define _KBASE_HWCNT_GPU_H_
 
+#include <linux/bug.h>
 #include <linux/types.h>
 
 struct kbase_device;
@@ -33,9 +34,8 @@ struct kbase_hwcnt_dump_buffer;
 #define KBASE_HWCNT_V5_BLOCK_TYPE_COUNT 4
 #define KBASE_HWCNT_V5_HEADERS_PER_BLOCK 4
 #define KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK 60
-#define KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK                                \
-	(KBASE_HWCNT_V5_HEADERS_PER_BLOCK +                                    \
-	 KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK)
+#define KBASE_HWCNT_V5_DEFAULT_VALUES_PER_BLOCK                                                    \
+	(KBASE_HWCNT_V5_HEADERS_PER_BLOCK + KBASE_HWCNT_V5_DEFAULT_COUNTERS_PER_BLOCK)
 
 /* FrontEnd block count in V5 GPU hardware counter. */
 #define KBASE_HWCNT_V5_FE_BLOCK_COUNT 1
@@ -60,33 +60,40 @@ enum kbase_hwcnt_gpu_group_type {
 /**
  * enum kbase_hwcnt_gpu_v5_block_type - GPU V5 hardware counter block types,
  *                                      used to identify metadata blocks.
- * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED: Undefined block (e.g. if a
- *                                                counter set that a block
- *                                                doesn't support is used).
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE:        Front End block (Job manager
  *                                                or CSF HW).
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2:       Secondary Front End block (Job
  *                                                manager or CSF HW).
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3:       Tertiary Front End block (Job
  *                                                manager or CSF HW).
+ * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED: Undefined Front End block
+ *                                                   (e.g. if a counter set that
+ *                                                   a block doesn't support is
+ *                                                   used).
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER:     Tiler block.
+ * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED: Undefined Tiler block.
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC:        Shader Core block.
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2:       Secondary Shader Core block.
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3:       Tertiary Shader Core block.
+ * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED: Undefined Shader Core block.
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS:    Memsys block.
  * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2:   Secondary Memsys block.
+ * @KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED: Undefined Memsys block.
  */
 enum kbase_hwcnt_gpu_v5_block_type {
-	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE2,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE3,
+	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER,
+	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC2,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC3,
+	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS,
 	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS2,
+	KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED,
 };
 
 /**
@@ -188,6 +195,27 @@ struct kbase_hwcnt_curr_config {
 };
 
 /**
+ * kbase_hwcnt_is_block_type_undefined() - Check if a block type is undefined.
+ *
+ * @grp_type: Hardware counter group type.
+ * @blk_type: Hardware counter block type.
+ *
+ * Return: true if the block type is undefined, else false.
+ */
+static inline bool kbase_hwcnt_is_block_type_undefined(const uint64_t grp_type,
+						       const uint64_t blk_type)
+{
+	/* Warn on unknown group type */
+	if (WARN_ON(grp_type != KBASE_HWCNT_GPU_GROUP_TYPE_V5))
+		return false;
+
+	return (blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED ||
+		blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED ||
+		blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED ||
+		blk_type == KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED);
+}
+
+/**
  * kbase_hwcnt_jm_metadata_create() - Create hardware counter metadata for the
  *                                    JM GPUs.
  * @info:           Non-NULL pointer to info struct.
@@ -199,19 +227,17 @@ struct kbase_hwcnt_curr_config {
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_jm_metadata_create(
-	const struct kbase_hwcnt_gpu_info *info,
-	enum kbase_hwcnt_set counter_set,
-	const struct kbase_hwcnt_metadata **out_metadata,
-	size_t *out_dump_bytes);
+int kbase_hwcnt_jm_metadata_create(const struct kbase_hwcnt_gpu_info *info,
+				   enum kbase_hwcnt_set counter_set,
+				   const struct kbase_hwcnt_metadata **out_metadata,
+				   size_t *out_dump_bytes);
 
 /**
  * kbase_hwcnt_jm_metadata_destroy() - Destroy JM GPU hardware counter metadata.
  *
  * @metadata: Pointer to metadata to destroy.
  */
-void kbase_hwcnt_jm_metadata_destroy(
-	const struct kbase_hwcnt_metadata *metadata);
+void kbase_hwcnt_jm_metadata_destroy(const struct kbase_hwcnt_metadata *metadata);
 
 /**
  * kbase_hwcnt_csf_metadata_create() - Create hardware counter metadata for the
@@ -223,18 +249,16 @@ void kbase_hwcnt_jm_metadata_destroy(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_csf_metadata_create(
-	const struct kbase_hwcnt_gpu_info *info,
-	enum kbase_hwcnt_set counter_set,
-	const struct kbase_hwcnt_metadata **out_metadata);
+int kbase_hwcnt_csf_metadata_create(const struct kbase_hwcnt_gpu_info *info,
+				    enum kbase_hwcnt_set counter_set,
+				    const struct kbase_hwcnt_metadata **out_metadata);
 
 /**
  * kbase_hwcnt_csf_metadata_destroy() - Destroy CSF GPU hardware counter
  *                                      metadata.
  * @metadata: Pointer to metadata to destroy.
  */
-void kbase_hwcnt_csf_metadata_destroy(
-	const struct kbase_hwcnt_metadata *metadata);
+void kbase_hwcnt_csf_metadata_destroy(const struct kbase_hwcnt_metadata *metadata);
 
 /**
  * kbase_hwcnt_jm_dump_get() - Copy or accumulate enabled counters from the raw
@@ -260,8 +284,7 @@ void kbase_hwcnt_csf_metadata_destroy(
 int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
 			    const struct kbase_hwcnt_enable_map *dst_enable_map,
 			    const u64 pm_core_mask,
-			    const struct kbase_hwcnt_curr_config *curr_config,
-			    bool accumulate);
+			    const struct kbase_hwcnt_curr_config *curr_config, bool accumulate);
 
 /**
  * kbase_hwcnt_csf_dump_get() - Copy or accumulate enabled counters from the raw
@@ -281,8 +304,7 @@ int kbase_hwcnt_jm_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
  * Return: 0 on success, else error code.
  */
 int kbase_hwcnt_csf_dump_get(struct kbase_hwcnt_dump_buffer *dst, u64 *src,
-			     const struct kbase_hwcnt_enable_map *dst_enable_map,
-			     bool accumulate);
+			     const struct kbase_hwcnt_enable_map *dst_enable_map, bool accumulate);
 
 /**
  * kbase_hwcnt_backend_gpu_block_map_to_physical() - Convert from a block
@@ -336,9 +358,8 @@ static inline u32 kbase_hwcnt_backend_gpu_block_map_to_physical(u64 lo, u64 hi)
  * individual counter block value, but the physical enable map uses 1 bit for
  * every 4 counters, shared over all instances of a block.
  */
-void kbase_hwcnt_gpu_enable_map_to_physical(
-	struct kbase_hwcnt_physical_enable_map *dst,
-	const struct kbase_hwcnt_enable_map *src);
+void kbase_hwcnt_gpu_enable_map_to_physical(struct kbase_hwcnt_physical_enable_map *dst,
+					    const struct kbase_hwcnt_enable_map *src);
 
 /**
  * kbase_hwcnt_gpu_set_to_physical() - Map counter set selection to physical
@@ -347,8 +368,7 @@ void kbase_hwcnt_gpu_enable_map_to_physical(
  * @dst: Non-NULL pointer to destination physical SET_SELECT value.
  * @src: Non-NULL pointer to source counter set selection.
  */
-void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst,
-				     enum kbase_hwcnt_set src);
+void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst, enum kbase_hwcnt_set src);
 
 /**
  * kbase_hwcnt_gpu_enable_map_from_physical() - Convert a physical enable map to
@@ -364,9 +384,8 @@ void kbase_hwcnt_gpu_set_to_physical(enum kbase_hwcnt_physical_set *dst,
  * more than 64, so the enable map abstraction has nowhere to store the enable
  * information for the 64 non-existent counters.
  */
-void kbase_hwcnt_gpu_enable_map_from_physical(
-	struct kbase_hwcnt_enable_map *dst,
-	const struct kbase_hwcnt_physical_enable_map *src);
+void kbase_hwcnt_gpu_enable_map_from_physical(struct kbase_hwcnt_enable_map *dst,
+					      const struct kbase_hwcnt_physical_enable_map *src);
 
 /**
  * kbase_hwcnt_gpu_patch_dump_headers() - Patch all the performance counter
@@ -382,8 +401,7 @@ void kbase_hwcnt_gpu_enable_map_from_physical(
  * kernel-user boundary, to ensure the header is accurate for the enable map
  * used by the user.
  */
-void kbase_hwcnt_gpu_patch_dump_headers(
-	struct kbase_hwcnt_dump_buffer *buf,
-	const struct kbase_hwcnt_enable_map *enable_map);
+void kbase_hwcnt_gpu_patch_dump_headers(struct kbase_hwcnt_dump_buffer *buf,
+					const struct kbase_hwcnt_enable_map *enable_map);
 
 #endif /* _KBASE_HWCNT_GPU_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.c
index e2caa1c..0cf2f94 100644
--- a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,21 +19,19 @@
  *
  */
 
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_gpu_narrow.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu_narrow.h"
 
 #include <linux/bug.h>
 #include <linux/err.h>
 #include <linux/slab.h>
 
-int kbase_hwcnt_gpu_metadata_narrow_create(
-	const struct kbase_hwcnt_metadata_narrow **dst_md_narrow,
-	const struct kbase_hwcnt_metadata *src_md)
+int kbase_hwcnt_gpu_metadata_narrow_create(const struct kbase_hwcnt_metadata_narrow **dst_md_narrow,
+					   const struct kbase_hwcnt_metadata *src_md)
 {
 	struct kbase_hwcnt_description desc;
 	struct kbase_hwcnt_group_description group;
-	struct kbase_hwcnt_block_description
-		blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT];
+	struct kbase_hwcnt_block_description blks[KBASE_HWCNT_V5_BLOCK_TYPE_COUNT];
 	size_t prfcnt_values_per_block;
 	size_t blk;
 	int err;
@@ -47,18 +45,15 @@ int kbase_hwcnt_gpu_metadata_narrow_create(
 	 * count in the metadata.
 	 */
 	if ((kbase_hwcnt_metadata_group_count(src_md) != 1) ||
-	    (kbase_hwcnt_metadata_block_count(src_md, 0) !=
-	     KBASE_HWCNT_V5_BLOCK_TYPE_COUNT))
+	    (kbase_hwcnt_metadata_block_count(src_md, 0) != KBASE_HWCNT_V5_BLOCK_TYPE_COUNT))
 		return -EINVAL;
 
 	/* Get the values count in the first block. */
-	prfcnt_values_per_block =
-		kbase_hwcnt_metadata_block_values_count(src_md, 0, 0);
+	prfcnt_values_per_block = kbase_hwcnt_metadata_block_values_count(src_md, 0, 0);
 
 	/* check all blocks should have same values count. */
 	for (blk = 1; blk < KBASE_HWCNT_V5_BLOCK_TYPE_COUNT; blk++) {
-		size_t val_cnt =
-			kbase_hwcnt_metadata_block_values_count(src_md, 0, blk);
+		size_t val_cnt = kbase_hwcnt_metadata_block_values_count(src_md, 0, blk);
 		if (val_cnt != prfcnt_values_per_block)
 			return -EINVAL;
 	}
@@ -75,12 +70,10 @@ int kbase_hwcnt_gpu_metadata_narrow_create(
 	prfcnt_values_per_block = 64;
 
 	for (blk = 0; blk < KBASE_HWCNT_V5_BLOCK_TYPE_COUNT; blk++) {
-		size_t blk_hdr_cnt = kbase_hwcnt_metadata_block_headers_count(
-			src_md, 0, blk);
+		size_t blk_hdr_cnt = kbase_hwcnt_metadata_block_headers_count(src_md, 0, blk);
 		blks[blk] = (struct kbase_hwcnt_block_description){
 			.type = kbase_hwcnt_metadata_block_type(src_md, 0, blk),
-			.inst_cnt = kbase_hwcnt_metadata_block_instance_count(
-				src_md, 0, blk),
+			.inst_cnt = kbase_hwcnt_metadata_block_instance_count(src_md, 0, blk),
 			.hdr_cnt = blk_hdr_cnt,
 			.ctr_cnt = prfcnt_values_per_block - blk_hdr_cnt,
 		};
@@ -105,8 +98,7 @@ int kbase_hwcnt_gpu_metadata_narrow_create(
 		 * only supports 32-bit but the created metadata uses 64-bit for
 		 * block entry.
 		 */
-		metadata_narrow->dump_buf_bytes =
-			metadata_narrow->metadata->dump_buf_bytes >> 1;
+		metadata_narrow->dump_buf_bytes = metadata_narrow->metadata->dump_buf_bytes >> 1;
 		*dst_md_narrow = metadata_narrow;
 	} else {
 		kfree(metadata_narrow);
@@ -115,8 +107,7 @@ int kbase_hwcnt_gpu_metadata_narrow_create(
 	return err;
 }
 
-void kbase_hwcnt_gpu_metadata_narrow_destroy(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow)
+void kbase_hwcnt_gpu_metadata_narrow_destroy(const struct kbase_hwcnt_metadata_narrow *md_narrow)
 {
 	if (!md_narrow)
 		return;
@@ -125,9 +116,8 @@ void kbase_hwcnt_gpu_metadata_narrow_destroy(
 	kfree(md_narrow);
 }
 
-int kbase_hwcnt_dump_buffer_narrow_alloc(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow,
-	struct kbase_hwcnt_dump_buffer_narrow *dump_buf)
+int kbase_hwcnt_dump_buffer_narrow_alloc(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+					 struct kbase_hwcnt_dump_buffer_narrow *dump_buf)
 {
 	size_t dump_buf_bytes;
 	size_t clk_cnt_buf_bytes;
@@ -137,8 +127,7 @@ int kbase_hwcnt_dump_buffer_narrow_alloc(
 		return -EINVAL;
 
 	dump_buf_bytes = md_narrow->dump_buf_bytes;
-	clk_cnt_buf_bytes =
-		sizeof(*dump_buf->clk_cnt_buf) * md_narrow->metadata->clk_cnt;
+	clk_cnt_buf_bytes = sizeof(*dump_buf->clk_cnt_buf) * md_narrow->metadata->clk_cnt;
 
 	/* Make a single allocation for both dump_buf and clk_cnt_buf. */
 	buf = kmalloc(dump_buf_bytes + clk_cnt_buf_bytes, GFP_KERNEL);
@@ -154,14 +143,15 @@ int kbase_hwcnt_dump_buffer_narrow_alloc(
 	return 0;
 }
 
-void kbase_hwcnt_dump_buffer_narrow_free(
-	struct kbase_hwcnt_dump_buffer_narrow *dump_buf_narrow)
+void kbase_hwcnt_dump_buffer_narrow_free(struct kbase_hwcnt_dump_buffer_narrow *dump_buf_narrow)
 {
 	if (!dump_buf_narrow)
 		return;
 
 	kfree(dump_buf_narrow->dump_buf);
-	*dump_buf_narrow = (struct kbase_hwcnt_dump_buffer_narrow){ 0 };
+	*dump_buf_narrow = (struct kbase_hwcnt_dump_buffer_narrow){ .md_narrow = NULL,
+								    .dump_buf = NULL,
+								    .clk_cnt_buf = NULL };
 }
 
 int kbase_hwcnt_dump_buffer_narrow_array_alloc(
@@ -180,8 +170,7 @@ int kbase_hwcnt_dump_buffer_narrow_array_alloc(
 		return -EINVAL;
 
 	dump_buf_bytes = md_narrow->dump_buf_bytes;
-	clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) *
-			    md_narrow->metadata->clk_cnt;
+	clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) * md_narrow->metadata->clk_cnt;
 
 	/* Allocate memory for the dump buffer struct array */
 	buffers = kmalloc_array(n, sizeof(*buffers), GFP_KERNEL);
@@ -234,27 +223,22 @@ void kbase_hwcnt_dump_buffer_narrow_array_free(
 	memset(dump_bufs, 0, sizeof(*dump_bufs));
 }
 
-void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk,
-						      const u64 *src_blk,
-						      const u64 *blk_em,
-						      size_t val_cnt)
+void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, const u64 *src_blk,
+						      const u64 *blk_em, size_t val_cnt)
 {
 	size_t val;
 
 	for (val = 0; val < val_cnt; val++) {
-		bool val_enabled =
-			kbase_hwcnt_enable_map_block_value_enabled(blk_em, val);
-		u32 src_val =
-			(src_blk[val] > U32_MAX) ? U32_MAX : (u32)src_blk[val];
+		bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, val);
+		u32 src_val = (src_blk[val] > U32_MAX) ? U32_MAX : (u32)src_blk[val];
 
 		dst_blk[val] = val_enabled ? src_val : 0;
 	}
 }
 
-void kbase_hwcnt_dump_buffer_copy_strict_narrow(
-	struct kbase_hwcnt_dump_buffer_narrow *dst_narrow,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_copy_strict_narrow(struct kbase_hwcnt_dump_buffer_narrow *dst_narrow,
+						const struct kbase_hwcnt_dump_buffer *src,
+						const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata_narrow *metadata_narrow;
 	size_t grp;
@@ -262,68 +246,53 @@ void kbase_hwcnt_dump_buffer_copy_strict_narrow(
 
 	if (WARN_ON(!dst_narrow) || WARN_ON(!src) || WARN_ON(!dst_enable_map) ||
 	    WARN_ON(dst_narrow->md_narrow->metadata == src->metadata) ||
-	    WARN_ON(dst_narrow->md_narrow->metadata->grp_cnt !=
-		    src->metadata->grp_cnt) ||
+	    WARN_ON(dst_narrow->md_narrow->metadata->grp_cnt != src->metadata->grp_cnt) ||
 	    WARN_ON(src->metadata->grp_cnt != 1) ||
 	    WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_cnt !=
 		    src->metadata->grp_metadata[0].blk_cnt) ||
 	    WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_cnt !=
 		    KBASE_HWCNT_V5_BLOCK_TYPE_COUNT) ||
-	    WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0]
-			    .blk_metadata[0]
-			    .ctr_cnt >
+	    WARN_ON(dst_narrow->md_narrow->metadata->grp_metadata[0].blk_metadata[0].ctr_cnt >
 		    src->metadata->grp_metadata[0].blk_metadata[0].ctr_cnt))
 		return;
 
 	/* Don't use src metadata since src buffer is bigger than dst buffer. */
 	metadata_narrow = dst_narrow->md_narrow;
 
-	for (grp = 0;
-	     grp < kbase_hwcnt_metadata_narrow_group_count(metadata_narrow);
-	     grp++) {
+	for (grp = 0; grp < kbase_hwcnt_metadata_narrow_group_count(metadata_narrow); grp++) {
 		size_t blk;
-		size_t blk_cnt = kbase_hwcnt_metadata_narrow_block_count(
-			metadata_narrow, grp);
+		size_t blk_cnt = kbase_hwcnt_metadata_narrow_block_count(metadata_narrow, grp);
 
 		for (blk = 0; blk < blk_cnt; blk++) {
 			size_t blk_inst;
-			size_t blk_inst_cnt =
-				kbase_hwcnt_metadata_narrow_block_instance_count(
-					metadata_narrow, grp, blk);
+			size_t blk_inst_cnt = kbase_hwcnt_metadata_narrow_block_instance_count(
+				metadata_narrow, grp, blk);
 
-			for (blk_inst = 0; blk_inst < blk_inst_cnt;
-			     blk_inst++) {
+			for (blk_inst = 0; blk_inst < blk_inst_cnt; blk_inst++) {
 				/* The narrowed down buffer is only 32-bit. */
-				u32 *dst_blk =
-					kbase_hwcnt_dump_buffer_narrow_block_instance(
-						dst_narrow, grp, blk, blk_inst);
-				const u64 *src_blk =
-					kbase_hwcnt_dump_buffer_block_instance(
-						src, grp, blk, blk_inst);
-				const u64 *blk_em =
-					kbase_hwcnt_enable_map_block_instance(
-						dst_enable_map, grp, blk,
-						blk_inst);
-				size_t val_cnt =
-					kbase_hwcnt_metadata_narrow_block_values_count(
-						metadata_narrow, grp, blk);
+				u32 *dst_blk = kbase_hwcnt_dump_buffer_narrow_block_instance(
+					dst_narrow, grp, blk, blk_inst);
+				const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance(
+					src, grp, blk, blk_inst);
+				const u64 *blk_em = kbase_hwcnt_enable_map_block_instance(
+					dst_enable_map, grp, blk, blk_inst);
+				size_t val_cnt = kbase_hwcnt_metadata_narrow_block_values_count(
+					metadata_narrow, grp, blk);
 				/* Align upwards to include padding bytes */
 				val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(
-					val_cnt,
-					(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
-					 KBASE_HWCNT_VALUE_BYTES));
+					val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
+						  KBASE_HWCNT_VALUE_BYTES));
 
-				kbase_hwcnt_dump_buffer_block_copy_strict_narrow(
-					dst_blk, src_blk, blk_em, val_cnt);
+				kbase_hwcnt_dump_buffer_block_copy_strict_narrow(dst_blk, src_blk,
+										 blk_em, val_cnt);
 			}
 		}
 	}
 
 	for (clk = 0; clk < metadata_narrow->metadata->clk_cnt; clk++) {
-		bool clk_enabled = kbase_hwcnt_clk_enable_map_enabled(
-			dst_enable_map->clk_enable_map, clk);
+		bool clk_enabled =
+			kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk);
 
-		dst_narrow->clk_cnt_buf[clk] =
-			clk_enabled ? src->clk_cnt_buf[clk] : 0;
+		dst_narrow->clk_cnt_buf[clk] = clk_enabled ? src->clk_cnt_buf[clk] : 0;
 	}
 }
diff --git a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.h
index af6fa19..afd236d 100644
--- a/mali_kbase/mali_kbase_hwcnt_gpu_narrow.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_gpu_narrow.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -22,7 +22,7 @@
 #ifndef _KBASE_HWCNT_GPU_NARROW_H_
 #define _KBASE_HWCNT_GPU_NARROW_H_
 
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 #include <linux/types.h>
 
 struct kbase_device;
@@ -86,8 +86,8 @@ struct kbase_hwcnt_dump_buffer_narrow_array {
  *
  * Return: Number of hardware counter groups described by narrow metadata.
  */
-static inline size_t kbase_hwcnt_metadata_narrow_group_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow)
+static inline size_t
+kbase_hwcnt_metadata_narrow_group_count(const struct kbase_hwcnt_metadata_narrow *md_narrow)
 {
 	return kbase_hwcnt_metadata_group_count(md_narrow->metadata);
 }
@@ -100,8 +100,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_group_count(
  *
  * Return: Type of the group grp.
  */
-static inline u64 kbase_hwcnt_metadata_narrow_group_type(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp)
+static inline u64
+kbase_hwcnt_metadata_narrow_group_type(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+				       size_t grp)
 {
 	return kbase_hwcnt_metadata_group_type(md_narrow->metadata, grp);
 }
@@ -114,8 +115,9 @@ static inline u64 kbase_hwcnt_metadata_narrow_group_type(
  *
  * Return: Number of blocks in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_narrow_block_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp)
+static inline size_t
+kbase_hwcnt_metadata_narrow_block_count(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+					size_t grp)
 {
 	return kbase_hwcnt_metadata_block_count(md_narrow->metadata, grp);
 }
@@ -131,11 +133,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_count(
  * Return: Number of instances of block blk in group grp.
  */
 static inline size_t kbase_hwcnt_metadata_narrow_block_instance_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp,
-	size_t blk)
+	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, size_t blk)
 {
-	return kbase_hwcnt_metadata_block_instance_count(md_narrow->metadata,
-							 grp, blk);
+	return kbase_hwcnt_metadata_block_instance_count(md_narrow->metadata, grp, blk);
 }
 
 /**
@@ -148,12 +148,11 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_instance_count(
  *
  * Return: Number of counter headers in each instance of block blk in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_narrow_block_headers_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp,
-	size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_narrow_block_headers_count(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+						size_t grp, size_t blk)
 {
-	return kbase_hwcnt_metadata_block_headers_count(md_narrow->metadata,
-							grp, blk);
+	return kbase_hwcnt_metadata_block_headers_count(md_narrow->metadata, grp, blk);
 }
 
 /**
@@ -167,11 +166,9 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_headers_count(
  * Return: Number of counters in each instance of block blk in group grp.
  */
 static inline size_t kbase_hwcnt_metadata_narrow_block_counters_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp,
-	size_t blk)
+	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp, size_t blk)
 {
-	return kbase_hwcnt_metadata_block_counters_count(md_narrow->metadata,
-							 grp, blk);
+	return kbase_hwcnt_metadata_block_counters_count(md_narrow->metadata, grp, blk);
 }
 
 /**
@@ -184,14 +181,12 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_counters_count(
  * Return: Number of headers plus counters in each instance of block blk
  *         in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_narrow_block_values_count(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow, size_t grp,
-	size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_narrow_block_values_count(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+					       size_t grp, size_t blk)
 {
-	return kbase_hwcnt_metadata_narrow_block_counters_count(md_narrow, grp,
-								blk) +
-	       kbase_hwcnt_metadata_narrow_block_headers_count(md_narrow, grp,
-							       blk);
+	return kbase_hwcnt_metadata_narrow_block_counters_count(md_narrow, grp, blk) +
+	       kbase_hwcnt_metadata_narrow_block_headers_count(md_narrow, grp, blk);
 }
 
 /**
@@ -205,18 +200,13 @@ static inline size_t kbase_hwcnt_metadata_narrow_block_values_count(
  *
  * Return: u32* to the dump buffer for the block instance.
  */
-static inline u32 *kbase_hwcnt_dump_buffer_narrow_block_instance(
-	const struct kbase_hwcnt_dump_buffer_narrow *buf, size_t grp,
-	size_t blk, size_t blk_inst)
+static inline u32 *
+kbase_hwcnt_dump_buffer_narrow_block_instance(const struct kbase_hwcnt_dump_buffer_narrow *buf,
+					      size_t grp, size_t blk, size_t blk_inst)
 {
-	return buf->dump_buf +
-	       buf->md_narrow->metadata->grp_metadata[grp].dump_buf_index +
-	       buf->md_narrow->metadata->grp_metadata[grp]
-		       .blk_metadata[blk]
-		       .dump_buf_index +
-	       (buf->md_narrow->metadata->grp_metadata[grp]
-			.blk_metadata[blk]
-			.dump_buf_stride *
+	return buf->dump_buf + buf->md_narrow->metadata->grp_metadata[grp].dump_buf_index +
+	       buf->md_narrow->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_index +
+	       (buf->md_narrow->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_stride *
 		blk_inst);
 }
 
@@ -239,17 +229,15 @@ static inline u32 *kbase_hwcnt_dump_buffer_narrow_block_instance(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_gpu_metadata_narrow_create(
-	const struct kbase_hwcnt_metadata_narrow **dst_md_narrow,
-	const struct kbase_hwcnt_metadata *src_md);
+int kbase_hwcnt_gpu_metadata_narrow_create(const struct kbase_hwcnt_metadata_narrow **dst_md_narrow,
+					   const struct kbase_hwcnt_metadata *src_md);
 
 /**
  * kbase_hwcnt_gpu_metadata_narrow_destroy() - Destroy a hardware counter narrow
  *                                             metadata object.
  * @md_narrow: Pointer to hardware counter narrow metadata.
  */
-void kbase_hwcnt_gpu_metadata_narrow_destroy(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow);
+void kbase_hwcnt_gpu_metadata_narrow_destroy(const struct kbase_hwcnt_metadata_narrow *md_narrow);
 
 /**
  * kbase_hwcnt_dump_buffer_narrow_alloc() - Allocate a narrow dump buffer.
@@ -260,9 +248,8 @@ void kbase_hwcnt_gpu_metadata_narrow_destroy(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_dump_buffer_narrow_alloc(
-	const struct kbase_hwcnt_metadata_narrow *md_narrow,
-	struct kbase_hwcnt_dump_buffer_narrow *dump_buf);
+int kbase_hwcnt_dump_buffer_narrow_alloc(const struct kbase_hwcnt_metadata_narrow *md_narrow,
+					 struct kbase_hwcnt_dump_buffer_narrow *dump_buf);
 
 /**
  * kbase_hwcnt_dump_buffer_narrow_free() - Free a narrow dump buffer.
@@ -271,8 +258,7 @@ int kbase_hwcnt_dump_buffer_narrow_alloc(
  * Can be safely called on an all-zeroed narrow dump buffer structure, or on an
  * already freed narrow dump buffer.
  */
-void kbase_hwcnt_dump_buffer_narrow_free(
-	struct kbase_hwcnt_dump_buffer_narrow *dump_buf);
+void kbase_hwcnt_dump_buffer_narrow_free(struct kbase_hwcnt_dump_buffer_narrow *dump_buf);
 
 /**
  * kbase_hwcnt_dump_buffer_narrow_array_alloc() - Allocate an array of narrow
@@ -320,10 +306,8 @@ void kbase_hwcnt_dump_buffer_narrow_array_free(
  * source value is bigger than U32_MAX, or copy the value from source if the
  * corresponding source value is less than or equal to U32_MAX.
  */
-void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk,
-						      const u64 *src_blk,
-						      const u64 *blk_em,
-						      size_t val_cnt);
+void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk, const u64 *src_blk,
+						      const u64 *blk_em, size_t val_cnt);
 
 /**
  * kbase_hwcnt_dump_buffer_copy_strict_narrow() - Copy all enabled values to a
@@ -339,9 +323,8 @@ void kbase_hwcnt_dump_buffer_block_copy_strict_narrow(u32 *dst_blk,
  * corresponding source value is bigger than U32_MAX, or copy the value from
  * source if the corresponding source value is less than or equal to U32_MAX.
  */
-void kbase_hwcnt_dump_buffer_copy_strict_narrow(
-	struct kbase_hwcnt_dump_buffer_narrow *dst_narrow,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_copy_strict_narrow(struct kbase_hwcnt_dump_buffer_narrow *dst_narrow,
+						const struct kbase_hwcnt_dump_buffer *src,
+						const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 #endif /* _KBASE_HWCNT_GPU_NARROW_H_ */
diff --git a/mali_kbase/mali_kbase_hwcnt_types.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.c
index d925ed7..763eb31 100644
--- a/mali_kbase/mali_kbase_hwcnt_types.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,13 +19,12 @@
  *
  */
 
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
 #include <linux/slab.h>
 
-int kbase_hwcnt_metadata_create(
-	const struct kbase_hwcnt_description *desc,
-	const struct kbase_hwcnt_metadata **out_metadata)
+int kbase_hwcnt_metadata_create(const struct kbase_hwcnt_description *desc,
+				const struct kbase_hwcnt_metadata **out_metadata)
 {
 	char *buf;
 	struct kbase_hwcnt_metadata *metadata;
@@ -56,8 +55,7 @@ int kbase_hwcnt_metadata_create(
 
 	/* Block metadata */
 	for (grp = 0; grp < desc->grp_cnt; grp++) {
-		size += sizeof(struct kbase_hwcnt_block_metadata) *
-			desc->grps[grp].blk_cnt;
+		size += sizeof(struct kbase_hwcnt_block_metadata) * desc->grps[grp].blk_cnt;
 	}
 
 	/* Single allocation for the entire metadata */
@@ -83,8 +81,7 @@ int kbase_hwcnt_metadata_create(
 	for (grp = 0; grp < desc->grp_cnt; grp++) {
 		size_t blk;
 
-		const struct kbase_hwcnt_group_description *grp_desc =
-			desc->grps + grp;
+		const struct kbase_hwcnt_group_description *grp_desc = desc->grps + grp;
 		struct kbase_hwcnt_group_metadata *grp_md = grp_mds + grp;
 
 		size_t group_enable_map_count = 0;
@@ -94,37 +91,28 @@ int kbase_hwcnt_metadata_create(
 		/* Bump allocate this group's block metadata */
 		struct kbase_hwcnt_block_metadata *blk_mds =
 			(struct kbase_hwcnt_block_metadata *)(buf + offset);
-		offset += sizeof(struct kbase_hwcnt_block_metadata) *
-			grp_desc->blk_cnt;
+		offset += sizeof(struct kbase_hwcnt_block_metadata) * grp_desc->blk_cnt;
 
 		/* Fill in each block in the group's information */
 		for (blk = 0; blk < grp_desc->blk_cnt; blk++) {
-			const struct kbase_hwcnt_block_description *blk_desc =
-				grp_desc->blks + blk;
-			struct kbase_hwcnt_block_metadata *blk_md =
-				blk_mds + blk;
-			const size_t n_values =
-				blk_desc->hdr_cnt + blk_desc->ctr_cnt;
+			const struct kbase_hwcnt_block_description *blk_desc = grp_desc->blks + blk;
+			struct kbase_hwcnt_block_metadata *blk_md = blk_mds + blk;
+			const size_t n_values = blk_desc->hdr_cnt + blk_desc->ctr_cnt;
 
 			blk_md->type = blk_desc->type;
 			blk_md->inst_cnt = blk_desc->inst_cnt;
 			blk_md->hdr_cnt = blk_desc->hdr_cnt;
 			blk_md->ctr_cnt = blk_desc->ctr_cnt;
 			blk_md->enable_map_index = group_enable_map_count;
-			blk_md->enable_map_stride =
-				kbase_hwcnt_bitfield_count(n_values);
+			blk_md->enable_map_stride = kbase_hwcnt_bitfield_count(n_values);
 			blk_md->dump_buf_index = group_dump_buffer_count;
-			blk_md->dump_buf_stride =
-				KBASE_HWCNT_ALIGN_UPWARDS(
-					n_values,
-					(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
-					 KBASE_HWCNT_VALUE_BYTES));
+			blk_md->dump_buf_stride = KBASE_HWCNT_ALIGN_UPWARDS(
+				n_values,
+				(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES));
 			blk_md->avail_mask_index = group_avail_mask_bits;
 
-			group_enable_map_count +=
-				blk_md->enable_map_stride * blk_md->inst_cnt;
-			group_dump_buffer_count +=
-				blk_md->dump_buf_stride * blk_md->inst_cnt;
+			group_enable_map_count += blk_md->enable_map_stride * blk_md->inst_cnt;
+			group_dump_buffer_count += blk_md->dump_buf_stride * blk_md->inst_cnt;
 			group_avail_mask_bits += blk_md->inst_cnt;
 		}
 
@@ -144,8 +132,7 @@ int kbase_hwcnt_metadata_create(
 	/* Fill in the top level metadata's information */
 	metadata->grp_cnt = desc->grp_cnt;
 	metadata->grp_metadata = grp_mds;
-	metadata->enable_map_bytes =
-		enable_map_count * KBASE_HWCNT_BITFIELD_BYTES;
+	metadata->enable_map_bytes = enable_map_count * KBASE_HWCNT_BITFIELD_BYTES;
 	metadata->dump_buf_bytes = dump_buf_count * KBASE_HWCNT_VALUE_BYTES;
 	metadata->avail_mask = desc->avail_mask;
 	metadata->clk_cnt = desc->clk_cnt;
@@ -155,8 +142,7 @@ int kbase_hwcnt_metadata_create(
 	 * bit per 4 bytes in the dump buffer.
 	 */
 	WARN_ON(metadata->dump_buf_bytes !=
-		(metadata->enable_map_bytes *
-		 BITS_PER_BYTE * KBASE_HWCNT_VALUE_BYTES));
+		(metadata->enable_map_bytes * BITS_PER_BYTE * KBASE_HWCNT_VALUE_BYTES));
 
 	*out_metadata = metadata;
 	return 0;
@@ -167,9 +153,8 @@ void kbase_hwcnt_metadata_destroy(const struct kbase_hwcnt_metadata *metadata)
 	kfree(metadata);
 }
 
-int kbase_hwcnt_enable_map_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	struct kbase_hwcnt_enable_map *enable_map)
+int kbase_hwcnt_enable_map_alloc(const struct kbase_hwcnt_metadata *metadata,
+				 struct kbase_hwcnt_enable_map *enable_map)
 {
 	u64 *enable_map_buf;
 
@@ -177,8 +162,7 @@ int kbase_hwcnt_enable_map_alloc(
 		return -EINVAL;
 
 	if (metadata->enable_map_bytes > 0) {
-		enable_map_buf =
-			kzalloc(metadata->enable_map_bytes, GFP_KERNEL);
+		enable_map_buf = kzalloc(metadata->enable_map_bytes, GFP_KERNEL);
 		if (!enable_map_buf)
 			return -ENOMEM;
 	} else {
@@ -200,9 +184,8 @@ void kbase_hwcnt_enable_map_free(struct kbase_hwcnt_enable_map *enable_map)
 	enable_map->metadata = NULL;
 }
 
-int kbase_hwcnt_dump_buffer_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+int kbase_hwcnt_dump_buffer_alloc(const struct kbase_hwcnt_metadata *metadata,
+				  struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	size_t dump_buf_bytes;
 	size_t clk_cnt_buf_bytes;
@@ -235,10 +218,8 @@ void kbase_hwcnt_dump_buffer_free(struct kbase_hwcnt_dump_buffer *dump_buf)
 	memset(dump_buf, 0, sizeof(*dump_buf));
 }
 
-int kbase_hwcnt_dump_buffer_array_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	size_t n,
-	struct kbase_hwcnt_dump_buffer_array *dump_bufs)
+int kbase_hwcnt_dump_buffer_array_alloc(const struct kbase_hwcnt_metadata *metadata, size_t n,
+					struct kbase_hwcnt_dump_buffer_array *dump_bufs)
 {
 	struct kbase_hwcnt_dump_buffer *buffers;
 	size_t buf_idx;
@@ -251,8 +232,7 @@ int kbase_hwcnt_dump_buffer_array_alloc(
 		return -EINVAL;
 
 	dump_buf_bytes = metadata->dump_buf_bytes;
-	clk_cnt_buf_bytes =
-		sizeof(*dump_bufs->bufs->clk_cnt_buf) * metadata->clk_cnt;
+	clk_cnt_buf_bytes = sizeof(*dump_bufs->bufs->clk_cnt_buf) * metadata->clk_cnt;
 
 	/* Allocate memory for the dump buffer struct array */
 	buffers = kmalloc_array(n, sizeof(*buffers), GFP_KERNEL);
@@ -283,15 +263,13 @@ int kbase_hwcnt_dump_buffer_array_alloc(
 
 		buffers[buf_idx].metadata = metadata;
 		buffers[buf_idx].dump_buf = (u64 *)(addr + dump_buf_offset);
-		buffers[buf_idx].clk_cnt_buf =
-			(u64 *)(addr + clk_cnt_buf_offset);
+		buffers[buf_idx].clk_cnt_buf = (u64 *)(addr + clk_cnt_buf_offset);
 	}
 
 	return 0;
 }
 
-void kbase_hwcnt_dump_buffer_array_free(
-	struct kbase_hwcnt_dump_buffer_array *dump_bufs)
+void kbase_hwcnt_dump_buffer_array_free(struct kbase_hwcnt_dump_buffer_array *dump_bufs)
 {
 	if (!dump_bufs)
 		return;
@@ -301,84 +279,71 @@ void kbase_hwcnt_dump_buffer_array_free(
 	memset(dump_bufs, 0, sizeof(*dump_bufs));
 }
 
-void kbase_hwcnt_dump_buffer_zero(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_zero(struct kbase_hwcnt_dump_buffer *dst,
+				  const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!dst_enable_map) ||
+	if (WARN_ON(!dst) || WARN_ON(!dst_enable_map) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
 		u64 *dst_blk;
 		size_t val_cnt;
 
-		if (!kbase_hwcnt_enable_map_block_enabled(
-			dst_enable_map, grp, blk, blk_inst))
+		if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst))
 			continue;
 
-		dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		val_cnt = kbase_hwcnt_metadata_block_values_count(
-			metadata, grp, blk);
+		dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk);
 
 		kbase_hwcnt_dump_buffer_block_zero(dst_blk, val_cnt);
 	}
 
-	memset(dst->clk_cnt_buf, 0,
-		sizeof(*dst->clk_cnt_buf) * metadata->clk_cnt);
+	memset(dst->clk_cnt_buf, 0, sizeof(*dst->clk_cnt_buf) * metadata->clk_cnt);
 }
 
-void kbase_hwcnt_dump_buffer_zero_strict(
-	struct kbase_hwcnt_dump_buffer *dst)
+void kbase_hwcnt_dump_buffer_zero_strict(struct kbase_hwcnt_dump_buffer *dst)
 {
 	if (WARN_ON(!dst))
 		return;
 
 	memset(dst->dump_buf, 0, dst->metadata->dump_buf_bytes);
 
-	memset(dst->clk_cnt_buf, 0,
-		sizeof(*dst->clk_cnt_buf) * dst->metadata->clk_cnt);
+	memset(dst->clk_cnt_buf, 0, sizeof(*dst->clk_cnt_buf) * dst->metadata->clk_cnt);
 }
 
-void kbase_hwcnt_dump_buffer_zero_non_enabled(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_zero_non_enabled(struct kbase_hwcnt_dump_buffer *dst,
+					      const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!dst_enable_map) ||
+	if (WARN_ON(!dst) || WARN_ON(!dst_enable_map) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		const u64 *blk_em = kbase_hwcnt_enable_map_block_instance(
-			dst_enable_map, grp, blk, blk_inst);
-		size_t val_cnt = kbase_hwcnt_metadata_block_values_count(
-			metadata, grp, blk);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		const u64 *blk_em =
+			kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst);
+		size_t val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk);
 
 		/* Align upwards to include padding bytes */
-		val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(val_cnt,
-			(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
-			 KBASE_HWCNT_VALUE_BYTES));
+		val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(
+			val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES));
 
-		if (kbase_hwcnt_metadata_block_instance_avail(
-			metadata, grp, blk, blk_inst)) {
+		if (kbase_hwcnt_metadata_block_instance_avail(metadata, grp, blk, blk_inst)) {
 			/* Block available, so only zero non-enabled values */
-			kbase_hwcnt_dump_buffer_block_zero_non_enabled(
-				dst_blk, blk_em, val_cnt);
+			kbase_hwcnt_dump_buffer_block_zero_non_enabled(dst_blk, blk_em, val_cnt);
 		} else {
 			/* Block not available, so zero the entire thing */
 			kbase_hwcnt_dump_buffer_block_zero(dst_blk, val_cnt);
@@ -386,188 +351,159 @@ void kbase_hwcnt_dump_buffer_zero_non_enabled(
 	}
 }
 
-void kbase_hwcnt_dump_buffer_copy(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_copy(struct kbase_hwcnt_dump_buffer *dst,
+				  const struct kbase_hwcnt_dump_buffer *src,
+				  const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 	size_t clk;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!src) ||
-	    WARN_ON(!dst_enable_map) ||
-	    WARN_ON(dst == src) ||
+	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) ||
 	    WARN_ON(dst->metadata != src->metadata) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
 		u64 *dst_blk;
 		const u64 *src_blk;
 		size_t val_cnt;
 
-		if (!kbase_hwcnt_enable_map_block_enabled(
-			dst_enable_map, grp, blk, blk_inst))
+		if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst))
 			continue;
 
-		dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		src_blk = kbase_hwcnt_dump_buffer_block_instance(
-			src, grp, blk, blk_inst);
-		val_cnt = kbase_hwcnt_metadata_block_values_count(
-			metadata, grp, blk);
+		dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		src_blk = kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst);
+		val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk);
 
 		kbase_hwcnt_dump_buffer_block_copy(dst_blk, src_blk, val_cnt);
 	}
 
-	kbase_hwcnt_metadata_for_each_clock(metadata, clk) {
-		if (kbase_hwcnt_clk_enable_map_enabled(
-			dst_enable_map->clk_enable_map, clk))
+	kbase_hwcnt_metadata_for_each_clock(metadata, clk)
+	{
+		if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk))
 			dst->clk_cnt_buf[clk] = src->clk_cnt_buf[clk];
 	}
 }
 
-void kbase_hwcnt_dump_buffer_copy_strict(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_copy_strict(struct kbase_hwcnt_dump_buffer *dst,
+					 const struct kbase_hwcnt_dump_buffer *src,
+					 const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 	size_t clk;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!src) ||
-	    WARN_ON(!dst_enable_map) ||
-	    WARN_ON(dst == src) ||
+	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) ||
 	    WARN_ON(dst->metadata != src->metadata) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance(
-			src, grp, blk, blk_inst);
-		const u64 *blk_em = kbase_hwcnt_enable_map_block_instance(
-			dst_enable_map, grp, blk, blk_inst);
-		size_t val_cnt = kbase_hwcnt_metadata_block_values_count(
-			metadata, grp, blk);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		const u64 *src_blk =
+			kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst);
+		const u64 *blk_em =
+			kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst);
+		size_t val_cnt = kbase_hwcnt_metadata_block_values_count(metadata, grp, blk);
 		/* Align upwards to include padding bytes */
-		val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(val_cnt,
-			(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
-			 KBASE_HWCNT_VALUE_BYTES));
+		val_cnt = KBASE_HWCNT_ALIGN_UPWARDS(
+			val_cnt, (KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES));
 
-		kbase_hwcnt_dump_buffer_block_copy_strict(
-			dst_blk, src_blk, blk_em, val_cnt);
+		kbase_hwcnt_dump_buffer_block_copy_strict(dst_blk, src_blk, blk_em, val_cnt);
 	}
 
-	kbase_hwcnt_metadata_for_each_clock(metadata, clk) {
+	kbase_hwcnt_metadata_for_each_clock(metadata, clk)
+	{
 		bool clk_enabled =
-			kbase_hwcnt_clk_enable_map_enabled(
-				dst_enable_map->clk_enable_map, clk);
+			kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk);
 
 		dst->clk_cnt_buf[clk] = clk_enabled ? src->clk_cnt_buf[clk] : 0;
 	}
 }
 
-void kbase_hwcnt_dump_buffer_accumulate(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_accumulate(struct kbase_hwcnt_dump_buffer *dst,
+					const struct kbase_hwcnt_dump_buffer *src,
+					const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 	size_t clk;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!src) ||
-	    WARN_ON(!dst_enable_map) ||
-	    WARN_ON(dst == src) ||
+	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) ||
 	    WARN_ON(dst->metadata != src->metadata) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
 		u64 *dst_blk;
 		const u64 *src_blk;
 		size_t hdr_cnt;
 		size_t ctr_cnt;
 
-		if (!kbase_hwcnt_enable_map_block_enabled(
-			dst_enable_map, grp, blk, blk_inst))
+		if (!kbase_hwcnt_enable_map_block_enabled(dst_enable_map, grp, blk, blk_inst))
 			continue;
 
-		dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		src_blk = kbase_hwcnt_dump_buffer_block_instance(
-			src, grp, blk, blk_inst);
-		hdr_cnt = kbase_hwcnt_metadata_block_headers_count(
-			metadata, grp, blk);
-		ctr_cnt = kbase_hwcnt_metadata_block_counters_count(
-			metadata, grp, blk);
-
-		kbase_hwcnt_dump_buffer_block_accumulate(
-			dst_blk, src_blk, hdr_cnt, ctr_cnt);
+		dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		src_blk = kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst);
+		hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk);
+		ctr_cnt = kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk);
+
+		kbase_hwcnt_dump_buffer_block_accumulate(dst_blk, src_blk, hdr_cnt, ctr_cnt);
 	}
 
-	kbase_hwcnt_metadata_for_each_clock(metadata, clk) {
-		if (kbase_hwcnt_clk_enable_map_enabled(
-			dst_enable_map->clk_enable_map, clk))
+	kbase_hwcnt_metadata_for_each_clock(metadata, clk)
+	{
+		if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk))
 			dst->clk_cnt_buf[clk] += src->clk_cnt_buf[clk];
 	}
 }
 
-void kbase_hwcnt_dump_buffer_accumulate_strict(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map)
+void kbase_hwcnt_dump_buffer_accumulate_strict(struct kbase_hwcnt_dump_buffer *dst,
+					       const struct kbase_hwcnt_dump_buffer *src,
+					       const struct kbase_hwcnt_enable_map *dst_enable_map)
 {
 	const struct kbase_hwcnt_metadata *metadata;
 	size_t grp, blk, blk_inst;
 	size_t clk;
 
-	if (WARN_ON(!dst) ||
-	    WARN_ON(!src) ||
-	    WARN_ON(!dst_enable_map) ||
-	    WARN_ON(dst == src) ||
+	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst_enable_map) || WARN_ON(dst == src) ||
 	    WARN_ON(dst->metadata != src->metadata) ||
 	    WARN_ON(dst->metadata != dst_enable_map->metadata))
 		return;
 
 	metadata = dst->metadata;
 
-	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(
-			dst, grp, blk, blk_inst);
-		const u64 *src_blk = kbase_hwcnt_dump_buffer_block_instance(
-			src, grp, blk, blk_inst);
-		const u64 *blk_em = kbase_hwcnt_enable_map_block_instance(
-			dst_enable_map, grp, blk, blk_inst);
-		size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(
-			metadata, grp, blk);
-		size_t ctr_cnt = kbase_hwcnt_metadata_block_counters_count(
-			metadata, grp, blk);
+	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst)
+	{
+		u64 *dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		const u64 *src_blk =
+			kbase_hwcnt_dump_buffer_block_instance(src, grp, blk, blk_inst);
+		const u64 *blk_em =
+			kbase_hwcnt_enable_map_block_instance(dst_enable_map, grp, blk, blk_inst);
+		size_t hdr_cnt = kbase_hwcnt_metadata_block_headers_count(metadata, grp, blk);
+		size_t ctr_cnt = kbase_hwcnt_metadata_block_counters_count(metadata, grp, blk);
 		/* Align upwards to include padding bytes */
-		ctr_cnt = KBASE_HWCNT_ALIGN_UPWARDS(hdr_cnt + ctr_cnt,
-			(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT /
-			 KBASE_HWCNT_VALUE_BYTES) - hdr_cnt);
+		ctr_cnt = KBASE_HWCNT_ALIGN_UPWARDS(
+			hdr_cnt + ctr_cnt,
+			(KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT / KBASE_HWCNT_VALUE_BYTES) - hdr_cnt);
 
-		kbase_hwcnt_dump_buffer_block_accumulate_strict(
-			dst_blk, src_blk, blk_em, hdr_cnt, ctr_cnt);
+		kbase_hwcnt_dump_buffer_block_accumulate_strict(dst_blk, src_blk, blk_em, hdr_cnt,
+								ctr_cnt);
 	}
 
-	kbase_hwcnt_metadata_for_each_clock(metadata, clk) {
-		if (kbase_hwcnt_clk_enable_map_enabled(
-			dst_enable_map->clk_enable_map, clk))
+	kbase_hwcnt_metadata_for_each_clock(metadata, clk)
+	{
+		if (kbase_hwcnt_clk_enable_map_enabled(dst_enable_map->clk_enable_map, clk))
 			dst->clk_cnt_buf[clk] += src->clk_cnt_buf[clk];
 		else
 			dst->clk_cnt_buf[clk] = 0;
diff --git a/mali_kbase/mali_kbase_hwcnt_types.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.h
index 9397840..5c5ada4 100644
--- a/mali_kbase/mali_kbase_hwcnt_types.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_types.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -104,8 +104,7 @@
 #define KBASE_HWCNT_AVAIL_MASK_BITS (sizeof(u64) * BITS_PER_BYTE)
 
 /* Minimum alignment of each block of hardware counters */
-#define KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT                                       \
-	(KBASE_HWCNT_BITFIELD_BITS * KBASE_HWCNT_VALUE_BYTES)
+#define KBASE_HWCNT_BLOCK_BYTE_ALIGNMENT (KBASE_HWCNT_BITFIELD_BITS * KBASE_HWCNT_VALUE_BYTES)
 
 /**
  * KBASE_HWCNT_ALIGN_UPWARDS() - Calculate next aligned value.
@@ -115,7 +114,7 @@
  * Return: Input value if already aligned to the specified boundary, or next
  * (incrementing upwards) aligned value.
  */
-#define KBASE_HWCNT_ALIGN_UPWARDS(value, alignment)                            \
+#define KBASE_HWCNT_ALIGN_UPWARDS(value, alignment)                                                \
 	(value + ((alignment - (value % alignment)) % alignment))
 
 /**
@@ -307,9 +306,8 @@ struct kbase_hwcnt_dump_buffer_array {
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_metadata_create(
-	const struct kbase_hwcnt_description *desc,
-	const struct kbase_hwcnt_metadata **metadata);
+int kbase_hwcnt_metadata_create(const struct kbase_hwcnt_description *desc,
+				const struct kbase_hwcnt_metadata **metadata);
 
 /**
  * kbase_hwcnt_metadata_destroy() - Destroy a hardware counter metadata object.
@@ -323,8 +321,7 @@ void kbase_hwcnt_metadata_destroy(const struct kbase_hwcnt_metadata *metadata);
  *
  * Return: Number of hardware counter groups described by metadata.
  */
-static inline size_t
-kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata)
+static inline size_t kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata)
 {
 	if (WARN_ON(!metadata))
 		return 0;
@@ -339,9 +336,8 @@ kbase_hwcnt_metadata_group_count(const struct kbase_hwcnt_metadata *metadata)
  *
  * Return: Type of the group grp.
  */
-static inline u64
-kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata,
-				size_t grp)
+static inline u64 kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata,
+						  size_t grp)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt))
 		return 0;
@@ -356,9 +352,8 @@ kbase_hwcnt_metadata_group_type(const struct kbase_hwcnt_metadata *metadata,
  *
  * Return: Number of blocks in group grp.
  */
-static inline size_t
-kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata,
-				 size_t grp)
+static inline size_t kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata,
+						      size_t grp)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt))
 		return 0;
@@ -374,9 +369,8 @@ kbase_hwcnt_metadata_block_count(const struct kbase_hwcnt_metadata *metadata,
  *
  * Return: Type of the block blk in group grp.
  */
-static inline u64
-kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata,
-				size_t grp, size_t blk)
+static inline u64 kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata,
+						  size_t grp, size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -394,8 +388,9 @@ kbase_hwcnt_metadata_block_type(const struct kbase_hwcnt_metadata *metadata,
  *
  * Return: Number of instances of block blk in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_block_instance_count(
-	const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_instance_count(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+					  size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -413,8 +408,9 @@ static inline size_t kbase_hwcnt_metadata_block_instance_count(
  *
  * Return: Number of counter headers in each instance of block blk in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_block_headers_count(
-	const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_headers_count(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+					 size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -431,8 +427,9 @@ static inline size_t kbase_hwcnt_metadata_block_headers_count(
  *
  * Return: Number of counters in each instance of block blk in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_block_counters_count(
-	const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_counters_count(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+					  size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -449,8 +446,9 @@ static inline size_t kbase_hwcnt_metadata_block_counters_count(
  *
  * Return: enable map stride in each instance of block blk in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_block_enable_map_stride(
-	const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_enable_map_stride(const struct kbase_hwcnt_metadata *metadata,
+					     size_t grp, size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -468,8 +466,9 @@ static inline size_t kbase_hwcnt_metadata_block_enable_map_stride(
  * Return: Number of headers plus counters in each instance of block blk
  *         in group grp.
  */
-static inline size_t kbase_hwcnt_metadata_block_values_count(
-	const struct kbase_hwcnt_metadata *metadata, size_t grp, size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_values_count(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+					size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -490,10 +489,13 @@ static inline size_t kbase_hwcnt_metadata_block_values_count(
  * Iteration order is group, then block, then block instance (i.e. linearly
  * through memory).
  */
-#define kbase_hwcnt_metadata_for_each_block(md, grp, blk, blk_inst) \
-	for ((grp) = 0; (grp) < kbase_hwcnt_metadata_group_count((md)); (grp)++) \
-		for ((blk) = 0; (blk) < kbase_hwcnt_metadata_block_count((md), (grp)); (blk)++) \
-			for ((blk_inst) = 0; (blk_inst) < kbase_hwcnt_metadata_block_instance_count((md), (grp), (blk)); (blk_inst)++)
+#define kbase_hwcnt_metadata_for_each_block(md, grp, blk, blk_inst)                                \
+	for ((grp) = 0; (grp) < kbase_hwcnt_metadata_group_count((md)); (grp)++)                   \
+		for ((blk) = 0; (blk) < kbase_hwcnt_metadata_block_count((md), (grp)); (blk)++)    \
+			for ((blk_inst) = 0;                                                       \
+			     (blk_inst) <                                                          \
+			     kbase_hwcnt_metadata_block_instance_count((md), (grp), (blk));        \
+			     (blk_inst)++)
 
 /**
  * kbase_hwcnt_metadata_block_avail_bit() - Get the bit index into the avail
@@ -504,10 +506,9 @@ static inline size_t kbase_hwcnt_metadata_block_values_count(
  *
  * Return: The bit index into the avail mask for the block.
  */
-static inline size_t kbase_hwcnt_metadata_block_avail_bit(
-	const struct kbase_hwcnt_metadata *metadata,
-	size_t grp,
-	size_t blk)
+static inline size_t
+kbase_hwcnt_metadata_block_avail_bit(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+				     size_t blk)
 {
 	if (WARN_ON(!metadata) || WARN_ON(grp >= metadata->grp_cnt) ||
 	    WARN_ON(blk >= metadata->grp_metadata[grp].blk_cnt))
@@ -527,11 +528,9 @@ static inline size_t kbase_hwcnt_metadata_block_avail_bit(
  *
  * Return: true if the block instance is available, else false.
  */
-static inline bool kbase_hwcnt_metadata_block_instance_avail(
-	const struct kbase_hwcnt_metadata *metadata,
-	size_t grp,
-	size_t blk,
-	size_t blk_inst)
+static inline bool
+kbase_hwcnt_metadata_block_instance_avail(const struct kbase_hwcnt_metadata *metadata, size_t grp,
+					  size_t blk, size_t blk_inst)
 {
 	size_t bit;
 	u64 mask;
@@ -553,9 +552,8 @@ static inline bool kbase_hwcnt_metadata_block_instance_avail(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_enable_map_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	struct kbase_hwcnt_enable_map *enable_map);
+int kbase_hwcnt_enable_map_alloc(const struct kbase_hwcnt_metadata *metadata,
+				 struct kbase_hwcnt_enable_map *enable_map);
 
 /**
  * kbase_hwcnt_enable_map_free() - Free an enable map.
@@ -577,9 +575,8 @@ void kbase_hwcnt_enable_map_free(struct kbase_hwcnt_enable_map *enable_map);
  * Return: u64* to the bitfield(s) used as the enable map for the
  *         block instance.
  */
-static inline u64 *
-kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map,
-				      size_t grp, size_t blk, size_t blk_inst)
+static inline u64 *kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map,
+							 size_t grp, size_t blk, size_t blk_inst)
 {
 	if (WARN_ON(!map) || WARN_ON(!map->hwcnt_enable_map))
 		return NULL;
@@ -589,15 +586,9 @@ kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map,
 	    WARN_ON(blk_inst >= map->metadata->grp_metadata[grp].blk_metadata[blk].inst_cnt))
 		return map->hwcnt_enable_map;
 
-	return map->hwcnt_enable_map +
-	       map->metadata->grp_metadata[grp].enable_map_index +
-	       map->metadata->grp_metadata[grp]
-		       .blk_metadata[blk]
-		       .enable_map_index +
-	       (map->metadata->grp_metadata[grp]
-			.blk_metadata[blk]
-			.enable_map_stride *
-		blk_inst);
+	return map->hwcnt_enable_map + map->metadata->grp_metadata[grp].enable_map_index +
+	       map->metadata->grp_metadata[grp].blk_metadata[blk].enable_map_index +
+	       (map->metadata->grp_metadata[grp].blk_metadata[blk].enable_map_stride * blk_inst);
 }
 
 /**
@@ -609,8 +600,7 @@ kbase_hwcnt_enable_map_block_instance(const struct kbase_hwcnt_enable_map *map,
  */
 static inline size_t kbase_hwcnt_bitfield_count(size_t val_cnt)
 {
-	return (val_cnt + KBASE_HWCNT_BITFIELD_BITS - 1) /
-		KBASE_HWCNT_BITFIELD_BITS;
+	return (val_cnt + KBASE_HWCNT_BITFIELD_BITS - 1) / KBASE_HWCNT_BITFIELD_BITS;
 }
 
 /**
@@ -620,11 +610,8 @@ static inline size_t kbase_hwcnt_bitfield_count(size_t val_cnt)
  * @blk:      Index of the block in the group.
  * @blk_inst: Index of the block instance in the block.
  */
-static inline void kbase_hwcnt_enable_map_block_disable_all(
-	struct kbase_hwcnt_enable_map *dst,
-	size_t grp,
-	size_t blk,
-	size_t blk_inst)
+static inline void kbase_hwcnt_enable_map_block_disable_all(struct kbase_hwcnt_enable_map *dst,
+							    size_t grp, size_t blk, size_t blk_inst)
 {
 	size_t val_cnt;
 	size_t bitfld_cnt;
@@ -644,15 +631,13 @@ static inline void kbase_hwcnt_enable_map_block_disable_all(
  * kbase_hwcnt_enable_map_disable_all() - Disable all values in the enable map.
  * @dst: Non-NULL pointer to enable map to zero.
  */
-static inline void kbase_hwcnt_enable_map_disable_all(
-	struct kbase_hwcnt_enable_map *dst)
+static inline void kbase_hwcnt_enable_map_disable_all(struct kbase_hwcnt_enable_map *dst)
 {
 	if (WARN_ON(!dst) || WARN_ON(!dst->metadata))
 		return;
 
 	if (dst->hwcnt_enable_map != NULL)
-		memset(dst->hwcnt_enable_map, 0,
-		       dst->metadata->enable_map_bytes);
+		memset(dst->hwcnt_enable_map, 0, dst->metadata->enable_map_bytes);
 
 	dst->clk_enable_map = 0;
 }
@@ -664,11 +649,8 @@ static inline void kbase_hwcnt_enable_map_disable_all(
  * @blk:      Index of the block in the group.
  * @blk_inst: Index of the block instance in the block.
  */
-static inline void kbase_hwcnt_enable_map_block_enable_all(
-	struct kbase_hwcnt_enable_map *dst,
-	size_t grp,
-	size_t blk,
-	size_t blk_inst)
+static inline void kbase_hwcnt_enable_map_block_enable_all(struct kbase_hwcnt_enable_map *dst,
+							   size_t grp, size_t blk, size_t blk_inst)
 {
 	size_t val_cnt;
 	size_t bitfld_cnt;
@@ -683,8 +665,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_all(
 	bitfld_cnt = kbase_hwcnt_bitfield_count(val_cnt);
 
 	for (bitfld_idx = 0; bitfld_idx < bitfld_cnt; bitfld_idx++) {
-		const u64 remaining_values = val_cnt -
-			(bitfld_idx * KBASE_HWCNT_BITFIELD_BITS);
+		const u64 remaining_values = val_cnt - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS);
 		u64 block_enable_map_mask = U64_MAX;
 
 		if (remaining_values < KBASE_HWCNT_BITFIELD_BITS)
@@ -699,8 +680,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_all(
  *                                       map.
  * @dst: Non-NULL pointer to enable map.
  */
-static inline void kbase_hwcnt_enable_map_enable_all(
-	struct kbase_hwcnt_enable_map *dst)
+static inline void kbase_hwcnt_enable_map_enable_all(struct kbase_hwcnt_enable_map *dst)
 {
 	size_t grp, blk, blk_inst;
 
@@ -708,8 +688,7 @@ static inline void kbase_hwcnt_enable_map_enable_all(
 		return;
 
 	kbase_hwcnt_metadata_for_each_block(dst->metadata, grp, blk, blk_inst)
-		kbase_hwcnt_enable_map_block_enable_all(
-			dst, grp, blk, blk_inst);
+		kbase_hwcnt_enable_map_block_enable_all(dst, grp, blk, blk_inst);
 
 	dst->clk_enable_map = (1ull << dst->metadata->clk_cnt) - 1;
 }
@@ -721,9 +700,8 @@ static inline void kbase_hwcnt_enable_map_enable_all(
  *
  * The dst and src MUST have been created from the same metadata.
  */
-static inline void kbase_hwcnt_enable_map_copy(
-	struct kbase_hwcnt_enable_map *dst,
-	const struct kbase_hwcnt_enable_map *src)
+static inline void kbase_hwcnt_enable_map_copy(struct kbase_hwcnt_enable_map *dst,
+					       const struct kbase_hwcnt_enable_map *src)
 {
 	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst->metadata) ||
 	    WARN_ON(dst->metadata != src->metadata))
@@ -733,8 +711,7 @@ static inline void kbase_hwcnt_enable_map_copy(
 		if (WARN_ON(!src->hwcnt_enable_map))
 			return;
 
-		memcpy(dst->hwcnt_enable_map,
-		       src->hwcnt_enable_map,
+		memcpy(dst->hwcnt_enable_map, src->hwcnt_enable_map,
 		       dst->metadata->enable_map_bytes);
 	}
 
@@ -748,9 +725,8 @@ static inline void kbase_hwcnt_enable_map_copy(
  *
  * The dst and src MUST have been created from the same metadata.
  */
-static inline void kbase_hwcnt_enable_map_union(
-	struct kbase_hwcnt_enable_map *dst,
-	const struct kbase_hwcnt_enable_map *src)
+static inline void kbase_hwcnt_enable_map_union(struct kbase_hwcnt_enable_map *dst,
+						const struct kbase_hwcnt_enable_map *src)
 {
 	if (WARN_ON(!dst) || WARN_ON(!src) || WARN_ON(!dst->metadata) ||
 	    WARN_ON(dst->metadata != src->metadata))
@@ -781,11 +757,9 @@ static inline void kbase_hwcnt_enable_map_union(
  *
  * Return: true if any values in the block are enabled, else false.
  */
-static inline bool kbase_hwcnt_enable_map_block_enabled(
-	const struct kbase_hwcnt_enable_map *enable_map,
-	size_t grp,
-	size_t blk,
-	size_t blk_inst)
+static inline bool
+kbase_hwcnt_enable_map_block_enabled(const struct kbase_hwcnt_enable_map *enable_map, size_t grp,
+				     size_t blk, size_t blk_inst)
 {
 	bool any_enabled = false;
 	size_t val_cnt;
@@ -801,15 +775,13 @@ static inline bool kbase_hwcnt_enable_map_block_enabled(
 	bitfld_cnt = kbase_hwcnt_bitfield_count(val_cnt);
 
 	for (bitfld_idx = 0; bitfld_idx < bitfld_cnt; bitfld_idx++) {
-		const u64 remaining_values = val_cnt -
-			(bitfld_idx * KBASE_HWCNT_BITFIELD_BITS);
+		const u64 remaining_values = val_cnt - (bitfld_idx * KBASE_HWCNT_BITFIELD_BITS);
 		u64 block_enable_map_mask = U64_MAX;
 
 		if (remaining_values < KBASE_HWCNT_BITFIELD_BITS)
 			block_enable_map_mask = (1ull << remaining_values) - 1;
 
-		any_enabled = any_enabled ||
-			(block_enable_map[bitfld_idx] & block_enable_map_mask);
+		any_enabled = any_enabled || (block_enable_map[bitfld_idx] & block_enable_map_mask);
 	}
 
 	return any_enabled;
@@ -821,8 +793,8 @@ static inline bool kbase_hwcnt_enable_map_block_enabled(
  *
  * Return: true if any values are enabled, else false.
  */
-static inline bool kbase_hwcnt_enable_map_any_enabled(
-	const struct kbase_hwcnt_enable_map *enable_map)
+static inline bool
+kbase_hwcnt_enable_map_any_enabled(const struct kbase_hwcnt_enable_map *enable_map)
 {
 	size_t grp, blk, blk_inst;
 	u64 clk_enable_map_mask;
@@ -832,14 +804,12 @@ static inline bool kbase_hwcnt_enable_map_any_enabled(
 
 	clk_enable_map_mask = (1ull << enable_map->metadata->clk_cnt) - 1;
 
-	if (enable_map->metadata->clk_cnt > 0 &&
-		(enable_map->clk_enable_map & clk_enable_map_mask))
+	if (enable_map->metadata->clk_cnt > 0 && (enable_map->clk_enable_map & clk_enable_map_mask))
 		return true;
 
-	kbase_hwcnt_metadata_for_each_block(
-		enable_map->metadata, grp, blk, blk_inst) {
-		if (kbase_hwcnt_enable_map_block_enabled(
-			enable_map, grp, blk, blk_inst))
+	kbase_hwcnt_metadata_for_each_block(enable_map->metadata, grp, blk, blk_inst)
+	{
+		if (kbase_hwcnt_enable_map_block_enabled(enable_map, grp, blk, blk_inst))
 			return true;
 	}
 
@@ -855,9 +825,7 @@ static inline bool kbase_hwcnt_enable_map_any_enabled(
  *
  * Return: true if the value was enabled, else false.
  */
-static inline bool kbase_hwcnt_enable_map_block_value_enabled(
-	const u64 *bitfld,
-	size_t val_idx)
+static inline bool kbase_hwcnt_enable_map_block_value_enabled(const u64 *bitfld, size_t val_idx)
 {
 	const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS;
 	const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS;
@@ -873,9 +841,7 @@ static inline bool kbase_hwcnt_enable_map_block_value_enabled(
  *           kbase_hwcnt_enable_map_block_instance.
  * @val_idx: Index of the value to enable in the block instance.
  */
-static inline void kbase_hwcnt_enable_map_block_enable_value(
-	u64 *bitfld,
-	size_t val_idx)
+static inline void kbase_hwcnt_enable_map_block_enable_value(u64 *bitfld, size_t val_idx)
 {
 	const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS;
 	const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS;
@@ -891,9 +857,7 @@ static inline void kbase_hwcnt_enable_map_block_enable_value(
  *           kbase_hwcnt_enable_map_block_instance.
  * @val_idx: Index of the value to disable in the block instance.
  */
-static inline void kbase_hwcnt_enable_map_block_disable_value(
-	u64 *bitfld,
-	size_t val_idx)
+static inline void kbase_hwcnt_enable_map_block_disable_value(u64 *bitfld, size_t val_idx)
 {
 	const size_t idx = val_idx / KBASE_HWCNT_BITFIELD_BITS;
 	const size_t bit = val_idx % KBASE_HWCNT_BITFIELD_BITS;
@@ -911,9 +875,8 @@ static inline void kbase_hwcnt_enable_map_block_disable_value(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_dump_buffer_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	struct kbase_hwcnt_dump_buffer *dump_buf);
+int kbase_hwcnt_dump_buffer_alloc(const struct kbase_hwcnt_metadata *metadata,
+				  struct kbase_hwcnt_dump_buffer *dump_buf);
 
 /**
  * kbase_hwcnt_dump_buffer_free() - Free a dump buffer.
@@ -936,10 +899,8 @@ void kbase_hwcnt_dump_buffer_free(struct kbase_hwcnt_dump_buffer *dump_buf);
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_dump_buffer_array_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	size_t n,
-	struct kbase_hwcnt_dump_buffer_array *dump_bufs);
+int kbase_hwcnt_dump_buffer_array_alloc(const struct kbase_hwcnt_metadata *metadata, size_t n,
+					struct kbase_hwcnt_dump_buffer_array *dump_bufs);
 
 /**
  * kbase_hwcnt_dump_buffer_array_free() - Free a dump buffer array.
@@ -948,8 +909,7 @@ int kbase_hwcnt_dump_buffer_array_alloc(
  * Can be safely called on an all-zeroed dump buffer array structure, or on an
  * already freed dump buffer array.
  */
-void kbase_hwcnt_dump_buffer_array_free(
-	struct kbase_hwcnt_dump_buffer_array *dump_bufs);
+void kbase_hwcnt_dump_buffer_array_free(struct kbase_hwcnt_dump_buffer_array *dump_bufs);
 
 /**
  * kbase_hwcnt_dump_buffer_block_instance() - Get the pointer to a block
@@ -961,9 +921,8 @@ void kbase_hwcnt_dump_buffer_array_free(
  *
  * Return: u64* to the dump buffer for the block instance.
  */
-static inline u64 *kbase_hwcnt_dump_buffer_block_instance(
-	const struct kbase_hwcnt_dump_buffer *buf, size_t grp, size_t blk,
-	size_t blk_inst)
+static inline u64 *kbase_hwcnt_dump_buffer_block_instance(const struct kbase_hwcnt_dump_buffer *buf,
+							  size_t grp, size_t blk, size_t blk_inst)
 {
 	if (WARN_ON(!buf) || WARN_ON(!buf->dump_buf))
 		return NULL;
@@ -975,10 +934,7 @@ static inline u64 *kbase_hwcnt_dump_buffer_block_instance(
 
 	return buf->dump_buf + buf->metadata->grp_metadata[grp].dump_buf_index +
 	       buf->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_index +
-	       (buf->metadata->grp_metadata[grp]
-			.blk_metadata[blk]
-			.dump_buf_stride *
-		blk_inst);
+	       (buf->metadata->grp_metadata[grp].blk_metadata[blk].dump_buf_stride * blk_inst);
 }
 
 /**
@@ -990,9 +946,8 @@ static inline u64 *kbase_hwcnt_dump_buffer_block_instance(
  *
  * The dst and dst_enable_map MUST have been created from the same metadata.
  */
-void kbase_hwcnt_dump_buffer_zero(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_zero(struct kbase_hwcnt_dump_buffer *dst,
+				  const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_zero() - Zero all values in a block.
@@ -1000,8 +955,7 @@ void kbase_hwcnt_dump_buffer_zero(
  *           kbase_hwcnt_dump_buffer_block_instance.
  * @val_cnt: Number of values in the block.
  */
-static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk,
-						      size_t val_cnt)
+static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk, size_t val_cnt)
 {
 	if (WARN_ON(!dst_blk))
 		return;
@@ -1017,8 +971,7 @@ static inline void kbase_hwcnt_dump_buffer_block_zero(u64 *dst_blk,
  *                                         Slower than the non-strict variant.
  * @dst: Non-NULL pointer to dump buffer.
  */
-void kbase_hwcnt_dump_buffer_zero_strict(
-	struct kbase_hwcnt_dump_buffer *dst);
+void kbase_hwcnt_dump_buffer_zero_strict(struct kbase_hwcnt_dump_buffer *dst);
 
 /**
  * kbase_hwcnt_dump_buffer_zero_non_enabled() - Zero all non-enabled values in
@@ -1031,9 +984,8 @@ void kbase_hwcnt_dump_buffer_zero_strict(
  *
  * The dst and dst_enable_map MUST have been created from the same metadata.
  */
-void kbase_hwcnt_dump_buffer_zero_non_enabled(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_zero_non_enabled(struct kbase_hwcnt_dump_buffer *dst,
+					      const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_zero_non_enabled() - Zero all non-enabled
@@ -1047,9 +999,8 @@ void kbase_hwcnt_dump_buffer_zero_non_enabled(
  *           kbase_hwcnt_enable_map_block_instance.
  * @val_cnt: Number of values in the block.
  */
-static inline void
-kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em,
-					       size_t val_cnt)
+static inline void kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em,
+								  size_t val_cnt)
 {
 	size_t val;
 
@@ -1073,10 +1024,9 @@ kbase_hwcnt_dump_buffer_block_zero_non_enabled(u64 *dst_blk, const u64 *blk_em,
  * The dst, src, and dst_enable_map MUST have been created from the same
  * metadata.
  */
-void kbase_hwcnt_dump_buffer_copy(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_copy(struct kbase_hwcnt_dump_buffer *dst,
+				  const struct kbase_hwcnt_dump_buffer *src,
+				  const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_copy() - Copy all block values from src to dst.
@@ -1086,8 +1036,7 @@ void kbase_hwcnt_dump_buffer_copy(
  *           kbase_hwcnt_dump_buffer_block_instance.
  * @val_cnt: Number of values in the block.
  */
-static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk,
-						      const u64 *src_blk,
+static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk, const u64 *src_blk,
 						      size_t val_cnt)
 {
 	if (WARN_ON(!dst_blk) || WARN_ON(!src_blk))
@@ -1113,10 +1062,9 @@ static inline void kbase_hwcnt_dump_buffer_block_copy(u64 *dst_blk,
  * The dst, src, and dst_enable_map MUST have been created from the same
  * metadata.
  */
-void kbase_hwcnt_dump_buffer_copy_strict(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_copy_strict(struct kbase_hwcnt_dump_buffer *dst,
+					 const struct kbase_hwcnt_dump_buffer *src,
+					 const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_copy_strict() - Copy all enabled block values
@@ -1134,10 +1082,8 @@ void kbase_hwcnt_dump_buffer_copy_strict(
  *
  * After the copy, any disabled values in dst will be zero.
  */
-static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk,
-							     const u64 *src_blk,
-							     const u64 *blk_em,
-							     size_t val_cnt)
+static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk, const u64 *src_blk,
+							     const u64 *blk_em, size_t val_cnt)
 {
 	size_t val;
 
@@ -1145,8 +1091,7 @@ static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk,
 		return;
 
 	for (val = 0; val < val_cnt; val++) {
-		bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled(
-			blk_em, val);
+		bool val_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, val);
 
 		dst_blk[val] = val_enabled ? src_blk[val] : 0;
 	}
@@ -1165,10 +1110,9 @@ static inline void kbase_hwcnt_dump_buffer_block_copy_strict(u64 *dst_blk,
  * The dst, src, and dst_enable_map MUST have been created from the same
  * metadata.
  */
-void kbase_hwcnt_dump_buffer_accumulate(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_accumulate(struct kbase_hwcnt_dump_buffer *dst,
+					const struct kbase_hwcnt_dump_buffer *src,
+					const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_accumulate() - Copy all block headers and
@@ -1181,10 +1125,8 @@ void kbase_hwcnt_dump_buffer_accumulate(
  * @hdr_cnt: Number of headers in the block.
  * @ctr_cnt: Number of counters in the block.
  */
-static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk,
-							    const u64 *src_blk,
-							    size_t hdr_cnt,
-							    size_t ctr_cnt)
+static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk, const u64 *src_blk,
+							    size_t hdr_cnt, size_t ctr_cnt)
 {
 	size_t ctr;
 
@@ -1219,10 +1161,9 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate(u64 *dst_blk,
  * The dst, src, and dst_enable_map MUST have been created from the same
  * metadata.
  */
-void kbase_hwcnt_dump_buffer_accumulate_strict(
-	struct kbase_hwcnt_dump_buffer *dst,
-	const struct kbase_hwcnt_dump_buffer *src,
-	const struct kbase_hwcnt_enable_map *dst_enable_map);
+void kbase_hwcnt_dump_buffer_accumulate_strict(struct kbase_hwcnt_dump_buffer *dst,
+					       const struct kbase_hwcnt_dump_buffer *src,
+					       const struct kbase_hwcnt_enable_map *dst_enable_map);
 
 /**
  * kbase_hwcnt_dump_buffer_block_accumulate_strict() - Copy all enabled block
@@ -1241,21 +1182,19 @@ void kbase_hwcnt_dump_buffer_accumulate_strict(
  * @hdr_cnt: Number of headers in the block.
  * @ctr_cnt: Number of counters in the block.
  */
-static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict(
-	u64 *dst_blk, const u64 *src_blk, const u64 *blk_em, size_t hdr_cnt,
-	size_t ctr_cnt)
+static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict(u64 *dst_blk, const u64 *src_blk,
+								   const u64 *blk_em,
+								   size_t hdr_cnt, size_t ctr_cnt)
 {
 	size_t ctr;
 
 	if (WARN_ON(!dst_blk) || WARN_ON(!src_blk))
 		return;
 
-	kbase_hwcnt_dump_buffer_block_copy_strict(
-		dst_blk, src_blk, blk_em, hdr_cnt);
+	kbase_hwcnt_dump_buffer_block_copy_strict(dst_blk, src_blk, blk_em, hdr_cnt);
 
 	for (ctr = hdr_cnt; ctr < ctr_cnt + hdr_cnt; ctr++) {
-		bool ctr_enabled = kbase_hwcnt_enable_map_block_value_enabled(
-			blk_em, ctr);
+		bool ctr_enabled = kbase_hwcnt_enable_map_block_value_enabled(blk_em, ctr);
 
 		if (ctr_enabled)
 			dst_blk[ctr] += src_blk[ctr];
@@ -1270,8 +1209,7 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict(
  * @md:          Non-NULL pointer to metadata.
  * @clk:         size_t variable used as clock iterator.
  */
-#define kbase_hwcnt_metadata_for_each_clock(md, clk)    \
-	for ((clk) = 0; (clk) < (md)->clk_cnt; (clk)++)
+#define kbase_hwcnt_metadata_for_each_clock(md, clk) for ((clk) = 0; (clk) < (md)->clk_cnt; (clk)++)
 
 /**
  * kbase_hwcnt_clk_enable_map_enabled() - Check if the given index is enabled
@@ -1281,8 +1219,7 @@ static inline void kbase_hwcnt_dump_buffer_block_accumulate_strict(
  *
  * Return: true if the index of the clock domain is enabled, else false.
  */
-static inline bool kbase_hwcnt_clk_enable_map_enabled(
-	const u64 clk_enable_map, const size_t index)
+static inline bool kbase_hwcnt_clk_enable_map_enabled(const u64 clk_enable_map, const size_t index)
 {
 	if (WARN_ON(index >= 64))
 		return false;
diff --git a/mali_kbase/mali_kbase_hwcnt_virtualizer.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.c
index 52ecb7b..d618764 100644
--- a/mali_kbase/mali_kbase_hwcnt_virtualizer.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,10 +19,10 @@
  *
  */
 
-#include "mali_kbase_hwcnt_virtualizer.h"
-#include "mali_kbase_hwcnt_accumulator.h"
-#include "mali_kbase_hwcnt_context.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_accumulator.h"
+#include "hwcnt/mali_kbase_hwcnt_context.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
 #include <linux/mutex.h>
 #include <linux/slab.h>
@@ -75,8 +75,8 @@ struct kbase_hwcnt_virtualizer_client {
 	u64 ts_start_ns;
 };
 
-const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata(
-	struct kbase_hwcnt_virtualizer *hvirt)
+const struct kbase_hwcnt_metadata *
+kbase_hwcnt_virtualizer_metadata(struct kbase_hwcnt_virtualizer *hvirt)
 {
 	if (!hvirt)
 		return NULL;
@@ -90,8 +90,7 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata(
  *
  * Will safely free a client in any partial state of construction.
  */
-static void kbasep_hwcnt_virtualizer_client_free(
-	struct kbase_hwcnt_virtualizer_client *hvcli)
+static void kbasep_hwcnt_virtualizer_client_free(struct kbase_hwcnt_virtualizer_client *hvcli)
 {
 	if (!hvcli)
 		return;
@@ -110,9 +109,8 @@ static void kbasep_hwcnt_virtualizer_client_free(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_virtualizer_client_alloc(
-	const struct kbase_hwcnt_metadata *metadata,
-	struct kbase_hwcnt_virtualizer_client **out_hvcli)
+static int kbasep_hwcnt_virtualizer_client_alloc(const struct kbase_hwcnt_metadata *metadata,
+						 struct kbase_hwcnt_virtualizer_client **out_hvcli)
 {
 	int errcode;
 	struct kbase_hwcnt_virtualizer_client *hvcli = NULL;
@@ -145,9 +143,9 @@ error:
  * @hvcli:    Non-NULL pointer to virtualizer client.
  * @dump_buf: Non-NULL pointer to dump buffer to accumulate from.
  */
-static void kbasep_hwcnt_virtualizer_client_accumulate(
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	const struct kbase_hwcnt_dump_buffer *dump_buf)
+static void
+kbasep_hwcnt_virtualizer_client_accumulate(struct kbase_hwcnt_virtualizer_client *hvcli,
+					   const struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	WARN_ON(!hvcli);
 	WARN_ON(!dump_buf);
@@ -155,12 +153,10 @@ static void kbasep_hwcnt_virtualizer_client_accumulate(
 
 	if (hvcli->has_accum) {
 		/* If already some accumulation, accumulate */
-		kbase_hwcnt_dump_buffer_accumulate(
-			&hvcli->accum_buf, dump_buf, &hvcli->enable_map);
+		kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, dump_buf, &hvcli->enable_map);
 	} else {
 		/* If no accumulation, copy */
-		kbase_hwcnt_dump_buffer_copy(
-			&hvcli->accum_buf, dump_buf, &hvcli->enable_map);
+		kbase_hwcnt_dump_buffer_copy(&hvcli->accum_buf, dump_buf, &hvcli->enable_map);
 	}
 	hvcli->has_accum = true;
 }
@@ -173,8 +169,7 @@ static void kbasep_hwcnt_virtualizer_client_accumulate(
  *
  * Will safely terminate the accumulator in any partial state of initialisation.
  */
-static void kbasep_hwcnt_virtualizer_accumulator_term(
-	struct kbase_hwcnt_virtualizer *hvirt)
+static void kbasep_hwcnt_virtualizer_accumulator_term(struct kbase_hwcnt_virtualizer *hvirt)
 {
 	WARN_ON(!hvirt);
 	lockdep_assert_held(&hvirt->lock);
@@ -194,8 +189,7 @@ static void kbasep_hwcnt_virtualizer_accumulator_term(
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_virtualizer_accumulator_init(
-	struct kbase_hwcnt_virtualizer *hvirt)
+static int kbasep_hwcnt_virtualizer_accumulator_init(struct kbase_hwcnt_virtualizer *hvirt)
 {
 	int errcode;
 
@@ -204,18 +198,15 @@ static int kbasep_hwcnt_virtualizer_accumulator_init(
 	WARN_ON(hvirt->client_count);
 	WARN_ON(hvirt->accum);
 
-	errcode = kbase_hwcnt_accumulator_acquire(
-		hvirt->hctx, &hvirt->accum);
+	errcode = kbase_hwcnt_accumulator_acquire(hvirt->hctx, &hvirt->accum);
 	if (errcode)
 		goto error;
 
-	errcode = kbase_hwcnt_enable_map_alloc(
-		hvirt->metadata, &hvirt->scratch_map);
+	errcode = kbase_hwcnt_enable_map_alloc(hvirt->metadata, &hvirt->scratch_map);
 	if (errcode)
 		goto error;
 
-	errcode = kbase_hwcnt_dump_buffer_alloc(
-		hvirt->metadata, &hvirt->scratch_buf);
+	errcode = kbase_hwcnt_dump_buffer_alloc(hvirt->metadata, &hvirt->scratch_buf);
 	if (errcode)
 		goto error;
 
@@ -234,10 +225,9 @@ error:
  *
  * Return: 0 on success, else error code.
  */
-static int kbasep_hwcnt_virtualizer_client_add(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	const struct kbase_hwcnt_enable_map *enable_map)
+static int kbasep_hwcnt_virtualizer_client_add(struct kbase_hwcnt_virtualizer *hvirt,
+					       struct kbase_hwcnt_virtualizer_client *hvcli,
+					       const struct kbase_hwcnt_enable_map *enable_map)
 {
 	int errcode = 0;
 	u64 ts_start_ns;
@@ -258,28 +248,25 @@ static int kbasep_hwcnt_virtualizer_client_add(
 
 	if (hvirt->client_count == 1) {
 		/* First client, so just pass the enable map onwards as is */
-		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum,
-			enable_map, &ts_start_ns, &ts_end_ns, NULL);
+		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, enable_map,
+							       &ts_start_ns, &ts_end_ns, NULL);
 	} else {
 		struct kbase_hwcnt_virtualizer_client *pos;
 
 		/* Make the scratch enable map the union of all enable maps */
-		kbase_hwcnt_enable_map_copy(
-			&hvirt->scratch_map, enable_map);
-		list_for_each_entry(pos, &hvirt->clients, node)
-			kbase_hwcnt_enable_map_union(
-				&hvirt->scratch_map, &pos->enable_map);
+		kbase_hwcnt_enable_map_copy(&hvirt->scratch_map, enable_map);
+		list_for_each_entry (pos, &hvirt->clients, node)
+			kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map);
 
 		/* Set the counters with the new union enable map */
-		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum,
-			&hvirt->scratch_map,
-			&ts_start_ns, &ts_end_ns,
-			&hvirt->scratch_buf);
+		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map,
+							       &ts_start_ns, &ts_end_ns,
+							       &hvirt->scratch_buf);
 		/* Accumulate into only existing clients' accumulation bufs */
 		if (!errcode)
-			list_for_each_entry(pos, &hvirt->clients, node)
-				kbasep_hwcnt_virtualizer_client_accumulate(
-					pos, &hvirt->scratch_buf);
+			list_for_each_entry (pos, &hvirt->clients, node)
+				kbasep_hwcnt_virtualizer_client_accumulate(pos,
+									   &hvirt->scratch_buf);
 	}
 	if (errcode)
 		goto error;
@@ -307,9 +294,8 @@ error:
  * @hvirt:      Non-NULL pointer to the hardware counter virtualizer.
  * @hvcli:      Non-NULL pointer to the virtualizer client to remove.
  */
-static void kbasep_hwcnt_virtualizer_client_remove(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	struct kbase_hwcnt_virtualizer_client *hvcli)
+static void kbasep_hwcnt_virtualizer_client_remove(struct kbase_hwcnt_virtualizer *hvirt,
+						   struct kbase_hwcnt_virtualizer_client *hvcli)
 {
 	int errcode = 0;
 	u64 ts_start_ns;
@@ -329,22 +315,21 @@ static void kbasep_hwcnt_virtualizer_client_remove(
 		struct kbase_hwcnt_virtualizer_client *pos;
 		/* Make the scratch enable map the union of all enable maps */
 		kbase_hwcnt_enable_map_disable_all(&hvirt->scratch_map);
-		list_for_each_entry(pos, &hvirt->clients, node)
-			kbase_hwcnt_enable_map_union(
-				&hvirt->scratch_map, &pos->enable_map);
+		list_for_each_entry (pos, &hvirt->clients, node)
+			kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map);
 		/* Set the counters with the new union enable map */
-		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum,
-			&hvirt->scratch_map,
-			&ts_start_ns, &ts_end_ns,
-			&hvirt->scratch_buf);
+		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map,
+							       &ts_start_ns, &ts_end_ns,
+							       &hvirt->scratch_buf);
 		/* Accumulate into remaining clients' accumulation bufs */
-		if (!errcode)
-			list_for_each_entry(pos, &hvirt->clients, node)
-				kbasep_hwcnt_virtualizer_client_accumulate(
-					pos, &hvirt->scratch_buf);
+		if (!errcode) {
+			list_for_each_entry (pos, &hvirt->clients, node)
+				kbasep_hwcnt_virtualizer_client_accumulate(pos,
+									   &hvirt->scratch_buf);
 
-		/* Store the most recent dump time for rate limiting */
-		hvirt->ts_last_dump_ns = ts_end_ns;
+			/* Store the most recent dump time for rate limiting */
+			hvirt->ts_last_dump_ns = ts_end_ns;
+		}
 	}
 	WARN_ON(errcode);
 }
@@ -370,11 +355,8 @@ static void kbasep_hwcnt_virtualizer_client_remove(
  * Return: 0 on success or error code.
  */
 static int kbasep_hwcnt_virtualizer_client_set_counters(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
+	struct kbase_hwcnt_virtualizer *hvirt, struct kbase_hwcnt_virtualizer_client *hvcli,
+	const struct kbase_hwcnt_enable_map *enable_map, u64 *ts_start_ns, u64 *ts_end_ns,
 	struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
@@ -391,32 +373,29 @@ static int kbasep_hwcnt_virtualizer_client_set_counters(
 
 	/* Make the scratch enable map the union of all enable maps */
 	kbase_hwcnt_enable_map_copy(&hvirt->scratch_map, enable_map);
-	list_for_each_entry(pos, &hvirt->clients, node)
+	list_for_each_entry (pos, &hvirt->clients, node)
 		/* Ignore the enable map of the selected client */
 		if (pos != hvcli)
-			kbase_hwcnt_enable_map_union(
-				&hvirt->scratch_map, &pos->enable_map);
+			kbase_hwcnt_enable_map_union(&hvirt->scratch_map, &pos->enable_map);
 
 	/* Set the counters with the new union enable map */
-	errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum,
-		&hvirt->scratch_map, ts_start_ns, ts_end_ns,
-		&hvirt->scratch_buf);
+	errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, &hvirt->scratch_map,
+						       ts_start_ns, ts_end_ns, &hvirt->scratch_buf);
 	if (errcode)
 		return errcode;
 
 	/* Accumulate into all accumulation bufs except the selected client's */
-	list_for_each_entry(pos, &hvirt->clients, node)
+	list_for_each_entry (pos, &hvirt->clients, node)
 		if (pos != hvcli)
-			kbasep_hwcnt_virtualizer_client_accumulate(
-				pos, &hvirt->scratch_buf);
+			kbasep_hwcnt_virtualizer_client_accumulate(pos, &hvirt->scratch_buf);
 
 	/* Finally, write into the dump buf */
 	if (dump_buf) {
 		const struct kbase_hwcnt_dump_buffer *src = &hvirt->scratch_buf;
 
 		if (hvcli->has_accum) {
-			kbase_hwcnt_dump_buffer_accumulate(
-				&hvcli->accum_buf, src, &hvcli->enable_map);
+			kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, src,
+							   &hvcli->enable_map);
 			src = &hvcli->accum_buf;
 		}
 		kbase_hwcnt_dump_buffer_copy(dump_buf, src, &hvcli->enable_map);
@@ -436,12 +415,10 @@ static int kbasep_hwcnt_virtualizer_client_set_counters(
 	return errcode;
 }
 
-int kbase_hwcnt_virtualizer_client_set_counters(
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+int kbase_hwcnt_virtualizer_client_set_counters(struct kbase_hwcnt_virtualizer_client *hvcli,
+						const struct kbase_hwcnt_enable_map *enable_map,
+						u64 *ts_start_ns, u64 *ts_end_ns,
+						struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
 	struct kbase_hwcnt_virtualizer *hvirt;
@@ -464,14 +441,12 @@ int kbase_hwcnt_virtualizer_client_set_counters(
 		 * to the accumulator, saving a fair few copies and
 		 * accumulations.
 		 */
-		errcode = kbase_hwcnt_accumulator_set_counters(
-			hvirt->accum, enable_map,
-			ts_start_ns, ts_end_ns, dump_buf);
+		errcode = kbase_hwcnt_accumulator_set_counters(hvirt->accum, enable_map,
+							       ts_start_ns, ts_end_ns, dump_buf);
 
 		if (!errcode) {
 			/* Update the selected client's enable map */
-			kbase_hwcnt_enable_map_copy(
-				&hvcli->enable_map, enable_map);
+			kbase_hwcnt_enable_map_copy(&hvcli->enable_map, enable_map);
 
 			/* Fix up the timestamps */
 			*ts_start_ns = hvcli->ts_start_ns;
@@ -483,8 +458,7 @@ int kbase_hwcnt_virtualizer_client_set_counters(
 	} else {
 		/* Otherwise, do the full virtualize */
 		errcode = kbasep_hwcnt_virtualizer_client_set_counters(
-			hvirt, hvcli, enable_map,
-			ts_start_ns, ts_end_ns, dump_buf);
+			hvirt, hvcli, enable_map, ts_start_ns, ts_end_ns, dump_buf);
 	}
 
 	mutex_unlock(&hvirt->lock);
@@ -507,12 +481,10 @@ int kbase_hwcnt_virtualizer_client_set_counters(
  *
  * Return: 0 on success or error code.
  */
-static int kbasep_hwcnt_virtualizer_client_dump(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+static int kbasep_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer *hvirt,
+						struct kbase_hwcnt_virtualizer_client *hvcli,
+						u64 *ts_start_ns, u64 *ts_end_ns,
+						struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
 	struct kbase_hwcnt_virtualizer_client *pos;
@@ -525,24 +497,23 @@ static int kbasep_hwcnt_virtualizer_client_dump(
 	lockdep_assert_held(&hvirt->lock);
 
 	/* Perform the dump */
-	errcode = kbase_hwcnt_accumulator_dump(hvirt->accum,
-		ts_start_ns, ts_end_ns, &hvirt->scratch_buf);
+	errcode = kbase_hwcnt_accumulator_dump(hvirt->accum, ts_start_ns, ts_end_ns,
+					       &hvirt->scratch_buf);
 	if (errcode)
 		return errcode;
 
 	/* Accumulate into all accumulation bufs except the selected client's */
-	list_for_each_entry(pos, &hvirt->clients, node)
+	list_for_each_entry (pos, &hvirt->clients, node)
 		if (pos != hvcli)
-			kbasep_hwcnt_virtualizer_client_accumulate(
-				pos, &hvirt->scratch_buf);
+			kbasep_hwcnt_virtualizer_client_accumulate(pos, &hvirt->scratch_buf);
 
 	/* Finally, write into the dump buf */
 	if (dump_buf) {
 		const struct kbase_hwcnt_dump_buffer *src = &hvirt->scratch_buf;
 
 		if (hvcli->has_accum) {
-			kbase_hwcnt_dump_buffer_accumulate(
-				&hvcli->accum_buf, src, &hvcli->enable_map);
+			kbase_hwcnt_dump_buffer_accumulate(&hvcli->accum_buf, src,
+							   &hvcli->enable_map);
 			src = &hvcli->accum_buf;
 		}
 		kbase_hwcnt_dump_buffer_copy(dump_buf, src, &hvcli->enable_map);
@@ -578,11 +549,8 @@ static int kbasep_hwcnt_virtualizer_client_dump(
  * Return: 0 on success or error code.
  */
 static int kbasep_hwcnt_virtualizer_client_dump_rate_limited(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+	struct kbase_hwcnt_virtualizer *hvirt, struct kbase_hwcnt_virtualizer_client *hvcli,
+	u64 *ts_start_ns, u64 *ts_end_ns, struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	bool rate_limited = true;
 
@@ -602,10 +570,8 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited(
 		 */
 		rate_limited = false;
 	} else {
-		const u64 ts_ns =
-			kbase_hwcnt_accumulator_timestamp_ns(hvirt->accum);
-		const u64 time_since_last_dump_ns =
-			ts_ns - hvirt->ts_last_dump_ns;
+		const u64 ts_ns = kbase_hwcnt_accumulator_timestamp_ns(hvirt->accum);
+		const u64 time_since_last_dump_ns = ts_ns - hvirt->ts_last_dump_ns;
 
 		/* Dump period equals or exceeds the threshold */
 		if (time_since_last_dump_ns >= hvirt->dump_threshold_ns)
@@ -613,8 +579,8 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited(
 	}
 
 	if (!rate_limited)
-		return kbasep_hwcnt_virtualizer_client_dump(
-			hvirt, hvcli, ts_start_ns, ts_end_ns, dump_buf);
+		return kbasep_hwcnt_virtualizer_client_dump(hvirt, hvcli, ts_start_ns, ts_end_ns,
+							    dump_buf);
 
 	/* If we've gotten this far, the client must have something accumulated
 	 * otherwise it is a logic error
@@ -622,8 +588,7 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited(
 	WARN_ON(!hvcli->has_accum);
 
 	if (dump_buf)
-		kbase_hwcnt_dump_buffer_copy(
-			dump_buf, &hvcli->accum_buf, &hvcli->enable_map);
+		kbase_hwcnt_dump_buffer_copy(dump_buf, &hvcli->accum_buf, &hvcli->enable_map);
 	hvcli->has_accum = false;
 
 	*ts_start_ns = hvcli->ts_start_ns;
@@ -633,11 +598,9 @@ static int kbasep_hwcnt_virtualizer_client_dump_rate_limited(
 	return 0;
 }
 
-int kbase_hwcnt_virtualizer_client_dump(
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf)
+int kbase_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer_client *hvcli,
+					u64 *ts_start_ns, u64 *ts_end_ns,
+					struct kbase_hwcnt_dump_buffer *dump_buf)
 {
 	int errcode;
 	struct kbase_hwcnt_virtualizer *hvirt;
@@ -659,8 +622,8 @@ int kbase_hwcnt_virtualizer_client_dump(
 		 * to the accumulator, saving a fair few copies and
 		 * accumulations.
 		 */
-		errcode = kbase_hwcnt_accumulator_dump(
-			hvirt->accum, ts_start_ns, ts_end_ns, dump_buf);
+		errcode = kbase_hwcnt_accumulator_dump(hvirt->accum, ts_start_ns, ts_end_ns,
+						       dump_buf);
 
 		if (!errcode) {
 			/* Fix up the timestamps */
@@ -681,20 +644,17 @@ int kbase_hwcnt_virtualizer_client_dump(
 	return errcode;
 }
 
-int kbase_hwcnt_virtualizer_client_create(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	struct kbase_hwcnt_virtualizer_client **out_hvcli)
+int kbase_hwcnt_virtualizer_client_create(struct kbase_hwcnt_virtualizer *hvirt,
+					  const struct kbase_hwcnt_enable_map *enable_map,
+					  struct kbase_hwcnt_virtualizer_client **out_hvcli)
 {
 	int errcode;
 	struct kbase_hwcnt_virtualizer_client *hvcli;
 
-	if (!hvirt || !enable_map || !out_hvcli ||
-	    (enable_map->metadata != hvirt->metadata))
+	if (!hvirt || !enable_map || !out_hvcli || (enable_map->metadata != hvirt->metadata))
 		return -EINVAL;
 
-	errcode = kbasep_hwcnt_virtualizer_client_alloc(
-		hvirt->metadata, &hvcli);
+	errcode = kbasep_hwcnt_virtualizer_client_alloc(hvirt->metadata, &hvcli);
 	if (errcode)
 		return errcode;
 
@@ -713,8 +673,7 @@ int kbase_hwcnt_virtualizer_client_create(
 	return 0;
 }
 
-void kbase_hwcnt_virtualizer_client_destroy(
-	struct kbase_hwcnt_virtualizer_client *hvcli)
+void kbase_hwcnt_virtualizer_client_destroy(struct kbase_hwcnt_virtualizer_client *hvcli)
 {
 	if (!hvcli)
 		return;
@@ -728,10 +687,8 @@ void kbase_hwcnt_virtualizer_client_destroy(
 	kbasep_hwcnt_virtualizer_client_free(hvcli);
 }
 
-int kbase_hwcnt_virtualizer_init(
-	struct kbase_hwcnt_context *hctx,
-	u64 dump_threshold_ns,
-	struct kbase_hwcnt_virtualizer **out_hvirt)
+int kbase_hwcnt_virtualizer_init(struct kbase_hwcnt_context *hctx, u64 dump_threshold_ns,
+				 struct kbase_hwcnt_virtualizer **out_hvirt)
 {
 	struct kbase_hwcnt_virtualizer *virt;
 	const struct kbase_hwcnt_metadata *metadata;
@@ -758,8 +715,7 @@ int kbase_hwcnt_virtualizer_init(
 	return 0;
 }
 
-void kbase_hwcnt_virtualizer_term(
-	struct kbase_hwcnt_virtualizer *hvirt)
+void kbase_hwcnt_virtualizer_term(struct kbase_hwcnt_virtualizer *hvirt)
 {
 	if (!hvirt)
 		return;
@@ -768,7 +724,7 @@ void kbase_hwcnt_virtualizer_term(
 	if (WARN_ON(hvirt->client_count != 0)) {
 		struct kbase_hwcnt_virtualizer_client *pos, *n;
 
-		list_for_each_entry_safe(pos, n, &hvirt->clients, node)
+		list_for_each_entry_safe (pos, n, &hvirt->clients, node)
 			kbase_hwcnt_virtualizer_client_destroy(pos);
 	}
 
diff --git a/mali_kbase/mali_kbase_hwcnt_virtualizer.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.h
index 08e8e9f..485ba74 100644
--- a/mali_kbase/mali_kbase_hwcnt_virtualizer.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_virtualizer.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -51,17 +51,14 @@ struct kbase_hwcnt_dump_buffer;
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_virtualizer_init(
-	struct kbase_hwcnt_context *hctx,
-	u64 dump_threshold_ns,
-	struct kbase_hwcnt_virtualizer **out_hvirt);
+int kbase_hwcnt_virtualizer_init(struct kbase_hwcnt_context *hctx, u64 dump_threshold_ns,
+				 struct kbase_hwcnt_virtualizer **out_hvirt);
 
 /**
  * kbase_hwcnt_virtualizer_term - Terminate a hardware counter virtualizer.
  * @hvirt: Pointer to virtualizer to be terminated.
  */
-void kbase_hwcnt_virtualizer_term(
-	struct kbase_hwcnt_virtualizer *hvirt);
+void kbase_hwcnt_virtualizer_term(struct kbase_hwcnt_virtualizer *hvirt);
 
 /**
  * kbase_hwcnt_virtualizer_metadata - Get the hardware counter metadata used by
@@ -71,8 +68,8 @@ void kbase_hwcnt_virtualizer_term(
  *
  * Return: Non-NULL pointer to metadata, or NULL on error.
  */
-const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata(
-	struct kbase_hwcnt_virtualizer *hvirt);
+const struct kbase_hwcnt_metadata *
+kbase_hwcnt_virtualizer_metadata(struct kbase_hwcnt_virtualizer *hvirt);
 
 /**
  * kbase_hwcnt_virtualizer_client_create - Create a new virtualizer client.
@@ -84,17 +81,15 @@ const struct kbase_hwcnt_metadata *kbase_hwcnt_virtualizer_metadata(
  *
  * Return: 0 on success, else error code.
  */
-int kbase_hwcnt_virtualizer_client_create(
-	struct kbase_hwcnt_virtualizer *hvirt,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	struct kbase_hwcnt_virtualizer_client **out_hvcli);
+int kbase_hwcnt_virtualizer_client_create(struct kbase_hwcnt_virtualizer *hvirt,
+					  const struct kbase_hwcnt_enable_map *enable_map,
+					  struct kbase_hwcnt_virtualizer_client **out_hvcli);
 
 /**
  * kbase_hwcnt_virtualizer_client_destroy() - Destroy a virtualizer client.
  * @hvcli: Pointer to the hardware counter client.
  */
-void kbase_hwcnt_virtualizer_client_destroy(
-	struct kbase_hwcnt_virtualizer_client *hvcli);
+void kbase_hwcnt_virtualizer_client_destroy(struct kbase_hwcnt_virtualizer_client *hvcli);
 
 /**
  * kbase_hwcnt_virtualizer_client_set_counters - Perform a dump of the client's
@@ -115,12 +110,10 @@ void kbase_hwcnt_virtualizer_client_destroy(
  *
  * Return: 0 on success or error code.
  */
-int kbase_hwcnt_virtualizer_client_set_counters(
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	const struct kbase_hwcnt_enable_map *enable_map,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf);
+int kbase_hwcnt_virtualizer_client_set_counters(struct kbase_hwcnt_virtualizer_client *hvcli,
+						const struct kbase_hwcnt_enable_map *enable_map,
+						u64 *ts_start_ns, u64 *ts_end_ns,
+						struct kbase_hwcnt_dump_buffer *dump_buf);
 
 /**
  * kbase_hwcnt_virtualizer_client_dump - Perform a dump of the client's
@@ -136,11 +129,9 @@ int kbase_hwcnt_virtualizer_client_set_counters(
  *
  * Return: 0 on success or error code.
  */
-int kbase_hwcnt_virtualizer_client_dump(
-	struct kbase_hwcnt_virtualizer_client *hvcli,
-	u64 *ts_start_ns,
-	u64 *ts_end_ns,
-	struct kbase_hwcnt_dump_buffer *dump_buf);
+int kbase_hwcnt_virtualizer_client_dump(struct kbase_hwcnt_virtualizer_client *hvcli,
+					u64 *ts_start_ns, u64 *ts_end_ns,
+					struct kbase_hwcnt_dump_buffer *dump_buf);
 
 /**
  * kbase_hwcnt_virtualizer_queue_work() - Queue hardware counter related async
diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if.h
index 1873318..501c008 100644
--- a/mali_kbase/mali_kbase_hwcnt_watchdog_if.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -50,17 +50,17 @@ typedef void kbase_hwcnt_watchdog_callback_fn(void *user_data);
  *
  * Return: 0 if the watchdog timer enabled successfully, error code otherwise.
  */
-typedef int kbase_hwcnt_watchdog_enable_fn(
-	const struct kbase_hwcnt_watchdog_info *timer, u32 period_ms,
-	kbase_hwcnt_watchdog_callback_fn *callback, void *user_data);
+typedef int kbase_hwcnt_watchdog_enable_fn(const struct kbase_hwcnt_watchdog_info *timer,
+					   u32 period_ms,
+					   kbase_hwcnt_watchdog_callback_fn *callback,
+					   void *user_data);
 
 /**
  * typedef kbase_hwcnt_watchdog_disable_fn - Disable watchdog timer
  *
  * @timer: Non-NULL pointer to a watchdog timer interface context
  */
-typedef void
-kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer);
+typedef void kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer);
 
 /**
  * typedef kbase_hwcnt_watchdog_modify_fn - Modify watchdog timer's timeout
@@ -68,9 +68,8 @@ kbase_hwcnt_watchdog_disable_fn(const struct kbase_hwcnt_watchdog_info *timer);
  * @timer:    Non-NULL pointer to a watchdog timer interface context
  * @delay_ms: Watchdog timer expiration in milliseconds
  */
-typedef void
-kbase_hwcnt_watchdog_modify_fn(const struct kbase_hwcnt_watchdog_info *timer,
-			       u32 delay_ms);
+typedef void kbase_hwcnt_watchdog_modify_fn(const struct kbase_hwcnt_watchdog_info *timer,
+					    u32 delay_ms);
 
 /**
  * struct kbase_hwcnt_watchdog_interface - Hardware counter watchdog virtual interface.
diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.c b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.c
index 69b957a..4caa832 100644
--- a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.c
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,8 +20,8 @@
  */
 
 #include "mali_kbase.h"
-#include "mali_kbase_hwcnt_watchdog_if.h"
-#include "mali_kbase_hwcnt_watchdog_if_timer.h"
+#include "hwcnt/mali_kbase_hwcnt_watchdog_if.h"
+#include "hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h"
 
 #include <linux/workqueue.h>
 #include <linux/slab.h>
@@ -62,12 +62,10 @@ static void kbasep_hwcnt_watchdog_callback(struct work_struct *const work)
 }
 
 static int kbasep_hwcnt_watchdog_if_timer_enable(
-	const struct kbase_hwcnt_watchdog_info *const timer,
-	u32 const period_ms, kbase_hwcnt_watchdog_callback_fn *const callback,
-	void *const user_data)
+	const struct kbase_hwcnt_watchdog_info *const timer, u32 const period_ms,
+	kbase_hwcnt_watchdog_callback_fn *const callback, void *const user_data)
 {
-	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info =
-		(void *)timer;
+	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer;
 
 	if (WARN_ON(!timer) || WARN_ON(!callback) || WARN_ON(timer_info->timer_enabled))
 		return -EINVAL;
@@ -81,11 +79,10 @@ static int kbasep_hwcnt_watchdog_if_timer_enable(
 	return 0;
 }
 
-static void kbasep_hwcnt_watchdog_if_timer_disable(
-	const struct kbase_hwcnt_watchdog_info *const timer)
+static void
+kbasep_hwcnt_watchdog_if_timer_disable(const struct kbase_hwcnt_watchdog_info *const timer)
 {
-	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info =
-		(void *)timer;
+	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer;
 
 	if (WARN_ON(!timer))
 		return;
@@ -97,11 +94,11 @@ static void kbasep_hwcnt_watchdog_if_timer_disable(
 	timer_info->timer_enabled = false;
 }
 
-static void kbasep_hwcnt_watchdog_if_timer_modify(
-	const struct kbase_hwcnt_watchdog_info *const timer, u32 const delay_ms)
+static void
+kbasep_hwcnt_watchdog_if_timer_modify(const struct kbase_hwcnt_watchdog_info *const timer,
+				      u32 const delay_ms)
 {
-	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info =
-		(void *)timer;
+	struct kbase_hwcnt_watchdog_if_timer_info *const timer_info = (void *)timer;
 
 	if (WARN_ON(!timer) || WARN_ON(!timer_info->timer_enabled))
 		return;
@@ -109,8 +106,7 @@ static void kbasep_hwcnt_watchdog_if_timer_modify(
 	mod_delayed_work(timer_info->workq, &timer_info->dwork, msecs_to_jiffies(delay_ms));
 }
 
-void kbase_hwcnt_watchdog_if_timer_destroy(
-	struct kbase_hwcnt_watchdog_interface *const watchdog_if)
+void kbase_hwcnt_watchdog_if_timer_destroy(struct kbase_hwcnt_watchdog_interface *const watchdog_if)
 {
 	struct kbase_hwcnt_watchdog_if_timer_info *timer_info;
 
@@ -125,11 +121,12 @@ void kbase_hwcnt_watchdog_if_timer_destroy(
 	destroy_workqueue(timer_info->workq);
 	kfree(timer_info);
 
-	*watchdog_if = (struct kbase_hwcnt_watchdog_interface){ NULL };
+	*watchdog_if = (struct kbase_hwcnt_watchdog_interface){
+		.timer = NULL, .enable = NULL, .disable = NULL, .modify = NULL
+	};
 }
 
-int kbase_hwcnt_watchdog_if_timer_create(
-	struct kbase_hwcnt_watchdog_interface *const watchdog_if)
+int kbase_hwcnt_watchdog_if_timer_create(struct kbase_hwcnt_watchdog_interface *const watchdog_if)
 {
 	struct kbase_hwcnt_watchdog_if_timer_info *timer_info;
 
@@ -140,9 +137,7 @@ int kbase_hwcnt_watchdog_if_timer_create(
 	if (!timer_info)
 		return -ENOMEM;
 
-	*timer_info =
-		(struct kbase_hwcnt_watchdog_if_timer_info){ .timer_enabled =
-								     false };
+	*timer_info = (struct kbase_hwcnt_watchdog_if_timer_info){ .timer_enabled = false };
 
 	INIT_DELAYED_WORK(&timer_info->dwork, kbasep_hwcnt_watchdog_callback);
 
diff --git a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.h b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h
index 3bd69c3..a545ad3 100644
--- a/mali_kbase/mali_kbase_hwcnt_watchdog_if_timer.h
+++ b/mali_kbase/hwcnt/mali_kbase_hwcnt_watchdog_if_timer.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -35,8 +35,7 @@ struct kbase_hwcnt_watchdog_interface;
  *
  * Return: 0 on success, error otherwise.
  */
-int kbase_hwcnt_watchdog_if_timer_create(
-	struct kbase_hwcnt_watchdog_interface *watchdog_if);
+int kbase_hwcnt_watchdog_if_timer_create(struct kbase_hwcnt_watchdog_interface *watchdog_if);
 
 /**
  * kbase_hwcnt_watchdog_if_timer_destroy() - Destroy a watchdog interface of hardware counter
@@ -44,7 +43,6 @@ int kbase_hwcnt_watchdog_if_timer_create(
  *
  * @watchdog_if: Pointer to watchdog interface to destroy
  */
-void kbase_hwcnt_watchdog_if_timer_destroy(
-	struct kbase_hwcnt_watchdog_interface *watchdog_if);
+void kbase_hwcnt_watchdog_if_timer_destroy(struct kbase_hwcnt_watchdog_interface *watchdog_if);
 
 #endif /* _KBASE_HWCNT_WATCHDOG_IF_TIMER_H_ */
diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c
index 81dc56b..60b061e 100644
--- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c
+++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_csf.c
@@ -281,7 +281,7 @@ int kbase_ipa_counter_dynamic_coeff(struct kbase_ipa_model *model, u32 *coeffp)
 	if (WARN_ON(ret))
 		return ret;
 
-	now = ktime_get();
+	now = ktime_get_raw();
 	diff = ktime_sub(now, kbdev->ipa.last_sample_time);
 	diff_ms = ktime_to_ms(diff);
 
diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c
index e240117..34515a9 100644
--- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c
+++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,7 +31,7 @@
 #define DEFAULT_MIN_SAMPLE_CYCLES 10000
 
 /**
- * read_hwcnt() - read a counter value
+ * kbase_ipa_read_hwcnt() - read a counter value
  * @model_data:		pointer to model data
  * @offset:		offset, in bytes, into vinstr buffer
  *
diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h
index e1718c6..6089610 100644
--- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h
+++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_common_jm.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2017-2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017-2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,8 +23,8 @@
 #define _KBASE_IPA_COUNTER_COMMON_JM_H_
 
 #include "mali_kbase.h"
-#include "mali_kbase_hwcnt_virtualizer.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 
 /* Maximum number of IPA groups for an IPA model. */
 #define KBASE_IPA_MAX_GROUP_DEF_NUM  16
@@ -83,7 +83,7 @@ struct kbase_ipa_model_vinstr_data {
 };
 
 /**
- * struct ipa_group - represents a single IPA group
+ * struct kbase_ipa_group - represents a single IPA group
  * @name:               name of the IPA group
  * @default_value:      default value of coefficient for IPA group.
  *                      Coefficients are interpreted as fractions where the
@@ -152,7 +152,7 @@ s64 kbase_ipa_single_counter(
 	s32 coeff, u32 counter);
 
 /**
- * attach_vinstr() - attach a vinstr_buffer to an IPA model.
+ * kbase_ipa_attach_vinstr() - attach a vinstr_buffer to an IPA model.
  * @model_data:		pointer to model data
  *
  * Attach a vinstr_buffer to an IPA model. The vinstr_buffer
@@ -164,7 +164,7 @@ s64 kbase_ipa_single_counter(
 int kbase_ipa_attach_vinstr(struct kbase_ipa_model_vinstr_data *model_data);
 
 /**
- * detach_vinstr() - detach a vinstr_buffer from an IPA model.
+ * kbase_ipa_detach_vinstr() - detach a vinstr_buffer from an IPA model.
  * @model_data:		pointer to model data
  *
  * Detach a vinstr_buffer from an IPA model.
diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c
index 66e56e2..21b4e52 100644
--- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c
+++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,10 +23,13 @@
 #include "mali_kbase.h"
 
 /* MEMSYS counter block offsets */
+#define L2_RD_MSG_IN_CU         (13)
 #define L2_RD_MSG_IN            (16)
 #define L2_WR_MSG_IN            (18)
+#define L2_SNP_MSG_IN           (20)
 #define L2_RD_MSG_OUT           (22)
 #define L2_READ_LOOKUP          (26)
+#define L2_EXT_READ_NOSNP       (30)
 #define L2_EXT_WRITE_NOSNP_FULL (43)
 
 /* SC counter block offsets */
@@ -36,17 +39,23 @@
 #define FULL_QUAD_WARPS         (21)
 #define EXEC_INSTR_FMA          (27)
 #define EXEC_INSTR_CVT          (28)
+#define EXEC_INSTR_SFU          (29)
 #define EXEC_INSTR_MSG          (30)
 #define TEX_FILT_NUM_OPS        (39)
 #define LS_MEM_READ_SHORT       (45)
 #define LS_MEM_WRITE_SHORT      (47)
 #define VARY_SLOT_16            (51)
+#define BEATS_RD_LSC_EXT        (57)
+#define BEATS_RD_TEX            (58)
+#define BEATS_RD_TEX_EXT        (59)
+#define FRAG_QUADS_COARSE       (68)
 
 /* Tiler counter block offsets */
 #define IDVS_POS_SHAD_STALL     (23)
 #define PREFETCH_STALL          (25)
 #define VFETCH_POS_READ_WAIT    (29)
 #define VFETCH_VERTEX_WAIT      (30)
+#define PRIMASSY_STALL          (32)
 #define IDVS_VAR_SHAD_STALL     (38)
 #define ITER_STALL              (40)
 #define PMGR_PTR_RD_STALL       (48)
@@ -59,9 +68,6 @@
 		.counter_block_type = block_type,		\
 	}
 
-#define CSHW_COUNTER_DEF(cnt_name, coeff, cnt_idx)	\
-	COUNTER_DEF(cnt_name, coeff, cnt_idx, KBASE_IPA_CORE_TYPE_CSHW)
-
 #define MEMSYS_COUNTER_DEF(cnt_name, coeff, cnt_idx)	\
 	COUNTER_DEF(cnt_name, coeff, cnt_idx, KBASE_IPA_CORE_TYPE_MEMSYS)
 
@@ -114,6 +120,15 @@ static const struct kbase_ipa_counter ipa_top_level_cntrs_def_ttux[] = {
 	TILER_COUNTER_DEF("vfetch_vertex_wait", -391964, VFETCH_VERTEX_WAIT),
 };
 
+static const struct kbase_ipa_counter ipa_top_level_cntrs_def_ttix[] = {
+	TILER_COUNTER_DEF("primassy_stall", 471953, PRIMASSY_STALL),
+	TILER_COUNTER_DEF("idvs_var_shad_stall", -460559, IDVS_VAR_SHAD_STALL),
+
+	MEMSYS_COUNTER_DEF("l2_rd_msg_in_cu", -6189604, L2_RD_MSG_IN_CU),
+	MEMSYS_COUNTER_DEF("l2_snp_msg_in", 6289609, L2_SNP_MSG_IN),
+	MEMSYS_COUNTER_DEF("l2_ext_read_nosnp", 512341, L2_EXT_READ_NOSNP),
+};
+
 /* These tables provide a description of each performance counter
  * used by the shader cores counter model for energy estimation.
  */
@@ -153,6 +168,17 @@ static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttux[] = {
 	SC_COUNTER_DEF("frag_quads_ezs_update", 372032, FRAG_QUADS_EZS_UPDATE),
 };
 
+static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttix[] = {
+	SC_COUNTER_DEF("exec_instr_fma", 192642, EXEC_INSTR_FMA),
+	SC_COUNTER_DEF("exec_instr_msg", 1326465, EXEC_INSTR_MSG),
+	SC_COUNTER_DEF("beats_rd_tex", 163518, BEATS_RD_TEX),
+	SC_COUNTER_DEF("beats_rd_lsc_ext", 127475, BEATS_RD_LSC_EXT),
+	SC_COUNTER_DEF("frag_quads_coarse", -36247, FRAG_QUADS_COARSE),
+	SC_COUNTER_DEF("ls_mem_write_short", 51547, LS_MEM_WRITE_SHORT),
+	SC_COUNTER_DEF("beats_rd_tex_ext", -43370, BEATS_RD_TEX_EXT),
+	SC_COUNTER_DEF("exec_instr_sfu", 31583, EXEC_INSTR_SFU),
+};
+
 #define IPA_POWER_MODEL_OPS(gpu, init_token) \
 	const struct kbase_ipa_model_ops kbase_ ## gpu ## _ipa_model_ops = { \
 		.name = "mali-" #gpu "-power-model", \
@@ -184,13 +210,13 @@ static const struct kbase_ipa_counter ipa_shader_core_cntrs_def_ttux[] = {
 #define ALIAS_POWER_MODEL(gpu, as_gpu) \
 	IPA_POWER_MODEL_OPS(gpu, as_gpu)
 
-/* Reference voltage value is 750 mV.
- */
+/* Reference voltage value is 750 mV. */
 STANDARD_POWER_MODEL(todx, 750);
 STANDARD_POWER_MODEL(tgrx, 750);
 STANDARD_POWER_MODEL(tvax, 750);
-
 STANDARD_POWER_MODEL(ttux, 750);
+/* Reference voltage value is 550 mV. */
+STANDARD_POWER_MODEL(ttix, 550);
 
 /* Assuming LODX is an alias of TODX for IPA */
 ALIAS_POWER_MODEL(lodx, todx);
@@ -198,10 +224,14 @@ ALIAS_POWER_MODEL(lodx, todx);
 /* Assuming LTUX is an alias of TTUX for IPA */
 ALIAS_POWER_MODEL(ltux, ttux);
 
+/* Assuming LTUX is an alias of TTUX for IPA */
+ALIAS_POWER_MODEL(ltix, ttix);
+
 static const struct kbase_ipa_model_ops *ipa_counter_model_ops[] = {
 	&kbase_todx_ipa_model_ops, &kbase_lodx_ipa_model_ops,
 	&kbase_tgrx_ipa_model_ops, &kbase_tvax_ipa_model_ops,
-	&kbase_ttux_ipa_model_ops, &kbase_ltux_ipa_model_ops
+	&kbase_ttux_ipa_model_ops, &kbase_ltux_ipa_model_ops,
+	&kbase_ttix_ipa_model_ops, &kbase_ltix_ipa_model_ops,
 };
 
 const struct kbase_ipa_model_ops *kbase_ipa_counter_model_ops_find(
@@ -240,6 +270,10 @@ const char *kbase_ipa_counter_model_name_from_id(u32 gpu_id)
 		return "mali-ttux-power-model";
 	case GPU_ID2_PRODUCT_LTUX:
 		return "mali-ltux-power-model";
+	case GPU_ID2_PRODUCT_TTIX:
+		return "mali-ttix-power-model";
+	case GPU_ID2_PRODUCT_LTIX:
+		return "mali-ltix-power-model";
 	default:
 		return NULL;
 	}
diff --git a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c
index f11be0d..5a204ae 100644
--- a/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c
+++ b/mali_kbase/ipa/backend/mali_kbase_ipa_counter_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2016-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,28 +23,19 @@
 
 #include "mali_kbase_ipa_counter_common_jm.h"
 #include "mali_kbase.h"
-
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-#include <backend/gpu/mali_kbase_model_dummy.h>
-#endif /* CONFIG_MALI_NO_MALI */
+#include <backend/gpu/mali_kbase_model_linux.h>
 
 /* Performance counter blocks base offsets */
 #define JM_BASE             (0 * KBASE_IPA_NR_BYTES_PER_BLOCK)
-#define TILER_BASE          (1 * KBASE_IPA_NR_BYTES_PER_BLOCK)
 #define MEMSYS_BASE         (2 * KBASE_IPA_NR_BYTES_PER_BLOCK)
 
 /* JM counter block offsets */
 #define JM_GPU_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT *  6)
 
-/* Tiler counter block offsets */
-#define TILER_ACTIVE (KBASE_IPA_NR_BYTES_PER_CNT * 45)
-
 /* MEMSYS counter block offsets */
 #define MEMSYS_L2_ANY_LOOKUP (KBASE_IPA_NR_BYTES_PER_CNT * 25)
 
 /* SC counter block offsets */
-#define SC_FRAG_ACTIVE             (KBASE_IPA_NR_BYTES_PER_CNT *  4)
-#define SC_EXEC_CORE_ACTIVE        (KBASE_IPA_NR_BYTES_PER_CNT * 26)
 #define SC_EXEC_INSTR_FMA          (KBASE_IPA_NR_BYTES_PER_CNT * 27)
 #define SC_EXEC_INSTR_COUNT        (KBASE_IPA_NR_BYTES_PER_CNT * 28)
 #define SC_EXEC_INSTR_MSG          (KBASE_IPA_NR_BYTES_PER_CNT * 30)
@@ -52,16 +43,14 @@
 #define SC_TEX_COORD_ISSUE         (KBASE_IPA_NR_BYTES_PER_CNT * 40)
 #define SC_TEX_TFCH_NUM_OPERATIONS (KBASE_IPA_NR_BYTES_PER_CNT * 42)
 #define SC_VARY_INSTR              (KBASE_IPA_NR_BYTES_PER_CNT * 49)
-#define SC_VARY_SLOT_32            (KBASE_IPA_NR_BYTES_PER_CNT * 50)
-#define SC_VARY_SLOT_16            (KBASE_IPA_NR_BYTES_PER_CNT * 51)
-#define SC_BEATS_RD_LSC            (KBASE_IPA_NR_BYTES_PER_CNT * 56)
-#define SC_BEATS_WR_LSC            (KBASE_IPA_NR_BYTES_PER_CNT * 61)
 #define SC_BEATS_WR_TIB            (KBASE_IPA_NR_BYTES_PER_CNT * 62)
 
 /**
- * get_jm_counter() - get performance counter offset inside the Job Manager block
+ * kbase_g7x_power_model_get_jm_counter() - get performance counter offset
+ * inside the Job Manager block
  * @model_data:            pointer to GPU model data.
- * @counter_block_offset:  offset in bytes of the performance counter inside the Job Manager block.
+ * @counter_block_offset:  offset in bytes of the performance counter inside
+ * the Job Manager block.
  *
  * Return: Block offset in bytes of the required performance counter.
  */
@@ -72,9 +61,11 @@ static u32 kbase_g7x_power_model_get_jm_counter(struct kbase_ipa_model_vinstr_da
 }
 
 /**
- * get_memsys_counter() - get performance counter offset inside the Memory System block
+ * kbase_g7x_power_model_get_memsys_counter() - get performance counter offset
+ * inside the Memory System block
  * @model_data:            pointer to GPU model data.
- * @counter_block_offset:  offset in bytes of the performance counter inside the (first) Memory System block.
+ * @counter_block_offset:  offset in bytes of the performance counter inside
+ * the (first) Memory System block.
  *
  * Return: Block offset in bytes of the required performance counter.
  */
@@ -88,9 +79,11 @@ static u32 kbase_g7x_power_model_get_memsys_counter(struct kbase_ipa_model_vinst
 }
 
 /**
- * get_sc_counter() - get performance counter offset inside the Shader Cores block
+ * kbase_g7x_power_model_get_sc_counter() - get performance counter offset
+ * inside the Shader Cores block
  * @model_data:            pointer to GPU model data.
- * @counter_block_offset:  offset in bytes of the performance counter inside the (first) Shader Cores block.
+ * @counter_block_offset:  offset in bytes of the performance counter inside
+ * the (first) Shader Cores block.
  *
  * Return: Block offset in bytes of the required performance counter.
  */
@@ -110,10 +103,12 @@ static u32 kbase_g7x_power_model_get_sc_counter(struct kbase_ipa_model_vinstr_da
 }
 
 /**
- * memsys_single_counter() - calculate energy for a single Memory System performance counter.
+ * kbase_g7x_sum_all_memsys_blocks() - calculate energy for a single Memory
+ * System performance counter.
  * @model_data:            pointer to GPU model data.
  * @coeff:                 default value of coefficient for IPA group.
- * @counter_block_offset:  offset in bytes of the counter inside the block it belongs to.
+ * @counter_block_offset:  offset in bytes of the counter inside the block it
+ * belongs to.
  *
  * Return: Energy estimation for a single Memory System performance counter.
  */
@@ -130,12 +125,15 @@ static s64 kbase_g7x_sum_all_memsys_blocks(
 }
 
 /**
- * sum_all_shader_cores() - calculate energy for a Shader Cores performance counter for all cores.
+ * kbase_g7x_sum_all_shader_cores() - calculate energy for a Shader Cores
+ * performance counter for all cores.
  * @model_data:            pointer to GPU model data.
  * @coeff:                 default value of coefficient for IPA group.
- * @counter_block_offset:  offset in bytes of the counter inside the block it belongs to.
+ * @counter_block_offset:  offset in bytes of the counter inside the block it
+ * belongs to.
  *
- * Return: Energy estimation for a Shader Cores performance counter for all cores.
+ * Return: Energy estimation for a Shader Cores performance counter for all
+ * cores.
  */
 static s64 kbase_g7x_sum_all_shader_cores(
 	struct kbase_ipa_model_vinstr_data *model_data,
@@ -150,7 +148,7 @@ static s64 kbase_g7x_sum_all_shader_cores(
 }
 
 /**
- * jm_single_counter() - calculate energy for a single Job Manager performance counter.
+ * kbase_g7x_jm_single_counter() - calculate energy for a single Job Manager performance counter.
  * @model_data:            pointer to GPU model data.
  * @coeff:                 default value of coefficient for IPA group.
  * @counter_block_offset:  offset in bytes of the counter inside the block it belongs to.
@@ -170,7 +168,7 @@ static s64 kbase_g7x_jm_single_counter(
 }
 
 /**
- * get_active_cycles() - return the GPU_ACTIVE counter
+ * kbase_g7x_get_active_cycles() - return the GPU_ACTIVE counter
  * @model_data:            pointer to GPU model data.
  *
  * Return: the number of cycles the GPU was active during the counter sampling
@@ -457,16 +455,14 @@ static const struct kbase_ipa_group ipa_groups_def_tbax[] = {
 	},
 };
 
-
-#define IPA_POWER_MODEL_OPS(gpu, init_token) \
-	const struct kbase_ipa_model_ops kbase_ ## gpu ## _ipa_model_ops = { \
-		.name = "mali-" #gpu "-power-model", \
-		.init = kbase_ ## init_token ## _power_model_init, \
-		.term = kbase_ipa_vinstr_common_model_term, \
-		.get_dynamic_coeff = kbase_ipa_vinstr_dynamic_coeff, \
-		.reset_counter_data = kbase_ipa_vinstr_reset_data, \
-	}; \
-	KBASE_EXPORT_TEST_API(kbase_ ## gpu ## _ipa_model_ops)
+#define IPA_POWER_MODEL_OPS(gpu, init_token)                                                       \
+	static const struct kbase_ipa_model_ops kbase_##gpu##_ipa_model_ops = {                    \
+		.name = "mali-" #gpu "-power-model",                                               \
+		.init = kbase_##init_token##_power_model_init,                                     \
+		.term = kbase_ipa_vinstr_common_model_term,                                        \
+		.get_dynamic_coeff = kbase_ipa_vinstr_dynamic_coeff,                               \
+		.reset_counter_data = kbase_ipa_vinstr_reset_data,                                 \
+	}
 
 #define STANDARD_POWER_MODEL(gpu, reference_voltage) \
 	static int kbase_ ## gpu ## _power_model_init(\
diff --git a/mali_kbase/ipa/mali_kbase_ipa.c b/mali_kbase/ipa/mali_kbase_ipa.c
index 428e68b..0e8abb1 100644
--- a/mali_kbase/ipa/mali_kbase_ipa.c
+++ b/mali_kbase/ipa/mali_kbase_ipa.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -84,11 +84,11 @@ KBASE_EXPORT_TEST_API(kbase_ipa_model_name_from_id);
 static struct device_node *get_model_dt_node(struct kbase_ipa_model *model,
 					     bool dt_required)
 {
-	struct device_node *model_dt_node;
+	struct device_node *model_dt_node = NULL;
 	char compat_string[64];
 
-	snprintf(compat_string, sizeof(compat_string), "arm,%s",
-		 model->ops->name);
+	if (unlikely(!scnprintf(compat_string, sizeof(compat_string), "arm,%s", model->ops->name)))
+		return NULL;
 
 	/* of_find_compatible_node() will call of_node_put() on the root node,
 	 * so take a reference on it first.
@@ -111,12 +111,12 @@ int kbase_ipa_model_add_param_s32(struct kbase_ipa_model *model,
 				  const char *name, s32 *addr,
 				  size_t num_elems, bool dt_required)
 {
-	int err, i;
+	int err = -EINVAL, i;
 	struct device_node *model_dt_node = get_model_dt_node(model,
 								dt_required);
 	char *origin;
 
-	err = of_property_read_u32_array(model_dt_node, name, addr, num_elems);
+	err = of_property_read_u32_array(model_dt_node, name, (u32 *)addr, num_elems);
 	/* We're done with model_dt_node now, so drop the reference taken in
 	 * get_model_dt_node()/of_find_compatible_node().
 	 */
@@ -138,11 +138,17 @@ int kbase_ipa_model_add_param_s32(struct kbase_ipa_model *model,
 	for (i = 0; i < num_elems; ++i) {
 		char elem_name[32];
 
-		if (num_elems == 1)
-			snprintf(elem_name, sizeof(elem_name), "%s", name);
-		else
-			snprintf(elem_name, sizeof(elem_name), "%s.%d",
-				name, i);
+		if (num_elems == 1) {
+			if (unlikely(!scnprintf(elem_name, sizeof(elem_name), "%s", name))) {
+				err = -ENOMEM;
+				goto exit;
+			}
+		} else {
+			if (unlikely(!scnprintf(elem_name, sizeof(elem_name), "%s.%d", name, i))) {
+				err = -ENOMEM;
+				goto exit;
+			}
+		}
 
 		dev_dbg(model->kbdev->dev, "%s.%s = %d (%s)\n",
 			model->ops->name, elem_name, addr[i], origin);
@@ -164,7 +170,7 @@ int kbase_ipa_model_add_param_string(struct kbase_ipa_model *model,
 	int err;
 	struct device_node *model_dt_node = get_model_dt_node(model,
 								dt_required);
-	const char *string_prop_value;
+	const char *string_prop_value = "";
 	char *origin;
 
 	err = of_property_read_string(model_dt_node, name,
@@ -324,7 +330,7 @@ int kbase_ipa_init(struct kbase_device *kbdev)
 		kbdev->ipa.configured_model = default_model;
 	}
 
-	kbdev->ipa.last_sample_time = ktime_get();
+	kbdev->ipa.last_sample_time = ktime_get_raw();
 
 end:
 	if (err)
@@ -750,7 +756,7 @@ void kbase_ipa_reset_data(struct kbase_device *kbdev)
 
 	mutex_lock(&kbdev->ipa.lock);
 
-	now = ktime_get();
+	now = ktime_get_raw();
 	diff = ktime_sub(now, kbdev->ipa.last_sample_time);
 	elapsed_time = ktime_to_ms(diff);
 
@@ -765,7 +771,7 @@ void kbase_ipa_reset_data(struct kbase_device *kbdev)
 		if (model != kbdev->ipa.fallback_model)
 			model->ops->reset_counter_data(model);
 
-		kbdev->ipa.last_sample_time = ktime_get();
+		kbdev->ipa.last_sample_time = ktime_get_raw();
 	}
 
 	mutex_unlock(&kbdev->ipa.lock);
diff --git a/mali_kbase/ipa/mali_kbase_ipa.h b/mali_kbase/ipa/mali_kbase_ipa.h
index c668af9..4f35b9e 100644
--- a/mali_kbase/ipa/mali_kbase_ipa.h
+++ b/mali_kbase/ipa/mali_kbase_ipa.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -266,7 +266,6 @@ int kbase_get_real_power(struct devfreq *df, u32 *power,
 				unsigned long freq,
 				unsigned long voltage);
 
-#if MALI_UNIT_TEST
 /* Called by kbase_get_real_power() to invoke the power models.
  * Must be called with kbdev->ipa.lock held.
  * This function is only exposed for use by unit tests.
@@ -274,7 +273,6 @@ int kbase_get_real_power(struct devfreq *df, u32 *power,
 int kbase_get_real_power_locked(struct kbase_device *kbdev, u32 *power,
 				unsigned long freq,
 				unsigned long voltage);
-#endif /* MALI_UNIT_TEST */
 
 extern struct devfreq_cooling_power kbase_ipa_power_model_ops;
 
diff --git a/mali_kbase/ipa/mali_kbase_ipa_debugfs.c b/mali_kbase/ipa/mali_kbase_ipa_debugfs.c
index d554fff..a8523a7 100644
--- a/mali_kbase/ipa/mali_kbase_ipa_debugfs.c
+++ b/mali_kbase/ipa/mali_kbase_ipa_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,6 +20,7 @@
  */
 
 #include <linux/debugfs.h>
+#include <linux/version_compat_defs.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
 
@@ -27,10 +28,6 @@
 #include "mali_kbase_ipa.h"
 #include "mali_kbase_ipa_debugfs.h"
 
-#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE)
-#define DEFINE_DEBUGFS_ATTRIBUTE DEFINE_SIMPLE_ATTRIBUTE
-#endif
-
 struct kbase_ipa_model_param {
 	char *name;
 	union {
diff --git a/mali_kbase/ipa/mali_kbase_ipa_simple.c b/mali_kbase/ipa/mali_kbase_ipa_simple.c
index fadae7d..0fd2136 100644
--- a/mali_kbase/ipa/mali_kbase_ipa_simple.c
+++ b/mali_kbase/ipa/mali_kbase_ipa_simple.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2016-2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2016-2018, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -33,6 +33,8 @@
 #include "mali_kbase_ipa_simple.h"
 #include "mali_kbase_ipa_debugfs.h"
 
+#if MALI_USE_CSF
+
 /* This is used if the dynamic power for top-level is estimated separately
  * through the counter model. To roughly match the contribution of top-level
  * power in the total dynamic power, when calculated through counter model,
@@ -43,6 +45,8 @@
  */
 #define TOP_LEVEL_DYN_COEFF_SCALER (3)
 
+#endif /* MALI_USE_CSF */
+
 #if MALI_UNIT_TEST
 
 static int dummy_temp;
@@ -227,14 +231,12 @@ static int add_params(struct kbase_ipa_model *model)
 			(struct kbase_ipa_model_simple_data *)model->model_data;
 
 	err = kbase_ipa_model_add_param_s32(model, "static-coefficient",
-					    &model_data->static_coefficient,
-					    1, true);
+					    (s32 *)&model_data->static_coefficient, 1, true);
 	if (err)
 		goto end;
 
 	err = kbase_ipa_model_add_param_s32(model, "dynamic-coefficient",
-					    &model_data->dynamic_coefficient,
-					    1, true);
+					    (s32 *)&model_data->dynamic_coefficient, 1, true);
 	if (err)
 		goto end;
 
@@ -321,8 +323,9 @@ static int kbase_simple_power_model_recalculate(struct kbase_ipa_model *model)
 		mutex_lock(&model->kbdev->ipa.lock);
 
 		if (IS_ERR_OR_NULL(tz)) {
-			pr_warn_ratelimited("Error %ld getting thermal zone \'%s\', not yet ready?\n",
-					    PTR_ERR(tz), tz_name);
+			pr_warn_ratelimited(
+				"Error %d getting thermal zone \'%s\', not yet ready?\n",
+				PTR_ERR_OR_ZERO(tz), tz_name);
 			return -EPROBE_DEFER;
 		}
 
diff --git a/mali_kbase/jm/mali_kbase_jm_defs.h b/mali_kbase/jm/mali_kbase_jm_defs.h
index 3c4d6b2..e694f9f 100644
--- a/mali_kbase/jm/mali_kbase_jm_defs.h
+++ b/mali_kbase/jm/mali_kbase_jm_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -135,13 +135,22 @@
 /**
  * enum kbase_timeout_selector - The choice of which timeout to get scaled
  *                               using the lowest GPU frequency.
- * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors. Must be last in
- *                                the enum.
+ * @MMU_AS_INACTIVE_WAIT_TIMEOUT: Maximum waiting time in ms for the completion
+ *                                of a MMU operation
+ * @JM_DEFAULT_JS_FREE_TIMEOUT: Maximum timeout to wait for JS_COMMAND_NEXT
+ *                              to be updated on HW side so a Job Slot is
+ *                              considered free.
+ * @KBASE_TIMEOUT_SELECTOR_COUNT: Number of timeout selectors.
+ * @KBASE_DEFAULT_TIMEOUT: Fallthrough in case an invalid timeout is
+ *                         passed.
  */
 enum kbase_timeout_selector {
+	MMU_AS_INACTIVE_WAIT_TIMEOUT,
+	JM_DEFAULT_JS_FREE_TIMEOUT,
 
 	/* Must be the last in the enum */
-	KBASE_TIMEOUT_SELECTOR_COUNT
+	KBASE_TIMEOUT_SELECTOR_COUNT,
+	KBASE_DEFAULT_TIMEOUT = JM_DEFAULT_JS_FREE_TIMEOUT
 };
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
@@ -194,8 +203,6 @@ struct kbase_jd_atom_dependency {
 static inline const struct kbase_jd_atom *
 kbase_jd_katom_dep_atom(const struct kbase_jd_atom_dependency *dep)
 {
-	KBASE_DEBUG_ASSERT(dep != NULL);
-
 	return (const struct kbase_jd_atom *)(dep->atom);
 }
 
@@ -209,8 +216,6 @@ kbase_jd_katom_dep_atom(const struct kbase_jd_atom_dependency *dep)
 static inline u8 kbase_jd_katom_dep_type(
 		const struct kbase_jd_atom_dependency *dep)
 {
-	KBASE_DEBUG_ASSERT(dep != NULL);
-
 	return dep->dep_type;
 }
 
@@ -227,8 +232,6 @@ static inline void kbase_jd_katom_dep_set(
 {
 	struct kbase_jd_atom_dependency *dep;
 
-	KBASE_DEBUG_ASSERT(const_dep != NULL);
-
 	dep = (struct kbase_jd_atom_dependency *)const_dep;
 
 	dep->atom = a;
@@ -245,8 +248,6 @@ static inline void kbase_jd_katom_dep_clear(
 {
 	struct kbase_jd_atom_dependency *dep;
 
-	KBASE_DEBUG_ASSERT(const_dep != NULL);
-
 	dep = (struct kbase_jd_atom_dependency *)const_dep;
 
 	dep->atom = NULL;
@@ -361,19 +362,6 @@ enum kbase_atom_exit_protected_state {
 };
 
 /**
- * struct kbase_ext_res - Contains the info for external resources referred
- *                        by an atom, which have been mapped on GPU side.
- * @gpu_address:          Start address of the memory region allocated for
- *                        the resource from GPU virtual address space.
- * @alloc:                pointer to physical pages tracking object, set on
- *                        mapping the external resource on GPU side.
- */
-struct kbase_ext_res {
-	u64 gpu_address;
-	struct kbase_mem_phy_alloc *alloc;
-};
-
-/**
  * struct kbase_jd_atom  - object representing the atom, containing the complete
  *                         state and attributes of an atom.
  * @work:                  work item for the bottom half processing of the atom,
@@ -406,7 +394,8 @@ struct kbase_ext_res {
  *                         each allocation is read in order to enforce an
  *                         overall physical memory usage limit.
  * @nr_extres:             number of external resources referenced by the atom.
- * @extres:                pointer to the location containing info about
+ * @extres:                Pointer to @nr_extres VA regions containing the external
+ *                         resource allocation and other information.
  *                         @nr_extres external resources referenced by the atom.
  * @device_nr:             indicates the coregroup with which the atom is
  *                         associated, when
@@ -424,16 +413,21 @@ struct kbase_ext_res {
  *                         sync through soft jobs and for the implicit
  *                         synchronization required on access to external
  *                         resources.
- * @dma_fence.fence_in:    Input fence
+ * @dma_fence.fence_in:    Points to the dma-buf input fence for this atom.
+ *                         The atom would complete only after the fence is
+ *                         signaled.
  * @dma_fence.fence:       Points to the dma-buf output fence for this atom.
+ * @dma_fence.fence_cb:    The object that is passed at the time of adding the
+ *                         callback that gets invoked when @dma_fence.fence_in
+ *                         is signaled.
+ * @dma_fence.fence_cb_added: Flag to keep a track if the callback was successfully
+ *                            added for @dma_fence.fence_in, which is supposed to be
+ *                            invoked on the signaling of fence.
  * @dma_fence.context:     The dma-buf fence context number for this atom. A
  *                         unique context number is allocated to each katom in
  *                         the context on context creation.
  * @dma_fence.seqno:       The dma-buf fence sequence number for this atom. This
  *                         is increased every time this katom uses dma-buf fence
- * @dma_fence.callbacks:   List of all callbacks set up to wait on other fences
- * @dma_fence.dep_count:   Atomic counter of number of outstandind dma-buf fence
- *                         dependencies for this atom.
  * @event_code:            Event code for the job chain represented by the atom,
  *                         both HW and low-level SW events are represented by
  *                         event codes.
@@ -516,7 +510,6 @@ struct kbase_ext_res {
  *                 BASE_JD_REQ_START_RENDERPASS set in its core requirements
  *                 with an atom that has BASE_JD_REQ_END_RENDERPASS set.
  * @jc_fragment:          Set of GPU fragment job chains
- * @retry_count:          TODO: Not used,to be removed
  */
 struct kbase_jd_atom {
 	struct kthread_work work;
@@ -536,21 +529,17 @@ struct kbase_jd_atom {
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
 	u16 nr_extres;
-	struct kbase_ext_res *extres;
+	struct kbase_va_region **extres;
 
 	u32 device_nr;
 	u64 jc;
 	void *softjob_data;
-#if defined(CONFIG_SYNC)
-	struct sync_fence *fence;
-	struct sync_fence_waiter sync_waiter;
-#endif				/* CONFIG_SYNC */
-#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	struct {
 		/* Use the functions/API defined in mali_kbase_fence.h to
 		 * when working with this sub struct
 		 */
-#if defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
 		struct fence *fence_in;
 #else
@@ -573,38 +562,21 @@ struct kbase_jd_atom {
 #else
 		struct dma_fence *fence;
 #endif
+
+		/* This is the callback object that is registered for the fence_in.
+		 * The callback is invoked when the fence_in is signaled.
+		 */
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+		struct fence_cb fence_cb;
+#else
+		struct dma_fence_cb fence_cb;
+#endif
+		bool fence_cb_added;
+
 		unsigned int context;
 		atomic_t seqno;
-		/* This contains a list of all callbacks set up to wait on
-		 * other fences.  This atom must be held back from JS until all
-		 * these callbacks have been called and dep_count have reached
-		 * 0. The initial value of dep_count must be equal to the
-		 * number of callbacks on this list.
-		 *
-		 * This list is protected by jctx.lock. Callbacks are added to
-		 * this list when the atom is built and the wait are set up.
-		 * All the callbacks then stay on the list until all callbacks
-		 * have been called and the atom is queued, or cancelled, and
-		 * then all callbacks are taken off the list and freed.
-		 */
-		struct list_head callbacks;
-		/* Atomic counter of number of outstandind dma-buf fence
-		 * dependencies for this atom. When dep_count reaches 0 the
-		 * atom may be queued.
-		 *
-		 * The special value "-1" may only be set after the count
-		 * reaches 0, while holding jctx.lock. This indicates that the
-		 * atom has been handled, either queued in JS or cancelled.
-		 *
-		 * If anyone but the dma-fence worker sets this to -1 they must
-		 * ensure that any potentially queued worker must have
-		 * completed before allowing the atom to be marked as unused.
-		 * This can be done by flushing the fence work queue:
-		 * kctx->dma_fence.wq.
-		 */
-		atomic_t dep_count;
 	} dma_fence;
-#endif /* CONFIG_MALI_DMA_FENCE || CONFIG_SYNC_FILE */
+#endif /* CONFIG_SYNC_FILE */
 
 	/* Note: refer to kbasep_js_atom_retained_state, which will take a copy
 	 * of some of the following members
@@ -623,12 +595,10 @@ struct kbase_jd_atom {
 #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS)
 	int work_id;
 #endif
-	int slot_nr;
+	unsigned int slot_nr;
 
 	u32 atom_flags;
 
-	int retry_count;
-
 	enum kbase_atom_gpu_rb_state gpu_rb_state;
 
 	bool need_cache_flush_cores_retained;
@@ -672,7 +642,7 @@ static inline bool kbase_jd_katom_is_protected(
 }
 
 /**
- * kbase_atom_is_younger - query if one atom is younger by age than another
+ * kbase_jd_atom_is_younger - query if one atom is younger by age than another
  *
  * @katom_a: the first atom
  * @katom_b: the second atom
diff --git a/mali_kbase/jm/mali_kbase_jm_js.h b/mali_kbase/jm/mali_kbase_jm_js.h
index f01e8bb..53819ca 100644
--- a/mali_kbase/jm/mali_kbase_jm_js.h
+++ b/mali_kbase/jm/mali_kbase_jm_js.h
@@ -29,6 +29,8 @@
 
 #include "mali_kbase_js_ctx_attr.h"
 
+#define JS_MAX_RUNNING_JOBS 8
+
 /**
  * kbasep_js_devdata_init - Initialize the Job Scheduler
  * @kbdev: The kbase_device to operate on
@@ -130,15 +132,15 @@ void kbasep_js_kctx_term(struct kbase_context *kctx);
  * Atoms of higher priority might still be able to be pulled from the context
  * on @js. This helps with starting a high priority atom as soon as possible.
  */
-static inline void kbase_jsctx_slot_prio_blocked_set(struct kbase_context *kctx,
-						     int js, int sched_prio)
+static inline void kbase_jsctx_slot_prio_blocked_set(struct kbase_context *kctx, unsigned int js,
+						     int sched_prio)
 {
 	struct kbase_jsctx_slot_tracking *slot_tracking =
 		&kctx->slot_tracking[js];
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
 	WARN(!slot_tracking->atoms_pulled_pri[sched_prio],
-	     "When marking slot %d as blocked for priority %d on a kctx, no atoms were pulled - the slot cannot become unblocked",
+	     "When marking slot %u as blocked for priority %d on a kctx, no atoms were pulled - the slot cannot become unblocked",
 	     js, sched_prio);
 
 	slot_tracking->blocked |= ((kbase_js_prio_bitmap_t)1) << sched_prio;
@@ -508,19 +510,6 @@ bool kbase_js_dep_resolved_submit(struct kbase_context *kctx,
 		struct kbase_jd_atom *katom);
 
 /**
- * jsctx_ll_flush_to_rb() - Pushes atoms from the linked list to ringbuffer.
- * @kctx:  Context Pointer
- * @prio:  Priority (specifies the queue together with js).
- * @js:    Job slot (specifies the queue together with prio).
- *
- * Pushes all possible atoms from the linked list to the ringbuffer.
- * Number of atoms are limited to free space in the ringbuffer and
- * number of available atoms in the linked list.
- *
- */
-void jsctx_ll_flush_to_rb(struct kbase_context *kctx, int prio, int js);
-
-/**
  * kbase_js_pull - Pull an atom from a context in the job scheduler for
  *                 execution.
  *
@@ -534,7 +523,7 @@ void jsctx_ll_flush_to_rb(struct kbase_context *kctx, int prio, int js);
  * Return: a pointer to an atom, or NULL if there are no atoms for this
  * slot that can be currently run.
  */
-struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js);
+struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, unsigned int js);
 
 /**
  * kbase_js_unpull - Return an atom to the job scheduler ringbuffer.
@@ -615,10 +604,10 @@ bool kbase_js_atom_blocked_on_x_dep(struct kbase_jd_atom *katom);
  * been used.
  *
  */
-void kbase_js_sched(struct kbase_device *kbdev, int js_mask);
+void kbase_js_sched(struct kbase_device *kbdev, unsigned int js_mask);
 
 /**
- * kbase_jd_zap_context - Attempt to deschedule a context that is being
+ * kbase_js_zap_context - Attempt to deschedule a context that is being
  *                        destroyed
  * @kctx: Context pointer
  *
@@ -705,8 +694,10 @@ static inline bool kbasep_js_is_submit_allowed(
 	bool is_allowed;
 
 	/* Ensure context really is scheduled in */
-	KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
-	KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED));
+	if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED),
+		 "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx,
+		 kctx->as_nr, atomic_read(&kctx->flags)))
+		return false;
 
 	test_bit = (u16) (1u << kctx->as_nr);
 
@@ -733,8 +724,10 @@ static inline void kbasep_js_set_submit_allowed(
 	u16 set_bit;
 
 	/* Ensure context really is scheduled in */
-	KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
-	KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED));
+	if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED),
+		 "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx,
+		 kctx->as_nr, atomic_read(&kctx->flags)))
+		return;
 
 	set_bit = (u16) (1u << kctx->as_nr);
 
@@ -763,8 +756,10 @@ static inline void kbasep_js_clear_submit_allowed(
 	u16 clear_mask;
 
 	/* Ensure context really is scheduled in */
-	KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
-	KBASE_DEBUG_ASSERT(kbase_ctx_flag(kctx, KCTX_SCHEDULED));
+	if (WARN((kctx->as_nr == KBASEP_AS_NR_INVALID) || !kbase_ctx_flag(kctx, KCTX_SCHEDULED),
+		 "%s: kctx %pK has assigned AS %d and context flag %d\n", __func__, (void *)kctx,
+		 kctx->as_nr, atomic_read(&kctx->flags)))
+		return;
 
 	clear_bit = (u16) (1u << kctx->as_nr);
 	clear_mask = ~clear_bit;
@@ -798,7 +793,7 @@ static inline void kbasep_js_atom_retained_state_init_invalid(
  * @retained_state: where to copy
  * @katom:          where to copy from
  *
- * Copy atom state that can be made available after jd_done_nolock() is called
+ * Copy atom state that can be made available after kbase_jd_done_nolock() is called
  * on that atom.
  */
 static inline void kbasep_js_atom_retained_state_copy(
@@ -872,9 +867,6 @@ static inline void kbase_js_runpool_inc_context_count(
 	struct kbasep_js_device_data *js_devdata;
 	struct kbasep_js_kctx_info *js_kctx_info;
 
-	KBASE_DEBUG_ASSERT(kbdev != NULL);
-	KBASE_DEBUG_ASSERT(kctx != NULL);
-
 	js_devdata = &kbdev->js_data;
 	js_kctx_info = &kctx->jctx.sched_info;
 
@@ -882,13 +874,12 @@ static inline void kbase_js_runpool_inc_context_count(
 	lockdep_assert_held(&js_devdata->runpool_mutex);
 
 	/* Track total contexts */
-	KBASE_DEBUG_ASSERT(js_devdata->nr_all_contexts_running < S8_MAX);
+	WARN_ON_ONCE(js_devdata->nr_all_contexts_running >= JS_MAX_RUNNING_JOBS);
 	++(js_devdata->nr_all_contexts_running);
 
 	if (!kbase_ctx_flag(kctx, KCTX_SUBMIT_DISABLED)) {
 		/* Track contexts that can submit jobs */
-		KBASE_DEBUG_ASSERT(js_devdata->nr_user_contexts_running <
-									S8_MAX);
+		WARN_ON_ONCE(js_devdata->nr_user_contexts_running >= JS_MAX_RUNNING_JOBS);
 		++(js_devdata->nr_user_contexts_running);
 	}
 }
@@ -909,9 +900,6 @@ static inline void kbase_js_runpool_dec_context_count(
 	struct kbasep_js_device_data *js_devdata;
 	struct kbasep_js_kctx_info *js_kctx_info;
 
-	KBASE_DEBUG_ASSERT(kbdev != NULL);
-	KBASE_DEBUG_ASSERT(kctx != NULL);
-
 	js_devdata = &kbdev->js_data;
 	js_kctx_info = &kctx->jctx.sched_info;
 
@@ -920,12 +908,12 @@ static inline void kbase_js_runpool_dec_context_count(
 
 	/* Track total contexts */
 	--(js_devdata->nr_all_contexts_running);
-	KBASE_DEBUG_ASSERT(js_devdata->nr_all_contexts_running >= 0);
+	WARN_ON_ONCE(js_devdata->nr_all_contexts_running < 0);
 
 	if (!kbase_ctx_flag(kctx, KCTX_SUBMIT_DISABLED)) {
 		/* Track contexts that can submit jobs */
 		--(js_devdata->nr_user_contexts_running);
-		KBASE_DEBUG_ASSERT(js_devdata->nr_user_contexts_running >= 0);
+		WARN_ON_ONCE(js_devdata->nr_user_contexts_running < 0);
 	}
 }
 
@@ -950,8 +938,8 @@ extern const base_jd_prio
 kbasep_js_relative_priority_to_atom[KBASE_JS_ATOM_SCHED_PRIO_COUNT];
 
 /**
- * kbasep_js_atom_prio_to_sched_prio(): - Convert atom priority (base_jd_prio)
- *                                        to relative ordering
+ * kbasep_js_atom_prio_to_sched_prio - Convert atom priority (base_jd_prio)
+ *                                     to relative ordering.
  * @atom_prio: Priority ID to translate.
  *
  * Atom priority values for @ref base_jd_prio cannot be compared directly to
@@ -980,16 +968,33 @@ static inline int kbasep_js_atom_prio_to_sched_prio(base_jd_prio atom_prio)
 	return kbasep_js_atom_priority_to_relative[atom_prio];
 }
 
-static inline base_jd_prio kbasep_js_sched_prio_to_atom_prio(int sched_prio)
+/**
+ * kbasep_js_sched_prio_to_atom_prio - Convert relative scheduler priority
+ *                                     to atom priority (base_jd_prio).
+ *
+ * @kbdev:    Device pointer
+ * @sched_prio: Relative scheduler priority to translate.
+ *
+ * This function will convert relative scheduler priority back into base_jd_prio
+ * values. It takes values which priorities are monotonically increasing
+ * and converts them to the corresponding base_jd_prio values. If an invalid number is
+ * passed in (i.e. not within the expected range) an error code is returned instead.
+ *
+ * The mapping is 1:1 and the size of the valid input range is the same as the
+ * size of the valid output range, i.e.
+ * KBASE_JS_ATOM_SCHED_PRIO_COUNT == BASE_JD_NR_PRIO_LEVELS
+ *
+ * Return: On success: a value in the inclusive range
+ *         0..BASE_JD_NR_PRIO_LEVELS-1. On failure: BASE_JD_PRIO_INVALID.
+ */
+static inline base_jd_prio kbasep_js_sched_prio_to_atom_prio(struct kbase_device *kbdev,
+							     int sched_prio)
 {
-	unsigned int prio_idx;
-
-	KBASE_DEBUG_ASSERT(sched_prio >= 0 &&
-			sched_prio < KBASE_JS_ATOM_SCHED_PRIO_COUNT);
-
-	prio_idx = (unsigned int)sched_prio;
-
-	return kbasep_js_relative_priority_to_atom[prio_idx];
+	if (likely(sched_prio >= 0 && sched_prio < KBASE_JS_ATOM_SCHED_PRIO_COUNT))
+		return kbasep_js_relative_priority_to_atom[sched_prio];
+	/* Invalid priority value if reached here */
+	dev_warn(kbdev->dev, "Unknown JS scheduling priority %d", sched_prio);
+	return BASE_JD_PRIO_INVALID;
 }
 
 /**
diff --git a/mali_kbase/jm/mali_kbase_js_defs.h b/mali_kbase/jm/mali_kbase_js_defs.h
index c5cb9ea..009ff02 100644
--- a/mali_kbase/jm/mali_kbase_js_defs.h
+++ b/mali_kbase/jm/mali_kbase_js_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2018, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -277,6 +277,7 @@ typedef u32 kbase_atom_ordering_flag_t;
  * @nr_contexts_runnable:Number of contexts that can either be pulled from or
  *                       arecurrently running
  * @soft_job_timeout_ms:Value for JS_SOFT_JOB_TIMEOUT
+ * @js_free_wait_time_ms: Maximum waiting time in ms for a Job Slot to be seen free.
  * @queue_mutex: Queue Lock, used to access the Policy's queue of contexts
  *               independently of the Run Pool.
  *               Of course, you don't need the Run Pool lock to access this.
@@ -329,6 +330,8 @@ struct kbasep_js_device_data {
 	u32 nr_contexts_pullable;
 	atomic_t nr_contexts_runnable;
 	atomic_t soft_job_timeout_ms;
+	u32 js_free_wait_time_ms;
+
 	struct rt_mutex queue_mutex;
 	/*
 	 * Run Pool mutex, for managing contexts within the runpool.
@@ -339,6 +342,30 @@ struct kbasep_js_device_data {
 	 * * the kbasep_js_kctx_info::runpool substructure
 	 */
 	struct mutex runpool_mutex;
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	/**
+	 * @gpu_metrics_timer: High-resolution timer used to periodically emit the GPU metrics
+	 *                     tracepoints for applications that are using the GPU. The timer is
+	 *                     needed for the long duration handling so that the length of work
+	 *                     period is within the allowed limit.
+	 */
+	struct hrtimer gpu_metrics_timer;
+
+	/**
+	 * @gpu_metrics_timer_needed: Flag to indicate if the @gpu_metrics_timer is needed.
+	 *                            The timer won't be started after the expiry if the flag
+	 *                            isn't set.
+	 */
+	bool gpu_metrics_timer_needed;
+
+	/**
+	 * @gpu_metrics_timer_running: Flag to indicate if the @gpu_metrics_timer is running.
+	 *                             The flag is set to false when the timer is cancelled or
+	 *                             is not restarted after the expiry.
+	 */
+	bool gpu_metrics_timer_running;
+#endif
 };
 
 /**
@@ -387,7 +414,7 @@ struct kbasep_js_kctx_info {
  * @sched_priority: priority
  * @device_nr: Core group atom was executed on
  *
- * Subset of atom state that can be available after jd_done_nolock() is called
+ * Subset of atom state that can be available after kbase_jd_done_nolock() is called
  * on that atom. A copy must be taken via kbasep_js_atom_retained_state_copy(),
  * because the original atom could disappear.
  */
diff --git a/mali_kbase/mali_base_hwconfig_features.h b/mali_kbase/mali_base_hwconfig_features.h
index a713681..724145f 100644
--- a/mali_kbase/mali_base_hwconfig_features.h
+++ b/mali_kbase/mali_base_hwconfig_features.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,7 +21,7 @@
 
 /* AUTOMATICALLY GENERATED FILE. If you want to amend the issues/features,
  * please update base/tools/hwconfig_generator/hwc_{issues,features}.py
- * For more information see base/tools/hwconfig_generator/README
+ * For more information see base/tools/docs/hwconfig_generator.md
  */
 
 #ifndef _BASE_HWCONFIG_FEATURES_H_
@@ -38,6 +38,9 @@ enum base_hw_feature {
 	BASE_HW_FEATURE_ASN_HASH,
 	BASE_HW_FEATURE_GPU_SLEEP,
 	BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER,
+	BASE_HW_FEATURE_CORE_FEATURES,
+	BASE_HW_FEATURE_PBHA_HWU,
+	BASE_HW_FEATURE_LARGE_PAGE_ALLOC,
 	BASE_HW_FEATURE_END
 };
 
@@ -87,6 +90,7 @@ __attribute__((unused)) static const enum base_hw_feature base_hw_features_tGOx[
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
 	BASE_HW_FEATURE_TLS_HASHING,
 	BASE_HW_FEATURE_IDVS_GROUP_SIZE,
+	BASE_HW_FEATURE_CORE_FEATURES,
 	BASE_HW_FEATURE_END
 };
 
@@ -128,47 +132,52 @@ __attribute__((unused)) static const enum base_hw_feature base_hw_features_tBAx[
 	BASE_HW_FEATURE_END
 };
 
-__attribute__((unused)) static const enum base_hw_feature base_hw_features_tDUx[] = {
+__attribute__((unused)) static const enum base_hw_feature base_hw_features_tODx[] = {
 	BASE_HW_FEATURE_FLUSH_REDUCTION,
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
-	BASE_HW_FEATURE_IDVS_GROUP_SIZE,
 	BASE_HW_FEATURE_L2_CONFIG,
 	BASE_HW_FEATURE_CLEAN_ONLY_SAFE,
-	BASE_HW_FEATURE_FLUSH_INV_SHADER_OTHER,
 	BASE_HW_FEATURE_END
 };
 
-__attribute__((unused)) static const enum base_hw_feature base_hw_features_tODx[] = {
+__attribute__((unused)) static const enum base_hw_feature base_hw_features_tGRx[] = {
 	BASE_HW_FEATURE_FLUSH_REDUCTION,
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
 	BASE_HW_FEATURE_L2_CONFIG,
 	BASE_HW_FEATURE_CLEAN_ONLY_SAFE,
+	BASE_HW_FEATURE_CORE_FEATURES,
 	BASE_HW_FEATURE_END
 };
 
-__attribute__((unused)) static const enum base_hw_feature base_hw_features_tGRx[] = {
+__attribute__((unused)) static const enum base_hw_feature base_hw_features_tVAx[] = {
 	BASE_HW_FEATURE_FLUSH_REDUCTION,
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
 	BASE_HW_FEATURE_L2_CONFIG,
 	BASE_HW_FEATURE_CLEAN_ONLY_SAFE,
+	BASE_HW_FEATURE_CORE_FEATURES,
 	BASE_HW_FEATURE_END
 };
 
-__attribute__((unused)) static const enum base_hw_feature base_hw_features_tVAx[] = {
+__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTUx[] = {
 	BASE_HW_FEATURE_FLUSH_REDUCTION,
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
 	BASE_HW_FEATURE_L2_CONFIG,
 	BASE_HW_FEATURE_CLEAN_ONLY_SAFE,
+	BASE_HW_FEATURE_ASN_HASH,
+	BASE_HW_FEATURE_GPU_SLEEP,
+	BASE_HW_FEATURE_CORE_FEATURES,
 	BASE_HW_FEATURE_END
 };
 
-__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTUx[] = {
+__attribute__((unused)) static const enum base_hw_feature base_hw_features_tTIx[] = {
 	BASE_HW_FEATURE_FLUSH_REDUCTION,
 	BASE_HW_FEATURE_PROTECTED_DEBUG_MODE,
 	BASE_HW_FEATURE_L2_CONFIG,
 	BASE_HW_FEATURE_CLEAN_ONLY_SAFE,
 	BASE_HW_FEATURE_ASN_HASH,
 	BASE_HW_FEATURE_GPU_SLEEP,
+	BASE_HW_FEATURE_CORE_FEATURES,
+	BASE_HW_FEATURE_PBHA_HWU,
 	BASE_HW_FEATURE_END
 };
 
diff --git a/mali_kbase/mali_base_hwconfig_issues.h b/mali_kbase/mali_base_hwconfig_issues.h
index 8766a6d..003edda 100644
--- a/mali_kbase/mali_base_hwconfig_issues.h
+++ b/mali_kbase/mali_base_hwconfig_issues.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,7 +21,7 @@
 
 /* AUTOMATICALLY GENERATED FILE. If you want to amend the issues/features,
  * please update base/tools/hwconfig_generator/hwc_{issues,features}.py
- * For more information see base/tools/hwconfig_generator/README
+ * For more information see base/tools/docs/hwconfig_generator.md
  */
 
 #ifndef _BASE_HWCONFIG_ISSUES_H_
@@ -61,6 +61,13 @@ enum base_hw_issue {
 	BASE_HW_ISSUE_GPU2019_3212,
 	BASE_HW_ISSUE_TURSEHW_1997,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -85,6 +92,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p0
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -105,6 +115,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p0
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -125,6 +138,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tMIx_r0p1
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -140,6 +156,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tMI
 	BASE_HW_ISSUE_TMIX_8343,
 	BASE_HW_ISSUE_TMIX_8456,
 	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -153,6 +172,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p0
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -166,6 +188,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p1
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -179,6 +204,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p2
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -191,6 +219,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tHEx_r0p3
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -201,6 +232,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tHE
 	BASE_HW_ISSUE_TMIX_8042,
 	BASE_HW_ISSUE_TMIX_8133,
 	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -214,6 +248,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r0p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -227,6 +264,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r0p1
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -239,6 +279,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r1p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -250,6 +293,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tSIx_r1p1
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -260,6 +306,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tSI
 	BASE_HW_ISSUE_TSIX_1116,
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -271,6 +320,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tDVx_r0p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -281,6 +333,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tDV
 	BASE_HW_ISSUE_TSIX_1116,
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -293,6 +348,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNOx_r0p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -303,6 +361,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tNO
 	BASE_HW_ISSUE_TSIX_1116,
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -315,6 +376,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGOx_r0p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -327,6 +391,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGOx_r1p0
 	BASE_HW_ISSUE_TTRX_921,
 	BASE_HW_ISSUE_GPU2017_1336,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -337,6 +404,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGO
 	BASE_HW_ISSUE_TSIX_1116,
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -353,6 +423,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p0
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
 	BASE_HW_ISSUE_TTRX_3485,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -369,6 +442,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p1
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
 	BASE_HW_ISSUE_TTRX_3485,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -384,6 +460,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTRx_r0p2
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -396,6 +475,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTR
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -412,6 +494,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNAx_r0p0
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
 	BASE_HW_ISSUE_TTRX_3485,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -427,6 +512,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tNAx_r0p1
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -439,6 +527,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tNA
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -453,6 +544,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r0p0
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
 	BASE_HW_ISSUE_TTRX_3485,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -466,6 +560,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r0p1
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -479,6 +576,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r1p0
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -492,6 +592,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBEx_r1p1
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -504,6 +607,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tBE
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -518,6 +624,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_lBEx_r1p0
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
 	BASE_HW_ISSUE_TTRX_3485,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -531,6 +640,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_lBEx_r1p1
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -544,6 +656,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBAx_r0p0
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -557,6 +672,9 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_tBAx_r1p0
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
@@ -569,105 +687,201 @@ __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tBA
 	BASE_HW_ISSUE_TTRX_3083,
 	BASE_HW_ISSUE_TTRX_3470,
 	BASE_HW_ISSUE_TTRX_3464,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tDUx_r0p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tODx_r0p0[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
-	BASE_HW_ISSUE_TTRX_921,
-	BASE_HW_ISSUE_TTRX_3414,
-	BASE_HW_ISSUE_TTRX_3083,
+	BASE_HW_ISSUE_GPU2019_3212,
+	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tDUx[] = {
-	BASE_HW_ISSUE_5736,
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tODx[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
-	BASE_HW_ISSUE_TTRX_3414,
-	BASE_HW_ISSUE_TTRX_3083,
+	BASE_HW_ISSUE_GPU2019_3212,
+	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tODx_r0p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGRx_r0p0[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
-	BASE_HW_ISSUE_GPU2019_3212,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tODx[] = {
-	BASE_HW_ISSUE_5736,
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGRx[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
-	BASE_HW_ISSUE_GPU2019_3212,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tGRx_r0p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tVAx_r0p0[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tGRx[] = {
-	BASE_HW_ISSUE_5736,
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tVAx[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tVAx_r0p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p0[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_TURSEHW_1997,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tVAx[] = {
-	BASE_HW_ISSUE_5736,
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p1[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_TURSEHW_1997,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
 __attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTUx[] = {
-	BASE_HW_ISSUE_5736,
-	BASE_HW_ISSUE_9435,
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r0p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p0[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
-	BASE_HW_ISSUE_TURSEHW_1997,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
-__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p0[] = {
-	BASE_HW_ISSUE_9435,
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p1[] = {
+	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
+	BASE_HW_ISSUE_END
+};
+
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p2[] = {
 	BASE_HW_ISSUE_TSIX_2033,
 	BASE_HW_ISSUE_TTRX_1337,
 	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
+	BASE_HW_ISSUE_END
+};
+
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTUx_r1p3[] = {
+	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_GPU2019_3878,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2019_3901,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
+	BASE_HW_ISSUE_END
+};
+
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_model_tTIx[] = {
+	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
+	BASE_HW_ISSUE_END
+};
+
+__attribute__((unused)) static const enum base_hw_issue base_hw_issues_tTIx_r0p0[] = {
+	BASE_HW_ISSUE_TSIX_2033,
+	BASE_HW_ISSUE_TTRX_1337,
+	BASE_HW_ISSUE_TURSEHW_2716,
+	BASE_HW_ISSUE_GPU2021PRO_290,
+	BASE_HW_ISSUE_TITANHW_2710,
+	BASE_HW_ISSUE_TITANHW_2679,
+	BASE_HW_ISSUE_GPU2022PRO_148,
+	BASE_HW_ISSUE_TITANHW_2938,
 	BASE_HW_ISSUE_END
 };
 
diff --git a/mali_kbase/mali_kbase.h b/mali_kbase/mali_kbase.h
index 9f2d209..d9e632f 100644
--- a/mali_kbase/mali_kbase.h
+++ b/mali_kbase/mali_kbase.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -52,6 +52,7 @@
 #include <uapi/gpu/arm/midgard/mali_base_kernel.h>
 
 #include <mali_kbase_linux.h>
+#include <linux/version_compat_defs.h>
 
 /*
  * Include mali_kbase_defs.h first as this provides types needed by other local
@@ -61,9 +62,7 @@
 
 #include "debug/mali_kbase_debug_ktrace.h"
 #include "context/mali_kbase_context.h"
-#include "mali_kbase_strings.h"
 #include "mali_kbase_mem_lowlevel.h"
-#include "mali_kbase_utility.h"
 #include "mali_kbase_mem.h"
 #include "mmu/mali_kbase_mmu.h"
 #include "mali_kbase_gpu_memory_debugfs.h"
@@ -75,7 +74,9 @@
 #include "mali_kbase_jd_debugfs.h"
 #include "mali_kbase_jm.h"
 #include "mali_kbase_js.h"
-#endif /* !MALI_USE_CSF */
+#else /* !MALI_USE_CSF */
+#include "csf/mali_kbase_debug_csf_fault.h"
+#endif /* MALI_USE_CSF */
 
 #include "ipa/mali_kbase_ipa.h"
 
@@ -85,16 +86,12 @@
 
 #include "mali_linux_trace.h"
 
+#define KBASE_DRV_NAME "mali"
+#define KBASE_TIMELINE_NAME KBASE_DRV_NAME ".timeline"
+
 #if MALI_USE_CSF
 #include "csf/mali_kbase_csf.h"
-#endif
 
-#ifndef u64_to_user_ptr
-/* Introduced in Linux v4.6 */
-#define u64_to_user_ptr(x) ((void __user *)(uintptr_t)x)
-#endif
-
-#if MALI_USE_CSF
 /* Physical memory group ID for CSF user I/O.
  */
 #define KBASE_MEM_GROUP_CSF_IO BASE_MEM_GROUP_DEFAULT
@@ -266,7 +263,7 @@ void kbase_jd_cancel(struct kbase_device *kbdev, struct kbase_jd_atom *katom);
 void kbase_jd_zap_context(struct kbase_context *kctx);
 
 /*
- * jd_done_nolock - Perform the necessary handling of an atom that has completed
+ * kbase_jd_done_nolock - Perform the necessary handling of an atom that has completed
  *                  the execution.
  *
  * @katom: Pointer to the atom that completed the execution
@@ -282,7 +279,7 @@ void kbase_jd_zap_context(struct kbase_context *kctx);
  *
  * The caller must hold the kbase_jd_context.lock.
  */
-bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately);
+bool kbase_jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately);
 
 void kbase_jd_free_external_resources(struct kbase_jd_atom *katom);
 void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom);
@@ -345,21 +342,8 @@ int kbase_job_slot_softstop_start_rp(struct kbase_context *kctx,
 void kbase_job_slot_softstop(struct kbase_device *kbdev, int js,
 		struct kbase_jd_atom *target_katom);
 
-void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, int js,
-		struct kbase_jd_atom *target_katom, u32 sw_flags);
-
-/**
- * kbase_job_slot_hardstop - Hard-stop the specified job slot
- * @kctx:         The kbase context that contains the job(s) that should
- *                be hard-stopped
- * @js:           The job slot to hard-stop
- * @target_katom: The job that should be hard-stopped (or NULL for all
- *                jobs from the context)
- * Context:
- *   The job slot lock must be held when calling this function.
- */
-void kbase_job_slot_hardstop(struct kbase_context *kctx, int js,
-		struct kbase_jd_atom *target_katom);
+void kbase_job_slot_softstop_swflags(struct kbase_device *kbdev, unsigned int js,
+				     struct kbase_jd_atom *target_katom, u32 sw_flags);
 
 /**
  * kbase_job_check_enter_disjoint - potentiall enter disjoint mode
@@ -454,19 +438,6 @@ static inline void kbase_free_user_buffer(
 	}
 }
 
-/**
- * kbase_mem_copy_from_extres() - Copy from external resources.
- *
- * @kctx:	kbase context within which the copying is to take place.
- * @buf_data:	Pointer to the information about external resources:
- *		pages pertaining to the external resource, number of
- *		pages to copy.
- *
- * Return:      0 on success, error code otherwise.
- */
-int kbase_mem_copy_from_extres(struct kbase_context *kctx,
-		struct kbase_debug_copy_buffer *buf_data);
-
 #if !MALI_USE_CSF
 int kbase_process_soft_job(struct kbase_jd_atom *katom);
 int kbase_prepare_soft_job(struct kbase_jd_atom *katom);
@@ -474,7 +445,7 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom);
 void kbase_cancel_soft_job(struct kbase_jd_atom *katom);
 void kbase_resume_suspended_soft_jobs(struct kbase_device *kbdev);
 void kbasep_remove_waiting_soft_job(struct kbase_jd_atom *katom);
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom);
 #endif
 int kbase_soft_event_update(struct kbase_context *kctx,
@@ -493,9 +464,9 @@ void kbasep_as_do_poke(struct work_struct *work);
  *
  * @kbdev: The kbase device structure for the device
  *
- * The caller should ensure that either kbdev->pm.active_count_lock is held, or
- * a dmb was executed recently (to ensure the value is most
- * up-to-date). However, without a lock the value could change afterwards.
+ * The caller should ensure that either kbase_device::kbase_pm_device_data::lock is held,
+ * or a dmb was executed recently (to ensure the value is most up-to-date).
+ * However, without a lock the value could change afterwards.
  *
  * Return:
  * * false if a suspend is not in progress
@@ -506,6 +477,22 @@ static inline bool kbase_pm_is_suspending(struct kbase_device *kbdev)
 	return kbdev->pm.suspending;
 }
 
+/**
+ * kbase_pm_is_resuming - Check whether System resume of GPU device is in progress.
+ *
+ * @kbdev: The kbase device structure for the device
+ *
+ * The caller should ensure that either kbase_device::kbase_pm_device_data::lock is held,
+ * or a dmb was executed recently (to ensure the value is most up-to-date).
+ * However, without a lock the value could change afterwards.
+ *
+ * Return: true if System resume is in progress, otherwise false.
+ */
+static inline bool kbase_pm_is_resuming(struct kbase_device *kbdev)
+{
+	return kbdev->pm.resuming;
+}
+
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 /*
  * Check whether a gpu lost is in progress
@@ -559,6 +546,23 @@ static inline bool kbase_pm_is_active(struct kbase_device *kbdev)
 }
 
 /**
+ * kbase_pm_gpu_freq_init() - Find the lowest frequency that the GPU can
+ *                            run as using the device tree, then query the
+ *                            GPU properties to find out the highest GPU
+ *                            frequency and store both of them within the
+ *                            @kbase_device.
+ * @kbdev: Pointer to kbase device.
+ *
+ * This function could be called from kbase_clk_rate_trace_manager_init,
+ * but is left separate as it can be called as soon as
+ * dev_pm_opp_of_add_table() has been called to initialize the OPP table,
+ * which occurs in power_control_init().
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kbase_pm_gpu_freq_init(struct kbase_device *kbdev);
+
+/**
  * kbase_pm_metrics_start - Start the utilization metrics timer
  * @kbdev: Pointer to the kbase device for which to start the utilization
  *         metrics calculation thread.
@@ -576,6 +580,40 @@ void kbase_pm_metrics_start(struct kbase_device *kbdev);
  */
 void kbase_pm_metrics_stop(struct kbase_device *kbdev);
 
+/**
+ * kbase_pm_init_event_log - Initialize the event log and make it discoverable
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ */
+void kbase_pm_init_event_log(struct kbase_device *kbdev);
+
+/**
+ * kbase_pm_max_event_log_size - Get the largest size of the power management event log
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ *
+ * Return: The size of a buffer large enough to contain the log at any time.
+ */
+u64 kbase_pm_max_event_log_size(struct kbase_device *kbdev);
+
+/**
+ * kbase_pm_copy_event_log - Retrieve a copy of the power management event log
+ *
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ * @buffer: If non-NULL, a buffer of @size bytes to copy the data into
+ * @size: The size of buffer (should be at least as large as returned by
+ *        kbase_pm_event_max_log_size())
+ *
+ * This function is called when dumping a debug log of all recent events in the
+ * power management backend.
+ *
+ * Return: 0 if the log could be copied successfully, otherwise an error code.
+ *
+ * Requires kbdev->pmaccess_lock to be held.
+ */
+int kbase_pm_copy_event_log(struct kbase_device *kbdev,
+		void *buffer, u64 size);
+
 #if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
 /**
  * kbase_pm_handle_runtime_suspend - Handle the runtime suspend of GPU
@@ -614,6 +652,7 @@ int kbase_pm_handle_runtime_suspend(struct kbase_device *kbdev);
  * Return: 0 if the wake up was successful.
  */
 int kbase_pm_force_mcu_wakeup_after_sleep(struct kbase_device *kbdev);
+
 #endif
 
 #if !MALI_USE_CSF
@@ -763,23 +802,153 @@ void kbase_device_pcm_dev_term(struct kbase_device *const kbdev);
 #define KBASE_DISJOINT_STATE_INTERLEAVED_CONTEXT_COUNT_THRESHOLD 2
 
 /**
- * kbase_create_realtime_thread - Create a realtime thread with an appropriate coremask
+ * kbase_kthread_run_rt - Create a realtime thread with an appropriate coremask
  *
- * @kbdev:    the kbase device
- * @threadfn: the function the realtime thread will execute
- * @data:     pointer to the thread's data
- * @namefmt:  a name for the thread.
+ * @kbdev:        the kbase device
+ * @threadfn:     the function the realtime thread will execute
+ * @thread_param: data pointer to @threadfn
+ * @namefmt:      a name for the thread.
  *
  * Creates a realtime kthread with priority &KBASE_RT_THREAD_PRIO and restricted
  * to cores defined by &KBASE_RT_THREAD_CPUMASK_MIN and &KBASE_RT_THREAD_CPUMASK_MAX.
  *
- * Return: A valid &struct task_struct pointer on success, or an ERR_PTR on failure.
+ * Wakes up the task.
+ *
+ * Return: IS_ERR() on failure, or a valid task pointer.
  */
-struct task_struct * kbase_create_realtime_thread(struct kbase_device *kbdev,
-	int (*threadfn)(void *data), void *data, const char namefmt[]);
+struct task_struct *kbase_kthread_run_rt(struct kbase_device *kbdev,
+	int (*threadfn)(void *data), void *thread_param, const char namefmt[], ...);
+
+/**
+ * kbase_kthread_run_worker_rt - Create a realtime kthread_worker_fn with an appropriate coremask
+ *
+ * @kbdev:   the kbase device
+ * @worker:  pointer to the thread's parameters
+ * @namefmt: a name for the thread.
+ *
+ * Creates a realtime kthread_worker_fn thread with priority &KBASE_RT_THREAD_PRIO and restricted
+ * to cores defined by &KBASE_RT_THREAD_CPUMASK_MIN and &KBASE_RT_THREAD_CPUMASK_MAX.
+ *
+ * Wakes up the task.
+ *
+ * Return: Zero on success, or an PTR_ERR on failure.
+ */
+int kbase_kthread_run_worker_rt(struct kbase_device *kbdev,
+	struct kthread_worker *worker, const char namefmt[], ...);
+
+/**
+ * kbase_destroy_kworker_stack - Destroy a kthread_worker and it's thread on the stack
+ *
+ * @worker:   pointer to the thread's kworker
+ */
+void kbase_destroy_kworker_stack(struct kthread_worker *worker);
 
 #if !defined(UINT64_MAX)
 	#define UINT64_MAX ((uint64_t)0xFFFFFFFFFFFFFFFFULL)
 #endif
 
+/**
+ * kbase_file_fops_count() - Get the kfile::fops_count value
+ *
+ * @kfile: Pointer to the object representing the mali device file.
+ *
+ * The value is read with kfile::lock held.
+ *
+ * Return: sampled value of kfile::fops_count.
+ */
+static inline u32 kbase_file_fops_count(struct kbase_file *kfile)
+{
+	u32 fops_count;
+
+	spin_lock(&kfile->lock);
+	fops_count = kfile->fops_count;
+	spin_unlock(&kfile->lock);
+
+	return fops_count;
+}
+
+/**
+ * kbase_file_inc_fops_count_unless_closed() - Increment the kfile::fops_count value if the
+ *                                             kfile::owner is still set.
+ *
+ * @kfile: Pointer to the object representing the /dev/malixx device file instance.
+ *
+ * Return: true if the increment was done otherwise false.
+ */
+static inline bool kbase_file_inc_fops_count_unless_closed(struct kbase_file *kfile)
+{
+	bool count_incremented = false;
+
+	spin_lock(&kfile->lock);
+	if (kfile->owner) {
+		kfile->fops_count++;
+		count_incremented = true;
+	}
+	spin_unlock(&kfile->lock);
+
+	return count_incremented;
+}
+
+/**
+ * kbase_file_dec_fops_count() - Decrement the kfile::fops_count value
+ *
+ * @kfile: Pointer to the object representing the /dev/malixx device file instance.
+ *
+ * This function shall only be called to decrement kfile::fops_count if a successful call
+ * to kbase_file_inc_fops_count_unless_closed() was made previously by the current thread.
+ *
+ * The function would enqueue the kfile::destroy_kctx_work if the process that originally
+ * created the file instance has closed its copy and no Kbase handled file operations are
+ * in progress and no memory mappings are present for the file instance.
+ */
+static inline void kbase_file_dec_fops_count(struct kbase_file *kfile)
+{
+	spin_lock(&kfile->lock);
+	WARN_ON_ONCE(kfile->fops_count <= 0);
+	kfile->fops_count--;
+	if (unlikely(!kfile->fops_count && !kfile->owner && !kfile->map_count)) {
+		queue_work(system_wq, &kfile->destroy_kctx_work);
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+		wake_up(&kfile->zero_fops_count_wait);
+#endif
+	}
+	spin_unlock(&kfile->lock);
+}
+
+/**
+ * kbase_file_inc_cpu_mapping_count() - Increment the kfile::map_count value.
+ *
+ * @kfile: Pointer to the object representing the /dev/malixx device file instance.
+ *
+ * This function shall be called when the memory mapping on /dev/malixx device file
+ * instance is created. The kbase_file::setup_state shall be KBASE_FILE_COMPLETE.
+ */
+static inline void kbase_file_inc_cpu_mapping_count(struct kbase_file *kfile)
+{
+	spin_lock(&kfile->lock);
+	kfile->map_count++;
+	spin_unlock(&kfile->lock);
+}
+
+/**
+ * kbase_file_dec_cpu_mapping_count() - Decrement the kfile::map_count value
+ *
+ * @kfile: Pointer to the object representing the /dev/malixx device file instance.
+ *
+ * This function is called to decrement kfile::map_count value when the memory mapping
+ * on /dev/malixx device file is closed.
+ * The function would enqueue the kfile::destroy_kctx_work if the process that originally
+ * created the file instance has closed its copy and there are no mappings present and no
+ * Kbase handled file operations are in progress for the file instance.
+ */
+static inline void kbase_file_dec_cpu_mapping_count(struct kbase_file *kfile)
+{
+	spin_lock(&kfile->lock);
+	WARN_ON_ONCE(kfile->map_count <= 0);
+	kfile->map_count--;
+	if (unlikely(!kfile->map_count && !kfile->owner && !kfile->fops_count))
+		queue_work(system_wq, &kfile->destroy_kctx_work);
+	spin_unlock(&kfile->lock);
+}
+
 #endif
diff --git a/mali_kbase/mali_kbase_as_fault_debugfs.c b/mali_kbase/mali_kbase_as_fault_debugfs.c
index 77f450d..ad33691 100644
--- a/mali_kbase/mali_kbase_as_fault_debugfs.c
+++ b/mali_kbase/mali_kbase_as_fault_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2016-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2016-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -98,11 +98,9 @@ void kbase_as_fault_debugfs_init(struct kbase_device *kbdev)
 			 "unable to create address_spaces debugfs directory");
 	} else {
 		for (i = 0; i < kbdev->nr_hw_address_spaces; i++) {
-			snprintf(as_name, ARRAY_SIZE(as_name), "as%u", i);
-			debugfs_create_file(as_name, 0444,
-					    debugfs_directory,
-					    (void *)(uintptr_t)i,
-					    &as_fault_fops);
+			if (likely(scnprintf(as_name, ARRAY_SIZE(as_name), "as%u", i)))
+				debugfs_create_file(as_name, 0444, debugfs_directory,
+						    (void *)(uintptr_t)i, &as_fault_fops);
 		}
 	}
 
diff --git a/mali_kbase/mali_kbase_config.c b/mali_kbase/mali_kbase_config.c
index 37dbca1..32f404b 100644
--- a/mali_kbase/mali_kbase_config.c
+++ b/mali_kbase/mali_kbase_config.c
@@ -63,7 +63,6 @@ void kbasep_platform_device_late_term(struct kbase_device *kbdev)
 		platform_funcs_p->platform_late_term_func(kbdev);
 }
 
-#if !MALI_USE_CSF
 int kbasep_platform_context_init(struct kbase_context *kctx)
 {
 	struct kbase_platform_funcs_conf *platform_funcs_p;
@@ -84,21 +83,41 @@ void kbasep_platform_context_term(struct kbase_context *kctx)
 		platform_funcs_p->platform_handler_context_term_func(kctx);
 }
 
-void kbasep_platform_event_atom_submit(struct kbase_jd_atom *katom)
+void kbasep_platform_event_work_begin(void *param)
 {
 	struct kbase_platform_funcs_conf *platform_funcs_p;
 
-	platform_funcs_p = (struct kbase_platform_funcs_conf *)PLATFORM_FUNCS;
-	if (platform_funcs_p && platform_funcs_p->platform_handler_atom_submit_func)
-		platform_funcs_p->platform_handler_atom_submit_func(katom);
+	platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS;
+	if (platform_funcs_p && platform_funcs_p->platform_handler_work_begin_func)
+		platform_funcs_p->platform_handler_work_begin_func(param);
 }
 
-void kbasep_platform_event_atom_complete(struct kbase_jd_atom *katom)
+void kbasep_platform_event_work_end(void *param)
 {
 	struct kbase_platform_funcs_conf *platform_funcs_p;
 
-	platform_funcs_p = (struct kbase_platform_funcs_conf *)PLATFORM_FUNCS;
-	if (platform_funcs_p && platform_funcs_p->platform_handler_atom_complete_func)
-		platform_funcs_p->platform_handler_atom_complete_func(katom);
+	platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS;
+	if (platform_funcs_p && platform_funcs_p->platform_handler_work_end_func)
+		platform_funcs_p->platform_handler_work_end_func(param);
 }
-#endif
+
+int kbasep_platform_fw_config_init(struct kbase_device *kbdev)
+{
+	struct kbase_platform_funcs_conf *platform_funcs_p;
+
+	platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS;
+	if (platform_funcs_p && platform_funcs_p->platform_fw_cfg_init_func)
+		return platform_funcs_p->platform_fw_cfg_init_func(kbdev);
+
+	return 0;
+}
+
+void kbasep_platform_event_core_dump(struct kbase_device *kbdev, const char* reason)
+{
+	struct kbase_platform_funcs_conf *platform_funcs_p;
+
+	platform_funcs_p = (struct kbase_platform_funcs_conf*)PLATFORM_FUNCS;
+	if (platform_funcs_p && platform_funcs_p->platform_handler_core_dump_func)
+		platform_funcs_p->platform_handler_core_dump_func(kbdev, reason);
+}
+
diff --git a/mali_kbase/mali_kbase_config.h b/mali_kbase/mali_kbase_config.h
index ecfdb28..ab65216 100644
--- a/mali_kbase/mali_kbase_config.h
+++ b/mali_kbase/mali_kbase_config.h
@@ -34,14 +34,9 @@
 /* Forward declaration of struct kbase_device */
 struct kbase_device;
 
-#if !MALI_USE_CSF
 /* Forward declaration of struct kbase_context */
 struct kbase_context;
 
-/* Forward declaration of struct kbase_atom */
-struct kbase_jd_atom;
-#endif
-
 /**
  * struct kbase_platform_funcs_conf - Specifies platform integration function
  * pointers for DDK events such as device init and term.
@@ -104,8 +99,6 @@ struct kbase_platform_funcs_conf {
 	 * can be accessed (and possibly terminated) in here.
 	 */
 	void (*platform_late_term_func)(struct kbase_device *kbdev);
-
-#if !MALI_USE_CSF
 	/**
 	 * @platform_handler_context_init_func: platform specific handler for
 	 * when a new kbase_context is created.
@@ -129,33 +122,63 @@ struct kbase_platform_funcs_conf {
 	 */
 	void (*platform_handler_context_term_func)(struct kbase_context *kctx);
 	/**
-	 * @platform_handler_atom_submit_func: platform specific handler for
-	 * when a kbase_jd_atom is submitted.
-	 * @katom - kbase_jd_atom pointer
+	 * platform_handler_work_begin_func - Platform specific handler whose
+	 *                                    function changes depending on the
+	 *                                    backend used.
+	 * @param
+	 *  - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*,
+	 *    to the atom that just started executing.
+	 *  - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to
+	 *    the group resident in a CSG slot which just started executing.
+	 *
+	 * Function pointer for platform specific handling at the point when a unit
+	 * of work starts running on the GPU or set to NULL if not required. The
+	 * function cannot assume that it is running in a process context.
 	 *
-	 * Function pointer for platform specific handling at the point when an
-	 * atom is submitted to the GPU or set to NULL if not required. The
+	 * Context:
+	 *  - If job manager: Function must be runnable in an interrupt context.
+	 */
+	void (*platform_handler_work_begin_func)(void* param);
+	/**
+	 * platform_handler_work_end_func - Platform specific handler whose function
+	 *                                  changes depending on the backend used.
+	 * @param
+	 *  - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*,
+	 *    to the atom that just completed.
+	 *  - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to
+	 *    the group resident in a CSG slot which just completed or suspended
+	 *    execution.
+	 *
+	 * Function pointer for platform specific handling at the point when a unit
+	 * of work stops running on the GPU or set to NULL if not required. The
 	 * function cannot assume that it is running in a process context.
 	 *
-	 * Context: The caller must hold the hwaccess_lock. Function must be
-	 *          runnable in an interrupt context.
+	 * Context:
+	 *  - If job manager: Function must be runnable in an interrupt context.
 	 */
-	void (*platform_handler_atom_submit_func)(struct kbase_jd_atom *katom);
+	void (*platform_handler_work_end_func)(void* param);
 	/**
-	 * @platform_handler_atom_complete_func: platform specific handler for
-	 * when a kbase_jd_atom completes.
-	 * @katom - kbase_jd_atom pointer
+	 * platform_fw_cfg_init_func - Platform specific callback for FW configuration
 	 *
-	 * Function pointer for platform specific handling at the point when an
-	 * atom stops running on the GPU or set to NULL if not required. The
-	 * function cannot assume that it is running in a process context.
+	 * @kbdev: kbase_device pointer
+	 *
+	 * Function pointer for platform specific FW configuration
 	 *
-	 * Context: The caller must hold the hwaccess_lock. Function must be
-	 *          runnable in an interrupt context.
+	 * Context: Process context
 	 */
-	void (*platform_handler_atom_complete_func)(
-		struct kbase_jd_atom *katom);
-#endif
+	int (*platform_fw_cfg_init_func)(struct kbase_device *kbdev);
+	/**
+	 * platform_handler_core_dump_func - Platform specific handler for triggering a core dump.
+	 *
+	 * @kbdev: kbase_device pointer
+	 * @reason: A null terminated string containing a dump reason
+	 *
+	 * Function pointer for platform specific handling at the point an internal error
+	 * has occurred, to dump debug info about the error. Or set to NULL if not required.
+	 *
+	 * Context: The caller must hold the hwaccess lock
+	 */
+	void (*platform_handler_core_dump_func)(struct kbase_device *kbdev, const char* reason);
 };
 
 /*
@@ -297,6 +320,14 @@ struct kbase_pm_callback_conf {
 	int (*soft_reset_callback)(struct kbase_device *kbdev);
 
 	/*
+	 * Optional callback for full hardware reset of the GPU
+	 *
+	 * This callback will be called by the power management core to trigger
+	 * a GPU hardware reset.
+	 */
+	void (*hardware_reset_callback)(struct kbase_device *kbdev);
+
+	/*
 	 * Optional callback invoked after GPU becomes idle, not supported on
 	 * JM GPUs.
 	 *
@@ -338,6 +369,24 @@ struct kbase_pm_callback_conf {
 	 * this feature.
 	 */
 	void (*power_runtime_gpu_active_callback)(struct kbase_device *kbdev);
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	/*
+	 * This callback will be invoked by the Kbase when GPU becomes active
+	 * to turn on the shader core power rails.
+	 * This callback is invoked from process context and the power rails
+	 * must be turned on before the completion of callback.
+	 */
+	void (*power_on_sc_rails_callback)(struct kbase_device *kbdev);
+
+	/*
+	 * This callback will be invoked by the Kbase when GPU becomes idle
+	 * to turn off the shader core power rails.
+	 * This callback is invoked from process context and the power rails
+	 * must be turned off before the completion of callback.
+	 */
+	void (*power_off_sc_rails_callback)(struct kbase_device *kbdev);
+#endif
 };
 
 /* struct kbase_gpu_clk_notifier_data - Data for clock rate change notifier.
@@ -511,7 +560,6 @@ int kbasep_platform_device_late_init(struct kbase_device *kbdev);
  */
 void kbasep_platform_device_late_term(struct kbase_device *kbdev);
 
-#if !MALI_USE_CSF
 /**
  * kbasep_platform_context_init - Platform specific callback when a kernel
  *                                context is created
@@ -538,28 +586,58 @@ int kbasep_platform_context_init(struct kbase_context *kctx);
 void kbasep_platform_context_term(struct kbase_context *kctx);
 
 /**
- * kbasep_platform_event_atom_submit - Platform specific callback when an atom
- *                                     is submitted to the GPU
- * @katom: kbase_jd_atom pointer
+ * kbasep_platform_event_work_begin - Platform specific callback whose function
+ *                                    changes depending on the backend used.
+ *                                    Signals that a unit of work has started
+ *                                    running on the GPU.
+ * @param
+ *  - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*,
+ *    to the atom that just started executing.
+ *  - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to
+ *    the group resident in a CSG slot which just started executing.
  *
  * Function calls a platform defined routine if specified in the configuration
- * attributes.  The routine should not assume that it is in a process context.
+ * attributes. The routine should not assume that it is in a process context.
  *
- * Return: 0 if no errors were encountered. Negative error code otherwise.
  */
-void kbasep_platform_event_atom_submit(struct kbase_jd_atom *katom);
+void kbasep_platform_event_work_begin(void *param);
 
 /**
- * kbasep_platform_event_atom_complete - Platform specific callback when an atom
- *                                       has stopped running on the GPU
- * @katom: kbase_jd_atom pointer
+ * kbasep_platform_event_work_end - Platform specific callback whose function
+ *                                  changes depending on the backend used.
+ *                                  Signals that a unit of work has completed.
+ * @param
+ *  - If job manager GPU: Param is a pointer of type struct kbase_jd_atom*,
+ *    to the atom that just completed.
+ *  - If CSF GPU: Param is a pointer of type struct kbase_queue_group*, to
+ *    the group resident in a CSG slot which just completed or suspended execution.
  *
  * Function calls a platform defined routine if specified in the configuration
- * attributes.  The routine should not assume that it is in a process context.
+ * attributes. The routine should not assume that it is in a process context.
  *
  */
-void kbasep_platform_event_atom_complete(struct kbase_jd_atom *katom);
-#endif
+void kbasep_platform_event_work_end(void *param);
+
+/**
+ * kbasep_platform_fw_config_init - Platform specific callback to configure FW
+ *
+ * @kbdev - kbase_device pointer
+ *
+ * Function calls a platform defined routine if specified in the configuration attributes.
+ *
+ */
+int kbasep_platform_fw_config_init(struct kbase_device *kbdev);
+
+/**
+ * kbasep_platform_event_core_dump - Platform specific callback to act on a firmware error.
+ *
+ * @kbdev - kbase_device pointer
+ * @reason: A null terminated string containing a dump reason
+ *
+ * Function calls a platform defined routine if specified in the configuration attributes.
+ *
+ */
+void kbasep_platform_event_core_dump(struct kbase_device *kbdev, const char* reason);
 
 #ifndef CONFIG_OF
 /**
diff --git a/mali_kbase/mali_kbase_config_defaults.h b/mali_kbase/mali_kbase_config_defaults.h
index 18e40b5..fa73612 100644
--- a/mali_kbase/mali_kbase_config_defaults.h
+++ b/mali_kbase/mali_kbase_config_defaults.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2013-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -89,6 +89,18 @@ enum {
 	KBASE_3BIT_AID_4 = 0x7
 };
 
+#if MALI_USE_CSF
+/*
+ * Default value for the TIMER register of the IPA Control interface,
+ * expressed in milliseconds.
+ *
+ * The chosen value is a trade off between two requirements: the IPA Control
+ * interface should sample counters with a resolution in the order of
+ * milliseconds, while keeping GPU overhead as limited as possible.
+ */
+#define IPA_CONTROL_TIMER_DEFAULT_VALUE_MS ((u32)10) /* 10 milliseconds */
+#endif /* MALI_USE_CSF */
+
 /* Default period for DVFS sampling (can be overridden by platform header) */
 #ifndef DEFAULT_PM_DVFS_PERIOD
 #define DEFAULT_PM_DVFS_PERIOD 100 /* 100ms */
@@ -158,11 +170,6 @@ enum {
  */
 #define DEFAULT_JS_RESET_TICKS_DUMPING   (15020) /* 1502s */
 
-/* Default number of milliseconds given for other jobs on the GPU to be
- * soft-stopped when the GPU needs to be reset.
- */
-#define DEFAULT_RESET_TIMEOUT_MS (3000) /* 3s */
-
 /* Nominal reference frequency that was used to obtain all following
  * <...>_TIMEOUT_CYCLES macros, in kHz.
  *
@@ -176,11 +183,12 @@ enum {
  *
  * This is also the default timeout to be used when an invalid timeout
  * selector is used to retrieve the timeout on CSF GPUs.
+ * This shouldn't be used as a timeout for the CSG suspend request.
  *
  * Based on 75000ms timeout at nominal 100MHz, as is required for Android - based
  * on scaling from a 50MHz GPU system.
  */
-#define CSF_FIRMWARE_TIMEOUT_CYCLES (7500000000)
+#define CSF_FIRMWARE_TIMEOUT_CYCLES (7500000000ull)
 
 /* Timeout in clock cycles for GPU Power Management to reach the desired
  * Shader, L2 and MCU state.
@@ -189,11 +197,41 @@ enum {
  */
 #define CSF_PM_TIMEOUT_CYCLES (250000000)
 
-/* Waiting timeout in clock cycles for GPU reset to complete.
+/* Waiting timeout in clock cycles for a CSG to be suspended.
+ *
+ * Based on 30s timeout at 100MHz, scaled from 5s at 600Mhz GPU frequency.
+ * More cycles (1s @ 100Mhz = 100000000) are added up to ensure that
+ * host timeout is always bigger than FW timeout.
+ */
+#define CSF_CSG_SUSPEND_TIMEOUT_CYCLES (3100000000ull)
+
+/* Waiting timeout in clock cycles for GPU reset to complete. */
+#define CSF_GPU_RESET_TIMEOUT_CYCLES (CSF_CSG_SUSPEND_TIMEOUT_CYCLES * 2)
+
+/* Waiting timeout in clock cycles for GPU firmware to boot.
+ *
+ * Based on 250ms timeout at 100MHz, scaled from a 50MHz GPU system.
+ */
+#define CSF_FIRMWARE_BOOT_TIMEOUT_CYCLES (25000000)
+
+/* Waiting timeout for a ping request to be acknowledged, in clock cycles.
+ *
+ * Based on 6000ms timeout at 100MHz, scaled from a 50MHz GPU system.
+ */
+#define CSF_FIRMWARE_PING_TIMEOUT_CYCLES (600000000ull)
+
+/* Waiting timeout for a KCPU queue's fence signal blocked to long, in clock cycles.
  *
- * Based on 2500ms timeout at 100MHz, scaled from a 50MHz GPU system.
+ * Based on 10s timeout at 100MHz, scaled from a 50MHz GPU system.
  */
-#define CSF_GPU_RESET_TIMEOUT_CYCLES (250000000)
+#define KCPU_FENCE_SIGNAL_TIMEOUT_CYCLES (1000000000ull)
+
+/* Waiting timeout for task execution on an endpoint. Based on the
+ * DEFAULT_PROGRESS_TIMEOUT.
+ *
+ * Based on 25s timeout at 100Mhz, scaled from a 500MHz GPU system.
+ */
+#define DEFAULT_PROGRESS_TIMEOUT_CYCLES (2500000000ull)
 
 #else /* MALI_USE_CSF */
 
@@ -202,7 +240,22 @@ enum {
  */
 #define JM_DEFAULT_TIMEOUT_CYCLES (150000000)
 
-#endif /* MALI_USE_CSF */
+/* Default number of milliseconds given for other jobs on the GPU to be
+ * soft-stopped when the GPU needs to be reset.
+ */
+#define JM_DEFAULT_RESET_TIMEOUT_MS (3000) /* 3s */
+
+/* Default timeout in clock cycles to be used when checking if JS_COMMAND_NEXT
+ * is updated on HW side so a Job Slot is considered free.
+ * This timeout will only take effect on GPUs with low value for the minimum
+ * GPU clock frequency (<= 100MHz).
+ *
+ * Based on 1ms timeout at 100MHz. Will default to 0ms on GPUs with higher
+ * value for minimum GPU clock frequency.
+ */
+#define JM_DEFAULT_JS_FREE_TIMEOUT_CYCLES (100000)
+
+#endif /* !MALI_USE_CSF */
 
 /* Default timeslice that a context is scheduled in for, in nanoseconds.
  *
@@ -238,5 +291,18 @@ enum {
  */
 #define DEFAULT_IR_THRESHOLD (192)
 
-#endif /* _KBASE_CONFIG_DEFAULTS_H_ */
+/* Waiting time in clock cycles for the completion of a MMU operation.
+ *
+ * Ideally 1.6M GPU cycles required for the L2 cache (512KiB slice) flush.
+ *
+ * As a pessimistic value, 50M GPU cycles ( > 30 times bigger ) is chosen.
+ * It corresponds to 0.5s in GPU @ 100Mhz.
+ */
+#define MMU_AS_INACTIVE_WAIT_TIMEOUT_CYCLES ((u64)50 * 1024 * 1024)
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+/* Default value of the time interval at which GPU metrics tracepoints are emitted. */
+#define DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS (500000000u) /* 500 ms */
+#endif
 
+#endif /* _KBASE_CONFIG_DEFAULTS_H_ */
diff --git a/mali_kbase/mali_kbase_core_linux.c b/mali_kbase/mali_kbase_core_linux.c
index e714056..d8fab9f 100644
--- a/mali_kbase/mali_kbase_core_linux.c
+++ b/mali_kbase/mali_kbase_core_linux.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,11 +31,8 @@
 #include <ipa/mali_kbase_ipa_debugfs.h>
 #endif /* CONFIG_DEVFREQ_THERMAL */
 #endif /* CONFIG_MALI_DEVFREQ */
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
 #include "backend/gpu/mali_kbase_model_linux.h"
-#include <backend/gpu/mali_kbase_model_dummy.h>
-#endif /* CONFIG_MALI_NO_MALI */
-#include "mali_kbase_mem_profile_debugfs_buf_size.h"
+#include "uapi/gpu/arm/midgard/mali_kbase_mem_profile_debugfs_buf_size.h"
 #include "mali_kbase_mem.h"
 #include "mali_kbase_mem_pool_debugfs.h"
 #include "mali_kbase_mem_pool_group.h"
@@ -54,8 +51,8 @@
 #if !MALI_USE_CSF
 #include "mali_kbase_kinstr_jm.h"
 #endif
-#include "mali_kbase_hwcnt_context.h"
-#include "mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_context.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
 #include "mali_kbase_kinstr_prfcnt.h"
 #include "mali_kbase_vinstr.h"
 #if MALI_USE_CSF
@@ -80,6 +77,9 @@
 #include "mali_kbase_pbha_debugfs.h"
 #endif
 
+/* Pixel includes */
+#include "platform/pixel/pixel_gpu_slc.h"
+
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/poll.h>
@@ -96,14 +96,16 @@
 #include <linux/fs.h>
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/mm.h>
 #include <linux/compat.h>	/* is_compat_task/in_compat_syscall */
 #include <linux/mman.h>
 #include <linux/version.h>
+#include <linux/version_compat_defs.h>
 #include <mali_kbase_hw.h>
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 #include <mali_kbase_sync.h>
-#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */
+#endif /* CONFIG_SYNC_FILE */
 #include <linux/clk.h>
 #include <linux/clk-provider.h>
 #include <linux/delay.h>
@@ -122,11 +124,6 @@
 
 #include <mali_kbase_caps.h>
 
-/* GPU IRQ Tags */
-#define	JOB_IRQ_TAG	0
-#define MMU_IRQ_TAG	1
-#define GPU_IRQ_TAG	2
-
 #define KERNEL_SIDE_DDK_VERSION_STRING "K:" MALI_RELEASE_NAME "(GPL)"
 
 /**
@@ -138,9 +135,6 @@
 					 (((minor) & 0xFFF) << 8) | \
 					 ((0 & 0xFF) << 0))
 
-#define KBASE_API_MIN(api_version) ((api_version >> 8) & 0xFFF)
-#define KBASE_API_MAJ(api_version) ((api_version >> 20) & 0xFFF)
-
 /**
  * struct mali_kbase_capability_def - kbase capabilities table
  *
@@ -172,6 +166,13 @@ static const struct mali_kbase_capability_def kbase_caps_table[MALI_KBASE_NUM_CA
 #endif
 };
 
+#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE)
+/* Mutex to synchronize the probe of multiple kbase instances */
+static struct mutex kbase_probe_mutex;
+#endif
+
+static void kbase_file_destroy_kctx_worker(struct work_struct *work);
+
 /**
  * mali_kbase_supports_cap - Query whether a kbase capability is supported
  *
@@ -200,35 +201,92 @@ bool mali_kbase_supports_cap(unsigned long api_version, enum mali_kbase_cap cap)
 	return supported;
 }
 
-struct task_struct *kbase_create_realtime_thread(struct kbase_device *kbdev,
-	int (*threadfn)(void *data), void *data, const char namefmt[])
+static void kbase_set_sched_rt(struct kbase_device *kbdev, struct task_struct *task, char *thread_name)
 {
 	unsigned int i;
-
-	cpumask_t mask = { CPU_BITS_NONE };
-
 	static const struct sched_param param = {
 		.sched_priority = KBASE_RT_THREAD_PRIO,
 	};
 
-	struct task_struct *ret = kthread_create(kthread_worker_fn, data, namefmt);
+	cpumask_t mask = { CPU_BITS_NONE };
+	for (i = KBASE_RT_THREAD_CPUMASK_MIN; i <= KBASE_RT_THREAD_CPUMASK_MAX ; i++)
+		cpumask_set_cpu(i, &mask);
+	kthread_bind_mask(task, &mask);
 
-	if (!IS_ERR(ret)) {
-		for (i = KBASE_RT_THREAD_CPUMASK_MIN; i <= KBASE_RT_THREAD_CPUMASK_MAX ; i++)
-			cpumask_set_cpu(i, &mask);
+	wake_up_process(task);
 
-		kthread_bind_mask(ret, &mask);
+	if (sched_setscheduler_nocheck(task, SCHED_FIFO, &param))
+		dev_warn(kbdev->dev, "%s not set to RT prio", thread_name);
+	else
+		dev_dbg(kbdev->dev, "%s set to RT prio: %i",
+			thread_name, param.sched_priority);
+}
 
-		wake_up_process(ret);
+struct task_struct *kbase_kthread_run_rt(struct kbase_device *kbdev,
+	int (*threadfn)(void *data), void *thread_param, const char namefmt[], ...)
+{
+	struct task_struct *task;
+	va_list args;
+	char name_buf[128];
+	int len;
 
-		if (sched_setscheduler_nocheck(ret, SCHED_FIFO, &param))
-			dev_warn(kbdev->dev, "%s not set to RT prio", namefmt);
-		else
-			dev_dbg(kbdev->dev, "%s set to RT prio: %i",
-				namefmt, param.sched_priority);
+	/* Construct the thread name */
+	va_start(args, namefmt);
+	len = vsnprintf(name_buf, sizeof(name_buf), namefmt, args);
+	va_end(args);
+	if (len + 1 > sizeof(name_buf)) {
+		dev_warn(kbdev->dev, "RT thread name truncated to %s", name_buf);
 	}
 
-	return ret;
+	task = kthread_create(threadfn, thread_param, name_buf);
+
+	if (!IS_ERR(task)) {
+		kbase_set_sched_rt(kbdev, task, name_buf);
+	}
+
+	return task;
+}
+
+int kbase_kthread_run_worker_rt(struct kbase_device *kbdev,
+	struct kthread_worker *worker, const char namefmt[], ...)
+{
+	struct task_struct *task;
+	va_list args;
+	char name_buf[128];
+	int len;
+
+	/* Construct the thread name */
+	va_start(args, namefmt);
+	len = vsnprintf(name_buf, sizeof(name_buf), namefmt, args);
+	va_end(args);
+	if (len + 1 > sizeof(name_buf)) {
+		dev_warn(kbdev->dev, "RT thread name truncated to %s", name_buf);
+	}
+
+	kthread_init_worker(worker);
+
+	task = kthread_create(kthread_worker_fn, worker, name_buf);
+
+	if (!IS_ERR(task)) {
+		worker->task = task;
+		kbase_set_sched_rt(kbdev, task, name_buf);
+		return 0;
+	}
+
+	return PTR_ERR(task);
+}
+
+void kbase_destroy_kworker_stack(struct kthread_worker *worker)
+{
+	struct task_struct *task;
+
+	task = worker->task;
+	if (WARN_ON(!task))
+		return;
+
+	kthread_flush_worker(worker);
+	kthread_stop(task);
+	WARN_ON(!list_empty(&worker->work_list));
 }
 
 /**
@@ -245,6 +303,8 @@ struct task_struct *kbase_create_realtime_thread(struct kbase_device *kbdev,
  *
  * Return: Address of an object representing a simulated device file, or NULL
  *         on failure.
+ *
+ * Note: This function always gets called in Userspace context.
  */
 static struct kbase_file *kbase_file_new(struct kbase_device *const kbdev,
 	struct file *const filp)
@@ -257,6 +317,17 @@ static struct kbase_file *kbase_file_new(struct kbase_device *const kbdev,
 		kfile->kctx = NULL;
 		kfile->api_version = 0;
 		atomic_set(&kfile->setup_state, KBASE_FILE_NEED_VSN);
+		/* Store the pointer to the file table structure of current process. */
+		kfile->owner = current->files;
+		INIT_WORK(&kfile->destroy_kctx_work, kbase_file_destroy_kctx_worker);
+		spin_lock_init(&kfile->lock);
+		kfile->fops_count = 0;
+		kfile->map_count = 0;
+		typecheck(typeof(kfile->map_count), typeof(current->mm->map_count));
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+		init_waitqueue_head(&kfile->zero_fops_count_wait);
+#endif
+		init_waitqueue_head(&kfile->event_queue);
 	}
 	return kfile;
 }
@@ -337,18 +408,46 @@ static int kbase_file_create_kctx(struct kbase_file *kfile,
 	base_context_create_flags flags);
 
 /**
+ * kbase_file_inc_fops_count_if_allowed - Increment the kfile::fops_count value if the file
+ *                                        operation is allowed for the current process.
+ *
+ * @kfile: Pointer to the object representing the /dev/malixx device file instance.
+ *
+ * The function shall be called at the beginning of certain file operation methods
+ * implemented for @kbase_fops, like ioctl, poll, read and mmap.
+ *
+ * kbase_file_dec_fops_count() shall be called if the increment was done.
+ *
+ * Return: true if the increment was done otherwise false.
+ *
+ * Note: This function shall always be called in Userspace context.
+ */
+static bool kbase_file_inc_fops_count_if_allowed(struct kbase_file *const kfile)
+{
+	/* Disallow file operations from the other process that shares the instance
+	 * of /dev/malixx file i.e. 'kfile' or disallow file operations if parent
+	 * process has closed the file instance.
+	 */
+	if (unlikely(kfile->owner != current->files))
+		return false;
+
+	return kbase_file_inc_fops_count_unless_closed(kfile);
+}
+
+/**
  * kbase_file_get_kctx_if_setup_complete - Get a kernel base context
  *                                         pointer from a device file
  *
  * @kfile: A device file created by kbase_file_new()
  *
- * This function returns an error code (encoded with ERR_PTR) if no context
- * has been created for the given @kfile. This makes it safe to use in
- * circumstances where the order of initialization cannot be enforced, but
- * only if the caller checks the return value.
+ * This function returns NULL if no context has been created for the given @kfile.
+ * This makes it safe to use in circumstances where the order of initialization
+ * cannot be enforced, but only if the caller checks the return value.
  *
  * Return: Address of the kernel base context associated with the @kfile, or
  *         NULL if no context exists.
+ *
+ * Note: This function shall always be called in Userspace context.
  */
 static struct kbase_context *kbase_file_get_kctx_if_setup_complete(
 	struct kbase_file *const kfile)
@@ -362,37 +461,103 @@ static struct kbase_context *kbase_file_get_kctx_if_setup_complete(
 }
 
 /**
- * kbase_file_delete - Destroy an object representing a device file
+ * kbase_file_destroy_kctx - Destroy the Kbase context created for @kfile.
  *
  * @kfile: A device file created by kbase_file_new()
- *
- * If any context was created for the @kfile then it is destroyed.
  */
-static void kbase_file_delete(struct kbase_file *const kfile)
+static void kbase_file_destroy_kctx(struct kbase_file *const kfile)
 {
-	struct kbase_device *kbdev = NULL;
-
-	if (WARN_ON(!kfile))
+	if (atomic_cmpxchg(&kfile->setup_state, KBASE_FILE_COMPLETE,
+	    KBASE_FILE_DESTROY_CTX) != KBASE_FILE_COMPLETE)
 		return;
 
-	kfile->filp->private_data = NULL;
-	kbdev = kfile->kbdev;
-
-	if (atomic_read(&kfile->setup_state) == KBASE_FILE_COMPLETE) {
-		struct kbase_context *kctx = kfile->kctx;
-
 #if IS_ENABLED(CONFIG_DEBUG_FS)
-		kbasep_mem_profile_debugfs_remove(kctx);
+	kbasep_mem_profile_debugfs_remove(kfile->kctx);
+	kbase_context_debugfs_term(kfile->kctx);
 #endif
-		kbase_context_debugfs_term(kctx);
 
-		kbase_destroy_context(kctx);
+	kbase_destroy_context(kfile->kctx);
+	dev_dbg(kfile->kbdev->dev, "Deleted kbase context");
+}
+
+/**
+ * kbase_file_destroy_kctx_worker - Work item to destroy the Kbase context.
+ *
+ * @work: Pointer to the kfile::destroy_kctx_work.
+ *
+ * The work item shall only be enqueued if the context termination could not
+ * be done from @kbase_flush().
+ */
+static void kbase_file_destroy_kctx_worker(struct work_struct *work)
+{
+	struct kbase_file *kfile =
+		container_of(work, struct kbase_file, destroy_kctx_work);
+
+	WARN_ON_ONCE(kfile->owner);
+	WARN_ON_ONCE(kfile->map_count);
+	WARN_ON_ONCE(kfile->fops_count);
+
+	kbase_file_destroy_kctx(kfile);
+}
+
+/**
+ * kbase_file_destroy_kctx_on_flush - Try destroy the Kbase context from the flush()
+ *                                    method of @kbase_fops.
+ *
+ * @kfile: A device file created by kbase_file_new()
+ */
+static void kbase_file_destroy_kctx_on_flush(struct kbase_file *const kfile)
+{
+	bool can_destroy_context = false;
 
-		dev_dbg(kbdev->dev, "deleted base context\n");
+	spin_lock(&kfile->lock);
+	kfile->owner = NULL;
+	/* To destroy the context from flush() method, unlike the release()
+	 * method, need to synchronize manually against the other threads in
+	 * the current process that could be operating on the /dev/malixx file.
+	 *
+	 * Only destroy the context if all the memory mappings on the
+	 * /dev/malixx file instance have been closed. If there are mappings
+	 * present then the context would be destroyed later when the last
+	 * mapping is closed.
+	 * Also, only destroy the context if no file operations are in progress.
+	 */
+	can_destroy_context = !kfile->map_count && !kfile->fops_count;
+	spin_unlock(&kfile->lock);
+
+	if (likely(can_destroy_context)) {
+		WARN_ON_ONCE(work_pending(&kfile->destroy_kctx_work));
+		kbase_file_destroy_kctx(kfile);
 	}
+}
 
-	kbase_release_device(kbdev);
+/**
+ * kbase_file_delete - Destroy an object representing a device file
+ *
+ * @kfile: A device file created by kbase_file_new()
+ *
+ * If any context was created for the @kfile and is still alive, then it is destroyed.
+ */
+static void kbase_file_delete(struct kbase_file *const kfile)
+{
+	if (WARN_ON(!kfile))
+		return;
+
+	/* All the CPU mappings on the device file should have been closed */
+	WARN_ON_ONCE(kfile->map_count);
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	/* There could still be file operations due to the debugfs file (mem_view) */
+	wait_event(kfile->zero_fops_count_wait, !kbase_file_fops_count(kfile));
+#else
+	/* There shall not be any file operations in progress on the device file */
+	WARN_ON_ONCE(kfile->fops_count);
+#endif
 
+	kfile->filp->private_data = NULL;
+	cancel_work_sync(&kfile->destroy_kctx_work);
+	/* Destroy the context if it wasn't done earlier from the flush() method. */
+	kbase_file_destroy_kctx(kfile);
+	kbase_release_device(kfile->kbdev);
 	kfree(kfile);
 }
 
@@ -463,6 +628,7 @@ static struct kbase_device *to_kbase_device(struct device *dev)
 
 int assign_irqs(struct kbase_device *kbdev)
 {
+	static const char *const irq_names_caps[] = { "JOB", "MMU", "GPU" };
 	struct platform_device *pdev;
 	int i;
 
@@ -470,40 +636,35 @@ int assign_irqs(struct kbase_device *kbdev)
 		return -ENODEV;
 
 	pdev = to_platform_device(kbdev->dev);
-	/* 3 IRQ resources */
-	for (i = 0; i < 3; i++) {
-		struct resource irq_res;
+
+	for (i = 0; i < ARRAY_SIZE(irq_names_caps); i++) {
+		struct irq_data *irqdata;
 		int irq;
-		int irqtag;
 
-		irq = platform_get_irq(pdev, i);
+		/* We recommend using Upper case for the irq names in dts, but if
+		 * there are devices in the world using Lower case then we should
+		 * avoid breaking support for them. So try using names in Upper case
+		 * first then try using Lower case names. If both attempts fail then
+		 * we assume there is no IRQ resource specified for the GPU.
+		 */
+		irq = platform_get_irq_byname(pdev, irq_names_caps[i]);
 		if (irq < 0) {
-			dev_err(kbdev->dev, "No IRQ resource at index %d\n", i);
-			return irq;
-		}
+			static const char *const irq_names[] = { "job", "mmu", "gpu" };
 
-#if IS_ENABLED(CONFIG_OF)
-		if (irq != of_irq_to_resource(kbdev->dev->of_node, i, &irq_res)) {
-			dev_err(kbdev->dev, "Failed to get irq resource at index %d\n", i);
+			irq = platform_get_irq_byname(pdev, irq_names[i]);
+	        }
+
+		if (irq < 0) {
+			dev_err(kbdev->dev, "No IRQ resource '%s'\n", irq_names_caps[i]);
 			return irq;
 		}
 
-		if (!strncasecmp(irq_res.name, "JOB", 4)) {
-			irqtag = JOB_IRQ_TAG;
-		} else if (!strncasecmp(irq_res.name, "MMU", 4)) {
-			irqtag = MMU_IRQ_TAG;
-		} else if (!strncasecmp(irq_res.name, "GPU", 4)) {
-			irqtag = GPU_IRQ_TAG;
-		} else {
-			dev_err(&pdev->dev, "Invalid irq res name: '%s'\n",
-				irq_res.name);
+		kbdev->irqs[i].irq = (u32)irq;
+		irqdata = irq_get_irq_data((unsigned int)irq);
+		if (likely(irqdata))
+			kbdev->irqs[i].flags = irqd_get_trigger_type(irqdata);
+		else
 			return -EINVAL;
-		}
-#else
-		irqtag = i;
-#endif /* CONFIG_OF */
-		kbdev->irqs[irqtag].irq = irq;
-		kbdev->irqs[irqtag].flags = irq_res.flags & IRQF_TRIGGER_MASK;
 	}
 
 	return 0;
@@ -539,27 +700,6 @@ void kbase_release_device(struct kbase_device *kbdev)
 EXPORT_SYMBOL(kbase_release_device);
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE &&                            \
-	!(KERNEL_VERSION(4, 4, 28) <= LINUX_VERSION_CODE &&                    \
-	  KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE)
-/*
- * Older versions, before v4.6, of the kernel doesn't have
- * kstrtobool_from_user(), except longterm 4.4.y which had it added in 4.4.28
- */
-static int kstrtobool_from_user(const char __user *s, size_t count, bool *res)
-{
-	char buf[4];
-
-	count = min(count, sizeof(buf) - 1);
-
-	if (copy_from_user(buf, s, count))
-		return -EFAULT;
-	buf[count] = '\0';
-
-	return strtobool(buf, res);
-}
-#endif
-
 static ssize_t write_ctx_infinite_cache(struct file *f, const char __user *ubuf, size_t size, loff_t *off)
 {
 	struct kbase_context *kctx = f->private_data;
@@ -671,13 +811,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile,
 
 	kbdev = kfile->kbdev;
 
-#if (KERNEL_VERSION(4, 6, 0) <= LINUX_VERSION_CODE)
 	kctx = kbase_create_context(kbdev, in_compat_syscall(),
-		flags, kfile->api_version, kfile->filp);
-#else
-	kctx = kbase_create_context(kbdev, is_compat_task(),
-		flags, kfile->api_version, kfile->filp);
-#endif /* (KERNEL_VERSION(4, 6, 0) <= LINUX_VERSION_CODE) */
+		flags, kfile->api_version, kfile);
 
 	/* if bad flags, will stay stuck in setup mode */
 	if (!kctx)
@@ -687,7 +822,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile,
 		kbase_ctx_flag_set(kctx, KCTX_INFINITE_CACHE);
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
-	snprintf(kctx_name, 64, "%d_%d", kctx->tgid, kctx->id);
+	if (unlikely(!scnprintf(kctx_name, 64, "%d_%d", kctx->tgid, kctx->id)))
+		return -ENOMEM;
 
 	mutex_init(&kctx->mem_profile_lock);
 
@@ -698,16 +834,8 @@ static int kbase_file_create_kctx(struct kbase_file *const kfile,
 		/* we don't treat this as a fail - just warn about it */
 		dev_warn(kbdev->dev, "couldn't create debugfs dir for kctx\n");
 	} else {
-#if (KERNEL_VERSION(4, 7, 0) > LINUX_VERSION_CODE)
-		/* prevent unprivileged use of debug file system
-		 * in old kernel version
-		 */
-		debugfs_create_file("infinite_cache", 0600, kctx->kctx_dentry,
-			kctx, &kbase_infinite_cache_fops);
-#else
 		debugfs_create_file("infinite_cache", 0644, kctx->kctx_dentry,
 			kctx, &kbase_infinite_cache_fops);
-#endif
 		debugfs_create_file("force_same_va", 0600, kctx->kctx_dentry,
 			kctx, &kbase_force_same_va_fops);
 
@@ -734,6 +862,11 @@ static int kbase_open(struct inode *inode, struct file *filp)
 	if (!kbdev)
 		return -ENODEV;
 
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+	/* Set address space operations for page migration */
+	kbase_mem_migrate_set_address_space_ops(kbdev, filp);
+#endif
+
 	/* Device-wide firmware load is moved here from probing to comply with
 	 * Android GKI vendor guideline.
 	 */
@@ -765,6 +898,36 @@ static int kbase_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+/**
+ * kbase_flush - Function implementing the flush() method of @kbase_fops.
+ *
+ * @filp: Pointer to the /dev/malixx device file instance.
+ * @id:   Pointer to the file table structure of current process.
+ *        If @filp is being shared by multiple processes then @id can differ
+ *        from kfile::owner.
+ *
+ * This function is called everytime the copy of @filp is closed. So if 3 processes
+ * are sharing the @filp then this function would be called 3 times and only after
+ * that kbase_release() would get called.
+ *
+ * Return: 0 if successful, otherwise a negative error code.
+ *
+ * Note: This function always gets called in Userspace context when the
+ *       file is closed.
+ */
+static int kbase_flush(struct file *filp, fl_owner_t id)
+{
+	struct kbase_file *const kfile = filp->private_data;
+
+	/* Try to destroy the context if the flush() method has been called for the
+	 * process that created the instance of /dev/malixx file i.e. 'kfile'.
+	 */
+	if (kfile->owner == id)
+		kbase_file_destroy_kctx_on_flush(kfile);
+
+	return 0;
+}
+
 static int kbase_api_set_flags(struct kbase_file *kfile,
 		struct kbase_ioctl_set_flags *flags)
 {
@@ -818,12 +981,21 @@ static int kbase_api_set_flags(struct kbase_file *kfile,
 	return err;
 }
 
+#if !MALI_USE_CSF
 static int kbase_api_apc_request(struct kbase_file *kfile,
 		struct kbase_ioctl_apc_request *apc)
 {
 	kbase_pm_apc_request(kfile->kbdev, apc->dur_usec);
 	return 0;
 }
+#endif
+
+static int kbase_api_buffer_liveness_update(struct kbase_context *kctx,
+		struct kbase_ioctl_buffer_liveness_update *update)
+{
+	/* Defer handling to platform */
+	return gpu_pixel_handle_buffer_liveness_update_ioctl(kctx, update);
+}
 
 #if !MALI_USE_CSF
 static int kbase_api_job_submit(struct kbase_context *kctx,
@@ -1053,9 +1225,9 @@ static int kbase_api_get_cpu_gpu_timeinfo(struct kbase_context *kctx,
 		union kbase_ioctl_get_cpu_gpu_timeinfo *timeinfo)
 {
 	u32 flags = timeinfo->in.request_flags;
-	struct timespec64 ts;
-	u64 timestamp;
-	u64 cycle_cnt;
+	struct timespec64 ts = { 0 };
+	u64 timestamp = 0;
+	u64 cycle_cnt = 0;
 
 	kbase_pm_context_active(kctx->kbdev);
 
@@ -1084,11 +1256,7 @@ static int kbase_api_get_cpu_gpu_timeinfo(struct kbase_context *kctx,
 static int kbase_api_hwcnt_set(struct kbase_context *kctx,
 		struct kbase_ioctl_hwcnt_values *values)
 {
-	gpu_model_set_dummy_prfcnt_sample(
-			(u32 __user *)(uintptr_t)values->data,
-			values->size);
-
-	return 0;
+	return gpu_model_set_dummy_prfcnt_user_sample(u64_to_user_ptr(values->data), values->size);
 }
 #endif /* CONFIG_MALI_NO_MALI */
 
@@ -1122,52 +1290,11 @@ static int kbase_api_get_ddk_version(struct kbase_context *kctx,
 	return len;
 }
 
-/* Defaults for legacy just-in-time memory allocator initialization
- * kernel calls
- */
-#define DEFAULT_MAX_JIT_ALLOCATIONS 255
-#define JIT_LEGACY_TRIM_LEVEL (0) /* No trimming */
-
-static int kbase_api_mem_jit_init_10_2(struct kbase_context *kctx,
-		struct kbase_ioctl_mem_jit_init_10_2 *jit_init)
-{
-	kctx->jit_version = 1;
-
-	/* since no phys_pages parameter, use the maximum: va_pages */
-	return kbase_region_tracker_init_jit(kctx, jit_init->va_pages,
-			DEFAULT_MAX_JIT_ALLOCATIONS,
-			JIT_LEGACY_TRIM_LEVEL, BASE_MEM_GROUP_DEFAULT,
-			jit_init->va_pages);
-}
-
-static int kbase_api_mem_jit_init_11_5(struct kbase_context *kctx,
-		struct kbase_ioctl_mem_jit_init_11_5 *jit_init)
-{
-	int i;
-
-	kctx->jit_version = 2;
-
-	for (i = 0; i < sizeof(jit_init->padding); i++) {
-		/* Ensure all padding bytes are 0 for potential future
-		 * extension
-		 */
-		if (jit_init->padding[i])
-			return -EINVAL;
-	}
-
-	/* since no phys_pages parameter, use the maximum: va_pages */
-	return kbase_region_tracker_init_jit(kctx, jit_init->va_pages,
-			jit_init->max_allocations, jit_init->trim_level,
-			jit_init->group_id, jit_init->va_pages);
-}
-
 static int kbase_api_mem_jit_init(struct kbase_context *kctx,
 		struct kbase_ioctl_mem_jit_init *jit_init)
 {
 	int i;
 
-	kctx->jit_version = 3;
-
 	for (i = 0; i < sizeof(jit_init->padding); i++) {
 		/* Ensure all padding bytes are 0 for potential future
 		 * extension
@@ -1325,7 +1452,7 @@ static int kbase_api_mem_flags_change(struct kbase_context *kctx,
 static int kbase_api_stream_create(struct kbase_context *kctx,
 		struct kbase_ioctl_stream_create *stream)
 {
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	int fd, ret;
 
 	/* Name must be NULL-terminated and padded with NULLs, so check last
@@ -1347,7 +1474,7 @@ static int kbase_api_stream_create(struct kbase_context *kctx,
 static int kbase_api_fence_validate(struct kbase_context *kctx,
 		struct kbase_ioctl_fence_validate *validate)
 {
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	return kbase_sync_fence_validate(validate->fd);
 #else
 	return -ENOENT;
@@ -1361,12 +1488,18 @@ static int kbase_api_mem_profile_add(struct kbase_context *kctx,
 	int err;
 
 	if (data->len > KBASE_MEM_PROFILE_MAX_BUF_SIZE) {
-		dev_err(kctx->kbdev->dev, "mem_profile_add: buffer too big\n");
+		dev_err(kctx->kbdev->dev, "mem_profile_add: buffer too big");
 		return -EINVAL;
 	}
 
+	if (!data->len) {
+		dev_err(kctx->kbdev->dev, "mem_profile_add: buffer size is 0");
+		/* Should return -EINVAL, but returning -ENOMEM for backwards compat */
+		return -ENOMEM;
+	}
+
 	buf = kmalloc(data->len, GFP_KERNEL);
-	if (ZERO_OR_NULL_PTR(buf))
+	if (!buf)
 		return -ENOMEM;
 
 	err = copy_from_user(buf, u64_to_user_ptr(data->buffer),
@@ -1406,7 +1539,7 @@ static int kbase_api_sticky_resource_map(struct kbase_context *kctx,
 	if (ret != 0)
 		return -EFAULT;
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	for (i = 0; i < map->count; i++) {
 		if (!kbase_sticky_resource_acquire(kctx, gpu_addr[i])) {
@@ -1423,7 +1556,7 @@ static int kbase_api_sticky_resource_map(struct kbase_context *kctx,
 		}
 	}
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 	return ret;
 }
@@ -1444,7 +1577,7 @@ static int kbase_api_sticky_resource_unmap(struct kbase_context *kctx,
 	if (ret != 0)
 		return -EFAULT;
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	for (i = 0; i < unmap->count; i++) {
 		if (!kbase_sticky_resource_release_force(kctx, NULL, gpu_addr[i])) {
@@ -1453,7 +1586,7 @@ static int kbase_api_sticky_resource_unmap(struct kbase_context *kctx,
 		}
 	}
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 	return ret;
 }
@@ -1518,6 +1651,7 @@ static int kbasep_cs_queue_group_create_1_6(
 	struct kbase_context *kctx,
 	union kbase_ioctl_cs_queue_group_create_1_6 *create)
 {
+	int ret, i;
 	union kbase_ioctl_cs_queue_group_create
 		new_create = { .in = {
 				       .tiler_mask = create->in.tiler_mask,
@@ -1531,16 +1665,61 @@ static int kbasep_cs_queue_group_create_1_6(
 				       .compute_max = create->in.compute_max,
 			       } };
 
-	int ret = kbase_csf_queue_group_create(kctx, &new_create);
+	for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) {
+		if (create->in.padding[i] != 0) {
+			dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n");
+			return -EINVAL;
+		}
+	}
+
+	ret = kbase_csf_queue_group_create(kctx, &new_create);
 
 	create->out.group_handle = new_create.out.group_handle;
 	create->out.group_uid = new_create.out.group_uid;
 
 	return ret;
 }
+
+static int kbasep_cs_queue_group_create_1_18(struct kbase_context *kctx,
+					     union kbase_ioctl_cs_queue_group_create_1_18 *create)
+{
+	int ret, i;
+	union kbase_ioctl_cs_queue_group_create
+		new_create = { .in = {
+				       .tiler_mask = create->in.tiler_mask,
+				       .fragment_mask = create->in.fragment_mask,
+				       .compute_mask = create->in.compute_mask,
+				       .cs_min = create->in.cs_min,
+				       .priority = create->in.priority,
+				       .tiler_max = create->in.tiler_max,
+				       .fragment_max = create->in.fragment_max,
+				       .compute_max = create->in.compute_max,
+				       .csi_handlers = create->in.csi_handlers,
+				       .dvs_buf = create->in.dvs_buf,
+			       } };
+
+	for (i = 0; i < ARRAY_SIZE(create->in.padding); i++) {
+		if (create->in.padding[i] != 0) {
+			dev_warn(kctx->kbdev->dev, "Invalid padding not 0 in queue group create\n");
+			return -EINVAL;
+		}
+	}
+
+	ret = kbase_csf_queue_group_create(kctx, &new_create);
+
+	create->out.group_handle = new_create.out.group_handle;
+	create->out.group_uid = new_create.out.group_uid;
+
+	return ret;
+}
+
 static int kbasep_cs_queue_group_create(struct kbase_context *kctx,
 			     union kbase_ioctl_cs_queue_group_create *create)
 {
+	if (create->in.reserved != 0) {
+		dev_warn(kctx->kbdev->dev, "Invalid reserved field not 0 in queue group create\n");
+		return -EINVAL;
+	}
 	return kbase_csf_queue_group_create(kctx, create);
 }
 
@@ -1573,12 +1752,31 @@ static int kbasep_kcpu_queue_enqueue(struct kbase_context *kctx,
 static int kbasep_cs_tiler_heap_init(struct kbase_context *kctx,
 		union kbase_ioctl_cs_tiler_heap_init *heap_init)
 {
+	if (heap_init->in.group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS)
+		return -EINVAL;
+	else
+		kctx->jit_group_id = heap_init->in.group_id;
+
+	return kbase_csf_tiler_heap_init(kctx, heap_init->in.chunk_size,
+					 heap_init->in.initial_chunks, heap_init->in.max_chunks,
+					 heap_init->in.target_in_flight, heap_init->in.buf_desc_va,
+					 &heap_init->out.gpu_heap_va,
+					 &heap_init->out.first_chunk_va);
+}
+
+static int kbasep_cs_tiler_heap_init_1_13(struct kbase_context *kctx,
+					  union kbase_ioctl_cs_tiler_heap_init_1_13 *heap_init)
+{
+	if (heap_init->in.group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS)
+		return -EINVAL;
+
 	kctx->jit_group_id = heap_init->in.group_id;
 
 	return kbase_csf_tiler_heap_init(kctx, heap_init->in.chunk_size,
-		heap_init->in.initial_chunks, heap_init->in.max_chunks,
-		heap_init->in.target_in_flight,
-		&heap_init->out.gpu_heap_va, &heap_init->out.first_chunk_va);
+					 heap_init->in.initial_chunks, heap_init->in.max_chunks,
+					 heap_init->in.target_in_flight, 0,
+					 &heap_init->out.gpu_heap_va,
+					 &heap_init->out.first_chunk_va);
 }
 
 static int kbasep_cs_tiler_heap_term(struct kbase_context *kctx,
@@ -1660,6 +1858,30 @@ static int kbasep_ioctl_cs_cpu_queue_dump(struct kbase_context *kctx,
 					cpu_queue_info->size);
 }
 
+static int kbase_ioctl_read_user_page(struct kbase_context *kctx,
+				      union kbase_ioctl_read_user_page *user_page)
+{
+	struct kbase_device *kbdev = kctx->kbdev;
+	unsigned long flags;
+
+	/* As of now, only LATEST_FLUSH is supported */
+	if (unlikely(user_page->in.offset != LATEST_FLUSH))
+		return -EINVAL;
+
+	/* Validating padding that must be zero */
+	if (unlikely(user_page->in.padding != 0))
+		return -EINVAL;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	if (!kbdev->pm.backend.gpu_powered)
+		user_page->out.val_lo = POWER_DOWN_LATEST_FLUSH_VALUE;
+	else
+		user_page->out.val_lo = kbase_reg_read(kbdev, USER_REG(LATEST_FLUSH));
+	user_page->out.val_hi = 0;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return 0;
+}
 #endif /* MALI_USE_CSF */
 
 static int kbasep_ioctl_context_priority_check(struct kbase_context *kctx,
@@ -1755,9 +1977,8 @@ static int kbasep_ioctl_set_limited_core_count(struct kbase_context *kctx,
 	return 0;
 }
 
-static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static long kbase_kfile_ioctl(struct kbase_file *kfile, unsigned int cmd, unsigned long arg)
 {
-	struct kbase_file *const kfile = filp->private_data;
 	struct kbase_context *kctx = NULL;
 	struct kbase_device *kbdev = kfile->kbdev;
 	void __user *uarg = (void __user *)arg;
@@ -1785,12 +2006,14 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				kfile);
 		break;
 
+#if !MALI_USE_CSF
 	case KBASE_IOCTL_APC_REQUEST:
 		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_APC_REQUEST,
 				kbase_api_apc_request,
 				struct kbase_ioctl_apc_request,
 				kfile);
 		break;
+#endif
 
 	case KBASE_IOCTL_KINSTR_PRFCNT_ENUM_INFO:
 		KBASE_HANDLE_IOCTL_INOUT(
@@ -1868,18 +2091,6 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				struct kbase_ioctl_get_ddk_version,
 				kctx);
 		break;
-	case KBASE_IOCTL_MEM_JIT_INIT_10_2:
-		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT_10_2,
-				kbase_api_mem_jit_init_10_2,
-				struct kbase_ioctl_mem_jit_init_10_2,
-				kctx);
-		break;
-	case KBASE_IOCTL_MEM_JIT_INIT_11_5:
-		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT_11_5,
-				kbase_api_mem_jit_init_11_5,
-				struct kbase_ioctl_mem_jit_init_11_5,
-				kctx);
-		break;
 	case KBASE_IOCTL_MEM_JIT_INIT:
 		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_MEM_JIT_INIT,
 				kbase_api_mem_jit_init,
@@ -2081,6 +2292,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			kbasep_cs_queue_group_create_1_6,
 			union kbase_ioctl_cs_queue_group_create_1_6, kctx);
 		break;
+	case KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18:
+		KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_QUEUE_GROUP_CREATE_1_18,
+					 kbasep_cs_queue_group_create_1_18,
+					 union kbase_ioctl_cs_queue_group_create_1_18, kctx);
+		break;
 	case KBASE_IOCTL_CS_QUEUE_GROUP_CREATE:
 		KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_QUEUE_GROUP_CREATE,
 				kbasep_cs_queue_group_create,
@@ -2117,6 +2333,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				union kbase_ioctl_cs_tiler_heap_init,
 				kctx);
 		break;
+	case KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13:
+		KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_CS_TILER_HEAP_INIT_1_13,
+					 kbasep_cs_tiler_heap_init_1_13,
+					 union kbase_ioctl_cs_tiler_heap_init_1_13, kctx);
+		break;
 	case KBASE_IOCTL_CS_TILER_HEAP_TERM:
 		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_CS_TILER_HEAP_TERM,
 				kbasep_cs_tiler_heap_term,
@@ -2135,6 +2356,11 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				struct kbase_ioctl_cs_cpu_queue_info,
 				kctx);
 		break;
+	/* This IOCTL will be kept for backward compatibility */
+	case KBASE_IOCTL_READ_USER_PAGE:
+		KBASE_HANDLE_IOCTL_INOUT(KBASE_IOCTL_READ_USER_PAGE, kbase_ioctl_read_user_page,
+					 union kbase_ioctl_read_user_page, kctx);
+		break;
 #endif /* MALI_USE_CSF */
 #if MALI_UNIT_TEST
 	case KBASE_IOCTL_TLSTREAM_STATS:
@@ -2156,6 +2382,12 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 				struct kbase_ioctl_set_limited_core_count,
 				kctx);
 		break;
+	case KBASE_IOCTL_BUFFER_LIVENESS_UPDATE:
+		KBASE_HANDLE_IOCTL_IN(KBASE_IOCTL_BUFFER_LIVENESS_UPDATE,
+				kbase_api_buffer_liveness_update,
+				struct kbase_ioctl_buffer_liveness_update,
+				kctx);
+		break;
 	}
 
 	dev_warn(kbdev->dev, "Unknown ioctl 0x%x nr:%d", cmd, _IOC_NR(cmd));
@@ -2163,20 +2395,45 @@ static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	return -ENOIOCTLCMD;
 }
 
+static long kbase_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	struct kbase_file *const kfile = filp->private_data;
+	long ioctl_ret;
+
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile)))
+		return -EPERM;
+
+	ioctl_ret = kbase_kfile_ioctl(kfile, cmd, arg);
+	kbase_file_dec_fops_count(kfile);
+
+	return ioctl_ret;
+}
+
 #if MALI_USE_CSF
 static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
 {
 	struct kbase_file *const kfile = filp->private_data;
-	struct kbase_context *const kctx =
-		kbase_file_get_kctx_if_setup_complete(kfile);
+	struct kbase_context *kctx;
 	struct base_csf_notification event_data = {
 		.type = BASE_CSF_NOTIFICATION_EVENT };
 	const size_t data_size = sizeof(event_data);
 	bool read_event = false, read_error = false;
+	ssize_t err = 0;
 
-	if (unlikely(!kctx))
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile)))
 		return -EPERM;
 
+	kctx = kbase_file_get_kctx_if_setup_complete(kfile);
+	if (unlikely(!kctx)) {
+		err = -EPERM;
+		goto out;
+	}
+
+	if (count < data_size) {
+		err = -ENOBUFS;
+		goto out;
+	}
+
 	if (atomic_read(&kctx->event_count))
 		read_event = true;
 	else
@@ -2199,28 +2456,39 @@ static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, lof
 	if (copy_to_user(buf, &event_data, data_size) != 0) {
 		dev_warn(kctx->kbdev->dev,
 			"Failed to copy data\n");
-		return -EFAULT;
+		err = -EFAULT;
+		goto out;
 	}
 
 	if (read_event)
 		atomic_set(&kctx->event_count, 0);
 
-	return data_size;
+out:
+	kbase_file_dec_fops_count(kfile);
+	return err ? err : data_size;
 }
 #else /* MALI_USE_CSF */
 static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
 {
 	struct kbase_file *const kfile = filp->private_data;
-	struct kbase_context *const kctx =
-		kbase_file_get_kctx_if_setup_complete(kfile);
+	struct kbase_context *kctx;
 	struct base_jd_event_v2 uevent;
 	int out_count = 0;
+	ssize_t err = 0;
 
-	if (unlikely(!kctx))
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile)))
 		return -EPERM;
 
-	if (count < sizeof(uevent))
-		return -ENOBUFS;
+	kctx = kbase_file_get_kctx_if_setup_complete(kfile);
+	if (unlikely(!kctx)) {
+		err = -EPERM;
+		goto out;
+	}
+
+	if (count < sizeof(uevent)) {
+		err = -ENOBUFS;
+		goto out;
+	}
 
 	memset(&uevent, 0, sizeof(uevent));
 
@@ -2229,46 +2497,78 @@ static ssize_t kbase_read(struct file *filp, char __user *buf, size_t count, lof
 			if (out_count > 0)
 				goto out;
 
-			if (filp->f_flags & O_NONBLOCK)
-				return -EAGAIN;
+			if (filp->f_flags & O_NONBLOCK) {
+				err = -EAGAIN;
+				goto out;
+			}
 
-			if (wait_event_interruptible(kctx->event_queue,
-					kbase_event_pending(kctx)) != 0)
-				return -ERESTARTSYS;
+			if (wait_event_interruptible(kctx->kfile->event_queue,
+					kbase_event_pending(kctx)) != 0) {
+				err = -ERESTARTSYS;
+				goto out;
+			}
 		}
 		if (uevent.event_code == BASE_JD_EVENT_DRV_TERMINATED) {
-			if (out_count == 0)
-				return -EPIPE;
+			if (out_count == 0) {
+				err = -EPIPE;
+				goto out;
+			}
 			goto out;
 		}
 
-		if (copy_to_user(buf, &uevent, sizeof(uevent)) != 0)
-			return -EFAULT;
+		if (copy_to_user(buf, &uevent, sizeof(uevent)) != 0) {
+			err = -EFAULT;
+			goto out;
+		}
 
 		buf += sizeof(uevent);
 		out_count++;
 		count -= sizeof(uevent);
 	} while (count >= sizeof(uevent));
 
- out:
-	return out_count * sizeof(uevent);
+out:
+	kbase_file_dec_fops_count(kfile);
+	return err ? err : (out_count * sizeof(uevent));
 }
 #endif /* MALI_USE_CSF */
 
-static unsigned int kbase_poll(struct file *filp, poll_table *wait)
+static __poll_t kbase_poll(struct file *filp, poll_table *wait)
 {
 	struct kbase_file *const kfile = filp->private_data;
-	struct kbase_context *const kctx =
-		kbase_file_get_kctx_if_setup_complete(kfile);
+	struct kbase_context *kctx;
+	__poll_t ret = 0;
 
-	if (unlikely(!kctx))
-		return POLLERR;
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile))) {
+#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE)
+		ret = POLLNVAL;
+#else
+		ret = EPOLLNVAL;
+#endif
+		return ret;
+	}
+
+	kctx = kbase_file_get_kctx_if_setup_complete(kfile);
+	if (unlikely(!kctx)) {
+#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE)
+		ret = POLLERR;
+#else
+		ret = EPOLLERR;
+#endif
+		goto out;
+	}
 
-	poll_wait(filp, &kctx->event_queue, wait);
-	if (kbase_event_pending(kctx))
-		return POLLIN | POLLRDNORM;
+	poll_wait(filp, &kfile->event_queue, wait);
+	if (kbase_event_pending(kctx)) {
+#if (KERNEL_VERSION(4, 19, 0) > LINUX_VERSION_CODE)
+		ret = POLLIN | POLLRDNORM;
+#else
+		ret = EPOLLIN | EPOLLRDNORM;
+#endif
+	}
 
-	return 0;
+out:
+	kbase_file_dec_fops_count(kfile);
+	return ret;
 }
 
 void _kbase_event_wakeup(struct kbase_context *kctx, bool sync)
@@ -2277,12 +2577,12 @@ void _kbase_event_wakeup(struct kbase_context *kctx, bool sync)
         if(sync) {
 	    dev_dbg(kctx->kbdev->dev,
                     "Waking event queue for context %pK (sync)\n", (void *)kctx);
-	    wake_up_interruptible_sync(&kctx->event_queue);
+	    wake_up_interruptible_sync(&kctx->kfile->event_queue);
         }
         else {
 	    dev_dbg(kctx->kbdev->dev,
                     "Waking event queue for context %pK (nosync)\n",(void *)kctx);
-	    wake_up_interruptible(&kctx->event_queue);
+	    wake_up_interruptible(&kctx->kfile->event_queue);
         }
 }
 
@@ -2291,7 +2591,10 @@ KBASE_EXPORT_TEST_API(_kbase_event_wakeup);
 #if MALI_USE_CSF
 int kbase_event_pending(struct kbase_context *ctx)
 {
-	WARN_ON_ONCE(!ctx);
+	KBASE_DEBUG_ASSERT(ctx);
+
+	if (unlikely(!ctx))
+		return -EPERM;
 
 	return (atomic_read(&ctx->event_count) != 0) ||
 		kbase_csf_event_error_pending(ctx) ||
@@ -2302,6 +2605,9 @@ int kbase_event_pending(struct kbase_context *ctx)
 {
 	KBASE_DEBUG_ASSERT(ctx);
 
+	if (unlikely(!ctx))
+		return -EPERM;
+
 	return (atomic_read(&ctx->event_count) != 0) ||
 		(atomic_read(&ctx->event_closed) != 0);
 }
@@ -2312,13 +2618,20 @@ KBASE_EXPORT_TEST_API(kbase_event_pending);
 static int kbase_mmap(struct file *const filp, struct vm_area_struct *const vma)
 {
 	struct kbase_file *const kfile = filp->private_data;
-	struct kbase_context *const kctx =
-		kbase_file_get_kctx_if_setup_complete(kfile);
+	struct kbase_context *kctx;
+	int ret;
 
-	if (unlikely(!kctx))
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile)))
 		return -EPERM;
 
-	return kbase_context_mmap(kctx, vma);
+	kctx = kbase_file_get_kctx_if_setup_complete(kfile);
+	if (likely(kctx))
+		ret = kbase_context_mmap(kctx, vma);
+	else
+		ret = -EPERM;
+
+	kbase_file_dec_fops_count(kfile);
+	return ret;
 }
 
 static int kbase_check_flags(int flags)
@@ -2337,18 +2650,26 @@ static unsigned long kbase_get_unmapped_area(struct file *const filp,
 		const unsigned long pgoff, const unsigned long flags)
 {
 	struct kbase_file *const kfile = filp->private_data;
-	struct kbase_context *const kctx =
-		kbase_file_get_kctx_if_setup_complete(kfile);
+	struct kbase_context *kctx;
+	unsigned long address;
 
-	if (unlikely(!kctx))
+	if (unlikely(!kbase_file_inc_fops_count_if_allowed(kfile)))
 		return -EPERM;
 
-	return kbase_context_get_unmapped_area(kctx, addr, len, pgoff, flags);
+	kctx = kbase_file_get_kctx_if_setup_complete(kfile);
+	if (likely(kctx))
+		address = kbase_context_get_unmapped_area(kctx, addr, len, pgoff, flags);
+	else
+		address = -EPERM;
+
+	kbase_file_dec_fops_count(kfile);
+	return address;
 }
 
 static const struct file_operations kbase_fops = {
 	.owner = THIS_MODULE,
 	.open = kbase_open,
+	.flush = kbase_flush,
 	.release = kbase_release,
 	.read = kbase_read,
 	.poll = kbase_poll,
@@ -2579,7 +2900,7 @@ static ssize_t core_mask_store(struct device *dev, struct device_attribute *attr
 		new_core_mask[1] = new_core_mask[2] = new_core_mask[0];
 #endif
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
 	shader_present = kbdev->gpu_props.props.raw_props.shader_present;
@@ -2649,7 +2970,7 @@ static ssize_t core_mask_store(struct device *dev, struct device_attribute *attr
 
 unlock:
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 end:
 	return err;
 }
@@ -3271,22 +3592,22 @@ static ssize_t gpuinfo_show(struct device *dev,
 		  .name = "Mali-G510" },
 		{ .id = GPU_ID2_PRODUCT_TVAX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT,
 		  .name = "Mali-G310" },
-		{ .id = GPU_ID2_PRODUCT_TTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT,
-		  .name = "Mali-TTUX" },
-		{ .id = GPU_ID2_PRODUCT_LTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT,
-		  .name = "Mali-LTUX" },
+		{ .id = GPU_ID2_PRODUCT_LTIX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT,
+		  .name = "Mali-G620" },
 	};
 	const char *product_name = "(Unknown Mali GPU)";
 	struct kbase_device *kbdev;
 	u32 gpu_id;
 	unsigned int product_id, product_id_mask;
 	unsigned int i;
+	struct kbase_gpu_props *gpu_props;
 
 	kbdev = to_kbase_device(dev);
 	if (!kbdev)
 		return -ENODEV;
 
-	gpu_id = kbdev->gpu_props.props.raw_props.gpu_id;
+	gpu_props = &kbdev->gpu_props;
+	gpu_id = gpu_props->props.raw_props.gpu_id;
 	product_id = gpu_id >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT;
 	product_id_mask = GPU_ID2_PRODUCT_MODEL >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT;
 
@@ -3300,6 +3621,47 @@ static ssize_t gpuinfo_show(struct device *dev,
 		}
 	}
 
+#if MALI_USE_CSF
+	if ((product_id & product_id_mask) ==
+	    ((GPU_ID2_PRODUCT_TTUX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT) & product_id_mask)) {
+		const bool rt_supported =
+			GPU_FEATURES_RAY_TRACING_GET(gpu_props->props.raw_props.gpu_features);
+		const u8 nr_cores = gpu_props->num_cores;
+
+		/* Mali-G715-Immortalis if 10 < number of cores with ray tracing supproted.
+		 * Mali-G715 if 10 < number of cores without ray tracing supported.
+		 * Mali-G715 if 7 <= number of cores <= 10 regardless ray tracing.
+		 * Mali-G615 if number of cores < 7.
+		 */
+		if ((nr_cores > 10) && rt_supported)
+			product_name = "Mali-G715-Immortalis";
+		else if (nr_cores >= 7)
+			product_name = "Mali-G715";
+
+		if (nr_cores < 7) {
+			dev_warn(kbdev->dev, "nr_cores(%u) GPU ID must be G615", nr_cores);
+			product_name = "Mali-G615";
+		} else
+			dev_dbg(kbdev->dev, "GPU ID_Name: %s, nr_cores(%u)\n", product_name,
+				nr_cores);
+	}
+
+	if ((product_id & product_id_mask) ==
+	    ((GPU_ID2_PRODUCT_TTIX >> KBASE_GPU_ID_VERSION_PRODUCT_ID_SHIFT) & product_id_mask)) {
+		const bool rt_supported =
+			GPU_FEATURES_RAY_TRACING_GET(gpu_props->props.raw_props.gpu_features);
+		const u8 nr_cores = gpu_props->num_cores;
+
+		if ((nr_cores >= 10) && rt_supported)
+			product_name = "Mali-G720-Immortalis";
+		else
+			product_name = (nr_cores >= 6) ? "Mali-G720" : "Mali-G620";
+
+		dev_dbg(kbdev->dev, "GPU ID_Name: %s (ID: 0x%x), nr_cores(%u)\n", product_name,
+			nr_cores, product_id & product_id_mask);
+	}
+#endif /* MALI_USE_CSF */
+
 	return scnprintf(buf, PAGE_SIZE, "%s %d cores r%dp%d 0x%04X\n", product_name,
 			 kbdev->gpu_props.num_cores,
 			 (gpu_id & GPU_ID_VERSION_MAJOR) >> KBASE_GPU_ID_VERSION_MAJOR_SHIFT,
@@ -3372,6 +3734,56 @@ static ssize_t dvfs_period_show(struct device *dev,
 
 static DEVICE_ATTR_RW(dvfs_period);
 
+int kbase_pm_gpu_freq_init(struct kbase_device *kbdev)
+{
+	int err;
+	/* Uses default reference frequency defined in below macro */
+	u64 lowest_freq_khz = DEFAULT_REF_TIMEOUT_FREQ_KHZ;
+
+	/* Only check lowest frequency in cases when OPPs are used and
+	 * present in the device tree.
+	 */
+#ifdef CONFIG_PM_OPP
+	struct dev_pm_opp *opp_ptr;
+	unsigned long found_freq = 0;
+
+	/* find lowest frequency OPP */
+	opp_ptr = dev_pm_opp_find_freq_ceil(kbdev->dev, &found_freq);
+	if (IS_ERR(opp_ptr)) {
+		dev_err(kbdev->dev, "No OPPs found in device tree! Scaling timeouts using %llu kHz",
+			(unsigned long long)lowest_freq_khz);
+	} else {
+#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
+		dev_pm_opp_put(opp_ptr); /* decrease OPP refcount */
+#endif
+		/* convert found frequency to KHz */
+		found_freq /= 1000;
+
+		/* If lowest frequency in OPP table is still higher
+		 * than the reference, then keep the reference frequency
+		 * as the one to use for scaling .
+		 */
+		if (found_freq < lowest_freq_khz)
+			lowest_freq_khz = found_freq;
+	}
+#else
+	dev_err(kbdev->dev, "No operating-points-v2 node or operating-points property in DT");
+#endif
+
+	kbdev->lowest_gpu_freq_khz = lowest_freq_khz;
+
+	err = kbase_device_populate_max_freq(kbdev);
+	if (unlikely(err < 0))
+		return -1;
+
+	dev_dbg(kbdev->dev, "Lowest frequency identified is %llu kHz", kbdev->lowest_gpu_freq_khz);
+	dev_dbg(kbdev->dev,
+		"Setting default highest frequency to %u kHz (pending devfreq initialization",
+		kbdev->gpu_props.props.core_props.gpu_freq_khz_max);
+
+	return 0;
+}
+
 /**
  * pm_poweroff_store - Store callback for the pm_poweroff sysfs file.
  * @dev:   The device with sysfs file is for
@@ -3481,21 +3893,32 @@ static ssize_t reset_timeout_store(struct device *dev,
 {
 	struct kbase_device *kbdev;
 	int ret;
-	int reset_timeout;
+	u32 reset_timeout;
+	u32 default_reset_timeout;
 
 	kbdev = to_kbase_device(dev);
 	if (!kbdev)
 		return -ENODEV;
 
-	ret = kstrtoint(buf, 0, &reset_timeout);
-	if (ret || reset_timeout <= 0) {
+	ret = kstrtou32(buf, 0, &reset_timeout);
+	if (ret || reset_timeout == 0) {
 		dev_err(kbdev->dev, "Couldn't process reset_timeout write operation.\n"
 				"Use format <reset_timeout_ms>\n");
 		return -EINVAL;
 	}
 
+#if MALI_USE_CSF
+	default_reset_timeout = kbase_get_timeout_ms(kbdev, CSF_GPU_RESET_TIMEOUT);
+#else /* MALI_USE_CSF */
+	default_reset_timeout = JM_DEFAULT_RESET_TIMEOUT_MS;
+#endif /* !MALI_USE_CSF */
+
+	if (reset_timeout < default_reset_timeout)
+		dev_warn(kbdev->dev, "requested reset_timeout(%u) is smaller than default(%u)",
+			 reset_timeout, default_reset_timeout);
+
 	kbdev->reset_timeout_ms = reset_timeout;
-	dev_dbg(kbdev->dev, "Reset timeout: %dms\n", reset_timeout);
+	dev_dbg(kbdev->dev, "Reset timeout: %ums\n", reset_timeout);
 
 	return count;
 }
@@ -4290,7 +4713,7 @@ static int kbase_common_reg_map(struct kbase_device *kbdev)
 static void kbase_common_reg_unmap(struct kbase_device * const kbdev)
 {
 }
-#else /* CONFIG_MALI_NO_MALI */
+#else /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
 static int kbase_common_reg_map(struct kbase_device *kbdev)
 {
 	int err = 0;
@@ -4326,7 +4749,7 @@ static void kbase_common_reg_unmap(struct kbase_device * const kbdev)
 		kbdev->reg_size = 0;
 	}
 }
-#endif /* CONFIG_MALI_NO_MALI */
+#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
 
 int registers_map(struct kbase_device * const kbdev)
 {
@@ -4379,8 +4802,10 @@ static bool kbase_is_pm_enabled(const struct device_node *gpu_node)
 	const void *operating_point_node;
 	bool is_pm_enable = false;
 
-	power_model_node = of_get_child_by_name(gpu_node,
-		"power_model");
+	power_model_node = of_get_child_by_name(gpu_node, "power-model");
+	if (!power_model_node)
+		power_model_node = of_get_child_by_name(gpu_node, "power_model");
+
 	if (power_model_node)
 		is_pm_enable = true;
 
@@ -4401,8 +4826,9 @@ static bool kbase_is_pv_enabled(const struct device_node *gpu_node)
 {
 	const void *arbiter_if_node;
 
-	arbiter_if_node = of_get_property(gpu_node,
-		"arbiter_if", NULL);
+	arbiter_if_node = of_get_property(gpu_node, "arbiter-if", NULL);
+	if (!arbiter_if_node)
+		arbiter_if_node = of_get_property(gpu_node, "arbiter_if", NULL);
 
 	return arbiter_if_node ? true : false;
 }
@@ -4530,14 +4956,14 @@ int power_control_init(struct kbase_device *kbdev)
 	for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) {
 		kbdev->regulators[i] = regulator_get_optional(kbdev->dev,
 			regulator_names[i]);
-		if (IS_ERR_OR_NULL(kbdev->regulators[i])) {
+		if (IS_ERR(kbdev->regulators[i])) {
 			err = PTR_ERR(kbdev->regulators[i]);
 			kbdev->regulators[i] = NULL;
 			break;
 		}
 	}
 	if (err == -EPROBE_DEFER) {
-		while ((i > 0) && (i < BASE_MAX_NR_CLOCKS_REGULATORS))
+		while (i > 0)
 			regulator_put(kbdev->regulators[--i]);
 		return err;
 	}
@@ -4558,7 +4984,7 @@ int power_control_init(struct kbase_device *kbdev)
 	 */
 	for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) {
 		kbdev->clocks[i] = of_clk_get(kbdev->dev->of_node, i);
-		if (IS_ERR_OR_NULL(kbdev->clocks[i])) {
+		if (IS_ERR(kbdev->clocks[i])) {
 			err = PTR_ERR(kbdev->clocks[i]);
 			kbdev->clocks[i] = NULL;
 			break;
@@ -4574,7 +5000,7 @@ int power_control_init(struct kbase_device *kbdev)
 		}
 	}
 	if (err == -EPROBE_DEFER) {
-		while ((i > 0) && (i < BASE_MAX_NR_CLOCKS_REGULATORS)) {
+		while (i > 0) {
 			clk_disable_unprepare(kbdev->clocks[--i]);
 			clk_put(kbdev->clocks[i]);
 		}
@@ -4591,16 +5017,47 @@ int power_control_init(struct kbase_device *kbdev)
 	 */
 #if defined(CONFIG_PM_OPP)
 #if defined(CONFIG_REGULATOR)
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+	if (kbdev->nr_regulators > 0) {
+		kbdev->token = dev_pm_opp_set_regulators(kbdev->dev, regulator_names);
+
+		if (kbdev->token < 0) {
+			err = kbdev->token;
+			goto regulators_probe_defer;
+		}
+
+	}
+#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE)
 	if (kbdev->nr_regulators > 0) {
-		kbdev->opp_token = dev_pm_opp_set_regulators(kbdev->dev,
-			regulator_names);
+		kbdev->opp_table = dev_pm_opp_set_regulators(kbdev->dev,
+			regulator_names, BASE_MAX_NR_CLOCKS_REGULATORS);
+
+		if (IS_ERR(kbdev->opp_table)) {
+			err = PTR_ERR(kbdev->opp_table);
+			goto regulators_probe_defer;
+		}
 	}
+#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */
 #endif /* CONFIG_REGULATOR */
 	err = dev_pm_opp_of_add_table(kbdev->dev);
 	CSTD_UNUSED(err);
 #endif /* CONFIG_PM_OPP */
 	return 0;
 
+#if defined(CONFIG_PM_OPP) &&                                                                      \
+	((KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE) && defined(CONFIG_REGULATOR))
+regulators_probe_defer:
+	for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++) {
+		if (kbdev->clocks[i]) {
+			if (__clk_is_enabled(kbdev->clocks[i]))
+				clk_disable_unprepare(kbdev->clocks[i]);
+			clk_put(kbdev->clocks[i]);
+			kbdev->clocks[i] = NULL;
+		} else
+			break;
+	}
+#endif
+
 clocks_probe_defer:
 #if defined(CONFIG_REGULATOR)
 	for (i = 0; i < BASE_MAX_NR_CLOCKS_REGULATORS; i++)
@@ -4617,8 +5074,13 @@ void power_control_term(struct kbase_device *kbdev)
 #if defined(CONFIG_PM_OPP)
 	dev_pm_opp_of_remove_table(kbdev->dev);
 #if defined(CONFIG_REGULATOR)
-	if (kbdev->opp_token >= 0)
-		dev_pm_opp_put_regulators(kbdev->opp_token);
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+	if (kbdev->token > -EPERM)
+		dev_pm_opp_put_regulators(kbdev->token);
+#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE)
+	if (!IS_ERR_OR_NULL(kbdev->opp_table))
+		dev_pm_opp_put_regulators(kbdev->opp_table);
+#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */
 #endif /* CONFIG_REGULATOR */
 #endif /* CONFIG_PM_OPP */
 
@@ -4659,18 +5121,18 @@ static int type##_quirks_set(void *data, u64 val) \
 	kbdev = (struct kbase_device *)data; \
 	kbdev->hw_quirks_##type = (u32)val; \
 	trigger_reset(kbdev); \
-	return 0;\
+	return 0; \
 } \
 \
 static int type##_quirks_get(void *data, u64 *val) \
 { \
-	struct kbase_device *kbdev;\
-	kbdev = (struct kbase_device *)data;\
-	*val = kbdev->hw_quirks_##type;\
-	return 0;\
+	struct kbase_device *kbdev; \
+	kbdev = (struct kbase_device *)data; \
+	*val = kbdev->hw_quirks_##type; \
+	return 0; \
 } \
-DEFINE_SIMPLE_ATTRIBUTE(fops_##type##_quirks, type##_quirks_get,\
-		type##_quirks_set, "%llu\n")
+DEFINE_DEBUGFS_ATTRIBUTE(fops_##type##_quirks, type##_quirks_get, \
+			 type##_quirks_set, "%llu\n")
 
 MAKE_QUIRK_ACCESSORS(sc);
 MAKE_QUIRK_ACCESSORS(tiler);
@@ -4700,8 +5162,46 @@ static int kbase_device_debugfs_reset_write(void *data, u64 wait_for_reset)
 	return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_trigger_reset,
-		NULL, &kbase_device_debugfs_reset_write, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_trigger_reset, NULL, &kbase_device_debugfs_reset_write, "%llu\n");
+
+/**
+ * kbase_device_debugfs_trigger_uevent_write - send a GPU uevent
+ * @file: File object to write to
+ * @ubuf:  User buffer to read data from
+ * @count:  Length of user buffer
+ * @ppos: Offset within file object
+ *
+ * Return: bytes read.
+ */
+static ssize_t kbase_device_debugfs_trigger_uevent_write(struct file *file,
+		const char __user *ubuf, size_t count, loff_t *ppos)
+{
+	struct kbase_device *kbdev = (struct kbase_device *)file->private_data;
+	struct gpu_uevent evt = { 0 };
+	char str[8] = { 0 };
+
+	if (count >= sizeof(str))
+		return -EINVAL;
+
+	if (copy_from_user(str, ubuf, count))
+		return -EINVAL;
+
+	str[count] = '\0';
+
+	if (sscanf(str, "%u %u", &evt.type, &evt.info) != 2)
+		return -EINVAL;
+
+	pixel_gpu_uevent_send(kbdev, (const struct gpu_uevent *) &evt);
+
+	return count;
+}
+
+static const struct file_operations fops_trigger_uevent = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.write = kbase_device_debugfs_trigger_uevent_write,
+	.llseek = default_llseek,
+};
 
 /**
  * debugfs_protected_debug_mode_read - "protected_debug_mode" debugfs read
@@ -4785,57 +5285,84 @@ static const struct file_operations
 	.release = single_release,
 };
 
-int kbase_device_debugfs_init(struct kbase_device *kbdev)
+/**
+ * debugfs_ctx_defaults_init - Create the default configuration of new contexts in debugfs
+ * @kbdev: An instance of the GPU platform device, allocated from the probe method of the driver.
+ * Return: A pointer to the last dentry that it tried to create, whether successful or not.
+ *         Could be NULL or encode another error value.
+ */
+static struct dentry *debugfs_ctx_defaults_init(struct kbase_device *const kbdev)
 {
-	struct dentry *debugfs_ctx_defaults_directory;
-	int err;
 	/* prevent unprivileged use of debug file system
 	 * in old kernel version
 	 */
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
-	/* only for newer kernel version debug file system is safe */
 	const mode_t mode = 0644;
-#else
-	const mode_t mode = 0600;
-#endif
+	struct dentry *dentry = debugfs_create_dir("defaults", kbdev->debugfs_ctx_directory);
+	struct dentry *debugfs_ctx_defaults_directory = dentry;
+
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Couldn't create mali debugfs ctx defaults directory\n");
+		return dentry;
+	}
+
+	debugfs_create_bool("infinite_cache", mode,
+			debugfs_ctx_defaults_directory,
+			&kbdev->infinite_cache_active_default);
 
-	kbdev->mali_debugfs_directory = debugfs_create_dir(kbdev->devname,
-			NULL);
-	if (IS_ERR_OR_NULL(kbdev->mali_debugfs_directory)) {
+	dentry = debugfs_create_file("mem_pool_max_size", mode, debugfs_ctx_defaults_directory,
+				   &kbdev->mem_pool_defaults.small,
+				   &kbase_device_debugfs_mem_pool_max_size_fops);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create mem_pool_max_size debugfs entry\n");
+		return dentry;
+	}
+
+	dentry = debugfs_create_file("lp_mem_pool_max_size", mode, debugfs_ctx_defaults_directory,
+				   &kbdev->mem_pool_defaults.large,
+				   &kbase_device_debugfs_mem_pool_max_size_fops);
+	if (IS_ERR_OR_NULL(dentry))
+		dev_err(kbdev->dev, "Unable to create lp_mem_pool_max_size debugfs entry\n");
+
+	return dentry;
+}
+
+/**
+ * init_debugfs - Create device-wide debugfs directories and files for the Mali driver
+ * @kbdev: An instance of the GPU platform device, allocated from the probe method of the driver.
+ * Return: A pointer to the last dentry that it tried to create, whether successful or not.
+ *         Could be NULL or encode another error value.
+ */
+static struct dentry *init_debugfs(struct kbase_device *kbdev)
+{
+	struct dentry *dentry = debugfs_create_dir(kbdev->devname, NULL);
+
+	kbdev->mali_debugfs_directory = dentry;
+	if (IS_ERR_OR_NULL(dentry)) {
 		dev_err(kbdev->dev,
 			"Couldn't create mali debugfs directory: %s\n",
 			kbdev->devname);
-		err = -ENOMEM;
-		goto out;
+		return dentry;
 	}
 
-	kbdev->debugfs_ctx_directory = debugfs_create_dir("ctx",
-			kbdev->mali_debugfs_directory);
-	if (IS_ERR_OR_NULL(kbdev->debugfs_ctx_directory)) {
+	dentry = debugfs_create_dir("ctx", kbdev->mali_debugfs_directory);
+	kbdev->debugfs_ctx_directory = dentry;
+	if (IS_ERR_OR_NULL(dentry)) {
 		dev_err(kbdev->dev, "Couldn't create mali debugfs ctx directory\n");
-		err = -ENOMEM;
-		goto out;
+		return dentry;
 	}
 
-	kbdev->debugfs_instr_directory = debugfs_create_dir("instrumentation",
-			kbdev->mali_debugfs_directory);
-	if (IS_ERR_OR_NULL(kbdev->debugfs_instr_directory)) {
+	dentry = debugfs_create_dir("instrumentation", kbdev->mali_debugfs_directory);
+	kbdev->debugfs_instr_directory = dentry;
+	if (IS_ERR_OR_NULL(dentry)) {
 		dev_err(kbdev->dev, "Couldn't create mali debugfs instrumentation directory\n");
-		err = -ENOMEM;
-		goto out;
-	}
-
-	debugfs_ctx_defaults_directory = debugfs_create_dir("defaults",
-			kbdev->debugfs_ctx_directory);
-	if (IS_ERR_OR_NULL(debugfs_ctx_defaults_directory)) {
-		dev_err(kbdev->dev, "Couldn't create mali debugfs ctx defaults directory\n");
-		err = -ENOMEM;
-		goto out;
+		return dentry;
 	}
 
 	kbasep_regs_history_debugfs_init(kbdev);
 
-#if !MALI_USE_CSF
+#if MALI_USE_CSF
+	kbase_debug_csf_fault_debugfs_init(kbdev);
+#else /* MALI_USE_CSF */
 	kbase_debug_job_fault_debugfs_init(kbdev);
 #endif /* !MALI_USE_CSF */
 
@@ -4849,41 +5376,62 @@ int kbase_device_debugfs_init(struct kbase_device *kbdev)
 	/* fops_* variables created by invocations of macro
 	 * MAKE_QUIRK_ACCESSORS() above.
 	 */
-	debugfs_create_file("quirks_sc", 0644,
+	dentry = debugfs_create_file("quirks_sc", 0644,
 			kbdev->mali_debugfs_directory, kbdev,
 			&fops_sc_quirks);
-	debugfs_create_file("quirks_tiler", 0644,
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create quirks_sc debugfs entry\n");
+		return dentry;
+	}
+
+	dentry = debugfs_create_file("quirks_tiler", 0644,
 			kbdev->mali_debugfs_directory, kbdev,
 			&fops_tiler_quirks);
-	debugfs_create_file("quirks_mmu", 0644,
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create quirks_tiler debugfs entry\n");
+		return dentry;
+	}
+
+	dentry = debugfs_create_file("quirks_mmu", 0644,
 			kbdev->mali_debugfs_directory, kbdev,
 			&fops_mmu_quirks);
-	debugfs_create_file("quirks_gpu", 0644, kbdev->mali_debugfs_directory,
-			    kbdev, &fops_gpu_quirks);
-
-	debugfs_create_bool("infinite_cache", mode,
-			debugfs_ctx_defaults_directory,
-			&kbdev->infinite_cache_active_default);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create quirks_mmu debugfs entry\n");
+		return dentry;
+	}
 
-	debugfs_create_file("mem_pool_max_size", mode,
-			debugfs_ctx_defaults_directory,
-			&kbdev->mem_pool_defaults.small,
-			&kbase_device_debugfs_mem_pool_max_size_fops);
+	dentry = debugfs_create_file("quirks_gpu", 0644, kbdev->mali_debugfs_directory,
+			    kbdev, &fops_gpu_quirks);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create quirks_gpu debugfs entry\n");
+		return dentry;
+	}
 
-	debugfs_create_file("lp_mem_pool_max_size", mode,
-			debugfs_ctx_defaults_directory,
-			&kbdev->mem_pool_defaults.large,
-			&kbase_device_debugfs_mem_pool_max_size_fops);
+	dentry = debugfs_ctx_defaults_init(kbdev);
+	if (IS_ERR_OR_NULL(dentry))
+		return dentry;
 
 	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PROTECTED_DEBUG_MODE)) {
-		debugfs_create_file("protected_debug_mode", 0444,
+		dentry = debugfs_create_file("protected_debug_mode", 0444,
 				kbdev->mali_debugfs_directory, kbdev,
 				&fops_protected_debug_mode);
+		if (IS_ERR_OR_NULL(dentry)) {
+			dev_err(kbdev->dev, "Unable to create protected_debug_mode debugfs entry\n");
+			return dentry;
+		}
 	}
 
-	debugfs_create_file("reset", 0644,
+	dentry = debugfs_create_file("reset", 0644,
 			kbdev->mali_debugfs_directory, kbdev,
 			&fops_trigger_reset);
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create reset debugfs entry\n");
+		return dentry;
+	}
+
+	debugfs_create_file("trigger_uevent", 0644,
+			kbdev->mali_debugfs_directory, kbdev,
+			&fops_trigger_uevent);
 
 	kbase_ktrace_debugfs_init(kbdev);
 
@@ -4895,18 +5443,30 @@ int kbase_device_debugfs_init(struct kbase_device *kbdev)
 #endif /* CONFIG_MALI_DEVFREQ */
 
 #if !MALI_USE_CSF
-	debugfs_create_file("serialize_jobs", 0644,
+	dentry = debugfs_create_file("serialize_jobs", 0644,
 			kbdev->mali_debugfs_directory, kbdev,
 			&kbasep_serialize_jobs_debugfs_fops);
-
+	if (IS_ERR_OR_NULL(dentry)) {
+		dev_err(kbdev->dev, "Unable to create serialize_jobs debugfs entry\n");
+		return dentry;
+	}
+	kbase_timeline_io_debugfs_init(kbdev);
 #endif
 	kbase_dvfs_status_debugfs_init(kbdev);
 
-	return 0;
 
-out:
-	debugfs_remove_recursive(kbdev->mali_debugfs_directory);
-	return err;
+	return dentry;
+}
+
+int kbase_device_debugfs_init(struct kbase_device *kbdev)
+{
+	struct dentry *dentry = init_debugfs(kbdev);
+
+	if (IS_ERR_OR_NULL(dentry)) {
+		debugfs_remove_recursive(kbdev->mali_debugfs_directory);
+		return IS_ERR(dentry) ? PTR_ERR(dentry) : -ENOMEM;
+	}
+	return 0;
 }
 
 void kbase_device_debugfs_term(struct kbase_device *kbdev)
@@ -5098,10 +5658,11 @@ static ssize_t fw_timeout_store(struct device *dev,
 
 	ret = kstrtouint(buf, 0, &fw_timeout);
 	if (ret || fw_timeout == 0) {
-		dev_err(kbdev->dev, "%s\n%s\n%u",
-			"Couldn't process fw_timeout write operation.",
-			"Use format 'fw_timeout_ms', and fw_timeout_ms > 0",
-			FIRMWARE_PING_INTERVAL_MS);
+		dev_err(kbdev->dev,
+			"Couldn't process fw_timeout write operation.\n"
+			"Use format 'fw_timeout_ms', and fw_timeout_ms > 0\n"
+			"Default fw_timeout: %u",
+			kbase_get_timeout_ms(kbdev, CSF_FIRMWARE_PING_TIMEOUT));
 		return -EINVAL;
 	}
 
@@ -5171,7 +5732,10 @@ static ssize_t idle_hysteresis_time_store(struct device *dev,
 		return -EINVAL;
 	}
 
-	kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur);
+	/* In sysFs, The unit of the input value of idle_hysteresis_time is us.
+	 * But the unit of the input parameter of this function is ns, so multiply by 1000
+	 */
+	kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur * NSEC_PER_USEC);
 
 	return count;
 }
@@ -5198,13 +5762,221 @@ static ssize_t idle_hysteresis_time_show(struct device *dev,
 	if (!kbdev)
 		return -ENODEV;
 
-	dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev);
+	/* The unit of return value of idle_hysteresis_time_show is us, So divide by 1000.*/
+	dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev) / NSEC_PER_USEC;
 	ret = scnprintf(buf, PAGE_SIZE, "%u\n", dur);
 
 	return ret;
 }
 
 static DEVICE_ATTR_RW(idle_hysteresis_time);
+
+/**
+ * idle_hysteresis_time_ns_store - Store callback for CSF
+ *                     idle_hysteresis_time_ns sysfs file.
+ *
+ * @dev:   The device with sysfs file is for
+ * @attr:  The attributes of the sysfs file
+ * @buf:   The value written to the sysfs file
+ * @count: The number of bytes written to the sysfs file
+ *
+ * This function is called when the idle_hysteresis_time_ns sysfs
+ * file is written to.
+ *
+ * This file contains values of the idle hysteresis duration in ns.
+ *
+ * Return: @count if the function succeeded. An error code on failure.
+ */
+static ssize_t idle_hysteresis_time_ns_store(struct device *dev, struct device_attribute *attr,
+					     const char *buf, size_t count)
+{
+	struct kbase_device *kbdev;
+	u32 dur = 0;
+
+	kbdev = to_kbase_device(dev);
+	if (!kbdev)
+		return -ENODEV;
+
+	if (kstrtou32(buf, 0, &dur)) {
+		dev_err(kbdev->dev, "Couldn't process idle_hysteresis_time_ns write operation.\n"
+				    "Use format <idle_hysteresis_time_ns>\n");
+		return -EINVAL;
+	}
+
+	kbase_csf_firmware_set_gpu_idle_hysteresis_time(kbdev, dur);
+
+	return count;
+}
+
+/**
+ * idle_hysteresis_time_ns_show - Show callback for CSF
+ *                  idle_hysteresis_time_ns sysfs entry.
+ *
+ * @dev:  The device this sysfs file is for.
+ * @attr: The attributes of the sysfs file.
+ * @buf:  The output buffer to receive the GPU information.
+ *
+ * This function is called to get the current idle hysteresis duration in ns.
+ *
+ * Return: The number of bytes output to @buf.
+ */
+static ssize_t idle_hysteresis_time_ns_show(struct device *dev, struct device_attribute *attr,
+					    char *const buf)
+{
+	struct kbase_device *kbdev;
+	ssize_t ret;
+	u32 dur;
+
+	kbdev = to_kbase_device(dev);
+	if (!kbdev)
+		return -ENODEV;
+
+	dur = kbase_csf_firmware_get_gpu_idle_hysteresis_time(kbdev);
+	ret = scnprintf(buf, PAGE_SIZE, "%u\n", dur);
+
+	return ret;
+}
+
+static DEVICE_ATTR_RW(idle_hysteresis_time_ns);
+
+/**
+ * mcu_shader_pwroff_timeout_show - Get the MCU shader Core power-off time value.
+ *
+ * @dev:  The device this sysfs file is for.
+ * @attr: The attributes of the sysfs file.
+ * @buf:  The output buffer for the sysfs file contents
+ *
+ * Get the internally recorded MCU shader Core power-off (nominal) timeout value.
+ * The unit of the value is in micro-seconds.
+ *
+ * Return: The number of bytes output to @buf if the
+ *         function succeeded. A Negative value on failure.
+ */
+static ssize_t mcu_shader_pwroff_timeout_show(struct device *dev, struct device_attribute *attr,
+					      char *const buf)
+{
+	struct kbase_device *kbdev = dev_get_drvdata(dev);
+	u32 pwroff;
+
+	if (!kbdev)
+		return -ENODEV;
+
+	/* The unit of return value of the function is us, So divide by 1000.*/
+	pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev) / NSEC_PER_USEC;
+	return scnprintf(buf, PAGE_SIZE, "%u\n", pwroff);
+}
+
+/**
+ * mcu_shader_pwroff_timeout_store - Set the MCU shader core power-off time value.
+ *
+ * @dev:   The device with sysfs file is for
+ * @attr:  The attributes of the sysfs file
+ * @buf:   The value written to the sysfs file
+ * @count: The number of bytes to write to the sysfs file
+ *
+ * The duration value (unit: micro-seconds) for configuring MCU Shader Core
+ * timer, when the shader cores' power transitions are delegated to the
+ * MCU (normal operational mode)
+ *
+ * Return: @count if the function succeeded. An error code on failure.
+ */
+static ssize_t mcu_shader_pwroff_timeout_store(struct device *dev, struct device_attribute *attr,
+					       const char *buf, size_t count)
+{
+	struct kbase_device *kbdev = dev_get_drvdata(dev);
+	u32 dur;
+
+	const struct kbase_pm_policy *current_policy;
+	bool always_on;
+
+	if (!kbdev)
+		return -ENODEV;
+
+	if (kstrtouint(buf, 0, &dur))
+		return -EINVAL;
+
+	current_policy = kbase_pm_get_policy(kbdev);
+	always_on = current_policy == &kbase_pm_always_on_policy_ops;
+	if (dur == 0 && !always_on)
+		return -EINVAL;
+
+	/* In sysFs, The unit of the input value of mcu_shader_pwroff_timeout is us.
+	 * But the unit of the input parameter of this function is ns, so multiply by 1000
+	 */
+	kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, dur * NSEC_PER_USEC);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(mcu_shader_pwroff_timeout);
+
+/**
+ * mcu_shader_pwroff_timeout_ns_show - Get the MCU shader Core power-off time value.
+ *
+ * @dev:  The device this sysfs file is for.
+ * @attr: The attributes of the sysfs file.
+ * @buf:  The output buffer for the sysfs file contents
+ *
+ * Get the internally recorded MCU shader Core power-off (nominal) timeout value.
+ * The unit of the value is in nanoseconds.
+ *
+ * Return: The number of bytes output to @buf if the
+ *         function succeeded. A Negative value on failure.
+ */
+static ssize_t mcu_shader_pwroff_timeout_ns_show(struct device *dev, struct device_attribute *attr,
+						 char *const buf)
+{
+	struct kbase_device *kbdev = dev_get_drvdata(dev);
+	u32 pwroff;
+
+	if (!kbdev)
+		return -ENODEV;
+
+	pwroff = kbase_csf_firmware_get_mcu_core_pwroff_time(kbdev);
+	return scnprintf(buf, PAGE_SIZE, "%u\n", pwroff);
+}
+
+/**
+ * mcu_shader_pwroff_timeout_ns_store - Set the MCU shader core power-off time value.
+ *
+ * @dev:   The device with sysfs file is for
+ * @attr:  The attributes of the sysfs file
+ * @buf:   The value written to the sysfs file
+ * @count: The number of bytes to write to the sysfs file
+ *
+ * The duration value (unit: nanoseconds) for configuring MCU Shader Core
+ * timer, when the shader cores' power transitions are delegated to the
+ * MCU (normal operational mode)
+ *
+ * Return: @count if the function succeeded. An error code on failure.
+ */
+static ssize_t mcu_shader_pwroff_timeout_ns_store(struct device *dev, struct device_attribute *attr,
+						  const char *buf, size_t count)
+{
+	struct kbase_device *kbdev = dev_get_drvdata(dev);
+	u32 dur;
+
+	const struct kbase_pm_policy *current_policy;
+	bool always_on;
+
+	if (!kbdev)
+		return -ENODEV;
+
+	if (kstrtouint(buf, 0, &dur))
+		return -EINVAL;
+
+	current_policy = kbase_pm_get_policy(kbdev);
+	always_on = current_policy == &kbase_pm_always_on_policy_ops;
+	if (dur == 0 && !always_on)
+		return -EINVAL;
+
+	kbase_csf_firmware_set_mcu_core_pwroff_time(kbdev, dur);
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(mcu_shader_pwroff_timeout_ns);
+
 #endif /* MALI_USE_CSF */
 
 static struct attribute *kbase_scheduling_attrs[] = {
@@ -5265,6 +6037,9 @@ static struct attribute *kbase_attrs[] = {
 	&dev_attr_csg_scheduling_period.attr,
 	&dev_attr_fw_timeout.attr,
 	&dev_attr_idle_hysteresis_time.attr,
+	&dev_attr_idle_hysteresis_time_ns.attr,
+	&dev_attr_mcu_shader_pwroff_timeout.attr,
+	&dev_attr_mcu_shader_pwroff_timeout_ns.attr,
 #endif /* !MALI_USE_CSF */
 	&dev_attr_power_policy.attr,
 	&dev_attr_core_mask.attr,
@@ -5402,8 +6177,15 @@ static int kbase_platform_device_probe(struct platform_device *pdev)
 	}
 
 	kbdev->dev = &pdev->dev;
-	dev_set_drvdata(kbdev->dev, kbdev);
 
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+	kbdev->token = -EPERM;
+#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */
+
+	dev_set_drvdata(kbdev->dev, kbdev);
+#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE)
+	mutex_lock(&kbase_probe_mutex);
+#endif
 	err = kbase_device_init(kbdev);
 
 	if (err) {
@@ -5415,14 +6197,28 @@ static int kbase_platform_device_probe(struct platform_device *pdev)
 
 		dev_set_drvdata(kbdev->dev, NULL);
 		kbase_device_free(kbdev);
+#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE)
+		mutex_unlock(&kbase_probe_mutex);
+#endif
 	} else {
+#if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE)
+		/* Since upstream is not exporting mmap_min_addr, kbase at the
+		 * moment is unable to track possible kernel changes via sysfs.
+		 * Flag this out in a device info message.
+		 */
+		dev_info(kbdev->dev, KBASE_COMPILED_MMAP_MIN_ADDR_MSG);
+#endif
+
 		dev_info(kbdev->dev,
 			"Probed as %s\n", dev_name(kbdev->mdev.this_device));
 		kbase_increment_device_id();
+#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE)
+		mutex_unlock(&kbase_probe_mutex);
+#endif
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
-		mutex_lock(&kbdev->pm.lock);
+		rt_mutex_lock(&kbdev->pm.lock);
 		kbase_arbiter_pm_vm_event(kbdev, KBASE_VM_GPU_INITIALIZED_EVT);
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 #endif
 	}
 
@@ -5490,13 +6286,8 @@ static int kbase_device_resume(struct device *dev)
 
 #ifdef CONFIG_MALI_DEVFREQ
 	dev_dbg(dev, "Callback %s\n", __func__);
-	if (kbdev->devfreq) {
-		mutex_lock(&kbdev->pm.lock);
-		if (kbdev->pm.active_count > 0)
-			kbase_devfreq_enqueue_work(kbdev, DEVFREQ_WORK_RESUME);
-		mutex_unlock(&kbdev->pm.lock);
-		flush_workqueue(kbdev->devfreq_queue.workq);
-	}
+	if (kbdev->devfreq)
+		kbase_devfreq_enqueue_work(kbdev, DEVFREQ_WORK_RESUME);
 #endif
 	return 0;
 }
@@ -5631,12 +6422,11 @@ static const struct dev_pm_ops kbase_pm_ops = {
 };
 
 #if IS_ENABLED(CONFIG_OF)
-static const struct of_device_id kbase_dt_ids[] = {
-	{ .compatible = "arm,malit6xx" },
-	{ .compatible = "arm,mali-midgard" },
-	{ .compatible = "arm,mali-bifrost" },
-	{ /* sentinel */ }
-};
+static const struct of_device_id kbase_dt_ids[] = { { .compatible = "arm,malit6xx" },
+						    { .compatible = "arm,mali-midgard" },
+						    { .compatible = "arm,mali-bifrost" },
+						    { .compatible = "arm,mali-valhall" },
+						    { /* sentinel */ } };
 MODULE_DEVICE_TABLE(of, kbase_dt_ids);
 #endif
 
@@ -5644,33 +6434,36 @@ static struct platform_driver kbase_platform_driver = {
 	.probe = kbase_platform_device_probe,
 	.remove = kbase_platform_device_remove,
 	.driver = {
-		   .name = kbase_drv_name,
+		   .name = KBASE_DRV_NAME,
 		   .pm = &kbase_pm_ops,
 		   .of_match_table = of_match_ptr(kbase_dt_ids),
 		   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
 
-/*
- * The driver will not provide a shortcut to create the Mali platform device
- * anymore when using Device Tree.
- */
-#if IS_ENABLED(CONFIG_OF)
+#if (KERNEL_VERSION(5, 3, 0) > LINUX_VERSION_CODE) && IS_ENABLED(CONFIG_OF)
 module_platform_driver(kbase_platform_driver);
 #else
-
 static int __init kbase_driver_init(void)
 {
 	int ret;
 
+#if (KERNEL_VERSION(5, 3, 0) <= LINUX_VERSION_CODE)
+	mutex_init(&kbase_probe_mutex);
+#endif
+
+#ifndef CONFIG_OF
 	ret = kbase_platform_register();
 	if (ret)
 		return ret;
-
+#endif
 	ret = platform_driver_register(&kbase_platform_driver);
-
-	if (ret)
+#ifndef CONFIG_OF
+	if (ret) {
 		kbase_platform_unregister();
+		return ret;
+	}
+#endif
 
 	return ret;
 }
@@ -5678,14 +6471,14 @@ static int __init kbase_driver_init(void)
 static void __exit kbase_driver_exit(void)
 {
 	platform_driver_unregister(&kbase_platform_driver);
+#ifndef CONFIG_OF
 	kbase_platform_unregister();
+#endif
 }
 
 module_init(kbase_driver_init);
 module_exit(kbase_driver_exit);
-
-#endif /* CONFIG_OF */
-
+#endif
 MODULE_LICENSE("GPL");
 MODULE_IMPORT_NS(DMA_BUF);
 MODULE_VERSION(MALI_RELEASE_NAME " (UK version " \
diff --git a/mali_kbase/mali_kbase_cs_experimental.h b/mali_kbase/mali_kbase_cs_experimental.h
index 4dc09e4..7e885ca 100644
--- a/mali_kbase/mali_kbase_cs_experimental.h
+++ b/mali_kbase/mali_kbase_cs_experimental.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -30,9 +30,9 @@
  */
 static inline void mali_kbase_print_cs_experimental(void)
 {
-#if MALI_INCREMENTAL_RENDERING
-	pr_info("mali_kbase: INCREMENTAL_RENDERING (experimental) enabled");
-#endif /* MALI_INCREMENTAL_RENDERING */
+#if MALI_INCREMENTAL_RENDERING_JM
+	pr_info("mali_kbase: INCREMENTAL_RENDERING_JM (experimental) enabled");
+#endif /* MALI_INCREMENTAL_RENDERING_JM */
 }
 
 #endif /* _KBASE_CS_EXPERIMENTAL_H_ */
diff --git a/mali_kbase/mali_kbase_ctx_sched.c b/mali_kbase/mali_kbase_ctx_sched.c
index 8026e7f..ea4f300 100644
--- a/mali_kbase/mali_kbase_ctx_sched.c
+++ b/mali_kbase/mali_kbase_ctx_sched.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2017-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,7 +23,9 @@
 #include <mali_kbase_defs.h>
 #include "mali_kbase_ctx_sched.h"
 #include "tl/mali_kbase_tracepoints.h"
-#if !MALI_USE_CSF
+#if MALI_USE_CSF
+#include "mali_kbase_reset_gpu.h"
+#else
 #include <mali_kbase_hwaccess_jm.h>
 #endif
 
@@ -67,6 +69,12 @@ void kbase_ctx_sched_term(struct kbase_device *kbdev)
 	}
 }
 
+void kbase_ctx_sched_init_ctx(struct kbase_context *kctx)
+{
+	kctx->as_nr = KBASEP_AS_NR_INVALID;
+	atomic_set(&kctx->refcount, 0);
+}
+
 /* kbasep_ctx_sched_find_as_for_ctx - Find a free address space
  *
  * @kbdev: The context for which to find a free address space
@@ -111,7 +119,7 @@ int kbase_ctx_sched_retain_ctx(struct kbase_context *kctx)
 	if (atomic_inc_return(&kctx->refcount) == 1) {
 		int const free_as = kbasep_ctx_sched_find_as_for_ctx(kctx);
 
-		if (free_as != KBASEP_AS_NR_INVALID) {
+		if (free_as >= 0) {
 			kbdev->as_free &= ~(1u << free_as);
 			/* Only program the MMU if the context has not been
 			 * assigned the same address space before.
@@ -152,9 +160,23 @@ void kbase_ctx_sched_retain_ctx_refcount(struct kbase_context *kctx)
 	struct kbase_device *const kbdev = kctx->kbdev;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	WARN_ON(atomic_read(&kctx->refcount) == 0);
-	WARN_ON(kctx->as_nr == KBASEP_AS_NR_INVALID);
-	WARN_ON(kbdev->as_to_kctx[kctx->as_nr] != kctx);
+#if MALI_USE_CSF
+	/* We expect the context to be active when this function is called,
+	 * except for the case where a page fault is reported for it during
+	 * the GPU reset sequence, in which case we can expect the refcount
+	 * to be 0.
+	 */
+	WARN_ON(!atomic_read(&kctx->refcount) && !kbase_reset_gpu_is_active(kbdev));
+#else
+	/* We expect the context to be active (and thus refcount should be non-zero)
+         * when this function is called
+         */
+	WARN_ON(!atomic_read(&kctx->refcount));
+#endif
+	if (likely((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS)))
+		WARN_ON(kbdev->as_to_kctx[kctx->as_nr] != kctx);
+	else
+		WARN(true, "Invalid as_nr(%d)", kctx->as_nr);
 
 	atomic_inc(&kctx->refcount);
 }
@@ -168,16 +190,17 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx)
 
 	new_ref_count = atomic_dec_return(&kctx->refcount);
 	if (new_ref_count == 0) {
-		kbdev->as_free |= (1u << kctx->as_nr);
-		if (kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT)) {
-			KBASE_TLSTREAM_TL_KBASE_CTX_UNASSIGN_AS(
-				kbdev, kctx->id);
-			kbdev->as_to_kctx[kctx->as_nr] = NULL;
-			kctx->as_nr = KBASEP_AS_NR_INVALID;
-			kbase_ctx_flag_clear(kctx, KCTX_AS_DISABLED_ON_FAULT);
+		if (likely((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS))) {
+			kbdev->as_free |= (1u << kctx->as_nr);
+			if (kbase_ctx_flag(kctx, KCTX_AS_DISABLED_ON_FAULT)) {
+				KBASE_TLSTREAM_TL_KBASE_CTX_UNASSIGN_AS(kbdev, kctx->id);
+				kbdev->as_to_kctx[kctx->as_nr] = NULL;
+				kctx->as_nr = KBASEP_AS_NR_INVALID;
+				kbase_ctx_flag_clear(kctx, KCTX_AS_DISABLED_ON_FAULT);
 #if !MALI_USE_CSF
-			kbase_backend_slot_kctx_purge_locked(kbdev, kctx);
+				kbase_backend_slot_kctx_purge_locked(kbdev, kctx);
 #endif
+			}
 		}
 	}
 
@@ -187,13 +210,14 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx)
 void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx)
 {
 	struct kbase_device *const kbdev = kctx->kbdev;
+	unsigned long flags;
 
-	lockdep_assert_held(&kbdev->mmu_hw_mutex);
-	lockdep_assert_held(&kbdev->hwaccess_lock);
+	mutex_lock(&kbdev->mmu_hw_mutex);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
 	WARN_ON(atomic_read(&kctx->refcount) != 0);
 
-	if (kctx->as_nr != KBASEP_AS_NR_INVALID) {
+	if ((kctx->as_nr >= 0) && (kctx->as_nr < BASE_MAX_NR_AS)) {
 		if (kbdev->pm.backend.gpu_powered)
 			kbase_mmu_disable(kctx);
 
@@ -201,6 +225,9 @@ void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx)
 		kbdev->as_to_kctx[kctx->as_nr] = NULL;
 		kctx->as_nr = KBASEP_AS_NR_INVALID;
 	}
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	mutex_unlock(&kbdev->mmu_hw_mutex);
 }
 
 void kbase_ctx_sched_restore_all_as(struct kbase_device *kbdev)
@@ -212,6 +239,8 @@ void kbase_ctx_sched_restore_all_as(struct kbase_device *kbdev)
 
 	WARN_ON(!kbdev->pm.backend.gpu_powered);
 
+	kbdev->mmu_unresponsive = false;
+
 	for (i = 0; i != kbdev->nr_hw_address_spaces; ++i) {
 		struct kbase_context *kctx;
 
@@ -264,7 +293,7 @@ struct kbase_context *kbase_ctx_sched_as_to_ctx_refcount(
 
 	found_kctx = kbdev->as_to_kctx[as_nr];
 
-	if (!WARN_ON(found_kctx == NULL))
+	if (found_kctx)
 		kbase_ctx_sched_retain_ctx_refcount(found_kctx);
 
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -313,16 +342,14 @@ struct kbase_context *kbase_ctx_sched_as_to_ctx_nolock(
 bool kbase_ctx_sched_inc_refcount_nolock(struct kbase_context *kctx)
 {
 	bool result = false;
-	int as_nr;
 
 	if (WARN_ON(kctx == NULL))
 		return result;
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
 
-	as_nr = kctx->as_nr;
 	if (atomic_read(&kctx->refcount) > 0) {
-		KBASE_DEBUG_ASSERT(as_nr >= 0);
+		KBASE_DEBUG_ASSERT(kctx->as_nr >= 0);
 
 		kbase_ctx_sched_retain_ctx_refcount(kctx);
 		KBASE_KTRACE_ADD(kctx->kbdev, SCHED_RETAIN_CTX_NOLOCK, kctx,
diff --git a/mali_kbase/mali_kbase_ctx_sched.h b/mali_kbase/mali_kbase_ctx_sched.h
index f787cc3..5a8d175 100644
--- a/mali_kbase/mali_kbase_ctx_sched.h
+++ b/mali_kbase/mali_kbase_ctx_sched.h
@@ -60,6 +60,15 @@ int kbase_ctx_sched_init(struct kbase_device *kbdev);
 void kbase_ctx_sched_term(struct kbase_device *kbdev);
 
 /**
+ * kbase_ctx_sched_ctx_init - Initialize per-context data fields for scheduling
+ * @kctx: The context to initialize
+ *
+ * This must be called during context initialization before any other context
+ * scheduling functions are called on @kctx
+ */
+void kbase_ctx_sched_init_ctx(struct kbase_context *kctx);
+
+/**
  * kbase_ctx_sched_retain_ctx - Retain a reference to the @ref kbase_context
  * @kctx: The context to which to retain a reference
  *
@@ -113,9 +122,6 @@ void kbase_ctx_sched_release_ctx(struct kbase_context *kctx);
  * This function should be called when a context is being destroyed. The
  * context must no longer have any reference. If it has been assigned an
  * address space before then the AS will be unprogrammed.
- *
- * The kbase_device::mmu_hw_mutex and kbase_device::hwaccess_lock locks must be
- * held whilst calling this function.
  */
 void kbase_ctx_sched_remove_ctx(struct kbase_context *kctx);
 
diff --git a/mali_kbase/mali_kbase_debug.h b/mali_kbase/mali_kbase_debug.h
index d9eeed8..f0c4b59 100644
--- a/mali_kbase/mali_kbase_debug.h
+++ b/mali_kbase/mali_kbase_debug.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2012-2015, 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2015, 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -65,7 +65,7 @@ struct kbasep_debug_assert_cb {
 #endif
 
 /**
- * KBASEP_DEBUG_ASSERT_OUT(trace, function, ...) - (Private) system printing
+ * KBASEP_DEBUG_ASSERT_OUT() - (Private) system printing
  * function associated to the @ref KBASE_DEBUG_ASSERT_MSG event.
  * @trace: location in the code from where the message is printed
  * @function: function from where the message is printed
@@ -125,7 +125,7 @@ struct kbasep_debug_assert_cb {
 #endif				/* KBASE_DEBUG_DISABLE_ASSERTS */
 
 /**
- * KBASE_DEBUG_CODE( X ) - Executes the code inside the macro only in debug mode
+ * KBASE_DEBUG_CODE() - Executes the code inside the macro only in debug mode
  * @X: Code to compile only in debug mode.
  */
 #ifdef CONFIG_MALI_DEBUG
diff --git a/mali_kbase/mali_kbase_debug_job_fault.c b/mali_kbase/mali_kbase_debug_job_fault.c
index 4f021b3..d6518b4 100644
--- a/mali_kbase/mali_kbase_debug_job_fault.c
+++ b/mali_kbase/mali_kbase_debug_job_fault.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2012-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2016, 2018-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -87,8 +87,7 @@ static bool kbase_ctx_has_no_event_pending(struct kbase_context *kctx)
 
 static int wait_for_job_fault(struct kbase_device *kbdev)
 {
-#if KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE && \
-	KERNEL_VERSION(4, 15, 0) > LINUX_VERSION_CODE
+#if KERNEL_VERSION(4, 15, 0) > LINUX_VERSION_CODE
 	int ret = wait_event_interruptible_timeout(kbdev->job_fault_wq,
 			kbase_is_job_fault_event_pending(kbdev),
 			msecs_to_jiffies(2000));
diff --git a/mali_kbase/mali_kbase_debug_mem_allocs.c b/mali_kbase/mali_kbase_debug_mem_allocs.c
new file mode 100644
index 0000000..0592187
--- /dev/null
+++ b/mali_kbase/mali_kbase_debug_mem_allocs.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/*
+ * Debugfs interface to dump information about GPU allocations in kctx
+ */
+
+#include "mali_kbase_debug_mem_allocs.h"
+#include "mali_kbase.h"
+
+#include <linux/string.h>
+#include <linux/list.h>
+#include <linux/file.h>
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+
+/**
+ * debug_zone_mem_allocs_show - Show information from specific rbtree
+ * @zone: The memory zone to be displayed
+ * @sfile: The debugfs entry
+ *
+ * This function is called to show information about all the GPU allocations of a
+ * a particular zone within GPU virtual memory space of a context.
+ * The information like the start virtual address and size (in bytes) is shown for
+ * every GPU allocation mapped in the zone.
+ */
+static void debug_zone_mem_allocs_show(struct kbase_reg_zone *zone, struct seq_file *sfile)
+{
+	struct rb_node *p;
+	struct rb_root *rbtree = &zone->reg_rbtree;
+	struct kbase_va_region *reg;
+	const char *type_names[5] = {
+		"Native",
+		"Imported UMM",
+		"Imported user buf",
+		"Alias",
+		"Raw"
+	};
+
+#define MEM_ALLOCS_HEADER \
+	"              VA,          VA size,      Commit size,    Flags,     Mem type\n"
+	seq_printf(sfile, "Zone name: %s\n:", kbase_reg_zone_get_name(zone->id));
+	seq_printf(sfile, MEM_ALLOCS_HEADER);
+	for (p = rb_first(rbtree); p; p = rb_next(p)) {
+		reg = rb_entry(p, struct kbase_va_region, rblink);
+		if (!(reg->flags & KBASE_REG_FREE)) {
+			seq_printf(sfile, "%16llx, %16zx, %16zx, %8lx, %s\n",
+					reg->start_pfn << PAGE_SHIFT, reg->nr_pages << PAGE_SHIFT,
+					kbase_reg_current_backed_size(reg) << PAGE_SHIFT,
+					reg->flags, type_names[reg->gpu_alloc->type]);
+		}
+	}
+}
+
+/**
+ * debug_ctx_mem_allocs_show - Show information about GPU allocations in a kctx
+ * @sfile: The debugfs entry
+ * @data: Data associated with the entry
+ *
+ * Return:
+ * 0 if successfully prints data in debugfs entry file
+ * -1 if it encountered an error
+ */
+static int debug_ctx_mem_allocs_show(struct seq_file *sfile, void *data)
+{
+	struct kbase_context *const kctx = sfile->private;
+	enum kbase_memory_zone zone_idx;
+
+	kbase_gpu_vm_lock(kctx);
+	for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) {
+		struct kbase_reg_zone *zone;
+
+		zone = &kctx->reg_zone[zone_idx];
+		debug_zone_mem_allocs_show(zone, sfile);
+	}
+	kbase_gpu_vm_unlock(kctx);
+	return 0;
+}
+
+/*
+ *  File operations related to debugfs entry for mem_zones
+ */
+static int debug_mem_allocs_open(struct inode *in, struct file *file)
+{
+	return single_open(file, debug_ctx_mem_allocs_show, in->i_private);
+}
+
+static const struct file_operations kbase_debug_mem_allocs_fops = {
+	.owner = THIS_MODULE,
+	.open = debug_mem_allocs_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+/*
+ *  Initialize debugfs entry for mem_allocs
+ */
+void kbase_debug_mem_allocs_init(struct kbase_context *const kctx)
+{
+	/* Caller already ensures this, but we keep the pattern for
+	 * maintenance safety.
+	 */
+	if (WARN_ON(!kctx) || WARN_ON(IS_ERR_OR_NULL(kctx->kctx_dentry)))
+		return;
+
+	debugfs_create_file("mem_allocs", 0400, kctx->kctx_dentry, kctx,
+			    &kbase_debug_mem_allocs_fops);
+}
+#else
+/*
+ * Stub functions for when debugfs is disabled
+ */
+void kbase_debug_mem_allocs_init(struct kbase_context *const kctx)
+{
+}
+#endif
diff --git a/mali_kbase/mali_kbase_debug_mem_allocs.h b/mali_kbase/mali_kbase_debug_mem_allocs.h
new file mode 100644
index 0000000..8cf69c2
--- /dev/null
+++ b/mali_kbase/mali_kbase_debug_mem_allocs.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_DEBUG_MEM_ALLOCS_H
+#define _KBASE_DEBUG_MEM_ALLOCS_H
+
+#include <mali_kbase.h>
+
+/**
+ * kbase_debug_mem_allocs_init() - Initialize the mem_allocs debugfs file
+ * @kctx: Pointer to kernel base context
+ *
+ * This function creates a "mem_allocs" file for a context to show infor about the
+ * GPU allocations created for that context.
+ *
+ * The file is cleaned up by a call to debugfs_remove_recursive() deleting the
+ * parent directory.
+ */
+void kbase_debug_mem_allocs_init(struct kbase_context *kctx);
+
+#endif
diff --git a/mali_kbase/mali_kbase_debug_mem_view.c b/mali_kbase/mali_kbase_debug_mem_view.c
index ce87a00..7086c6b 100644
--- a/mali_kbase/mali_kbase_debug_mem_view.c
+++ b/mali_kbase/mali_kbase_debug_mem_view.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2013-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2013-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -189,13 +189,13 @@ static const struct seq_operations ops = {
 	.show = debug_mem_show,
 };
 
-static int debug_mem_zone_open(struct rb_root *rbtree,
-						struct debug_mem_data *mem_data)
+static int debug_mem_zone_open(struct kbase_reg_zone *zone, struct debug_mem_data *mem_data)
 {
 	int ret = 0;
 	struct rb_node *p;
 	struct kbase_va_region *reg;
 	struct debug_mem_mapping *mapping;
+	struct rb_root *rbtree = &zone->reg_rbtree;
 
 	for (p = rb_first(rbtree); p; p = rb_next(p)) {
 		reg = rb_entry(p, struct kbase_va_region, rblink);
@@ -233,8 +233,9 @@ static int debug_mem_open(struct inode *i, struct file *file)
 	struct kbase_context *const kctx = i->i_private;
 	struct debug_mem_data *mem_data;
 	int ret;
+	enum kbase_memory_zone idx;
 
-	if (get_file_rcu(kctx->filp) == 0)
+	if (!kbase_file_inc_fops_count_unless_closed(kctx->kfile))
 		return -ENOENT;
 
 	/* Check if file was opened in write mode. GPU memory contents
@@ -263,37 +264,15 @@ static int debug_mem_open(struct inode *i, struct file *file)
 
 	mem_data->column_width = kctx->mem_view_column_width;
 
-	ret = debug_mem_zone_open(&kctx->reg_rbtree_same, mem_data);
-	if (ret != 0) {
-		kbase_gpu_vm_unlock(kctx);
-		goto out;
-	}
-
-	ret = debug_mem_zone_open(&kctx->reg_rbtree_custom, mem_data);
-	if (ret != 0) {
-		kbase_gpu_vm_unlock(kctx);
-		goto out;
-	}
-
-	ret = debug_mem_zone_open(&kctx->reg_rbtree_exec, mem_data);
-	if (ret != 0) {
-		kbase_gpu_vm_unlock(kctx);
-		goto out;
-	}
+	for (idx = 0; idx < CONTEXT_ZONE_MAX; idx++) {
+		struct kbase_reg_zone *zone = &kctx->reg_zone[idx];
 
-#if MALI_USE_CSF
-	ret = debug_mem_zone_open(&kctx->reg_rbtree_exec_fixed, mem_data);
-	if (ret != 0) {
-		kbase_gpu_vm_unlock(kctx);
-		goto out;
-	}
-
-	ret = debug_mem_zone_open(&kctx->reg_rbtree_fixed, mem_data);
-	if (ret != 0) {
-		kbase_gpu_vm_unlock(kctx);
-		goto out;
+		ret = debug_mem_zone_open(zone, mem_data);
+		if (ret != 0) {
+			kbase_gpu_vm_unlock(kctx);
+			goto out;
+		}
 	}
-#endif
 
 	kbase_gpu_vm_unlock(kctx);
 
@@ -316,7 +295,7 @@ out:
 	}
 	seq_release(i, file);
 open_fail:
-	fput(kctx->filp);
+	kbase_file_dec_fops_count(kctx->kfile);
 
 	return ret;
 }
@@ -346,7 +325,7 @@ static int debug_mem_release(struct inode *inode, struct file *file)
 		kfree(mem_data);
 	}
 
-	fput(kctx->filp);
+	kbase_file_dec_fops_count(kctx->kfile);
 
 	return 0;
 }
diff --git a/mali_kbase/mali_kbase_debug_mem_view.h b/mali_kbase/mali_kbase_debug_mem_view.h
index d034832..cb8050d 100644
--- a/mali_kbase/mali_kbase_debug_mem_view.h
+++ b/mali_kbase/mali_kbase_debug_mem_view.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2013-2015, 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2013-2015, 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,7 +25,7 @@
 #include <mali_kbase.h>
 
 /**
- * kbase_debug_mem_view_init - Initialize the mem_view sysfs file
+ * kbase_debug_mem_view_init - Initialize the mem_view debugfs file
  * @kctx: Pointer to kernel base context
  *
  * This function creates a "mem_view" file which can be used to get a view of
diff --git a/mali_kbase/mali_kbase_debug_mem_zones.c b/mali_kbase/mali_kbase_debug_mem_zones.c
new file mode 100644
index 0000000..115c9c3
--- /dev/null
+++ b/mali_kbase/mali_kbase_debug_mem_zones.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/*
+ * Debugfs interface to dump information about GPU_VA memory zones
+ */
+
+#include "mali_kbase_debug_mem_zones.h"
+#include "mali_kbase.h"
+
+#include <linux/list.h>
+#include <linux/file.h>
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+
+/**
+ * debug_mem_zones_show - Show information about GPU_VA memory zones
+ * @sfile: The debugfs entry
+ * @data: Data associated with the entry
+ *
+ * This function is called to get the contents of the @c mem_zones debugfs file.
+ * This lists the start address and size (in pages) of each initialized memory
+ * zone within GPU_VA memory.
+ *
+ * Return:
+ * 0 if successfully prints data in debugfs entry file
+ * -1 if it encountered an error
+ */
+static int debug_mem_zones_show(struct seq_file *sfile, void *data)
+{
+	struct kbase_context *const kctx = sfile->private;
+	struct kbase_reg_zone *reg_zone;
+	enum kbase_memory_zone zone_idx;
+
+	kbase_gpu_vm_lock(kctx);
+
+	for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) {
+		reg_zone = &kctx->reg_zone[zone_idx];
+
+		if (reg_zone->base_pfn) {
+			seq_printf(sfile, "%15s %u 0x%.16llx 0x%.16llx\n",
+				   kbase_reg_zone_get_name(zone_idx), zone_idx, reg_zone->base_pfn,
+				   reg_zone->va_size_pages);
+		}
+	}
+#if MALI_USE_CSF
+	reg_zone = &kctx->kbdev->csf.mcu_shared_zone;
+
+	if (reg_zone && reg_zone->base_pfn) {
+		seq_printf(sfile, "%15s %u 0x%.16llx 0x%.16llx\n",
+			   kbase_reg_zone_get_name(MCU_SHARED_ZONE), MCU_SHARED_ZONE,
+			   reg_zone->base_pfn, reg_zone->va_size_pages);
+	}
+#endif
+
+	kbase_gpu_vm_unlock(kctx);
+	return 0;
+}
+
+/*
+ *  File operations related to debugfs entry for mem_zones
+ */
+static int debug_mem_zones_open(struct inode *in, struct file *file)
+{
+	return single_open(file, debug_mem_zones_show, in->i_private);
+}
+
+static const struct file_operations kbase_debug_mem_zones_fops = {
+	.owner = THIS_MODULE,
+	.open = debug_mem_zones_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+/*
+ *  Initialize debugfs entry for mem_zones
+ */
+void kbase_debug_mem_zones_init(struct kbase_context *const kctx)
+{
+	/* Caller already ensures this, but we keep the pattern for
+	 * maintenance safety.
+	 */
+	if (WARN_ON(!kctx) || WARN_ON(IS_ERR_OR_NULL(kctx->kctx_dentry)))
+		return;
+
+	debugfs_create_file("mem_zones", 0400, kctx->kctx_dentry, kctx,
+			    &kbase_debug_mem_zones_fops);
+}
+#else
+/*
+ * Stub functions for when debugfs is disabled
+ */
+void kbase_debug_mem_zones_init(struct kbase_context *const kctx)
+{
+}
+#endif
diff --git a/mali_kbase/mali_kbase_debug_mem_zones.h b/mali_kbase/mali_kbase_debug_mem_zones.h
new file mode 100644
index 0000000..acf349b
--- /dev/null
+++ b/mali_kbase/mali_kbase_debug_mem_zones.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_DEBUG_MEM_ZONES_H
+#define _KBASE_DEBUG_MEM_ZONES_H
+
+#include <mali_kbase.h>
+
+/**
+ * kbase_debug_mem_zones_init() - Initialize the mem_zones sysfs file
+ * @kctx: Pointer to kernel base context
+ *
+ * This function creates a "mem_zones" file which can be used to determine the
+ * address ranges of GPU memory zones, in the GPU Virtual-Address space.
+ *
+ * The file is cleaned up by a call to debugfs_remove_recursive() deleting the
+ * parent directory.
+ */
+void kbase_debug_mem_zones_init(struct kbase_context *kctx);
+
+#endif
diff --git a/mali_kbase/mali_kbase_debugfs_helper.c b/mali_kbase/mali_kbase_debugfs_helper.c
index fcc149b..c846491 100644
--- a/mali_kbase/mali_kbase_debugfs_helper.c
+++ b/mali_kbase/mali_kbase_debugfs_helper.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
diff --git a/mali_kbase/mali_kbase_defs.h b/mali_kbase/mali_kbase_defs.h
index 25e4f32..bdc3f6d 100644
--- a/mali_kbase/mali_kbase_defs.h
+++ b/mali_kbase/mali_kbase_defs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -35,13 +35,13 @@
 #include <backend/gpu/mali_kbase_instr_defs.h>
 #include <mali_kbase_pm.h>
 #include <mali_kbase_gpuprops_types.h>
-#include <mali_kbase_hwcnt_watchdog_if.h>
+#include <hwcnt/mali_kbase_hwcnt_watchdog_if.h>
 
 #if MALI_USE_CSF
-#include <mali_kbase_hwcnt_backend_csf.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_csf.h>
 #else
-#include <mali_kbase_hwcnt_backend_jm.h>
-#include <mali_kbase_hwcnt_backend_jm_watchdog.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm.h>
+#include <hwcnt/backend/mali_kbase_hwcnt_backend_jm_watchdog.h>
 #endif
 
 #include <protected_mode_switcher.h>
@@ -53,11 +53,7 @@
 #include <linux/sizes.h>
 #include <linux/rtmutex.h>
 
-#if defined(CONFIG_SYNC)
-#include <sync.h>
-#else
 #include "mali_kbase_fence_defs.h"
-#endif
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 #include <linux/debugfs.h>
@@ -154,8 +150,7 @@
 /* Maximum number of pages of memory that require a permanent mapping, per
  * kbase_context
  */
-#define KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES ((32 * 1024ul * 1024ul) >> \
-								PAGE_SHIFT)
+#define KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES ((64 * 1024ul * 1024ul) >> PAGE_SHIFT)
 /* Minimum threshold period for hwcnt dumps between different hwcnt virtualizer
  * clients, to reduce undesired system load.
  * If a virtualizer client requests a dump within this threshold period after
@@ -188,6 +183,60 @@ struct kbase_as;
 struct kbase_mmu_setup;
 struct kbase_kinstr_jm;
 
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+/**
+ * struct kbase_gpu_metrics - Object containing members that are used to emit
+ *                            GPU metrics tracepoints for all applications that
+ *                            created Kbase context(s) for a GPU.
+ *
+ * @active_list:   List of applications that did some GPU activity in the recent work period.
+ * @inactive_list: List of applications that didn't do any GPU activity in the recent work period.
+ */
+struct kbase_gpu_metrics {
+	struct list_head active_list;
+	struct list_head inactive_list;
+};
+
+/**
+ * struct kbase_gpu_metrics_ctx - Object created for every application, that created
+ *                                Kbase context(s), containing members that are used
+ *                                to emit GPU metrics tracepoints for the application.
+ *
+ * @link:                    Links the object in kbase_device::gpu_metrics::active_list
+ *                           or kbase_device::gpu_metrics::inactive_list.
+ * @first_active_start_time: Records the time at which the application first became
+ *                           active in the current work period.
+ * @last_active_start_time:  Records the time at which the application last became
+ *                           active in the current work period.
+ * @last_active_end_time:    Records the time at which the application last became
+ *                           inactive in the current work period.
+ * @total_active:            Tracks the time for which application has been active
+ *                           in the current work period.
+ * @prev_wp_active_end_time: Records the time at which the application last became
+ *                           inactive in the previous work period.
+ * @aid:                     Unique identifier for an application.
+ * @kctx_count:              Counter to keep a track of the number of Kbase contexts
+ *                           created for an application. There may be multiple Kbase
+ *                           contexts contributing GPU activity data to a single GPU
+ *                           metrics context.
+ * @active_cnt:              Counter that is updated every time the GPU activity starts
+ *                           and ends in the current work period for an application.
+ * @flags:                   Flags to track the state of GPU metrics context.
+ */
+struct kbase_gpu_metrics_ctx {
+	struct list_head link;
+	u64 first_active_start_time;
+	u64 last_active_start_time;
+	u64 last_active_end_time;
+	u64 total_active;
+	u64 prev_wp_active_end_time;
+	unsigned int aid;
+	unsigned int kctx_count;
+	u8 active_cnt;
+	u8 flags;
+};
+#endif
+
 /**
  * struct kbase_io_access - holds information about 1 register access
  *
@@ -269,12 +318,25 @@ struct kbase_fault {
 	bool protected_mode;
 };
 
+/** Maximum number of memory pages that should be allocated for the array
+ * of pointers to free PGDs.
+ *
+ * This number has been pre-calculated to deal with the maximum allocation
+ * size expressed by the default value of KBASE_MEM_ALLOC_MAX_SIZE.
+ * This is supposed to be enough for almost the entirety of MMU operations.
+ * Any size greater than KBASE_MEM_ALLOC_MAX_SIZE requires being broken down
+ * into multiple iterations, each dealing with at most KBASE_MEM_ALLOC_MAX_SIZE
+ * bytes.
+ *
+ * Please update this value if KBASE_MEM_ALLOC_MAX_SIZE changes.
+ */
+#define MAX_PAGES_FOR_FREE_PGDS ((size_t)9)
+
+/* Maximum number of pointers to free PGDs */
+#define MAX_FREE_PGDS ((PAGE_SIZE / sizeof(struct page *)) * MAX_PAGES_FOR_FREE_PGDS)
+
 /**
  * struct kbase_mmu_table  - object representing a set of GPU page tables
- * @mmu_teardown_pages:   Array containing pointers to 3 separate pages, used
- *                        to cache the entries of top (L0) & intermediate level
- *                        page tables (L1 & L2) to avoid repeated calls to
- *                        kmap_atomic() during the MMU teardown.
  * @mmu_lock:             Lock to serialize the accesses made to multi level GPU
  *                        page tables
  * @pgd:                  Physical address of the page allocated for the top
@@ -286,29 +348,106 @@ struct kbase_fault {
  *                        Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
  * @kctx:                 If this set of MMU tables belongs to a context then
  *                        this is a back-reference to the context, otherwise
- *                        it is NULL
+ *                        it is NULL.
+ * @scratch_mem:          Scratch memory used for MMU operations, which are
+ *                        serialized by the @mmu_lock.
  */
 struct kbase_mmu_table {
-	u64 *mmu_teardown_pages[MIDGARD_MMU_BOTTOMLEVEL];
 	struct rt_mutex mmu_lock;
 	phys_addr_t pgd;
 	u8 group_id;
 	struct kbase_context *kctx;
+	union {
+		/**
+		 * @teardown_pages: Scratch memory used for backup copies of whole
+		 *                  PGD pages when tearing down levels upon
+		 *                  termination of the MMU table.
+		 */
+		struct {
+			/**
+			 * @levels: Array of PGD pages, large enough to copy one PGD
+			 *          for each level of the MMU table.
+			 */
+			u64 levels[MIDGARD_MMU_BOTTOMLEVEL][PAGE_SIZE / sizeof(u64)];
+		} teardown_pages;
+		/**
+		 * @free_pgds: Scratch memory used for insertion, update and teardown
+		 *             operations to store a temporary list of PGDs to be freed
+		 *             at the end of the operation.
+		 */
+		struct {
+			/** @pgds: Array of pointers to PGDs to free. */
+			struct page *pgds[MAX_FREE_PGDS];
+			/** @head_index: Index of first free element in the PGDs array. */
+			size_t head_index;
+		} free_pgds;
+	} scratch_mem;
+};
+
+/**
+ * enum kbase_memory_zone - Kbase memory zone identifier
+ * @SAME_VA_ZONE: Memory zone for allocations where the GPU and CPU VA coincide.
+ * @CUSTOM_VA_ZONE: When operating in compatibility mode, this zone is used to
+ *                  allow 32-bit userspace (either on a 32-bit device or a
+ *                  32-bit application on a 64-bit device) to address the entirety
+ *                  of the GPU address space. The @CUSTOM_VA_ZONE is also used
+ *                  for JIT allocations: on 64-bit systems, the zone is created
+ *                  by reducing the size of the SAME_VA zone by a user-controlled
+ *                  amount, whereas on 32-bit systems, it is created as part of
+ *                  the existing CUSTOM_VA_ZONE
+ * @EXEC_VA_ZONE: Memory zone used to track GPU-executable memory. The start
+ *                and end of this zone depend on the individual platform,
+ *                and it is initialized upon user process request.
+ * @EXEC_FIXED_VA_ZONE: Memory zone used to contain GPU-executable memory
+ *                      that also permits FIXED/FIXABLE allocations.
+ * @FIXED_VA_ZONE: Memory zone used to allocate memory at userspace-supplied
+ *                 addresses.
+ * @MCU_SHARED_ZONE: Memory zone created for mappings shared between the MCU
+ *                   and Kbase. Currently this is the only zone type that is
+ *                   created on a per-device, rather than a per-context
+ *                   basis.
+ * @MEMORY_ZONE_MAX: Sentinel value used for iterating over all the memory zone
+ *                   identifiers.
+ * @CONTEXT_ZONE_MAX: Sentinel value used to keep track of the last per-context
+ *                    zone for iteration.
+ */
+enum kbase_memory_zone {
+	SAME_VA_ZONE,
+	CUSTOM_VA_ZONE,
+	EXEC_VA_ZONE,
+#if IS_ENABLED(MALI_USE_CSF)
+	EXEC_FIXED_VA_ZONE,
+	FIXED_VA_ZONE,
+	MCU_SHARED_ZONE,
+#endif
+	MEMORY_ZONE_MAX,
+#if IS_ENABLED(MALI_USE_CSF)
+	CONTEXT_ZONE_MAX = FIXED_VA_ZONE + 1
+#else
+	CONTEXT_ZONE_MAX = EXEC_VA_ZONE + 1
+#endif
 };
 
 /**
- * struct kbase_reg_zone - Information about GPU memory region zones
+ * struct kbase_reg_zone - GPU memory zone information and region tracking
+ * @reg_rbtree: RB tree used to track kbase memory regions.
  * @base_pfn: Page Frame Number in GPU virtual address space for the start of
  *            the Zone
  * @va_size_pages: Size of the Zone in pages
+ * @id: Memory zone identifier
+ * @cache: Pointer to a per-device slab allocator to allow for quickly allocating
+ *         new regions
  *
  * Track information about a zone KBASE_REG_ZONE() and related macros.
  * In future, this could also store the &rb_root that are currently in
  * &kbase_context and &kbase_csf_device.
  */
 struct kbase_reg_zone {
+	struct rb_root reg_rbtree;
 	u64 base_pfn;
 	u64 va_size_pages;
+	enum kbase_memory_zone id;
+	struct kmem_cache *cache;
 };
 
 #if MALI_USE_CSF
@@ -317,6 +456,8 @@ struct kbase_reg_zone {
 #include "jm/mali_kbase_jm_defs.h"
 #endif
 
+#include "mali_kbase_hwaccess_time.h"
+
 static inline int kbase_as_has_bus_fault(struct kbase_as *as,
 	struct kbase_fault *fault)
 {
@@ -403,7 +544,15 @@ struct kbase_clk_rate_trace_manager {
  *                Note that some code paths keep shaders/the tiler
  *                powered whilst this is 0.
  *                Use kbase_pm_is_active() instead to check for such cases.
- * @suspending: Flag indicating suspending/suspended
+ * @suspending: Flag set to true when System suspend of GPU device begins and
+ *              set to false only when System resume of GPU device starts.
+ *              So GPU device could be in suspended state while the flag is set.
+ *              The flag is updated with @lock held.
+ * @resuming:   Flag set to true when System resume of GPU device starts and is set
+ *              to false when resume ends. The flag is set to true at the same time
+ *              when @suspending is set to false with @lock held.
+ *              The flag is currently used only to prevent Kbase context termination
+ *              during System resume of GPU device.
  * @runtime_active: Flag to track if the GPU is in runtime suspended or active
  *                  state. This ensures that runtime_put and runtime_get
  *                  functions are called in pairs. For example if runtime_get
@@ -414,7 +563,7 @@ struct kbase_clk_rate_trace_manager {
  *            This structure contains data for the power management framework.
  *            There is one instance of this structure per device in the system.
  * @zero_active_count_wait: Wait queue set when active_count == 0
- * @resume_wait: system resume of GPU device.
+ * @resume_wait: Wait queue to wait for the System suspend/resume of GPU device.
  * @debug_core_mask: Bit masks identifying the available shader cores that are
  *                   specified via sysfs. One mask per job slot.
  * @debug_core_mask_all: Bit masks identifying the available shader cores that
@@ -432,9 +581,10 @@ struct kbase_clk_rate_trace_manager {
  * @clk_rtm: The state of the GPU clock rate trace manager
  */
 struct kbase_pm_device_data {
-	struct mutex lock;
+	struct rt_mutex lock;
 	int active_count;
 	bool suspending;
+	bool resuming;
 #if MALI_USE_CSF
 	bool runtime_active;
 #endif
@@ -465,36 +615,40 @@ struct kbase_pm_device_data {
 
 /**
  * struct kbase_mem_pool - Page based memory pool for kctx/kbdev
- * @kbdev:        Kbase device where memory is used
- * @cur_size:     Number of free pages currently in the pool (may exceed
- *                @max_size in some corner cases)
- * @max_size:     Maximum number of free pages in the pool
- * @order:        order = 0 refers to a pool of 4 KB pages
- *                order = 9 refers to a pool of 2 MB pages (2^9 * 4KB = 2 MB)
- * @group_id:     A memory group ID to be passed to a platform-specific
- *                memory group manager, if present. Immutable.
- *                Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
- * @pool_lock:    Lock protecting the pool - must be held when modifying
- *                @cur_size and @page_list
- * @page_list:    List of free pages in the pool
- * @reclaim:      Shrinker for kernel reclaim of free pages
- * @next_pool:    Pointer to next pool where pages can be allocated when this
- *                pool is empty. Pages will spill over to the next pool when
- *                this pool is full. Can be NULL if there is no next pool.
- * @dying:        true if the pool is being terminated, and any ongoing
- *                operations should be abandoned
- * @dont_reclaim: true if the shrinker is forbidden from reclaiming memory from
- *                this pool, eg during a grow operation
+ * @kbdev:                     Kbase device where memory is used
+ * @cur_size:                  Number of free pages currently in the pool (may exceed
+ *                             @max_size in some corner cases)
+ * @max_size:                  Maximum number of free pages in the pool
+ * @order:                     order = 0 refers to a pool of 4 KB pages
+ *                             order = 9 refers to a pool of 2 MB pages (2^9 * 4KB = 2 MB)
+ * @group_id:                  A memory group ID to be passed to a platform-specific
+ *                             memory group manager, if present. Immutable.
+ *                             Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
+ * @pool_lock:                 Lock protecting the pool - must be held when modifying
+ *                             @cur_size and @page_list
+ * @page_list:                 List of free pages in the pool
+ * @reclaim:                   Shrinker for kernel reclaim of free pages
+ * @isolation_in_progress_cnt: Number of pages in pool undergoing page isolation.
+ *                             This is used to avoid race condition between pool termination
+ *                             and page isolation for page migration.
+ * @next_pool:                 Pointer to next pool where pages can be allocated when this
+ *                             pool is empty. Pages will spill over to the next pool when
+ *                             this pool is full. Can be NULL if there is no next pool.
+ * @dying:                     true if the pool is being terminated, and any ongoing
+ *                             operations should be abandoned
+ * @dont_reclaim:              true if the shrinker is forbidden from reclaiming memory from
+ *                             this pool, eg during a grow operation
  */
 struct kbase_mem_pool {
 	struct kbase_device *kbdev;
-	size_t              cur_size;
-	size_t              max_size;
-	u8                  order;
-	u8                  group_id;
-	spinlock_t          pool_lock;
-	struct list_head    page_list;
-	struct shrinker     reclaim;
+	size_t cur_size;
+	size_t max_size;
+	u8 order;
+	u8 group_id;
+	spinlock_t pool_lock;
+	struct list_head page_list;
+	struct shrinker reclaim;
+	atomic_t isolation_in_progress_cnt;
 
 	struct kbase_mem_pool *next_pool;
 
@@ -581,7 +735,7 @@ struct kbase_devfreq_opp {
  * @entry_set_pte:    program the pte to be a valid entry to encode the physical
  *                    address of the next lower level page table and also update
  *                    the number of valid entries.
- * @entry_invalidate: clear out or invalidate the pte.
+ * @entries_invalidate: clear out or invalidate a range of ptes.
  * @get_num_valid_entries: returns the number of valid entries for a specific pgd.
  * @set_num_valid_entries: sets the number of valid entries for a specific pgd
  * @flags:            bitmask of MMU mode flags. Refer to KBASE_MMU_MODE_ constants.
@@ -598,8 +752,8 @@ struct kbase_mmu_mode {
 	int (*pte_is_valid)(u64 pte, int level);
 	void (*entry_set_ate)(u64 *entry, struct tagged_addr phy,
 			unsigned long flags, int level);
-	void (*entry_set_pte)(u64 *pgd, u64 vpfn, phys_addr_t phy);
-	void (*entry_invalidate)(u64 *entry);
+	void (*entry_set_pte)(u64 *entry, phys_addr_t phy);
+	void (*entries_invalidate)(u64 *entry, u32 count);
 	unsigned int (*get_num_valid_entries)(u64 *pgd);
 	void (*set_num_valid_entries)(u64 *pgd,
 				      unsigned int num_of_valid_entries);
@@ -675,6 +829,33 @@ struct kbase_process {
 };
 
 /**
+ * struct kbase_mem_migrate - Object representing an instance for managing
+ *                            page migration.
+ *
+ * @free_pages_list:  List of deferred pages to free. Mostly used when page migration
+ *                    is enabled. Pages in memory pool that require migrating
+ *                    will be freed instead. However page cannot be freed
+ *                    right away as Linux will need to release the page lock.
+ *                    Therefore page will be added to this list and freed later.
+ * @free_pages_lock:  This lock should be held when adding or removing pages
+ *                    from @free_pages_list.
+ * @free_pages_workq: Work queue to process the work items queued to free
+ *                    pages in @free_pages_list.
+ * @free_pages_work:  Work item to free pages in @free_pages_list.
+ * @inode:            Pointer to inode whose address space operations are used
+ *                    for page migration purposes.
+ */
+struct kbase_mem_migrate {
+	struct list_head free_pages_list;
+	spinlock_t free_pages_lock;
+	struct workqueue_struct *free_pages_workq;
+	struct work_struct free_pages_work;
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+	struct inode *inode;
+#endif
+};
+
+/**
  * struct kbase_device   - Object representing an instance of GPU platform device,
  *                         allocated from the probe method of mali driver.
  * @hw_quirks_sc:          Configuration to be used for the shader cores as per
@@ -712,6 +893,10 @@ struct kbase_process {
  * @opp_token:             Token linked to the device OPP structure maintaining the
  *                         link to OPPs attached to a device. This is obtained
  *                         after setting regulator names for the device.
+ * @token:                 Integer replacement for opp_table in kernel versions
+ *                         6 and greater. Value is a token id number when 0 or greater,
+ *                         and a linux errno when negative. Must be initialised
+ *                         to an non-zero value as 0 is valid token id.
  * @devname:               string containing the name used for GPU device instance,
  *                         miscellaneous device is registered using the same name.
  * @id:                    Unique identifier for the device, indicates the number of
@@ -752,12 +937,18 @@ struct kbase_process {
  *                         to the GPU device. This points to an internal memory
  *                         group manager if no platform-specific memory group
  *                         manager was retrieved through device tree.
+ * @mmu_unresponsive:      Flag to indicate MMU is not responding.
+ *                         Set if a MMU command isn't completed within
+ *                         &kbase_device:mmu_or_gpu_cache_op_wait_time_ms.
+ *                         Clear by kbase_ctx_sched_restore_all_as() after GPU reset completes.
  * @as:                    Array of objects representing address spaces of GPU.
- * @as_free:               Bitpattern of free/available GPU address spaces.
  * @as_to_kctx:            Array of pointers to struct kbase_context, having
  *                         GPU adrress spaces assigned to them.
+ * @as_free:               Bitpattern of free/available GPU address spaces.
  * @mmu_mask_change:       Lock to serialize the access to MMU interrupt mask
  *                         register used in the handling of Bus & Page faults.
+ * @pagesize_2mb:          Boolean to determine whether 2MiB page sizes are
+ *                         supported and used where possible.
  * @gpu_props:             Object containing complete information about the
  *                         configuration/properties of GPU HW device in use.
  * @hw_issues_mask:        List of SW workarounds for HW issues
@@ -803,6 +994,7 @@ struct kbase_process {
  *                         GPU reset.
  * @lowest_gpu_freq_khz:   Lowest frequency in KHz that the GPU can run at. Used
  *                         to calculate suitable timeouts for wait operations.
+ * @backend_time:          Kbase backend time related attributes.
  * @cache_clean_in_progress: Set when a cache clean has been started, and
  *                         cleared when it has finished. This prevents multiple
  *                         cache cleans being done simultaneously.
@@ -909,6 +1101,10 @@ struct kbase_process {
  *                         GPU2019-3878. PM state machine is invoked after
  *                         clearing this flag and @hwaccess_lock is used to
  *                         serialize the access.
+ * @mmu_page_migrate_in_progress: Set before starting a MMU page migration transaction
+ *                         and cleared after the transaction completes. PM L2 state is
+ *                         prevented from entering powering up/down transitions when the
+ *                         flag is set, @hwaccess_lock is used to serialize the access.
  * @poweroff_pending:      Set when power off operation for GPU is started, reset when
  *                         power on for GPU is started.
  * @infinite_cache_active_default: Set to enable using infinite cache for all the
@@ -978,11 +1174,8 @@ struct kbase_process {
  *                          @total_gpu_pages for both native and dma-buf imported
  *                          allocations.
  * @job_done_worker:        Worker for job_done work.
- * @job_done_worker_thread: Thread for job_done work.
  * @event_worker:           Worker for event work.
- * @event_worker_thread:    Thread for event work.
  * @apc.worker:             Worker for async power control work.
- * @apc.thread:             Thread for async power control work.
  * @apc.power_on_work:      Work struct for powering on the GPU.
  * @apc.power_off_work:     Work struct for powering off the GPU.
  * @apc.end_ts:             The latest end timestamp to power off the GPU.
@@ -1002,6 +1195,16 @@ struct kbase_process {
  * @oom_notifier_block:     notifier_block containing kernel-registered out-of-
  *                          memory handler.
  * @proc_sysfs_node:        Sysfs directory node to store per-process stats.
+ * @mem_migrate:            Per device object for managing page migration.
+ * @live_fence_metadata:    Count of live fence metadata structures created by
+ *                          KCPU queue. These structures may outlive kbase module
+ *                          itself. Therefore, in such a case, a warning should be
+ *                          be produced.
+ * @mmu_or_gpu_cache_op_wait_time_ms: Maximum waiting time in ms for the completion of
+ *                          a cache operation via MMU_AS_CONTROL or GPU_CONTROL.
+ * @va_region_slab:         kmem_cache (slab) for allocated kbase_va_region structures.
+ * @fence_signal_timeout_enabled: Global flag for whether fence signal timeout tracking
+ *                                is enabled.
  */
 struct kbase_device {
 	u32 hw_quirks_sc;
@@ -1026,12 +1229,16 @@ struct kbase_device {
 #if IS_ENABLED(CONFIG_REGULATOR)
 	struct regulator *regulators[BASE_MAX_NR_CLOCKS_REGULATORS];
 	unsigned int nr_regulators;
-	int opp_token;
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+	int token;
+#elif (KERNEL_VERSION(4, 10, 0) <= LINUX_VERSION_CODE)
+	struct opp_table *opp_table;
+#endif /* (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE) */
 #endif /* CONFIG_REGULATOR */
 	char devname[DEVNAME_SIZE];
 	u32  id;
 
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
+#if !IS_ENABLED(CONFIG_MALI_REAL_HW)
 	void *model;
 	struct kmem_cache *irq_slab;
 	struct workqueue_struct *irq_workq;
@@ -1039,7 +1246,7 @@ struct kbase_device {
 	atomic_t serving_gpu_irq;
 	atomic_t serving_mmu_irq;
 	spinlock_t reg_op_lock;
-#endif /* CONFIG_MALI_NO_MALI */
+#endif /* !IS_ENABLED(CONFIG_MALI_REAL_HW) */
 	struct kbase_pm_device_data pm;
 
 	struct kbase_mem_pool_group mem_pools;
@@ -1048,12 +1255,15 @@ struct kbase_device {
 
 	struct memory_group_manager_device *mgm_dev;
 
+	bool mmu_unresponsive;
 	struct kbase_as as[BASE_MAX_NR_AS];
-	u16 as_free;
 	struct kbase_context *as_to_kctx[BASE_MAX_NR_AS];
+	u16 as_free;
 
 	spinlock_t mmu_mask_change;
 
+	bool pagesize_2mb;
+
 	struct kbase_gpu_props gpu_props;
 
 	unsigned long hw_issues_mask[(BASE_HW_ISSUE_END + BITS_PER_LONG - 1) / BITS_PER_LONG];
@@ -1067,6 +1277,12 @@ struct kbase_device {
 	s8 nr_hw_address_spaces;
 	s8 nr_user_address_spaces;
 
+	/**
+	 * @pbha_propagate_bits:   Record of Page-Based Hardware Attribute Propagate bits to
+	 *                         restore to L2_CONFIG upon GPU reset.
+	 */
+	u8 pbha_propagate_bits;
+
 #if MALI_USE_CSF
 	struct kbase_hwcnt_backend_csf_if hwcnt_backend_csf_if_fw;
 #else
@@ -1101,6 +1317,8 @@ struct kbase_device {
 
 	u64 lowest_gpu_freq_khz;
 
+	struct kbase_backend_time backend_time;
+
 	bool cache_clean_in_progress;
 	u32 cache_clean_queued;
 	wait_queue_head_t cache_clean_wait;
@@ -1148,7 +1366,9 @@ struct kbase_device {
 #endif /* CONFIG_MALI_DEVFREQ */
 	unsigned long previous_frequency;
 
+#if !MALI_USE_CSF
 	atomic_t job_fault_debug;
+#endif /* !MALI_USE_CSF */
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	struct dentry *mali_debugfs_directory;
@@ -1159,11 +1379,13 @@ struct kbase_device {
 	u64 debugfs_as_read_bitmap;
 #endif /* CONFIG_MALI_DEBUG */
 
+#if !MALI_USE_CSF
 	wait_queue_head_t job_fault_wq;
 	wait_queue_head_t job_fault_resume_wq;
 	struct workqueue_struct *job_fault_resume_workq;
 	struct list_head job_fault_event_list;
 	spinlock_t job_fault_event_lock;
+#endif /* !MALI_USE_CSF */
 
 #if !MALI_CUSTOMER_RELEASE
 	struct {
@@ -1185,13 +1407,11 @@ struct kbase_device {
 #if MALI_USE_CSF
 	bool mmu_hw_operation_in_progress;
 #endif
+	bool mmu_page_migrate_in_progress;
 	bool poweroff_pending;
 
-#if (KERNEL_VERSION(4, 4, 0) <= LINUX_VERSION_CODE)
 	bool infinite_cache_active_default;
-#else
-	u32 infinite_cache_active_default;
-#endif
+
 	struct kbase_mem_pool_group_config mem_pool_defaults;
 
 	u32 current_gpu_coherency_mode;
@@ -1240,9 +1460,7 @@ struct kbase_device {
 	struct kbasep_js_device_data js_data;
 
 	struct kthread_worker job_done_worker;
-	struct task_struct *job_done_worker_thread;
 	struct kthread_worker event_worker;
-	struct task_struct *event_worker_thread;
 
 	/* See KBASE_JS_*_PRIORITY_MODE for details. */
 	u32 js_ctx_scheduling_mode;
@@ -1258,7 +1476,6 @@ struct kbase_device {
 
 	struct {
 		struct kthread_worker worker;
-		struct task_struct *thread;
 		struct kthread_work power_on_work;
 		struct kthread_work power_off_work;
 		ktime_t end_ts;
@@ -1292,6 +1509,24 @@ struct kbase_device {
 	struct notifier_block oom_notifier_block;
 
 	struct kobject *proc_sysfs_node;
+
+	struct kbase_mem_migrate mem_migrate;
+
+#if MALI_USE_CSF && IS_ENABLED(CONFIG_SYNC_FILE)
+	atomic_t live_fence_metadata;
+#endif
+	u32 mmu_or_gpu_cache_op_wait_time_ms;
+	struct kmem_cache *va_region_slab;
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	/**
+	 * @gpu_metrics: GPU device wide structure used for emitting GPU metrics tracepoints.
+	 */
+	struct kbase_gpu_metrics gpu_metrics;
+#endif
+#if MALI_USE_CSF
+	atomic_t fence_signal_timeout_enabled;
+#endif
 };
 
 /**
@@ -1308,6 +1543,9 @@ struct kbase_device {
  * @KBASE_FILE_COMPLETE:        Indicates if the setup for context has
  *                              completed, i.e. flags have been set for the
  *                              context.
+ * @KBASE_FILE_DESTROY_CTX:     Indicates that destroying of context has begun or
+ *                              is complete. This state can only be reached after
+ *                              @KBASE_FILE_COMPLETE.
  *
  * The driver allows only limited interaction with user-space until setup
  * is complete.
@@ -1317,7 +1555,8 @@ enum kbase_file_state {
 	KBASE_FILE_VSN_IN_PROGRESS,
 	KBASE_FILE_NEED_CTX,
 	KBASE_FILE_CTX_IN_PROGRESS,
-	KBASE_FILE_COMPLETE
+	KBASE_FILE_COMPLETE,
+	KBASE_FILE_DESTROY_CTX
 };
 
 /**
@@ -1327,6 +1566,12 @@ enum kbase_file_state {
  *                       allocated from the probe method of the Mali driver.
  * @filp:                Pointer to the struct file corresponding to device file
  *                       /dev/malixx instance, passed to the file's open method.
+ * @owner:               Pointer to the file table structure of a process that
+ *                       created the instance of /dev/malixx device file. Set to
+ *                       NULL when that process closes the file instance. No more
+ *                       file operations would be allowed once set to NULL.
+ *                       It would be updated only in the Userspace context, i.e.
+ *                       when @kbase_open or @kbase_flush is called.
  * @kctx:                Object representing an entity, among which GPU is
  *                       scheduled and which gets its own GPU address space.
  *                       Invalid until @setup_state is KBASE_FILE_COMPLETE.
@@ -1335,13 +1580,44 @@ enum kbase_file_state {
  *                       @setup_state is KBASE_FILE_NEED_CTX.
  * @setup_state:         Initialization state of the file. Values come from
  *                       the kbase_file_state enumeration.
+ * @destroy_kctx_work:   Work item for destroying the @kctx, enqueued only when
+ *                       @fops_count and @map_count becomes zero after /dev/malixx
+ *                       file was previously closed by the @owner.
+ * @lock:                Lock to serialize the access to members like @owner, @fops_count,
+ *                       @map_count.
+ * @fops_count:          Counter that is incremented at the beginning of a method
+ *                       defined for @kbase_fops and is decremented at the end.
+ *                       So the counter keeps a track of the file operations in progress
+ *                       for /dev/malixx file, that are being handled by the Kbase.
+ *                       The counter is needed to defer the context termination as
+ *                       Userspace can close the /dev/malixx file and flush() method
+ *                       can get called when some other file operation is in progress.
+ * @map_count:           Counter to keep a track of the memory mappings present on
+ *                       /dev/malixx file instance. The counter is needed to defer the
+ *                       context termination as Userspace can close the /dev/malixx
+ *                       file and flush() method can get called when mappings are still
+ *                       present.
+ * @zero_fops_count_wait: Waitqueue used to wait for the @fops_count to become 0.
+ *                        Currently needed only for the "mem_view" debugfs file.
+ * @event_queue:          Wait queue used for blocking the thread, which consumes
+ *                        the base_jd_event corresponding to an atom, when there
+ *                        are no more posted events.
  */
 struct kbase_file {
 	struct kbase_device  *kbdev;
 	struct file          *filp;
+	fl_owner_t            owner;
 	struct kbase_context *kctx;
 	unsigned long         api_version;
 	atomic_t              setup_state;
+	struct work_struct    destroy_kctx_work;
+	spinlock_t            lock;
+	int                   fops_count;
+	int                   map_count;
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+	wait_queue_head_t     zero_fops_count_wait;
+#endif
+	wait_queue_head_t event_queue;
 };
 #if MALI_JIT_PRESSURE_LIMIT_BASE
 /**
@@ -1374,10 +1650,6 @@ struct kbase_file {
  *
  * @KCTX_DYING: Set when the context process is in the process of being evicted.
  *
- * @KCTX_NO_IMPLICIT_SYNC: Set when explicit Android fences are in use on this
- * context, to disable use of implicit dma-buf fences. This is used to avoid
- * potential synchronization deadlocks.
- *
  * @KCTX_FORCE_SAME_VA: Set when BASE_MEM_SAME_VA should be forced on memory
  * allocations. For 64-bit clients it is enabled by default, and disabled by
  * default on 32-bit clients. Being able to clear this flag is only used for
@@ -1420,7 +1692,6 @@ enum kbase_context_flags {
 	KCTX_PRIVILEGED = 1U << 7,
 	KCTX_SCHEDULED = 1U << 8,
 	KCTX_DYING = 1U << 9,
-	KCTX_NO_IMPLICIT_SYNC = 1U << 10,
 	KCTX_FORCE_SAME_VA = 1U << 11,
 	KCTX_PULLED_SINCE_ACTIVE_JS0 = 1U << 12,
 	KCTX_PULLED_SINCE_ACTIVE_JS1 = 1U << 13,
@@ -1459,9 +1730,6 @@ enum kbase_context_flags {
  *
  * @KCTX_DYING: Set when the context process is in the process of being evicted.
  *
- * @KCTX_NO_IMPLICIT_SYNC: Set when explicit Android fences are in use on this
- * context, to disable use of implicit dma-buf fences. This is used to avoid
- * potential synchronization deadlocks.
  *
  * @KCTX_FORCE_SAME_VA: Set when BASE_MEM_SAME_VA should be forced on memory
  * allocations. For 64-bit clients it is enabled by default, and disabled by
@@ -1502,7 +1770,6 @@ enum kbase_context_flags {
 	KCTX_PRIVILEGED = 1U << 7,
 	KCTX_SCHEDULED = 1U << 8,
 	KCTX_DYING = 1U << 9,
-	KCTX_NO_IMPLICIT_SYNC = 1U << 10,
 	KCTX_FORCE_SAME_VA = 1U << 11,
 	KCTX_PULLED_SINCE_ACTIVE_JS0 = 1U << 12,
 	KCTX_PULLED_SINCE_ACTIVE_JS1 = 1U << 13,
@@ -1520,8 +1787,8 @@ struct kbase_sub_alloc {
 /**
  * struct kbase_context - Kernel base context
  *
- * @filp:                 Pointer to the struct file corresponding to device file
- *                        /dev/malixx instance, passed to the file's open method.
+ * @kfile:                Pointer to the object representing the /dev/malixx device
+ *                        file instance.
  * @kbdev:                Pointer to the Kbase device for which the context is created.
  * @kctx_list_link:       Node into Kbase device list of contexts.
  * @mmu:                  Structure holding details of the MMU tables for this
@@ -1556,22 +1823,6 @@ struct kbase_sub_alloc {
  *                        for the allocations >= 2 MB in size.
  * @reg_lock:             Lock used for GPU virtual address space management operations,
  *                        like adding/freeing a memory region in the address space.
- *                        Can be converted to a rwlock ?.
- * @reg_rbtree_same:      RB tree of the memory regions allocated from the SAME_VA
- *                        zone of the GPU virtual address space. Used for allocations
- *                        having the same value for GPU & CPU virtual address.
- * @reg_rbtree_custom:    RB tree of the memory regions allocated from the CUSTOM_VA
- *                        zone of the GPU virtual address space.
- * @reg_rbtree_exec:      RB tree of the memory regions allocated from the EXEC_VA
- *                        zone of the GPU virtual address space. Used for GPU-executable
- *                        allocations which don't need the SAME_VA property.
- * @reg_rbtree_exec_fixed: RB tree of the memory regions allocated from the
- *                         EXEC_FIXED_VA zone of the GPU virtual address space. Used for
- *                        GPU-executable allocations with FIXED/FIXABLE GPU virtual
- *                        addresses.
- * @reg_rbtree_fixed:     RB tree of the memory regions allocated from the FIXED_VA zone
- *                        of the GPU virtual address space. Used for allocations with
- *                        FIXED/FIXABLE GPU virtual addresses.
  * @num_fixable_allocs:   A count for the number of memory allocations with the
  *                        BASE_MEM_FIXABLE property.
  * @num_fixed_allocs:     A count for the number of memory allocations with the
@@ -1588,9 +1839,6 @@ struct kbase_sub_alloc {
  *                        used in conjunction with @cookies bitmask mainly for
  *                        providing a mechansim to have the same value for CPU &
  *                        GPU virtual address.
- * @event_queue:          Wait queue used for blocking the thread, which consumes
- *                        the base_jd_event corresponding to an atom, when there
- *                        are no more posted events.
  * @tgid:                 Thread group ID of the process whose thread created
  *                        the context (by calling KBASE_IOCTL_VERSION_CHECK or
  *                        KBASE_IOCTL_SET_FLAGS, depending on the @api_version).
@@ -1652,11 +1900,13 @@ struct kbase_sub_alloc {
  *                        is scheduled in and an atom is pulled from the context's per
  *                        slot runnable tree in JM GPU or GPU command queue
  *                        group is programmed on CSG slot in CSF GPU.
- * @mm_update_lock:       lock used for handling of special tracking page.
  * @process_mm:           Pointer to the memory descriptor of the process which
  *                        created the context. Used for accounting the physical
  *                        pages used for GPU allocations, done for the context,
- *                        to the memory consumed by the process.
+ *                        to the memory consumed by the process. A reference is taken
+ *                        on this descriptor for the Userspace created contexts so that
+ *                        Kbase can safely access it to update the memory usage counters.
+ *                        The reference is dropped on context termination.
  * @gpu_va_end:           End address of the GPU va space (in 4KB page units)
  * @running_total_tiler_heap_nr_chunks: Running total of number of chunks in all
  *                        tiler heaps of the kbase context.
@@ -1707,12 +1957,6 @@ struct kbase_sub_alloc {
  *                                   memory allocations.
  * @jit_current_allocations_per_bin: Current number of in-flight just-in-time
  *                                   memory allocations per bin.
- * @jit_version:          Version number indicating whether userspace is using
- *                        old or new version of interface for just-in-time
- *                        memory allocations.
- *                        1 -> client used KBASE_IOCTL_MEM_JIT_INIT_10_2
- *                        2 -> client used KBASE_IOCTL_MEM_JIT_INIT_11_5
- *                        3 -> client used KBASE_IOCTL_MEM_JIT_INIT
  * @jit_group_id:         A memory group ID to be passed to a platform-specific
  *                        memory group manager.
  *                        Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
@@ -1784,6 +2028,11 @@ struct kbase_sub_alloc {
  * @limited_core_mask:    The mask that is applied to the affinity in case of atoms
  *                        marked with BASE_JD_REQ_LIMITED_CORE_MASK.
  * @platform_data:        Pointer to platform specific per-context data.
+ * @task:                 Pointer to the task structure of the main thread of the process
+ *                        that created the Kbase context. It would be set only for the
+ *                        contexts created by the Userspace and not for the contexts
+ *                        created internally by the Kbase.
+ * @comm:                 Record the process name
  *
  * A kernel base context is an entity among which the GPU is scheduled.
  * Each context has its own GPU address space.
@@ -1792,7 +2041,7 @@ struct kbase_sub_alloc {
  * is made on the device file.
  */
 struct kbase_context {
-	struct file *filp;
+	struct kbase_file *kfile;
 	struct kbase_device *kbdev;
 	struct list_head kctx_list_link;
 	struct kbase_mmu_table mmu;
@@ -1817,17 +2066,11 @@ struct kbase_context {
 	struct list_head        mem_partials;
 
 	struct mutex            reg_lock;
-
-	struct rb_root reg_rbtree_same;
-	struct rb_root reg_rbtree_custom;
-	struct rb_root reg_rbtree_exec;
 #if MALI_USE_CSF
-	struct rb_root reg_rbtree_exec_fixed;
-	struct rb_root reg_rbtree_fixed;
 	atomic64_t num_fixable_allocs;
 	atomic64_t num_fixed_allocs;
 #endif
-	struct kbase_reg_zone reg_zone[KBASE_REG_ZONE_MAX];
+	struct kbase_reg_zone reg_zone[CONTEXT_ZONE_MAX];
 
 #if MALI_USE_CSF
 	struct kbase_csf_context csf;
@@ -1851,7 +2094,6 @@ struct kbase_context {
 	DECLARE_BITMAP(cookies, BITS_PER_LONG);
 	struct kbase_va_region *pending_regions[BITS_PER_LONG];
 
-	wait_queue_head_t event_queue;
 	pid_t tgid;
 	pid_t pid;
 	atomic_t used_pages;
@@ -1866,19 +2108,12 @@ struct kbase_context {
 
 	struct list_head waiting_soft_jobs;
 	spinlock_t waiting_soft_jobs_lock;
-#ifdef CONFIG_MALI_DMA_FENCE
-	struct {
-		struct list_head waiting_resource;
-		struct workqueue_struct *wq;
-	} dma_fence;
-#endif /* CONFIG_MALI_DMA_FENCE */
 
 	int as_nr;
 
 	atomic_t refcount;
 
-	spinlock_t         mm_update_lock;
-	struct mm_struct __rcu *process_mm;
+	struct mm_struct *process_mm;
 	u64 gpu_va_end;
 #if MALI_USE_CSF
 	u32 running_total_tiler_heap_nr_chunks;
@@ -1903,7 +2138,6 @@ struct kbase_context {
 	u8 jit_max_allocations;
 	u8 jit_current_allocations;
 	u8 jit_current_allocations_per_bin[256];
-	u8 jit_version;
 	u8 jit_group_id;
 #if MALI_JIT_PRESSURE_LIMIT_BASE
 	u64 jit_phys_pages_limit;
@@ -1939,9 +2173,19 @@ struct kbase_context {
 
 	u64 limited_core_mask;
 
-#if !MALI_USE_CSF
 	void *platform_data;
+
+	struct task_struct *task;
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	/**
+	 * @gpu_metrics_ctx: Pointer to the GPU metrics context corresponding to the
+	 *                   application that created the Kbase context.
+	 */
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx;
 #endif
+
+	char comm[TASK_COMM_LEN];
 };
 
 #ifdef CONFIG_MALI_CINSTR_GWT
@@ -1970,17 +2214,15 @@ struct kbasep_gwt_list_element {
  *                                 to a @kbase_context.
  * @ext_res_node:                  List head for adding the metadata to a
  *                                 @kbase_context.
- * @alloc:                         The physical memory allocation structure
- *                                 which is mapped.
- * @gpu_addr:                      The GPU virtual address the resource is
- *                                 mapped to.
+ * @reg:                           External resource information, containing
+ *                                 the corresponding VA region
  * @ref:                           Reference count.
  *
  * External resources can be mapped into multiple contexts as well as the same
  * context multiple times.
- * As kbase_va_region itself isn't refcounted we can't attach our extra
- * information to it as it could be removed under our feet leaving external
- * resources pinned.
+ * As kbase_va_region is refcounted, we guarantee that it will be available
+ * for the duration of the external resource, meaning it is sufficient to use
+ * it to rederive any additional data, like the GPU address.
  * This metadata structure binds a single external resource to a single
  * context, ensuring that per context mapping is tracked separately so it can
  * be overridden when needed and abuses by the application (freeing the resource
@@ -1988,8 +2230,7 @@ struct kbasep_gwt_list_element {
  */
 struct kbase_ctx_ext_res_meta {
 	struct list_head ext_res_node;
-	struct kbase_mem_phy_alloc *alloc;
-	u64 gpu_addr;
+	struct kbase_va_region *reg;
 	u32 ref;
 };
 
@@ -2044,6 +2285,7 @@ static inline u64 kbase_get_lock_region_min_size_log2(struct kbase_gpu_props con
 /* Maximum number of loops polling the GPU for a cache flush before we assume it must have completed */
 #define KBASE_CLEAN_CACHE_MAX_LOOPS     100000
 /* Maximum number of loops polling the GPU for an AS command to complete before we assume the GPU has hung */
-#define KBASE_AS_INACTIVE_MAX_LOOPS     100000000
-
+#define KBASE_AS_INACTIVE_MAX_LOOPS     100000
+/* Maximum number of loops polling the GPU PRFCNT_ACTIVE bit before we assume the GPU has hung */
+#define KBASE_PRFCNT_ACTIVE_MAX_LOOPS   100000000
 #endif /* _KBASE_DEFS_H_ */
diff --git a/mali_kbase/mali_kbase_dma_fence.c b/mali_kbase/mali_kbase_dma_fence.c
deleted file mode 100644
index c4129ff..0000000
--- a/mali_kbase/mali_kbase_dma_fence.c
+++ /dev/null
@@ -1,491 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
-/*
- *
- * (C) COPYRIGHT 2011-2016, 2020-2021 ARM Limited. All rights reserved.
- *
- * This program is free software and is provided to you under the terms of the
- * GNU General Public License version 2 as published by the Free Software
- * Foundation, and any use by you of this program is subject to the terms
- * of such GNU license.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, you can access it online at
- * http://www.gnu.org/licenses/gpl-2.0.html.
- *
- */
-
-/* Include mali_kbase_dma_fence.h before checking for CONFIG_MALI_DMA_FENCE as
- * it will be set there.
- */
-#include "mali_kbase_dma_fence.h"
-#include <linux/atomic.h>
-#include <linux/list.h>
-#include <linux/lockdep.h>
-#include <linux/mutex.h>
-#include <linux/version.h>
-#include <linux/slab.h>
-#include <linux/spinlock.h>
-#include <linux/workqueue.h>
-#include <linux/ww_mutex.h>
-#include <mali_kbase.h>
-
-static void
-kbase_dma_fence_work(struct work_struct *pwork);
-
-static void
-kbase_dma_fence_waiters_add(struct kbase_jd_atom *katom)
-{
-	struct kbase_context *kctx = katom->kctx;
-
-	list_add_tail(&katom->queue, &kctx->dma_fence.waiting_resource);
-}
-
-static void
-kbase_dma_fence_waiters_remove(struct kbase_jd_atom *katom)
-{
-	list_del(&katom->queue);
-}
-
-static int
-kbase_dma_fence_lock_reservations(struct kbase_dma_fence_resv_info *info,
-				  struct ww_acquire_ctx *ctx)
-{
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-	struct reservation_object *content_res = NULL;
-#else
-	struct dma_resv *content_res = NULL;
-#endif
-	unsigned int content_res_idx = 0;
-	unsigned int r;
-	int err = 0;
-
-	ww_acquire_init(ctx, &reservation_ww_class);
-
-retry:
-	for (r = 0; r < info->dma_fence_resv_count; r++) {
-		if (info->resv_objs[r] == content_res) {
-			content_res = NULL;
-			continue;
-		}
-
-		err = ww_mutex_lock(&info->resv_objs[r]->lock, ctx);
-		if (err)
-			goto error;
-	}
-
-	ww_acquire_done(ctx);
-	return err;
-
-error:
-	content_res_idx = r;
-
-	/* Unlock the locked one ones */
-	while (r--)
-		ww_mutex_unlock(&info->resv_objs[r]->lock);
-
-	if (content_res)
-		ww_mutex_unlock(&content_res->lock);
-
-	/* If we deadlock try with lock_slow and retry */
-	if (err == -EDEADLK) {
-		content_res = info->resv_objs[content_res_idx];
-		ww_mutex_lock_slow(&content_res->lock, ctx);
-		goto retry;
-	}
-
-	/* If we are here the function failed */
-	ww_acquire_fini(ctx);
-	return err;
-}
-
-static void
-kbase_dma_fence_unlock_reservations(struct kbase_dma_fence_resv_info *info,
-				    struct ww_acquire_ctx *ctx)
-{
-	unsigned int r;
-
-	for (r = 0; r < info->dma_fence_resv_count; r++)
-		ww_mutex_unlock(&info->resv_objs[r]->lock);
-	ww_acquire_fini(ctx);
-}
-
-
-
-/**
- * kbase_dma_fence_queue_work() - Queue work to handle @katom
- * @katom: Pointer to atom for which to queue work
- *
- * Queue kbase_dma_fence_work() for @katom to clean up the fence callbacks and
- * submit the atom.
- */
-static void
-kbase_dma_fence_queue_work(struct kbase_jd_atom *katom)
-{
-	struct kbase_context *kctx = katom->kctx;
-	bool ret;
-
-	INIT_WORK(&katom->work, kbase_dma_fence_work);
-	ret = queue_work(kctx->dma_fence.wq, &katom->work);
-	/* Warn if work was already queued, that should not happen. */
-	WARN_ON(!ret);
-}
-
-/**
- * kbase_dma_fence_cancel_atom() - Cancels waiting on an atom
- * @katom:	Katom to cancel
- *
- * Locking: katom->dma_fence.callbacks list assumes jctx.lock is held.
- */
-static void
-kbase_dma_fence_cancel_atom(struct kbase_jd_atom *katom)
-{
-	lockdep_assert_held(&katom->kctx->jctx.lock);
-
-	/* Cancel callbacks and clean up. */
-	kbase_fence_free_callbacks(katom);
-
-	/* Mark the atom as handled in case all fences signaled just before
-	 * canceling the callbacks and the worker was queued.
-	 */
-	kbase_fence_dep_count_set(katom, -1);
-
-	/* Prevent job_done_nolock from being called twice on an atom when
-	 * there is a race between job completion and cancellation.
-	 */
-
-	if (katom->status == KBASE_JD_ATOM_STATE_QUEUED) {
-		/* Wait was cancelled - zap the atom */
-		katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
-		if (jd_done_nolock(katom, true))
-			kbase_js_sched_all(katom->kctx->kbdev);
-	}
-}
-
-/**
- * kbase_dma_fence_work() - Worker thread called when a fence is signaled
- * @pwork:	work_struct containing a pointer to a katom
- *
- * This function will clean and mark all dependencies as satisfied
- */
-static void
-kbase_dma_fence_work(struct work_struct *pwork)
-{
-	struct kbase_jd_atom *katom;
-	struct kbase_jd_context *ctx;
-
-	katom = container_of(pwork, struct kbase_jd_atom, work);
-	ctx = &katom->kctx->jctx;
-
-	mutex_lock(&ctx->lock);
-	if (kbase_fence_dep_count_read(katom) != 0)
-		goto out;
-
-	kbase_fence_dep_count_set(katom, -1);
-
-	/* Remove atom from list of dma-fence waiting atoms. */
-	kbase_dma_fence_waiters_remove(katom);
-	/* Cleanup callbacks. */
-	kbase_fence_free_callbacks(katom);
-	/*
-	 * Queue atom on GPU, unless it has already completed due to a failing
-	 * dependency. Run jd_done_nolock() on the katom if it is completed.
-	 */
-	if (unlikely(katom->status == KBASE_JD_ATOM_STATE_COMPLETED))
-		jd_done_nolock(katom, true);
-	else
-		kbase_jd_dep_clear_locked(katom);
-
-out:
-	mutex_unlock(&ctx->lock);
-}
-
-static void
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-kbase_dma_fence_cb(struct fence *fence, struct fence_cb *cb)
-#else
-kbase_dma_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
-#endif
-{
-	struct kbase_fence_cb *kcb = container_of(cb,
-				struct kbase_fence_cb,
-				fence_cb);
-	struct kbase_jd_atom *katom = kcb->katom;
-
-	/* If the atom is zapped dep_count will be forced to a negative number
-	 * preventing this callback from ever scheduling work. Which in turn
-	 * would reschedule the atom.
-	 */
-
-	if (kbase_fence_dep_count_dec_and_test(katom))
-		kbase_dma_fence_queue_work(katom);
-}
-
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-static int
-kbase_dma_fence_add_reservation_callback(struct kbase_jd_atom *katom,
-					 struct reservation_object *resv,
-					 bool exclusive)
-#else
-static int
-kbase_dma_fence_add_reservation_callback(struct kbase_jd_atom *katom,
-					 struct dma_resv *resv,
-					 bool exclusive)
-#endif
-{
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-	struct fence *excl_fence = NULL;
-	struct fence **shared_fences = NULL;
-#else
-	struct dma_fence *excl_fence = NULL;
-	struct dma_fence **shared_fences = NULL;
-#endif
-	unsigned int shared_count = 0;
-	int err, i;
-
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-	err = reservation_object_get_fences_rcu(
-#elif (KERNEL_VERSION(5, 14, 0) > LINUX_VERSION_CODE)
-	err = dma_resv_get_fences_rcu(
-#else
-	err = dma_resv_get_fences(
-#endif
-						resv,
-						&excl_fence,
-						&shared_count,
-						&shared_fences);
-	if (err)
-		return err;
-
-	if (excl_fence) {
-		err = kbase_fence_add_callback(katom,
-						excl_fence,
-						kbase_dma_fence_cb);
-
-		/* Release our reference, taken by reservation_object_get_fences_rcu(),
-		 * to the fence. We have set up our callback (if that was possible),
-		 * and it's the fence's owner is responsible for singling the fence
-		 * before allowing it to disappear.
-		 */
-		dma_fence_put(excl_fence);
-
-		if (err)
-			goto out;
-	}
-
-	if (exclusive) {
-		for (i = 0; i < shared_count; i++) {
-			err = kbase_fence_add_callback(katom,
-							shared_fences[i],
-							kbase_dma_fence_cb);
-			if (err)
-				goto out;
-		}
-	}
-
-	/* Release all our references to the shared fences, taken by
-	 * reservation_object_get_fences_rcu(). We have set up our callback (if
-	 * that was possible), and it's the fence's owner is responsible for
-	 * signaling the fence before allowing it to disappear.
-	 */
-out:
-	for (i = 0; i < shared_count; i++)
-		dma_fence_put(shared_fences[i]);
-	kfree(shared_fences);
-
-	if (err) {
-		/*
-		 * On error, cancel and clean up all callbacks that was set up
-		 * before the error.
-		 */
-		kbase_fence_free_callbacks(katom);
-	}
-
-	return err;
-}
-
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-void kbase_dma_fence_add_reservation(struct reservation_object *resv,
-				     struct kbase_dma_fence_resv_info *info,
-				     bool exclusive)
-#else
-void kbase_dma_fence_add_reservation(struct dma_resv *resv,
-				     struct kbase_dma_fence_resv_info *info,
-				     bool exclusive)
-#endif
-{
-	unsigned int i;
-
-	for (i = 0; i < info->dma_fence_resv_count; i++) {
-		/* Duplicate resource, ignore */
-		if (info->resv_objs[i] == resv)
-			return;
-	}
-
-	info->resv_objs[info->dma_fence_resv_count] = resv;
-	if (exclusive)
-		set_bit(info->dma_fence_resv_count,
-			info->dma_fence_excl_bitmap);
-	(info->dma_fence_resv_count)++;
-}
-
-int kbase_dma_fence_wait(struct kbase_jd_atom *katom,
-			 struct kbase_dma_fence_resv_info *info)
-{
-	int err, i;
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-	struct fence *fence;
-#else
-	struct dma_fence *fence;
-#endif
-	struct ww_acquire_ctx ww_ctx;
-
-	lockdep_assert_held(&katom->kctx->jctx.lock);
-
-	fence = kbase_fence_out_new(katom);
-	if (!fence) {
-		err = -ENOMEM;
-		dev_err(katom->kctx->kbdev->dev,
-			"Error %d creating fence.\n", err);
-		return err;
-	}
-
-	kbase_fence_dep_count_set(katom, 1);
-
-	err = kbase_dma_fence_lock_reservations(info, &ww_ctx);
-	if (err) {
-		dev_err(katom->kctx->kbdev->dev,
-			"Error %d locking reservations.\n", err);
-		kbase_fence_dep_count_set(katom, -1);
-		kbase_fence_out_remove(katom);
-		return err;
-	}
-
-	for (i = 0; i < info->dma_fence_resv_count; i++) {
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-		struct reservation_object *obj = info->resv_objs[i];
-#else
-		struct dma_resv *obj = info->resv_objs[i];
-#endif
-		if (!test_bit(i, info->dma_fence_excl_bitmap)) {
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-			err = reservation_object_reserve_shared(obj);
-#else
-			err = dma_resv_reserve_shared(obj, 0);
-#endif
-			if (err) {
-				dev_err(katom->kctx->kbdev->dev,
-					"Error %d reserving space for shared fence.\n", err);
-				goto end;
-			}
-
-			err = kbase_dma_fence_add_reservation_callback(katom, obj, false);
-			if (err) {
-				dev_err(katom->kctx->kbdev->dev,
-					"Error %d adding reservation to callback.\n", err);
-				goto end;
-			}
-
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-			reservation_object_add_shared_fence(obj, fence);
-#else
-			dma_resv_add_shared_fence(obj, fence);
-#endif
-		} else {
-			err = kbase_dma_fence_add_reservation_callback(katom, obj, true);
-			if (err) {
-				dev_err(katom->kctx->kbdev->dev,
-					"Error %d adding reservation to callback.\n", err);
-				goto end;
-			}
-
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-			reservation_object_add_excl_fence(obj, fence);
-#else
-			dma_resv_add_excl_fence(obj, fence);
-#endif
-		}
-	}
-
-end:
-	kbase_dma_fence_unlock_reservations(info, &ww_ctx);
-
-	if (likely(!err)) {
-		/* Test if the callbacks are already triggered */
-		if (kbase_fence_dep_count_dec_and_test(katom)) {
-			kbase_fence_dep_count_set(katom, -1);
-			kbase_fence_free_callbacks(katom);
-		} else {
-			/* Add katom to the list of dma-buf fence waiting atoms
-			 * only if it is still waiting.
-			 */
-			kbase_dma_fence_waiters_add(katom);
-		}
-	} else {
-		/* There was an error, cancel callbacks, set dep_count to -1 to
-		 * indicate that the atom has been handled (the caller will
-		 * kill it for us), signal the fence, free callbacks and the
-		 * fence.
-		 */
-		kbase_fence_free_callbacks(katom);
-		kbase_fence_dep_count_set(katom, -1);
-		kbase_dma_fence_signal(katom);
-	}
-
-	return err;
-}
-
-void kbase_dma_fence_cancel_all_atoms(struct kbase_context *kctx)
-{
-	struct list_head *list = &kctx->dma_fence.waiting_resource;
-
-	while (!list_empty(list)) {
-		struct kbase_jd_atom *katom;
-
-		katom = list_first_entry(list, struct kbase_jd_atom, queue);
-		kbase_dma_fence_waiters_remove(katom);
-		kbase_dma_fence_cancel_atom(katom);
-	}
-}
-
-void kbase_dma_fence_cancel_callbacks(struct kbase_jd_atom *katom)
-{
-	/* Cancel callbacks and clean up. */
-	if (kbase_fence_free_callbacks(katom))
-		kbase_dma_fence_queue_work(katom);
-}
-
-void kbase_dma_fence_signal(struct kbase_jd_atom *katom)
-{
-	if (!katom->dma_fence.fence)
-		return;
-
-	/* Signal the atom's fence. */
-	dma_fence_signal(katom->dma_fence.fence);
-
-	kbase_fence_out_remove(katom);
-
-	kbase_fence_free_callbacks(katom);
-}
-
-void kbase_dma_fence_term(struct kbase_context *kctx)
-{
-	destroy_workqueue(kctx->dma_fence.wq);
-	kctx->dma_fence.wq = NULL;
-}
-
-int kbase_dma_fence_init(struct kbase_context *kctx)
-{
-	INIT_LIST_HEAD(&kctx->dma_fence.waiting_resource);
-
-	kctx->dma_fence.wq = alloc_workqueue("mali-fence-%d",
-					     WQ_UNBOUND, 1, kctx->pid);
-	if (!kctx->dma_fence.wq)
-		return -ENOMEM;
-
-	return 0;
-}
diff --git a/mali_kbase/mali_kbase_dma_fence.h b/mali_kbase/mali_kbase_dma_fence.h
deleted file mode 100644
index be69118..0000000
--- a/mali_kbase/mali_kbase_dma_fence.h
+++ /dev/null
@@ -1,150 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- *
- * (C) COPYRIGHT 2010-2016, 2020-2022 ARM Limited. All rights reserved.
- *
- * This program is free software and is provided to you under the terms of the
- * GNU General Public License version 2 as published by the Free Software
- * Foundation, and any use by you of this program is subject to the terms
- * of such GNU license.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, you can access it online at
- * http://www.gnu.org/licenses/gpl-2.0.html.
- *
- */
-
-#ifndef _KBASE_DMA_FENCE_H_
-#define _KBASE_DMA_FENCE_H_
-
-#ifdef CONFIG_MALI_DMA_FENCE
-
-#include <linux/list.h>
-#include <linux/version.h>
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-#include <linux/reservation.h>
-#else
-#include <linux/dma-resv.h>
-#endif
-#include <mali_kbase_fence.h>
-
-/* Forward declaration from mali_kbase_defs.h */
-struct kbase_jd_atom;
-struct kbase_context;
-
-/**
- * struct kbase_dma_fence_resv_info - Structure with list of reservation objects
- * @resv_objs:             Array of reservation objects to attach the
- *                         new fence to.
- * @dma_fence_resv_count:  Number of reservation objects in the array.
- * @dma_fence_excl_bitmap: Specifies which resv_obj are exclusive.
- *
- * This is used by some functions to pass around a collection of data about
- * reservation objects.
- */
-struct kbase_dma_fence_resv_info {
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-	struct reservation_object **resv_objs;
-#else
-	struct dma_resv **resv_objs;
-#endif
-	unsigned int dma_fence_resv_count;
-	unsigned long *dma_fence_excl_bitmap;
-};
-
-/**
- * kbase_dma_fence_add_reservation() - Adds a resv to the array of resv_objs
- * @resv:      Reservation object to add to the array.
- * @info:      Pointer to struct with current reservation info
- * @exclusive: Boolean indicating if exclusive access is needed
- *
- * The function adds a new reservation_object to an existing array of
- * reservation_objects. At the same time keeps track of which objects require
- * exclusive access in dma_fence_excl_bitmap.
- */
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-void kbase_dma_fence_add_reservation(struct reservation_object *resv,
-				     struct kbase_dma_fence_resv_info *info,
-				     bool exclusive);
-#else
-void kbase_dma_fence_add_reservation(struct dma_resv *resv,
-				     struct kbase_dma_fence_resv_info *info,
-				     bool exclusive);
-#endif
-
-/**
- * kbase_dma_fence_wait() - Creates a new fence and attaches it to the resv_objs
- * @katom: Katom with the external dependency.
- * @info:  Pointer to struct with current reservation info
- *
- * Return: An error code or 0 if succeeds
- */
-int kbase_dma_fence_wait(struct kbase_jd_atom *katom,
-			 struct kbase_dma_fence_resv_info *info);
-
-/**
- * kbase_dma_fence_cancel_ctx() - Cancel all dma-fences blocked atoms on kctx
- * @kctx: Pointer to kbase context
- *
- * This function will cancel and clean up all katoms on @kctx that is waiting
- * on dma-buf fences.
- *
- * Locking: jctx.lock needs to be held when calling this function.
- */
-void kbase_dma_fence_cancel_all_atoms(struct kbase_context *kctx);
-
-/**
- * kbase_dma_fence_cancel_callbacks() - Cancel only callbacks on katom
- * @katom: Pointer to katom whose callbacks are to be canceled
- *
- * This function cancels all dma-buf fence callbacks on @katom, but does not
- * cancel the katom itself.
- *
- * The caller is responsible for ensuring that jd_done_nolock is called on
- * @katom.
- *
- * Locking: jctx.lock must be held when calling this function.
- */
-void kbase_dma_fence_cancel_callbacks(struct kbase_jd_atom *katom);
-
-/**
- * kbase_dma_fence_signal() - Signal katom's fence and clean up after wait
- * @katom: Pointer to katom to signal and clean up
- *
- * This function will signal the @katom's fence, if it has one, and clean up
- * the callback data from the katom's wait on earlier fences.
- *
- * Locking: jctx.lock must be held while calling this function.
- */
-void kbase_dma_fence_signal(struct kbase_jd_atom *katom);
-
-/**
- * kbase_dma_fence_term() - Terminate Mali dma-fence context
- * @kctx: kbase context to terminate
- */
-void kbase_dma_fence_term(struct kbase_context *kctx);
-
-/**
- * kbase_dma_fence_init() - Initialize Mali dma-fence context
- * @kctx: kbase context to initialize
- *
- * Return: 0 on success, error code otherwise.
- */
-int kbase_dma_fence_init(struct kbase_context *kctx);
-
-#else /* !CONFIG_MALI_DMA_FENCE */
-/* Dummy functions for when dma-buf fence isn't enabled. */
-
-static inline int kbase_dma_fence_init(struct kbase_context *kctx)
-{
-	return 0;
-}
-
-static inline void kbase_dma_fence_term(struct kbase_context *kctx) {}
-#endif /* CONFIG_MALI_DMA_FENCE */
-#endif
diff --git a/mali_kbase/mali_kbase_dummy_job_wa.c b/mali_kbase/mali_kbase_dummy_job_wa.c
index 35934b9..c3c6046 100644
--- a/mali_kbase/mali_kbase_dummy_job_wa.c
+++ b/mali_kbase/mali_kbase_dummy_job_wa.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -183,9 +183,9 @@ int kbase_dummy_job_wa_execute(struct kbase_device *kbdev, u64 cores)
 
 	if (kbdev->dummy_job_wa.flags & KBASE_DUMMY_JOB_WA_FLAG_WAIT_POWERUP) {
 		/* wait for power-ups */
-		wait(kbdev, SHADER_READY_LO, (cores & U32_MAX), true);
+		wait(kbdev, GPU_CONTROL_REG(SHADER_READY_LO), (cores & U32_MAX), true);
 		if (cores >> 32)
-			wait(kbdev, SHADER_READY_HI, (cores >> 32), true);
+			wait(kbdev, GPU_CONTROL_REG(SHADER_READY_HI), (cores >> 32), true);
 	}
 
 	if (kbdev->dummy_job_wa.flags & KBASE_DUMMY_JOB_WA_FLAG_SERIALIZE) {
@@ -218,11 +218,11 @@ int kbase_dummy_job_wa_execute(struct kbase_device *kbdev, u64 cores)
 		kbase_reg_write(kbdev, SHADER_PWROFF_HI, (cores >> 32));
 
 		/* wait for power off complete */
-		wait(kbdev, SHADER_READY_LO, (cores & U32_MAX), false);
-		wait(kbdev, SHADER_PWRTRANS_LO, (cores & U32_MAX), false);
+		wait(kbdev, GPU_CONTROL_REG(SHADER_READY_LO), (cores & U32_MAX), false);
+		wait(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO), (cores & U32_MAX), false);
 		if (cores >> 32) {
-			wait(kbdev, SHADER_READY_HI, (cores >> 32), false);
-			wait(kbdev, SHADER_PWRTRANS_HI, (cores >> 32), false);
+			wait(kbdev, GPU_CONTROL_REG(SHADER_READY_HI), (cores >> 32), false);
+			wait(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI), (cores >> 32), false);
 		}
 		kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_IRQ_CLEAR), U32_MAX);
 	}
diff --git a/mali_kbase/mali_kbase_dvfs_debugfs.c b/mali_kbase/mali_kbase_dvfs_debugfs.c
index 1e584de..e4cb716 100644
--- a/mali_kbase/mali_kbase_dvfs_debugfs.c
+++ b/mali_kbase/mali_kbase_dvfs_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -68,11 +68,7 @@ static const struct file_operations kbasep_dvfs_utilization_debugfs_fops = {
 void kbase_dvfs_status_debugfs_init(struct kbase_device *kbdev)
 {
 	struct dentry *file;
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 
 	if (WARN_ON(!kbdev || IS_ERR_OR_NULL(kbdev->mali_debugfs_directory)))
 		return;
diff --git a/mali_kbase/mali_kbase_fence.c b/mali_kbase/mali_kbase_fence.c
index 01557cd..b16b276 100644
--- a/mali_kbase/mali_kbase_fence.c
+++ b/mali_kbase/mali_kbase_fence.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -59,95 +59,3 @@ kbase_fence_out_new(struct kbase_jd_atom *katom)
 	return fence;
 }
 
-bool
-kbase_fence_free_callbacks(struct kbase_jd_atom *katom)
-{
-	struct kbase_fence_cb *cb, *tmp;
-	bool res = false;
-
-	lockdep_assert_held(&katom->kctx->jctx.lock);
-
-	/* Clean up and free callbacks. */
-	list_for_each_entry_safe(cb, tmp, &katom->dma_fence.callbacks, node) {
-		bool ret;
-
-		/* Cancel callbacks that hasn't been called yet. */
-		ret = dma_fence_remove_callback(cb->fence, &cb->fence_cb);
-		if (ret) {
-			int ret;
-
-			/* Fence had not signaled, clean up after
-			 * canceling.
-			 */
-			ret = atomic_dec_return(&katom->dma_fence.dep_count);
-
-			if (unlikely(ret == 0))
-				res = true;
-		}
-
-		/*
-		 * Release the reference taken in
-		 * kbase_fence_add_callback().
-		 */
-		dma_fence_put(cb->fence);
-		list_del(&cb->node);
-		kfree(cb);
-	}
-
-	return res;
-}
-
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-int
-kbase_fence_add_callback(struct kbase_jd_atom *katom,
-			 struct fence *fence,
-			 fence_func_t callback)
-#else
-int
-kbase_fence_add_callback(struct kbase_jd_atom *katom,
-			 struct dma_fence *fence,
-			 dma_fence_func_t callback)
-#endif
-{
-	int err = 0;
-	struct kbase_fence_cb *kbase_fence_cb;
-
-	if (!fence)
-		return -EINVAL;
-
-	kbase_fence_cb = kmalloc(sizeof(*kbase_fence_cb), GFP_KERNEL);
-	if (!kbase_fence_cb)
-		return -ENOMEM;
-
-	kbase_fence_cb->fence = fence;
-	kbase_fence_cb->katom = katom;
-	INIT_LIST_HEAD(&kbase_fence_cb->node);
-	atomic_inc(&katom->dma_fence.dep_count);
-
-	err = dma_fence_add_callback(fence, &kbase_fence_cb->fence_cb,
-				     callback);
-	if (err == -ENOENT) {
-		/* Fence signaled, get the completion result */
-		err = dma_fence_get_status(fence);
-
-		/* remap success completion to err code */
-		if (err == 1)
-			err = 0;
-
-		kfree(kbase_fence_cb);
-		atomic_dec(&katom->dma_fence.dep_count);
-	} else if (err) {
-		kfree(kbase_fence_cb);
-		atomic_dec(&katom->dma_fence.dep_count);
-	} else {
-		/*
-		 * Get reference to fence that will be kept until callback gets
-		 * cleaned up in kbase_fence_free_callbacks().
-		 */
-		dma_fence_get(fence);
-		/* Add callback to katom's list of callbacks */
-		list_add(&kbase_fence_cb->node, &katom->dma_fence.callbacks);
-	}
-
-	return err;
-}
diff --git a/mali_kbase/mali_kbase_fence.h b/mali_kbase/mali_kbase_fence.h
index 2842280..ea2ac34 100644
--- a/mali_kbase/mali_kbase_fence.h
+++ b/mali_kbase/mali_kbase_fence.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2018, 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,41 +23,62 @@
 #define _KBASE_FENCE_H_
 
 /*
- * mali_kbase_fence.[hc] has common fence code used by both
- * - CONFIG_MALI_DMA_FENCE - implicit DMA fences
- * - CONFIG_SYNC_FILE      - explicit fences beginning with 4.9 kernel
+ * mali_kbase_fence.[hc] has fence code used only by
+ * - CONFIG_SYNC_FILE      - explicit fences
  */
 
-#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 
 #include <linux/list.h>
 #include "mali_kbase_fence_defs.h"
 #include "mali_kbase.h"
+#include "mali_kbase_refcount_defs.h"
+#include <linux/version_compat_defs.h>
 
+#if MALI_USE_CSF
+/* Maximum number of characters in DMA fence timeline name. */
+#define MAX_TIMELINE_NAME (32)
+
+/**
+ * struct kbase_kcpu_dma_fence_meta - Metadata structure for dma fence objects containing
+ *                                    information about KCPU queue. One instance per KCPU
+ *                                    queue.
+ *
+ * @refcount:       Atomic value to keep track of number of references to an instance.
+ *                  An instance can outlive the KCPU queue itself.
+ * @kbdev:          Pointer to Kbase device.
+ * @kctx_id:        Kbase context ID.
+ * @timeline_name:  String of timeline name for associated fence object.
+ */
+struct kbase_kcpu_dma_fence_meta {
+	kbase_refcount_t refcount;
+	struct kbase_device *kbdev;
+	int kctx_id;
+	char timeline_name[MAX_TIMELINE_NAME];
+};
+
+/**
+ * struct kbase_kcpu_dma_fence - Structure which extends a dma fence object to include a
+ *                               reference to metadata containing more informaiton about it.
+ *
+ * @base:      Fence object itself.
+ * @metadata:  Pointer to metadata structure.
+ */
+struct kbase_kcpu_dma_fence {
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-extern const struct fence_ops kbase_fence_ops;
+	struct fence base;
 #else
-extern const struct dma_fence_ops kbase_fence_ops;
+	struct dma_fence base;
+#endif /* LINUX_VERSION_CODE < KERNEL_VERSION(4, 10, 0) */
+	struct kbase_kcpu_dma_fence_meta *metadata;
+};
 #endif
 
-/**
- * struct kbase_fence_cb - Mali dma-fence callback data struct
- * @fence_cb: Callback function
- * @katom:    Pointer to katom that is waiting on this callback
- * @fence:    Pointer to the fence object on which this callback is waiting
- * @node:     List head for linking this callback to the katom
- */
-struct kbase_fence_cb {
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-	struct fence_cb fence_cb;
-	struct fence *fence;
+extern const struct fence_ops kbase_fence_ops;
 #else
-	struct dma_fence_cb fence_cb;
-	struct dma_fence *fence;
+extern const struct dma_fence_ops kbase_fence_ops;
 #endif
-	struct kbase_jd_atom *katom;
-	struct list_head node;
-};
 
 /**
  * kbase_fence_out_new() - Creates a new output fence and puts it on the atom
@@ -71,7 +92,7 @@ struct fence *kbase_fence_out_new(struct kbase_jd_atom *katom);
 struct dma_fence *kbase_fence_out_new(struct kbase_jd_atom *katom);
 #endif
 
-#if defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 /**
  * kbase_fence_fence_in_set() - Assign input fence to atom
  * @katom: Atom to assign input fence to
@@ -102,9 +123,9 @@ static inline void kbase_fence_out_remove(struct kbase_jd_atom *katom)
 	}
 }
 
-#if defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 /**
- * kbase_fence_out_remove() - Removes the input fence from atom
+ * kbase_fence_in_remove() - Removes the input fence from atom
  * @katom: Atom to remove input fence for
  *
  * This will also release the reference to this fence which the atom keeps
@@ -140,144 +161,92 @@ static inline bool kbase_fence_out_is_ours(struct kbase_jd_atom *katom)
 static inline int kbase_fence_out_signal(struct kbase_jd_atom *katom,
 					 int status)
 {
-	if (status) {
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \
-	  KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE)
-		fence_set_error(katom->dma_fence.fence, status);
-#elif (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE)
-		dma_fence_set_error(katom->dma_fence.fence, status);
-#else
-		katom->dma_fence.fence->status = status;
-#endif
-	}
+	if (status)
+		dma_fence_set_error_helper(katom->dma_fence.fence, status);
 	return dma_fence_signal(katom->dma_fence.fence);
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 /**
- * kbase_fence_add_callback() - Add callback on @fence to block @katom
- * @katom: Pointer to katom that will be blocked by @fence
- * @fence: Pointer to fence on which to set up the callback
- * @callback: Pointer to function to be called when fence is signaled
+ * kbase_fence_in_get() - Retrieve input fence for atom.
+ * @katom: Atom to get input fence from
  *
- * Caller needs to hold a reference to @fence when calling this function, and
- * the caller is responsible for releasing that reference.  An additional
- * reference to @fence will be taken when the callback was successfully set up
- * and @fence needs to be kept valid until the callback has been called and
- * cleanup have been done.
+ * A ref will be taken for the fence, so use @kbase_fence_put() to release it
  *
- * Return: 0 on success: fence was either already signaled, or callback was
- * set up. Negative error code is returned on error.
+ * Return: The fence, or NULL if there is no input fence for atom
  */
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-int kbase_fence_add_callback(struct kbase_jd_atom *katom,
-			     struct fence *fence,
-			     fence_func_t callback);
-#else
-int kbase_fence_add_callback(struct kbase_jd_atom *katom,
-			     struct dma_fence *fence,
-			     dma_fence_func_t callback);
+#define kbase_fence_in_get(katom) dma_fence_get((katom)->dma_fence.fence_in)
 #endif
 
 /**
- * kbase_fence_dep_count_set() - Set dep_count value on atom to specified value
- * @katom: Atom to set dep_count for
- * @val: value to set dep_count to
- *
- * The dep_count is available to the users of this module so that they can
- * synchronize completion of the wait with cancellation and adding of more
- * callbacks. For instance, a user could do the following:
+ * kbase_fence_out_get() - Retrieve output fence for atom.
+ * @katom: Atom to get output fence from
  *
- * dep_count set to 1
- * callback #1 added, dep_count is increased to 2
- *                             callback #1 happens, dep_count decremented to 1
- *                             since dep_count > 0, no completion is done
- * callback #2 is added, dep_count is increased to 2
- * dep_count decremented to 1
- *                             callback #2 happens, dep_count decremented to 0
- *                             since dep_count now is zero, completion executes
+ * A ref will be taken for the fence, so use @kbase_fence_put() to release it
  *
- * The dep_count can also be used to make sure that the completion only
- * executes once. This is typically done by setting dep_count to -1 for the
- * thread that takes on this responsibility.
+ * Return: The fence, or NULL if there is no output fence for atom
  */
-static inline void
-kbase_fence_dep_count_set(struct kbase_jd_atom *katom, int val)
-{
-	atomic_set(&katom->dma_fence.dep_count, val);
-}
+#define kbase_fence_out_get(katom) dma_fence_get((katom)->dma_fence.fence)
+
+#endif /* !MALI_USE_CSF */
 
 /**
- * kbase_fence_dep_count_dec_and_test() - Decrements dep_count
- * @katom: Atom to decrement dep_count for
+ * kbase_fence_get() - Retrieve fence for a KCPUQ fence command.
+ * @fence_info: KCPUQ fence command
  *
- * See @kbase_fence_dep_count_set for general description about dep_count
+ * A ref will be taken for the fence, so use @kbase_fence_put() to release it
  *
- * Return: true if value was decremented to zero, otherwise false
+ * Return: The fence, or NULL if there is no fence for KCPUQ fence command
  */
-static inline bool
-kbase_fence_dep_count_dec_and_test(struct kbase_jd_atom *katom)
+#define kbase_fence_get(fence_info) dma_fence_get((fence_info)->fence)
+
+#if MALI_USE_CSF
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+static inline struct kbase_kcpu_dma_fence *kbase_kcpu_dma_fence_get(struct fence *fence)
+#else
+static inline struct kbase_kcpu_dma_fence *kbase_kcpu_dma_fence_get(struct dma_fence *fence)
+#endif
 {
-	return atomic_dec_and_test(&katom->dma_fence.dep_count);
+	if (fence->ops == &kbase_fence_ops)
+		return (struct kbase_kcpu_dma_fence *)fence;
+
+	return NULL;
 }
 
-/**
- * kbase_fence_dep_count_read() - Returns the current dep_count value
- * @katom: Pointer to katom
- *
- * See @kbase_fence_dep_count_set for general description about dep_count
- *
- * Return: The current dep_count value
- */
-static inline int kbase_fence_dep_count_read(struct kbase_jd_atom *katom)
+static inline void kbase_kcpu_dma_fence_meta_put(struct kbase_kcpu_dma_fence_meta *metadata)
 {
-	return atomic_read(&katom->dma_fence.dep_count);
+	if (kbase_refcount_dec_and_test(&metadata->refcount)) {
+		atomic_dec(&metadata->kbdev->live_fence_metadata);
+		kfree(metadata);
+	}
 }
 
-/**
- * kbase_fence_free_callbacks() - Free dma-fence callbacks on a katom
- * @katom: Pointer to katom
- *
- * This function will free all fence callbacks on the katom's list of
- * callbacks. Callbacks that have not yet been called, because their fence
- * hasn't yet signaled, will first be removed from the fence.
- *
- * Locking: katom->dma_fence.callbacks list assumes jctx.lock is held.
- *
- * Return: true if dep_count reached 0, otherwise false.
- */
-bool kbase_fence_free_callbacks(struct kbase_jd_atom *katom);
-
-#if defined(CONFIG_SYNC_FILE)
-/**
- * kbase_fence_in_get() - Retrieve input fence for atom.
- * @katom: Atom to get input fence from
- *
- * A ref will be taken for the fence, so use @kbase_fence_put() to release it
- *
- * Return: The fence, or NULL if there is no input fence for atom
- */
-#define kbase_fence_in_get(katom) dma_fence_get((katom)->dma_fence.fence_in)
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+static inline void kbase_kcpu_dma_fence_put(struct fence *fence)
+#else
+static inline void kbase_kcpu_dma_fence_put(struct dma_fence *fence)
 #endif
+{
+	struct kbase_kcpu_dma_fence *kcpu_fence = kbase_kcpu_dma_fence_get(fence);
 
-/**
- * kbase_fence_out_get() - Retrieve output fence for atom.
- * @katom: Atom to get output fence from
- *
- * A ref will be taken for the fence, so use @kbase_fence_put() to release it
- *
- * Return: The fence, or NULL if there is no output fence for atom
- */
-#define kbase_fence_out_get(katom) dma_fence_get((katom)->dma_fence.fence)
-
-#endif /* !MALI_USE_CSF */
+	if (kcpu_fence)
+		kbase_kcpu_dma_fence_meta_put(kcpu_fence->metadata);
+}
+#endif /* MALI_USE_CSF */
 
 /**
  * kbase_fence_put() - Releases a reference to a fence
  * @fence: Fence to release reference for.
  */
-#define kbase_fence_put(fence) dma_fence_put(fence)
-
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+static inline void kbase_fence_put(struct fence *fence)
+#else
+static inline void kbase_fence_put(struct dma_fence *fence)
+#endif
+{
+	dma_fence_put(fence);
+}
 
-#endif /* CONFIG_MALI_DMA_FENCE || defined(CONFIG_SYNC_FILE */
+#endif /* IS_ENABLED(CONFIG_SYNC_FILE) */
 
 #endif /* _KBASE_FENCE_H_ */
diff --git a/mali_kbase/mali_kbase_fence_ops.c b/mali_kbase/mali_kbase_fence_ops.c
index 14ddf03..f14a55e 100644
--- a/mali_kbase/mali_kbase_fence_ops.c
+++ b/mali_kbase/mali_kbase_fence_ops.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,7 +21,7 @@
 
 #include <linux/atomic.h>
 #include <linux/list.h>
-#include <mali_kbase_fence_defs.h>
+#include <mali_kbase_fence.h>
 #include <mali_kbase.h>
 
 static const char *
@@ -31,7 +31,7 @@ kbase_fence_get_driver_name(struct fence *fence)
 kbase_fence_get_driver_name(struct dma_fence *fence)
 #endif
 {
-	return kbase_drv_name;
+	return KBASE_DRV_NAME;
 }
 
 static const char *
@@ -41,7 +41,13 @@ kbase_fence_get_timeline_name(struct fence *fence)
 kbase_fence_get_timeline_name(struct dma_fence *fence)
 #endif
 {
-	return kbase_timeline_name;
+#if MALI_USE_CSF
+	struct kbase_kcpu_dma_fence *kcpu_fence = (struct kbase_kcpu_dma_fence *)fence;
+
+	return kcpu_fence->metadata->timeline_name;
+#else
+	return KBASE_TIMELINE_NAME;
+#endif /* MALI_USE_CSF */
 }
 
 static bool
@@ -62,22 +68,44 @@ kbase_fence_fence_value_str(struct dma_fence *fence, char *str, int size)
 #endif
 {
 #if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
-	snprintf(str, size, "%u", fence->seqno);
+	const char *format = "%u";
 #else
-	snprintf(str, size, "%llu", fence->seqno);
+	const char *format = "%llu";
 #endif
+	if (unlikely(!scnprintf(str, size, format, fence->seqno)))
+		pr_err("Fail to encode fence seqno to string");
 }
 
+#if MALI_USE_CSF
+static void
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-const struct fence_ops kbase_fence_ops = {
-	.wait = fence_default_wait,
+kbase_fence_release(struct fence *fence)
 #else
-const struct dma_fence_ops kbase_fence_ops = {
-	.wait = dma_fence_default_wait,
+kbase_fence_release(struct dma_fence *fence)
+#endif
+{
+	struct kbase_kcpu_dma_fence *kcpu_fence = (struct kbase_kcpu_dma_fence *)fence;
+
+	kbase_kcpu_dma_fence_meta_put(kcpu_fence->metadata);
+	kfree(kcpu_fence);
+}
 #endif
-	.get_driver_name = kbase_fence_get_driver_name,
-	.get_timeline_name = kbase_fence_get_timeline_name,
-	.enable_signaling = kbase_fence_enable_signaling,
-	.fence_value_str = kbase_fence_fence_value_str
-};
 
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+extern const struct fence_ops kbase_fence_ops; /* silence checker warning */
+const struct fence_ops kbase_fence_ops = { .wait = fence_default_wait,
+#else
+extern const struct dma_fence_ops kbase_fence_ops; /* silence checker warning */
+const struct dma_fence_ops kbase_fence_ops = { .wait = dma_fence_default_wait,
+#endif
+					   .get_driver_name = kbase_fence_get_driver_name,
+					   .get_timeline_name = kbase_fence_get_timeline_name,
+					   .enable_signaling = kbase_fence_enable_signaling,
+#if MALI_USE_CSF
+					   .fence_value_str = kbase_fence_fence_value_str,
+					   .release = kbase_fence_release
+#else
+					    .fence_value_str = kbase_fence_fence_value_str
+#endif
+};
+KBASE_EXPORT_TEST_API(kbase_fence_ops);
diff --git a/mali_kbase/mali_kbase_gpu_metrics.c b/mali_kbase/mali_kbase_gpu_metrics.c
new file mode 100644
index 0000000..af3a08d
--- /dev/null
+++ b/mali_kbase/mali_kbase_gpu_metrics.c
@@ -0,0 +1,260 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include "mali_power_gpu_work_period_trace.h"
+#include <mali_kbase_gpu_metrics.h>
+
+/**
+ * enum gpu_metrics_ctx_flags - Flags for the GPU metrics context
+ *
+ * @ACTIVE_INTERVAL_IN_WP: Flag set when the application first becomes active in
+ *                         the current work period.
+ *
+ * @INSIDE_ACTIVE_LIST:    Flag to track if object is in kbase_device::gpu_metrics::active_list
+ *
+ * All members need to be separate bits. This enum is intended for use in a
+ * bitmask where multiple values get OR-ed together.
+ */
+enum gpu_metrics_ctx_flags {
+	ACTIVE_INTERVAL_IN_WP = 1 << 0,
+	INSIDE_ACTIVE_LIST    = 1 << 1,
+};
+
+static inline bool gpu_metrics_ctx_flag(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx,
+					enum gpu_metrics_ctx_flags flag)
+{
+	return (gpu_metrics_ctx->flags & flag);
+}
+
+static inline void gpu_metrics_ctx_flag_set(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx,
+					    enum gpu_metrics_ctx_flags flag)
+{
+	gpu_metrics_ctx->flags |= flag;
+}
+
+static inline void gpu_metrics_ctx_flag_clear(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx,
+					      enum gpu_metrics_ctx_flags flag)
+{
+	gpu_metrics_ctx->flags &= ~flag;
+}
+
+static inline void validate_tracepoint_data(struct kbase_gpu_metrics_ctx *gpu_metrics_ctx,
+					    u64 start_time, u64 end_time, u64 total_active)
+{
+#ifdef CONFIG_MALI_DEBUG
+	WARN(total_active > NSEC_PER_SEC,
+	     "total_active %llu > 1 second for aid %u active_cnt %u",
+	     total_active, gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt);
+
+	WARN(start_time >= end_time,
+	     "start_time %llu >= end_time %llu for aid %u active_cnt %u",
+	     start_time, end_time, gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt);
+
+	WARN(total_active > (end_time - start_time),
+	     "total_active %llu > end_time %llu - start_time %llu for aid %u active_cnt %u",
+	     total_active, end_time, start_time,
+	     gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt);
+
+	WARN(gpu_metrics_ctx->prev_wp_active_end_time > start_time,
+	     "prev_wp_active_end_time %llu > start_time %llu for aid %u active_cnt %u",
+	     gpu_metrics_ctx->prev_wp_active_end_time, start_time,
+	     gpu_metrics_ctx->aid, gpu_metrics_ctx->active_cnt);
+#endif
+}
+
+static void emit_tracepoint_for_active_gpu_metrics_ctx(struct kbase_device *kbdev,
+			struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, u64 current_time)
+{
+	const u64 start_time = gpu_metrics_ctx->first_active_start_time;
+	u64 total_active = gpu_metrics_ctx->total_active;
+	u64 end_time;
+
+	/* Check if the GPU activity is currently ongoing */
+	if (gpu_metrics_ctx->active_cnt) {
+		end_time = current_time;
+		total_active +=
+			end_time - gpu_metrics_ctx->last_active_start_time;
+
+		gpu_metrics_ctx->first_active_start_time = current_time;
+		gpu_metrics_ctx->last_active_start_time = current_time;
+	} else {
+		end_time = gpu_metrics_ctx->last_active_end_time;
+		gpu_metrics_ctx_flag_clear(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP);
+	}
+
+	trace_gpu_work_period(kbdev->id, gpu_metrics_ctx->aid,
+			      start_time, end_time, total_active);
+
+	validate_tracepoint_data(gpu_metrics_ctx, start_time, end_time, total_active);
+	gpu_metrics_ctx->prev_wp_active_end_time = end_time;
+	gpu_metrics_ctx->total_active = 0;
+}
+
+void kbase_gpu_metrics_ctx_put(struct kbase_device *kbdev,
+			       struct kbase_gpu_metrics_ctx *gpu_metrics_ctx)
+{
+	WARN_ON(list_empty(&gpu_metrics_ctx->link));
+	WARN_ON(!gpu_metrics_ctx->kctx_count);
+
+	gpu_metrics_ctx->kctx_count--;
+	if (gpu_metrics_ctx->kctx_count)
+		return;
+
+	if (gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP))
+		emit_tracepoint_for_active_gpu_metrics_ctx(kbdev,
+			gpu_metrics_ctx, ktime_get_raw_ns());
+
+	list_del_init(&gpu_metrics_ctx->link);
+	kfree(gpu_metrics_ctx);
+}
+
+struct kbase_gpu_metrics_ctx *kbase_gpu_metrics_ctx_get(struct kbase_device *kbdev, u32 aid)
+{
+	struct kbase_gpu_metrics *gpu_metrics = &kbdev->gpu_metrics;
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx;
+
+	list_for_each_entry(gpu_metrics_ctx, &gpu_metrics->active_list, link) {
+		if (gpu_metrics_ctx->aid == aid) {
+			WARN_ON(!gpu_metrics_ctx->kctx_count);
+			gpu_metrics_ctx->kctx_count++;
+			return gpu_metrics_ctx;
+		}
+	}
+
+	list_for_each_entry(gpu_metrics_ctx, &gpu_metrics->inactive_list, link) {
+		if (gpu_metrics_ctx->aid == aid) {
+			WARN_ON(!gpu_metrics_ctx->kctx_count);
+			gpu_metrics_ctx->kctx_count++;
+			return gpu_metrics_ctx;
+		}
+	}
+
+	return NULL;
+}
+
+void kbase_gpu_metrics_ctx_init(struct kbase_device *kbdev,
+				struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, unsigned int aid)
+{
+	gpu_metrics_ctx->aid = aid;
+	gpu_metrics_ctx->total_active = 0;
+	gpu_metrics_ctx->kctx_count = 1;
+	gpu_metrics_ctx->active_cnt = 0;
+	gpu_metrics_ctx->prev_wp_active_end_time = 0;
+	gpu_metrics_ctx->flags = 0;
+	list_add_tail(&gpu_metrics_ctx->link, &kbdev->gpu_metrics.inactive_list);
+}
+
+void kbase_gpu_metrics_ctx_start_activity(struct kbase_context *kctx, u64 timestamp_ns)
+{
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx = kctx->gpu_metrics_ctx;
+
+	gpu_metrics_ctx->active_cnt++;
+	if (gpu_metrics_ctx->active_cnt == 1)
+		gpu_metrics_ctx->last_active_start_time = timestamp_ns;
+
+	if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP)) {
+		gpu_metrics_ctx->first_active_start_time = timestamp_ns;
+		gpu_metrics_ctx_flag_set(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP);
+	}
+
+	if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, INSIDE_ACTIVE_LIST)) {
+		list_move_tail(&gpu_metrics_ctx->link, &kctx->kbdev->gpu_metrics.active_list);
+		gpu_metrics_ctx_flag_set(gpu_metrics_ctx, INSIDE_ACTIVE_LIST);
+	}
+}
+
+void kbase_gpu_metrics_ctx_end_activity(struct kbase_context *kctx, u64 timestamp_ns)
+{
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx = kctx->gpu_metrics_ctx;
+
+	if (WARN_ON_ONCE(!gpu_metrics_ctx->active_cnt))
+		return;
+
+	if (--gpu_metrics_ctx->active_cnt)
+		return;
+
+	if (likely(timestamp_ns > gpu_metrics_ctx->last_active_start_time)) {
+		gpu_metrics_ctx->last_active_end_time = timestamp_ns;
+		gpu_metrics_ctx->total_active +=
+			timestamp_ns - gpu_metrics_ctx->last_active_start_time;
+		return;
+	}
+
+	/* Due to conversion from system timestamp to CPU timestamp (which involves rounding)
+	 * the value for start and end timestamp could come as same.
+	 */
+	if (timestamp_ns == gpu_metrics_ctx->last_active_start_time) {
+		gpu_metrics_ctx->last_active_end_time = timestamp_ns + 1;
+		gpu_metrics_ctx->total_active += 1;
+		return;
+	}
+
+	/* The following check is to detect the situation where 'ACT=0' event was not visible to
+	 * the Kbase even though the system timestamp value sampled by FW was less than the system
+	 * timestamp value sampled by Kbase just before the draining of trace buffer.
+	 */
+	if (gpu_metrics_ctx->last_active_start_time == gpu_metrics_ctx->first_active_start_time &&
+	    gpu_metrics_ctx->prev_wp_active_end_time == gpu_metrics_ctx->first_active_start_time) {
+		WARN_ON_ONCE(gpu_metrics_ctx->total_active);
+		gpu_metrics_ctx->last_active_end_time =
+			gpu_metrics_ctx->prev_wp_active_end_time + 1;
+		gpu_metrics_ctx->total_active = 1;
+		return;
+	}
+
+	WARN_ON_ONCE(1);
+}
+
+void kbase_gpu_metrics_emit_tracepoint(struct kbase_device *kbdev, u64 ts)
+{
+	struct kbase_gpu_metrics *gpu_metrics = &kbdev->gpu_metrics;
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, *tmp;
+
+	list_for_each_entry_safe(gpu_metrics_ctx, tmp, &gpu_metrics->active_list, link) {
+		if (!gpu_metrics_ctx_flag(gpu_metrics_ctx, ACTIVE_INTERVAL_IN_WP)) {
+			WARN_ON(!gpu_metrics_ctx_flag(gpu_metrics_ctx, INSIDE_ACTIVE_LIST));
+			WARN_ON(gpu_metrics_ctx->active_cnt);
+			list_move_tail(&gpu_metrics_ctx->link, &gpu_metrics->inactive_list);
+			gpu_metrics_ctx_flag_clear(gpu_metrics_ctx, INSIDE_ACTIVE_LIST);
+			continue;
+		}
+
+		emit_tracepoint_for_active_gpu_metrics_ctx(kbdev, gpu_metrics_ctx, ts);
+	}
+}
+
+int kbase_gpu_metrics_init(struct kbase_device *kbdev)
+{
+	INIT_LIST_HEAD(&kbdev->gpu_metrics.active_list);
+	INIT_LIST_HEAD(&kbdev->gpu_metrics.inactive_list);
+
+	dev_info(kbdev->dev, "GPU metrics tracepoint support enabled");
+	return 0;
+}
+
+void kbase_gpu_metrics_term(struct kbase_device *kbdev)
+{
+	WARN_ON_ONCE(!list_empty(&kbdev->gpu_metrics.active_list));
+	WARN_ON_ONCE(!list_empty(&kbdev->gpu_metrics.inactive_list));
+}
+
+#endif
diff --git a/mali_kbase/mali_kbase_gpu_metrics.h b/mali_kbase/mali_kbase_gpu_metrics.h
new file mode 100644
index 0000000..adc8816
--- /dev/null
+++ b/mali_kbase/mali_kbase_gpu_metrics.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/**
+ * DOC: GPU metrics frontend APIs
+ */
+
+#ifndef _KBASE_GPU_METRICS_H_
+#define _KBASE_GPU_METRICS_H_
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase.h>
+
+/**
+ * kbase_gpu_metrics_get_emit_interval() - Return the trace point emission interval.
+ *
+ * Return: The time interval in nanosecond for GPU metrics trace point emission.
+ */
+unsigned long kbase_gpu_metrics_get_emit_interval(void);
+
+/**
+ * kbase_gpu_metrics_ctx_put() - Decrement the Kbase context count for the GPU metrics
+ *                               context and free it if the count becomes 0.
+ *
+ * @kbdev:           Pointer to the GPU device.
+ * @gpu_metrics_ctx: Pointer to the GPU metrics context.
+ *
+ * This function must be called when a Kbase context is destroyed.
+ * The function would decrement the Kbase context count for the GPU metrics context and
+ * free the memory if the count becomes 0.
+ * The function would emit a power/gpu_work_period tracepoint for the GPU metrics context
+ * if there was some GPU activity done for it since the last tracepoint was emitted.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ */
+void kbase_gpu_metrics_ctx_put(struct kbase_device *kbdev,
+			       struct kbase_gpu_metrics_ctx *gpu_metrics_ctx);
+
+/**
+ * kbase_gpu_metrics_ctx_get() - Increment the Kbase context count for the GPU metrics
+ *                               context if it exists.
+ *
+ * @kbdev: Pointer to the GPU device.
+ * @aid:   Unique identifier of the Application that is creating the Kbase context.
+ *
+ * This function must be called when a Kbase context is created.
+ * The function would increment the Kbase context count for the GPU metrics context,
+ * corresponding to the @aid, if it exists.
+ *
+ * Return: Pointer to the GPU metrics context corresponding to the @aid if it already
+ * exists otherwise NULL.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ *       The caller shall allocate memory for GPU metrics context structure if the
+ *       function returns NULL.
+ */
+struct kbase_gpu_metrics_ctx *kbase_gpu_metrics_ctx_get(struct kbase_device *kbdev, u32 aid);
+
+/**
+ * kbase_gpu_metrics_ctx_init() - Initialise the GPU metrics context
+ *
+ * @kbdev:           Pointer to the GPU device.
+ * @gpu_metrics_ctx: Pointer to the GPU metrics context.
+ * @aid:             Unique identifier of the Application for which GPU metrics
+ *                   context needs to be initialized.
+ *
+ * This function must be called when a Kbase context is created, after the call to
+ * kbase_gpu_metrics_ctx_get() returned NULL and memory for the GPU metrics context
+ * structure was allocated.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ */
+void kbase_gpu_metrics_ctx_init(struct kbase_device *kbdev,
+				struct kbase_gpu_metrics_ctx *gpu_metrics_ctx, u32 aid);
+
+/**
+ * kbase_gpu_metrics_ctx_start_activity() - Report the start of some GPU activity
+ *                                          for GPU metrics context.
+ *
+ * @kctx:         Pointer to the Kbase context contributing data to the GPU metrics context.
+ * @timestamp_ns: CPU timestamp at which the GPU activity started.
+ *
+ * The provided timestamp would be later used as the "start_time_ns" for the
+ * power/gpu_work_period tracepoint if this is the first GPU activity for the GPU
+ * metrics context in the current work period.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ */
+void kbase_gpu_metrics_ctx_start_activity(struct kbase_context *kctx, u64 timestamp_ns);
+
+/**
+ * kbase_gpu_metrics_ctx_end_activity() - Report the end of some GPU activity
+ *                                        for GPU metrics context.
+ *
+ * @kctx:         Pointer to the Kbase context contributing data to the GPU metrics context.
+ * @timestamp_ns: CPU timestamp at which the GPU activity ended.
+ *
+ * The provided timestamp would be later used as the "end_time_ns" for the
+ * power/gpu_work_period tracepoint if this is the last GPU activity for the GPU
+ * metrics context in the current work period.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ */
+void kbase_gpu_metrics_ctx_end_activity(struct kbase_context *kctx, u64 timestamp_ns);
+
+/**
+ * kbase_gpu_metrics_emit_tracepoint() - Emit power/gpu_work_period tracepoint
+ *                                       for active GPU metrics contexts.
+ *
+ * @kbdev: Pointer to the GPU device.
+ * @ts:    Timestamp at which the tracepoint is being emitted.
+ *
+ * This function would loop through all the active GPU metrics contexts and emit a
+ * power/gpu_work_period tracepoint for them.
+ * The GPU metrics context that is found to be inactive since the last tracepoint
+ * was emitted would be moved to the inactive list.
+ * The current work period would be considered as over and a new work period would
+ * begin whenever any application does the GPU activity.
+ *
+ * Note: The caller must appropriately serialize the call to this function with the
+ *       call to other GPU metrics functions declared in this file.
+ */
+void kbase_gpu_metrics_emit_tracepoint(struct kbase_device *kbdev, u64 ts);
+
+/**
+ * kbase_gpu_metrics_init() - Initialise a gpu_metrics instance for a GPU
+ *
+ * @kbdev: Pointer to the GPU device.
+ *
+ * This function is called once for each @kbdev.
+ *
+ * Return: 0 on success, or negative on failure.
+ */
+int kbase_gpu_metrics_init(struct kbase_device *kbdev);
+
+/**
+ * kbase_gpu_metrics_term() - Terminate a gpu_metrics instance
+ *
+ * @kbdev: Pointer to the GPU device.
+ */
+void kbase_gpu_metrics_term(struct kbase_device *kbdev);
+
+#endif
+#endif  /* _KBASE_GPU_METRICS_H_ */
diff --git a/mali_kbase/mali_kbase_gpuprops.c b/mali_kbase/mali_kbase_gpuprops.c
index 91ef6d1..02d6bb2 100644
--- a/mali_kbase/mali_kbase_gpuprops.c
+++ b/mali_kbase/mali_kbase_gpuprops.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -49,7 +49,7 @@ static void kbase_gpuprops_construct_coherent_groups(
 	props->coherency_info.coherency = props->raw_props.mem_features;
 	props->coherency_info.num_core_groups = hweight64(props->raw_props.l2_present);
 
-	if (props->coherency_info.coherency & GROUPS_L2_COHERENT) {
+	if (props->coherency_info.coherency & MEM_FEATURES_COHERENT_CORE_GROUP_MASK) {
 		/* Group is l2 coherent */
 		group_present = props->raw_props.l2_present;
 	} else {
@@ -198,7 +198,6 @@ static int kbase_gpuprops_get_props(struct base_gpu_props * const gpu_props,
 	gpu_props->raw_props.mem_features = regdump.mem_features;
 	gpu_props->raw_props.mmu_features = regdump.mmu_features;
 	gpu_props->raw_props.l2_features = regdump.l2_features;
-	gpu_props->raw_props.core_features = regdump.core_features;
 
 	gpu_props->raw_props.as_present = regdump.as_present;
 	gpu_props->raw_props.js_present = regdump.js_present;
@@ -312,9 +311,6 @@ static void kbase_gpuprops_calculate_props(
 	struct base_gpu_props * const gpu_props, struct kbase_device *kbdev)
 {
 	int i;
-#if !MALI_USE_CSF
-	u32 gpu_id;
-#endif
 
 	/* Populate the base_gpu_props structure */
 	kbase_gpuprops_update_core_props_gpu_id(gpu_props);
@@ -326,9 +322,6 @@ static void kbase_gpuprops_calculate_props(
 		totalram_pages() << PAGE_SHIFT;
 #endif
 
-	gpu_props->core_props.num_exec_engines =
-		KBASE_UBFX32(gpu_props->raw_props.core_features, 0, 4);
-
 	for (i = 0; i < BASE_GPU_NUM_TEXTURE_FEATURES_REGISTERS; i++)
 		gpu_props->core_props.texture_features[i] = gpu_props->raw_props.texture_features[i];
 
@@ -367,51 +360,23 @@ static void kbase_gpuprops_calculate_props(
 		gpu_props->thread_props.tls_alloc =
 				gpu_props->raw_props.thread_tls_alloc;
 
-	/* MIDHARC-2364 was intended for tULx.
-	 * Workaround for the incorrectly applied THREAD_FEATURES to tDUx.
-	 */
-#if !MALI_USE_CSF
-	gpu_id = kbdev->gpu_props.props.raw_props.gpu_id;
-#endif
-
 #if MALI_USE_CSF
-	CSTD_UNUSED(gpu_id);
 	gpu_props->thread_props.max_registers =
-		KBASE_UBFX32(gpu_props->raw_props.thread_features,
-			     0U, 22);
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 0U, 22);
 	gpu_props->thread_props.impl_tech =
-		KBASE_UBFX32(gpu_props->raw_props.thread_features,
-			     22U, 2);
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 22U, 2);
 	gpu_props->thread_props.max_task_queue =
-		KBASE_UBFX32(gpu_props->raw_props.thread_features,
-			     24U, 8);
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 24U, 8);
 	gpu_props->thread_props.max_thread_group_split = 0;
 #else
-	if ((gpu_id & GPU_ID2_PRODUCT_MODEL) == GPU_ID2_PRODUCT_TDUX) {
-		gpu_props->thread_props.max_registers =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     0U, 22);
-		gpu_props->thread_props.impl_tech =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     22U, 2);
-		gpu_props->thread_props.max_task_queue =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     24U, 8);
-		gpu_props->thread_props.max_thread_group_split = 0;
-	} else {
-		gpu_props->thread_props.max_registers =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     0U, 16);
-		gpu_props->thread_props.max_task_queue =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     16U, 8);
-		gpu_props->thread_props.max_thread_group_split =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     24U, 6);
-		gpu_props->thread_props.impl_tech =
-			KBASE_UBFX32(gpu_props->raw_props.thread_features,
-				     30U, 2);
-	}
+	gpu_props->thread_props.max_registers =
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 0U, 16);
+	gpu_props->thread_props.max_task_queue =
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 16U, 8);
+	gpu_props->thread_props.max_thread_group_split =
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 24U, 6);
+	gpu_props->thread_props.impl_tech =
+		KBASE_UBFX32(gpu_props->raw_props.thread_features, 30U, 2);
 #endif
 
 	/* If values are not specified, then use defaults */
@@ -511,6 +476,21 @@ int kbase_gpuprops_set_features(struct kbase_device *kbdev)
 	if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_THREAD_GROUP_SPLIT))
 		gpu_props->thread_props.max_thread_group_split = 0;
 
+	/*
+	 * The CORE_FEATURES register has different meanings depending on GPU.
+	 * On tGOx, bits[3:0] encode num_exec_engines.
+	 * On CSF GPUs, bits[7:0] is an enumeration that needs to be parsed,
+	 * instead.
+	 * GPUs like tTIx have additional fields like LSC_SIZE that are
+	 * otherwise reserved/RAZ on older GPUs.
+	 */
+	gpu_props->raw_props.core_features = regdump.core_features;
+
+#if !MALI_USE_CSF
+	gpu_props->core_props.num_exec_engines =
+		KBASE_UBFX32(gpu_props->raw_props.core_features, 0, 4);
+#endif
+
 	return err;
 }
 
@@ -532,7 +512,7 @@ MODULE_PARM_DESC(override_l2_hash, "Override L2 hash config for testing");
 static u32 l2_hash_values[ASN_HASH_COUNT] = {
 	0,
 };
-static int num_override_l2_hash_values;
+static unsigned int num_override_l2_hash_values;
 module_param_array(l2_hash_values, uint, &num_override_l2_hash_values, 0000);
 MODULE_PARM_DESC(l2_hash_values, "Override L2 hash values config for testing");
 
@@ -586,7 +566,7 @@ kbase_read_l2_config_from_dt(struct kbase_device *const kbdev)
 
 	kbdev->l2_hash_values_override = false;
 	if (num_override_l2_hash_values) {
-		int i;
+		unsigned int i;
 
 		kbdev->l2_hash_values_override = true;
 		for (i = 0; i < num_override_l2_hash_values; i++)
@@ -670,9 +650,11 @@ int kbase_gpuprops_update_l2_features(struct kbase_device *kbdev)
 			int idx;
 			const bool asn_he = regdump.l2_config &
 					    L2_CONFIG_ASN_HASH_ENABLE_MASK;
+#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
 			if (!asn_he && kbdev->l2_hash_values_override)
 				dev_err(kbdev->dev,
 					"Failed to use requested ASN_HASH, fallback to default");
+#endif
 			for (idx = 0; idx < ASN_HASH_COUNT; idx++)
 				dev_info(kbdev->dev,
 					 "%s ASN_HASH[%d] is [0x%08x]\n",
@@ -698,94 +680,102 @@ static struct {
 #define PROP(name, member) \
 	{KBASE_GPUPROP_ ## name, offsetof(struct base_gpu_props, member), \
 		sizeof(((struct base_gpu_props *)0)->member)}
-	PROP(PRODUCT_ID,                  core_props.product_id),
-	PROP(VERSION_STATUS,              core_props.version_status),
-	PROP(MINOR_REVISION,              core_props.minor_revision),
-	PROP(MAJOR_REVISION,              core_props.major_revision),
-	PROP(GPU_FREQ_KHZ_MAX,            core_props.gpu_freq_khz_max),
-	PROP(LOG2_PROGRAM_COUNTER_SIZE,   core_props.log2_program_counter_size),
-	PROP(TEXTURE_FEATURES_0,          core_props.texture_features[0]),
-	PROP(TEXTURE_FEATURES_1,          core_props.texture_features[1]),
-	PROP(TEXTURE_FEATURES_2,          core_props.texture_features[2]),
-	PROP(TEXTURE_FEATURES_3,          core_props.texture_features[3]),
-	PROP(GPU_AVAILABLE_MEMORY_SIZE,   core_props.gpu_available_memory_size),
-	PROP(NUM_EXEC_ENGINES,            core_props.num_exec_engines),
-
-	PROP(L2_LOG2_LINE_SIZE,           l2_props.log2_line_size),
-	PROP(L2_LOG2_CACHE_SIZE,          l2_props.log2_cache_size),
-	PROP(L2_NUM_L2_SLICES,            l2_props.num_l2_slices),
-
-	PROP(TILER_BIN_SIZE_BYTES,        tiler_props.bin_size_bytes),
-	PROP(TILER_MAX_ACTIVE_LEVELS,     tiler_props.max_active_levels),
-
-	PROP(MAX_THREADS,                 thread_props.max_threads),
-	PROP(MAX_WORKGROUP_SIZE,          thread_props.max_workgroup_size),
-	PROP(MAX_BARRIER_SIZE,            thread_props.max_barrier_size),
-	PROP(MAX_REGISTERS,               thread_props.max_registers),
-	PROP(MAX_TASK_QUEUE,              thread_props.max_task_queue),
-	PROP(MAX_THREAD_GROUP_SPLIT,      thread_props.max_thread_group_split),
-	PROP(IMPL_TECH,                   thread_props.impl_tech),
-	PROP(TLS_ALLOC,                   thread_props.tls_alloc),
-
-	PROP(RAW_SHADER_PRESENT,          raw_props.shader_present),
-	PROP(RAW_TILER_PRESENT,           raw_props.tiler_present),
-	PROP(RAW_L2_PRESENT,              raw_props.l2_present),
-	PROP(RAW_STACK_PRESENT,           raw_props.stack_present),
-	PROP(RAW_L2_FEATURES,             raw_props.l2_features),
-	PROP(RAW_CORE_FEATURES,           raw_props.core_features),
-	PROP(RAW_MEM_FEATURES,            raw_props.mem_features),
-	PROP(RAW_MMU_FEATURES,            raw_props.mmu_features),
-	PROP(RAW_AS_PRESENT,              raw_props.as_present),
-	PROP(RAW_JS_PRESENT,              raw_props.js_present),
-	PROP(RAW_JS_FEATURES_0,           raw_props.js_features[0]),
-	PROP(RAW_JS_FEATURES_1,           raw_props.js_features[1]),
-	PROP(RAW_JS_FEATURES_2,           raw_props.js_features[2]),
-	PROP(RAW_JS_FEATURES_3,           raw_props.js_features[3]),
-	PROP(RAW_JS_FEATURES_4,           raw_props.js_features[4]),
-	PROP(RAW_JS_FEATURES_5,           raw_props.js_features[5]),
-	PROP(RAW_JS_FEATURES_6,           raw_props.js_features[6]),
-	PROP(RAW_JS_FEATURES_7,           raw_props.js_features[7]),
-	PROP(RAW_JS_FEATURES_8,           raw_props.js_features[8]),
-	PROP(RAW_JS_FEATURES_9,           raw_props.js_features[9]),
-	PROP(RAW_JS_FEATURES_10,          raw_props.js_features[10]),
-	PROP(RAW_JS_FEATURES_11,          raw_props.js_features[11]),
-	PROP(RAW_JS_FEATURES_12,          raw_props.js_features[12]),
-	PROP(RAW_JS_FEATURES_13,          raw_props.js_features[13]),
-	PROP(RAW_JS_FEATURES_14,          raw_props.js_features[14]),
-	PROP(RAW_JS_FEATURES_15,          raw_props.js_features[15]),
-	PROP(RAW_TILER_FEATURES,          raw_props.tiler_features),
-	PROP(RAW_TEXTURE_FEATURES_0,      raw_props.texture_features[0]),
-	PROP(RAW_TEXTURE_FEATURES_1,      raw_props.texture_features[1]),
-	PROP(RAW_TEXTURE_FEATURES_2,      raw_props.texture_features[2]),
-	PROP(RAW_TEXTURE_FEATURES_3,      raw_props.texture_features[3]),
-	PROP(RAW_GPU_ID,                  raw_props.gpu_id),
-	PROP(RAW_THREAD_MAX_THREADS,      raw_props.thread_max_threads),
-	PROP(RAW_THREAD_MAX_WORKGROUP_SIZE,
-			raw_props.thread_max_workgroup_size),
+	PROP(PRODUCT_ID, core_props.product_id),
+	PROP(VERSION_STATUS, core_props.version_status),
+	PROP(MINOR_REVISION, core_props.minor_revision),
+	PROP(MAJOR_REVISION, core_props.major_revision),
+	PROP(GPU_FREQ_KHZ_MAX, core_props.gpu_freq_khz_max),
+	PROP(LOG2_PROGRAM_COUNTER_SIZE, core_props.log2_program_counter_size),
+	PROP(TEXTURE_FEATURES_0, core_props.texture_features[0]),
+	PROP(TEXTURE_FEATURES_1, core_props.texture_features[1]),
+	PROP(TEXTURE_FEATURES_2, core_props.texture_features[2]),
+	PROP(TEXTURE_FEATURES_3, core_props.texture_features[3]),
+	PROP(GPU_AVAILABLE_MEMORY_SIZE, core_props.gpu_available_memory_size),
+
+#if MALI_USE_CSF
+#define BACKWARDS_COMPAT_PROP(name, type)                                                          \
+	{                                                                                          \
+		KBASE_GPUPROP_##name, SIZE_MAX, sizeof(type)                                       \
+	}
+	BACKWARDS_COMPAT_PROP(NUM_EXEC_ENGINES, u8),
+#else
+	PROP(NUM_EXEC_ENGINES, core_props.num_exec_engines),
+#endif
+
+	PROP(L2_LOG2_LINE_SIZE, l2_props.log2_line_size),
+	PROP(L2_LOG2_CACHE_SIZE, l2_props.log2_cache_size),
+	PROP(L2_NUM_L2_SLICES, l2_props.num_l2_slices),
+
+	PROP(TILER_BIN_SIZE_BYTES, tiler_props.bin_size_bytes),
+	PROP(TILER_MAX_ACTIVE_LEVELS, tiler_props.max_active_levels),
+
+	PROP(MAX_THREADS, thread_props.max_threads),
+	PROP(MAX_WORKGROUP_SIZE, thread_props.max_workgroup_size),
+	PROP(MAX_BARRIER_SIZE, thread_props.max_barrier_size),
+	PROP(MAX_REGISTERS, thread_props.max_registers),
+	PROP(MAX_TASK_QUEUE, thread_props.max_task_queue),
+	PROP(MAX_THREAD_GROUP_SPLIT, thread_props.max_thread_group_split),
+	PROP(IMPL_TECH, thread_props.impl_tech),
+	PROP(TLS_ALLOC, thread_props.tls_alloc),
+
+	PROP(RAW_SHADER_PRESENT, raw_props.shader_present),
+	PROP(RAW_TILER_PRESENT, raw_props.tiler_present),
+	PROP(RAW_L2_PRESENT, raw_props.l2_present),
+	PROP(RAW_STACK_PRESENT, raw_props.stack_present),
+	PROP(RAW_L2_FEATURES, raw_props.l2_features),
+	PROP(RAW_CORE_FEATURES, raw_props.core_features),
+	PROP(RAW_MEM_FEATURES, raw_props.mem_features),
+	PROP(RAW_MMU_FEATURES, raw_props.mmu_features),
+	PROP(RAW_AS_PRESENT, raw_props.as_present),
+	PROP(RAW_JS_PRESENT, raw_props.js_present),
+	PROP(RAW_JS_FEATURES_0, raw_props.js_features[0]),
+	PROP(RAW_JS_FEATURES_1, raw_props.js_features[1]),
+	PROP(RAW_JS_FEATURES_2, raw_props.js_features[2]),
+	PROP(RAW_JS_FEATURES_3, raw_props.js_features[3]),
+	PROP(RAW_JS_FEATURES_4, raw_props.js_features[4]),
+	PROP(RAW_JS_FEATURES_5, raw_props.js_features[5]),
+	PROP(RAW_JS_FEATURES_6, raw_props.js_features[6]),
+	PROP(RAW_JS_FEATURES_7, raw_props.js_features[7]),
+	PROP(RAW_JS_FEATURES_8, raw_props.js_features[8]),
+	PROP(RAW_JS_FEATURES_9, raw_props.js_features[9]),
+	PROP(RAW_JS_FEATURES_10, raw_props.js_features[10]),
+	PROP(RAW_JS_FEATURES_11, raw_props.js_features[11]),
+	PROP(RAW_JS_FEATURES_12, raw_props.js_features[12]),
+	PROP(RAW_JS_FEATURES_13, raw_props.js_features[13]),
+	PROP(RAW_JS_FEATURES_14, raw_props.js_features[14]),
+	PROP(RAW_JS_FEATURES_15, raw_props.js_features[15]),
+	PROP(RAW_TILER_FEATURES, raw_props.tiler_features),
+	PROP(RAW_TEXTURE_FEATURES_0, raw_props.texture_features[0]),
+	PROP(RAW_TEXTURE_FEATURES_1, raw_props.texture_features[1]),
+	PROP(RAW_TEXTURE_FEATURES_2, raw_props.texture_features[2]),
+	PROP(RAW_TEXTURE_FEATURES_3, raw_props.texture_features[3]),
+	PROP(RAW_GPU_ID, raw_props.gpu_id),
+	PROP(RAW_THREAD_MAX_THREADS, raw_props.thread_max_threads),
+	PROP(RAW_THREAD_MAX_WORKGROUP_SIZE, raw_props.thread_max_workgroup_size),
 	PROP(RAW_THREAD_MAX_BARRIER_SIZE, raw_props.thread_max_barrier_size),
-	PROP(RAW_THREAD_FEATURES,         raw_props.thread_features),
-	PROP(RAW_COHERENCY_MODE,          raw_props.coherency_mode),
-	PROP(RAW_THREAD_TLS_ALLOC,        raw_props.thread_tls_alloc),
-	PROP(RAW_GPU_FEATURES,            raw_props.gpu_features),
-	PROP(COHERENCY_NUM_GROUPS,        coherency_info.num_groups),
-	PROP(COHERENCY_NUM_CORE_GROUPS,   coherency_info.num_core_groups),
-	PROP(COHERENCY_COHERENCY,         coherency_info.coherency),
-	PROP(COHERENCY_GROUP_0,           coherency_info.group[0].core_mask),
-	PROP(COHERENCY_GROUP_1,           coherency_info.group[1].core_mask),
-	PROP(COHERENCY_GROUP_2,           coherency_info.group[2].core_mask),
-	PROP(COHERENCY_GROUP_3,           coherency_info.group[3].core_mask),
-	PROP(COHERENCY_GROUP_4,           coherency_info.group[4].core_mask),
-	PROP(COHERENCY_GROUP_5,           coherency_info.group[5].core_mask),
-	PROP(COHERENCY_GROUP_6,           coherency_info.group[6].core_mask),
-	PROP(COHERENCY_GROUP_7,           coherency_info.group[7].core_mask),
-	PROP(COHERENCY_GROUP_8,           coherency_info.group[8].core_mask),
-	PROP(COHERENCY_GROUP_9,           coherency_info.group[9].core_mask),
-	PROP(COHERENCY_GROUP_10,          coherency_info.group[10].core_mask),
-	PROP(COHERENCY_GROUP_11,          coherency_info.group[11].core_mask),
-	PROP(COHERENCY_GROUP_12,          coherency_info.group[12].core_mask),
-	PROP(COHERENCY_GROUP_13,          coherency_info.group[13].core_mask),
-	PROP(COHERENCY_GROUP_14,          coherency_info.group[14].core_mask),
-	PROP(COHERENCY_GROUP_15,          coherency_info.group[15].core_mask),
+	PROP(RAW_THREAD_FEATURES, raw_props.thread_features),
+	PROP(RAW_COHERENCY_MODE, raw_props.coherency_mode),
+	PROP(RAW_THREAD_TLS_ALLOC, raw_props.thread_tls_alloc),
+	PROP(RAW_GPU_FEATURES, raw_props.gpu_features),
+	PROP(COHERENCY_NUM_GROUPS, coherency_info.num_groups),
+	PROP(COHERENCY_NUM_CORE_GROUPS, coherency_info.num_core_groups),
+	PROP(COHERENCY_COHERENCY, coherency_info.coherency),
+	PROP(COHERENCY_GROUP_0, coherency_info.group[0].core_mask),
+	PROP(COHERENCY_GROUP_1, coherency_info.group[1].core_mask),
+	PROP(COHERENCY_GROUP_2, coherency_info.group[2].core_mask),
+	PROP(COHERENCY_GROUP_3, coherency_info.group[3].core_mask),
+	PROP(COHERENCY_GROUP_4, coherency_info.group[4].core_mask),
+	PROP(COHERENCY_GROUP_5, coherency_info.group[5].core_mask),
+	PROP(COHERENCY_GROUP_6, coherency_info.group[6].core_mask),
+	PROP(COHERENCY_GROUP_7, coherency_info.group[7].core_mask),
+	PROP(COHERENCY_GROUP_8, coherency_info.group[8].core_mask),
+	PROP(COHERENCY_GROUP_9, coherency_info.group[9].core_mask),
+	PROP(COHERENCY_GROUP_10, coherency_info.group[10].core_mask),
+	PROP(COHERENCY_GROUP_11, coherency_info.group[11].core_mask),
+	PROP(COHERENCY_GROUP_12, coherency_info.group[12].core_mask),
+	PROP(COHERENCY_GROUP_13, coherency_info.group[13].core_mask),
+	PROP(COHERENCY_GROUP_14, coherency_info.group[14].core_mask),
+	PROP(COHERENCY_GROUP_15, coherency_info.group[15].core_mask),
 
 #undef PROP
 };
@@ -805,7 +795,7 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev)
 	}
 
 	kprops->prop_buffer_size = size;
-	kprops->prop_buffer = kmalloc(size, GFP_KERNEL);
+	kprops->prop_buffer = kzalloc(size, GFP_KERNEL);
 
 	if (!kprops->prop_buffer) {
 		kprops->prop_buffer_size = 0;
@@ -822,7 +812,14 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev)
 	for (i = 0; i < count; i++) {
 		u32 type = gpu_property_mapping[i].type;
 		u8 type_size;
-		void *field = ((u8 *)props) + gpu_property_mapping[i].offset;
+		const size_t offset = gpu_property_mapping[i].offset;
+		const u64 dummy_backwards_compat_value = (u64)0;
+		const void *field;
+
+		if (likely(offset < sizeof(struct base_gpu_props)))
+			field = ((const u8 *)props) + offset;
+		else
+			field = &dummy_backwards_compat_value;
 
 		switch (gpu_property_mapping[i].size) {
 		case 1:
@@ -848,16 +845,16 @@ int kbase_gpuprops_populate_user_buffer(struct kbase_device *kbdev)
 
 		switch (type_size) {
 		case KBASE_GPUPROP_VALUE_SIZE_U8:
-			WRITE_U8(*((u8 *)field));
+			WRITE_U8(*((const u8 *)field));
 			break;
 		case KBASE_GPUPROP_VALUE_SIZE_U16:
-			WRITE_U16(*((u16 *)field));
+			WRITE_U16(*((const u16 *)field));
 			break;
 		case KBASE_GPUPROP_VALUE_SIZE_U32:
-			WRITE_U32(*((u32 *)field));
+			WRITE_U32(*((const u32 *)field));
 			break;
 		case KBASE_GPUPROP_VALUE_SIZE_U64:
-			WRITE_U64(*((u64 *)field));
+			WRITE_U64(*((const u64 *)field));
 			break;
 		default: /* Cannot be reached */
 			WARN_ON(1);
diff --git a/mali_kbase/mali_kbase_gwt.c b/mali_kbase/mali_kbase_gwt.c
index 16cccee..4914e24 100644
--- a/mali_kbase/mali_kbase_gwt.c
+++ b/mali_kbase/mali_kbase_gwt.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -53,17 +53,17 @@ static void kbase_gpu_gwt_setup_pages(struct kbase_context *kctx,
 					unsigned long flag)
 {
 	kbase_gpu_gwt_setup_page_permission(kctx, flag,
-				rb_first(&(kctx->reg_rbtree_same)));
+					    rb_first(&kctx->reg_zone[SAME_VA_ZONE].reg_rbtree));
 	kbase_gpu_gwt_setup_page_permission(kctx, flag,
-				rb_first(&(kctx->reg_rbtree_custom)));
+					    rb_first(&kctx->reg_zone[CUSTOM_VA_ZONE].reg_rbtree));
 }
 
 
 int kbase_gpu_gwt_start(struct kbase_context *kctx)
 {
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 	if (kctx->gwt_enabled) {
-		kbase_gpu_vm_unlock(kctx);
+		kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 		return -EBUSY;
 	}
 
@@ -90,7 +90,7 @@ int kbase_gpu_gwt_start(struct kbase_context *kctx)
 
 	kbase_gpu_gwt_setup_pages(kctx, ~KBASE_REG_GPU_WR);
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 	return 0;
 }
 
@@ -125,14 +125,17 @@ int kbase_gpu_gwt_stop(struct kbase_context *kctx)
 	return 0;
 }
 
-
+#if (KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE)
+static int list_cmp_function(void *priv, const struct list_head *a, const struct list_head *b)
+#else
 static int list_cmp_function(void *priv, struct list_head *a,
 				struct list_head *b)
+#endif
 {
-	struct kbasep_gwt_list_element *elementA = container_of(a,
-				struct kbasep_gwt_list_element, link);
-	struct kbasep_gwt_list_element *elementB = container_of(b,
-				struct kbasep_gwt_list_element, link);
+	const struct kbasep_gwt_list_element *elementA =
+		container_of(a, struct kbasep_gwt_list_element, link);
+	const struct kbasep_gwt_list_element *elementB =
+		container_of(b, struct kbasep_gwt_list_element, link);
 
 	CSTD_UNUSED(priv);
 
diff --git a/mali_kbase/mali_kbase_hw.c b/mali_kbase/mali_kbase_hw.c
index 75e4aaf..b07327a 100644
--- a/mali_kbase/mali_kbase_hw.c
+++ b/mali_kbase/mali_kbase_hw.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -68,9 +68,6 @@ void kbase_hw_set_features_mask(struct kbase_device *kbdev)
 	case GPU_ID2_PRODUCT_TBAX:
 		features = base_hw_features_tBAx;
 		break;
-	case GPU_ID2_PRODUCT_TDUX:
-		features = base_hw_features_tDUx;
-		break;
 	case GPU_ID2_PRODUCT_TODX:
 	case GPU_ID2_PRODUCT_LODX:
 		features = base_hw_features_tODx;
@@ -85,6 +82,10 @@ void kbase_hw_set_features_mask(struct kbase_device *kbdev)
 	case GPU_ID2_PRODUCT_LTUX:
 		features = base_hw_features_tTUx;
 		break;
+	case GPU_ID2_PRODUCT_TTIX:
+	case GPU_ID2_PRODUCT_LTIX:
+		features = base_hw_features_tTIx;
+		break;
 	default:
 		features = base_hw_features_generic;
 		break;
@@ -137,8 +138,7 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id(
 
 	static const struct base_hw_product base_hw_products[] = {
 		{ GPU_ID2_PRODUCT_TMIX,
-		  { { GPU_ID2_VERSION_MAKE(0, 0, 1),
-		      base_hw_issues_tMIx_r0p0_05dev0 },
+		  { { GPU_ID2_VERSION_MAKE(0, 0, 1), base_hw_issues_tMIx_r0p0_05dev0 },
 		    { GPU_ID2_VERSION_MAKE(0, 0, 2), base_hw_issues_tMIx_r0p0 },
 		    { GPU_ID2_VERSION_MAKE(0, 1, 0), base_hw_issues_tMIx_r0p1 },
 		    { U32_MAX /* sentinel value */, NULL } } },
@@ -208,10 +208,6 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id(
 		    { GPU_ID2_VERSION_MAKE(0, 0, 2), base_hw_issues_tBAx_r0p0 },
 		    { U32_MAX, NULL } } },
 
-		{ GPU_ID2_PRODUCT_TDUX,
-		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tDUx_r0p0 },
-		    { U32_MAX, NULL } } },
-
 		{ GPU_ID2_PRODUCT_TODX,
 		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tODx_r0p0 },
 		    { GPU_ID2_VERSION_MAKE(0, 0, 4), base_hw_issues_tODx_r0p0 },
@@ -232,12 +228,27 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id(
 
 		{ GPU_ID2_PRODUCT_TTUX,
 		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTUx_r0p0 },
+		    { GPU_ID2_VERSION_MAKE(0, 1, 0), base_hw_issues_tTUx_r0p1 },
 		    { GPU_ID2_VERSION_MAKE(1, 0, 0), base_hw_issues_tTUx_r1p0 },
+		    { GPU_ID2_VERSION_MAKE(1, 1, 0), base_hw_issues_tTUx_r1p1 },
+		    { GPU_ID2_VERSION_MAKE(1, 2, 0), base_hw_issues_tTUx_r1p2 },
+		    { GPU_ID2_VERSION_MAKE(1, 3, 0), base_hw_issues_tTUx_r1p3 },
 		    { U32_MAX, NULL } } },
 
 		{ GPU_ID2_PRODUCT_LTUX,
 		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTUx_r0p0 },
 		    { GPU_ID2_VERSION_MAKE(1, 0, 0), base_hw_issues_tTUx_r1p0 },
+		    { GPU_ID2_VERSION_MAKE(1, 1, 0), base_hw_issues_tTUx_r1p1 },
+		    { GPU_ID2_VERSION_MAKE(1, 2, 0), base_hw_issues_tTUx_r1p2 },
+		    { GPU_ID2_VERSION_MAKE(1, 3, 0), base_hw_issues_tTUx_r1p3 },
+		    { U32_MAX, NULL } } },
+
+		{ GPU_ID2_PRODUCT_TTIX,
+		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTIx_r0p0 },
+		    { U32_MAX, NULL } } },
+
+		{ GPU_ID2_PRODUCT_LTIX,
+		  { { GPU_ID2_VERSION_MAKE(0, 0, 0), base_hw_issues_tTIx_r0p0 },
 		    { U32_MAX, NULL } } },
 
 	};
@@ -294,25 +305,20 @@ static const enum base_hw_issue *kbase_hw_get_issues_for_new_id(
 			 */
 			issues = fallback_issues;
 
-#if MALI_CUSTOMER_RELEASE
-			dev_warn(kbdev->dev,
-				"GPU hardware issue table may need updating:\n"
-#else
-			dev_info(kbdev->dev,
-#endif
-				"r%dp%d status %d is unknown; treating as r%dp%d status %d",
-				(gpu_id & GPU_ID2_VERSION_MAJOR) >>
-					GPU_ID2_VERSION_MAJOR_SHIFT,
-				(gpu_id & GPU_ID2_VERSION_MINOR) >>
-					GPU_ID2_VERSION_MINOR_SHIFT,
-				(gpu_id & GPU_ID2_VERSION_STATUS) >>
-					GPU_ID2_VERSION_STATUS_SHIFT,
-				(fallback_version & GPU_ID2_VERSION_MAJOR) >>
-					GPU_ID2_VERSION_MAJOR_SHIFT,
-				(fallback_version & GPU_ID2_VERSION_MINOR) >>
-					GPU_ID2_VERSION_MINOR_SHIFT,
-				(fallback_version & GPU_ID2_VERSION_STATUS) >>
-					GPU_ID2_VERSION_STATUS_SHIFT);
+			dev_notice(kbdev->dev, "r%dp%d status %d not found in HW issues table;\n",
+				   (gpu_id & GPU_ID2_VERSION_MAJOR) >> GPU_ID2_VERSION_MAJOR_SHIFT,
+				   (gpu_id & GPU_ID2_VERSION_MINOR) >> GPU_ID2_VERSION_MINOR_SHIFT,
+				   (gpu_id & GPU_ID2_VERSION_STATUS) >>
+					   GPU_ID2_VERSION_STATUS_SHIFT);
+			dev_notice(kbdev->dev, "falling back to closest match: r%dp%d status %d\n",
+				   (fallback_version & GPU_ID2_VERSION_MAJOR) >>
+					   GPU_ID2_VERSION_MAJOR_SHIFT,
+				   (fallback_version & GPU_ID2_VERSION_MINOR) >>
+					   GPU_ID2_VERSION_MINOR_SHIFT,
+				   (fallback_version & GPU_ID2_VERSION_STATUS) >>
+					   GPU_ID2_VERSION_STATUS_SHIFT);
+			dev_notice(kbdev->dev,
+				   "Execution proceeding normally with fallback match\n");
 
 			gpu_id &= ~GPU_ID2_VERSION;
 			gpu_id |= fallback_version;
@@ -338,7 +344,7 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev)
 		issues = kbase_hw_get_issues_for_new_id(kbdev);
 		if (issues == NULL) {
 			dev_err(kbdev->dev,
-				"Unknown GPU ID %x", gpu_id);
+				"HW product - Unknown GPU ID %x", gpu_id);
 			return -EINVAL;
 		}
 
@@ -382,9 +388,6 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev)
 		case GPU_ID2_PRODUCT_TBAX:
 			issues = base_hw_issues_model_tBAx;
 			break;
-		case GPU_ID2_PRODUCT_TDUX:
-			issues = base_hw_issues_model_tDUx;
-			break;
 		case GPU_ID2_PRODUCT_TODX:
 		case GPU_ID2_PRODUCT_LODX:
 			issues = base_hw_issues_model_tODx;
@@ -399,10 +402,13 @@ int kbase_hw_set_issues_mask(struct kbase_device *kbdev)
 		case GPU_ID2_PRODUCT_LTUX:
 			issues = base_hw_issues_model_tTUx;
 			break;
-
+		case GPU_ID2_PRODUCT_TTIX:
+		case GPU_ID2_PRODUCT_LTIX:
+			issues = base_hw_issues_model_tTIx;
+			break;
 		default:
 			dev_err(kbdev->dev,
-				"Unknown GPU ID %x", gpu_id);
+				"HW issues - Unknown GPU ID %x", gpu_id);
 			return -EINVAL;
 		}
 	}
diff --git a/mali_kbase/mali_kbase_hwaccess_jm.h b/mali_kbase/mali_kbase_hwaccess_jm.h
index 95d7624..ca77c19 100644
--- a/mali_kbase/mali_kbase_hwaccess_jm.h
+++ b/mali_kbase/mali_kbase_hwaccess_jm.h
@@ -97,8 +97,8 @@ bool kbase_backend_use_ctx(struct kbase_device *kbdev,
  * Return: true if context is now active, false otherwise (ie if context does
  *	   not have an address space assigned)
  */
-bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev,
-					struct kbase_context *kctx, int js);
+bool kbase_backend_use_ctx_sched(struct kbase_device *kbdev, struct kbase_context *kctx,
+				 unsigned int js);
 
 /**
  * kbase_backend_release_ctx_irq - Release a context from the GPU. This will
@@ -183,8 +183,7 @@ void kbase_backend_reset(struct kbase_device *kbdev, ktime_t *end_timestamp);
  *
  * Return: Atom currently at the head of slot @js, or NULL
  */
-struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev,
-					int js);
+struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev, unsigned int js);
 
 /**
  * kbase_backend_nr_atoms_on_slot() - Return the number of atoms currently on a
@@ -194,7 +193,7 @@ struct kbase_jd_atom *kbase_backend_inspect_tail(struct kbase_device *kbdev,
  *
  * Return: Number of atoms currently on slot
  */
-int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js);
+int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, unsigned int js);
 
 /**
  * kbase_backend_nr_atoms_submitted() - Return the number of atoms on a slot
@@ -204,7 +203,7 @@ int kbase_backend_nr_atoms_on_slot(struct kbase_device *kbdev, int js);
  *
  * Return: Number of atoms currently on slot @js that are currently on the GPU.
  */
-int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, int js);
+int kbase_backend_nr_atoms_submitted(struct kbase_device *kbdev, unsigned int js);
 
 /**
  * kbase_backend_ctx_count_changed() - Number of contexts ready to submit jobs
@@ -233,10 +232,10 @@ void kbase_backend_timeouts_changed(struct kbase_device *kbdev);
  *
  * Return: Number of jobs that can be submitted.
  */
-int kbase_backend_slot_free(struct kbase_device *kbdev, int js);
+int kbase_backend_slot_free(struct kbase_device *kbdev, unsigned int js);
 
 /**
- * kbase_job_check_enter_disjoint - potentially leave disjoint state
+ * kbase_job_check_leave_disjoint - potentially leave disjoint state
  * @kbdev: kbase device
  * @target_katom: atom which is finishing
  *
@@ -287,8 +286,8 @@ u32 kbase_backend_get_current_flush_id(struct kbase_device *kbdev);
  * Context:
  *   The job slot lock must be held when calling this function.
  */
-void kbase_job_slot_hardstop(struct kbase_context *kctx, int js,
-				struct kbase_jd_atom *target_katom);
+void kbase_job_slot_hardstop(struct kbase_context *kctx, unsigned int js,
+			     struct kbase_jd_atom *target_katom);
 
 /**
  * kbase_gpu_atoms_submitted_any() - Inspect whether there are any atoms
diff --git a/mali_kbase/mali_kbase_hwaccess_pm.h b/mali_kbase/mali_kbase_hwaccess_pm.h
index 1c153c4..effb2ff 100644
--- a/mali_kbase/mali_kbase_hwaccess_pm.h
+++ b/mali_kbase/mali_kbase_hwaccess_pm.h
@@ -209,7 +209,7 @@ int kbase_pm_list_policies(struct kbase_device *kbdev,
 	const struct kbase_pm_policy * const **list);
 
 /**
- * kbase_protected_most_enable - Enable protected mode
+ * kbase_pm_protected_mode_enable() - Enable protected mode
  *
  * @kbdev: Address of the instance of a GPU platform device.
  *
@@ -218,7 +218,7 @@ int kbase_pm_list_policies(struct kbase_device *kbdev,
 int kbase_pm_protected_mode_enable(struct kbase_device *kbdev);
 
 /**
- * kbase_protected_mode_disable - Disable protected mode
+ * kbase_pm_protected_mode_disable() - Disable protected mode
  *
  * @kbdev: Address of the instance of a GPU platform device.
  *
diff --git a/mali_kbase/mali_kbase_hwaccess_time.h b/mali_kbase/mali_kbase_hwaccess_time.h
index 27e2cb7..f16348f 100644
--- a/mali_kbase/mali_kbase_hwaccess_time.h
+++ b/mali_kbase/mali_kbase_hwaccess_time.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014, 2018-2021, 2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,6 +23,56 @@
 #define _KBASE_BACKEND_TIME_H_
 
 /**
+ * struct kbase_backend_time - System timestamp attributes.
+ *
+ * @multiplier:		Numerator of the converter's fraction.
+ * @divisor:		Denominator of the converter's fraction.
+ * @offset:		Converter's offset term.
+ * @device_scaled_timeouts: Timeouts in milliseconds that were scaled to be
+ *                          consistent with the minimum MCU frequency. This
+ *                          array caches the results of all of the conversions
+ *                          for ease of use later on.
+ *
+ * According to Generic timer spec, system timer:
+ * - Increments at a fixed frequency
+ * - Starts operating from zero
+ *
+ * Hence CPU time is a linear function of System Time.
+ *
+ * CPU_ts = alpha * SYS_ts + beta
+ *
+ * Where
+ * - alpha = 10^9/SYS_ts_freq
+ * - beta is calculated by two timer samples taken at the same time:
+ *   beta = CPU_ts_s - SYS_ts_s * alpha
+ *
+ * Since alpha is a rational number, we minimizing possible
+ * rounding error by simplifying the ratio. Thus alpha is stored
+ * as a simple `multiplier / divisor` ratio.
+ *
+ */
+struct kbase_backend_time {
+#if MALI_USE_CSF
+	u64 multiplier;
+	u64 divisor;
+	s64 offset;
+#endif
+	unsigned int device_scaled_timeouts[KBASE_TIMEOUT_SELECTOR_COUNT];
+};
+
+#if MALI_USE_CSF
+/**
+ * kbase_backend_time_convert_gpu_to_cpu() - Convert GPU timestamp to CPU timestamp.
+ *
+ * @kbdev:	Kbase device pointer
+ * @gpu_ts:	System timestamp value to converter.
+ *
+ * Return: The CPU timestamp.
+ */
+u64 __maybe_unused kbase_backend_time_convert_gpu_to_cpu(struct kbase_device *kbdev, u64 gpu_ts);
+#endif
+
+/**
  * kbase_backend_get_gpu_time() - Get current GPU time
  * @kbdev:              Device pointer
  * @cycle_counter:      Pointer to u64 to store cycle counter in.
@@ -47,7 +97,38 @@ void kbase_backend_get_gpu_time_norequest(struct kbase_device *kbdev,
 					  u64 *system_time,
 					  struct timespec64 *ts);
 
-#endif /* _KBASE_BACKEND_TIME_H_ */
+/**
+ * kbase_device_set_timeout_ms - Set an unscaled device timeout in milliseconds,
+ *                               subject to the maximum timeout constraint.
+ *
+ * @kbdev:            KBase device pointer.
+ * @selector:         The specific timeout that should be scaled.
+ * @timeout_ms:    The timeout in cycles which should be scaled.
+ *
+ * This function writes the absolute timeout in milliseconds to the table of
+ * precomputed device timeouts, while estabilishing an upped bound on the individual
+ * timeout of UINT_MAX milliseconds.
+ */
+void kbase_device_set_timeout_ms(struct kbase_device *kbdev, enum kbase_timeout_selector selector,
+				 unsigned int timeout_ms);
+
+/**
+ * kbase_device_set_timeout - Calculate the given timeout using the provided
+ *                            timeout cycles and multiplier.
+ *
+ * @kbdev:            KBase device pointer.
+ * @selector:         The specific timeout that should be scaled.
+ * @timeout_cycles:    The timeout in cycles which should be scaled.
+ * @cycle_multiplier: A multiplier applied to the number of cycles, allowing
+ *                    the callsite to scale the minimum timeout based on the
+ *                    host device.
+ *
+ * This function writes the scaled timeout to the per-device table to avoid
+ * having to recompute the timeouts every single time that the related methods
+ * are called.
+ */
+void kbase_device_set_timeout(struct kbase_device *kbdev, enum kbase_timeout_selector selector,
+			      u64 timeout_cycles, u32 cycle_multiplier);
 
 /**
  * kbase_get_timeout_ms - Choose a timeout value to get a timeout scaled
@@ -70,3 +151,17 @@ unsigned int kbase_get_timeout_ms(struct kbase_device *kbdev,
  * Return: Snapshot of the GPU cycle count register.
  */
 u64 kbase_backend_get_cycle_cnt(struct kbase_device *kbdev);
+
+/**
+ * kbase_backend_time_init() - Initialize system timestamp converter.
+ *
+ * @kbdev:	Kbase device pointer
+ *
+ * This function should only be called after GPU is powered-up and
+ * L2 cached power-up has been initiated.
+ *
+ * Return: Zero on success, error code otherwise.
+ */
+int kbase_backend_time_init(struct kbase_device *kbdev);
+
+#endif /* _KBASE_BACKEND_TIME_H_ */
diff --git a/mali_kbase/mali_kbase_jd.c b/mali_kbase/mali_kbase_jd.c
index 97add10..15e30db 100644
--- a/mali_kbase/mali_kbase_jd.c
+++ b/mali_kbase/mali_kbase_jd.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,6 +28,11 @@
 #include <linux/version.h>
 #include <linux/ratelimit.h>
 #include <linux/priority_control_manager.h>
+#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
+#include <linux/sched/signal.h>
+#else
+#include <linux/signal.h>
+#endif
 
 #include <mali_kbase_jm.h>
 #include <mali_kbase_kinstr_jm.h>
@@ -35,7 +40,6 @@
 #include <tl/mali_kbase_tracepoints.h>
 #include <mali_linux_trace.h>
 
-#include "mali_kbase_dma_fence.h"
 #include <mali_kbase_cs_experimental.h>
 
 #include <mali_kbase_caps.h>
@@ -82,7 +86,7 @@ static void jd_mark_atom_complete(struct kbase_jd_atom *katom)
  * Returns whether the JS needs a reschedule.
  *
  * Note that the caller must also check the atom status and
- * if it is KBASE_JD_ATOM_STATE_COMPLETED must call jd_done_nolock
+ * if it is KBASE_JD_ATOM_STATE_COMPLETED must call kbase_jd_done_nolock
  */
 static bool jd_run_atom(struct kbase_jd_atom *katom)
 {
@@ -148,7 +152,7 @@ void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom)
 
 		if (katom->status == KBASE_JD_ATOM_STATE_COMPLETED) {
 			/* The atom has already finished */
-			resched |= jd_done_nolock(katom, true);
+			resched |= kbase_jd_done_nolock(katom, true);
 		}
 
 		if (resched)
@@ -158,15 +162,6 @@ void kbase_jd_dep_clear_locked(struct kbase_jd_atom *katom)
 
 void kbase_jd_free_external_resources(struct kbase_jd_atom *katom)
 {
-#ifdef CONFIG_MALI_DMA_FENCE
-	/* Flush dma-fence workqueue to ensure that any callbacks that may have
-	 * been queued are done before continuing.
-	 * Any successfully completed atom would have had all it's callbacks
-	 * completed before the atom was run, so only flush for failed atoms.
-	 */
-	if (katom->event_code != BASE_JD_EVENT_DONE)
-		flush_workqueue(katom->kctx->dma_fence.wq);
-#endif /* CONFIG_MALI_DMA_FENCE */
 }
 
 static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom)
@@ -174,10 +169,6 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom)
 	KBASE_DEBUG_ASSERT(katom);
 	KBASE_DEBUG_ASSERT(katom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES);
 
-#ifdef CONFIG_MALI_DMA_FENCE
-	kbase_dma_fence_signal(katom);
-#endif /* CONFIG_MALI_DMA_FENCE */
-
 	kbase_gpu_vm_lock(katom->kctx);
 	/* only roll back if extres is non-NULL */
 	if (katom->extres) {
@@ -185,13 +176,7 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom)
 
 		res_no = katom->nr_extres;
 		while (res_no-- > 0) {
-			struct kbase_mem_phy_alloc *alloc = katom->extres[res_no].alloc;
-			struct kbase_va_region *reg;
-
-			reg = kbase_region_tracker_find_region_base_address(
-					katom->kctx,
-					katom->extres[res_no].gpu_address);
-			kbase_unmap_external_resource(katom->kctx, reg, alloc);
+			kbase_unmap_external_resource(katom->kctx, katom->extres[res_no]);
 		}
 		kfree(katom->extres);
 		katom->extres = NULL;
@@ -207,26 +192,8 @@ static void kbase_jd_post_external_resources(struct kbase_jd_atom *katom)
 
 static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const struct base_jd_atom *user_atom)
 {
-	int err_ret_val = -EINVAL;
+	int err = -EINVAL;
 	u32 res_no;
-#ifdef CONFIG_MALI_DMA_FENCE
-	struct kbase_dma_fence_resv_info info = {
-		.resv_objs = NULL,
-		.dma_fence_resv_count = 0,
-		.dma_fence_excl_bitmap = NULL
-	};
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
-	/*
-	 * When both dma-buf fence and Android native sync is enabled, we
-	 * disable dma-buf fence for contexts that are using Android native
-	 * fences.
-	 */
-	const bool implicit_sync = !kbase_ctx_flag(katom->kctx,
-						   KCTX_NO_IMPLICIT_SYNC);
-#else /* CONFIG_SYNC || CONFIG_SYNC_FILE*/
-	const bool implicit_sync = true;
-#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */
-#endif /* CONFIG_MALI_DMA_FENCE */
 	struct base_external_resource *input_extres;
 
 	KBASE_DEBUG_ASSERT(katom);
@@ -240,68 +207,32 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st
 	if (!katom->extres)
 		return -ENOMEM;
 
-	/* copy user buffer to the end of our real buffer.
-	 * Make sure the struct sizes haven't changed in a way
-	 * we don't support
-	 */
-	BUILD_BUG_ON(sizeof(*input_extres) > sizeof(*katom->extres));
-	input_extres = (struct base_external_resource *)
-			(((unsigned char *)katom->extres) +
-			(sizeof(*katom->extres) - sizeof(*input_extres)) *
-			katom->nr_extres);
+	input_extres = kmalloc_array(katom->nr_extres, sizeof(*input_extres), GFP_KERNEL);
+	if (!input_extres) {
+		err = -ENOMEM;
+		goto failed_input_alloc;
+	}
 
 	if (copy_from_user(input_extres,
 			get_compat_pointer(katom->kctx, user_atom->extres_list),
 			sizeof(*input_extres) * katom->nr_extres) != 0) {
-		err_ret_val = -EINVAL;
-		goto early_err_out;
+		err = -EINVAL;
+		goto failed_input_copy;
 	}
 
-#ifdef CONFIG_MALI_DMA_FENCE
-	if (implicit_sync) {
-		info.resv_objs =
-			kmalloc_array(katom->nr_extres,
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-				      sizeof(struct reservation_object *),
-#else
-				      sizeof(struct dma_resv *),
-#endif
-				      GFP_KERNEL);
-		if (!info.resv_objs) {
-			err_ret_val = -ENOMEM;
-			goto early_err_out;
-		}
-
-		info.dma_fence_excl_bitmap =
-				kcalloc(BITS_TO_LONGS(katom->nr_extres),
-					sizeof(unsigned long), GFP_KERNEL);
-		if (!info.dma_fence_excl_bitmap) {
-			err_ret_val = -ENOMEM;
-			goto early_err_out;
-		}
-	}
-#endif /* CONFIG_MALI_DMA_FENCE */
-
 	/* Take the processes mmap lock */
 	down_read(kbase_mem_get_process_mmap_lock());
 
 	/* need to keep the GPU VM locked while we set up UMM buffers */
 	kbase_gpu_vm_lock(katom->kctx);
 	for (res_no = 0; res_no < katom->nr_extres; res_no++) {
-		struct base_external_resource *res = &input_extres[res_no];
+		struct base_external_resource *user_res = &input_extres[res_no];
 		struct kbase_va_region *reg;
-		struct kbase_mem_phy_alloc *alloc;
-#ifdef CONFIG_MALI_DMA_FENCE
-		bool exclusive;
 
-		exclusive = (res->ext_resource & BASE_EXT_RES_ACCESS_EXCLUSIVE)
-				? true : false;
-#endif
 		reg = kbase_region_tracker_find_region_enclosing_address(
-				katom->kctx,
-				res->ext_resource & ~BASE_EXT_RES_ACCESS_EXCLUSIVE);
+			katom->kctx, user_res->ext_resource & ~BASE_EXT_RES_ACCESS_EXCLUSIVE);
 		/* did we find a matching region object? */
-		if (kbase_is_region_invalid_or_free(reg)) {
+		if (unlikely(kbase_is_region_invalid_or_free(reg))) {
 			/* roll back */
 			goto failed_loop;
 		}
@@ -311,36 +242,11 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st
 			katom->atom_flags |= KBASE_KATOM_FLAG_PROTECTED;
 		}
 
-		alloc = kbase_map_external_resource(katom->kctx, reg,
-				current->mm);
-		if (!alloc) {
-			err_ret_val = -EINVAL;
+		err = kbase_map_external_resource(katom->kctx, reg, current->mm);
+		if (err)
 			goto failed_loop;
-		}
-
-#ifdef CONFIG_MALI_DMA_FENCE
-		if (implicit_sync &&
-		    reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM) {
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-			struct reservation_object *resv;
-#else
-			struct dma_resv *resv;
-#endif
-			resv = reg->gpu_alloc->imported.umm.dma_buf->resv;
-			if (resv)
-				kbase_dma_fence_add_reservation(resv, &info,
-								exclusive);
-		}
-#endif /* CONFIG_MALI_DMA_FENCE */
 
-		/* finish with updating out array with the data we found */
-		/* NOTE: It is important that this is the last thing we do (or
-		 * at least not before the first write) as we overwrite elements
-		 * as we loop and could be overwriting ourself, so no writes
-		 * until the last read for an element.
-		 */
-		katom->extres[res_no].gpu_address = reg->start_pfn << PAGE_SHIFT; /* save the start_pfn (as an address, not pfn) to use fast lookup later */
-		katom->extres[res_no].alloc = alloc;
+		katom->extres[res_no] = reg;
 	}
 	/* successfully parsed the extres array */
 	/* drop the vm lock now */
@@ -349,57 +255,33 @@ static int kbase_jd_pre_external_resources(struct kbase_jd_atom *katom, const st
 	/* Release the processes mmap lock */
 	up_read(kbase_mem_get_process_mmap_lock());
 
-#ifdef CONFIG_MALI_DMA_FENCE
-	if (implicit_sync) {
-		if (info.dma_fence_resv_count) {
-			int ret;
-
-			ret = kbase_dma_fence_wait(katom, &info);
-			if (ret < 0)
-				goto failed_dma_fence_setup;
-		}
-
-		kfree(info.resv_objs);
-		kfree(info.dma_fence_excl_bitmap);
-	}
-#endif /* CONFIG_MALI_DMA_FENCE */
+	/* Free the buffer holding data from userspace */
+	kfree(input_extres);
 
 	/* all done OK */
 	return 0;
 
 /* error handling section */
-
-#ifdef CONFIG_MALI_DMA_FENCE
-failed_dma_fence_setup:
-	/* Lock the processes mmap lock */
-	down_read(kbase_mem_get_process_mmap_lock());
-
-	/* lock before we unmap */
-	kbase_gpu_vm_lock(katom->kctx);
-#endif
-
- failed_loop:
-	/* undo the loop work */
+failed_loop:
+	/* undo the loop work. We are guaranteed to have access to the VA region
+	 * as we hold a reference to it until it's unmapped
+	 */
 	while (res_no-- > 0) {
-		struct kbase_mem_phy_alloc *alloc = katom->extres[res_no].alloc;
+		struct kbase_va_region *reg = katom->extres[res_no];
 
-		kbase_unmap_external_resource(katom->kctx, NULL, alloc);
+		kbase_unmap_external_resource(katom->kctx, reg);
 	}
 	kbase_gpu_vm_unlock(katom->kctx);
 
 	/* Release the processes mmap lock */
 	up_read(kbase_mem_get_process_mmap_lock());
 
- early_err_out:
+failed_input_copy:
+	kfree(input_extres);
+failed_input_alloc:
 	kfree(katom->extres);
 	katom->extres = NULL;
-#ifdef CONFIG_MALI_DMA_FENCE
-	if (implicit_sync) {
-		kfree(info.resv_objs);
-		kfree(info.dma_fence_excl_bitmap);
-	}
-#endif
-	return err_ret_val;
+	return err;
 }
 
 static inline void jd_resolve_dep(struct list_head *out_list,
@@ -422,10 +304,6 @@ static inline void jd_resolve_dep(struct list_head *out_list,
 
 		if (katom->event_code != BASE_JD_EVENT_DONE &&
 			(dep_type != BASE_JD_DEP_TYPE_ORDER)) {
-#ifdef CONFIG_MALI_DMA_FENCE
-			kbase_dma_fence_cancel_callbacks(dep_atom);
-#endif
-
 			dep_atom->event_code = katom->event_code;
 			KBASE_DEBUG_ASSERT(dep_atom->status !=
 						KBASE_JD_ATOM_STATE_UNUSED);
@@ -439,35 +317,8 @@ static inline void jd_resolve_dep(struct list_head *out_list,
 				(IS_GPU_ATOM(dep_atom) && !ctx_is_dying &&
 				!dep_atom->will_fail_event_code &&
 				!other_dep_atom->will_fail_event_code))) {
-			bool dep_satisfied = true;
-#ifdef CONFIG_MALI_DMA_FENCE
-			int dep_count;
-
-			dep_count = kbase_fence_dep_count_read(dep_atom);
-			if (likely(dep_count == -1)) {
-				dep_satisfied = true;
-			} else {
-				/*
-				 * There are either still active callbacks, or
-				 * all fences for this @dep_atom has signaled,
-				 * but the worker that will queue the atom has
-				 * not yet run.
-				 *
-				 * Wait for the fences to signal and the fence
-				 * worker to run and handle @dep_atom. If
-				 * @dep_atom was completed due to error on
-				 * @katom, then the fence worker will pick up
-				 * the complete status and error code set on
-				 * @dep_atom above.
-				 */
-				dep_satisfied = false;
-			}
-#endif /* CONFIG_MALI_DMA_FENCE */
-
-			if (dep_satisfied) {
-				dep_atom->in_jd_list = true;
-				list_add_tail(&dep_atom->jd_item, out_list);
-			}
+			dep_atom->in_jd_list = true;
+			list_add_tail(&dep_atom->jd_item, out_list);
 		}
 	}
 }
@@ -526,33 +377,8 @@ static void jd_try_submitting_deps(struct list_head *out_list,
 						dep_atom->dep[0].atom);
 				bool dep1_valid = is_dep_valid(
 						dep_atom->dep[1].atom);
-				bool dep_satisfied = true;
-#ifdef CONFIG_MALI_DMA_FENCE
-				int dep_count;
-
-				dep_count = kbase_fence_dep_count_read(
-								dep_atom);
-				if (likely(dep_count == -1)) {
-					dep_satisfied = true;
-				} else {
-				/*
-				 * There are either still active callbacks, or
-				 * all fences for this @dep_atom has signaled,
-				 * but the worker that will queue the atom has
-				 * not yet run.
-				 *
-				 * Wait for the fences to signal and the fence
-				 * worker to run and handle @dep_atom. If
-				 * @dep_atom was completed due to error on
-				 * @katom, then the fence worker will pick up
-				 * the complete status and error code set on
-				 * @dep_atom above.
-				 */
-					dep_satisfied = false;
-				}
-#endif /* CONFIG_MALI_DMA_FENCE */
 
-				if (dep0_valid && dep1_valid && dep_satisfied) {
+				if (dep0_valid && dep1_valid) {
 					dep_atom->in_jd_list = true;
 					list_add(&dep_atom->jd_item, out_list);
 				}
@@ -780,10 +606,13 @@ static void jd_mark_simple_gfx_frame_atoms(struct kbase_jd_atom *katom)
 	}
 
 	if (dep_fence && dep_vtx) {
+		unsigned long flags;
 		dev_dbg(kbdev->dev, "Simple gfx frame: {vtx=%pK, wait=%pK}->frag=%pK\n",
 			dep_vtx, dep_fence, katom);
+		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		katom->atom_flags |= KBASE_KATOM_FLAG_SIMPLE_FRAME_FRAGMENT;
 		dep_vtx->atom_flags |= KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF;
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 	}
 }
 
@@ -796,7 +625,7 @@ static void jd_mark_simple_gfx_frame_atoms(struct kbase_jd_atom *katom)
  *
  * The caller must hold the kbase_jd_context.lock.
  */
-bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately)
+bool kbase_jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately)
 {
 	struct kbase_context *kctx = katom->kctx;
 	struct list_head completed_jobs;
@@ -804,6 +633,8 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately)
 	bool need_to_try_schedule_context = false;
 	int i;
 
+	lockdep_assert_held(&kctx->jctx.lock);
+
 	KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START(kctx->kbdev, katom);
 
 	INIT_LIST_HEAD(&completed_jobs);
@@ -855,14 +686,15 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately)
 				dev_dbg(kctx->kbdev->dev,
 					"Simple-frame fragment atom %pK unblocked\n",
 					node);
-				node->atom_flags &=
-					~KBASE_KATOM_FLAG_SIMPLE_FRAME_FRAGMENT;
 				for (i = 0; i < 2; i++) {
 					if (node->dep[i].atom &&
 					    node->dep[i].atom->atom_flags &
 						    KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF) {
+						unsigned long flags;
+						spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags);
 						node->dep[i].atom->atom_flags &=
 							~KBASE_KATOM_FLAG_DEFER_WHILE_POWEROFF;
+						spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags);
 						dev_dbg(kctx->kbdev->dev,
 							"  Undeferred atom %pK\n",
 							node->dep[i].atom);
@@ -936,7 +768,7 @@ bool jd_done_nolock(struct kbase_jd_atom *katom, bool post_immediately)
 	return need_to_try_schedule_context;
 }
 
-KBASE_EXPORT_TEST_API(jd_done_nolock);
+KBASE_EXPORT_TEST_API(kbase_jd_done_nolock);
 
 #if IS_ENABLED(CONFIG_GPU_TRACEPOINTS)
 enum {
@@ -1044,7 +876,6 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 	katom->jobslot = user_atom->jobslot;
 	katom->seq_nr = user_atom->seq_nr;
 	katom->atom_flags = 0;
-	katom->retry_count = 0;
 	katom->need_cache_flush_cores_retained = 0;
 	katom->pre_dep = NULL;
 	katom->post_dep = NULL;
@@ -1078,9 +909,6 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 
 	INIT_LIST_HEAD(&katom->queue);
 	INIT_LIST_HEAD(&katom->jd_item);
-#ifdef CONFIG_MALI_DMA_FENCE
-	kbase_fence_dep_count_set(katom, -1);
-#endif
 
 	/* Don't do anything if there is a mess up with dependencies.
 	 * This is done in a separate cycle to check both the dependencies at ones, otherwise
@@ -1105,7 +933,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 				 * dependencies.
 				 */
 				jd_trace_atom_submit(kctx, katom, NULL);
-				return jd_done_nolock(katom, true);
+				return kbase_jd_done_nolock(katom, true);
 			}
 		}
 	}
@@ -1169,7 +997,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 				if (err >= 0)
 					kbase_finish_soft_job(katom);
 			}
-			return jd_done_nolock(katom, true);
+			return kbase_jd_done_nolock(katom, true);
 		}
 
 		katom->will_fail_event_code = katom->event_code;
@@ -1195,7 +1023,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 	/* Create a new atom. */
 	jd_trace_atom_submit(kctx, katom, &katom->sched_priority);
 
-#if !MALI_INCREMENTAL_RENDERING
+#if !MALI_INCREMENTAL_RENDERING_JM
 	/* Reject atoms for incremental rendering if not supported */
 	if (katom->core_req &
 	(BASE_JD_REQ_START_RENDERPASS|BASE_JD_REQ_END_RENDERPASS)) {
@@ -1203,9 +1031,9 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 			"Rejecting atom with unsupported core_req 0x%x\n",
 			katom->core_req);
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
-#endif /* !MALI_INCREMENTAL_RENDERING */
+#endif /* !MALI_INCREMENTAL_RENDERING_JM */
 
 	if (katom->core_req & BASE_JD_REQ_END_RENDERPASS) {
 		WARN_ON(katom->jc != 0);
@@ -1217,7 +1045,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 		 */
 		dev_err(kctx->kbdev->dev, "Rejecting atom with jc = NULL\n");
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
 
 	/* Reject atoms with an invalid device_nr */
@@ -1227,7 +1055,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 				"Rejecting atom with invalid device_nr %d\n",
 				katom->device_nr);
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
 
 	/* Reject atoms with invalid core requirements */
@@ -1237,7 +1065,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 				"Rejecting atom with invalid core requirements\n");
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
 		katom->core_req &= ~BASE_JD_REQ_EVENT_COALESCE;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
 
 	/* Reject soft-job atom of certain types from accessing external resources */
@@ -1248,7 +1076,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 		dev_err(kctx->kbdev->dev,
 				"Rejecting soft-job atom accessing external resources\n");
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
 
 	if (katom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES) {
@@ -1256,7 +1084,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 		if (kbase_jd_pre_external_resources(katom, user_atom) != 0) {
 			/* setup failed (no access, bad resource, unknown resource types, etc.) */
 			katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-			return jd_done_nolock(katom, true);
+			return kbase_jd_done_nolock(katom, true);
 		}
 	}
 
@@ -1267,7 +1095,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 		 * JIT IDs - atom is invalid.
 		 */
 		katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-		return jd_done_nolock(katom, true);
+		return kbase_jd_done_nolock(katom, true);
 	}
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
@@ -1281,13 +1109,13 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 	if ((katom->core_req & BASE_JD_REQ_SOFT_JOB) == 0) {
 		if (!kbase_js_is_atom_valid(kctx->kbdev, katom)) {
 			katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-			return jd_done_nolock(katom, true);
+			return kbase_jd_done_nolock(katom, true);
 		}
 	} else {
 		/* Soft-job */
 		if (kbase_prepare_soft_job(katom) != 0) {
 			katom->event_code = BASE_JD_EVENT_JOB_INVALID;
-			return jd_done_nolock(katom, true);
+			return kbase_jd_done_nolock(katom, true);
 		}
 	}
 
@@ -1302,16 +1130,10 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 	if (queued && !IS_GPU_ATOM(katom))
 		return false;
 
-#ifdef CONFIG_MALI_DMA_FENCE
-	if (kbase_fence_dep_count_read(katom) != -1)
-		return false;
-
-#endif /* CONFIG_MALI_DMA_FENCE */
-
 	if (katom->core_req & BASE_JD_REQ_SOFT_JOB) {
 		if (kbase_process_soft_job(katom) == 0) {
 			kbase_finish_soft_job(katom);
-			return jd_done_nolock(katom, true);
+			return kbase_jd_done_nolock(katom, true);
 		}
 		return false;
 	}
@@ -1341,7 +1163,7 @@ static bool jd_submit_atom(struct kbase_context *const kctx,
 	}
 
 	/* This is a pure dependency. Resolve it immediately */
-	return jd_done_nolock(katom, true);
+	return kbase_jd_done_nolock(katom, true);
 }
 
 int kbase_jd_submit(struct kbase_context *kctx,
@@ -1379,18 +1201,26 @@ int kbase_jd_submit(struct kbase_context *kctx,
 		return -EINVAL;
 	}
 
+	if (nr_atoms > BASE_JD_ATOM_COUNT) {
+		dev_dbg(kbdev->dev, "Invalid attempt to submit %u atoms at once for kctx %d_%d",
+			nr_atoms, kctx->tgid, kctx->id);
+		return -EINVAL;
+	}
+
 	/* All atoms submitted in this call have the same flush ID */
 	latest_flush = kbase_backend_get_current_flush_id(kbdev);
 
 	for (i = 0; i < nr_atoms; i++) {
-		struct base_jd_atom user_atom;
+		struct base_jd_atom user_atom = {
+			.seq_nr = 0,
+		};
 		struct base_jd_fragment user_jc_incr;
 		struct kbase_jd_atom *katom;
 
 		if (unlikely(jd_atom_is_v2)) {
 			if (copy_from_user(&user_atom.jc, user_addr, sizeof(struct base_jd_atom_v2)) != 0) {
 				dev_dbg(kbdev->dev,
-					"Invalid atom address %p passed to job_submit\n",
+					"Invalid atom address %pK passed to job_submit\n",
 					user_addr);
 				err = -EFAULT;
 				break;
@@ -1401,7 +1231,7 @@ int kbase_jd_submit(struct kbase_context *kctx,
 		} else {
 			if (copy_from_user(&user_atom, user_addr, stride) != 0) {
 				dev_dbg(kbdev->dev,
-					"Invalid atom address %p passed to job_submit\n",
+					"Invalid atom address %pK passed to job_submit\n",
 					user_addr);
 				err = -EFAULT;
 				break;
@@ -1507,6 +1337,12 @@ while (false)
 		kbase_disjoint_event_potential(kbdev);
 
 		rt_mutex_unlock(&jctx->lock);
+		if (fatal_signal_pending(current)) {
+			dev_dbg(kbdev->dev, "Fatal signal pending for kctx %d_%d",
+				kctx->tgid, kctx->id);
+			/* We're being killed so the result code doesn't really matter  */
+			return 0;
+		}
 	}
 
 	if (need_to_try_schedule_context)
@@ -1598,8 +1434,8 @@ void kbase_jd_done_worker(struct kthread_work *data)
 	kbasep_js_remove_job(kbdev, kctx, katom);
 	rt_mutex_unlock(&js_kctx_info->ctx.jsctx_mutex);
 	rt_mutex_unlock(&js_devdata->queue_mutex);
-	/* jd_done_nolock() requires the jsctx_mutex lock to be dropped */
-	jd_done_nolock(katom, false);
+	/* kbase_jd_done_nolock() requires the jsctx_mutex lock to be dropped */
+	kbase_jd_done_nolock(katom, false);
 
 	/* katom may have been freed now, do not use! */
 
@@ -1665,7 +1501,7 @@ void kbase_jd_done_worker(struct kthread_work *data)
 	kbase_js_sched_all(kbdev);
 
 	if (!atomic_dec_return(&kctx->work_count)) {
-		/* If worker now idle then post all events that jd_done_nolock()
+		/* If worker now idle then post all events that kbase_jd_done_nolock()
 		 * has queued
 		 */
 		rt_mutex_lock(&jctx->lock);
@@ -1711,8 +1547,10 @@ static void jd_cancel_worker(struct kthread_work *data)
 	struct kbase_jd_context *jctx;
 	struct kbase_context *kctx;
 	struct kbasep_js_kctx_info *js_kctx_info;
+	bool need_to_try_schedule_context;
 	bool attr_state_changed;
 	struct kbase_device *kbdev;
+	CSTD_UNUSED(need_to_try_schedule_context);
 
 	/* Soft jobs should never reach this function */
 	KBASE_DEBUG_ASSERT((katom->core_req & BASE_JD_REQ_SOFT_JOB) == 0);
@@ -1738,7 +1576,13 @@ static void jd_cancel_worker(struct kthread_work *data)
 
 	rt_mutex_lock(&jctx->lock);
 
-	jd_done_nolock(katom, true);
+	need_to_try_schedule_context = kbase_jd_done_nolock(katom, true);
+	/* Because we're zapping, we're not adding any more jobs to this ctx, so no need to
+	 * schedule the context. There's also no need for the jsctx_mutex to have been taken
+	 * around this too.
+	 */
+	KBASE_DEBUG_ASSERT(!need_to_try_schedule_context);
+	CSTD_UNUSED(need_to_try_schedule_context);
 
 	/* katom may have been freed now, do not use! */
 	rt_mutex_unlock(&jctx->lock);
@@ -1777,6 +1621,8 @@ void kbase_jd_done(struct kbase_jd_atom *katom, int slot_nr,
 	kbdev = kctx->kbdev;
 	KBASE_DEBUG_ASSERT(kbdev);
 
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
 	if (done_code & KBASE_JS_ATOM_DONE_EVICTED_FROM_NEXT)
 		katom->event_code = BASE_JD_EVENT_REMOVED_FROM_NEXT;
 
@@ -1854,20 +1700,8 @@ void kbase_jd_zap_context(struct kbase_context *kctx)
 		kbase_cancel_soft_job(katom);
 	}
 
-
-#ifdef CONFIG_MALI_DMA_FENCE
-	kbase_dma_fence_cancel_all_atoms(kctx);
-#endif
-
 	rt_mutex_unlock(&kctx->jctx.lock);
 
-#ifdef CONFIG_MALI_DMA_FENCE
-	/* Flush dma-fence workqueue to ensure that any callbacks that may have
-	 * been queued are done before continuing.
-	 */
-	flush_workqueue(kctx->dma_fence.wq);
-#endif
-
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 	kbase_debug_job_fault_kctx_unblock(kctx);
 #endif
@@ -1896,11 +1730,10 @@ int kbase_jd_init(struct kbase_context *kctx)
 		kctx->jctx.atoms[i].event_code = BASE_JD_EVENT_JOB_INVALID;
 		kctx->jctx.atoms[i].status = KBASE_JD_ATOM_STATE_UNUSED;
 
-#if defined(CONFIG_MALI_DMA_FENCE) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 		kctx->jctx.atoms[i].dma_fence.context =
 						dma_fence_context_alloc(1);
 		atomic_set(&kctx->jctx.atoms[i].dma_fence.seqno, 0);
-		INIT_LIST_HEAD(&kctx->jctx.atoms[i].dma_fence.callbacks);
 #endif
 	}
 
diff --git a/mali_kbase/mali_kbase_jd_debugfs.c b/mali_kbase/mali_kbase_jd_debugfs.c
index f9b41d5..3e0a760 100644
--- a/mali_kbase/mali_kbase_jd_debugfs.c
+++ b/mali_kbase/mali_kbase_jd_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,8 +24,7 @@
 #include <linux/seq_file.h>
 #include <mali_kbase.h>
 #include <mali_kbase_jd_debugfs.h>
-#include <mali_kbase_dma_fence.h>
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 #include <mali_kbase_sync.h>
 #endif
 #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h>
@@ -38,7 +37,7 @@ struct kbase_jd_debugfs_depinfo {
 static void kbase_jd_debugfs_fence_info(struct kbase_jd_atom *atom,
 					struct seq_file *sfile)
 {
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	struct kbase_sync_fence_info info;
 	int res;
 
@@ -58,55 +57,7 @@ static void kbase_jd_debugfs_fence_info(struct kbase_jd_atom *atom,
 	default:
 		break;
 	}
-#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */
-
-#ifdef CONFIG_MALI_DMA_FENCE
-	if (atom->core_req & BASE_JD_REQ_EXTERNAL_RESOURCES) {
-		struct kbase_fence_cb *cb;
-
-		if (atom->dma_fence.fence) {
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-			struct fence *fence = atom->dma_fence.fence;
-#else
-			struct dma_fence *fence = atom->dma_fence.fence;
-#endif
-
-			seq_printf(sfile,
-#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE)
-				   "Sd(%u#%u: %s) ",
-#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
-				   "Sd(%llu#%u: %s) ",
-#else
-				   "Sd(%llu#%llu: %s) ",
-#endif
-				   fence->context, fence->seqno,
-				   dma_fence_is_signaled(fence) ? "signaled" :
-								  "active");
-		}
-
-		list_for_each_entry(cb, &atom->dma_fence.callbacks,
-				    node) {
-#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
-			struct fence *fence = cb->fence;
-#else
-			struct dma_fence *fence = cb->fence;
-#endif
-
-			seq_printf(sfile,
-#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE)
-				   "Wd(%u#%u: %s) ",
-#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
-				   "Wd(%llu#%u: %s) ",
-#else
-				   "Wd(%llu#%llu: %s) ",
-#endif
-				   fence->context, fence->seqno,
-				   dma_fence_is_signaled(fence) ? "signaled" :
-								  "active");
-		}
-	}
-#endif /* CONFIG_MALI_DMA_FENCE */
-
+#endif /* CONFIG_SYNC_FILE */
 }
 
 static void kbasep_jd_debugfs_atom_deps(
@@ -164,7 +115,7 @@ static int kbasep_jd_debugfs_atoms_show(struct seq_file *sfile, void *data)
 			BASE_UK_VERSION_MINOR);
 
 	/* Print table heading */
-	seq_puts(sfile, " ID, Core req, St, CR,   Predeps,           Start time, Additional info...\n");
+	seq_puts(sfile, " ID, Core req, St,   Predeps,           Start time, Additional info...\n");
 
 	atoms = kctx->jctx.atoms;
 	/* General atom states */
@@ -184,8 +135,8 @@ static int kbasep_jd_debugfs_atoms_show(struct seq_file *sfile, void *data)
 		 * it is valid
 		 */
 		if (ktime_to_ns(atom->start_timestamp))
-			start_timestamp = ktime_to_ns(
-					ktime_sub(ktime_get(), atom->start_timestamp));
+			start_timestamp =
+				ktime_to_ns(ktime_sub(ktime_get_raw(), atom->start_timestamp));
 
 		kbasep_jd_debugfs_atom_deps(deps, atom);
 
@@ -230,11 +181,7 @@ static const struct file_operations kbasep_jd_debugfs_atoms_fops = {
 
 void kbasep_jd_debugfs_ctx_init(struct kbase_context *kctx)
 {
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 
 	/* Caller already ensures this, but we keep the pattern for
 	 * maintenance safety.
diff --git a/mali_kbase/mali_kbase_jm.c b/mali_kbase/mali_kbase_jm.c
index 6cbd6f1..1ac5cd3 100644
--- a/mali_kbase/mali_kbase_jm.c
+++ b/mali_kbase/mali_kbase_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2013-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -37,15 +37,13 @@
  *
  * Return: true if slot can still be submitted on, false if slot is now full.
  */
-static bool kbase_jm_next_job(struct kbase_device *kbdev, int js,
-				int nr_jobs_to_submit)
+static bool kbase_jm_next_job(struct kbase_device *kbdev, unsigned int js, int nr_jobs_to_submit)
 {
 	struct kbase_context *kctx;
 	int i;
 
 	kctx = kbdev->hwaccess.active_kctx[js];
-	dev_dbg(kbdev->dev,
-		"Trying to run the next %d jobs in kctx %pK (s:%d)\n",
+	dev_dbg(kbdev->dev, "Trying to run the next %d jobs in kctx %pK (s:%u)\n",
 		nr_jobs_to_submit, (void *)kctx, js);
 
 	if (!kctx)
@@ -60,7 +58,7 @@ static bool kbase_jm_next_job(struct kbase_device *kbdev, int js,
 		kbase_backend_run_atom(kbdev, katom);
 	}
 
-	dev_dbg(kbdev->dev, "Slot ringbuffer should now be full (s:%d)\n", js);
+	dev_dbg(kbdev->dev, "Slot ringbuffer should now be full (s:%u)\n", js);
 	return false;
 }
 
@@ -72,7 +70,7 @@ u32 kbase_jm_kick(struct kbase_device *kbdev, u32 js_mask)
 	dev_dbg(kbdev->dev, "JM kick slot mask 0x%x\n", js_mask);
 
 	while (js_mask) {
-		int js = ffs(js_mask) - 1;
+		unsigned int js = ffs(js_mask) - 1;
 		int nr_jobs_to_submit = kbase_backend_slot_free(kbdev, js);
 
 		if (kbase_jm_next_job(kbdev, js, nr_jobs_to_submit))
@@ -111,14 +109,14 @@ void kbase_jm_try_kick_all(struct kbase_device *kbdev)
 
 void kbase_jm_idle_ctx(struct kbase_device *kbdev, struct kbase_context *kctx)
 {
-	int js;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
 	for (js = 0; js < BASE_JM_MAX_NR_SLOTS; js++) {
 		if (kbdev->hwaccess.active_kctx[js] == kctx) {
-			dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%d)\n",
-					(void *)kctx, js);
+			dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%u)\n", (void *)kctx,
+				js);
 			kbdev->hwaccess.active_kctx[js] = NULL;
 		}
 	}
diff --git a/mali_kbase/mali_kbase_js.c b/mali_kbase/mali_kbase_js.c
index 97af9c6..8d29f87 100644
--- a/mali_kbase/mali_kbase_js.c
+++ b/mali_kbase/mali_kbase_js.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -34,7 +34,22 @@
 
 #include "mali_kbase_jm.h"
 #include "mali_kbase_hwaccess_jm.h"
+#include <mali_kbase_hwaccess_time.h>
 #include <linux/priority_control_manager.h>
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#include <mali_kbase_gpu_metrics.h>
+
+static unsigned long gpu_metrics_tp_emit_interval_ns = DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS;
+
+module_param(gpu_metrics_tp_emit_interval_ns, ulong, 0444);
+MODULE_PARM_DESC(gpu_metrics_tp_emit_interval_ns,
+		 "Time interval in nano seconds at which GPU metrics tracepoints are emitted");
+
+unsigned long kbase_gpu_metrics_get_emit_interval(void)
+{
+	return gpu_metrics_tp_emit_interval_ns;
+}
+#endif
 
 /*
  * Private types
@@ -77,8 +92,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal(
 		struct kbase_device *kbdev, struct kbase_context *kctx,
 		struct kbasep_js_atom_retained_state *katom_retained_state);
 
-static int kbase_js_get_slot(struct kbase_device *kbdev,
-				struct kbase_jd_atom *katom);
+static unsigned int kbase_js_get_slot(struct kbase_device *kbdev, struct kbase_jd_atom *katom);
 
 static void kbase_js_foreach_ctx_job(struct kbase_context *kctx,
 				     kbasep_js_ctx_job_cb *callback);
@@ -101,6 +115,118 @@ static int kbase_ktrace_get_ctx_refcnt(struct kbase_context *kctx)
  * Private functions
  */
 
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+/**
+ * gpu_metrics_timer_callback() - Callback function for the GPU metrics hrtimer
+ *
+ * @timer: Pointer to the GPU metrics hrtimer
+ *
+ * This function will emit power/gpu_work_period tracepoint for all the active
+ * GPU metrics contexts. The timer will be restarted if needed.
+ *
+ * Return: enum value to indicate that timer should not be restarted.
+ */
+static enum hrtimer_restart gpu_metrics_timer_callback(struct hrtimer *timer)
+{
+	struct kbasep_js_device_data *js_devdata =
+		container_of(timer, struct kbasep_js_device_data, gpu_metrics_timer);
+	struct kbase_device *kbdev =
+		container_of(js_devdata, struct kbase_device, js_data);
+	unsigned long flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	kbase_gpu_metrics_emit_tracepoint(kbdev, ktime_get_raw_ns());
+	WARN_ON_ONCE(!js_devdata->gpu_metrics_timer_running);
+	if (js_devdata->gpu_metrics_timer_needed) {
+		hrtimer_start(&js_devdata->gpu_metrics_timer,
+			      HR_TIMER_DELAY_NSEC(gpu_metrics_tp_emit_interval_ns),
+			      HRTIMER_MODE_REL);
+	} else
+		js_devdata->gpu_metrics_timer_running = false;
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	return HRTIMER_NORESTART;
+}
+
+/**
+ * gpu_metrics_ctx_init() - Take a reference on GPU metrics context if it exists,
+ *                          otherwise allocate and initialise one.
+ *
+ * @kctx: Pointer to the Kbase context.
+ *
+ * The GPU metrics context represents an "Application" for the purposes of GPU metrics
+ * reporting. There may be multiple kbase_contexts contributing data to a single GPU
+ * metrics context.
+ * This function takes a reference on GPU metrics context if it already exists
+ * corresponding to the Application that is creating the Kbase context, otherwise
+ * memory is allocated for it and initialised.
+ *
+ * Return: 0 on success, or negative on failure.
+ */
+static inline int gpu_metrics_ctx_init(struct kbase_context *kctx)
+{
+	struct kbase_gpu_metrics_ctx *gpu_metrics_ctx;
+	struct kbase_device *kbdev = kctx->kbdev;
+	unsigned long flags;
+	int ret = 0;
+
+	const struct cred *cred = get_current_cred();
+	const unsigned int aid = cred->euid.val;
+
+	put_cred(cred);
+
+	/* Return early if this is not a Userspace created context */
+	if (unlikely(!kctx->kfile))
+		return 0;
+
+	/* Serialize against the other threads trying to create/destroy Kbase contexts. */
+	mutex_lock(&kbdev->kctx_list_lock);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+	gpu_metrics_ctx = kbase_gpu_metrics_ctx_get(kbdev, aid);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	if (!gpu_metrics_ctx) {
+		gpu_metrics_ctx = kmalloc(sizeof(*gpu_metrics_ctx), GFP_KERNEL);
+
+		if (gpu_metrics_ctx) {
+			spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+			kbase_gpu_metrics_ctx_init(kbdev, gpu_metrics_ctx, aid);
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+		} else {
+			dev_err(kbdev->dev, "Allocation for gpu_metrics_ctx failed");
+			ret = -ENOMEM;
+		}
+	}
+
+	kctx->gpu_metrics_ctx = gpu_metrics_ctx;
+	mutex_unlock(&kbdev->kctx_list_lock);
+
+	return ret;
+}
+
+/**
+ * gpu_metrics_ctx_term() - Drop a reference on a GPU metrics context and free it
+ *                          if the refcount becomes 0.
+ *
+ * @kctx: Pointer to the Kbase context.
+ */
+static inline void gpu_metrics_ctx_term(struct kbase_context *kctx)
+{
+	unsigned long flags;
+
+	/* Return early if this is not a Userspace created context */
+	if (unlikely(!kctx->kfile))
+		return;
+
+	/* Serialize against the other threads trying to create/destroy Kbase contexts. */
+	mutex_lock(&kctx->kbdev->kctx_list_lock);
+	spin_lock_irqsave(&kctx->kbdev->hwaccess_lock, flags);
+	kbase_gpu_metrics_ctx_put(kctx->kbdev, kctx->gpu_metrics_ctx);
+	spin_unlock_irqrestore(&kctx->kbdev->hwaccess_lock, flags);
+	mutex_unlock(&kctx->kbdev->kctx_list_lock);
+}
+#endif
+
 /**
  * core_reqs_from_jsn_features - Convert JSn_FEATURES to core requirements
  * @features: JSn_FEATURE register value
@@ -151,8 +277,7 @@ static void kbase_js_sync_timers(struct kbase_device *kbdev)
  *
  * Return: true if there are no atoms to pull, false otherwise.
  */
-static inline bool
-jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio)
+static inline bool jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, unsigned int js, int prio)
 {
 	bool none_to_pull;
 	struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js];
@@ -161,9 +286,8 @@ jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio)
 
 	none_to_pull = RB_EMPTY_ROOT(&rb->runnable_tree);
 
-	dev_dbg(kctx->kbdev->dev,
-		"Slot %d (prio %d) is %spullable in kctx %pK\n",
-		js, prio, none_to_pull ? "not " : "", kctx);
+	dev_dbg(kctx->kbdev->dev, "Slot %u (prio %d) is %spullable in kctx %pK\n", js, prio,
+		none_to_pull ? "not " : "", kctx);
 
 	return none_to_pull;
 }
@@ -179,8 +303,7 @@ jsctx_rb_none_to_pull_prio(struct kbase_context *kctx, int js, int prio)
  * Return: true if the ring buffers for all priorities have no pullable atoms,
  *	   false otherwise.
  */
-static inline bool
-jsctx_rb_none_to_pull(struct kbase_context *kctx, int js)
+static inline bool jsctx_rb_none_to_pull(struct kbase_context *kctx, unsigned int js)
 {
 	int prio;
 
@@ -212,8 +335,8 @@ jsctx_rb_none_to_pull(struct kbase_context *kctx, int js)
  *
  * The HW access lock must always be held when calling this function.
  */
-static void jsctx_queue_foreach_prio(struct kbase_context *kctx, int js,
-				     int prio, kbasep_js_ctx_job_cb *callback)
+static void jsctx_queue_foreach_prio(struct kbase_context *kctx, unsigned int js, int prio,
+				     kbasep_js_ctx_job_cb *callback)
 {
 	struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js];
 
@@ -272,7 +395,7 @@ static void jsctx_queue_foreach_prio(struct kbase_context *kctx, int js,
  * jsctx_queue_foreach_prio() to iterate over the queue and invoke @callback
  * for each entry, and remove the entry from the queue.
  */
-static inline void jsctx_queue_foreach(struct kbase_context *kctx, int js,
+static inline void jsctx_queue_foreach(struct kbase_context *kctx, unsigned int js,
 				       kbasep_js_ctx_job_cb *callback)
 {
 	int prio;
@@ -293,15 +416,14 @@ static inline void jsctx_queue_foreach(struct kbase_context *kctx, int js,
  *
  * Return: Pointer to next atom in buffer, or NULL if there is no atom.
  */
-static inline struct kbase_jd_atom *
-jsctx_rb_peek_prio(struct kbase_context *kctx, int js, int prio)
+static inline struct kbase_jd_atom *jsctx_rb_peek_prio(struct kbase_context *kctx, unsigned int js,
+						       int prio)
 {
 	struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js];
 	struct rb_node *node;
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
-	dev_dbg(kctx->kbdev->dev,
-		"Peeking runnable tree of kctx %pK for prio %d (s:%d)\n",
+	dev_dbg(kctx->kbdev->dev, "Peeking runnable tree of kctx %pK for prio %d (s:%u)\n",
 		(void *)kctx, prio, js);
 
 	node = rb_first(&rb->runnable_tree);
@@ -326,8 +448,7 @@ jsctx_rb_peek_prio(struct kbase_context *kctx, int js, int prio)
  *
  * Return: Pointer to next atom in buffer, or NULL if there is no atom.
  */
-static inline struct kbase_jd_atom *
-jsctx_rb_peek(struct kbase_context *kctx, int js)
+static inline struct kbase_jd_atom *jsctx_rb_peek(struct kbase_context *kctx, unsigned int js)
 {
 	int prio;
 
@@ -358,7 +479,7 @@ static inline void
 jsctx_rb_pull(struct kbase_context *kctx, struct kbase_jd_atom *katom)
 {
 	int prio = katom->sched_priority;
-	int js = katom->slot_nr;
+	unsigned int js = katom->slot_nr;
 	struct jsctx_queue *rb = &kctx->jsctx_queue[prio][js];
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
@@ -377,14 +498,14 @@ jsctx_tree_add(struct kbase_context *kctx, struct kbase_jd_atom *katom)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
 	int prio = katom->sched_priority;
-	int js = katom->slot_nr;
+	unsigned int js = katom->slot_nr;
 	struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js];
 	struct rb_node **new = &(queue->runnable_tree.rb_node), *parent = NULL;
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
 
-	dev_dbg(kbdev->dev, "Adding atom %pK to runnable tree of kctx %pK (s:%d)\n",
-		(void *)katom, (void *)kctx, js);
+	dev_dbg(kbdev->dev, "Adding atom %pK to runnable tree of kctx %pK (s:%u)\n", (void *)katom,
+		(void *)kctx, js);
 
 	while (*new) {
 		struct kbase_jd_atom *entry = container_of(*new,
@@ -425,15 +546,11 @@ jsctx_rb_unpull(struct kbase_context *kctx, struct kbase_jd_atom *katom)
 	jsctx_tree_add(kctx, katom);
 }
 
-static bool kbase_js_ctx_pullable(struct kbase_context *kctx,
-					int js,
-					bool is_scheduled);
+static bool kbase_js_ctx_pullable(struct kbase_context *kctx, unsigned int js, bool is_scheduled);
 static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js);
+						  struct kbase_context *kctx, unsigned int js);
 static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js);
+						    struct kbase_context *kctx, unsigned int js);
 
 typedef bool(katom_ordering_func)(const struct kbase_jd_atom *,
 				  const struct kbase_jd_atom *);
@@ -541,6 +658,7 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev)
 	jsdd->gpu_reset_ticks_dumping = DEFAULT_JS_RESET_TICKS_DUMPING;
 	jsdd->ctx_timeslice_ns = DEFAULT_JS_CTX_TIMESLICE_NS;
 	atomic_set(&jsdd->soft_job_timeout_ms, DEFAULT_JS_SOFT_JOB_TIMEOUT);
+	jsdd->js_free_wait_time_ms = kbase_get_timeout_ms(kbdev, JM_DEFAULT_JS_FREE_TIMEOUT);
 
 	dev_dbg(kbdev->dev, "JS Config Attribs: ");
 	dev_dbg(kbdev->dev, "\tscheduling_period_ns:%u",
@@ -565,6 +683,7 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev)
 			jsdd->ctx_timeslice_ns);
 	dev_dbg(kbdev->dev, "\tsoft_job_timeout:%i",
 		atomic_read(&jsdd->soft_job_timeout_ms));
+	dev_dbg(kbdev->dev, "\tjs_free_wait_time_ms:%u", jsdd->js_free_wait_time_ms);
 
 	if (!(jsdd->soft_stop_ticks < jsdd->hard_stop_ticks_ss &&
 			jsdd->hard_stop_ticks_ss < jsdd->gpu_reset_ticks_ss &&
@@ -609,6 +728,21 @@ int kbasep_js_devdata_init(struct kbase_device * const kbdev)
 		}
 	}
 
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	if (!gpu_metrics_tp_emit_interval_ns || (gpu_metrics_tp_emit_interval_ns > NSEC_PER_SEC)) {
+		dev_warn(
+			kbdev->dev,
+			"Invalid value (%lu ns) for module param gpu_metrics_tp_emit_interval_ns. Using default value: %u ns",
+			gpu_metrics_tp_emit_interval_ns, DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS);
+		gpu_metrics_tp_emit_interval_ns = DEFAULT_GPU_METRICS_TP_EMIT_INTERVAL_NS;
+	}
+
+	hrtimer_init(&jsdd->gpu_metrics_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	jsdd->gpu_metrics_timer.function = gpu_metrics_timer_callback;
+	jsdd->gpu_metrics_timer_needed = false;
+	jsdd->gpu_metrics_timer_running = false;
+#endif
+
 	return 0;
 }
 
@@ -619,8 +753,9 @@ void kbasep_js_devdata_halt(struct kbase_device *kbdev)
 
 void kbasep_js_devdata_term(struct kbase_device *kbdev)
 {
-	s8 zero_ctx_attr_ref_count[KBASEP_JS_CTX_ATTR_COUNT] = { 0, };
 	struct kbasep_js_device_data *js_devdata = &kbdev->js_data;
+	s8 zero_ctx_attr_ref_count[KBASEP_JS_CTX_ATTR_COUNT] = { 0, };
+	CSTD_UNUSED(js_devdata);
 
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 
@@ -632,15 +767,31 @@ void kbasep_js_devdata_term(struct kbase_device *kbdev)
 				  zero_ctx_attr_ref_count,
 				  sizeof(zero_ctx_attr_ref_count)) == 0);
 	CSTD_UNUSED(zero_ctx_attr_ref_count);
+
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	js_devdata->gpu_metrics_timer_needed = false;
+	hrtimer_cancel(&js_devdata->gpu_metrics_timer);
+#endif
 }
 
 int kbasep_js_kctx_init(struct kbase_context *const kctx)
 {
 	struct kbasep_js_kctx_info *js_kctx_info;
 	int i, j;
+	int ret;
+	CSTD_UNUSED(js_kctx_info);
 
 	KBASE_DEBUG_ASSERT(kctx != NULL);
 
+	CSTD_UNUSED(ret);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	ret = gpu_metrics_ctx_init(kctx);
+	if (ret)
+		return ret;
+#endif
+
+	kbase_ctx_sched_init_ctx(kctx);
+
 	for (i = 0; i < BASE_JM_MAX_NR_SLOTS; ++i)
 		INIT_LIST_HEAD(&kctx->jctx.sched_info.ctx.ctx_list_entry[i]);
 
@@ -679,9 +830,10 @@ void kbasep_js_kctx_term(struct kbase_context *kctx)
 {
 	struct kbase_device *kbdev;
 	struct kbasep_js_kctx_info *js_kctx_info;
-	int js;
+	unsigned int js;
 	bool update_ctx_count = false;
 	unsigned long flags;
+	CSTD_UNUSED(js_kctx_info);
 
 	KBASE_DEBUG_ASSERT(kctx != NULL);
 
@@ -717,6 +869,11 @@ void kbasep_js_kctx_term(struct kbase_context *kctx)
 		kbase_backend_ctx_count_changed(kbdev);
 		mutex_unlock(&kbdev->js_data.runpool_mutex);
 	}
+
+	kbase_ctx_sched_remove_ctx(kctx);
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+	gpu_metrics_ctx_term(kctx);
+#endif
 }
 
 /*
@@ -724,8 +881,8 @@ void kbasep_js_kctx_term(struct kbase_context *kctx)
  */
 
 /* Should not normally use directly - use kbase_jsctx_slot_atom_pulled_dec() instead */
-static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx,
-						int js, int sched_prio)
+static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx, unsigned int js,
+						int sched_prio)
 {
 	struct kbase_jsctx_slot_tracking *slot_tracking =
 		&kctx->slot_tracking[js];
@@ -737,7 +894,7 @@ static void kbase_jsctx_slot_prio_blocked_clear(struct kbase_context *kctx,
 				      NULL, 0, js, (unsigned int)sched_prio);
 }
 
-static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, int js)
+static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, unsigned int js)
 {
 	return atomic_read(&kctx->slot_tracking[js].atoms_pulled);
 }
@@ -747,7 +904,7 @@ static int kbase_jsctx_slot_atoms_pulled(struct kbase_context *kctx, int js)
  * - that priority level is blocked
  * - or, any higher priority level is blocked
  */
-static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, int js,
+static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, unsigned int js,
 					     int sched_prio)
 {
 	struct kbase_jsctx_slot_tracking *slot_tracking =
@@ -787,7 +944,7 @@ static bool kbase_jsctx_slot_prio_is_blocked(struct kbase_context *kctx, int js,
 static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx,
 					    const struct kbase_jd_atom *katom)
 {
-	int js = katom->slot_nr;
+	unsigned int js = katom->slot_nr;
 	int sched_prio = katom->sched_priority;
 	struct kbase_jsctx_slot_tracking *slot_tracking =
 		&kctx->slot_tracking[js];
@@ -796,7 +953,7 @@ static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx,
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
 
 	WARN(kbase_jsctx_slot_prio_is_blocked(kctx, js, sched_prio),
-	     "Should not have pulled atoms for slot %d from a context that is blocked at priority %d or higher",
+	     "Should not have pulled atoms for slot %u from a context that is blocked at priority %d or higher",
 	     js, sched_prio);
 
 	nr_atoms_pulled = atomic_inc_return(&kctx->atoms_pulled_all_slots);
@@ -825,7 +982,7 @@ static int kbase_jsctx_slot_atom_pulled_inc(struct kbase_context *kctx,
 static bool kbase_jsctx_slot_atom_pulled_dec(struct kbase_context *kctx,
 					     const struct kbase_jd_atom *katom)
 {
-	int js = katom->slot_nr;
+	unsigned int js = katom->slot_nr;
 	int sched_prio = katom->sched_priority;
 	int atoms_pulled_pri;
 	struct kbase_jsctx_slot_tracking *slot_tracking =
@@ -874,14 +1031,12 @@ static bool kbase_jsctx_slot_atom_pulled_dec(struct kbase_context *kctx,
  * Return: true if caller should call kbase_backend_ctx_count_changed()
  */
 static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js)
+						  struct kbase_context *kctx, unsigned int js)
 {
 	bool ret = false;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	dev_dbg(kbdev->dev, "Add pullable tail kctx %pK (s:%d)\n",
-		(void *)kctx, js);
+	dev_dbg(kbdev->dev, "Add pullable tail kctx %pK (s:%u)\n", (void *)kctx, js);
 
 	if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]))
 		list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]);
@@ -916,14 +1071,13 @@ static bool kbase_js_ctx_list_add_pullable_nolock(struct kbase_device *kbdev,
  *
  * Return:  true if caller should call kbase_backend_ctx_count_changed()
  */
-static bool kbase_js_ctx_list_add_pullable_head_nolock(
-		struct kbase_device *kbdev, struct kbase_context *kctx, int js)
+static bool kbase_js_ctx_list_add_pullable_head_nolock(struct kbase_device *kbdev,
+						       struct kbase_context *kctx, unsigned int js)
 {
 	bool ret = false;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	dev_dbg(kbdev->dev, "Add pullable head kctx %pK (s:%d)\n",
-		(void *)kctx, js);
+	dev_dbg(kbdev->dev, "Add pullable head kctx %pK (s:%u)\n", (void *)kctx, js);
 
 	if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]))
 		list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]);
@@ -961,8 +1115,7 @@ static bool kbase_js_ctx_list_add_pullable_head_nolock(
  * Return:  true if caller should call kbase_backend_ctx_count_changed()
  */
 static bool kbase_js_ctx_list_add_pullable_head(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js)
+						struct kbase_context *kctx, unsigned int js)
 {
 	bool ret;
 	unsigned long flags;
@@ -992,14 +1145,12 @@ static bool kbase_js_ctx_list_add_pullable_head(struct kbase_device *kbdev,
  * Return:  true if caller should call kbase_backend_ctx_count_changed()
  */
 static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev,
-						struct kbase_context *kctx,
-						int js)
+						    struct kbase_context *kctx, unsigned int js)
 {
 	bool ret = false;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	dev_dbg(kbdev->dev, "Add unpullable tail kctx %pK (s:%d)\n",
-		(void *)kctx, js);
+	dev_dbg(kbdev->dev, "Add unpullable tail kctx %pK (s:%u)\n", (void *)kctx, js);
 
 	list_move_tail(&kctx->jctx.sched_info.ctx.ctx_list_entry[js],
 		&kbdev->js_data.ctx_list_unpullable[js][kctx->priority]);
@@ -1034,9 +1185,8 @@ static bool kbase_js_ctx_list_add_unpullable_nolock(struct kbase_device *kbdev,
  *
  * Return:  true if caller should call kbase_backend_ctx_count_changed()
  */
-static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev,
-					struct kbase_context *kctx,
-					int js)
+static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev, struct kbase_context *kctx,
+					    unsigned int js)
 {
 	bool ret = false;
 
@@ -1072,9 +1222,8 @@ static bool kbase_js_ctx_list_remove_nolock(struct kbase_device *kbdev,
  * Return:  Context to use for specified slot.
  *          NULL if no contexts present for specified slot
  */
-static struct kbase_context *kbase_js_ctx_list_pop_head_nolock(
-						struct kbase_device *kbdev,
-						int js)
+static struct kbase_context *kbase_js_ctx_list_pop_head_nolock(struct kbase_device *kbdev,
+							       unsigned int js)
 {
 	struct kbase_context *kctx;
 	int i;
@@ -1090,9 +1239,8 @@ static struct kbase_context *kbase_js_ctx_list_pop_head_nolock(
 				jctx.sched_info.ctx.ctx_list_entry[js]);
 
 		list_del_init(&kctx->jctx.sched_info.ctx.ctx_list_entry[js]);
-		dev_dbg(kbdev->dev,
-			"Popped %pK from the pullable queue (s:%d)\n",
-			(void *)kctx, js);
+		dev_dbg(kbdev->dev, "Popped %pK from the pullable queue (s:%u)\n", (void *)kctx,
+			js);
 		return kctx;
 	}
 	return NULL;
@@ -1107,8 +1255,7 @@ static struct kbase_context *kbase_js_ctx_list_pop_head_nolock(
  * Return:  Context to use for specified slot.
  *          NULL if no contexts present for specified slot
  */
-static struct kbase_context *kbase_js_ctx_list_pop_head(
-		struct kbase_device *kbdev, int js)
+static struct kbase_context *kbase_js_ctx_list_pop_head(struct kbase_device *kbdev, unsigned int js)
 {
 	struct kbase_context *kctx;
 	unsigned long flags;
@@ -1132,8 +1279,7 @@ static struct kbase_context *kbase_js_ctx_list_pop_head(
  * Return:         true if context can be pulled from on specified slot
  *                 false otherwise
  */
-static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js,
-					bool is_scheduled)
+static bool kbase_js_ctx_pullable(struct kbase_context *kctx, unsigned int js, bool is_scheduled)
 {
 	struct kbasep_js_device_data *js_devdata;
 	struct kbase_jd_atom *katom;
@@ -1152,8 +1298,7 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js,
 	}
 	katom = jsctx_rb_peek(kctx, js);
 	if (!katom) {
-		dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%d)\n",
-			(void *)kctx, js);
+		dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%u)\n", (void *)kctx, js);
 		return false; /* No pullable atoms */
 	}
 	if (kbase_jsctx_slot_prio_is_blocked(kctx, js, katom->sched_priority)) {
@@ -1161,7 +1306,7 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js,
 			kctx->kbdev, JS_SLOT_PRIO_IS_BLOCKED, kctx, katom,
 			katom->jc, js, (unsigned int)katom->sched_priority);
 		dev_dbg(kbdev->dev,
-			"JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%d)\n",
+			"JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%u)\n",
 			(void *)kctx, katom->sched_priority, js);
 		return false;
 	}
@@ -1182,14 +1327,14 @@ static bool kbase_js_ctx_pullable(struct kbase_context *kctx, int js,
 		if ((katom->atom_flags & KBASE_KATOM_FLAG_FAIL_BLOCKER) &&
 			kbase_backend_nr_atoms_on_slot(kctx->kbdev, js)) {
 			dev_dbg(kbdev->dev,
-				"JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%d)\n",
+				"JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%u)\n",
 				(void *)katom, js);
 			return false;
 		}
 	}
 
-	dev_dbg(kbdev->dev, "JS: Atom %pK is pullable in kctx %pK (s:%d)\n",
-		(void *)katom, (void *)kctx, js);
+	dev_dbg(kbdev->dev, "JS: Atom %pK is pullable in kctx %pK (s:%u)\n", (void *)katom,
+		(void *)kctx, js);
 
 	return true;
 }
@@ -1200,7 +1345,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx,
 	struct kbase_device *kbdev = kctx->kbdev;
 	bool ret = true;
 	bool has_dep = false, has_x_dep = false;
-	int js = kbase_js_get_slot(kbdev, katom);
+	unsigned int js = kbase_js_get_slot(kbdev, katom);
 	int prio = katom->sched_priority;
 	int i;
 
@@ -1208,7 +1353,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx,
 		struct kbase_jd_atom *dep_atom = katom->dep[i].atom;
 
 		if (dep_atom) {
-			int dep_js = kbase_js_get_slot(kbdev, dep_atom);
+			unsigned int dep_js = kbase_js_get_slot(kbdev, dep_atom);
 			int dep_prio = dep_atom->sched_priority;
 
 			dev_dbg(kbdev->dev,
@@ -1363,7 +1508,7 @@ static bool kbase_js_dep_validate(struct kbase_context *kctx,
 void kbase_js_set_ctx_priority(struct kbase_context *kctx, int new_priority)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	int js;
+	unsigned int js;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
 
@@ -1789,10 +1934,12 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal(
 	unsigned long flags;
 	struct kbasep_js_device_data *js_devdata;
 	struct kbasep_js_kctx_info *js_kctx_info;
+	int kctx_as_nr = kctx->as_nr;
 
 	kbasep_js_release_result release_result = 0u;
 	bool runpool_ctx_attr_change = false;
 	int new_ref_count;
+	CSTD_UNUSED(kctx_as_nr);
 
 	KBASE_DEBUG_ASSERT(kbdev != NULL);
 	KBASE_DEBUG_ASSERT(kctx != NULL);
@@ -1809,7 +1956,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal(
 	 *
 	 * Assert about out calling contract
 	 */
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
 	KBASE_DEBUG_ASSERT(atomic_read(&kctx->refcount) > 0);
@@ -1911,7 +2058,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal(
 
 		kbase_backend_release_ctx_noirq(kbdev, kctx);
 
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 
 		/* Note: Don't reuse kctx_as_nr now */
 
@@ -1934,7 +2081,7 @@ static kbasep_js_release_result kbasep_js_runpool_release_ctx_internal(
 				katom_retained_state, runpool_ctx_attr_change);
 
 		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 	}
 
 	return release_result;
@@ -2064,9 +2211,8 @@ void kbase_js_set_timeouts(struct kbase_device *kbdev)
 	kbase_backend_timeouts_changed(kbdev);
 }
 
-static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev,
-					struct kbase_context *kctx,
-					int js)
+static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev, struct kbase_context *kctx,
+				   unsigned int js)
 {
 	struct kbasep_js_device_data *js_devdata;
 	struct kbasep_js_kctx_info *js_kctx_info;
@@ -2074,7 +2220,7 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev,
 	bool kctx_suspended = false;
 	int as_nr;
 
-	dev_dbg(kbdev->dev, "Scheduling kctx %pK (s:%d)\n", kctx, js);
+	dev_dbg(kbdev->dev, "Scheduling kctx %pK (s:%u)\n", kctx, js);
 
 	js_devdata = &kbdev->js_data;
 	js_kctx_info = &kctx->jctx.sched_info;
@@ -2101,8 +2247,8 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev,
 			WARN_ON(as_nr == KBASEP_AS_NR_INVALID);
 		}
 	}
-	if (as_nr == KBASEP_AS_NR_INVALID)
-		return false; /* No address spaces currently available */
+	if ((as_nr < 0) || (as_nr >= BASE_MAX_NR_AS))
+		return false; /* No address space currently available */
 
 	/*
 	 * Atomic transaction on the Context and Run Pool begins
@@ -2171,6 +2317,9 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev,
 #else
 	if (kbase_pm_is_suspending(kbdev)) {
 #endif
+		/* Cause it to leave at some later point */
+		bool retained;
+		CSTD_UNUSED(retained);
 
 		kbase_ctx_sched_inc_refcount_nolock(kctx);
 
@@ -2205,9 +2354,8 @@ static bool kbasep_js_schedule_ctx(struct kbase_device *kbdev,
 	return true;
 }
 
-static bool kbase_js_use_ctx(struct kbase_device *kbdev,
-				struct kbase_context *kctx,
-				int js)
+static bool kbase_js_use_ctx(struct kbase_device *kbdev, struct kbase_context *kctx,
+			     unsigned int js)
 {
 	unsigned long flags;
 
@@ -2215,9 +2363,7 @@ static bool kbase_js_use_ctx(struct kbase_device *kbdev,
 
 	if (kbase_ctx_flag(kctx, KCTX_SCHEDULED) &&
 			kbase_backend_use_ctx_sched(kbdev, kctx, js)) {
-
-		dev_dbg(kbdev->dev,
-			"kctx %pK already has ASID - mark as active (s:%d)\n",
+		dev_dbg(kbdev->dev, "kctx %pK already has ASID - mark as active (s:%u)\n",
 			(void *)kctx, js);
 
 		if (kbdev->hwaccess.active_kctx[js] != kctx) {
@@ -2484,8 +2630,7 @@ bool kbase_js_is_atom_valid(struct kbase_device *kbdev,
 	return true;
 }
 
-static int kbase_js_get_slot(struct kbase_device *kbdev,
-				struct kbase_jd_atom *katom)
+static unsigned int kbase_js_get_slot(struct kbase_device *kbdev, struct kbase_jd_atom *katom)
 {
 	if (katom->core_req & BASE_JD_REQ_JOB_SLOT)
 		return katom->jobslot;
@@ -2524,11 +2669,10 @@ bool kbase_js_dep_resolved_submit(struct kbase_context *kctx,
 			(katom->pre_dep && (katom->pre_dep->atom_flags &
 			KBASE_KATOM_FLAG_JSCTX_IN_X_DEP_LIST))) {
 		int prio = katom->sched_priority;
-		int js = katom->slot_nr;
+		unsigned int js = katom->slot_nr;
 		struct jsctx_queue *queue = &kctx->jsctx_queue[prio][js];
 
-		dev_dbg(kctx->kbdev->dev, "Add atom %pK to X_DEP list (s:%d)\n",
-			(void *)katom, js);
+		dev_dbg(kctx->kbdev->dev, "Add atom %pK to X_DEP list (s:%u)\n", (void *)katom, js);
 
 		list_add_tail(&katom->queue, &queue->x_dep_head);
 		katom->atom_flags |= KBASE_KATOM_FLAG_JSCTX_IN_X_DEP_LIST;
@@ -2619,8 +2763,8 @@ static void kbase_js_move_to_tree(struct kbase_jd_atom *katom)
  *
  * Context: Caller must hold the HW access lock
  */
-static void kbase_js_evict_deps(struct kbase_context *kctx,
-				struct kbase_jd_atom *katom, int js, int prio)
+static void kbase_js_evict_deps(struct kbase_context *kctx, struct kbase_jd_atom *katom,
+				unsigned int js, int prio)
 {
 	struct kbase_jd_atom *x_dep = katom->x_post_dep;
 	struct kbase_jd_atom *next_katom = katom->post_dep;
@@ -2652,7 +2796,7 @@ static void kbase_js_evict_deps(struct kbase_context *kctx,
 	}
 }
 
-struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js)
+struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, unsigned int js)
 {
 	struct kbase_jd_atom *katom;
 	struct kbasep_js_device_data *js_devdata;
@@ -2662,8 +2806,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js)
 	KBASE_DEBUG_ASSERT(kctx);
 
 	kbdev = kctx->kbdev;
-	dev_dbg(kbdev->dev, "JS: pulling an atom from kctx %pK (s:%d)\n",
-		(void *)kctx, js);
+	dev_dbg(kbdev->dev, "JS: pulling an atom from kctx %pK (s:%u)\n", (void *)kctx, js);
 
 	js_devdata = &kbdev->js_data;
 	lockdep_assert_held(&kbdev->hwaccess_lock);
@@ -2682,13 +2825,12 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js)
 
 	katom = jsctx_rb_peek(kctx, js);
 	if (!katom) {
-		dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%d)\n",
-			(void *)kctx, js);
+		dev_dbg(kbdev->dev, "JS: No pullable atom in kctx %pK (s:%u)\n", (void *)kctx, js);
 		return NULL;
 	}
 	if (kbase_jsctx_slot_prio_is_blocked(kctx, js, katom->sched_priority)) {
 		dev_dbg(kbdev->dev,
-			"JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%d)\n",
+			"JS: kctx %pK is blocked from submitting atoms at priority %d and lower (s:%u)\n",
 			(void *)kctx, katom->sched_priority, js);
 		return NULL;
 	}
@@ -2722,7 +2864,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js)
 		if ((katom->atom_flags & KBASE_KATOM_FLAG_FAIL_BLOCKER) &&
 				kbase_backend_nr_atoms_on_slot(kbdev, js)) {
 			dev_dbg(kbdev->dev,
-				"JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%d)\n",
+				"JS: Atom %pK has cross-slot fail dependency and atoms on slot (s:%u)\n",
 				(void *)katom, js);
 			return NULL;
 		}
@@ -2745,7 +2887,7 @@ struct kbase_jd_atom *kbase_js_pull(struct kbase_context *kctx, int js)
 
 	katom->ticks = 0;
 
-	dev_dbg(kbdev->dev, "JS: successfully pulled atom %pK from kctx %pK (s:%d)\n",
+	dev_dbg(kbdev->dev, "JS: successfully pulled atom %pK from kctx %pK (s:%u)\n",
 		(void *)katom, (void *)kctx, js);
 
 	return katom;
@@ -3276,6 +3418,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx,
 	int atom_slot;
 	bool context_idle = false;
 	int prio = katom->sched_priority;
+	bool slot_became_unblocked;
 
 	kbdev = kctx->kbdev;
 	atom_slot = katom->slot_nr;
@@ -3298,44 +3441,37 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx,
 	mutex_lock(&js_devdata->runpool_mutex);
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
-	if (katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE) {
-		bool slot_became_unblocked;
+	WARN_ON(!(katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE));
 
-		dev_dbg(kbdev->dev, "Atom %pK is in runnable_tree\n",
-			(void *)katom);
+	dev_dbg(kbdev->dev, "Atom %pK is in runnable_tree\n", (void *)katom);
 
-		slot_became_unblocked =
-			kbase_jsctx_slot_atom_pulled_dec(kctx, katom);
-		context_idle = !kbase_jsctx_atoms_pulled(kctx);
+	slot_became_unblocked = kbase_jsctx_slot_atom_pulled_dec(kctx, katom);
+	context_idle = !kbase_jsctx_atoms_pulled(kctx);
 
-		if (!kbase_jsctx_atoms_pulled(kctx) && !kctx->slots_pullable) {
-			WARN_ON(!kbase_ctx_flag(kctx, KCTX_RUNNABLE_REF));
-			kbase_ctx_flag_clear(kctx, KCTX_RUNNABLE_REF);
-			atomic_dec(&kbdev->js_data.nr_contexts_runnable);
-			timer_sync = true;
-		}
+	if (!kbase_jsctx_atoms_pulled(kctx) && !kctx->slots_pullable) {
+		WARN_ON(!kbase_ctx_flag(kctx, KCTX_RUNNABLE_REF));
+		kbase_ctx_flag_clear(kctx, KCTX_RUNNABLE_REF);
+		atomic_dec(&kbdev->js_data.nr_contexts_runnable);
+		timer_sync = true;
+	}
 
-		/* If this slot has been blocked due to soft-stopped atoms, and
-		 * all atoms have now been processed at this priority level and
-		 * higher, then unblock the slot
-		 */
-		if (slot_became_unblocked) {
-			dev_dbg(kbdev->dev,
-				"kctx %pK is no longer blocked from submitting on slot %d at priority %d or higher\n",
-				(void *)kctx, atom_slot, prio);
+	/* If this slot has been blocked due to soft-stopped atoms, and
+	 * all atoms have now been processed at this priority level and
+	 * higher, then unblock the slot
+	 */
+	if (slot_became_unblocked) {
+		dev_dbg(kbdev->dev,
+			"kctx %pK is no longer blocked from submitting on slot %d at priority %d or higher\n",
+			(void *)kctx, atom_slot, prio);
 
-			if (kbase_js_ctx_pullable(kctx, atom_slot, true))
-				timer_sync |=
-					kbase_js_ctx_list_add_pullable_nolock(
-						kbdev, kctx, atom_slot);
-		}
+		if (kbase_js_ctx_pullable(kctx, atom_slot, true))
+			timer_sync |=
+				kbase_js_ctx_list_add_pullable_nolock(kbdev, kctx, atom_slot);
 	}
-	WARN_ON(!(katom->atom_flags & KBASE_KATOM_FLAG_JSCTX_IN_TREE));
 
 	if (!kbase_jsctx_slot_atoms_pulled(kctx, atom_slot) &&
 	    jsctx_rb_none_to_pull(kctx, atom_slot)) {
-		if (!list_empty(
-			&kctx->jctx.sched_info.ctx.ctx_list_entry[atom_slot]))
+		if (!list_empty(&kctx->jctx.sched_info.ctx.ctx_list_entry[atom_slot]))
 			timer_sync |= kbase_js_ctx_list_remove_nolock(
 					kctx->kbdev, kctx, atom_slot);
 	}
@@ -3348,7 +3484,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx,
 	if (!kbasep_js_is_submit_allowed(js_devdata, kctx) &&
 	    !kbase_jsctx_atoms_pulled(kctx) &&
 	    !kbase_ctx_flag(kctx, KCTX_DYING)) {
-		int js;
+		unsigned int js;
 
 		kbasep_js_set_submit_allowed(js_devdata, kctx);
 
@@ -3360,7 +3496,7 @@ bool kbase_js_complete_atom_wq(struct kbase_context *kctx,
 		}
 	} else if (katom->x_post_dep &&
 			kbasep_js_is_submit_allowed(js_devdata, kctx)) {
-		int js;
+		unsigned int js;
 
 		for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) {
 			if (kbase_js_ctx_pullable(kctx, js, true))
@@ -3638,13 +3774,13 @@ done:
 	return ret;
 }
 
-void kbase_js_sched(struct kbase_device *kbdev, int js_mask)
+void kbase_js_sched(struct kbase_device *kbdev, unsigned int js_mask)
 {
 	struct kbasep_js_device_data *js_devdata;
 	struct kbase_context *last_active[BASE_JM_MAX_NR_SLOTS];
 	bool timer_sync = false;
 	bool ctx_waiting[BASE_JM_MAX_NR_SLOTS];
-	int js;
+	unsigned int js;
 
 	KBASE_TLSTREAM_TL_JS_SCHED_START(kbdev, 0);
 
@@ -3690,18 +3826,15 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask)
 
 			if (!kctx) {
 				js_mask &= ~(1 << js);
-				dev_dbg(kbdev->dev,
-					"No kctx on pullable list (s:%d)\n",
-					js);
+				dev_dbg(kbdev->dev, "No kctx on pullable list (s:%u)\n", js);
 				break;
 			}
 
 			if (!kbase_ctx_flag(kctx, KCTX_ACTIVE)) {
 				context_idle = true;
 
-				dev_dbg(kbdev->dev,
-					"kctx %pK is not active (s:%d)\n",
-					(void *)kctx, js);
+				dev_dbg(kbdev->dev, "kctx %pK is not active (s:%u)\n", (void *)kctx,
+					js);
 
 				if (kbase_js_defer_activate_for_slot(kctx, js)) {
 					bool ctx_count_changed;
@@ -3724,8 +3857,7 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask)
 				if (kbase_pm_context_active_handle_suspend(
 									kbdev,
 				      KBASE_PM_SUSPEND_HANDLER_DONT_INCREASE)) {
-					dev_dbg(kbdev->dev,
-						"Suspend pending (s:%d)\n", js);
+					dev_dbg(kbdev->dev, "Suspend pending (s:%u)\n", js);
 					/* Suspend pending - return context to
 					 * queue and stop scheduling
 					 */
@@ -3786,16 +3918,13 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask)
 			kbase_ctx_flag_clear(kctx, KCTX_PULLED);
 
 			if (!kbase_jm_kick(kbdev, 1 << js)) {
-				dev_dbg(kbdev->dev,
-					"No more jobs can be submitted (s:%d)\n",
-					js);
+				dev_dbg(kbdev->dev, "No more jobs can be submitted (s:%u)\n", js);
 				js_mask &= ~(1 << js);
 			}
 			if (!kbase_ctx_flag(kctx, KCTX_PULLED)) {
 				bool pullable;
 
-				dev_dbg(kbdev->dev,
-					"No atoms pulled from kctx %pK (s:%d)\n",
+				dev_dbg(kbdev->dev, "No atoms pulled from kctx %pK (s:%u)\n",
 					(void *)kctx, js);
 
 				pullable = kbase_js_ctx_pullable(kctx, js,
@@ -3879,8 +4008,8 @@ void kbase_js_sched(struct kbase_device *kbdev, int js_mask)
 	for (js = 0; js < BASE_JM_MAX_NR_SLOTS; js++) {
 		if (kbdev->hwaccess.active_kctx[js] == last_active[js] &&
 				ctx_waiting[js]) {
-			dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%d)\n",
-					(void *)last_active[js], js);
+			dev_dbg(kbdev->dev, "Marking kctx %pK as inactive (s:%u)\n",
+				(void *)last_active[js], js);
 			kbdev->hwaccess.active_kctx[js] = NULL;
 		}
 	}
@@ -3951,7 +4080,7 @@ void kbase_js_zap_context(struct kbase_context *kctx)
 	 */
 	if (!kbase_ctx_flag(kctx, KCTX_SCHEDULED)) {
 		unsigned long flags;
-		int js;
+		unsigned int js;
 
 		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 		for (js = 0; js < kbdev->gpu_props.num_job_slots; js++) {
@@ -3990,6 +4119,8 @@ void kbase_js_zap_context(struct kbase_context *kctx)
 		rt_mutex_unlock(&kctx->jctx.lock);
 	} else {
 		unsigned long flags;
+		bool was_retained;
+		CSTD_UNUSED(was_retained);
 
 		/* Case c: didn't evict, but it is scheduled - it's in the Run
 		 * Pool
@@ -4068,7 +4199,7 @@ static void kbase_js_foreach_ctx_job(struct kbase_context *kctx,
 {
 	struct kbase_device *kbdev;
 	unsigned long flags;
-	u32 js;
+	unsigned int js;
 
 	kbdev = kctx->kbdev;
 
@@ -4088,13 +4219,15 @@ base_jd_prio kbase_js_priority_check(struct kbase_device *kbdev, base_jd_prio pr
 {
 	struct priority_control_manager_device *pcm_device = kbdev->pcm_dev;
 	int req_priority, out_priority;
-	base_jd_prio out_jd_priority = priority;
 
-	if (pcm_device)	{
-		req_priority = kbasep_js_atom_prio_to_sched_prio(priority);
-		out_priority = pcm_device->ops.pcm_scheduler_priority_check(pcm_device, current, req_priority);
-		out_jd_priority = kbasep_js_sched_prio_to_atom_prio(out_priority);
-	}
-	return out_jd_priority;
+	req_priority = kbasep_js_atom_prio_to_sched_prio(priority);
+	out_priority = req_priority;
+	/* Does not use pcm defined priority check if PCM not defined or if
+	 * kbasep_js_atom_prio_to_sched_prio returns an error
+	 * (KBASE_JS_ATOM_SCHED_PRIO_INVALID).
+	 */
+	if (pcm_device && (req_priority != KBASE_JS_ATOM_SCHED_PRIO_INVALID))
+		out_priority = pcm_device->ops.pcm_scheduler_priority_check(pcm_device, current,
+									    req_priority);
+	return kbasep_js_sched_prio_to_atom_prio(kbdev, out_priority);
 }
-
diff --git a/mali_kbase/mali_kbase_kinstr_jm.c b/mali_kbase/mali_kbase_kinstr_jm.c
index 84efbb3..ca74540 100644
--- a/mali_kbase/mali_kbase_kinstr_jm.c
+++ b/mali_kbase/mali_kbase_kinstr_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -45,8 +45,14 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/version.h>
+#include <linux/version_compat_defs.h>
 #include <linux/wait.h>
 
+/* Explicitly include epoll header for old kernels. Not required from 4.16. */
+#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE
+#include <uapi/linux/eventpoll.h>
+#endif
+
 /* Define static_assert().
  *
  * The macro was introduced in kernel 5.1. But older vendor kernels may define
@@ -60,14 +66,6 @@
 #define __static_assert(e, msg, ...) _Static_assert(e, msg)
 #endif
 
-#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE
-typedef unsigned int __poll_t;
-#endif
-
-#ifndef ENOTSUP
-#define ENOTSUP EOPNOTSUPP
-#endif
-
 /* The module printing prefix */
 #define PR_ "mali_kbase_kinstr_jm: "
 
@@ -227,11 +225,8 @@ static inline bool reader_changes_is_valid_size(const size_t size)
  *
  * Return:
  * (0, U16_MAX] - the number of data elements allocated
- * -EINVAL - a pointer was invalid
- * -ENOTSUP - we do not support allocation of the context
  * -ERANGE - the requested memory size was invalid
  * -ENOMEM - could not allocate the memory
- * -EADDRINUSE - the buffer memory was already allocated
  */
 static int reader_changes_init(struct reader_changes *const changes,
 			       const size_t size)
@@ -626,31 +621,34 @@ exit:
  *
  * Return:
  * * 0 - no data ready
- * * POLLIN - state changes have been buffered
- * * -EBADF - the file descriptor did not have an attached reader
- * * -EINVAL - the IO control arguments were invalid
+ * * EPOLLIN | EPOLLRDNORM - state changes have been buffered
+ * * EPOLLHUP | EPOLLERR - IO control arguments were invalid or the file
+ *                         descriptor did not have an attached reader.
  */
 static __poll_t reader_poll(struct file *const file,
 			    struct poll_table_struct *const wait)
 {
 	struct reader *reader;
 	struct reader_changes *changes;
+	__poll_t mask = 0;
 
 	if (unlikely(!file || !wait))
-		return -EINVAL;
+		return EPOLLHUP | EPOLLERR;
 
 	reader = file->private_data;
 	if (unlikely(!reader))
-		return -EBADF;
+		return EPOLLHUP | EPOLLERR;
 
 	changes = &reader->changes;
-
 	if (reader_changes_count(changes) >= changes->threshold)
-		return POLLIN;
+		return EPOLLIN | EPOLLRDNORM;
 
 	poll_wait(file, &reader->wait_queue, wait);
 
-	return (reader_changes_count(changes) > 0) ? POLLIN : 0;
+	if (reader_changes_count(changes) > 0)
+		mask |= EPOLLIN | EPOLLRDNORM;
+
+	return mask;
 }
 
 /* The file operations virtual function table */
@@ -666,7 +664,7 @@ static const struct file_operations file_operations = {
 static const size_t kbase_kinstr_jm_readers_max = 16;
 
 /**
- * kbasep_kinstr_jm_release() - Invoked when the reference count is dropped
+ * kbase_kinstr_jm_release() - Invoked when the reference count is dropped
  * @ref: the context reference count
  */
 static void kbase_kinstr_jm_release(struct kref *const ref)
@@ -737,7 +735,7 @@ static int kbase_kinstr_jm_readers_add(struct kbase_kinstr_jm *const ctx,
 }
 
 /**
- * readers_del() - Deletes a reader from the list of readers
+ * kbase_kinstr_jm_readers_del() - Deletes a reader from the list of readers
  * @ctx: the instrumentation context
  * @reader: the reader to delete
  */
diff --git a/mali_kbase/mali_kbase_kinstr_jm.h b/mali_kbase/mali_kbase_kinstr_jm.h
index 2c904e5..84fabac 100644
--- a/mali_kbase/mali_kbase_kinstr_jm.h
+++ b/mali_kbase/mali_kbase_kinstr_jm.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -71,8 +71,6 @@
 #else
 /* empty wrapper macros for userspace */
 #define static_branch_unlikely(key) (1)
-#define KERNEL_VERSION(a, b, c) (0)
-#define LINUX_VERSION_CODE (1)
 #endif /* __KERNEL__ */
 
 /* Forward declarations */
diff --git a/mali_kbase/mali_kbase_kinstr_prfcnt.c b/mali_kbase/mali_kbase_kinstr_prfcnt.c
index afc008b..f0c4da7 100644
--- a/mali_kbase/mali_kbase_kinstr_prfcnt.c
+++ b/mali_kbase/mali_kbase_kinstr_prfcnt.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,8 +21,8 @@
 
 #include "mali_kbase.h"
 #include "mali_kbase_kinstr_prfcnt.h"
-#include "mali_kbase_hwcnt_virtualizer.h"
-#include "mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
 #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h>
 #include "mali_malisw.h"
 #include "mali_kbase_debug.h"
@@ -36,8 +36,14 @@
 #include <linux/mutex.h>
 #include <linux/poll.h>
 #include <linux/slab.h>
+#include <linux/version_compat_defs.h>
 #include <linux/workqueue.h>
 
+/* Explicitly include epoll header for old kernels. Not required from 4.16. */
+#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE
+#include <uapi/linux/eventpoll.h>
+#endif
+
 /* The minimum allowed interval between dumps, in nanoseconds
  * (equivalent to 10KHz)
  */
@@ -46,9 +52,6 @@
 /* The maximum allowed buffers per client */
 #define MAX_BUFFER_COUNT 32
 
-/* The module printing prefix */
-#define KINSTR_PRFCNT_PREFIX "mali_kbase_kinstr_prfcnt: "
-
 /**
  * struct kbase_kinstr_prfcnt_context - IOCTL interface for userspace hardware
  *                                      counters.
@@ -87,16 +90,13 @@ struct kbase_kinstr_prfcnt_sample {
 
 /**
  * struct kbase_kinstr_prfcnt_sample_array - Array of sample data.
- * @page_addr:    Address of allocated pages. A single allocation is used
+ * @user_buf:     Address of allocated userspace buffer. A single allocation is used
  *                for all Dump Buffers in the array.
- * @page_order:   The allocation order of the pages, the order is on a
- *                logarithmic scale.
  * @sample_count: Number of allocated samples.
  * @samples:      Non-NULL pointer to the array of Dump Buffers.
  */
 struct kbase_kinstr_prfcnt_sample_array {
-	u64 page_addr;
-	unsigned int page_order;
+	u8 *user_buf;
 	size_t sample_count;
 	struct kbase_kinstr_prfcnt_sample *samples;
 };
@@ -120,16 +120,31 @@ struct kbase_kinstr_prfcnt_client_config {
 };
 
 /**
- * struct kbase_kinstr_prfcnt_async - Asynchronous sampling operation to
- *                                    carry out for a kinstr_prfcnt_client.
- * @dump_work: Worker for performing asynchronous counter dumps.
- * @user_data: User data for asynchronous dump in progress.
- * @ts_end_ns: End timestamp of most recent async dump.
+ * enum kbase_kinstr_prfcnt_client_init_state - A list of
+ *                                              initialisation states that the
+ *                                              kinstr_prfcnt client can be at
+ *                                              during initialisation. Useful
+ *                                              for terminating a partially
+ *                                              initialised client.
+ *
+ * @KINSTR_PRFCNT_UNINITIALISED : Client is uninitialised
+ * @KINSTR_PRFCNT_PARSE_SETUP : Parse the setup session
+ * @KINSTR_PRFCNT_ENABLE_MAP : Allocate memory for enable map
+ * @KINSTR_PRFCNT_DUMP_BUFFER : Allocate memory for dump buffer
+ * @KINSTR_PRFCNT_SAMPLE_ARRAY : Allocate memory for and initialise sample array
+ * @KINSTR_PRFCNT_VIRTUALIZER_CLIENT : Create virtualizer client
+ * @KINSTR_PRFCNT_WAITQ_MUTEX : Create and initialise mutex and waitqueue
+ * @KINSTR_PRFCNT_INITIALISED : Client is fully initialised
  */
-struct kbase_kinstr_prfcnt_async {
-	struct work_struct dump_work;
-	u64 user_data;
-	u64 ts_end_ns;
+enum kbase_kinstr_prfcnt_client_init_state {
+	KINSTR_PRFCNT_UNINITIALISED,
+	KINSTR_PRFCNT_PARSE_SETUP = KINSTR_PRFCNT_UNINITIALISED,
+	KINSTR_PRFCNT_ENABLE_MAP,
+	KINSTR_PRFCNT_DUMP_BUFFER,
+	KINSTR_PRFCNT_SAMPLE_ARRAY,
+	KINSTR_PRFCNT_VIRTUALIZER_CLIENT,
+	KINSTR_PRFCNT_WAITQ_MUTEX,
+	KINSTR_PRFCNT_INITIALISED
 };
 
 /**
@@ -139,9 +154,7 @@ struct kbase_kinstr_prfcnt_async {
  * @hvcli:                Hardware counter virtualizer client.
  * @node:                 Node used to attach this client to list in
  *                        kinstr_prfcnt context.
- * @cmd_sync_lock:        Lock coordinating the reader interface for commands
- *                        that need interacting with the async sample dump
- *                        worker thread.
+ * @cmd_sync_lock:        Lock coordinating the reader interface for commands.
  * @next_dump_time_ns:    Time in ns when this client's next periodic dump must
  *                        occur. If 0, not a periodic client.
  * @dump_interval_ns:     Interval between periodic dumps. If 0, not a periodic
@@ -162,15 +175,10 @@ struct kbase_kinstr_prfcnt_async {
  * @waitq:                Client's notification queue.
  * @sample_size:          Size of the data required for one sample, in bytes.
  * @sample_count:         Number of samples the client is able to capture.
- * @sync_sample_count:    Number of available spaces for synchronous samples.
- *                        It can differ from sample_count if asynchronous
- *                        sample requests are reserving space in the buffer.
  * @user_data:            User data associated with the session.
  *                        This is set when the session is started and stopped.
  *                        This value is ignored for control commands that
  *                        provide another value.
- * @async:                Asynchronous sampling operations to carry out in this
- *                        client's session.
  */
 struct kbase_kinstr_prfcnt_client {
 	struct kbase_kinstr_prfcnt_context *kinstr_ctx;
@@ -191,9 +199,7 @@ struct kbase_kinstr_prfcnt_client {
 	wait_queue_head_t waitq;
 	size_t sample_size;
 	size_t sample_count;
-	atomic_t sync_sample_count;
 	u64 user_data;
-	struct kbase_kinstr_prfcnt_async async;
 };
 
 static struct prfcnt_enum_item kinstr_prfcnt_supported_requests[] = {
@@ -226,35 +232,29 @@ static struct prfcnt_enum_item kinstr_prfcnt_supported_requests[] = {
  * @filp: Non-NULL pointer to file structure.
  * @wait: Non-NULL pointer to poll table.
  *
- * Return: POLLIN if data can be read without blocking, 0 if data can not be
- *         read without blocking, else error code.
+ * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking, 0 if
+ *         data can not be read without blocking, else EPOLLHUP | EPOLLERR.
  */
-#if KERNEL_VERSION(4, 16, 0) >= LINUX_VERSION_CODE
-static unsigned int
-kbasep_kinstr_prfcnt_hwcnt_reader_poll(struct file *filp,
-				       struct poll_table_struct *wait)
-#else
 static __poll_t
 kbasep_kinstr_prfcnt_hwcnt_reader_poll(struct file *filp,
 				       struct poll_table_struct *wait)
-#endif
 {
 	struct kbase_kinstr_prfcnt_client *cli;
 
 	if (!filp || !wait)
-		return -EINVAL;
+		return EPOLLHUP | EPOLLERR;
 
 	cli = filp->private_data;
 
 	if (!cli)
-		return -EINVAL;
+		return EPOLLHUP | EPOLLERR;
 
 	poll_wait(filp, &cli->waitq, wait);
 
 	if (atomic_read(&cli->write_idx) != atomic_read(&cli->fetch_idx))
-		return POLLIN;
+		return EPOLLIN | EPOLLRDNORM;
 
-	return 0;
+	return (__poll_t)0;
 }
 
 /**
@@ -392,7 +392,10 @@ kbase_hwcnt_metadata_block_type_to_prfcnt_block_type(u64 type)
 		block_type = PRFCNT_BLOCK_TYPE_MEMORY;
 		break;
 
-	case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_UNDEFINED:
+	case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_FE_UNDEFINED:
+	case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_SC_UNDEFINED:
+	case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_TILER_UNDEFINED:
+	case KBASE_HWCNT_GPU_V5_BLOCK_TYPE_PERF_MEMSYS_UNDEFINED:
 	default:
 		block_type = PRFCNT_BLOCK_TYPE_RESERVED;
 		break;
@@ -429,18 +432,23 @@ static
 int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *enable_map,
 					      struct kbase_hwcnt_dump_buffer *dst,
 					      struct prfcnt_metadata **block_meta_base,
-					      u64 base_addr, u8 counter_set)
+					      u8 *base_addr, u8 counter_set)
 {
 	size_t grp, blk, blk_inst;
 	struct prfcnt_metadata **ptr_md = block_meta_base;
 	const struct kbase_hwcnt_metadata *metadata;
+	uint8_t block_idx = 0;
 
 	if (!dst || !*block_meta_base)
 		return -EINVAL;
 
 	metadata = dst->metadata;
 	kbase_hwcnt_metadata_for_each_block(metadata, grp, blk, blk_inst) {
-		u64 *dst_blk;
+		u8 *dst_blk;
+
+		/* Block indices must be reported with no gaps. */
+		if (blk_inst == 0)
+			block_idx = 0;
 
 		/* Skip unavailable or non-enabled blocks */
 		if (kbase_kinstr_is_block_type_reserved(metadata, grp, blk) ||
@@ -448,20 +456,21 @@ int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *ena
 		    !kbase_hwcnt_enable_map_block_enabled(enable_map, grp, blk, blk_inst))
 			continue;
 
-		dst_blk = kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
+		dst_blk = (u8 *)kbase_hwcnt_dump_buffer_block_instance(dst, grp, blk, blk_inst);
 		(*ptr_md)->hdr.item_type = PRFCNT_SAMPLE_META_TYPE_BLOCK;
 		(*ptr_md)->hdr.item_version = PRFCNT_READER_API_VERSION;
 		(*ptr_md)->u.block_md.block_type =
 			kbase_hwcnt_metadata_block_type_to_prfcnt_block_type(
 				kbase_hwcnt_metadata_block_type(metadata, grp,
 								blk));
-		(*ptr_md)->u.block_md.block_idx = (u8)blk_inst;
+		(*ptr_md)->u.block_md.block_idx = block_idx;
 		(*ptr_md)->u.block_md.set = counter_set;
 		(*ptr_md)->u.block_md.block_state = BLOCK_STATE_UNKNOWN;
-		(*ptr_md)->u.block_md.values_offset = (u32)((u64)(uintptr_t)dst_blk - base_addr);
+		(*ptr_md)->u.block_md.values_offset = (u32)(dst_blk - base_addr);
 
 		/* update the buf meta data block pointer to next item */
 		(*ptr_md)++;
+		block_idx++;
 	}
 
 	return 0;
@@ -504,7 +513,7 @@ static void kbasep_kinstr_prfcnt_set_sample_metadata(
 	/* Dealing with counter blocks */
 	ptr_md++;
 	if (WARN_ON(kbasep_kinstr_prfcnt_set_block_meta_items(&cli->enable_map, dump_buf, &ptr_md,
-							      cli->sample_arr.page_addr,
+							      cli->sample_arr.user_buf,
 							      cli->config.counter_set)))
 		return;
 
@@ -514,33 +523,6 @@ static void kbasep_kinstr_prfcnt_set_sample_metadata(
 }
 
 /**
- * kbasep_kinstr_prfcnt_client_output_empty_sample() - Assemble an empty sample
- *                                                     for output.
- * @cli:          Non-NULL pointer to a kinstr_prfcnt client.
- * @buf_idx:      The index to the sample array for saving the sample.
- */
-static void kbasep_kinstr_prfcnt_client_output_empty_sample(
-	struct kbase_kinstr_prfcnt_client *cli, unsigned int buf_idx)
-{
-	struct kbase_hwcnt_dump_buffer *dump_buf;
-	struct prfcnt_metadata *ptr_md;
-
-	if (WARN_ON(buf_idx >= cli->sample_arr.sample_count))
-		return;
-
-	dump_buf = &cli->sample_arr.samples[buf_idx].dump_buf;
-	ptr_md = cli->sample_arr.samples[buf_idx].sample_meta;
-
-	kbase_hwcnt_dump_buffer_zero(dump_buf, &cli->enable_map);
-
-	/* Use end timestamp from most recent async dump */
-	ptr_md->u.sample_md.timestamp_start = cli->async.ts_end_ns;
-	ptr_md->u.sample_md.timestamp_end = cli->async.ts_end_ns;
-
-	kbasep_kinstr_prfcnt_set_sample_metadata(cli, dump_buf, ptr_md);
-}
-
-/**
  * kbasep_kinstr_prfcnt_client_output_sample() - Assemble a sample for output.
  * @cli:          Non-NULL pointer to a kinstr_prfcnt client.
  * @buf_idx:      The index to the sample array for saving the sample.
@@ -589,16 +571,11 @@ static void kbasep_kinstr_prfcnt_client_output_sample(
  * @cli:          Non-NULL pointer to a kinstr_prfcnt client.
  * @event_id:     Event type that triggered the dump.
  * @user_data:    User data to return to the user.
- * @async_dump:   Whether this is an asynchronous dump or not.
- * @empty_sample: Sample block data will be 0 if this is true.
  *
  * Return: 0 on success, else error code.
  */
-static int
-kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli,
-				 enum base_hwcnt_reader_event event_id,
-				 u64 user_data, bool async_dump,
-				 bool empty_sample)
+static int kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli,
+					    enum base_hwcnt_reader_event event_id, u64 user_data)
 {
 	int ret;
 	u64 ts_start_ns = 0;
@@ -616,17 +593,11 @@ kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli,
 	/* Check if there is a place to copy HWC block into. Calculate the
 	 * number of available samples count, by taking into account the type
 	 * of dump.
-	 * Asynchronous dumps have the ability to reserve space in the samples
-	 * array for future dumps, unlike synchronous dumps. Because of that,
-	 * the samples count for synchronous dumps is managed by a variable
-	 * called sync_sample_count, that originally is defined as equal to the
-	 * size of the whole array but later decreases every time an
-	 * asynchronous dump request is pending and then re-increased every
-	 * time an asynchronous dump request is completed.
 	 */
-	available_samples_count = async_dump ?
-					  cli->sample_arr.sample_count :
-					  atomic_read(&cli->sync_sample_count);
+	available_samples_count = cli->sample_arr.sample_count;
+	WARN_ON(available_samples_count < 1);
+	/* Reserve one slot to store the implicit sample taken on CMD_STOP */
+	available_samples_count -= 1;
 	if (write_idx - read_idx == available_samples_count) {
 		/* For periodic sampling, the current active dump
 		 * will be accumulated in the next sample, when
@@ -642,38 +613,19 @@ kbasep_kinstr_prfcnt_client_dump(struct kbase_kinstr_prfcnt_client *cli,
 	 */
 	write_idx %= cli->sample_arr.sample_count;
 
-	if (!empty_sample) {
-		ret = kbase_hwcnt_virtualizer_client_dump(
-			cli->hvcli, &ts_start_ns, &ts_end_ns, &cli->tmp_buf);
-		/* HWC dump error, set the sample with error flag */
-		if (ret)
-			cli->sample_flags |= SAMPLE_FLAG_ERROR;
-
-		/* Make the sample ready and copy it to the userspace mapped buffer */
-		kbasep_kinstr_prfcnt_client_output_sample(
-			cli, write_idx, user_data, ts_start_ns, ts_end_ns);
-	} else {
-		if (!async_dump) {
-			struct prfcnt_metadata *ptr_md;
-			/* User data will not be updated for empty samples. */
-			ptr_md = cli->sample_arr.samples[write_idx].sample_meta;
-			ptr_md->u.sample_md.user_data = user_data;
-		}
+	ret = kbase_hwcnt_virtualizer_client_dump(cli->hvcli, &ts_start_ns, &ts_end_ns,
+						  &cli->tmp_buf);
+	/* HWC dump error, set the sample with error flag */
+	if (ret)
+		cli->sample_flags |= SAMPLE_FLAG_ERROR;
 
-		/* Make the sample ready and copy it to the userspace mapped buffer */
-		kbasep_kinstr_prfcnt_client_output_empty_sample(cli, write_idx);
-	}
+	/* Make the sample ready and copy it to the userspace mapped buffer */
+	kbasep_kinstr_prfcnt_client_output_sample(cli, write_idx, user_data, ts_start_ns,
+						  ts_end_ns);
 
 	/* Notify client. Make sure all changes to memory are visible. */
 	wmb();
 	atomic_inc(&cli->write_idx);
-	if (async_dump) {
-		/* Remember the end timestamp of async dump for empty samples */
-		if (!empty_sample)
-			cli->async.ts_end_ns = ts_end_ns;
-
-		atomic_inc(&cli->sync_sample_count);
-	}
 	wake_up_interruptible(&cli->waitq);
 	/* Reset the flags for the next sample dump */
 	cli->sample_flags = 0;
@@ -687,6 +639,9 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli,
 {
 	int ret;
 	u64 tm_start, tm_end;
+	unsigned int write_idx;
+	unsigned int read_idx;
+	size_t available_samples_count;
 
 	WARN_ON(!cli);
 	lockdep_assert_held(&cli->cmd_sync_lock);
@@ -695,6 +650,16 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli,
 	if (cli->active)
 		return 0;
 
+	write_idx = atomic_read(&cli->write_idx);
+	read_idx = atomic_read(&cli->read_idx);
+
+	/* Check whether there is space to store atleast an implicit sample
+	 * corresponding to CMD_STOP.
+	 */
+	available_samples_count = cli->sample_count - (write_idx - read_idx);
+	if (!available_samples_count)
+		return -EBUSY;
+
 	kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map,
 						 &cli->config.phys_em);
 
@@ -707,7 +672,6 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli,
 		cli->hvcli, &cli->enable_map, &tm_start, &tm_end, NULL);
 
 	if (!ret) {
-		atomic_set(&cli->sync_sample_count, cli->sample_count);
 		cli->active = true;
 		cli->user_data = user_data;
 		cli->sample_flags = 0;
@@ -721,16 +685,6 @@ kbasep_kinstr_prfcnt_client_start(struct kbase_kinstr_prfcnt_client *cli,
 	return ret;
 }
 
-static int kbasep_kinstr_prfcnt_client_wait_async_done(
-	struct kbase_kinstr_prfcnt_client *cli)
-{
-	lockdep_assert_held(&cli->cmd_sync_lock);
-
-	return wait_event_interruptible(cli->waitq,
-					atomic_read(&cli->sync_sample_count) ==
-						cli->sample_count);
-}
-
 static int
 kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli,
 				 u64 user_data)
@@ -739,7 +693,7 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli,
 	u64 tm_start = 0;
 	u64 tm_end = 0;
 	struct kbase_hwcnt_physical_enable_map phys_em;
-	struct kbase_hwcnt_dump_buffer *tmp_buf = NULL;
+	size_t available_samples_count;
 	unsigned int write_idx;
 	unsigned int read_idx;
 
@@ -750,12 +704,11 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli,
 	if (!cli->active)
 		return -EINVAL;
 
-	/* Wait until pending async sample operation done */
-	ret = kbasep_kinstr_prfcnt_client_wait_async_done(cli);
-
-	if (ret < 0)
-		return -ERESTARTSYS;
+	mutex_lock(&cli->kinstr_ctx->lock);
 
+	/* Disable counters under the lock, so we do not race with the
+	 * sampling thread.
+	 */
 	phys_em.fe_bm = 0;
 	phys_em.tiler_bm = 0;
 	phys_em.mmu_l2_bm = 0;
@@ -763,15 +716,11 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli,
 
 	kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, &phys_em);
 
-	mutex_lock(&cli->kinstr_ctx->lock);
-
 	/* Check whether one has the buffer to hold the last sample */
 	write_idx = atomic_read(&cli->write_idx);
 	read_idx = atomic_read(&cli->read_idx);
 
-	/* Check if there is a place to save the last stop produced sample */
-	if (write_idx - read_idx < cli->sample_arr.sample_count)
-		tmp_buf = &cli->tmp_buf;
+	available_samples_count = cli->sample_count - (write_idx - read_idx);
 
 	ret = kbase_hwcnt_virtualizer_client_set_counters(cli->hvcli,
 							  &cli->enable_map,
@@ -781,7 +730,8 @@ kbasep_kinstr_prfcnt_client_stop(struct kbase_kinstr_prfcnt_client *cli,
 	if (ret)
 		cli->sample_flags |= SAMPLE_FLAG_ERROR;
 
-	if (tmp_buf) {
+	/* There must be a place to save the last stop produced sample */
+	if (!WARN_ON(!available_samples_count)) {
 		write_idx %= cli->sample_arr.sample_count;
 		/* Handle the last stop sample */
 		kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map,
@@ -811,7 +761,6 @@ kbasep_kinstr_prfcnt_client_sync_dump(struct kbase_kinstr_prfcnt_client *cli,
 				      u64 user_data)
 {
 	int ret;
-	bool empty_sample = false;
 
 	lockdep_assert_held(&cli->cmd_sync_lock);
 
@@ -819,90 +768,9 @@ kbasep_kinstr_prfcnt_client_sync_dump(struct kbase_kinstr_prfcnt_client *cli,
 	if (!cli->active || cli->dump_interval_ns)
 		return -EINVAL;
 
-	/* Wait until pending async sample operation done, this is required to
-	 * satisfy the stated sample sequence following their issuing order,
-	 * reflected by the sample start timestamp.
-	 */
-	if (atomic_read(&cli->sync_sample_count) != cli->sample_count) {
-		/* Return empty sample instead of performing real dump.
-		 * As there is an async dump currently in-flight which will
-		 * have the desired information.
-		 */
-		empty_sample = true;
-		ret = kbasep_kinstr_prfcnt_client_wait_async_done(cli);
-
-		if (ret < 0)
-			return -ERESTARTSYS;
-	}
-
 	mutex_lock(&cli->kinstr_ctx->lock);
 
-	ret = kbasep_kinstr_prfcnt_client_dump(cli,
-					       BASE_HWCNT_READER_EVENT_MANUAL,
-					       user_data, false, empty_sample);
-
-	mutex_unlock(&cli->kinstr_ctx->lock);
-
-	return ret;
-}
-
-static int
-kbasep_kinstr_prfcnt_client_async_dump(struct kbase_kinstr_prfcnt_client *cli,
-				       u64 user_data)
-{
-	unsigned int write_idx;
-	unsigned int read_idx;
-	unsigned int active_async_dumps;
-	unsigned int new_async_buf_idx;
-	int ret;
-
-	lockdep_assert_held(&cli->cmd_sync_lock);
-
-	/* If the client is not started, or not manual, the command invalid */
-	if (!cli->active || cli->dump_interval_ns)
-		return -EINVAL;
-
-	mutex_lock(&cli->kinstr_ctx->lock);
-
-	write_idx = atomic_read(&cli->write_idx);
-	read_idx = atomic_read(&cli->read_idx);
-	active_async_dumps =
-		cli->sample_count - atomic_read(&cli->sync_sample_count);
-	new_async_buf_idx = write_idx + active_async_dumps;
-
-	/* Check if there is a place to copy HWC block into.
-	 * If successful, reserve space in the buffer for the asynchronous
-	 * operation to make sure that it can actually take place.
-	 * Because we reserve space for asynchronous dumps we need to take that
-	 * in consideration here.
-	 */
-	ret = (new_async_buf_idx - read_idx == cli->sample_arr.sample_count) ?
-		      -EBUSY :
-		      0;
-
-	if (ret == -EBUSY) {
-		mutex_unlock(&cli->kinstr_ctx->lock);
-		return ret;
-	}
-
-	if (active_async_dumps > 0) {
-		struct prfcnt_metadata *ptr_md;
-		unsigned int buf_idx =
-			new_async_buf_idx % cli->sample_arr.sample_count;
-		/* Instead of storing user_data, write it directly to future
-		 * empty sample.
-		 */
-		ptr_md = cli->sample_arr.samples[buf_idx].sample_meta;
-		ptr_md->u.sample_md.user_data = user_data;
-
-		atomic_dec(&cli->sync_sample_count);
-	} else {
-		cli->async.user_data = user_data;
-		atomic_dec(&cli->sync_sample_count);
-
-		kbase_hwcnt_virtualizer_queue_work(cli->kinstr_ctx->hvirt,
-						   &cli->async.dump_work);
-	}
+	ret = kbasep_kinstr_prfcnt_client_dump(cli, BASE_HWCNT_READER_EVENT_MANUAL, user_data);
 
 	mutex_unlock(&cli->kinstr_ctx->lock);
 
@@ -962,10 +830,6 @@ int kbasep_kinstr_prfcnt_cmd(struct kbase_kinstr_prfcnt_client *cli,
 		ret = kbasep_kinstr_prfcnt_client_sync_dump(
 			cli, control_cmd->user_data);
 		break;
-	case PRFCNT_CONTROL_CMD_SAMPLE_ASYNC:
-		ret = kbasep_kinstr_prfcnt_client_async_dump(
-			cli, control_cmd->user_data);
-		break;
 	case PRFCNT_CONTROL_CMD_DISCARD:
 		ret = kbasep_kinstr_prfcnt_client_discard(cli);
 		break;
@@ -1017,23 +881,8 @@ kbasep_kinstr_prfcnt_get_sample(struct kbase_kinstr_prfcnt_client *cli,
 	}
 
 	read_idx %= cli->sample_arr.sample_count;
-	sample_offset_bytes =
-		(u64)(uintptr_t)cli->sample_arr.samples[read_idx].sample_meta -
-		(u64)(uintptr_t)cli->sample_arr.page_addr;
-	sample_meta =
-		(struct prfcnt_metadata *)cli->sample_arr.samples[read_idx]
-			.sample_meta;
-
-	/* Verify that a valid sample has been dumped in the read_idx.
-	 * There are situations where this may not be the case,
-	 * for instance if the client is trying to get an asynchronous
-	 * sample which has not been dumped yet.
-	 */
-	if (sample_meta->hdr.item_type != PRFCNT_SAMPLE_META_TYPE_SAMPLE ||
-	    sample_meta->hdr.item_version != PRFCNT_READER_API_VERSION) {
-		err = -EINVAL;
-		goto error_out;
-	}
+	sample_meta = cli->sample_arr.samples[read_idx].sample_meta;
+	sample_offset_bytes = (u8 *)sample_meta - cli->sample_arr.user_buf;
 
 	sample_access->sequence = sample_meta->u.sample_md.seq;
 	sample_access->sample_offset_bytes = sample_offset_bytes;
@@ -1067,8 +916,7 @@ kbasep_kinstr_prfcnt_put_sample(struct kbase_kinstr_prfcnt_client *cli,
 
 	read_idx %= cli->sample_arr.sample_count;
 	sample_offset_bytes =
-		(u64)(uintptr_t)cli->sample_arr.samples[read_idx].sample_meta -
-		(u64)(uintptr_t)cli->sample_arr.page_addr;
+		(u8 *)cli->sample_arr.samples[read_idx].sample_meta - cli->sample_arr.user_buf;
 
 	if (sample_access->sample_offset_bytes != sample_offset_bytes) {
 		err = -EINVAL;
@@ -1160,40 +1008,15 @@ static int kbasep_kinstr_prfcnt_hwcnt_reader_mmap(struct file *filp,
 						  struct vm_area_struct *vma)
 {
 	struct kbase_kinstr_prfcnt_client *cli;
-	unsigned long vm_size, size, addr, pfn, offset;
 
 	if (!filp || !vma)
 		return -EINVAL;
-	cli = filp->private_data;
 
+	cli = filp->private_data;
 	if (!cli)
 		return -EINVAL;
 
-	vm_size = vma->vm_end - vma->vm_start;
-
-	/* The mapping is allowed to span the entirety of the page allocation,
-	 * not just the chunk where the dump buffers are allocated.
-	 * This accommodates the corner case where the combined size of the
-	 * dump buffers is smaller than a single page.
-	 * This does not pose a security risk as the pages are zeroed on
-	 * allocation, and anything out of bounds of the dump buffers is never
-	 * written to.
-	 */
-	size = (1ull << cli->sample_arr.page_order) * PAGE_SIZE;
-
-	if (vma->vm_pgoff > (size >> PAGE_SHIFT))
-		return -EINVAL;
-
-	offset = vma->vm_pgoff << PAGE_SHIFT;
-
-	if (vm_size > size - offset)
-		return -EINVAL;
-
-	addr = __pa(cli->sample_arr.page_addr + offset);
-	pfn = addr >> PAGE_SHIFT;
-
-	return remap_pfn_range(vma, vma->vm_start, pfn, vm_size,
-			       vma->vm_page_prot);
+	return remap_vmalloc_range(vma, cli->sample_arr.user_buf, 0);
 }
 
 static void kbasep_kinstr_prfcnt_sample_array_free(
@@ -1202,27 +1025,51 @@ static void kbasep_kinstr_prfcnt_sample_array_free(
 	if (!sample_arr)
 		return;
 
-	kfree((void *)sample_arr->samples);
-	kfree((void *)(size_t)sample_arr->page_addr);
+	kfree(sample_arr->samples);
+	vfree(sample_arr->user_buf);
 	memset(sample_arr, 0, sizeof(*sample_arr));
 }
 
-#if !MALI_KERNEL_TEST_API
-static
-#endif
-void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli)
+static void
+kbasep_kinstr_prfcnt_client_destroy_partial(struct kbase_kinstr_prfcnt_client *cli,
+					    enum kbase_kinstr_prfcnt_client_init_state init_state)
 {
 	if (!cli)
 		return;
 
-	kbase_hwcnt_virtualizer_client_destroy(cli->hvcli);
-	kbasep_kinstr_prfcnt_sample_array_free(&cli->sample_arr);
-	kbase_hwcnt_dump_buffer_free(&cli->tmp_buf);
-	kbase_hwcnt_enable_map_free(&cli->enable_map);
-	mutex_destroy(&cli->cmd_sync_lock);
+	while (init_state-- > KINSTR_PRFCNT_UNINITIALISED) {
+		switch (init_state) {
+		case KINSTR_PRFCNT_INITIALISED:
+			/* This shouldn't be reached */
+			break;
+		case KINSTR_PRFCNT_WAITQ_MUTEX:
+			mutex_destroy(&cli->cmd_sync_lock);
+			break;
+		case KINSTR_PRFCNT_VIRTUALIZER_CLIENT:
+			kbase_hwcnt_virtualizer_client_destroy(cli->hvcli);
+			break;
+		case KINSTR_PRFCNT_SAMPLE_ARRAY:
+			kbasep_kinstr_prfcnt_sample_array_free(&cli->sample_arr);
+			break;
+		case KINSTR_PRFCNT_DUMP_BUFFER:
+			kbase_hwcnt_dump_buffer_free(&cli->tmp_buf);
+			break;
+		case KINSTR_PRFCNT_ENABLE_MAP:
+			kbase_hwcnt_enable_map_free(&cli->enable_map);
+			break;
+		case KINSTR_PRFCNT_PARSE_SETUP:
+			/* Nothing to do here */
+			break;
+		}
+	}
 	kfree(cli);
 }
 
+void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli)
+{
+	kbasep_kinstr_prfcnt_client_destroy_partial(cli, KINSTR_PRFCNT_INITIALISED);
+}
+
 /**
  * kbasep_kinstr_prfcnt_hwcnt_reader_release() - hwcnt reader's release.
  * @inode: Non-NULL pointer to inode structure.
@@ -1329,9 +1176,8 @@ static void kbasep_kinstr_prfcnt_dump_worker(struct work_struct *work)
 	list_for_each_entry(pos, &kinstr_ctx->clients, node) {
 		if (pos->active && (pos->next_dump_time_ns != 0) &&
 		    (pos->next_dump_time_ns < cur_time_ns))
-			kbasep_kinstr_prfcnt_client_dump(
-				pos, BASE_HWCNT_READER_EVENT_PERIODIC,
-				pos->user_data, false, false);
+			kbasep_kinstr_prfcnt_client_dump(pos, BASE_HWCNT_READER_EVENT_PERIODIC,
+							 pos->user_data);
 	}
 
 	kbasep_kinstr_prfcnt_reschedule_worker(kinstr_ctx);
@@ -1340,48 +1186,6 @@ static void kbasep_kinstr_prfcnt_dump_worker(struct work_struct *work)
 }
 
 /**
- * kbasep_kinstr_prfcnt_async_dump_worker()- Dump worker for a manual client
- *                                           to take a single asynchronous
- *                                           sample.
- * @work: Work structure.
- */
-static void kbasep_kinstr_prfcnt_async_dump_worker(struct work_struct *work)
-{
-	struct kbase_kinstr_prfcnt_async *cli_async =
-		container_of(work, struct kbase_kinstr_prfcnt_async, dump_work);
-	struct kbase_kinstr_prfcnt_client *cli = container_of(
-		cli_async, struct kbase_kinstr_prfcnt_client, async);
-
-	mutex_lock(&cli->kinstr_ctx->lock);
-	/* While the async operation is in flight, a sync stop might have been
-	 * executed, for which the dump should be skipped. Further as we are
-	 * doing an async dump, we expect that there is reserved buffer for
-	 * this to happen. This is to avoid the rare corner case where the
-	 * user side has issued a stop/start pair before the async work item
-	 * get the chance to execute.
-	 */
-	if (cli->active &&
-	    (atomic_read(&cli->sync_sample_count) < cli->sample_count))
-		kbasep_kinstr_prfcnt_client_dump(cli,
-						 BASE_HWCNT_READER_EVENT_MANUAL,
-						 cli->async.user_data, true,
-						 false);
-
-	/* While the async operation is in flight, more async dump requests
-	 * may have been submitted. In this case, no more async dumps work
-	 * will be queued. Instead space will be reserved for that dump and
-	 * an empty sample will be return after handling the current async
-	 * dump.
-	 */
-	while (cli->active &&
-	       (atomic_read(&cli->sync_sample_count) < cli->sample_count)) {
-		kbasep_kinstr_prfcnt_client_dump(
-			cli, BASE_HWCNT_READER_EVENT_MANUAL, 0, true, true);
-	}
-	mutex_unlock(&cli->kinstr_ctx->lock);
-}
-
-/**
  * kbasep_kinstr_prfcnt_dump_timer() - Dump timer that schedules the dump worker for
  *                              execution as soon as possible.
  * @timer: Timer structure.
@@ -1443,8 +1247,6 @@ void kbase_kinstr_prfcnt_term(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
 	if (!kinstr_ctx)
 		return;
 
-	cancel_work_sync(&kinstr_ctx->dump_work);
-
 	/* Non-zero client count implies client leak */
 	if (WARN_ON(kinstr_ctx->client_count > 0)) {
 		struct kbase_kinstr_prfcnt_client *pos, *n;
@@ -1456,14 +1258,18 @@ void kbase_kinstr_prfcnt_term(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
 		}
 	}
 
+	cancel_work_sync(&kinstr_ctx->dump_work);
+
 	WARN_ON(kinstr_ctx->client_count > 0);
 	kfree(kinstr_ctx);
 }
 
 void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
 {
-	if (WARN_ON(!kinstr_ctx))
+	if (!kinstr_ctx) {
+		pr_warn("%s: kinstr_ctx is NULL\n", __func__);
 		return;
+	}
 
 	mutex_lock(&kinstr_ctx->lock);
 
@@ -1492,8 +1298,10 @@ void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
 
 void kbase_kinstr_prfcnt_resume(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
 {
-	if (WARN_ON(!kinstr_ctx))
+	if (!kinstr_ctx) {
+		pr_warn("%s: kinstr_ctx is NULL\n", __func__);
 		return;
+	}
 
 	mutex_lock(&kinstr_ctx->lock);
 
@@ -1530,8 +1338,6 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl
 	struct kbase_kinstr_prfcnt_sample_array *sample_arr = &cli->sample_arr;
 	struct kbase_kinstr_prfcnt_sample *samples;
 	size_t sample_idx;
-	u64 addr;
-	unsigned int order;
 	size_t dump_buf_bytes;
 	size_t clk_cnt_buf_bytes;
 	size_t sample_meta_bytes;
@@ -1554,16 +1360,13 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl
 	if (!samples)
 		return -ENOMEM;
 
-	order = get_order(sample_size * buffer_count);
-	addr = (u64)(uintptr_t)kzalloc(sample_size * buffer_count, GFP_KERNEL);
+	sample_arr->user_buf = vmalloc_user(sample_size * buffer_count);
 
-	if (!addr) {
-		kfree((void *)samples);
+	if (!sample_arr->user_buf) {
+		kfree(samples);
 		return -ENOMEM;
 	}
 
-	sample_arr->page_addr = addr;
-	sample_arr->page_order = order;
 	sample_arr->sample_count = buffer_count;
 	sample_arr->samples = samples;
 
@@ -1577,12 +1380,11 @@ static int kbasep_kinstr_prfcnt_sample_array_alloc(struct kbase_kinstr_prfcnt_cl
 		/* Internal layout in a sample buffer: [sample metadata, dump_buf, clk_cnt_buf]. */
 		samples[sample_idx].dump_buf.metadata = metadata;
 		samples[sample_idx].sample_meta =
-			(struct prfcnt_metadata *)(uintptr_t)(
-				addr + sample_meta_offset);
+			(struct prfcnt_metadata *)(sample_arr->user_buf + sample_meta_offset);
 		samples[sample_idx].dump_buf.dump_buf =
-			(u64 *)(uintptr_t)(addr + dump_buf_offset);
+			(u64 *)(sample_arr->user_buf + dump_buf_offset);
 		samples[sample_idx].dump_buf.clk_cnt_buf =
-			(u64 *)(uintptr_t)(addr + clk_cnt_buf_offset);
+			(u64 *)(sample_arr->user_buf + clk_cnt_buf_offset);
 	}
 
 	return 0;
@@ -1849,83 +1651,100 @@ int kbasep_kinstr_prfcnt_client_create(struct kbase_kinstr_prfcnt_context *kinst
 {
 	int err;
 	struct kbase_kinstr_prfcnt_client *cli;
+	enum kbase_kinstr_prfcnt_client_init_state init_state;
 
-	WARN_ON(!kinstr_ctx);
-	WARN_ON(!setup);
-	WARN_ON(!req_arr);
+	if (WARN_ON(!kinstr_ctx))
+		return -EINVAL;
+
+	if (WARN_ON(!setup))
+		return -EINVAL;
+
+	if (WARN_ON(!req_arr))
+		return -EINVAL;
 
 	cli = kzalloc(sizeof(*cli), GFP_KERNEL);
 
 	if (!cli)
 		return -ENOMEM;
 
-	cli->kinstr_ctx = kinstr_ctx;
-	err = kbasep_kinstr_prfcnt_parse_setup(kinstr_ctx, setup, &cli->config, req_arr);
-
-	if (err < 0)
-		goto error;
+	for (init_state = KINSTR_PRFCNT_UNINITIALISED; init_state < KINSTR_PRFCNT_INITIALISED;
+	     init_state++) {
+		err = 0;
+		switch (init_state) {
+		case KINSTR_PRFCNT_PARSE_SETUP:
+			cli->kinstr_ctx = kinstr_ctx;
+			err = kbasep_kinstr_prfcnt_parse_setup(kinstr_ctx, setup, &cli->config,
+							       req_arr);
 
-	cli->config.buffer_count = MAX_BUFFER_COUNT;
-	cli->dump_interval_ns = cli->config.period_ns;
-	cli->next_dump_time_ns = 0;
-	cli->active = false;
-	atomic_set(&cli->write_idx, 0);
-	atomic_set(&cli->read_idx, 0);
-	atomic_set(&cli->fetch_idx, 0);
+			break;
 
-	err = kbase_hwcnt_enable_map_alloc(kinstr_ctx->metadata,
-					   &cli->enable_map);
+		case KINSTR_PRFCNT_ENABLE_MAP:
+			cli->config.buffer_count = MAX_BUFFER_COUNT;
+			cli->dump_interval_ns = cli->config.period_ns;
+			cli->next_dump_time_ns = 0;
+			cli->active = false;
+			atomic_set(&cli->write_idx, 0);
+			atomic_set(&cli->read_idx, 0);
+			atomic_set(&cli->fetch_idx, 0);
 
-	if (err < 0)
-		goto error;
+			err = kbase_hwcnt_enable_map_alloc(kinstr_ctx->metadata, &cli->enable_map);
+			break;
 
-	kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map, &cli->config.phys_em);
+		case KINSTR_PRFCNT_DUMP_BUFFER:
+			kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map,
+								 &cli->config.phys_em);
 
-	cli->sample_count = cli->config.buffer_count;
-	atomic_set(&cli->sync_sample_count, cli->sample_count);
-	cli->sample_size = kbasep_kinstr_prfcnt_get_sample_size(cli, kinstr_ctx->metadata);
+			cli->sample_count = cli->config.buffer_count;
+			cli->sample_size =
+				kbasep_kinstr_prfcnt_get_sample_size(cli, kinstr_ctx->metadata);
 
-	/* Use virtualizer's metadata to alloc tmp buffer which interacts with
-	 * the HWC virtualizer.
-	 */
-	err = kbase_hwcnt_dump_buffer_alloc(kinstr_ctx->metadata,
-					    &cli->tmp_buf);
+			/* Use virtualizer's metadata to alloc tmp buffer which interacts with
+			 * the HWC virtualizer.
+			 */
+			err = kbase_hwcnt_dump_buffer_alloc(kinstr_ctx->metadata, &cli->tmp_buf);
+			break;
 
-	if (err < 0)
-		goto error;
+		case KINSTR_PRFCNT_SAMPLE_ARRAY:
+			/* Disable clock map in setup, and enable clock map when start */
+			cli->enable_map.clk_enable_map = 0;
 
-	/* Disable clock map in setup, and enable clock map when start */
-	cli->enable_map.clk_enable_map = 0;
+			/* Use metadata from virtualizer to allocate dump buffers  if
+			 * kinstr_prfcnt doesn't have the truncated metadata.
+			 */
+			err = kbasep_kinstr_prfcnt_sample_array_alloc(cli, kinstr_ctx->metadata);
 
-	/* Use metadata from virtualizer to allocate dump buffers  if
-	 * kinstr_prfcnt doesn't have the truncated metadata.
-	 */
-	err = kbasep_kinstr_prfcnt_sample_array_alloc(cli, kinstr_ctx->metadata);
+			break;
 
-	if (err < 0)
-		goto error;
+		case KINSTR_PRFCNT_VIRTUALIZER_CLIENT:
+			/* Set enable map to be 0 to prevent virtualizer to init and kick the
+			 * backend to count.
+			 */
+			kbase_hwcnt_gpu_enable_map_from_physical(
+				&cli->enable_map, &(struct kbase_hwcnt_physical_enable_map){ 0 });
 
-	/* Set enable map to be 0 to prevent virtualizer to init and kick the backend to count */
-	kbase_hwcnt_gpu_enable_map_from_physical(&cli->enable_map,
-						 &(struct kbase_hwcnt_physical_enable_map){ 0 });
+			err = kbase_hwcnt_virtualizer_client_create(kinstr_ctx->hvirt,
+								    &cli->enable_map, &cli->hvcli);
+			break;
 
-	err = kbase_hwcnt_virtualizer_client_create(
-		kinstr_ctx->hvirt, &cli->enable_map, &cli->hvcli);
+		case KINSTR_PRFCNT_WAITQ_MUTEX:
+			init_waitqueue_head(&cli->waitq);
+			mutex_init(&cli->cmd_sync_lock);
+			break;
 
-	if (err < 0)
-		goto error;
+		case KINSTR_PRFCNT_INITIALISED:
+			/* This shouldn't be reached */
+			break;
+		}
 
-	init_waitqueue_head(&cli->waitq);
-	INIT_WORK(&cli->async.dump_work,
-		  kbasep_kinstr_prfcnt_async_dump_worker);
-	mutex_init(&cli->cmd_sync_lock);
+		if (err < 0) {
+			kbasep_kinstr_prfcnt_client_destroy_partial(cli, init_state);
+			return err;
+		}
+	}
 	*out_vcli = cli;
 
 	return 0;
 
-error:
-	kbasep_kinstr_prfcnt_client_destroy(cli);
-	return err;
 }
 
 static size_t kbasep_kinstr_prfcnt_get_block_info_count(
@@ -2033,7 +1852,6 @@ static int kbasep_kinstr_prfcnt_enum_info_count(
 	struct kbase_kinstr_prfcnt_context *kinstr_ctx,
 	struct kbase_ioctl_kinstr_prfcnt_enum_info *enum_info)
 {
-	int err = 0;
 	uint32_t count = 0;
 	size_t block_info_count = 0;
 	const struct kbase_hwcnt_metadata *metadata;
@@ -2054,7 +1872,7 @@ static int kbasep_kinstr_prfcnt_enum_info_count(
 	enum_info->info_item_size = sizeof(struct prfcnt_enum_item);
 	kinstr_ctx->info_item_count = count;
 
-	return err;
+	return 0;
 }
 
 static int kbasep_kinstr_prfcnt_enum_info_list(
@@ -2148,17 +1966,18 @@ int kbase_kinstr_prfcnt_setup(struct kbase_kinstr_prfcnt_context *kinstr_ctx,
 			      union kbase_ioctl_kinstr_prfcnt_setup *setup)
 {
 	int err;
-	unsigned int item_count;
-	unsigned long bytes;
-	struct prfcnt_request_item *req_arr;
+	size_t item_count;
+	size_t bytes;
+	struct prfcnt_request_item *req_arr = NULL;
 	struct kbase_kinstr_prfcnt_client *cli = NULL;
+	const size_t max_bytes = 32 * sizeof(*req_arr);
 
 	if (!kinstr_ctx || !setup)
 		return -EINVAL;
 
 	item_count = setup->in.request_item_count;
 
-	/* Limiting the request items to 2x of the expected: acommodating
+	/* Limiting the request items to 2x of the expected: accommodating
 	 * moderate duplications but rejecting excessive abuses.
 	 */
 	if (!setup->in.requests_ptr || (item_count < 2) || (setup->in.request_item_size == 0) ||
@@ -2166,16 +1985,22 @@ int kbase_kinstr_prfcnt_setup(struct kbase_kinstr_prfcnt_context *kinstr_ctx,
 		return -EINVAL;
 	}
 
-	bytes = item_count * sizeof(*req_arr);
-	req_arr = kmalloc(bytes, GFP_KERNEL);
+	if (check_mul_overflow(item_count, sizeof(*req_arr), &bytes))
+		return -EINVAL;
+
+	/* Further limiting the max bytes to copy from userspace by setting it in the following
+	 * fashion: a maximum of 1 mode item, 4 types of 3 sets for a total of 12 enable items,
+	 * each currently at the size of prfcnt_request_item.
+	 *
+	 * Note: if more request types get added, this max limit needs to be updated.
+	 */
+	if (bytes > max_bytes)
+		return -EINVAL;
 
-	if (!req_arr)
-		return -ENOMEM;
+	req_arr = memdup_user(u64_to_user_ptr(setup->in.requests_ptr), bytes);
 
-	if (copy_from_user(req_arr, u64_to_user_ptr(setup->in.requests_ptr), bytes)) {
-		err = -EFAULT;
-		goto free_buf;
-	}
+	if (IS_ERR(req_arr))
+		return PTR_ERR(req_arr);
 
 	err = kbasep_kinstr_prfcnt_client_create(kinstr_ctx, setup, &cli, req_arr);
 
diff --git a/mali_kbase/mali_kbase_kinstr_prfcnt.h b/mali_kbase/mali_kbase_kinstr_prfcnt.h
index ec42ce0..53e9674 100644
--- a/mali_kbase/mali_kbase_kinstr_prfcnt.h
+++ b/mali_kbase/mali_kbase_kinstr_prfcnt.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -26,7 +26,7 @@
 #ifndef _KBASE_KINSTR_PRFCNT_H_
 #define _KBASE_KINSTR_PRFCNT_H_
 
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 #include <uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h>
 
 struct kbase_kinstr_prfcnt_context;
@@ -80,7 +80,6 @@ void kbase_kinstr_prfcnt_suspend(struct kbase_kinstr_prfcnt_context *kinstr_ctx)
  */
 void kbase_kinstr_prfcnt_resume(struct kbase_kinstr_prfcnt_context *kinstr_ctx);
 
-#if MALI_KERNEL_TEST_API
 /**
  * kbasep_kinstr_prfcnt_get_block_info_list() - Get list of all block types
  *                                              with their information.
@@ -124,7 +123,7 @@ size_t kbasep_kinstr_prfcnt_get_sample_md_count(const struct kbase_hwcnt_metadat
 int kbasep_kinstr_prfcnt_set_block_meta_items(struct kbase_hwcnt_enable_map *enable_map,
 					      struct kbase_hwcnt_dump_buffer *dst,
 					      struct prfcnt_metadata **block_meta_base,
-					      u64 base_addr, u8 counter_set);
+					      u8 *base_addr, u8 counter_set);
 
 /**
  * kbasep_kinstr_prfcnt_client_create() - Create a kinstr_prfcnt client.
@@ -158,7 +157,6 @@ int kbasep_kinstr_prfcnt_cmd(struct kbase_kinstr_prfcnt_client *cli,
  * @cli: kinstr_prfcnt client. Must not be attached to a kinstr_prfcnt context.
  */
 void kbasep_kinstr_prfcnt_client_destroy(struct kbase_kinstr_prfcnt_client *cli);
-#endif /* MALI_KERNEL_TEST_API */
 
 /**
  * kbase_kinstr_prfcnt_enum_info - Enumerate performance counter information.
diff --git a/mali_kbase/mali_kbase_linux.h b/mali_kbase/mali_kbase_linux.h
index 1d8d196..e5c6f7a 100644
--- a/mali_kbase/mali_kbase_linux.h
+++ b/mali_kbase/mali_kbase_linux.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2014, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2014, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -33,7 +33,7 @@
 #include <linux/module.h>
 #include <linux/atomic.h>
 
-#if (defined(MALI_KERNEL_TEST_API) && (1 == MALI_KERNEL_TEST_API))
+#if IS_ENABLED(MALI_KERNEL_TEST_API)
 	#define KBASE_EXPORT_TEST_API(func) EXPORT_SYMBOL(func)
 #else
 	#define KBASE_EXPORT_TEST_API(func)
diff --git a/mali_kbase/mali_kbase_mem.c b/mali_kbase/mali_kbase_mem.c
index 6562f01..5547bef 100644
--- a/mali_kbase/mali_kbase_mem.c
+++ b/mali_kbase/mali_kbase_mem.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -43,6 +43,11 @@
 #include <mmu/mali_kbase_mmu.h>
 #include <mali_kbase_config_defaults.h>
 #include <mali_kbase_trace_gpu_mem.h>
+#include <linux/version_compat_defs.h>
+#define VA_REGION_SLAB_NAME_PREFIX "va-region-slab-"
+#define VA_REGION_SLAB_NAME_SIZE (DEVNAME_SIZE + sizeof(VA_REGION_SLAB_NAME_PREFIX) + 1)
+
+#if MALI_JIT_PRESSURE_LIMIT_BASE
 
 /*
  * Alignment of objects allocated by the GPU inside a just-in-time memory
@@ -66,6 +71,7 @@
  */
 #define KBASE_GPU_ALLOCATED_OBJECT_MAX_BYTES (512u)
 
+#endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
 /* Forward declarations */
 static void free_partial_locked(struct kbase_context *kctx,
@@ -89,68 +95,72 @@ static size_t kbase_get_num_cpu_va_bits(struct kbase_context *kctx)
 #error "Unknown CPU VA width for this architecture"
 #endif
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (kbase_ctx_flag(kctx, KCTX_COMPAT))
+	if (kbase_ctx_compat_mode(kctx))
 		cpu_va_bits = 32;
-#endif
 
 	return cpu_va_bits;
 }
 
-/* This function finds out which RB tree the given pfn from the GPU VA belongs
- * to based on the memory zone the pfn refers to
- */
-static struct rb_root *kbase_gpu_va_to_rbtree(struct kbase_context *kctx,
-								    u64 gpu_pfn)
+unsigned long kbase_zone_to_bits(enum kbase_memory_zone zone)
 {
-	struct rb_root *rbtree = NULL;
+	return ((((unsigned long)zone) & ((1 << KBASE_REG_ZONE_BITS) - 1ul))
+		<< KBASE_REG_ZONE_SHIFT);
+}
 
-	struct kbase_reg_zone *exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA);
+enum kbase_memory_zone kbase_bits_to_zone(unsigned long zone_bits)
+{
+	return (enum kbase_memory_zone)(((zone_bits) & KBASE_REG_ZONE_MASK)
+		>> KBASE_REG_ZONE_SHIFT);
+}
 
+char *kbase_reg_zone_get_name(enum kbase_memory_zone zone)
+{
+	switch (zone) {
+	case SAME_VA_ZONE:
+		return "SAME_VA";
+	case CUSTOM_VA_ZONE:
+		return "CUSTOM_VA";
+	case EXEC_VA_ZONE:
+		return "EXEC_VA";
 #if MALI_USE_CSF
-	struct kbase_reg_zone *fixed_va_zone =
-		kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_FIXED_VA);
-
-	struct kbase_reg_zone *exec_fixed_va_zone =
-		kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_FIXED_VA);
-
-	if (gpu_pfn >= fixed_va_zone->base_pfn) {
-		rbtree = &kctx->reg_rbtree_fixed;
-		return rbtree;
-	} else if (gpu_pfn >= exec_fixed_va_zone->base_pfn) {
-		rbtree = &kctx->reg_rbtree_exec_fixed;
-		return rbtree;
-	}
+	case MCU_SHARED_ZONE:
+		return "MCU_SHARED";
+	case EXEC_FIXED_VA_ZONE:
+		return "EXEC_FIXED_VA";
+	case FIXED_VA_ZONE:
+		return "FIXED_VA";
 #endif
-	if (gpu_pfn >= exec_va_zone->base_pfn)
-		rbtree = &kctx->reg_rbtree_exec;
-	else {
-		u64 same_va_end;
+	default:
+		return NULL;
+	}
+}
 
-#if IS_ENABLED(CONFIG_64BIT)
-		if (kbase_ctx_flag(kctx, KCTX_COMPAT)) {
-#endif /* CONFIG_64BIT */
-			same_va_end = KBASE_REG_ZONE_CUSTOM_VA_BASE;
-#if IS_ENABLED(CONFIG_64BIT)
-		} else {
-			struct kbase_reg_zone *same_va_zone =
-				kbase_ctx_reg_zone_get(kctx,
-						       KBASE_REG_ZONE_SAME_VA);
-			same_va_end = kbase_reg_zone_end_pfn(same_va_zone);
-		}
-#endif /* CONFIG_64BIT */
+/**
+ * kbase_gpu_pfn_to_rbtree - find the rb-tree tracking the region with the indicated GPU
+ *                           page frame number
+ * @kctx: kbase context
+ * @gpu_pfn: GPU PFN address
+ *
+ * Context: any context.
+ *
+ * Return: reference to the rb-tree root, NULL if not found
+ */
+static struct rb_root *kbase_gpu_pfn_to_rbtree(struct kbase_context *kctx, u64 gpu_pfn)
+{
+	enum kbase_memory_zone zone_idx;
+	struct kbase_reg_zone *zone;
 
-		if (gpu_pfn >= same_va_end)
-			rbtree = &kctx->reg_rbtree_custom;
-		else
-			rbtree = &kctx->reg_rbtree_same;
+	for (zone_idx = 0; zone_idx < CONTEXT_ZONE_MAX; zone_idx++) {
+		zone = &kctx->reg_zone[zone_idx];
+		if ((gpu_pfn >= zone->base_pfn) && (gpu_pfn < kbase_reg_zone_end_pfn(zone)))
+			return &zone->reg_rbtree;
 	}
 
-	return rbtree;
+	return NULL;
 }
 
 /* This function inserts a region into the tree. */
-static void kbase_region_tracker_insert(struct kbase_va_region *new_reg)
+void kbase_region_tracker_insert(struct kbase_va_region *new_reg)
 {
 	u64 start_pfn = new_reg->start_pfn;
 	struct rb_node **link = NULL;
@@ -251,7 +261,9 @@ struct kbase_va_region *kbase_region_tracker_find_region_enclosing_address(
 
 	lockdep_assert_held(&kctx->reg_lock);
 
-	rbtree = kbase_gpu_va_to_rbtree(kctx, gpu_pfn);
+	rbtree = kbase_gpu_pfn_to_rbtree(kctx, gpu_pfn);
+	if (unlikely(!rbtree))
+		return NULL;
 
 	return kbase_find_region_enclosing_address(rbtree, gpu_addr);
 }
@@ -289,7 +301,9 @@ struct kbase_va_region *kbase_region_tracker_find_region_base_address(
 
 	lockdep_assert_held(&kctx->reg_lock);
 
-	rbtree = kbase_gpu_va_to_rbtree(kctx, gpu_pfn);
+	rbtree = kbase_gpu_pfn_to_rbtree(kctx, gpu_pfn);
+	if (unlikely(!rbtree))
+		return NULL;
 
 	return kbase_find_region_base_address(rbtree, gpu_addr);
 }
@@ -376,10 +390,12 @@ void kbase_remove_va_region(struct kbase_device *kbdev,
 			    struct kbase_va_region *reg)
 {
 	struct rb_node *rbprev;
+	struct kbase_reg_zone *zone = container_of(reg->rbtree, struct kbase_reg_zone, reg_rbtree);
 	struct kbase_va_region *prev = NULL;
 	struct rb_node *rbnext;
 	struct kbase_va_region *next = NULL;
 	struct rb_root *reg_rbtree = NULL;
+	struct kbase_va_region *orig_reg = reg;
 
 	int merged_front = 0;
 	int merged_back = 0;
@@ -399,8 +415,8 @@ void kbase_remove_va_region(struct kbase_device *kbdev,
 			 */
 			u64 prev_end_pfn = prev->start_pfn + prev->nr_pages;
 
-			WARN_ON((prev->flags & KBASE_REG_ZONE_MASK) !=
-					    (reg->flags & KBASE_REG_ZONE_MASK));
+			WARN_ON((kbase_bits_to_zone(prev->flags)) !=
+				(kbase_bits_to_zone(reg->flags)));
 			if (!WARN_ON(reg->start_pfn < prev_end_pfn))
 				prev->nr_pages += reg->start_pfn - prev_end_pfn;
 			prev->nr_pages += reg->nr_pages;
@@ -421,32 +437,30 @@ void kbase_remove_va_region(struct kbase_device *kbdev,
 			 */
 			u64 reg_end_pfn = reg->start_pfn + reg->nr_pages;
 
-			WARN_ON((next->flags & KBASE_REG_ZONE_MASK) !=
-					    (reg->flags & KBASE_REG_ZONE_MASK));
+			WARN_ON((kbase_bits_to_zone(next->flags)) !=
+				(kbase_bits_to_zone(reg->flags)));
 			if (!WARN_ON(next->start_pfn < reg_end_pfn))
 				next->nr_pages += next->start_pfn - reg_end_pfn;
 			next->start_pfn = reg->start_pfn;
 			next->nr_pages += reg->nr_pages;
 			rb_erase(&(reg->rblink), reg_rbtree);
 			merged_back = 1;
-			if (merged_front) {
-				/* We already merged with prev, free it */
-				kfree(reg);
-			}
 		}
 	}
 
-	/* If we failed to merge then we need to add a new block */
-	if (!(merged_front || merged_back)) {
+	if (merged_front && merged_back) {
+		/* We already merged with prev, free it */
+		kfree(reg);
+	} else if (!(merged_front || merged_back)) {
+		/* If we failed to merge then we need to add a new block */
+
 		/*
 		 * We didn't merge anything. Try to add a new free
 		 * placeholder, and in any case, remove the original one.
 		 */
 		struct kbase_va_region *free_reg;
 
-		free_reg = kbase_alloc_free_region(reg_rbtree,
-				reg->start_pfn, reg->nr_pages,
-				reg->flags & KBASE_REG_ZONE_MASK);
+		free_reg = kbase_alloc_free_region(zone, reg->start_pfn, reg->nr_pages);
 		if (!free_reg) {
 			/* In case of failure, we cannot allocate a replacement
 			 * free region, so we will be left with a 'gap' in the
@@ -477,6 +491,12 @@ void kbase_remove_va_region(struct kbase_device *kbdev,
 		rb_replace_node(&(reg->rblink), &(free_reg->rblink), reg_rbtree);
 	}
 
+	/* This operation is always safe because the function never frees
+	 * the region. If the region has been merged to both front and back,
+	 * then it's the previous region that is supposed to be freed.
+	 */
+	orig_reg->start_pfn = 0;
+
 out:
 	return;
 }
@@ -487,6 +507,7 @@ KBASE_EXPORT_TEST_API(kbase_remove_va_region);
  * kbase_insert_va_region_nolock - Insert a VA region to the list,
  * replacing the existing one.
  *
+ * @kbdev: The kbase device
  * @new_reg: The new region to insert
  * @at_reg: The region to replace
  * @start_pfn: The Page Frame Number to insert at
@@ -494,10 +515,14 @@ KBASE_EXPORT_TEST_API(kbase_remove_va_region);
  *
  * Return: 0 on success, error code otherwise.
  */
-static int kbase_insert_va_region_nolock(struct kbase_va_region *new_reg,
-		struct kbase_va_region *at_reg, u64 start_pfn, size_t nr_pages)
+static int kbase_insert_va_region_nolock(struct kbase_device *kbdev,
+					 struct kbase_va_region *new_reg,
+					 struct kbase_va_region *at_reg, u64 start_pfn,
+					 size_t nr_pages)
 {
 	struct rb_root *reg_rbtree = NULL;
+	struct kbase_reg_zone *zone =
+		container_of(at_reg->rbtree, struct kbase_reg_zone, reg_rbtree);
 	int err = 0;
 
 	reg_rbtree = at_reg->rbtree;
@@ -539,10 +564,8 @@ static int kbase_insert_va_region_nolock(struct kbase_va_region *new_reg,
 	else {
 		struct kbase_va_region *new_front_reg;
 
-		new_front_reg = kbase_alloc_free_region(reg_rbtree,
-				at_reg->start_pfn,
-				start_pfn - at_reg->start_pfn,
-				at_reg->flags & KBASE_REG_ZONE_MASK);
+		new_front_reg = kbase_alloc_free_region(zone, at_reg->start_pfn,
+							start_pfn - at_reg->start_pfn);
 
 		if (new_front_reg) {
 			at_reg->nr_pages -= nr_pages + new_front_reg->nr_pages;
@@ -595,9 +618,9 @@ int kbase_add_va_region(struct kbase_context *kctx,
 #endif
 	if (!(reg->flags & KBASE_REG_GPU_NX) && !addr &&
 #if MALI_USE_CSF
-		((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_EXEC_FIXED_VA) &&
+	    ((kbase_bits_to_zone(reg->flags)) != EXEC_FIXED_VA_ZONE) &&
 #endif
-	    ((reg->flags & KBASE_REG_ZONE_MASK) != KBASE_REG_ZONE_EXEC_VA)) {
+	    ((kbase_bits_to_zone(reg->flags)) != EXEC_VA_ZONE)) {
 		if (cpu_va_bits > gpu_pc_bits) {
 			align = max(align, (size_t)((1ULL << gpu_pc_bits)
 						>> PAGE_SHIFT));
@@ -615,8 +638,7 @@ int kbase_add_va_region(struct kbase_context *kctx,
 		 * then don't retry, we're out of VA and there is
 		 * nothing which can be done about it.
 		 */
-		if ((reg->flags & KBASE_REG_ZONE_MASK) !=
-				KBASE_REG_ZONE_CUSTOM_VA)
+		if ((kbase_bits_to_zone(reg->flags)) != CUSTOM_VA_ZONE)
 			break;
 	} while (kbase_jit_evict(kctx));
 
@@ -679,8 +701,7 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev,
 			goto exit;
 		}
 
-		err = kbase_insert_va_region_nolock(reg, tmp, gpu_pfn,
-				nr_pages);
+		err = kbase_insert_va_region_nolock(kbdev, reg, tmp, gpu_pfn, nr_pages);
 		if (err) {
 			dev_warn(dev, "Failed to insert va region");
 			err = -ENOMEM;
@@ -705,8 +726,7 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev,
 				nr_pages, align_offset, align_mask,
 				&start_pfn);
 		if (tmp) {
-			err = kbase_insert_va_region_nolock(reg, tmp,
-							start_pfn, nr_pages);
+			err = kbase_insert_va_region_nolock(kbdev, reg, tmp, start_pfn, nr_pages);
 			if (unlikely(err)) {
 				dev_warn(dev, "Failed to insert region: 0x%08llx start_pfn, %zu nr_pages",
 					start_pfn, nr_pages);
@@ -722,85 +742,27 @@ exit:
 	return err;
 }
 
-/*
- * @brief Initialize the internal region tracker data structure.
+/**
+ * kbase_reg_to_kctx - Obtain the kbase context tracking a VA region.
+ * @reg: VA region
+ *
+ * Return:
+ * * pointer to kbase context of the memory allocation
+ * * NULL if the region does not belong to a kbase context (for instance,
+ *   if the allocation corresponds to a shared MCU region on CSF).
  */
-#if MALI_USE_CSF
-static void kbase_region_tracker_ds_init(struct kbase_context *kctx,
-					 struct kbase_va_region *same_va_reg,
-					 struct kbase_va_region *custom_va_reg,
-					 struct kbase_va_region *exec_va_reg,
-					 struct kbase_va_region *exec_fixed_va_reg,
-					 struct kbase_va_region *fixed_va_reg)
-{
-	u64 last_zone_end_pfn;
-
-	kctx->reg_rbtree_same = RB_ROOT;
-	kbase_region_tracker_insert(same_va_reg);
-
-	last_zone_end_pfn = same_va_reg->start_pfn + same_va_reg->nr_pages;
-
-	/* Although custom_va_reg doesn't always exist, initialize
-	 * unconditionally because of the mem_view debugfs
-	 * implementation which relies on it being empty.
-	 */
-	kctx->reg_rbtree_custom = RB_ROOT;
-	kctx->reg_rbtree_exec = RB_ROOT;
-
-	if (custom_va_reg) {
-		WARN_ON(custom_va_reg->start_pfn < last_zone_end_pfn);
-		kbase_region_tracker_insert(custom_va_reg);
-		last_zone_end_pfn = custom_va_reg->start_pfn + custom_va_reg->nr_pages;
-	}
-
-	/* Initialize exec, fixed and exec_fixed. These are always
-	 * initialized at this stage, if they will exist at all.
-	 */
-	kctx->reg_rbtree_fixed = RB_ROOT;
-	kctx->reg_rbtree_exec_fixed = RB_ROOT;
-
-	if (exec_va_reg) {
-		WARN_ON(exec_va_reg->start_pfn < last_zone_end_pfn);
-		kbase_region_tracker_insert(exec_va_reg);
-		last_zone_end_pfn = exec_va_reg->start_pfn + exec_va_reg->nr_pages;
-	}
-
-	if (exec_fixed_va_reg) {
-		WARN_ON(exec_fixed_va_reg->start_pfn < last_zone_end_pfn);
-		kbase_region_tracker_insert(exec_fixed_va_reg);
-		last_zone_end_pfn = exec_fixed_va_reg->start_pfn + exec_fixed_va_reg->nr_pages;
-	}
-
-	if (fixed_va_reg) {
-		WARN_ON(fixed_va_reg->start_pfn < last_zone_end_pfn);
-		kbase_region_tracker_insert(fixed_va_reg);
-		last_zone_end_pfn = fixed_va_reg->start_pfn + fixed_va_reg->nr_pages;
-	}
-}
-#else
-static void kbase_region_tracker_ds_init(struct kbase_context *kctx,
-		struct kbase_va_region *same_va_reg,
-		struct kbase_va_region *custom_va_reg)
+static struct kbase_context *kbase_reg_to_kctx(struct kbase_va_region *reg)
 {
-	kctx->reg_rbtree_same = RB_ROOT;
-	kbase_region_tracker_insert(same_va_reg);
+	struct rb_root *rbtree = reg->rbtree;
+	struct kbase_reg_zone *zone = container_of(rbtree, struct kbase_reg_zone, reg_rbtree);
 
-	/* Although custom_va_reg and exec_va_reg don't always exist,
-	 * initialize unconditionally because of the mem_view debugfs
-	 * implementation which relies on them being empty.
-	 *
-	 * The difference between the two is that the EXEC_VA region
-	 * is never initialized at this stage.
-	 */
-	kctx->reg_rbtree_custom = RB_ROOT;
-	kctx->reg_rbtree_exec = RB_ROOT;
+	if (!kbase_is_ctx_reg_zone(zone->id))
+		return NULL;
 
-	if (custom_va_reg)
-		kbase_region_tracker_insert(custom_va_reg);
+	return container_of(zone - zone->id, struct kbase_context, reg_zone[0]);
 }
-#endif /* MALI_USE_CSF */
 
-static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree)
+void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree)
 {
 	struct rb_node *rbnode;
 	struct kbase_va_region *reg;
@@ -810,7 +772,13 @@ static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree)
 		if (rbnode) {
 			rb_erase(rbnode, rbtree);
 			reg = rb_entry(rbnode, struct kbase_va_region, rblink);
-			WARN_ON(reg->va_refcnt != 1);
+			WARN_ON(kbase_refcount_read(&reg->va_refcnt) != 1);
+			if (kbase_is_page_migration_enabled()) {
+				struct kbase_context *kctx = kbase_reg_to_kctx(reg);
+
+				if (kctx)
+					kbase_gpu_munmap(kctx, reg);
+			}
 			/* Reset the start_pfn - as the rbtree is being
 			 * destroyed and we've already erased this region, there
 			 * is no further need to attempt to remove it.
@@ -825,214 +793,261 @@ static void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree)
 	} while (rbnode);
 }
 
-void kbase_region_tracker_term(struct kbase_context *kctx)
-{
-	kbase_gpu_vm_lock(kctx);
-	kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_same);
-	kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_custom);
-	kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_exec);
-#if MALI_USE_CSF
-	WARN_ON(!list_empty(&kctx->csf.event_pages_head));
-	kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_exec_fixed);
-	kbase_region_tracker_erase_rbtree(&kctx->reg_rbtree_fixed);
-
-#endif
-	kbase_gpu_vm_unlock(kctx);
-}
-
-void kbase_region_tracker_term_rbtree(struct rb_root *rbtree)
-{
-	kbase_region_tracker_erase_rbtree(rbtree);
-}
-
 static size_t kbase_get_same_va_bits(struct kbase_context *kctx)
 {
 	return min_t(size_t, kbase_get_num_cpu_va_bits(kctx),
 			kctx->kbdev->gpu_props.mmu.va_bits);
 }
 
-int kbase_region_tracker_init(struct kbase_context *kctx)
+static int kbase_reg_zone_same_va_init(struct kbase_context *kctx, u64 gpu_va_limit)
 {
-	struct kbase_va_region *same_va_reg;
-	struct kbase_va_region *custom_va_reg = NULL;
-	size_t same_va_bits = kbase_get_same_va_bits(kctx);
-	u64 custom_va_size = KBASE_REG_ZONE_CUSTOM_VA_SIZE;
-	u64 gpu_va_bits = kctx->kbdev->gpu_props.mmu.va_bits;
-	u64 gpu_va_limit = (1ULL << gpu_va_bits) >> PAGE_SHIFT;
-	u64 same_va_pages;
-	u64 same_va_base = 1u;
 	int err;
-#if MALI_USE_CSF
-	struct kbase_va_region *exec_va_reg;
-	struct kbase_va_region *exec_fixed_va_reg;
-	struct kbase_va_region *fixed_va_reg;
-
-	u64 exec_va_base;
-	u64 fixed_va_end;
-	u64 exec_fixed_va_base;
-	u64 fixed_va_base;
-	u64 fixed_va_pages;
-#endif
-
-	/* Take the lock as kbase_free_alloced_region requires it */
-	kbase_gpu_vm_lock(kctx);
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE);
+	const size_t same_va_bits = kbase_get_same_va_bits(kctx);
+	const u64 base_pfn = 1u;
+	u64 nr_pages = (1ULL << (same_va_bits - PAGE_SHIFT)) - base_pfn;
 
-	same_va_pages = (1ULL << (same_va_bits - PAGE_SHIFT)) - same_va_base;
+	lockdep_assert_held(&kctx->reg_lock);
 
 #if MALI_USE_CSF
-	if ((same_va_base + same_va_pages) > KBASE_REG_ZONE_EXEC_VA_BASE_64) {
+	if ((base_pfn + nr_pages) > KBASE_REG_ZONE_EXEC_VA_BASE_64) {
 		/* Depending on how the kernel is configured, it's possible (eg on aarch64) for
 		 * same_va_bits to reach 48 bits. Cap same_va_pages so that the same_va zone
 		 * doesn't cross into the exec_va zone.
 		 */
-		same_va_pages = KBASE_REG_ZONE_EXEC_VA_BASE_64 - same_va_base;
+		nr_pages = KBASE_REG_ZONE_EXEC_VA_BASE_64 - base_pfn;
 	}
 #endif
+	err = kbase_reg_zone_init(kctx->kbdev, zone, SAME_VA_ZONE, base_pfn, nr_pages);
+	if (err)
+		return -ENOMEM;
 
-	/* all have SAME_VA */
-	same_va_reg =
-		kbase_alloc_free_region(&kctx->reg_rbtree_same, same_va_base,
-					same_va_pages, KBASE_REG_ZONE_SAME_VA);
+	kctx->gpu_va_end = base_pfn + nr_pages;
 
-	if (!same_va_reg) {
-		err = -ENOMEM;
-		goto fail_unlock;
-	}
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_SAME_VA, same_va_base,
-				same_va_pages);
+	return 0;
+}
 
-#if IS_ENABLED(CONFIG_64BIT)
-	/* 32-bit clients have custom VA zones */
-	if (kbase_ctx_flag(kctx, KCTX_COMPAT)) {
-#endif
-		if (gpu_va_limit <= KBASE_REG_ZONE_CUSTOM_VA_BASE) {
-			err = -EINVAL;
-			goto fail_free_same_va;
-		}
-		/* If the current size of TMEM is out of range of the
-		 * virtual address space addressable by the MMU then
-		 * we should shrink it to fit
-		 */
-		if ((KBASE_REG_ZONE_CUSTOM_VA_BASE + KBASE_REG_ZONE_CUSTOM_VA_SIZE) >= gpu_va_limit)
-			custom_va_size = gpu_va_limit - KBASE_REG_ZONE_CUSTOM_VA_BASE;
+static void kbase_reg_zone_same_va_term(struct kbase_context *kctx)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE);
 
-		custom_va_reg = kbase_alloc_free_region(
-				&kctx->reg_rbtree_custom,
-				KBASE_REG_ZONE_CUSTOM_VA_BASE,
-				custom_va_size, KBASE_REG_ZONE_CUSTOM_VA);
+	kbase_reg_zone_term(zone);
+}
 
-		if (!custom_va_reg) {
-			err = -ENOMEM;
-			goto fail_free_same_va;
-		}
-		kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_CUSTOM_VA,
-					KBASE_REG_ZONE_CUSTOM_VA_BASE,
-					custom_va_size);
-#if IS_ENABLED(CONFIG_64BIT)
-	} else {
-		custom_va_size = 0;
-	}
-#endif
+static int kbase_reg_zone_custom_va_init(struct kbase_context *kctx, u64 gpu_va_limit)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE);
+	u64 nr_pages = KBASE_REG_ZONE_CUSTOM_VA_SIZE;
 
-#if MALI_USE_CSF
-	/* The position of EXEC_VA depends on whether the client is 32-bit or 64-bit. */
-	exec_va_base = KBASE_REG_ZONE_EXEC_VA_BASE_64;
+	/* If the context does not support CUSTOM_VA zones, then we don't need to
+	 * proceed past this point, and can pretend that it was initialized properly.
+	 * In practice, this will mean that the zone metadata structure will be zero
+	 * initialized and not contain a valid zone ID.
+	 */
+	if (!kbase_ctx_compat_mode(kctx))
+		return 0;
+
+	if (gpu_va_limit <= KBASE_REG_ZONE_CUSTOM_VA_BASE)
+		return -EINVAL;
 
-	/* Similarly the end of the FIXED_VA zone also depends on whether the client
-	 * is 32 or 64-bits.
+	/* If the current size of TMEM is out of range of the
+	 * virtual address space addressable by the MMU then
+	 * we should shrink it to fit
 	 */
-	fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_64;
+	if ((KBASE_REG_ZONE_CUSTOM_VA_BASE + KBASE_REG_ZONE_CUSTOM_VA_SIZE) >= gpu_va_limit)
+		nr_pages = gpu_va_limit - KBASE_REG_ZONE_CUSTOM_VA_BASE;
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (kbase_ctx_flag(kctx, KCTX_COMPAT)) {
-		exec_va_base = KBASE_REG_ZONE_EXEC_VA_BASE_32;
-		fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_32;
-	}
+	if (kbase_reg_zone_init(kctx->kbdev, zone, CUSTOM_VA_ZONE, KBASE_REG_ZONE_CUSTOM_VA_BASE,
+				nr_pages))
+		return -ENOMEM;
+
+	/* On JM systems, this is the last memory zone that gets initialized,
+	 * so the GPU VA ends right after the end of the CUSTOM_VA zone. On CSF,
+	 * setting here is harmless, as the FIXED_VA initializer will overwrite
+	 * it
+	 */
+	kctx->gpu_va_end += nr_pages;
+
+	return 0;
+}
+
+static void kbase_reg_zone_custom_va_term(struct kbase_context *kctx)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE);
+
+	kbase_reg_zone_term(zone);
+}
+
+static inline u64 kbase_get_exec_va_zone_base(struct kbase_context *kctx)
+{
+	u64 base_pfn;
+
+#if MALI_USE_CSF
+	base_pfn = KBASE_REG_ZONE_EXEC_VA_BASE_64;
+	if (kbase_ctx_compat_mode(kctx))
+		base_pfn = KBASE_REG_ZONE_EXEC_VA_BASE_32;
+#else
+	/* EXEC_VA zone's codepaths are slightly easier when its base_pfn is
+	 * initially U64_MAX
+	 */
+	base_pfn = U64_MAX;
 #endif
 
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_VA, exec_va_base,
-				KBASE_REG_ZONE_EXEC_VA_SIZE);
+	return base_pfn;
+}
 
-	exec_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_exec, exec_va_base,
-					      KBASE_REG_ZONE_EXEC_VA_SIZE, KBASE_REG_ZONE_EXEC_VA);
+static inline int kbase_reg_zone_exec_va_init(struct kbase_context *kctx, u64 gpu_va_limit)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE);
+	const u64 base_pfn = kbase_get_exec_va_zone_base(kctx);
+	u64 nr_pages = KBASE_REG_ZONE_EXEC_VA_SIZE;
 
-	if (!exec_va_reg) {
-		err = -ENOMEM;
-		goto fail_free_custom_va;
-	}
+#if !MALI_USE_CSF
+	nr_pages = 0;
+#endif
 
-	exec_fixed_va_base = exec_va_base + KBASE_REG_ZONE_EXEC_VA_SIZE;
+	return kbase_reg_zone_init(kctx->kbdev, zone, EXEC_VA_ZONE, base_pfn, nr_pages);
+}
 
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_FIXED_VA, exec_fixed_va_base,
-				KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE);
+static void kbase_reg_zone_exec_va_term(struct kbase_context *kctx)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE);
 
-	exec_fixed_va_reg =
-		kbase_alloc_free_region(&kctx->reg_rbtree_exec_fixed, exec_fixed_va_base,
-					KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE,
-					KBASE_REG_ZONE_EXEC_FIXED_VA);
+	kbase_reg_zone_term(zone);
+}
 
-	if (!exec_fixed_va_reg) {
-		err = -ENOMEM;
-		goto fail_free_exec_va;
-	}
+#if MALI_USE_CSF
+static inline u64 kbase_get_exec_fixed_va_zone_base(struct kbase_context *kctx)
+{
+	return kbase_get_exec_va_zone_base(kctx) + KBASE_REG_ZONE_EXEC_VA_SIZE;
+}
+
+static int kbase_reg_zone_exec_fixed_va_init(struct kbase_context *kctx, u64 gpu_va_limit)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_FIXED_VA_ZONE);
+	const u64 base_pfn = kbase_get_exec_fixed_va_zone_base(kctx);
+
+	return kbase_reg_zone_init(kctx->kbdev, zone, EXEC_FIXED_VA_ZONE, base_pfn,
+				   KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE);
+}
 
-	fixed_va_base = exec_fixed_va_base + KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE;
-	fixed_va_pages = fixed_va_end - fixed_va_base;
+static void kbase_reg_zone_exec_fixed_va_term(struct kbase_context *kctx)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, EXEC_FIXED_VA_ZONE);
+
+	WARN_ON(!list_empty(&kctx->csf.event_pages_head));
+	kbase_reg_zone_term(zone);
+}
+
+static int kbase_reg_zone_fixed_va_init(struct kbase_context *kctx, u64 gpu_va_limit)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, FIXED_VA_ZONE);
+	const u64 base_pfn =
+		kbase_get_exec_fixed_va_zone_base(kctx) + KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE;
+	u64 fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_64;
+	u64 nr_pages;
+
+	if (kbase_ctx_compat_mode(kctx))
+		fixed_va_end = KBASE_REG_ZONE_FIXED_VA_END_32;
 
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_FIXED_VA, fixed_va_base, fixed_va_pages);
+	nr_pages = fixed_va_end - base_pfn;
 
-	fixed_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_fixed, fixed_va_base,
-					       fixed_va_pages, KBASE_REG_ZONE_FIXED_VA);
+	if (kbase_reg_zone_init(kctx->kbdev, zone, FIXED_VA_ZONE, base_pfn, nr_pages))
+		return -ENOMEM;
 
 	kctx->gpu_va_end = fixed_va_end;
 
-	if (!fixed_va_reg) {
-		err = -ENOMEM;
-		goto fail_free_exec_fixed_va;
-	}
+	return 0;
+}
 
-	kbase_region_tracker_ds_init(kctx, same_va_reg, custom_va_reg, exec_va_reg,
-				     exec_fixed_va_reg, fixed_va_reg);
+static void kbase_reg_zone_fixed_va_term(struct kbase_context *kctx)
+{
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get(kctx, FIXED_VA_ZONE);
 
-	INIT_LIST_HEAD(&kctx->csf.event_pages_head);
-#else
-	/* EXEC_VA zone's codepaths are slightly easier when its base_pfn is
-	 * initially U64_MAX
-	 */
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_EXEC_VA, U64_MAX, 0u);
-	/* Other zones are 0: kbase_create_context() uses vzalloc */
+	kbase_reg_zone_term(zone);
+}
+#endif
+
+typedef int kbase_memory_zone_init(struct kbase_context *kctx, u64 gpu_va_limit);
+typedef void kbase_memory_zone_term(struct kbase_context *kctx);
+
+struct kbase_memory_zone_init_meta {
+	kbase_memory_zone_init *init;
+	kbase_memory_zone_term *term;
+	char *error_msg;
+};
+
+static const struct kbase_memory_zone_init_meta zones_init[] = {
+	[SAME_VA_ZONE] = { kbase_reg_zone_same_va_init, kbase_reg_zone_same_va_term,
+			   "Could not initialize SAME_VA zone" },
+	[CUSTOM_VA_ZONE] = { kbase_reg_zone_custom_va_init, kbase_reg_zone_custom_va_term,
+			     "Could not initialize CUSTOM_VA zone" },
+	[EXEC_VA_ZONE] = { kbase_reg_zone_exec_va_init, kbase_reg_zone_exec_va_term,
+			   "Could not initialize EXEC_VA zone" },
+#if MALI_USE_CSF
+	[EXEC_FIXED_VA_ZONE] = { kbase_reg_zone_exec_fixed_va_init,
+				 kbase_reg_zone_exec_fixed_va_term,
+				 "Could not initialize EXEC_FIXED_VA zone" },
+	[FIXED_VA_ZONE] = { kbase_reg_zone_fixed_va_init, kbase_reg_zone_fixed_va_term,
+			    "Could not initialize FIXED_VA zone" },
+#endif
+};
 
-	kbase_region_tracker_ds_init(kctx, same_va_reg, custom_va_reg);
-	kctx->gpu_va_end = same_va_base + same_va_pages + custom_va_size;
+int kbase_region_tracker_init(struct kbase_context *kctx)
+{
+	const u64 gpu_va_bits = kctx->kbdev->gpu_props.mmu.va_bits;
+	const u64 gpu_va_limit = (1ULL << gpu_va_bits) >> PAGE_SHIFT;
+	int err;
+	unsigned int i;
+
+	/* Take the lock as kbase_free_alloced_region requires it */
+	kbase_gpu_vm_lock(kctx);
+
+	for (i = 0; i < ARRAY_SIZE(zones_init); i++) {
+		err = zones_init[i].init(kctx, gpu_va_limit);
+		if (unlikely(err)) {
+			dev_err(kctx->kbdev->dev, "%s, err = %d\n", zones_init[i].error_msg, err);
+			goto term;
+		}
+	}
+#if MALI_USE_CSF
+	INIT_LIST_HEAD(&kctx->csf.event_pages_head);
 #endif
 	kctx->jit_va = false;
 
 	kbase_gpu_vm_unlock(kctx);
-	return 0;
 
-#if MALI_USE_CSF
-fail_free_exec_fixed_va:
-	kbase_free_alloced_region(exec_fixed_va_reg);
-fail_free_exec_va:
-	kbase_free_alloced_region(exec_va_reg);
-fail_free_custom_va:
-	if (custom_va_reg)
-		kbase_free_alloced_region(custom_va_reg);
-#endif
+	return 0;
+term:
+	while (i-- > 0)
+		zones_init[i].term(kctx);
 
-fail_free_same_va:
-	kbase_free_alloced_region(same_va_reg);
-fail_unlock:
 	kbase_gpu_vm_unlock(kctx);
 	return err;
 }
 
+void kbase_region_tracker_term(struct kbase_context *kctx)
+{
+	unsigned int i;
+
+	WARN(kctx->as_nr != KBASEP_AS_NR_INVALID,
+	     "kctx-%d_%d must first be scheduled out to flush GPU caches+tlbs before erasing remaining regions",
+	     kctx->tgid, kctx->id);
+
+	kbase_gpu_vm_lock(kctx);
+
+	for (i = 0; i < ARRAY_SIZE(zones_init); i++)
+		zones_init[i].term(kctx);
+
+	kbase_gpu_vm_unlock(kctx);
+}
+
 static bool kbase_has_exec_va_zone_locked(struct kbase_context *kctx)
 {
 	struct kbase_reg_zone *exec_va_zone;
 
 	lockdep_assert_held(&kctx->reg_lock);
-	exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA);
+	exec_va_zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE);
 
 	return (exec_va_zone->base_pfn != U64_MAX);
 }
@@ -1072,16 +1087,16 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx)
 
 	lockdep_assert_held(&kctx->reg_lock);
 
-	for (zone_idx = 0; zone_idx < KBASE_REG_ZONE_MAX; ++zone_idx) {
+	for (zone_idx = 0; zone_idx < MEMORY_ZONE_MAX; zone_idx++) {
 		struct kbase_reg_zone *zone;
 		struct kbase_va_region *reg;
 		u64 zone_base_addr;
-		unsigned long zone_bits = KBASE_REG_ZONE(zone_idx);
-		unsigned long reg_zone;
+		enum kbase_memory_zone reg_zone;
 
-		if (!kbase_is_ctx_reg_zone(zone_bits))
+		if (!kbase_is_ctx_reg_zone(zone_idx))
 			continue;
-		zone = kbase_ctx_reg_zone_get(kctx, zone_bits);
+
+		zone = kbase_ctx_reg_zone_get(kctx, zone_idx);
 		zone_base_addr = zone->base_pfn << PAGE_SHIFT;
 
 		reg = kbase_region_tracker_find_region_base_address(
@@ -1089,21 +1104,21 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx)
 
 		if (!zone->va_size_pages) {
 			WARN(reg,
-			     "Should not have found a region that starts at 0x%.16llx for zone 0x%lx",
-			     (unsigned long long)zone_base_addr, zone_bits);
+			     "Should not have found a region that starts at 0x%.16llx for zone %s",
+			     (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx));
 			continue;
 		}
 
 		if (WARN(!reg,
-			 "There should always be a region that starts at 0x%.16llx for zone 0x%lx, couldn't find it",
-			 (unsigned long long)zone_base_addr, zone_bits))
+			 "There should always be a region that starts at 0x%.16llx for zone %s, couldn't find it",
+			 (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx)))
 			return true; /* Safest return value */
 
-		reg_zone = reg->flags & KBASE_REG_ZONE_MASK;
-		if (WARN(reg_zone != zone_bits,
-			 "The region that starts at 0x%.16llx should be in zone 0x%lx but was found in the wrong zone 0x%lx",
-			 (unsigned long long)zone_base_addr, zone_bits,
-			 reg_zone))
+		reg_zone = kbase_bits_to_zone(reg->flags);
+		if (WARN(reg_zone != zone_idx,
+			 "The region that starts at 0x%.16llx should be in zone %s but was found in the wrong zone %s",
+			 (unsigned long long)zone_base_addr, kbase_reg_zone_get_name(zone_idx),
+			 kbase_reg_zone_get_name(reg_zone)))
 			return true; /* Safest return value */
 
 		/* Unless the region is completely free, of the same size as
@@ -1120,15 +1135,12 @@ static bool kbase_region_tracker_has_allocs(struct kbase_context *kctx)
 	return false;
 }
 
-#if IS_ENABLED(CONFIG_64BIT)
 static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx,
 		u64 jit_va_pages)
 {
 	struct kbase_va_region *same_va_reg;
-	struct kbase_reg_zone *same_va_zone;
+	struct kbase_reg_zone *same_va_zone, *custom_va_zone;
 	u64 same_va_zone_base_addr;
-	const unsigned long same_va_zone_bits = KBASE_REG_ZONE_SAME_VA;
-	struct kbase_va_region *custom_va_reg;
 	u64 jit_va_start;
 
 	lockdep_assert_held(&kctx->reg_lock);
@@ -1139,14 +1151,14 @@ static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx,
 	 * cause an overlap to happen with existing same VA allocations and the
 	 * custom VA zone.
 	 */
-	same_va_zone = kbase_ctx_reg_zone_get(kctx, same_va_zone_bits);
+	same_va_zone = kbase_ctx_reg_zone_get(kctx, SAME_VA_ZONE);
 	same_va_zone_base_addr = same_va_zone->base_pfn << PAGE_SHIFT;
 
 	same_va_reg = kbase_region_tracker_find_region_base_address(
 		kctx, same_va_zone_base_addr);
 	if (WARN(!same_va_reg,
-		 "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone 0x%lx",
-		 (unsigned long long)same_va_zone_base_addr, same_va_zone_bits))
+		 "Already found a free region at the start of every zone, but now cannot find any region for zone SAME_VA base 0x%.16llx",
+		 (unsigned long long)same_va_zone_base_addr))
 		return -ENOMEM;
 
 	/* kbase_region_tracker_has_allocs() in the caller has already ensured
@@ -1167,28 +1179,17 @@ static int kbase_region_tracker_init_jit_64(struct kbase_context *kctx,
 
 	/*
 	 * Create a custom VA zone at the end of the VA for allocations which
-	 * JIT can use so it doesn't have to allocate VA from the kernel.
-	 */
-	custom_va_reg =
-		kbase_alloc_free_region(&kctx->reg_rbtree_custom, jit_va_start,
-					jit_va_pages, KBASE_REG_ZONE_CUSTOM_VA);
-
-	/*
-	 * The context will be destroyed if we fail here so no point
-	 * reverting the change we made to same_va.
+	 * JIT can use so it doesn't have to allocate VA from the kernel. Note
+	 * that while the zone has already been zero-initialized during the
+	 * region tracker initialization, we can just overwrite it.
 	 */
-	if (!custom_va_reg)
+	custom_va_zone = kbase_ctx_reg_zone_get(kctx, CUSTOM_VA_ZONE);
+	if (kbase_reg_zone_init(kctx->kbdev, custom_va_zone, CUSTOM_VA_ZONE, jit_va_start,
+				jit_va_pages))
 		return -ENOMEM;
-	/* Since this is 64-bit, the custom zone will not have been
-	 * initialized, so initialize it now
-	 */
-	kbase_ctx_reg_zone_init(kctx, KBASE_REG_ZONE_CUSTOM_VA, jit_va_start,
-				jit_va_pages);
 
-	kbase_region_tracker_insert(custom_va_reg);
 	return 0;
 }
-#endif
 
 int kbase_region_tracker_init_jit(struct kbase_context *kctx, u64 jit_va_pages,
 		int max_allocations, int trim_level, int group_id,
@@ -1229,10 +1230,8 @@ int kbase_region_tracker_init_jit(struct kbase_context *kctx, u64 jit_va_pages,
 		goto exit_unlock;
 	}
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (!kbase_ctx_flag(kctx, KCTX_COMPAT))
+	if (!kbase_ctx_compat_mode(kctx))
 		err = kbase_region_tracker_init_jit_64(kctx, jit_va_pages);
-#endif
 	/*
 	 * Nothing to do for 32-bit clients, JIT uses the existing
 	 * custom VA zone.
@@ -1259,12 +1258,11 @@ exit_unlock:
 int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages)
 {
 #if !MALI_USE_CSF
-	struct kbase_va_region *exec_va_reg;
 	struct kbase_reg_zone *exec_va_zone;
 	struct kbase_reg_zone *target_zone;
 	struct kbase_va_region *target_reg;
 	u64 target_zone_base_addr;
-	unsigned long target_zone_bits;
+	enum kbase_memory_zone target_zone_id;
 	u64 exec_va_start;
 	int err;
 #endif
@@ -1308,25 +1306,23 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages
 		goto exit_unlock;
 	}
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (kbase_ctx_flag(kctx, KCTX_COMPAT)) {
-#endif
+	if (kbase_ctx_compat_mode(kctx)) {
 		/* 32-bit client: take from CUSTOM_VA zone */
-		target_zone_bits = KBASE_REG_ZONE_CUSTOM_VA;
-#if IS_ENABLED(CONFIG_64BIT)
+		target_zone_id = CUSTOM_VA_ZONE;
 	} else {
 		/* 64-bit client: take from SAME_VA zone */
-		target_zone_bits = KBASE_REG_ZONE_SAME_VA;
+		target_zone_id = SAME_VA_ZONE;
 	}
-#endif
-	target_zone = kbase_ctx_reg_zone_get(kctx, target_zone_bits);
+
+	target_zone = kbase_ctx_reg_zone_get(kctx, target_zone_id);
 	target_zone_base_addr = target_zone->base_pfn << PAGE_SHIFT;
 
 	target_reg = kbase_region_tracker_find_region_base_address(
 		kctx, target_zone_base_addr);
 	if (WARN(!target_reg,
-		 "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone 0x%lx",
-		 (unsigned long long)target_zone_base_addr, target_zone_bits)) {
+		 "Already found a free region at the start of every zone, but now cannot find any region for zone base 0x%.16llx zone %s",
+		 (unsigned long long)target_zone_base_addr,
+		 kbase_reg_zone_get_name(target_zone_id))) {
 		err = -ENOMEM;
 		goto exit_unlock;
 	}
@@ -1345,28 +1341,14 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages
 
 	/* Taken from the end of the target zone */
 	exec_va_start = kbase_reg_zone_end_pfn(target_zone) - exec_va_pages;
-
-	exec_va_reg = kbase_alloc_free_region(&kctx->reg_rbtree_exec,
-			exec_va_start,
-			exec_va_pages,
-			KBASE_REG_ZONE_EXEC_VA);
-	if (!exec_va_reg) {
-		err = -ENOMEM;
-		goto exit_unlock;
-	}
-	/* Update EXEC_VA zone
-	 *
-	 * not using kbase_ctx_reg_zone_init() - it was already initialized
-	 */
-	exec_va_zone = kbase_ctx_reg_zone_get(kctx, KBASE_REG_ZONE_EXEC_VA);
-	exec_va_zone->base_pfn = exec_va_start;
-	exec_va_zone->va_size_pages = exec_va_pages;
+	exec_va_zone = kbase_ctx_reg_zone_get(kctx, EXEC_VA_ZONE);
+	if (kbase_reg_zone_init(kctx->kbdev, exec_va_zone, EXEC_VA_ZONE, exec_va_start,
+				exec_va_pages))
+		return -ENOMEM;
 
 	/* Update target zone and corresponding region */
 	target_reg->nr_pages -= exec_va_pages;
 	target_zone->va_size_pages -= exec_va_pages;
-
-	kbase_region_tracker_insert(exec_va_reg);
 	err = 0;
 
 exit_unlock:
@@ -1378,36 +1360,40 @@ exit_unlock:
 #if MALI_USE_CSF
 void kbase_mcu_shared_interface_region_tracker_term(struct kbase_device *kbdev)
 {
-	kbase_region_tracker_term_rbtree(&kbdev->csf.shared_reg_rbtree);
+	kbase_reg_zone_term(&kbdev->csf.mcu_shared_zone);
 }
 
 int kbase_mcu_shared_interface_region_tracker_init(struct kbase_device *kbdev)
 {
-	struct kbase_va_region *shared_reg;
-	u64 shared_reg_start_pfn;
-	u64 shared_reg_size;
-
-	shared_reg_start_pfn = KBASE_REG_ZONE_MCU_SHARED_BASE;
-	shared_reg_size = KBASE_REG_ZONE_MCU_SHARED_SIZE;
-
-	kbdev->csf.shared_reg_rbtree = RB_ROOT;
-
-	shared_reg = kbase_alloc_free_region(&kbdev->csf.shared_reg_rbtree,
-					shared_reg_start_pfn,
-					shared_reg_size,
-					KBASE_REG_ZONE_MCU_SHARED);
-	if (!shared_reg)
-		return -ENOMEM;
-
-	kbase_region_tracker_insert(shared_reg);
-	return 0;
+	return kbase_reg_zone_init(kbdev, &kbdev->csf.mcu_shared_zone, MCU_SHARED_ZONE,
+				   KBASE_REG_ZONE_MCU_SHARED_BASE, MCU_SHARED_ZONE_SIZE);
 }
 #endif
 
+static void kbasep_mem_page_size_init(struct kbase_device *kbdev)
+{
+#if IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE)
+#if IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC)
+	kbdev->pagesize_2mb = true;
+	if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_LARGE_PAGE_ALLOC) != 1) {
+		dev_warn(
+			kbdev->dev,
+			"2MB page is enabled by force while current GPU-HW doesn't meet the requirement to do so.\n");
+	}
+#else /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC) */
+	kbdev->pagesize_2mb = false;
+#endif /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC) */
+#else /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE) */
+	/* Set it to the default based on which GPU is present */
+	kbdev->pagesize_2mb = kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_LARGE_PAGE_ALLOC);
+#endif /* IS_ENABLED(CONFIG_LARGE_PAGE_ALLOC_OVERRIDE) */
+}
+
 int kbase_mem_init(struct kbase_device *kbdev)
 {
 	int err = 0;
 	struct kbasep_mem_device *memdev;
+	char va_region_slab_name[VA_REGION_SLAB_NAME_SIZE];
 #if IS_ENABLED(CONFIG_OF)
 	struct device_node *mgm_node = NULL;
 #endif
@@ -1416,6 +1402,20 @@ int kbase_mem_init(struct kbase_device *kbdev)
 
 	memdev = &kbdev->memdev;
 
+	kbasep_mem_page_size_init(kbdev);
+
+	scnprintf(va_region_slab_name, VA_REGION_SLAB_NAME_SIZE, VA_REGION_SLAB_NAME_PREFIX "%s",
+		  kbdev->devname);
+
+	/* Initialize slab cache for kbase_va_regions */
+	kbdev->va_region_slab =
+		kmem_cache_create(va_region_slab_name, sizeof(struct kbase_va_region), 0, 0, NULL);
+	if (kbdev->va_region_slab == NULL) {
+		dev_err(kbdev->dev, "Failed to create va_region_slab\n");
+		return -ENOMEM;
+	}
+
+	kbase_mem_migrate_init(kbdev);
 	kbase_mem_pool_group_config_set_max_size(&kbdev->mem_pool_defaults,
 		KBASE_MEM_POOL_MAX_SIZE_KCTX);
 
@@ -1479,8 +1479,7 @@ int kbase_mem_init(struct kbase_device *kbdev)
 		kbase_mem_pool_group_config_set_max_size(&mem_pool_defaults,
 			KBASE_MEM_POOL_MAX_SIZE_KBDEV);
 
-		err = kbase_mem_pool_group_init(&kbdev->mem_pools, kbdev,
-			&mem_pool_defaults, NULL);
+		err = kbase_mem_pool_group_init(&kbdev->mem_pools, kbdev, &mem_pool_defaults, NULL);
 	}
 
 	return err;
@@ -1506,6 +1505,11 @@ void kbase_mem_term(struct kbase_device *kbdev)
 
 	kbase_mem_pool_group_term(&kbdev->mem_pools);
 
+	kbase_mem_migrate_term(kbdev);
+
+	kmem_cache_destroy(kbdev->va_region_slab);
+	kbdev->va_region_slab = NULL;
+
 	WARN_ON(kbdev->total_gpu_pages);
 	WARN_ON(!RB_EMPTY_ROOT(&kbdev->process_root));
 	WARN_ON(!RB_EMPTY_ROOT(&kbdev->dma_buf_root));
@@ -1519,41 +1523,41 @@ KBASE_EXPORT_TEST_API(kbase_mem_term);
 /**
  * kbase_alloc_free_region - Allocate a free region object.
  *
- * @rbtree:    Backlink to the red-black tree of memory regions.
+ * @zone:      CUSTOM_VA_ZONE or SAME_VA_ZONE
  * @start_pfn: The Page Frame Number in GPU virtual address space.
  * @nr_pages:  The size of the region in pages.
- * @zone:      KBASE_REG_ZONE_CUSTOM_VA or KBASE_REG_ZONE_SAME_VA
  *
  * The allocated object is not part of any list yet, and is flagged as
  * KBASE_REG_FREE. No mapping is allocated yet.
  *
- * zone is KBASE_REG_ZONE_CUSTOM_VA or KBASE_REG_ZONE_SAME_VA.
- *
  * Return: pointer to the allocated region object on success, NULL otherwise.
  */
-struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree,
-		u64 start_pfn, size_t nr_pages, int zone)
+struct kbase_va_region *kbase_alloc_free_region(struct kbase_reg_zone *zone, u64 start_pfn,
+						size_t nr_pages)
 {
 	struct kbase_va_region *new_reg;
 
-	KBASE_DEBUG_ASSERT(rbtree != NULL);
-
-	/* zone argument should only contain zone related region flags */
-	KBASE_DEBUG_ASSERT((zone & ~KBASE_REG_ZONE_MASK) == 0);
 	KBASE_DEBUG_ASSERT(nr_pages > 0);
 	/* 64-bit address range is the max */
 	KBASE_DEBUG_ASSERT(start_pfn + nr_pages <= (U64_MAX / PAGE_SIZE));
 
-	new_reg = kzalloc(sizeof(*new_reg), GFP_KERNEL);
+	if (WARN_ON(!zone))
+		return NULL;
+
+	if (unlikely(!zone->base_pfn || !zone->va_size_pages))
+		return NULL;
+
+	new_reg = kmem_cache_zalloc(zone->cache, GFP_KERNEL);
 
 	if (!new_reg)
 		return NULL;
 
-	new_reg->va_refcnt = 1;
+	kbase_refcount_set(&new_reg->va_refcnt, 1);
+	atomic_set(&new_reg->no_user_free_count, 0);
 	new_reg->cpu_alloc = NULL; /* no alloc bound yet */
 	new_reg->gpu_alloc = NULL; /* no alloc bound yet */
-	new_reg->rbtree = rbtree;
-	new_reg->flags = zone | KBASE_REG_FREE;
+	new_reg->rbtree = &zone->reg_rbtree;
+	new_reg->flags = kbase_zone_to_bits(zone->id) | KBASE_REG_FREE;
 
 	new_reg->flags |= KBASE_REG_GROWABLE;
 
@@ -1565,42 +1569,15 @@ struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree,
 
 	return new_reg;
 }
-
 KBASE_EXPORT_TEST_API(kbase_alloc_free_region);
 
-static struct kbase_context *kbase_reg_flags_to_kctx(
-		struct kbase_va_region *reg)
+struct kbase_va_region *kbase_ctx_alloc_free_region(struct kbase_context *kctx,
+						    enum kbase_memory_zone id, u64 start_pfn,
+						    size_t nr_pages)
 {
-	struct kbase_context *kctx = NULL;
-	struct rb_root *rbtree = reg->rbtree;
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get_nolock(kctx, id);
 
-	switch (reg->flags & KBASE_REG_ZONE_MASK) {
-	case KBASE_REG_ZONE_CUSTOM_VA:
-		kctx = container_of(rbtree, struct kbase_context,
-				reg_rbtree_custom);
-		break;
-	case KBASE_REG_ZONE_SAME_VA:
-		kctx = container_of(rbtree, struct kbase_context,
-				reg_rbtree_same);
-		break;
-	case KBASE_REG_ZONE_EXEC_VA:
-		kctx = container_of(rbtree, struct kbase_context,
-				reg_rbtree_exec);
-		break;
-#if MALI_USE_CSF
-	case KBASE_REG_ZONE_EXEC_FIXED_VA:
-		kctx = container_of(rbtree, struct kbase_context, reg_rbtree_exec_fixed);
-		break;
-	case KBASE_REG_ZONE_FIXED_VA:
-		kctx = container_of(rbtree, struct kbase_context, reg_rbtree_fixed);
-		break;
-#endif
-	default:
-		WARN(1, "Unknown zone in region: flags=0x%lx\n", reg->flags);
-		break;
-	}
-
-	return kctx;
+	return kbase_alloc_free_region(zone, start_pfn, nr_pages);
 }
 
 /**
@@ -1614,18 +1591,18 @@ static struct kbase_context *kbase_reg_flags_to_kctx(
  * alloc object will be released.
  * It is a bug if no alloc object exists for non-free regions.
  *
+ * If region is MCU_SHARED_ZONE it is freed
  */
 void kbase_free_alloced_region(struct kbase_va_region *reg)
 {
 #if MALI_USE_CSF
-	if ((reg->flags & KBASE_REG_ZONE_MASK) ==
-			KBASE_REG_ZONE_MCU_SHARED) {
+	if (kbase_bits_to_zone(reg->flags) == MCU_SHARED_ZONE) {
 		kfree(reg);
 		return;
 	}
 #endif
 	if (!(reg->flags & KBASE_REG_FREE)) {
-		struct kbase_context *kctx = kbase_reg_flags_to_kctx(reg);
+		struct kbase_context *kctx = kbase_reg_to_kctx(reg);
 
 		if (WARN_ON(!kctx))
 			return;
@@ -1633,10 +1610,17 @@ void kbase_free_alloced_region(struct kbase_va_region *reg)
 		if (WARN_ON(kbase_is_region_invalid(reg)))
 			return;
 
-		dev_dbg(kctx->kbdev->dev, "Freeing memory region %pK\n",
-			(void *)reg);
+		dev_dbg(kctx->kbdev->dev, "Freeing memory region %pK\n of zone %s", (void *)reg,
+			kbase_reg_zone_get_name(kbase_bits_to_zone(reg->flags)));
 #if MALI_USE_CSF
 		if (reg->flags & KBASE_REG_CSF_EVENT)
+			/*
+			 * This should not be reachable if called from 'mcu_shared' functions
+			 * such as:
+			 * kbase_csf_firmware_mcu_shared_mapping_init
+			 * kbase_csf_firmware_mcu_shared_mapping_term
+			 */
+
 			kbase_unlink_event_mem_page(kctx, reg);
 #endif
 
@@ -1650,8 +1634,6 @@ void kbase_free_alloced_region(struct kbase_va_region *reg)
 		 * on the list at termination time of the region tracker.
 		 */
 		if (!list_empty(&reg->gpu_alloc->evict_node)) {
-			mutex_unlock(&kctx->jit_evict_lock);
-
 			/*
 			 * Unlink the physical allocation before unmaking it
 			 * evictable so that the allocation isn't grown back to
@@ -1662,6 +1644,8 @@ void kbase_free_alloced_region(struct kbase_va_region *reg)
 			if (reg->cpu_alloc != reg->gpu_alloc)
 				reg->gpu_alloc->reg = NULL;
 
+			mutex_unlock(&kctx->jit_evict_lock);
+
 			/*
 			 * If a region has been made evictable then we must
 			 * unmake it before trying to free it.
@@ -1736,41 +1720,45 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg,
 		KBASE_DEBUG_ASSERT(alloc->imported.alias.aliased);
 		for (i = 0; i < alloc->imported.alias.nents; i++) {
 			if (alloc->imported.alias.aliased[i].alloc) {
-				err = kbase_mmu_insert_pages(
-					kctx->kbdev, &kctx->mmu,
-					reg->start_pfn + (i * stride),
-					alloc->imported.alias.aliased[i]
-							.alloc->pages +
-						alloc->imported.alias.aliased[i]
-							.offset,
+				err = kbase_mmu_insert_aliased_pages(
+					kctx->kbdev, &kctx->mmu, reg->start_pfn + (i * stride),
+					alloc->imported.alias.aliased[i].alloc->pages +
+						alloc->imported.alias.aliased[i].offset,
 					alloc->imported.alias.aliased[i].length,
-					reg->flags & gwt_mask, kctx->as_nr,
-					group_id, mmu_sync_info);
+					reg->flags & gwt_mask, kctx->as_nr, group_id, mmu_sync_info,
+					NULL);
 				if (err)
-					goto bad_insert;
+					goto bad_aliased_insert;
 
 				/* Note: mapping count is tracked at alias
 				 * creation time
 				 */
 			} else {
-				err = kbase_mmu_insert_single_page(
-					kctx, reg->start_pfn + i * stride,
-					kctx->aliasing_sink_page,
+				err = kbase_mmu_insert_single_aliased_page(
+					kctx, reg->start_pfn + i * stride, kctx->aliasing_sink_page,
 					alloc->imported.alias.aliased[i].length,
-					(reg->flags & mask & gwt_mask) | attr,
-					group_id, mmu_sync_info);
+					(reg->flags & mask & gwt_mask) | attr, group_id,
+					mmu_sync_info);
 
 				if (err)
-					goto bad_insert;
+					goto bad_aliased_insert;
 			}
 		}
 	} else {
-		err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu,
-					     reg->start_pfn,
-					     kbase_get_gpu_phy_pages(reg),
-					     kbase_reg_current_backed_size(reg),
-					     reg->flags & gwt_mask, kctx->as_nr,
-					     group_id, mmu_sync_info);
+		if (reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM ||
+		    reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_USER_BUF) {
+			err = kbase_mmu_insert_pages_skip_status_update(
+				kctx->kbdev, &kctx->mmu, reg->start_pfn,
+				kbase_get_gpu_phy_pages(reg), kbase_reg_current_backed_size(reg),
+				reg->flags & gwt_mask, kctx->as_nr, group_id, mmu_sync_info, reg);
+		} else {
+			err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn,
+						     kbase_get_gpu_phy_pages(reg),
+						     kbase_reg_current_backed_size(reg),
+						     reg->flags & gwt_mask, kctx->as_nr, group_id,
+						     mmu_sync_info, reg);
+		}
+
 		if (err)
 			goto bad_insert;
 		kbase_mem_phy_alloc_gpu_mapped(alloc);
@@ -1780,9 +1768,9 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg,
 	    !WARN_ON(reg->nr_pages < reg->gpu_alloc->nents) &&
 	    reg->gpu_alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM &&
 	    reg->gpu_alloc->imported.umm.current_mapping_usage_count) {
-		/* For padded imported dma-buf memory, map the dummy aliasing
-		 * page from the end of the dma-buf pages, to the end of the
-		 * region using a read only mapping.
+		/* For padded imported dma-buf or user-buf memory, map the dummy
+		 * aliasing page from the end of the imported pages, to the end of
+		 * the region using a read only mapping.
 		 *
 		 * Only map when it's imported dma-buf memory that is currently
 		 * mapped.
@@ -1790,23 +1778,31 @@ int kbase_gpu_mmap(struct kbase_context *kctx, struct kbase_va_region *reg,
 		 * Assume reg->gpu_alloc->nents is the number of actual pages
 		 * in the dma-buf memory.
 		 */
-		err = kbase_mmu_insert_single_page(
-			kctx, reg->start_pfn + reg->gpu_alloc->nents,
-			kctx->aliasing_sink_page,
+		err = kbase_mmu_insert_single_imported_page(
+			kctx, reg->start_pfn + reg->gpu_alloc->nents, kctx->aliasing_sink_page,
 			reg->nr_pages - reg->gpu_alloc->nents,
-			(reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR,
-			KBASE_MEM_GROUP_SINK, mmu_sync_info);
+			(reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, KBASE_MEM_GROUP_SINK,
+			mmu_sync_info);
 		if (err)
 			goto bad_insert;
 	}
 
 	return err;
 
-bad_insert:
-	kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu,
-				 reg->start_pfn, reg->nr_pages,
-				 kctx->as_nr);
+bad_aliased_insert:
+	while (i-- > 0) {
+		struct tagged_addr *phys_alloc = NULL;
+		u64 const stride = alloc->imported.alias.stride;
 
+		if (alloc->imported.alias.aliased[i].alloc != NULL)
+			phys_alloc = alloc->imported.alias.aliased[i].alloc->pages +
+				     alloc->imported.alias.aliased[i].offset;
+
+		kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + (i * stride),
+					 phys_alloc, alloc->imported.alias.aliased[i].length,
+					 alloc->imported.alias.aliased[i].length, kctx->as_nr);
+	}
+bad_insert:
 	kbase_remove_va_region(kctx->kbdev, reg);
 
 	return err;
@@ -1814,12 +1810,13 @@ bad_insert:
 
 KBASE_EXPORT_TEST_API(kbase_gpu_mmap);
 
-static void kbase_jd_user_buf_unmap(struct kbase_context *kctx,
-		struct kbase_mem_phy_alloc *alloc, bool writeable);
+static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, struct kbase_mem_phy_alloc *alloc,
+				    struct kbase_va_region *reg);
 
 int kbase_gpu_munmap(struct kbase_context *kctx, struct kbase_va_region *reg)
 {
 	int err = 0;
+	struct kbase_mem_phy_alloc *alloc;
 
 	if (reg->start_pfn == 0)
 		return 0;
@@ -1827,67 +1824,95 @@ int kbase_gpu_munmap(struct kbase_context *kctx, struct kbase_va_region *reg)
 	if (!reg->gpu_alloc)
 		return -EINVAL;
 
+	alloc = reg->gpu_alloc;
+
 	/* Tear down GPU page tables, depending on memory type. */
-	switch (reg->gpu_alloc->type) {
+	switch (alloc->type) {
 	case KBASE_MEM_TYPE_ALIAS: {
 			size_t i = 0;
-			struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
-
 			/* Due to the way the number of valid PTEs and ATEs are tracked
 			 * currently, only the GPU virtual range that is backed & mapped
-			 * should be passed to the kbase_mmu_teardown_pages() function,
-			 * hence individual aliased regions needs to be unmapped
-			 * separately.
+			 * should be passed to the page teardown function, hence individual
+			 * aliased regions needs to be unmapped separately.
 			 */
 			for (i = 0; i < alloc->imported.alias.nents; i++) {
-				if (alloc->imported.alias.aliased[i].alloc) {
-					int err_loop = kbase_mmu_teardown_pages(
-						kctx->kbdev, &kctx->mmu,
-						reg->start_pfn +
-							(i *
-							alloc->imported.alias.stride),
-						alloc->imported.alias.aliased[i].length,
-						kctx->as_nr);
-					if (WARN_ON_ONCE(err_loop))
-						err = err_loop;
-				}
+				struct tagged_addr *phys_alloc = NULL;
+				int err_loop;
+
+				if (alloc->imported.alias.aliased[i].alloc != NULL)
+					phys_alloc = alloc->imported.alias.aliased[i].alloc->pages +
+						     alloc->imported.alias.aliased[i].offset;
+
+				err_loop = kbase_mmu_teardown_pages(
+					kctx->kbdev, &kctx->mmu,
+					reg->start_pfn + (i * alloc->imported.alias.stride),
+					phys_alloc, alloc->imported.alias.aliased[i].length,
+					alloc->imported.alias.aliased[i].length, kctx->as_nr);
+
+				if (WARN_ON_ONCE(err_loop))
+					err = err_loop;
 			}
 		}
 		break;
-	case KBASE_MEM_TYPE_IMPORTED_UMM:
-		err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu,
-				reg->start_pfn, reg->nr_pages, kctx->as_nr);
+	case KBASE_MEM_TYPE_IMPORTED_UMM: {
+			size_t nr_phys_pages = reg->nr_pages;
+			size_t nr_virt_pages = reg->nr_pages;
+			/* If the region has import padding and falls under the threshold for
+			 * issuing a partial GPU cache flush, we want to reduce the number of
+			 * physical pages that get flushed.
+
+			 * This is symmetric with case of mapping the memory, which first maps
+			 * each imported physical page to a separate virtual page, and then
+			 * maps the single aliasing sink page to each of the virtual padding
+			 * pages.
+			 */
+			if (reg->flags & KBASE_REG_IMPORT_PAD)
+				nr_phys_pages = alloc->nents + 1;
+
+			err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu,
+								reg->start_pfn, alloc->pages,
+								nr_phys_pages, nr_virt_pages,
+								kctx->as_nr);
+		}
 		break;
-	default:
-		err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu,
-			reg->start_pfn, kbase_reg_current_backed_size(reg),
-			kctx->as_nr);
+	case KBASE_MEM_TYPE_IMPORTED_USER_BUF: {
+			size_t nr_reg_pages = kbase_reg_current_backed_size(reg);
+
+			err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu,
+								reg->start_pfn, alloc->pages,
+								nr_reg_pages, nr_reg_pages,
+								kctx->as_nr);
+		}
+		break;
+	default: {
+			size_t nr_reg_pages = kbase_reg_current_backed_size(reg);
+
+			err = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn,
+						       alloc->pages, nr_reg_pages, nr_reg_pages,
+						       kctx->as_nr);
+		}
 		break;
 	}
 
 	/* Update tracking, and other cleanup, depending on memory type. */
-	switch (reg->gpu_alloc->type) {
+	switch (alloc->type) {
 	case KBASE_MEM_TYPE_ALIAS:
 		/* We mark the source allocs as unmapped from the GPU when
 		 * putting reg's allocs
 		 */
 		break;
 	case KBASE_MEM_TYPE_IMPORTED_USER_BUF: {
-			struct kbase_alloc_import_user_buf *user_buf =
-				&reg->gpu_alloc->imported.user_buf;
-
-			if (user_buf->current_mapping_usage_count & PINNED_ON_IMPORT) {
-				user_buf->current_mapping_usage_count &=
-					~PINNED_ON_IMPORT;
-
-				/* The allocation could still have active mappings. */
-				if (user_buf->current_mapping_usage_count == 0) {
-					kbase_jd_user_buf_unmap(kctx, reg->gpu_alloc,
-								(reg->flags & (KBASE_REG_CPU_WR |
-									       KBASE_REG_GPU_WR)));
-				}
+		struct kbase_alloc_import_user_buf *user_buf = &alloc->imported.user_buf;
+
+		if (user_buf->current_mapping_usage_count & PINNED_ON_IMPORT) {
+			user_buf->current_mapping_usage_count &= ~PINNED_ON_IMPORT;
+
+			/* The allocation could still have active mappings. */
+			if (user_buf->current_mapping_usage_count == 0) {
+				kbase_jd_user_buf_unmap(kctx, alloc, reg);
 			}
 		}
+	}
 		fallthrough;
 	default:
 		kbase_mem_phy_alloc_gpu_unmapped(reg->gpu_alloc);
@@ -2007,7 +2032,8 @@ void kbase_sync_single(struct kbase_context *kctx,
 		BUG_ON(!cpu_page);
 		BUG_ON(offset + size > PAGE_SIZE);
 
-		dma_addr = kbase_dma_addr(cpu_page) + offset;
+		dma_addr = kbase_dma_addr_from_tagged(t_cpu_pa) + offset;
+
 		if (sync_fn == KBASE_SYNC_TO_CPU)
 			dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr,
 					size, DMA_BIDIRECTIONAL);
@@ -2018,29 +2044,30 @@ void kbase_sync_single(struct kbase_context *kctx,
 		void *src = NULL;
 		void *dst = NULL;
 		struct page *gpu_page;
+		dma_addr_t dma_addr;
 
 		if (WARN(!gpu_pa, "No GPU PA found for infinite cache op"))
 			return;
 
 		gpu_page = pfn_to_page(PFN_DOWN(gpu_pa));
+		dma_addr = kbase_dma_addr_from_tagged(t_gpu_pa) + offset;
 
 		if (sync_fn == KBASE_SYNC_TO_DEVICE) {
-			src = ((unsigned char *)kmap(cpu_page)) + offset;
-			dst = ((unsigned char *)kmap(gpu_page)) + offset;
+			src = ((unsigned char *)kbase_kmap(cpu_page)) + offset;
+			dst = ((unsigned char *)kbase_kmap(gpu_page)) + offset;
 		} else if (sync_fn == KBASE_SYNC_TO_CPU) {
-			dma_sync_single_for_cpu(kctx->kbdev->dev,
-					kbase_dma_addr(gpu_page) + offset,
-					size, DMA_BIDIRECTIONAL);
-			src = ((unsigned char *)kmap(gpu_page)) + offset;
-			dst = ((unsigned char *)kmap(cpu_page)) + offset;
+			dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr, size,
+						DMA_BIDIRECTIONAL);
+			src = ((unsigned char *)kbase_kmap(gpu_page)) + offset;
+			dst = ((unsigned char *)kbase_kmap(cpu_page)) + offset;
 		}
+
 		memcpy(dst, src, size);
-		kunmap(gpu_page);
-		kunmap(cpu_page);
+		kbase_kunmap(gpu_page, src);
+		kbase_kunmap(cpu_page, dst);
 		if (sync_fn == KBASE_SYNC_TO_DEVICE)
-			dma_sync_single_for_device(kctx->kbdev->dev,
-					kbase_dma_addr(gpu_page) + offset,
-					size, DMA_BIDIRECTIONAL);
+			dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, size,
+						   DMA_BIDIRECTIONAL);
 	}
 }
 
@@ -2186,29 +2213,27 @@ int kbase_mem_free_region(struct kbase_context *kctx, struct kbase_va_region *re
 		__func__, (void *)reg, (void *)kctx);
 	lockdep_assert_held(&kctx->reg_lock);
 
-	if (reg->flags & KBASE_REG_NO_USER_FREE) {
+	if (kbase_va_region_is_no_user_free(reg)) {
 		dev_warn(kctx->kbdev->dev, "Attempt to free GPU memory whose freeing by user space is forbidden!\n");
 		return -EINVAL;
 	}
 
-	/*
-	 * Unlink the physical allocation before unmaking it evictable so
-	 * that the allocation isn't grown back to its last backed size
-	 * as we're going to unmap it anyway.
-	 */
-	reg->cpu_alloc->reg = NULL;
-	if (reg->cpu_alloc != reg->gpu_alloc)
-		reg->gpu_alloc->reg = NULL;
-
-	/*
-	 * If a region has been made evictable then we must unmake it
+	/* If a region has been made evictable then we must unmake it
 	 * before trying to free it.
 	 * If the memory hasn't been reclaimed it will be unmapped and freed
 	 * below, if it has been reclaimed then the operations below are no-ops.
 	 */
 	if (reg->flags & KBASE_REG_DONT_NEED) {
-		KBASE_DEBUG_ASSERT(reg->cpu_alloc->type ==
-				   KBASE_MEM_TYPE_NATIVE);
+		WARN_ON(reg->cpu_alloc->type != KBASE_MEM_TYPE_NATIVE);
+		mutex_lock(&kctx->jit_evict_lock);
+		/* Unlink the physical allocation before unmaking it evictable so
+		 * that the allocation isn't grown back to its last backed size
+		 * as we're going to unmap it anyway.
+		 */
+		reg->cpu_alloc->reg = NULL;
+		if (reg->cpu_alloc != reg->gpu_alloc)
+			reg->gpu_alloc->reg = NULL;
+		mutex_unlock(&kctx->jit_evict_lock);
 		kbase_mem_evictable_unmake(reg->gpu_alloc);
 	}
 
@@ -2219,8 +2244,8 @@ int kbase_mem_free_region(struct kbase_context *kctx, struct kbase_va_region *re
 	}
 
 #if MALI_USE_CSF
-	if (((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_FIXED_VA) ||
-	    ((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_EXEC_FIXED_VA)) {
+	if (((kbase_bits_to_zone(reg->flags)) == FIXED_VA_ZONE) ||
+	    ((kbase_bits_to_zone(reg->flags)) == EXEC_FIXED_VA_ZONE)) {
 		if (reg->flags & KBASE_REG_FIXED_ADDRESS)
 			atomic64_dec(&kctx->num_fixed_allocs);
 		else
@@ -2268,7 +2293,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr)
 			__func__);
 		return -EINVAL;
 	}
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	if (gpu_addr >= BASE_MEM_COOKIE_BASE &&
 	    gpu_addr < BASE_MEM_FIRST_FREE_ADDRESS) {
@@ -2297,7 +2322,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr)
 			goto out_unlock;
 		}
 
-		if ((reg->flags & KBASE_REG_ZONE_MASK) == KBASE_REG_ZONE_SAME_VA) {
+		if ((kbase_bits_to_zone(reg->flags)) == SAME_VA_ZONE) {
 			/* SAME_VA must be freed through munmap */
 			dev_warn(kctx->kbdev->dev, "%s called on SAME_VA memory 0x%llX", __func__,
 					gpu_addr);
@@ -2308,7 +2333,7 @@ int kbase_mem_free(struct kbase_context *kctx, u64 gpu_addr)
 	}
 
 out_unlock:
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 	return err;
 }
 
@@ -2407,8 +2432,11 @@ int kbase_update_region_flags(struct kbase_context *kctx,
 	if (flags & BASEP_MEM_PERMANENT_KERNEL_MAPPING)
 		reg->flags |= KBASE_REG_PERMANENT_KERNEL_MAPPING;
 
-	if (flags & BASEP_MEM_NO_USER_FREE)
-		reg->flags |= KBASE_REG_NO_USER_FREE;
+	if (flags & BASEP_MEM_NO_USER_FREE) {
+		kbase_gpu_vm_lock(kctx);
+		kbase_va_region_no_user_free_inc(reg);
+		kbase_gpu_vm_unlock(kctx);
+	}
 
 	if (flags & BASE_MEM_GPU_VA_SAME_4GB_PAGE)
 		reg->flags |= KBASE_REG_GPU_VA_SAME_4GB_PAGE;
@@ -2457,21 +2485,18 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 	 * allocation is visible to the OOM killer
 	 */
 	kbase_process_page_usage_inc(kctx, nr_pages_requested);
+	kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested);
 
 	tp = alloc->pages + alloc->nents;
 
-#ifdef CONFIG_MALI_2MB_ALLOC
 	/* Check if we have enough pages requested so we can allocate a large
 	 * page (512 * 4KB = 2MB )
 	 */
-	if (nr_left >= (SZ_2M / SZ_4K)) {
+	if (kbdev->pagesize_2mb && nr_left >= (SZ_2M / SZ_4K)) {
 		int nr_lp = nr_left / (SZ_2M / SZ_4K);
 
-		res = kbase_mem_pool_alloc_pages(
-			&kctx->mem_pools.large[alloc->group_id],
-			 nr_lp * (SZ_2M / SZ_4K),
-			 tp,
-			 true);
+		res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.large[alloc->group_id],
+						 nr_lp * (SZ_2M / SZ_4K), tp, true, kctx->task);
 
 		if (res > 0) {
 			nr_left -= res;
@@ -2525,7 +2550,7 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 
 				err = kbase_mem_pool_grow(
 					&kctx->mem_pools.large[alloc->group_id],
-					1);
+					1, kctx->task);
 				if (err)
 					break;
 			} while (1);
@@ -2566,13 +2591,11 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 			}
 		}
 	}
-no_new_partial:
-#endif
 
+no_new_partial:
 	if (nr_left) {
-		res = kbase_mem_pool_alloc_pages(
-			&kctx->mem_pools.small[alloc->group_id],
-			nr_left, tp, false);
+		res = kbase_mem_pool_alloc_pages(&kctx->mem_pools.small[alloc->group_id], nr_left,
+						 tp, false, kctx->task);
 		if (res <= 0)
 			goto alloc_failed;
 	}
@@ -2584,8 +2607,6 @@ no_new_partial:
 
 	alloc->nents += nr_pages_requested;
 
-	kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested);
-
 done:
 	return 0;
 
@@ -2595,19 +2616,13 @@ alloc_failed:
 		size_t nr_pages_to_free = nr_pages_requested - nr_left;
 
 		alloc->nents += nr_pages_to_free;
-
-		kbase_process_page_usage_inc(kctx, nr_pages_to_free);
-		atomic_add(nr_pages_to_free, &kctx->used_pages);
-		atomic_add(nr_pages_to_free,
-			&kctx->kbdev->memdev.used_pages);
-
 		kbase_free_phy_pages_helper(alloc, nr_pages_to_free);
 	}
 
-	kbase_process_page_usage_dec(kctx, nr_pages_requested);
-	atomic_sub(nr_pages_requested, &kctx->used_pages);
-	atomic_sub(nr_pages_requested,
-		&kctx->kbdev->memdev.used_pages);
+	kbase_trace_gpu_mem_usage_dec(kctx->kbdev, kctx, nr_left);
+	kbase_process_page_usage_dec(kctx, nr_left);
+	atomic_sub(nr_left, &kctx->used_pages);
+	atomic_sub(nr_left, &kctx->kbdev->memdev.used_pages);
 
 invalid_request:
 	return -ENOMEM;
@@ -2631,18 +2646,17 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked(
 
 	lockdep_assert_held(&pool->pool_lock);
 
-#if !defined(CONFIG_MALI_2MB_ALLOC)
-	WARN_ON(pool->order);
-#endif
+	kctx = alloc->imported.native.kctx;
+	kbdev = kctx->kbdev;
+
+	if (!kbdev->pagesize_2mb)
+		WARN_ON(pool->order);
 
 	if (alloc->reg) {
 		if (nr_pages_requested > alloc->reg->nr_pages - alloc->nents)
 			goto invalid_request;
 	}
 
-	kctx = alloc->imported.native.kctx;
-	kbdev = kctx->kbdev;
-
 	lockdep_assert_held(&kctx->mem_partials_lock);
 
 	if (nr_pages_requested == 0)
@@ -2657,12 +2671,12 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked(
 	 * allocation is visible to the OOM killer
 	 */
 	kbase_process_page_usage_inc(kctx, nr_pages_requested);
+	kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested);
 
 	tp = alloc->pages + alloc->nents;
 	new_pages = tp;
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-	if (pool->order) {
+	if (kbdev->pagesize_2mb && pool->order) {
 		int nr_lp = nr_left / (SZ_2M / SZ_4K);
 
 		res = kbase_mem_pool_alloc_pages_locked(pool,
@@ -2746,15 +2760,12 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked(
 		if (nr_left)
 			goto alloc_failed;
 	} else {
-#endif
 		res = kbase_mem_pool_alloc_pages_locked(pool,
 						 nr_left,
 						 tp);
 		if (res <= 0)
 			goto alloc_failed;
-#ifdef CONFIG_MALI_2MB_ALLOC
 	}
-#endif
 
 	KBASE_TLSTREAM_AUX_PAGESALLOC(
 			kbdev,
@@ -2763,8 +2774,6 @@ struct tagged_addr *kbase_alloc_phy_pages_helper_locked(
 
 	alloc->nents += nr_pages_requested;
 
-	kbase_trace_gpu_mem_usage_inc(kctx->kbdev, kctx, nr_pages_requested);
-
 done:
 	return new_pages;
 
@@ -2775,8 +2784,7 @@ alloc_failed:
 
 		struct tagged_addr *start_free = alloc->pages + alloc->nents;
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-		if (pool->order) {
+		if (kbdev->pagesize_2mb && pool->order) {
 			while (nr_pages_to_free) {
 				if (is_huge_head(*start_free)) {
 					kbase_mem_pool_free_pages_locked(
@@ -2794,17 +2802,15 @@ alloc_failed:
 				}
 			}
 		} else {
-#endif
 			kbase_mem_pool_free_pages_locked(pool,
 					nr_pages_to_free,
 					start_free,
 					false, /* not dirty */
 					true); /* return to pool */
-#ifdef CONFIG_MALI_2MB_ALLOC
 		}
-#endif
 	}
 
+	kbase_trace_gpu_mem_usage_dec(kctx->kbdev, kctx, nr_pages_requested);
 	kbase_process_page_usage_dec(kctx, nr_pages_requested);
 	atomic_sub(nr_pages_requested, &kctx->used_pages);
 	atomic_sub(nr_pages_requested, &kctx->kbdev->memdev.used_pages);
@@ -3064,6 +3070,13 @@ KBASE_EXPORT_TEST_API(kbase_free_phy_pages_helper_locked);
 /**
  * kbase_jd_user_buf_unpin_pages - Release the pinned pages of a user buffer.
  * @alloc: The allocation for the imported user buffer.
+ *
+ * This must only be called when terminating an alloc, when its refcount
+ * (number of users) has become 0. This also ensures it is only called once all
+ * CPU mappings have been closed.
+ *
+ * Instead call kbase_jd_user_buf_unmap() if you need to unpin pages on active
+ * allocations
  */
 static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc);
 #endif
@@ -3194,9 +3207,32 @@ out_rollback:
 out_term:
 	return -1;
 }
-
 KBASE_EXPORT_TEST_API(kbase_alloc_phy_pages);
 
+void kbase_set_phy_alloc_page_status(struct kbase_mem_phy_alloc *alloc,
+				     enum kbase_page_status status)
+{
+	u32 i = 0;
+
+	for (; i < alloc->nents; i++) {
+		struct tagged_addr phys = alloc->pages[i];
+		struct kbase_page_metadata *page_md = kbase_page_private(as_page(phys));
+
+		/* Skip the 4KB page that is part of a large page, as the large page is
+		 * excluded from the migration process.
+		 */
+		if (is_huge(phys) || is_partial(phys))
+			continue;
+
+		if (!page_md)
+			continue;
+
+		spin_lock(&page_md->migrate_lock);
+		page_md->status = PAGE_STATUS_SET(page_md->status, (u8)status);
+		spin_unlock(&page_md->migrate_lock);
+	}
+}
+
 bool kbase_check_alloc_flags(unsigned long flags)
 {
 	/* Only known input flags should be set. */
@@ -3437,30 +3473,36 @@ int kbase_check_alloc_sizes(struct kbase_context *kctx, unsigned long flags,
 #undef KBASE_MSG_PRE
 }
 
-/**
- * Acquire the per-context region list lock
- * @kctx:  KBase context
- */
 void kbase_gpu_vm_lock(struct kbase_context *kctx)
 {
 	KBASE_DEBUG_ASSERT(kctx != NULL);
 	mutex_lock(&kctx->reg_lock);
 }
-
 KBASE_EXPORT_TEST_API(kbase_gpu_vm_lock);
 
-/**
- * Release the per-context region list lock
- * @kctx:  KBase context
- */
+void kbase_gpu_vm_lock_with_pmode_sync(struct kbase_context *kctx)
+{
+#if MALI_USE_CSF
+	down_read(&kctx->kbdev->csf.pmode_sync_sem);
+#endif
+	kbase_gpu_vm_lock(kctx);
+}
+
 void kbase_gpu_vm_unlock(struct kbase_context *kctx)
 {
 	KBASE_DEBUG_ASSERT(kctx != NULL);
 	mutex_unlock(&kctx->reg_lock);
 }
-
 KBASE_EXPORT_TEST_API(kbase_gpu_vm_unlock);
 
+void kbase_gpu_vm_unlock_with_pmode_sync(struct kbase_context *kctx)
+{
+	kbase_gpu_vm_unlock(kctx);
+#if MALI_USE_CSF
+	up_read(&kctx->kbdev->csf.pmode_sync_sem);
+#endif
+}
+
 #if IS_ENABLED(CONFIG_DEBUG_FS)
 struct kbase_jit_debugfs_data {
 	int (*func)(struct kbase_jit_debugfs_data *data);
@@ -3688,12 +3730,7 @@ void kbase_jit_debugfs_init(struct kbase_context *kctx)
 	/* prevent unprivileged use of debug file system
 	 * in old kernel version
 	 */
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
-	/* only for newer kernel version debug file system is safe */
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 
 	/* Caller already ensures this, but we keep the pattern for
 	 * maintenance safety.
@@ -3767,7 +3804,15 @@ static void kbase_jit_destroy_worker(struct work_struct *work)
 		mutex_unlock(&kctx->jit_evict_lock);
 
 		kbase_gpu_vm_lock(kctx);
-		reg->flags &= ~KBASE_REG_NO_USER_FREE;
+
+		/*
+		 * Incrementing the refcount is prevented on JIT regions.
+		 * If/when this ever changes we would need to compensate
+		 * by implementing "free on putting the last reference",
+		 * but only for JIT regions.
+		 */
+		WARN_ON(atomic_read(&reg->no_user_free_count) > 1);
+		kbase_va_region_no_user_free_dec(reg);
 		kbase_mem_free_region(kctx, reg);
 		kbase_gpu_vm_unlock(kctx);
 	} while (1);
@@ -3782,6 +3827,7 @@ int kbase_jit_init(struct kbase_context *kctx)
 	INIT_WORK(&kctx->jit_work, kbase_jit_destroy_worker);
 
 #if MALI_USE_CSF
+	mutex_init(&kctx->csf.kcpu_queues.jit_lock);
 	INIT_LIST_HEAD(&kctx->csf.kcpu_queues.jit_cmds_head);
 	INIT_LIST_HEAD(&kctx->csf.kcpu_queues.jit_blocked_queues);
 #else /* !MALI_USE_CSF */
@@ -4020,25 +4066,18 @@ static int kbase_jit_grow(struct kbase_context *kctx,
 	if (reg->gpu_alloc->nents >= info->commit_pages)
 		goto done;
 
-	/* Grow the backing */
-	old_size = reg->gpu_alloc->nents;
-
 	/* Allocate some more pages */
 	delta = info->commit_pages - reg->gpu_alloc->nents;
 	pages_required = delta;
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-	if (pages_required >= (SZ_2M / SZ_4K)) {
+	if (kctx->kbdev->pagesize_2mb && pages_required >= (SZ_2M / SZ_4K)) {
 		pool = &kctx->mem_pools.large[kctx->jit_group_id];
 		/* Round up to number of 2 MB pages required */
 		pages_required += ((SZ_2M / SZ_4K) - 1);
 		pages_required /= (SZ_2M / SZ_4K);
 	} else {
-#endif
 		pool = &kctx->mem_pools.small[kctx->jit_group_id];
-#ifdef CONFIG_MALI_2MB_ALLOC
 	}
-#endif
 
 	if (reg->cpu_alloc != reg->gpu_alloc)
 		pages_required *= 2;
@@ -4059,7 +4098,7 @@ static int kbase_jit_grow(struct kbase_context *kctx,
 		spin_unlock(&kctx->mem_partials_lock);
 
 		kbase_gpu_vm_unlock(kctx);
-		ret = kbase_mem_pool_grow(pool, pool_delta);
+		ret = kbase_mem_pool_grow(pool, pool_delta, kctx->task);
 		kbase_gpu_vm_lock(kctx);
 
 		if (ret)
@@ -4069,6 +4108,17 @@ static int kbase_jit_grow(struct kbase_context *kctx,
 		kbase_mem_pool_lock(pool);
 	}
 
+	if (reg->gpu_alloc->nents >= info->commit_pages) {
+		kbase_mem_pool_unlock(pool);
+		spin_unlock(&kctx->mem_partials_lock);
+		dev_info(
+			kctx->kbdev->dev,
+			"JIT alloc grown beyond the required number of initially required pages, this grow no longer needed.");
+		goto done;
+	}
+
+	old_size = reg->gpu_alloc->nents;
+	delta = info->commit_pages - old_size;
 	gpu_pages = kbase_alloc_phy_pages_helper_locked(reg->gpu_alloc, pool,
 			delta, &prealloc_sas[0]);
 	if (!gpu_pages) {
@@ -4219,11 +4269,11 @@ static bool jit_allow_allocate(struct kbase_context *kctx,
 		const struct base_jit_alloc_info *info,
 		bool ignore_pressure_limit)
 {
-#if MALI_USE_CSF
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
-#else
+#if !MALI_USE_CSF
 	lockdep_assert_held(&kctx->jctx.lock);
-#endif
+#else /* MALI_USE_CSF */
+	lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock);
+#endif /* !MALI_USE_CSF */
 
 #if MALI_JIT_PRESSURE_LIMIT_BASE
 	if (!ignore_pressure_limit &&
@@ -4314,25 +4364,25 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC;
 
-#if MALI_USE_CSF
-	lockdep_assert_held(&kctx->csf.kcpu_queues.lock);
-#else
+#if !MALI_USE_CSF
 	lockdep_assert_held(&kctx->jctx.lock);
-#endif
+#else /* MALI_USE_CSF */
+	lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock);
+#endif /* !MALI_USE_CSF */
 
 	if (!jit_allow_allocate(kctx, info, ignore_pressure_limit))
 		return NULL;
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-	/* Preallocate memory for the sub-allocation structs */
-	for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) {
-		prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL);
-		if (!prealloc_sas[i])
-			goto end;
+	if (kctx->kbdev->pagesize_2mb) {
+		/* Preallocate memory for the sub-allocation structs */
+		for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) {
+			prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL);
+			if (!prealloc_sas[i])
+				goto end;
+		}
 	}
-#endif
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 	mutex_lock(&kctx->jit_evict_lock);
 
 	/*
@@ -4414,12 +4464,12 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 			kbase_jit_done_phys_increase(kctx, needed_pages);
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
-		kbase_gpu_vm_unlock(kctx);
+		kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 		if (ret < 0) {
 			/*
 			 * An update to an allocation from the pool failed,
-			 * chances are slim a new allocation would fair any
+			 * chances are slim a new allocation would fare any
 			 * better so return the allocation to the pool and
 			 * return the function with failure.
 			 */
@@ -4441,6 +4491,17 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 			mutex_unlock(&kctx->jit_evict_lock);
 			reg = NULL;
 			goto end;
+		} else {
+			/* A suitable JIT allocation existed on the evict list, so we need
+			 * to make sure that the NOT_MOVABLE property is cleared.
+			 */
+			if (kbase_is_page_migration_enabled()) {
+				kbase_gpu_vm_lock(kctx);
+				mutex_lock(&kctx->jit_evict_lock);
+				kbase_set_phy_alloc_page_status(reg->gpu_alloc, ALLOCATED_MAPPED);
+				mutex_unlock(&kctx->jit_evict_lock);
+				kbase_gpu_vm_unlock(kctx);
+			}
 		}
 	} else {
 		/* No suitable JIT allocation was found so create a new one */
@@ -4468,7 +4529,7 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
 		mutex_unlock(&kctx->jit_evict_lock);
-		kbase_gpu_vm_unlock(kctx);
+		kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 		reg = kbase_mem_alloc(kctx, info->va_pages, info->commit_pages, info->extension,
 				      &flags, &gpu_addr, mmu_sync_info);
@@ -4497,6 +4558,29 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 		}
 	}
 
+	/* Similarly to tiler heap init, there is a short window of time
+	 * where the (either recycled or newly allocated, in our case) region has
+	 * "no user free" count incremented but is still missing the DONT_NEED flag, and
+	 * doesn't yet have the ACTIVE_JIT_ALLOC flag either. Temporarily leaking the
+	 * allocation is the least bad option that doesn't lead to a security issue down the
+	 * line (it will eventually be cleaned up during context termination).
+	 *
+	 * We also need to call kbase_gpu_vm_lock regardless, as we're updating the region
+	 * flags.
+	 */
+	kbase_gpu_vm_lock(kctx);
+	if (unlikely(atomic_read(&reg->no_user_free_count) > 1)) {
+		kbase_gpu_vm_unlock(kctx);
+		dev_err(kctx->kbdev->dev, "JIT region has no_user_free_count > 1!\n");
+
+		mutex_lock(&kctx->jit_evict_lock);
+		list_move(&reg->jit_node, &kctx->jit_pool_head);
+		mutex_unlock(&kctx->jit_evict_lock);
+
+		reg = NULL;
+		goto end;
+	}
+
 	trace_mali_jit_alloc(reg, info->id);
 
 	kctx->jit_current_allocations++;
@@ -4514,6 +4598,7 @@ struct kbase_va_region *kbase_jit_allocate(struct kbase_context *kctx,
 	kbase_jit_report_update_pressure(kctx, reg, info->va_pages,
 			KBASE_JIT_REPORT_ON_ALLOC_OR_FREE);
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
+	kbase_gpu_vm_unlock(kctx);
 
 end:
 	for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i)
@@ -4526,6 +4611,12 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
 {
 	u64 old_pages;
 
+#if !MALI_USE_CSF
+	lockdep_assert_held(&kctx->jctx.lock);
+#else /* MALI_USE_CSF */
+	lockdep_assert_held(&kctx->csf.kcpu_queues.jit_lock);
+#endif /* !MALI_USE_CSF */
+
 	/* JIT id not immediately available here, so use 0u */
 	trace_mali_jit_free(reg, 0u);
 
@@ -4540,9 +4631,9 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
 		u64 delta = old_pages - new_size;
 
 		if (delta) {
-			mutex_lock(&kctx->reg_lock);
+			kbase_gpu_vm_lock_with_pmode_sync(kctx);
 			kbase_mem_shrink(kctx, reg, old_pages - delta);
-			mutex_unlock(&kctx->reg_lock);
+			kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 		}
 	}
 
@@ -4578,12 +4669,18 @@ void kbase_jit_free(struct kbase_context *kctx, struct kbase_va_region *reg)
 
 	list_move(&reg->jit_node, &kctx->jit_pool_head);
 
+	/* Inactive JIT regions should be freed by the shrinker and not impacted
+	 * by page migration. Once freed, they will enter into the page migration
+	 * state machine via the mempools.
+	 */
+	if (kbase_is_page_migration_enabled())
+		kbase_set_phy_alloc_page_status(reg->gpu_alloc, NOT_MOVABLE);
 	mutex_unlock(&kctx->jit_evict_lock);
 }
 
 void kbase_jit_backing_lost(struct kbase_va_region *reg)
 {
-	struct kbase_context *kctx = kbase_reg_flags_to_kctx(reg);
+	struct kbase_context *kctx = kbase_reg_to_kctx(reg);
 
 	if (WARN_ON(!kctx))
 		return;
@@ -4624,7 +4721,14 @@ bool kbase_jit_evict(struct kbase_context *kctx)
 	mutex_unlock(&kctx->jit_evict_lock);
 
 	if (reg) {
-		reg->flags &= ~KBASE_REG_NO_USER_FREE;
+		/*
+		 * Incrementing the refcount is prevented on JIT regions.
+		 * If/when this ever changes we would need to compensate
+		 * by implementing "free on putting the last reference",
+		 * but only for JIT regions.
+		 */
+		WARN_ON(atomic_read(&reg->no_user_free_count) > 1);
+		kbase_va_region_no_user_free_dec(reg);
 		kbase_mem_free_region(kctx, reg);
 	}
 
@@ -4636,8 +4740,7 @@ void kbase_jit_term(struct kbase_context *kctx)
 	struct kbase_va_region *walker;
 
 	/* Free all allocations for this context */
-
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 	mutex_lock(&kctx->jit_evict_lock);
 	/* Free all allocations from the pool */
 	while (!list_empty(&kctx->jit_pool_head)) {
@@ -4646,7 +4749,14 @@ void kbase_jit_term(struct kbase_context *kctx)
 		list_del(&walker->jit_node);
 		list_del_init(&walker->gpu_alloc->evict_node);
 		mutex_unlock(&kctx->jit_evict_lock);
-		walker->flags &= ~KBASE_REG_NO_USER_FREE;
+		/*
+		 * Incrementing the refcount is prevented on JIT regions.
+		 * If/when this ever changes we would need to compensate
+		 * by implementing "free on putting the last reference",
+		 * but only for JIT regions.
+		 */
+		WARN_ON(atomic_read(&walker->no_user_free_count) > 1);
+		kbase_va_region_no_user_free_dec(walker);
 		kbase_mem_free_region(kctx, walker);
 		mutex_lock(&kctx->jit_evict_lock);
 	}
@@ -4658,7 +4768,14 @@ void kbase_jit_term(struct kbase_context *kctx)
 		list_del(&walker->jit_node);
 		list_del_init(&walker->gpu_alloc->evict_node);
 		mutex_unlock(&kctx->jit_evict_lock);
-		walker->flags &= ~KBASE_REG_NO_USER_FREE;
+		/*
+		 * Incrementing the refcount is prevented on JIT regions.
+		 * If/when this ever changes we would need to compensate
+		 * by implementing "free on putting the last reference",
+		 * but only for JIT regions.
+		 */
+		WARN_ON(atomic_read(&walker->no_user_free_count) > 1);
+		kbase_va_region_no_user_free_dec(walker);
 		kbase_mem_free_region(kctx, walker);
 		mutex_lock(&kctx->jit_evict_lock);
 	}
@@ -4666,7 +4783,7 @@ void kbase_jit_term(struct kbase_context *kctx)
 	WARN_ON(kctx->jit_phys_pages_to_be_allocated);
 #endif
 	mutex_unlock(&kctx->jit_evict_lock);
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 	/*
 	 * Flush the freeing of allocations whose backing has been freed
@@ -4772,7 +4889,23 @@ void kbase_unpin_user_buf_page(struct page *page)
 #if MALI_USE_CSF
 static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc)
 {
-	if (alloc->nents) {
+	/* In CSF builds, we keep pages pinned until the last reference is
+	 * released on the alloc. A refcount of 0 also means we can be sure
+	 * that all CPU mappings have been closed on this alloc, and no more
+	 * mappings of it will be created.
+	 *
+	 * Further, the WARN() below captures the restriction that this
+	 * function will not handle anything other than the alloc termination
+	 * path, because the caller of kbase_mem_phy_alloc_put() is not
+	 * required to hold the kctx's reg_lock, and so we could not handle
+	 * removing an existing CPU mapping here.
+	 *
+	 * Refer to this function's kernel-doc comments for alternatives for
+	 * unpinning a User buffer.
+	 */
+
+	if (alloc->nents && !WARN(kref_read(&alloc->kref) != 0,
+				  "must only be called on terminating an allocation")) {
 		struct page **pages = alloc->imported.user_buf.pages;
 		long i;
 
@@ -4780,6 +4913,8 @@ static void kbase_jd_user_buf_unpin_pages(struct kbase_mem_phy_alloc *alloc)
 
 		for (i = 0; i < alloc->nents; i++)
 			kbase_unpin_user_buf_page(pages[i]);
+
+		alloc->nents = 0;
 	}
 }
 #endif
@@ -4795,6 +4930,8 @@ int kbase_jd_user_buf_pin_pages(struct kbase_context *kctx,
 	long i;
 	int write;
 
+	lockdep_assert_held(&kctx->reg_lock);
+
 	if (WARN_ON(alloc->type != KBASE_MEM_TYPE_IMPORTED_USER_BUF))
 		return -EINVAL;
 
@@ -4810,18 +4947,7 @@ int kbase_jd_user_buf_pin_pages(struct kbase_context *kctx,
 
 	write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR);
 
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE
-	pinned_pages = get_user_pages(NULL, mm, address, alloc->imported.user_buf.nr_pages,
-#if KERNEL_VERSION(4, 4, 168) <= LINUX_VERSION_CODE && \
-KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
-				      write ? FOLL_WRITE : 0, pages, NULL);
-#else
-				      write, 0, pages, NULL);
-#endif
-#elif KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE
-	pinned_pages = get_user_pages_remote(NULL, mm, address, alloc->imported.user_buf.nr_pages,
-					     write, 0, pages, NULL);
-#elif KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE
+#if KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE
 	pinned_pages = get_user_pages_remote(NULL, mm, address, alloc->imported.user_buf.nr_pages,
 					     write ? FOLL_WRITE : 0, pages, NULL);
 #elif KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE
@@ -4836,6 +4962,9 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
 		return pinned_pages;
 
 	if (pinned_pages != alloc->imported.user_buf.nr_pages) {
+		/* Above code already ensures there will not have been a CPU
+		 * mapping by ensuring alloc->nents is 0
+		 */
 		for (i = 0; i < pinned_pages; i++)
 			kbase_unpin_user_buf_page(pages[i]);
 		return -ENOMEM;
@@ -4849,43 +4978,65 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
 static int kbase_jd_user_buf_map(struct kbase_context *kctx,
 		struct kbase_va_region *reg)
 {
-	long pinned_pages;
+	int err;
+	long pinned_pages = 0;
 	struct kbase_mem_phy_alloc *alloc;
 	struct page **pages;
 	struct tagged_addr *pa;
-	long i;
-	unsigned long address;
+	long i, dma_mapped_pages;
 	struct device *dev;
-	unsigned long offset;
-	unsigned long local_size;
 	unsigned long gwt_mask = ~0;
-	int err = kbase_jd_user_buf_pin_pages(kctx, reg);
-
 	/* Calls to this function are inherently asynchronous, with respect to
 	 * MMU operations.
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	bool write;
+	enum dma_data_direction dma_dir;
+
+	/* If neither the CPU nor the GPU needs write access, use DMA_TO_DEVICE
+	 * to avoid potentially-destructive CPU cache invalidates that could
+	 * corruption of user data.
+	 */
+	write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR);
+	dma_dir = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+
+	lockdep_assert_held(&kctx->reg_lock);
+
+	err = kbase_jd_user_buf_pin_pages(kctx, reg);
 
 	if (err)
 		return err;
 
 	alloc = reg->gpu_alloc;
 	pa = kbase_get_gpu_phy_pages(reg);
-	address = alloc->imported.user_buf.address;
 	pinned_pages = alloc->nents;
 	pages = alloc->imported.user_buf.pages;
 	dev = kctx->kbdev->dev;
-	offset = address & ~PAGE_MASK;
-	local_size = alloc->imported.user_buf.size;
 
+	/* Manual CPU cache synchronization.
+	 *
+	 * The driver disables automatic CPU cache synchronization because the
+	 * memory pages that enclose the imported region may also contain
+	 * sub-regions which are not imported and that are allocated and used
+	 * by the user process. This may be the case of memory at the beginning
+	 * of the first page and at the end of the last page. Automatic CPU cache
+	 * synchronization would force some operations on those memory allocations,
+	 * unbeknown to the user process: in particular, a CPU cache invalidate
+	 * upon unmapping would destroy the content of dirty CPU caches and cause
+	 * the user process to lose CPU writes to the non-imported sub-regions.
+	 *
+	 * When the GPU claims ownership of the imported memory buffer, it shall
+	 * commit CPU writes for the whole of all pages that enclose the imported
+	 * region, otherwise the initial content of memory would be wrong.
+	 */
 	for (i = 0; i < pinned_pages; i++) {
 		dma_addr_t dma_addr;
-		unsigned long min;
-
-		min = MIN(PAGE_SIZE - offset, local_size);
-		dma_addr = dma_map_page(dev, pages[i],
-				offset, min,
-				DMA_BIDIRECTIONAL);
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+		dma_addr = dma_map_page(dev, pages[i], 0, PAGE_SIZE, dma_dir);
+#else
+		dma_addr = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE, dma_dir,
+					      DMA_ATTR_SKIP_CPU_SYNC);
+#endif
 		err = dma_mapping_error(dev, dma_addr);
 		if (err)
 			goto unwind;
@@ -4893,8 +5044,7 @@ static int kbase_jd_user_buf_map(struct kbase_context *kctx,
 		alloc->imported.user_buf.dma_addrs[i] = dma_addr;
 		pa[i] = as_tagged(page_to_phys(pages[i]));
 
-		local_size -= min;
-		offset = 0;
+		dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir);
 	}
 
 #ifdef CONFIG_MALI_CINSTR_GWT
@@ -4902,23 +5052,44 @@ static int kbase_jd_user_buf_map(struct kbase_context *kctx,
 		gwt_mask = ~KBASE_REG_GPU_WR;
 #endif
 
-	err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn,
-				     pa, kbase_reg_current_backed_size(reg),
-				     reg->flags & gwt_mask, kctx->as_nr,
-				     alloc->group_id, mmu_sync_info);
+	err = kbase_mmu_insert_pages_skip_status_update(kctx->kbdev, &kctx->mmu, reg->start_pfn, pa,
+							kbase_reg_current_backed_size(reg),
+							reg->flags & gwt_mask, kctx->as_nr,
+							alloc->group_id, mmu_sync_info, NULL);
 	if (err == 0)
 		return 0;
 
 	/* fall down */
 unwind:
 	alloc->nents = 0;
-	while (i--) {
-		dma_unmap_page(kctx->kbdev->dev,
-				alloc->imported.user_buf.dma_addrs[i],
-				PAGE_SIZE, DMA_BIDIRECTIONAL);
+	dma_mapped_pages = i;
+	/* Run the unmap loop in the same order as map loop, and perform again
+	 * CPU cache synchronization to re-write the content of dirty CPU caches
+	 * to memory. This is precautionary measure in case a GPU job has taken
+	 * advantage of a partially GPU-mapped range to write and corrupt the
+	 * content of memory, either inside or outside the imported region.
+	 *
+	 * Notice that this error recovery path doesn't try to be optimal and just
+	 * flushes the entire page range.
+	 */
+	for (i = 0; i < dma_mapped_pages; i++) {
+		dma_addr_t dma_addr = alloc->imported.user_buf.dma_addrs[i];
+
+		dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir);
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+		dma_unmap_page(dev, dma_addr, PAGE_SIZE, dma_dir);
+#else
+		dma_unmap_page_attrs(dev, dma_addr, PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+#endif
 	}
 
-	while (++i < pinned_pages) {
+	/* The user buffer could already have been previously pinned before
+	 * entering this function, and hence there could potentially be CPU
+	 * mappings of it
+	 */
+	kbase_mem_shrink_cpu_mapping(kctx, reg, 0, pinned_pages);
+
+	for (i = 0; i < pinned_pages; i++) {
 		kbase_unpin_user_buf_page(pages[i]);
 		pages[i] = NULL;
 	}
@@ -4926,34 +5097,165 @@ unwind:
 	return err;
 }
 
+/* user_buf_sync_read_only_page - This function handles syncing a single page that has read access,
+ *                                only, on both the CPU and * GPU, so it is ready to be unmapped.
+ * @kctx: kbase context
+ * @imported_size: the number of bytes to sync
+ * @dma_addr: DMA address of the bytes to be sync'd
+ * @offset_within_page: (unused) offset of the bytes within the page. Passed so that the calling
+ * signature is identical to user_buf_sync_writable_page().
+ */
+static void user_buf_sync_read_only_page(struct kbase_context *kctx, unsigned long imported_size,
+					 dma_addr_t dma_addr, unsigned long offset_within_page)
+{
+	/* Manual cache synchronization.
+	 *
+	 * Writes from neither the CPU nor GPU are possible via this mapping,
+	 * so we just sync the entire page to the device.
+	 */
+	dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, imported_size, DMA_TO_DEVICE);
+}
+
+/* user_buf_sync_writable_page - This function handles syncing a single page that has read
+ *                                and writable access, from either (or both of) the CPU and GPU,
+ *                                so it is ready to be unmapped.
+ * @kctx: kbase context
+ * @imported_size: the number of bytes to unmap
+ * @dma_addr: DMA address of the bytes to be unmapped
+ * @offset_within_page: offset of the bytes within the page. This is the offset to the subrange of
+ *                      the memory that is "imported" and so is intended for GPU access. Areas of
+ *                      the page outside of this - whilst still GPU accessible - are not intended
+ *                      for use by GPU work, and should also not be modified as the userspace CPU
+ *                      threads may be modifying them.
+ */
+static void user_buf_sync_writable_page(struct kbase_context *kctx, unsigned long imported_size,
+					dma_addr_t dma_addr, unsigned long offset_within_page)
+{
+	/* Manual CPU cache synchronization.
+	 *
+	 * When the GPU returns ownership of the buffer to the CPU, the driver
+	 * needs to treat imported and non-imported memory differently.
+	 *
+	 * The first case to consider is non-imported sub-regions at the
+	 * beginning of the first page and at the end of last page. For these
+	 * sub-regions: CPU cache shall be committed with a clean+invalidate,
+	 * in order to keep the last CPU write.
+	 *
+	 * Imported region prefers the opposite treatment: this memory has been
+	 * legitimately mapped and used by the GPU, hence GPU writes shall be
+	 * committed to memory, while CPU cache shall be invalidated to make
+	 * sure that CPU reads the correct memory content.
+	 *
+	 * The following diagram shows the expect value of the variables
+	 * used in this loop in the corner case of an imported region encloed
+	 * by a single memory page:
+	 *
+	 * page boundary ->|---------- | <- dma_addr (initial value)
+	 *                 |           |
+	 *                 | - - - - - | <- offset_within_page
+	 *                 |XXXXXXXXXXX|\
+	 *                 |XXXXXXXXXXX| \
+	 *                 |XXXXXXXXXXX|  }- imported_size
+	 *                 |XXXXXXXXXXX| /
+	 *                 |XXXXXXXXXXX|/
+	 *                 | - - - - - | <- offset_within_page + imported_size
+	 *                 |           |\
+	 *                 |           | }- PAGE_SIZE - imported_size -
+	 *                 |           |/   offset_within_page
+	 *                 |           |
+	 * page boundary ->|-----------|
+	 *
+	 * If the imported region is enclosed by more than one page, then
+	 * offset_within_page = 0 for any page after the first.
+	 */
+
+	/* Only for first page: handle non-imported range at the beginning. */
+	if (offset_within_page > 0) {
+		dma_sync_single_for_device(kctx->kbdev->dev, dma_addr, offset_within_page,
+					   DMA_BIDIRECTIONAL);
+		dma_addr += offset_within_page;
+	}
+
+	/* For every page: handle imported range. */
+	if (imported_size > 0)
+		dma_sync_single_for_cpu(kctx->kbdev->dev, dma_addr, imported_size,
+					DMA_BIDIRECTIONAL);
+
+	/* Only for last page (that may coincide with first page):
+	 * handle non-imported range at the end.
+	 */
+	if ((imported_size + offset_within_page) < PAGE_SIZE) {
+		dma_addr += imported_size;
+		dma_sync_single_for_device(kctx->kbdev->dev, dma_addr,
+					   PAGE_SIZE - imported_size - offset_within_page,
+					   DMA_BIDIRECTIONAL);
+	}
+}
+
 /* This function would also perform the work of unpinning pages on Job Manager
  * GPUs, which implies that a call to kbase_jd_user_buf_pin_pages() will NOT
  * have a corresponding call to kbase_jd_user_buf_unpin_pages().
  */
-static void kbase_jd_user_buf_unmap(struct kbase_context *kctx,
-		struct kbase_mem_phy_alloc *alloc, bool writeable)
+static void kbase_jd_user_buf_unmap(struct kbase_context *kctx, struct kbase_mem_phy_alloc *alloc,
+				    struct kbase_va_region *reg)
 {
 	long i;
 	struct page **pages;
-	unsigned long size = alloc->imported.user_buf.size;
+	unsigned long offset_within_page = alloc->imported.user_buf.address & ~PAGE_MASK;
+	unsigned long remaining_size = alloc->imported.user_buf.size;
+	bool writable = (reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR));
+
+	lockdep_assert_held(&kctx->reg_lock);
 
 	KBASE_DEBUG_ASSERT(alloc->type == KBASE_MEM_TYPE_IMPORTED_USER_BUF);
 	pages = alloc->imported.user_buf.pages;
+
+#if !MALI_USE_CSF
+	kbase_mem_shrink_cpu_mapping(kctx, reg, 0, alloc->nents);
+#endif
+
 	for (i = 0; i < alloc->imported.user_buf.nr_pages; i++) {
-		unsigned long local_size;
+		unsigned long imported_size = MIN(remaining_size, PAGE_SIZE - offset_within_page);
+		/* Notice: this is a temporary variable that is used for DMA sync
+		 * operations, and that could be incremented by an offset if the
+		 * current page contains both imported and non-imported memory
+		 * sub-regions.
+		 *
+		 * It is valid to add an offset to this value, because the offset
+		 * is always kept within the physically contiguous dma-mapped range
+		 * and there's no need to translate to physical address to offset it.
+		 *
+		 * This variable is not going to be used for the actual DMA unmap
+		 * operation, that shall always use the original DMA address of the
+		 * whole memory page.
+		 */
 		dma_addr_t dma_addr = alloc->imported.user_buf.dma_addrs[i];
+		enum dma_data_direction dma_dir = writable ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+
+		if (writable)
+			user_buf_sync_writable_page(kctx, imported_size, dma_addr,
+						    offset_within_page);
+		else
+			user_buf_sync_read_only_page(kctx, imported_size, dma_addr,
+						     offset_within_page);
 
-		local_size = MIN(size, PAGE_SIZE - (dma_addr & ~PAGE_MASK));
-		dma_unmap_page(kctx->kbdev->dev, dma_addr, local_size,
-				DMA_BIDIRECTIONAL);
-		if (writeable)
+			/* Notice: use the original DMA address to unmap the whole memory page. */
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+		dma_unmap_page(kctx->kbdev->dev, alloc->imported.user_buf.dma_addrs[i], PAGE_SIZE,
+			       dma_dir);
+#else
+		dma_unmap_page_attrs(kctx->kbdev->dev, alloc->imported.user_buf.dma_addrs[i],
+				     PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+#endif
+		if (writable)
 			set_page_dirty_lock(pages[i]);
 #if !MALI_USE_CSF
 		kbase_unpin_user_buf_page(pages[i]);
 		pages[i] = NULL;
 #endif
 
-		size -= local_size;
+		remaining_size -= imported_size;
+		offset_within_page = 0;
 	}
 #if !MALI_USE_CSF
 	alloc->nents = 0;
@@ -4964,7 +5266,8 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages,
 		void *src_page, size_t *to_copy, unsigned int nr_pages,
 		unsigned int *target_page_nr, size_t offset)
 {
-	void *target_page = kmap(dest_pages[*target_page_nr]);
+	void *target_page = kbase_kmap(dest_pages[*target_page_nr]);
+
 	size_t chunk = PAGE_SIZE-offset;
 
 	if (!target_page) {
@@ -4977,13 +5280,13 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages,
 	memcpy(target_page + offset, src_page, chunk);
 	*to_copy -= chunk;
 
-	kunmap(dest_pages[*target_page_nr]);
+	kbase_kunmap(dest_pages[*target_page_nr], target_page);
 
 	*target_page_nr += 1;
 	if (*target_page_nr >= nr_pages || *to_copy == 0)
 		return 0;
 
-	target_page = kmap(dest_pages[*target_page_nr]);
+	target_page = kbase_kmap(dest_pages[*target_page_nr]);
 	if (!target_page) {
 		pr_err("%s: kmap failure", __func__);
 		return -ENOMEM;
@@ -4995,16 +5298,16 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages,
 	memcpy(target_page, src_page + PAGE_SIZE-offset, chunk);
 	*to_copy -= chunk;
 
-	kunmap(dest_pages[*target_page_nr]);
+	kbase_kunmap(dest_pages[*target_page_nr], target_page);
 
 	return 0;
 }
 
-struct kbase_mem_phy_alloc *kbase_map_external_resource(
-		struct kbase_context *kctx, struct kbase_va_region *reg,
-		struct mm_struct *locked_mm)
+int kbase_map_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg,
+				struct mm_struct *locked_mm)
 {
-	int err;
+	int err = 0;
+	struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
 
 	lockdep_assert_held(&kctx->reg_lock);
 
@@ -5013,7 +5316,7 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource(
 	case KBASE_MEM_TYPE_IMPORTED_USER_BUF: {
 		if ((reg->gpu_alloc->imported.user_buf.mm != locked_mm) &&
 		    (!reg->gpu_alloc->nents))
-			goto exit;
+			return -EINVAL;
 
 		reg->gpu_alloc->imported.user_buf.current_mapping_usage_count++;
 		if (reg->gpu_alloc->imported.user_buf
@@ -5021,7 +5324,7 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource(
 			err = kbase_jd_user_buf_map(kctx, reg);
 			if (err) {
 				reg->gpu_alloc->imported.user_buf.current_mapping_usage_count--;
-				goto exit;
+				return err;
 			}
 		}
 	}
@@ -5029,21 +5332,30 @@ struct kbase_mem_phy_alloc *kbase_map_external_resource(
 	case KBASE_MEM_TYPE_IMPORTED_UMM: {
 		err = kbase_mem_umm_map(kctx, reg);
 		if (err)
-			goto exit;
+			return err;
 		break;
 	}
 	default:
-		goto exit;
+		dev_dbg(kctx->kbdev->dev,
+			"Invalid external resource GPU allocation type (%x) on mapping",
+			alloc->type);
+		return -EINVAL;
 	}
 
-	return kbase_mem_phy_alloc_get(reg->gpu_alloc);
-exit:
-	return NULL;
+	kbase_va_region_alloc_get(kctx, reg);
+	kbase_mem_phy_alloc_get(alloc);
+	return err;
 }
 
-void kbase_unmap_external_resource(struct kbase_context *kctx,
-		struct kbase_va_region *reg, struct kbase_mem_phy_alloc *alloc)
+void kbase_unmap_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg)
 {
+	/* gpu_alloc was used in kbase_map_external_resources, so we need to use it for the
+	 * unmapping operation.
+	 */
+	struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
+
+	lockdep_assert_held(&kctx->reg_lock);
+
 	switch (alloc->type) {
 	case KBASE_MEM_TYPE_IMPORTED_UMM: {
 		kbase_mem_umm_unmap(kctx, reg, alloc);
@@ -5053,28 +5365,29 @@ void kbase_unmap_external_resource(struct kbase_context *kctx,
 		alloc->imported.user_buf.current_mapping_usage_count--;
 
 		if (alloc->imported.user_buf.current_mapping_usage_count == 0) {
-			bool writeable = true;
-
-			if (!kbase_is_region_invalid_or_free(reg) &&
-					reg->gpu_alloc == alloc)
-				kbase_mmu_teardown_pages(
-						kctx->kbdev,
-						&kctx->mmu,
-						reg->start_pfn,
-						kbase_reg_current_backed_size(reg),
-						kctx->as_nr);
-
-			if (reg && ((reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR)) == 0))
-				writeable = false;
+			if (!kbase_is_region_invalid_or_free(reg)) {
+				kbase_mmu_teardown_imported_pages(
+					kctx->kbdev, &kctx->mmu, reg->start_pfn, alloc->pages,
+					kbase_reg_current_backed_size(reg),
+					kbase_reg_current_backed_size(reg), kctx->as_nr);
+			}
 
-			kbase_jd_user_buf_unmap(kctx, alloc, writeable);
+			kbase_jd_user_buf_unmap(kctx, alloc, reg);
+		}
 		}
-	}
 	break;
 	default:
-	break;
+		WARN(1, "Invalid external resource GPU allocation type (%x) on unmapping",
+		     alloc->type);
+		return;
 	}
 	kbase_mem_phy_alloc_put(alloc);
+	kbase_va_region_alloc_put(kctx, reg);
+}
+
+static inline u64 kbasep_get_va_gpu_addr(struct kbase_va_region *reg)
+{
+	return reg->start_pfn << PAGE_SHIFT;
 }
 
 struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire(
@@ -5090,7 +5403,7 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire(
 	 * metadata which matches the region which is being acquired.
 	 */
 	list_for_each_entry(walker, &kctx->ext_res_meta_head, ext_res_node) {
-		if (walker->gpu_addr == gpu_addr) {
+		if (kbasep_get_va_gpu_addr(walker->reg) == gpu_addr) {
 			meta = walker;
 			meta->ref++;
 			break;
@@ -5102,8 +5415,7 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire(
 		struct kbase_va_region *reg;
 
 		/* Find the region */
-		reg = kbase_region_tracker_find_region_enclosing_address(
-				kctx, gpu_addr);
+		reg = kbase_region_tracker_find_region_enclosing_address(kctx, gpu_addr);
 		if (kbase_is_region_invalid_or_free(reg))
 			goto failed;
 
@@ -5111,18 +5423,18 @@ struct kbase_ctx_ext_res_meta *kbase_sticky_resource_acquire(
 		meta = kzalloc(sizeof(*meta), GFP_KERNEL);
 		if (!meta)
 			goto failed;
-
 		/*
 		 * Fill in the metadata object and acquire a reference
 		 * for the physical resource.
 		 */
-		meta->alloc = kbase_map_external_resource(kctx, reg, NULL);
-		meta->ref = 1;
+		meta->reg = reg;
 
-		if (!meta->alloc)
+		/* Map the external resource to the GPU allocation of the region
+		 * and acquire the reference to the VA region
+		 */
+		if (kbase_map_external_resource(kctx, meta->reg, NULL))
 			goto fail_map;
-
-		meta->gpu_addr = reg->start_pfn << PAGE_SHIFT;
+		meta->ref = 1;
 
 		list_add(&meta->ext_res_node, &kctx->ext_res_meta_head);
 	}
@@ -5147,7 +5459,7 @@ find_sticky_resource_meta(struct kbase_context *kctx, u64 gpu_addr)
 	 * metadata which matches the region which is being released.
 	 */
 	list_for_each_entry(walker, &kctx->ext_res_meta_head, ext_res_node)
-		if (walker->gpu_addr == gpu_addr)
+		if (kbasep_get_va_gpu_addr(walker->reg) == gpu_addr)
 			return walker;
 
 	return NULL;
@@ -5156,14 +5468,7 @@ find_sticky_resource_meta(struct kbase_context *kctx, u64 gpu_addr)
 static void release_sticky_resource_meta(struct kbase_context *kctx,
 		struct kbase_ctx_ext_res_meta *meta)
 {
-	struct kbase_va_region *reg;
-
-	/* Drop the physical memory reference and free the metadata. */
-	reg = kbase_region_tracker_find_region_enclosing_address(
-			kctx,
-			meta->gpu_addr);
-
-	kbase_unmap_external_resource(kctx, reg, meta->alloc);
+	kbase_unmap_external_resource(kctx, meta->reg);
 	list_del(&meta->ext_res_node);
 	kfree(meta);
 }
diff --git a/mali_kbase/mali_kbase_mem.h b/mali_kbase/mali_kbase_mem.h
index 4ac4feb..1a59706 100644
--- a/mali_kbase/mali_kbase_mem.h
+++ b/mali_kbase/mali_kbase_mem.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -37,6 +37,8 @@
 #include "mali_kbase_defs.h"
 /* Required for kbase_mem_evictable_unmake */
 #include "mali_kbase_mem_linux.h"
+#include "mali_kbase_mem_migrate.h"
+#include "mali_kbase_refcount_defs.h"
 
 static inline void kbase_process_page_usage_inc(struct kbase_context *kctx,
 		int pages);
@@ -60,6 +62,186 @@ static inline void kbase_process_page_usage_inc(struct kbase_context *kctx,
 #define KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_HW_ISSUE_8316 (1u << KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_LOG2_HW_ISSUE_8316)
 #define KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_HW_ISSUE_9630 (1u << KBASEP_TMEM_GROWABLE_BLOCKSIZE_PAGES_LOG2_HW_ISSUE_9630)
 
+/* Free region */
+#define KBASE_REG_FREE (1ul << 0)
+/* CPU write access */
+#define KBASE_REG_CPU_WR (1ul << 1)
+/* GPU write access */
+#define KBASE_REG_GPU_WR (1ul << 2)
+/* No eXecute flag */
+#define KBASE_REG_GPU_NX (1ul << 3)
+/* Is CPU cached? */
+#define KBASE_REG_CPU_CACHED (1ul << 4)
+/* Is GPU cached?
+ * Some components within the GPU might only be able to access memory that is
+ * GPU cacheable. Refer to the specific GPU implementation for more details.
+ */
+#define KBASE_REG_GPU_CACHED (1ul << 5)
+
+#define KBASE_REG_GROWABLE (1ul << 6)
+/* Can grow on pf? */
+#define KBASE_REG_PF_GROW (1ul << 7)
+
+/* Allocation doesn't straddle the 4GB boundary in GPU virtual space */
+#define KBASE_REG_GPU_VA_SAME_4GB_PAGE (1ul << 8)
+
+/* inner shareable coherency */
+#define KBASE_REG_SHARE_IN (1ul << 9)
+/* inner & outer shareable coherency */
+#define KBASE_REG_SHARE_BOTH (1ul << 10)
+
+#if MALI_USE_CSF
+/* Space for 8 different zones */
+#define KBASE_REG_ZONE_BITS 3
+#else
+/* Space for 4 different zones */
+#define KBASE_REG_ZONE_BITS 2
+#endif
+
+/* The bits 11-13 (inclusive) of the kbase_va_region flag are reserved
+ * for information about the zone in which it was allocated.
+ */
+#define KBASE_REG_ZONE_SHIFT (11ul)
+#define KBASE_REG_ZONE_MASK (((1 << KBASE_REG_ZONE_BITS) - 1ul) << KBASE_REG_ZONE_SHIFT)
+
+#if KBASE_REG_ZONE_MAX > (1 << KBASE_REG_ZONE_BITS)
+#error "Too many zones for the number of zone bits defined"
+#endif
+
+/* GPU read access */
+#define KBASE_REG_GPU_RD (1ul << 14)
+/* CPU read access */
+#define KBASE_REG_CPU_RD (1ul << 15)
+
+/* Index of chosen MEMATTR for this region (0..7) */
+#define KBASE_REG_MEMATTR_MASK (7ul << 16)
+#define KBASE_REG_MEMATTR_INDEX(x) (((x)&7) << 16)
+#define KBASE_REG_MEMATTR_VALUE(x) (((x)&KBASE_REG_MEMATTR_MASK) >> 16)
+
+#define KBASE_REG_PROTECTED (1ul << 19)
+
+/* Region belongs to a shrinker.
+ *
+ * This can either mean that it is part of the JIT/Ephemeral or tiler heap
+ * shrinker paths. Should be removed only after making sure that there are
+ * no references remaining to it in these paths, as it may cause the physical
+ * backing of the region to disappear during use.
+ */
+#define KBASE_REG_DONT_NEED (1ul << 20)
+
+/* Imported buffer is padded? */
+#define KBASE_REG_IMPORT_PAD (1ul << 21)
+
+#if MALI_USE_CSF
+/* CSF event memory */
+#define KBASE_REG_CSF_EVENT (1ul << 22)
+/* Bit 23 is reserved.
+ *
+ * Do not remove, use the next unreserved bit for new flags
+ */
+#define KBASE_REG_RESERVED_BIT_23 (1ul << 23)
+#else
+/* Bit 22 is reserved.
+ *
+ * Do not remove, use the next unreserved bit for new flags
+ */
+#define KBASE_REG_RESERVED_BIT_22 (1ul << 22)
+/* The top of the initial commit is aligned to extension pages.
+ * Extent must be a power of 2
+ */
+#define KBASE_REG_TILER_ALIGN_TOP (1ul << 23)
+#endif /* MALI_USE_CSF */
+
+/* Bit 24 is currently unused and is available for use for a new flag */
+
+/* Memory has permanent kernel side mapping */
+#define KBASE_REG_PERMANENT_KERNEL_MAPPING (1ul << 25)
+
+/* GPU VA region has been freed by the userspace, but still remains allocated
+ * due to the reference held by CPU mappings created on the GPU VA region.
+ *
+ * A region with this flag set has had kbase_gpu_munmap() called on it, but can
+ * still be looked-up in the region tracker as a non-free region. Hence must
+ * not create or update any more GPU mappings on such regions because they will
+ * not be unmapped when the region is finally destroyed.
+ *
+ * Since such regions are still present in the region tracker, new allocations
+ * attempted with BASE_MEM_SAME_VA might fail if their address intersects with
+ * a region with this flag set.
+ *
+ * In addition, this flag indicates the gpu_alloc member might no longer valid
+ * e.g. in infinite cache simulation.
+ */
+#define KBASE_REG_VA_FREED (1ul << 26)
+
+/* If set, the heap info address points to a u32 holding the used size in bytes;
+ * otherwise it points to a u64 holding the lowest address of unused memory.
+ */
+#define KBASE_REG_HEAP_INFO_IS_SIZE (1ul << 27)
+
+/* Allocation is actively used for JIT memory */
+#define KBASE_REG_ACTIVE_JIT_ALLOC (1ul << 28)
+
+#if MALI_USE_CSF
+/* This flag only applies to allocations in the EXEC_FIXED_VA and FIXED_VA
+ * memory zones, and it determines whether they were created with a fixed
+ * GPU VA address requested by the user.
+ */
+#define KBASE_REG_FIXED_ADDRESS (1ul << 29)
+#else
+#define KBASE_REG_RESERVED_BIT_29 (1ul << 29)
+#endif
+
+#define KBASE_REG_ZONE_CUSTOM_VA_BASE (0x100000000ULL >> PAGE_SHIFT)
+
+#if MALI_USE_CSF
+/* only used with 32-bit clients */
+/* On a 32bit platform, custom VA should be wired from 4GB to 2^(43).
+ */
+#define KBASE_REG_ZONE_CUSTOM_VA_SIZE (((1ULL << 43) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE)
+#else
+/* only used with 32-bit clients */
+/* On a 32bit platform, custom VA should be wired from 4GB to the VA limit of the
+ * GPU. Unfortunately, the Linux mmap() interface limits us to 2^32 pages (2^44
+ * bytes, see mmap64 man page for reference).  So we put the default limit to the
+ * maximum possible on Linux and shrink it down, if required by the GPU, during
+ * initialization.
+ */
+#define KBASE_REG_ZONE_CUSTOM_VA_SIZE (((1ULL << 44) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE)
+/* end 32-bit clients only */
+#endif
+
+/* The starting address and size of the GPU-executable zone are dynamic
+ * and depend on the platform and the number of pages requested by the
+ * user process, with an upper limit of 4 GB.
+ */
+#define KBASE_REG_ZONE_EXEC_VA_MAX_PAGES ((1ULL << 32) >> PAGE_SHIFT) /* 4 GB */
+#define KBASE_REG_ZONE_EXEC_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES
+
+#if MALI_USE_CSF
+#define KBASE_REG_ZONE_MCU_SHARED_BASE (0x04000000ULL >> PAGE_SHIFT)
+#define MCU_SHARED_ZONE_SIZE (((0x08000000ULL) >> PAGE_SHIFT) - KBASE_REG_ZONE_MCU_SHARED_BASE)
+
+/* For CSF GPUs, the EXEC_VA zone is always 4GB in size, and starts at 2^47 for 64-bit
+ * clients, and 2^43 for 32-bit clients.
+ */
+#define KBASE_REG_ZONE_EXEC_VA_BASE_64 ((1ULL << 47) >> PAGE_SHIFT)
+#define KBASE_REG_ZONE_EXEC_VA_BASE_32 ((1ULL << 43) >> PAGE_SHIFT)
+/* Executable zone supporting FIXED/FIXABLE allocations.
+ * It is always 4GB in size.
+ */
+#define KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES
+
+/* Non-executable zone supporting FIXED/FIXABLE allocations.
+ * It extends from (2^47) up to (2^48)-1, for 64-bit userspace clients, and from
+ * (2^43) up to (2^44)-1 for 32-bit userspace clients. For the same reason,
+ * the end of the FIXED_VA zone for 64-bit clients is (2^48)-1.
+ */
+#define KBASE_REG_ZONE_FIXED_VA_END_64 ((1ULL << 48) >> PAGE_SHIFT)
+#define KBASE_REG_ZONE_FIXED_VA_END_32 ((1ULL << 44) >> PAGE_SHIFT)
+
+#endif
+
 /*
  * A CPU mapping
  */
@@ -182,6 +364,106 @@ struct kbase_mem_phy_alloc {
 	} imported;
 };
 
+/**
+ * enum kbase_page_status - Status of a page used for page migration.
+ *
+ * @MEM_POOL: Stable state. Page is located in a memory pool and can safely
+ *            be migrated.
+ * @ALLOCATE_IN_PROGRESS: Transitory state. A page is set to this status as
+ *                        soon as it leaves a memory pool.
+ * @SPILL_IN_PROGRESS: Transitory state. Corner case where pages in a memory
+ *                     pool of a dying context are being moved to the device
+ *                     memory pool.
+ * @NOT_MOVABLE: Stable state. Page has been allocated for an object that is
+ *               not movable, but may return to be movable when the object
+ *               is freed.
+ * @ALLOCATED_MAPPED: Stable state. Page has been allocated, mapped to GPU
+ *                    and has reference to kbase_mem_phy_alloc object.
+ * @PT_MAPPED: Stable state. Similar to ALLOCATED_MAPPED, but page doesn't
+ *             reference kbase_mem_phy_alloc object. Used as a page in MMU
+ *             page table.
+ * @FREE_IN_PROGRESS: Transitory state. A page is set to this status as soon as
+ *                    the driver manages to acquire a lock on the page while
+ *                    unmapping it. This status means that a memory release is
+ *                    happening and it's still not complete.
+ * @FREE_ISOLATED_IN_PROGRESS: Transitory state. This is a very particular corner case.
+ *                             A page is isolated while it is in ALLOCATED_MAPPED state,
+ *                             but then the driver tries to destroy the allocation.
+ * @FREE_PT_ISOLATED_IN_PROGRESS: Transitory state. This is a very particular corner case.
+ *                                A page is isolated while it is in PT_MAPPED state, but
+ *                                then the driver tries to destroy the allocation.
+ *
+ * Pages can only be migrated in stable states.
+ */
+enum kbase_page_status {
+	MEM_POOL = 0,
+	ALLOCATE_IN_PROGRESS,
+	SPILL_IN_PROGRESS,
+	NOT_MOVABLE,
+	ALLOCATED_MAPPED,
+	PT_MAPPED,
+	FREE_IN_PROGRESS,
+	FREE_ISOLATED_IN_PROGRESS,
+	FREE_PT_ISOLATED_IN_PROGRESS,
+};
+
+#define PGD_VPFN_LEVEL_MASK ((u64)0x3)
+#define PGD_VPFN_LEVEL_GET_LEVEL(pgd_vpfn_level) (pgd_vpfn_level & PGD_VPFN_LEVEL_MASK)
+#define PGD_VPFN_LEVEL_GET_VPFN(pgd_vpfn_level) (pgd_vpfn_level & ~PGD_VPFN_LEVEL_MASK)
+#define PGD_VPFN_LEVEL_SET(pgd_vpfn, level)                                                        \
+	((pgd_vpfn & ~PGD_VPFN_LEVEL_MASK) | (level & PGD_VPFN_LEVEL_MASK))
+
+/**
+ * struct kbase_page_metadata - Metadata for each page in kbase
+ *
+ * @kbdev:         Pointer to kbase device.
+ * @dma_addr:      DMA address mapped to page.
+ * @migrate_lock:  A spinlock to protect the private metadata.
+ * @data:          Member in union valid based on @status.
+ * @status:        Status to keep track if page can be migrated at any
+ *                 given moment. MSB will indicate if page is isolated.
+ *                 Protected by @migrate_lock.
+ * @vmap_count:    Counter of kernel mappings.
+ * @group_id:      Memory group ID obtained at the time of page allocation.
+ *
+ * Each 4KB page will have a reference to this struct in the private field.
+ * This will be used to keep track of information required for Linux page
+ * migration functionality as well as address for DMA mapping.
+ */
+struct kbase_page_metadata {
+	dma_addr_t dma_addr;
+	spinlock_t migrate_lock;
+
+	union {
+		struct {
+			struct kbase_mem_pool *pool;
+			/* Pool could be terminated after page is isolated and therefore
+			 * won't be able to get reference to kbase device.
+			 */
+			struct kbase_device *kbdev;
+		} mem_pool;
+		struct {
+			struct kbase_va_region *reg;
+			struct kbase_mmu_table *mmut;
+			u64 vpfn;
+		} mapped;
+		struct {
+			struct kbase_mmu_table *mmut;
+			u64 pgd_vpfn_level;
+		} pt_mapped;
+		struct {
+			struct kbase_device *kbdev;
+		} free_isolated;
+		struct {
+			struct kbase_device *kbdev;
+		} free_pt_isolated;
+	} data;
+
+	u8 status;
+	u8 vmap_count;
+	u8 group_id;
+};
+
 /* The top bit of kbase_alloc_import_user_buf::current_mapping_usage_count is
  * used to signify that a buffer was pinned when it was imported. Since the
  * reference count is limited by the number of atoms that can be submitted at
@@ -204,6 +486,46 @@ enum kbase_jit_report_flags {
 	KBASE_JIT_REPORT_ON_ALLOC_OR_FREE = (1u << 0)
 };
 
+/**
+ * kbase_zone_to_bits - Convert a memory zone @zone to the corresponding
+ *                      bitpattern, for ORing together with other flags.
+ * @zone: Memory zone
+ *
+ * Return: Bitpattern with the appropriate bits set.
+ */
+unsigned long kbase_zone_to_bits(enum kbase_memory_zone zone);
+
+/**
+ * kbase_bits_to_zone - Convert the bitpattern @zone_bits to the corresponding
+ *                      zone identifier
+ * @zone_bits: Memory allocation flag containing a zone pattern
+ *
+ * Return: Zone identifier for valid zone bitpatterns,
+ */
+enum kbase_memory_zone kbase_bits_to_zone(unsigned long zone_bits);
+
+/**
+ * kbase_mem_zone_get_name - Get the string name for a given memory zone
+ * @zone: Memory zone identifier
+ *
+ * Return: string for valid memory zone, NULL otherwise
+ */
+char *kbase_reg_zone_get_name(enum kbase_memory_zone zone);
+
+/**
+ * kbase_set_phy_alloc_page_status - Set the page migration status of the underlying
+ *                                   physical allocation.
+ * @alloc:  the physical allocation containing the pages whose metadata is going
+ *          to be modified
+ * @status: the status the pages should end up in
+ *
+ * Note that this function does not go through all of the checking to ensure that
+ * proper states are set. Instead, it is only used when we change the allocation
+ * to NOT_MOVABLE or from NOT_MOVABLE to ALLOCATED_MAPPED
+ */
+void kbase_set_phy_alloc_page_status(struct kbase_mem_phy_alloc *alloc,
+				     enum kbase_page_status status);
+
 static inline void kbase_mem_phy_alloc_gpu_mapped(struct kbase_mem_phy_alloc *alloc)
 {
 	KBASE_DEBUG_ASSERT(alloc);
@@ -224,8 +546,9 @@ static inline void kbase_mem_phy_alloc_gpu_unmapped(struct kbase_mem_phy_alloc *
 }
 
 /**
- * kbase_mem_phy_alloc_kernel_mapped - Increment kernel_mappings
- * counter for a memory region to prevent commit and flag changes
+ * kbase_mem_phy_alloc_kernel_mapped - Increment kernel_mappings counter for a
+ *                                     memory region to prevent commit and flag
+ *                                     changes
  *
  * @alloc:  Pointer to physical pages tracking object
  */
@@ -303,6 +626,8 @@ static inline struct kbase_mem_phy_alloc *kbase_mem_phy_alloc_put(struct kbase_m
  * @jit_usage_id: The last just-in-time memory usage ID for this region.
  * @jit_bin_id:   The just-in-time memory bin this region came from.
  * @va_refcnt:    Number of users of this region. Protected by reg_lock.
+ * @no_user_free_count:    Number of contexts that want to prevent the region
+ *                         from being freed by userspace.
  * @heap_info_gpu_addr: Pointer to an object in GPU memory defining an end of
  *                      an allocated region
  *                      The object can be one of:
@@ -330,200 +655,6 @@ struct kbase_va_region {
 	size_t nr_pages;
 	size_t initial_commit;
 	size_t threshold_pages;
-
-/* Free region */
-#define KBASE_REG_FREE              (1ul << 0)
-/* CPU write access */
-#define KBASE_REG_CPU_WR            (1ul << 1)
-/* GPU write access */
-#define KBASE_REG_GPU_WR            (1ul << 2)
-/* No eXecute flag */
-#define KBASE_REG_GPU_NX            (1ul << 3)
-/* Is CPU cached? */
-#define KBASE_REG_CPU_CACHED        (1ul << 4)
-/* Is GPU cached?
- * Some components within the GPU might only be able to access memory that is
- * GPU cacheable. Refer to the specific GPU implementation for more details.
- */
-#define KBASE_REG_GPU_CACHED        (1ul << 5)
-
-#define KBASE_REG_GROWABLE          (1ul << 6)
-/* Can grow on pf? */
-#define KBASE_REG_PF_GROW           (1ul << 7)
-
-/* Allocation doesn't straddle the 4GB boundary in GPU virtual space */
-#define KBASE_REG_GPU_VA_SAME_4GB_PAGE (1ul << 8)
-
-/* inner shareable coherency */
-#define KBASE_REG_SHARE_IN          (1ul << 9)
-/* inner & outer shareable coherency */
-#define KBASE_REG_SHARE_BOTH        (1ul << 10)
-
-#if MALI_USE_CSF
-/* Space for 8 different zones */
-#define KBASE_REG_ZONE_BITS 3
-#else
-/* Space for 4 different zones */
-#define KBASE_REG_ZONE_BITS 2
-#endif
-
-#define KBASE_REG_ZONE_MASK (((1 << KBASE_REG_ZONE_BITS) - 1ul) << 11)
-#define KBASE_REG_ZONE(x) (((x) & ((1 << KBASE_REG_ZONE_BITS) - 1ul)) << 11)
-#define KBASE_REG_ZONE_IDX(x)       (((x) & KBASE_REG_ZONE_MASK) >> 11)
-
-#if KBASE_REG_ZONE_MAX > (1 << KBASE_REG_ZONE_BITS)
-#error "Too many zones for the number of zone bits defined"
-#endif
-
-/* GPU read access */
-#define KBASE_REG_GPU_RD (1ul << 14)
-/* CPU read access */
-#define KBASE_REG_CPU_RD (1ul << 15)
-
-/* Index of chosen MEMATTR for this region (0..7) */
-#define KBASE_REG_MEMATTR_MASK      (7ul << 16)
-#define KBASE_REG_MEMATTR_INDEX(x)  (((x) & 7) << 16)
-#define KBASE_REG_MEMATTR_VALUE(x)  (((x) & KBASE_REG_MEMATTR_MASK) >> 16)
-
-#define KBASE_REG_PROTECTED         (1ul << 19)
-
-#define KBASE_REG_DONT_NEED         (1ul << 20)
-
-/* Imported buffer is padded? */
-#define KBASE_REG_IMPORT_PAD        (1ul << 21)
-
-#if MALI_USE_CSF
-/* CSF event memory */
-#define KBASE_REG_CSF_EVENT         (1ul << 22)
-#else
-/* Bit 22 is reserved.
- *
- * Do not remove, use the next unreserved bit for new flags
- */
-#define KBASE_REG_RESERVED_BIT_22   (1ul << 22)
-#endif
-
-#if !MALI_USE_CSF
-/* The top of the initial commit is aligned to extension pages.
- * Extent must be a power of 2
- */
-#define KBASE_REG_TILER_ALIGN_TOP   (1ul << 23)
-#else
-/* Bit 23 is reserved.
- *
- * Do not remove, use the next unreserved bit for new flags
- */
-#define KBASE_REG_RESERVED_BIT_23   (1ul << 23)
-#endif /* !MALI_USE_CSF */
-
-/* Whilst this flag is set the GPU allocation is not supposed to be freed by
- * user space. The flag will remain set for the lifetime of JIT allocations.
- */
-#define KBASE_REG_NO_USER_FREE      (1ul << 24)
-
-/* Memory has permanent kernel side mapping */
-#define KBASE_REG_PERMANENT_KERNEL_MAPPING (1ul << 25)
-
-/* GPU VA region has been freed by the userspace, but still remains allocated
- * due to the reference held by CPU mappings created on the GPU VA region.
- *
- * A region with this flag set has had kbase_gpu_munmap() called on it, but can
- * still be looked-up in the region tracker as a non-free region. Hence must
- * not create or update any more GPU mappings on such regions because they will
- * not be unmapped when the region is finally destroyed.
- *
- * Since such regions are still present in the region tracker, new allocations
- * attempted with BASE_MEM_SAME_VA might fail if their address intersects with
- * a region with this flag set.
- *
- * In addition, this flag indicates the gpu_alloc member might no longer valid
- * e.g. in infinite cache simulation.
- */
-#define KBASE_REG_VA_FREED (1ul << 26)
-
-/* If set, the heap info address points to a u32 holding the used size in bytes;
- * otherwise it points to a u64 holding the lowest address of unused memory.
- */
-#define KBASE_REG_HEAP_INFO_IS_SIZE (1ul << 27)
-
-/* Allocation is actively used for JIT memory */
-#define KBASE_REG_ACTIVE_JIT_ALLOC (1ul << 28)
-
-#if MALI_USE_CSF
-/* This flag only applies to allocations in the EXEC_FIXED_VA and FIXED_VA
- * memory zones, and it determines whether they were created with a fixed
- * GPU VA address requested by the user.
- */
-#define KBASE_REG_FIXED_ADDRESS (1ul << 29)
-#else
-#define KBASE_REG_RESERVED_BIT_29 (1ul << 29)
-#endif
-
-#define KBASE_REG_ZONE_SAME_VA      KBASE_REG_ZONE(0)
-
-#define KBASE_REG_ZONE_CUSTOM_VA         KBASE_REG_ZONE(1)
-#define KBASE_REG_ZONE_CUSTOM_VA_BASE    (0x100000000ULL >> PAGE_SHIFT)
-
-#if MALI_USE_CSF
-/* only used with 32-bit clients */
-/* On a 32bit platform, custom VA should be wired from 4GB to 2^(43).
- */
-#define KBASE_REG_ZONE_CUSTOM_VA_SIZE \
-		(((1ULL << 43) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE)
-#else
-/* only used with 32-bit clients */
-/* On a 32bit platform, custom VA should be wired from 4GB to the VA limit of the
- * GPU. Unfortunately, the Linux mmap() interface limits us to 2^32 pages (2^44
- * bytes, see mmap64 man page for reference).  So we put the default limit to the
- * maximum possible on Linux and shrink it down, if required by the GPU, during
- * initialization.
- */
-#define KBASE_REG_ZONE_CUSTOM_VA_SIZE \
-		(((1ULL << 44) >> PAGE_SHIFT) - KBASE_REG_ZONE_CUSTOM_VA_BASE)
-/* end 32-bit clients only */
-#endif
-
-/* The starting address and size of the GPU-executable zone are dynamic
- * and depend on the platform and the number of pages requested by the
- * user process, with an upper limit of 4 GB.
- */
-#define KBASE_REG_ZONE_EXEC_VA           KBASE_REG_ZONE(2)
-#define KBASE_REG_ZONE_EXEC_VA_MAX_PAGES ((1ULL << 32) >> PAGE_SHIFT) /* 4 GB */
-
-#if MALI_USE_CSF
-#define KBASE_REG_ZONE_MCU_SHARED      KBASE_REG_ZONE(3)
-#define KBASE_REG_ZONE_MCU_SHARED_BASE (0x04000000ULL >> PAGE_SHIFT)
-#define KBASE_REG_ZONE_MCU_SHARED_SIZE (((0x08000000ULL) >> PAGE_SHIFT) - \
-		KBASE_REG_ZONE_MCU_SHARED_BASE)
-
-/* For CSF GPUs, the EXEC_VA zone is always 4GB in size, and starts at 2^47 for 64-bit
- * clients, and 2^43 for 32-bit clients.
- */
-#define KBASE_REG_ZONE_EXEC_VA_BASE_64 ((1ULL << 47) >> PAGE_SHIFT)
-#define KBASE_REG_ZONE_EXEC_VA_BASE_32 ((1ULL << 43) >> PAGE_SHIFT)
-#define KBASE_REG_ZONE_EXEC_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES
-
-/* Executable zone supporting FIXED/FIXABLE allocations.
- * It is always 4GB in size.
- */
-
-#define KBASE_REG_ZONE_EXEC_FIXED_VA KBASE_REG_ZONE(4)
-#define KBASE_REG_ZONE_EXEC_FIXED_VA_SIZE KBASE_REG_ZONE_EXEC_VA_MAX_PAGES
-
-/* Non-executable zone supporting FIXED/FIXABLE allocations.
- * It extends from (2^47) up to (2^48)-1, for 64-bit userspace clients, and from
- * (2^43) up to (2^44)-1 for 32-bit userspace clients.
- */
-#define KBASE_REG_ZONE_FIXED_VA KBASE_REG_ZONE(5)
-
-/* Again - 32-bit userspace cannot map addresses beyond 2^44, but 64-bit can - and so
- * the end of the FIXED_VA zone for 64-bit clients is (2^48)-1.
- */
-#define KBASE_REG_ZONE_FIXED_VA_END_64 ((1ULL << 48) >> PAGE_SHIFT)
-#define KBASE_REG_ZONE_FIXED_VA_END_32 ((1ULL << 44) >> PAGE_SHIFT)
-
-#endif
-
 	unsigned long flags;
 	size_t extension;
 	struct kbase_mem_phy_alloc *cpu_alloc;
@@ -559,24 +690,24 @@ struct kbase_va_region {
 	size_t used_pages;
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
-	int    va_refcnt;
+	kbase_refcount_t va_refcnt;
+	atomic_t no_user_free_count;
 };
 
 /**
- * kbase_is_ctx_reg_zone - determine whether a KBASE_REG_ZONE_<...> is for a
- *                         context or for a device
- * @zone_bits: A KBASE_REG_ZONE_<...> to query
+ * kbase_is_ctx_reg_zone - Determine whether a zone is associated with a
+ *                         context or with the device
+ * @zone: Zone identifier
  *
- * Return: True if the zone for @zone_bits is a context zone, False otherwise
+ * Return: True if @zone is a context zone, False otherwise
  */
-static inline bool kbase_is_ctx_reg_zone(unsigned long zone_bits)
+static inline bool kbase_is_ctx_reg_zone(enum kbase_memory_zone zone)
 {
-	WARN_ON((zone_bits & KBASE_REG_ZONE_MASK) != zone_bits);
-	return (zone_bits == KBASE_REG_ZONE_SAME_VA ||
 #if MALI_USE_CSF
-		zone_bits == KBASE_REG_ZONE_EXEC_FIXED_VA || zone_bits == KBASE_REG_ZONE_FIXED_VA ||
+	return !(zone == MCU_SHARED_ZONE);
+#else
+	return true;
 #endif
-		zone_bits == KBASE_REG_ZONE_CUSTOM_VA || zone_bits == KBASE_REG_ZONE_EXEC_VA);
 }
 
 /* Special marker for failed JIT allocations that still must be marked as
@@ -602,6 +733,23 @@ static inline bool kbase_is_region_invalid_or_free(struct kbase_va_region *reg)
 	return (kbase_is_region_invalid(reg) ||	kbase_is_region_free(reg));
 }
 
+/**
+ * kbase_is_region_shrinkable - Check if a region is "shrinkable".
+ * A shrinkable regions is a region for which its backing pages (reg->gpu_alloc->pages)
+ * can be freed at any point, even though the kbase_va_region structure itself
+ * may have been refcounted.
+ * Regions that aren't on a shrinker, but could be shrunk at any point in future
+ * without warning are still considered "shrinkable" (e.g. Active JIT allocs)
+ *
+ * @reg: Pointer to region
+ *
+ * Return: true if the region is "shrinkable", false if not.
+ */
+static inline bool kbase_is_region_shrinkable(struct kbase_va_region *reg)
+{
+	return (reg->flags & KBASE_REG_DONT_NEED) || (reg->flags & KBASE_REG_ACTIVE_JIT_ALLOC);
+}
+
 void kbase_remove_va_region(struct kbase_device *kbdev,
 			    struct kbase_va_region *reg);
 static inline void kbase_region_refcnt_free(struct kbase_device *kbdev,
@@ -619,14 +767,12 @@ static inline void kbase_region_refcnt_free(struct kbase_device *kbdev,
 static inline struct kbase_va_region *kbase_va_region_alloc_get(
 		struct kbase_context *kctx, struct kbase_va_region *region)
 {
-	lockdep_assert_held(&kctx->reg_lock);
+	WARN_ON(!kbase_refcount_read(&region->va_refcnt));
+	WARN_ON(kbase_refcount_read(&region->va_refcnt) == INT_MAX);
 
-	WARN_ON(!region->va_refcnt);
-
-	/* non-atomic as kctx->reg_lock is held */
 	dev_dbg(kctx->kbdev->dev, "va_refcnt %d before get %pK\n",
-		region->va_refcnt, (void *)region);
-	region->va_refcnt++;
+		kbase_refcount_read(&region->va_refcnt), (void *)region);
+	kbase_refcount_inc(&region->va_refcnt);
 
 	return region;
 }
@@ -634,21 +780,67 @@ static inline struct kbase_va_region *kbase_va_region_alloc_get(
 static inline struct kbase_va_region *kbase_va_region_alloc_put(
 		struct kbase_context *kctx, struct kbase_va_region *region)
 {
-	lockdep_assert_held(&kctx->reg_lock);
-
-	WARN_ON(region->va_refcnt <= 0);
+	WARN_ON(kbase_refcount_read(&region->va_refcnt) <= 0);
 	WARN_ON(region->flags & KBASE_REG_FREE);
 
-	/* non-atomic as kctx->reg_lock is held */
-	region->va_refcnt--;
-	dev_dbg(kctx->kbdev->dev, "va_refcnt %d after put %pK\n",
-		region->va_refcnt, (void *)region);
-	if (!region->va_refcnt)
+	if (kbase_refcount_dec_and_test(&region->va_refcnt))
 		kbase_region_refcnt_free(kctx->kbdev, region);
+	else
+		dev_dbg(kctx->kbdev->dev, "va_refcnt %d after put %pK\n",
+			kbase_refcount_read(&region->va_refcnt), (void *)region);
 
 	return NULL;
 }
 
+/**
+ * kbase_va_region_is_no_user_free - Check if user free is forbidden for the region.
+ * A region that must not be freed by userspace indicates that it is owned by some other
+ * kbase subsystem, for example tiler heaps, JIT memory or CSF queues.
+ * Such regions must not be shrunk (i.e. have their backing pages freed), except by the
+ * current owner.
+ * Hence, callers cannot rely on this check alone to determine if a region might be shrunk
+ * by any part of kbase. Instead they should use kbase_is_region_shrinkable().
+ *
+ * @region: Pointer to region.
+ *
+ * Return: true if userspace cannot free the region, false if userspace can free the region.
+ */
+static inline bool kbase_va_region_is_no_user_free(struct kbase_va_region *region)
+{
+	return atomic_read(&region->no_user_free_count) > 0;
+}
+
+/**
+ * kbase_va_region_no_user_free_inc - Increment "no user free" count for a region.
+ * Calling this function will prevent the region to be shrunk by parts of kbase that
+ * don't own the region (as long as the count stays above zero). Refer to
+ * kbase_va_region_is_no_user_free() for more information.
+ *
+ * @region: Pointer to region (not shrinkable).
+ *
+ * Return: the pointer to the region passed as argument.
+ */
+static inline void kbase_va_region_no_user_free_inc(struct kbase_va_region *region)
+{
+	WARN_ON(kbase_is_region_shrinkable(region));
+	WARN_ON(atomic_read(&region->no_user_free_count) == INT_MAX);
+
+	/* non-atomic as kctx->reg_lock is held */
+	atomic_inc(&region->no_user_free_count);
+}
+
+/**
+ * kbase_va_region_no_user_free_dec - Decrement "no user free" count for a region.
+ *
+ * @region: Pointer to region (not shrinkable).
+ */
+static inline void kbase_va_region_no_user_free_dec(struct kbase_va_region *region)
+{
+	WARN_ON(!kbase_va_region_is_no_user_free(region));
+
+	atomic_dec(&region->no_user_free_count);
+}
+
 /* Common functions */
 static inline struct tagged_addr *kbase_get_cpu_phy_pages(
 		struct kbase_va_region *reg)
@@ -862,12 +1054,9 @@ static inline size_t kbase_mem_pool_config_get_max_size(
  *
  * Return: 0 on success, negative -errno on error
  */
-int kbase_mem_pool_init(struct kbase_mem_pool *pool,
-		const struct kbase_mem_pool_config *config,
-		unsigned int order,
-		int group_id,
-		struct kbase_device *kbdev,
-		struct kbase_mem_pool *next_pool);
+int kbase_mem_pool_init(struct kbase_mem_pool *pool, const struct kbase_mem_pool_config *config,
+			unsigned int order, int group_id, struct kbase_device *kbdev,
+			struct kbase_mem_pool *next_pool);
 
 /**
  * kbase_mem_pool_term - Destroy a memory pool
@@ -947,6 +1136,9 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p,
  * @pages:    Pointer to array where the physical address of the allocated
  *            pages will be stored.
  * @partial_allowed: If fewer pages allocated is allowed
+ * @page_owner: Pointer to the task that created the Kbase context for which
+ *              the pages are being allocated. It can be NULL if the pages
+ *              won't be associated with any Kbase context.
  *
  * Like kbase_mem_pool_alloc() but optimized for allocating many pages.
  *
@@ -963,7 +1155,8 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p,
  * this lock, it should use kbase_mem_pool_alloc_pages_locked() instead.
  */
 int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
-		struct tagged_addr *pages, bool partial_allowed);
+			       struct tagged_addr *pages, bool partial_allowed,
+			       struct task_struct *page_owner);
 
 /**
  * kbase_mem_pool_alloc_pages_locked - Allocate pages from memory pool
@@ -1075,13 +1268,17 @@ void kbase_mem_pool_set_max_size(struct kbase_mem_pool *pool, size_t max_size);
  * kbase_mem_pool_grow - Grow the pool
  * @pool:       Memory pool to grow
  * @nr_to_grow: Number of pages to add to the pool
+ * @page_owner: Pointer to the task that created the Kbase context for which
+ *              the memory pool is being grown. It can be NULL if the pages
+ *              to be allocated won't be associated with any Kbase context.
  *
  * Adds @nr_to_grow pages to the pool. Note that this may cause the pool to
  * become larger than the maximum size specified.
  *
  * Return: 0 on success, -ENOMEM if unable to allocate sufficent pages
  */
-int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow);
+int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow,
+			struct task_struct *page_owner);
 
 /**
  * kbase_mem_pool_trim - Grow or shrink the pool to a new size
@@ -1115,6 +1312,16 @@ void kbase_mem_pool_mark_dying(struct kbase_mem_pool *pool);
 struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool);
 
 /**
+ * kbase_mem_pool_free_page - Free a page from a memory pool.
+ * @pool:  Memory pool to free a page from
+ * @p:     Page to free
+ *
+ * This will free any associated data stored for the page and release
+ * the page back to the kernel.
+ */
+void kbase_mem_pool_free_page(struct kbase_mem_pool *pool, struct page *p);
+
+/**
  * kbase_region_tracker_init - Initialize the region tracker data structure
  * @kctx: kbase context
  *
@@ -1159,18 +1366,19 @@ int kbase_region_tracker_init_exec(struct kbase_context *kctx, u64 exec_va_pages
 void kbase_region_tracker_term(struct kbase_context *kctx);
 
 /**
- * kbase_region_tracker_term_rbtree - Free memory for a region tracker
+ * kbase_region_tracker_erase_rbtree - Free memory for a region tracker
  *
  * @rbtree: Region tracker tree root
  *
  * This will free all the regions within the region tracker
  */
-void kbase_region_tracker_term_rbtree(struct rb_root *rbtree);
+void kbase_region_tracker_erase_rbtree(struct rb_root *rbtree);
 
 struct kbase_va_region *kbase_region_tracker_find_region_enclosing_address(
 		struct kbase_context *kctx, u64 gpu_addr);
 struct kbase_va_region *kbase_find_region_enclosing_address(
 		struct rb_root *rbtree, u64 gpu_addr);
+void kbase_region_tracker_insert(struct kbase_va_region *new_reg);
 
 /**
  * kbase_region_tracker_find_region_base_address - Check that a pointer is
@@ -1187,8 +1395,11 @@ struct kbase_va_region *kbase_region_tracker_find_region_base_address(
 struct kbase_va_region *kbase_find_region_base_address(struct rb_root *rbtree,
 		u64 gpu_addr);
 
-struct kbase_va_region *kbase_alloc_free_region(struct rb_root *rbtree,
-		u64 start_pfn, size_t nr_pages, int zone);
+struct kbase_va_region *kbase_alloc_free_region(struct kbase_reg_zone *zone, u64 start_pfn,
+						size_t nr_pages);
+struct kbase_va_region *kbase_ctx_alloc_free_region(struct kbase_context *kctx,
+						    enum kbase_memory_zone id, u64 start_pfn,
+						    size_t nr_pages);
 void kbase_free_alloced_region(struct kbase_va_region *reg);
 int kbase_add_va_region(struct kbase_context *kctx, struct kbase_va_region *reg,
 		u64 addr, size_t nr_pages, size_t align);
@@ -1199,6 +1410,32 @@ int kbase_add_va_region_rbtree(struct kbase_device *kbdev,
 bool kbase_check_alloc_flags(unsigned long flags);
 bool kbase_check_import_flags(unsigned long flags);
 
+static inline bool kbase_import_size_is_valid(struct kbase_device *kbdev, u64 va_pages)
+{
+	if (va_pages > KBASE_MEM_ALLOC_MAX_SIZE) {
+		dev_dbg(
+			kbdev->dev,
+			"Import attempted with va_pages==%lld larger than KBASE_MEM_ALLOC_MAX_SIZE!",
+			(unsigned long long)va_pages);
+		return false;
+	}
+
+	return true;
+}
+
+static inline bool kbase_alias_size_is_valid(struct kbase_device *kbdev, u64 va_pages)
+{
+	if (va_pages > KBASE_MEM_ALLOC_MAX_SIZE) {
+		dev_dbg(
+			kbdev->dev,
+			"Alias attempted with va_pages==%lld larger than KBASE_MEM_ALLOC_MAX_SIZE!",
+			(unsigned long long)va_pages);
+		return false;
+	}
+
+	return true;
+}
+
 /**
  * kbase_check_alloc_sizes - check user space sizes parameters for an
  *                           allocation
@@ -1233,9 +1470,75 @@ int kbase_check_alloc_sizes(struct kbase_context *kctx, unsigned long flags,
 int kbase_update_region_flags(struct kbase_context *kctx,
 		struct kbase_va_region *reg, unsigned long flags);
 
+/**
+ * kbase_gpu_vm_lock() - Acquire the per-context region list lock
+ * @kctx:  KBase context
+ *
+ * Care must be taken when making an allocation whilst holding this lock, because of interaction
+ * with the Kernel's OoM-killer and use of this lock in &vm_operations_struct close() handlers.
+ *
+ * If this lock is taken during a syscall, and/or the allocation is 'small' then it is safe to use.
+ *
+ * If the caller is not in a syscall, and the allocation is 'large', then it must not hold this
+ * lock.
+ *
+ * This is because the kernel OoM killer might target the process corresponding to that same kbase
+ * context, and attempt to call the context's close() handlers for its open VMAs. This is safe if
+ * the allocating caller is in a syscall, because the VMA close() handlers are delayed until all
+ * syscalls have finished (noting that no new syscalls can start as the remaining user threads will
+ * have been killed too), and so there is no possibility of contention between the thread
+ * allocating with this lock held, and the VMA close() handler.
+ *
+ * However, outside of a syscall (e.g. a kworker or other kthread), one of kbase's VMA close()
+ * handlers (kbase_cpu_vm_close()) also takes this lock, and so prevents the process from being
+ * killed until the caller of the function allocating memory has released this lock. On subsequent
+ * retries for allocating a page, the OoM killer would be re-invoked but skips over the process
+ * stuck in its close() handler.
+ *
+ * Also because the caller is not in a syscall, the page allocation code in the kernel is not aware
+ * that the allocation is being done on behalf of another process, and so does not realize that
+ * process has received a kill signal due to an OoM, and so will continually retry with the OoM
+ * killer until enough memory has been released, or until all other killable processes have been
+ * killed (at which point the kernel halts with a panic).
+ *
+ * However, if the allocation outside of a syscall is small enough to be satisfied by killing
+ * another process, then the allocation completes, the caller releases this lock, and
+ * kbase_cpu_vm_close() can unblock and allow the process to be killed.
+ *
+ * Hence, this is effectively a deadlock with kbase_cpu_vm_close(), except that if the memory
+ * allocation is small enough the deadlock can be resolved. For that reason, such a memory deadlock
+ * is NOT discovered with CONFIG_PROVE_LOCKING.
+ *
+ * If this may be called outside of a syscall, consider moving allocations outside of this lock, or
+ * use __GFP_NORETRY for such allocations (which will allow direct-reclaim attempts, but will
+ * prevent OoM kills to satisfy the allocation, and will just fail the allocation instead).
+ */
 void kbase_gpu_vm_lock(struct kbase_context *kctx);
+
+/**
+ * kbase_gpu_vm_lock_with_pmode_sync() - Wrapper of kbase_gpu_vm_lock.
+ * @kctx:  KBase context
+ *
+ * Same as kbase_gpu_vm_lock for JM GPU.
+ * Additionally acquire P.mode read-write semaphore for CSF GPU.
+ */
+void kbase_gpu_vm_lock_with_pmode_sync(struct kbase_context *kctx);
+
+/**
+ * kbase_gpu_vm_unlock() - Release the per-context region list lock
+ * @kctx:  KBase context
+ */
 void kbase_gpu_vm_unlock(struct kbase_context *kctx);
 
+/**
+ * kbase_gpu_vm_unlock_with_pmode_sync() - Wrapper of kbase_gpu_vm_unlock.
+ * @kctx:  KBase context
+ *
+ * Same as kbase_gpu_vm_unlock for JM GPU.
+ * Additionally release P.mode read-write semaphore for CSF GPU.
+ */
+void kbase_gpu_vm_unlock_with_pmode_sync(struct kbase_context *kctx);
+
 int kbase_alloc_phy_pages(struct kbase_va_region *reg, size_t vsize, size_t size);
 
 /**
@@ -1311,6 +1614,7 @@ void kbase_mmu_disable_as(struct kbase_device *kbdev, int as_nr);
 
 void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat);
 
+#if defined(CONFIG_MALI_VECTOR_DUMP)
 /**
  * kbase_mmu_dump() - Dump the MMU tables to a buffer.
  *
@@ -1330,6 +1634,7 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat);
  * (including if the @c nr_pages is too small)
  */
 void *kbase_mmu_dump(struct kbase_context *kctx, int nr_pages);
+#endif
 
 /**
  * kbase_sync_now - Perform cache maintenance on a memory region
@@ -1449,15 +1754,21 @@ int kbasep_find_enclosing_gpu_mapping_start_and_offset(
  * @alloc:              allocation object to add pages to
  * @nr_pages_requested: number of physical pages to allocate
  *
- * Allocates \a nr_pages_requested and updates the alloc object.
+ * Allocates @nr_pages_requested and updates the alloc object.
  *
- * Return: 0 if all pages have been successfully allocated. Error code otherwise
+ * Note: if kbase_gpu_vm_lock() is to be held around this function to ensure thread-safe updating
+ * of @alloc, then refer to the documentation of kbase_gpu_vm_lock() about the requirements of
+ * either calling during a syscall, or ensuring the allocation is small. These requirements prevent
+ * an effective deadlock between the kernel's OoM killer and kbase's VMA close() handlers, which
+ * could take kbase_gpu_vm_lock() too.
  *
- * Note : The caller must not hold vm_lock, as this could cause a deadlock if
- * the kernel OoM killer runs. If the caller must allocate pages while holding
- * this lock, it should use kbase_mem_pool_alloc_pages_locked() instead.
+ * If the requirements of kbase_gpu_vm_lock() cannot be satisfied when calling this function, but
+ * @alloc must still be updated in a thread-safe way, then instead use
+ * kbase_alloc_phy_pages_helper_locked() and restructure callers into the sequence outlined there.
  *
  * This function cannot be used from interrupt context
+ *
+ * Return: 0 if all pages have been successfully allocated. Error code otherwise
  */
 int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
 		size_t nr_pages_requested);
@@ -1467,17 +1778,19 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
  * @alloc:              allocation object to add pages to
  * @pool:               Memory pool to allocate from
  * @nr_pages_requested: number of physical pages to allocate
- * @prealloc_sa:        Information about the partial allocation if the amount
- *                      of memory requested is not a multiple of 2MB. One
- *                      instance of struct kbase_sub_alloc must be allocated by
- *                      the caller iff CONFIG_MALI_2MB_ALLOC is enabled.
  *
- * Allocates \a nr_pages_requested and updates the alloc object. This function
- * does not allocate new pages from the kernel, and therefore will never trigger
- * the OoM killer. Therefore, it can be run while the vm_lock is held.
+ * @prealloc_sa:        Information about the partial allocation if the amount of memory requested
+ *                      is not a multiple of 2MB. One instance of struct kbase_sub_alloc must be
+ *                      allocated by the caller if kbdev->pagesize_2mb is enabled.
  *
- * As new pages can not be allocated, the caller must ensure there are
- * sufficient pages in the pool. Usage of this function should look like :
+ * Allocates @nr_pages_requested and updates the alloc object. This function does not allocate new
+ * pages from the kernel, and therefore will never trigger the OoM killer. Therefore, it can be
+ * called whilst a thread operating outside of a syscall has held the region list lock
+ * (kbase_gpu_vm_lock()), as it will not cause an effective deadlock with VMA close() handlers used
+ * by the OoM killer.
+ *
+ * As new pages can not be allocated, the caller must ensure there are sufficient pages in the
+ * pool. Usage of this function should look like :
  *
  *   kbase_gpu_vm_lock(kctx);
  *   kbase_mem_pool_lock(pool)
@@ -1490,24 +1803,24 @@ int kbase_alloc_phy_pages_helper(struct kbase_mem_phy_alloc *alloc,
  *   }
  *   kbase_alloc_phy_pages_helper_locked(pool)
  *   kbase_mem_pool_unlock(pool)
- *   Perform other processing that requires vm_lock...
+ *   // Perform other processing that requires vm_lock...
  *   kbase_gpu_vm_unlock(kctx);
  *
- * This ensures that the pool can be grown to the required size and that the
- * allocation can complete without another thread using the newly grown pages.
+ * This ensures that the pool can be grown to the required size and that the allocation can
+ * complete without another thread using the newly grown pages.
  *
- * If CONFIG_MALI_2MB_ALLOC is defined and the allocation is >= 2MB, then
- * @pool must be alloc->imported.native.kctx->lp_mem_pool. Otherwise it must be
- * alloc->imported.native.kctx->mem_pool.
- * @prealloc_sa is used to manage the non-2MB sub-allocation. It has to be
- * pre-allocated because we must not sleep (due to the usage of kmalloc())
- * whilst holding pool->pool_lock.
- * @prealloc_sa shall be set to NULL if it has been consumed by this function
- * to indicate that the caller must not free it.
+ * If kbdev->pagesize_2mb is enabled and the allocation is >= 2MB, then @pool must be one of the
+ * pools from alloc->imported.native.kctx->mem_pools.large[]. Otherwise it must be one of the
+ * mempools from alloc->imported.native.kctx->mem_pools.small[].
  *
- * Return: Pointer to array of allocated pages. NULL on failure.
+ * @prealloc_sa is used to manage the non-2MB sub-allocation. It has to be pre-allocated because we
+ * must not sleep (due to the usage of kmalloc()) whilst holding pool->pool_lock.  @prealloc_sa
+ * shall be set to NULL if it has been consumed by this function to indicate that the caller no
+ * longer owns it and should not access it further.
+ *
+ * Note: Caller must hold @pool->pool_lock
  *
- * Note : Caller must hold pool->pool_lock
+ * Return: Pointer to array of allocated pages. NULL on failure.
  */
 struct tagged_addr *kbase_alloc_phy_pages_helper_locked(
 		struct kbase_mem_phy_alloc *alloc, struct kbase_mem_pool *pool,
@@ -1546,7 +1859,7 @@ void kbase_free_phy_pages_helper_locked(struct kbase_mem_phy_alloc *alloc,
 		struct kbase_mem_pool *pool, struct tagged_addr *pages,
 		size_t nr_pages_to_free);
 
-static inline void kbase_set_dma_addr(struct page *p, dma_addr_t dma_addr)
+static inline void kbase_set_dma_addr_as_priv(struct page *p, dma_addr_t dma_addr)
 {
 	SetPagePrivate(p);
 	if (sizeof(dma_addr_t) > sizeof(p->private)) {
@@ -1562,7 +1875,7 @@ static inline void kbase_set_dma_addr(struct page *p, dma_addr_t dma_addr)
 	}
 }
 
-static inline dma_addr_t kbase_dma_addr(struct page *p)
+static inline dma_addr_t kbase_dma_addr_as_priv(struct page *p)
 {
 	if (sizeof(dma_addr_t) > sizeof(p->private))
 		return ((dma_addr_t)page_private(p)) << PAGE_SHIFT;
@@ -1570,11 +1883,35 @@ static inline dma_addr_t kbase_dma_addr(struct page *p)
 	return (dma_addr_t)page_private(p);
 }
 
-static inline void kbase_clear_dma_addr(struct page *p)
+static inline void kbase_clear_dma_addr_as_priv(struct page *p)
 {
 	ClearPagePrivate(p);
 }
 
+static inline struct kbase_page_metadata *kbase_page_private(struct page *p)
+{
+	return (struct kbase_page_metadata *)page_private(p);
+}
+
+static inline dma_addr_t kbase_dma_addr(struct page *p)
+{
+	if (kbase_is_page_migration_enabled())
+		return kbase_page_private(p)->dma_addr;
+
+	return kbase_dma_addr_as_priv(p);
+}
+
+static inline dma_addr_t kbase_dma_addr_from_tagged(struct tagged_addr tagged_pa)
+{
+	phys_addr_t pa = as_phys_addr_t(tagged_pa);
+	struct page *page = pfn_to_page(PFN_DOWN(pa));
+	dma_addr_t dma_addr = (is_huge(tagged_pa) || is_partial(tagged_pa)) ?
+					    kbase_dma_addr_as_priv(page) :
+					    kbase_dma_addr(page);
+
+	return dma_addr;
+}
+
 /**
  * kbase_flush_mmu_wqs() - Flush MMU workqueues.
  * @kbdev:   Device pointer.
@@ -1733,8 +2070,8 @@ void kbase_jit_report_update_pressure(struct kbase_context *kctx,
 		unsigned int flags);
 
 /**
- * jit_trim_necessary_pages() - calculate and trim the least pages possible to
- * satisfy a new JIT allocation
+ * kbase_jit_trim_necessary_pages() - calculate and trim the least pages
+ * possible to satisfy a new JIT allocation
  *
  * @kctx: Pointer to the kbase context
  * @needed_pages: Number of JIT physical pages by which trimming is requested.
@@ -1868,28 +2205,36 @@ bool kbase_has_exec_va_zone(struct kbase_context *kctx);
 /**
  * kbase_map_external_resource - Map an external resource to the GPU.
  * @kctx:              kbase context.
- * @reg:               The region to map.
+ * @reg:               External resource to map.
  * @locked_mm:         The mm_struct which has been locked for this operation.
  *
- * Return: The physical allocation which backs the region on success or NULL
- * on failure.
+ * On successful mapping, the VA region and the gpu_alloc refcounts will be
+ * increased, making it safe to use and store both values directly.
+ *
+ * Return: Zero on success, or negative error code.
  */
-struct kbase_mem_phy_alloc *kbase_map_external_resource(
-		struct kbase_context *kctx, struct kbase_va_region *reg,
-		struct mm_struct *locked_mm);
+int kbase_map_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg,
+				struct mm_struct *locked_mm);
 
 /**
  * kbase_unmap_external_resource - Unmap an external resource from the GPU.
  * @kctx:  kbase context.
- * @reg:   The region to unmap or NULL if it has already been released.
- * @alloc: The physical allocation being unmapped.
+ * @reg:   VA region corresponding to external resource
+ *
+ * On successful unmapping, the VA region and the gpu_alloc refcounts will
+ * be decreased. If the refcount reaches zero, both @reg and the corresponding
+ * allocation may be freed, so using them after returning from this function
+ * requires the caller to explicitly check their state.
  */
-void kbase_unmap_external_resource(struct kbase_context *kctx,
-		struct kbase_va_region *reg, struct kbase_mem_phy_alloc *alloc);
+void kbase_unmap_external_resource(struct kbase_context *kctx, struct kbase_va_region *reg);
 
 /**
  * kbase_unpin_user_buf_page - Unpin a page of a user buffer.
  * @page: page to unpin
+ *
+ * The caller must have ensured that there are no CPU mappings for @page (as
+ * might be created from the struct kbase_mem_phy_alloc that tracks @page), and
+ * that userspace will not be able to recreate the CPU mappings again.
  */
 void kbase_unpin_user_buf_page(struct page *page);
 
@@ -1973,7 +2318,7 @@ static inline void kbase_mem_pool_lock(struct kbase_mem_pool *pool)
 }
 
 /**
- * kbase_mem_pool_lock - Release a memory pool
+ * kbase_mem_pool_unlock - Release a memory pool
  * @pool: Memory pool to lock
  */
 static inline void kbase_mem_pool_unlock(struct kbase_mem_pool *pool)
@@ -2119,83 +2464,102 @@ int kbase_mem_copy_to_pinned_user_pages(struct page **dest_pages,
 		unsigned int *target_page_nr, size_t offset);
 
 /**
- * kbase_reg_zone_end_pfn - return the end Page Frame Number of @zone
- * @zone: zone to query
+ * kbase_ctx_reg_zone_get_nolock - Get a zone from @kctx where the caller does
+ *                                 not have @kctx 's region lock
+ * @kctx: Pointer to kbase context
+ * @zone: Zone identifier
  *
- * Return: The end of the zone corresponding to @zone
+ * This should only be used in performance-critical paths where the code is
+ * resilient to a race with the zone changing, and only when the zone is tracked
+ * by the @kctx.
+ *
+ * Return: The zone corresponding to @zone
  */
-static inline u64 kbase_reg_zone_end_pfn(struct kbase_reg_zone *zone)
+static inline struct kbase_reg_zone *kbase_ctx_reg_zone_get_nolock(struct kbase_context *kctx,
+								   enum kbase_memory_zone zone)
 {
-	return zone->base_pfn + zone->va_size_pages;
+	WARN_ON(!kbase_is_ctx_reg_zone(zone));
+	return &kctx->reg_zone[zone];
 }
 
 /**
- * kbase_ctx_reg_zone_init - initialize a zone in @kctx
+ * kbase_ctx_reg_zone_get - Get a memory zone from @kctx
  * @kctx: Pointer to kbase context
- * @zone_bits: A KBASE_REG_ZONE_<...> to initialize
+ * @zone: Zone identifier
+ *
+ * Note that the zone is not refcounted, so there is no corresponding operation to
+ * put the zone back.
+ *
+ * Return: The zone corresponding to @zone
+ */
+static inline struct kbase_reg_zone *kbase_ctx_reg_zone_get(struct kbase_context *kctx,
+							    enum kbase_memory_zone zone)
+{
+	lockdep_assert_held(&kctx->reg_lock);
+	return kbase_ctx_reg_zone_get_nolock(kctx, zone);
+}
+
+/**
+ * kbase_reg_zone_init - Initialize a zone in @kctx
+ * @kbdev: Pointer to kbase device in order to initialize the VA region cache
+ * @zone: Memory zone
+ * @id: Memory zone identifier to facilitate lookups
  * @base_pfn: Page Frame Number in GPU virtual address space for the start of
  *            the Zone
  * @va_size_pages: Size of the Zone in pages
+ *
+ * Return:
+ * * 0 on success
+ * * -ENOMEM on error
  */
-static inline void kbase_ctx_reg_zone_init(struct kbase_context *kctx,
-					   unsigned long zone_bits,
-					   u64 base_pfn, u64 va_size_pages)
+static inline int kbase_reg_zone_init(struct kbase_device *kbdev, struct kbase_reg_zone *zone,
+				      enum kbase_memory_zone id, u64 base_pfn, u64 va_size_pages)
 {
-	struct kbase_reg_zone *zone;
+	struct kbase_va_region *reg;
 
-	lockdep_assert_held(&kctx->reg_lock);
-	WARN_ON(!kbase_is_ctx_reg_zone(zone_bits));
+	*zone = (struct kbase_reg_zone){ .reg_rbtree = RB_ROOT,
+					 .base_pfn = base_pfn,
+					 .va_size_pages = va_size_pages,
+					 .id = id,
+					 .cache = kbdev->va_region_slab };
+
+	if (unlikely(!va_size_pages))
+		return 0;
+
+	reg = kbase_alloc_free_region(zone, base_pfn, va_size_pages);
+	if (unlikely(!reg))
+		return -ENOMEM;
+
+	kbase_region_tracker_insert(reg);
 
-	zone = &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)];
-	*zone = (struct kbase_reg_zone){
-		.base_pfn = base_pfn, .va_size_pages = va_size_pages,
-	};
+	return 0;
 }
 
 /**
- * kbase_ctx_reg_zone_get_nolock - get a zone from @kctx where the caller does
- *                                 not have @kctx 's region lock
- * @kctx: Pointer to kbase context
- * @zone_bits: A KBASE_REG_ZONE_<...> to retrieve
- *
- * This should only be used in performance-critical paths where the code is
- * resilient to a race with the zone changing.
+ * kbase_reg_zone_end_pfn - return the end Page Frame Number of @zone
+ * @zone: zone to query
  *
- * Return: The zone corresponding to @zone_bits
+ * Return: The end of the zone corresponding to @zone
  */
-static inline struct kbase_reg_zone *
-kbase_ctx_reg_zone_get_nolock(struct kbase_context *kctx,
-			      unsigned long zone_bits)
+static inline u64 kbase_reg_zone_end_pfn(struct kbase_reg_zone *zone)
 {
-	WARN_ON(!kbase_is_ctx_reg_zone(zone_bits));
-
-	return &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)];
+	return zone->base_pfn + zone->va_size_pages;
 }
 
 /**
- * kbase_ctx_reg_zone_get - get a zone from @kctx
- * @kctx: Pointer to kbase context
- * @zone_bits: A KBASE_REG_ZONE_<...> to retrieve
- *
- * The get is not refcounted - there is no corresponding 'put' operation
- *
- * Return: The zone corresponding to @zone_bits
+ * kbase_reg_zone_term - Terminate the memory zone tracker
+ * @zone: Memory zone
  */
-static inline struct kbase_reg_zone *
-kbase_ctx_reg_zone_get(struct kbase_context *kctx, unsigned long zone_bits)
+static inline void kbase_reg_zone_term(struct kbase_reg_zone *zone)
 {
-	lockdep_assert_held(&kctx->reg_lock);
-	WARN_ON(!kbase_is_ctx_reg_zone(zone_bits));
-
-	return &kctx->reg_zone[KBASE_REG_ZONE_IDX(zone_bits)];
+	kbase_region_tracker_erase_rbtree(&zone->reg_rbtree);
 }
 
 /**
  * kbase_mem_allow_alloc - Check if allocation of GPU memory is allowed
  * @kctx: Pointer to kbase context
  *
- * Don't allow the allocation of GPU memory until user space has set up the
- * tracking page (which sets kctx->process_mm) or if the ioctl has been issued
+ * Don't allow the allocation of GPU memory if the ioctl has been issued
  * from the forked child process using the mali device file fd inherited from
  * the parent process.
  *
@@ -2203,13 +2567,23 @@ kbase_ctx_reg_zone_get(struct kbase_context *kctx, unsigned long zone_bits)
  */
 static inline bool kbase_mem_allow_alloc(struct kbase_context *kctx)
 {
-	bool allow_alloc = true;
-
-	rcu_read_lock();
-	allow_alloc = (rcu_dereference(kctx->process_mm) == current->mm);
-	rcu_read_unlock();
+	return (kctx->process_mm == current->mm);
+}
 
-	return allow_alloc;
+/**
+ * kbase_mem_mmgrab - Wrapper function to take reference on mm_struct of current process
+ */
+static inline void kbase_mem_mmgrab(void)
+{
+	/* This merely takes a reference on the memory descriptor structure
+	 * i.e. mm_struct of current process and not on its address space and
+	 * so won't block the freeing of address space on process exit.
+	 */
+#if KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE
+	atomic_inc(&current->mm->mm_count);
+#else
+	mmgrab(current->mm);
+#endif
 }
 
 /**
diff --git a/mali_kbase/mali_kbase_mem_linux.c b/mali_kbase/mali_kbase_mem_linux.c
index 23d55b2..d154583 100644
--- a/mali_kbase/mali_kbase_mem_linux.c
+++ b/mali_kbase/mali_kbase_mem_linux.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,14 +31,13 @@
 #include <linux/fs.h>
 #include <linux/version.h>
 #include <linux/dma-mapping.h>
-#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE)
-#include <linux/dma-attrs.h>
-#endif /* LINUX_VERSION_CODE < 4.8.0 */
 #include <linux/dma-buf.h>
 #include <linux/shrinker.h>
 #include <linux/cache.h>
 #include <linux/memory_group_manager.h>
-
+#include <linux/math64.h>
+#include <linux/migrate.h>
+#include <linux/version.h>
 #include <mali_kbase.h>
 #include <mali_kbase_mem_linux.h>
 #include <tl/mali_kbase_tracepoints.h>
@@ -84,23 +83,34 @@
 #define IR_THRESHOLD_STEPS (256u)
 
 #if MALI_USE_CSF
-static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx,
-			struct vm_area_struct *vma);
-static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx,
-			struct vm_area_struct *vma);
+static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx, struct vm_area_struct *vma);
+static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx, struct vm_area_struct *vma);
 #endif
 
-static int kbase_vmap_phy_pages(struct kbase_context *kctx,
-		struct kbase_va_region *reg, u64 offset_bytes, size_t size,
-		struct kbase_vmap_struct *map);
+static int kbase_vmap_phy_pages(struct kbase_context *kctx, struct kbase_va_region *reg,
+				u64 offset_bytes, size_t size, struct kbase_vmap_struct *map,
+				kbase_vmap_flag vmap_flags);
 static void kbase_vunmap_phy_pages(struct kbase_context *kctx,
 		struct kbase_vmap_struct *map);
 
 static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_struct *vma);
 
-static int kbase_mem_shrink_gpu_mapping(struct kbase_context *kctx,
-		struct kbase_va_region *reg,
-		u64 new_pages, u64 old_pages);
+static bool is_process_exiting(struct vm_area_struct *vma)
+{
+	/* PF_EXITING flag can't be reliably used here for the detection
+	 * of process exit, as 'mm_users' counter could still be non-zero
+	 * when all threads of the process have exited. Later when the
+	 * thread (which took a reference on the 'mm' of process that
+	 * exited) drops it reference, the vm_ops->close method would be
+	 * called for all the vmas (owned by 'mm' of process that exited)
+	 * but the PF_EXITING flag may not be neccessarily set for the
+	 * thread at that time.
+	 */
+	if (atomic_read(&vma->vm_mm->mm_users))
+		return false;
+
+	return true;
+}
 
 /* Retrieve the associated region pointer if the GPU address corresponds to
  * one of the event memory pages. The enclosing region, if found, shouldn't
@@ -182,20 +192,12 @@ static int kbase_phy_alloc_mapping_init(struct kbase_context *kctx,
 			reg->cpu_alloc->type != KBASE_MEM_TYPE_NATIVE)
 		return -EINVAL;
 
-	if (size > (KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES -
-			atomic_read(&kctx->permanent_mapped_pages))) {
-		dev_warn(kctx->kbdev->dev, "Request for %llu more pages mem needing a permanent mapping would breach limit %lu, currently at %d pages",
-				(u64)size,
-				KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES,
-				atomic_read(&kctx->permanent_mapped_pages));
-		return -ENOMEM;
-	}
-
 	kern_mapping = kzalloc(sizeof(*kern_mapping), GFP_KERNEL);
 	if (!kern_mapping)
 		return -ENOMEM;
 
-	err = kbase_vmap_phy_pages(kctx, reg, 0u, size_bytes, kern_mapping);
+	err = kbase_vmap_phy_pages(kctx, reg, 0u, size_bytes, kern_mapping,
+				   KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING);
 	if (err < 0)
 		goto vmap_fail;
 
@@ -203,7 +205,6 @@ static int kbase_phy_alloc_mapping_init(struct kbase_context *kctx,
 	reg->flags &= ~KBASE_REG_GROWABLE;
 
 	reg->cpu_alloc->permanent_map = kern_mapping;
-	atomic_add(size, &kctx->permanent_mapped_pages);
 
 	return 0;
 vmap_fail:
@@ -219,13 +220,6 @@ void kbase_phy_alloc_mapping_term(struct kbase_context *kctx,
 	kfree(alloc->permanent_map);
 
 	alloc->permanent_map = NULL;
-
-	/* Mappings are only done on cpu_alloc, so don't need to worry about
-	 * this being reduced a second time if a separate gpu_alloc is
-	 * freed
-	 */
-	WARN_ON(alloc->nents > atomic_read(&kctx->permanent_mapped_pages));
-	atomic_sub(alloc->nents, &kctx->permanent_mapped_pages);
 }
 
 void *kbase_phy_alloc_mapping_get(struct kbase_context *kctx,
@@ -293,9 +287,8 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 					u64 extension, u64 *flags, u64 *gpu_va,
 					enum kbase_caller_mmu_sync_info mmu_sync_info)
 {
-	int zone;
 	struct kbase_va_region *reg;
-	struct rb_root *rbtree;
+	enum kbase_memory_zone zone;
 	struct device *dev;
 
 	KBASE_DEBUG_ASSERT(kctx);
@@ -365,32 +358,25 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 #endif
 
 	/* find out which VA zone to use */
-	if (*flags & BASE_MEM_SAME_VA) {
-		rbtree = &kctx->reg_rbtree_same;
-		zone = KBASE_REG_ZONE_SAME_VA;
-	}
+	if (*flags & BASE_MEM_SAME_VA)
+		zone = SAME_VA_ZONE;
 #if MALI_USE_CSF
 	/* fixed va_zone always exists */
 	else if (*flags & (BASE_MEM_FIXED | BASE_MEM_FIXABLE)) {
 		if (*flags & BASE_MEM_PROT_GPU_EX) {
-			rbtree = &kctx->reg_rbtree_exec_fixed;
-			zone = KBASE_REG_ZONE_EXEC_FIXED_VA;
+			zone = EXEC_FIXED_VA_ZONE;
 		} else {
-			rbtree = &kctx->reg_rbtree_fixed;
-			zone = KBASE_REG_ZONE_FIXED_VA;
+			zone = FIXED_VA_ZONE;
 		}
 	}
 #endif
 	else if ((*flags & BASE_MEM_PROT_GPU_EX) && kbase_has_exec_va_zone(kctx)) {
-		rbtree = &kctx->reg_rbtree_exec;
-		zone = KBASE_REG_ZONE_EXEC_VA;
+		zone = EXEC_VA_ZONE;
 	} else {
-		rbtree = &kctx->reg_rbtree_custom;
-		zone = KBASE_REG_ZONE_CUSTOM_VA;
+		zone = CUSTOM_VA_ZONE;
 	}
 
-	reg = kbase_alloc_free_region(rbtree, PFN_DOWN(*gpu_va),
-			va_pages, zone);
+	reg = kbase_ctx_alloc_free_region(kctx, zone, PFN_DOWN(*gpu_va), va_pages);
 
 	if (!reg) {
 		dev_err(dev, "Failed to allocate free region");
@@ -445,7 +431,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 	}
 	reg->initial_commit = commit_pages;
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	if (reg->flags & KBASE_REG_PERMANENT_KERNEL_MAPPING) {
 		/* Permanent kernel mappings must happen as soon as
@@ -456,7 +442,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 		int err = kbase_phy_alloc_mapping_init(kctx, reg, va_pages,
 				commit_pages);
 		if (err < 0) {
-			kbase_gpu_vm_unlock(kctx);
+			kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 			goto no_kern_mapping;
 		}
 	}
@@ -468,7 +454,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 		/* Bind to a cookie */
 		if (bitmap_empty(kctx->cookies, BITS_PER_LONG)) {
 			dev_err(dev, "No cookies available for allocation!");
-			kbase_gpu_vm_unlock(kctx);
+			kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 			goto no_cookie;
 		}
 		/* return a cookie */
@@ -483,10 +469,28 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 
 		*gpu_va = (u64) cookie;
 	} else /* we control the VA */ {
-		if (kbase_gpu_mmap(kctx, reg, *gpu_va, va_pages, 1,
+		size_t align = 1;
+
+		if (kctx->kbdev->pagesize_2mb) {
+			/* If there's enough (> 33 bits) of GPU VA space, align to 2MB
+			* boundaries. The similar condition is used for mapping from
+			* the SAME_VA zone inside kbase_context_get_unmapped_area().
+			*/
+			if (kctx->kbdev->gpu_props.mmu.va_bits > 33) {
+				if (va_pages >= (SZ_2M / SZ_4K))
+					align = (SZ_2M / SZ_4K);
+			}
+			if (*gpu_va)
+				align = 1;
+#if !MALI_USE_CSF
+			if (reg->flags & KBASE_REG_TILER_ALIGN_TOP)
+				align = 1;
+#endif /* !MALI_USE_CSF */
+		}
+		if (kbase_gpu_mmap(kctx, reg, *gpu_va, va_pages, align,
 				   mmu_sync_info) != 0) {
 			dev_warn(dev, "Failed to map memory on GPU");
-			kbase_gpu_vm_unlock(kctx);
+			kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 			goto no_mmap;
 		}
 		/* return real GPU VA */
@@ -504,7 +508,7 @@ struct kbase_va_region *kbase_mem_alloc(struct kbase_context *kctx, u64 va_pages
 	}
 #endif /* MALI_JIT_PRESSURE_LIMIT_BASE */
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 #if MALI_USE_CSF
 	if (*flags & BASE_MEM_FIXABLE)
@@ -623,8 +627,8 @@ int kbase_mem_query(struct kbase_context *kctx,
 #if MALI_USE_CSF
 		if (KBASE_REG_CSF_EVENT & reg->flags)
 			*out |= BASE_MEM_CSF_EVENT;
-		if (((KBASE_REG_ZONE_MASK & reg->flags) == KBASE_REG_ZONE_FIXED_VA) ||
-		    ((KBASE_REG_ZONE_MASK & reg->flags) == KBASE_REG_ZONE_EXEC_FIXED_VA)) {
+		if ((kbase_bits_to_zone(reg->flags) == FIXED_VA_ZONE) ||
+		    (kbase_bits_to_zone(reg->flags) == EXEC_FIXED_VA_ZONE)) {
 			if (KBASE_REG_FIXED_ADDRESS & reg->flags)
 				*out |= BASE_MEM_FIXED;
 			else
@@ -659,24 +663,33 @@ out_unlock:
  * @s:        Shrinker
  * @sc:       Shrinker control
  *
- * Return: Number of pages which can be freed.
+ * Return: Number of pages which can be freed or SHRINK_EMPTY if no page remains.
  */
 static
 unsigned long kbase_mem_evictable_reclaim_count_objects(struct shrinker *s,
 		struct shrink_control *sc)
 {
-	struct kbase_context *kctx;
+	struct kbase_context *kctx = container_of(s, struct kbase_context, reclaim);
+	int evict_nents = atomic_read(&kctx->evict_nents);
+	unsigned long nr_freeable_items;
 
-	kctx = container_of(s, struct kbase_context, reclaim);
-
-	WARN((sc->gfp_mask & __GFP_ATOMIC),
-	     "Shrinkers cannot be called for GFP_ATOMIC allocations. Check kernel mm for problems. gfp_mask==%x\n",
-	     sc->gfp_mask);
 	WARN(in_atomic(),
-	     "Shrinker called whilst in atomic context. The caller must switch to using GFP_ATOMIC or similar. gfp_mask==%x\n",
+	     "Shrinker called in atomic context. The caller must use GFP_ATOMIC or similar, then Shrinkers must not be called. gfp_mask==%x\n",
 	     sc->gfp_mask);
 
-	return atomic_read(&kctx->evict_nents);
+	if (unlikely(evict_nents < 0)) {
+		dev_err(kctx->kbdev->dev, "invalid evict_nents(%d)", evict_nents);
+		nr_freeable_items = 0;
+	} else {
+		nr_freeable_items = evict_nents;
+	}
+
+#if KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE
+	if (nr_freeable_items == 0)
+		nr_freeable_items = SHRINK_EMPTY;
+#endif
+
+	return nr_freeable_items;
 }
 
 /**
@@ -685,8 +698,8 @@ unsigned long kbase_mem_evictable_reclaim_count_objects(struct shrinker *s,
  * @s:        Shrinker
  * @sc:       Shrinker control
  *
- * Return: Number of pages freed (can be less then requested) or -1 if the
- * shrinker failed to free pages in its pool.
+ * Return: Number of pages freed (can be less then requested) or
+ *         SHRINK_STOP if reclaim isn't possible.
  *
  * Note:
  * This function accesses region structures without taking the region lock,
@@ -709,22 +722,27 @@ unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s,
 
 	kctx = container_of(s, struct kbase_context, reclaim);
 
+#if MALI_USE_CSF
+	if (!down_read_trylock(&kctx->kbdev->csf.pmode_sync_sem)) {
+		dev_warn(kctx->kbdev->dev,
+			 "Can't shrink GPU memory when P.Mode entrance is in progress");
+		return 0;
+	}
+#endif
 	mutex_lock(&kctx->jit_evict_lock);
 
 	list_for_each_entry_safe(alloc, tmp, &kctx->evict_list, evict_node) {
 		int err;
 
+		if (!alloc->reg)
+			continue;
+
 		err = kbase_mem_shrink_gpu_mapping(kctx, alloc->reg,
 				0, alloc->nents);
-		if (err != 0) {
-			/*
-			 * Failed to remove GPU mapping, tell the shrinker
-			 * to stop trying to shrink our slab even though we
-			 * have pages in it.
-			 */
-			freed = -1;
-			goto out_unlock;
-		}
+
+		/* Failed to remove GPU mapping, proceed to next one. */
+		if (err != 0)
+			continue;
 
 		/*
 		 * Update alloc->evicted before freeing the backing so the
@@ -748,9 +766,11 @@ unsigned long kbase_mem_evictable_reclaim_scan_objects(struct shrinker *s,
 		if (freed > sc->nr_to_scan)
 			break;
 	}
-out_unlock:
-	mutex_unlock(&kctx->jit_evict_lock);
 
+	mutex_unlock(&kctx->jit_evict_lock);
+#if MALI_USE_CSF
+	up_read(&kctx->kbdev->csf.pmode_sync_sem);
+#endif
 	return freed;
 }
 
@@ -768,7 +788,11 @@ int kbase_mem_evictable_init(struct kbase_context *kctx)
 	 * struct shrinker does not define batch
 	 */
 	kctx->reclaim.batch = 0;
-	register_shrinker(&kctx->reclaim, "mali-mem-evictable");
+#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE
+	register_shrinker(&kctx->reclaim);
+#else
+	register_shrinker(&kctx->reclaim, "mali-mem");
+#endif
 	return 0;
 }
 
@@ -832,6 +856,9 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc)
 
 	lockdep_assert_held(&kctx->reg_lock);
 
+	/* Memory is in the process of transitioning to the shrinker, and
+	 * should ignore migration attempts
+	 */
 	kbase_mem_shrink_cpu_mapping(kctx, gpu_alloc->reg,
 			0, gpu_alloc->nents);
 
@@ -839,12 +866,17 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc)
 	/* This allocation can't already be on a list. */
 	WARN_ON(!list_empty(&gpu_alloc->evict_node));
 
-	/*
-	 * Add the allocation to the eviction list, after this point the shrink
+	/* Add the allocation to the eviction list, after this point the shrink
 	 * can reclaim it.
 	 */
 	list_add(&gpu_alloc->evict_node, &kctx->evict_list);
 	atomic_add(gpu_alloc->nents, &kctx->evict_nents);
+
+	/* Indicate to page migration that the memory can be reclaimed by the shrinker.
+	 */
+	if (kbase_is_page_migration_enabled())
+		kbase_set_phy_alloc_page_status(gpu_alloc, NOT_MOVABLE);
+
 	mutex_unlock(&kctx->jit_evict_lock);
 	kbase_mem_evictable_mark_reclaim(gpu_alloc);
 
@@ -896,6 +928,15 @@ bool kbase_mem_evictable_unmake(struct kbase_mem_phy_alloc *gpu_alloc)
 					gpu_alloc->evicted, 0, mmu_sync_info);
 
 			gpu_alloc->evicted = 0;
+
+			/* Since the allocation is no longer evictable, and we ensure that
+			 * it grows back to its pre-eviction size, we will consider the
+			 * state of it to be ALLOCATED_MAPPED, as that is the only state
+			 * in which a physical allocation could transition to NOT_MOVABLE
+			 * from.
+			 */
+			if (kbase_is_page_migration_enabled())
+				kbase_set_phy_alloc_page_status(gpu_alloc, ALLOCATED_MAPPED);
 		}
 	}
 
@@ -941,13 +982,22 @@ int kbase_mem_flags_change(struct kbase_context *kctx, u64 gpu_addr, unsigned in
 
 	/* now we can lock down the context, and find the region */
 	down_write(kbase_mem_get_process_mmap_lock());
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	/* Validate the region */
 	reg = kbase_region_tracker_find_region_base_address(kctx, gpu_addr);
 	if (kbase_is_region_invalid_or_free(reg))
 		goto out_unlock;
 
+	/* There is no use case to support MEM_FLAGS_CHANGE ioctl for allocations
+	 * that have NO_USER_FREE flag set, to mark them as evictable/reclaimable.
+	 * This would usually include JIT allocations, Tiler heap related allocations
+	 * & GPU queue ringbuffer and none of them needs to be explicitly marked
+	 * as evictable by Userspace.
+	 */
+	if (kbase_va_region_is_no_user_free(reg))
+		goto out_unlock;
+
 	/* Is the region being transitioning between not needed and needed? */
 	prev_needed = (KBASE_REG_DONT_NEED & reg->flags) == KBASE_REG_DONT_NEED;
 	new_needed = (BASE_MEM_DONT_NEED & flags) == BASE_MEM_DONT_NEED;
@@ -1045,7 +1095,7 @@ int kbase_mem_flags_change(struct kbase_context *kctx, u64 gpu_addr, unsigned in
 		reg->flags = new_flags;
 
 out_unlock:
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 	up_write(kbase_mem_get_process_mmap_lock());
 out:
 	return ret;
@@ -1101,19 +1151,7 @@ int kbase_mem_do_sync_imported(struct kbase_context *kctx,
 			ret = 0;
 		}
 #else
-	/* Though the below version check could be superfluous depending upon the version condition
-	 * used for enabling KBASE_MEM_ION_SYNC_WORKAROUND, we still keep this check here to allow
-	 * ease of modification for non-ION systems or systems where ION has been patched.
-	 */
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS)
-		dma_buf_end_cpu_access(dma_buf,
-				0, dma_buf->size,
-				dir);
-		ret = 0;
-#else
-		ret = dma_buf_end_cpu_access(dma_buf,
-				dir);
-#endif
+		ret = dma_buf_end_cpu_access(dma_buf, dir);
 #endif /* KBASE_MEM_ION_SYNC_WORKAROUND */
 		break;
 	case KBASE_SYNC_TO_CPU:
@@ -1130,11 +1168,7 @@ int kbase_mem_do_sync_imported(struct kbase_context *kctx,
 			ret = 0;
 		}
 #else
-		ret = dma_buf_begin_cpu_access(dma_buf,
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS)
-				0, dma_buf->size,
-#endif
-				dir);
+		ret = dma_buf_begin_cpu_access(dma_buf, dir);
 #endif /* KBASE_MEM_ION_SYNC_WORKAROUND */
 		break;
 	}
@@ -1281,11 +1315,11 @@ int kbase_mem_umm_map(struct kbase_context *kctx,
 		gwt_mask = ~KBASE_REG_GPU_WR;
 #endif
 
-	err = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn,
-				     kbase_get_gpu_phy_pages(reg),
-				     kbase_reg_current_backed_size(reg),
-				     reg->flags & gwt_mask, kctx->as_nr,
-				     alloc->group_id, mmu_sync_info);
+	err = kbase_mmu_insert_pages_skip_status_update(kctx->kbdev, &kctx->mmu, reg->start_pfn,
+							kbase_get_gpu_phy_pages(reg),
+							kbase_reg_current_backed_size(reg),
+							reg->flags & gwt_mask, kctx->as_nr,
+							alloc->group_id, mmu_sync_info, NULL);
 	if (err)
 		goto bad_insert;
 
@@ -1298,11 +1332,11 @@ int kbase_mem_umm_map(struct kbase_context *kctx,
 		 * Assume alloc->nents is the number of actual pages in the
 		 * dma-buf memory.
 		 */
-		err = kbase_mmu_insert_single_page(
-			kctx, reg->start_pfn + alloc->nents,
-			kctx->aliasing_sink_page, reg->nr_pages - alloc->nents,
-			(reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR,
-			KBASE_MEM_GROUP_SINK, mmu_sync_info);
+		err = kbase_mmu_insert_single_imported_page(
+			kctx, reg->start_pfn + alloc->nents, kctx->aliasing_sink_page,
+			reg->nr_pages - alloc->nents,
+			(reg->flags | KBASE_REG_GPU_RD) & ~KBASE_REG_GPU_WR, KBASE_MEM_GROUP_SINK,
+			mmu_sync_info);
 		if (err)
 			goto bad_pad_insert;
 	}
@@ -1310,11 +1344,8 @@ int kbase_mem_umm_map(struct kbase_context *kctx,
 	return 0;
 
 bad_pad_insert:
-	kbase_mmu_teardown_pages(kctx->kbdev,
-				 &kctx->mmu,
-				 reg->start_pfn,
-				 alloc->nents,
-				 kctx->as_nr);
+	kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn, alloc->pages,
+					  alloc->nents, alloc->nents, kctx->as_nr);
 bad_insert:
 	kbase_mem_umm_unmap_attachment(kctx, alloc);
 bad_map_attachment:
@@ -1342,11 +1373,9 @@ void kbase_mem_umm_unmap(struct kbase_context *kctx,
 	if (!kbase_is_region_invalid_or_free(reg) && reg->gpu_alloc == alloc) {
 		int err;
 
-		err = kbase_mmu_teardown_pages(kctx->kbdev,
-					       &kctx->mmu,
-					       reg->start_pfn,
-					       reg->nr_pages,
-					       kctx->as_nr);
+		err = kbase_mmu_teardown_imported_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn,
+							alloc->pages, reg->nr_pages, reg->nr_pages,
+							kctx->as_nr);
 		WARN_ON(err);
 	}
 
@@ -1393,6 +1422,7 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx,
 	struct kbase_va_region *reg;
 	struct dma_buf *dma_buf;
 	struct dma_buf_attachment *dma_attachment;
+	enum kbase_memory_zone zone;
 	bool shared_zone = false;
 	bool need_sync = false;
 	int group_id;
@@ -1418,6 +1448,9 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx,
 		return NULL;
 	}
 
+	if (!kbase_import_size_is_valid(kctx->kbdev, *va_pages))
+		return NULL;
+
 	/* ignore SAME_VA */
 	*flags &= ~BASE_MEM_SAME_VA;
 
@@ -1438,24 +1471,21 @@ static struct kbase_va_region *kbase_mem_from_umm(struct kbase_context *kctx,
 	if (*flags & BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP)
 		need_sync = true;
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) {
+	if (!kbase_ctx_compat_mode(kctx)) {
 		/*
 		 * 64-bit tasks require us to reserve VA on the CPU that we use
 		 * on the GPU.
 		 */
 		shared_zone = true;
 	}
-#endif
 
 	if (shared_zone) {
 		*flags |= BASE_MEM_NEED_MMAP;
-		reg = kbase_alloc_free_region(&kctx->reg_rbtree_same,
-				0, *va_pages, KBASE_REG_ZONE_SAME_VA);
-	} else {
-		reg = kbase_alloc_free_region(&kctx->reg_rbtree_custom,
-				0, *va_pages, KBASE_REG_ZONE_CUSTOM_VA);
-	}
+		zone = SAME_VA_ZONE;
+	} else
+		zone = CUSTOM_VA_ZONE;
+
+	reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *va_pages);
 
 	if (!reg) {
 		dma_buf_detach(dma_buf, dma_attachment);
@@ -1539,16 +1569,18 @@ static struct kbase_va_region *kbase_mem_from_user_buffer(
 		struct kbase_context *kctx, unsigned long address,
 		unsigned long size, u64 *va_pages, u64 *flags)
 {
-	long i;
+	long i, dma_mapped_pages;
 	struct kbase_va_region *reg;
-	struct rb_root *rbtree;
 	long faulted_pages;
-	int zone = KBASE_REG_ZONE_CUSTOM_VA;
+	enum kbase_memory_zone zone = CUSTOM_VA_ZONE;
 	bool shared_zone = false;
 	u32 cache_line_alignment = kbase_get_cache_line_alignment(kctx->kbdev);
 	struct kbase_alloc_import_user_buf *user_buf;
 	struct page **pages = NULL;
+	struct tagged_addr *pa;
+	struct device *dev;
 	int write;
+	enum dma_data_direction dma_dir;
 
 	/* Flag supported only for dma-buf imported memory */
 	if (*flags & BASE_MEM_IMPORT_SYNC_ON_MAP_UNMAP)
@@ -1585,31 +1617,29 @@ static struct kbase_va_region *kbase_mem_from_user_buffer(
 		/* 64-bit address range is the max */
 		goto bad_size;
 
+	if (!kbase_import_size_is_valid(kctx->kbdev, *va_pages))
+		goto bad_size;
+
 	/* SAME_VA generally not supported with imported memory (no known use cases) */
 	*flags &= ~BASE_MEM_SAME_VA;
 
 	if (*flags & BASE_MEM_IMPORT_SHARED)
 		shared_zone = true;
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) {
+	if (!kbase_ctx_compat_mode(kctx)) {
 		/*
 		 * 64-bit tasks require us to reserve VA on the CPU that we use
 		 * on the GPU.
 		 */
 		shared_zone = true;
 	}
-#endif
 
 	if (shared_zone) {
 		*flags |= BASE_MEM_NEED_MMAP;
-		zone = KBASE_REG_ZONE_SAME_VA;
-		rbtree = &kctx->reg_rbtree_same;
-	} else
-		rbtree = &kctx->reg_rbtree_custom;
-
-	reg = kbase_alloc_free_region(rbtree, 0, *va_pages, zone);
+		zone = SAME_VA_ZONE;
+	}
 
+	reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *va_pages);
 	if (!reg)
 		goto no_region;
 
@@ -1634,11 +1664,7 @@ static struct kbase_va_region *kbase_mem_from_user_buffer(
 	user_buf->address = address;
 	user_buf->nr_pages = *va_pages;
 	user_buf->mm = current->mm;
-#if KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE
-	atomic_inc(&current->mm->mm_count);
-#else
-	mmgrab(current->mm);
-#endif
+	kbase_mem_mmgrab();
 	if (reg->gpu_alloc->properties & KBASE_MEM_PHY_ALLOC_LARGE)
 		user_buf->pages = vmalloc(*va_pages * sizeof(struct page *));
 	else
@@ -1663,19 +1689,9 @@ static struct kbase_va_region *kbase_mem_from_user_buffer(
 	down_read(kbase_mem_get_process_mmap_lock());
 
 	write = reg->flags & (KBASE_REG_CPU_WR | KBASE_REG_GPU_WR);
+	dma_dir = write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
 
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE
-	faulted_pages = get_user_pages(current, current->mm, address, *va_pages,
-#if KERNEL_VERSION(4, 4, 168) <= LINUX_VERSION_CODE && \
-KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
-			write ? FOLL_WRITE : 0, pages, NULL);
-#else
-			write, 0, pages, NULL);
-#endif
-#elif KERNEL_VERSION(4, 9, 0) > LINUX_VERSION_CODE
-	faulted_pages = get_user_pages(address, *va_pages,
-			write, 0, pages, NULL);
-#elif KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE
+#if KERNEL_VERSION(5, 9, 0) > LINUX_VERSION_CODE
 	faulted_pages = get_user_pages(address, *va_pages,
 			write ? FOLL_WRITE : 0, pages, NULL);
 #else
@@ -1706,31 +1722,44 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
 	reg->gpu_alloc->nents = 0;
 	reg->extension = 0;
 
-	if (pages) {
-		struct device *dev = kctx->kbdev->dev;
-		unsigned long local_size = user_buf->size;
-		unsigned long offset = user_buf->address & ~PAGE_MASK;
-		struct tagged_addr *pa = kbase_get_gpu_phy_pages(reg);
+	pa = kbase_get_gpu_phy_pages(reg);
+	dev = kctx->kbdev->dev;
 
+	if (pages) {
 		/* Top bit signifies that this was pinned on import */
 		user_buf->current_mapping_usage_count |= PINNED_ON_IMPORT;
 
+		/* Manual CPU cache synchronization.
+		 *
+		 * The driver disables automatic CPU cache synchronization because the
+		 * memory pages that enclose the imported region may also contain
+		 * sub-regions which are not imported and that are allocated and used
+		 * by the user process. This may be the case of memory at the beginning
+		 * of the first page and at the end of the last page. Automatic CPU cache
+		 * synchronization would force some operations on those memory allocations,
+		 * unbeknown to the user process: in particular, a CPU cache invalidate
+		 * upon unmapping would destroy the content of dirty CPU caches and cause
+		 * the user process to lose CPU writes to the non-imported sub-regions.
+		 *
+		 * When the GPU claims ownership of the imported memory buffer, it shall
+		 * commit CPU writes for the whole of all pages that enclose the imported
+		 * region, otherwise the initial content of memory would be wrong.
+		 */
 		for (i = 0; i < faulted_pages; i++) {
 			dma_addr_t dma_addr;
-			unsigned long min;
-
-			min = MIN(PAGE_SIZE - offset, local_size);
-			dma_addr = dma_map_page(dev, pages[i],
-					offset, min,
-					DMA_BIDIRECTIONAL);
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+			dma_addr = dma_map_page(dev, pages[i], 0, PAGE_SIZE, dma_dir);
+#else
+			dma_addr = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE, dma_dir,
+						      DMA_ATTR_SKIP_CPU_SYNC);
+#endif
 			if (dma_mapping_error(dev, dma_addr))
 				goto unwind_dma_map;
 
 			user_buf->dma_addrs[i] = dma_addr;
 			pa[i] = as_tagged(page_to_phys(pages[i]));
 
-			local_size -= min;
-			offset = 0;
+			dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir);
 		}
 
 		reg->gpu_alloc->nents = faulted_pages;
@@ -1739,13 +1768,29 @@ KERNEL_VERSION(4, 5, 0) > LINUX_VERSION_CODE
 	return reg;
 
 unwind_dma_map:
-	while (i--) {
-		dma_unmap_page(kctx->kbdev->dev,
-				user_buf->dma_addrs[i],
-				PAGE_SIZE, DMA_BIDIRECTIONAL);
+	dma_mapped_pages = i;
+	/* Run the unmap loop in the same order as map loop, and perform again
+	 * CPU cache synchronization to re-write the content of dirty CPU caches
+	 * to memory. This precautionary measure is kept here to keep this code
+	 * aligned with kbase_jd_user_buf_map() to allow for a potential refactor
+	 * in the future.
+	 */
+	for (i = 0; i < dma_mapped_pages; i++) {
+		dma_addr_t dma_addr = user_buf->dma_addrs[i];
+
+		dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, dma_dir);
+#if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
+		dma_unmap_page(dev, dma_addr, PAGE_SIZE, dma_dir);
+#else
+		dma_unmap_page_attrs(dev, dma_addr, PAGE_SIZE, dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+#endif
 	}
 fault_mismatch:
 	if (pages) {
+		/* In this case, the region was not yet in the region tracker,
+		 * and so there are no CPU mappings to remove before we unpin
+		 * the page
+		 */
 		for (i = 0; i < faulted_pages; i++)
 			kbase_unpin_user_buf_page(pages[i]);
 	}
@@ -1758,7 +1803,6 @@ no_alloc_obj:
 no_region:
 bad_size:
 	return NULL;
-
 }
 
 
@@ -1770,6 +1814,8 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 	u64 gpu_va;
 	size_t i;
 	bool coherent;
+	uint64_t max_stride;
+	enum kbase_memory_zone zone;
 
 	/* Calls to this function are inherently asynchronous, with respect to
 	 * MMU operations.
@@ -1802,30 +1848,31 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 	if (!nents)
 		goto bad_nents;
 
-	if (nents > (U64_MAX / PAGE_SIZE) / stride)
+	max_stride = div64_u64(U64_MAX, nents);
+
+	if (stride > max_stride)
+		goto bad_size;
+
+	if ((nents * stride) > (U64_MAX / PAGE_SIZE))
 		/* 64-bit address range is the max */
 		goto bad_size;
 
 	/* calculate the number of pages this alias will cover */
 	*num_pages = nents * stride;
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) {
+	if (!kbase_alias_size_is_valid(kctx->kbdev, *num_pages))
+		goto bad_size;
+
+	if (!kbase_ctx_compat_mode(kctx)) {
 		/* 64-bit tasks must MMAP anyway, but not expose this address to
 		 * clients
 		 */
+		zone = SAME_VA_ZONE;
 		*flags |= BASE_MEM_NEED_MMAP;
-		reg = kbase_alloc_free_region(&kctx->reg_rbtree_same, 0,
-				*num_pages,
-				KBASE_REG_ZONE_SAME_VA);
-	} else {
-#else
-	if (1) {
-#endif
-		reg = kbase_alloc_free_region(&kctx->reg_rbtree_custom,
-				0, *num_pages,
-				KBASE_REG_ZONE_CUSTOM_VA);
-	}
+	} else
+		zone = CUSTOM_VA_ZONE;
+
+	reg = kbase_ctx_alloc_free_region(kctx, zone, 0, *num_pages);
 
 	if (!reg)
 		goto no_reg;
@@ -1847,7 +1894,7 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 	if (!reg->gpu_alloc->imported.alias.aliased)
 		goto no_aliased_array;
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	/* validate and add src handles */
 	for (i = 0; i < nents; i++) {
@@ -1873,9 +1920,9 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 			/* validate found region */
 			if (kbase_is_region_invalid_or_free(aliasing_reg))
 				goto bad_handle; /* Not found/already free */
-			if (aliasing_reg->flags & KBASE_REG_DONT_NEED)
+			if (kbase_is_region_shrinkable(aliasing_reg))
 				goto bad_handle; /* Ephemeral region */
-			if (aliasing_reg->flags & KBASE_REG_NO_USER_FREE)
+			if (kbase_va_region_is_no_user_free(aliasing_reg))
 				goto bad_handle; /* JIT regions can't be
 						  * aliased. NO_USER_FREE flag
 						  * covers the entire lifetime
@@ -1930,8 +1977,7 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 		}
 	}
 
-#if IS_ENABLED(CONFIG_64BIT)
-	if (!kbase_ctx_flag(kctx, KCTX_COMPAT)) {
+	if (!kbase_ctx_compat_mode(kctx)) {
 		/* Bind to a cookie */
 		if (bitmap_empty(kctx->cookies, BITS_PER_LONG)) {
 			dev_err(kctx->kbdev->dev, "No cookies available for allocation!");
@@ -1946,10 +1992,8 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 		/* relocate to correct base */
 		gpu_va += PFN_DOWN(BASE_MEM_COOKIE_BASE);
 		gpu_va <<= PAGE_SHIFT;
-	} else /* we control the VA */ {
-#else
-	if (1) {
-#endif
+	} else {
+		/* we control the VA */
 		if (kbase_gpu_mmap(kctx, reg, 0, *num_pages, 1,
 				   mmu_sync_info) != 0) {
 			dev_warn(kctx->kbdev->dev, "Failed to map memory on GPU");
@@ -1962,20 +2006,18 @@ u64 kbase_mem_alias(struct kbase_context *kctx, u64 *flags, u64 stride,
 	reg->flags &= ~KBASE_REG_FREE;
 	reg->flags &= ~KBASE_REG_GROWABLE;
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 	return gpu_va;
 
-#if IS_ENABLED(CONFIG_64BIT)
 no_cookie:
-#endif
 no_mmap:
 bad_handle:
 	/* Marking the source allocs as not being mapped on the GPU and putting
 	 * them is handled by putting reg's allocs, so no rollback of those
 	 * actions is done here.
 	 */
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 no_aliased_array:
 invalid_flags:
 	kbase_mem_phy_alloc_put(reg->cpu_alloc);
@@ -2035,7 +2077,10 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type,
 		/* Remove COHERENT_SYSTEM flag if coherent mem is unavailable */
 		*flags &= ~BASE_MEM_COHERENT_SYSTEM;
 	}
-
+	if (((*flags & BASE_MEM_CACHED_CPU) == 0) && (type == BASE_MEM_IMPORT_TYPE_USER_BUFFER)) {
+		dev_warn(kctx->kbdev->dev, "USER_BUFFER must be CPU cached");
+		goto bad_flags;
+	}
 	if ((padding != 0) && (type != BASE_MEM_IMPORT_TYPE_UMM)) {
 		dev_warn(kctx->kbdev->dev,
 				"padding is only supported for UMM");
@@ -2083,7 +2128,7 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type,
 	if (!reg)
 		goto no_reg;
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	/* mmap needed to setup VA? */
 	if (*flags & (BASE_MEM_SAME_VA | BASE_MEM_NEED_MMAP)) {
@@ -2118,13 +2163,13 @@ int kbase_mem_import(struct kbase_context *kctx, enum base_mem_import_type type,
 	/* clear out private flags */
 	*flags &= ((1UL << BASE_MEM_FLAGS_NR_BITS) - 1);
 
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 
 	return 0;
 
 no_gpu_va:
 no_cookie:
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 	kbase_mem_phy_alloc_put(reg->cpu_alloc);
 	kbase_mem_phy_alloc_put(reg->gpu_alloc);
 	kfree(reg);
@@ -2149,11 +2194,9 @@ int kbase_mem_grow_gpu_mapping(struct kbase_context *kctx,
 
 	/* Map the new pages into the GPU */
 	phy_pages = kbase_get_gpu_phy_pages(reg);
-	ret = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu,
-				     reg->start_pfn + old_pages,
-				     phy_pages + old_pages, delta, reg->flags,
-				     kctx->as_nr, reg->gpu_alloc->group_id,
-				     mmu_sync_info);
+	ret = kbase_mmu_insert_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + old_pages,
+				     phy_pages + old_pages, delta, reg->flags, kctx->as_nr,
+				     reg->gpu_alloc->group_id, mmu_sync_info, reg);
 
 	return ret;
 }
@@ -2168,33 +2211,21 @@ void kbase_mem_shrink_cpu_mapping(struct kbase_context *kctx,
 		/* Nothing to do */
 		return;
 
-	unmap_mapping_range(kctx->filp->f_inode->i_mapping,
+	unmap_mapping_range(kctx->kfile->filp->f_inode->i_mapping,
 			(gpu_va_start + new_pages)<<PAGE_SHIFT,
 			(old_pages - new_pages)<<PAGE_SHIFT, 1);
 }
 
-/**
- * kbase_mem_shrink_gpu_mapping - Shrink the GPU mapping of an allocation
- * @kctx:      Context the region belongs to
- * @reg:       The GPU region or NULL if there isn't one
- * @new_pages: The number of pages after the shrink
- * @old_pages: The number of pages before the shrink
- *
- * Return: 0 on success, negative -errno on error
- *
- * Unmap the shrunk pages from the GPU mapping. Note that the size of the region
- * itself is unmodified as we still need to reserve the VA, only the page tables
- * will be modified by this function.
- */
-static int kbase_mem_shrink_gpu_mapping(struct kbase_context *const kctx,
-		struct kbase_va_region *const reg,
-		u64 const new_pages, u64 const old_pages)
+int kbase_mem_shrink_gpu_mapping(struct kbase_context *const kctx,
+				 struct kbase_va_region *const reg, u64 const new_pages,
+				 u64 const old_pages)
 {
 	u64 delta = old_pages - new_pages;
+	struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
 	int ret = 0;
 
-	ret = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu,
-			reg->start_pfn + new_pages, delta, kctx->as_nr);
+	ret = kbase_mmu_teardown_pages(kctx->kbdev, &kctx->mmu, reg->start_pfn + new_pages,
+				       alloc->pages + new_pages, delta, delta, kctx->as_nr);
 
 	return ret;
 }
@@ -2221,7 +2252,7 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
 	}
 
 	down_write(kbase_mem_get_process_mmap_lock());
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	/* Validate the region */
 	reg = kbase_region_tracker_find_region_base_address(kctx, gpu_addr);
@@ -2258,8 +2289,11 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
 
 	if (atomic_read(&reg->cpu_alloc->kernel_mappings) > 0)
 		goto out_unlock;
-	/* can't grow regions which are ephemeral */
-	if (reg->flags & KBASE_REG_DONT_NEED)
+
+	if (kbase_is_region_shrinkable(reg))
+		goto out_unlock;
+
+	if (kbase_va_region_is_no_user_free(reg))
 		goto out_unlock;
 
 #ifdef CONFIG_MALI_MEMORY_FULLY_BACKED
@@ -2322,7 +2356,7 @@ int kbase_mem_commit(struct kbase_context *kctx, u64 gpu_addr, u64 new_pages)
 	}
 
 out_unlock:
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 	if (read_locked)
 		up_read(kbase_mem_get_process_mmap_lock());
 	else
@@ -2350,6 +2384,21 @@ int kbase_mem_shrink(struct kbase_context *const kctx,
 		return -EINVAL;
 
 	delta = old_pages - new_pages;
+	if (kctx->kbdev->pagesize_2mb) {
+		struct tagged_addr *start_free = reg->gpu_alloc->pages + new_pages;
+
+		/* Move the end of new committed range to a valid location.
+		 * This mirrors the adjustment done inside kbase_free_phy_pages_helper().
+		 */
+		while (delta && is_huge(*start_free) && !is_huge_head(*start_free)) {
+			start_free++;
+			new_pages++;
+			delta--;
+		}
+
+		if (!delta)
+			return 0;
+	}
 
 	/* Update the GPU mapping */
 	err = kbase_mem_shrink_gpu_mapping(kctx, reg,
@@ -2362,18 +2411,6 @@ int kbase_mem_shrink(struct kbase_context *const kctx,
 		kbase_free_phy_pages_helper(reg->cpu_alloc, delta);
 		if (reg->cpu_alloc != reg->gpu_alloc)
 			kbase_free_phy_pages_helper(reg->gpu_alloc, delta);
-#ifdef CONFIG_MALI_2MB_ALLOC
-		if (kbase_reg_current_backed_size(reg) > new_pages) {
-			old_pages = new_pages;
-			new_pages = kbase_reg_current_backed_size(reg);
-
-			/* Update GPU mapping. */
-			err = kbase_mem_grow_gpu_mapping(kctx, reg,
-					new_pages, old_pages, CALLER_MMU_ASYNC);
-		}
-#else
-		WARN_ON(kbase_reg_current_backed_size(reg) != new_pages);
-#endif
 	}
 
 	return err;
@@ -2404,55 +2441,27 @@ static void kbase_cpu_vm_close(struct vm_area_struct *vma)
 	KBASE_DEBUG_ASSERT(map->kctx);
 	KBASE_DEBUG_ASSERT(map->alloc);
 
-	kbase_gpu_vm_lock(map->kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(map->kctx);
 
 	if (map->free_on_close) {
-		KBASE_DEBUG_ASSERT((map->region->flags & KBASE_REG_ZONE_MASK) ==
-				KBASE_REG_ZONE_SAME_VA);
+		KBASE_DEBUG_ASSERT(kbase_bits_to_zone(map->region->flags) == SAME_VA_ZONE);
 		/* Avoid freeing memory on the process death which results in
 		 * GPU Page Fault. Memory will be freed in kbase_destroy_context
 		 */
-		if (!(current->flags & PF_EXITING))
+		if (!is_process_exiting(vma))
 			kbase_mem_free_region(map->kctx, map->region);
 	}
 
 	list_del(&map->mappings_list);
 
 	kbase_va_region_alloc_put(map->kctx, map->region);
-	kbase_gpu_vm_unlock(map->kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(map->kctx);
 
 	kbase_mem_phy_alloc_put(map->alloc);
+	kbase_file_dec_cpu_mapping_count(map->kctx->kfile);
 	kfree(map);
 }
 
-static int kbase_cpu_vm_split(struct vm_area_struct *vma, unsigned long addr)
-{
-	struct kbase_cpu_mapping *map = vma->vm_private_data;
-
-	KBASE_DEBUG_ASSERT(map->kctx);
-	KBASE_DEBUG_ASSERT(map->count > 0);
-
-	/*
-	 * We should never have a map/munmap pairing on a kbase_context managed
-	 * vma such that the munmap only unmaps a portion of the vma range.
-	 * Should this arise, the kernel attempts to split the vma range to
-	 * ensure that it only unmaps the requested region. To achieve this it
-	 * attempts to split the containing vma split occurs, and this callback
-	 * is reached. By returning -EINVAL here we inform the kernel that such
-	 * splits are not supported so that it instead unmaps the entire region.
-	 * Since this is indicative of a bug in the map/munmap code in the
-	 * driver, we raise a WARN here to indicate that this invalid
-	 * state has been reached.
-	 */
-	dev_warn(map->kctx->kbdev->dev,
-		"%s: vma region split requested: addr=%lx map->count=%d reg=%p reg->start_pfn=%llx reg->nr_pages=%zu",
-		__func__, addr, map->count, map->region, map->region->start_pfn,
-		map->region->nr_pages);
-	WARN_ON_ONCE(1);
-
-	return -EINVAL;
-}
-
 static struct kbase_aliased *get_aliased_alloc(struct vm_area_struct *vma,
 					struct kbase_va_region *reg,
 					pgoff_t *start_off,
@@ -2508,9 +2517,17 @@ static vm_fault_t kbase_cpu_vm_fault(struct vm_fault *vmf)
 	KBASE_DEBUG_ASSERT(map->kctx);
 	KBASE_DEBUG_ASSERT(map->alloc);
 
+	kbase_gpu_vm_lock(map->kctx);
+
+	/* Reject faults for SAME_VA mapping of UMM allocations */
+	if ((map->alloc->type == KBASE_MEM_TYPE_IMPORTED_UMM) && map->free_on_close) {
+		dev_warn(map->kctx->kbdev->dev, "Invalid CPU access to UMM memory for ctx %d_%d",
+			 map->kctx->tgid, map->kctx->id);
+		goto exit;
+	}
+
 	map_start_pgoff = vma->vm_pgoff - map->region->start_pfn;
 
-	kbase_gpu_vm_lock(map->kctx);
 	if (unlikely(map->region->cpu_alloc->type == KBASE_MEM_TYPE_ALIAS)) {
 		struct kbase_aliased *aliased =
 		      get_aliased_alloc(vma, map->region, &map_start_pgoff, 1);
@@ -2561,7 +2578,6 @@ exit:
 const struct vm_operations_struct kbase_vm_ops = {
 	.open  = kbase_cpu_vm_open,
 	.close = kbase_cpu_vm_close,
-	.may_split = kbase_cpu_vm_split,
 	.fault = kbase_cpu_vm_fault
 };
 
@@ -2626,9 +2642,9 @@ static int kbase_cpu_mmap(struct kbase_context *kctx,
 		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 	}
 
-	if (!kaddr) {
+	if (!kaddr)
 		vm_flags_set(vma, VM_PFNMAP);
-	} else {
+	else {
 		WARN_ON(aligned_offset);
 		/* MIXEDMAP so we can vfree the kaddr early and not track it after map time */
 		vm_flags_set(vma, VM_MIXEDMAP);
@@ -2652,6 +2668,7 @@ static int kbase_cpu_mmap(struct kbase_context *kctx,
 		map->alloc->properties |= KBASE_MEM_PHY_ALLOC_ACCESSED_CACHED;
 
 	list_add(&map->mappings_list, &map->alloc->mappings);
+	kbase_file_inc_cpu_mapping_count(kctx->kfile);
 
  out:
 	return err;
@@ -2673,7 +2690,6 @@ static void kbase_free_unused_jit_allocations(struct kbase_context *kctx)
 	while (kbase_jit_evict(kctx))
 		;
 }
-#endif
 
 static int kbase_mmu_dump_mmap(struct kbase_context *kctx,
 			struct vm_area_struct *vma,
@@ -2686,13 +2702,13 @@ static int kbase_mmu_dump_mmap(struct kbase_context *kctx,
 	size_t size;
 	int err = 0;
 
+	lockdep_assert_held(&kctx->reg_lock);
+
 	dev_dbg(kctx->kbdev->dev, "%s\n", __func__);
 	size = (vma->vm_end - vma->vm_start);
 	nr_pages = size >> PAGE_SHIFT;
 
-#ifdef CONFIG_MALI_VECTOR_DUMP
 	kbase_free_unused_jit_allocations(kctx);
-#endif
 
 	kaddr = kbase_mmu_dump(kctx, nr_pages);
 
@@ -2701,8 +2717,7 @@ static int kbase_mmu_dump_mmap(struct kbase_context *kctx,
 		goto out;
 	}
 
-	new_reg = kbase_alloc_free_region(&kctx->reg_rbtree_same, 0, nr_pages,
-			KBASE_REG_ZONE_SAME_VA);
+	new_reg = kbase_ctx_alloc_free_region(kctx, SAME_VA_ZONE, 0, nr_pages);
 	if (!new_reg) {
 		err = -ENOMEM;
 		WARN_ON(1);
@@ -2740,7 +2755,7 @@ out_va_region:
 out:
 	return err;
 }
-
+#endif
 
 void kbase_os_mem_map_lock(struct kbase_context *kctx)
 {
@@ -2760,7 +2775,7 @@ static int kbasep_reg_mmap(struct kbase_context *kctx,
 			   size_t *nr_pages, size_t *aligned_offset)
 
 {
-	int cookie = vma->vm_pgoff - PFN_DOWN(BASE_MEM_COOKIE_BASE);
+	unsigned int cookie = vma->vm_pgoff - PFN_DOWN(BASE_MEM_COOKIE_BASE);
 	struct kbase_va_region *reg;
 	int err = 0;
 
@@ -2801,7 +2816,6 @@ static int kbasep_reg_mmap(struct kbase_context *kctx,
 
 	/* adjust down nr_pages to what we have physically */
 	*nr_pages = kbase_reg_current_backed_size(reg);
-
 	if (kbase_gpu_mmap(kctx, reg, vma->vm_start + *aligned_offset,
 			   reg->nr_pages, 1, mmu_sync_info) != 0) {
 		dev_err(kctx->kbdev->dev, "%s:%d\n", __FILE__, __LINE__);
@@ -2861,7 +2875,7 @@ int kbase_context_mmap(struct kbase_context *const kctx,
 		goto out;
 	}
 
-	kbase_gpu_vm_lock(kctx);
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
 
 	if (vma->vm_pgoff == PFN_DOWN(BASE_MEM_MAP_TRACKING_HANDLE)) {
 		/* The non-mapped tracking helper page */
@@ -2881,6 +2895,7 @@ int kbase_context_mmap(struct kbase_context *const kctx,
 		err = -EINVAL;
 		goto out_unlock;
 	case PFN_DOWN(BASE_MEM_MMU_DUMP_HANDLE):
+#if defined(CONFIG_MALI_VECTOR_DUMP)
 		/* MMU dump */
 		err = kbase_mmu_dump_mmap(kctx, vma, &reg, &kaddr);
 		if (err != 0)
@@ -2888,17 +2903,22 @@ int kbase_context_mmap(struct kbase_context *const kctx,
 		/* free the region on munmap */
 		free_on_close = 1;
 		break;
+#else
+		/* Illegal handle for direct map */
+		err = -EINVAL;
+		goto out_unlock;
+#endif /* defined(CONFIG_MALI_VECTOR_DUMP) */
 #if MALI_USE_CSF
 	case PFN_DOWN(BASEP_MEM_CSF_USER_REG_PAGE_HANDLE):
-		kbase_gpu_vm_unlock(kctx);
+		kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 		err = kbase_csf_cpu_mmap_user_reg_page(kctx, vma);
 		goto out;
 	case PFN_DOWN(BASEP_MEM_CSF_USER_IO_PAGES_HANDLE) ...
 	     PFN_DOWN(BASE_MEM_COOKIE_BASE) - 1: {
-		kbase_gpu_vm_unlock(kctx);
-		mutex_lock(&kctx->csf.lock);
+		kbase_gpu_vm_unlock_with_pmode_sync(kctx);
+		rt_mutex_lock(&kctx->csf.lock);
 		err = kbase_csf_cpu_mmap_user_io_pages(kctx, vma);
-		mutex_unlock(&kctx->csf.lock);
+		rt_mutex_unlock(&kctx->csf.lock);
 		goto out;
 	}
 #endif
@@ -2975,7 +2995,7 @@ int kbase_context_mmap(struct kbase_context *const kctx,
 
 	err = kbase_cpu_mmap(kctx, reg, vma, kaddr, nr_pages, aligned_offset,
 			free_on_close);
-
+#if defined(CONFIG_MALI_VECTOR_DUMP)
 	if (vma->vm_pgoff == PFN_DOWN(BASE_MEM_MMU_DUMP_HANDLE)) {
 		/* MMU dump - userspace should now have a reference on
 		 * the pages, so we can now free the kernel mapping
@@ -2994,9 +3014,9 @@ int kbase_context_mmap(struct kbase_context *const kctx,
 		 */
 		vma->vm_pgoff = PFN_DOWN(vma->vm_start);
 	}
-
+#endif /* defined(CONFIG_MALI_VECTOR_DUMP) */
 out_unlock:
-	kbase_gpu_vm_unlock(kctx);
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
 out:
 	if (err)
 		dev_err(dev, "mmap failed %d\n", err);
@@ -3036,9 +3056,108 @@ void kbase_sync_mem_regions(struct kbase_context *kctx,
 	}
 }
 
-static int kbase_vmap_phy_pages(struct kbase_context *kctx,
-		struct kbase_va_region *reg, u64 offset_bytes, size_t size,
-		struct kbase_vmap_struct *map)
+/**
+ * kbase_vmap_phy_pages_migrate_count_increment - Increment VMAP count for
+ *                                                array of physical pages
+ *
+ * @pages:      Array of pages.
+ * @page_count: Number of pages.
+ * @flags:      Region flags.
+ *
+ * This function is supposed to be called only if page migration support
+ * is enabled in the driver.
+ *
+ * The counter of kernel CPU mappings of the physical pages involved in a
+ * mapping operation is incremented by 1. Errors are handled by making pages
+ * not movable. Permanent kernel mappings will be marked as not movable, too.
+ */
+static void kbase_vmap_phy_pages_migrate_count_increment(struct tagged_addr *pages,
+							 size_t page_count, unsigned long flags)
+{
+	size_t i;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+
+	for (i = 0; i < page_count; i++) {
+		struct page *p = as_page(pages[i]);
+		struct kbase_page_metadata *page_md = kbase_page_private(p);
+
+		/* Skip the 4KB page that is part of a large page, as the large page is
+		 * excluded from the migration process.
+		 */
+		if (is_huge(pages[i]) || is_partial(pages[i]))
+			continue;
+
+		spin_lock(&page_md->migrate_lock);
+		/* Mark permanent kernel mappings as NOT_MOVABLE because they're likely
+		 * to stay mapped for a long time. However, keep on counting the number
+		 * of mappings even for them: they don't represent an exception for the
+		 * vmap_count.
+		 *
+		 * At the same time, errors need to be handled if a client tries to add
+		 * too many mappings, hence a page may end up in the NOT_MOVABLE state
+		 * anyway even if it's not a permanent kernel mapping.
+		 */
+		if (flags & KBASE_REG_PERMANENT_KERNEL_MAPPING)
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+		if (page_md->vmap_count < U8_MAX)
+			page_md->vmap_count++;
+		else
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+		spin_unlock(&page_md->migrate_lock);
+	}
+}
+
+/**
+ * kbase_vunmap_phy_pages_migrate_count_decrement - Decrement VMAP count for
+ *                                                  array of physical pages
+ *
+ * @pages:      Array of pages.
+ * @page_count: Number of pages.
+ *
+ * This function is supposed to be called only if page migration support
+ * is enabled in the driver.
+ *
+ * The counter of kernel CPU mappings of the physical pages involved in a
+ * mapping operation is decremented by 1. Errors are handled by making pages
+ * not movable.
+ */
+static void kbase_vunmap_phy_pages_migrate_count_decrement(struct tagged_addr *pages,
+							   size_t page_count)
+{
+	size_t i;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+
+	for (i = 0; i < page_count; i++) {
+		struct page *p = as_page(pages[i]);
+		struct kbase_page_metadata *page_md = kbase_page_private(p);
+
+		/* Skip the 4KB page that is part of a large page, as the large page is
+		 * excluded from the migration process.
+		 */
+		if (is_huge(pages[i]) || is_partial(pages[i]))
+			continue;
+
+		spin_lock(&page_md->migrate_lock);
+		/* Decrement the number of mappings for all kinds of pages, including
+		 * pages which are NOT_MOVABLE (e.g. permanent kernel mappings).
+		 * However, errors still need to be handled if a client tries to remove
+		 * more mappings than created.
+		 */
+		if (page_md->vmap_count == 0)
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+		else
+			page_md->vmap_count--;
+		spin_unlock(&page_md->migrate_lock);
+	}
+}
+
+static int kbase_vmap_phy_pages(struct kbase_context *kctx, struct kbase_va_region *reg,
+				u64 offset_bytes, size_t size, struct kbase_vmap_struct *map,
+				kbase_vmap_flag vmap_flags)
 {
 	unsigned long page_index;
 	unsigned int offset_in_page = offset_bytes & ~PAGE_MASK;
@@ -3049,6 +3168,12 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx,
 	pgprot_t prot;
 	size_t i;
 
+	if (WARN_ON(vmap_flags & ~KBASE_VMAP_INPUT_FLAGS))
+		return -EINVAL;
+
+	if (WARN_ON(kbase_is_region_invalid_or_free(reg)))
+		return -EINVAL;
+
 	if (!size || !map || !reg->cpu_alloc || !reg->gpu_alloc)
 		return -EINVAL;
 
@@ -3065,6 +3190,17 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx,
 	if (page_index + page_count > kbase_reg_current_backed_size(reg))
 		return -ENOMEM;
 
+	if ((vmap_flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) &&
+	    (page_count > (KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES -
+			   atomic_read(&kctx->permanent_mapped_pages)))) {
+		dev_warn(
+			kctx->kbdev->dev,
+			"Request for %llu more pages mem needing a permanent mapping would breach limit %lu, currently at %d pages",
+			(u64)page_count, KBASE_PERMANENTLY_MAPPED_MEM_LIMIT_PAGES,
+			atomic_read(&kctx->permanent_mapped_pages));
+		return -ENOMEM;
+	}
+
 	if (reg->flags & KBASE_REG_DONT_NEED)
 		return -EINVAL;
 
@@ -3091,6 +3227,13 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx,
 	 */
 	cpu_addr = vmap(pages, page_count, VM_MAP, prot);
 
+	/* If page migration is enabled, increment the number of VMA mappings
+	 * of all physical pages. In case of errors, e.g. too many mappings,
+	 * make the page not movable to prevent trouble.
+	 */
+	if (kbase_is_page_migration_enabled() && !kbase_mem_is_imported(reg->gpu_alloc->type))
+		kbase_vmap_phy_pages_migrate_count_increment(page_array, page_count, reg->flags);
+
 	kfree(pages);
 
 	if (!cpu_addr)
@@ -3103,61 +3246,79 @@ static int kbase_vmap_phy_pages(struct kbase_context *kctx,
 	map->gpu_pages = &kbase_get_gpu_phy_pages(reg)[page_index];
 	map->addr = (void *)((uintptr_t)cpu_addr + offset_in_page);
 	map->size = size;
-	map->sync_needed = ((reg->flags & KBASE_REG_CPU_CACHED) != 0) &&
-		!kbase_mem_is_imported(map->gpu_alloc->type);
+	map->flags = vmap_flags;
+	if ((reg->flags & KBASE_REG_CPU_CACHED) && !kbase_mem_is_imported(map->gpu_alloc->type))
+		map->flags |= KBASE_VMAP_FLAG_SYNC_NEEDED;
 
-	if (map->sync_needed)
+	if (map->flags & KBASE_VMAP_FLAG_SYNC_NEEDED)
 		kbase_sync_mem_regions(kctx, map, KBASE_SYNC_TO_CPU);
 
+	if (vmap_flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING)
+		atomic_add(page_count, &kctx->permanent_mapped_pages);
+
 	kbase_mem_phy_alloc_kernel_mapped(reg->cpu_alloc);
+
 	return 0;
 }
 
-void *kbase_vmap_prot(struct kbase_context *kctx, u64 gpu_addr, size_t size,
-		      unsigned long prot_request, struct kbase_vmap_struct *map)
+void *kbase_vmap_reg(struct kbase_context *kctx, struct kbase_va_region *reg, u64 gpu_addr,
+		     size_t size, unsigned long prot_request, struct kbase_vmap_struct *map,
+		     kbase_vmap_flag vmap_flags)
 {
-	struct kbase_va_region *reg;
-	void *addr = NULL;
 	u64 offset_bytes;
 	struct kbase_mem_phy_alloc *cpu_alloc;
 	struct kbase_mem_phy_alloc *gpu_alloc;
 	int err;
 
-	kbase_gpu_vm_lock(kctx);
+	lockdep_assert_held(&kctx->reg_lock);
 
-	reg = kbase_region_tracker_find_region_enclosing_address(kctx,
-			gpu_addr);
-	if (kbase_is_region_invalid_or_free(reg))
-		goto out_unlock;
+	if (WARN_ON(kbase_is_region_invalid_or_free(reg)))
+		return NULL;
 
 	/* check access permissions can be satisfied
 	 * Intended only for checking KBASE_REG_{CPU,GPU}_{RD,WR}
 	 */
 	if ((reg->flags & prot_request) != prot_request)
-		goto out_unlock;
+		return NULL;
 
 	offset_bytes = gpu_addr - (reg->start_pfn << PAGE_SHIFT);
 	cpu_alloc = kbase_mem_phy_alloc_get(reg->cpu_alloc);
 	gpu_alloc = kbase_mem_phy_alloc_get(reg->gpu_alloc);
 
-	err = kbase_vmap_phy_pages(kctx, reg, offset_bytes, size, map);
+	err = kbase_vmap_phy_pages(kctx, reg, offset_bytes, size, map, vmap_flags);
 	if (err < 0)
 		goto fail_vmap_phy_pages;
 
-	addr = map->addr;
-
-out_unlock:
-	kbase_gpu_vm_unlock(kctx);
-	return addr;
+	return map->addr;
 
 fail_vmap_phy_pages:
-	kbase_gpu_vm_unlock(kctx);
 	kbase_mem_phy_alloc_put(cpu_alloc);
 	kbase_mem_phy_alloc_put(gpu_alloc);
-
 	return NULL;
 }
 
+void *kbase_vmap_prot(struct kbase_context *kctx, u64 gpu_addr, size_t size,
+		      unsigned long prot_request, struct kbase_vmap_struct *map)
+{
+	struct kbase_va_region *reg;
+	void *addr = NULL;
+
+	kbase_gpu_vm_lock(kctx);
+
+	reg = kbase_region_tracker_find_region_enclosing_address(kctx, gpu_addr);
+	if (kbase_is_region_invalid_or_free(reg))
+		goto out_unlock;
+
+	if (reg->gpu_alloc->type != KBASE_MEM_TYPE_NATIVE)
+		goto out_unlock;
+
+	addr = kbase_vmap_reg(kctx, reg, gpu_addr, size, prot_request, map, 0u);
+
+out_unlock:
+	kbase_gpu_vm_unlock(kctx);
+	return addr;
+}
+
 void *kbase_vmap(struct kbase_context *kctx, u64 gpu_addr, size_t size,
 		struct kbase_vmap_struct *map)
 {
@@ -3178,16 +3339,34 @@ static void kbase_vunmap_phy_pages(struct kbase_context *kctx,
 
 	vunmap(addr);
 
-	if (map->sync_needed)
+	/* If page migration is enabled, decrement the number of VMA mappings
+	 * for all physical pages. Now is a good time to do it because references
+	 * haven't been released yet.
+	 */
+	if (kbase_is_page_migration_enabled() && !kbase_mem_is_imported(map->gpu_alloc->type)) {
+		const size_t page_count = PFN_UP(map->offset_in_page + map->size);
+		struct tagged_addr *pages_array = map->cpu_pages;
+
+		kbase_vunmap_phy_pages_migrate_count_decrement(pages_array, page_count);
+	}
+
+	if (map->flags & KBASE_VMAP_FLAG_SYNC_NEEDED)
 		kbase_sync_mem_regions(kctx, map, KBASE_SYNC_TO_DEVICE);
+	if (map->flags & KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING) {
+		size_t page_count = PFN_UP(map->offset_in_page + map->size);
+
+		WARN_ON(page_count > atomic_read(&kctx->permanent_mapped_pages));
+		atomic_sub(page_count, &kctx->permanent_mapped_pages);
+	}
 
 	kbase_mem_phy_alloc_kernel_unmapped(map->cpu_alloc);
+
 	map->offset_in_page = 0;
 	map->cpu_pages = NULL;
 	map->gpu_pages = NULL;
 	map->addr = NULL;
 	map->size = 0;
-	map->sync_needed = false;
+	map->flags = 0;
 }
 
 void kbase_vunmap(struct kbase_context *kctx, struct kbase_vmap_struct *map)
@@ -3200,11 +3379,14 @@ KBASE_EXPORT_TEST_API(kbase_vunmap);
 
 static void kbasep_add_mm_counter(struct mm_struct *mm, int member, long value)
 {
-#if (KERNEL_VERSION(4, 19, 0) <= LINUX_VERSION_CODE)
-	/* To avoid the build breakage due to an unexported kernel symbol
-	 * 'mm_trace_rss_stat' from later kernels, i.e. from V4.19.0 onwards,
-	 * we inline here the equivalent of 'add_mm_counter()' from linux
-	 * kernel V5.4.0~8.
+#if (KERNEL_VERSION(6, 2, 0) <= LINUX_VERSION_CODE)
+	/* To avoid the build breakage due to the type change in rss_stat,
+	 * we inline here the equivalent of 'add_mm_counter()' from linux kernel V6.2.
+	 */
+	percpu_counter_add(&mm->rss_stat[member], value);
+#elif (KERNEL_VERSION(5, 5, 0) <= LINUX_VERSION_CODE)
+	/* To avoid the build breakage due to an unexported kernel symbol 'mm_trace_rss_stat',
+	 * we inline here the equivalent of 'add_mm_counter()' from linux kernel V5.5.
 	 */
 	atomic_long_add(value, &mm->rss_stat.count[member]);
 #else
@@ -3214,73 +3396,44 @@ static void kbasep_add_mm_counter(struct mm_struct *mm, int member, long value)
 
 void kbasep_os_process_page_usage_update(struct kbase_context *kctx, int pages)
 {
-	struct mm_struct *mm;
+	struct mm_struct *mm = kctx->process_mm;
 
-	rcu_read_lock();
-	mm = rcu_dereference(kctx->process_mm);
-	if (mm) {
-		atomic_add(pages, &kctx->nonmapped_pages);
-#ifdef SPLIT_RSS_COUNTING
-		kbasep_add_mm_counter(mm, MM_FILEPAGES, pages);
-#else
-		spin_lock(&mm->page_table_lock);
-		kbasep_add_mm_counter(mm, MM_FILEPAGES, pages);
-		spin_unlock(&mm->page_table_lock);
-#endif
-	}
-	rcu_read_unlock();
-}
-
-static void kbasep_os_process_page_usage_drain(struct kbase_context *kctx)
-{
-	int pages;
-	struct mm_struct *mm;
-
-	spin_lock(&kctx->mm_update_lock);
-	mm = rcu_dereference_protected(kctx->process_mm, lockdep_is_held(&kctx->mm_update_lock));
-	if (!mm) {
-		spin_unlock(&kctx->mm_update_lock);
+	if (unlikely(!mm))
 		return;
-	}
 
-	rcu_assign_pointer(kctx->process_mm, NULL);
-	spin_unlock(&kctx->mm_update_lock);
-	synchronize_rcu();
-
-	pages = atomic_xchg(&kctx->nonmapped_pages, 0);
+	atomic_add(pages, &kctx->nonmapped_pages);
 #ifdef SPLIT_RSS_COUNTING
-	kbasep_add_mm_counter(mm, MM_FILEPAGES, -pages);
+	kbasep_add_mm_counter(mm, MM_FILEPAGES, pages);
 #else
 	spin_lock(&mm->page_table_lock);
-	kbasep_add_mm_counter(mm, MM_FILEPAGES, -pages);
+	kbasep_add_mm_counter(mm, MM_FILEPAGES, pages);
 	spin_unlock(&mm->page_table_lock);
 #endif
 }
 
+static void kbase_special_vm_open(struct vm_area_struct *vma)
+{
+	struct kbase_context *kctx = vma->vm_private_data;
+
+	kbase_file_inc_cpu_mapping_count(kctx->kfile);
+}
+
 static void kbase_special_vm_close(struct vm_area_struct *vma)
 {
-	struct kbase_context *kctx;
+	struct kbase_context *kctx = vma->vm_private_data;
 
-	kctx = vma->vm_private_data;
-	kbasep_os_process_page_usage_drain(kctx);
+	kbase_file_dec_cpu_mapping_count(kctx->kfile);
 }
 
 static const struct vm_operations_struct kbase_vm_special_ops = {
+	.open = kbase_special_vm_open,
 	.close = kbase_special_vm_close,
 };
 
 static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_struct *vma)
 {
-	/* check that this is the only tracking page */
-	spin_lock(&kctx->mm_update_lock);
-	if (rcu_dereference_protected(kctx->process_mm, lockdep_is_held(&kctx->mm_update_lock))) {
-		spin_unlock(&kctx->mm_update_lock);
-		return -EFAULT;
-	}
-
-	rcu_assign_pointer(kctx->process_mm, current->mm);
-
-	spin_unlock(&kctx->mm_update_lock);
+	if (vma_pages(vma) != 1)
+		return -EINVAL;
 
 	/* no real access */
 	vm_flags_clear(vma, VM_READ | VM_MAYREAD | VM_WRITE | VM_MAYWRITE | VM_EXEC | VM_MAYEXEC);
@@ -3288,6 +3441,7 @@ static int kbase_tracking_page_setup(struct kbase_context *kctx, struct vm_area_
 	vma->vm_ops = &kbase_vm_special_ops;
 	vma->vm_private_data = kctx;
 
+	kbase_file_inc_cpu_mapping_count(kctx->kfile);
 	return 0;
 }
 
@@ -3311,9 +3465,27 @@ static unsigned long get_queue_doorbell_pfn(struct kbase_device *kbdev,
 			 (u64)queue->doorbell_nr * CSF_HW_DOORBELL_PAGE_SIZE));
 }
 
+static int
+#if (KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE || \
+	KERNEL_VERSION(5, 11, 0) > LINUX_VERSION_CODE)
+kbase_csf_user_io_pages_vm_mremap(struct vm_area_struct *vma)
+#else
+kbase_csf_user_io_pages_vm_mremap(struct vm_area_struct *vma, unsigned long flags)
+#endif
+{
+	pr_debug("Unexpected call to mremap method for User IO pages mapping vma\n");
+	return -EINVAL;
+}
+
+static int kbase_csf_user_io_pages_vm_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	pr_debug("Unexpected call to split method for User IO pages mapping vma\n");
+	return -EINVAL;
+}
+
 static void kbase_csf_user_io_pages_vm_open(struct vm_area_struct *vma)
 {
-	WARN(1, "Unexpected attempt to clone private vma\n");
+	pr_debug("Unexpected call to the open method for User IO pages mapping vma\n");
 	vma->vm_private_data = NULL;
 }
 
@@ -3324,12 +3496,16 @@ static void kbase_csf_user_io_pages_vm_close(struct vm_area_struct *vma)
 	struct kbase_device *kbdev;
 	int err;
 	bool reset_prevented = false;
+	struct kbase_file *kfile;
 
-	if (WARN_ON(!queue))
+	if (!queue) {
+		pr_debug("Close method called for the new User IO pages mapping vma\n");
 		return;
+	}
 
 	kctx = queue->kctx;
 	kbdev = kctx->kbdev;
+	kfile = kctx->kfile;
 
 	err = kbase_reset_gpu_prevent_and_wait(kbdev);
 	if (err)
@@ -3340,15 +3516,16 @@ static void kbase_csf_user_io_pages_vm_close(struct vm_area_struct *vma)
 	else
 		reset_prevented = true;
 
-	mutex_lock(&kctx->csf.lock);
-	kbase_csf_queue_unbind(queue);
-	mutex_unlock(&kctx->csf.lock);
+	rt_mutex_lock(&kctx->csf.lock);
+	kbase_csf_queue_unbind(queue, is_process_exiting(vma));
+	rt_mutex_unlock(&kctx->csf.lock);
 
 	if (reset_prevented)
 		kbase_reset_gpu_allow(kbdev);
 
+	kbase_file_dec_cpu_mapping_count(kfile);
 	/* Now as the vma is closed, drop the reference on mali device file */
-	fput(kctx->filp);
+	fput(kfile->filp);
 }
 
 #if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE)
@@ -3370,9 +3547,12 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf)
 	struct memory_group_manager_device *mgm_dev;
 
 	/* Few sanity checks up front */
-	if ((nr_pages != BASEP_QUEUE_NR_MMAP_USER_PAGES) ||
-	    (vma->vm_pgoff != queue->db_file_offset))
+	if (!queue || (nr_pages != BASEP_QUEUE_NR_MMAP_USER_PAGES) ||
+	    (vma->vm_pgoff != queue->db_file_offset)) {
+		pr_warn("Unexpected CPU page fault on User IO pages mapping for process %s tgid %d pid %d\n",
+			current->comm, current->tgid, current->pid);
 		return VM_FAULT_SIGBUS;
+	}
 
 	kbdev = queue->kctx->kbdev;
 	mgm_dev = kbdev->mgm_dev;
@@ -3382,13 +3562,6 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf)
 	/* Always map the doorbell page as uncached */
 	doorbell_pgprot = pgprot_device(vma->vm_page_prot);
 
-#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \
-		((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \
-		 (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE)))
-	vma->vm_page_prot = doorbell_pgprot;
-	input_page_pgprot = doorbell_pgprot;
-	output_page_pgprot = doorbell_pgprot;
-#else
 	if (kbdev->system_coherency == COHERENCY_NONE) {
 		input_page_pgprot = pgprot_writecombine(vma->vm_page_prot);
 		output_page_pgprot = pgprot_writecombine(vma->vm_page_prot);
@@ -3396,7 +3569,6 @@ static vm_fault_t kbase_csf_user_io_pages_vm_fault(struct vm_fault *vmf)
 		input_page_pgprot = vma->vm_page_prot;
 		output_page_pgprot = vma->vm_page_prot;
 	}
-#endif
 
 	doorbell_cpu_addr = vma->vm_start;
 
@@ -3435,6 +3607,12 @@ exit:
 static const struct vm_operations_struct kbase_csf_user_io_pages_vm_ops = {
 	.open = kbase_csf_user_io_pages_vm_open,
 	.close = kbase_csf_user_io_pages_vm_close,
+#if KERNEL_VERSION(5, 11, 0) <= LINUX_VERSION_CODE
+	.may_split = kbase_csf_user_io_pages_vm_split,
+#else
+	.split = kbase_csf_user_io_pages_vm_split,
+#endif
+	.mremap = kbase_csf_user_io_pages_vm_mremap,
 	.fault = kbase_csf_user_io_pages_vm_fault
 };
 
@@ -3500,6 +3678,7 @@ static int kbase_csf_cpu_mmap_user_io_pages(struct kbase_context *kctx,
 	/* Also adjust the vm_pgoff */
 	vma->vm_pgoff = queue->db_file_offset;
 
+	kbase_file_inc_cpu_mapping_count(kctx->kfile);
 	return 0;
 
 map_failed:
@@ -3514,13 +3693,78 @@ map_failed:
 	return err;
 }
 
+/**
+ * kbase_csf_user_reg_vm_open - VMA open function for the USER page
+ *
+ * @vma:  Pointer to the struct containing information about
+ *        the userspace mapping of USER page.
+ * Note:
+ * This function isn't expected to be called. If called (i.e> mremap),
+ * set private_data as NULL to indicate to close() and fault() functions.
+ */
+static void kbase_csf_user_reg_vm_open(struct vm_area_struct *vma)
+{
+	pr_debug("Unexpected call to the open method for USER register mapping");
+	vma->vm_private_data = NULL;
+}
+
+/**
+ * kbase_csf_user_reg_vm_close - VMA close function for the USER page
+ *
+ * @vma:  Pointer to the struct containing information about
+ *        the userspace mapping of USER page.
+ */
 static void kbase_csf_user_reg_vm_close(struct vm_area_struct *vma)
 {
 	struct kbase_context *kctx = vma->vm_private_data;
+	struct kbase_device *kbdev;
+	struct kbase_file *kfile;
+
+	if (unlikely(!kctx)) {
+		pr_debug("Close function called for the unexpected mapping");
+		return;
+	}
+
+	kbdev = kctx->kbdev;
+	kfile = kctx->kfile;
+
+	if (unlikely(!kctx->csf.user_reg.vma))
+		dev_warn(kbdev->dev, "user_reg VMA pointer unexpectedly NULL for ctx %d_%d",
+			 kctx->tgid, kctx->id);
+
+	mutex_lock(&kbdev->csf.reg_lock);
+	list_del_init(&kctx->csf.user_reg.link);
+	mutex_unlock(&kbdev->csf.reg_lock);
 
-	WARN_ON(!kctx->csf.user_reg_vma);
+	kctx->csf.user_reg.vma = NULL;
 
-	kctx->csf.user_reg_vma = NULL;
+	kbase_file_dec_cpu_mapping_count(kfile);
+	/* Now as the VMA is closed, drop the reference on mali device file */
+	fput(kfile->filp);
+}
+
+/**
+ * kbase_csf_user_reg_vm_mremap - VMA mremap function for the USER page
+ *
+ * @vma:  Pointer to the struct containing information about
+ *        the userspace mapping of USER page.
+ *
+ * Return: -EINVAL
+ *
+ * Note:
+ * User space must not attempt mremap on USER page mapping.
+ * This function will return an error to fail the attempt.
+ */
+static int
+#if ((KERNEL_VERSION(5, 13, 0) <= LINUX_VERSION_CODE) || \
+	(KERNEL_VERSION(5, 11, 0) > LINUX_VERSION_CODE))
+kbase_csf_user_reg_vm_mremap(struct vm_area_struct *vma)
+#else
+kbase_csf_user_reg_vm_mremap(struct vm_area_struct *vma, unsigned long flags)
+#endif
+{
+	pr_debug("Unexpected call to mremap method for USER page mapping vma\n");
+	return -EINVAL;
 }
 
 #if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE)
@@ -3533,44 +3777,52 @@ static vm_fault_t kbase_csf_user_reg_vm_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 #endif
 	struct kbase_context *kctx = vma->vm_private_data;
-	struct kbase_device *kbdev = kctx->kbdev;
-	struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev;
-	unsigned long pfn = PFN_DOWN(kbdev->reg_start + USER_BASE);
+	struct kbase_device *kbdev;
+	struct memory_group_manager_device *mgm_dev;
+	unsigned long pfn;
 	size_t nr_pages = PFN_DOWN(vma->vm_end - vma->vm_start);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 	unsigned long flags;
 
 	/* Few sanity checks up front */
-	if (WARN_ON(nr_pages != 1) ||
-	    WARN_ON(vma != kctx->csf.user_reg_vma) ||
-	    WARN_ON(vma->vm_pgoff !=
-			PFN_DOWN(BASEP_MEM_CSF_USER_REG_PAGE_HANDLE)))
+
+	if (!kctx || (nr_pages != 1) || (vma != kctx->csf.user_reg.vma) ||
+	    (vma->vm_pgoff != kctx->csf.user_reg.file_offset)) {
+		pr_err("Unexpected CPU page fault on USER page mapping for process %s tgid %d pid %d\n",
+		       current->comm, current->tgid, current->pid);
 		return VM_FAULT_SIGBUS;
+	}
+
+	kbdev = kctx->kbdev;
+	mgm_dev = kbdev->mgm_dev;
+	pfn = PFN_DOWN(kbdev->reg_start + USER_BASE);
 
 	mutex_lock(&kbdev->csf.reg_lock);
+
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	/* Don't map in the actual register page if GPU is powered down.
-	 * Always map in the dummy page in no mali builds.
+	/* Dummy page will be mapped during GPU off.
+	 *
+	 * In no mail builds, always map in the dummy page.
 	 */
-#if IS_ENABLED(CONFIG_MALI_NO_MALI)
-	pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.dummy_user_reg_page));
-#else
-	if (!kbdev->pm.backend.gpu_powered)
-		pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.dummy_user_reg_page));
-#endif
+	if (IS_ENABLED(CONFIG_MALI_NO_MALI) || !kbdev->pm.backend.gpu_powered)
+		pfn = PFN_DOWN(as_phys_addr_t(kbdev->csf.user_reg.dummy_page));
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
+	list_move_tail(&kctx->csf.user_reg.link, &kbdev->csf.user_reg.list);
 	ret = mgm_dev->ops.mgm_vmf_insert_pfn_prot(mgm_dev,
 						   KBASE_MEM_GROUP_CSF_FW, vma,
 						   vma->vm_start, pfn,
 						   vma->vm_page_prot);
+
 	mutex_unlock(&kbdev->csf.reg_lock);
 
 	return ret;
 }
 
 static const struct vm_operations_struct kbase_csf_user_reg_vm_ops = {
+	.open = kbase_csf_user_reg_vm_open,
 	.close = kbase_csf_user_reg_vm_close,
+	.mremap = kbase_csf_user_reg_vm_mremap,
 	.fault = kbase_csf_user_reg_vm_fault
 };
 
@@ -3578,9 +3830,10 @@ static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx,
 				struct vm_area_struct *vma)
 {
 	size_t nr_pages = PFN_DOWN(vma->vm_end - vma->vm_start);
+	struct kbase_device *kbdev = kctx->kbdev;
 
 	/* Few sanity checks */
-	if (kctx->csf.user_reg_vma)
+	if (kctx->csf.user_reg.vma)
 		return -EBUSY;
 
 	if (nr_pages != 1)
@@ -3599,11 +3852,25 @@ static int kbase_csf_cpu_mmap_user_reg_page(struct kbase_context *kctx,
 	 */
 	vm_flags_set(vma, VM_PFNMAP);
 
-	kctx->csf.user_reg_vma = vma;
+	kctx->csf.user_reg.vma = vma;
+
+	mutex_lock(&kbdev->csf.reg_lock);
+	kctx->csf.user_reg.file_offset = kbdev->csf.user_reg.file_offset++;
+	mutex_unlock(&kbdev->csf.reg_lock);
 
+	/* Make VMA point to the special internal file, but don't drop the
+	 * reference on mali device file (that would be done later when the
+	 * VMA is closed).
+	 */
+	vma->vm_file = kctx->kbdev->csf.user_reg.filp;
+	get_file(vma->vm_file);
+
+	/* Also adjust the vm_pgoff */
+	vma->vm_pgoff = kctx->csf.user_reg.file_offset;
 	vma->vm_ops = &kbase_csf_user_reg_vm_ops;
 	vma->vm_private_data = kctx;
 
+	kbase_file_inc_cpu_mapping_count(kctx->kfile);
 	return 0;
 }
 
diff --git a/mali_kbase/mali_kbase_mem_linux.h b/mali_kbase/mali_kbase_mem_linux.h
index 1f6877a..6dda44b 100644
--- a/mali_kbase/mali_kbase_mem_linux.h
+++ b/mali_kbase/mali_kbase_mem_linux.h
@@ -217,6 +217,26 @@ int kbase_mem_evictable_make(struct kbase_mem_phy_alloc *gpu_alloc);
  */
 bool kbase_mem_evictable_unmake(struct kbase_mem_phy_alloc *alloc);
 
+typedef unsigned int kbase_vmap_flag;
+
+/* Sync operations are needed on beginning and ending of access to kernel-mapped GPU memory.
+ *
+ * This is internal to the struct kbase_vmap_struct and should not be passed in by callers of
+ * kbase_vmap-related functions.
+ */
+#define KBASE_VMAP_FLAG_SYNC_NEEDED (((kbase_vmap_flag)1) << 0)
+
+/* Permanently mapped memory accounting (including enforcing limits) should be done on the
+ * kernel-mapped GPU memory.
+ *
+ * This should be used if the kernel mapping is going to live for a potentially long time, for
+ * example if it will persist after the caller has returned.
+ */
+#define KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING (((kbase_vmap_flag)1) << 1)
+
+/* Set of flags that can be passed into kbase_vmap-related functions */
+#define KBASE_VMAP_INPUT_FLAGS (KBASE_VMAP_FLAG_PERMANENT_MAP_ACCOUNTING)
+
 struct kbase_vmap_struct {
 	off_t offset_in_page;
 	struct kbase_mem_phy_alloc *cpu_alloc;
@@ -225,9 +245,55 @@ struct kbase_vmap_struct {
 	struct tagged_addr *gpu_pages;
 	void *addr;
 	size_t size;
-	bool sync_needed;
+	kbase_vmap_flag flags;
 };
 
+/**
+ * kbase_mem_shrink_gpu_mapping - Shrink the GPU mapping of an allocation
+ * @kctx:      Context the region belongs to
+ * @reg:       The GPU region or NULL if there isn't one
+ * @new_pages: The number of pages after the shrink
+ * @old_pages: The number of pages before the shrink
+ *
+ * Return: 0 on success, negative -errno on error
+ *
+ * Unmap the shrunk pages from the GPU mapping. Note that the size of the region
+ * itself is unmodified as we still need to reserve the VA, only the page tables
+ * will be modified by this function.
+ */
+int kbase_mem_shrink_gpu_mapping(struct kbase_context *kctx, struct kbase_va_region *reg,
+				 u64 new_pages, u64 old_pages);
+
+/**
+ * kbase_vmap_reg - Map part of an existing region into the kernel safely, only if the requested
+ *                  access permissions are supported
+ * @kctx:         Context @reg belongs to
+ * @reg:          The GPU region to map part of
+ * @gpu_addr:     Start address of VA range to map, which must be within @reg
+ * @size:         Size of VA range, which when added to @gpu_addr must be within @reg
+ * @prot_request: Flags indicating how the caller will then access the memory
+ * @map:          Structure to be given to kbase_vunmap() on freeing
+ * @vmap_flags:   Flags of type kbase_vmap_flag
+ *
+ * Return: Kernel-accessible CPU pointer to the VA range, or NULL on error
+ *
+ * Variant of kbase_vmap_prot() that can be used given an existing region.
+ *
+ * The caller must satisfy one of the following for @reg:
+ * * It must have been obtained by finding it on the region tracker, and the region lock must not
+ *   have been released in the mean time.
+ * * Or, it must have been refcounted with a call to kbase_va_region_alloc_get(), and the region
+ *   lock is now held again.
+ * * Or, @reg has had NO_USER_FREE set at creation time or under the region lock, and the
+ *   region lock is now held again.
+ *
+ * The acceptable @vmap_flags are those in %KBASE_VMAP_INPUT_FLAGS.
+ *
+ * Refer to kbase_vmap_prot() for more information on the operation of this function.
+ */
+void *kbase_vmap_reg(struct kbase_context *kctx, struct kbase_va_region *reg, u64 gpu_addr,
+		     size_t size, unsigned long prot_request, struct kbase_vmap_struct *map,
+		     kbase_vmap_flag vmap_flags);
 
 /**
  * kbase_vmap_prot - Map a GPU VA range into the kernel safely, only if the
@@ -439,18 +505,7 @@ u32 kbase_get_cache_line_alignment(struct kbase_device *kbdev);
 static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma,
 			unsigned long addr, unsigned long pfn, pgprot_t pgprot)
 {
-	int err;
-
-#if ((KERNEL_VERSION(4, 4, 147) >= LINUX_VERSION_CODE) || \
-		((KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE) && \
-		 (KERNEL_VERSION(4, 5, 0) <= LINUX_VERSION_CODE)))
-	if (pgprot_val(pgprot) != pgprot_val(vma->vm_page_prot))
-		return VM_FAULT_SIGBUS;
-
-	err = vm_insert_pfn(vma, addr, pfn);
-#else
-	err = vm_insert_pfn_prot(vma, addr, pfn, pgprot);
-#endif
+	int err = vm_insert_pfn_prot(vma, addr, pfn, pgprot);
 
 	if (unlikely(err == -ENOMEM))
 		return VM_FAULT_OOM;
diff --git a/mali_kbase/mali_kbase_mem_migrate.c b/mali_kbase/mali_kbase_mem_migrate.c
new file mode 100644
index 0000000..4c2cc0f
--- /dev/null
+++ b/mali_kbase/mali_kbase_mem_migrate.c
@@ -0,0 +1,712 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/**
+ * DOC: Base kernel page migration implementation.
+ */
+#include <linux/migrate.h>
+
+#include <mali_kbase.h>
+#include <mali_kbase_mem_migrate.h>
+#include <mmu/mali_kbase_mmu.h>
+
+/* Global integer used to determine if module parameter value has been
+ * provided and if page migration feature is enabled.
+ * Feature is disabled on all platforms by default.
+ */
+#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)
+/* If page migration support is explicitly compiled out, there should be no way to change
+ * this int. Its value is automatically 0 as a global.
+ */
+const int kbase_page_migration_enabled;
+/* module_param is not called so this value cannot be changed at insmod when compiled
+ * without support for page migration.
+ */
+#else
+/* -1 as default, 0 when manually set as off and 1 when manually set as on */
+int kbase_page_migration_enabled = -1;
+module_param(kbase_page_migration_enabled, int, 0444);
+MODULE_PARM_DESC(kbase_page_migration_enabled,
+		 "Explicitly enable or disable page migration with 1 or 0 respectively.");
+#endif /* !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) */
+
+KBASE_EXPORT_TEST_API(kbase_page_migration_enabled);
+
+bool kbase_is_page_migration_enabled(void)
+{
+	/* Handle uninitialised int case */
+	if (kbase_page_migration_enabled < 0)
+		return false;
+	return IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT) && kbase_page_migration_enabled;
+}
+KBASE_EXPORT_SYMBOL(kbase_is_page_migration_enabled);
+
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+static const struct movable_operations movable_ops;
+#endif
+
+bool kbase_alloc_page_metadata(struct kbase_device *kbdev, struct page *p, dma_addr_t dma_addr,
+			       u8 group_id)
+{
+	struct kbase_page_metadata *page_md;
+
+	/* A check for kbase_page_migration_enabled would help here too but it's already being
+	 * checked in the only caller of this function.
+	 */
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return false;
+
+	page_md = kzalloc(sizeof(struct kbase_page_metadata), GFP_KERNEL);
+	if (!page_md)
+		return false;
+
+	SetPagePrivate(p);
+	set_page_private(p, (unsigned long)page_md);
+	page_md->dma_addr = dma_addr;
+	page_md->status = PAGE_STATUS_SET(page_md->status, (u8)ALLOCATE_IN_PROGRESS);
+	page_md->vmap_count = 0;
+	page_md->group_id = group_id;
+	spin_lock_init(&page_md->migrate_lock);
+
+	lock_page(p);
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+	__SetPageMovable(p, &movable_ops);
+	page_md->status = PAGE_MOVABLE_SET(page_md->status);
+#else
+	/* In some corner cases, the driver may attempt to allocate memory pages
+	 * even before the device file is open and the mapping for address space
+	 * operations is created. In that case, it is impossible to assign address
+	 * space operations to memory pages: simply pretend that they are movable,
+	 * even if they are not.
+	 *
+	 * The page will go through all state transitions but it will never be
+	 * actually considered movable by the kernel. This is due to the fact that
+	 * the page cannot be marked as NOT_MOVABLE upon creation, otherwise the
+	 * memory pool will always refuse to add it to the pool and schedule
+	 * a worker thread to free it later.
+	 *
+	 * Page metadata may seem redundant in this case, but they are not,
+	 * because memory pools expect metadata to be present when page migration
+	 * is enabled and because the pages may always return to memory pools and
+	 * gain the movable property later on in their life cycle.
+	 */
+	if (kbdev->mem_migrate.inode && kbdev->mem_migrate.inode->i_mapping) {
+		__SetPageMovable(p, kbdev->mem_migrate.inode->i_mapping);
+		page_md->status = PAGE_MOVABLE_SET(page_md->status);
+	}
+#endif
+	unlock_page(p);
+
+	return true;
+}
+
+static void kbase_free_page_metadata(struct kbase_device *kbdev, struct page *p, u8 *group_id)
+{
+	struct device *const dev = kbdev->dev;
+	struct kbase_page_metadata *page_md;
+	dma_addr_t dma_addr;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+	page_md = kbase_page_private(p);
+	if (!page_md)
+		return;
+
+	if (group_id)
+		*group_id = page_md->group_id;
+	dma_addr = kbase_dma_addr(p);
+	dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+	kfree(page_md);
+	set_page_private(p, 0);
+	ClearPagePrivate(p);
+}
+
+#if IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)
+/* This function is only called when page migration
+ * support is not explicitly compiled out.
+ */
+static void kbase_free_pages_worker(struct work_struct *work)
+{
+	struct kbase_mem_migrate *mem_migrate =
+		container_of(work, struct kbase_mem_migrate, free_pages_work);
+	struct kbase_device *kbdev = container_of(mem_migrate, struct kbase_device, mem_migrate);
+	struct page *p, *tmp;
+	struct kbase_page_metadata *page_md;
+	LIST_HEAD(free_list);
+
+	spin_lock(&mem_migrate->free_pages_lock);
+	list_splice_init(&mem_migrate->free_pages_list, &free_list);
+	spin_unlock(&mem_migrate->free_pages_lock);
+	list_for_each_entry_safe(p, tmp, &free_list, lru) {
+		u8 group_id = 0;
+		list_del_init(&p->lru);
+
+		lock_page(p);
+		page_md = kbase_page_private(p);
+		if (page_md && IS_PAGE_MOVABLE(page_md->status)) {
+			__ClearPageMovable(p);
+			page_md->status = PAGE_MOVABLE_CLEAR(page_md->status);
+		}
+		unlock_page(p);
+
+		kbase_free_page_metadata(kbdev, p, &group_id);
+		kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, group_id, p, 0);
+	}
+}
+#endif
+
+void kbase_free_page_later(struct kbase_device *kbdev, struct page *p)
+{
+	struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+	spin_lock(&mem_migrate->free_pages_lock);
+	list_add(&p->lru, &mem_migrate->free_pages_list);
+	spin_unlock(&mem_migrate->free_pages_lock);
+}
+
+/**
+ * kbasep_migrate_page_pt_mapped - Migrate a memory page that is mapped
+ *                                 in a PGD of kbase_mmu_table.
+ *
+ * @old_page:  Existing PGD page to remove
+ * @new_page:  Destination for migrating the existing PGD page to
+ *
+ * Replace an existing PGD page with a new page by migrating its content. More specifically:
+ * the new page shall replace the existing PGD page in the MMU page table. Before returning,
+ * the new page shall be set as movable and not isolated, while the old page shall lose
+ * the movable property. The meta data attached to the PGD page is transferred to the
+ * new (replacement) page.
+ *
+ * This function returns early with an error if called when not compiled with
+ * CONFIG_PAGE_MIGRATION_SUPPORT.
+ *
+ * Return: 0 on migration success, or -EAGAIN for a later retry. Otherwise it's a failure
+ *          and the migration is aborted.
+ */
+static int kbasep_migrate_page_pt_mapped(struct page *old_page, struct page *new_page)
+{
+	struct kbase_page_metadata *page_md = kbase_page_private(old_page);
+	struct kbase_context *kctx = page_md->data.pt_mapped.mmut->kctx;
+	struct kbase_device *kbdev = kctx->kbdev;
+	dma_addr_t old_dma_addr = page_md->dma_addr;
+	dma_addr_t new_dma_addr;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return -EINVAL;
+
+	/* Create a new dma map for the new page */
+	new_dma_addr = dma_map_page(kbdev->dev, new_page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(kbdev->dev, new_dma_addr))
+		return -ENOMEM;
+
+	/* Lock context to protect access to the page in physical allocation.
+	 * This blocks the CPU page fault handler from remapping pages.
+	 * Only MCU's mmut is device wide, i.e. no corresponding kctx.
+	 */
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
+
+	ret = kbase_mmu_migrate_page(
+		as_tagged(page_to_phys(old_page)), as_tagged(page_to_phys(new_page)), old_dma_addr,
+		new_dma_addr, PGD_VPFN_LEVEL_GET_LEVEL(page_md->data.pt_mapped.pgd_vpfn_level));
+
+	if (ret == 0) {
+		dma_unmap_page(kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+		__ClearPageMovable(old_page);
+		ClearPagePrivate(old_page);
+		put_page(old_page);
+
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+		__SetPageMovable(new_page, &movable_ops);
+		page_md->status = PAGE_MOVABLE_SET(page_md->status);
+#else
+		if (kbdev->mem_migrate.inode->i_mapping) {
+			__SetPageMovable(new_page, kbdev->mem_migrate.inode->i_mapping);
+			page_md->status = PAGE_MOVABLE_SET(page_md->status);
+		}
+#endif
+		SetPagePrivate(new_page);
+		get_page(new_page);
+	} else
+		dma_unmap_page(kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+	/* Page fault handler for CPU mapping unblocked. */
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
+
+	return ret;
+}
+
+/*
+ * kbasep_migrate_page_allocated_mapped - Migrate a memory page that is both
+ *                                        allocated and mapped.
+ *
+ * @old_page:  Page to remove.
+ * @new_page:  Page to add.
+ *
+ * Replace an old page with a new page by migrating its content and all its
+ * CPU and GPU mappings. More specifically: the new page shall replace the
+ * old page in the MMU page table, as well as in the page array of the physical
+ * allocation, which is used to create CPU mappings. Before returning, the new
+ * page shall be set as movable and not isolated, while the old page shall lose
+ * the movable property.
+ *
+ * This function returns early with an error if called when not compiled with
+ * CONFIG_PAGE_MIGRATION_SUPPORT.
+ */
+static int kbasep_migrate_page_allocated_mapped(struct page *old_page, struct page *new_page)
+{
+	struct kbase_page_metadata *page_md = kbase_page_private(old_page);
+	struct kbase_context *kctx = page_md->data.mapped.mmut->kctx;
+	dma_addr_t old_dma_addr, new_dma_addr;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return -EINVAL;
+	old_dma_addr = page_md->dma_addr;
+	new_dma_addr = dma_map_page(kctx->kbdev->dev, new_page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(kctx->kbdev->dev, new_dma_addr))
+		return -ENOMEM;
+
+	/* Lock context to protect access to array of pages in physical allocation.
+	 * This blocks the CPU page fault handler from remapping pages.
+	 */
+	kbase_gpu_vm_lock_with_pmode_sync(kctx);
+
+	/* Unmap the old physical range. */
+	unmap_mapping_range(kctx->kfile->filp->f_inode->i_mapping,
+			    page_md->data.mapped.vpfn << PAGE_SHIFT,
+			    PAGE_SIZE, 1);
+
+	ret = kbase_mmu_migrate_page(as_tagged(page_to_phys(old_page)),
+				     as_tagged(page_to_phys(new_page)), old_dma_addr, new_dma_addr,
+				     MIDGARD_MMU_BOTTOMLEVEL);
+
+	if (ret == 0) {
+		dma_unmap_page(kctx->kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+		SetPagePrivate(new_page);
+		get_page(new_page);
+
+		/* Clear PG_movable from the old page and release reference. */
+		ClearPagePrivate(old_page);
+		__ClearPageMovable(old_page);
+		put_page(old_page);
+
+		/* Set PG_movable to the new page. */
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+		__SetPageMovable(new_page, &movable_ops);
+		page_md->status = PAGE_MOVABLE_SET(page_md->status);
+#else
+		if (kctx->kbdev->mem_migrate.inode->i_mapping) {
+			__SetPageMovable(new_page, kctx->kbdev->mem_migrate.inode->i_mapping);
+			page_md->status = PAGE_MOVABLE_SET(page_md->status);
+		}
+#endif
+	} else
+		dma_unmap_page(kctx->kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+	/* Page fault handler for CPU mapping unblocked. */
+	kbase_gpu_vm_unlock_with_pmode_sync(kctx);
+
+	return ret;
+}
+
+/**
+ * kbase_page_isolate - Isolate a page for migration.
+ *
+ * @p:    Pointer of the page struct of page to isolate.
+ * @mode: LRU Isolation modes.
+ *
+ * Callback function for Linux to isolate a page and prepare it for migration.
+ * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT.
+ *
+ * Return: true on success, false otherwise.
+ */
+static bool kbase_page_isolate(struct page *p, isolate_mode_t mode)
+{
+	bool status_mem_pool = false;
+	struct kbase_mem_pool *mem_pool = NULL;
+	struct kbase_page_metadata *page_md = kbase_page_private(p);
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return false;
+	CSTD_UNUSED(mode);
+
+	if (!page_md || !IS_PAGE_MOVABLE(page_md->status))
+		return false;
+
+	if (!spin_trylock(&page_md->migrate_lock))
+		return false;
+
+	if (WARN_ON(IS_PAGE_ISOLATED(page_md->status))) {
+		spin_unlock(&page_md->migrate_lock);
+		return false;
+	}
+
+	switch (PAGE_STATUS_GET(page_md->status)) {
+	case MEM_POOL:
+		/* Prepare to remove page from memory pool later only if pool is not
+		 * in the process of termination.
+		 */
+		mem_pool = page_md->data.mem_pool.pool;
+		status_mem_pool = true;
+		preempt_disable();
+		atomic_inc(&mem_pool->isolation_in_progress_cnt);
+		break;
+	case ALLOCATED_MAPPED:
+		/* Mark the page into isolated state, but only if it has no
+		 * kernel CPU mappings
+		 */
+		if (page_md->vmap_count == 0)
+			page_md->status = PAGE_ISOLATE_SET(page_md->status, 1);
+		break;
+	case PT_MAPPED:
+		/* Mark the page into isolated state. */
+		page_md->status = PAGE_ISOLATE_SET(page_md->status, 1);
+		break;
+	case SPILL_IN_PROGRESS:
+	case ALLOCATE_IN_PROGRESS:
+	case FREE_IN_PROGRESS:
+		break;
+	case NOT_MOVABLE:
+		/* Opportunistically clear the movable property for these pages */
+		__ClearPageMovable(p);
+		page_md->status = PAGE_MOVABLE_CLEAR(page_md->status);
+		break;
+	default:
+		/* State should always fall in one of the previous cases!
+		 * Also notice that FREE_ISOLATED_IN_PROGRESS or
+		 * FREE_PT_ISOLATED_IN_PROGRESS is impossible because
+		 * that state only applies to pages that are already isolated.
+		 */
+		page_md->status = PAGE_ISOLATE_SET(page_md->status, 0);
+		break;
+	}
+
+	spin_unlock(&page_md->migrate_lock);
+
+	/* If the page is still in the memory pool: try to remove it. This will fail
+	 * if pool lock is taken which could mean page no longer exists in pool.
+	 */
+	if (status_mem_pool) {
+		if (!spin_trylock(&mem_pool->pool_lock)) {
+			atomic_dec(&mem_pool->isolation_in_progress_cnt);
+			preempt_enable();
+			return false;
+		}
+
+		spin_lock(&page_md->migrate_lock);
+		/* Check status again to ensure page has not been removed from memory pool. */
+		if (PAGE_STATUS_GET(page_md->status) == MEM_POOL) {
+			page_md->status = PAGE_ISOLATE_SET(page_md->status, 1);
+			list_del_init(&p->lru);
+			mem_pool->cur_size--;
+		}
+		spin_unlock(&page_md->migrate_lock);
+		spin_unlock(&mem_pool->pool_lock);
+		atomic_dec(&mem_pool->isolation_in_progress_cnt);
+		preempt_enable();
+	}
+
+	return IS_PAGE_ISOLATED(page_md->status);
+}
+
+/**
+ * kbase_page_migrate - Migrate content of old page to new page provided.
+ *
+ * @mapping:  Pointer to address_space struct associated with pages.
+ * @new_page: Pointer to the page struct of new page.
+ * @old_page: Pointer to the page struct of old page.
+ * @mode:     Mode to determine if migration will be synchronised.
+ *
+ * Callback function for Linux to migrate the content of the old page to the
+ * new page provided.
+ * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+static int kbase_page_migrate(struct address_space *mapping, struct page *new_page,
+			      struct page *old_page, enum migrate_mode mode)
+#else
+static int kbase_page_migrate(struct page *new_page, struct page *old_page, enum migrate_mode mode)
+#endif
+{
+	int err = 0;
+	bool status_mem_pool = false;
+	bool status_free_pt_isolated_in_progress = false;
+	bool status_free_isolated_in_progress = false;
+	bool status_pt_mapped = false;
+	bool status_mapped = false;
+	bool status_not_movable = false;
+	struct kbase_page_metadata *page_md = kbase_page_private(old_page);
+	struct kbase_device *kbdev = NULL;
+
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+	CSTD_UNUSED(mapping);
+#endif
+	CSTD_UNUSED(mode);
+
+	if (!kbase_is_page_migration_enabled() || !page_md || !IS_PAGE_MOVABLE(page_md->status))
+		return -EINVAL;
+
+	if (!spin_trylock(&page_md->migrate_lock))
+		return -EAGAIN;
+
+	if (WARN_ON(!IS_PAGE_ISOLATED(page_md->status))) {
+		spin_unlock(&page_md->migrate_lock);
+		return -EINVAL;
+	}
+
+	switch (PAGE_STATUS_GET(page_md->status)) {
+	case MEM_POOL:
+		status_mem_pool = true;
+		kbdev = page_md->data.mem_pool.kbdev;
+		break;
+	case ALLOCATED_MAPPED:
+		status_mapped = true;
+		break;
+	case PT_MAPPED:
+		status_pt_mapped = true;
+		break;
+	case FREE_ISOLATED_IN_PROGRESS:
+		status_free_isolated_in_progress = true;
+		kbdev = page_md->data.free_isolated.kbdev;
+		break;
+	case FREE_PT_ISOLATED_IN_PROGRESS:
+		status_free_pt_isolated_in_progress = true;
+		kbdev = page_md->data.free_pt_isolated.kbdev;
+		break;
+	case NOT_MOVABLE:
+		status_not_movable = true;
+		break;
+	default:
+		/* State should always fall in one of the previous cases! */
+		err = -EAGAIN;
+		break;
+	}
+
+	spin_unlock(&page_md->migrate_lock);
+
+	if (status_mem_pool || status_free_isolated_in_progress ||
+	    status_free_pt_isolated_in_progress) {
+		struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate;
+
+		kbase_free_page_metadata(kbdev, old_page, NULL);
+		__ClearPageMovable(old_page);
+		put_page(old_page);
+
+		/* Just free new page to avoid lock contention. */
+		INIT_LIST_HEAD(&new_page->lru);
+		get_page(new_page);
+		set_page_private(new_page, 0);
+		kbase_free_page_later(kbdev, new_page);
+		queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work);
+	} else if (status_not_movable) {
+		err = -EINVAL;
+	} else if (status_mapped) {
+		err = kbasep_migrate_page_allocated_mapped(old_page, new_page);
+	} else if (status_pt_mapped) {
+		err = kbasep_migrate_page_pt_mapped(old_page, new_page);
+	}
+
+	/* While we want to preserve the movability of pages for which we return
+	 * EAGAIN, according to the kernel docs, movable pages for which a critical
+	 * error is returned are called putback on, which may not be what we
+	 * expect.
+	 */
+	if (err < 0 && err != -EAGAIN) {
+		__ClearPageMovable(old_page);
+		page_md->status = PAGE_MOVABLE_CLEAR(page_md->status);
+	}
+
+	return err;
+}
+
+/**
+ * kbase_page_putback - Return isolated page back to kbase.
+ *
+ * @p: Pointer of the page struct of page.
+ *
+ * Callback function for Linux to return isolated page back to kbase. This
+ * will only be called for a page that has been isolated but failed to
+ * migrate. This function will put back the given page to the state it was
+ * in before it was isolated.
+ * This callback is not registered if compiled without CONFIG_PAGE_MIGRATION_SUPPORT.
+ */
+static void kbase_page_putback(struct page *p)
+{
+	bool status_mem_pool = false;
+	bool status_free_isolated_in_progress = false;
+	bool status_free_pt_isolated_in_progress = false;
+	struct kbase_page_metadata *page_md = kbase_page_private(p);
+	struct kbase_device *kbdev = NULL;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+	/* If we don't have page metadata, the page may not belong to the
+	 * driver or may already have been freed, and there's nothing we can do
+	 */
+	if (!page_md)
+		return;
+
+	spin_lock(&page_md->migrate_lock);
+
+	if (WARN_ON(!IS_PAGE_ISOLATED(page_md->status))) {
+		spin_unlock(&page_md->migrate_lock);
+		return;
+	}
+
+	switch (PAGE_STATUS_GET(page_md->status)) {
+	case MEM_POOL:
+		status_mem_pool = true;
+		kbdev = page_md->data.mem_pool.kbdev;
+		break;
+	case ALLOCATED_MAPPED:
+		page_md->status = PAGE_ISOLATE_SET(page_md->status, 0);
+		break;
+	case PT_MAPPED:
+	case NOT_MOVABLE:
+		/* Pages should no longer be isolated if they are in a stable state
+		 * and used by the driver.
+		 */
+		page_md->status = PAGE_ISOLATE_SET(page_md->status, 0);
+		break;
+	case FREE_ISOLATED_IN_PROGRESS:
+		status_free_isolated_in_progress = true;
+		kbdev = page_md->data.free_isolated.kbdev;
+		break;
+	case FREE_PT_ISOLATED_IN_PROGRESS:
+		status_free_pt_isolated_in_progress = true;
+		kbdev = page_md->data.free_pt_isolated.kbdev;
+		break;
+	default:
+		/* State should always fall in one of the previous cases! */
+		break;
+	}
+
+	spin_unlock(&page_md->migrate_lock);
+
+	/* If page was in a memory pool then just free it to avoid lock contention. The
+	 * same is also true to status_free_pt_isolated_in_progress.
+	 */
+	if (status_mem_pool || status_free_isolated_in_progress ||
+	    status_free_pt_isolated_in_progress) {
+		__ClearPageMovable(p);
+		page_md->status = PAGE_MOVABLE_CLEAR(page_md->status);
+		if (!WARN_ON_ONCE(!kbdev)) {
+			struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate;
+
+			kbase_free_page_later(kbdev, p);
+			queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work);
+		}
+	}
+}
+
+#if (KERNEL_VERSION(6, 0, 0) <= LINUX_VERSION_CODE)
+static const struct movable_operations movable_ops = {
+	.isolate_page = kbase_page_isolate,
+	.migrate_page = kbase_page_migrate,
+	.putback_page = kbase_page_putback,
+};
+#else
+static const struct address_space_operations kbase_address_space_ops = {
+	.isolate_page = kbase_page_isolate,
+	.migratepage = kbase_page_migrate,
+	.putback_page = kbase_page_putback,
+};
+#endif
+
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+void kbase_mem_migrate_set_address_space_ops(struct kbase_device *kbdev, struct file *const filp)
+{
+	if (!kbase_is_page_migration_enabled())
+		return;
+
+	mutex_lock(&kbdev->fw_load_lock);
+
+	if (filp) {
+		filp->f_inode->i_mapping->a_ops = &kbase_address_space_ops;
+
+		if (!kbdev->mem_migrate.inode) {
+			kbdev->mem_migrate.inode = filp->f_inode;
+			/* This reference count increment is balanced by iput()
+			 * upon termination.
+			 */
+			atomic_inc(&filp->f_inode->i_count);
+		} else {
+			WARN_ON(kbdev->mem_migrate.inode != filp->f_inode);
+		}
+	}
+
+	mutex_unlock(&kbdev->fw_load_lock);
+}
+#endif
+
+void kbase_mem_migrate_init(struct kbase_device *kbdev)
+{
+#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)
+	/* Page migration explicitly disabled at compile time - do nothing */
+	return;
+#else
+	struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate;
+
+	/* Page migration support compiled in, either explicitly or
+	 * by default, so the default behaviour is to follow the choice
+	 * of large pages if not selected at insmod. Check insmod parameter
+	 * integer for a negative value to see if insmod parameter was
+	 * passed in at all (it will override the default negative value).
+	 */
+	if (kbase_page_migration_enabled < 0)
+		kbase_page_migration_enabled = kbdev->pagesize_2mb ? 1 : 0;
+	else
+		dev_info(kbdev->dev, "Page migration support explicitly %s at insmod.",
+			 kbase_page_migration_enabled ? "enabled" : "disabled");
+
+	spin_lock_init(&mem_migrate->free_pages_lock);
+	INIT_LIST_HEAD(&mem_migrate->free_pages_list);
+
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+	mem_migrate->inode = NULL;
+#endif
+	mem_migrate->free_pages_workq =
+		alloc_workqueue("free_pages_workq", WQ_UNBOUND | WQ_MEM_RECLAIM, 1);
+	INIT_WORK(&mem_migrate->free_pages_work, kbase_free_pages_worker);
+#endif
+}
+
+void kbase_mem_migrate_term(struct kbase_device *kbdev)
+{
+	struct kbase_mem_migrate *mem_migrate = &kbdev->mem_migrate;
+
+#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)
+	/* Page migration explicitly disabled at compile time - do nothing */
+	return;
+#endif
+	if (mem_migrate->free_pages_workq)
+		destroy_workqueue(mem_migrate->free_pages_workq);
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+	iput(mem_migrate->inode);
+#endif
+}
diff --git a/mali_kbase/mali_kbase_mem_migrate.h b/mali_kbase/mali_kbase_mem_migrate.h
new file mode 100644
index 0000000..e9f3fc4
--- /dev/null
+++ b/mali_kbase/mali_kbase_mem_migrate.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+#ifndef _KBASE_MEM_MIGRATE_H
+#define _KBASE_MEM_MIGRATE_H
+
+/**
+ * DOC: Base kernel page migration implementation.
+ */
+
+#define PAGE_STATUS_MASK ((u8)0x3F)
+#define PAGE_STATUS_GET(status) (status & PAGE_STATUS_MASK)
+#define PAGE_STATUS_SET(status, value) ((status & ~PAGE_STATUS_MASK) | (value & PAGE_STATUS_MASK))
+
+#define PAGE_ISOLATE_SHIFT (7)
+#define PAGE_ISOLATE_MASK ((u8)1 << PAGE_ISOLATE_SHIFT)
+#define PAGE_ISOLATE_SET(status, value)                                                            \
+	((status & ~PAGE_ISOLATE_MASK) | (value << PAGE_ISOLATE_SHIFT))
+#define IS_PAGE_ISOLATED(status) ((bool)(status & PAGE_ISOLATE_MASK))
+
+#define PAGE_MOVABLE_SHIFT (6)
+#define PAGE_MOVABLE_MASK ((u8)1 << PAGE_MOVABLE_SHIFT)
+#define PAGE_MOVABLE_CLEAR(status) ((status) & ~PAGE_MOVABLE_MASK)
+#define PAGE_MOVABLE_SET(status) (status | PAGE_MOVABLE_MASK)
+
+#define IS_PAGE_MOVABLE(status) ((bool)(status & PAGE_MOVABLE_MASK))
+
+/* Global integer used to determine if module parameter value has been
+ * provided and if page migration feature is enabled.
+ */
+#if !IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT)
+extern const int kbase_page_migration_enabled;
+#else
+extern int kbase_page_migration_enabled;
+#endif
+
+/**
+ * kbase_alloc_page_metadata - Allocate and initialize page metadata
+ * @kbdev:    Pointer to kbase device.
+ * @p:        Page to assign metadata to.
+ * @dma_addr: DMA address mapped to paged.
+ * @group_id: Memory group ID associated with the entity that is
+ *            allocating the page metadata.
+ *
+ * This will allocate memory for the page's metadata, initialize it and
+ * assign a reference to the page's private field. Importantly, once
+ * the metadata is set and ready this function will mark the page as
+ * movable.
+ *
+ * Return: true if successful or false otherwise.
+ */
+bool kbase_alloc_page_metadata(struct kbase_device *kbdev, struct page *p, dma_addr_t dma_addr,
+			       u8 group_id);
+
+bool kbase_is_page_migration_enabled(void);
+
+/**
+ * kbase_free_page_later - Defer freeing of given page.
+ * @kbdev:  Pointer to kbase device
+ * @p:      Page to free
+ *
+ * This will add given page to a list of pages which will be freed at
+ * a later time.
+ */
+void kbase_free_page_later(struct kbase_device *kbdev, struct page *p);
+
+#if (KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE)
+/*
+ * kbase_mem_migrate_set_address_space_ops - Set address space operations
+ *
+ * @kbdev: Pointer to object representing an instance of GPU platform device.
+ * @filp:  Pointer to the struct file corresponding to device file
+ *         /dev/malixx instance, passed to the file's open method.
+ *
+ * Assign address space operations to the given file struct @filp and
+ * add a reference to @kbdev.
+ */
+void kbase_mem_migrate_set_address_space_ops(struct kbase_device *kbdev, struct file *const filp);
+#endif
+
+/*
+ * kbase_mem_migrate_init - Initialise kbase page migration
+ *
+ * @kbdev: Pointer to kbase device
+ *
+ * Enables page migration by default based on GPU and setup work queue to
+ * defer freeing pages during page migration callbacks.
+ */
+void kbase_mem_migrate_init(struct kbase_device *kbdev);
+
+/*
+ * kbase_mem_migrate_term - Terminate kbase page migration
+ *
+ * @kbdev: Pointer to kbase device
+ *
+ * This will flush any work left to free pages from page migration
+ * and destroy workqueue associated.
+ */
+void kbase_mem_migrate_term(struct kbase_device *kbdev);
+
+#endif /* _KBASE_migrate_H */
diff --git a/mali_kbase/mali_kbase_mem_pool.c b/mali_kbase/mali_kbase_mem_pool.c
index c991adf..d942ff5 100644
--- a/mali_kbase/mali_kbase_mem_pool.c
+++ b/mali_kbase/mali_kbase_mem_pool.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,12 +21,18 @@
 
 #include <mali_kbase.h>
 #include <linux/mm.h>
+#include <linux/migrate.h>
 #include <linux/dma-mapping.h>
 #include <linux/highmem.h>
 #include <linux/spinlock.h>
 #include <linux/shrinker.h>
 #include <linux/atomic.h>
 #include <linux/version.h>
+#if KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE
+#include <linux/sched/signal.h>
+#else
+#include <linux/signal.h>
+#endif
 
 #define pool_dbg(pool, format, ...) \
 	dev_dbg(pool->kbdev->dev, "%s-pool [%zu/%zu]: " format,	\
@@ -70,6 +76,41 @@ static void kbase_mem_pool_ordered_add_array_spill_locked(
 		struct tagged_addr *pages, struct list_head *spillover_list,
 		bool zero, bool sync);
 
+/**
+ * can_alloc_page() - Check if the current thread can allocate a physical page
+ *
+ * @pool:                Pointer to the memory pool.
+ * @page_owner:          Pointer to the task/process that created the Kbase context
+ *                       for which a page needs to be allocated. It can be NULL if
+ *                       the page won't be associated with Kbase context.
+ *
+ * This function checks if the current thread can make a request to kernel to
+ * allocate a physical page. If the process that created the context is exiting or
+ * is being killed, then there is no point in doing a page allocation.
+ *
+ * The check done by the function is particularly helpful when the system is running
+ * low on memory. When a page is allocated from the context of a kernel thread, OoM
+ * killer doesn't consider the kernel thread for killing and kernel keeps retrying
+ * to allocate the page as long as the OoM killer is able to kill processes.
+ * The check allows to quickly exit the page allocation loop once OoM
+ * killer has initiated the killing of @page_owner, thereby unblocking the context
+ * termination for @page_owner and freeing of GPU memory allocated by it. This helps
+ * in preventing the kernel panic and also limits the number of innocent processes
+ * that get killed.
+ *
+ * Return: true if the page can be allocated otherwise false.
+ */
+static inline bool can_alloc_page(struct kbase_mem_pool *pool, struct task_struct *page_owner)
+{
+	if (page_owner && ((page_owner->flags & PF_EXITING) || fatal_signal_pending(page_owner))) {
+		dev_info(pool->kbdev->dev, "%s : Process %s/%d exiting", __func__, page_owner->comm,
+			 task_pid_nr(page_owner));
+		return false;
+	}
+
+	return true;
+}
+
 static size_t kbase_mem_pool_capacity(struct kbase_mem_pool *pool)
 {
 	ssize_t max_size = kbase_mem_pool_max_size(pool);
@@ -88,9 +129,47 @@ static bool kbase_mem_pool_is_empty(struct kbase_mem_pool *pool)
 	return kbase_mem_pool_size(pool) == 0;
 }
 
+static bool set_pool_new_page_metadata(struct kbase_mem_pool *pool, struct page *p,
+				       struct list_head *page_list, size_t *list_size)
+{
+	struct kbase_page_metadata *page_md = kbase_page_private(p);
+	bool not_movable = false;
+
+	lockdep_assert_held(&pool->pool_lock);
+
+	/* Free the page instead of adding it to the pool if it's not movable.
+	 * Only update page status and add the page to the memory pool if
+	 * it is not isolated.
+	 */
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		not_movable = true;
+	else {
+		spin_lock(&page_md->migrate_lock);
+		if (PAGE_STATUS_GET(page_md->status) == (u8)NOT_MOVABLE) {
+			not_movable = true;
+		} else if (!WARN_ON_ONCE(IS_PAGE_ISOLATED(page_md->status))) {
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)MEM_POOL);
+			page_md->data.mem_pool.pool = pool;
+			page_md->data.mem_pool.kbdev = pool->kbdev;
+			list_add(&p->lru, page_list);
+			(*list_size)++;
+		}
+		spin_unlock(&page_md->migrate_lock);
+	}
+
+	if (not_movable) {
+		kbase_free_page_later(pool->kbdev, p);
+		pool_dbg(pool, "skipping a not movable page\n");
+	}
+
+	return not_movable;
+}
+
 static void kbase_mem_pool_add_locked(struct kbase_mem_pool *pool,
 		struct page *p)
 {
+	bool queue_work_to_free = false;
+
 	if (mali_kbase_mem_pool_order_pages_enabled) {
 		kbase_mem_pool_ordered_add_locked(pool, p);
 		return;
@@ -98,8 +177,19 @@ static void kbase_mem_pool_add_locked(struct kbase_mem_pool *pool,
 
 	lockdep_assert_held(&pool->pool_lock);
 
-	list_add(&p->lru, &pool->page_list);
-	pool->cur_size++;
+	if (!pool->order && kbase_is_page_migration_enabled()) {
+		if (set_pool_new_page_metadata(pool, p, &pool->page_list, &pool->cur_size))
+			queue_work_to_free = true;
+	} else {
+		list_add(&p->lru, &pool->page_list);
+		pool->cur_size++;
+	}
+
+	if (queue_work_to_free) {
+		struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate;
+
+		queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work);
+	}
 
 	pool_dbg(pool, "added page\n");
 }
@@ -114,10 +204,28 @@ static void kbase_mem_pool_add(struct kbase_mem_pool *pool, struct page *p)
 static void kbase_mem_pool_add_list_locked(struct kbase_mem_pool *pool,
 		struct list_head *page_list, size_t nr_pages)
 {
+	bool queue_work_to_free = false;
+
 	lockdep_assert_held(&pool->pool_lock);
 
-	list_splice(page_list, &pool->page_list);
-	pool->cur_size += nr_pages;
+	if (!pool->order && kbase_is_page_migration_enabled()) {
+		struct page *p, *tmp;
+
+		list_for_each_entry_safe(p, tmp, page_list, lru) {
+			list_del_init(&p->lru);
+			if (set_pool_new_page_metadata(pool, p, &pool->page_list, &pool->cur_size))
+				queue_work_to_free = true;
+		}
+	} else {
+		list_splice(page_list, &pool->page_list);
+		pool->cur_size += nr_pages;
+	}
+
+	if (queue_work_to_free) {
+		struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate;
+
+		queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work);
+	}
 
 	pool_dbg(pool, "added %zu pages\n", nr_pages);
 }
@@ -130,7 +238,8 @@ static void kbase_mem_pool_add_list(struct kbase_mem_pool *pool,
 	kbase_mem_pool_unlock(pool);
 }
 
-static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool)
+static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool,
+						 enum kbase_page_status status)
 {
 	struct page *p;
 
@@ -140,6 +249,16 @@ static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool)
 		return NULL;
 
 	p = list_first_entry(&pool->page_list, struct page, lru);
+
+	if (!pool->order && kbase_is_page_migration_enabled()) {
+		struct kbase_page_metadata *page_md = kbase_page_private(p);
+
+		spin_lock(&page_md->migrate_lock);
+		WARN_ON(PAGE_STATUS_GET(page_md->status) != (u8)MEM_POOL);
+		page_md->status = PAGE_STATUS_SET(page_md->status, (u8)status);
+		spin_unlock(&page_md->migrate_lock);
+	}
+
 	list_del_init(&p->lru);
 	pool->cur_size--;
 
@@ -148,12 +267,13 @@ static struct page *kbase_mem_pool_remove_locked(struct kbase_mem_pool *pool)
 	return p;
 }
 
-static struct page *kbase_mem_pool_remove(struct kbase_mem_pool *pool)
+static struct page *kbase_mem_pool_remove(struct kbase_mem_pool *pool,
+					  enum kbase_page_status status)
 {
 	struct page *p;
 
 	kbase_mem_pool_lock(pool);
-	p = kbase_mem_pool_remove_locked(pool);
+	p = kbase_mem_pool_remove_locked(pool, status);
 	kbase_mem_pool_unlock(pool);
 
 	return p;
@@ -163,9 +283,9 @@ static void kbase_mem_pool_sync_page(struct kbase_mem_pool *pool,
 		struct page *p)
 {
 	struct device *dev = pool->kbdev->dev;
+	dma_addr_t dma_addr = pool->order ? kbase_dma_addr_as_priv(p) : kbase_dma_addr(p);
 
-	dma_sync_single_for_device(dev, kbase_dma_addr(p),
-			(PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL);
+	dma_sync_single_for_device(dev, dma_addr, (PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL);
 }
 
 static void kbase_mem_pool_zero_page(struct kbase_mem_pool *pool,
@@ -196,7 +316,7 @@ static void kbase_mem_pool_spill(struct kbase_mem_pool *next_pool,
 struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool)
 {
 	struct page *p;
-	gfp_t gfp = GFP_HIGHUSER | __GFP_ZERO;
+	gfp_t gfp = __GFP_ZERO;
 	struct kbase_device *const kbdev = pool->kbdev;
 	struct device *const dev = kbdev->dev;
 	dma_addr_t dma_addr;
@@ -204,7 +324,9 @@ struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool)
 
 	/* don't warn on higher order failures */
 	if (pool->order)
-		gfp |= __GFP_NOWARN;
+		gfp |= GFP_HIGHUSER | __GFP_NOWARN;
+	else
+		gfp |= kbase_is_page_migration_enabled() ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
 
 	p = kbdev->mgm_dev->ops.mgm_alloc_page(kbdev->mgm_dev,
 		pool->group_id, gfp, pool->order);
@@ -220,30 +342,59 @@ struct page *kbase_mem_alloc_page(struct kbase_mem_pool *pool)
 		return NULL;
 	}
 
-	WARN_ON(dma_addr != page_to_phys(p));
-	for (i = 0; i < (1u << pool->order); i++)
-		kbase_set_dma_addr(p+i, dma_addr + PAGE_SIZE * i);
+	/* Setup page metadata for 4KB pages when page migration is enabled */
+	if (!pool->order && kbase_is_page_migration_enabled()) {
+		INIT_LIST_HEAD(&p->lru);
+		if (!kbase_alloc_page_metadata(kbdev, p, dma_addr, pool->group_id)) {
+			dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+			kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, pool->group_id, p,
+							  pool->order);
+			return NULL;
+		}
+	} else {
+		WARN_ON(dma_addr != page_to_phys(p));
+		for (i = 0; i < (1u << pool->order); i++)
+			kbase_set_dma_addr_as_priv(p + i, dma_addr + PAGE_SIZE * i);
+	}
 
 	return p;
 }
 
-static void kbase_mem_pool_free_page(struct kbase_mem_pool *pool,
-		struct page *p)
+static void enqueue_free_pool_pages_work(struct kbase_mem_pool *pool)
 {
-	struct kbase_device *const kbdev = pool->kbdev;
-	struct device *const dev = kbdev->dev;
-	dma_addr_t dma_addr = kbase_dma_addr(p);
-	int i;
+	struct kbase_mem_migrate *mem_migrate = &pool->kbdev->mem_migrate;
+
+	if (!pool->order && kbase_is_page_migration_enabled())
+		queue_work(mem_migrate->free_pages_workq, &mem_migrate->free_pages_work);
+}
 
-	dma_unmap_page(dev, dma_addr, (PAGE_SIZE << pool->order),
-		       DMA_BIDIRECTIONAL);
-	for (i = 0; i < (1u << pool->order); i++)
-		kbase_clear_dma_addr(p+i);
+void kbase_mem_pool_free_page(struct kbase_mem_pool *pool, struct page *p)
+{
+	struct kbase_device *kbdev;
+
+	if (WARN_ON(!pool))
+		return;
+	if (WARN_ON(!p))
+		return;
+
+	kbdev = pool->kbdev;
 
-	kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev,
-		pool->group_id, p, pool->order);
+	if (!pool->order && kbase_is_page_migration_enabled()) {
+		kbase_free_page_later(kbdev, p);
+		pool_dbg(pool, "page to be freed to kernel later\n");
+	} else {
+		int i;
+		dma_addr_t dma_addr = kbase_dma_addr_as_priv(p);
+
+		for (i = 0; i < (1u << pool->order); i++)
+			kbase_clear_dma_addr_as_priv(p + i);
+
+		dma_unmap_page(kbdev->dev, dma_addr, (PAGE_SIZE << pool->order), DMA_BIDIRECTIONAL);
 
-	pool_dbg(pool, "freed page to kernel\n");
+		kbdev->mgm_dev->ops.mgm_free_page(kbdev->mgm_dev, pool->group_id, p, pool->order);
+
+		pool_dbg(pool, "freed page to kernel\n");
+	}
 }
 
 static size_t kbase_mem_pool_shrink_locked(struct kbase_mem_pool *pool,
@@ -255,10 +406,13 @@ static size_t kbase_mem_pool_shrink_locked(struct kbase_mem_pool *pool,
 	lockdep_assert_held(&pool->pool_lock);
 
 	for (i = 0; i < nr_to_shrink && !kbase_mem_pool_is_empty(pool); i++) {
-		p = kbase_mem_pool_remove_locked(pool);
+		p = kbase_mem_pool_remove_locked(pool, FREE_IN_PROGRESS);
 		kbase_mem_pool_free_page(pool, p);
 	}
 
+	/* Freeing of pages will be deferred when page migration is enabled. */
+	enqueue_free_pool_pages_work(pool);
+
 	return i;
 }
 
@@ -274,8 +428,8 @@ static size_t kbase_mem_pool_shrink(struct kbase_mem_pool *pool,
 	return nr_freed;
 }
 
-int kbase_mem_pool_grow(struct kbase_mem_pool *pool,
-		size_t nr_to_grow)
+int kbase_mem_pool_grow(struct kbase_mem_pool *pool, size_t nr_to_grow,
+			struct task_struct *page_owner)
 {
 	struct page *p;
 	size_t i;
@@ -293,6 +447,9 @@ int kbase_mem_pool_grow(struct kbase_mem_pool *pool,
 		}
 		kbase_mem_pool_unlock(pool);
 
+		if (unlikely(!can_alloc_page(pool, page_owner)))
+			return -ENOMEM;
+
 		p = kbase_mem_alloc_page(pool);
 		if (!p) {
 			kbase_mem_pool_lock(pool);
@@ -310,6 +467,7 @@ int kbase_mem_pool_grow(struct kbase_mem_pool *pool,
 
 	return 0;
 }
+KBASE_EXPORT_TEST_API(kbase_mem_pool_grow);
 
 void kbase_mem_pool_trim(struct kbase_mem_pool *pool, size_t new_size)
 {
@@ -324,7 +482,7 @@ void kbase_mem_pool_trim(struct kbase_mem_pool *pool, size_t new_size)
 	if (new_size < cur_size)
 		kbase_mem_pool_shrink(pool, cur_size - new_size);
 	else if (new_size > cur_size)
-		err = kbase_mem_pool_grow(pool, new_size - cur_size);
+		err = kbase_mem_pool_grow(pool, new_size - cur_size, NULL);
 
 	if (err) {
 		size_t grown_size = kbase_mem_pool_size(pool);
@@ -365,6 +523,9 @@ static unsigned long kbase_mem_pool_reclaim_count_objects(struct shrinker *s,
 	kbase_mem_pool_lock(pool);
 	if (pool->dont_reclaim && !pool->dying) {
 		kbase_mem_pool_unlock(pool);
+		/* Tell shrinker to skip reclaim
+		 * even though freeable pages are available
+		 */
 		return 0;
 	}
 	pool_size = kbase_mem_pool_size(pool);
@@ -384,7 +545,10 @@ static unsigned long kbase_mem_pool_reclaim_scan_objects(struct shrinker *s,
 	kbase_mem_pool_lock(pool);
 	if (pool->dont_reclaim && !pool->dying) {
 		kbase_mem_pool_unlock(pool);
-		return 0;
+		/* Tell shrinker that reclaim can't be made and
+		 * do not attempt again for this reclaim context.
+		 */
+		return SHRINK_STOP;
 	}
 
 	pool_dbg(pool, "reclaim scan %ld:\n", sc->nr_to_scan);
@@ -398,12 +562,9 @@ static unsigned long kbase_mem_pool_reclaim_scan_objects(struct shrinker *s,
 	return freed;
 }
 
-int kbase_mem_pool_init(struct kbase_mem_pool *pool,
-		const struct kbase_mem_pool_config *config,
-		unsigned int order,
-		int group_id,
-		struct kbase_device *kbdev,
-		struct kbase_mem_pool *next_pool)
+int kbase_mem_pool_init(struct kbase_mem_pool *pool, const struct kbase_mem_pool_config *config,
+			unsigned int order, int group_id, struct kbase_device *kbdev,
+			struct kbase_mem_pool *next_pool)
 {
 	if (WARN_ON(group_id < 0) ||
 		WARN_ON(group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS)) {
@@ -417,6 +578,7 @@ int kbase_mem_pool_init(struct kbase_mem_pool *pool,
 	pool->kbdev = kbdev;
 	pool->next_pool = next_pool;
 	pool->dying = false;
+	atomic_set(&pool->isolation_in_progress_cnt, 0);
 
 	spin_lock_init(&pool->pool_lock);
 	INIT_LIST_HEAD(&pool->page_list);
@@ -428,12 +590,17 @@ int kbase_mem_pool_init(struct kbase_mem_pool *pool,
 	 * struct shrinker does not define batch
 	 */
 	pool->reclaim.batch = 0;
-	register_shrinker(&pool->reclaim, "mali-mempool");
+#if KERNEL_VERSION(6, 0, 0) > LINUX_VERSION_CODE
+	register_shrinker(&pool->reclaim);
+#else
+	register_shrinker(&pool->reclaim, "mali-mem-pool");
+#endif
 
 	pool_dbg(pool, "initialized\n");
 
 	return 0;
 }
+KBASE_EXPORT_TEST_API(kbase_mem_pool_init);
 
 void kbase_mem_pool_mark_dying(struct kbase_mem_pool *pool)
 {
@@ -465,15 +632,17 @@ void kbase_mem_pool_term(struct kbase_mem_pool *pool)
 
 		/* Zero pages first without holding the next_pool lock */
 		for (i = 0; i < nr_to_spill; i++) {
-			p = kbase_mem_pool_remove_locked(pool);
-			list_add(&p->lru, &spill_list);
+			p = kbase_mem_pool_remove_locked(pool, SPILL_IN_PROGRESS);
+			if (p)
+				list_add(&p->lru, &spill_list);
 		}
 	}
 
 	while (!kbase_mem_pool_is_empty(pool)) {
 		/* Free remaining pages to kernel */
-		p = kbase_mem_pool_remove_locked(pool);
-		list_add(&p->lru, &free_list);
+		p = kbase_mem_pool_remove_locked(pool, FREE_IN_PROGRESS);
+		if (p)
+			list_add(&p->lru, &free_list);
 	}
 
 	kbase_mem_pool_unlock(pool);
@@ -506,8 +675,19 @@ void kbase_mem_pool_term(struct kbase_mem_pool *pool)
 		kbase_mem_pool_free_page(pool, p);
 	}
 
+	/* Freeing of pages will be deferred when page migration is enabled. */
+	enqueue_free_pool_pages_work(pool);
+
+	/* Before returning wait to make sure there are no pages undergoing page isolation
+	 * which will require reference to this pool.
+	 */
+	if (kbase_is_page_migration_enabled()) {
+		while (atomic_read(&pool->isolation_in_progress_cnt))
+			cpu_relax();
+	}
 	pool_dbg(pool, "terminated\n");
 }
+KBASE_EXPORT_TEST_API(kbase_mem_pool_term);
 
 struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool)
 {
@@ -515,7 +695,7 @@ struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool)
 
 	do {
 		pool_dbg(pool, "alloc()\n");
-		p = kbase_mem_pool_remove(pool);
+		p = kbase_mem_pool_remove(pool, ALLOCATE_IN_PROGRESS);
 
 		if (p)
 			return p;
@@ -528,17 +708,10 @@ struct page *kbase_mem_pool_alloc(struct kbase_mem_pool *pool)
 
 struct page *kbase_mem_pool_alloc_locked(struct kbase_mem_pool *pool)
 {
-	struct page *p;
-
 	lockdep_assert_held(&pool->pool_lock);
 
 	pool_dbg(pool, "alloc_locked()\n");
-	p = kbase_mem_pool_remove_locked(pool);
-
-	if (p)
-		return p;
-
-	return NULL;
+	return kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS);
 }
 
 void kbase_mem_pool_free(struct kbase_mem_pool *pool, struct page *p,
@@ -565,6 +738,8 @@ void kbase_mem_pool_free(struct kbase_mem_pool *pool, struct page *p,
 	} else {
 		/* Free page */
 		kbase_mem_pool_free_page(pool, p);
+		/* Freeing of pages will be deferred when page migration is enabled. */
+		enqueue_free_pool_pages_work(pool);
 	}
 }
 
@@ -589,11 +764,14 @@ void kbase_mem_pool_free_locked(struct kbase_mem_pool *pool, struct page *p,
 	} else {
 		/* Free page */
 		kbase_mem_pool_free_page(pool, p);
+		/* Freeing of pages will be deferred when page migration is enabled. */
+		enqueue_free_pool_pages_work(pool);
 	}
 }
 
 int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
-		struct tagged_addr *pages, bool partial_allowed)
+			       struct tagged_addr *pages, bool partial_allowed,
+			       struct task_struct *page_owner)
 {
 	struct page *p;
 	size_t nr_from_pool;
@@ -612,10 +790,12 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
 	/* Get pages from this pool */
 	kbase_mem_pool_lock(pool);
 	nr_from_pool = min(nr_pages_internal, kbase_mem_pool_size(pool));
+
 	while (nr_from_pool--) {
 		int j;
 
-		p = kbase_mem_pool_remove_locked(pool);
+		p = kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS);
+
 		if (pool->order) {
 			pages[i++] = as_tagged_tag(page_to_phys(p),
 						   HUGE_HEAD | HUGE_PAGE);
@@ -631,8 +811,8 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
 
 	if (i != nr_4k_pages && pool->next_pool) {
 		/* Allocate via next pool */
-		err = kbase_mem_pool_alloc_pages(pool->next_pool,
-				nr_4k_pages - i, pages + i, partial_allowed);
+		err = kbase_mem_pool_alloc_pages(pool->next_pool, nr_4k_pages - i, pages + i,
+						 partial_allowed, page_owner);
 
 		if (err < 0)
 			goto err_rollback;
@@ -641,6 +821,9 @@ int kbase_mem_pool_alloc_pages(struct kbase_mem_pool *pool, size_t nr_4k_pages,
 	} else {
 		/* Get any remaining pages from kernel */
 		while (i != nr_4k_pages) {
+			if (unlikely(!can_alloc_page(pool, page_owner)))
+				goto err_rollback;
+
 			p = kbase_mem_alloc_page(pool);
 			if (!p) {
 				if (partial_allowed)
@@ -674,6 +857,9 @@ done:
 
 err_rollback:
 	kbase_mem_pool_free_pages(pool, i, pages, NOT_DIRTY, NOT_RECLAIMED);
+	dev_warn(pool->kbdev->dev,
+			"Failed allocation request for remaining %zu pages after obtaining %zu pages already.\n",
+			nr_4k_pages, i);
 	return err;
 }
 
@@ -703,7 +889,7 @@ int kbase_mem_pool_alloc_pages_locked(struct kbase_mem_pool *pool,
 	for (i = 0; i < nr_pages_internal; i++) {
 		int j;
 
-		p = kbase_mem_pool_remove_locked(pool);
+		p = kbase_mem_pool_remove_locked(pool, ALLOCATE_IN_PROGRESS);
 		if (pool->order) {
 			*pages++ = as_tagged_tag(page_to_phys(p),
 						   HUGE_HEAD | HUGE_PAGE);
@@ -810,6 +996,7 @@ void kbase_mem_pool_free_pages(struct kbase_mem_pool *pool, size_t nr_pages,
 	size_t nr_to_pool;
 	LIST_HEAD(to_pool_list);
 	size_t i = 0;
+	bool pages_released = false;
 
 	if (mali_kbase_mem_pool_order_pages_enabled) {
 		kbase_mem_pool_ordered_free_pages(pool, nr_pages, pages, dirty,
@@ -848,13 +1035,17 @@ void kbase_mem_pool_free_pages(struct kbase_mem_pool *pool, size_t nr_pages,
 			pages[i] = as_tagged(0);
 			continue;
 		}
-
 		p = as_page(pages[i]);
 
 		kbase_mem_pool_free_page(pool, p);
 		pages[i] = as_tagged(0);
+		pages_released = true;
 	}
 
+	/* Freeing of pages will be deferred when page migration is enabled. */
+	if (pages_released)
+		enqueue_free_pool_pages_work(pool);
+
 	pool_dbg(pool, "free_pages(%zu) done\n", nr_pages);
 }
 
@@ -867,6 +1058,7 @@ void kbase_mem_pool_free_pages_locked(struct kbase_mem_pool *pool,
 	size_t nr_to_pool;
 	LIST_HEAD(to_pool_list);
 	size_t i = 0;
+	bool pages_released = false;
 
 	if (mali_kbase_mem_pool_order_pages_enabled) {
 		kbase_mem_pool_ordered_free_pages_locked(pool, nr_pages, pages,
@@ -903,8 +1095,13 @@ void kbase_mem_pool_free_pages_locked(struct kbase_mem_pool *pool,
 
 		kbase_mem_pool_free_page(pool, p);
 		pages[i] = as_tagged(0);
+		pages_released = true;
 	}
 
+	/* Freeing of pages will be deferred when page migration is enabled. */
+	if (pages_released)
+		enqueue_free_pool_pages_work(pool);
+
 	pool_dbg(pool, "free_pages_locked(%zu) done\n", nr_pages);
 }
 
diff --git a/mali_kbase/mali_kbase_mem_pool_debugfs.c b/mali_kbase/mali_kbase_mem_pool_debugfs.c
index cfb43b0..3b1b2ba 100644
--- a/mali_kbase/mali_kbase_mem_pool_debugfs.c
+++ b/mali_kbase/mali_kbase_mem_pool_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -168,13 +168,7 @@ static const struct file_operations kbase_mem_pool_debugfs_max_size_fops = {
 void kbase_mem_pool_debugfs_init(struct dentry *parent,
 		struct kbase_context *kctx)
 {
-	/* prevent unprivileged use of debug file in old kernel version */
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
-	/* only for newer kernel version debug file system is safe */
 	const mode_t mode = 0644;
-#else
-	const mode_t mode = 0600;
-#endif
 
 	debugfs_create_file("mem_pool_size", mode, parent,
 		&kctx->mem_pools.small, &kbase_mem_pool_debugfs_fops);
diff --git a/mali_kbase/mali_kbase_mem_pool_group.c b/mali_kbase/mali_kbase_mem_pool_group.c
index 8d7bb4d..49c4b04 100644
--- a/mali_kbase/mali_kbase_mem_pool_group.c
+++ b/mali_kbase/mali_kbase_mem_pool_group.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -43,29 +43,22 @@ void kbase_mem_pool_group_config_set_max_size(
 	}
 }
 
-int kbase_mem_pool_group_init(
-	struct kbase_mem_pool_group *const mem_pools,
-	struct kbase_device *const kbdev,
-	const struct kbase_mem_pool_group_config *const configs,
-	struct kbase_mem_pool_group *next_pools)
+int kbase_mem_pool_group_init(struct kbase_mem_pool_group *const mem_pools,
+			      struct kbase_device *const kbdev,
+			      const struct kbase_mem_pool_group_config *const configs,
+			      struct kbase_mem_pool_group *next_pools)
 {
 	int gid, err = 0;
 
 	for (gid = 0; gid < MEMORY_GROUP_MANAGER_NR_GROUPS; ++gid) {
-		err = kbase_mem_pool_init(&mem_pools->small[gid],
-			&configs->small[gid],
-			KBASE_MEM_POOL_4KB_PAGE_TABLE_ORDER,
-			gid,
-			kbdev,
-			next_pools ? &next_pools->small[gid] : NULL);
+		err = kbase_mem_pool_init(&mem_pools->small[gid], &configs->small[gid],
+					  KBASE_MEM_POOL_4KB_PAGE_TABLE_ORDER, gid, kbdev,
+					  next_pools ? &next_pools->small[gid] : NULL);
 
 		if (!err) {
-			err = kbase_mem_pool_init(&mem_pools->large[gid],
-				&configs->large[gid],
-				KBASE_MEM_POOL_2MB_PAGE_TABLE_ORDER,
-				gid,
-				kbdev,
-				next_pools ? &next_pools->large[gid] : NULL);
+			err = kbase_mem_pool_init(&mem_pools->large[gid], &configs->large[gid],
+						  KBASE_MEM_POOL_2MB_PAGE_TABLE_ORDER, gid, kbdev,
+						  next_pools ? &next_pools->large[gid] : NULL);
 			if (err)
 				kbase_mem_pool_term(&mem_pools->small[gid]);
 		}
diff --git a/mali_kbase/mali_kbase_mem_pool_group.h b/mali_kbase/mali_kbase_mem_pool_group.h
index c50ffdb..fe8ce77 100644
--- a/mali_kbase/mali_kbase_mem_pool_group.h
+++ b/mali_kbase/mali_kbase_mem_pool_group.h
@@ -49,8 +49,8 @@ static inline struct kbase_mem_pool *kbase_mem_pool_group_select(
 }
 
 /**
- * kbase_mem_pool_group_config_init - Set the initial configuration for a
- *                                    set of memory pools
+ * kbase_mem_pool_group_config_set_max_size - Set the initial configuration for
+ * a set of memory pools
  *
  * @configs:  Initial configuration for the set of memory pools
  * @max_size: Maximum number of free 4 KiB pages each pool can hold
@@ -86,13 +86,12 @@ void kbase_mem_pool_group_config_set_max_size(
  *
  * Return: 0 on success, otherwise a negative error code
  */
-int kbase_mem_pool_group_init(struct kbase_mem_pool_group *mem_pools,
-	struct kbase_device *kbdev,
-	const struct kbase_mem_pool_group_config *configs,
-	struct kbase_mem_pool_group *next_pools);
+int kbase_mem_pool_group_init(struct kbase_mem_pool_group *mem_pools, struct kbase_device *kbdev,
+			      const struct kbase_mem_pool_group_config *configs,
+			      struct kbase_mem_pool_group *next_pools);
 
 /**
- * kbase_mem_pool_group_term - Mark a set of memory pools as dying
+ * kbase_mem_pool_group_mark_dying - Mark a set of memory pools as dying
  *
  * @mem_pools: Set of memory pools to mark
  *
diff --git a/mali_kbase/mali_kbase_mem_profile_debugfs.c b/mali_kbase/mali_kbase_mem_profile_debugfs.c
index 92ab1b8..9317023 100644
--- a/mali_kbase/mali_kbase_mem_profile_debugfs.c
+++ b/mali_kbase/mali_kbase_mem_profile_debugfs.c
@@ -69,11 +69,7 @@ static const struct file_operations kbasep_mem_profile_debugfs_fops = {
 int kbasep_mem_profile_debugfs_insert(struct kbase_context *kctx, char *data,
 					size_t size)
 {
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
 	const mode_t mode = 0444;
-#else
-	const mode_t mode = 0400;
-#endif
 	int err = 0;
 
 	mutex_lock(&kctx->mem_profile_lock);
diff --git a/mali_kbase/mali_kbase_native_mgm.c b/mali_kbase/mali_kbase_native_mgm.c
index 4554bee..10a7f50 100644
--- a/mali_kbase/mali_kbase_native_mgm.c
+++ b/mali_kbase/mali_kbase_native_mgm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -140,6 +140,30 @@ kbase_native_mgm_update_gpu_pte(struct memory_group_manager_device *mgm_dev,
 	return pte;
 }
 
+/**
+ * kbase_native_mgm_pte_to_original_pte - Native method to undo changes done in
+ *                                        kbase_native_mgm_update_gpu_pte()
+ *
+ * @mgm_dev:   The memory group manager the request is being made through.
+ * @group_id:  A physical memory group ID, which must be valid but is not used.
+ *             Its valid range is 0 .. MEMORY_GROUP_MANAGER_NR_GROUPS-1.
+ * @mmu_level: The level of the MMU page table where the page is getting mapped.
+ * @pte:       The prepared page table entry.
+ *
+ * This function simply returns the @pte without modification.
+ *
+ * Return: A GPU page table entry to be stored in a page table.
+ */
+static u64 kbase_native_mgm_pte_to_original_pte(struct memory_group_manager_device *mgm_dev,
+						int group_id, int mmu_level, u64 pte)
+{
+	CSTD_UNUSED(mgm_dev);
+	CSTD_UNUSED(group_id);
+	CSTD_UNUSED(mmu_level);
+
+	return pte;
+}
+
 struct memory_group_manager_device kbase_native_mgm_dev = {
 	.ops = {
 		.mgm_alloc_page = kbase_native_mgm_alloc,
@@ -147,6 +171,7 @@ struct memory_group_manager_device kbase_native_mgm_dev = {
 		.mgm_get_import_memory_id = NULL,
 		.mgm_vmf_insert_pfn_prot = kbase_native_mgm_vmf_insert_pfn_prot,
 		.mgm_update_gpu_pte = kbase_native_mgm_update_gpu_pte,
+		.mgm_pte_to_original_pte = kbase_native_mgm_pte_to_original_pte,
 	},
 	.data = NULL
 };
diff --git a/mali_kbase/mali_kbase_pbha.c b/mali_kbase/mali_kbase_pbha.c
index 90406b2..b446bd5 100644
--- a/mali_kbase/mali_kbase_pbha.c
+++ b/mali_kbase/mali_kbase_pbha.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,7 +23,10 @@
 
 #include <device/mali_kbase_device.h>
 #include <mali_kbase.h>
+
+#if MALI_USE_CSF
 #define DTB_SET_SIZE 2
+#endif
 
 static bool read_setting_valid(unsigned int id, unsigned int read_setting)
 {
@@ -209,31 +212,36 @@ void kbase_pbha_write_settings(struct kbase_device *kbdev)
 	}
 }
 
-int kbase_pbha_read_dtb(struct kbase_device *kbdev)
+#if MALI_USE_CSF
+static int kbase_pbha_read_int_id_override_property(struct kbase_device *kbdev,
+						    const struct device_node *pbha_node)
 {
 	u32 dtb_data[SYSC_ALLOC_COUNT * sizeof(u32) * DTB_SET_SIZE];
-	const struct device_node *pbha_node;
 	int sz, i;
 	bool valid = true;
 
-	if (!kbasep_pbha_supported(kbdev))
-		return 0;
+	sz = of_property_count_elems_of_size(pbha_node, "int-id-override", sizeof(u32));
 
-	pbha_node = of_get_child_by_name(kbdev->dev->of_node, "pbha");
-	if (!pbha_node)
+	if (sz == -EINVAL) {
+		/* There is no int-id-override field. Fallback to int_id_override instead */
+		sz = of_property_count_elems_of_size(pbha_node, "int_id_override", sizeof(u32));
+	}
+	if (sz == -EINVAL) {
+		/* There is no int_id_override field. This is valid - but there's nothing further
+		 * to do here.
+		 */
 		return 0;
-
-	sz = of_property_count_elems_of_size(pbha_node, "int_id_override",
-					     sizeof(u32));
+	}
 	if (sz <= 0 || (sz % DTB_SET_SIZE != 0)) {
 		dev_err(kbdev->dev, "Bad DTB format: pbha.int_id_override\n");
 		return -EINVAL;
 	}
-	if (of_property_read_u32_array(pbha_node, "int_id_override", dtb_data,
-				       sz) != 0) {
-		dev_err(kbdev->dev,
-			"Failed to read DTB pbha.int_id_override\n");
-		return -EINVAL;
+	if (of_property_read_u32_array(pbha_node, "int-id-override", dtb_data, sz) != 0) {
+		/* There may be no int-id-override field. Fallback to int_id_override instead */
+		if (of_property_read_u32_array(pbha_node, "int_id_override", dtb_data, sz) != 0) {
+			dev_err(kbdev->dev, "Failed to read DTB pbha.int_id_override\n");
+			return -EINVAL;
+		}
 	}
 
 	for (i = 0; valid && i < sz; i = i + DTB_SET_SIZE) {
@@ -256,3 +264,66 @@ int kbase_pbha_read_dtb(struct kbase_device *kbdev)
 	}
 	return 0;
 }
+
+static int kbase_pbha_read_propagate_bits_property(struct kbase_device *kbdev,
+						   const struct device_node *pbha_node)
+{
+	u32 bits = 0;
+	int err;
+
+	if (!kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU))
+		return 0;
+
+	err = of_property_read_u32(pbha_node, "propagate-bits", &bits);
+
+	if (err == -EINVAL) {
+		err = of_property_read_u32(pbha_node, "propagate_bits", &bits);
+	}
+
+	if (err < 0) {
+		if (err != -EINVAL) {
+			dev_err(kbdev->dev,
+				"DTB value for propagate_bits is improperly formed (err=%d)\n",
+				err);
+			return err;
+		} else {
+			/* Property does not exist */
+			kbdev->pbha_propagate_bits = 0;
+			return 0;
+		}
+	}
+
+	if (bits > (L2_CONFIG_PBHA_HWU_MASK >> L2_CONFIG_PBHA_HWU_SHIFT)) {
+		dev_err(kbdev->dev, "Bad DTB value for propagate_bits: 0x%x\n", bits);
+		return -EINVAL;
+	}
+
+	kbdev->pbha_propagate_bits = bits;
+	return 0;
+}
+#endif /* MALI_USE_CSF */
+
+int kbase_pbha_read_dtb(struct kbase_device *kbdev)
+{
+#if MALI_USE_CSF
+	const struct device_node *pbha_node;
+	int err;
+
+	if (!kbasep_pbha_supported(kbdev))
+		return 0;
+
+	pbha_node = of_get_child_by_name(kbdev->dev->of_node, "pbha");
+	if (!pbha_node)
+		return 0;
+
+	err = kbase_pbha_read_int_id_override_property(kbdev, pbha_node);
+
+	if (err < 0)
+		return err;
+
+	err = kbase_pbha_read_propagate_bits_property(kbdev, pbha_node);
+	return err;
+#else
+	return 0;
+#endif
+}
diff --git a/mali_kbase/mali_kbase_pbha_debugfs.c b/mali_kbase/mali_kbase_pbha_debugfs.c
index 47eab63..1cc29c7 100644
--- a/mali_kbase/mali_kbase_pbha_debugfs.c
+++ b/mali_kbase/mali_kbase_pbha_debugfs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,13 +20,15 @@
  */
 
 #include "mali_kbase_pbha_debugfs.h"
-
 #include "mali_kbase_pbha.h"
-
 #include <device/mali_kbase_device.h>
 #include <mali_kbase_reset_gpu.h>
 #include <mali_kbase.h>
 
+#if MALI_USE_CSF
+#include "backend/gpu/mali_kbase_pm_internal.h"
+#endif
+
 static int int_id_overrides_show(struct seq_file *sfile, void *data)
 {
 	struct kbase_device *kbdev = sfile->private;
@@ -108,6 +110,90 @@ static int int_id_overrides_open(struct inode *in, struct file *file)
 	return single_open(file, int_id_overrides_show, in->i_private);
 }
 
+#if MALI_USE_CSF
+/**
+ * propagate_bits_show - Read PBHA bits from L2_CONFIG out to debugfs.
+ *
+ * @sfile: The debugfs entry.
+ * @data: Data associated with the entry.
+ *
+ * Return: 0 in all cases.
+ */
+static int propagate_bits_show(struct seq_file *sfile, void *data)
+{
+	struct kbase_device *kbdev = sfile->private;
+	u32 l2_config_val;
+
+	kbase_csf_scheduler_pm_active(kbdev);
+	kbase_pm_wait_for_l2_powered(kbdev);
+	l2_config_val = L2_CONFIG_PBHA_HWU_GET(kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_CONFIG)));
+	kbase_csf_scheduler_pm_idle(kbdev);
+
+	seq_printf(sfile, "PBHA Propagate Bits: 0x%x\n", l2_config_val);
+	return 0;
+}
+
+static int propagate_bits_open(struct inode *in, struct file *file)
+{
+	return single_open(file, propagate_bits_show, in->i_private);
+}
+
+/**
+ * propagate_bits_write - Write input value from debugfs to PBHA bits of L2_CONFIG register.
+ *
+ * @file:     Pointer to file struct of debugfs node.
+ * @ubuf:     Pointer to user buffer with value to be written.
+ * @count:    Size of user buffer.
+ * @ppos:     Not used.
+ *
+ * Return: Size of buffer passed in when successful, but error code E2BIG/EINVAL otherwise.
+ */
+static ssize_t propagate_bits_write(struct file *file, const char __user *ubuf, size_t count,
+				    loff_t *ppos)
+{
+	struct seq_file *sfile = file->private_data;
+	struct kbase_device *kbdev = sfile->private;
+	/* 32 characters should be enough for the input string in any base */
+	char raw_str[32];
+	unsigned long propagate_bits;
+
+	if (count >= sizeof(raw_str))
+		return -E2BIG;
+	if (copy_from_user(raw_str, ubuf, count))
+		return -EINVAL;
+	raw_str[count] = '\0';
+	if (kstrtoul(raw_str, 0, &propagate_bits))
+		return -EINVAL;
+
+	/* Check propagate_bits input argument does not
+	 * exceed the maximum size of the propagate_bits mask.
+	 */
+	if (propagate_bits > (L2_CONFIG_PBHA_HWU_MASK >> L2_CONFIG_PBHA_HWU_SHIFT))
+		return -EINVAL;
+	/* Cast to u8 is safe as check is done already to ensure size is within
+	 * correct limits.
+	 */
+	kbdev->pbha_propagate_bits = (u8)propagate_bits;
+
+	/* GPU Reset will set new values in L2 config */
+	if (kbase_prepare_to_reset_gpu(kbdev, RESET_FLAGS_NONE)) {
+		kbase_reset_gpu(kbdev);
+		kbase_reset_gpu_wait(kbdev);
+	}
+
+	return count;
+}
+
+static const struct file_operations pbha_propagate_bits_fops = {
+	.owner = THIS_MODULE,
+	.open = propagate_bits_open,
+	.read = seq_read,
+	.write = propagate_bits_write,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+#endif /* MALI_USE_CSF */
+
 static const struct file_operations pbha_int_id_overrides_fops = {
 	.owner = THIS_MODULE,
 	.open = int_id_overrides_open,
@@ -120,14 +206,10 @@ static const struct file_operations pbha_int_id_overrides_fops = {
 void kbase_pbha_debugfs_init(struct kbase_device *kbdev)
 {
 	if (kbasep_pbha_supported(kbdev)) {
-#if (KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE)
-		/* only for newer kernel version debug file system is safe */
 		const mode_t mode = 0644;
-#else
-		const mode_t mode = 0600;
-#endif
 		struct dentry *debugfs_pbha_dir = debugfs_create_dir(
 			"pbha", kbdev->mali_debugfs_directory);
+
 		if (IS_ERR_OR_NULL(debugfs_pbha_dir)) {
 			dev_err(kbdev->dev,
 				"Couldn't create mali debugfs page-based hardware attributes directory\n");
@@ -136,5 +218,10 @@ void kbase_pbha_debugfs_init(struct kbase_device *kbdev)
 
 		debugfs_create_file("int_id_overrides", mode, debugfs_pbha_dir,
 				    kbdev, &pbha_int_id_overrides_fops);
+#if MALI_USE_CSF
+		if (kbase_hw_has_feature(kbdev, BASE_HW_FEATURE_PBHA_HWU))
+			debugfs_create_file("propagate_bits", mode, debugfs_pbha_dir, kbdev,
+					    &pbha_propagate_bits_fops);
+#endif /* MALI_USE_CSF */
 	}
 }
diff --git a/mali_kbase/mali_kbase_pbha_debugfs.h b/mali_kbase/mali_kbase_pbha_debugfs.h
index 3f477b4..508ecdf 100644
--- a/mali_kbase/mali_kbase_pbha_debugfs.h
+++ b/mali_kbase/mali_kbase_pbha_debugfs.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,7 +25,7 @@
 #include <mali_kbase.h>
 
 /**
- * kbasep_pbha_debugfs_init - Initialize pbha debugfs directory
+ * kbase_pbha_debugfs_init - Initialize pbha debugfs directory
  *
  * @kbdev: Device pointer
  */
diff --git a/mali_kbase/mali_kbase_platform_fake.c b/mali_kbase/mali_kbase_platform_fake.c
index bf525ed..265c676 100644
--- a/mali_kbase/mali_kbase_platform_fake.c
+++ b/mali_kbase/mali_kbase_platform_fake.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2014, 2016-2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2014, 2016-2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -32,14 +32,15 @@
  */
 #include <mali_kbase_config.h>
 
+#ifndef CONFIG_OF
+
 #define PLATFORM_CONFIG_RESOURCE_COUNT 4
-#define PLATFORM_CONFIG_IRQ_RES_COUNT  3
 
 static struct platform_device *mali_device;
 
-#ifndef CONFIG_OF
 /**
- * Convert data in struct kbase_io_resources struct to Linux-specific resources
+ * kbasep_config_parse_io_resources - Convert data in struct kbase_io_resources
+ * struct to Linux-specific resources
  * @io_resources:      Input IO resource data
  * @linux_resources:  Pointer to output array of Linux resource structures
  *
@@ -72,14 +73,11 @@ static void kbasep_config_parse_io_resources(const struct kbase_io_resources *io
 	linux_resources[3].end   = io_resources->gpu_irq_number;
 	linux_resources[3].flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL;
 }
-#endif /* CONFIG_OF */
 
 int kbase_platform_register(void)
 {
 	struct kbase_platform_config *config;
-#ifndef CONFIG_OF
 	struct resource resources[PLATFORM_CONFIG_RESOURCE_COUNT];
-#endif
 	int err;
 
 	config = kbase_get_platform_config(); /* declared in midgard/mali_kbase_config.h but defined in platform folder */
@@ -92,7 +90,6 @@ int kbase_platform_register(void)
 	if (mali_device == NULL)
 		return -ENOMEM;
 
-#ifndef CONFIG_OF
 	kbasep_config_parse_io_resources(config->io_resources, resources);
 	err = platform_device_add_resources(mali_device, resources, PLATFORM_CONFIG_RESOURCE_COUNT);
 	if (err) {
@@ -100,7 +97,6 @@ int kbase_platform_register(void)
 		mali_device = NULL;
 		return err;
 	}
-#endif /* CONFIG_OF */
 
 	err = platform_device_add(mali_device);
 	if (err) {
@@ -119,3 +115,5 @@ void kbase_platform_unregister(void)
 		platform_device_unregister(mali_device);
 }
 EXPORT_SYMBOL(kbase_platform_unregister);
+
+#endif /* CONFIG_OF */
diff --git a/mali_kbase/mali_kbase_pm.c b/mali_kbase/mali_kbase_pm.c
index de2422c..d6c559a 100644
--- a/mali_kbase/mali_kbase_pm.c
+++ b/mali_kbase/mali_kbase_pm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,7 +27,7 @@
 #include <gpu/mali_kbase_gpu_regmap.h>
 #include <mali_kbase_vinstr.h>
 #include <mali_kbase_kinstr_prfcnt.h>
-#include <mali_kbase_hwcnt_context.h>
+#include <hwcnt/mali_kbase_hwcnt_context.h>
 
 #include <mali_kbase_pm.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
@@ -159,13 +159,13 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev)
 	 */
 	kbase_hwcnt_context_disable(kbdev->hwcnt_gpu_ctx);
 
-	mutex_lock(&kbdev->pm.lock);
+	rt_mutex_lock(&kbdev->pm.lock);
 	if (WARN_ON(kbase_pm_is_suspending(kbdev))) {
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 		return 0;
 	}
 	kbdev->pm.suspending = true;
-	mutex_unlock(&kbdev->pm.lock);
+	rt_mutex_unlock(&kbdev->pm.lock);
 
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	if (kbdev->arb.arb_if) {
@@ -194,9 +194,9 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev)
 	kbasep_js_suspend(kbdev);
 #else
 	if (kbase_csf_scheduler_pm_suspend(kbdev)) {
-		mutex_lock(&kbdev->pm.lock);
+		rt_mutex_lock(&kbdev->pm.lock);
 		kbdev->pm.suspending = false;
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 		return -1;
 	}
 #endif
@@ -211,13 +211,31 @@ int kbase_pm_driver_suspend(struct kbase_device *kbdev)
 		kbdev->pm.active_count == 0);
 	dev_dbg(kbdev->dev, ">wait_event - waiting done\n");
 
+#if MALI_USE_CSF
+	/* At this point, any kbase context termination should either have run to
+	 * completion and any further context termination can only begin after
+	 * the system resumes. Therefore, it is now safe to skip taking the context
+	 * list lock when traversing the context list.
+	 */
+	if (kbase_csf_kcpu_queue_halt_timers(kbdev)) {
+		rt_mutex_lock(&kbdev->pm.lock);
+		kbdev->pm.suspending = false;
+		rt_mutex_unlock(&kbdev->pm.lock);
+		return -1;
+	}
+#endif
+
 	/* NOTE: We synchronize with anything that was just finishing a
 	 * kbase_pm_context_idle() call by locking the pm.lock below
 	 */
 	if (kbase_hwaccess_pm_suspend(kbdev)) {
-		mutex_lock(&kbdev->pm.lock);
+#if MALI_USE_CSF
+		/* Resume the timers in case of suspend failure. */
+		kbase_csf_kcpu_queue_resume_timers(kbdev);
+#endif
+		rt_mutex_lock(&kbdev->pm.lock);
 		kbdev->pm.suspending = false;
-		mutex_unlock(&kbdev->pm.lock);
+		rt_mutex_unlock(&kbdev->pm.lock);
 		return -1;
 	}
 
@@ -262,6 +280,8 @@ void kbase_pm_driver_resume(struct kbase_device *kbdev, bool arb_gpu_start)
 	kbasep_js_resume(kbdev);
 #else
 	kbase_csf_scheduler_pm_resume(kbdev);
+
+	kbase_csf_kcpu_queue_resume_timers(kbdev);
 #endif
 
 	/* Matching idle call, to power off the GPU/cores if we didn't actually
@@ -283,6 +303,10 @@ void kbase_pm_driver_resume(struct kbase_device *kbdev, bool arb_gpu_start)
 	/* Resume HW counters intermediaries. */
 	kbase_vinstr_resume(kbdev->vinstr_ctx);
 	kbase_kinstr_prfcnt_resume(kbdev->kinstr_prfcnt_ctx);
+	/* System resume callback is complete */
+	kbdev->pm.resuming = false;
+	/* Unblock the threads waiting for the completion of System suspend/resume */
+	wake_up_all(&kbdev->pm.resume_wait);
 }
 
 int kbase_pm_suspend(struct kbase_device *kbdev)
@@ -462,11 +486,11 @@ static enum hrtimer_restart kbase_pm_apc_timer_callback(struct hrtimer *timer)
 
 int kbase_pm_apc_init(struct kbase_device *kbdev)
 {
-	kthread_init_worker(&kbdev->apc.worker);
-	kbdev->apc.thread = kbase_create_realtime_thread(kbdev,
-		kthread_worker_fn, &kbdev->apc.worker, "mali_apc_thread");
-	if (IS_ERR(kbdev->apc.thread))
-		return PTR_ERR(kbdev->apc.thread);
+	int ret;
+
+	ret = kbase_kthread_run_worker_rt(kbdev, &kbdev->apc.worker, "mali_apc_thread");
+	if (ret)
+		return ret;
 
 	/*
 	 * We initialize power off and power on work on init as they will each
@@ -486,6 +510,5 @@ int kbase_pm_apc_init(struct kbase_device *kbdev)
 void kbase_pm_apc_term(struct kbase_device *kbdev)
 {
 	hrtimer_cancel(&kbdev->apc.timer);
-	kthread_flush_worker(&kbdev->apc.worker);
-	kthread_stop(kbdev->apc.thread);
+	kbase_destroy_kworker_stack(&kbdev->apc.worker);
 }
diff --git a/mali_kbase/mali_kbase_pm.h b/mali_kbase/mali_kbase_pm.h
index 7252bc7..4ff3699 100644
--- a/mali_kbase/mali_kbase_pm.h
+++ b/mali_kbase/mali_kbase_pm.h
@@ -292,4 +292,14 @@ void kbase_pm_apc_term(struct kbase_device *kbdev);
  */
 void kbase_pm_apc_request(struct kbase_device *kbdev, u32 dur_usec);
 
+/**
+ * Print debug message indicating power state of GPU
+ * @kbdev: The kbase device structure for the device (must be a valid pointer)
+ * @timeout_msg: A message to print.
+ *
+ * Prerequisite: GPU is powered.
+ * Takes and releases kbdev->hwaccess_lock on CSF GPUs.
+ */
+void kbase_gpu_timeout_debug_message(struct kbase_device *kbdev, const char *timeout_msg);
+
 #endif /* _KBASE_PM_H_ */
diff --git a/mali_kbase/mali_kbase_refcount_defs.h b/mali_kbase/mali_kbase_refcount_defs.h
new file mode 100644
index 0000000..c517a2d
--- /dev/null
+++ b/mali_kbase/mali_kbase_refcount_defs.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _KBASE_REFCOUNT_DEFS_H_
+#define _KBASE_REFCOUNT_DEFS_H_
+
+/*
+ * The Refcount API is available from 4.11 onwards
+ * This file hides the compatibility issues with this for the rest the driver
+ */
+
+#include <linux/version.h>
+#include <linux/types.h>
+
+#if (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE)
+
+#define kbase_refcount_t atomic_t
+#define kbase_refcount_read(x) atomic_read(x)
+#define kbase_refcount_set(x, v) atomic_set(x, v)
+#define kbase_refcount_dec_and_test(x) atomic_dec_and_test(x)
+#define kbase_refcount_dec(x) atomic_dec(x)
+#define kbase_refcount_inc_not_zero(x) atomic_inc_not_zero(x)
+#define kbase_refcount_inc(x) atomic_inc(x)
+
+#else
+
+#include <linux/refcount.h>
+
+#define kbase_refcount_t refcount_t
+#define kbase_refcount_read(x) refcount_read(x)
+#define kbase_refcount_set(x, v) refcount_set(x, v)
+#define kbase_refcount_dec_and_test(x) refcount_dec_and_test(x)
+#define kbase_refcount_dec(x) refcount_dec(x)
+#define kbase_refcount_inc_not_zero(x) refcount_inc_not_zero(x)
+#define kbase_refcount_inc(x) refcount_inc(x)
+
+#endif /* (KERNEL_VERSION(4, 11, 0) > LINUX_VERSION_CODE) */
+
+#endif /* _KBASE_REFCOUNT_DEFS_H_ */
diff --git a/mali_kbase/mali_kbase_regs_history_debugfs.c b/mali_kbase/mali_kbase_regs_history_debugfs.c
index f8dec6b..c19b4a3 100644
--- a/mali_kbase/mali_kbase_regs_history_debugfs.c
+++ b/mali_kbase/mali_kbase_regs_history_debugfs.c
@@ -25,6 +25,7 @@
 #if defined(CONFIG_DEBUG_FS) && !IS_ENABLED(CONFIG_MALI_NO_MALI)
 
 #include <linux/debugfs.h>
+#include <linux/version_compat_defs.h>
 
 /**
  * kbase_io_history_resize - resize the register access history buffer.
@@ -158,11 +159,8 @@ static int regs_history_size_set(void *data, u64 val)
 	return kbase_io_history_resize(h, (u16)val);
 }
 
-
-DEFINE_SIMPLE_ATTRIBUTE(regs_history_size_fops,
-		regs_history_size_get,
-		regs_history_size_set,
-		"%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(regs_history_size_fops, regs_history_size_get, regs_history_size_set,
+			 "%llu\n");
 
 /**
  * regs_history_show - show callback for the register access history file.
diff --git a/mali_kbase/mali_kbase_reset_gpu.h b/mali_kbase/mali_kbase_reset_gpu.h
index ff631e9..5063b64 100644
--- a/mali_kbase/mali_kbase_reset_gpu.h
+++ b/mali_kbase/mali_kbase_reset_gpu.h
@@ -144,6 +144,14 @@ void kbase_reset_gpu_assert_prevented(struct kbase_device *kbdev);
 void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev);
 
 /**
+ * kbase_reset_gpu_failed - Return whether a previous GPU reset failed.
+ *
+ * @kbdev: Device pointer
+ *
+ */
+bool kbase_reset_gpu_failed(struct kbase_device *kbdev);
+
+/**
  * RESET_FLAGS_NONE - Flags for kbase_prepare_to_reset_gpu
  */
 #define RESET_FLAGS_NONE (0U)
@@ -151,6 +159,9 @@ void kbase_reset_gpu_assert_failed_or_prevented(struct kbase_device *kbdev);
 /* This reset should be treated as an unrecoverable error by HW counter logic */
 #define RESET_FLAGS_HWC_UNRECOVERABLE_ERROR ((unsigned int)(1 << 0))
 
+/* pixel: Powercycle the GPU instead of attempting a soft/hard reset (only used on CSF hw). */
+#define RESET_FLAGS_FORCE_PM_HW_RESET ((unsigned int)(1 << 1))
+
 /**
  * kbase_prepare_to_reset_gpu_locked - Prepare for resetting the GPU.
  * @kbdev: Device pointer
@@ -237,6 +248,18 @@ int kbase_reset_gpu_silent(struct kbase_device *kbdev);
 bool kbase_reset_gpu_is_active(struct kbase_device *kbdev);
 
 /**
+ * kbase_reset_gpu_not_pending - Reports if the GPU reset isn't pending
+ *
+ * @kbdev: Device pointer
+ *
+ * Note that unless appropriate locks are held when using this function, the
+ * state could change immediately afterwards.
+ *
+ * Return: True if the GPU reset isn't pending.
+ */
+bool kbase_reset_gpu_is_not_pending(struct kbase_device *kbdev);
+
+/**
  * kbase_reset_gpu_wait - Wait for a GPU reset to complete
  * @kbdev: Device pointer
  *
diff --git a/mali_kbase/mali_kbase_smc.h b/mali_kbase/mali_kbase_smc.h
index 91eb9ee..40a3483 100644
--- a/mali_kbase/mali_kbase_smc.h
+++ b/mali_kbase/mali_kbase_smc.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2015, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -49,7 +49,7 @@
 u64 kbase_invoke_smc_fid(u32 fid, u64 arg0, u64 arg1, u64 arg2);
 
 /**
- * kbase_invoke_smc_fid - Perform a secure monitor call
+ * kbase_invoke_smc - Perform a secure monitor call
  * @oen: Owning Entity number (SIP, STD etc).
  * @function_number: The function number within the OEN.
  * @smc64: use SMC64 calling convention instead of SMC32.
diff --git a/mali_kbase/mali_kbase_softjobs.c b/mali_kbase/mali_kbase_softjobs.c
index bbb0934..31da049 100644
--- a/mali_kbase/mali_kbase_softjobs.c
+++ b/mali_kbase/mali_kbase_softjobs.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -23,7 +23,7 @@
 
 #include <linux/dma-buf.h>
 #include <asm/cacheflush.h>
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 #include <mali_kbase_sync.h>
 #include <mali_kbase_fence.h>
 #endif
@@ -41,6 +41,7 @@
 #include <linux/kernel.h>
 #include <linux/cache.h>
 #include <linux/file.h>
+#include <linux/version_compat_defs.h>
 
 #if !MALI_USE_CSF
 /**
@@ -75,7 +76,7 @@ static void kbasep_add_waiting_with_timeout(struct kbase_jd_atom *katom)
 	/* Record the start time of this atom so we could cancel it at
 	 * the right time.
 	 */
-	katom->start_timestamp = ktime_get();
+	katom->start_timestamp = ktime_get_raw();
 
 	/* Add the atom to the waiting list before the timer is
 	 * (re)started to make sure that it gets processed.
@@ -206,7 +207,7 @@ static int kbase_dump_cpu_gpu_time(struct kbase_jd_atom *katom)
 	return 0;
 }
 
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 /* Called by the explicit fence mechanism when a fence wait has completed */
 void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom)
 {
@@ -215,7 +216,7 @@ void kbase_soft_event_wait_callback(struct kbase_jd_atom *katom)
 	rt_mutex_lock(&kctx->jctx.lock);
 	kbasep_remove_waiting_soft_job(katom);
 	kbase_finish_soft_job(katom);
-	if (jd_done_nolock(katom, true))
+	if (kbase_jd_done_nolock(katom, true))
 		kbase_js_sched_all(kctx->kbdev);
 	rt_mutex_unlock(&kctx->jctx.lock);
 }
@@ -229,7 +230,7 @@ static void kbasep_soft_event_complete_job(struct kthread_work *work)
 	int resched;
 
 	rt_mutex_lock(&kctx->jctx.lock);
-	resched = jd_done_nolock(katom, true);
+	resched = kbase_jd_done_nolock(katom, true);
 	rt_mutex_unlock(&kctx->jctx.lock);
 
 	if (resched)
@@ -390,7 +391,7 @@ void kbasep_soft_job_timeout_worker(struct timer_list *timer)
 			soft_job_timeout);
 	u32 timeout_ms = (u32)atomic_read(
 			&kctx->kbdev->js_data.soft_job_timeout_ms);
-	ktime_t cur_time = ktime_get();
+	ktime_t cur_time = ktime_get_raw();
 	bool restarting = false;
 	unsigned long lflags;
 	struct list_head *entry, *tmp;
@@ -500,10 +501,11 @@ out:
 static void kbasep_soft_event_cancel_job(struct kbase_jd_atom *katom)
 {
 	katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
-	if (jd_done_nolock(katom, true))
+	if (kbase_jd_done_nolock(katom, true))
 		kbase_js_sched_all(katom->kctx->kbdev);
 }
 
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 static void kbase_debug_copy_finish(struct kbase_jd_atom *katom)
 {
 	struct kbase_debug_copy_buffer *buffers = katom->softjob_data;
@@ -675,8 +677,8 @@ static int kbase_debug_copy_prepare(struct kbase_jd_atom *katom)
 		case KBASE_MEM_TYPE_IMPORTED_USER_BUF:
 		{
 			struct kbase_mem_phy_alloc *alloc = reg->gpu_alloc;
-			unsigned long nr_pages =
-				alloc->imported.user_buf.nr_pages;
+			const unsigned long nr_pages = alloc->imported.user_buf.nr_pages;
+			const unsigned long start = alloc->imported.user_buf.address;
 
 			if (alloc->imported.user_buf.mm != current->mm) {
 				ret = -EINVAL;
@@ -688,11 +690,9 @@ static int kbase_debug_copy_prepare(struct kbase_jd_atom *katom)
 				ret = -ENOMEM;
 				goto out_unlock;
 			}
-
-			ret = get_user_pages_fast(
-					alloc->imported.user_buf.address,
-					nr_pages, 0,
-					buffers[i].extres_pages);
+			kbase_gpu_vm_unlock(katom->kctx);
+			ret = get_user_pages_fast(start, nr_pages, 0, buffers[i].extres_pages);
+			kbase_gpu_vm_lock(katom->kctx);
 			if (ret != nr_pages) {
 				/* Adjust number of pages, so that we only
 				 * attempt to release pages in the array that we
@@ -730,7 +730,6 @@ out_cleanup:
 
 	return ret;
 }
-#endif /* !MALI_USE_CSF */
 
 #if KERNEL_VERSION(5, 6, 0) <= LINUX_VERSION_CODE
 static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc,
@@ -753,7 +752,7 @@ static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc,
 		if (page_index == page_num) {
 			*page = sg_page_iter_page(&sg_iter);
 
-			return kmap(*page);
+			return kbase_kmap(*page);
 		}
 		page_index++;
 	}
@@ -762,8 +761,18 @@ static void *dma_buf_kmap_page(struct kbase_mem_phy_alloc *gpu_alloc,
 }
 #endif
 
-int kbase_mem_copy_from_extres(struct kbase_context *kctx,
-		struct kbase_debug_copy_buffer *buf_data)
+/**
+ * kbase_mem_copy_from_extres() - Copy from external resources.
+ *
+ * @kctx:	kbase context within which the copying is to take place.
+ * @buf_data:	Pointer to the information about external resources:
+ *		pages pertaining to the external resource, number of
+ *		pages to copy.
+ *
+ * Return:      0 on success, error code otherwise.
+ */
+static int kbase_mem_copy_from_extres(struct kbase_context *kctx,
+				      struct kbase_debug_copy_buffer *buf_data)
 {
 	unsigned int i;
 	unsigned int target_page_nr = 0;
@@ -789,14 +798,13 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx,
 		for (i = 0; i < buf_data->nr_extres_pages &&
 				target_page_nr < buf_data->nr_pages; i++) {
 			struct page *pg = buf_data->extres_pages[i];
-			void *extres_page = kmap(pg);
-
+			void *extres_page = kbase_kmap(pg);
 			if (extres_page) {
 				ret = kbase_mem_copy_to_pinned_user_pages(
 						pages, extres_page, &to_copy,
 						buf_data->nr_pages,
 						&target_page_nr, offset);
-				kunmap(pg);
+				kbase_kunmap(pg, extres_page);
 				if (ret)
 					goto out_unlock;
 			}
@@ -812,11 +820,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx,
 
 		dma_to_copy = min(dma_buf->size,
 			(size_t)(buf_data->nr_extres_pages * PAGE_SIZE));
-		ret = dma_buf_begin_cpu_access(dma_buf,
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS)
-					       0, dma_to_copy,
-#endif
-					       DMA_FROM_DEVICE);
+		ret = dma_buf_begin_cpu_access(dma_buf, DMA_FROM_DEVICE);
 		if (ret)
 			goto out_unlock;
 
@@ -835,7 +839,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx,
 						&target_page_nr, offset);
 
 #if KERNEL_VERSION(5, 6, 0) <= LINUX_VERSION_CODE
-				kunmap(pg);
+				kbase_kunmap(pg, extres_page);
 #else
 				dma_buf_kunmap(dma_buf, i, extres_page);
 #endif
@@ -843,11 +847,7 @@ int kbase_mem_copy_from_extres(struct kbase_context *kctx,
 					break;
 			}
 		}
-		dma_buf_end_cpu_access(dma_buf,
-#if KERNEL_VERSION(4, 6, 0) > LINUX_VERSION_CODE && !defined(CONFIG_CHROMEOS)
-				       0, dma_to_copy,
-#endif
-				       DMA_FROM_DEVICE);
+		dma_buf_end_cpu_access(dma_buf, DMA_FROM_DEVICE);
 		break;
 	}
 	default:
@@ -858,7 +858,6 @@ out_unlock:
 	return ret;
 }
 
-#if !MALI_USE_CSF
 static int kbase_debug_copy(struct kbase_jd_atom *katom)
 {
 	struct kbase_debug_copy_buffer *buffers = katom->softjob_data;
@@ -876,6 +875,7 @@ static int kbase_debug_copy(struct kbase_jd_atom *katom)
 
 	return 0;
 }
+#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */
 #endif /* !MALI_USE_CSF */
 
 #define KBASEP_JIT_ALLOC_GPU_ADDR_ALIGNMENT ((u32)0x7)
@@ -935,26 +935,6 @@ int kbasep_jit_alloc_validate(struct kbase_context *kctx,
 
 #if !MALI_USE_CSF
 
-/*
- * Sizes of user data to copy for each just-in-time memory interface version
- *
- * In interface version 2 onwards this is the same as the struct size, allowing
- * copying of arrays of structures from userspace.
- *
- * In interface version 1 the structure size was variable, and hence arrays of
- * structures cannot be supported easily, and were not a feature present in
- * version 1 anyway.
- */
-static const size_t jit_info_copy_size_for_jit_version[] = {
-	/* in jit_version 1, the structure did not have any end padding, hence
-	 * it could be a different size on 32 and 64-bit clients. We therefore
-	 * do not copy past the last member
-	 */
-	[1] = offsetofend(struct base_jit_alloc_info_10_2, id),
-	[2] = sizeof(struct base_jit_alloc_info_11_5),
-	[3] = sizeof(struct base_jit_alloc_info)
-};
-
 static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom)
 {
 	__user u8 *data = (__user u8 *)(uintptr_t) katom->jc;
@@ -964,18 +944,18 @@ static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom)
 	u32 count;
 	int ret;
 	u32 i;
-	size_t jit_info_user_copy_size;
 
-	WARN_ON(kctx->jit_version >=
-		ARRAY_SIZE(jit_info_copy_size_for_jit_version));
-	jit_info_user_copy_size =
-			jit_info_copy_size_for_jit_version[kctx->jit_version];
-	WARN_ON(jit_info_user_copy_size > sizeof(*info));
+	if (!kbase_mem_allow_alloc(kctx)) {
+		dev_dbg(kbdev->dev, "Invalid attempt to allocate JIT memory by %s/%d for ctx %d_%d",
+			current->comm, current->pid, kctx->tgid, kctx->id);
+		ret = -EINVAL;
+		goto fail;
+	}
 
 	/* For backwards compatibility, and to prevent reading more than 1 jit
 	 * info struct on jit version 1
 	 */
-	if (katom->nr_extres == 0 || kctx->jit_version == 1)
+	if (katom->nr_extres == 0)
 		katom->nr_extres = 1;
 	count = katom->nr_extres;
 
@@ -995,17 +975,11 @@ static int kbase_jit_allocate_prepare(struct kbase_jd_atom *katom)
 
 	katom->softjob_data = info;
 
-	for (i = 0; i < count; i++, info++, data += jit_info_user_copy_size) {
-		if (copy_from_user(info, data, jit_info_user_copy_size) != 0) {
+	for (i = 0; i < count; i++, info++, data += sizeof(*info)) {
+		if (copy_from_user(info, data, sizeof(*info)) != 0) {
 			ret = -EINVAL;
 			goto free_info;
 		}
-		/* Clear any remaining bytes when user struct is smaller than
-		 * kernel struct. For jit version 1, this also clears the
-		 * padding bytes
-		 */
-		memset(((u8 *)info) + jit_info_user_copy_size, 0,
-				sizeof(*info) - jit_info_user_copy_size);
 
 		ret = kbasep_jit_alloc_validate(kctx, info);
 		if (ret)
@@ -1357,7 +1331,7 @@ static void kbasep_jit_finish_worker(struct kthread_work *work)
 
 	rt_mutex_lock(&kctx->jctx.lock);
 	kbase_finish_soft_job(katom);
-	resched = jd_done_nolock(katom, true);
+	resched = kbase_jd_done_nolock(katom, true);
 	rt_mutex_unlock(&kctx->jctx.lock);
 
 	if (resched)
@@ -1486,10 +1460,11 @@ static void kbase_ext_res_process(struct kbase_jd_atom *katom, bool map)
 			if (!kbase_sticky_resource_acquire(katom->kctx,
 					gpu_addr))
 				goto failed_loop;
-		} else
+		} else {
 			if (!kbase_sticky_resource_release_force(katom->kctx, NULL,
 					gpu_addr))
 				failed = true;
+		}
 	}
 
 	/*
@@ -1549,7 +1524,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom)
 		ret = kbase_dump_cpu_gpu_time(katom);
 		break;
 
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	case BASE_JD_REQ_SOFT_FENCE_TRIGGER:
 		katom->event_code = kbase_sync_fence_out_trigger(katom,
 				katom->event_code == BASE_JD_EVENT_DONE ?
@@ -1578,6 +1553,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom)
 	case BASE_JD_REQ_SOFT_EVENT_RESET:
 		kbasep_soft_event_update_locked(katom, BASE_JD_SOFT_EVENT_RESET);
 		break;
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 	case BASE_JD_REQ_SOFT_DEBUG_COPY:
 	{
 		int res = kbase_debug_copy(katom);
@@ -1586,6 +1562,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom)
 			katom->event_code = BASE_JD_EVENT_JOB_INVALID;
 		break;
 	}
+#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */
 	case BASE_JD_REQ_SOFT_JIT_ALLOC:
 		ret = kbase_jit_allocate_process(katom);
 		break;
@@ -1609,7 +1586,7 @@ int kbase_process_soft_job(struct kbase_jd_atom *katom)
 void kbase_cancel_soft_job(struct kbase_jd_atom *katom)
 {
 	switch (katom->core_req & BASE_JD_REQ_SOFT_JOB_TYPE) {
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	case BASE_JD_REQ_SOFT_FENCE_WAIT:
 		kbase_sync_fence_in_cancel_wait(katom);
 		break;
@@ -1632,7 +1609,7 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom)
 				return -EINVAL;
 		}
 		break;
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	case BASE_JD_REQ_SOFT_FENCE_TRIGGER:
 		{
 			struct base_fence fence;
@@ -1683,20 +1660,9 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom)
 							  fence.basep.fd);
 			if (ret < 0)
 				return ret;
-
-#ifdef CONFIG_MALI_DMA_FENCE
-			/*
-			 * Set KCTX_NO_IMPLICIT_FENCE in the context the first
-			 * time a soft fence wait job is observed. This will
-			 * prevent the implicit dma-buf fence to conflict with
-			 * the Android native sync fences.
-			 */
-			if (!kbase_ctx_flag(katom->kctx, KCTX_NO_IMPLICIT_SYNC))
-				kbase_ctx_flag_set(katom->kctx, KCTX_NO_IMPLICIT_SYNC);
-#endif /* CONFIG_MALI_DMA_FENCE */
 		}
 		break;
-#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */
+#endif /* CONFIG_SYNC_FILE */
 	case BASE_JD_REQ_SOFT_JIT_ALLOC:
 		return kbase_jit_allocate_prepare(katom);
 	case BASE_JD_REQ_SOFT_JIT_FREE:
@@ -1707,8 +1673,10 @@ int kbase_prepare_soft_job(struct kbase_jd_atom *katom)
 		if (katom->jc == 0)
 			return -EINVAL;
 		break;
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 	case BASE_JD_REQ_SOFT_DEBUG_COPY:
 		return kbase_debug_copy_prepare(katom);
+#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */
 	case BASE_JD_REQ_SOFT_EXT_RES_MAP:
 		return kbase_ext_res_prepare(katom);
 	case BASE_JD_REQ_SOFT_EXT_RES_UNMAP:
@@ -1729,7 +1697,7 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom)
 	case BASE_JD_REQ_SOFT_DUMP_CPU_GPU_TIME:
 		/* Nothing to do */
 		break;
-#if defined(CONFIG_SYNC) || defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 	case BASE_JD_REQ_SOFT_FENCE_TRIGGER:
 		/* If fence has not yet been signaled, do it now */
 		kbase_sync_fence_out_trigger(katom, katom->event_code ==
@@ -1739,10 +1707,12 @@ void kbase_finish_soft_job(struct kbase_jd_atom *katom)
 		/* Release katom's reference to fence object */
 		kbase_sync_fence_in_remove(katom);
 		break;
-#endif /* CONFIG_SYNC || CONFIG_SYNC_FILE */
+#endif /* CONFIG_SYNC_FILE */
+#if IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST
 	case BASE_JD_REQ_SOFT_DEBUG_COPY:
 		kbase_debug_copy_finish(katom);
 		break;
+#endif /* IS_ENABLED(CONFIG_MALI_VECTOR_DUMP) || MALI_UNIT_TEST */
 	case BASE_JD_REQ_SOFT_JIT_ALLOC:
 		kbase_jit_allocate_finish(katom);
 		break;
@@ -1793,7 +1763,7 @@ void kbase_resume_suspended_soft_jobs(struct kbase_device *kbdev)
 
 		if (kbase_process_soft_job(katom_iter) == 0) {
 			kbase_finish_soft_job(katom_iter);
-			resched |= jd_done_nolock(katom_iter, true);
+			resched |= kbase_jd_done_nolock(katom_iter, true);
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 			atomic_dec(&kbdev->pm.gpu_users_waiting);
 #endif /* CONFIG_MALI_ARBITER_SUPPORT */
diff --git a/mali_kbase/mali_kbase_sync.h b/mali_kbase/mali_kbase_sync.h
index e820dcc..2b466a6 100644
--- a/mali_kbase/mali_kbase_sync.h
+++ b/mali_kbase/mali_kbase_sync.h
@@ -30,9 +30,6 @@
 
 #include <linux/fdtable.h>
 #include <linux/syscalls.h>
-#if IS_ENABLED(CONFIG_SYNC)
-#include <sync.h>
-#endif
 #if IS_ENABLED(CONFIG_SYNC_FILE)
 #include "mali_kbase_fence_defs.h"
 #include <linux/sync_file.h>
@@ -181,7 +178,7 @@ int kbase_sync_fence_out_info_get(struct kbase_jd_atom *katom,
 				  struct kbase_sync_fence_info *info);
 #endif /* !MALI_USE_CSF */
 
-#if defined(CONFIG_SYNC_FILE)
+#if IS_ENABLED(CONFIG_SYNC_FILE)
 #if (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE)
 void kbase_sync_fence_info_get(struct fence *fence,
 			       struct kbase_sync_fence_info *info);
diff --git a/mali_kbase/mali_kbase_sync_android.c b/mali_kbase/mali_kbase_sync_android.c
deleted file mode 100644
index c028b1c..0000000
--- a/mali_kbase/mali_kbase_sync_android.c
+++ /dev/null
@@ -1,520 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
-/*
- *
- * (C) COPYRIGHT 2012-2017, 2020-2021 ARM Limited. All rights reserved.
- *
- * This program is free software and is provided to you under the terms of the
- * GNU General Public License version 2 as published by the Free Software
- * Foundation, and any use by you of this program is subject to the terms
- * of such GNU license.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, you can access it online at
- * http://www.gnu.org/licenses/gpl-2.0.html.
- *
- */
-
-/*
- * Code for supporting explicit Android fences (CONFIG_SYNC)
- * Known to be good for kernels 4.5 and earlier.
- * Replaced with CONFIG_SYNC_FILE for 4.9 and later kernels
- * (see mali_kbase_sync_file.c)
- */
-
-#include <linux/sched.h>
-#include <linux/fdtable.h>
-#include <linux/file.h>
-#include <linux/fs.h>
-#include <linux/module.h>
-#include <linux/anon_inodes.h>
-#include <linux/version.h>
-#include "sync.h"
-#include <mali_kbase.h>
-#include <mali_kbase_sync.h>
-
-struct mali_sync_timeline {
-	struct sync_timeline timeline;
-	atomic_t counter;
-	atomic_t signaled;
-};
-
-struct mali_sync_pt {
-	struct sync_pt pt;
-	int order;
-	int result;
-};
-
-static struct mali_sync_timeline *to_mali_sync_timeline(
-						struct sync_timeline *timeline)
-{
-	return container_of(timeline, struct mali_sync_timeline, timeline);
-}
-
-static struct mali_sync_pt *to_mali_sync_pt(struct sync_pt *pt)
-{
-	return container_of(pt, struct mali_sync_pt, pt);
-}
-
-static struct sync_pt *timeline_dup(struct sync_pt *pt)
-{
-	struct mali_sync_pt *mpt = to_mali_sync_pt(pt);
-	struct mali_sync_pt *new_mpt;
-	struct sync_pt *new_pt = sync_pt_create(sync_pt_parent(pt),
-						sizeof(struct mali_sync_pt));
-
-	if (!new_pt)
-		return NULL;
-
-	new_mpt = to_mali_sync_pt(new_pt);
-	new_mpt->order = mpt->order;
-	new_mpt->result = mpt->result;
-
-	return new_pt;
-}
-
-static int timeline_has_signaled(struct sync_pt *pt)
-{
-	struct mali_sync_pt *mpt = to_mali_sync_pt(pt);
-	struct mali_sync_timeline *mtl = to_mali_sync_timeline(
-							sync_pt_parent(pt));
-	int result = mpt->result;
-
-	int diff = atomic_read(&mtl->signaled) - mpt->order;
-
-	if (diff >= 0)
-		return (result < 0) ? result : 1;
-
-	return 0;
-}
-
-static int timeline_compare(struct sync_pt *a, struct sync_pt *b)
-{
-	struct mali_sync_pt *ma = container_of(a, struct mali_sync_pt, pt);
-	struct mali_sync_pt *mb = container_of(b, struct mali_sync_pt, pt);
-
-	int diff = ma->order - mb->order;
-
-	if (diff == 0)
-		return 0;
-
-	return (diff < 0) ? -1 : 1;
-}
-
-static void timeline_value_str(struct sync_timeline *timeline, char *str,
-			       int size)
-{
-	struct mali_sync_timeline *mtl = to_mali_sync_timeline(timeline);
-
-	snprintf(str, size, "%d", atomic_read(&mtl->signaled));
-}
-
-static void pt_value_str(struct sync_pt *pt, char *str, int size)
-{
-	struct mali_sync_pt *mpt = to_mali_sync_pt(pt);
-
-	snprintf(str, size, "%d(%d)", mpt->order, mpt->result);
-}
-
-static struct sync_timeline_ops mali_timeline_ops = {
-	.driver_name = "Mali",
-	.dup = timeline_dup,
-	.has_signaled = timeline_has_signaled,
-	.compare = timeline_compare,
-	.timeline_value_str = timeline_value_str,
-	.pt_value_str       = pt_value_str,
-};
-
-/* Allocates a timeline for Mali
- *
- * One timeline should be allocated per API context.
- */
-static struct sync_timeline *mali_sync_timeline_alloc(const char *name)
-{
-	struct sync_timeline *tl;
-	struct mali_sync_timeline *mtl;
-
-	tl = sync_timeline_create(&mali_timeline_ops,
-				  sizeof(struct mali_sync_timeline), name);
-	if (!tl)
-		return NULL;
-
-	/* Set the counter in our private struct */
-	mtl = to_mali_sync_timeline(tl);
-	atomic_set(&mtl->counter, 0);
-	atomic_set(&mtl->signaled, 0);
-
-	return tl;
-}
-
-static int kbase_stream_close(struct inode *inode, struct file *file)
-{
-	struct sync_timeline *tl;
-
-	tl = (struct sync_timeline *)file->private_data;
-	sync_timeline_destroy(tl);
-	return 0;
-}
-
-static const struct file_operations stream_fops = {
-	.owner = THIS_MODULE,
-	.release = kbase_stream_close,
-};
-
-int kbase_sync_fence_stream_create(const char *name, int *const out_fd)
-{
-	struct sync_timeline *tl;
-
-	if (!out_fd)
-		return -EINVAL;
-
-	tl = mali_sync_timeline_alloc(name);
-	if (!tl)
-		return -EINVAL;
-
-	*out_fd = anon_inode_getfd(name, &stream_fops, tl, O_RDONLY|O_CLOEXEC);
-
-	if (*out_fd < 0) {
-		sync_timeline_destroy(tl);
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-#if !MALI_USE_CSF
-/* Allocates a sync point within the timeline.
- *
- * The timeline must be the one allocated by kbase_sync_timeline_alloc
- *
- * Sync points must be triggered in *exactly* the same order as they are
- * allocated.
- */
-static struct sync_pt *kbase_sync_pt_alloc(struct sync_timeline *parent)
-{
-	struct sync_pt *pt = sync_pt_create(parent,
-					    sizeof(struct mali_sync_pt));
-	struct mali_sync_timeline *mtl = to_mali_sync_timeline(parent);
-	struct mali_sync_pt *mpt;
-
-	if (!pt)
-		return NULL;
-
-	mpt = to_mali_sync_pt(pt);
-	mpt->order = atomic_inc_return(&mtl->counter);
-	mpt->result = 0;
-
-	return pt;
-}
-
-int kbase_sync_fence_out_create(struct kbase_jd_atom *katom, int tl_fd)
-{
-	struct sync_timeline *tl;
-	struct sync_pt *pt;
-	struct sync_fence *fence;
-	int fd;
-	struct file *tl_file;
-
-	tl_file = fget(tl_fd);
-	if (tl_file == NULL)
-		return -EBADF;
-
-	if (tl_file->f_op != &stream_fops) {
-		fd = -EBADF;
-		goto out;
-	}
-
-	tl = tl_file->private_data;
-
-	pt = kbase_sync_pt_alloc(tl);
-	if (!pt) {
-		fd = -EFAULT;
-		goto out;
-	}
-
-	fence = sync_fence_create("mali_fence", pt);
-	if (!fence) {
-		sync_pt_free(pt);
-		fd = -EFAULT;
-		goto out;
-	}
-
-	/* from here the fence owns the sync_pt */
-
-	/* create a fd representing the fence */
-	fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
-	if (fd < 0) {
-		sync_fence_put(fence);
-		goto out;
-	}
-
-	/* bind fence to the new fd */
-	sync_fence_install(fence, fd);
-
-	katom->fence = sync_fence_fdget(fd);
-	if (katom->fence == NULL) {
-		/* The only way the fence can be NULL is if userspace closed it
-		 * for us, so we don't need to clear it up
-		 */
-		fd = -EINVAL;
-		goto out;
-	}
-
-out:
-	fput(tl_file);
-
-	return fd;
-}
-
-int kbase_sync_fence_in_from_fd(struct kbase_jd_atom *katom, int fd)
-{
-	katom->fence = sync_fence_fdget(fd);
-	return katom->fence ? 0 : -ENOENT;
-}
-#endif /* !MALI_USE_CSF */
-
-int kbase_sync_fence_validate(int fd)
-{
-	struct sync_fence *fence;
-
-	fence = sync_fence_fdget(fd);
-	if (!fence)
-		return -EINVAL;
-
-	sync_fence_put(fence);
-	return 0;
-}
-
-#if !MALI_USE_CSF
-/* Returns true if the specified timeline is allocated by Mali */
-static int kbase_sync_timeline_is_ours(struct sync_timeline *timeline)
-{
-	return timeline->ops == &mali_timeline_ops;
-}
-
-/* Signals a particular sync point
- *
- * Sync points must be triggered in *exactly* the same order as they are
- * allocated.
- *
- * If they are signaled in the wrong order then a message will be printed in
- * debug builds and otherwise attempts to signal order sync_pts will be ignored.
- *
- * result can be negative to indicate error, any other value is interpreted as
- * success.
- */
-static void kbase_sync_signal_pt(struct sync_pt *pt, int result)
-{
-	struct mali_sync_pt *mpt = to_mali_sync_pt(pt);
-	struct mali_sync_timeline *mtl = to_mali_sync_timeline(
-							sync_pt_parent(pt));
-	int signaled;
-	int diff;
-
-	mpt->result = result;
-
-	do {
-		signaled = atomic_read(&mtl->signaled);
-
-		diff = signaled - mpt->order;
-
-		if (diff > 0) {
-			/* The timeline is already at or ahead of this point.
-			 * This should not happen unless userspace has been
-			 * signaling fences out of order, so warn but don't
-			 * violate the sync_pt API.
-			 * The warning is only in debug builds to prevent
-			 * a malicious user being able to spam dmesg.
-			 */
-#ifdef CONFIG_MALI_DEBUG
-			pr_err("Fences were triggered in a different order to allocation!");
-#endif				/* CONFIG_MALI_DEBUG */
-			return;
-		}
-	} while (atomic_cmpxchg(&mtl->signaled,
-				signaled, mpt->order) != signaled);
-}
-
-enum base_jd_event_code
-kbase_sync_fence_out_trigger(struct kbase_jd_atom *katom, int result)
-{
-	struct sync_pt *pt;
-	struct sync_timeline *timeline;
-
-	if (!katom->fence)
-		return BASE_JD_EVENT_JOB_CANCELLED;
-
-	if (katom->fence->num_fences != 1) {
-		/* Not exactly one item in the list - so it didn't (directly)
-		 * come from us
-		 */
-		return BASE_JD_EVENT_JOB_CANCELLED;
-	}
-
-	pt = container_of(katom->fence->cbs[0].sync_pt, struct sync_pt, base);
-	timeline = sync_pt_parent(pt);
-
-	if (!kbase_sync_timeline_is_ours(timeline)) {
-		/* Fence has a sync_pt which isn't ours! */
-		return BASE_JD_EVENT_JOB_CANCELLED;
-	}
-
-	kbase_sync_signal_pt(pt, result);
-
-	sync_timeline_signal(timeline);
-
-	kbase_sync_fence_out_remove(katom);
-
-	return (result < 0) ? BASE_JD_EVENT_JOB_CANCELLED : BASE_JD_EVENT_DONE;
-}
-
-static inline int kbase_fence_get_status(struct sync_fence *fence)
-{
-	if (!fence)
-		return -ENOENT;
-
-	return atomic_read(&fence->status);
-}
-
-static void kbase_fence_wait_callback(struct sync_fence *fence,
-				      struct sync_fence_waiter *waiter)
-{
-	struct kbase_jd_atom *katom = container_of(waiter,
-					struct kbase_jd_atom, sync_waiter);
-	struct kbase_context *kctx = katom->kctx;
-
-	/* Propagate the fence status to the atom.
-	 * If negative then cancel this atom and its dependencies.
-	 */
-	if (kbase_fence_get_status(fence) < 0)
-		katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
-
-	/* To prevent a potential deadlock we schedule the work onto the
-	 * job_done_worker kthread
-	 *
-	 * The issue is that we may signal the timeline while holding
-	 * kctx->jctx.lock and the callbacks are run synchronously from
-	 * sync_timeline_signal. So we simply defer the work.
-	 */
-
-	kthread_init_work(&katom->work, kbase_sync_fence_wait_worker);
-	kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work);
-}
-
-int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom)
-{
-	int ret;
-
-	sync_fence_waiter_init(&katom->sync_waiter, kbase_fence_wait_callback);
-
-	ret = sync_fence_wait_async(katom->fence, &katom->sync_waiter);
-
-	if (ret == 1) {
-		/* Already signaled */
-		return 0;
-	}
-
-	if (ret < 0) {
-		katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
-		/* We should cause the dependent jobs in the bag to be failed,
-		 * to do this we schedule the work queue to complete this job
-		 */
-		kthread_init_work(&katom->work, kbase_sync_fence_wait_worker);
-		kthread_queue_work(&katom->kctx->kbdev->job_done_worker, &katom->work);
-
-	}
-
-	return 1;
-}
-
-void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom)
-{
-	if (sync_fence_cancel_async(katom->fence, &katom->sync_waiter) != 0) {
-		/* The wait wasn't cancelled - leave the cleanup for
-		 * kbase_fence_wait_callback
-		 */
-		return;
-	}
-
-	/* Wait was cancelled - zap the atoms */
-	katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
-
-	kbasep_remove_waiting_soft_job(katom);
-	kbase_finish_soft_job(katom);
-
-	if (jd_done_nolock(katom, true))
-		kbase_js_sched_all(katom->kctx->kbdev);
-}
-
-void kbase_sync_fence_out_remove(struct kbase_jd_atom *katom)
-{
-	if (katom->fence) {
-		sync_fence_put(katom->fence);
-		katom->fence = NULL;
-	}
-}
-
-void kbase_sync_fence_in_remove(struct kbase_jd_atom *katom)
-{
-	if (katom->fence) {
-		sync_fence_put(katom->fence);
-		katom->fence = NULL;
-	}
-}
-
-int kbase_sync_fence_in_info_get(struct kbase_jd_atom *katom,
-				 struct kbase_sync_fence_info *info)
-{
-	u32 string_len;
-
-	if (!katom->fence)
-		return -ENOENT;
-
-	info->fence = katom->fence;
-	info->status = kbase_fence_get_status(katom->fence);
-
-	string_len = strscpy(info->name, katom->fence->name, sizeof(info->name));
-	string_len += sizeof(char);
-	/* Make sure that the source string fit into the buffer. */
-	KBASE_DEBUG_ASSERT(string_len <= sizeof(info->name));
-	CSTD_UNUSED(string_len);
-
-	return 0;
-}
-
-int kbase_sync_fence_out_info_get(struct kbase_jd_atom *katom,
-				 struct kbase_sync_fence_info *info)
-{
-	u32 string_len;
-
-	if (!katom->fence)
-		return -ENOENT;
-
-	info->fence = katom->fence;
-	info->status = kbase_fence_get_status(katom->fence);
-
-	string_len = strscpy(info->name, katom->fence->name, sizeof(info->name));
-	string_len += sizeof(char);
-	/* Make sure that the source string fit into the buffer. */
-	KBASE_DEBUG_ASSERT(string_len <= sizeof(info->name));
-	CSTD_UNUSED(string_len);
-
-	return 0;
-}
-
-#ifdef CONFIG_MALI_FENCE_DEBUG
-void kbase_sync_fence_in_dump(struct kbase_jd_atom *katom)
-{
-	/* Dump out the full state of all the Android sync fences.
-	 * The function sync_dump() isn't exported to modules, so force
-	 * sync_fence_wait() to time out to trigger sync_dump().
-	 */
-	if (katom->fence)
-		sync_fence_wait(katom->fence, 1);
-}
-#endif
-#endif /* !MALI_USE_CSF */
diff --git a/mali_kbase/mali_kbase_sync_file.c b/mali_kbase/mali_kbase_sync_file.c
index 1462a6b..d98eba9 100644
--- a/mali_kbase/mali_kbase_sync_file.c
+++ b/mali_kbase/mali_kbase_sync_file.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2012-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2012-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -21,9 +21,6 @@
 
 /*
  * Code for supporting explicit Linux fences (CONFIG_SYNC_FILE)
- * Introduced in kernel 4.9.
- * Android explicit fences (CONFIG_SYNC) can be used for older kernels
- * (see mali_kbase_sync_android.c)
  */
 
 #include <linux/sched.h>
@@ -101,10 +98,13 @@ int kbase_sync_fence_in_from_fd(struct kbase_jd_atom *katom, int fd)
 	struct dma_fence *fence = sync_file_get_fence(fd);
 #endif
 
+	lockdep_assert_held(&katom->kctx->jctx.lock);
+
 	if (!fence)
 		return -ENOENT;
 
 	kbase_fence_fence_in_set(katom, fence);
+	katom->dma_fence.fence_cb_added = false;
 
 	return 0;
 }
@@ -156,36 +156,31 @@ static void kbase_fence_wait_callback(struct dma_fence *fence,
 				      struct dma_fence_cb *cb)
 #endif
 {
-	struct kbase_fence_cb *kcb = container_of(cb,
-				struct kbase_fence_cb,
-				fence_cb);
-	struct kbase_jd_atom *katom = kcb->katom;
+	struct kbase_jd_atom *katom = container_of(cb, struct kbase_jd_atom,
+						   dma_fence.fence_cb);
 	struct kbase_context *kctx = katom->kctx;
 
 	/* Cancel atom if fence is erroneous */
+	if (dma_fence_is_signaled(katom->dma_fence.fence_in) &&
 #if (KERNEL_VERSION(4, 11, 0) <= LINUX_VERSION_CODE || \
 	 (KERNEL_VERSION(4, 10, 0) > LINUX_VERSION_CODE && \
 	  KERNEL_VERSION(4, 9, 68) <= LINUX_VERSION_CODE))
-	if (dma_fence_is_signaled(kcb->fence) && kcb->fence->error < 0)
+	    katom->dma_fence.fence_in->error < 0)
 #else
-	if (dma_fence_is_signaled(kcb->fence) && kcb->fence->status < 0)
+	    katom->dma_fence.fence_in->status < 0)
 #endif
 		katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
 
-	if (kbase_fence_dep_count_dec_and_test(katom)) {
-		/* We take responsibility of handling this */
-		kbase_fence_dep_count_set(katom, -1);
 
-		/* To prevent a potential deadlock we schedule the work onto the
-		 * job_done_worker kthread
-		 *
-		 * The issue is that we may signal the timeline while holding
-		 * kctx->jctx.lock and the callbacks are run synchronously from
-		 * sync_timeline_signal. So we simply defer the work.
-		 */
-		kthread_init_work(&katom->work, kbase_sync_fence_wait_worker);
-		kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work);
-	}
+	/* To prevent a potential deadlock we schedule the work onto the
+	 * job_done_wq workqueue
+	 *
+	 * The issue is that we may signal the timeline while holding
+	 * kctx->jctx.lock and the callbacks are run synchronously from
+	 * sync_timeline_signal. So we simply defer the work.
+	 */
+	kthread_init_work(&katom->work, kbase_sync_fence_wait_worker);
+	kthread_queue_work(&kctx->kbdev->job_done_worker, &katom->work);
 }
 
 int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom)
@@ -197,53 +192,77 @@ int kbase_sync_fence_in_wait(struct kbase_jd_atom *katom)
 	struct dma_fence *fence;
 #endif
 
-	fence = kbase_fence_in_get(katom);
+	lockdep_assert_held(&katom->kctx->jctx.lock);
+
+	fence = katom->dma_fence.fence_in;
 	if (!fence)
 		return 0; /* no input fence to wait for, good to go! */
 
-	kbase_fence_dep_count_set(katom, 1);
+	err = dma_fence_add_callback(fence, &katom->dma_fence.fence_cb,
+				     kbase_fence_wait_callback);
+	if (err == -ENOENT) {
+		int fence_status = dma_fence_get_status(fence);
+
+		if (fence_status == 1) {
+			/* Fence is already signaled with no error. The completion
+			 * for FENCE_WAIT softjob can be done right away.
+			 */
+			return 0;
+		}
 
-	err = kbase_fence_add_callback(katom, fence, kbase_fence_wait_callback);
+		/* Fence shouldn't be in not signaled state */
+		if (!fence_status) {
+			struct kbase_sync_fence_info info;
 
-	kbase_fence_put(fence);
+			kbase_sync_fence_in_info_get(katom, &info);
 
-	if (likely(!err)) {
-		/* Test if the callbacks are already triggered */
-		if (kbase_fence_dep_count_dec_and_test(katom)) {
-			kbase_fence_free_callbacks(katom);
-			kbase_fence_dep_count_set(katom, -1);
-			return 0; /* Already signaled, good to go right now */
+			dev_warn(katom->kctx->kbdev->dev,
+				 "Unexpected status for fence %s of ctx:%d_%d atom:%d",
+				 info.name, katom->kctx->tgid, katom->kctx->id,
+				 kbase_jd_atom_id(katom->kctx, katom));
 		}
 
-		/* Callback installed, so we just need to wait for it... */
-	} else {
-		/* Failure */
-		kbase_fence_free_callbacks(katom);
-		kbase_fence_dep_count_set(katom, -1);
+		/* If fence is signaled with an error, then the FENCE_WAIT softjob is
+		 * considered to be failed.
+		 */
+	}
 
+	if (unlikely(err)) {
+		/* We should cause the dependent jobs in the bag to be failed. */
 		katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
 
-		/* We should cause the dependent jobs in the bag to be failed,
-		 * to do this we schedule the work queue to complete this job
-		 */
-		kthread_init_work(&katom->work, kbase_sync_fence_wait_worker);
-		kthread_queue_work(&katom->kctx->kbdev->job_done_worker, &katom->work);
+		/* The completion for FENCE_WAIT softjob can be done right away. */
+		return 0;
 	}
 
-	return 1; /* completion to be done later by callback/worker */
+	/* Callback was successfully installed */
+	katom->dma_fence.fence_cb_added = true;
+
+	/* Completion to be done later by callback/worker */
+	return 1;
 }
 
 void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom)
 {
-	if (!kbase_fence_free_callbacks(katom)) {
-		/* The wait wasn't cancelled -
-		 * leave the cleanup for kbase_fence_wait_callback
-		 */
-		return;
-	}
+	lockdep_assert_held(&katom->kctx->jctx.lock);
+
+	if (katom->dma_fence.fence_cb_added) {
+		if (!dma_fence_remove_callback(katom->dma_fence.fence_in,
+					       &katom->dma_fence.fence_cb)) {
+			/* The callback is already removed so leave the cleanup
+			 * for kbase_fence_wait_callback.
+			 */
+			return;
+		}
+	} else {
+		struct kbase_sync_fence_info info;
 
-	/* Take responsibility of completion */
-	kbase_fence_dep_count_set(katom, -1);
+		kbase_sync_fence_in_info_get(katom, &info);
+		dev_warn(katom->kctx->kbdev->dev,
+			 "Callback was not added earlier for fence %s of ctx:%d_%d atom:%d",
+			 info.name, katom->kctx->tgid, katom->kctx->id,
+			 kbase_jd_atom_id(katom->kctx, katom));
+	}
 
 	/* Wait was cancelled - zap the atoms */
 	katom->event_code = BASE_JD_EVENT_JOB_CANCELLED;
@@ -251,7 +270,7 @@ void kbase_sync_fence_in_cancel_wait(struct kbase_jd_atom *katom)
 	kbasep_remove_waiting_soft_job(katom);
 	kbase_finish_soft_job(katom);
 
-	if (jd_done_nolock(katom, true))
+	if (kbase_jd_done_nolock(katom, true))
 		kbase_js_sched_all(katom->kctx->kbdev);
 }
 
@@ -262,8 +281,29 @@ void kbase_sync_fence_out_remove(struct kbase_jd_atom *katom)
 
 void kbase_sync_fence_in_remove(struct kbase_jd_atom *katom)
 {
-	kbase_fence_free_callbacks(katom);
+	lockdep_assert_held(&katom->kctx->jctx.lock);
+
+	if (katom->dma_fence.fence_cb_added) {
+		bool removed = dma_fence_remove_callback(katom->dma_fence.fence_in,
+							 &katom->dma_fence.fence_cb);
+
+		/* Here it is expected that the callback should have already been removed
+		 * previously either by kbase_sync_fence_in_cancel_wait() or when the fence
+		 * was signaled and kbase_sync_fence_wait_worker() was called.
+		 */
+		if (removed) {
+			struct kbase_sync_fence_info info;
+
+			kbase_sync_fence_in_info_get(katom, &info);
+			dev_warn(katom->kctx->kbdev->dev,
+				 "Callback was not removed earlier for fence %s of ctx:%d_%d atom:%d",
+				 info.name, katom->kctx->tgid, katom->kctx->id,
+				 kbase_jd_atom_id(katom->kctx, katom));
+		}
+	}
+
 	kbase_fence_in_remove(katom);
+	katom->dma_fence.fence_cb_added = false;
 }
 #endif /* !MALI_USE_CSF */
 
@@ -277,7 +317,7 @@ void kbase_sync_fence_info_get(struct dma_fence *fence,
 {
 	info->fence = fence;
 
-	/* translate into CONFIG_SYNC status:
+	/* Translate into the following status, with support for error handling:
 	 * < 0 : error
 	 * 0 : active
 	 * 1 : signaled
@@ -298,10 +338,7 @@ void kbase_sync_fence_info_get(struct dma_fence *fence,
 		info->status = 0; /* still active (unsignaled) */
 	}
 
-#if (KERNEL_VERSION(4, 8, 0) > LINUX_VERSION_CODE)
-	scnprintf(info->name, sizeof(info->name), "%u#%u",
-		  fence->context, fence->seqno);
-#elif (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
+#if (KERNEL_VERSION(5, 1, 0) > LINUX_VERSION_CODE)
 	scnprintf(info->name, sizeof(info->name), "%llu#%u",
 		  fence->context, fence->seqno);
 #else
diff --git a/mali_kbase/mali_kbase_utility.h b/mali_kbase/mali_kbase_utility.h
deleted file mode 100644
index 2dad49b..0000000
--- a/mali_kbase/mali_kbase_utility.h
+++ /dev/null
@@ -1,52 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- *
- * (C) COPYRIGHT 2012-2013, 2015, 2018, 2020-2021 ARM Limited. All rights reserved.
- *
- * This program is free software and is provided to you under the terms of the
- * GNU General Public License version 2 as published by the Free Software
- * Foundation, and any use by you of this program is subject to the terms
- * of such GNU license.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, you can access it online at
- * http://www.gnu.org/licenses/gpl-2.0.html.
- *
- */
-
-#ifndef _KBASE_UTILITY_H
-#define _KBASE_UTILITY_H
-
-#ifndef _KBASE_H_
-#error "Don't include this file directly, use mali_kbase.h instead"
-#endif
-
-static inline void kbase_timer_setup(struct timer_list *timer,
-				     void (*callback)(struct timer_list *timer))
-{
-#if KERNEL_VERSION(4, 14, 0) > LINUX_VERSION_CODE
-	setup_timer(timer, (void (*)(unsigned long)) callback,
-			(unsigned long) timer);
-#else
-	timer_setup(timer, callback, 0);
-#endif
-}
-
-#ifndef WRITE_ONCE
-	#ifdef ASSIGN_ONCE
-		#define WRITE_ONCE(x, val) ASSIGN_ONCE(val, x)
-	#else
-		#define WRITE_ONCE(x, val) (ACCESS_ONCE(x) = (val))
-	#endif
-#endif
-
-#ifndef READ_ONCE
-	#define READ_ONCE(x) ACCESS_ONCE(x)
-#endif
-
-#endif				/* _KBASE_UTILITY_H */
diff --git a/mali_kbase/mali_kbase_vinstr.c b/mali_kbase/mali_kbase_vinstr.c
index d7a6c98..eb6911e 100644
--- a/mali_kbase/mali_kbase_vinstr.c
+++ b/mali_kbase/mali_kbase_vinstr.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2011-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,11 +20,11 @@
  */
 
 #include "mali_kbase_vinstr.h"
-#include "mali_kbase_hwcnt_virtualizer.h"
-#include "mali_kbase_hwcnt_types.h"
+#include "hwcnt/mali_kbase_hwcnt_virtualizer.h"
+#include "hwcnt/mali_kbase_hwcnt_types.h"
 #include <uapi/gpu/arm/midgard/mali_kbase_hwcnt_reader.h>
-#include "mali_kbase_hwcnt_gpu.h"
-#include "mali_kbase_hwcnt_gpu_narrow.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu.h"
+#include "hwcnt/mali_kbase_hwcnt_gpu_narrow.h"
 #include <uapi/gpu/arm/midgard/mali_kbase_ioctl.h>
 #include "mali_malisw.h"
 #include "mali_kbase_debug.h"
@@ -38,8 +38,14 @@
 #include <linux/mutex.h>
 #include <linux/poll.h>
 #include <linux/slab.h>
+#include <linux/version_compat_defs.h>
 #include <linux/workqueue.h>
 
+/* Explicitly include epoll header for old kernels. Not required from 4.16. */
+#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE
+#include <uapi/linux/eventpoll.h>
+#endif
+
 /* Hwcnt reader API version */
 #define HWCNT_READER_API 1
 
@@ -113,9 +119,7 @@ struct kbase_vinstr_client {
 	wait_queue_head_t waitq;
 };
 
-static unsigned int kbasep_vinstr_hwcnt_reader_poll(
-	struct file *filp,
-	poll_table *wait);
+static __poll_t kbasep_vinstr_hwcnt_reader_poll(struct file *filp, poll_table *wait);
 
 static long kbasep_vinstr_hwcnt_reader_ioctl(
 	struct file *filp,
@@ -453,7 +457,7 @@ static int kbasep_vinstr_client_create(
 
 	errcode = -ENOMEM;
 	vcli->dump_bufs_meta = kmalloc_array(
-		setup->buffer_count, sizeof(*vcli->dump_bufs_meta), GFP_KERNEL);
+		setup->buffer_count, sizeof(*vcli->dump_bufs_meta), GFP_KERNEL | __GFP_ZERO);
 	if (!vcli->dump_bufs_meta)
 		goto error;
 
@@ -517,8 +521,6 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx)
 	if (!vctx)
 		return;
 
-	cancel_work_sync(&vctx->dump_work);
-
 	/* Non-zero client count implies client leak */
 	if (WARN_ON(vctx->client_count != 0)) {
 		struct kbase_vinstr_client *pos, *n;
@@ -530,6 +532,7 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx)
 		}
 	}
 
+	cancel_work_sync(&vctx->dump_work);
 	kbase_hwcnt_gpu_metadata_narrow_destroy(vctx->metadata_user);
 
 	WARN_ON(vctx->client_count != 0);
@@ -538,8 +541,10 @@ void kbase_vinstr_term(struct kbase_vinstr_context *vctx)
 
 void kbase_vinstr_suspend(struct kbase_vinstr_context *vctx)
 {
-	if (WARN_ON(!vctx))
+	if (!vctx) {
+		pr_warn("%s: vctx is NULL\n", __func__);
 		return;
+	}
 
 	mutex_lock(&vctx->lock);
 
@@ -568,8 +573,10 @@ void kbase_vinstr_suspend(struct kbase_vinstr_context *vctx)
 
 void kbase_vinstr_resume(struct kbase_vinstr_context *vctx)
 {
-	if (WARN_ON(!vctx))
+	if (!vctx) {
+		pr_warn("%s:vctx is NULL\n", __func__);
 		return;
+	}
 
 	mutex_lock(&vctx->lock);
 
@@ -1036,26 +1043,25 @@ static long kbasep_vinstr_hwcnt_reader_ioctl(
  * @filp: Non-NULL pointer to file structure.
  * @wait: Non-NULL pointer to poll table.
  *
- * Return: POLLIN if data can be read without blocking, 0 if data can not be
- *         read without blocking, else error code.
+ * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking, 0 if
+ *         data can not be read without blocking, else EPOLLHUP | EPOLLERR.
  */
-static unsigned int kbasep_vinstr_hwcnt_reader_poll(
-	struct file *filp,
-	poll_table *wait)
+static __poll_t kbasep_vinstr_hwcnt_reader_poll(struct file *filp, poll_table *wait)
 {
 	struct kbase_vinstr_client *cli;
 
 	if (!filp || !wait)
-		return -EINVAL;
+		return EPOLLHUP | EPOLLERR;
 
 	cli = filp->private_data;
 	if (!cli)
-		return -EINVAL;
+		return EPOLLHUP | EPOLLERR;
 
 	poll_wait(filp, &cli->waitq, wait);
 	if (kbasep_vinstr_hwcnt_reader_buffer_ready(cli))
-		return POLLIN;
-	return 0;
+		return EPOLLIN | EPOLLRDNORM;
+
+	return (__poll_t)0;
 }
 
 /**
diff --git a/mali_kbase/mali_linux_trace.h b/mali_kbase/mali_linux_trace.h
index 2a243dd..1293a0b 100644
--- a/mali_kbase/mali_linux_trace.h
+++ b/mali_kbase/mali_linux_trace.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2011-2016, 2018-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2011-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -173,7 +173,7 @@ TRACE_EVENT(mali_total_alloc_pages_change,
 		((status) & AS_FAULTSTATUS_ACCESS_TYPE_MASK)
 #define KBASE_MMU_FAULT_ACCESS_SYMBOLIC_STRINGS _ENSURE_PARENTHESIS(\
 	{AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC, "ATOMIC" }, \
-	{AS_FAULTSTATUS_ACCESS_TYPE_EX,     "EXECUTE"}, \
+	{AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE,     "EXECUTE"}, \
 	{AS_FAULTSTATUS_ACCESS_TYPE_READ,   "READ"   }, \
 	{AS_FAULTSTATUS_ACCESS_TYPE_WRITE,  "WRITE"  })
 #define KBASE_MMU_FAULT_STATUS_ACCESS_PRINT(status) \
@@ -531,6 +531,23 @@ TRACE_EVENT(mali_jit_trim,
 	TP_printk("freed_pages=%zu", __entry->freed_pages)
 );
 
+/* trace_mali_protected_mode
+ *
+ * Trace point to indicate if GPU is in protected mode
+ */
+TRACE_EVENT(mali_protected_mode,
+	TP_PROTO(bool protm),
+	TP_ARGS(protm),
+	TP_STRUCT__entry(
+		__field(bool, protm)
+	),
+	TP_fast_assign(
+		__entry->protm = protm;
+	),
+	TP_printk("Protected mode: %d" , __entry->protm)
+);
+
+
 #include "debug/mali_kbase_debug_linux_ktrace.h"
 
 #endif /* _TRACE_MALI_H */
diff --git a/mali_kbase/mali_malisw.h b/mali_kbase/mali_malisw.h
index fc8dcbc..d9db189 100644
--- a/mali_kbase/mali_malisw.h
+++ b/mali_kbase/mali_malisw.h
@@ -19,7 +19,7 @@
  *
  */
 
-/**
+/*
  * Kernel-wide include for common macros and types.
  */
 
@@ -97,16 +97,12 @@
  */
 #define CSTD_STR2(x)	CSTD_STR1(x)
 
-/* LINUX_VERSION_CODE < 5.4 */
-#if (KERNEL_VERSION(5, 4, 0) > LINUX_VERSION_CODE)
-#if defined(GCC_VERSION) && GCC_VERSION >= 70000
+ #ifndef fallthrough
+ #define fallthrough    __fallthrough
+ #endif /* fallthrough */
+
 #ifndef __fallthrough
 #define __fallthrough  __attribute__((fallthrough))
 #endif /* __fallthrough */
-#define fallthrough    __fallthrough
-#else
-#define fallthrough	   CSTD_NOP(...) /* fallthrough */
-#endif /* GCC_VERSION >= 70000 */
-#endif /* KERNEL_VERSION(5, 4, 0) */
 
 #endif /* _MALISW_H_ */
diff --git a/mali_kbase/mali_kbase_strings.c b/mali_kbase/mali_power_gpu_work_period_trace.c
index 84784be..8e7bf6f 100644
--- a/mali_kbase/mali_kbase_strings.c
+++ b/mali_kbase/mali_power_gpu_work_period_trace.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2016, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -19,10 +19,10 @@
  *
  */
 
-#include "mali_kbase_strings.h"
-
-#define KBASE_DRV_NAME "mali"
-#define KBASE_TIMELINE_NAME KBASE_DRV_NAME ".timeline"
-
-const char kbase_drv_name[] = KBASE_DRV_NAME;
-const char kbase_timeline_name[] = KBASE_TIMELINE_NAME;
+/* Create the trace point if not configured in kernel */
+#ifndef CONFIG_TRACE_POWER_GPU_WORK_PERIOD
+#if IS_ENABLED(CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD)
+#define CREATE_TRACE_POINTS
+#include "mali_power_gpu_work_period_trace.h"
+#endif /* CONFIG_MALI_TRACE_POWER_GPU_WORK_PERIOD */
+#endif
diff --git a/mali_kbase/mali_power_gpu_work_period_trace.h b/mali_kbase/mali_power_gpu_work_period_trace.h
new file mode 100644
index 0000000..46e86ad
--- /dev/null
+++ b/mali_kbase/mali_power_gpu_work_period_trace.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#ifndef _TRACE_POWER_GPU_WORK_PERIOD_MALI
+#define _TRACE_POWER_GPU_WORK_PERIOD_MALI
+#endif
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM power
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE mali_power_gpu_work_period_trace
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+
+#if !defined(_TRACE_POWER_GPU_WORK_PERIOD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_POWER_GPU_WORK_PERIOD_H
+
+#include <linux/tracepoint.h>
+
+/**
+ * gpu_work_period - Reports GPU work period metrics
+ *
+ * @gpu_id: Unique GPU Identifier
+ * @uid: UID of an application
+ * @start_time_ns: Start time of a GPU work period in nanoseconds
+ * @end_time_ns: End time of a GPU work period in nanoseconds
+ * @total_active_duration_ns: Total amount of time the GPU was running GPU work for given
+ *                            UID during the GPU work period, in nanoseconds. This duration does
+ *                            not double-account parallel GPU work for the same UID.
+ */
+TRACE_EVENT(gpu_work_period,
+
+	TP_PROTO(
+		u32 gpu_id,
+		u32 uid,
+		u64 start_time_ns,
+		u64 end_time_ns,
+		u64 total_active_duration_ns
+	),
+
+	TP_ARGS(gpu_id, uid, start_time_ns, end_time_ns, total_active_duration_ns),
+
+	TP_STRUCT__entry(
+		__field(u32, gpu_id)
+		__field(u32, uid)
+		__field(u64, start_time_ns)
+		__field(u64, end_time_ns)
+		__field(u64, total_active_duration_ns)
+	),
+
+	TP_fast_assign(
+		__entry->gpu_id = gpu_id;
+		__entry->uid = uid;
+		__entry->start_time_ns = start_time_ns;
+		__entry->end_time_ns = end_time_ns;
+		__entry->total_active_duration_ns = total_active_duration_ns;
+	),
+
+	TP_printk("gpu_id=%u uid=%u start_time_ns=%llu end_time_ns=%llu total_active_duration_ns=%llu",
+		__entry->gpu_id,
+		__entry->uid,
+		__entry->start_time_ns,
+		__entry->end_time_ns,
+		__entry->total_active_duration_ns)
+);
+
+#endif /* _TRACE_POWER_GPU_WORK_PERIOD_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c b/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c
index c9ba3fc..a057d3c 100644
--- a/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c
+++ b/mali_kbase/mmu/backend/mali_kbase_mmu_csf.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -88,12 +88,11 @@ static void submit_work_pagefault(struct kbase_device *kbdev, u32 as_nr,
 		 * context's address space, when the page fault occurs for
 		 * MCU's address space.
 		 */
-		if (!queue_work(as->pf_wq, &as->work_pagefault))
-			kbase_ctx_sched_release_ctx(kctx);
-		else {
+		if (!queue_work(as->pf_wq, &as->work_pagefault)) {
 			dev_dbg(kbdev->dev,
-				"Page fault is already pending for as %u\n",
-				as_nr);
+				"Page fault is already pending for as %u", as_nr);
+			kbase_ctx_sched_release_ctx(kctx);
+		} else {
 			atomic_inc(&kbdev->faults_pending);
 		}
 	}
@@ -122,6 +121,8 @@ void kbase_mmu_report_mcu_as_fault_and_reset(struct kbase_device *kbdev,
 		access_type, kbase_gpu_access_type_name(fault->status),
 		source_id);
 
+	kbase_debug_csf_fault_notify(kbdev, NULL, DF_GPU_PAGE_FAULT);
+
 	/* Report MMU fault for all address spaces (except MCU_AS_NR) */
 	for (as_no = 1; as_no < kbdev->nr_hw_address_spaces; as_no++)
 		submit_work_pagefault(kbdev, as_no, fault);
@@ -145,21 +146,21 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx,
 				GPU_FAULTSTATUS_ACCESS_TYPE_SHIFT;
 	int source_id = (status & GPU_FAULTSTATUS_SOURCE_ID_MASK) >>
 				GPU_FAULTSTATUS_SOURCE_ID_SHIFT;
-	const char *addr_valid = (status & GPU_FAULTSTATUS_ADDR_VALID_FLAG) ?
-					"true" : "false";
+	const char *addr_valid = (status & GPU_FAULTSTATUS_ADDRESS_VALID_MASK) ? "true" : "false";
 	int as_no = as->number;
 	unsigned long flags;
+	const uintptr_t fault_addr = fault->addr;
 
 	/* terminal fault, print info about the fault */
 	dev_err(kbdev->dev,
-		"GPU bus fault in AS%d at VA 0x%016llX\n"
-		"VA_VALID: %s\n"
+		"GPU bus fault in AS%d at PA %pK\n"
+		"PA_VALID: %s\n"
 		"raw fault status: 0x%X\n"
 		"exception type 0x%X: %s\n"
 		"access type 0x%X: %s\n"
 		"source id 0x%X\n"
 		"pid: %d\n",
-		as_no, fault->addr,
+		as_no, (void *)fault_addr,
 		addr_valid,
 		status,
 		exception_type, kbase_gpu_exception_name(exception_type),
@@ -188,6 +189,7 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx,
 	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND),
 			GPU_COMMAND_CLEAR_FAULT);
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
 }
 
 /*
@@ -244,6 +246,8 @@ void kbase_mmu_report_fault_and_kill(struct kbase_context *kctx,
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	kbase_mmu_disable(kctx);
 	kbase_ctx_flag_set(kctx, KCTX_AS_DISABLED_ON_FAULT);
+	kbase_debug_csf_fault_notify(kbdev, kctx, DF_GPU_PAGE_FAULT);
+	kbase_csf_ctx_report_page_fault_for_active_groups(kctx, fault);
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
 	mutex_unlock(&kbdev->mmu_hw_mutex);
@@ -262,6 +266,7 @@ void kbase_mmu_report_fault_and_kill(struct kbase_context *kctx,
 			KBASE_MMU_FAULT_TYPE_PAGE_UNEXPECTED);
 	kbase_mmu_hw_enable_fault(kbdev, as,
 			KBASE_MMU_FAULT_TYPE_PAGE_UNEXPECTED);
+
 }
 
 /**
@@ -363,9 +368,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 
 	/* remember current mask */
 	spin_lock_irqsave(&kbdev->mmu_mask_change, flags);
-	new_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK));
+	new_mask = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK));
 	/* mask interrupts for now */
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0);
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
 
 	while (pf_bits) {
@@ -375,11 +380,11 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 		struct kbase_fault *fault = &as->pf_data;
 
 		/* find faulting address */
-		fault->addr = kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTADDRESS_HI));
+		fault->addr = kbase_reg_read(kbdev,
+					     MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_HI)));
 		fault->addr <<= 32;
-		fault->addr |= kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTADDRESS_LO));
+		fault->addr |= kbase_reg_read(
+			kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_LO)));
 
 		/* Mark the fault protected or not */
 		fault->protected_mode = false;
@@ -388,14 +393,14 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 		kbase_as_fault_debugfs_new(kbdev, as_no);
 
 		/* record the fault status */
-		fault->status = kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTSTATUS));
+		fault->status =
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTSTATUS)));
 
-		fault->extra_addr = kbase_reg_read(kbdev,
-					MMU_AS_REG(as_no, AS_FAULTEXTRA_HI));
+		fault->extra_addr =
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_HI)));
 		fault->extra_addr <<= 32;
-		fault->extra_addr |= kbase_reg_read(kbdev,
-					MMU_AS_REG(as_no, AS_FAULTEXTRA_LO));
+		fault->extra_addr |=
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_LO)));
 
 		/* Mark page fault as handled */
 		pf_bits &= ~(1UL << as_no);
@@ -427,9 +432,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 
 	/* reenable interrupts */
 	spin_lock_irqsave(&kbdev->mmu_mask_change, flags);
-	tmp = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK));
+	tmp = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK));
 	new_mask |= tmp;
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), new_mask);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), new_mask);
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
 }
 
@@ -465,19 +470,16 @@ static void kbase_mmu_gpu_fault_worker(struct work_struct *data)
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	fault = &faulting_as->gf_data;
 	status = fault->status;
-	as_valid = status & GPU_FAULTSTATUS_JASID_VALID_FLAG;
+	as_valid = status & GPU_FAULTSTATUS_JASID_VALID_MASK;
 	address = fault->addr;
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
 	dev_warn(kbdev->dev,
 		 "GPU Fault 0x%08x (%s) in AS%u at 0x%016llx\n"
 		 "ASID_VALID: %s,  ADDRESS_VALID: %s\n",
-		 status,
-		 kbase_gpu_exception_name(
-			GPU_FAULTSTATUS_EXCEPTION_TYPE_GET(status)),
-		 as_nr, address,
-		 as_valid ? "true" : "false",
-		 status & GPU_FAULTSTATUS_ADDR_VALID_FLAG ? "true" : "false");
+		 status, kbase_gpu_exception_name(GPU_FAULTSTATUS_EXCEPTION_TYPE_GET(status)),
+		 as_nr, address, as_valid ? "true" : "false",
+		 status & GPU_FAULTSTATUS_ADDRESS_VALID_MASK ? "true" : "false");
 
 	kctx = kbase_ctx_sched_as_to_ctx(kbdev, as_nr);
 	kbase_csf_ctx_handle_fault(kctx, fault);
@@ -547,14 +549,14 @@ void kbase_mmu_gpu_fault_interrupt(struct kbase_device *kbdev, u32 status,
 }
 KBASE_EXPORT_TEST_API(kbase_mmu_gpu_fault_interrupt);
 
-int kbase_mmu_as_init(struct kbase_device *kbdev, int i)
+int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i)
 {
 	kbdev->as[i].number = i;
 	kbdev->as[i].bf_data.addr = 0ULL;
 	kbdev->as[i].pf_data.addr = 0ULL;
 	kbdev->as[i].gf_data.addr = 0ULL;
 
-	kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", 0, 1, i);
+	kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", WQ_UNBOUND, 0, i);
 	if (!kbdev->as[i].pf_wq)
 		return -ENOMEM;
 
diff --git a/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c b/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c
index fad5554..5c774c2 100644
--- a/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c
+++ b/mali_kbase/mmu/backend/mali_kbase_mmu_jm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -63,15 +63,16 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx,
 	u32 const exception_data = (status >> 8) & 0xFFFFFF;
 	int const as_no = as->number;
 	unsigned long flags;
+	const uintptr_t fault_addr = fault->addr;
 
 	/* terminal fault, print info about the fault */
 	dev_err(kbdev->dev,
-		"GPU bus fault in AS%d at VA 0x%016llX\n"
+		"GPU bus fault in AS%d at PA %pK\n"
 		"raw fault status: 0x%X\n"
 		"exception type 0x%X: %s\n"
 		"exception data 0x%X\n"
 		"pid: %d\n",
-		as_no, fault->addr,
+		as_no, (void *)fault_addr,
 		status,
 		exception_type, kbase_gpu_exception_name(exception_type),
 		exception_data,
@@ -94,6 +95,7 @@ void kbase_gpu_report_bus_fault_and_kill(struct kbase_context *kctx,
 				 KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED);
 	kbase_mmu_hw_enable_fault(kbdev, as,
 				 KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED);
+
 }
 
 /*
@@ -320,14 +322,14 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 
 	/* remember current mask */
 	spin_lock_irqsave(&kbdev->mmu_mask_change, flags);
-	new_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK));
+	new_mask = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK));
 	/* mask interrupts for now */
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), 0);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), 0);
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
 
 	while (bf_bits | pf_bits) {
 		struct kbase_as *as;
-		int as_no;
+		unsigned int as_no;
 		struct kbase_context *kctx;
 		struct kbase_fault *fault;
 
@@ -353,11 +355,11 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 		kctx = kbase_ctx_sched_as_to_ctx_refcount(kbdev, as_no);
 
 		/* find faulting address */
-		fault->addr = kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTADDRESS_HI));
+		fault->addr = kbase_reg_read(kbdev,
+					     MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_HI)));
 		fault->addr <<= 32;
-		fault->addr |= kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTADDRESS_LO));
+		fault->addr |= kbase_reg_read(
+			kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTADDRESS_LO)));
 		/* Mark the fault protected or not */
 		fault->protected_mode = kbdev->protected_mode;
 
@@ -370,13 +372,13 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 		kbase_as_fault_debugfs_new(kbdev, as_no);
 
 		/* record the fault status */
-		fault->status = kbase_reg_read(kbdev, MMU_AS_REG(as_no,
-				AS_FAULTSTATUS));
-		fault->extra_addr = kbase_reg_read(kbdev,
-				MMU_AS_REG(as_no, AS_FAULTEXTRA_HI));
+		fault->status =
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTSTATUS)));
+		fault->extra_addr =
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_HI)));
 		fault->extra_addr <<= 32;
-		fault->extra_addr |= kbase_reg_read(kbdev,
-				MMU_AS_REG(as_no, AS_FAULTEXTRA_LO));
+		fault->extra_addr |=
+			kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_no, AS_FAULTEXTRA_LO)));
 
 		if (kbase_as_has_bus_fault(as, fault)) {
 			/* Mark bus fault as handled.
@@ -404,9 +406,9 @@ void kbase_mmu_interrupt(struct kbase_device *kbdev, u32 irq_stat)
 
 	/* reenable interrupts */
 	spin_lock_irqsave(&kbdev->mmu_mask_change, flags);
-	tmp = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK));
+	tmp = kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK));
 	new_mask |= tmp;
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), new_mask);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), new_mask);
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
 
 	dev_dbg(kbdev->dev, "Leaving %s irq_stat %u\n",
@@ -422,13 +424,13 @@ int kbase_mmu_switch_to_ir(struct kbase_context *const kctx,
 	return kbase_job_slot_softstop_start_rp(kctx, reg);
 }
 
-int kbase_mmu_as_init(struct kbase_device *kbdev, int i)
+int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i)
 {
 	kbdev->as[i].number = i;
 	kbdev->as[i].bf_data.addr = 0ULL;
 	kbdev->as[i].pf_data.addr = 0ULL;
 
-	kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%d", 0, 1, i);
+	kbdev->as[i].pf_wq = alloc_workqueue("mali_mmu%u", 0, 0, i);
 	if (!kbdev->as[i].pf_wq)
 		return -ENOMEM;
 
diff --git a/mali_kbase/mmu/mali_kbase_mmu.c b/mali_kbase/mmu/mali_kbase_mmu.c
index 26ddd95..d6b4eb7 100644
--- a/mali_kbase/mmu/mali_kbase_mmu.c
+++ b/mali_kbase/mmu/mali_kbase_mmu.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,6 +25,7 @@
 
 #include <linux/kernel.h>
 #include <linux/dma-mapping.h>
+#include <linux/migrate.h>
 #include <mali_kbase.h>
 #include <gpu/mali_kbase_gpu_fault.h>
 #include <gpu/mali_kbase_gpu_regmap.h>
@@ -45,10 +46,35 @@
 #if !MALI_USE_CSF
 #include <mali_kbase_hwaccess_jm.h>
 #endif
+#include <linux/version_compat_defs.h>
 
 #include <mali_kbase_trace_gpu_mem.h>
 #include <backend/gpu/mali_kbase_pm_internal.h>
 
+/* Threshold used to decide whether to flush full caches or just a physical range */
+#define KBASE_PA_RANGE_THRESHOLD_NR_PAGES 20
+#define MGM_DEFAULT_PTE_GROUP (0)
+
+/* Macro to convert updated PDGs to flags indicating levels skip in flush */
+#define pgd_level_to_skip_flush(dirty_pgds) (~(dirty_pgds) & 0xF)
+
+static int mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				     const u64 start_vpfn, struct tagged_addr *phys, size_t nr,
+				     unsigned long flags, int const group_id, u64 *dirty_pgds,
+				     struct kbase_va_region *reg, bool ignore_page_migration);
+
+/* Small wrapper function to factor out GPU-dependent context releasing */
+static void release_ctx(struct kbase_device *kbdev,
+		struct kbase_context *kctx)
+{
+#if MALI_USE_CSF
+	CSTD_UNUSED(kbdev);
+	kbase_ctx_sched_release_ctx_lock(kctx);
+#else /* MALI_USE_CSF */
+	kbasep_js_runpool_release_ctx(kbdev, kctx);
+#endif /* MALI_USE_CSF */
+}
+
 static void mmu_hw_operation_begin(struct kbase_device *kbdev)
 {
 #if !IS_ENABLED(CONFIG_MALI_NO_MALI)
@@ -91,7 +117,8 @@ static void mmu_hw_operation_end(struct kbase_device *kbdev)
 
 /**
  * mmu_flush_cache_on_gpu_ctrl() - Check if cache flush needs to be done
- * through GPU_CONTROL interface
+ * through GPU_CONTROL interface.
+ *
  * @kbdev:         kbase device to check GPU model ID on.
  *
  * This function returns whether a cache flush for page table update should
@@ -109,119 +136,213 @@ static bool mmu_flush_cache_on_gpu_ctrl(struct kbase_device *kbdev)
 }
 
 /**
- * mmu_flush_invalidate_on_gpu_ctrl() - Flush and invalidate the GPU caches
- * through GPU_CONTROL interface.
- * @kbdev:         kbase device to issue the MMU operation on.
- * @as:            address space to issue the MMU operation on.
- * @op_param:      parameters for the operation.
+ * mmu_flush_pa_range() - Flush physical address range
  *
- * This wrapper function alternates AS_COMMAND_FLUSH_PT and AS_COMMAND_FLUSH_MEM
- * to equivalent GPU_CONTROL command FLUSH_CACHES.
- * The function first issue LOCK to MMU-AS with kbase_mmu_hw_do_operation().
- * And issues cache-flush with kbase_gpu_cache_flush_and_busy_wait() function
- * then issue UNLOCK to MMU-AS with kbase_mmu_hw_do_operation().
+ * @kbdev:    kbase device to issue the MMU operation on.
+ * @phys:     Starting address of the physical range to start the operation on.
+ * @nr_bytes: Number of bytes to work on.
+ * @op:       Type of cache flush operation to perform.
  *
- * Return: Zero if the operation was successful, non-zero otherwise.
+ * Issue a cache flush physical range command.
  */
-static int
-mmu_flush_invalidate_on_gpu_ctrl(struct kbase_device *kbdev,
-				 struct kbase_as *as,
-				 struct kbase_mmu_hw_op_param *op_param)
+#if MALI_USE_CSF
+static void mmu_flush_pa_range(struct kbase_device *kbdev, phys_addr_t phys, size_t nr_bytes,
+			       enum kbase_mmu_op_type op)
 {
 	u32 flush_op;
-	int ret, ret2;
-
-	if (WARN_ON(kbdev == NULL) ||
-	    WARN_ON(as == NULL) ||
-	    WARN_ON(op_param == NULL))
-		return -EINVAL;
 
 	lockdep_assert_held(&kbdev->hwaccess_lock);
-	lockdep_assert_held(&kbdev->mmu_hw_mutex);
 
 	/* Translate operation to command */
-	if (op_param->op == KBASE_MMU_OP_FLUSH_PT) {
-		flush_op = GPU_COMMAND_CACHE_CLN_INV_L2;
-	} else if (op_param->op == KBASE_MMU_OP_FLUSH_MEM) {
-		flush_op = GPU_COMMAND_CACHE_CLN_INV_L2_LSC;
-	} else {
-		dev_warn(kbdev->dev, "Invalid flush request (op = %d)\n",
-			 op_param->op);
-		return -EINVAL;
+	if (op == KBASE_MMU_OP_FLUSH_PT)
+		flush_op = GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2;
+	else if (op == KBASE_MMU_OP_FLUSH_MEM)
+		flush_op = GPU_COMMAND_FLUSH_PA_RANGE_CLN_INV_L2_LSC;
+	else {
+		dev_warn(kbdev->dev, "Invalid flush request (op = %d)", op);
+		return;
 	}
 
-	/* 1. Issue MMU_AS_CONTROL.COMMAND.LOCK operation. */
-	op_param->op = KBASE_MMU_OP_LOCK;
-	ret = kbase_mmu_hw_do_operation(kbdev, as, op_param);
-	if (ret)
-		return ret;
+	if (kbase_gpu_cache_flush_pa_range_and_busy_wait(kbdev, phys, nr_bytes, flush_op))
+		dev_err(kbdev->dev, "Flush for physical address range did not complete");
+}
+#endif
 
-	/* 2. Issue GPU_CONTROL.COMMAND.FLUSH_CACHES operation */
-	ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, flush_op);
+/**
+ * mmu_invalidate() - Perform an invalidate operation on MMU caches.
+ * @kbdev:      The Kbase device.
+ * @kctx:       The Kbase context.
+ * @as_nr:      GPU address space number for which invalidate is required.
+ * @op_param: Non-NULL pointer to struct containing information about the MMU
+ *            operation to perform.
+ *
+ * Perform an MMU invalidate operation on a particual address space
+ * by issuing a UNLOCK command.
+ */
+static void mmu_invalidate(struct kbase_device *kbdev, struct kbase_context *kctx, int as_nr,
+			   const struct kbase_mmu_hw_op_param *op_param)
+{
+	unsigned long flags;
 
-	/* 3. Issue MMU_AS_CONTROL.COMMAND.UNLOCK operation. */
-	op_param->op = KBASE_MMU_OP_UNLOCK;
-	ret2 = kbase_mmu_hw_do_operation(kbdev, as, op_param);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 
-	return ret ?: ret2;
+	if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) {
+		as_nr = kctx ? kctx->as_nr : as_nr;
+		if (kbase_mmu_hw_do_unlock(kbdev, &kbdev->as[as_nr], op_param))
+			dev_err(kbdev->dev,
+				"Invalidate after GPU page table update did not complete");
+	}
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+}
+
+/* Perform a flush/invalidate on a particular address space
+ */
+static void mmu_flush_invalidate_as(struct kbase_device *kbdev, struct kbase_as *as,
+				    const struct kbase_mmu_hw_op_param *op_param)
+{
+	unsigned long flags;
+
+	/* AS transaction begin */
+	mutex_lock(&kbdev->mmu_hw_mutex);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+
+	if (kbdev->pm.backend.gpu_ready && (kbase_mmu_hw_do_flush_locked(kbdev, as, op_param)))
+		dev_err(kbdev->dev, "Flush for GPU page table update did not complete");
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	mutex_unlock(&kbdev->mmu_hw_mutex);
+	/* AS transaction end */
 }
 
 /**
- * kbase_mmu_flush_invalidate() - Flush and invalidate the GPU caches.
- * @kctx: The KBase context.
- * @vpfn: The virtual page frame number to start the flush on.
- * @nr: The number of pages to flush.
- * @sync: Set if the operation should be synchronous or not.
+ * mmu_flush_invalidate() - Perform a flush operation on GPU caches.
+ * @kbdev:      The Kbase device.
+ * @kctx:       The Kbase context.
+ * @as_nr:      GPU address space number for which flush + invalidate is required.
+ * @op_param: Non-NULL pointer to struct containing information about the MMU
+ *            operation to perform.
  *
- * Issue a cache flush + invalidate to the GPU caches and invalidate the TLBs.
+ * This function performs the cache flush operation described by @op_param.
+ * The function retains a reference to the given @kctx and releases it
+ * after performing the flush operation.
  *
- * If sync is not set then transactions still in flight when the flush is issued
- * may use the old page tables and the data they write will not be written out
- * to memory, this function returns after the flush has been issued but
- * before all accesses which might effect the flushed region have completed.
+ * If operation is set to KBASE_MMU_OP_FLUSH_PT then this function will issue
+ * a cache flush + invalidate to the L2 caches and invalidate the TLBs.
  *
- * If sync is set then accesses in the flushed region will be drained
- * before data is flush and invalidated through L1, L2 and into memory,
- * after which point this function will return.
- * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops.
+ * If operation is set to KBASE_MMU_OP_FLUSH_MEM then this function will issue
+ * a cache flush + invalidate to the L2 and GPU Load/Store caches as well as
+ * invalidating the TLBs.
  */
-static void
-kbase_mmu_flush_invalidate(struct kbase_context *kctx, u64 vpfn, size_t nr,
-			   bool sync,
-			   enum kbase_caller_mmu_sync_info mmu_sync_info);
+static void mmu_flush_invalidate(struct kbase_device *kbdev, struct kbase_context *kctx, int as_nr,
+				 const struct kbase_mmu_hw_op_param *op_param)
+{
+	bool ctx_is_in_runpool;
+
+	/* Early out if there is nothing to do */
+	if (op_param->nr == 0)
+		return;
+
+	/* If no context is provided then MMU operation is performed on address
+	 * space which does not belong to user space context. Otherwise, retain
+	 * refcount to context provided and release after flush operation.
+	 */
+	if (!kctx) {
+		mmu_flush_invalidate_as(kbdev, &kbdev->as[as_nr], op_param);
+	} else {
+#if !MALI_USE_CSF
+		rt_mutex_lock(&kbdev->js_data.queue_mutex);
+		ctx_is_in_runpool = kbase_ctx_sched_inc_refcount(kctx);
+		rt_mutex_unlock(&kbdev->js_data.queue_mutex);
+#else
+		ctx_is_in_runpool = kbase_ctx_sched_inc_refcount_if_as_valid(kctx);
+#endif /* !MALI_USE_CSF */
+
+		if (ctx_is_in_runpool) {
+			KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
+
+			mmu_flush_invalidate_as(kbdev, &kbdev->as[kctx->as_nr], op_param);
+
+			release_ctx(kbdev, kctx);
+		}
+	}
+}
 
 /**
- * kbase_mmu_flush_invalidate_no_ctx() - Flush and invalidate the GPU caches.
- * @kbdev: Device pointer.
- * @vpfn: The virtual page frame number to start the flush on.
- * @nr: The number of pages to flush.
- * @sync: Set if the operation should be synchronous or not.
- * @as_nr: GPU address space number for which flush + invalidate is required.
- * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops.
+ * mmu_flush_invalidate_on_gpu_ctrl() - Perform a flush operation on GPU caches via
+ *                                    the GPU_CONTROL interface
+ * @kbdev:      The Kbase device.
+ * @kctx:       The Kbase context.
+ * @as_nr:      GPU address space number for which flush + invalidate is required.
+ * @op_param: Non-NULL pointer to struct containing information about the MMU
+ *            operation to perform.
  *
- * This is used for MMU tables which do not belong to a user space context.
+ * Perform a flush/invalidate on a particular address space via the GPU_CONTROL
+ * interface.
  */
-static void kbase_mmu_flush_invalidate_no_ctx(
-	struct kbase_device *kbdev, u64 vpfn, size_t nr, bool sync, int as_nr,
-	enum kbase_caller_mmu_sync_info mmu_sync_info);
+static void mmu_flush_invalidate_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_context *kctx,
+					int as_nr, const struct kbase_mmu_hw_op_param *op_param)
+{
+	unsigned long flags;
+
+	/* AS transaction begin */
+	mutex_lock(&kbdev->mmu_hw_mutex);
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+
+	if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0)) {
+		as_nr = kctx ? kctx->as_nr : as_nr;
+		if (kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, &kbdev->as[as_nr], op_param))
+			dev_err(kbdev->dev, "Flush for GPU page table update did not complete");
+	}
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+	mutex_unlock(&kbdev->mmu_hw_mutex);
+}
+
+static void kbase_mmu_sync_pgd_gpu(struct kbase_device *kbdev, struct kbase_context *kctx,
+				   phys_addr_t phys, size_t size,
+				   enum kbase_mmu_op_type flush_op)
+{
+	kbase_mmu_flush_pa_range(kbdev, kctx, phys, size, flush_op);
+}
+
+static void kbase_mmu_sync_pgd_cpu(struct kbase_device *kbdev, dma_addr_t handle, size_t size)
+{
+	/* Ensure that the GPU can read the pages from memory.
+	 *
+	 * pixel: b/200555454 requires this sync to happen even if the system
+	 * is coherent.
+	 */
+	dma_sync_single_for_device(kbdev->dev, handle, size,
+			DMA_TO_DEVICE);
+}
 
 /**
  * kbase_mmu_sync_pgd() - sync page directory to memory when needed.
- * @kbdev:	Device pointer.
- * @handle:	Address of DMA region.
- * @size:       Size of the region to sync.
+ * @kbdev:    Device pointer.
+ * @kctx:     Context pointer.
+ * @phys:     Starting physical address of the destination region.
+ * @handle:   Address of DMA region.
+ * @size:     Size of the region to sync.
+ * @flush_op: MMU cache flush operation to perform on the physical address
+ *            range, if GPU control is available.
+ *
+ * This function is called whenever the association between a virtual address
+ * range and a physical address range changes, because a mapping is created or
+ * destroyed.
+ * One of the effects of this operation is performing an MMU cache flush
+ * operation only on the physical address range affected by this function, if
+ * GPU control is available.
  *
  * This should be called after each page directory update.
  */
-static void kbase_mmu_sync_pgd(struct kbase_device *kbdev,
-		dma_addr_t handle, size_t size)
+static void kbase_mmu_sync_pgd(struct kbase_device *kbdev, struct kbase_context *kctx,
+			       phys_addr_t phys, dma_addr_t handle, size_t size,
+			       enum kbase_mmu_op_type flush_op)
 {
-	/* In non-coherent system, ensure the GPU can read
-	 * the pages from memory
-	 */
-	if (kbdev->system_coherency == COHERENCY_NONE)
-		dma_sync_single_for_device(kbdev->dev, handle, size,
-				DMA_TO_DEVICE);
+
+	kbase_mmu_sync_pgd_cpu(kbdev, handle, size);
+	kbase_mmu_sync_pgd_gpu(kbdev, kctx, phys, size, flush_op);
 }
 
 /*
@@ -233,35 +354,153 @@ static void kbase_mmu_sync_pgd(struct kbase_device *kbdev,
  *        a 4kB physical page.
  */
 
-static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
-					struct tagged_addr *phys, size_t nr,
-					unsigned long flags, int group_id);
-
 /**
  * kbase_mmu_update_and_free_parent_pgds() - Update number of valid entries and
  *                                           free memory of the page directories
  *
- * @kbdev:   Device pointer.
- * @mmut:    GPU MMU page table.
- * @pgds:    Physical addresses of page directories to be freed.
- * @vpfn:    The virtual page frame number.
- * @level:   The level of MMU page table.
+ * @kbdev:    Device pointer.
+ * @mmut:     GPU MMU page table.
+ * @pgds:     Physical addresses of page directories to be freed.
+ * @vpfn:     The virtual page frame number.
+ * @level:    The level of MMU page table.
+ * @flush_op: The type of MMU flush operation to perform.
+ * @dirty_pgds: Flags to track every level where a PGD has been updated.
  */
 static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev,
-						  struct kbase_mmu_table *mmut,
-						  phys_addr_t *pgds, u64 vpfn,
-						  int level);
+						  struct kbase_mmu_table *mmut, phys_addr_t *pgds,
+						  u64 vpfn, int level,
+						  enum kbase_mmu_op_type flush_op, u64 *dirty_pgds);
+
+static void kbase_mmu_account_freed_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut)
+{
+	atomic_sub(1, &kbdev->memdev.used_pages);
+
+	/* If MMU tables belong to a context then pages will have been accounted
+	 * against it, so we must decrement the usage counts here.
+	 */
+	if (mmut->kctx) {
+		kbase_process_page_usage_dec(mmut->kctx, 1);
+		atomic_sub(1, &mmut->kctx->used_pages);
+	}
+
+	kbase_trace_gpu_mem_usage_dec(kbdev, mmut->kctx, 1);
+}
+
+static bool kbase_mmu_handle_isolated_pgd_page(struct kbase_device *kbdev,
+					       struct kbase_mmu_table *mmut,
+					       struct page *p)
+{
+	struct kbase_page_metadata *page_md = kbase_page_private(p);
+	bool page_is_isolated = false;
+
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	if (!kbase_is_page_migration_enabled())
+		return false;
+
+	spin_lock(&page_md->migrate_lock);
+	if (PAGE_STATUS_GET(page_md->status) == PT_MAPPED) {
+		WARN_ON_ONCE(!mmut->kctx);
+		if (IS_PAGE_ISOLATED(page_md->status)) {
+			page_md->status = PAGE_STATUS_SET(page_md->status,
+							  FREE_PT_ISOLATED_IN_PROGRESS);
+			page_md->data.free_pt_isolated.kbdev = kbdev;
+			page_is_isolated = true;
+		} else {
+			page_md->status =
+				PAGE_STATUS_SET(page_md->status, FREE_IN_PROGRESS);
+		}
+	} else if ((PAGE_STATUS_GET(page_md->status) == FREE_IN_PROGRESS) ||
+		   (PAGE_STATUS_GET(page_md->status) == ALLOCATE_IN_PROGRESS)) {
+		/* Nothing to do - fall through */
+	} else {
+		WARN_ON_ONCE(PAGE_STATUS_GET(page_md->status) != NOT_MOVABLE);
+	}
+	spin_unlock(&page_md->migrate_lock);
+
+	if (unlikely(page_is_isolated)) {
+		/* Do the CPU cache flush and accounting here for the isolated
+		 * PGD page, which is done inside kbase_mmu_free_pgd() for the
+		 * PGD page that did not get isolated.
+		 */
+		dma_sync_single_for_device(kbdev->dev, kbase_dma_addr(p), PAGE_SIZE,
+					   DMA_BIDIRECTIONAL);
+		kbase_mmu_account_freed_pgd(kbdev, mmut);
+	}
+
+	return page_is_isolated;
+}
+
 /**
  * kbase_mmu_free_pgd() - Free memory of the page directory
  *
  * @kbdev:   Device pointer.
  * @mmut:    GPU MMU page table.
  * @pgd:     Physical address of page directory to be freed.
- * @dirty:   Flag to indicate whether the page may be dirty in the cache.
+ *
+ * This function is supposed to be called with mmu_lock held and after
+ * ensuring that the GPU won't be able to access the page.
  */
-static void kbase_mmu_free_pgd(struct kbase_device *kbdev,
-			       struct kbase_mmu_table *mmut, phys_addr_t pgd,
-			       bool dirty);
+static void kbase_mmu_free_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+			       phys_addr_t pgd)
+{
+	struct page *p;
+	bool page_is_isolated = false;
+
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	p = pfn_to_page(PFN_DOWN(pgd));
+	page_is_isolated = kbase_mmu_handle_isolated_pgd_page(kbdev, mmut, p);
+
+	if (likely(!page_is_isolated)) {
+		kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id], p, true);
+		kbase_mmu_account_freed_pgd(kbdev, mmut);
+	}
+}
+
+/**
+ * kbase_mmu_free_pgds_list() - Free the PGD pages present in the list
+ *
+ * @kbdev:          Device pointer.
+ * @mmut:           GPU MMU page table.
+ *
+ * This function will call kbase_mmu_free_pgd() on each page directory page
+ * present in the list of free PGDs inside @mmut.
+ *
+ * The function is supposed to be called after the GPU cache and MMU TLB has
+ * been invalidated post the teardown loop.
+ *
+ * The mmu_lock shall be held prior to calling the function.
+ */
+static void kbase_mmu_free_pgds_list(struct kbase_device *kbdev, struct kbase_mmu_table *mmut)
+{
+	size_t i;
+
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	for (i = 0; i < mmut->scratch_mem.free_pgds.head_index; i++)
+		kbase_mmu_free_pgd(kbdev, mmut, page_to_phys(mmut->scratch_mem.free_pgds.pgds[i]));
+
+	mmut->scratch_mem.free_pgds.head_index = 0;
+}
+
+static void kbase_mmu_add_to_free_pgds_list(struct kbase_mmu_table *mmut, struct page *p)
+{
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	if (WARN_ON_ONCE(mmut->scratch_mem.free_pgds.head_index > (MAX_FREE_PGDS - 1)))
+		return;
+
+	mmut->scratch_mem.free_pgds.pgds[mmut->scratch_mem.free_pgds.head_index++] = p;
+}
+
+static inline void kbase_mmu_reset_free_pgds_list(struct kbase_mmu_table *mmut)
+{
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	mmut->scratch_mem.free_pgds.head_index = 0;
+}
+
 /**
  * reg_grow_calc_extra_pages() - Calculate the number of backed pages to add to
  *                               a region on a GPU page fault
@@ -289,7 +528,7 @@ static size_t reg_grow_calc_extra_pages(struct kbase_device *kbdev,
 	if (!multiple) {
 		dev_warn(
 			kbdev->dev,
-			"VA Region 0x%llx extension was 0, allocator needs to set this properly for KBASE_REG_PF_GROW\n",
+			"VA Region 0x%llx extension was 0, allocator needs to set this properly for KBASE_REG_PF_GROW",
 			((unsigned long long)reg->start_pfn) << PAGE_SHIFT);
 		return minimum_extra;
 	}
@@ -345,13 +584,14 @@ static size_t reg_grow_calc_extra_pages(struct kbase_device *kbdev,
 static void kbase_gpu_mmu_handle_write_faulting_as(struct kbase_device *kbdev,
 						   struct kbase_as *faulting_as,
 						   u64 start_pfn, size_t nr,
-						   u32 kctx_id)
+						   u32 kctx_id, u64 dirty_pgds)
 {
 	/* Calls to this function are inherently synchronous, with respect to
 	 * MMU operations.
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_SYNC;
 	struct kbase_mmu_hw_op_param op_param;
+	int ret = 0;
 
 	mutex_lock(&kbdev->mmu_hw_mutex);
 
@@ -359,27 +599,31 @@ static void kbase_gpu_mmu_handle_write_faulting_as(struct kbase_device *kbdev,
 			KBASE_MMU_FAULT_TYPE_PAGE);
 
 	/* flush L2 and unlock the VA (resumes the MMU) */
-	op_param = (struct kbase_mmu_hw_op_param){
-		.vpfn = start_pfn,
-		.nr = nr,
-		.op = KBASE_MMU_OP_FLUSH_PT,
-		.kctx_id = kctx_id,
-		.mmu_sync_info = mmu_sync_info,
-	};
+	op_param.vpfn = start_pfn;
+	op_param.nr = nr;
+	op_param.op = KBASE_MMU_OP_FLUSH_PT;
+	op_param.kctx_id = kctx_id;
+	op_param.mmu_sync_info = mmu_sync_info;
 	if (mmu_flush_cache_on_gpu_ctrl(kbdev)) {
 		unsigned long irq_flags;
 
 		spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags);
-		mmu_flush_invalidate_on_gpu_ctrl(kbdev, faulting_as, &op_param);
+		op_param.flush_skip_levels =
+				pgd_level_to_skip_flush(dirty_pgds);
+		ret = kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, faulting_as, &op_param);
 		spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags);
 	} else {
 		mmu_hw_operation_begin(kbdev);
-		kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param);
+		ret = kbase_mmu_hw_do_flush(kbdev, faulting_as, &op_param);
 		mmu_hw_operation_end(kbdev);
 	}
 
 	mutex_unlock(&kbdev->mmu_hw_mutex);
 
+	if (ret)
+		dev_err(kbdev->dev,
+			"Flush for GPU page fault due to write access did not complete");
+
 	kbase_mmu_hw_enable_fault(kbdev, faulting_as,
 			KBASE_MMU_FAULT_TYPE_PAGE);
 }
@@ -412,8 +656,8 @@ static void kbase_gpu_mmu_handle_write_fault(struct kbase_context *kctx,
 	struct tagged_addr *fault_phys_addr;
 	struct kbase_fault *fault;
 	u64 fault_pfn, pfn_offset;
-	int ret;
 	int as_no;
+	u64 dirty_pgds = 0;
 
 	as_no = faulting_as->number;
 	kbdev = container_of(faulting_as, struct kbase_device, as[as_no]);
@@ -472,12 +716,11 @@ static void kbase_gpu_mmu_handle_write_fault(struct kbase_context *kctx,
 	}
 
 	/* Now make this faulting page writable to GPU. */
-	ret = kbase_mmu_update_pages_no_flush(kctx, fault_pfn,
-				fault_phys_addr,
-				1, region->flags, region->gpu_alloc->group_id);
+	kbase_mmu_update_pages_no_flush(kbdev, &kctx->mmu, fault_pfn, fault_phys_addr, 1,
+					region->flags, region->gpu_alloc->group_id, &dirty_pgds);
 
 	kbase_gpu_mmu_handle_write_faulting_as(kbdev, faulting_as, fault_pfn, 1,
-					       kctx->id);
+					       kctx->id, dirty_pgds);
 
 	kbase_gpu_vm_unlock(kctx);
 }
@@ -492,7 +735,7 @@ static void kbase_gpu_mmu_handle_permission_fault(struct kbase_context *kctx,
 	case AS_FAULTSTATUS_ACCESS_TYPE_WRITE:
 		kbase_gpu_mmu_handle_write_fault(kctx, faulting_as);
 		break;
-	case AS_FAULTSTATUS_ACCESS_TYPE_EX:
+	case AS_FAULTSTATUS_ACCESS_TYPE_EXECUTE:
 		kbase_mmu_report_fault_and_kill(kctx, faulting_as,
 				"Execute Permission fault", fault);
 		break;
@@ -508,31 +751,68 @@ static void kbase_gpu_mmu_handle_permission_fault(struct kbase_context *kctx,
 }
 #endif
 
-#define MAX_POOL_LEVEL 2
+/**
+ * estimate_pool_space_required - Determine how much a pool should be grown by to support a future
+ * allocation
+ * @pool:           The memory pool to check, including its linked pools
+ * @pages_required: Number of 4KiB pages require for the pool to support a future allocation
+ *
+ * The value returned is accounting for the size of @pool and the size of each memory pool linked to
+ * @pool. Hence, the caller should use @pool and (if not already satisfied) all its linked pools to
+ * allocate from.
+ *
+ * Note: this is only an estimate, because even during the calculation the memory pool(s) involved
+ * can be updated to be larger or smaller. Hence, the result is only a guide as to whether an
+ * allocation could succeed, or an estimate of the correct amount to grow the pool by. The caller
+ * should keep attempting an allocation and then re-growing with a new value queried form this
+ * function until the allocation succeeds.
+ *
+ * Return: an estimate of the amount of extra 4KiB pages in @pool that are required to satisfy an
+ * allocation, or 0 if @pool (including its linked pools) is likely to already satisfy the
+ * allocation.
+ */
+static size_t estimate_pool_space_required(struct kbase_mem_pool *pool, const size_t pages_required)
+{
+	size_t pages_still_required;
+
+	for (pages_still_required = pages_required; pool != NULL && pages_still_required;
+	     pool = pool->next_pool) {
+		size_t pool_size_4k;
+
+		kbase_mem_pool_lock(pool);
+
+		pool_size_4k = kbase_mem_pool_size(pool) << pool->order;
+		if (pool_size_4k >= pages_still_required)
+			pages_still_required = 0;
+		else
+			pages_still_required -= pool_size_4k;
+
+		kbase_mem_pool_unlock(pool);
+	}
+	return pages_still_required;
+}
 
 /**
  * page_fault_try_alloc - Try to allocate memory from a context pool
  * @kctx:          Context pointer
  * @region:        Region to grow
- * @new_pages:     Number of 4 kB pages to allocate
- * @pages_to_grow: Pointer to variable to store number of outstanding pages on
- *                 failure. This can be either 4 kB or 2 MB pages, depending on
- *                 the number of pages requested.
- * @grow_2mb_pool: Pointer to variable to store which pool needs to grow - true
- *                 for 2 MB, false for 4 kB.
+ * @new_pages:     Number of 4 KiB pages to allocate
+ * @pages_to_grow: Pointer to variable to store number of outstanding pages on failure. This can be
+ *                 either 4 KiB or 2 MiB pages, depending on the number of pages requested.
+ * @grow_2mb_pool: Pointer to variable to store which pool needs to grow - true for 2 MiB, false for
+ *                 4 KiB.
  * @prealloc_sas:  Pointer to kbase_sub_alloc structures
  *
- * This function will try to allocate as many pages as possible from the context
- * pool, then if required will try to allocate the remaining pages from the
- * device pool.
+ * This function will try to allocate as many pages as possible from the context pool, then if
+ * required will try to allocate the remaining pages from the device pool.
  *
- * This function will not allocate any new memory beyond that is already
- * present in the context or device pools. This is because it is intended to be
- * called with the vm_lock held, which could cause recursive locking if the
- * allocation caused the out-of-memory killer to run.
+ * This function will not allocate any new memory beyond that is already present in the context or
+ * device pools. This is because it is intended to be called whilst the thread has acquired the
+ * region list lock with kbase_gpu_vm_lock(), and a large enough memory allocation whilst that is
+ * held could invoke the OoM killer and cause an effective deadlock with kbase_cpu_vm_close().
  *
- * If 2 MB pages are enabled and new_pages is >= 2 MB then pages_to_grow will be
- * a count of 2 MB pages, otherwise it will be a count of 4 kB pages.
+ * If 2 MiB pages are enabled and new_pages is >= 2 MiB then pages_to_grow will be a count of 2 MiB
+ * pages, otherwise it will be a count of 4 KiB pages.
  *
  * Return: true if successful, false on failure
  */
@@ -541,13 +821,15 @@ static bool page_fault_try_alloc(struct kbase_context *kctx,
 		int *pages_to_grow, bool *grow_2mb_pool,
 		struct kbase_sub_alloc **prealloc_sas)
 {
-	struct tagged_addr *gpu_pages[MAX_POOL_LEVEL] = {NULL};
-	struct tagged_addr *cpu_pages[MAX_POOL_LEVEL] = {NULL};
-	size_t pages_alloced[MAX_POOL_LEVEL] = {0};
+	size_t total_gpu_pages_alloced = 0;
+	size_t total_cpu_pages_alloced = 0;
 	struct kbase_mem_pool *pool, *root_pool;
-	int pool_level = 0;
 	bool alloc_failed = false;
 	size_t pages_still_required;
+	size_t total_mempools_free_4k = 0;
+
+	lockdep_assert_held(&kctx->reg_lock);
+	lockdep_assert_held(&kctx->mem_partials_lock);
 
 	if (WARN_ON(region->gpu_alloc->group_id >=
 		MEMORY_GROUP_MANAGER_NR_GROUPS)) {
@@ -556,42 +838,21 @@ static bool page_fault_try_alloc(struct kbase_context *kctx,
 		return false;
 	}
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-	if (new_pages >= (SZ_2M / SZ_4K)) {
+	if (kctx->kbdev->pagesize_2mb && new_pages >= (SZ_2M / SZ_4K)) {
 		root_pool = &kctx->mem_pools.large[region->gpu_alloc->group_id];
 		*grow_2mb_pool = true;
 	} else {
-#endif
 		root_pool = &kctx->mem_pools.small[region->gpu_alloc->group_id];
 		*grow_2mb_pool = false;
-#ifdef CONFIG_MALI_2MB_ALLOC
 	}
-#endif
 
 	if (region->gpu_alloc != region->cpu_alloc)
 		new_pages *= 2;
 
-	pages_still_required = new_pages;
-
 	/* Determine how many pages are in the pools before trying to allocate.
 	 * Don't attempt to allocate & free if the allocation can't succeed.
 	 */
-	for (pool = root_pool; pool != NULL; pool = pool->next_pool) {
-		size_t pool_size_4k;
-
-		kbase_mem_pool_lock(pool);
-
-		pool_size_4k = kbase_mem_pool_size(pool) << pool->order;
-		if (pool_size_4k >= pages_still_required)
-			pages_still_required = 0;
-		else
-			pages_still_required -= pool_size_4k;
-
-		kbase_mem_pool_unlock(pool);
-
-		if (!pages_still_required)
-			break;
-	}
+	pages_still_required = estimate_pool_space_required(root_pool, new_pages);
 
 	if (pages_still_required) {
 		/* Insufficient pages in pools. Don't try to allocate - just
@@ -602,11 +863,11 @@ static bool page_fault_try_alloc(struct kbase_context *kctx,
 		return false;
 	}
 
-	/* Since we've dropped the pool locks, the amount of memory in the pools
-	 * may change between the above check and the actual allocation.
+	/* Since we're not holding any of the mempool locks, the amount of memory in the pools may
+	 * change between the above estimate and the actual allocation.
 	 */
-	pool = root_pool;
-	for (pool_level = 0; pool_level < MAX_POOL_LEVEL; pool_level++) {
+	pages_still_required = new_pages;
+	for (pool = root_pool; pool != NULL && pages_still_required; pool = pool->next_pool) {
 		size_t pool_size_4k;
 		size_t pages_to_alloc_4k;
 		size_t pages_to_alloc_4k_per_alloc;
@@ -615,94 +876,92 @@ static bool page_fault_try_alloc(struct kbase_context *kctx,
 
 		/* Allocate as much as possible from this pool*/
 		pool_size_4k = kbase_mem_pool_size(pool) << pool->order;
-		pages_to_alloc_4k = MIN(new_pages, pool_size_4k);
+		total_mempools_free_4k += pool_size_4k;
+		pages_to_alloc_4k = MIN(pages_still_required, pool_size_4k);
 		if (region->gpu_alloc == region->cpu_alloc)
 			pages_to_alloc_4k_per_alloc = pages_to_alloc_4k;
 		else
 			pages_to_alloc_4k_per_alloc = pages_to_alloc_4k >> 1;
 
-		pages_alloced[pool_level] = pages_to_alloc_4k;
 		if (pages_to_alloc_4k) {
-			gpu_pages[pool_level] =
-					kbase_alloc_phy_pages_helper_locked(
-						region->gpu_alloc, pool,
-						pages_to_alloc_4k_per_alloc,
-						&prealloc_sas[0]);
+			struct tagged_addr *gpu_pages =
+				kbase_alloc_phy_pages_helper_locked(region->gpu_alloc, pool,
+								    pages_to_alloc_4k_per_alloc,
+								    &prealloc_sas[0]);
 
-			if (!gpu_pages[pool_level]) {
+			if (!gpu_pages)
 				alloc_failed = true;
-			} else if (region->gpu_alloc != region->cpu_alloc) {
-				cpu_pages[pool_level] =
-					kbase_alloc_phy_pages_helper_locked(
-						region->cpu_alloc, pool,
-						pages_to_alloc_4k_per_alloc,
-						&prealloc_sas[1]);
-
-				if (!cpu_pages[pool_level])
+			else
+				total_gpu_pages_alloced += pages_to_alloc_4k_per_alloc;
+
+			if (!alloc_failed && region->gpu_alloc != region->cpu_alloc) {
+				struct tagged_addr *cpu_pages = kbase_alloc_phy_pages_helper_locked(
+					region->cpu_alloc, pool, pages_to_alloc_4k_per_alloc,
+					&prealloc_sas[1]);
+
+				if (!cpu_pages)
 					alloc_failed = true;
+				else
+					total_cpu_pages_alloced += pages_to_alloc_4k_per_alloc;
 			}
 		}
 
 		kbase_mem_pool_unlock(pool);
 
 		if (alloc_failed) {
-			WARN_ON(!new_pages);
-			WARN_ON(pages_to_alloc_4k >= new_pages);
-			WARN_ON(pages_to_alloc_4k_per_alloc >= new_pages);
+			WARN_ON(!pages_still_required);
+			WARN_ON(pages_to_alloc_4k >= pages_still_required);
+			WARN_ON(pages_to_alloc_4k_per_alloc >= pages_still_required);
 			break;
 		}
 
-		new_pages -= pages_to_alloc_4k;
-
-		if (!new_pages)
-			break;
-
-		pool = pool->next_pool;
-		if (!pool)
-			break;
+		pages_still_required -= pages_to_alloc_4k;
 	}
 
-	if (new_pages) {
-		/* Allocation was unsuccessful */
-		int max_pool_level = pool_level;
-
-		pool = root_pool;
-
-		/* Free memory allocated so far */
-		for (pool_level = 0; pool_level <= max_pool_level;
-				pool_level++) {
-			kbase_mem_pool_lock(pool);
+	if (pages_still_required) {
+		/* Allocation was unsuccessful. We have dropped the mem_pool lock after allocation,
+		 * so must in any case use kbase_free_phy_pages_helper() rather than
+		 * kbase_free_phy_pages_helper_locked()
+		 */
+		if (total_gpu_pages_alloced > 0)
+			kbase_free_phy_pages_helper(region->gpu_alloc, total_gpu_pages_alloced);
+		if (region->gpu_alloc != region->cpu_alloc && total_cpu_pages_alloced > 0)
+			kbase_free_phy_pages_helper(region->cpu_alloc, total_cpu_pages_alloced);
 
-			if (region->gpu_alloc != region->cpu_alloc) {
-				if (pages_alloced[pool_level] &&
-						cpu_pages[pool_level])
-					kbase_free_phy_pages_helper_locked(
-						region->cpu_alloc,
-						pool, cpu_pages[pool_level],
-						pages_alloced[pool_level]);
+		if (alloc_failed) {
+			/* Note that in allocating from the above memory pools, we always ensure
+			 * never to request more than is available in each pool with the pool's
+			 * lock held. Hence failing to allocate in such situations would be unusual
+			 * and we should cancel the growth instead (as re-growing the memory pool
+			 * might not fix the situation)
+			 */
+			dev_warn(
+				kctx->kbdev->dev,
+				"Page allocation failure of %zu pages: managed %zu pages, mempool (inc linked pools) had %zu pages available",
+				new_pages, total_gpu_pages_alloced + total_cpu_pages_alloced,
+				total_mempools_free_4k);
+			*pages_to_grow = 0;
+		} else {
+			/* Tell the caller to try to grow the memory pool
+			 *
+			 * Freeing pages above may have spilled or returned them to the OS, so we
+			 * have to take into account how many are still in the pool before giving a
+			 * new estimate for growth required of the pool. We can just re-estimate a
+			 * new value.
+			 */
+			pages_still_required = estimate_pool_space_required(root_pool, new_pages);
+			if (pages_still_required) {
+				*pages_to_grow = pages_still_required;
+			} else {
+				/* It's possible another thread could've grown the pool to be just
+				 * big enough after we rolled back the allocation. Request at least
+				 * one more page to ensure the caller doesn't fail the growth by
+				 * conflating it with the alloc_failed case above
+				 */
+				*pages_to_grow = 1u;
 			}
-
-			if (pages_alloced[pool_level] && gpu_pages[pool_level])
-				kbase_free_phy_pages_helper_locked(
-						region->gpu_alloc,
-						pool, gpu_pages[pool_level],
-						pages_alloced[pool_level]);
-
-			kbase_mem_pool_unlock(pool);
-
-			pool = pool->next_pool;
 		}
 
-		/*
-		 * If the allocation failed despite there being enough memory in
-		 * the pool, then just fail. Otherwise, try to grow the memory
-		 * pool.
-		 */
-		if (alloc_failed)
-			*pages_to_grow = 0;
-		else
-			*pages_to_grow = new_pages;
-
 		return false;
 	}
 
@@ -712,18 +971,6 @@ static bool page_fault_try_alloc(struct kbase_context *kctx,
 	return true;
 }
 
-/* Small wrapper function to factor out GPU-dependent context releasing */
-static void release_ctx(struct kbase_device *kbdev,
-		struct kbase_context *kctx)
-{
-#if MALI_USE_CSF
-	CSTD_UNUSED(kbdev);
-	kbase_ctx_sched_release_ctx_lock(kctx);
-#else /* MALI_USE_CSF */
-	kbasep_js_runpool_release_ctx(kbdev, kctx);
-#endif /* MALI_USE_CSF */
-}
-
 void kbase_mmu_page_fault_worker(struct work_struct *data)
 {
 	u64 fault_pfn;
@@ -758,9 +1005,8 @@ void kbase_mmu_page_fault_worker(struct work_struct *data)
 	as_no = faulting_as->number;
 
 	kbdev = container_of(faulting_as, struct kbase_device, as[as_no]);
-	dev_dbg(kbdev->dev,
-		"Entering %s %pK, fault_pfn %lld, as_no %d\n",
-		__func__, (void *)data, fault_pfn, as_no);
+	dev_dbg(kbdev->dev, "Entering %s %pK, fault_pfn %lld, as_no %d", __func__, (void *)data,
+		fault_pfn, as_no);
 
 	/* Grab the context that was already refcounted in kbase_mmu_interrupt()
 	 * Therefore, it cannot be scheduled out of this AS until we explicitly
@@ -783,8 +1029,7 @@ void kbase_mmu_page_fault_worker(struct work_struct *data)
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	/* check if we still have GPU */
 	if (unlikely(kbase_is_gpu_removed(kbdev))) {
-		dev_dbg(kbdev->dev,
-				"%s: GPU has been removed\n", __func__);
+		dev_dbg(kbdev->dev, "%s: GPU has been removed", __func__);
 		goto fault_done;
 	}
 #endif
@@ -847,20 +1092,24 @@ void kbase_mmu_page_fault_worker(struct work_struct *data)
 		goto fault_done;
 	}
 
-#ifdef CONFIG_MALI_2MB_ALLOC
-	/* Preallocate memory for the sub-allocation structs if necessary */
-	for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) {
-		prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL);
-		if (!prealloc_sas[i]) {
-			kbase_mmu_report_fault_and_kill(kctx, faulting_as,
-					"Failed pre-allocating memory for sub-allocations' metadata",
-					fault);
-			goto fault_done;
+page_fault_retry:
+	if (kbdev->pagesize_2mb) {
+		/* Preallocate (or re-allocate) memory for the sub-allocation structs if necessary */
+		for (i = 0; i != ARRAY_SIZE(prealloc_sas); ++i) {
+			if (!prealloc_sas[i]) {
+				prealloc_sas[i] = kmalloc(sizeof(*prealloc_sas[i]), GFP_KERNEL);
+
+				if (!prealloc_sas[i]) {
+					kbase_mmu_report_fault_and_kill(
+						kctx, faulting_as,
+						"Failed pre-allocating memory for sub-allocations' metadata",
+						fault);
+					goto fault_done;
+				}
+			}
 		}
 	}
-#endif /* CONFIG_MALI_2MB_ALLOC */
 
-page_fault_retry:
 	/* so we have a translation fault,
 	 * let's see if it is for growable memory
 	 */
@@ -938,16 +1187,29 @@ page_fault_retry:
 		 * transaction (which should cause the other page fault to be
 		 * raised again).
 		 */
-		op_param = (struct kbase_mmu_hw_op_param){
-			.vpfn = 0,
-			.nr = 0,
-			.op = KBASE_MMU_OP_UNLOCK,
-			.kctx_id = kctx->id,
-			.mmu_sync_info = mmu_sync_info,
-		};
-		mmu_hw_operation_begin(kbdev);
-		kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param);
-		mmu_hw_operation_end(kbdev);
+		op_param.mmu_sync_info = mmu_sync_info;
+		op_param.kctx_id = kctx->id;
+		if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) {
+			mmu_hw_operation_begin(kbdev);
+			err = kbase_mmu_hw_do_unlock_no_addr(kbdev, faulting_as,
+							     &op_param);
+			mmu_hw_operation_end(kbdev);
+		} else {
+			/* Can safely skip the invalidate for all levels in case
+			 * of duplicate page faults.
+			 */
+			op_param.flush_skip_levels = 0xF;
+			op_param.vpfn = fault_pfn;
+			op_param.nr = 1;
+			err = kbase_mmu_hw_do_unlock(kbdev, faulting_as,
+						     &op_param);
+		}
+
+		if (err) {
+			dev_err(kbdev->dev,
+				"Invalidation for MMU did not complete on handling page fault @ 0x%llx",
+				fault->addr);
+		}
 
 		mutex_unlock(&kbdev->mmu_hw_mutex);
 
@@ -962,8 +1224,7 @@ page_fault_retry:
 
 	/* cap to max vsize */
 	new_pages = min(new_pages, region->nr_pages - current_backed_size);
-	dev_dbg(kctx->kbdev->dev, "Allocate %zu pages on page fault\n",
-		new_pages);
+	dev_dbg(kctx->kbdev->dev, "Allocate %zu pages on page fault", new_pages);
 
 	if (new_pages == 0) {
 		struct kbase_mmu_hw_op_param op_param;
@@ -975,16 +1236,29 @@ page_fault_retry:
 				KBASE_MMU_FAULT_TYPE_PAGE);
 
 		/* See comment [1] about UNLOCK usage */
-		op_param = (struct kbase_mmu_hw_op_param){
-			.vpfn = 0,
-			.nr = 0,
-			.op = KBASE_MMU_OP_UNLOCK,
-			.kctx_id = kctx->id,
-			.mmu_sync_info = mmu_sync_info,
-		};
-		mmu_hw_operation_begin(kbdev);
-		kbase_mmu_hw_do_operation(kbdev, faulting_as, &op_param);
-		mmu_hw_operation_end(kbdev);
+		op_param.mmu_sync_info = mmu_sync_info;
+		op_param.kctx_id = kctx->id;
+		if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) {
+			mmu_hw_operation_begin(kbdev);
+			err = kbase_mmu_hw_do_unlock_no_addr(kbdev, faulting_as,
+							     &op_param);
+			mmu_hw_operation_end(kbdev);
+		} else {
+			/* Can safely skip the invalidate for all levels in case
+			 * of duplicate page faults.
+			 */
+			op_param.flush_skip_levels = 0xF;
+			op_param.vpfn = fault_pfn;
+			op_param.nr = 1;
+			err = kbase_mmu_hw_do_unlock(kbdev, faulting_as,
+						     &op_param);
+		}
+
+		if (err) {
+			dev_err(kbdev->dev,
+				"Invalidation for MMU did not complete on handling page fault @ 0x%llx",
+				fault->addr);
+		}
 
 		mutex_unlock(&kbdev->mmu_hw_mutex);
 
@@ -1009,6 +1283,7 @@ page_fault_retry:
 	spin_unlock(&kctx->mem_partials_lock);
 
 	if (grown) {
+		u64 dirty_pgds = 0;
 		u64 pfn_offset;
 		struct kbase_mmu_hw_op_param op_param;
 
@@ -1026,10 +1301,11 @@ page_fault_retry:
 		 * so the no_flush version of insert_pages is used which allows
 		 * us to unlock the MMU as we see fit.
 		 */
-		err = kbase_mmu_insert_pages_no_flush(kbdev, &kctx->mmu,
-			region->start_pfn + pfn_offset,
-			&kbase_get_gpu_phy_pages(region)[pfn_offset],
-			new_pages, region->flags, region->gpu_alloc->group_id);
+		err = mmu_insert_pages_no_flush(kbdev, &kctx->mmu, region->start_pfn + pfn_offset,
+						&kbase_get_gpu_phy_pages(region)[pfn_offset],
+						new_pages, region->flags,
+						region->gpu_alloc->group_id, &dirty_pgds, region,
+						false);
 		if (err) {
 			kbase_free_phy_pages_helper(region->gpu_alloc,
 					new_pages);
@@ -1048,23 +1324,18 @@ page_fault_retry:
 				(u64)new_pages);
 		trace_mali_mmu_page_fault_grow(region, fault, new_pages);
 
-#if MALI_INCREMENTAL_RENDERING
+#if MALI_INCREMENTAL_RENDERING_JM
 		/* Switch to incremental rendering if we have nearly run out of
 		 * memory in a JIT memory allocation.
 		 */
 		if (region->threshold_pages &&
 			kbase_reg_current_backed_size(region) >
 				region->threshold_pages) {
-
-			dev_dbg(kctx->kbdev->dev,
-				"%zu pages exceeded IR threshold %zu\n",
-				new_pages + current_backed_size,
-				region->threshold_pages);
+			dev_dbg(kctx->kbdev->dev, "%zu pages exceeded IR threshold %zu",
+				new_pages + current_backed_size, region->threshold_pages);
 
 			if (kbase_mmu_switch_to_ir(kctx, region) >= 0) {
-				dev_dbg(kctx->kbdev->dev,
-					"Get region %pK for IR\n",
-					(void *)region);
+				dev_dbg(kctx->kbdev->dev, "Get region %pK for IR", (void *)region);
 				kbase_va_region_alloc_get(kctx, region);
 			}
 		}
@@ -1084,25 +1355,22 @@ page_fault_retry:
 		kbase_mmu_hw_clear_fault(kbdev, faulting_as,
 					 KBASE_MMU_FAULT_TYPE_PAGE);
 
-		/* flush L2 and unlock the VA (resumes the MMU) */
-		op_param = (struct kbase_mmu_hw_op_param){
-			.vpfn = fault->addr >> PAGE_SHIFT,
-			.nr = new_pages,
-			.op = KBASE_MMU_OP_FLUSH_PT,
-			.kctx_id = kctx->id,
-			.mmu_sync_info = mmu_sync_info,
-		};
+		op_param.vpfn = region->start_pfn + pfn_offset;
+		op_param.nr = new_pages;
+		op_param.op = KBASE_MMU_OP_FLUSH_PT;
+		op_param.kctx_id = kctx->id;
+		op_param.mmu_sync_info = mmu_sync_info;
 		if (mmu_flush_cache_on_gpu_ctrl(kbdev)) {
-			unsigned long irq_flags;
-
-			spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags);
-			err = mmu_flush_invalidate_on_gpu_ctrl(kbdev, faulting_as,
-							 &op_param);
-			spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags);
+			/* Unlock to invalidate the TLB (and resume the MMU) */
+			op_param.flush_skip_levels =
+				pgd_level_to_skip_flush(dirty_pgds);
+			err = kbase_mmu_hw_do_unlock(kbdev, faulting_as,
+						     &op_param);
 		} else {
+			/* flush L2 and unlock the VA (resumes the MMU) */
 			mmu_hw_operation_begin(kbdev);
-			err = kbase_mmu_hw_do_operation(kbdev, faulting_as,
-						  &op_param);
+			err = kbase_mmu_hw_do_flush(kbdev, faulting_as,
+						    &op_param);
 			mmu_hw_operation_end(kbdev);
 		}
 
@@ -1148,6 +1416,7 @@ page_fault_retry:
 		kbase_gpu_vm_unlock(kctx);
 	} else {
 		int ret = -ENOMEM;
+		const u8 group_id = region->gpu_alloc->group_id;
 
 		kbase_gpu_vm_unlock(kctx);
 
@@ -1155,37 +1424,31 @@ page_fault_retry:
 		 * Otherwise fail the allocation.
 		 */
 		if (pages_to_grow > 0) {
-#ifdef CONFIG_MALI_2MB_ALLOC
-			if (grow_2mb_pool) {
+			if (kbdev->pagesize_2mb && grow_2mb_pool) {
 				/* Round page requirement up to nearest 2 MB */
 				struct kbase_mem_pool *const lp_mem_pool =
-					&kctx->mem_pools.large[
-					region->gpu_alloc->group_id];
+					&kctx->mem_pools.large[group_id];
 
 				pages_to_grow = (pages_to_grow +
 					((1 << lp_mem_pool->order) - 1))
 						>> lp_mem_pool->order;
 
 				ret = kbase_mem_pool_grow(lp_mem_pool,
-					pages_to_grow);
+					pages_to_grow, kctx->task);
 			} else {
-#endif
 				struct kbase_mem_pool *const mem_pool =
-					&kctx->mem_pools.small[
-					region->gpu_alloc->group_id];
+					&kctx->mem_pools.small[group_id];
 
 				ret = kbase_mem_pool_grow(mem_pool,
-					pages_to_grow);
-#ifdef CONFIG_MALI_2MB_ALLOC
+					pages_to_grow, kctx->task);
 			}
-#endif
 		}
 		if (ret < 0) {
 			/* failed to extend, handle as a normal PF */
 			kbase_mmu_report_fault_and_kill(kctx, faulting_as,
 					"Page allocation failure", fault);
 		} else {
-			dev_dbg(kbdev->dev, "Try again after pool_grow\n");
+			dev_dbg(kbdev->dev, "Try again after pool_grow");
 			goto page_fault_retry;
 		}
 	}
@@ -1212,24 +1475,27 @@ fault_done:
 	release_ctx(kbdev, kctx);
 
 	atomic_dec(&kbdev->faults_pending);
-	dev_dbg(kbdev->dev, "Leaving page_fault_worker %pK\n", (void *)data);
+	dev_dbg(kbdev->dev, "Leaving page_fault_worker %pK", (void *)data);
 }
 
 static phys_addr_t kbase_mmu_alloc_pgd(struct kbase_device *kbdev,
 		struct kbase_mmu_table *mmut)
 {
 	u64 *page;
-	int i;
 	struct page *p;
+	phys_addr_t pgd;
 
 	p = kbase_mem_pool_alloc(&kbdev->mem_pools.small[mmut->group_id]);
 	if (!p)
-		return 0;
+		return KBASE_MMU_INVALID_PGD_ADDRESS;
+
+	page = kbase_kmap(p);
 
-	page = kmap(p);
 	if (page == NULL)
 		goto alloc_free;
 
+	pgd = page_to_phys(p);
+
 	/* If the MMU tables belong to a context then account the memory usage
 	 * to that context, otherwise the MMU tables are device wide and are
 	 * only accounted to the device.
@@ -1250,33 +1516,43 @@ static phys_addr_t kbase_mmu_alloc_pgd(struct kbase_device *kbdev,
 
 	kbase_trace_gpu_mem_usage_inc(kbdev, mmut->kctx, 1);
 
-	for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++)
-		kbdev->mmu_mode->entry_invalidate(&page[i]);
+	kbdev->mmu_mode->entries_invalidate(page, KBASE_MMU_PAGE_ENTRIES);
 
-	kbase_mmu_sync_pgd(kbdev, kbase_dma_addr(p), PAGE_SIZE);
+	/* As this page is newly created, therefore there is no content to
+	 * clean or invalidate in the GPU caches.
+	 */
+	kbase_mmu_sync_pgd_cpu(kbdev, kbase_dma_addr(p), PAGE_SIZE);
 
-	kunmap(p);
-	return page_to_phys(p);
+	kbase_kunmap(p, page);
+	return pgd;
 
 alloc_free:
 	kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id], p, false);
 
-	return 0;
+	return KBASE_MMU_INVALID_PGD_ADDRESS;
 }
 
-/* Given PGD PFN for level N, return PGD PFN for level N+1, allocating the
- * new table from the pool if needed and possible
+/**
+ * mmu_get_next_pgd() - Given PGD PFN for level N, return PGD PFN for level N+1
+ *
+ * @kbdev:    Device pointer.
+ * @mmut:     GPU MMU page table.
+ * @pgd:      Physical addresse of level N page directory.
+ * @vpfn:     The virtual page frame number.
+ * @level:    The level of MMU page table (N).
+ *
+ * Return:
+ * * 0 - OK
+ * * -EFAULT - level N+1 PGD does not exist
+ * * -EINVAL - kmap() failed for level N PGD PFN
  */
-static int mmu_get_next_pgd(struct kbase_device *kbdev,
-		struct kbase_mmu_table *mmut,
-		phys_addr_t *pgd, u64 vpfn, int level)
+static int mmu_get_next_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+			    phys_addr_t *pgd, u64 vpfn, int level)
 {
 	u64 *page;
 	phys_addr_t target_pgd;
 	struct page *p;
 
-	KBASE_DEBUG_ASSERT(*pgd);
-
 	lockdep_assert_held(&mmut->mmu_lock);
 
 	/*
@@ -1287,43 +1563,92 @@ static int mmu_get_next_pgd(struct kbase_device *kbdev,
 	vpfn &= 0x1FF;
 
 	p = pfn_to_page(PFN_DOWN(*pgd));
-	page = kmap(p);
+	page = kbase_kmap(p);
 	if (page == NULL) {
-		dev_warn(kbdev->dev, "%s: kmap failure\n", __func__);
+		dev_err(kbdev->dev, "%s: kmap failure", __func__);
 		return -EINVAL;
 	}
 
-	target_pgd = kbdev->mmu_mode->pte_to_phy_addr(page[vpfn]);
+	if (!kbdev->mmu_mode->pte_is_valid(page[vpfn], level)) {
+		dev_dbg(kbdev->dev, "%s: invalid PTE at level %d vpfn 0x%llx", __func__, level,
+			vpfn);
+		kbase_kunmap(p, page);
+		return -EFAULT;
+	} else {
+		target_pgd = kbdev->mmu_mode->pte_to_phy_addr(
+			kbdev->mgm_dev->ops.mgm_pte_to_original_pte(
+				kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[vpfn]));
+	}
 
-	if (!target_pgd) {
-		target_pgd = kbase_mmu_alloc_pgd(kbdev, mmut);
-		if (!target_pgd) {
-			dev_dbg(kbdev->dev, "%s: kbase_mmu_alloc_pgd failure\n",
-					__func__);
-			kunmap(p);
-			return -ENOMEM;
-		}
+	kbase_kunmap(p, page);
+	*pgd = target_pgd;
 
-		kbdev->mmu_mode->entry_set_pte(page, vpfn, target_pgd);
+	return 0;
+}
 
-		kbase_mmu_sync_pgd(kbdev, kbase_dma_addr(p), PAGE_SIZE);
-		/* Rely on the caller to update the address space flags. */
+/**
+ * mmu_get_lowest_valid_pgd() - Find a valid PGD at or closest to in_level
+ *
+ * @kbdev:    Device pointer.
+ * @mmut:     GPU MMU page table.
+ * @vpfn:     The virtual page frame number.
+ * @in_level:     The level of MMU page table (N).
+ * @out_level:    Set to the level of the lowest valid PGD found on success.
+ *                Invalid on error.
+ * @out_pgd:      Set to the lowest valid PGD found on success.
+ *                Invalid on error.
+ *
+ * Does a page table walk starting from top level (L0) to in_level to find a valid PGD at or
+ * closest to in_level
+ *
+ * Terminology:
+ * Level-0 = Top-level = highest
+ * Level-3 = Bottom-level = lowest
+ *
+ * Return:
+ * * 0 - OK
+ * * -EINVAL - kmap() failed during page table walk.
+ */
+static int mmu_get_lowest_valid_pgd(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				    u64 vpfn, int in_level, int *out_level, phys_addr_t *out_pgd)
+{
+	phys_addr_t pgd;
+	int l;
+	int err = 0;
+
+	lockdep_assert_held(&mmut->mmu_lock);
+	pgd = mmut->pgd;
+
+	for (l = MIDGARD_MMU_TOPLEVEL; l < in_level; l++) {
+		err = mmu_get_next_pgd(kbdev, mmut, &pgd, vpfn, l);
+
+		/* Handle failure condition */
+		if (err) {
+			dev_dbg(kbdev->dev,
+				"%s: mmu_get_next_pgd() failed to find a valid pgd at level %d",
+				__func__, l + 1);
+			break;
+		}
 	}
 
-	kunmap(p);
-	*pgd = target_pgd;
+	*out_pgd = pgd;
+	*out_level = l;
 
-	return 0;
+	/* -EFAULT indicates that pgd param was valid but the next pgd entry at vpfn was invalid.
+	 * This implies that we have found the lowest valid pgd. Reset the error code.
+	 */
+	if (err == -EFAULT)
+		err = 0;
+
+	return err;
 }
 
 /*
- * Returns the PGD for the specified level of translation
+ * On success, sets out_pgd to the PGD for the specified level of translation
+ * Returns -EFAULT if a valid PGD is not found
  */
-static int mmu_get_pgd_at_level(struct kbase_device *kbdev,
-					struct kbase_mmu_table *mmut,
-					u64 vpfn,
-					int level,
-					phys_addr_t *out_pgd)
+static int mmu_get_pgd_at_level(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+				int level, phys_addr_t *out_pgd)
 {
 	phys_addr_t pgd;
 	int l;
@@ -1335,9 +1660,9 @@ static int mmu_get_pgd_at_level(struct kbase_device *kbdev,
 		int err = mmu_get_next_pgd(kbdev, mmut, &pgd, vpfn, l);
 		/* Handle failure condition */
 		if (err) {
-			dev_dbg(kbdev->dev,
-				 "%s: mmu_get_next_pgd failure at level %d\n",
-				 __func__, l);
+			dev_err(kbdev->dev,
+				"%s: mmu_get_next_pgd() failed to find a valid pgd at level %d",
+				__func__, l + 1);
 			return err;
 		}
 	}
@@ -1347,20 +1672,11 @@ static int mmu_get_pgd_at_level(struct kbase_device *kbdev,
 	return 0;
 }
 
-static int mmu_get_bottom_pgd(struct kbase_device *kbdev,
-		struct kbase_mmu_table *mmut,
-		u64 vpfn,
-		phys_addr_t *out_pgd)
-{
-	return mmu_get_pgd_at_level(kbdev, mmut, vpfn, MIDGARD_MMU_BOTTOMLEVEL,
-			out_pgd);
-}
-
 static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev,
-		struct kbase_mmu_table *mmut,
-		u64 from_vpfn, u64 to_vpfn)
+					      struct kbase_mmu_table *mmut, u64 from_vpfn,
+					      u64 to_vpfn, u64 *dirty_pgds,
+					      struct tagged_addr *phys, bool ignore_page_migration)
 {
-	phys_addr_t pgd;
 	u64 vpfn = from_vpfn;
 	struct kbase_mmu_mode const *mmu_mode;
 
@@ -1371,9 +1687,9 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev,
 	lockdep_assert_held(&mmut->mmu_lock);
 
 	mmu_mode = kbdev->mmu_mode;
+	kbase_mmu_reset_free_pgds_list(mmut);
 
 	while (vpfn < to_vpfn) {
-		unsigned int i;
 		unsigned int idx = vpfn & 0x1FF;
 		unsigned int count = KBASE_MMU_PAGE_ENTRIES - idx;
 		unsigned int pcount = 0;
@@ -1381,6 +1697,8 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev,
 		int level;
 		u64 *page;
 		phys_addr_t pgds[MIDGARD_MMU_BOTTOMLEVEL + 1];
+		phys_addr_t pgd = mmut->pgd;
+		struct page *p = phys_to_page(pgd);
 
 		register unsigned int num_of_valid_entries;
 
@@ -1388,17 +1706,17 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev,
 			count = left;
 
 		/* need to check if this is a 2MB page or a 4kB */
-		pgd = mmut->pgd;
-
 		for (level = MIDGARD_MMU_TOPLEVEL;
 				level <= MIDGARD_MMU_BOTTOMLEVEL; level++) {
 			idx = (vpfn >> ((3 - level) * 9)) & 0x1FF;
 			pgds[level] = pgd;
-			page = kmap(phys_to_page(pgd));
+			page = kbase_kmap(p);
 			if (mmu_mode->ate_is_valid(page[idx], level))
 				break; /* keep the mapping */
-			kunmap(phys_to_page(pgd));
-			pgd = mmu_mode->pte_to_phy_addr(page[idx]);
+			kbase_kunmap(p, page);
+			pgd = mmu_mode->pte_to_phy_addr(kbdev->mgm_dev->ops.mgm_pte_to_original_pte(
+				kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[idx]));
+			p = phys_to_page(pgd);
 		}
 
 		switch (level) {
@@ -1411,68 +1729,312 @@ static void mmu_insert_pages_failure_recovery(struct kbase_device *kbdev,
 			pcount = count;
 			break;
 		default:
-			dev_warn(kbdev->dev, "%sNo support for ATEs at level %d\n",
-			       __func__, level);
+			dev_warn(kbdev->dev, "%sNo support for ATEs at level %d", __func__, level);
 			goto next;
 		}
 
+		if (dirty_pgds && pcount > 0)
+			*dirty_pgds |= 1ULL << level;
+
 		num_of_valid_entries = mmu_mode->get_num_valid_entries(page);
 		if (WARN_ON_ONCE(num_of_valid_entries < pcount))
 			num_of_valid_entries = 0;
 		else
 			num_of_valid_entries -= pcount;
 
+		/* Invalidate the entries we added */
+		mmu_mode->entries_invalidate(&page[idx], pcount);
+
 		if (!num_of_valid_entries) {
-			kunmap(phys_to_page(pgd));
+			kbase_kunmap(p, page);
 
-			kbase_mmu_free_pgd(kbdev, mmut, pgd, true);
+			kbase_mmu_add_to_free_pgds_list(mmut, p);
 
-			kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds,
-							      vpfn, level);
+			kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, vpfn, level,
+							      KBASE_MMU_OP_NONE, dirty_pgds);
 			vpfn += count;
 			continue;
 		}
 
-		/* Invalidate the entries we added */
-		for (i = 0; i < pcount; i++)
-			mmu_mode->entry_invalidate(&page[idx + i]);
-
 		mmu_mode->set_num_valid_entries(page, num_of_valid_entries);
 
-		kbase_mmu_sync_pgd(kbdev,
-				   kbase_dma_addr(phys_to_page(pgd)) + 8 * idx,
-				   8 * pcount);
-		kunmap(phys_to_page(pgd));
+		/* MMU cache flush strategy is NONE because GPU cache maintenance is
+		 * going to be done by the caller
+		 */
+		kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (idx * sizeof(u64)),
+				   kbase_dma_addr(p) + sizeof(u64) * idx, sizeof(u64) * pcount,
+				   KBASE_MMU_OP_NONE);
+		kbase_kunmap(p, page);
 next:
 		vpfn += count;
 	}
+
+	/* If page migration is enabled: the only way to recover from failure
+	 * is to mark all pages as not movable. It is not predictable what's
+	 * going to happen to these pages at this stage. They might return
+	 * movable once they are returned to a memory pool.
+	 */
+	if (kbase_is_page_migration_enabled() && !ignore_page_migration && phys) {
+		const u64 num_pages = to_vpfn - from_vpfn + 1;
+		u64 i;
+
+		for (i = 0; i < num_pages; i++) {
+			struct page *phys_page = as_page(phys[i]);
+			struct kbase_page_metadata *page_md = kbase_page_private(phys_page);
+
+			if (page_md) {
+				spin_lock(&page_md->migrate_lock);
+				page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+				spin_unlock(&page_md->migrate_lock);
+			}
+		}
+	}
 }
 
-/*
- * Map the single page 'phys' 'nr' of times, starting at GPU PFN 'vpfn'
+static void mmu_flush_invalidate_insert_pages(struct kbase_device *kbdev,
+					      struct kbase_mmu_table *mmut, const u64 vpfn,
+					      size_t nr, u64 dirty_pgds,
+					      enum kbase_caller_mmu_sync_info mmu_sync_info,
+					      bool insert_pages_failed)
+{
+	struct kbase_mmu_hw_op_param op_param;
+	int as_nr = 0;
+
+	op_param.vpfn = vpfn;
+	op_param.nr = nr;
+	op_param.op = KBASE_MMU_OP_FLUSH_PT;
+	op_param.mmu_sync_info = mmu_sync_info;
+	op_param.kctx_id = mmut->kctx ? mmut->kctx->id : 0xFFFFFFFF;
+	op_param.flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds);
+
+#if MALI_USE_CSF
+	as_nr = mmut->kctx ? mmut->kctx->as_nr : MCU_AS_NR;
+#else
+	WARN_ON(!mmut->kctx);
+#endif
+
+	/* MMU cache flush strategy depends on whether GPU control commands for
+	 * flushing physical address ranges are supported. The new physical pages
+	 * are not present in GPU caches therefore they don't need any cache
+	 * maintenance, but PGDs in the page table may or may not be created anew.
+	 *
+	 * Operations that affect the whole GPU cache shall only be done if it's
+	 * impossible to update physical ranges.
+	 *
+	 * On GPUs where flushing by physical address range is supported,
+	 * full cache flush is done when an error occurs during
+	 * insert_pages() to keep the error handling simpler.
+	 */
+	if (mmu_flush_cache_on_gpu_ctrl(kbdev) && !insert_pages_failed)
+		mmu_invalidate(kbdev, mmut->kctx, as_nr, &op_param);
+	else
+		mmu_flush_invalidate(kbdev, mmut->kctx, as_nr, &op_param);
+}
+
+/**
+ * update_parent_pgds() - Updates the page table from bottom level towards
+ *                        the top level to insert a new ATE
+ *
+ * @kbdev:    Device pointer.
+ * @mmut:     GPU MMU page table.
+ * @cur_level:    The level of MMU page table where the ATE needs to be added.
+ *                The bottom PGD level.
+ * @insert_level: The level of MMU page table where the chain of newly allocated
+ *                PGDs needs to be linked-in/inserted.
+ * @insert_vpfn:  The virtual page frame number for the ATE.
+ * @pgds_to_insert: Ptr to an array (size MIDGARD_MMU_BOTTOMLEVEL+1) that contains
+ *                  the physical addresses of newly allocated PGDs from index
+ *                  insert_level+1 to cur_level, and an existing PGD at index
+ *                  insert_level.
+ *
+ * The newly allocated PGDs are linked from the bottom level up and inserted into the PGD
+ * at insert_level which already exists in the MMU Page Tables. Migration status is also
+ * updated for all the newly allocated PGD pages.
+ *
+ * Return:
+ * * 0 - OK
+ * * -EFAULT - level N+1 PGD does not exist
+ * * -EINVAL - kmap() failed for level N PGD PFN
+ */
+static int update_parent_pgds(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+			      int cur_level, int insert_level, u64 insert_vpfn,
+			      phys_addr_t *pgds_to_insert)
+{
+	int pgd_index;
+	int err = 0;
+
+	/* Add a PTE for the new PGD page at pgd_index into the parent PGD at (pgd_index-1)
+	 * Loop runs from the bottom-most to the top-most level so that all entries in the chain
+	 * are valid when they are inserted into the MMU Page table via the insert_level PGD.
+	 */
+	for (pgd_index = cur_level; pgd_index > insert_level; pgd_index--) {
+		int parent_index = pgd_index - 1;
+		phys_addr_t parent_pgd = pgds_to_insert[parent_index];
+		unsigned int current_valid_entries;
+		u64 pte;
+		phys_addr_t target_pgd = pgds_to_insert[pgd_index];
+		u64 parent_vpfn = (insert_vpfn >> ((3 - parent_index) * 9)) & 0x1FF;
+		struct page *parent_page = pfn_to_page(PFN_DOWN(parent_pgd));
+		u64 *parent_page_va;
+
+		if (WARN_ON_ONCE(target_pgd == KBASE_MMU_INVALID_PGD_ADDRESS)) {
+			err = -EFAULT;
+			goto failure_recovery;
+		}
+
+		parent_page_va = kbase_kmap(parent_page);
+
+		if (unlikely(parent_page_va == NULL)) {
+			dev_err(kbdev->dev, "%s: kmap failure", __func__);
+			err = -EINVAL;
+			goto failure_recovery;
+		}
+
+		current_valid_entries = kbdev->mmu_mode->get_num_valid_entries(parent_page_va);
+
+		kbdev->mmu_mode->entry_set_pte(&pte, target_pgd);
+		parent_page_va[parent_vpfn] = kbdev->mgm_dev->ops.mgm_update_gpu_pte(
+			kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, parent_index, pte);
+		kbdev->mmu_mode->set_num_valid_entries(parent_page_va, current_valid_entries + 1);
+		kbase_kunmap(parent_page, parent_page_va);
+
+		if (parent_index != insert_level) {
+			/* Newly allocated PGDs */
+			kbase_mmu_sync_pgd_cpu(
+				kbdev, kbase_dma_addr(parent_page) + (parent_vpfn * sizeof(u64)),
+				sizeof(u64));
+		} else {
+			/* A new valid entry is added to an existing PGD. Perform the
+			 * invalidate operation for GPU cache as it could be having a
+			 * cacheline that contains the entry (in an invalid form).
+			 */
+			kbase_mmu_sync_pgd(
+				kbdev, mmut->kctx, parent_pgd + (parent_vpfn * sizeof(u64)),
+				kbase_dma_addr(parent_page) + (parent_vpfn * sizeof(u64)),
+				sizeof(u64), KBASE_MMU_OP_FLUSH_PT);
+		}
+
+		/* Update the new target_pgd page to its stable state */
+		if (kbase_is_page_migration_enabled()) {
+			struct kbase_page_metadata *page_md =
+				kbase_page_private(phys_to_page(target_pgd));
+
+			spin_lock(&page_md->migrate_lock);
+
+			WARN_ON_ONCE(PAGE_STATUS_GET(page_md->status) != ALLOCATE_IN_PROGRESS ||
+				     IS_PAGE_ISOLATED(page_md->status));
+
+			if (mmut->kctx) {
+				page_md->status = PAGE_STATUS_SET(page_md->status, PT_MAPPED);
+				page_md->data.pt_mapped.mmut = mmut;
+				page_md->data.pt_mapped.pgd_vpfn_level =
+					PGD_VPFN_LEVEL_SET(insert_vpfn, parent_index);
+			} else {
+				page_md->status = PAGE_STATUS_SET(page_md->status, NOT_MOVABLE);
+			}
+
+			spin_unlock(&page_md->migrate_lock);
+		}
+	}
+
+	return 0;
+
+failure_recovery:
+	/* Cleanup PTEs from PGDs. The Parent PGD in the loop above is just "PGD" here */
+	for (; pgd_index < cur_level; pgd_index++) {
+		phys_addr_t pgd = pgds_to_insert[pgd_index];
+		struct page *pgd_page = pfn_to_page(PFN_DOWN(pgd));
+		u64 *pgd_page_va = kbase_kmap(pgd_page);
+		u64 vpfn = (insert_vpfn >> ((3 - pgd_index) * 9)) & 0x1FF;
+
+		kbdev->mmu_mode->entries_invalidate(&pgd_page_va[vpfn], 1);
+		kbase_kunmap(pgd_page, pgd_page_va);
+	}
+
+	return err;
+}
+
+/**
+ * mmu_insert_alloc_pgds() - allocate memory for PGDs from level_low to
+ *                           level_high (inclusive)
+ *
+ * @kbdev:    Device pointer.
+ * @mmut:     GPU MMU page table.
+ * @level_low:  The lower bound for the levels for which the PGD allocs are required
+ * @level_high: The higher bound for the levels for which the PGD allocs are required
+ * @new_pgds:   Ptr to an array (size MIDGARD_MMU_BOTTOMLEVEL+1) to write the
+ *              newly allocated PGD addresses to.
+ *
+ * Numerically, level_low < level_high, not to be confused with top level and
+ * bottom level concepts for MMU PGDs. They are only used as low and high bounds
+ * in an incrementing for-loop.
+ *
+ * Return:
+ * * 0 - OK
+ * * -ENOMEM - allocation failed for a PGD.
  */
-int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn,
-				 struct tagged_addr phys, size_t nr,
-				 unsigned long flags, int const group_id,
-				 enum kbase_caller_mmu_sync_info mmu_sync_info)
+static int mmu_insert_alloc_pgds(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				 phys_addr_t *new_pgds, int level_low, int level_high)
+{
+	int err = 0;
+	int i;
+
+	lockdep_assert_held(&mmut->mmu_lock);
+
+	for (i = level_low; i <= level_high; i++) {
+		do {
+			new_pgds[i] = kbase_mmu_alloc_pgd(kbdev, mmut);
+			if (new_pgds[i] != KBASE_MMU_INVALID_PGD_ADDRESS)
+				break;
+
+			rt_mutex_unlock(&mmut->mmu_lock);
+			err = kbase_mem_pool_grow(&kbdev->mem_pools.small[mmut->group_id],
+						  level_high, NULL);
+			rt_mutex_lock(&mmut->mmu_lock);
+			if (err) {
+				dev_err(kbdev->dev, "%s: kbase_mem_pool_grow() returned error %d",
+					__func__, err);
+
+				/* Free all PGDs allocated in previous successful iterations
+				 * from (i-1) to level_low
+				 */
+				for (i = (i - 1); i >= level_low; i--) {
+					if (new_pgds[i] != KBASE_MMU_INVALID_PGD_ADDRESS)
+						kbase_mmu_free_pgd(kbdev, mmut, new_pgds[i]);
+				}
+
+				return err;
+			}
+		} while (1);
+	}
+
+	return 0;
+}
+
+static int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 start_vpfn,
+					struct tagged_addr phys, size_t nr, unsigned long flags,
+					int const group_id,
+					enum kbase_caller_mmu_sync_info mmu_sync_info,
+					bool ignore_page_migration)
 {
 	phys_addr_t pgd;
 	u64 *pgd_page;
-	/* In case the insert_single_page only partially completes
-	 * we need to be able to recover
-	 */
-	bool recover_required = false;
-	u64 start_vpfn = vpfn;
-	size_t recover_count = 0;
+	u64 insert_vpfn = start_vpfn;
 	size_t remain = nr;
 	int err;
 	struct kbase_device *kbdev;
+	u64 dirty_pgds = 0;
+	unsigned int i;
+	phys_addr_t new_pgds[MIDGARD_MMU_BOTTOMLEVEL + 1];
+	enum kbase_mmu_op_type flush_op;
+	struct kbase_mmu_table *mmut = &kctx->mmu;
+	int l, cur_level, insert_level;
 
 	if (WARN_ON(kctx == NULL))
 		return -EINVAL;
 
 	/* 64-bit address range is the max */
-	KBASE_DEBUG_ASSERT(vpfn <= (U64_MAX / PAGE_SIZE));
+	KBASE_DEBUG_ASSERT(start_vpfn <= (U64_MAX / PAGE_SIZE));
 
 	kbdev = kctx->kbdev;
 
@@ -1480,77 +2042,88 @@ int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn,
 	if (nr == 0)
 		return 0;
 
-	rt_mutex_lock(&kctx->mmu.mmu_lock);
+	/* If page migration is enabled, pages involved in multiple GPU mappings
+	 * are always treated as not movable.
+	 */
+	if (kbase_is_page_migration_enabled() && !ignore_page_migration) {
+		struct page *phys_page = as_page(phys);
+		struct kbase_page_metadata *page_md = kbase_page_private(phys_page);
+
+		if (page_md) {
+			spin_lock(&page_md->migrate_lock);
+			page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+			spin_unlock(&page_md->migrate_lock);
+		}
+	}
+
+	rt_mutex_lock(&mmut->mmu_lock);
 
 	while (remain) {
-		unsigned int i;
-		unsigned int index = vpfn & 0x1FF;
-		unsigned int count = KBASE_MMU_PAGE_ENTRIES - index;
+		unsigned int vindex = insert_vpfn & 0x1FF;
+		unsigned int count = KBASE_MMU_PAGE_ENTRIES - vindex;
 		struct page *p;
 		register unsigned int num_of_valid_entries;
+		bool newly_created_pgd = false;
 
 		if (count > remain)
 			count = remain;
 
+		cur_level = MIDGARD_MMU_BOTTOMLEVEL;
+		insert_level = cur_level;
+
 		/*
-		 * Repeatedly calling mmu_get_bottom_pgd() is clearly
+		 * Repeatedly calling mmu_get_lowest_valid_pgd() is clearly
 		 * suboptimal. We don't have to re-parse the whole tree
 		 * each time (just cache the l0-l2 sequence).
 		 * On the other hand, it's only a gain when we map more than
 		 * 256 pages at once (on average). Do we really care?
 		 */
-		do {
-			err = mmu_get_bottom_pgd(kbdev, &kctx->mmu,
-					vpfn, &pgd);
-			if (err != -ENOMEM)
-				break;
-			/* Fill the memory pool with enough pages for
-			 * the page walk to succeed
-			 */
-			rt_mutex_unlock(&kctx->mmu.mmu_lock);
-			err = kbase_mem_pool_grow(
-				&kbdev->mem_pools.small[
-					kctx->mmu.group_id],
-				MIDGARD_MMU_BOTTOMLEVEL);
-			rt_mutex_lock(&kctx->mmu.mmu_lock);
-		} while (!err);
+		/* insert_level < cur_level if there's no valid PGD for cur_level and insert_vpn */
+		err = mmu_get_lowest_valid_pgd(kbdev, mmut, insert_vpfn, cur_level, &insert_level,
+					       &pgd);
+
 		if (err) {
-			dev_warn(kbdev->dev, "%s: mmu_get_bottom_pgd failure\n",
-				 __func__);
-			if (recover_required) {
-				/* Invalidate the pages we have partially
-				 * completed
-				 */
-				mmu_insert_pages_failure_recovery(kbdev,
-						&kctx->mmu,
-						start_vpfn,
-						start_vpfn + recover_count);
-			}
+			dev_err(kbdev->dev, "%s: mmu_get_lowest_valid_pgd() returned error %d",
+				__func__, err);
 			goto fail_unlock;
 		}
 
+		/* No valid pgd at cur_level */
+		if (insert_level != cur_level) {
+			/* Allocate new pgds for all missing levels from the required level
+			 * down to the lowest valid pgd at insert_level
+			 */
+			err = mmu_insert_alloc_pgds(kbdev, mmut, new_pgds, (insert_level + 1),
+						    cur_level);
+			if (err)
+				goto fail_unlock;
+
+			newly_created_pgd = true;
+
+			new_pgds[insert_level] = pgd;
+
+			/* If we didn't find an existing valid pgd at cur_level,
+			 * we've now allocated one. The ATE in the next step should
+			 * be inserted in this newly allocated pgd.
+			 */
+			pgd = new_pgds[cur_level];
+		}
+
 		p = pfn_to_page(PFN_DOWN(pgd));
-		pgd_page = kmap(p);
+
+		pgd_page = kbase_kmap(p);
 		if (!pgd_page) {
-			dev_warn(kbdev->dev, "%s: kmap failure\n", __func__);
-			if (recover_required) {
-				/* Invalidate the pages we have partially
-				 * completed
-				 */
-				mmu_insert_pages_failure_recovery(kbdev,
-						&kctx->mmu,
-						start_vpfn,
-						start_vpfn + recover_count);
-			}
+			dev_err(kbdev->dev, "%s: kmap failure", __func__);
 			err = -ENOMEM;
-			goto fail_unlock;
+
+			goto fail_unlock_free_pgds;
 		}
 
 		num_of_valid_entries =
 			kbdev->mmu_mode->get_num_valid_entries(pgd_page);
 
 		for (i = 0; i < count; i++) {
-			unsigned int ofs = index + i;
+			unsigned int ofs = vindex + i;
 
 			/* Fail if the current page is a valid ATE entry */
 			KBASE_DEBUG_ASSERT(0 == (pgd_page[ofs] & 1UL));
@@ -1562,55 +2135,170 @@ int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn,
 		kbdev->mmu_mode->set_num_valid_entries(
 			pgd_page, num_of_valid_entries + count);
 
-		vpfn += count;
-		remain -= count;
+		dirty_pgds |= 1ULL << (newly_created_pgd ? insert_level : MIDGARD_MMU_BOTTOMLEVEL);
 
-		kbase_mmu_sync_pgd(kbdev,
-				kbase_dma_addr(p) + (index * sizeof(u64)),
-				count * sizeof(u64));
-
-		kunmap(p);
-		/* We have started modifying the page table.
-		 * If further pages need inserting and fail we need to undo what
-		 * has already taken place
+		/* MMU cache flush operation here will depend on whether bottom level
+		 * PGD is newly created or not.
+		 *
+		 * If bottom level PGD is newly created then no GPU cache maintenance is
+		 * required as the PGD will not exist in GPU cache. Otherwise GPU cache
+		 * maintenance is required for existing PGD.
 		 */
-		recover_required = true;
-		recover_count += count;
+		flush_op = newly_created_pgd ? KBASE_MMU_OP_NONE : KBASE_MMU_OP_FLUSH_PT;
+
+		kbase_mmu_sync_pgd(kbdev, kctx, pgd + (vindex * sizeof(u64)),
+				   kbase_dma_addr(p) + (vindex * sizeof(u64)), count * sizeof(u64),
+				   flush_op);
+
+		if (newly_created_pgd) {
+			err = update_parent_pgds(kbdev, mmut, cur_level, insert_level, insert_vpfn,
+						 new_pgds);
+			if (err) {
+				dev_err(kbdev->dev, "%s: update_parent_pgds() failed (%d)",
+					__func__, err);
+
+				kbdev->mmu_mode->entries_invalidate(&pgd_page[vindex], count);
+
+				kbase_kunmap(p, pgd_page);
+				goto fail_unlock_free_pgds;
+			}
+		}
+
+		insert_vpfn += count;
+		remain -= count;
+		kbase_kunmap(p, pgd_page);
 	}
-	rt_mutex_unlock(&kctx->mmu.mmu_lock);
-	kbase_mmu_flush_invalidate(kctx, start_vpfn, nr, false, mmu_sync_info);
+
+	rt_mutex_unlock(&mmut->mmu_lock);
+
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr, dirty_pgds, mmu_sync_info,
+					  false);
+
 	return 0;
 
+fail_unlock_free_pgds:
+	/* Free the pgds allocated by us from insert_level+1 to bottom level */
+	for (l = cur_level; l > insert_level; l--)
+		kbase_mmu_free_pgd(kbdev, mmut, new_pgds[l]);
+
 fail_unlock:
-	rt_mutex_unlock(&kctx->mmu.mmu_lock);
-	kbase_mmu_flush_invalidate(kctx, start_vpfn, nr, false, mmu_sync_info);
+	if (insert_vpfn != start_vpfn) {
+		/* Invalidate the pages we have partially completed */
+		mmu_insert_pages_failure_recovery(kbdev, mmut, start_vpfn, insert_vpfn, &dirty_pgds,
+						  NULL, true);
+	}
+
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr, dirty_pgds, mmu_sync_info,
+					  true);
+	kbase_mmu_free_pgds_list(kbdev, mmut);
+	rt_mutex_unlock(&mmut->mmu_lock);
+
 	return err;
 }
 
-static void kbase_mmu_free_pgd(struct kbase_device *kbdev,
-			       struct kbase_mmu_table *mmut, phys_addr_t pgd,
-			       bool dirty)
+int kbase_mmu_insert_single_imported_page(struct kbase_context *kctx, u64 vpfn,
+					  struct tagged_addr phys, size_t nr, unsigned long flags,
+					  int const group_id,
+					  enum kbase_caller_mmu_sync_info mmu_sync_info)
 {
-	struct page *p;
+	/* The aliasing sink page has metadata and shall be moved to NOT_MOVABLE. */
+	return kbase_mmu_insert_single_page(kctx, vpfn, phys, nr, flags, group_id, mmu_sync_info,
+					    false);
+}
 
-	lockdep_assert_held(&mmut->mmu_lock);
+int kbase_mmu_insert_single_aliased_page(struct kbase_context *kctx, u64 vpfn,
+					 struct tagged_addr phys, size_t nr, unsigned long flags,
+					 int const group_id,
+					 enum kbase_caller_mmu_sync_info mmu_sync_info)
+{
+	/* The aliasing sink page has metadata and shall be moved to NOT_MOVABLE. */
+	return kbase_mmu_insert_single_page(kctx, vpfn, phys, nr, flags, group_id, mmu_sync_info,
+					    false);
+}
 
-	p = pfn_to_page(PFN_DOWN(pgd));
+static void kbase_mmu_progress_migration_on_insert(struct tagged_addr phys,
+						   struct kbase_va_region *reg,
+						   struct kbase_mmu_table *mmut, const u64 vpfn)
+{
+	struct page *phys_page = as_page(phys);
+	struct kbase_page_metadata *page_md = kbase_page_private(phys_page);
 
-	kbase_mem_pool_free(&kbdev->mem_pools.small[mmut->group_id],
-			    p, dirty);
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
 
-	atomic_sub(1, &kbdev->memdev.used_pages);
+	spin_lock(&page_md->migrate_lock);
 
-	/* If MMU tables belong to a context then pages will have been accounted
-	 * against it, so we must decrement the usage counts here.
+	/* If no GPU va region is given: the metadata provided are
+	 * invalid.
+	 *
+	 * If the page is already allocated and mapped: this is
+	 * an additional GPU mapping, probably to create a memory
+	 * alias, which means it is no longer possible to migrate
+	 * the page easily because tracking all the GPU mappings
+	 * would be too costly.
+	 *
+	 * In any case: the page becomes not movable. It is kept
+	 * alive, but attempts to migrate it will fail. The page
+	 * will be freed if it is still not movable when it returns
+	 * to a memory pool. Notice that the movable flag is not
+	 * cleared because that would require taking the page lock.
 	 */
-	if (mmut->kctx) {
-		kbase_process_page_usage_dec(mmut->kctx, 1);
-		atomic_sub(1, &mmut->kctx->used_pages);
+	if (!reg || PAGE_STATUS_GET(page_md->status) == (u8)ALLOCATED_MAPPED) {
+		page_md->status = PAGE_STATUS_SET(page_md->status, (u8)NOT_MOVABLE);
+	} else if (PAGE_STATUS_GET(page_md->status) == (u8)ALLOCATE_IN_PROGRESS) {
+		page_md->status = PAGE_STATUS_SET(page_md->status, (u8)ALLOCATED_MAPPED);
+		page_md->data.mapped.reg = reg;
+		page_md->data.mapped.mmut = mmut;
+		page_md->data.mapped.vpfn = vpfn;
 	}
 
-	kbase_trace_gpu_mem_usage_dec(kbdev, mmut->kctx, 1);
+	spin_unlock(&page_md->migrate_lock);
+}
+
+static void kbase_mmu_progress_migration_on_teardown(struct kbase_device *kbdev,
+						     struct tagged_addr *phys, size_t requested_nr)
+{
+	size_t i;
+
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return;
+
+	for (i = 0; i < requested_nr; i++) {
+		struct page *phys_page = as_page(phys[i]);
+		struct kbase_page_metadata *page_md = kbase_page_private(phys_page);
+
+		/* Skip the 4KB page that is part of a large page, as the large page is
+		 * excluded from the migration process.
+		 */
+		if (is_huge(phys[i]) || is_partial(phys[i]))
+			continue;
+
+		if (page_md) {
+			u8 status;
+
+			spin_lock(&page_md->migrate_lock);
+			status = PAGE_STATUS_GET(page_md->status);
+
+			if (status == ALLOCATED_MAPPED) {
+				if (IS_PAGE_ISOLATED(page_md->status)) {
+					page_md->status = PAGE_STATUS_SET(
+						page_md->status, (u8)FREE_ISOLATED_IN_PROGRESS);
+					page_md->data.free_isolated.kbdev = kbdev;
+					/* At this point, we still have a reference
+					 * to the page via its page migration metadata,
+					 * and any page with the FREE_ISOLATED_IN_PROGRESS
+					 * status will subsequently be freed in either
+					 * kbase_page_migrate() or kbase_page_putback()
+					 */
+					phys[i] = as_tagged(0);
+				} else
+					page_md->status = PAGE_STATUS_SET(page_md->status,
+									  (u8)FREE_IN_PROGRESS);
+			}
+
+			spin_unlock(&page_md->migrate_lock);
+		}
+	}
 }
 
 u64 kbase_mmu_create_ate(struct kbase_device *const kbdev,
@@ -1624,12 +2312,10 @@ u64 kbase_mmu_create_ate(struct kbase_device *const kbdev,
 		group_id, level, entry);
 }
 
-int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
-				    struct kbase_mmu_table *mmut,
-				    const u64 start_vpfn,
-				    struct tagged_addr *phys, size_t nr,
-				    unsigned long flags,
-				    int const group_id)
+static int mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				     const u64 start_vpfn, struct tagged_addr *phys, size_t nr,
+				     unsigned long flags, int const group_id, u64 *dirty_pgds,
+				     struct kbase_va_region *reg, bool ignore_page_migration)
 {
 	phys_addr_t pgd;
 	u64 *pgd_page;
@@ -1637,6 +2323,9 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
 	size_t remain = nr;
 	int err;
 	struct kbase_mmu_mode const *mmu_mode;
+	unsigned int i;
+	phys_addr_t new_pgds[MIDGARD_MMU_BOTTOMLEVEL + 1];
+	int l, cur_level, insert_level;
 
 	/* Note that 0 is a valid start_vpfn */
 	/* 64-bit address range is the max */
@@ -1651,12 +2340,12 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
 	rt_mutex_lock(&mmut->mmu_lock);
 
 	while (remain) {
-		unsigned int i;
 		unsigned int vindex = insert_vpfn & 0x1FF;
 		unsigned int count = KBASE_MMU_PAGE_ENTRIES - vindex;
 		struct page *p;
-		int cur_level;
 		register unsigned int num_of_valid_entries;
+		bool newly_created_pgd = false;
+		enum kbase_mmu_op_type flush_op;
 
 		if (count > remain)
 			count = remain;
@@ -1666,55 +2355,54 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
 		else
 			cur_level = MIDGARD_MMU_BOTTOMLEVEL;
 
+		insert_level = cur_level;
+
 		/*
-		 * Repeatedly calling mmu_get_pgd_at_level() is clearly
+		 * Repeatedly calling mmu_get_lowest_valid_pgd() is clearly
 		 * suboptimal. We don't have to re-parse the whole tree
 		 * each time (just cache the l0-l2 sequence).
 		 * On the other hand, it's only a gain when we map more than
 		 * 256 pages at once (on average). Do we really care?
 		 */
-		do {
-			err = mmu_get_pgd_at_level(kbdev, mmut, insert_vpfn,
-						   cur_level, &pgd);
-			if (err != -ENOMEM)
-				break;
-			/* Fill the memory pool with enough pages for
-			 * the page walk to succeed
-			 */
-			rt_mutex_unlock(&mmut->mmu_lock);
-			err = kbase_mem_pool_grow(
-				&kbdev->mem_pools.small[mmut->group_id],
-				cur_level);
-			rt_mutex_lock(&mmut->mmu_lock);
-		} while (!err);
+		/* insert_level < cur_level if there's no valid PGD for cur_level and insert_vpn */
+		err = mmu_get_lowest_valid_pgd(kbdev, mmut, insert_vpfn, cur_level, &insert_level,
+					       &pgd);
 
 		if (err) {
-			dev_warn(kbdev->dev,
-				 "%s: mmu_get_bottom_pgd failure\n", __func__);
-			if (insert_vpfn != start_vpfn) {
-				/* Invalidate the pages we have partially
-				 * completed
-				 */
-				mmu_insert_pages_failure_recovery(kbdev,
-						mmut, start_vpfn, insert_vpfn);
-			}
+			dev_err(kbdev->dev, "%s: mmu_get_lowest_valid_pgd() returned error %d",
+				__func__, err);
 			goto fail_unlock;
 		}
 
+		/* No valid pgd at cur_level */
+		if (insert_level != cur_level) {
+			/* Allocate new pgds for all missing levels from the required level
+			 * down to the lowest valid pgd at insert_level
+			 */
+			err = mmu_insert_alloc_pgds(kbdev, mmut, new_pgds, (insert_level + 1),
+						    cur_level);
+			if (err)
+				goto fail_unlock;
+
+			newly_created_pgd = true;
+
+			new_pgds[insert_level] = pgd;
+
+			/* If we didn't find an existing valid pgd at cur_level,
+			 * we've now allocated one. The ATE in the next step should
+			 * be inserted in this newly allocated pgd.
+			 */
+			pgd = new_pgds[cur_level];
+		}
+
 		p = pfn_to_page(PFN_DOWN(pgd));
-		pgd_page = kmap(p);
+		pgd_page = kbase_kmap(p);
+
 		if (!pgd_page) {
-			dev_warn(kbdev->dev, "%s: kmap failure\n",
-				 __func__);
-			if (insert_vpfn != start_vpfn) {
-				/* Invalidate the pages we have partially
-				 * completed
-				 */
-				mmu_insert_pages_failure_recovery(kbdev,
-						mmut, start_vpfn, insert_vpfn);
-			}
+			dev_err(kbdev->dev, "%s: kmap failure", __func__);
 			err = -ENOMEM;
-			goto fail_unlock;
+
+			goto fail_unlock_free_pgds;
 		}
 
 		num_of_valid_entries =
@@ -1722,18 +2410,8 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
 
 		if (cur_level == MIDGARD_MMU_LEVEL(2)) {
 			int level_index = (insert_vpfn >> 9) & 0x1FF;
-			u64 *target = &pgd_page[level_index];
-
-			if (mmu_mode->pte_is_valid(*target, cur_level)) {
-				kbase_mmu_free_pgd(
-					kbdev, mmut,
-					kbdev->mmu_mode->pte_to_phy_addr(
-						*target),
-					false);
-				num_of_valid_entries--;
-			}
-			*target = kbase_mmu_create_ate(kbdev, *phys, flags,
-				cur_level, group_id);
+			pgd_page[level_index] =
+				kbase_mmu_create_ate(kbdev, *phys, flags, cur_level, group_id);
 
 			num_of_valid_entries++;
 		} else {
@@ -1752,27 +2430,94 @@ int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
 
 				*target = kbase_mmu_create_ate(kbdev,
 					phys[i], flags, cur_level, group_id);
+
+				/* If page migration is enabled, this is the right time
+				 * to update the status of the page.
+				 */
+				if (kbase_is_page_migration_enabled() && !ignore_page_migration &&
+				    !is_huge(phys[i]) && !is_partial(phys[i]))
+					kbase_mmu_progress_migration_on_insert(phys[i], reg, mmut,
+									       insert_vpfn + i);
 			}
 			num_of_valid_entries += count;
 		}
 
 		mmu_mode->set_num_valid_entries(pgd_page, num_of_valid_entries);
 
+		if (dirty_pgds)
+			*dirty_pgds |= 1ULL << (newly_created_pgd ? insert_level : cur_level);
+
+		/* MMU cache flush operation here will depend on whether bottom level
+		 * PGD is newly created or not.
+		 *
+		 * If bottom level PGD is newly created then no GPU cache maintenance is
+		 * required as the PGD will not exist in GPU cache. Otherwise GPU cache
+		 * maintenance is required for existing PGD.
+		 */
+		flush_op = newly_created_pgd ? KBASE_MMU_OP_NONE : KBASE_MMU_OP_FLUSH_PT;
+
+		kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (vindex * sizeof(u64)),
+				   kbase_dma_addr(p) + (vindex * sizeof(u64)), count * sizeof(u64),
+				   flush_op);
+
+		if (newly_created_pgd) {
+			err = update_parent_pgds(kbdev, mmut, cur_level, insert_level, insert_vpfn,
+						 new_pgds);
+			if (err) {
+				dev_err(kbdev->dev, "%s: update_parent_pgds() failed (%d)",
+					__func__, err);
+
+				kbdev->mmu_mode->entries_invalidate(&pgd_page[vindex], count);
+
+				kbase_kunmap(p, pgd_page);
+				goto fail_unlock_free_pgds;
+			}
+		}
+
 		phys += count;
 		insert_vpfn += count;
 		remain -= count;
+		kbase_kunmap(p, pgd_page);
+	}
 
-		kbase_mmu_sync_pgd(kbdev,
-				kbase_dma_addr(p) + (vindex * sizeof(u64)),
-				count * sizeof(u64));
+	rt_mutex_unlock(&mmut->mmu_lock);
 
-		kunmap(p);
-	}
+	return 0;
 
-	err = 0;
+fail_unlock_free_pgds:
+	/* Free the pgds allocated by us from insert_level+1 to bottom level */
+	for (l = cur_level; l > insert_level; l--)
+		kbase_mmu_free_pgd(kbdev, mmut, new_pgds[l]);
 
 fail_unlock:
+	if (insert_vpfn != start_vpfn) {
+		/* Invalidate the pages we have partially completed */
+		mmu_insert_pages_failure_recovery(kbdev, mmut, start_vpfn, insert_vpfn, dirty_pgds,
+						  phys, ignore_page_migration);
+	}
+
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, start_vpfn, nr,
+					  dirty_pgds ? *dirty_pgds : 0xF, CALLER_MMU_ASYNC, true);
+	kbase_mmu_free_pgds_list(kbdev, mmut);
 	rt_mutex_unlock(&mmut->mmu_lock);
+
+	return err;
+}
+
+int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				    const u64 start_vpfn, struct tagged_addr *phys, size_t nr,
+				    unsigned long flags, int const group_id, u64 *dirty_pgds,
+				    struct kbase_va_region *reg)
+{
+	int err;
+
+	/* Early out if there is nothing to do */
+	if (nr == 0)
+		return 0;
+
+	err = mmu_insert_pages_no_flush(kbdev, mmut, start_vpfn, phys, nr, flags, group_id,
+					dirty_pgds, reg, false);
+
 	return err;
 }
 
@@ -1780,31 +2525,86 @@ fail_unlock:
  * Map 'nr' pages pointed to by 'phys' at GPU PFN 'vpfn' for GPU address space
  * number 'as_nr'.
  */
-int kbase_mmu_insert_pages(struct kbase_device *kbdev,
-			   struct kbase_mmu_table *mmut, u64 vpfn,
-			   struct tagged_addr *phys, size_t nr,
-			   unsigned long flags, int as_nr, int const group_id,
-			   enum kbase_caller_mmu_sync_info mmu_sync_info)
+int kbase_mmu_insert_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+			   struct tagged_addr *phys, size_t nr, unsigned long flags, int as_nr,
+			   int const group_id, enum kbase_caller_mmu_sync_info mmu_sync_info,
+			   struct kbase_va_region *reg)
 {
 	int err;
+	u64 dirty_pgds = 0;
+
+	/* Early out if there is nothing to do */
+	if (nr == 0)
+		return 0;
 
-	err = kbase_mmu_insert_pages_no_flush(kbdev, mmut, vpfn,
-			phys, nr, flags, group_id);
+	err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds,
+					reg, false);
+	if (err)
+		return err;
 
-	if (mmut->kctx)
-		kbase_mmu_flush_invalidate(mmut->kctx, vpfn, nr, false,
-					   mmu_sync_info);
-	else
-		kbase_mmu_flush_invalidate_no_ctx(kbdev, vpfn, nr, false, as_nr,
-						  mmu_sync_info);
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false);
 
-	return err;
+	return 0;
 }
 
 KBASE_EXPORT_TEST_API(kbase_mmu_insert_pages);
 
+int kbase_mmu_insert_pages_skip_status_update(struct kbase_device *kbdev,
+					      struct kbase_mmu_table *mmut, u64 vpfn,
+					      struct tagged_addr *phys, size_t nr,
+					      unsigned long flags, int as_nr, int const group_id,
+					      enum kbase_caller_mmu_sync_info mmu_sync_info,
+					      struct kbase_va_region *reg)
+{
+	int err;
+	u64 dirty_pgds = 0;
+
+	/* Early out if there is nothing to do */
+	if (nr == 0)
+		return 0;
+
+	/* Imported allocations don't have metadata and therefore always ignore the
+	 * page migration logic.
+	 */
+	err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds,
+					reg, true);
+	if (err)
+		return err;
+
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false);
+
+	return 0;
+}
+
+int kbase_mmu_insert_aliased_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				   u64 vpfn, struct tagged_addr *phys, size_t nr,
+				   unsigned long flags, int as_nr, int const group_id,
+				   enum kbase_caller_mmu_sync_info mmu_sync_info,
+				   struct kbase_va_region *reg)
+{
+	int err;
+	u64 dirty_pgds = 0;
+
+	/* Early out if there is nothing to do */
+	if (nr == 0)
+		return 0;
+
+	/* Memory aliases are always built on top of existing allocations,
+	 * therefore the state of physical pages shall be updated.
+	 */
+	err = mmu_insert_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id, &dirty_pgds,
+					reg, false);
+	if (err)
+		return err;
+
+	mmu_flush_invalidate_insert_pages(kbdev, mmut, vpfn, nr, dirty_pgds, mmu_sync_info, false);
+
+	return 0;
+}
+
+#if !MALI_USE_CSF
 /**
- * kbase_mmu_flush_invalidate_noretain() - Flush and invalidate the GPU caches
+ * kbase_mmu_flush_noretain() - Flush and invalidate the GPU caches
  * without retaining the kbase context.
  * @kctx: The KBase context.
  * @vpfn: The virtual page frame number to start the flush on.
@@ -1813,17 +2613,15 @@ KBASE_EXPORT_TEST_API(kbase_mmu_insert_pages);
  * As per kbase_mmu_flush_invalidate but doesn't retain the kctx or do any
  * other locking.
  */
-static void kbase_mmu_flush_invalidate_noretain(struct kbase_context *kctx,
-						u64 vpfn, size_t nr)
+static void kbase_mmu_flush_noretain(struct kbase_context *kctx, u64 vpfn, size_t nr)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	struct kbase_mmu_hw_op_param op_param;
 	int err;
-
 	/* Calls to this function are inherently asynchronous, with respect to
 	 * MMU operations.
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	struct kbase_mmu_hw_op_param op_param;
 
 	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
 	lockdep_assert_held(&kctx->kbdev->mmu_hw_mutex);
@@ -1833,154 +2631,32 @@ static void kbase_mmu_flush_invalidate_noretain(struct kbase_context *kctx,
 		return;
 
 	/* flush L2 and unlock the VA (resumes the MMU) */
-	op_param = (struct kbase_mmu_hw_op_param){
-		.vpfn = vpfn,
-		.nr = nr,
-		.op = KBASE_MMU_OP_FLUSH_MEM,
-		.kctx_id = kctx->id,
-		.mmu_sync_info = mmu_sync_info,
-	};
-
+	op_param.vpfn = vpfn;
+	op_param.nr = nr;
+	op_param.op = KBASE_MMU_OP_FLUSH_MEM;
+	op_param.kctx_id = kctx->id;
+	op_param.mmu_sync_info = mmu_sync_info;
 	if (mmu_flush_cache_on_gpu_ctrl(kbdev)) {
-		err = mmu_flush_invalidate_on_gpu_ctrl(
-			kbdev, &kbdev->as[kctx->as_nr], &op_param);
+		/* Value used to prevent skipping of any levels when flushing */
+		op_param.flush_skip_levels = pgd_level_to_skip_flush(0xF);
+		err = kbase_mmu_hw_do_flush_on_gpu_ctrl(kbdev, &kbdev->as[kctx->as_nr],
+							&op_param);
 	} else {
-		err = kbase_mmu_hw_do_operation(kbdev, &kbdev->as[kctx->as_nr],
-						&op_param);
+		err = kbase_mmu_hw_do_flush_locked(kbdev, &kbdev->as[kctx->as_nr],
+						   &op_param);
 	}
 
 	if (err) {
 		/* Flush failed to complete, assume the
 		 * GPU has hung and perform a reset to recover
 		 */
-		dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover\n");
+		dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover");
 
 		if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE))
 			kbase_reset_gpu_locked(kbdev);
 	}
 }
-
-/* Perform a flush/invalidate on a particular address space
- */
-static void
-kbase_mmu_flush_invalidate_as(struct kbase_device *kbdev, struct kbase_as *as,
-			      u64 vpfn, size_t nr, bool sync, u32 kctx_id,
-			      enum kbase_caller_mmu_sync_info mmu_sync_info)
-{
-	int err;
-	bool gpu_powered;
-	unsigned long flags;
-	struct kbase_mmu_hw_op_param op_param;
-
-	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-	gpu_powered = kbdev->pm.backend.gpu_powered;
-	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-
-	/* GPU is off so there's no need to perform flush/invalidate.
-	 * But even if GPU is not actually powered down, after gpu_powered flag
-	 * was set to false, it is still safe to skip the flush/invalidate.
-	 * The TLB invalidation will anyways be performed due to AS_COMMAND_UPDATE
-	 * which is sent when address spaces are restored after gpu_powered flag
-	 * is set to true. Flushing of L2 cache is certainly not required as L2
-	 * cache is definitely off if gpu_powered is false.
-	 */
-	if (!gpu_powered)
-		return;
-
-	if (kbase_pm_context_active_handle_suspend(kbdev,
-				KBASE_PM_SUSPEND_HANDLER_DONT_REACTIVATE)) {
-		/* GPU has just been powered off due to system suspend.
-		 * So again, no need to perform flush/invalidate.
-		 */
-		return;
-	}
-
-	/* AS transaction begin */
-	mutex_lock(&kbdev->mmu_hw_mutex);
-
-	op_param = (struct kbase_mmu_hw_op_param){
-		.vpfn = vpfn,
-		.nr = nr,
-		.kctx_id = kctx_id,
-		.mmu_sync_info = mmu_sync_info,
-	};
-
-	if (sync)
-		op_param.op = KBASE_MMU_OP_FLUSH_MEM;
-	else
-		op_param.op = KBASE_MMU_OP_FLUSH_PT;
-
-	if (mmu_flush_cache_on_gpu_ctrl(kbdev)) {
-		spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-		err = mmu_flush_invalidate_on_gpu_ctrl(kbdev, as, &op_param);
-		spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
-	} else {
-		mmu_hw_operation_begin(kbdev);
-		err = kbase_mmu_hw_do_operation(kbdev, as, &op_param);
-		mmu_hw_operation_end(kbdev);
-	}
-
-	if (err) {
-		/* Flush failed to complete, assume the GPU has hung and
-		 * perform a reset to recover
-		 */
-		dev_err(kbdev->dev, "Flush for GPU page table update did not complete. Issuing GPU soft-reset to recover\n");
-
-		if (kbase_prepare_to_reset_gpu(
-			    kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
-			kbase_reset_gpu(kbdev);
-	}
-
-	mutex_unlock(&kbdev->mmu_hw_mutex);
-	/* AS transaction end */
-
-	kbase_pm_context_idle(kbdev);
-}
-
-static void
-kbase_mmu_flush_invalidate_no_ctx(struct kbase_device *kbdev, u64 vpfn,
-				  size_t nr, bool sync, int as_nr,
-				  enum kbase_caller_mmu_sync_info mmu_sync_info)
-{
-	/* Skip if there is nothing to do */
-	if (nr) {
-		kbase_mmu_flush_invalidate_as(kbdev, &kbdev->as[as_nr], vpfn,
-					      nr, sync, 0xFFFFFFFF,
-					      mmu_sync_info);
-	}
-}
-
-static void
-kbase_mmu_flush_invalidate(struct kbase_context *kctx, u64 vpfn, size_t nr,
-			   bool sync,
-			   enum kbase_caller_mmu_sync_info mmu_sync_info)
-{
-	struct kbase_device *kbdev;
-	bool ctx_is_in_runpool;
-
-	/* Early out if there is nothing to do */
-	if (nr == 0)
-		return;
-
-	kbdev = kctx->kbdev;
-#if !MALI_USE_CSF
-	rt_mutex_lock(&kbdev->js_data.queue_mutex);
-	ctx_is_in_runpool = kbase_ctx_sched_inc_refcount(kctx);
-	rt_mutex_unlock(&kbdev->js_data.queue_mutex);
-#else
-	ctx_is_in_runpool = kbase_ctx_sched_inc_refcount_if_as_valid(kctx);
-#endif /* !MALI_USE_CSF */
-
-	if (ctx_is_in_runpool) {
-		KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
-
-		kbase_mmu_flush_invalidate_as(kbdev, &kbdev->as[kctx->as_nr],
-					      vpfn, nr, sync, kctx->id,
-					      mmu_sync_info);
-
-		release_ctx(kbdev, kctx);
-	}
-}
+#endif
 
 void kbase_mmu_update(struct kbase_device *kbdev,
 		struct kbase_mmu_table *mmut,
@@ -2002,6 +2678,88 @@ void kbase_mmu_disable_as(struct kbase_device *kbdev, int as_nr)
 	kbdev->mmu_mode->disable_as(kbdev, as_nr);
 }
 
+#if MALI_USE_CSF
+void kbase_mmu_disable(struct kbase_context *kctx)
+{
+	/* Calls to this function are inherently asynchronous, with respect to
+	 * MMU operations.
+	 */
+	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	struct kbase_device *kbdev = kctx->kbdev;
+	struct kbase_mmu_hw_op_param op_param = { 0 };
+	int lock_err, flush_err;
+
+	/* ASSERT that the context has a valid as_nr, which is only the case
+	 * when it's scheduled in.
+	 *
+	 * as_nr won't change because the caller has the hwaccess_lock
+	 */
+	KBASE_DEBUG_ASSERT(kctx->as_nr != KBASEP_AS_NR_INVALID);
+
+	lockdep_assert_held(&kctx->kbdev->hwaccess_lock);
+	lockdep_assert_held(&kctx->kbdev->mmu_hw_mutex);
+
+	op_param.vpfn = 0;
+	op_param.nr = ~0;
+	op_param.op = KBASE_MMU_OP_FLUSH_MEM;
+	op_param.kctx_id = kctx->id;
+	op_param.mmu_sync_info = mmu_sync_info;
+
+#if MALI_USE_CSF
+	/* 0xF value used to prevent skipping of any levels when flushing */
+	if (mmu_flush_cache_on_gpu_ctrl(kbdev))
+		op_param.flush_skip_levels = pgd_level_to_skip_flush(0xF);
+#endif
+
+	/* lock MMU to prevent existing jobs on GPU from executing while the AS is
+	 * not yet disabled
+	 */
+	lock_err = kbase_mmu_hw_do_lock(kbdev, &kbdev->as[kctx->as_nr], &op_param);
+	if (lock_err)
+		dev_err(kbdev->dev, "Failed to lock AS %d for ctx %d_%d", kctx->as_nr, kctx->tgid,
+			kctx->id);
+
+	/* Issue the flush command only when L2 cache is in stable power on state.
+	 * Any other state for L2 cache implies that shader cores are powered off,
+	 * which in turn implies there is no execution happening on the GPU.
+	 */
+	if (kbdev->pm.backend.l2_state == KBASE_L2_ON) {
+		flush_err = kbase_gpu_cache_flush_and_busy_wait(kbdev,
+								GPU_COMMAND_CACHE_CLN_INV_L2_LSC);
+		if (flush_err)
+			dev_err(kbdev->dev,
+				"Failed to flush GPU cache when disabling AS %d for ctx %d_%d",
+				kctx->as_nr, kctx->tgid, kctx->id);
+	}
+	kbdev->mmu_mode->disable_as(kbdev, kctx->as_nr);
+
+	if (!lock_err) {
+		/* unlock the MMU to allow it to resume */
+		lock_err =
+			kbase_mmu_hw_do_unlock_no_addr(kbdev, &kbdev->as[kctx->as_nr], &op_param);
+		if (lock_err)
+			dev_err(kbdev->dev, "Failed to unlock AS %d for ctx %d_%d", kctx->as_nr,
+				kctx->tgid, kctx->id);
+	}
+
+#if !MALI_USE_CSF
+	/*
+	 * JM GPUs has some L1 read only caches that need to be invalidated
+	 * with START_FLUSH configuration. Purge the MMU disabled kctx from
+	 * the slot_rb tracking field so such invalidation is performed when
+	 * a new katom is executed on the affected slots.
+	 */
+	kbase_backend_slot_kctx_purge_locked(kbdev, kctx);
+#endif
+
+	/* kbase_gpu_cache_flush_and_busy_wait() will reset the GPU on timeout. Only
+	 * reset the GPU if locking or unlocking fails.
+	 */
+	if (lock_err)
+		if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE))
+			kbase_reset_gpu_locked(kbdev);
+}
+#else
 void kbase_mmu_disable(struct kbase_context *kctx)
 {
 	/* ASSERT that the context has a valid as_nr, which is only the case
@@ -2021,7 +2779,7 @@ void kbase_mmu_disable(struct kbase_context *kctx)
 	 * The job scheduler code will already be holding the locks and context
 	 * so just do the flush.
 	 */
-	kbase_mmu_flush_invalidate_noretain(kctx, 0, ~0);
+	kbase_mmu_flush_noretain(kctx, 0, ~0);
 
 	kctx->kbdev->mmu_mode->disable_as(kctx->kbdev, kctx->as_nr);
 #if !MALI_USE_CSF
@@ -2034,12 +2792,13 @@ void kbase_mmu_disable(struct kbase_context *kctx)
 	kbase_backend_slot_kctx_purge_locked(kctx->kbdev, kctx);
 #endif
 }
+#endif
 KBASE_EXPORT_TEST_API(kbase_mmu_disable);
 
 static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev,
-						  struct kbase_mmu_table *mmut,
-						  phys_addr_t *pgds, u64 vpfn,
-						  int level)
+						  struct kbase_mmu_table *mmut, phys_addr_t *pgds,
+						  u64 vpfn, int level,
+						  enum kbase_mmu_op_type flush_op, u64 *dirty_pgds)
 {
 	int current_level;
 
@@ -2047,83 +2806,116 @@ static void kbase_mmu_update_and_free_parent_pgds(struct kbase_device *kbdev,
 
 	for (current_level = level - 1; current_level >= MIDGARD_MMU_LEVEL(0);
 	     current_level--) {
-		u64 *current_page = kmap(phys_to_page(pgds[current_level]));
+		phys_addr_t current_pgd = pgds[current_level];
+		struct page *p = phys_to_page(current_pgd);
+
+		u64 *current_page = kbase_kmap(p);
 		unsigned int current_valid_entries =
 			kbdev->mmu_mode->get_num_valid_entries(current_page);
+		int index = (vpfn >> ((3 - current_level) * 9)) & 0x1FF;
 
+		/* We need to track every level that needs updating */
+		if (dirty_pgds)
+			*dirty_pgds |= 1ULL << current_level;
+
+		kbdev->mmu_mode->entries_invalidate(&current_page[index], 1);
 		if (current_valid_entries == 1 &&
 		    current_level != MIDGARD_MMU_LEVEL(0)) {
-			kunmap(phys_to_page(pgds[current_level]));
+			kbase_kunmap(p, current_page);
 
-			kbase_mmu_free_pgd(kbdev, mmut, pgds[current_level],
-					   true);
-		} else {
-			int index = (vpfn >> ((3 - current_level) * 9)) & 0x1FF;
-
-			kbdev->mmu_mode->entry_invalidate(&current_page[index]);
+			/* Ensure the cacheline containing the last valid entry
+			 * of PGD is invalidated from the GPU cache, before the
+			 * PGD page is freed.
+			 */
+			kbase_mmu_sync_pgd_gpu(kbdev, mmut->kctx,
+				current_pgd + (index * sizeof(u64)),
+				sizeof(u64), flush_op);
 
+			kbase_mmu_add_to_free_pgds_list(mmut, p);
+		} else {
 			current_valid_entries--;
 
 			kbdev->mmu_mode->set_num_valid_entries(
 				current_page, current_valid_entries);
 
-			kbase_mmu_sync_pgd(kbdev,
-					   kbase_dma_addr(phys_to_page(
-						   pgds[current_level])) +
-						   8 * index,
-					   8 * 1);
+			kbase_kunmap(p, current_page);
 
-			kunmap(phys_to_page(pgds[current_level]));
+			kbase_mmu_sync_pgd(kbdev, mmut->kctx, current_pgd + (index * sizeof(u64)),
+					   kbase_dma_addr(p) + (index * sizeof(u64)), sizeof(u64),
+					   flush_op);
 			break;
 		}
 	}
 }
 
-/*
- * We actually discard the ATE and free the page table pages if no valid entries
- * exist in PGD.
+/**
+ * mmu_flush_invalidate_teardown_pages() - Perform flush operation after unmapping pages.
  *
- * IMPORTANT: This uses kbasep_js_runpool_release_ctx() when the context is
- * currently scheduled into the runpool, and so potentially uses a lot of locks.
- * These locks must be taken in the correct order with respect to others
- * already held by the caller. Refer to kbasep_js_runpool_release_ctx() for more
- * information.
+ * @kbdev:         Pointer to kbase device.
+ * @kctx:          Pointer to kbase context.
+ * @as_nr:         Address space number, for GPU cache maintenance operations
+ *                 that happen outside a specific kbase context.
+ * @phys:          Array of physical pages to flush.
+ * @phys_page_nr:  Number of physical pages to flush.
+ * @op_param:      Non-NULL pointer to struct containing information about the flush
+ *                 operation to perform.
+ *
+ * This function will do one of three things:
+ * 1. Invalidate the MMU caches, followed by a partial GPU cache flush of the
+ *    individual pages that were unmapped if feature is supported on GPU.
+ * 2. Perform a full GPU cache flush through the GPU_CONTROL interface if feature is
+ *    supported on GPU or,
+ * 3. Perform a full GPU cache flush through the MMU_CONTROL interface.
+ *
+ * When performing a partial GPU cache flush, the number of physical
+ * pages does not have to be identical to the number of virtual pages on the MMU,
+ * to support a single physical address flush for an aliased page.
  */
-int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
-	struct kbase_mmu_table *mmut, u64 vpfn, size_t nr, int as_nr)
+static void mmu_flush_invalidate_teardown_pages(struct kbase_device *kbdev,
+						struct kbase_context *kctx, int as_nr,
+						struct tagged_addr *phys, size_t phys_page_nr,
+						struct kbase_mmu_hw_op_param *op_param)
 {
-	phys_addr_t pgd;
-	u64 start_vpfn = vpfn;
-	size_t requested_nr = nr;
-	struct kbase_mmu_mode const *mmu_mode;
-	int err = -EFAULT;
+	if (!mmu_flush_cache_on_gpu_ctrl(kbdev)) {
+		/* Full cache flush through the MMU_COMMAND */
+		mmu_flush_invalidate(kbdev, kctx, as_nr, op_param);
+	} else if (op_param->op == KBASE_MMU_OP_FLUSH_MEM) {
+		/* Full cache flush through the GPU_CONTROL */
+		mmu_flush_invalidate_on_gpu_ctrl(kbdev, kctx, as_nr, op_param);
+	}
+#if MALI_USE_CSF
+	else {
+		/* Partial GPU cache flush with MMU cache invalidation */
+		unsigned long irq_flags;
+		unsigned int i;
+		bool flush_done = false;
 
-	/* Calls to this function are inherently asynchronous, with respect to
-	 * MMU operations.
-	 */
-	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+		mmu_invalidate(kbdev, kctx, as_nr, op_param);
 
-	if (nr == 0) {
-		/* early out if nothing to do */
-		return 0;
+		for (i = 0; !flush_done && i < phys_page_nr; i++) {
+			spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags);
+			if (kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0))
+				mmu_flush_pa_range(kbdev, as_phys_addr_t(phys[i]), PAGE_SIZE,
+						   KBASE_MMU_OP_FLUSH_MEM);
+			else
+				flush_done = true;
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags);
+		}
 	}
+#endif
+}
 
-	if (!rt_mutex_trylock(&mmut->mmu_lock)) {
-		/*
-		 * Sometimes, mmu_lock takes long time to be released.
-		 * In that case, kswapd is stuck until it can hold
-		 * the lock. Instead, just bail out here so kswapd
-		 * could reclaim other pages.
-		 */
-		if (current_is_kswapd())
-			return -EBUSY;
-		rt_mutex_lock(&mmut->mmu_lock);
-	}
+static int kbase_mmu_teardown_pgd_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+					u64 vpfn, size_t nr, u64 *dirty_pgds,
+					struct list_head *free_pgds_list,
+					enum kbase_mmu_op_type flush_op)
+{
+	struct kbase_mmu_mode const *mmu_mode = kbdev->mmu_mode;
 
-	mmu_mode = kbdev->mmu_mode;
+	lockdep_assert_held(&mmut->mmu_lock);
+	kbase_mmu_reset_free_pgds_list(mmut);
 
 	while (nr) {
-		unsigned int i;
 		unsigned int index = vpfn & 0x1FF;
 		unsigned int count = KBASE_MMU_PAGE_ENTRIES - index;
 		unsigned int pcount;
@@ -2131,19 +2923,19 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
 		u64 *page;
 		phys_addr_t pgds[MIDGARD_MMU_BOTTOMLEVEL + 1];
 		register unsigned int num_of_valid_entries;
+		phys_addr_t pgd = mmut->pgd;
+		struct page *p = phys_to_page(pgd);
 
 		if (count > nr)
 			count = nr;
 
-		/* need to check if this is a 2MB or a 4kB page */
-		pgd = mmut->pgd;
-
+		/* need to check if this is a 2MB page or a 4kB */
 		for (level = MIDGARD_MMU_TOPLEVEL;
 				level <= MIDGARD_MMU_BOTTOMLEVEL; level++) {
 			phys_addr_t next_pgd;
 
 			index = (vpfn >> ((3 - level) * 9)) & 0x1FF;
-			page = kmap(phys_to_page(pgd));
+			page = kbase_kmap(p);
 			if (mmu_mode->ate_is_valid(page[index], level))
 				break; /* keep the mapping */
 			else if (!mmu_mode->pte_is_valid(page[index], level)) {
@@ -2166,28 +2958,31 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
 					count = nr;
 				goto next;
 			}
-			next_pgd = mmu_mode->pte_to_phy_addr(page[index]);
+			next_pgd = mmu_mode->pte_to_phy_addr(
+				kbdev->mgm_dev->ops.mgm_pte_to_original_pte(
+					kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, page[index]));
+			kbase_kunmap(p, page);
 			pgds[level] = pgd;
-			kunmap(phys_to_page(pgd));
 			pgd = next_pgd;
+			p = phys_to_page(pgd);
 		}
 
 		switch (level) {
 		case MIDGARD_MMU_LEVEL(0):
 		case MIDGARD_MMU_LEVEL(1):
-			dev_warn(kbdev->dev,
-				 "%s: No support for ATEs at level %d\n",
-				 __func__, level);
-			kunmap(phys_to_page(pgd));
+			dev_warn(kbdev->dev, "%s: No support for ATEs at level %d", __func__,
+				 level);
+			kbase_kunmap(p, page);
 			goto out;
 		case MIDGARD_MMU_LEVEL(2):
 			/* can only teardown if count >= 512 */
 			if (count >= 512) {
 				pcount = 1;
 			} else {
-				dev_warn(kbdev->dev,
-					 "%s: limiting teardown as it tries to do a partial 2MB teardown, need 512, but have %d to tear down\n",
-					 __func__, count);
+				dev_warn(
+					kbdev->dev,
+					"%s: limiting teardown as it tries to do a partial 2MB teardown, need 512, but have %d to tear down",
+					__func__, count);
 				pcount = 0;
 			}
 			break;
@@ -2196,68 +2991,205 @@ int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
 			pcount = count;
 			break;
 		default:
-			dev_err(kbdev->dev,
-				"%s: found non-mapped memory, early out\n",
-				__func__);
+			dev_err(kbdev->dev, "%s: found non-mapped memory, early out", __func__);
 			vpfn += count;
 			nr -= count;
 			continue;
 		}
 
+		if (pcount > 0)
+			*dirty_pgds |= 1ULL << level;
+
 		num_of_valid_entries = mmu_mode->get_num_valid_entries(page);
 		if (WARN_ON_ONCE(num_of_valid_entries < pcount))
 			num_of_valid_entries = 0;
 		else
 			num_of_valid_entries -= pcount;
 
+		/* Invalidate the entries we added */
+		mmu_mode->entries_invalidate(&page[index], pcount);
+
 		if (!num_of_valid_entries) {
-			kunmap(phys_to_page(pgd));
+			kbase_kunmap(p, page);
+
+			/* Ensure the cacheline(s) containing the last valid entries
+			 * of PGD is invalidated from the GPU cache, before the
+			 * PGD page is freed.
+			 */
+			kbase_mmu_sync_pgd_gpu(kbdev, mmut->kctx,
+				pgd + (index * sizeof(u64)),
+				pcount * sizeof(u64), flush_op);
 
-			kbase_mmu_free_pgd(kbdev, mmut, pgd, true);
+			kbase_mmu_add_to_free_pgds_list(mmut, p);
 
-			kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds,
-							      vpfn, level);
+			kbase_mmu_update_and_free_parent_pgds(kbdev, mmut, pgds, vpfn, level,
+							      flush_op, dirty_pgds);
 
 			vpfn += count;
 			nr -= count;
 			continue;
 		}
 
-		/* Invalidate the entries we added */
-		for (i = 0; i < pcount; i++)
-			mmu_mode->entry_invalidate(&page[index + i]);
-
 		mmu_mode->set_num_valid_entries(page, num_of_valid_entries);
 
-		kbase_mmu_sync_pgd(
-			kbdev, kbase_dma_addr(phys_to_page(pgd)) + 8 * index,
-			8 * pcount);
+		kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)),
+				   kbase_dma_addr(p) + (index * sizeof(u64)), pcount * sizeof(u64),
+				   flush_op);
 next:
-		kunmap(phys_to_page(pgd));
-		vpfn += count;
-		nr -= count;
+	kbase_kunmap(p, page);
+	vpfn += count;
+	nr -= count;
 	}
-	err = 0;
 out:
-	rt_mutex_unlock(&mmut->mmu_lock);
+	return 0;
+}
 
-	if (mmut->kctx)
-		kbase_mmu_flush_invalidate(mmut->kctx, start_vpfn, requested_nr,
-					   true, mmu_sync_info);
-	else
-		kbase_mmu_flush_invalidate_no_ctx(kbdev, start_vpfn,
-						  requested_nr, true, as_nr,
-						  mmu_sync_info);
+/**
+ * mmu_teardown_pages - Remove GPU virtual addresses from the MMU page table
+ *
+ * @kbdev:    Pointer to kbase device.
+ * @mmut:     Pointer to GPU MMU page table.
+ * @vpfn:     Start page frame number of the GPU virtual pages to unmap.
+ * @phys:     Array of physical pages currently mapped to the virtual
+ *            pages to unmap, or NULL. This is used for GPU cache maintenance
+ *            and page migration support.
+ * @nr_phys_pages: Number of physical pages to flush.
+ * @nr_virt_pages: Number of virtual pages whose PTEs should be destroyed.
+ * @as_nr:    Address space number, for GPU cache maintenance operations
+ *            that happen outside a specific kbase context.
+ * @ignore_page_migration: Whether page migration metadata should be ignored.
+ *
+ * We actually discard the ATE and free the page table pages if no valid entries
+ * exist in the PGD.
+ *
+ * IMPORTANT: This uses kbasep_js_runpool_release_ctx() when the context is
+ * currently scheduled into the runpool, and so potentially uses a lot of locks.
+ * These locks must be taken in the correct order with respect to others
+ * already held by the caller. Refer to kbasep_js_runpool_release_ctx() for more
+ * information.
+ *
+ * The @p phys pointer to physical pages is not necessary for unmapping virtual memory,
+ * but it is used for fine-grained GPU cache maintenance. If @p phys is NULL,
+ * GPU cache maintenance will be done as usual; that is, invalidating the whole GPU caches
+ * instead of specific physical address ranges.
+ *
+ * Return: 0 on success, otherwise an error code.
+ */
+static int mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+			      struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages,
+			      int as_nr, bool ignore_page_migration)
+{
+	u64 start_vpfn = vpfn;
+	enum kbase_mmu_op_type flush_op = KBASE_MMU_OP_NONE;
+	struct kbase_mmu_hw_op_param op_param;
+	int err = -EFAULT;
+	u64 dirty_pgds = 0;
+	LIST_HEAD(free_pgds_list);
+
+	/* Calls to this function are inherently asynchronous, with respect to
+	 * MMU operations.
+	 */
+	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+
+	/* This function performs two operations: MMU maintenance and flushing
+	 * the caches. To ensure internal consistency between the caches and the
+	 * MMU, it does not make sense to be able to flush only the physical pages
+	 * from the cache and keep the PTE, nor does it make sense to use this
+	 * function to remove a PTE and keep the physical pages in the cache.
+	 *
+	 * However, we have legitimate cases where we can try to tear down a mapping
+	 * with zero virtual and zero physical pages, so we must have the following
+	 * behaviour:
+	 *  - if both physical and virtual page counts are zero, return early
+	 *  - if either physical and virtual page counts are zero, return early
+	 *  - if there are fewer physical pages than virtual pages, return -EINVAL
+	 */
+	if (unlikely(nr_virt_pages == 0 || nr_phys_pages == 0))
+		return 0;
+
+	if (unlikely(nr_virt_pages < nr_phys_pages))
+		return -EINVAL;
+
+	/* MMU cache flush strategy depends on the number of pages to unmap. In both cases
+	 * the operation is invalidate but the granularity of cache maintenance may change
+	 * according to the situation.
+	 *
+	 * If GPU control command operations are present and the number of pages is "small",
+	 * then the optimal strategy is flushing on the physical address range of the pages
+	 * which are affected by the operation. That implies both the PGDs which are modified
+	 * or removed from the page table and the physical pages which are freed from memory.
+	 *
+	 * Otherwise, there's no alternative to invalidating the whole GPU cache.
+	 */
+	if (mmu_flush_cache_on_gpu_ctrl(kbdev) && phys &&
+	    nr_phys_pages <= KBASE_PA_RANGE_THRESHOLD_NR_PAGES)
+		flush_op = KBASE_MMU_OP_FLUSH_PT;
+
+	if (!rt_mutex_trylock(&mmut->mmu_lock)) {
+		/*
+		 * Sometimes, mmu_lock takes long time to be released.
+		 * In that case, kswapd is stuck until it can hold
+		 * the lock. Instead, just bail out here so kswapd
+		 * could reclaim other pages.
+		 */
+		if (current_is_kswapd())
+			return -EBUSY;
+		rt_mutex_lock(&mmut->mmu_lock);
+	}
+
+	err = kbase_mmu_teardown_pgd_pages(kbdev, mmut, vpfn, nr_virt_pages, &dirty_pgds,
+					   &free_pgds_list, flush_op);
+
+	/* Set up MMU operation parameters. See above about MMU cache flush strategy. */
+	op_param = (struct kbase_mmu_hw_op_param){
+		.vpfn = start_vpfn,
+		.nr = nr_virt_pages,
+		.mmu_sync_info = mmu_sync_info,
+		.kctx_id = mmut->kctx ? mmut->kctx->id : 0xFFFFFFFF,
+		.op = (flush_op == KBASE_MMU_OP_FLUSH_PT) ? KBASE_MMU_OP_FLUSH_PT :
+							    KBASE_MMU_OP_FLUSH_MEM,
+		.flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds),
+	};
+	mmu_flush_invalidate_teardown_pages(kbdev, mmut->kctx, as_nr, phys, nr_phys_pages,
+					    &op_param);
+
+	/* If page migration is enabled: the status of all physical pages involved
+	 * shall be updated, unless they are not movable. Their status shall be
+	 * updated before releasing the lock to protect against concurrent
+	 * requests to migrate the pages, if they have been isolated.
+	 */
+	if (kbase_is_page_migration_enabled() && phys && !ignore_page_migration)
+		kbase_mmu_progress_migration_on_teardown(kbdev, phys, nr_phys_pages);
+
+	kbase_mmu_free_pgds_list(kbdev, mmut);
+
+	rt_mutex_unlock(&mmut->mmu_lock);
 
 	return err;
 }
 
-KBASE_EXPORT_TEST_API(kbase_mmu_teardown_pages);
+int kbase_mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+			     struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages,
+			     int as_nr)
+{
+	return mmu_teardown_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, as_nr,
+				  false);
+}
+
+int kbase_mmu_teardown_imported_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				      u64 vpfn, struct tagged_addr *phys, size_t nr_phys_pages,
+				      size_t nr_virt_pages, int as_nr)
+{
+	return mmu_teardown_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages, as_nr,
+				  true);
+}
 
 /**
- * kbase_mmu_update_pages_no_flush() - Update attributes data in GPU page table entries
+ * kbase_mmu_update_pages_no_flush() - Update phy pages and attributes data in GPU
+ *                                     page table entries
  *
- * @kctx:  Kbase context
+ * @kbdev: Pointer to kbase device.
+ * @mmut:  The involved MMU table
  * @vpfn:  Virtual PFN (Page Frame Number) of the first page to update
  * @phys:  Pointer to the array of tagged physical addresses of the physical
  *         pages that are pointed to by the page table entries (that need to
@@ -2267,28 +3199,25 @@ KBASE_EXPORT_TEST_API(kbase_mmu_teardown_pages);
  * @flags: Flags
  * @group_id: The physical memory group in which the page was allocated.
  *            Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
+ * @dirty_pgds: Flags to track every level where a PGD has been updated.
  *
  * This will update page table entries that already exist on the GPU based on
- * the new flags that are passed (the physical pages pointed to by the page
- * table entries remain unchanged). It is used as a response to the changes of
- * the memory attributes.
+ * new flags and replace any existing phy pages that are passed (the PGD pages
+ * remain unchanged). It is used as a response to the changes of phys as well
+ * as the the memory attributes.
  *
  * The caller is responsible for validating the memory attributes.
  *
  * Return: 0 if the attributes data in page table entries were updated
  *         successfully, otherwise an error code.
  */
-static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
-					struct tagged_addr *phys, size_t nr,
-					unsigned long flags, int const group_id)
+int kbase_mmu_update_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+					   u64 vpfn, struct tagged_addr *phys, size_t nr,
+					   unsigned long flags, int const group_id, u64 *dirty_pgds)
 {
 	phys_addr_t pgd;
 	u64 *pgd_page;
 	int err;
-	struct kbase_device *kbdev;
-
-	if (WARN_ON(kctx == NULL))
-		return -EINVAL;
 
 	KBASE_DEBUG_ASSERT(vpfn <= (U64_MAX / PAGE_SIZE));
 
@@ -2296,9 +3225,7 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
 	if (nr == 0)
 		return 0;
 
-	rt_mutex_lock(&kctx->mmu.mmu_lock);
-
-	kbdev = kctx->kbdev;
+	rt_mutex_lock(&mmut->mmu_lock);
 
 	while (nr) {
 		unsigned int i;
@@ -2314,12 +3241,12 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
 		if (is_huge(*phys) && (index == index_in_large_page(*phys)))
 			cur_level = MIDGARD_MMU_LEVEL(2);
 
-		err = mmu_get_pgd_at_level(kbdev, &kctx->mmu, vpfn, cur_level, &pgd);
+		err = mmu_get_pgd_at_level(kbdev, mmut, vpfn, cur_level, &pgd);
 		if (WARN_ON(err))
 			goto fail_unlock;
 
 		p = pfn_to_page(PFN_DOWN(pgd));
-		pgd_page = kmap(p);
+		pgd_page = kbase_kmap(p);
 		if (!pgd_page) {
 			dev_warn(kbdev->dev, "kmap failure on update_pages");
 			err = -ENOMEM;
@@ -2341,9 +3268,9 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
 			pgd_page[level_index] = kbase_mmu_create_ate(kbdev,
 					*target_phys, flags, MIDGARD_MMU_LEVEL(2),
 					group_id);
-			kbase_mmu_sync_pgd(kbdev,
-				kbase_dma_addr(p) + (level_index * sizeof(u64)),
-				sizeof(u64));
+			kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (level_index * sizeof(u64)),
+					   kbase_dma_addr(p) + (level_index * sizeof(u64)),
+					   sizeof(u64), KBASE_MMU_OP_NONE);
 		} else {
 			for (i = 0; i < count; i++) {
 #ifdef CONFIG_MALI_DEBUG
@@ -2355,148 +3282,568 @@ static int kbase_mmu_update_pages_no_flush(struct kbase_context *kctx, u64 vpfn,
 					phys[i], flags, MIDGARD_MMU_BOTTOMLEVEL,
 					group_id);
 			}
-			kbase_mmu_sync_pgd(kbdev,
-				kbase_dma_addr(p) + (index * sizeof(u64)),
-				count * sizeof(u64));
+
+			/* MMU cache flush strategy is NONE because GPU cache maintenance
+			 * will be done by the caller.
+			 */
+			kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)),
+					   kbase_dma_addr(p) + (index * sizeof(u64)),
+					   count * sizeof(u64), KBASE_MMU_OP_NONE);
 		}
 
 		kbdev->mmu_mode->set_num_valid_entries(pgd_page,
 					num_of_valid_entries);
 
+		if (dirty_pgds && count > 0)
+			*dirty_pgds |= 1ULL << cur_level;
+
 		phys += count;
 		vpfn += count;
 		nr -= count;
 
-		kunmap(p);
+		kbase_kunmap(p, pgd_page);
 	}
 
-	rt_mutex_unlock(&kctx->mmu.mmu_lock);
+	rt_mutex_unlock(&mmut->mmu_lock);
 	return 0;
 
 fail_unlock:
-	rt_mutex_unlock(&kctx->mmu.mmu_lock);
+	rt_mutex_unlock(&mmut->mmu_lock);
 	return err;
 }
 
-int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn,
-			   struct tagged_addr *phys, size_t nr,
-			   unsigned long flags, int const group_id)
+static int kbase_mmu_update_pages_common(struct kbase_device *kbdev, struct kbase_context *kctx,
+					 u64 vpfn, struct tagged_addr *phys, size_t nr,
+					 unsigned long flags, int const group_id)
 {
 	int err;
+	u64 dirty_pgds = 0;
+	struct kbase_mmu_table *mmut;
 
+#if !MALI_USE_CSF
+	if (unlikely(kctx == NULL))
+		return -EINVAL;
+
+	mmut = &kctx->mmu;
+#else
+	mmut = kctx ? &kctx->mmu : &kbdev->csf.mcu_mmu;
+#endif
+
+	err = kbase_mmu_update_pages_no_flush(kbdev, mmut, vpfn, phys, nr, flags, group_id,
+					      &dirty_pgds);
+
+	kbase_mmu_flush_invalidate_update_pages(kbdev, kctx, vpfn, nr, dirty_pgds);
+
+	return err;
+}
+
+void kbase_mmu_flush_invalidate_update_pages(struct kbase_device *kbdev, struct kbase_context *kctx, u64 vpfn,
+					size_t nr, u64 dirty_pgds)
+{
+	struct kbase_mmu_hw_op_param op_param;
 	/* Calls to this function are inherently asynchronous, with respect to
 	 * MMU operations.
 	 */
 	const enum kbase_caller_mmu_sync_info mmu_sync_info = CALLER_MMU_ASYNC;
+	int as_nr;
 
-	err = kbase_mmu_update_pages_no_flush(kctx, vpfn, phys, nr, flags,
-		group_id);
-	kbase_mmu_flush_invalidate(kctx, vpfn, nr, true, mmu_sync_info);
-	return err;
+#if !MALI_USE_CSF
+	if (unlikely(kctx == NULL))
+		return;
+
+	as_nr = kctx->as_nr;
+#else
+	as_nr = kctx ? kctx->as_nr : MCU_AS_NR;
+#endif
+
+	op_param = (const struct kbase_mmu_hw_op_param){
+		.vpfn = vpfn,
+		.nr = nr,
+		.op = KBASE_MMU_OP_FLUSH_MEM,
+		.kctx_id = kctx ? kctx->id : 0xFFFFFFFF,
+		.mmu_sync_info = mmu_sync_info,
+		.flush_skip_levels = pgd_level_to_skip_flush(dirty_pgds),
+	};
+
+	if (mmu_flush_cache_on_gpu_ctrl(kbdev))
+		mmu_flush_invalidate_on_gpu_ctrl(kbdev, kctx, as_nr, &op_param);
+	else
+		mmu_flush_invalidate(kbdev, kctx, as_nr, &op_param);
 }
 
-static void mmu_teardown_level(struct kbase_device *kbdev,
-		struct kbase_mmu_table *mmut, phys_addr_t pgd,
-		int level)
+int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn, struct tagged_addr *phys,
+			   size_t nr, unsigned long flags, int const group_id)
+{
+	if (unlikely(kctx == NULL))
+		return -EINVAL;
+
+	return kbase_mmu_update_pages_common(kctx->kbdev, kctx, vpfn, phys, nr, flags, group_id);
+}
+
+#if MALI_USE_CSF
+int kbase_mmu_update_csf_mcu_pages(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys,
+				   size_t nr, unsigned long flags, int const group_id)
+{
+	return kbase_mmu_update_pages_common(kbdev, NULL, vpfn, phys, nr, flags, group_id);
+}
+#endif /* MALI_USE_CSF */
+
+static void mmu_page_migration_transaction_begin(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	WARN_ON_ONCE(kbdev->mmu_page_migrate_in_progress);
+	kbdev->mmu_page_migrate_in_progress = true;
+}
+
+static void mmu_page_migration_transaction_end(struct kbase_device *kbdev)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	WARN_ON_ONCE(!kbdev->mmu_page_migrate_in_progress);
+	kbdev->mmu_page_migrate_in_progress = false;
+	/* Invoke the PM state machine, as the MMU page migration session
+	 * may have deferred a transition in L2 state machine.
+	 */
+	kbase_pm_update_state(kbdev);
+}
+
+int kbase_mmu_migrate_page(struct tagged_addr old_phys, struct tagged_addr new_phys,
+			   dma_addr_t old_dma_addr, dma_addr_t new_dma_addr, int level)
+{
+	struct kbase_page_metadata *page_md = kbase_page_private(as_page(old_phys));
+	struct kbase_mmu_hw_op_param op_param;
+	struct kbase_mmu_table *mmut = (level == MIDGARD_MMU_BOTTOMLEVEL) ?
+					       page_md->data.mapped.mmut :
+					       page_md->data.pt_mapped.mmut;
+	struct kbase_device *kbdev;
+	phys_addr_t pgd;
+	u64 *old_page, *new_page, *pgd_page, *target, vpfn;
+	int index, check_state, ret = 0;
+	unsigned long hwaccess_flags = 0;
+	unsigned int num_of_valid_entries;
+	u8 vmap_count = 0;
+
+	/* If page migration support is not compiled in, return with fault */
+	if (!IS_ENABLED(CONFIG_PAGE_MIGRATION_SUPPORT))
+		return -EINVAL;
+	/* Due to the hard binding of mmu_command_instr with kctx_id via kbase_mmu_hw_op_param,
+	 * here we skip the no kctx case, which is only used with MCU's mmut.
+	 */
+	if (!mmut->kctx)
+		return -EINVAL;
+
+	if (level > MIDGARD_MMU_BOTTOMLEVEL)
+		return -EINVAL;
+	else if (level == MIDGARD_MMU_BOTTOMLEVEL)
+		vpfn = page_md->data.mapped.vpfn;
+	else
+		vpfn = PGD_VPFN_LEVEL_GET_VPFN(page_md->data.pt_mapped.pgd_vpfn_level);
+
+	kbdev = mmut->kctx->kbdev;
+	index = (vpfn >> ((3 - level) * 9)) & 0x1FF;
+
+	/* Create all mappings before copying content.
+	 * This is done as early as possible because it is the only operation that may
+	 * fail. It is possible to do this before taking any locks because the
+	 * pages to migrate are not going to change and even the parent PGD is not
+	 * going to be affected by any other concurrent operation, since the page
+	 * has been isolated before migration and therefore it cannot disappear in
+	 * the middle of this function.
+	 */
+	old_page = kbase_kmap(as_page(old_phys));
+	if (!old_page) {
+		dev_warn(kbdev->dev, "%s: kmap failure for old page.", __func__);
+		ret = -EINVAL;
+		goto old_page_map_error;
+	}
+
+	new_page = kbase_kmap(as_page(new_phys));
+	if (!new_page) {
+		dev_warn(kbdev->dev, "%s: kmap failure for new page.", __func__);
+		ret = -EINVAL;
+		goto new_page_map_error;
+	}
+
+	/* GPU cache maintenance affects both memory content and page table,
+	 * but at two different stages. A single virtual memory page is affected
+	 * by the migration.
+	 *
+	 * Notice that the MMU maintenance is done in the following steps:
+	 *
+	 * 1) The MMU region is locked without performing any other operation.
+	 *    This lock must cover the entire migration process, in order to
+	 *    prevent any GPU access to the virtual page whose physical page
+	 *    is being migrated.
+	 * 2) Immediately after locking: the MMU region content is flushed via
+	 *    GPU control while the lock is taken and without unlocking.
+	 *    The region must stay locked for the duration of the whole page
+	 *    migration procedure.
+	 *    This is necessary to make sure that pending writes to the old page
+	 *    are finalized before copying content to the new page.
+	 * 3) Before unlocking: changes to the page table are flushed.
+	 *    Finer-grained GPU control operations are used if possible, otherwise
+	 *    the whole GPU cache shall be flushed again.
+	 *    This is necessary to make sure that the GPU accesses the new page
+	 *    after migration.
+	 * 4) The MMU region is unlocked.
+	 */
+#define PGD_VPFN_MASK(level) (~((((u64)1) << ((3 - level) * 9)) - 1))
+	op_param.mmu_sync_info = CALLER_MMU_ASYNC;
+	op_param.kctx_id = mmut->kctx->id;
+	op_param.vpfn = vpfn & PGD_VPFN_MASK(level);
+	op_param.nr = 1 << ((3 - level) * 9);
+	op_param.op = KBASE_MMU_OP_FLUSH_PT;
+	/* When level is not MIDGARD_MMU_BOTTOMLEVEL, it is assumed PGD page migration */
+	op_param.flush_skip_levels = (level == MIDGARD_MMU_BOTTOMLEVEL) ?
+					     pgd_level_to_skip_flush(1ULL << level) :
+					     pgd_level_to_skip_flush(3ULL << level);
+
+	rt_mutex_lock(&mmut->mmu_lock);
+
+	/* The state was evaluated before entering this function, but it could
+	 * have changed before the mmu_lock was taken. However, the state
+	 * transitions which are possible at this point are only two, and in both
+	 * cases it is a stable state progressing to a "free in progress" state.
+	 *
+	 * After taking the mmu_lock the state can no longer change: read it again
+	 * and make sure that it hasn't changed before continuing.
+	 */
+	spin_lock(&page_md->migrate_lock);
+	check_state = PAGE_STATUS_GET(page_md->status);
+	if (level == MIDGARD_MMU_BOTTOMLEVEL)
+		vmap_count = page_md->vmap_count;
+	spin_unlock(&page_md->migrate_lock);
+
+	if (level == MIDGARD_MMU_BOTTOMLEVEL) {
+		if (check_state != ALLOCATED_MAPPED) {
+			dev_dbg(kbdev->dev,
+				"%s: state changed to %d (was %d), abort page migration", __func__,
+				check_state, ALLOCATED_MAPPED);
+			ret = -EAGAIN;
+			goto page_state_change_out;
+		} else if (vmap_count > 0) {
+			dev_dbg(kbdev->dev, "%s: page was multi-mapped, abort page migration",
+				__func__);
+			ret = -EAGAIN;
+			goto page_state_change_out;
+		}
+	} else {
+		if (check_state != PT_MAPPED) {
+			dev_dbg(kbdev->dev,
+				"%s: state changed to %d (was %d), abort PGD page migration",
+				__func__, check_state, PT_MAPPED);
+			WARN_ON_ONCE(check_state != FREE_PT_ISOLATED_IN_PROGRESS);
+			ret = -EAGAIN;
+			goto page_state_change_out;
+		}
+	}
+
+	ret = mmu_get_pgd_at_level(kbdev, mmut, vpfn, level, &pgd);
+	if (ret) {
+		dev_err(kbdev->dev, "%s: failed to find PGD for old page.", __func__);
+		goto get_pgd_at_level_error;
+	}
+
+	pgd_page = kbase_kmap(phys_to_page(pgd));
+	if (!pgd_page) {
+		dev_warn(kbdev->dev, "%s: kmap failure for PGD page.", __func__);
+		ret = -EINVAL;
+		goto pgd_page_map_error;
+	}
+
+	mutex_lock(&kbdev->mmu_hw_mutex);
+
+	/* Lock MMU region and flush GPU cache by using GPU control,
+	 * in order to keep MMU region locked.
+	 */
+	spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags);
+	if (unlikely(!kbase_pm_l2_allow_mmu_page_migration(kbdev))) {
+		/* Defer the migration as L2 is in a transitional phase */
+		spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags);
+		mutex_unlock(&kbdev->mmu_hw_mutex);
+		dev_dbg(kbdev->dev, "%s: L2 in transtion, abort PGD page migration", __func__);
+		ret = -EAGAIN;
+		goto l2_state_defer_out;
+	}
+	/* Prevent transitional phases in L2 by starting the transaction */
+	mmu_page_migration_transaction_begin(kbdev);
+	if (kbdev->pm.backend.gpu_ready && mmut->kctx->as_nr >= 0) {
+		int as_nr = mmut->kctx->as_nr;
+		struct kbase_as *as = &kbdev->as[as_nr];
+
+		ret = kbase_mmu_hw_do_lock(kbdev, as, &op_param);
+		if (!ret) {
+				ret = kbase_gpu_cache_flush_and_busy_wait(
+					kbdev, GPU_COMMAND_CACHE_CLN_INV_L2_LSC);
+		}
+		if (ret)
+			mmu_page_migration_transaction_end(kbdev);
+	}
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags);
+
+	if (ret < 0) {
+		mutex_unlock(&kbdev->mmu_hw_mutex);
+		dev_err(kbdev->dev, "%s: failed to lock MMU region or flush GPU cache", __func__);
+		goto undo_mappings;
+	}
+
+	/* Copy memory content.
+	 *
+	 * It is necessary to claim the ownership of the DMA buffer for the old
+	 * page before performing the copy, to make sure of reading a consistent
+	 * version of its content, before copying. After the copy, ownership of
+	 * the DMA buffer for the new page is given to the GPU in order to make
+	 * the content visible to potential GPU access that may happen as soon as
+	 * this function releases the lock on the MMU region.
+	 */
+	dma_sync_single_for_cpu(kbdev->dev, old_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+	memcpy(new_page, old_page, PAGE_SIZE);
+	dma_sync_single_for_device(kbdev->dev, new_dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+
+	/* Remap GPU virtual page.
+	 *
+	 * This code rests on the assumption that page migration is only enabled
+	 * for 4 kB pages, that necessarily live in the bottom level of the MMU
+	 * page table. For this reason, the PGD level tells us inequivocably
+	 * whether the page being migrated is a "content page" or another PGD
+	 * of the page table:
+	 *
+	 * - Bottom level implies ATE (Address Translation Entry)
+	 * - Any other level implies PTE (Page Table Entry)
+	 *
+	 * The current implementation doesn't handle the case of a level 0 PGD,
+	 * that is: the root PGD of the page table.
+	 */
+	target = &pgd_page[index];
+
+	/* Certain entries of a page table page encode the count of valid entries
+	 * present in that page. So need to save & restore the count information
+	 * when updating the PTE/ATE to point to the new page.
+	 */
+	num_of_valid_entries = kbdev->mmu_mode->get_num_valid_entries(pgd_page);
+
+	if (level == MIDGARD_MMU_BOTTOMLEVEL) {
+		WARN_ON_ONCE((*target & 1UL) == 0);
+		*target =
+			kbase_mmu_create_ate(kbdev, new_phys, page_md->data.mapped.reg->flags,
+					     level, page_md->data.mapped.reg->gpu_alloc->group_id);
+	} else {
+		u64 managed_pte;
+
+#ifdef CONFIG_MALI_DEBUG
+		/* The PTE should be pointing to the page being migrated */
+		WARN_ON_ONCE(as_phys_addr_t(old_phys) != kbdev->mmu_mode->pte_to_phy_addr(
+			kbdev->mgm_dev->ops.mgm_pte_to_original_pte(
+				kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, pgd_page[index])));
+#endif
+		kbdev->mmu_mode->entry_set_pte(&managed_pte, as_phys_addr_t(new_phys));
+		*target = kbdev->mgm_dev->ops.mgm_update_gpu_pte(
+			kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP, level, managed_pte);
+	}
+
+	kbdev->mmu_mode->set_num_valid_entries(pgd_page, num_of_valid_entries);
+
+	/* This function always updates a single entry inside an existing PGD,
+	 * therefore cache maintenance is necessary and affects a single entry.
+	 */
+	kbase_mmu_sync_pgd(kbdev, mmut->kctx, pgd + (index * sizeof(u64)),
+			   kbase_dma_addr(phys_to_page(pgd)) + (index * sizeof(u64)), sizeof(u64),
+			   KBASE_MMU_OP_FLUSH_PT);
+
+	/* Unlock MMU region.
+	 *
+	 * Notice that GPUs which don't issue flush commands via GPU control
+	 * still need an additional GPU cache flush here, this time only
+	 * for the page table, because the function call above to sync PGDs
+	 * won't have any effect on them.
+	 */
+	spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags);
+	if (kbdev->pm.backend.gpu_ready && mmut->kctx->as_nr >= 0) {
+		int as_nr = mmut->kctx->as_nr;
+		struct kbase_as *as = &kbdev->as[as_nr];
+
+		if (mmu_flush_cache_on_gpu_ctrl(kbdev)) {
+			ret = kbase_mmu_hw_do_unlock(kbdev, as, &op_param);
+		} else {
+			ret = kbase_gpu_cache_flush_and_busy_wait(kbdev,
+								  GPU_COMMAND_CACHE_CLN_INV_L2);
+			if (!ret)
+				ret = kbase_mmu_hw_do_unlock_no_addr(kbdev, as, &op_param);
+		}
+	}
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags);
+	/* Releasing locks before checking the migration transaction error state */
+	mutex_unlock(&kbdev->mmu_hw_mutex);
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, hwaccess_flags);
+	/* Release the transition prevention in L2 by ending the transaction */
+	mmu_page_migration_transaction_end(kbdev);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, hwaccess_flags);
+
+	/* Checking the final migration transaction error state */
+	if (ret < 0) {
+		dev_err(kbdev->dev, "%s: failed to unlock MMU region.", __func__);
+		goto undo_mappings;
+	}
+
+	/* Undertaking metadata transfer, while we are holding the mmu_lock */
+	spin_lock(&page_md->migrate_lock);
+	if (level == MIDGARD_MMU_BOTTOMLEVEL) {
+		size_t page_array_index =
+			page_md->data.mapped.vpfn - page_md->data.mapped.reg->start_pfn;
+
+		WARN_ON(PAGE_STATUS_GET(page_md->status) != ALLOCATED_MAPPED);
+
+		/* Replace page in array of pages of the physical allocation. */
+		page_md->data.mapped.reg->gpu_alloc->pages[page_array_index] = new_phys;
+	}
+	/* Update the new page dma_addr with the transferred metadata from the old_page */
+	page_md->dma_addr = new_dma_addr;
+	page_md->status = PAGE_ISOLATE_SET(page_md->status, 0);
+	spin_unlock(&page_md->migrate_lock);
+	set_page_private(as_page(new_phys), (unsigned long)page_md);
+	/* Old page metatdata pointer cleared as it now owned by the new page */
+	set_page_private(as_page(old_phys), 0);
+
+l2_state_defer_out:
+	kbase_kunmap(phys_to_page(pgd), pgd_page);
+pgd_page_map_error:
+get_pgd_at_level_error:
+page_state_change_out:
+	rt_mutex_unlock(&mmut->mmu_lock);
+
+	kbase_kunmap(as_page(new_phys), new_page);
+new_page_map_error:
+	kbase_kunmap(as_page(old_phys), old_page);
+old_page_map_error:
+	return ret;
+
+undo_mappings:
+	/* Unlock the MMU table and undo mappings. */
+	rt_mutex_unlock(&mmut->mmu_lock);
+	kbase_kunmap(phys_to_page(pgd), pgd_page);
+	kbase_kunmap(as_page(new_phys), new_page);
+	kbase_kunmap(as_page(old_phys), old_page);
+
+	return ret;
+}
+
+static void mmu_teardown_level(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+			       phys_addr_t pgd, unsigned int level)
 {
-	phys_addr_t target_pgd;
 	u64 *pgd_page;
 	int i;
-	struct kbase_mmu_mode const *mmu_mode;
-	u64 *pgd_page_buffer;
+	struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev;
+	struct kbase_mmu_mode const *mmu_mode = kbdev->mmu_mode;
+	u64 *pgd_page_buffer = NULL;
+	struct page *p = phys_to_page(pgd);
 
 	lockdep_assert_held(&mmut->mmu_lock);
 
-	/* Early-out. No need to kmap to check entries for L3 PGD. */
-	if (level == MIDGARD_MMU_BOTTOMLEVEL) {
-		kbase_mmu_free_pgd(kbdev, mmut, pgd, true);
+	pgd_page = kbase_kmap_atomic(p);
+	/* kmap_atomic should NEVER fail. */
+	if (WARN_ON_ONCE(pgd_page == NULL))
 		return;
+	if (level < MIDGARD_MMU_BOTTOMLEVEL) {
+		/* Copy the page to our preallocated buffer so that we can minimize
+		 * kmap_atomic usage
+		 */
+		pgd_page_buffer = mmut->scratch_mem.teardown_pages.levels[level];
+		memcpy(pgd_page_buffer, pgd_page, PAGE_SIZE);
 	}
 
-	pgd_page = kmap_atomic(pfn_to_page(PFN_DOWN(pgd)));
-	/* kmap_atomic should NEVER fail. */
-	if (WARN_ON(pgd_page == NULL))
-		return;
-	/* Copy the page to our preallocated buffer so that we can minimize
-	 * kmap_atomic usage
+	/* When page migration is enabled, kbase_region_tracker_term() would ensure
+	 * there are no pages left mapped on the GPU for a context. Hence the count
+	 * of valid entries is expected to be zero here.
 	 */
-	pgd_page_buffer = mmut->mmu_teardown_pages[level];
-	memcpy(pgd_page_buffer, pgd_page, PAGE_SIZE);
-	kunmap_atomic(pgd_page);
+	if (kbase_is_page_migration_enabled() && mmut->kctx)
+		WARN_ON_ONCE(kbdev->mmu_mode->get_num_valid_entries(pgd_page));
+	/* Invalidate page after copying */
+	mmu_mode->entries_invalidate(pgd_page, KBASE_MMU_PAGE_ENTRIES);
+	kbase_kunmap_atomic(pgd_page);
 	pgd_page = pgd_page_buffer;
 
-	mmu_mode = kbdev->mmu_mode;
-
-	for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) {
-		target_pgd = mmu_mode->pte_to_phy_addr(pgd_page[i]);
-
-		if (target_pgd) {
+	if (level < MIDGARD_MMU_BOTTOMLEVEL) {
+		for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) {
 			if (mmu_mode->pte_is_valid(pgd_page[i], level)) {
-				mmu_teardown_level(kbdev, mmut,
-						   target_pgd,
-						   level + 1);
+				phys_addr_t target_pgd = mmu_mode->pte_to_phy_addr(
+					mgm_dev->ops.mgm_pte_to_original_pte(mgm_dev,
+									     MGM_DEFAULT_PTE_GROUP,
+									     level, pgd_page[i]));
+
+				mmu_teardown_level(kbdev, mmut, target_pgd, level + 1);
 			}
 		}
 	}
 
-	kbase_mmu_free_pgd(kbdev, mmut, pgd, true);
+	kbase_mmu_free_pgd(kbdev, mmut, pgd);
+}
+
+static void kbase_mmu_mark_non_movable(struct page *page)
+{
+	struct kbase_page_metadata *page_md;
+
+	if (!kbase_is_page_migration_enabled())
+		return;
+
+	page_md = kbase_page_private(page);
+
+	spin_lock(&page_md->migrate_lock);
+	page_md->status = PAGE_STATUS_SET(page_md->status, NOT_MOVABLE);
+
+	if (IS_PAGE_MOVABLE(page_md->status))
+		page_md->status = PAGE_MOVABLE_CLEAR(page_md->status);
+
+	spin_unlock(&page_md->migrate_lock);
 }
 
 int kbase_mmu_init(struct kbase_device *const kbdev,
 	struct kbase_mmu_table *const mmut, struct kbase_context *const kctx,
 	int const group_id)
 {
-	int level;
-
 	if (WARN_ON(group_id >= MEMORY_GROUP_MANAGER_NR_GROUPS) ||
 	    WARN_ON(group_id < 0))
 		return -EINVAL;
 
+	compiletime_assert(KBASE_MEM_ALLOC_MAX_SIZE <= (((8ull << 30) >> PAGE_SHIFT)),
+			   "List of free PGDs may not be large enough.");
+	compiletime_assert(MAX_PAGES_FOR_FREE_PGDS >= MIDGARD_MMU_BOTTOMLEVEL,
+			   "Array of MMU levels is not large enough.");
+
 	mmut->group_id = group_id;
 	rt_mutex_init(&mmut->mmu_lock);
 	mmut->kctx = kctx;
-	mmut->pgd = 0;
-
-	/* Preallocate MMU depth of 3 pages for mmu_teardown_level to use */
-	for (level = MIDGARD_MMU_TOPLEVEL;
-			level < MIDGARD_MMU_BOTTOMLEVEL; level++) {
-		mmut->mmu_teardown_pages[level] =
-			kmalloc(PAGE_SIZE, GFP_KERNEL);
-
-		if (!mmut->mmu_teardown_pages[level]) {
-			kbase_mmu_term(kbdev, mmut);
-			return -ENOMEM;
-		}
-	}
+	mmut->pgd = KBASE_MMU_INVALID_PGD_ADDRESS;
 
 	/* We allocate pages into the kbdev memory pool, then
 	 * kbase_mmu_alloc_pgd will allocate out of that pool. This is done to
 	 * avoid allocations from the kernel happening with the lock held.
 	 */
-	while (!mmut->pgd) {
+	while (mmut->pgd == KBASE_MMU_INVALID_PGD_ADDRESS) {
 		int err;
 
 		err = kbase_mem_pool_grow(
 			&kbdev->mem_pools.small[mmut->group_id],
-			MIDGARD_MMU_BOTTOMLEVEL);
+			MIDGARD_MMU_BOTTOMLEVEL, kctx ? kctx->task : NULL);
 		if (err) {
 			kbase_mmu_term(kbdev, mmut);
 			return -ENOMEM;
 		}
 
-		rt_mutex_lock(&mmut->mmu_lock);
 		mmut->pgd = kbase_mmu_alloc_pgd(kbdev, mmut);
-		rt_mutex_unlock(&mmut->mmu_lock);
 	}
 
+	kbase_mmu_mark_non_movable(pfn_to_page(PFN_DOWN(mmut->pgd)));
 	return 0;
 }
 
 void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut)
 {
-	int level;
+	WARN((mmut->kctx) && (mmut->kctx->as_nr != KBASEP_AS_NR_INVALID),
+	     "kctx-%d_%d must first be scheduled out to flush GPU caches+tlbs before tearing down MMU tables",
+	     mmut->kctx->tgid, mmut->kctx->id);
 
-	if (mmut->pgd) {
+	if (mmut->pgd != KBASE_MMU_INVALID_PGD_ADDRESS) {
 		rt_mutex_lock(&mmut->mmu_lock);
 		mmu_teardown_level(kbdev, mmut, mmut->pgd, MIDGARD_MMU_TOPLEVEL);
 		rt_mutex_unlock(&mmut->mmu_lock);
@@ -2504,20 +3851,29 @@ void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut)
 		if (mmut->kctx)
 			KBASE_TLSTREAM_AUX_PAGESALLOC(kbdev, mmut->kctx->id, 0);
 	}
-
-	for (level = MIDGARD_MMU_TOPLEVEL;
-			level < MIDGARD_MMU_BOTTOMLEVEL; level++) {
-		if (!mmut->mmu_teardown_pages[level])
-			break;
-		kfree(mmut->mmu_teardown_pages[level]);
-	}
 }
 
-void kbase_mmu_as_term(struct kbase_device *kbdev, int i)
+void kbase_mmu_as_term(struct kbase_device *kbdev, unsigned int i)
 {
 	destroy_workqueue(kbdev->as[i].pf_wq);
 }
 
+void kbase_mmu_flush_pa_range(struct kbase_device *kbdev, struct kbase_context *kctx,
+			      phys_addr_t phys, size_t size,
+			      enum kbase_mmu_op_type flush_op)
+{
+#if MALI_USE_CSF
+	unsigned long irq_flags;
+
+	spin_lock_irqsave(&kbdev->hwaccess_lock, irq_flags);
+	if (mmu_flush_cache_on_gpu_ctrl(kbdev) && (flush_op != KBASE_MMU_OP_NONE) &&
+	    kbdev->pm.backend.gpu_ready && (!kctx || kctx->as_nr >= 0))
+		mmu_flush_pa_range(kbdev, phys, size, KBASE_MMU_OP_FLUSH_PT);
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, irq_flags);
+#endif
+}
+
+#ifdef CONFIG_MALI_VECTOR_DUMP
 static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd,
 		int level, char ** const buffer, size_t *size_left)
 {
@@ -2536,9 +3892,9 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd,
 	kbdev = kctx->kbdev;
 	mmu_mode = kbdev->mmu_mode;
 
-	pgd_page = kmap(pfn_to_page(PFN_DOWN(pgd)));
+	pgd_page = kbase_kmap(pfn_to_page(PFN_DOWN(pgd)));
 	if (!pgd_page) {
-		dev_warn(kbdev->dev, "%s: kmap failure\n", __func__);
+		dev_warn(kbdev->dev, "%s: kmap failure", __func__);
 		return 0;
 	}
 
@@ -2563,13 +3919,15 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd,
 		for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) {
 			if (mmu_mode->pte_is_valid(pgd_page[i], level)) {
 				target_pgd = mmu_mode->pte_to_phy_addr(
-						pgd_page[i]);
+					kbdev->mgm_dev->ops.mgm_pte_to_original_pte(
+						kbdev->mgm_dev, MGM_DEFAULT_PTE_GROUP,
+						level, pgd_page[i]));
 
 				dump_size = kbasep_mmu_dump_level(kctx,
 						target_pgd, level + 1,
 						buffer, size_left);
 				if (!dump_size) {
-					kunmap(pfn_to_page(PFN_DOWN(pgd)));
+					kbase_kunmap(pfn_to_page(PFN_DOWN(pgd)), pgd_page);
 					return 0;
 				}
 				size += dump_size;
@@ -2577,7 +3935,7 @@ static size_t kbasep_mmu_dump_level(struct kbase_context *kctx, phys_addr_t pgd,
 		}
 	}
 
-	kunmap(pfn_to_page(PFN_DOWN(pgd)));
+	kbase_kunmap(pfn_to_page(PFN_DOWN(pgd)), pgd_page);
 
 	return size;
 }
@@ -2657,6 +4015,7 @@ fail_free:
 	return NULL;
 }
 KBASE_EXPORT_TEST_API(kbase_mmu_dump);
+#endif /* CONFIG_MALI_VECTOR_DUMP */
 
 void kbase_mmu_bus_fault_worker(struct work_struct *data)
 {
@@ -2689,8 +4048,7 @@ void kbase_mmu_bus_fault_worker(struct work_struct *data)
 #ifdef CONFIG_MALI_ARBITER_SUPPORT
 	/* check if we still have GPU */
 	if (unlikely(kbase_is_gpu_removed(kbdev))) {
-		dev_dbg(kbdev->dev,
-				"%s: GPU has been removed\n", __func__);
+		dev_dbg(kbdev->dev, "%s: GPU has been removed", __func__);
 		release_ctx(kbdev, kctx);
 		atomic_dec(&kbdev->faults_pending);
 		return;
diff --git a/mali_kbase/mmu/mali_kbase_mmu.h b/mali_kbase/mmu/mali_kbase_mmu.h
index 49665fb..e13e9b9 100644
--- a/mali_kbase/mmu/mali_kbase_mmu.h
+++ b/mali_kbase/mmu/mali_kbase_mmu.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,17 +25,19 @@
 #include <uapi/gpu/arm/midgard/mali_base_kernel.h>
 
 #define KBASE_MMU_PAGE_ENTRIES 512
+#define KBASE_MMU_INVALID_PGD_ADDRESS (~(phys_addr_t)0)
 
 struct kbase_context;
 struct kbase_mmu_table;
+struct kbase_va_region;
 
 /**
  * enum kbase_caller_mmu_sync_info - MMU-synchronous caller info.
  * A pointer to this type is passed down from the outer-most callers in the kbase
  * module - where the information resides as to the synchronous / asynchronous
  * nature of the call flow, with respect to MMU operations. ie - does the call flow relate to
- * existing GPU work does it come from requests (like ioctl) from user-space, power management,
- * etc.
+ * existing GPU work or does it come from requests (like ioctl) from user-space, power
+ * management, etc.
  *
  * @CALLER_MMU_UNSET_SYNCHRONICITY: default value must be invalid to avoid accidental choice
  *                                  of a 'valid' value
@@ -49,6 +51,26 @@ enum kbase_caller_mmu_sync_info {
 };
 
 /**
+ * enum kbase_mmu_op_type - enum for MMU operations
+ * @KBASE_MMU_OP_NONE:        To help catch uninitialized struct
+ * @KBASE_MMU_OP_FIRST:       The lower boundary of enum
+ * @KBASE_MMU_OP_LOCK:        Lock memory region
+ * @KBASE_MMU_OP_UNLOCK:      Unlock memory region
+ * @KBASE_MMU_OP_FLUSH_PT:    Flush page table (CLN+INV L2 only)
+ * @KBASE_MMU_OP_FLUSH_MEM:   Flush memory (CLN+INV L2+LSC)
+ * @KBASE_MMU_OP_COUNT:       The upper boundary of enum
+ */
+enum kbase_mmu_op_type {
+	KBASE_MMU_OP_NONE = 0, /* Must be zero */
+	KBASE_MMU_OP_FIRST, /* Must be the first non-zero op */
+	KBASE_MMU_OP_LOCK = KBASE_MMU_OP_FIRST,
+	KBASE_MMU_OP_UNLOCK,
+	KBASE_MMU_OP_FLUSH_PT,
+	KBASE_MMU_OP_FLUSH_MEM,
+	KBASE_MMU_OP_COUNT /* Must be the last in enum */
+};
+
+/**
  * kbase_mmu_as_init() - Initialising GPU address space object.
  *
  * @kbdev: The kbase device structure for the device (must be a valid pointer).
@@ -59,7 +81,7 @@ enum kbase_caller_mmu_sync_info {
  *
  * Return: 0 on success and non-zero value on failure.
  */
-int kbase_mmu_as_init(struct kbase_device *kbdev, int i);
+int kbase_mmu_as_init(struct kbase_device *kbdev, unsigned int i);
 
 /**
  * kbase_mmu_as_term() - Terminate address space object.
@@ -70,7 +92,7 @@ int kbase_mmu_as_init(struct kbase_device *kbdev, int i);
  * This is called upon device termination to destroy
  * the address space object of the device.
  */
-void kbase_mmu_as_term(struct kbase_device *kbdev, int i);
+void kbase_mmu_as_term(struct kbase_device *kbdev, unsigned int i);
 
 /**
  * kbase_mmu_init - Initialise an object representing GPU page tables
@@ -129,27 +151,143 @@ void kbase_mmu_term(struct kbase_device *kbdev, struct kbase_mmu_table *mmut);
 u64 kbase_mmu_create_ate(struct kbase_device *kbdev,
 	struct tagged_addr phy, unsigned long flags, int level, int group_id);
 
-int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev,
-				    struct kbase_mmu_table *mmut,
-				    const u64 start_vpfn,
-				    struct tagged_addr *phys, size_t nr,
-				    unsigned long flags, int group_id);
-int kbase_mmu_insert_pages(struct kbase_device *kbdev,
-			   struct kbase_mmu_table *mmut, u64 vpfn,
-			   struct tagged_addr *phys, size_t nr,
-			   unsigned long flags, int as_nr, int group_id,
-			   enum kbase_caller_mmu_sync_info mmu_sync_info);
-int kbase_mmu_insert_single_page(struct kbase_context *kctx, u64 vpfn,
-				 struct tagged_addr phys, size_t nr,
-				 unsigned long flags, int group_id,
-				 enum kbase_caller_mmu_sync_info mmu_sync_info);
-
-int kbase_mmu_teardown_pages(struct kbase_device *kbdev,
-			     struct kbase_mmu_table *mmut, u64 vpfn,
-			     size_t nr, int as_nr);
+int kbase_mmu_insert_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				    u64 vpfn, struct tagged_addr *phys, size_t nr,
+				    unsigned long flags, int group_id, u64 *dirty_pgds,
+				    struct kbase_va_region *reg);
+int kbase_mmu_insert_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+			   struct tagged_addr *phys, size_t nr, unsigned long flags, int as_nr,
+			   int group_id, enum kbase_caller_mmu_sync_info mmu_sync_info,
+			   struct kbase_va_region *reg);
+
+/**
+ * kbase_mmu_insert_pages_skip_status_update - Map 'nr' pages pointed to by 'phys'
+ * at GPU PFN 'vpfn' for GPU address space number 'as_nr'.
+ *
+ * @kbdev:         Instance of GPU platform device, allocated from the probe method.
+ * @mmut:          GPU page tables.
+ * @vpfn:          Start page frame number of the GPU virtual pages to map.
+ * @phys:          Physical address of the page to be mapped.
+ * @nr:            The number of pages to map.
+ * @flags:         Bitmask of attributes of the GPU memory region being mapped.
+ * @as_nr:         The GPU address space number.
+ * @group_id:      The physical memory group in which the page was allocated.
+ * @mmu_sync_info: MMU-synchronous caller info.
+ * @reg:           The region whose physical allocation is to be mapped.
+ *
+ * Similar to kbase_mmu_insert_pages() but skips updating each pages metadata
+ * for page migration.
+ *
+ * Return: 0 if successful, otherwise a negative error code.
+ */
+int kbase_mmu_insert_pages_skip_status_update(struct kbase_device *kbdev,
+					      struct kbase_mmu_table *mmut, u64 vpfn,
+					      struct tagged_addr *phys, size_t nr,
+					      unsigned long flags, int as_nr, int group_id,
+					      enum kbase_caller_mmu_sync_info mmu_sync_info,
+					      struct kbase_va_region *reg);
+int kbase_mmu_insert_aliased_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				   u64 vpfn, struct tagged_addr *phys, size_t nr,
+				   unsigned long flags, int as_nr, int group_id,
+				   enum kbase_caller_mmu_sync_info mmu_sync_info,
+				   struct kbase_va_region *reg);
+int kbase_mmu_insert_single_imported_page(struct kbase_context *kctx, u64 vpfn,
+					  struct tagged_addr phys, size_t nr, unsigned long flags,
+					  int group_id,
+					  enum kbase_caller_mmu_sync_info mmu_sync_info);
+int kbase_mmu_insert_single_aliased_page(struct kbase_context *kctx, u64 vpfn,
+					 struct tagged_addr phys, size_t nr, unsigned long flags,
+					 int group_id,
+					 enum kbase_caller_mmu_sync_info mmu_sync_info);
+
+int kbase_mmu_teardown_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut, u64 vpfn,
+			     struct tagged_addr *phys, size_t nr_phys_pages, size_t nr_virt_pages,
+			     int as_nr);
+int kbase_mmu_teardown_imported_pages(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+				      u64 vpfn, struct tagged_addr *phys, size_t nr_phys_pages,
+				      size_t nr_virt_pages, int as_nr);
+#define kbase_mmu_teardown_firmware_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages,   \
+					  as_nr)                                                   \
+	kbase_mmu_teardown_imported_pages(kbdev, mmut, vpfn, phys, nr_phys_pages, nr_virt_pages,   \
+					  as_nr)
+
 int kbase_mmu_update_pages(struct kbase_context *kctx, u64 vpfn,
 			   struct tagged_addr *phys, size_t nr,
 			   unsigned long flags, int const group_id);
+#if MALI_USE_CSF
+/**
+ * kbase_mmu_update_csf_mcu_pages - Update MCU mappings with changes of phys and flags
+ *
+ * @kbdev:    Pointer to kbase device.
+ * @vpfn:     Virtual PFN (Page Frame Number) of the first page to update
+ * @phys:     Pointer to the array of tagged physical addresses of the physical
+ *            pages that are pointed to by the page table entries (that need to
+ *            be updated).
+ * @nr:       Number of pages to update
+ * @flags:    Flags
+ * @group_id: The physical memory group in which the page was allocated.
+ *            Valid range is 0..(MEMORY_GROUP_MANAGER_NR_GROUPS-1).
+ *
+ * Return: 0 on success, otherwise an error code.
+ */
+int kbase_mmu_update_csf_mcu_pages(struct kbase_device *kbdev, u64 vpfn, struct tagged_addr *phys,
+				   size_t nr, unsigned long flags, int const group_id);
+#endif
+
+/**
+ * kbase_mmu_migrate_page - Migrate GPU mappings and content between memory pages
+ *
+ * @old_phys:     Old physical page to be replaced.
+ * @new_phys:     New physical page used to replace old physical page.
+ * @old_dma_addr: DMA address of the old page.
+ * @new_dma_addr: DMA address of the new page.
+ * @level:        MMU page table level of the provided PGD.
+ *
+ * The page migration process is made of 2 big steps:
+ *
+ * 1) Copy the content of the old page to the new page.
+ * 2) Remap the virtual page, that is: replace either the ATE (if the old page
+ *    was a regular page) or the PTE (if the old page was used as a PGD) in the
+ *    MMU page table with the new page.
+ *
+ * During the process, the MMU region is locked to prevent GPU access to the
+ * virtual memory page that is being remapped.
+ *
+ * Before copying the content of the old page to the new page and while the
+ * MMU region is locked, a GPU cache flush is performed to make sure that
+ * pending GPU writes are finalized to the old page before copying.
+ * That is necessary because otherwise there's a risk that GPU writes might
+ * be finalized to the old page, and not new page, after migration.
+ * The MMU region is unlocked only at the end of the migration operation.
+ *
+ * Return: 0 on success, otherwise an error code.
+ */
+int kbase_mmu_migrate_page(struct tagged_addr old_phys, struct tagged_addr new_phys,
+			   dma_addr_t old_dma_addr, dma_addr_t new_dma_addr, int level);
+
+/**
+ * kbase_mmu_flush_pa_range() - Flush physical address range from the GPU caches
+ *
+ * @kbdev:    Instance of GPU platform device, allocated from the probe method.
+ * @kctx:     Pointer to kbase context, it can be NULL if the physical address
+ *            range is not associated with User created context.
+ * @phys:     Starting address of the physical range to start the operation on.
+ * @size:     Number of bytes to work on.
+ * @flush_op: Type of cache flush operation to perform.
+ *
+ * Issue a cache flush physical range command. This function won't perform any
+ * flush if the GPU doesn't support FLUSH_PA_RANGE command. The flush would be
+ * performed only if the context has a JASID assigned to it.
+ * This function is basically a wrapper for kbase_gpu_cache_flush_pa_range_and_busy_wait().
+ */
+void kbase_mmu_flush_pa_range(struct kbase_device *kbdev, struct kbase_context *kctx,
+			      phys_addr_t phys, size_t size,
+			      enum kbase_mmu_op_type flush_op);
+void kbase_mmu_flush_invalidate_update_pages(struct kbase_device *kbdev, struct kbase_context *kctx, u64 vpfn,
+					size_t nr, u64 dirty_pgds);
+int kbase_mmu_update_pages_no_flush(struct kbase_device *kbdev, struct kbase_mmu_table *mmut,
+					   u64 vpfn, struct tagged_addr *phys, size_t nr,
+					   unsigned long flags, int group_id, u64 *dirty_pgds);
 
 /**
  * kbase_mmu_bus_fault_interrupt - Process a bus fault interrupt.
diff --git a/mali_kbase/mmu/mali_kbase_mmu_hw.h b/mali_kbase/mmu/mali_kbase_mmu_hw.h
index 31658e0..49e050e 100644
--- a/mali_kbase/mmu/mali_kbase_mmu_hw.h
+++ b/mali_kbase/mmu/mali_kbase_mmu_hw.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2014-2015, 2018-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -55,32 +55,14 @@ enum kbase_mmu_fault_type {
 };
 
 /**
- * enum kbase_mmu_op_type - enum for MMU operations
- * @KBASE_MMU_OP_NONE:        To help catch uninitialized struct
- * @KBASE_MMU_OP_FIRST:       The lower boundary of enum
- * @KBASE_MMU_OP_LOCK:        Lock memory region
- * @KBASE_MMU_OP_UNLOCK:      Unlock memory region
- * @KBASE_MMU_OP_FLUSH_PT:    Flush page table (CLN+INV L2 only)
- * @KBASE_MMU_OP_FLUSH_MEM:   Flush memory (CLN+INV L2+LSC)
- * @KBASE_MMU_OP_COUNT:       The upper boundary of enum
- */
-enum kbase_mmu_op_type {
-	KBASE_MMU_OP_NONE = 0, /* Must be zero */
-	KBASE_MMU_OP_FIRST, /* Must be the first non-zero op */
-	KBASE_MMU_OP_LOCK = KBASE_MMU_OP_FIRST,
-	KBASE_MMU_OP_UNLOCK,
-	KBASE_MMU_OP_FLUSH_PT,
-	KBASE_MMU_OP_FLUSH_MEM,
-	KBASE_MMU_OP_COUNT /* Must be the last in enum */
-};
-
-/**
- * struct kbase_mmu_hw_op_param  - parameters for kbase_mmu_hw_do_operation()
- * @vpfn:          MMU Virtual Page Frame Number to start the operation on.
- * @nr:            Number of pages to work on.
- * @op:            Operation type (written to ASn_COMMAND).
- * @kctx_id:       Kernel context ID for MMU command tracepoint
- * @mmu_sync_info: Indicates whether this call is synchronous wrt MMU ops.
+ * struct kbase_mmu_hw_op_param  - parameters for kbase_mmu_hw_do_* functions
+ * @vpfn:           MMU Virtual Page Frame Number to start the operation on.
+ * @nr:             Number of pages to work on.
+ * @op:             Operation type (written to AS_COMMAND).
+ * @kctx_id:        Kernel context ID for MMU command tracepoint.
+ * @mmu_sync_info:  Indicates whether this call is synchronous wrt MMU ops.
+ * @flush_skip_levels: Page table levels to skip flushing. (Only
+ *                     applicable if GPU supports feature)
  */
 struct kbase_mmu_hw_op_param {
 	u64 vpfn;
@@ -88,6 +70,7 @@ struct kbase_mmu_hw_op_param {
 	enum kbase_mmu_op_type op;
 	u32 kctx_id;
 	enum kbase_caller_mmu_sync_info mmu_sync_info;
+	u64 flush_skip_levels;
 };
 
 /**
@@ -102,18 +85,120 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev,
 		struct kbase_as *as);
 
 /**
- * kbase_mmu_hw_do_operation - Issue an operation to the MMU.
- * @kbdev:         kbase device to issue the MMU operation on.
- * @as:            address space to issue the MMU operation on.
- * @op_param:      parameters for the operation.
+ * kbase_mmu_hw_do_lock - Issue LOCK command to the MMU and program
+ *                        the LOCKADDR register.
+ *
+ * @kbdev:     Kbase device to issue the MMU operation on.
+ * @as:        Address space to issue the MMU operation on.
+ * @op_param:  Pointer to struct containing information about the MMU
+ *             operation to perform.
+ *
+ * hwaccess_lock needs to be held when calling this function.
+ *
+ * Return: 0 if issuing the command was successful, otherwise an error code.
+ */
+int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as,
+			 const struct kbase_mmu_hw_op_param *op_param);
+
+/**
+ * kbase_mmu_hw_do_unlock_no_addr - Issue UNLOCK command to the MMU without
+ *                                  programming the LOCKADDR register and wait
+ *                                  for it to complete before returning.
+ *
+ * @kbdev:     Kbase device to issue the MMU operation on.
+ * @as:        Address space to issue the MMU operation on.
+ * @op_param:  Pointer to struct containing information about the MMU
+ *             operation to perform.
+ *
+ * This function should be called for GPU where GPU command is used to flush
+ * the cache(s) instead of MMU command.
+ *
+ * Return: 0 if issuing the command was successful, otherwise an error code.
+ */
+int kbase_mmu_hw_do_unlock_no_addr(struct kbase_device *kbdev, struct kbase_as *as,
+				   const struct kbase_mmu_hw_op_param *op_param);
+
+/**
+ * kbase_mmu_hw_do_unlock - Issue UNLOCK command to the MMU and wait for it
+ *                          to complete before returning.
+ *
+ * @kbdev:     Kbase device to issue the MMU operation on.
+ * @as:        Address space to issue the MMU operation on.
+ * @op_param:  Pointer to struct containing information about the MMU
+ *             operation to perform.
+ *
+ * Return: 0 if issuing the command was successful, otherwise an error code.
+ */
+int kbase_mmu_hw_do_unlock(struct kbase_device *kbdev, struct kbase_as *as,
+			   const struct kbase_mmu_hw_op_param *op_param);
+/**
+ * kbase_mmu_hw_do_lock - Issue a LOCK operation to the MMU.
  *
- * Issue an operation (MMU invalidate, MMU flush, etc) on the address space that
- * is associated with the provided kbase_context over the specified range
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @op_param:   Pointer to struct containing information about the MMU
+ *              operation to perform.
+ *
+ * Context: Acquires the hwaccess_lock, expects the caller to hold the mmu_hw_mutex
  *
  * Return: Zero if the operation was successful, non-zero otherwise.
  */
-int kbase_mmu_hw_do_operation(struct kbase_device *kbdev, struct kbase_as *as,
-			      struct kbase_mmu_hw_op_param *op_param);
+int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as,
+			  const struct kbase_mmu_hw_op_param *op_param);
+
+/**
+ * kbase_mmu_hw_do_flush - Issue a flush operation to the MMU.
+ *
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @op_param:   Pointer to struct containing information about the MMU
+ *              operation to perform.
+ *
+ * Issue a flush operation on the address space as per the information
+ * specified inside @op_param. This function should not be called for
+ * GPUs where MMU command to flush the cache(s) is deprecated.
+ * mmu_hw_mutex needs to be held when calling this function.
+ *
+ * Return: 0 if the operation was successful, non-zero otherwise.
+ */
+int kbase_mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as,
+			  const struct kbase_mmu_hw_op_param *op_param);
+
+/**
+ * kbase_mmu_hw_do_flush_locked - Issue a flush operation to the MMU.
+ *
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @op_param:   Pointer to struct containing information about the MMU
+ *              operation to perform.
+ *
+ * Issue a flush operation on the address space as per the information
+ * specified inside @op_param. This function should not be called for
+ * GPUs where MMU command to flush the cache(s) is deprecated.
+ * Both mmu_hw_mutex and hwaccess_lock need to be held when calling this
+ * function.
+ *
+ * Return: 0 if the operation was successful, non-zero otherwise.
+ */
+int kbase_mmu_hw_do_flush_locked(struct kbase_device *kbdev, struct kbase_as *as,
+				 const struct kbase_mmu_hw_op_param *op_param);
+
+/**
+ * kbase_mmu_hw_do_flush_on_gpu_ctrl - Issue a flush operation to the MMU.
+ *
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @op_param:   Pointer to struct containing information about the MMU
+ *              operation to perform.
+ *
+ * Issue a flush operation on the address space as per the information
+ * specified inside @op_param. GPU command is used to flush the cache(s)
+ * instead of the MMU command.
+ *
+ * Return: 0 if the operation was successful, non-zero otherwise.
+ */
+int kbase_mmu_hw_do_flush_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_as *as,
+				      const struct kbase_mmu_hw_op_param *op_param);
 
 /**
  * kbase_mmu_hw_clear_fault - Clear a fault that has been previously reported by
diff --git a/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c b/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c
index cdf9a84..d5411bd 100644
--- a/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c
+++ b/mali_kbase/mmu/mali_kbase_mmu_hw_direct.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,15 +24,40 @@
 #include <mali_kbase.h>
 #include <mali_kbase_ctx_sched.h>
 #include <mali_kbase_mem.h>
+#include <mali_kbase_reset_gpu.h>
 #include <mmu/mali_kbase_mmu_hw.h>
 #include <tl/mali_kbase_tracepoints.h>
+#include <linux/delay.h>
+
+#if MALI_USE_CSF
+/**
+ * mmu_has_flush_skip_pgd_levels() - Check if the GPU has the feature
+ *                                   AS_LOCKADDR_FLUSH_SKIP_LEVELS
+ *
+ * @gpu_props:  GPU properties for the GPU instance.
+ *
+ * This function returns whether a cache flush can apply the skip flags of
+ * AS_LOCKADDR_FLUSH_SKIP_LEVELS.
+ *
+ * Return: True if cache flush has the said feature.
+ */
+static bool mmu_has_flush_skip_pgd_levels(struct kbase_gpu_props const *gpu_props)
+{
+	u32 const signature =
+		gpu_props->props.raw_props.gpu_id & (GPU_ID2_ARCH_MAJOR | GPU_ID2_ARCH_REV);
+
+	return signature >= (u32)GPU_ID2_PRODUCT_MAKE(12, 0, 4, 0);
+}
+#endif
 
 /**
  * lock_region() - Generate lockaddr to lock memory region in MMU
- * @gpu_props: GPU properties for finding the MMU lock region size
- * @pfn:       Starting page frame number of the region to lock
- * @num_pages: Number of pages to lock. It must be greater than 0.
- * @lockaddr:  Address and size of memory region to lock
+ *
+ * @gpu_props: GPU properties for finding the MMU lock region size.
+ * @lockaddr:  Address and size of memory region to lock.
+ * @op_param:  Pointer to a struct containing the starting page frame number of
+ *             the region to lock, the number of pages to lock and page table
+ *             levels to skip when flushing (if supported).
  *
  * The lockaddr value is a combination of the starting address and
  * the size of the region that encompasses all the memory pages to lock.
@@ -63,14 +88,14 @@
  *
  * Return: 0 if success, or an error code on failure.
  */
-static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num_pages,
-		       u64 *lockaddr)
+static int lock_region(struct kbase_gpu_props const *gpu_props, u64 *lockaddr,
+		       const struct kbase_mmu_hw_op_param *op_param)
 {
-	const u64 lockaddr_base = pfn << PAGE_SHIFT;
-	const u64 lockaddr_end = ((pfn + num_pages) << PAGE_SHIFT) - 1;
+	const u64 lockaddr_base = op_param->vpfn << PAGE_SHIFT;
+	const u64 lockaddr_end = ((op_param->vpfn + op_param->nr) << PAGE_SHIFT) - 1;
 	u64 lockaddr_size_log2;
 
-	if (num_pages == 0)
+	if (op_param->nr == 0)
 		return -EINVAL;
 
 	/* The MMU lock region is a self-aligned region whose size
@@ -101,7 +126,7 @@ static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num
 	 * therefore the highest bit that differs is bit #16
 	 * and the region size (as a logarithm) is 16 + 1 = 17, i.e. 128 kB.
 	 */
-	lockaddr_size_log2 = fls(lockaddr_base ^ lockaddr_end);
+	lockaddr_size_log2 = fls64(lockaddr_base ^ lockaddr_end);
 
 	/* Cap the size against minimum and maximum values allowed. */
 	if (lockaddr_size_log2 > KBASE_LOCK_REGION_MAX_SIZE_LOG2)
@@ -123,40 +148,69 @@ static int lock_region(struct kbase_gpu_props const *gpu_props, u64 pfn, u32 num
 	*lockaddr = lockaddr_base & ~((1ull << lockaddr_size_log2) - 1);
 	*lockaddr |= lockaddr_size_log2 - 1;
 
+#if MALI_USE_CSF
+	if (mmu_has_flush_skip_pgd_levels(gpu_props))
+		*lockaddr =
+			AS_LOCKADDR_FLUSH_SKIP_LEVELS_SET(*lockaddr, op_param->flush_skip_levels);
+#endif
+
 	return 0;
 }
 
-static int wait_ready(struct kbase_device *kbdev,
-		unsigned int as_nr)
+/**
+ * wait_ready() - Wait for previously issued MMU command to complete.
+ *
+ * @kbdev:        Kbase device to wait for a MMU command to complete.
+ * @as_nr:        Address space to wait for a MMU command to complete.
+ *
+ * Reset GPU if the wait for previously issued command fails.
+ *
+ * Return: 0 on successful completion. negative error on failure.
+ */
+static int wait_ready(struct kbase_device *kbdev, unsigned int as_nr)
 {
-	unsigned int max_loops = KBASE_AS_INACTIVE_MAX_LOOPS;
+	const ktime_t wait_loop_start = ktime_get_raw();
+	const u32 mmu_as_inactive_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms;
+	s64 diff;
 
-	/* Wait for the MMU status to indicate there is no active command. */
-	while (--max_loops &&
-	       kbase_reg_read(kbdev, MMU_AS_REG(as_nr, AS_STATUS)) &
-		       AS_STATUS_AS_ACTIVE) {
-		;
-	}
+	if (unlikely(kbdev->mmu_unresponsive))
+		return -EBUSY;
 
-	if (WARN_ON_ONCE(max_loops == 0)) {
-		dev_err(kbdev->dev,
-			"AS_ACTIVE bit stuck for as %u, might be caused by slow/unstable GPU clock or possible faulty FPGA connector",
-			as_nr);
-		return -1;
-	}
+	do {
+		unsigned int i;
 
-	return 0;
+		for (i = 0; i < 1000; i++) {
+			/* Wait for the MMU status to indicate there is no active command */
+			if (!(kbase_reg_read(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_STATUS))) &
+			      AS_STATUS_AS_ACTIVE))
+				return 0;
+		}
+
+		diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start));
+	} while (diff < mmu_as_inactive_wait_time_ms);
+
+	dev_err(kbdev->dev,
+		"AS_ACTIVE bit stuck for as %u. Might be caused by unstable GPU clk/pwr or faulty system",
+		as_nr);
+	kbdev->mmu_unresponsive = true;
+	if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+		kbase_reset_gpu_locked(kbdev);
+
+	return -ETIMEDOUT;
 }
 
 static int write_cmd(struct kbase_device *kbdev, int as_nr, u32 cmd)
 {
-	int status;
-
 	/* write AS_COMMAND when MMU is ready to accept another command */
-	status = wait_ready(kbdev, as_nr);
-	if (status == 0)
-		kbase_reg_write(kbdev, MMU_AS_REG(as_nr, AS_COMMAND), cmd);
-	else {
+	const int status = wait_ready(kbdev, as_nr);
+
+	if (likely(status == 0))
+		kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_COMMAND)), cmd);
+	else if (status == -EBUSY) {
+		dev_dbg(kbdev->dev,
+			"Skipped the wait for AS_ACTIVE bit for as %u, before sending MMU command %u",
+			as_nr, cmd);
+	} else {
 		dev_err(kbdev->dev,
 			"Wait for AS_ACTIVE bit failed for as %u, before sending MMU command %u",
 			as_nr, cmd);
@@ -165,6 +219,131 @@ static int write_cmd(struct kbase_device *kbdev, int as_nr, u32 cmd)
 	return status;
 }
 
+#if MALI_USE_CSF
+static int wait_l2_power_trans_complete(struct kbase_device *kbdev)
+{
+	const ktime_t wait_loop_start = ktime_get_raw();
+	const u32 pwr_trans_wait_time_ms = kbdev->mmu_or_gpu_cache_op_wait_time_ms;
+	s64 diff;
+	u64 value;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	do {
+		unsigned int i;
+
+		for (i = 0; i < 1000; i++) {
+			value = kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_PWRTRANS_HI));
+			value <<= 32;
+			value |= kbase_reg_read(kbdev, GPU_CONTROL_REG(L2_PWRTRANS_LO));
+
+			if (!value)
+				return 0;
+		}
+
+		diff = ktime_to_ms(ktime_sub(ktime_get_raw(), wait_loop_start));
+	} while (diff < pwr_trans_wait_time_ms);
+
+	dev_warn(kbdev->dev, "L2_PWRTRANS %016llx set for too long", value);
+
+	if (kbase_prepare_to_reset_gpu_locked(kbdev, RESET_FLAGS_NONE))
+		kbase_reset_gpu_locked(kbdev);
+
+	return -ETIMEDOUT;
+}
+
+#if !IS_ENABLED(CONFIG_MALI_NO_MALI)
+static int wait_cores_power_trans_complete(struct kbase_device *kbdev)
+{
+#define WAIT_TIMEOUT 50000 /* 50ms timeout */
+#define DELAY_TIME_IN_US 1
+	const int max_iterations = WAIT_TIMEOUT;
+	int loop;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	for (loop = 0; loop < max_iterations; loop++) {
+		u32 lo =
+		    kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO));
+		u32 hi =
+		    kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI));
+
+		if (!lo && !hi)
+			break;
+
+		udelay(DELAY_TIME_IN_US);
+	}
+
+	if (loop == max_iterations) {
+		dev_warn(kbdev->dev, "SHADER_PWRTRANS %08x%08x set for too long",
+			kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_HI)),
+			kbase_reg_read(kbdev, GPU_CONTROL_REG(SHADER_PWRTRANS_LO)));
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+/**
+ * apply_hw_issue_GPU2019_3901_wa - Apply WA for the HW issue GPU2019_3901
+ *
+ * @kbdev:             Kbase device to issue the MMU operation on.
+ * @mmu_cmd:           Pointer to the variable contain the value of MMU command
+ *                     that needs to be sent to flush the L2 cache and do an
+ *                     implicit unlock.
+ * @as_nr:             Address space number for which MMU command needs to be
+ *                     sent.
+ *
+ * This function ensures that the flush of LSC is not missed for the pages that
+ * were unmapped from the GPU, due to the power down transition of shader cores.
+ *
+ * Return: 0 if the WA was successfully applied, non-zero otherwise.
+ */
+static int apply_hw_issue_GPU2019_3901_wa(struct kbase_device *kbdev, u32 *mmu_cmd,
+					  unsigned int as_nr)
+{
+	int ret = 0;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	/* Check if L2 is OFF. The cores also must be OFF if L2 is not up, so
+	 * the workaround can be safely skipped.
+	 */
+	if (kbdev->pm.backend.l2_state != KBASE_L2_OFF) {
+		if (unlikely(*mmu_cmd != AS_COMMAND_FLUSH_MEM)) {
+			dev_warn(kbdev->dev, "Unexpected MMU command(%u) received", *mmu_cmd);
+			return -EINVAL;
+		}
+
+		/* Wait for the LOCK MMU command to complete, issued by the caller */
+		ret = wait_ready(kbdev, as_nr);
+		if (unlikely(ret))
+			return ret;
+
+		ret = kbase_gpu_cache_flush_and_busy_wait(kbdev,
+				GPU_COMMAND_CACHE_CLN_INV_LSC);
+		if (unlikely(ret))
+			return ret;
+
+		ret = wait_cores_power_trans_complete(kbdev);
+		if (unlikely(ret)) {
+			if (kbase_prepare_to_reset_gpu_locked(kbdev,
+							      RESET_FLAGS_HWC_UNRECOVERABLE_ERROR))
+				kbase_reset_gpu_locked(kbdev);
+			return ret;
+		}
+
+		/* As LSC is guaranteed to have been flushed we can use FLUSH_PT
+		 * MMU command to only flush the L2.
+		 */
+		*mmu_cmd = AS_COMMAND_FLUSH_PT;
+	}
+
+	return ret;
+}
+#endif /* !IS_ENABLED(CONFIG_MALI_NO_MALI) */
+#endif /* MALI_USE_CSF */
+
 void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as)
 {
 	struct kbase_mmu_setup *current_setup = &as->current_setup;
@@ -195,19 +374,18 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as)
 		transcfg = (transcfg | AS_TRANSCFG_PTW_SH_OS);
 	}
 
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSCFG_LO),
-			transcfg);
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSCFG_HI),
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSCFG_LO)), transcfg);
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSCFG_HI)),
 			(transcfg >> 32) & 0xFFFFFFFFUL);
 
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSTAB_LO),
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSTAB_LO)),
 			current_setup->transtab & 0xFFFFFFFFUL);
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_TRANSTAB_HI),
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_TRANSTAB_HI)),
 			(current_setup->transtab >> 32) & 0xFFFFFFFFUL);
 
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_MEMATTR_LO),
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_MEMATTR_LO)),
 			current_setup->memattr & 0xFFFFFFFFUL);
-	kbase_reg_write(kbdev, MMU_AS_REG(as->number, AS_MEMATTR_HI),
+	kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_MEMATTR_HI)),
 			(current_setup->memattr >> 32) & 0xFFFFFFFFUL);
 
 	KBASE_TLSTREAM_TL_ATTRIB_AS_CONFIG(kbdev, as,
@@ -222,93 +400,302 @@ void kbase_mmu_hw_configure(struct kbase_device *kbdev, struct kbase_as *as)
 #endif
 }
 
-int kbase_mmu_hw_do_operation(struct kbase_device *kbdev, struct kbase_as *as,
-			      struct kbase_mmu_hw_op_param *op_param)
+/**
+ * mmu_command_instr - Record an MMU command for instrumentation purposes.
+ *
+ * @kbdev:          Kbase device used to issue MMU operation on.
+ * @kctx_id:        Kernel context ID for MMU command tracepoint.
+ * @cmd:            Command issued to the MMU.
+ * @lock_addr:      Address of memory region locked for the operation.
+ * @mmu_sync_info:  Indicates whether this call is synchronous wrt MMU ops.
+ */
+static void mmu_command_instr(struct kbase_device *kbdev, u32 kctx_id, u32 cmd, u64 lock_addr,
+				    enum kbase_caller_mmu_sync_info mmu_sync_info)
+{
+	u64 lock_addr_base = AS_LOCKADDR_LOCKADDR_BASE_GET(lock_addr);
+	u32 lock_addr_size = AS_LOCKADDR_LOCKADDR_SIZE_GET(lock_addr);
+
+	bool is_mmu_synchronous = (mmu_sync_info == CALLER_MMU_SYNC);
+
+	KBASE_TLSTREAM_AUX_MMU_COMMAND(kbdev, kctx_id, cmd, is_mmu_synchronous, lock_addr_base,
+				       lock_addr_size);
+}
+
+/* Helper function to program the LOCKADDR register before LOCK/UNLOCK command
+ * is issued.
+ */
+static int mmu_hw_set_lock_addr(struct kbase_device *kbdev, int as_nr, u64 *lock_addr,
+				const struct kbase_mmu_hw_op_param *op_param)
+{
+	int ret;
+
+	ret = lock_region(&kbdev->gpu_props, lock_addr, op_param);
+
+	if (!ret) {
+		/* Set the region that needs to be updated */
+		kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_LOCKADDR_LO)),
+				*lock_addr & 0xFFFFFFFFUL);
+		kbase_reg_write(kbdev, MMU_STAGE1_REG(MMU_AS_REG(as_nr, AS_LOCKADDR_HI)),
+				(*lock_addr >> 32) & 0xFFFFFFFFUL);
+	}
+	return ret;
+}
+
+/**
+ * mmu_hw_do_lock_no_wait - Issue LOCK command to the MMU and return without
+ *                          waiting for it's completion.
+ *
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @lock_addr:  Address of memory region locked for this operation.
+ * @op_param:   Pointer to a struct containing information about the MMU operation.
+ *
+ * Return: 0 if issuing the command was successful, otherwise an error code.
+ */
+static int mmu_hw_do_lock_no_wait(struct kbase_device *kbdev, struct kbase_as *as, u64 *lock_addr,
+				  const struct kbase_mmu_hw_op_param *op_param)
+{
+	int ret;
+
+	ret = mmu_hw_set_lock_addr(kbdev, as->number, lock_addr, op_param);
+
+	if (likely(!ret))
+		ret = write_cmd(kbdev, as->number, AS_COMMAND_LOCK);
+
+	return ret;
+}
+
+/**
+ * mmu_hw_do_lock - Issue LOCK command to the MMU and wait for its completion.
+ *
+ * @kbdev:      Kbase device to issue the MMU operation on.
+ * @as:         Address space to issue the MMU operation on.
+ * @op_param:   Pointer to a struct containing information about the MMU operation.
+ *
+ * Return: 0 if issuing the LOCK command was successful, otherwise an error code.
+ */
+static int mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as,
+			  const struct kbase_mmu_hw_op_param *op_param)
 {
 	int ret;
 	u64 lock_addr = 0x0;
 
-	if (WARN_ON(kbdev == NULL) ||
-	    WARN_ON(as == NULL) ||
-	    WARN_ON(op_param == NULL))
+	if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL))
 		return -EINVAL;
 
-	lockdep_assert_held(&kbdev->mmu_hw_mutex);
+	ret = mmu_hw_do_lock_no_wait(kbdev, as, &lock_addr, op_param);
+
+	if (!ret)
+		ret = wait_ready(kbdev, as->number);
+
+	if (!ret)
+		mmu_command_instr(kbdev, op_param->kctx_id, AS_COMMAND_LOCK, lock_addr,
+				  op_param->mmu_sync_info);
+	else
+		dev_err(kbdev->dev, "AS_ACTIVE bit stuck after sending UNLOCK command");
 
-	if (op_param->op == KBASE_MMU_OP_UNLOCK) {
-		/* Unlock doesn't require a lock first */
-		ret = write_cmd(kbdev, as->number, AS_COMMAND_UNLOCK);
+	return ret;
+}
+
+int kbase_mmu_hw_do_lock(struct kbase_device *kbdev, struct kbase_as *as,
+			 const struct kbase_mmu_hw_op_param *op_param)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
 
-		/* Wait for UNLOCK command to complete */
+	return mmu_hw_do_lock(kbdev, as, op_param);
+}
+
+int kbase_mmu_hw_do_unlock_no_addr(struct kbase_device *kbdev, struct kbase_as *as,
+				   const struct kbase_mmu_hw_op_param *op_param)
+{
+	int ret = 0;
+
+	if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL))
+		return -EINVAL;
+
+	ret = write_cmd(kbdev, as->number, AS_COMMAND_UNLOCK);
+
+	/* Wait for UNLOCK command to complete */
+	if (likely(!ret))
 		ret = wait_ready(kbdev, as->number);
 
-		if (!ret) {
-			/* read MMU_AS_CONTROL.LOCKADDR register */
-			lock_addr |= (u64)kbase_reg_read(kbdev,
-				MMU_AS_REG(as->number, AS_LOCKADDR_HI)) << 32;
-			lock_addr |= (u64)kbase_reg_read(kbdev,
-				MMU_AS_REG(as->number, AS_LOCKADDR_LO));
+	if (likely(!ret)) {
+		u64 lock_addr = 0x0;
+		/* read MMU_AS_CONTROL.LOCKADDR register */
+		lock_addr |= (u64)kbase_reg_read(
+				     kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_LOCKADDR_HI)))
+			     << 32;
+		lock_addr |= (u64)kbase_reg_read(
+			kbdev, MMU_STAGE1_REG(MMU_AS_REG(as->number, AS_LOCKADDR_LO)));
+
+		mmu_command_instr(kbdev, op_param->kctx_id, AS_COMMAND_UNLOCK,
+				  lock_addr, op_param->mmu_sync_info);
+	}
+
+	return ret;
+}
+
+int kbase_mmu_hw_do_unlock(struct kbase_device *kbdev, struct kbase_as *as,
+			   const struct kbase_mmu_hw_op_param *op_param)
+{
+	int ret = 0;
+	u64 lock_addr = 0x0;
+
+	if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL))
+		return -EINVAL;
+
+	ret = mmu_hw_set_lock_addr(kbdev, as->number, &lock_addr, op_param);
+
+	if (!ret)
+		ret = kbase_mmu_hw_do_unlock_no_addr(kbdev, as,
+						     op_param);
+
+	return ret;
+}
+
+/**
+ * mmu_hw_do_flush - Flush MMU and wait for its completion.
+ *
+ * @kbdev:           Kbase device to issue the MMU operation on.
+ * @as:              Address space to issue the MMU operation on.
+ * @op_param:        Pointer to a struct containing information about the MMU operation.
+ * @hwaccess_locked: Flag to indicate if the lock has been held.
+ *
+ * Return: 0 if flushing MMU was successful, otherwise an error code.
+ */
+static int mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as,
+	const struct kbase_mmu_hw_op_param *op_param, bool hwaccess_locked)
+{
+	int ret;
+	u64 lock_addr = 0x0;
+	u32 mmu_cmd = AS_COMMAND_FLUSH_MEM;
+	const enum kbase_mmu_op_type flush_op = op_param->op;
+
+	if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL))
+		return -EINVAL;
+
+	/* MMU operations can be either FLUSH_PT or FLUSH_MEM, anything else at
+	 * this point would be unexpected.
+	 */
+	if (flush_op != KBASE_MMU_OP_FLUSH_PT && flush_op != KBASE_MMU_OP_FLUSH_MEM) {
+		dev_err(kbdev->dev, "Unexpected flush operation received");
+		return -EINVAL;
+	}
+
+	lockdep_assert_held(&kbdev->mmu_hw_mutex);
+
+	if (flush_op == KBASE_MMU_OP_FLUSH_PT)
+		mmu_cmd = AS_COMMAND_FLUSH_PT;
+
+	/* Lock the region that needs to be updated */
+	ret = mmu_hw_do_lock_no_wait(kbdev, as, &lock_addr, op_param);
+	if (ret)
+		return ret;
+
+#if MALI_USE_CSF && !IS_ENABLED(CONFIG_MALI_NO_MALI)
+	/* WA for the BASE_HW_ISSUE_GPU2019_3901. */
+	if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_GPU2019_3901) &&
+	    mmu_cmd == AS_COMMAND_FLUSH_MEM) {
+		if (!hwaccess_locked) {
+			unsigned long flags = 0;
+
+			spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+			ret = apply_hw_issue_GPU2019_3901_wa(kbdev, &mmu_cmd, as->number);
+			spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+		} else {
+			ret = apply_hw_issue_GPU2019_3901_wa(kbdev, &mmu_cmd, as->number);
 		}
-	} else if (op_param->op >= KBASE_MMU_OP_FIRST &&
-		   op_param->op < KBASE_MMU_OP_COUNT) {
-		ret = lock_region(&kbdev->gpu_props, op_param->vpfn, op_param->nr, &lock_addr);
-
-		if (!ret) {
-			/* Lock the region that needs to be updated */
-			kbase_reg_write(kbdev,
-				MMU_AS_REG(as->number, AS_LOCKADDR_LO),
-				lock_addr & 0xFFFFFFFFUL);
-			kbase_reg_write(kbdev,
-				MMU_AS_REG(as->number, AS_LOCKADDR_HI),
-				(lock_addr >> 32) & 0xFFFFFFFFUL);
-			write_cmd(kbdev, as->number, AS_COMMAND_LOCK);
-
-			/* Translate and send operation to HW */
-			switch (op_param->op) {
-			case KBASE_MMU_OP_FLUSH_PT:
-				write_cmd(kbdev, as->number,
-					  AS_COMMAND_FLUSH_PT);
-				break;
-			case KBASE_MMU_OP_FLUSH_MEM:
-				write_cmd(kbdev, as->number,
-					  AS_COMMAND_FLUSH_MEM);
-				break;
-			case KBASE_MMU_OP_LOCK:
-				/* No further operation. */
-				break;
-			default:
-				dev_warn(kbdev->dev,
-					 "Unsupported MMU operation (op=%d).\n",
-					 op_param->op);
-				return -EINVAL;
-			};
-
-			/* Wait for the command to complete */
-			ret = wait_ready(kbdev, as->number);
+
+		if (ret) {
+			dev_warn(
+				kbdev->dev,
+				"Failed to apply WA for HW issue when doing MMU flush op on VA range %llx-%llx for AS %u",
+				op_param->vpfn << PAGE_SHIFT,
+				((op_param->vpfn + op_param->nr) << PAGE_SHIFT) - 1, as->number);
+			/* Continue with the MMU flush operation */
 		}
-	} else {
-		/* Code should not reach here. */
-		dev_warn(kbdev->dev, "Invalid mmu operation (op=%d).\n",
-			 op_param->op);
+	}
+#endif
+
+	ret = write_cmd(kbdev, as->number, mmu_cmd);
+
+	/* Wait for the command to complete */
+	if (likely(!ret))
+		ret = wait_ready(kbdev, as->number);
+
+	if (likely(!ret)) {
+		mmu_command_instr(kbdev, op_param->kctx_id, mmu_cmd, lock_addr,
+				  op_param->mmu_sync_info);
+#if MALI_USE_CSF
+		if (flush_op == KBASE_MMU_OP_FLUSH_MEM &&
+		    kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa &&
+		    kbdev->pm.backend.l2_state == KBASE_L2_PEND_OFF)
+			ret = wait_l2_power_trans_complete(kbdev);
+#endif
+	}
+
+	return ret;
+}
+
+int kbase_mmu_hw_do_flush_locked(struct kbase_device *kbdev, struct kbase_as *as,
+				 const struct kbase_mmu_hw_op_param *op_param)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	return mmu_hw_do_flush(kbdev, as, op_param, true);
+}
+
+int kbase_mmu_hw_do_flush(struct kbase_device *kbdev, struct kbase_as *as,
+			  const struct kbase_mmu_hw_op_param *op_param)
+{
+	return mmu_hw_do_flush(kbdev, as, op_param, false);
+}
+
+int kbase_mmu_hw_do_flush_on_gpu_ctrl(struct kbase_device *kbdev, struct kbase_as *as,
+				      const struct kbase_mmu_hw_op_param *op_param)
+{
+	int ret, ret2;
+	u32 gpu_cmd = GPU_COMMAND_CACHE_CLN_INV_L2_LSC;
+	const enum kbase_mmu_op_type flush_op = op_param->op;
+
+	if (WARN_ON(kbdev == NULL) || WARN_ON(as == NULL))
+		return -EINVAL;
+
+	/* MMU operations can be either FLUSH_PT or FLUSH_MEM, anything else at
+	 * this point would be unexpected.
+	 */
+	if (flush_op != KBASE_MMU_OP_FLUSH_PT && flush_op != KBASE_MMU_OP_FLUSH_MEM) {
+		dev_err(kbdev->dev, "Unexpected flush operation received");
 		return -EINVAL;
 	}
 
-	/* MMU command instrumentation */
-	if (!ret) {
-		u64 lock_addr_base = AS_LOCKADDR_LOCKADDR_BASE_GET(lock_addr);
-		u32 lock_addr_size = AS_LOCKADDR_LOCKADDR_SIZE_GET(lock_addr);
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	lockdep_assert_held(&kbdev->mmu_hw_mutex);
+
+	if (flush_op == KBASE_MMU_OP_FLUSH_PT)
+		gpu_cmd = GPU_COMMAND_CACHE_CLN_INV_L2;
+
+	/* 1. Issue MMU_AS_CONTROL.COMMAND.LOCK operation. */
+	ret = mmu_hw_do_lock(kbdev, as, op_param);
+	if (ret)
+		return ret;
 
-		bool is_mmu_synchronous = false;
+	/* 2. Issue GPU_CONTROL.COMMAND.FLUSH_CACHES operation */
+	ret = kbase_gpu_cache_flush_and_busy_wait(kbdev, gpu_cmd);
 
-		if (op_param->mmu_sync_info == CALLER_MMU_SYNC)
-			is_mmu_synchronous = true;
+	/* 3. Issue MMU_AS_CONTROL.COMMAND.UNLOCK operation. */
+	ret2 = kbase_mmu_hw_do_unlock_no_addr(kbdev, as, op_param);
 
-		KBASE_TLSTREAM_AUX_MMU_COMMAND(kbdev, op_param->kctx_id,
-					       op_param->op, is_mmu_synchronous,
-					       lock_addr_base, lock_addr_size);
+#if MALI_USE_CSF
+	if (!ret && !ret2) {
+		if (flush_op == KBASE_MMU_OP_FLUSH_MEM &&
+		    kbdev->pm.backend.apply_hw_issue_TITANHW_2938_wa &&
+		    kbdev->pm.backend.l2_state == KBASE_L2_PEND_OFF)
+			ret = wait_l2_power_trans_complete(kbdev);
 	}
+#endif
 
-	return ret;
+	return ret ?: ret2;
 }
 
 void kbase_mmu_hw_clear_fault(struct kbase_device *kbdev, struct kbase_as *as,
@@ -333,7 +720,7 @@ void kbase_mmu_hw_clear_fault(struct kbase_device *kbdev, struct kbase_as *as,
 			type == KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED)
 		pf_bf_mask |= MMU_BUS_ERROR(as->number);
 #endif
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_CLEAR), pf_bf_mask);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_CLEAR), pf_bf_mask);
 
 unlock:
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
@@ -357,15 +744,15 @@ void kbase_mmu_hw_enable_fault(struct kbase_device *kbdev, struct kbase_as *as,
 	if (kbdev->irq_reset_flush)
 		goto unlock;
 
-	irq_mask = kbase_reg_read(kbdev, MMU_REG(MMU_IRQ_MASK)) |
-			MMU_PAGE_FAULT(as->number);
+	irq_mask =
+		kbase_reg_read(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK)) | MMU_PAGE_FAULT(as->number);
 
 #if !MALI_USE_CSF
 	if (type == KBASE_MMU_FAULT_TYPE_BUS ||
 			type == KBASE_MMU_FAULT_TYPE_BUS_UNEXPECTED)
 		irq_mask |= MMU_BUS_ERROR(as->number);
 #endif
-	kbase_reg_write(kbdev, MMU_REG(MMU_IRQ_MASK), irq_mask);
+	kbase_reg_write(kbdev, MMU_CONTROL_REG(MMU_IRQ_MASK), irq_mask);
 
 unlock:
 	spin_unlock_irqrestore(&kbdev->mmu_mask_change, flags);
diff --git a/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c b/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c
index c061099..f2c6274 100644
--- a/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c
+++ b/mali_kbase/mmu/mali_kbase_mmu_mode_aarch64.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2014, 2016-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2014, 2016-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -35,10 +35,8 @@
 #define ENTRY_IS_INVAL		2ULL
 #define ENTRY_IS_PTE		3ULL
 
-#define ENTRY_ATTR_BITS (7ULL << 2)	/* bits 4:2 */
 #define ENTRY_ACCESS_RW (1ULL << 6)     /* bits 6:7 */
 #define ENTRY_ACCESS_RO (3ULL << 6)
-#define ENTRY_SHARE_BITS (3ULL << 8)	/* bits 9:8 */
 #define ENTRY_ACCESS_BIT (1ULL << 10)
 #define ENTRY_NX_BIT (1ULL << 54)
 
@@ -189,35 +187,31 @@ static void set_num_valid_entries(u64 *pgd, unsigned int num_of_valid_entries)
 		   << UNUSED_BIT_POSITION_IN_PAGE_DESCRIPTOR);
 }
 
-static void entry_set_pte(u64 *pgd, u64 vpfn, phys_addr_t phy)
+static void entry_set_pte(u64 *entry, phys_addr_t phy)
 {
-	unsigned int nr_entries = get_num_valid_entries(pgd);
-
-	page_table_entry_set(&pgd[vpfn], (phy & PAGE_MASK) | ENTRY_ACCESS_BIT |
-						 ENTRY_IS_PTE);
-
-	set_num_valid_entries(pgd, nr_entries + 1);
+	page_table_entry_set(entry, (phy & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_PTE);
 }
 
-static void entry_invalidate(u64 *entry)
+static void entries_invalidate(u64 *entry, u32 count)
 {
-	page_table_entry_set(entry, ENTRY_IS_INVAL);
+	u32 i;
+
+	for (i = 0; i < count; i++)
+		page_table_entry_set(entry + i, ENTRY_IS_INVAL);
 }
 
-static const struct kbase_mmu_mode aarch64_mode = {
-	.update = mmu_update,
-	.get_as_setup = kbase_mmu_get_as_setup,
-	.disable_as = mmu_disable_as,
-	.pte_to_phy_addr = pte_to_phy_addr,
-	.ate_is_valid = ate_is_valid,
-	.pte_is_valid = pte_is_valid,
-	.entry_set_ate = entry_set_ate,
-	.entry_set_pte = entry_set_pte,
-	.entry_invalidate = entry_invalidate,
-	.get_num_valid_entries = get_num_valid_entries,
-	.set_num_valid_entries = set_num_valid_entries,
-	.flags = KBASE_MMU_MODE_HAS_NON_CACHEABLE
-};
+static const struct kbase_mmu_mode aarch64_mode = { .update = mmu_update,
+						    .get_as_setup = kbase_mmu_get_as_setup,
+						    .disable_as = mmu_disable_as,
+						    .pte_to_phy_addr = pte_to_phy_addr,
+						    .ate_is_valid = ate_is_valid,
+						    .pte_is_valid = pte_is_valid,
+						    .entry_set_ate = entry_set_ate,
+						    .entry_set_pte = entry_set_pte,
+						    .entries_invalidate = entries_invalidate,
+						    .get_num_valid_entries = get_num_valid_entries,
+						    .set_num_valid_entries = set_num_valid_entries,
+						    .flags = KBASE_MMU_MODE_HAS_NON_CACHEABLE };
 
 struct kbase_mmu_mode const *kbase_mmu_mode_get_aarch64(void)
 {
diff --git a/mali_kbase/platform/Kconfig b/mali_kbase/platform/Kconfig
index de4203c..b190e26 100644
--- a/mali_kbase/platform/Kconfig
+++ b/mali_kbase/platform/Kconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2012-2013, 2017, 2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2012-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -20,7 +20,7 @@
 
 # Add your platform specific Kconfig file here
 #
-# "drivers/gpu/arm/midgard/platform/xxx/Kconfig"
+# "$(MALI_KCONFIG_EXT_PREFIX)drivers/gpu/arm/midgard/platform/xxx/Kconfig"
 #
 # Where xxx is the platform name is the name set in MALI_PLATFORM_NAME
 #
diff --git a/mali_kbase/platform/devicetree/Kbuild b/mali_kbase/platform/devicetree/Kbuild
index 5eeccfa..995c4cd 100644
--- a/mali_kbase/platform/devicetree/Kbuild
+++ b/mali_kbase/platform/devicetree/Kbuild
@@ -20,6 +20,5 @@
 
 mali_kbase-y += \
     platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_devicetree.o \
-    platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_platform.o \
     platform/$(MALI_PLATFORM_DIR)/mali_kbase_runtime_pm.o \
     platform/$(MALI_PLATFORM_DIR)/mali_kbase_clk_rate_trace.o
diff --git a/mali_kbase/platform/devicetree/mali_kbase_config_platform.h b/mali_kbase/platform/devicetree/mali_kbase_config_platform.h
index 743885f..584a721 100644
--- a/mali_kbase/platform/devicetree/mali_kbase_config_platform.h
+++ b/mali_kbase/platform/devicetree/mali_kbase_config_platform.h
@@ -33,13 +33,12 @@
  * Attached value: pointer to @ref kbase_platform_funcs_conf
  * Default value: See @ref kbase_platform_funcs_conf
  */
-#define PLATFORM_FUNCS (&platform_funcs)
+#define PLATFORM_FUNCS (NULL)
 
 #define CLK_RATE_TRACE_OPS (&clk_rate_trace_ops)
 
 extern struct kbase_pm_callback_conf pm_callbacks;
 extern struct kbase_clk_rate_trace_op_conf clk_rate_trace_ops;
-extern struct kbase_platform_funcs_conf platform_funcs;
 /**
  * AUTO_SUSPEND_DELAY - Autosuspend delay
  *
diff --git a/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c b/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c
index 3881d28..a019229 100644
--- a/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c
+++ b/mali_kbase/platform/devicetree/mali_kbase_runtime_pm.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -50,7 +50,6 @@ static void enable_gpu_power_control(struct kbase_device *kbdev)
 	}
 }
 
-
 static void disable_gpu_power_control(struct kbase_device *kbdev)
 {
 	unsigned int i;
@@ -82,8 +81,7 @@ static int pm_callback_power_on(struct kbase_device *kbdev)
 	int error;
 	unsigned long flags;
 
-	dev_dbg(kbdev->dev, "%s %p\n", __func__,
-			(void *)kbdev->dev->pm_domain);
+	dev_dbg(kbdev->dev, "%s %pK\n", __func__, (void *)kbdev->dev->pm_domain);
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
 	WARN_ON(kbdev->pm.backend.gpu_powered);
@@ -99,9 +97,8 @@ static int pm_callback_power_on(struct kbase_device *kbdev)
 #else
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 
+#ifdef KBASE_PM_RUNTIME
 	error = pm_runtime_get_sync(kbdev->dev);
-	enable_gpu_power_control(kbdev);
-
 	if (error == 1) {
 		/*
 		 * Let core know that the chip has not been
@@ -109,8 +106,11 @@ static int pm_callback_power_on(struct kbase_device *kbdev)
 		 */
 		ret = 0;
 	}
-
 	dev_dbg(kbdev->dev, "pm_runtime_get_sync returned %d\n", error);
+#else
+	enable_gpu_power_control(kbdev);
+#endif /* KBASE_PM_RUNTIME */
+
 #endif /* MALI_USE_CSF */
 
 	return ret;
@@ -126,7 +126,9 @@ static void pm_callback_power_off(struct kbase_device *kbdev)
 	WARN_ON(kbdev->pm.backend.gpu_powered);
 #if MALI_USE_CSF
 	if (likely(kbdev->csf.firmware_inited)) {
+#ifdef CONFIG_MALI_DEBUG
 		WARN_ON(kbase_csf_scheduler_get_nr_active_csgs(kbdev));
+#endif
 		WARN_ON(kbdev->pm.backend.mcu_state != KBASE_MCU_OFF);
 	}
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
@@ -241,7 +243,9 @@ static int pm_callback_runtime_on(struct kbase_device *kbdev)
 {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
+#if !MALI_USE_CSF
 	enable_gpu_power_control(kbdev);
+#endif
 	return 0;
 }
 
@@ -249,7 +253,9 @@ static void pm_callback_runtime_off(struct kbase_device *kbdev)
 {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
+#if !MALI_USE_CSF
 	disable_gpu_power_control(kbdev);
+#endif
 }
 
 static void pm_callback_resume(struct kbase_device *kbdev)
@@ -264,6 +270,17 @@ static void pm_callback_suspend(struct kbase_device *kbdev)
 	pm_callback_runtime_off(kbdev);
 }
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void pm_callback_sc_rails_on(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "SC rails are on");
+}
+
+static void pm_callback_sc_rails_off(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "SC rails are off");
+}
+#endif
 
 struct kbase_pm_callback_conf pm_callbacks = {
 	.power_on_callback = pm_callback_power_on,
@@ -289,6 +306,9 @@ struct kbase_pm_callback_conf pm_callbacks = {
 	.power_runtime_gpu_idle_callback = NULL,
 	.power_runtime_gpu_active_callback = NULL,
 #endif
-};
-
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	.power_on_sc_rails_callback = pm_callback_sc_rails_on,
+	.power_off_sc_rails_callback = pm_callback_sc_rails_off,
+#endif
+};
diff --git a/mali_kbase/platform/meson/Kbuild b/mali_kbase/platform/meson/Kbuild
new file mode 100644
index 0000000..3f55378
--- /dev/null
+++ b/mali_kbase/platform/meson/Kbuild
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# (C) COPYRIGHT 2012-2017, 2019-2021 ARM Limited. All rights reserved.
+#
+# This program is free software and is provided to you under the terms of the
+# GNU General Public License version 2 as published by the Free Software
+# Foundation, and any use by you of this program is subject to the terms
+# of such GNU license.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, you can access it online at
+# http://www.gnu.org/licenses/gpl-2.0.html.
+#
+#
+
+mali_kbase-y += \
+    platform/$(MALI_PLATFORM_DIR)/mali_kbase_config_meson.o \
+    platform/$(MALI_PLATFORM_DIR)/mali_kbase_runtime_pm.o
diff --git a/mali_kbase/platform/devicetree/mali_kbase_config_platform.c b/mali_kbase/platform/meson/mali_kbase_config_meson.c
index 2eebed0..c999a52 100644
--- a/mali_kbase/platform/devicetree/mali_kbase_config_platform.c
+++ b/mali_kbase/platform/meson/mali_kbase_config_meson.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2021-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015, 2017, 2019, 2021, 2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -20,24 +20,34 @@
  */
 
 #include <mali_kbase.h>
-#include <mali_kbase_defs.h>
 #include <mali_kbase_config.h>
-#include "mali_kbase_config_platform.h"
-#include <device/mali_kbase_device.h>
-#include <mali_kbase_hwaccess_time.h>
-#include <gpu/mali_kbase_gpu_regmap.h>
+#include <backend/gpu/mali_kbase_pm_internal.h>
 
-#include <linux/kthread.h>
-#include <linux/timer.h>
-#include <linux/jiffies.h>
-#include <linux/wait.h>
-#include <linux/delay.h>
-#include <linux/gcd.h>
-#include <asm/arch_timer.h>
+static struct kbase_platform_config dummy_platform_config;
 
-struct kbase_platform_funcs_conf platform_funcs = {
-	.platform_init_func = NULL,
-	.platform_term_func = NULL,
-	.platform_late_init_func = NULL,
-	.platform_late_term_func = NULL,
-};
+struct kbase_platform_config *kbase_get_platform_config(void)
+{
+	return &dummy_platform_config;
+}
+
+#ifndef CONFIG_OF
+int kbase_platform_register(void)
+{
+	return 0;
+}
+
+void kbase_platform_unregister(void)
+{
+}
+#endif
+
+#ifdef CONFIG_MALI_MIDGARD_DVFS
+#if MALI_USE_CSF
+int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation)
+#else
+int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation, u32 util_gl_share, u32 util_cl_share[2])
+#endif
+{
+	return 1;
+}
+#endif /* CONFIG_MALI_MIDGARD_DVFS */
diff --git a/mali_kbase/platform/meson/mali_kbase_config_platform.h b/mali_kbase/platform/meson/mali_kbase_config_platform.h
new file mode 100644
index 0000000..866a7de
--- /dev/null
+++ b/mali_kbase/platform/meson/mali_kbase_config_platform.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2014-2017, 2019-2023 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+/**
+ * POWER_MANAGEMENT_CALLBACKS - Power management configuration
+ *
+ * Attached value: pointer to @ref kbase_pm_callback_conf
+ * Default value: See @ref kbase_pm_callback_conf
+ */
+#define POWER_MANAGEMENT_CALLBACKS (&pm_callbacks)
+
+/**
+ * PLATFORM_FUNCS - Platform specific configuration functions
+ *
+ * Attached value: pointer to @ref kbase_platform_funcs_conf
+ * Default value: See @ref kbase_platform_funcs_conf
+ */
+#define PLATFORM_FUNCS (NULL)
+
+extern struct kbase_pm_callback_conf pm_callbacks;
+
+/**
+ * AUTO_SUSPEND_DELAY - Autosuspend delay
+ *
+ * The delay time (in milliseconds) to be used for autosuspend
+ */
+#define AUTO_SUSPEND_DELAY (100)
diff --git a/mali_kbase/platform/meson/mali_kbase_runtime_pm.c b/mali_kbase/platform/meson/mali_kbase_runtime_pm.c
new file mode 100644
index 0000000..a9b380c
--- /dev/null
+++ b/mali_kbase/platform/meson/mali_kbase_runtime_pm.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2015, 2017-2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+
+#include <mali_kbase.h>
+#include <mali_kbase_defs.h>
+#include <device/mali_kbase_device.h>
+
+#include <linux/pm_runtime.h>
+#include <linux/reset.h>
+#include <linux/clk.h>
+#include <linux/clk-provider.h>
+#include <linux/delay.h>
+#include <linux/regulator/consumer.h>
+
+#include "mali_kbase_config_platform.h"
+
+
+static struct reset_control **resets;
+static int nr_resets;
+
+static int resets_init(struct kbase_device *kbdev)
+{
+	struct device_node *np;
+	int i;
+	int err = 0;
+
+	np = kbdev->dev->of_node;
+
+	nr_resets = of_count_phandle_with_args(np, "resets", "#reset-cells");
+	if (nr_resets <= 0) {
+		dev_err(kbdev->dev, "Failed to get GPU resets from dtb\n");
+		return nr_resets;
+	}
+
+	resets = devm_kcalloc(kbdev->dev, nr_resets, sizeof(*resets),
+			GFP_KERNEL);
+	if (!resets)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_resets; ++i) {
+		resets[i] = devm_reset_control_get_exclusive_by_index(
+				kbdev->dev, i);
+		if (IS_ERR(resets[i])) {
+			err = PTR_ERR(resets[i]);
+			nr_resets = i;
+			break;
+		}
+	}
+
+	return err;
+}
+
+static int pm_callback_soft_reset(struct kbase_device *kbdev)
+{
+	int ret, i;
+
+	if (!resets) {
+		ret = resets_init(kbdev);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < nr_resets; ++i)
+		reset_control_assert(resets[i]);
+
+	udelay(10);
+
+	for (i = 0; i < nr_resets; ++i)
+		reset_control_deassert(resets[i]);
+
+	udelay(10);
+
+	/* Override Power Management Settings, values from manufacturer's defaults */
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(PWR_KEY), 0x2968A819);
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(PWR_OVERRIDE1),
+			0xfff | (0x20 << 16));
+
+	/*
+	 * RESET_COMPLETED interrupt will be raised, so continue with
+	 * the normal soft reset procedure
+	 */
+	return 0;
+}
+
+static void enable_gpu_power_control(struct kbase_device *kbdev)
+{
+	unsigned int i;
+
+#if defined(CONFIG_REGULATOR)
+	for (i = 0; i < kbdev->nr_regulators; i++) {
+		if (WARN_ON(kbdev->regulators[i] == NULL))
+			;
+		else if (!regulator_is_enabled(kbdev->regulators[i]))
+			WARN_ON(regulator_enable(kbdev->regulators[i]));
+	}
+#endif
+
+	for (i = 0; i < kbdev->nr_clocks; i++) {
+		if (WARN_ON(kbdev->clocks[i] == NULL))
+			;
+		else if (!__clk_is_enabled(kbdev->clocks[i]))
+			WARN_ON(clk_prepare_enable(kbdev->clocks[i]));
+	}
+}
+
+static void disable_gpu_power_control(struct kbase_device *kbdev)
+{
+	unsigned int i;
+
+	for (i = 0; i < kbdev->nr_clocks; i++) {
+		if (WARN_ON(kbdev->clocks[i] == NULL))
+			;
+		else if (__clk_is_enabled(kbdev->clocks[i])) {
+			clk_disable_unprepare(kbdev->clocks[i]);
+			WARN_ON(__clk_is_enabled(kbdev->clocks[i]));
+		}
+	}
+
+#if defined(CONFIG_REGULATOR)
+	for (i = 0; i < kbdev->nr_regulators; i++) {
+		if (WARN_ON(kbdev->regulators[i] == NULL))
+			;
+		else if (regulator_is_enabled(kbdev->regulators[i]))
+			WARN_ON(regulator_disable(kbdev->regulators[i]));
+	}
+#endif
+}
+
+static int pm_callback_power_on(struct kbase_device *kbdev)
+{
+	int ret = 1; /* Assume GPU has been powered off */
+	int error;
+
+	dev_dbg(kbdev->dev, "%s %pK\n", __func__, (void *)kbdev->dev->pm_domain);
+
+#ifdef KBASE_PM_RUNTIME
+	error = pm_runtime_get_sync(kbdev->dev);
+	if (error == 1) {
+		/*
+		 * Let core know that the chip has not been
+		 * powered off, so we can save on re-initialization.
+		 */
+		ret = 0;
+	}
+	dev_dbg(kbdev->dev, "pm_runtime_get_sync returned %d\n", error);
+#else
+	enable_gpu_power_control(kbdev);
+#endif
+
+	return ret;
+}
+
+static void pm_callback_power_off(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+#ifdef KBASE_PM_RUNTIME
+	pm_runtime_mark_last_busy(kbdev->dev);
+	pm_runtime_put_autosuspend(kbdev->dev);
+#else
+	/* Power down the GPU immediately as runtime PM is disabled */
+	disable_gpu_power_control(kbdev);
+#endif
+}
+
+#ifdef KBASE_PM_RUNTIME
+static int kbase_device_runtime_init(struct kbase_device *kbdev)
+{
+	int ret = 0;
+
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	pm_runtime_set_autosuspend_delay(kbdev->dev, AUTO_SUSPEND_DELAY);
+	pm_runtime_use_autosuspend(kbdev->dev);
+
+	pm_runtime_set_active(kbdev->dev);
+	pm_runtime_enable(kbdev->dev);
+
+	if (!pm_runtime_enabled(kbdev->dev)) {
+		dev_warn(kbdev->dev, "pm_runtime not enabled");
+		ret = -EINVAL;
+	} else if (atomic_read(&kbdev->dev->power.usage_count)) {
+		dev_warn(kbdev->dev, "%s: Device runtime usage count unexpectedly non zero %d",
+			 __func__, atomic_read(&kbdev->dev->power.usage_count));
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static void kbase_device_runtime_disable(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	if (atomic_read(&kbdev->dev->power.usage_count))
+		dev_warn(kbdev->dev, "%s: Device runtime usage count unexpectedly non zero %d",
+			 __func__, atomic_read(&kbdev->dev->power.usage_count));
+
+	pm_runtime_disable(kbdev->dev);
+}
+#endif /* KBASE_PM_RUNTIME */
+
+static int pm_callback_runtime_on(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	enable_gpu_power_control(kbdev);
+	return 0;
+}
+
+static void pm_callback_runtime_off(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	disable_gpu_power_control(kbdev);
+}
+
+static void pm_callback_resume(struct kbase_device *kbdev)
+{
+	int ret = pm_callback_runtime_on(kbdev);
+
+	WARN_ON(ret);
+}
+
+static void pm_callback_suspend(struct kbase_device *kbdev)
+{
+	pm_callback_runtime_off(kbdev);
+}
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+static void pm_callback_sc_rails_on(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "SC rails are on");
+}
+
+static void pm_callback_sc_rails_off(struct kbase_device *kbdev)
+{
+	dev_dbg(kbdev->dev, "SC rails are off");
+}
+#endif
+
+struct kbase_pm_callback_conf pm_callbacks = {
+	.power_on_callback = pm_callback_power_on,
+	.power_off_callback = pm_callback_power_off,
+	.power_suspend_callback = pm_callback_suspend,
+	.power_resume_callback = pm_callback_resume,
+	.soft_reset_callback = pm_callback_soft_reset,
+#ifdef KBASE_PM_RUNTIME
+	.power_runtime_init_callback = kbase_device_runtime_init,
+	.power_runtime_term_callback = kbase_device_runtime_disable,
+	.power_runtime_on_callback = pm_callback_runtime_on,
+	.power_runtime_off_callback = pm_callback_runtime_off,
+#else				/* KBASE_PM_RUNTIME */
+	.power_runtime_init_callback = NULL,
+	.power_runtime_term_callback = NULL,
+	.power_runtime_on_callback = NULL,
+	.power_runtime_off_callback = NULL,
+#endif				/* KBASE_PM_RUNTIME */
+
+#if MALI_USE_CSF && defined(KBASE_PM_RUNTIME)
+	.power_runtime_gpu_idle_callback = pm_callback_runtime_gpu_idle,
+	.power_runtime_gpu_active_callback = pm_callback_runtime_gpu_active,
+#else
+	.power_runtime_gpu_idle_callback = NULL,
+	.power_runtime_gpu_active_callback = NULL,
+#endif
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	.power_on_sc_rails_callback = pm_callback_sc_rails_on,
+	.power_off_sc_rails_callback = pm_callback_sc_rails_off,
+#endif
+};
diff --git a/mali_kbase/platform/pixel/Kbuild b/mali_kbase/platform/pixel/Kbuild
index 1d368c9..b80c87b 100644
--- a/mali_kbase/platform/pixel/Kbuild
+++ b/mali_kbase/platform/pixel/Kbuild
@@ -21,7 +21,9 @@
 
 mali_kbase-y += \
 	platform/$(MALI_PLATFORM_DIR)/pixel_gpu.o \
-	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_power.o
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_power.o \
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_uevent.o \
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_itmon.o
 
 mali_kbase-$(CONFIG_MALI_MIDGARD_DVFS) += \
 	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_dvfs.o \
@@ -34,3 +36,14 @@ mali_kbase-$(CONFIG_MALI_PIXEL_GPU_QOS) += \
 
 mali_kbase-$(CONFIG_MALI_PIXEL_GPU_THERMAL) += \
 	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_tmu.o
+
+ifneq ($(filter -DCONFIG_MALI_PIXEL_GPU_SSCD, $(ccflags-y)),)
+mali_kbase-y += \
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_sscd.o
+endif
+
+mali_kbase-$(CONFIG_MALI_PIXEL_GPU_SLC) += \
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_slc.o
+
+mali_kbase-$(CONFIG_MALI_CSF_SUPPORT) += \
+	platform/$(MALI_PLATFORM_DIR)/pixel_gpu_debug.o
diff --git a/mali_kbase/platform/pixel/mali_kbase_config_platform.h b/mali_kbase/platform/pixel/mali_kbase_config_platform.h
index 87df05d..57cec12 100644
--- a/mali_kbase/platform/pixel/mali_kbase_config_platform.h
+++ b/mali_kbase/platform/pixel/mali_kbase_config_platform.h
@@ -45,7 +45,10 @@
  * Attached value: pointer to @ref kbase_clk_rate_trace_op_conf
  * Default value: See @ref kbase_clk_rate_trace_op_conf
  */
+#ifdef CONFIG_MALI_MIDGARD_DVFS
 #define CLK_RATE_TRACE_OPS (&pixel_clk_rate_trace_ops)
+extern struct kbase_clk_rate_trace_op_conf pixel_clk_rate_trace_ops;
+#endif
 
 /**
  * Platform specific configuration functions
@@ -56,7 +59,6 @@
 #define PLATFORM_FUNCS (&platform_funcs)
 
 extern struct kbase_pm_callback_conf pm_callbacks;
-extern struct kbase_clk_rate_trace_op_conf pixel_clk_rate_trace_ops;
 extern struct kbase_platform_funcs_conf platform_funcs;
 
 #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING
@@ -65,13 +67,6 @@ extern struct protected_mode_ops pixel_protected_ops;
 #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */
 
 /**
- * Autosuspend delay
- *
- * The delay time (in milliseconds) to be used for autosuspend
- */
-#define AUTO_SUSPEND_DELAY (100)
-
-/**
  * DVFS Utilization evaluation period
  *
  * The amount of time (in milliseconds) between sucessive measurements of the
@@ -86,8 +81,16 @@ extern struct protected_mode_ops pixel_protected_ops;
 #include <linux/workqueue.h>
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
 
+#if IS_ENABLED(CONFIG_EXYNOS_ITMON)
+#include <linux/atomic.h>
+#include <linux/notifier.h>
+#include <linux/workqueue.h>
+#endif /* IS_ENABLED(CONFIG_EXYNOS_ITMON) */
+
 /* SOC level includes */
+#if IS_ENABLED(CONFIG_GOOGLE_BCL)
 #include <soc/google/bcl.h>
+#endif
 #if IS_ENABLED(CONFIG_EXYNOS_PD)
 #include <soc/google/exynos-pd.h>
 #endif
@@ -102,8 +105,10 @@ extern struct protected_mode_ops pixel_protected_ops;
 #include "pixel_gpu_dvfs.h"
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
 
+#include "pixel_gpu_uevent.h"
+
 /* All port specific fields go here */
-#define OF_DATA_NUM_MAX 128
+#define OF_DATA_NUM_MAX 140
 #define CPU_FREQ_MAX INT_MAX
 
 enum gpu_power_state {
@@ -116,14 +121,15 @@ enum gpu_power_state {
 	 * The power state can thus be defined as the highest-level domain that
 	 * is currently powered on.
 	 *
-	 * GLOBAL: The frontend (JM, CSF), including registers.
-	 * COREGROUP: The L2 and AXI interface, Tiler, and MMU.
-	 * STACKS: The shader cores.
+	 * GLOBAL: JM, CSF: The frontend (JM, CSF), including registers.
+	 *         CSF: The L2 and AXI interface, Tiler, and MMU.
+	 * STACKS: JM, CSF: The shader cores.
+	 *         JM: The L2 and AXI interface, Tiler, and MMU.
 	 */
 	GPU_POWER_LEVEL_OFF       = 0,
 	GPU_POWER_LEVEL_GLOBAL    = 1,
-	GPU_POWER_LEVEL_COREGROUP = 2,
-	GPU_POWER_LEVEL_STACKS    = 3,
+	GPU_POWER_LEVEL_STACKS    = 2,
+	GPU_POWER_LEVEL_NUM
 };
 
 /**
@@ -231,7 +237,9 @@ struct gpu_dvfs_metrics_uid_stats;
  * @pm.domain:                  The power domain the GPU is in.
  * @pm.status_reg_offset:       Register offset to the G3D status in the PMU. Set via DT.
  * @pm.status_local_power_mask: Mask to extract power status of the GPU. Set via DT.
- * @pm.autosuspend_delay:       Delay (in ms) before PM runtime should trigger auto suspend.
+ * @pm.use_autosuspend:         Use autosuspend on the TOP domain if true, sync suspend if false.
+ * @pm.autosuspend_delay:       Delay (in ms) before PM runtime should trigger auto suspend on TOP
+ *                              domain if use_autosuspend is true.
  * @pm.bcl_dev:                 Pointer to the Battery Current Limiter device.
  *
  * @tz_protection_enabled:      Storing the secure rendering state of the GPU. Access to this is
@@ -271,9 +279,9 @@ struct gpu_dvfs_metrics_uid_stats;
  * @dvfs.metrics.last_power_state: The GPU's power state when the DVFS metric logic was last run.
  * @dvfs.metrics.last_level:       The GPU's level when the DVFS metric logic was last run.
  * @dvfs.metrics.transtab:         Pointer to the DVFS transition table.
- * @dvfs.metrics.js_uid_stats:     An array of pointers to the per-UID stats blocks currently
- *                                 resident in each of the GPU's job slots. Access is controlled by
- *                                 the hwaccess lock.
+ * @dvfs.metrics.work_uid_stats:   An array of pointers to the per-UID stats blocks currently
+ *                                 resident in each of the GPU's job slots, or CSG slots.
+ *                                 Access is controlled by the dvfs.metrics.lock.
  * @dvfs.metrics.uid_stats_list:   List head pointer to the linked list of per-UID stats blocks.
  *                                 Modification to the linked list itself (not its elements) is
  *                                 protected by the kctx_list lock.
@@ -293,6 +301,16 @@ struct gpu_dvfs_metrics_uid_stats;
  * @dvfs.qos.bts.enabled:   Stores whether Bus Traffic Shaping (BTS) is currently enabled
  * @dvfs.qos.bts.threshold: The G3D shader stack clock at which BTS will be enabled. Set via DT.
  * @dvfs.qos.bts.scenario:  The index of the BTS scenario to be used. Set via DT.
+ *
+ * @slc.lock:           Synchronize updates to the SLC partition accounting variables.
+ * @slc.demand:         The total demand for SLC space, an aggregation of each kctx's demand.
+ * @slc.usage:          The total amount of SLC space used, an aggregation of each kctx's usage.
+ *
+ * @itmon.wq:     A workqueue for ITMON page table search.
+ * @itmon.work:   The work item for the above.
+ * @itmon.nb:     The ITMON notifier block.
+ * @itmon.pa:     The faulting physical address.
+ * @itmon.active: Active count, non-zero while a search is active.
  */
 struct pixel_context {
 	struct kbase_device *kbdev;
@@ -303,10 +321,10 @@ struct pixel_context {
 
 		struct device *domain_devs[GPU_PM_DOMAIN_COUNT];
 		struct device_link *domain_links[GPU_PM_DOMAIN_COUNT];
-
 		struct exynos_pm_domain *domain;
 		unsigned int status_reg_offset;
 		unsigned int status_local_power_mask;
+		bool use_autosuspend;
 		unsigned int autosuspend_delay;
 #ifdef CONFIG_MALI_MIDGARD_DVFS
 		struct gpu_dvfs_opp_metrics power_off_metrics;
@@ -315,6 +333,10 @@ struct pixel_context {
 #if IS_ENABLED(CONFIG_GOOGLE_BCL)
 		struct bcl_device *bcl_dev;
 #endif
+		struct pixel_rail_state_log *rail_state_log;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+		bool ifpo_enabled;
+#endif
 	} pm;
 
 #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING
@@ -328,17 +350,21 @@ struct pixel_context {
 		struct workqueue_struct *control_wq;
 		struct work_struct control_work;
 		atomic_t util;
+#if !MALI_USE_CSF
 		atomic_t util_gl;
 		atomic_t util_cl;
+#endif
 
 		struct workqueue_struct *clockdown_wq;
 		struct delayed_work clockdown_work;
 		unsigned int clockdown_hysteresis;
 
+		bool updates_enabled;
 		struct gpu_dvfs_clk clks[GPU_DVFS_CLK_COUNT];
 
 		struct gpu_dvfs_opp *table;
 		int table_size;
+		int step_up_val;
 		int level;
 		int level_target;
 		int level_max;
@@ -354,11 +380,16 @@ struct pixel_context {
 		} governor;
 
 		struct {
+			spinlock_t lock;
 			u64 last_time;
 			bool last_power_state;
 			int last_level;
 			int *transtab;
-			struct gpu_dvfs_metrics_uid_stats *js_uid_stats[BASE_JM_MAX_NR_SLOTS];
+#if !MALI_USE_CSF
+			struct gpu_dvfs_metrics_uid_stats *work_uid_stats[BASE_JM_MAX_NR_SLOTS * SLOT_RB_SIZE];
+#else
+			struct gpu_dvfs_metrics_uid_stats *work_uid_stats[MAX_SUPPORTED_CSGS];
+#endif /* !MALI_USE_CSF */
 			struct list_head uid_stats_list;
 		} metrics;
 
@@ -389,6 +420,38 @@ struct pixel_context {
 #endif /* CONFIG_MALI_PIXEL_GPU_THERMAL */
 	} dvfs;
 #endif /* CONFIG_MALI_MIDGARD_DVFS */
+
+	struct {
+		struct mutex lock;
+		u64 demand;
+		u64 usage;
+	} slc;
+
+#if IS_ENABLED(CONFIG_EXYNOS_ITMON)
+	struct {
+		struct workqueue_struct *wq;
+		struct work_struct work;
+		struct notifier_block nb;
+		phys_addr_t pa;
+		atomic_t active;
+	} itmon;
+#endif
+};
+
+/**
+ * struct pixel_platform_data - Per kbase_context Pixel specific platform data
+ *
+ * @stats:      Tracks the dvfs metrics for the UID associated with this context
+ *
+ * @slc.peak_demand: The parent context's maximum demand for SLC space
+ * @slc.peak_usage:  The parent context's maximum use of SLC space
+ */
+struct pixel_platform_data {
+	struct gpu_dvfs_metrics_uid_stats* stats;
+	struct {
+		u64 peak_demand;
+		u64 peak_usage;
+	} slc;
 };
 
 #endif /* _KBASE_CONFIG_PLATFORM_H_ */
diff --git a/mali_kbase/platform/pixel/pixel_gpu.c b/mali_kbase/platform/pixel/pixel_gpu.c
index 940f125..3e8977c 100644
--- a/mali_kbase/platform/pixel/pixel_gpu.c
+++ b/mali_kbase/platform/pixel/pixel_gpu.c
@@ -21,10 +21,15 @@
 #ifdef CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING
 #include <device/mali_kbase_device_internal.h>
 #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */
+#if MALI_USE_CSF
+#include <csf/mali_kbase_csf_firmware_cfg.h>
+#endif
 
 /* Pixel integration includes */
 #include "mali_kbase_config_platform.h"
 #include "pixel_gpu_control.h"
+#include "pixel_gpu_sscd.h"
+#include "pixel_gpu_slc.h"
 
 #define CREATE_TRACE_POINTS
 #include "pixel_gpu_trace.h"
@@ -35,6 +40,10 @@
  */
 #define GPU_SMC_TZPC_OK 0
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+#define HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME "Host controls SC rails"
+#endif
+
 /**
  * pixel_gpu_secure_mode_enable() - Enables secure mode for the GPU
  *
@@ -118,6 +127,123 @@ struct protected_mode_ops pixel_protected_ops = {
 
 #endif /* CONFIG_MALI_PIXEL_GPU_SECURE_RENDERING */
 
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+/**
+ * gpu_pixel_enable_host_ctrl_sc_rails() - Enable the config in FW to support host based
+ *                                         control of SC power rails
+ *
+ * Look for the config entry that enables support in FW for the Host based
+ * control of shader core power rails and set it before the initial boot
+ * or reload of firmware.
+ *
+ * @kbdev:     Kbase device structure
+ *
+ * Return: 0 if successful, negative error code on failure
+ */
+static int gpu_pixel_enable_host_ctrl_sc_rails(struct kbase_device *kbdev)
+{
+	u32 addr;
+	int ec = kbase_csf_firmware_cfg_find_config_address(
+		kbdev, HOST_CONTROLS_SC_RAILS_CFG_ENTRY_NAME, &addr);
+
+	if (!ec) {
+		kbase_csf_update_firmware_memory(kbdev, addr, 1);
+	}
+
+	return ec;
+}
+#endif
+
+static int gpu_fw_cfg_init(struct kbase_device *kbdev) {
+	int ec = 0;
+
+#if MALI_USE_CSF
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	ec = gpu_pixel_enable_host_ctrl_sc_rails(kbdev);
+	if (ec)
+		dev_warn(kbdev->dev, "pixel: failed to enable SC rail host-control");
+#endif
+	if (gpu_sscd_fw_log_init(kbdev, 0)) {
+		dev_warn(kbdev->dev, "pixel: failed to enable FW log");
+	}
+#endif
+
+	return ec;
+}
+
+/**
+ * gpu_pixel_kctx_init() - Called when a kernel context is created
+ *
+ * @kctx: The &struct kbase_context that is being initialized
+ *
+ * This function is called when the GPU driver is initializing a new kernel context.
+ *
+ * Return: Returns 0 on success, or an error code on failure.
+ */
+static int gpu_pixel_kctx_init(struct kbase_context *kctx)
+{
+	struct kbase_device* kbdev = kctx->kbdev;
+	int err;
+
+	kctx->platform_data = kzalloc(sizeof(struct pixel_platform_data), GFP_KERNEL);
+	if (kctx->platform_data == NULL) {
+		dev_err(kbdev->dev, "pixel: failed to alloc platform_data for kctx");
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = gpu_dvfs_kctx_init(kctx);
+	if (err) {
+		dev_err(kbdev->dev, "pixel: DVFS kctx init failed\n");
+		goto done;
+	}
+
+	err = gpu_slc_kctx_init(kctx);
+	if (err) {
+		dev_err(kbdev->dev, "pixel: SLC kctx init failed\n");
+		goto done;
+	}
+
+done:
+	return err;
+}
+
+/**
+ * gpu_pixel_kctx_term() - Called when a kernel context is terminated
+ *
+ * @kctx: The &struct kbase_context that is being terminated
+ */
+static void gpu_pixel_kctx_term(struct kbase_context *kctx)
+{
+	gpu_slc_kctx_term(kctx);
+	gpu_dvfs_kctx_term(kctx);
+
+	kfree(kctx->platform_data);
+	kctx->platform_data = NULL;
+}
+
+static const struct kbase_device_init dev_init[] = {
+	{ gpu_pm_init, gpu_pm_term, "PM init failed" },
+#ifdef CONFIG_MALI_MIDGARD_DVFS
+	{ gpu_dvfs_init, gpu_dvfs_term, "DVFS init failed" },
+#endif
+	{ gpu_sysfs_init, gpu_sysfs_term, "sysfs init failed" },
+	{ gpu_sscd_init, gpu_sscd_term, "SSCD init failed" },
+	{ gpu_slc_init, gpu_slc_term, "SLC init failed" },
+#if IS_ENABLED(CONFIG_EXYNOS_ITMON)
+	{ gpu_itmon_init, gpu_itmon_term, "ITMON notifier init failed" },
+#endif
+};
+
+static void gpu_pixel_term_partial(struct kbase_device *kbdev,
+		unsigned int i)
+{
+	while (i-- > 0) {
+		if (dev_init[i].term)
+			dev_init[i].term(kbdev);
+	}
+}
+
 /**
  * gpu_pixel_init() - Initializes the Pixel integration for the Mali GPU.
  *
@@ -127,8 +253,8 @@ struct protected_mode_ops pixel_protected_ops = {
  */
 static int gpu_pixel_init(struct kbase_device *kbdev)
 {
-	int ret;
-
+	int ret = 0;
+	unsigned int i;
 	struct pixel_context *pc;
 
 	pc = kzalloc(sizeof(struct pixel_context), GFP_KERNEL);
@@ -141,26 +267,22 @@ static int gpu_pixel_init(struct kbase_device *kbdev)
 	kbdev->platform_context = pc;
 	pc->kbdev = kbdev;
 
-	ret = gpu_pm_init(kbdev);
-	if (ret) {
-		dev_err(kbdev->dev, "power management init failed\n");
-		goto done;
-	}
-
-#ifdef CONFIG_MALI_MIDGARD_DVFS
-	ret = gpu_dvfs_init(kbdev);
-	if (ret) {
-		dev_err(kbdev->dev, "DVFS init failed\n");
-		goto done;
+	for (i = 0; i < ARRAY_SIZE(dev_init); i++) {
+		if (dev_init[i].init) {
+			ret = dev_init[i].init(kbdev);
+			if (ret) {
+				dev_err(kbdev->dev, "%s error = %d\n",
+					dev_init[i].err_mes, ret);
+				break;
+			}
+		}
 	}
-#endif /* CONFIG_MALI_MIDGARD_DVFS */
 
-	ret = gpu_sysfs_init(kbdev);
 	if (ret) {
-		dev_err(kbdev->dev, "sysfs init failed\n");
-		goto done;
+		gpu_pixel_term_partial(kbdev, i);
+		kbdev->platform_context = NULL;
+		kfree(pc);
 	}
-	ret = 0;
 
 done:
 	return ret;
@@ -175,10 +297,7 @@ static void gpu_pixel_term(struct kbase_device *kbdev)
 {
 	struct pixel_context *pc = kbdev->platform_context;
 
-	gpu_sysfs_term(kbdev);
-	gpu_dvfs_term(kbdev);
-	gpu_pm_term(kbdev);
-
+	gpu_pixel_term_partial(kbdev, ARRAY_SIZE(dev_init));
 	kbdev->platform_context = NULL;
 	kfree(pc);
 }
@@ -186,8 +305,10 @@ static void gpu_pixel_term(struct kbase_device *kbdev)
 struct kbase_platform_funcs_conf platform_funcs = {
 	.platform_init_func = &gpu_pixel_init,
 	.platform_term_func = &gpu_pixel_term,
-	.platform_handler_context_init_func = &gpu_dvfs_kctx_init,
-	.platform_handler_context_term_func = &gpu_dvfs_kctx_term,
-	.platform_handler_atom_submit_func = &gpu_dvfs_metrics_job_start,
-	.platform_handler_atom_complete_func = &gpu_dvfs_metrics_job_end,
+	.platform_handler_context_init_func = &gpu_pixel_kctx_init,
+	.platform_handler_context_term_func = &gpu_pixel_kctx_term,
+	.platform_handler_work_begin_func = &gpu_dvfs_metrics_work_begin,
+	.platform_handler_work_end_func = &gpu_dvfs_metrics_work_end,
+	.platform_fw_cfg_init_func = &gpu_fw_cfg_init,
+	.platform_handler_core_dump_func = &gpu_sscd_dump,
 };
diff --git a/mali_kbase/platform/pixel/pixel_gpu_control.h b/mali_kbase/platform/pixel/pixel_gpu_control.h
index 5b4e184..51b3063 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_control.h
+++ b/mali_kbase/platform/pixel/pixel_gpu_control.h
@@ -12,19 +12,44 @@
 bool gpu_pm_get_power_state(struct kbase_device *kbdev);
 int gpu_pm_init(struct kbase_device *kbdev);
 void gpu_pm_term(struct kbase_device *kbdev);
+void* gpu_pm_get_rail_state_log(struct kbase_device *kbdev);
+unsigned int gpu_pm_get_rail_state_log_size(struct kbase_device *kbdev);
 
 /* DVFS */
 void gpu_dvfs_event_power_on(struct kbase_device *kbdev);
 void gpu_dvfs_event_power_off(struct kbase_device *kbdev);
+
+#ifdef CONFIG_MALI_MIDGARD_DVFS
 int gpu_dvfs_init(struct kbase_device *kbdev);
 void gpu_dvfs_term(struct kbase_device *kbdev);
+void gpu_dvfs_disable_updates(struct kbase_device *kbdev);
+void gpu_dvfs_enable_updates(struct kbase_device *kbdev);
+#else
+static int __maybe_unused gpu_dvfs_init(struct kbase_device *kbdev) { return 0; }
+static void __maybe_unused gpu_dvfs_term(struct kbase_device *kbdev) {}
+static void __maybe_unused gpu_dvfs_disable_updates(struct kbase_device *kbdev) {}
+static void __maybe_unused gpu_dvfs_enable_updates(struct kbase_device *kbdev) {}
+#endif
 
 /* sysfs */
+#ifdef CONFIG_MALI_MIDGARD_DVFS
 int gpu_sysfs_init(struct kbase_device *kbdev);
 void gpu_sysfs_term(struct kbase_device *kbdev);
+#else
+static int __maybe_unused gpu_sysfs_init(struct kbase_device *kbdev) { return 0; }
+static void __maybe_unused gpu_sysfs_term(struct kbase_device *kbdev) {}
+#endif
 
 /* Kernel context callbacks */
+#ifdef CONFIG_MALI_MIDGARD_DVFS
 int gpu_dvfs_kctx_init(struct kbase_context *kctx);
 void gpu_dvfs_kctx_term(struct kbase_context *kctx);
+#endif
+
+/* ITMON notifier */
+#if IS_ENABLED(CONFIG_EXYNOS_ITMON)
+int gpu_itmon_init(struct kbase_device *kbdev);
+void gpu_itmon_term(struct kbase_device *kbdev);
+#endif
 
 #endif /* _PIXEL_GPU_CONTROL_H_ */
diff --git a/mali_kbase/platform/pixel/pixel_gpu_debug.c b/mali_kbase/platform/pixel/pixel_gpu_debug.c
new file mode 100644
index 0000000..f08f0b0
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_debug.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+
+/* Mali core includes */
+#include <mali_kbase.h>
+#include <device/mali_kbase_device.h>
+
+/* Pixel integration includes */
+#include "pixel_gpu_debug.h"
+
+#define GPU_DBG_LO               0x00000FE8
+#define PIXEL_STACK_PDC_ADDR     0x000770DB
+#define PIXEL_CG_PDC_ADDR        0x000760DB
+#define PIXEL_SC_PDC_ADDR        0x000740DB
+#define GPU_PDC_ADDR(offset, val)    ((offset) + ((val) << 8))
+#define GPU_DBG_ACTIVE_BIT         (1 << 31)
+#define GPU_DBG_ACTIVE_MAX_LOOPS    1000000
+#define GPU_DBG_INVALID                (~0U)
+
+static bool gpu_debug_check_dbg_active(struct kbase_device *kbdev)
+{
+	int i = 0;
+	u32 val;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	/* Wait for the active bit to drop, indicating the DBG command completed */
+	do {
+		val = kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_STATUS));
+	} while ((val & GPU_DBG_ACTIVE_BIT) && i++ < GPU_DBG_ACTIVE_MAX_LOOPS);
+
+	if (val & GPU_DBG_ACTIVE_BIT) {
+		dev_err(kbdev->dev, "Timed out waiting for GPU DBG command to complete");
+		return false;
+	}
+
+	dev_dbg(kbdev->dev, "Waited for %d iterations before GPU DBG command completed", i);
+
+	return true;
+}
+
+static u32 gpu_debug_read_pdc(struct kbase_device *kbdev, u32 pdc_offset)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	/* Write the debug command */
+	kbase_reg_write(kbdev, GPU_CONTROL_REG(GPU_COMMAND), pdc_offset);
+	/* Wait for the debug command to complete */
+	if (!gpu_debug_check_dbg_active(kbdev))
+		return GPU_DBG_INVALID;
+
+	/* Read the result */
+	return kbase_reg_read(kbdev, GPU_CONTROL_REG(GPU_DBG_LO));
+}
+
+static void gpu_debug_read_sparse_pdcs(struct kbase_device *kbdev, u32 *out, u64 available,
+				       u64 offset, u64 logical_max)
+{
+	int sparse_idx, logical_idx = 0;
+
+	for (sparse_idx = 0; sparse_idx < BITS_PER_TYPE(u64) && logical_idx < logical_max; ++sparse_idx) {
+		/* Skip if we don't have this core in our configuration */
+		if (!(available & BIT_ULL(sparse_idx)))
+			continue;
+
+		/* GPU debug command expects the sparse core index */
+		out[logical_idx] = gpu_debug_read_pdc(kbdev, GPU_PDC_ADDR(offset, sparse_idx));
+
+		++logical_idx;
+	}
+}
+
+void gpu_debug_read_pdc_status(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *status)
+{
+	struct gpu_raw_gpu_props *raw_props;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	status->meta = (struct pixel_gpu_pdc_status_metadata) {
+		.magic = "pdcs",
+		.version = 2,
+	};
+
+	/* If there's no external power we skip the register read/writes,
+	 * We know all the PDC signals will be 0 in this case
+	 */
+	if (!kbdev->pm.backend.gpu_powered) {
+		memset(&status->state, 0, sizeof(status->state));
+		return;
+	}
+
+	raw_props = &kbdev->gpu_props.props.raw_props;
+
+	status->state.core_group = gpu_debug_read_pdc(kbdev, PIXEL_CG_PDC_ADDR);
+	gpu_debug_read_sparse_pdcs(kbdev, status->state.shader_cores, raw_props->shader_present,
+				   PIXEL_SC_PDC_ADDR, PIXEL_MALI_SC_COUNT);
+	gpu_debug_read_sparse_pdcs(kbdev, status->state.stacks, raw_props->stack_present,
+				   PIXEL_STACK_PDC_ADDR, PIXEL_MALI_STACK_COUNT);
+}
diff --git a/mali_kbase/platform/pixel/pixel_gpu_debug.h b/mali_kbase/platform/pixel/pixel_gpu_debug.h
new file mode 100644
index 0000000..c4bcd4a
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_debug.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2022 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+
+#ifndef _PIXEL_GPU_DEBUG_H_
+#define _PIXEL_GPU_DEBUG_H_
+
+/* This is currently only supported for Odin */
+#define PIXEL_MALI_SC_COUNT 0x7
+#define PIXEL_MALI_STACK_COUNT 0x3
+
+/**
+ * enum pixel_gpu_pdc_state - PDC internal state
+ */
+enum pixel_gpu_pdc_state {
+	PIXEL_GPU_PDC_STATE_POWER_OFF,
+	PIXEL_GPU_PDC_STATE_UP_POWER,
+	PIXEL_GPU_PDC_STATE_UP_ISOLATE,
+	PIXEL_GPU_PDC_STATE_UP_RESET,
+	PIXEL_GPU_PDC_STATE_UP_CLOCK,
+	PIXEL_GPU_PDC_STATE_UP_FUNC_ISOLATE,
+	PIXEL_GPU_PDC_STATE_UP_RESP,
+	PIXEL_GPU_PDC_STATE_UNUSED7,
+	PIXEL_GPU_PDC_STATE_UNUSED8,
+	PIXEL_GPU_PDC_STATE_POWER_ON,
+	PIXEL_GPU_PDC_STATE_DOWN_FUNC_ISOLATE,
+	PIXEL_GPU_PDC_STATE_DOWN_CLOCK,
+	PIXEL_GPU_PDC_STATE_DOWN_RESET,
+	PIXEL_GPU_PDC_STATE_DOWN_ISOLATE,
+	PIXEL_GPU_PDC_STATE_DOWN_POWER,
+	PIXEL_GPU_PDC_STATE_DOWN_RESP,
+	PIXEL_GPU_PDC_STATE_FAST_FUNC_ISOLATE,
+	PIXEL_GPU_PDC_STATE_FAST_CLOCK,
+	PIXEL_GPU_PDC_STATE_FAST_RESET,
+	PIXEL_GPU_PDC_STATE_FAST_RESP,
+	PIXEL_GPU_PDC_STATE_FAST_WAIT,
+	PIXEL_GPU_PDC_STATE_UNUSED11,
+	PIXEL_GPU_PDC_STATE_UNUSED12,
+	PIXEL_GPU_PDC_STATE_UNUSED13,
+	PIXEL_GPU_PDC_STATE_UNUSED14,
+	PIXEL_GPU_PDC_STATE_UNUSED15,
+	PIXEL_GPU_PDC_STATE_UNUSED16,
+	PIXEL_GPU_PDC_STATE_UNUSED17,
+	PIXEL_GPU_PDC_STATE_UNUSED1A,
+	PIXEL_GPU_PDC_STATE_UNUSED1B,
+	PIXEL_GPU_PDC_STATE_UNUSED1F,
+};
+
+/**
+ * struct pixel_gpu_pdc_status_bits - PDC status layout
+ *
+ * @state:         PDC state, see enum pixel_gpu_pdc_state for details
+ * @func_iso_n:    Functional isolation request
+ * @func_iso_ack_n Functional isolation complete
+ * @pwrup:         Power up request
+ * @pwrup_ack      Power up request acknowledged by PDC
+ * @reset_n        Reset request
+ * @reset_ack_n    Reset request acknowledged by PDC
+ * @isolate_n      Physical isolation enable request
+ * @isolate_ack_n  Physical isolation enable request has been acknowledged by PDC
+ * @clken          Clock enable request
+ * @clken_ack      Clock enable request acknowledged from internal gating
+ * @power_is_on    PDC thinks power domain is fully on
+ * @power_is_off   PDC thinks power domain is fully off
+ * @_reserved      Undocumented
+ **/
+struct pixel_gpu_pdc_status_bits {
+	uint32_t state : 5;
+	uint32_t func_iso_n : 1;
+	uint32_t func_iso_ack_n : 1;
+	uint32_t pwrup : 1;
+	uint32_t pwrup_ack : 1;
+	uint32_t reset_n : 1;
+	uint32_t reset_ack_n : 1;
+	uint32_t isolate_n : 1;
+	uint32_t isolate_ack_n : 1;
+	uint32_t clken : 1;
+	uint32_t clken_ack : 1;
+	uint32_t power_is_on : 1;
+	uint32_t power_is_off : 1;
+	uint32_t _reserved : 15;
+};
+_Static_assert(sizeof(struct pixel_gpu_pdc_status_bits) == sizeof(uint32_t),
+	       "Incorrect pixel_gpu_pdc_status_bits size");
+
+/**
+ * struct pixel_gpu_pdc_status_metadata - Info about the PDC status format
+ *
+ * @magic:          Always 'pdcs', helps find the log in memory dumps
+ * @version:        Updated whenever the binary layout changes
+ * @_reserved:      Bytes reserved for future use
+ **/
+struct pixel_gpu_pdc_status_metadata {
+	char magic[4];
+	uint8_t version;
+	char _reserved[11];
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_gpu_pdc_status_metadata) == 16,
+	       "Incorrect pixel_gpu_pdc_status_metadata size");
+
+/**
+ * struct pixel_gpu_pdc_status - FW view of PDC state
+ *
+ * @meta:         Info about the status format
+ * @core_group:   Core group PDC state
+ * @shader_cores: Shader core PDC state
+ **/
+struct pixel_gpu_pdc_status {
+	struct pixel_gpu_pdc_status_metadata meta;
+	struct {
+		uint32_t core_group;
+		uint32_t shader_cores[PIXEL_MALI_SC_COUNT];
+		uint32_t stacks[PIXEL_MALI_STACK_COUNT];
+	} state;
+} __attribute__((packed));
+
+#if MALI_USE_CSF
+void gpu_debug_read_pdc_status(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *status);
+#else
+static void __maybe_unused gpu_debug_read_pdc_status(struct kbase_device *kbdev,
+						     struct pixel_gpu_pdc_status *status)
+{
+	(void)kbdev, (void)status;
+}
+#endif
+
+#endif /* _PIXEL_GPU_DEBUG_H_ */
diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs.c
index ae6f496..f758867 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_dvfs.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs.c
@@ -26,12 +26,33 @@
 #include "pixel_gpu_dvfs.h"
 #include "pixel_gpu_trace.h"
 
-#define DVFS_TABLE_ROW_MAX (12)
+#define DVFS_TABLE_ROW_MAX (14)
+#define DVFS_TABLES_MAX (2)
 static struct gpu_dvfs_opp gpu_dvfs_table[DVFS_TABLE_ROW_MAX];
 
 /* DVFS event handling code */
 
 /**
+ * gpu_dvfs_set_freq() - Request a frequency change for a GPU domain
+ *
+ * @kbdev:   &struct kbase_device for the GPU.
+ * @domain:  The GPU domain that shall have it's frequency changed.
+ * @level:   The frequency level to set the GPU domain to.
+ *
+ * Context: Expects the caller to hold the domain access lock
+ *
+ * Return: See cal_dfs_set_rate
+ */
+static int gpu_dvfs_set_freq(struct kbase_device *kbdev, enum gpu_dvfs_clk_index domain, int level)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	lockdep_assert_held(&pc->pm.domain->access_lock);
+
+	return cal_dfs_set_rate(pc->dvfs.clks[domain].cal_id, pc->dvfs.table[level].clk[domain]);
+}
+
+/**
  * gpu_dvfs_set_new_level() - Updates the GPU operating point.
  *
  * @kbdev:      The &struct kbase_device for the GPU.
@@ -43,7 +64,6 @@ static struct gpu_dvfs_opp gpu_dvfs_table[DVFS_TABLE_ROW_MAX];
 static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level)
 {
 	struct pixel_context *pc = kbdev->platform_context;
-	int c;
 
 	lockdep_assert_held(&pc->dvfs.lock);
 
@@ -55,8 +75,17 @@ static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level)
 
 	mutex_lock(&pc->pm.domain->access_lock);
 
-	for (c = 0; c < GPU_DVFS_CLK_COUNT; c++)
-		cal_dfs_set_rate(pc->dvfs.clks[c].cal_id, pc->dvfs.table[next_level].clk[c]);
+	/* We must enforce the CLK_G3DL2 >= CLK_G3D constraint.
+	 * When clocking down we must set G3D CLK first to avoid violating the constraint.
+	 */
+	if (next_level > pc->dvfs.level) {
+		gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_SHADERS, next_level);
+		gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_TOP_LEVEL, next_level);
+	} else {
+		gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_TOP_LEVEL, next_level);
+		gpu_dvfs_set_freq(kbdev, GPU_DVFS_CLK_SHADERS, next_level);
+	}
+
 
 	mutex_unlock(&pc->pm.domain->access_lock);
 
@@ -82,6 +111,8 @@ static int gpu_dvfs_set_new_level(struct kbase_device *kbdev, int next_level)
  * taking into account the priority levels of each level lock. It ensures that votes on minimum and
  * maximum levels originating from different level lock types are supported.
  *
+ * Context: Expects the caller to hold the DVFS lock
+ *
  * Note: This is the only function that should write to &level_scaling_max or &level_scaling_min.
  */
 static void gpu_dvfs_process_level_locks(struct kbase_device *kbdev)
@@ -267,6 +298,7 @@ static void gpu_dvfs_clockdown_worker(struct work_struct *data)
 static inline void gpu_dvfs_set_level_locks_from_util(struct kbase_device *kbdev,
 	struct gpu_dvfs_utlization *util_stats)
 {
+#if !MALI_USE_CSF
 	struct pixel_context *pc = kbdev->platform_context;
 	bool cl_lock_set = (pc->dvfs.level_locks[GPU_DVFS_LEVEL_LOCK_COMPUTE].level_min != -1 ||
 		pc->dvfs.level_locks[GPU_DVFS_LEVEL_LOCK_COMPUTE].level_max != -1);
@@ -277,6 +309,7 @@ static inline void gpu_dvfs_set_level_locks_from_util(struct kbase_device *kbdev
 			pc->dvfs.level_scaling_compute_min, -1);
 	else if (util_stats->util_cl == 0 && cl_lock_set)
 		gpu_dvfs_reset_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_COMPUTE);
+#endif /* !MALI_USE_CSF */
 }
 
 /**
@@ -299,10 +332,12 @@ void gpu_dvfs_select_level(struct kbase_device *kbdev)
 	struct pixel_context *pc = kbdev->platform_context;
 	struct gpu_dvfs_utlization util_stats;
 
-	if (gpu_pm_get_power_state(kbdev)) {
+	if (pc->dvfs.updates_enabled && gpu_pm_get_power_state(kbdev)) {
 		util_stats.util = atomic_read(&pc->dvfs.util);
+#if !MALI_USE_CSF
 		util_stats.util_gl = atomic_read(&pc->dvfs.util_gl);
 		util_stats.util_cl = atomic_read(&pc->dvfs.util_cl);
+#endif
 
 		gpu_dvfs_set_level_locks_from_util(kbdev, &util_stats);
 
@@ -325,6 +360,41 @@ void gpu_dvfs_select_level(struct kbase_device *kbdev)
 	}
 }
 
+#ifdef CONFIG_MALI_MIDGARD_DVFS
+/**
+ * gpu_dvfs_disable_updates() - Ensure DVFS updates are disabled
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Ensure that no dvfs updates will occurr after this call completes.
+ */
+void gpu_dvfs_disable_updates(struct kbase_device *kbdev) {
+	struct pixel_context *pc = kbdev->platform_context;
+
+	mutex_lock(&pc->dvfs.lock);
+	pc->dvfs.updates_enabled = false;
+	mutex_unlock(&pc->dvfs.lock);
+
+	flush_workqueue(pc->dvfs.control_wq);
+}
+
+/**
+ * gpu_dvfs_enable_updates() - Ensure DVFS updates are enabled
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Ensure that dvfs updates will occurr after this call completes, undoing the effect of
+ * gpu_dvfs_disable_updates().
+ */
+void gpu_dvfs_enable_updates(struct kbase_device *kbdev) {
+	struct pixel_context *pc = kbdev->platform_context;
+
+	mutex_lock(&pc->dvfs.lock);
+	pc->dvfs.updates_enabled = true;
+	mutex_unlock(&pc->dvfs.lock);
+}
+#endif
+
 /**
  * gpu_dvfs_control_worker() - The workqueue worker that changes DVFS on utilization change.
  *
@@ -341,6 +411,34 @@ static void gpu_dvfs_control_worker(struct work_struct *data)
 	mutex_unlock(&pc->dvfs.lock);
 }
 
+#if MALI_USE_CSF
+/**
+ * kbase_platform_dvfs_event() - Callback from Mali driver to report updated utilization metrics.
+ *
+ * @kbdev:         The &struct kbase_device for the GPU.
+ * @utilisation:   The calculated utilization as measured by the core Mali driver's metrics system.
+ *
+ * This is the function that bridges the core Mali driver and the Pixel integration code. As this is
+ * made in interrupt context, it is swiftly handed off to a work_queue for further processing.
+ *
+ * Context: Interrupt context.
+ *
+ * Return: Returns 1 to signal success as specified in mali_kbase_pm_internal.h.
+ */
+int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+	int proc = raw_smp_processor_id();
+
+	/* TODO (b/187175695): Report this data via a custom ftrace event instead */
+	trace_clock_set_rate("gpu_util", utilisation, proc);
+
+	atomic_set(&pc->dvfs.util, utilisation);
+	queue_work(pc->dvfs.control_wq, &pc->dvfs.control_work);
+
+	return 1;
+}
+#else /* MALI_USE_CSF */
 /**
  * kbase_platform_dvfs_event() - Callback from Mali driver to report updated utilization metrics.
  *
@@ -374,6 +472,7 @@ int kbase_platform_dvfs_event(struct kbase_device *kbdev, u32 utilisation,
 
 	return 1;
 }
+#endif
 
 /* Initialization code */
 
@@ -405,39 +504,39 @@ static int find_voltage_for_freq(struct kbase_device *kbdev, unsigned int clock,
 }
 
 /**
- * gpu_dvfs_update_asv_table() - Populate the GPU's DVFS table from DT.
+ * validate_and_parse_dvfs_table() - Validate and populate the GPU's DVFS table from DT.
  *
  * @kbdev: The &struct kbase_device for the GPU.
+ * @dvfs_table_num: DVFS table number to be validated and parsed.
  *
- * This function reads data out of the GPU's device tree entry and uses it to populate
- * &gpu_dvfs_table. For each entry in the DVFS table, it makes calls to determine voltages from ECT.
- * It also checks for any level locks specified in the devicetree and ensures that the effective
- * scaling range is set up.
+ * This function reads data out of the GPU's device tree entry, validates it, and
+ * uses it to populate &gpu_dvfs_table. For each entry in the DVFS table, it makes
+ * calls to determine voltages from ECT. It also checks for any level locks specified
+ * in the devicetree and ensures that the effective scaling range is set up.
  *
- * This function will fail if the required data is not present in the GPU's device tree entry.
+ * This function will fail if the particular dvfs table's operating points does not
+ * match the ECT table for the device.
  *
- * Return: Returns the size of the DVFS table on success, -EINVAL on failure.
+ * Return: Returns the number of opertaing points in the DVFS table on success, -EINVAL on failure.
  */
-static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
+static int validate_and_parse_dvfs_table(struct kbase_device *kbdev, int dvfs_table_num)
 {
-	struct pixel_context *pc = kbdev->platform_context;
-	struct device_node *np = kbdev->dev->of_node;
+	char table_name[64];
+	char table_size_name[64];
 
 	int i, idx, c;
-
 	int of_data_int_array[OF_DATA_NUM_MAX];
 	int dvfs_table_row_num = 0, dvfs_table_col_num = 0;
 	int dvfs_table_size = 0;
-
-	struct dvfs_rate_volt vf_map[GPU_DVFS_CLK_COUNT][16];
-	int level_count[GPU_DVFS_CLK_COUNT];
-
 	int scaling_level_max = -1, scaling_level_min = -1;
 	int scaling_freq_max_devicetree = INT_MAX;
 	int scaling_freq_min_devicetree = 0;
 	int scaling_freq_min_compute = 0;
+	int level_count[GPU_DVFS_CLK_COUNT];
+	struct dvfs_rate_volt vf_map[GPU_DVFS_CLK_COUNT][16];
 
-	bool use_asv_v1 = false;
+	struct device_node *np = kbdev->dev->of_node;
+	struct pixel_context *pc = kbdev->platform_context;
 
 	/* Get frequency -> voltage mapping */
 	for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
@@ -448,22 +547,9 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
 		}
 	}
 
-	/* We detect which ASV table the GPU is running by checking which
-	 * operating points are available from ECT. We check for 202MHz on the
-	 * GPU shader cores as this is only available in ASV v0.3 and later.
-	 */
-	if (find_voltage_for_freq(kbdev, 202000, NULL, vf_map[GPU_DVFS_CLK_SHADERS],
-		level_count[GPU_DVFS_CLK_SHADERS]))
-		use_asv_v1 = true;
-
-	/* Get size of DVFS table data from device tree */
-	if (use_asv_v1) {
-		if (of_property_read_u32_array(np, "gpu_dvfs_table_size_v1", of_data_int_array, 2))
-			goto err;
-	} else {
-		if (of_property_read_u32_array(np, "gpu_dvfs_table_size_v2", of_data_int_array, 2))
-			goto err;
-	}
+	sprintf(table_size_name, "gpu_dvfs_table_size_v%d", dvfs_table_num);
+	if (of_property_read_u32_array(np, table_size_name, of_data_int_array, 2))
+		goto err;
 
 	dvfs_table_row_num = of_data_int_array[0];
 	dvfs_table_col_num = of_data_int_array[1];
@@ -471,27 +557,39 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
 
 	if (dvfs_table_row_num > DVFS_TABLE_ROW_MAX) {
 		dev_err(kbdev->dev,
-			"DVFS table has %d rows but only up to %d are supported\n",
-			dvfs_table_row_num, DVFS_TABLE_ROW_MAX);
+			"DVFS table %d has %d rows but only up to %d are supported",
+			dvfs_table_num, dvfs_table_row_num, DVFS_TABLE_ROW_MAX);
 		goto err;
 	}
 
 	if (dvfs_table_size > OF_DATA_NUM_MAX) {
-		dev_err(kbdev->dev, "DVFS table is too big\n");
+		dev_err(kbdev->dev, "DVFS table %d is too big", dvfs_table_num);
 		goto err;
 	}
-
-	if (use_asv_v1)
-		of_property_read_u32_array(np, "gpu_dvfs_table_v1",
-			of_data_int_array, dvfs_table_size);
-	else
-		of_property_read_u32_array(np, "gpu_dvfs_table_v2",
-			of_data_int_array, dvfs_table_size);
+	sprintf(table_name, "gpu_dvfs_table_v%d", dvfs_table_num);
+	if (of_property_read_u32_array(np, table_name, of_data_int_array, dvfs_table_size))
+		goto err;
 
 	of_property_read_u32(np, "gpu_dvfs_max_freq", &scaling_freq_max_devicetree);
 	of_property_read_u32(np, "gpu_dvfs_min_freq", &scaling_freq_min_devicetree);
 	of_property_read_u32(np, "gpu_dvfs_min_freq_compute", &scaling_freq_min_compute);
 
+	/* Check if there is a voltage mapping for each frequency in the ECT table */
+	for (i = 0; i < dvfs_table_row_num; i++) {
+		idx = i * dvfs_table_col_num;
+
+		/* Get and validate voltages from cal-if */
+		for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
+			if (find_voltage_for_freq(kbdev, of_data_int_array[idx + c],
+				NULL, vf_map[c], level_count[c])) {
+				dev_dbg(kbdev->dev,
+					"Failed to find voltage for clock %u frequency %u in gpu_dvfs_table_v%d\n",
+					c, of_data_int_array[idx + c], dvfs_table_num);
+				goto err;
+			}
+		}
+	}
+
 	/* Process DVFS table data from device tree and store it in OPP table */
 	for (i = 0; i < dvfs_table_row_num; i++) {
 		idx = i * dvfs_table_col_num;
@@ -500,6 +598,11 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
 		gpu_dvfs_table[i].clk[GPU_DVFS_CLK_TOP_LEVEL] = of_data_int_array[idx + 0];
 		gpu_dvfs_table[i].clk[GPU_DVFS_CLK_SHADERS]   = of_data_int_array[idx + 1];
 
+		for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
+			find_voltage_for_freq(kbdev, gpu_dvfs_table[i].clk[c],
+				&(gpu_dvfs_table[i].vol[c]), vf_map[c], level_count[c]);
+		}
+
 		gpu_dvfs_table[i].util_min     = of_data_int_array[idx + 2];
 		gpu_dvfs_table[i].util_max     = of_data_int_array[idx + 3];
 		gpu_dvfs_table[i].hysteresis   = of_data_int_array[idx + 4];
@@ -524,19 +627,6 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
 
 		if (gpu_dvfs_table[i].clk[GPU_DVFS_CLK_SHADERS] >= scaling_freq_min_compute)
 			pc->dvfs.level_scaling_compute_min = i;
-
-		/* Get and validate voltages from cal-if */
-		for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
-			if (find_voltage_for_freq(kbdev, gpu_dvfs_table[i].clk[c],
-					&(gpu_dvfs_table[i].vol[c]),
-					vf_map[c], level_count[c])) {
-				dev_err(kbdev->dev,
-					"Failed to find voltage for clock %u frequency %u\n",
-					c, gpu_dvfs_table[i].clk[c]);
-				goto err;
-			}
-		}
-
 	}
 
 	pc->dvfs.level_max = 0;
@@ -547,11 +637,43 @@ static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
 	return dvfs_table_row_num;
 
 err:
-	dev_err(kbdev->dev, "failed to set GPU ASV table\n");
 	return -EINVAL;
 }
 
 /**
+ * gpu_dvfs_update_asv_table() - Populate the GPU's DVFS table from DT.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * This function iterates through the list of DVFS tables available in the device tree
+ * and calls validate_and_parse_dvfs_table() to select the valid one for the device.
+ *
+ * This function will fail if the required data is not present in the GPU's device tree entry.
+ *
+ * Context: Expects the caller to hold the DVFS lock
+ *
+ * Return: Returns the number of opertaing points in the DVFS table on success, -EINVAL on failure.
+ */
+static int gpu_dvfs_update_asv_table(struct kbase_device *kbdev)
+{
+	int dvfs_table_idx, dvfs_table_row_num;
+	struct pixel_context *pc = kbdev->platform_context;
+
+	lockdep_assert_held(&pc->dvfs.lock);
+
+	for (dvfs_table_idx = DVFS_TABLES_MAX; dvfs_table_idx > 0; dvfs_table_idx--) {
+		dvfs_table_row_num = validate_and_parse_dvfs_table(kbdev, dvfs_table_idx);
+		if (dvfs_table_row_num > 0)
+			break;
+	}
+	if (dvfs_table_row_num <= 0) {
+		dev_err(kbdev->dev, "failed to set GPU DVFS table");
+	}
+
+	return dvfs_table_row_num;
+}
+
+/**
  * gpu_dvfs_set_initial_level() - Set the initial GPU clocks
  *
  * @kbdev: The &struct kbase_device for the GPU
@@ -576,17 +698,17 @@ static int gpu_dvfs_set_initial_level(struct kbase_device *kbdev)
 	mutex_lock(&pc->pm.domain->access_lock);
 
 	for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
-		ret = cal_dfs_set_rate(pc->dvfs.clks[c].cal_id, pc->dvfs.table[level].clk[c]);
+		ret = gpu_dvfs_set_freq(kbdev, c, level);
 		if (ret) {
 			dev_err(kbdev->dev,
 				"Failed to set boot frequency %d on clock index %d (err: %d)\n",
 				pc->dvfs.table[level].clk[c], c, ret);
-			goto done;
+			break;
 		}
 	}
 
-done:
 	mutex_unlock(&pc->pm.domain->access_lock);
+
 	return ret;
 }
 
@@ -615,6 +737,8 @@ int gpu_dvfs_init(struct kbase_device *kbdev)
 		pc->dvfs.level_locks[i].level_max = -1;
 	}
 
+	pc->dvfs.updates_enabled = true;
+
 	/* Get data from DT */
 	if (of_property_read_u32(np, "gpu0_cmu_cal_id",
 		&pc->dvfs.clks[GPU_DVFS_CLK_TOP_LEVEL].cal_id) ||
@@ -643,6 +767,12 @@ int gpu_dvfs_init(struct kbase_device *kbdev)
 		goto done;
 	}
 
+	/* Setup dvfs step up value */
+	if (of_property_read_u32(np, "gpu_dvfs_step_up_val", &pc->dvfs.step_up_val)) {
+		ret = -EINVAL;
+		goto done;
+	}
+
 	/* Initialize power down hysteresis */
 	if (of_property_read_u32(np, "gpu_dvfs_clockdown_hysteresis",
 		&pc->dvfs.clockdown_hysteresis)) {
diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs.h b/mali_kbase/platform/pixel/pixel_gpu_dvfs.h
index d133693..c1f1587 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_dvfs.h
+++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs.h
@@ -142,27 +142,99 @@ void gpu_dvfs_governor_term(struct kbase_device *kbdev);
  * @active_kctx_count: Count of active kernel contexts operating under this UID. Should only be
  *                     accessed while holding the kctx_list lock.
  * @uid:               The UID for this stats block.
- * @atoms_in_flight:   The number of atoms currently executing on the GPU from this UID. Should only
- *                     be accessed while holding the hwaccess lock.
+ * @active_work_count: Count of currently executing units of work on the GPU from this UID. Should
+ *                     only be accessed while holding the hwaccess lock if using a job manager GPU,
+ *                     CSF GPUs require holding the csf.scheduler.lock.
  * @period_start:      The time (in nanoseconds) that the current active period for this UID began.
- *                     Should only be accessed while holding the hwaccess lock.
+ *                     Should only be accessed while holding the hwaccess lock if using a job
+ *                     manager GPU, CSF GPUs require holding the csf.scheduler.lock.
  * @tis_stats:         &struct gpu_dvfs_opp_metrics block storing time in state data for this UID.
- *                     Should only be accessed while holding the hwaccess lock.
+ *                     Should only be accessed while holding the hwaccess lock if using a job
+ *                     manager GPU, CSF GPUs require holding the csf.scheduler.lock.
  */
 struct gpu_dvfs_metrics_uid_stats {
 	struct list_head uid_list_link;
 	int active_kctx_count;
 	kuid_t uid;
-	int atoms_in_flight;
+	int active_work_count;
 	u64 period_start;
 	struct gpu_dvfs_opp_metrics *tis_stats;
 };
 
+/**
+ * gpu_dvfs_metrics_update() - Updates GPU metrics on level or power change.
+ *
+ * @kbdev:       The &struct kbase_device for the GPU.
+ * @old_level:   The level that the GPU has just moved from. Can be the same as &new_level.
+ * @new_level:   The level that the GPU has just moved to. Can be the same as &old_level. This
+ *               parameter is ignored if &power_state is false.
+ * @power_state: The current power state of the GPU. Can be the same as the current power state.
+ *
+ * This function should be called (1) right after a change in power state of the GPU, or (2) just
+ * after changing the level of a powered on GPU. It will update the metrics for each of the GPU
+ * DVFS level metrics and the power metrics as appropriate.
+ *
+ * Context: Expects the caller to hold the dvfs.lock & dvfs.metrics.lock.
+ */
 void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_level,
 	bool power_state);
-void gpu_dvfs_metrics_job_start(struct kbase_jd_atom *atom);
-void gpu_dvfs_metrics_job_end(struct kbase_jd_atom *atom);
+
+/**
+ * gpu_dvfs_metrics_work_begin() - Notification of when a unit of work starts on
+ *                                 the GPU
+ *
+ * @param:
+ * - If job manager GPU: The &struct kbase_jd_atom that has just been submitted to the GPU.
+ * - If CSF GPU: The &struct kbase_queue_group that has just been submitted to the GPU.
+ *
+ * For job manager GPUs:
+ * This function is called when an atom is submitted to the GPU by way of writing to the
+ * JSn_HEAD_NEXTn register.
+ *
+ * For CSF GPUs:
+ * This function is called when an group resident in a CSG slot starts executing.
+ *
+ * Context: Acquires the dvfs.metrics.lock. May be in IRQ context
+ */
+void gpu_dvfs_metrics_work_begin(void *param);
+
+/**
+ * gpu_dvfs_metrics_work_end() - Notification of when a unit of work stops
+ *                               running on the GPU
+ *
+ * @param:
+ * - If job manager GPU: The &struct kbase_jd_atom that has just stopped running on the GPU
+ * - If CSF GPU: The &struct kbase_queue_group that has just stopped running on the GPU
+ *
+ * This function is called when a unit of work is no longer running on the GPU,
+ * either due to successful completion, failure, preemption, or GPU reset.
+ *
+ * For job manager GPUs, a unit of work refers to an atom.
+ *
+ * For CSF GPUs, it refers to a group resident in a CSG slot, and so this
+ * function is called when a that CSG slot completes or suspends execution of
+ * the group.
+ *
+ * Context: Acquires the dvfs.metrics.lock. May be in IRQ context
+ */
+void gpu_dvfs_metrics_work_end(void *param);
+
+/**
+ * gpu_dvfs_metrics_init() - Initializes DVFS metrics.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Context: Process context. Takes and releases the DVFS lock.
+ *
+ * Return: On success, returns 0 otherwise returns an error code.
+ */
 int gpu_dvfs_metrics_init(struct kbase_device *kbdev);
+
+/**
+ * gpu_dvfs_metrics_term() - Terminates DVFS metrics
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ */
 void gpu_dvfs_metrics_term(struct kbase_device *kbdev);
 
 /**
diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c
index b817aff..28d4073 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs_governor.c
@@ -7,11 +7,13 @@
 
 /* Mali core includes */
 #include <mali_kbase.h>
+#include <trace/events/power.h>
 
 /* Pixel integration includes */
 #include "mali_kbase_config_platform.h"
 #include "pixel_gpu_control.h"
 #include "pixel_gpu_dvfs.h"
+#include "pixel_gpu_trace.h"
 
 /**
  * gpu_dvfs_governor_basic() - The evaluation function for &GPU_DVFS_GOVERNOR_BASIC.
@@ -96,15 +98,16 @@ static int gpu_dvfs_governor_quickstep(struct kbase_device *kbdev,
 	int level_max = pc->dvfs.level_max;
 	int level_min = pc->dvfs.level_min;
 	int util = util_stats->util;
+	int step_up = pc->dvfs.step_up_val;
 
 	lockdep_assert_held(&pc->dvfs.lock);
 
 	if ((level > level_max) && (util > tbl[level].util_max)) {
 		/* We need to clock up. */
-		if (level >= 2 && (util > (100 + tbl[level].util_max) / 2)) {
-			dev_dbg(kbdev->dev, "DVFS +2: %d -> %d (u: %d / %d)\n",
-				level, level - 2, util, tbl[level].util_max);
-			level -= 2;
+		if (level >= step_up && (util > (100 + tbl[level].util_max) / 2)) {
+			dev_dbg(kbdev->dev, "DVFS +%d: %d -> %d (u: %d / %d)\n",
+				step_up, level, level - step_up, util, tbl[level].util_max);
+			level -= step_up;
 			pc->dvfs.governor.delay = tbl[level].hysteresis / 2;
 		} else {
 			dev_dbg(kbdev->dev, "DVFS +1: %d -> %d (u: %d / %d)\n",
@@ -164,11 +167,24 @@ int gpu_dvfs_governor_get_next_level(struct kbase_device *kbdev,
 	struct gpu_dvfs_utlization *util_stats)
 {
 	struct pixel_context *pc = kbdev->platform_context;
-	int level;
+	int level, ret;
 
 	lockdep_assert_held(&pc->dvfs.lock);
 	level = governors[pc->dvfs.governor.curr].evaluate(kbdev, util_stats);
-	return clamp(level, pc->dvfs.level_scaling_max, pc->dvfs.level_scaling_min);
+	if (level != pc->dvfs.level) {
+		trace_clock_set_rate("gpu_gov_rec", pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS],
+			raw_smp_processor_id());
+	}
+
+	ret = clamp(level, pc->dvfs.level_scaling_max, pc->dvfs.level_scaling_min);
+	if (ret != level) {
+		trace_gpu_gov_rec_violate(pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS],
+			pc->dvfs.table[ret].clk[GPU_DVFS_CLK_SHADERS],
+			pc->dvfs.table[pc->dvfs.level_scaling_min].clk[GPU_DVFS_CLK_SHADERS],
+			pc->dvfs.table[pc->dvfs.level_scaling_max].clk[GPU_DVFS_CLK_SHADERS]);
+	}
+
+	return ret;
 }
 
 /**
diff --git a/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c b/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c
index 5d3da59..c7c2b81 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_dvfs_metrics.c
@@ -25,6 +25,7 @@
 #include "mali_kbase_config_platform.h"
 #include "pixel_gpu_control.h"
 #include "pixel_gpu_dvfs.h"
+#include "mali_power_gpu_frequency_trace.h"
 
 static void *enumerate_gpu_clk(struct kbase_device *kbdev, unsigned int index)
 {
@@ -85,7 +86,6 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev
 	struct pixel_context *pc = kbdev->platform_context;
 	struct kbase_gpu_clk_notifier_data nd;
 	int c;
-	int proc = raw_smp_processor_id();
 	int clks[GPU_DVFS_CLK_COUNT];
 
 	for (c = 0; c < GPU_DVFS_CLK_COUNT; c++) {
@@ -103,9 +103,8 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev
 
 	}
 
-	/* TODO: Remove reporting clocks this way when we transition to Perfetto */
-	trace_clock_set_rate("gpu0", clks[GPU_DVFS_CLK_TOP_LEVEL], proc);
-	trace_clock_set_rate("gpu1", clks[GPU_DVFS_CLK_SHADERS], proc);
+	trace_gpu_frequency(clks[GPU_DVFS_CLK_TOP_LEVEL], 0);
+	trace_gpu_frequency(clks[GPU_DVFS_CLK_SHADERS], 1);
 }
 
 /**
@@ -114,61 +113,44 @@ static void gpu_dvfs_metrics_trace_clock(struct kbase_device *kbdev, int old_lev
  * @kbdev:       The &struct kbase_device for the GPU.
  * @event_time:  The time of the clock change event in nanoseconds.
  *
- * Called when the operating point is changing so that the per-UID time in state data for in-flight
- * atoms can be updated. Note that this function need only be called when the operating point is
- * changing _and_ the GPU is powered on. This is because no atoms will be in-flight when the GPU is
- * powered down.
+ * Called when the operating point is changing so that the per-UID time in state
+ * data for active work can be updated. Note that this function need only be
+ * called when the operating point is changing _and_ the GPU is powered on.
+ * This is because no work will be active when the GPU is powered down.
  *
- * Context: Called in process context, invokes an IRQ context and takes the per-UID metrics spin
- *          lock.
+ * Context: Called in process context. Requires the dvfs.lock & dvfs.metrics.lock to be held.
  */
 static void gpu_dvfs_metrics_uid_level_change(struct kbase_device *kbdev, u64 event_time)
 {
 	struct pixel_context *pc = kbdev->platform_context;
 	struct gpu_dvfs_metrics_uid_stats *stats;
-	unsigned long flags;
 	int i;
+	int const nr_slots = ARRAY_SIZE(pc->dvfs.metrics.work_uid_stats);
 
 	lockdep_assert_held(&pc->dvfs.lock);
+	lockdep_assert_held(&pc->dvfs.metrics.lock);
 
-	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
-
-	for (i = 0; i < BASE_JM_MAX_NR_SLOTS; i++) {
-		stats = pc->dvfs.metrics.js_uid_stats[i];
+	for (i = 0; i < nr_slots; i++) {
+		stats = pc->dvfs.metrics.work_uid_stats[i];
 		if (stats && stats->period_start != event_time) {
-			WARN_ON(stats->period_start == 0);
+			WARN_ON_ONCE(stats->period_start == 0);
 			stats->tis_stats[pc->dvfs.level].time_total +=
 				(event_time - stats->period_start);
 			stats->period_start = event_time;
 		}
 	}
-
-	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 }
 
-/**
- * gpu_dvfs_metrics_update() - Updates GPU metrics on level or power change.
- *
- * @kbdev:       The &struct kbase_device for the GPU.
- * @old_level:   The level that the GPU has just moved from. Can be the same as &new_level.
- * @new_level:   The level that the GPU has just moved to. Can be the same as &old_level. This
- *               parameter is ignored if &power_state is false.
- * @power_state: The current power state of the GPU. Can be the same as the current power state.
- *
- * This function should be called (1) right after a change in power state of the GPU, or (2) just
- * after changing the level of a powered on GPU. It will update the metrics for each of the GPU
- * DVFS level metrics and the power metrics as appropriate.
- *
- * Context: Expects the caller to hold the DVFS lock.
- */
 void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_level,
 	bool power_state)
 {
 	struct pixel_context *pc = kbdev->platform_context;
 	const u64 prev = pc->dvfs.metrics.last_time;
 	u64 curr = ktime_get_ns();
+	unsigned long flags;
 
 	lockdep_assert_held(&pc->dvfs.lock);
+	spin_lock_irqsave(&pc->dvfs.metrics.lock, flags);
 
 	if (pc->dvfs.metrics.last_power_state) {
 		if (power_state) {
@@ -210,74 +192,125 @@ void gpu_dvfs_metrics_update(struct kbase_device *kbdev, int old_level, int new_
 	pc->dvfs.metrics.last_power_state = power_state;
 	pc->dvfs.metrics.last_time = curr;
 	pc->dvfs.metrics.last_level = new_level;
+	spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags);
 
 	gpu_dvfs_metrics_trace_clock(kbdev, old_level, new_level, power_state);
 }
 
-/**
- * gpu_dvfs_metrics_job_start() - Notification of when an atom starts on the GPU
- *
- * @atom: The &struct kbase_jd_atom that has just been submitted to the GPU.
- *
- * This function is called when an atom is submitted to the GPU by way of writing to the
- * JSn_HEAD_NEXTn register.
- *
- * Context: May be in IRQ context, assumes that the hwaccess lock is held, and in turn takes and
- *          releases the metrics UID spin lock.
- */
-void gpu_dvfs_metrics_job_start(struct kbase_jd_atom *atom)
+void gpu_dvfs_metrics_work_begin(void* param)
 {
-	struct kbase_device *kbdev = atom->kctx->kbdev;
-	struct pixel_context *pc = kbdev->platform_context;
-	struct gpu_dvfs_metrics_uid_stats *stats = atom->kctx->platform_data;
-	int js = atom->slot_nr;
+#if !MALI_USE_CSF
+	struct kbase_jd_atom* unit = param;
+	const int slot = unit->slot_nr;
+#else
+	struct kbase_queue_group* unit = param;
+	const int slot = unit->csg_nr;
+#endif
+	struct kbase_context* kctx = unit->kctx;
+	struct kbase_device* kbdev = kctx->kbdev;
+	struct pixel_context* pc = kbdev->platform_context;
+	struct pixel_platform_data *pd = kctx->platform_data;
+	struct gpu_dvfs_metrics_uid_stats* uid_stats = pd->stats;
+	struct gpu_dvfs_metrics_uid_stats** work_stats = &pc->dvfs.metrics.work_uid_stats[slot];
+	const u64 curr = ktime_get_ns();
+	unsigned long flags;
+
+	dev_dbg(kbdev->dev, "work_begin, slot: %d, uid: %d", slot, uid_stats->uid.val);
+
+	spin_lock_irqsave(&pc->dvfs.metrics.lock, flags);
 
-	lockdep_assert_held(&kbdev->hwaccess_lock);
+#if !MALI_USE_CSF
+	/*
+	* JM slots can have 2 Atoms submitted per slot, with different UIDs
+	* Use the secondary slot if the first is occupied
+	*/
+	if (*work_stats != NULL) {
+		work_stats = &pc->dvfs.metrics.work_uid_stats[slot + BASE_JM_MAX_NR_SLOTS];
+	}
+#endif
+
+	/* Nothing should be mapped to this slot */
+	WARN_ON_ONCE(*work_stats != NULL);
 
-	if (stats->atoms_in_flight == 0) {
-		/* This is the start of a new period */
-		WARN_ON(stats->period_start != 0);
-		stats->period_start = ktime_get_ns();
+	/*
+	 * First new work associated with this UID, start tracking the per UID
+	 * time now
+	 */
+	if (uid_stats->active_work_count == 0)
+	{
+		/*
+		 * This is the start of a new period, the start time shouldn't have
+		 * been set or should have been cleared.
+		 */
+		WARN_ON_ONCE(uid_stats->period_start != 0);
+		uid_stats->period_start = curr;
 	}
+	++uid_stats->active_work_count;
+
+	/* Link the UID stats to the stream slot */
+	*work_stats = uid_stats;
 
-	stats->atoms_in_flight++;
-	pc->dvfs.metrics.js_uid_stats[js] = stats;
+	spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags);
 }
 
-/**
- * gpu_dvfs_metrics_job_end() - Notification of when an atom stops running on the GPU
- *
- * @atom: The &struct kbase_jd_atom that has just stopped running on the GPU
- *
- * This function is called when an atom is no longer running on the GPU, either due to successful
- * completion, failure, preemption, or GPU reset.
- *
- * Context: May be in IRQ context, assumes that the hwaccess lock is held, and in turn takes and
- *          releases the metrics UID spin lock.
- */
-void gpu_dvfs_metrics_job_end(struct kbase_jd_atom *atom)
+void gpu_dvfs_metrics_work_end(void *param)
 {
-	struct kbase_device *kbdev = atom->kctx->kbdev;
-	struct pixel_context *pc = kbdev->platform_context;
-	struct gpu_dvfs_metrics_uid_stats *stats = atom->kctx->platform_data;
-	int js = atom->slot_nr;
-	u64 curr = ktime_get_ns();
+#if !MALI_USE_CSF
+	struct kbase_jd_atom* unit = param;
+	const int slot = unit->slot_nr;
+#else
+	struct kbase_queue_group* unit = param;
+	const int slot = unit->csg_nr;
+#endif
+	struct kbase_context* kctx = unit->kctx;
+	struct kbase_device* kbdev = kctx->kbdev;
+	struct pixel_context* pc = kbdev->platform_context;
+	struct pixel_platform_data *pd = kctx->platform_data;
+	struct gpu_dvfs_metrics_uid_stats* uid_stats = pd->stats;
+	struct gpu_dvfs_metrics_uid_stats** work_stats = &pc->dvfs.metrics.work_uid_stats[slot];
+	const u64 curr = ktime_get_ns();
+	unsigned long flags;
 
-	lockdep_assert_held(&kbdev->hwaccess_lock);
+	dev_dbg(kbdev->dev, "work_end, slot: %d, uid: %d", slot, uid_stats->uid.val);
 
-	WARN_ON(stats->period_start == 0);
-	WARN_ON(stats->atoms_in_flight == 0);
+	spin_lock_irqsave(&pc->dvfs.metrics.lock, flags);
 
-	stats->atoms_in_flight--;
-	stats->tis_stats[pc->dvfs.level].time_total += (curr - stats->period_start);
+#if !MALI_USE_CSF
+	/*
+	* JM slots can have 2 Atoms submitted per slot, with different UIDs
+	* If the primary slot is not for this uid, then check the secondary slot
+	*/
+	if (*work_stats != uid_stats) {
+		work_stats = &pc->dvfs.metrics.work_uid_stats[slot + BASE_JM_MAX_NR_SLOTS];
+	}
+#endif
 
-	if (stats->atoms_in_flight == 0)
-		/* This is the end of a period */
-		stats->period_start = 0;
-	else
-		stats->period_start = curr;
+	/* We should have something mapped to this slot */
+	WARN_ON_ONCE(*work_stats == NULL);
+	/* Should be the same stats */
+	WARN_ON_ONCE(uid_stats != *work_stats);
+	/* Forgot to init the start time? */
+	WARN_ON_ONCE(uid_stats->period_start == 0);
+	/* No jobs so how could have something have completed? */
+	if (!WARN_ON_ONCE(uid_stats->active_work_count == 0))
+		--uid_stats->active_work_count;
+	/*
+	 * We could only update this when the work count equals zero, and
+	 * avoid updating the period_start often. However we get more timely
+	 * updates this way.
+	 */
+	uid_stats->tis_stats[pc->dvfs.level].time_total += (curr - uid_stats->period_start);
+
+	/*
+	 * Reset the period start time when there is no work associated with
+	 * this UID, or update it to prevent double counting.
+	 */
+	uid_stats->period_start = uid_stats->active_work_count == 0 ? 0 : curr;
 
-	pc->dvfs.metrics.js_uid_stats[js] = NULL;
+	/* Unlink the UID stats from the slot stats */
+	*work_stats = NULL;
+
+	spin_unlock_irqrestore(&pc->dvfs.metrics.lock, flags);
 }
 
 /**
@@ -345,6 +378,7 @@ int gpu_dvfs_kctx_init(struct kbase_context *kctx)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
 	struct pixel_context *pc = kbdev->platform_context;
+	struct pixel_platform_data *pd = kctx->platform_data;
 
 	struct task_struct *task;
 	kuid_t uid;
@@ -397,7 +431,7 @@ int gpu_dvfs_kctx_init(struct kbase_context *kctx)
 	stats->active_kctx_count++;
 
 	/* Store a direct link in the kctx */
-	kctx->platform_data = stats;
+	pd->stats = stats;
 
 done:
 	mutex_unlock(&kbdev->kctx_list_lock);
@@ -405,7 +439,7 @@ done:
 }
 
 /**
- * gpu_dvfs_kctx_init() - Called when a kernel context is terminated
+ * gpu_dvfs_kctx_term() - Called when a kernel context is terminated
  *
  * @kctx: The &struct kbase_context that is being terminated
  *
@@ -415,7 +449,8 @@ done:
 void gpu_dvfs_kctx_term(struct kbase_context *kctx)
 {
 	struct kbase_device *kbdev = kctx->kbdev;
-	struct gpu_dvfs_metrics_uid_stats *stats = kctx->platform_data;
+	struct pixel_platform_data *pd = kctx->platform_data;
+	struct gpu_dvfs_metrics_uid_stats *stats = pd->stats;
 	unsigned long flags;
 
 	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
@@ -424,21 +459,13 @@ void gpu_dvfs_kctx_term(struct kbase_context *kctx)
 	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
 }
 
-/**
- * gpu_dvfs_metrics_init() - Initializes DVFS metrics.
- *
- * @kbdev: The &struct kbase_device for the GPU.
- *
- * Context: Process context. Takes and releases the DVFS lock.
- *
- * Return: On success, returns 0 otherwise returns an error code.
- */
 int gpu_dvfs_metrics_init(struct kbase_device *kbdev)
 {
 	struct pixel_context *pc = kbdev->platform_context;
 	int c;
 
 	mutex_lock(&pc->dvfs.lock);
+	spin_lock_init(&pc->dvfs.metrics.lock);
 
 	pc->dvfs.metrics.last_time = ktime_get_ns();
 	pc->dvfs.metrics.last_power_state = gpu_pm_get_power_state(kbdev);
@@ -460,14 +487,11 @@ int gpu_dvfs_metrics_init(struct kbase_device *kbdev)
 	/* Initialize per-UID metrics */
 	INIT_LIST_HEAD(&pc->dvfs.metrics.uid_stats_list);
 
+	memset(pc->dvfs.metrics.work_uid_stats, 0, sizeof(pc->dvfs.metrics.work_uid_stats));
+
 	return 0;
 }
 
-/**
- * gpu_dvfs_metrics_term() - Terminates DVFS metrics
- *
- * @kbdev: The &struct kbase_device for the GPU.
- */
 void gpu_dvfs_metrics_term(struct kbase_device *kbdev)
 {
 	struct pixel_context *pc = kbdev->platform_context;
diff --git a/mali_kbase/platform/pixel/pixel_gpu_itmon.c b/mali_kbase/platform/pixel/pixel_gpu_itmon.c
new file mode 100644
index 0000000..7dbce37
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_itmon.c
@@ -0,0 +1,383 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2023 Google LLC.
+ *
+ * This platform component registers an ITMON notifier callback which filters
+ * fabric fault reports where the GPU is identified as the initiator of the
+ * transaction.
+ *
+ * When such a fault occurs, it searches for the faulting physical address in
+ * the GPU page tables of all GPU contexts.  If the physical address appears in
+ * a page table, the context and corresponding virtual address are logged.
+ *
+ * Otherwise, a message is logged indicating that the physical address does not
+ * appear in any GPU page table.
+ */
+
+#if IS_ENABLED(CONFIG_EXYNOS_ITMON)
+
+/* Linux includes */
+#include <linux/of.h>
+
+/* SOC includes */
+#include <soc/google/exynos-itmon.h>
+
+/* Mali core includes */
+#include <mali_kbase.h>
+
+/* Pixel integration includes */
+#include "mali_kbase_config_platform.h"
+#include "pixel_gpu_control.h"
+
+
+/* GPU page tables may use more physical address bits than the bus, to encode
+ * other information.  We'll need to mask those away to match with bus
+ * addresses.
+ */
+#define PHYSICAL_ADDRESS_BITS 36
+#define PHYSICAL_ADDRESS_MASK ((1ULL << (PHYSICAL_ADDRESS_BITS)) - 1)
+
+/* Convert KBASE_MMU_PAGE_ENTRIES to number of bits */
+#define KBASE_MMU_PAGE_ENTRIES_LOG2 const_ilog2(KBASE_MMU_PAGE_ENTRIES)
+
+
+/**
+ * pixel_gpu_itmon_search_pgd() - Search a page directory page.
+ *
+ * @mmu_mode:  The &struct kbase_mmu_mode PTE accessor functions.
+ * @level:     The level of the page directory.
+ * @pa_pgd:    The physical address of the page directory page.
+ * @pa_search: The physical address to search for.
+ * @va_prefix: The virtual address prefix above this level.
+ *
+ * Return: The virtual address mapped to the physical address, or zero.
+ */
+static u64 pixel_gpu_itmon_search_pgd(struct kbase_mmu_mode const *mmu_mode,
+	int level, phys_addr_t pa_pgd, phys_addr_t pa_search, u64 va_prefix)
+{
+	u64 va_found = 0;
+	int i;
+
+	/* Map the page */
+	const u64 *entry = kmap(pfn_to_page(PFN_DOWN(pa_pgd)));
+	if (!entry)
+		return 0;
+
+	/* Shift the VA prefix left to make room for this new level */
+	va_prefix <<= KBASE_MMU_PAGE_ENTRIES_LOG2;
+
+	/* For each entry in the page directory */
+	for (i = 0; i < KBASE_MMU_PAGE_ENTRIES; i++) {
+
+		/* Is this a PTE, an ATE, or invalid? */
+		if (mmu_mode->pte_is_valid(entry[i], level)) {
+
+			/* PTE: Get the physical address of the next level PGD */
+			phys_addr_t pa_next = mmu_mode->pte_to_phy_addr(entry[i])
+				& PHYSICAL_ADDRESS_MASK;
+
+			/* Recurse into it */
+			if (pa_next) {
+				va_found = pixel_gpu_itmon_search_pgd(mmu_mode, level + 1,
+					pa_next, pa_search, va_prefix);
+				if (va_found)
+					break;
+			}
+
+		} else if (mmu_mode->ate_is_valid(entry[i], level)) {
+
+			/* ATE: Get the page (or block) physical address */
+			phys_addr_t pa_start = mmu_mode->pte_to_phy_addr(entry[i])
+				& PHYSICAL_ADDRESS_MASK;
+
+			if (pa_start) {
+				/* Get the size of the block:
+				 * this may be larger than a page, depending on level.
+				 * A competent compiler will hoist this out of the loop.
+				 */
+				int remaining_levels = MIDGARD_MMU_BOTTOMLEVEL - level;
+				size_t block_size = PAGE_SIZE <<
+					(KBASE_MMU_PAGE_ENTRIES_LOG2 * remaining_levels);
+
+				/* Test if the block contains the PA we are searching for */
+				if ((pa_search >= pa_start) &&
+				    (pa_search < (pa_start + block_size))) {
+
+					/* Combine translated and non-translated address bits */
+					va_found = (va_prefix * block_size) +
+					           (pa_search % block_size);
+					break;
+				}
+			}
+		}
+
+		/* Advance the virtual address prefix with each entry */
+		va_prefix++;
+	}
+
+	kunmap(pfn_to_page(PFN_DOWN(pa_pgd)));
+
+	return va_found;
+}
+
+/**
+ * pixel_gpu_itmon_search_page_table() - Search a page table for a PA.
+ *
+ * @kbdev: The &struct kbase_device.
+ * @table: The &struct kbase_mmu_table to search.
+ * @pa:    The physical address to search for.
+ *
+ * Return: The virtual address mapped to the physical address, or zero.
+ */
+static u64 pixel_gpu_itmon_search_page_table(struct kbase_device *kbdev,
+	struct kbase_mmu_table* table, phys_addr_t pa)
+{
+	u64 va;
+
+	rt_mutex_lock(&table->mmu_lock);
+	va = pixel_gpu_itmon_search_pgd(kbdev->mmu_mode, MIDGARD_MMU_TOPLEVEL,
+		table->pgd, pa, 0);
+	rt_mutex_unlock(&table->mmu_lock);
+
+	return va;
+}
+
+/**
+ * pixel_gpu_itmon_search_context() - Search the page tables of a context.
+ *
+ * @pc:   The &struct pixel_context.
+ * @kctx: The &struct kbase_context to search.
+ *
+ * Return: True if the faulting physical address was found.
+ */
+static bool pixel_gpu_itmon_search_context(struct pixel_context *pc,
+	struct kbase_context *kctx)
+{
+	u64 va = pixel_gpu_itmon_search_page_table(pc->kbdev, &kctx->mmu,
+		pc->itmon.pa);
+
+	/* If a mapping was found */
+	if (va) {
+		/* Get the task from the context */
+		struct pid *pid_struct;
+		struct task_struct *task;
+
+		rcu_read_lock();
+		pid_struct = find_get_pid(kctx->pid);
+		task = pid_task(pid_struct, PIDTYPE_PID);
+
+		/* And report it */
+		dev_err(pc->kbdev->dev,
+			"ITMON: Faulting physical address 0x%llX appears in page table of "
+			"task %s (pid %u), mapped from virtual address 0x%llx (as %d)\n",
+			pc->itmon.pa, task ? task->comm : "[null task]", kctx->pid, va,
+			kctx->as_nr);
+
+		put_pid(pid_struct);
+		rcu_read_unlock();
+
+		return true;
+	}
+
+	return false;
+}
+
+#if MALI_USE_CSF
+/**
+ * pixel_gpu_itmon_search_csffw() - Search the CSF MCU page table.
+ *
+ * @pc: The &struct pixel_context.
+ *
+ * Return: True if the faulting physical address was found.
+ */
+static bool pixel_gpu_itmon_search_csffw(struct pixel_context *pc)
+{
+	struct kbase_device *kbdev = pc->kbdev;
+
+	u64 va = pixel_gpu_itmon_search_page_table(kbdev, &kbdev->csf.mcu_mmu,
+		pc->itmon.pa);
+
+	/* If a mapping was found */
+	if (va) {
+		dev_err(kbdev->dev,
+			"ITMON: Faulting physical address 0x%llX appears in CSF MCU page "
+			"table, mapped from virtual address 0x%llx (as 0)\n",
+			pc->itmon.pa, va);
+		return true;
+	}
+
+	return false;
+}
+#endif /* MALI_USE_CSF */
+
+/**
+ * pixel_gpu_itmon_worker() - ITMON fault worker.
+ *
+ * Required to be able to lock mutexes while searching page tables.
+ *
+ * @data: The &struct work_struct.
+ */
+static void pixel_gpu_itmon_worker(struct work_struct *data)
+{
+	/* Recover the pixel_context */
+	struct pixel_context *pc = container_of(data, struct pixel_context,
+		itmon.work);
+
+	struct kbase_device *kbdev = pc->kbdev;
+	struct kbase_context *kctx;
+	bool found = false;
+
+	/* Log that the work has started */
+	dev_err(kbdev->dev,
+		"ITMON: Searching for physical address 0x%llX across all GPU page "
+		"tables...\n", pc->itmon.pa);
+
+	/* Search the CSF MCU page table first */
+#if MALI_USE_CSF
+	found |= pixel_gpu_itmon_search_csffw(pc);
+#endif
+
+	mutex_lock(&kbdev->kctx_list_lock);
+
+	/* Enumerate all contexts and search their page tables */
+	list_for_each_entry(kctx, &kbdev->kctx_list, kctx_list_link) {
+		found |= pixel_gpu_itmon_search_context(pc, kctx);
+	}
+
+	mutex_unlock(&kbdev->kctx_list_lock);
+
+	/* For completeness, log that we did not find the fault address anywhere */
+	if (!found) {
+		dev_err(kbdev->dev,
+			"ITMON: Faulting physical address 0x%llX NOT PRESENT in any GPU "
+			"page table - GPU would not have initiated this access\n",
+			pc->itmon.pa);
+	}
+
+	/* Let the ITMON ISR know that we're done and it can continue */
+	atomic_dec(&pc->itmon.active);
+}
+
+/**
+ * pixel_gpu_itmon_notifier() - Handle an ITMON fault report.
+ *
+ * @nb:      The &struct notifier_block inside &struct pixel_context.
+ * @action:  Unused.
+ * @nb_data: The ITMON report.
+ *
+ * Return: NOTIFY_OK to continue calling other notifier blocks.
+ */
+static int pixel_gpu_itmon_notifier(struct notifier_block *nb,
+	unsigned long action, void *nb_data)
+{
+	/* Recover the pixel_context */
+	struct pixel_context *pc = container_of(nb, struct pixel_context, itmon.nb);
+
+	/* Get details of the ITMON report */
+	struct itmon_notifier *itmon_info = nb_data;
+
+	/* Filter out non-GPU ports */
+	if ((!itmon_info->port) ||
+	    (strncmp(itmon_info->port, "GPU", 3) &&
+	     strncmp(itmon_info->port, "G3D", 3)))
+		return NOTIFY_OK;
+
+	/* Immediately acknowledge that this fault matched our filter */
+	dev_err(pc->kbdev->dev,
+		"Detected relevant ITMON fault report from %s to 0x%llX, "
+		"enqueueing work...\n", itmon_info->port, (u64)itmon_info->target_addr);
+
+	/* Make sure we have finished processing previous work */
+	if (atomic_fetch_inc(&pc->itmon.active) != 0) {
+		atomic_dec(&pc->itmon.active);
+		dev_err(pc->kbdev->dev, "Previous work not yet finished, skipping\n");
+		return NOTIFY_OK;
+	}
+
+	/* Save the PA to search for */
+	pc->itmon.pa = itmon_info->target_addr;
+
+	/* Access to GPU page tables is protected by a mutex, which we cannot lock
+	 * here in an atomic context.  Queue work to another CPU to do the search.
+	 */
+	queue_work(pc->itmon.wq, &pc->itmon.work);
+
+	/* (Try to) busy-wait for that work to complete, before we ramdump */
+	{
+		u64 start = ktime_get_ns();
+
+		while (atomic_read(&pc->itmon.active) > 0) {
+
+			if ((ktime_get_ns() - start) < (NSEC_PER_SEC / 2)) {
+				udelay(10000);
+			} else {
+				dev_err(pc->kbdev->dev,
+					"Timed out waiting for ITMON work, this is not an error\n");
+				break;
+			}
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+/**
+ * gpu_itmon_init() - Initialize ITMON notifier callback.
+ *
+ * @kbdev: The &struct kbase_device.
+ *
+ * Return: An error code, or 0 on success.
+ */
+int gpu_itmon_init(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	/* The additional diagnostic information offered by this callback is only
+	 * useful if it can be collected as part of a ramdump.  Ramdumps are
+	 * disabled in "user" builds, so query the build variant and skip
+	 * initialization if that is the case.
+	 */
+	struct device_node *dpm = of_find_node_by_name(NULL, "dpm");
+	const char *variant = NULL;
+	if ((!dpm) || of_property_read_string(dpm, "variant", &variant) ||
+	    (!strcmp(variant, "user")))
+		return 0;
+
+	/* Create a workqueue that can run on any CPU with high priority, so that
+	 * it can run while we (try to) wait for it in the ITMON interrupt.
+	 */
+	pc->itmon.wq = alloc_workqueue("mali_itmon_wq", WQ_UNBOUND | WQ_HIGHPRI, 1);
+	if (!pc->itmon.wq)
+		return -ENOMEM;
+	INIT_WORK(&pc->itmon.work, pixel_gpu_itmon_worker);
+
+	/* Then register our ITMON notifier callback */
+	pc->itmon.nb.notifier_call = pixel_gpu_itmon_notifier;
+	itmon_notifier_chain_register(&pc->itmon.nb);
+
+	return 0;
+}
+
+/**
+ * gpu_itmon_term() - Terminate ITMON notifier callback.
+ *
+ * @kbdev: The &struct kbase_device.
+ */
+void gpu_itmon_term(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	if (pc->itmon.wq) {
+		/* Unregister our ITMON notifier callback first */
+		itmon_notifier_chain_unregister(&pc->itmon.nb);
+
+		/* Then it's safe to destroy the workqueue */
+		destroy_workqueue(pc->itmon.wq);
+		pc->itmon.wq = NULL;
+	}
+}
+
+/* Depend on ITMON driver */
+MODULE_SOFTDEP("pre: itmon");
+
+#endif /* IS_ENABLED(CONFIG_EXYNOS_ITMON) */
+\ No newline at end of file
diff --git a/mali_kbase/platform/pixel/pixel_gpu_power.c b/mali_kbase/platform/pixel/pixel_gpu_power.c
index 33ea438..7b28f9e 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_power.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_power.c
@@ -15,10 +15,12 @@
 /* SOC includes */
 #if IS_ENABLED(CONFIG_EXYNOS_PMU_IF)
 #include <soc/google/exynos-pmu-if.h>
+#include <soc/google/exynos-pd.h>
 #endif
 #if IS_ENABLED(CONFIG_CAL_IF)
 #include <soc/google/cal-if.h>
 #endif
+#include <linux/soc/samsung/exynos-smc.h>
 
 /* Mali core includes */
 #include <mali_kbase.h>
@@ -27,6 +29,7 @@
 #include "mali_kbase_config_platform.h"
 #include "pixel_gpu_control.h"
 #include "pixel_gpu_trace.h"
+#include <trace/events/power.h>
 
 /*
  * GPU_PM_DOMAIN_NAMES - names for GPU power domains.
@@ -40,28 +43,217 @@ static const char * const GPU_PM_DOMAIN_NAMES[GPU_PM_DOMAIN_COUNT] = {
 };
 
 /**
- * gpu_pm_power_on_cores() - Powers on the GPU shader cores.
+ * struct pixel_rail_transition - Represents a power rail state transition
+ *
+ * @begin_timestamp: Time-stamp from when the transition began
+ * @end_timestamp:   Time-stamp from when the transition completed
+ * @from:            Rail state at the start of the transition
+ * @to:              Rail state at the end of the transition
+ **/
+struct pixel_rail_transition {
+	ktime_t begin_timestamp;
+	ktime_t end_timestamp;
+	uint8_t from;
+	uint8_t to;
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_rail_transition) == 18,
+	       "Incorrect pixel_rail_transition size");
+_Static_assert(GPU_POWER_LEVEL_NUM < ((uint8_t)(~0U)), "gpu_power_state must fit in one byte");
+
+#define PIXEL_RAIL_LOG_MAX (PAGE_SIZE / sizeof(struct pixel_rail_transition))
+
+/**
+ * struct pixel_rail_state_metadata - Info about the rail transition log
+ *
+ * @magic:            Always 'pprs', helps find the log in memory dumps
+ * @version:          Updated whenever the binary layout changes
+ * @log_address:      The memory address of the power rail state log
+ * @log_offset:       The offset of the power rail state log within an SSCD
+ * @log_length:       Number of used bytes in the power rail state log ring buffer.
+ *                    The length will be <= (FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT)
+ * @last_entry:       The last entry index, used to find the start and end of the ring buffer
+ * @log_entry_stride: The stride in bytes between entries within the log
+ * @_reserved:        Bytes reserved for future use
+ **/
+struct pixel_rail_state_metadata {
+	char magic[4];
+	uint8_t version;
+	uint64_t log_address;
+	uint32_t log_offset;
+	uint32_t log_length;
+	uint32_t last_entry;
+	uint8_t log_entry_stride;
+	char _reserved[6];
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_rail_state_metadata) == 32,
+	       "Incorrect pixel_rail_state_metadata size");
+
+
+/**
+ * struct pixel_rail_state_log - Log containing a record of power rail state transitions
+ *
+ * @meta:       Info about the log
+ * @log_rb:     The actual log
+ **/
+struct pixel_rail_state_log {
+	struct pixel_rail_state_metadata meta;
+	struct pixel_rail_transition log_rb[PIXEL_RAIL_LOG_MAX];
+} __attribute__((packed));
+
+/**
+ * gpu_pm_rail_state_log_last_entry() - Get a handle to the last logged rail transition
+ *
+ * @log: The &struct pixel_rail_state_log containing all logged transitions
+ *
+ * Context: Process context
+ *
+ * Return: Most recent log entry
+ */
+static struct pixel_rail_transition *
+gpu_pm_rail_state_log_last_entry(struct pixel_rail_state_log *log)
+{
+	return &log->log_rb[log->meta.last_entry];
+}
+
+/**
+ * gpu_pm_rail_state_start_transition_lock() - Mark the start of a power rail transition
+ *
+ * @pc: The &struct pixel_context for the GPU
+ *
+ * Mark the beginning of a power rail transition. This function starts a critical section
+ * by holding the pm.lock, and creates a new log entry to record the transition.
+ *
+ * Context: Process context, acquires pc->pm.lock and does not release it
+ */
+static void gpu_pm_rail_state_start_transition_lock(struct pixel_context *pc)
+{
+	struct pixel_rail_state_log *log;
+	struct pixel_rail_transition *entry;
+
+	mutex_lock(&pc->pm.lock);
+
+	log = pc->pm.rail_state_log;
+	log->meta.last_entry = (log->meta.last_entry + 1) % PIXEL_RAIL_LOG_MAX;
+	log->meta.log_length = max(log->meta.last_entry, log->meta.log_length);
+	entry = gpu_pm_rail_state_log_last_entry(log);
+
+	/* Clear to prevent leaking an old event */
+	memset(entry, 0, sizeof(struct pixel_rail_transition));
+
+	entry->from = (uint8_t)pc->pm.state;
+	entry->begin_timestamp = ktime_get_ns();
+}
+
+/**
+ * gpu_pm_rail_state_end_transition_unlock() - Mark the end of a power rail transition
+ *
+ * @pc: The &struct pixel_context for the GPU
+ *
+ * Mark the end of a power rail transition. This function ends a critical section
+ * by releasing the pm.lock, and completes the partial event log entry added when
+ * the transition began.
+ *
+ * Context: Process context, expects pc->pm.lock to be held, releases pc->pm.lock
+ */
+static void gpu_pm_rail_state_end_transition_unlock(struct pixel_context *pc)
+{
+	struct pixel_rail_transition *entry;
+
+	lockdep_assert_held(&pc->pm.lock);
+
+	entry = gpu_pm_rail_state_log_last_entry(pc->pm.rail_state_log);
+
+	entry->end_timestamp = ktime_get_ns();
+	entry->to = (uint8_t)pc->pm.state;
+	trace_gpu_power_state(entry->end_timestamp - entry->begin_timestamp, entry->from, entry->to);
+
+	mutex_unlock(&pc->pm.lock);
+}
+
+/**
+ * gpu_pm_get_rail_state_log() - Obtain a handle to the rail state log
  *
  * @kbdev: The &struct kbase_device for the GPU.
  *
- * Powers on the CORES domain and issues trace points and events. Also powers on TOP and cancels
- * any pending suspend operations on it.
+ * Context: Process context
  *
- * Context: Process context. Takes and releases PM lock.
+ * Return: Opaque handle to rail state log
+ */
+void* gpu_pm_get_rail_state_log(struct kbase_device *kbdev)
+{
+	return ((struct pixel_context *)kbdev->platform_context)->pm.rail_state_log;
+}
+
+
+/**
+ * gpu_pm_get_rail_state_log_size() - Size in bytes of the rail state log
  *
- * Return: If GPU state has been lost, 1 is returned. Otherwise 0 is returned.
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Context: Process context
+ *
+ * Return: Size in bytes of the rail state log, for dumping purposes
+ */
+unsigned int gpu_pm_get_rail_state_log_size(struct kbase_device *kbdev)
+{
+	return sizeof(struct pixel_rail_state_log);
+}
+
+/**
+ * gpu_pm_rail_state_log_init() - Allocate and initialize the power rail state transition log
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Context: Process context
+ *
+ * Return: Owning pointer to allocated rail state log
+ */
+static struct pixel_rail_state_log* gpu_pm_rail_state_log_init(struct kbase_device *kbdev)
+{
+	struct pixel_rail_state_log* log = kzalloc(sizeof(struct pixel_rail_state_log), GFP_KERNEL);
+
+	if (log == NULL) {
+		dev_err(kbdev->dev, "Failed to allocated pm_rail_state_log");
+		return log;
+	}
+
+	log->meta = (struct pixel_rail_state_metadata) {
+		.magic = "pprs",
+		.version = 1,
+		.log_address = (uint64_t)log->log_rb,
+		.log_offset = offsetof(struct pixel_rail_state_log, log_rb),
+		.log_length = 0,
+		.last_entry = 0,
+		.log_entry_stride = (uint8_t)sizeof(struct pixel_rail_transition),
+	};
+
+	return log;
+}
+
+/**
+ * gpu_pm_rail_state_log_term() - Free the rail state transition log
+ *
+ * @log: The &struct pixel_rail_state_log to destroy
+ *
+ * Context: Process context
  */
-static int gpu_pm_power_on_cores(struct kbase_device *kbdev)
+static void gpu_pm_rail_state_log_term(struct pixel_rail_state_log *log)
+{
+	kfree(log);
+}
+
+/**
+ * gpu_pm_power_on_top_nolock() - See gpu_pm_power_on_top
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ */
+static int gpu_pm_power_on_top_nolock(struct kbase_device *kbdev)
 {
 	int ret;
 	struct pixel_context *pc = kbdev->platform_context;
-	u64 start_ns = ktime_get_ns();
-
-	mutex_lock(&pc->pm.lock);
 
 	pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
 	pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
-
 	/*
 	 * We determine whether GPU state was lost by detecting whether the GPU state reached
 	 * GPU_POWER_LEVEL_OFF before we entered this function. The GPU state is set to be
@@ -75,61 +267,114 @@ static int gpu_pm_power_on_cores(struct kbase_device *kbdev)
 	 */
 	ret = (pc->pm.state == GPU_POWER_LEVEL_OFF);
 
-	trace_gpu_power_state(ktime_get_ns() - start_ns,
-		GPU_POWER_LEVEL_GLOBAL, GPU_POWER_LEVEL_STACKS);
+	gpu_dvfs_enable_updates(kbdev);
 #ifdef CONFIG_MALI_MIDGARD_DVFS
+	kbase_pm_metrics_start(kbdev);
 	gpu_dvfs_event_power_on(kbdev);
 #endif
-
 #if IS_ENABLED(CONFIG_GOOGLE_BCL)
+	if (!pc->pm.bcl_dev)
+		pc->pm.bcl_dev = google_retrieve_bcl_handle();
 	if (pc->pm.bcl_dev)
 		google_init_gpu_ratio(pc->pm.bcl_dev);
 #endif
 
-	pc->pm.state = GPU_POWER_LEVEL_STACKS;
+#if !IS_ENABLED(CONFIG_SOC_GS101)
+	if (exynos_smc(SMC_PROTECTION_SET, 0, PROT_G3D, SMC_PROTECTION_ENABLE) != 0) {
+		dev_err(kbdev->dev, "Couldn't enable protected mode after GPU power-on");
+	}
+#endif
 
-	mutex_unlock(&pc->pm.lock);
+	pc->pm.state = GPU_POWER_LEVEL_STACKS;
 
 	return ret;
 }
 
 /**
- * gpu_pm_power_off_cores() - Powers off the GPU shader cores.
+ * gpu_pm_power_on_top() - Powers on the GPU global domains and shader cores.
  *
  * @kbdev: The &struct kbase_device for the GPU.
  *
- * Powers off the CORES domain and issues trace points and events. Also marks the TOP domain for
- * delayed suspend. Complete power down of all GPU domains will only occur after this delayed
- * suspend, and the kernel notifies of this change via the &gpu_pm_callback_power_runtime_suspend
- * callback.
+ * Powers on the CORES domain and issues trace points and events. Also powers on TOP and cancels
+ * any pending suspend operations on it.
  *
- * Note: If the we have already performed these operations without an intervening call to
- *       &gpu_pm_power_on_cores, then we take no action.
+ * Context: Process context. Takes and releases PM lock.
  *
- * Context: Process context. Takes and releases the PM lock.
+ * Return: If GPU state has been lost, 1 is returned. Otherwise 0 is returned.
  */
-static void gpu_pm_power_off_cores(struct kbase_device *kbdev)
+static int gpu_pm_power_on_top(struct kbase_device *kbdev)
 {
+	int ret;
 	struct pixel_context *pc = kbdev->platform_context;
-	u64 start_ns = ktime_get_ns();
 
-	mutex_lock(&pc->pm.lock);
+	gpu_pm_rail_state_start_transition_lock(pc);
+	ret = gpu_pm_power_on_top_nolock(kbdev);
+	gpu_pm_rail_state_end_transition_unlock(pc);
+
+	return ret;
+}
+
+/**
+ * gpu_pm_power_off_top_nolock() - See gpu_pm_power_off_top
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ */
+static void gpu_pm_power_off_top_nolock(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
 
-	if (pc->pm.state > GPU_POWER_LEVEL_GLOBAL) {
+	if (pc->pm.state == GPU_POWER_LEVEL_STACKS) {
 		pm_runtime_put_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
 		pc->pm.state = GPU_POWER_LEVEL_GLOBAL;
+	}
+
+	if (pc->pm.state == GPU_POWER_LEVEL_GLOBAL) {
+#if !IS_ENABLED(CONFIG_SOC_GS101)
+		if (exynos_smc(SMC_PROTECTION_SET, 0, PROT_G3D, SMC_PROTECTION_DISABLE) != 0) {
+			dev_err(kbdev->dev, "Couldn't disable protected mode before GPU power-off");
+		}
+#endif
 
-		pm_runtime_mark_last_busy(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
-		pm_runtime_put_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+		gpu_dvfs_disable_updates(kbdev);
+
+		if (pc->pm.use_autosuspend) {
+			pm_runtime_mark_last_busy(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+			pm_runtime_put_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+		} else {
+			pm_runtime_put_sync_suspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+		}
+		pc->pm.state = GPU_POWER_LEVEL_OFF;
 
-		trace_gpu_power_state(ktime_get_ns() - start_ns,
-			GPU_POWER_LEVEL_STACKS, GPU_POWER_LEVEL_GLOBAL);
 #ifdef CONFIG_MALI_MIDGARD_DVFS
 		gpu_dvfs_event_power_off(kbdev);
+		kbase_pm_metrics_stop(kbdev);
 #endif
+
 	}
+}
 
-	mutex_unlock(&pc->pm.lock);
+/**
+ * gpu_pm_power_off_top() - Instruct GPU to transition to OFF.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Powers off the CORES domain if they are on. Marks the TOP domain for delayed
+ * suspend. The complete power down of all GPU domains will only occur after
+ * this delayed suspend, and the kernel notifies of this change via the
+ * &gpu_pm_callback_power_runtime_suspend callback.
+ *
+ * Note: If the we have already performed these operations without an intervening call to
+ *       &gpu_pm_power_on_top, then we take no action.
+ *
+ * Context: Process context. Takes and releases the PM lock.
+ */
+static void gpu_pm_power_off_top(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	gpu_pm_rail_state_start_transition_lock(pc);
+	gpu_pm_power_off_top_nolock(kbdev);
+	gpu_pm_rail_state_end_transition_unlock(pc);
 }
 
 /**
@@ -152,7 +397,7 @@ static int gpu_pm_callback_power_on(struct kbase_device *kbdev)
 {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
-	return gpu_pm_power_on_cores(kbdev);
+	return gpu_pm_power_on_top(kbdev);
 }
 
 /**
@@ -170,7 +415,7 @@ static void gpu_pm_callback_power_off(struct kbase_device *kbdev)
 {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
-	gpu_pm_power_off_cores(kbdev);
+	gpu_pm_power_off_top(kbdev);
 }
 
 /**
@@ -204,117 +449,168 @@ static void gpu_pm_callback_power_suspend(struct kbase_device *kbdev)
 {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
-	gpu_pm_power_off_cores(kbdev);
+	gpu_pm_power_off_top(kbdev);
 }
 
-#ifdef KBASE_PM_RUNTIME
+#if IS_ENABLED(KBASE_PM_RUNTIME)
 
 /**
- * gpu_pm_callback_power_runtime_suspend() - Called when a TOP domain is going to runtime suspend
+ * gpu_pm_callback_power_runtime_init() - Initialize runtime power management.
  *
- * @dev: The device that is going to runtime suspend
+ * @kbdev: The &struct kbase_device for the GPU.
  *
- * This callback is made when @dev is about to enter runtime suspend. In our case, this occurs when
- * the TOP domain of GPU is about to enter runtime suspend. At this point we take the opportunity
- * to store that state will be lost and disable DVFS metrics gathering.
+ * This callback is made by the core Mali driver at the point where runtime power management is
+ * being initialized early on in the probe of the Mali device.
  *
- * Note: This function doesn't take the PM lock prior to updating GPU state as it doesn't explicitly
- *       attempt to update GPU power domain state. The caller of this function (or another function
- *       further up the callstack) will hold &power.lock for the TOP domain's &struct device and
- *       that is sufficient for ensuring serialization of the GPU power state.
+ * We enable autosuspend for the TOP domain so that after the autosuspend delay, the core Mali
+ * driver knows to disable the collection of GPU utilization data used for DVFS purposes.
  *
- * Return: Always returns 0.
+ * Return: Returns 0 on success, or an error code on failure.
  */
-static int gpu_pm_callback_power_runtime_suspend(struct device *dev)
+static int gpu_pm_callback_power_runtime_init(struct kbase_device *kbdev)
 {
-	struct kbase_device *kbdev = dev_get_drvdata(dev);
 	struct pixel_context *pc = kbdev->platform_context;
 
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
-	WARN_ON(pc->pm.state > GPU_POWER_LEVEL_GLOBAL);
-	pc->pm.state = GPU_POWER_LEVEL_OFF;
+	if (!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]) ||
+		!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES])) {
+		dev_warn(kbdev->dev, "pm_runtime not enabled\n");
+		return -ENOSYS;
+	}
 
-#ifdef CONFIG_MALI_MIDGARD_DVFS
-	kbase_pm_metrics_stop(kbdev);
-#endif
+	if (pc->pm.use_autosuspend) {
+		pm_runtime_set_autosuspend_delay(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP],
+			pc->pm.autosuspend_delay);
+		pm_runtime_use_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+	}
 
 	return 0;
 }
 
 /**
- * gpu_pm_callback_power_runtime_resume() - Called when a TOP domain is going to runtime resume
- *
- * @dev: The device that is going to runtime suspend
+ * kbase_device_runtime_term() - Initialize runtime power management.
  *
- * This callback is made when @dev is about to runtime resume. In our case, this occurs when
- * the TOP domain of GPU is about to runtime resume. We use this callback to enable DVFS metrics
- * gathering.
+ * @kbdev: The &struct kbase_device for the GPU.
  *
- * Return: Always returns 0.
+ * This callback is made via the core Mali driver at the point where runtime power management needs
+ * to be de-initialized. Currently this only happens if the device probe fails at a point after
+ * which runtime power management has been initialized.
  */
-static int gpu_pm_callback_power_runtime_resume(struct device *dev)
+static void gpu_pm_callback_power_runtime_term(struct kbase_device *kbdev)
 {
-#ifdef CONFIG_MALI_MIDGARD_DVFS
-	struct kbase_device *kbdev = dev_get_drvdata(dev);
+	struct pixel_context *pc = kbdev->platform_context;
 
-	kbase_pm_metrics_start(kbdev);
-#endif
-	return 0;
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
+	pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
 }
 
+#endif /* IS_ENABLED(KBASE_PM_RUNTIME) */
+
+
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
 /**
- * gpu_pm_callback_power_runtime_init() - Initialize runtime power management.
+ * gpu_pm_power_on_cores() - Powers on the GPU shader cores for
+ *                           CONFIG_MALI_HOST_CONTROLS_SC_RAILS integrations.
  *
  * @kbdev: The &struct kbase_device for the GPU.
  *
- * This callback is made by the core Mali driver at the point where runtime power management is
- * being initialized early on in the probe of the Mali device.
- *
- * We enable autosuspend for the TOP domain so that after the autosuspend delay, the core Mali
- * driver knows to disable the collection of GPU utilization data used for DVFS purposes.
+ * Powers on the CORES domain for CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+ * integrations. Afterwards shaders must be powered and may be used by GPU.
  *
- * Return: Returns 0 on success, or an error code on failure.
+ * Context: Process context. Takes and releases PM lock.
  */
-static int gpu_pm_callback_power_runtime_init(struct kbase_device *kbdev)
-{
+static void gpu_pm_power_on_cores(struct kbase_device *kbdev) {
 	struct pixel_context *pc = kbdev->platform_context;
 
-	dev_dbg(kbdev->dev, "%s\n", __func__);
+	gpu_pm_rail_state_start_transition_lock(pc);
 
-	pm_runtime_set_autosuspend_delay(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP],
-		pc->pm.autosuspend_delay);
-	pm_runtime_use_autosuspend(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+	if (pc->pm.state == GPU_POWER_LEVEL_GLOBAL && pc->pm.ifpo_enabled) {
+		pm_runtime_get_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
+		pc->pm.state = GPU_POWER_LEVEL_STACKS;
 
-	if (!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]) ||
-		!pm_runtime_enabled(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES])) {
-		dev_warn(kbdev->dev, "pm_runtime not enabled\n");
-		return -ENOSYS;
+#ifdef CONFIG_MALI_MIDGARD_DVFS
+		gpu_dvfs_event_power_on(kbdev);
+#endif
 	}
 
-	return 0;
+	gpu_pm_rail_state_end_transition_unlock(pc);
 }
 
 /**
- * kbase_device_runtime_term() - Initialize runtime power management.
+ * gpu_pm_power_off_cores() - Powers off the GPU shader cores for
+ *                            CONFIG_MALI_HOST_CONTROLS_SC_RAILS integrations.
  *
  * @kbdev: The &struct kbase_device for the GPU.
  *
- * This callback is made via the core Mali driver at the point where runtime power management needs
- * to be de-initialized. Currently this only happens if the device probe fails at a point after
- * which runtime power management has been initialized.
+ * Powers off the CORES domain for CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+ * integrations. Afterwards shaders are not powered and may not be used by GPU.
+ *
+ * Context: Process context. Takes and releases PM lock.
  */
-static void gpu_pm_callback_power_runtime_term(struct kbase_device *kbdev)
-{
+static void gpu_pm_power_off_cores(struct kbase_device *kbdev) {
 	struct pixel_context *pc = kbdev->platform_context;
 
+	gpu_pm_rail_state_start_transition_lock(pc);
+
+	if (pc->pm.state == GPU_POWER_LEVEL_STACKS && pc->pm.ifpo_enabled) {
+		pm_runtime_put_sync(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
+		pc->pm.state = GPU_POWER_LEVEL_GLOBAL;
+
+#ifdef CONFIG_MALI_MIDGARD_DVFS
+		gpu_dvfs_event_power_off(kbdev);
+#endif
+	}
+
+	gpu_pm_rail_state_end_transition_unlock(pc);
+}
+
+/**
+ * gpu_pm_callback_power_sc_rails_on() - Called by GPU when shaders are needed.
+ *
+ * @kbdev: The device that needs its shaders powered on.
+ *
+ * This callback is made when @dev needs shader cores powered on integrations
+ * using CONFIG_MALI_HOST_CONTROLS_SC_RAILS.
+ */
+static void gpu_pm_callback_power_sc_rails_on(struct kbase_device *kbdev) {
 	dev_dbg(kbdev->dev, "%s\n", __func__);
 
-	pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_CORES]);
-	pm_runtime_disable(pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]);
+	gpu_pm_power_on_cores(kbdev);
 }
 
-#endif /* KBASE_PM_RUNTIME */
+/**
+ * gpu_pm_callback_power_sc_rails_off() - Called by GPU when shaders are idle.
+ *
+ * @kbdev: The device that needs its shaders powered on.
+ *
+ * This callback is made when @dev coud have its shader cores powered off on
+ * integrations using CONFIG_MALI_HOST_CONTROLS_SC_RAILS.
+ */
+static void gpu_pm_callback_power_sc_rails_off(struct kbase_device *kbdev) {
+	dev_dbg(kbdev->dev, "%s\n", __func__);
+
+	gpu_pm_power_off_cores(kbdev);
+}
+#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */
+
+static void gpu_pm_hw_reset(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	/* Ensure the power cycle happens inside one critical section */
+	gpu_pm_rail_state_start_transition_lock(pc);
+
+	dev_warn(kbdev->dev, "pixel: performing GPU hardware reset");
+
+	gpu_pm_power_off_top_nolock(kbdev);
+	/* GPU state loss is intended */
+	(void)gpu_pm_power_on_top_nolock(kbdev);
+
+	gpu_pm_rail_state_end_transition_unlock(pc);
+}
 
 /*
  * struct pm_callbacks - Callbacks for linking to core Mali KMD power management
@@ -350,7 +646,7 @@ struct kbase_pm_callback_conf pm_callbacks = {
 	.power_on_callback = gpu_pm_callback_power_on,
 	.power_suspend_callback = gpu_pm_callback_power_suspend,
 	.power_resume_callback = NULL,
-#ifdef KBASE_PM_RUNTIME
+#if IS_ENABLED(KBASE_PM_RUNTIME)
 	.power_runtime_init_callback = gpu_pm_callback_power_runtime_init,
 	.power_runtime_term_callback = gpu_pm_callback_power_runtime_term,
 	.power_runtime_off_callback = NULL,
@@ -363,39 +659,16 @@ struct kbase_pm_callback_conf pm_callbacks = {
 	.power_runtime_on_callback = NULL,
 	.power_runtime_idle_callback = NULL,
 #endif /* KBASE_PM_RUNTIME */
-	.soft_reset_callback = NULL
+	.soft_reset_callback = NULL,
+	.hardware_reset_callback = gpu_pm_hw_reset,
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	.power_on_sc_rails_callback = gpu_pm_callback_power_sc_rails_on,
+	.power_off_sc_rails_callback = gpu_pm_callback_power_sc_rails_off,
+#endif /* CONFIG_MALI_HOST_CONTROLS_SC_RAILS */
 };
 
 /**
- * gpu_pm_get_pm_cores_domain() - Find the GPU's power domain.
- *
- * @g3d_genpd_name: A string containing the name of the power domain
- *
- * Searches through the available power domains in device tree for one that
- * matched @g3d_genpd_name and returns it if found.
- *
- * Return: A pointer to the power domain if found, NULL otherwise.
- */
-static struct exynos_pm_domain *gpu_pm_get_pm_cores_domain(const char *g3d_genpd_name)
-{
-	struct device_node *np;
-	struct platform_device *pdev;
-	struct exynos_pm_domain *pd;
-
-	for_each_compatible_node(np, NULL, "samsung,exynos-pd") {
-		if (of_device_is_available(np)) {
-			pdev = of_find_device_by_node(np);
-			pd = (struct exynos_pm_domain *)platform_get_drvdata(pdev);
-			if (strcmp(g3d_genpd_name, (const char *)(pd->genpd.name)) == 0)
-				return pd;
-		}
-	}
-
-	return NULL;
-}
-
-/**
- * gpu_pm_get_power_state() - Returns the current power state of a GPU.
+ * gpu_pm_get_power_state() - Returns the current power state of the GPU.
  *
  * @kbdev: The &struct kbase_device for the GPU.
  *
@@ -472,20 +745,17 @@ int gpu_pm_init(struct kbase_device *kbdev)
 		}
 	}
 
-	/*
-	 * We set up runtime pm callbacks specifically for the TOP domain. This is so that when we
-	 * use autosupend it will only affect the TOP domain and not CORES as we control the power
-	 * state of CORES directly.
-	 */
-	pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]->pm_domain->ops.runtime_suspend =
-		&gpu_pm_callback_power_runtime_suspend;
-	pc->pm.domain_devs[GPU_PM_DOMAIN_TOP]->pm_domain->ops.runtime_resume =
-		&gpu_pm_callback_power_runtime_resume;
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	pc->pm.ifpo_enabled = true;
+#endif
 
 	if (of_property_read_u32(np, "gpu_pm_autosuspend_delay", &pc->pm.autosuspend_delay)) {
-		pc->pm.autosuspend_delay = AUTO_SUSPEND_DELAY;
-		dev_info(kbdev->dev, "autosuspend delay not set in DT, using default of %dms\n",
-			AUTO_SUSPEND_DELAY);
+		pc->pm.use_autosuspend = false;
+		pc->pm.autosuspend_delay = 0;
+		dev_info(kbdev->dev, "using synchronous suspend for TOP domain\n");
+	} else {
+		pc->pm.use_autosuspend = true;
+		dev_info(kbdev->dev, "autosuspend delay set to %ims for TOP domain\n", pc->pm.autosuspend_delay);
 	}
 
 	if (of_property_read_u32(np, "gpu_pmu_status_reg_offset", &pc->pm.status_reg_offset)) {
@@ -507,14 +777,19 @@ int gpu_pm_init(struct kbase_device *kbdev)
 		goto error;
 	}
 
-	pc->pm.domain = gpu_pm_get_pm_cores_domain(g3d_power_domain_name);
-	if (pc->pm.domain == NULL)
+	pc->pm.domain = exynos_pd_lookup_name(g3d_power_domain_name);
+	if (pc->pm.domain == NULL) {
+		dev_err(kbdev->dev, "Failed to find GPU power domain '%s'\n",
+			g3d_power_domain_name);
 		return -ENODEV;
+	}
 
 #if IS_ENABLED(CONFIG_GOOGLE_BCL)
 	pc->pm.bcl_dev = google_retrieve_bcl_handle();
 #endif
 
+	pc->pm.rail_state_log = gpu_pm_rail_state_log_init(kbdev);
+
 	return 0;
 
 error:
@@ -535,6 +810,8 @@ void gpu_pm_term(struct kbase_device *kbdev)
 	struct pixel_context *pc = kbdev->platform_context;
 	int i;
 
+	gpu_pm_rail_state_log_term(pc->pm.rail_state_log);
+
 	for (i = 0; i < GPU_PM_DOMAIN_COUNT; i++) {
 		if (pc->pm.domain_devs[i]) {
 			if (pc->pm.domain_links[i])
diff --git a/mali_kbase/platform/pixel/pixel_gpu_slc.c b/mali_kbase/platform/pixel/pixel_gpu_slc.c
new file mode 100644
index 0000000..94409d2
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_slc.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022-2023 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+
+/* Mali core includes */
+#include <mali_kbase.h>
+
+/* UAPI includes */
+#include <uapi/gpu/arm/midgard/platform/pixel/pixel_gpu_common_slc.h>
+/* Back-door mali_pixel include */
+#include <uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h>
+
+/* Pixel integration includes */
+#include "mali_kbase_config_platform.h"
+#include "pixel_gpu_slc.h"
+
+struct dirty_region {
+	u64 first_vpfn;
+	u64 last_vpfn;
+	u64 dirty_pgds;
+};
+
+/**
+ * struct gpu_slc_liveness_update_info - Buffer info, and live ranges
+ *
+ * @buffer_va:         Array of buffer base virtual addresses
+ * @buffer_sizes:      Array of buffer sizes
+ * @buffer_count:      Number of elements in the va and sizes buffers
+ * @live_ranges:       Array of &struct kbase_pixel_gpu_slc_liveness_mark denoting live ranges for
+ *                     each buffer
+ * @live_ranges_count: Number of elements in the live ranges buffer
+ */
+struct gpu_slc_liveness_update_info {
+	u64* buffer_va;
+	u64* buffer_sizes;
+	u64 buffer_count;
+	struct kbase_pixel_gpu_slc_liveness_mark* live_ranges;
+	u64 live_ranges_count;
+};
+
+/**
+ * gpu_slc_lock_as - Lock the current process address space
+ *
+ * @kctx:  The &struct kbase_context
+ */
+static void gpu_slc_lock_as(struct kbase_context *kctx)
+{
+	down_write(kbase_mem_get_process_mmap_lock());
+	kbase_gpu_vm_lock(kctx);
+}
+
+/**
+ * gpu_slc_unlock_as - Unlock the current process address space
+ *
+ * @kctx:  The &struct kbase_context
+ */
+static void gpu_slc_unlock_as(struct kbase_context *kctx)
+{
+	kbase_gpu_vm_unlock(kctx);
+	up_write(kbase_mem_get_process_mmap_lock());
+}
+
+/**
+ * gpu_slc_in_group - Check whether the region is SLC cacheable
+ *
+ * @reg:   The gpu memory region to check for an SLC cacheable memory group.
+ */
+static bool gpu_slc_in_group(struct kbase_va_region* reg)
+{
+	return reg->gpu_alloc->group_id == MGM_SLC_GROUP_ID;
+}
+
+/**
+ * gpu_slc_get_region - Find the gpu memory region from a virtual address
+ *
+ * @kctx:  The &struct kbase_context
+ * @va:    The base gpu virtual address of the region
+ *
+ * Return: On success, returns a valid memory region. On failure NULL is returned.
+ */
+static struct kbase_va_region* gpu_slc_get_region(struct kbase_context *kctx, u64 va)
+{
+	struct kbase_va_region *reg;
+
+	if (!va)
+		goto invalid;
+
+	if ((va & ~PAGE_MASK) && (va >= PAGE_SIZE))
+		goto invalid;
+
+	/* Find the region that the virtual address belongs to */
+	reg = kbase_region_tracker_find_region_base_address(kctx, va);
+
+	/* Validate the region */
+	if (kbase_is_region_invalid_or_free(reg))
+		goto invalid;
+
+	return reg;
+
+invalid:
+	dev_dbg(kctx->kbdev->dev, "pixel: failed to find valid region for gpu_va: %llu", va);
+	return NULL;
+}
+
+/**
+ * gpu_slc_migrate_region - Add PBHA that will make the pages SLC cacheable
+ *
+ * @kctx:      The &struct kbase_context
+ * @reg:       The gpu memory region migrate to an SLC cacheable memory group
+ * @dirty_reg: The &struct dirty_region containing the extent of the dirty page table entries
+ */
+static void gpu_slc_migrate_region(struct kbase_context *kctx, struct kbase_va_region *reg, struct dirty_region *dirty_reg)
+{
+	int err;
+	u64 vpfn;
+	size_t page_nr;
+
+	KBASE_DEBUG_ASSERT(kctx);
+	KBASE_DEBUG_ASSERT(reg);
+
+	vpfn = reg->start_pfn;
+	page_nr = kbase_reg_current_backed_size(reg);
+
+	err = kbase_mmu_update_pages_no_flush(kctx->kbdev, &kctx->mmu, vpfn,
+			kbase_get_gpu_phy_pages(reg),
+			page_nr,
+			reg->flags,
+			MGM_SLC_GROUP_ID,
+			&dirty_reg->dirty_pgds);
+
+	/* Track the dirty region */
+	dirty_reg->first_vpfn = min(dirty_reg->first_vpfn, vpfn);
+	dirty_reg->last_vpfn = max(dirty_reg->last_vpfn, vpfn + page_nr);
+
+	if (err)
+		dev_warn(kctx->kbdev->dev, "pixel: failed to move region to SLC: %d", err);
+	else
+		/* If everything is good, then set the new group on the region. */
+		reg->gpu_alloc->group_id = MGM_SLC_GROUP_ID;
+}
+
+/**
+ * gpu_slc_flush_dirty_region - Perform an MMU flush for a dirty page region
+ *
+ * @kctx:      The &struct kbase_context
+ * @dirty_reg: The &struct dirty_region containing the extent of the dirty page table entries
+ */
+static void gpu_slc_flush_dirty_region(struct kbase_context *kctx, struct dirty_region *dirty_reg)
+{
+	size_t const dirty_page_nr =
+	    (dirty_reg->last_vpfn - min(dirty_reg->first_vpfn, dirty_reg->last_vpfn));
+
+	if (!dirty_page_nr)
+		return;
+
+	kbase_mmu_flush_invalidate_update_pages(
+	    kctx->kbdev, kctx, dirty_reg->first_vpfn, dirty_page_nr, dirty_reg->dirty_pgds);
+}
+
+/**
+ * gpu_slc_resize_partition - Attempt to resize the GPU's SLC partition to meet demand.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ */
+static void gpu_slc_resize_partition(struct kbase_device* kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	/* Request that the mgm select an SLC partition that fits our demand */
+	pixel_mgm_resize_group_to_fit(kbdev->mgm_dev, MGM_SLC_GROUP_ID, pc->slc.demand);
+
+	dev_dbg(kbdev->dev, "pixel: resized GPU SLC partition to meet demand: %llu", pc->slc.demand);
+}
+
+/**
+ * gpu_slc_get_partition_size - Query the current size of the GPU's SLC partition.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Returns the size of the GPU's SLC partition.
+ */
+static u64 gpu_slc_get_partition_size(struct kbase_device* kbdev)
+{
+	u64 const partition_size = pixel_mgm_query_group_size(kbdev->mgm_dev, MGM_SLC_GROUP_ID);
+
+	dev_dbg(kbdev->dev, "pixel: GPU SLC partition partition size: %llu", partition_size);
+
+	return partition_size;
+}
+
+/**
+ * gpu_slc_liveness_update - Respond to a liveness update by trying to put the new buffers into free
+ *                           SLC space, and resizing the partition to meet demand.
+ *
+ * @kctx:   The &struct kbase_context corresponding to a user space context which sent the liveness
+ *          update
+ * @info:   See struct gpu_slc_liveness_update_info
+ */
+static void gpu_slc_liveness_update(struct kbase_context* kctx,
+                                    struct gpu_slc_liveness_update_info* info)
+{
+	struct kbase_device* kbdev = kctx->kbdev;
+	struct pixel_context *pc = kbdev->platform_context;
+	struct pixel_platform_data *kctx_pd = kctx->platform_data;
+	struct dirty_region dirty_reg = {
+		.first_vpfn = U64_MAX,
+		.last_vpfn = 0,
+		.dirty_pgds = 0,
+	};
+	u64 current_usage = 0;
+	u64 current_demand = 0;
+	u64 free_space;
+	int i;
+
+	/* Lock the process address space before modifying ATE's */
+	gpu_slc_lock_as(kctx);
+
+	/* Synchronize updates to the partition size and usage */
+	mutex_lock(&pc->slc.lock);
+
+	dev_dbg(kbdev->dev, "pixel: buffer liveness update received");
+
+	/* Remove the usage and demand from the previous liveness update */
+	pc->slc.demand -= kctx_pd->slc.peak_demand;
+	pc->slc.usage -= kctx_pd->slc.peak_usage;
+	kctx_pd->slc.peak_demand = 0;
+	kctx_pd->slc.peak_usage = 0;
+
+	/* Calculate the remaining free space in the SLC partition (floored at 0) */
+	free_space = gpu_slc_get_partition_size(kbdev);
+	free_space -= min(free_space, pc->slc.usage);
+
+	for (i = 0; i < info->live_ranges_count; ++i)
+	{
+		struct kbase_va_region *reg;
+                u64 size;
+                u64 va;
+		u32 index = info->live_ranges[i].index;
+
+		if (unlikely(index >= info->buffer_count))
+			continue;
+
+		size = info->buffer_sizes[index];
+		va = info->buffer_va[index];
+
+		reg = gpu_slc_get_region(kctx, va);
+		if(!reg)
+			continue;
+
+		switch (info->live_ranges[i].type)
+		{
+		case KBASE_PIXEL_GPU_LIVE_RANGE_BEGIN:
+			/* Update demand as though there's no size limit */
+			current_demand += size;
+			kctx_pd->slc.peak_demand = max(kctx_pd->slc.peak_demand, current_demand);
+
+			/* Check whether there's free space in the partition to store the buffer */
+			if (free_space >= current_usage + size)
+				gpu_slc_migrate_region(kctx, reg, &dirty_reg);
+
+			/* This may be true, even if the space calculation above returned false,
+			 * as a previous call to this function may have migrated the region.
+			 * In such a scenario, the current_usage may exceed the available free_space
+			 * and we will be oversubscribed to the SLC partition.
+			 * We could migrate the region back to the non-SLC group, but this would
+			 * require an SLC flush, so for now we do nothing.
+			 */
+			if (gpu_slc_in_group(reg)) {
+				current_usage += size;
+				kctx_pd->slc.peak_usage = max(kctx_pd->slc.peak_usage, current_usage);
+			}
+			break;
+		case KBASE_PIXEL_GPU_LIVE_RANGE_END:
+			current_demand -= size;
+			if (gpu_slc_in_group(reg))
+				current_usage -= size;
+			break;
+		}
+	}
+	/* Perform single page table flush */
+	gpu_slc_flush_dirty_region(kctx, &dirty_reg);
+
+	/* Indicates a missing live range end marker */
+	WARN_ON_ONCE(current_demand != 0 || current_usage != 0);
+
+	/* Update the total usage and demand */
+	pc->slc.demand += kctx_pd->slc.peak_demand;
+	pc->slc.usage += kctx_pd->slc.peak_usage;
+
+	dev_dbg(kbdev->dev,
+	        "pixel: kctx_%d, peak_demand: %llu, peak_usage: %llu",
+	        kctx->id,
+	        kctx_pd->slc.peak_demand,
+	        kctx_pd->slc.peak_usage);
+	dev_dbg(kbdev->dev, "pixel: kbdev, demand: %llu, usage: %llu", pc->slc.demand, pc->slc.usage);
+
+	/* Trigger partition resize based on the new demand */
+	gpu_slc_resize_partition(kctx->kbdev);
+
+	mutex_unlock(&pc->slc.lock);
+	gpu_slc_unlock_as(kctx);
+}
+
+/**
+ * gpu_pixel_handle_buffer_liveness_update_ioctl() - See gpu_slc_liveness_update
+ *
+ * @kctx:   The &struct kbase_context corresponding to a user space context which sent the liveness
+ *          update
+ * @update: See struct kbase_ioctl_buffer_liveness_update
+ *
+ * Context: Process context. Takes and releases the GPU power domain lock. Expects the caller to
+ *          hold the DVFS lock.
+ */
+int gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx,
+                                                  struct kbase_ioctl_buffer_liveness_update* update)
+{
+	int err = -EINVAL;
+	struct gpu_slc_liveness_update_info info;
+	u64* buff = NULL;
+	u64 total_buff_size;
+
+	/* Compute the sizes of the user space arrays that we need to copy */
+	u64 const buffer_info_size = sizeof(u64) * update->buffer_count;
+	u64 const live_ranges_size =
+	    sizeof(struct kbase_pixel_gpu_slc_liveness_mark) * update->live_ranges_count;
+
+	/* Guard against overflows and empty sizes */
+	if (!buffer_info_size || !live_ranges_size)
+		goto done;
+	if (U64_MAX / sizeof(u64) < update->buffer_count)
+		goto done;
+	if (U64_MAX / sizeof(struct kbase_pixel_gpu_slc_liveness_mark) < update->live_ranges_count)
+		goto done;
+	/* Guard against nullptr */
+	if (!update->live_ranges_address || !update->buffer_va_address || !update->buffer_sizes_address)
+		goto done;
+	/* Calculate the total buffer size required and detect overflows */
+	if ((U64_MAX - live_ranges_size) / 2 < buffer_info_size)
+		goto done;
+
+	total_buff_size = buffer_info_size * 2 + live_ranges_size;
+
+	/* Allocate the memory we require to copy from user space */
+	buff = kmalloc(total_buff_size, GFP_KERNEL);
+	if (buff == NULL) {
+		dev_err(kctx->kbdev->dev, "pixel: failed to allocate buffer for liveness update");
+		err = -ENOMEM;
+		goto done;
+	}
+
+	/* Set up the info struct by pointing into the allocation. All 8 byte aligned */
+	info = (struct gpu_slc_liveness_update_info){
+	    .buffer_va = buff,
+	    .buffer_sizes = buff + update->buffer_count,
+	    .buffer_count = update->buffer_count,
+	    .live_ranges = (struct kbase_pixel_gpu_slc_liveness_mark*)(buff + update->buffer_count * 2),
+	    .live_ranges_count = update->live_ranges_count,
+	};
+
+	/* Copy the data from user space */
+	err =
+	    copy_from_user(info.live_ranges, u64_to_user_ptr(update->live_ranges_address), live_ranges_size);
+	if (err) {
+		dev_err(kctx->kbdev->dev, "pixel: failed to copy live ranges");
+		err = -EFAULT;
+		goto done;
+	}
+
+	err = copy_from_user(
+	    info.buffer_sizes, u64_to_user_ptr(update->buffer_sizes_address), buffer_info_size);
+	if (err) {
+		dev_err(kctx->kbdev->dev, "pixel: failed to copy buffer sizes");
+		err = -EFAULT;
+		goto done;
+	}
+
+	err = copy_from_user(info.buffer_va, u64_to_user_ptr(update->buffer_va_address), buffer_info_size);
+	if (err) {
+		dev_err(kctx->kbdev->dev, "pixel: failed to copy buffer addresses");
+		err = -EFAULT;
+		goto done;
+	}
+
+	/* Execute an slc update */
+	gpu_slc_liveness_update(kctx, &info);
+
+done:
+	kfree(buff);
+
+	return err;
+}
+
+/**
+ * gpu_slc_kctx_init() - Called when a kernel context is created
+ *
+ * @kctx: The &struct kbase_context that is being initialized
+ *
+ * This function is called when the GPU driver is initializing a new kernel context. This event is
+ * used to set up data structures that will be used to track this context's usage of the SLC.
+ *
+ * Return: Returns 0 on success, or an error code on failure.
+ */
+int gpu_slc_kctx_init(struct kbase_context *kctx)
+{
+	(void)kctx;
+	return 0;
+}
+
+/**
+ * gpu_slc_kctx_term() - Called when a kernel context is terminated
+ *
+ * @kctx: The &struct kbase_context that is being terminated
+ *
+ * Free up SLC space used by the buffers that this context owns.
+ */
+void gpu_slc_kctx_term(struct kbase_context *kctx)
+{
+	struct kbase_device* kbdev = kctx->kbdev;
+	struct pixel_context *pc = kbdev->platform_context;
+	struct pixel_platform_data *kctx_pd = kctx->platform_data;
+
+	mutex_lock(&pc->slc.lock);
+
+	/* Deduct the usage and demand, freeing that SLC space for the next update */
+	pc->slc.demand -= kctx_pd->slc.peak_demand;
+	pc->slc.usage -= kctx_pd->slc.peak_usage;
+
+	/* Trigger partition resize based on the new demand */
+	gpu_slc_resize_partition(kctx->kbdev);
+
+	mutex_unlock(&pc->slc.lock);
+}
+
+
+/**
+ * gpu_slc_init - Initialize the SLC partition for the GPU
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Return: On success, returns 0. On failure an error code is returned.
+ */
+int gpu_slc_init(struct kbase_device *kbdev)
+{
+	struct pixel_context *pc = kbdev->platform_context;
+
+	mutex_init(&pc->slc.lock);
+
+	return 0;
+}
+
+/**
+ * gpu_slc_term() - Terminates the Pixel GPU SLC partition.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ */
+void gpu_slc_term(struct kbase_device *kbdev)
+{
+	(void)kbdev;
+}
diff --git a/mali_kbase/platform/pixel/pixel_gpu_slc.h b/mali_kbase/platform/pixel/pixel_gpu_slc.h
new file mode 100644
index 0000000..29b4eb3
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_slc.h
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2022-2023 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+#ifndef _PIXEL_GPU_SLC_H_
+#define _PIXEL_GPU_SLC_H_
+
+#ifdef CONFIG_MALI_PIXEL_GPU_SLC
+int gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx,
+                                                  struct kbase_ioctl_buffer_liveness_update* update);
+
+int gpu_slc_init(struct kbase_device *kbdev);
+
+void gpu_slc_term(struct kbase_device *kbdev);
+
+int gpu_slc_kctx_init(struct kbase_context *kctx);
+
+void gpu_slc_kctx_term(struct kbase_context *kctx);
+#else
+static int __maybe_unused gpu_pixel_handle_buffer_liveness_update_ioctl(struct kbase_context* kctx,
+                                                  struct kbase_ioctl_buffer_liveness_update* update)
+{
+	return (void)kctx, (void)update, 0;
+}
+
+int __maybe_unused gpu_slc_init(struct kbase_device *kbdev) { return (void)kbdev, 0; }
+
+void __maybe_unused gpu_slc_term(struct kbase_device *kbdev) { (void)kbdev; }
+
+static int __maybe_unused gpu_slc_kctx_init(struct kbase_context *kctx) { return (void)kctx, 0; }
+
+static void __maybe_unused gpu_slc_kctx_term(struct kbase_context* kctx) { (void)kctx; }
+#endif /* CONFIG_MALI_PIXEL_GPU_SLC */
+
+#endif /* _PIXEL_GPU_SLC_H_ */
diff --git a/mali_kbase/platform/pixel/pixel_gpu_sscd.c b/mali_kbase/platform/pixel/pixel_gpu_sscd.c
new file mode 100644
index 0000000..b374b00
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_sscd.c
@@ -0,0 +1,720 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+
+/* Mali core includes */
+#include <mali_kbase.h>
+#include <csf/mali_kbase_csf_trace_buffer.h>
+#include <csf/mali_kbase_csf_firmware.h>
+#include <csf/mali_kbase_csf_firmware_cfg.h>
+#include <csf/mali_kbase_csf_firmware_core_dump.h>
+
+/* Pixel integration includes */
+#include "mali_kbase_config_platform.h"
+#include <mali_kbase_reset_gpu.h>
+#include "pixel_gpu_sscd.h"
+#include "pixel_gpu_debug.h"
+#include "pixel_gpu_control.h"
+#include <linux/platform_data/sscoredump.h>
+#include <linux/platform_device.h>
+
+/***************************************************************************************************
+ * This feature is a WIP, and is pending Firmware + core KMD support for:                          *
+ *        - Dumping FW private memory                                                              *
+ *        - Suspending the MCU                                                                     *
+ *        - Dumping MCU registers                                                                  *
+ **************************************************************************************************/
+
+static void sscd_release(struct device *dev)
+{
+	(void)dev;
+}
+
+static struct sscd_platform_data sscd_pdata;
+const static struct platform_device sscd_dev_init = { .name = "mali",
+						      .driver_override = SSCD_NAME,
+						      .id = -1,
+						      .dev = {
+							      .platform_data = &sscd_pdata,
+							      .release = sscd_release,
+						      } };
+static struct platform_device sscd_dev;
+
+enum
+{
+	MCU_REGISTERS = 0x1,
+	GPU_REGISTERS = 0x2,
+	PRIVATE_MEM = 0x3,
+	SHARED_MEM = 0x4,
+	FW_TRACE = 0x5,
+	PM_EVENT_LOG = 0x6,
+	POWER_RAIL_LOG = 0x7,
+	PDC_STATUS = 0x8,
+	KTRACE = 0x9,
+	CONTEXTS = 0xA,
+	FW_CORE_DUMP = 0xB,
+	NUM_SEGMENTS
+} sscd_segs;
+
+static void get_pm_event_log(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (seg->addr == NULL)
+		return;
+
+	if (kbase_pm_copy_event_log(kbdev, seg->addr, seg->size)) {
+		dev_warn(kbdev->dev, "pixel: failed to report PM event log");
+	}
+}
+
+/**
+ * struct pixel_fw_trace_metadata - Info about the FW trace log
+ *
+ * @magic:          Always 'pfwt', helps find the log in memory dumps
+ * @trace_address:  The memory address of the FW trace log
+ * @trace_length:   Number of used bytes in the trace ring buffer.
+ *                  The length will be <= (FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT)
+ * @version:        Updated whenever the binary layout changes
+ * @_reserved:      Bytes reserved for future use
+ **/
+struct pixel_fw_trace_metadata {
+	char magic[4];
+	uint64_t trace_address;
+	uint32_t trace_length;
+	uint8_t version;
+	char _reserved[31];
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_fw_trace_metadata) == 48,
+	       "Incorrect pixel_fw_trace_metadata size");
+
+/**
+ * struct pixel_fw_trace - The FW trace and associated meta data
+ *
+ * @meta:      Info about the trace log
+ * @trace_log: The actual trace log
+ **/
+struct pixel_fw_trace {
+	struct pixel_fw_trace_metadata meta;
+	char trace_log[FW_TRACE_BUF_NR_PAGES << PAGE_SHIFT];
+};
+
+static void get_fw_trace(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	struct firmware_trace_buffer *tb;
+	struct pixel_fw_trace *fw_trace;
+
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (seg->addr == NULL)
+		return;
+
+	fw_trace = seg->addr;
+
+	/* Write the default meta data */
+	fw_trace->meta = (struct pixel_fw_trace_metadata) {
+		.magic = "pfwt",
+		.trace_address = 0,
+		.trace_length = 0,
+		.version = 1,
+	};
+
+	tb = kbase_csf_firmware_get_trace_buffer(kbdev, KBASE_CSFFW_LOG_BUF_NAME);
+
+	if (tb == NULL) {
+		dev_err(kbdev->dev, "pixel: failed to open firmware trace buffer");
+		return;
+	}
+
+	/* Write the trace log */
+	fw_trace->meta.trace_address = (uint64_t)tb;
+	fw_trace->meta.trace_length = kbase_csf_firmware_trace_buffer_read_data(
+		tb, fw_trace->trace_log, sizeof(fw_trace->trace_log));
+
+	return;
+}
+
+/**
+ * struct pixel_ktrace_metadata - Info about the ktrace log
+ *
+ * @magic:          Always 'ktra', helps find the log in memory dumps
+ * @trace_address:  The memory address of the ktrace log
+ * @trace_start:    Start of the ktrace ringbuffer
+ * @trace_end:      End of the ktrace ringbuffer
+ * @version_major:  Ktrace major version.
+ * @version_minor:  Ktrace minor version.
+ * @_reserved:      Bytes reserved for future use
+ **/
+struct pixel_ktrace_metadata {
+	char magic[4];
+	uint64_t trace_address;
+	uint32_t trace_start;
+	uint32_t trace_end;
+	uint8_t version_major;
+	uint8_t version_minor;
+	char _reserved[28];
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_ktrace_metadata) == 50,
+	       "Incorrect pixel_ktrace_metadata size");
+
+struct pixel_ktrace {
+	struct pixel_ktrace_metadata meta;
+#if KBASE_KTRACE_TARGET_RBUF
+	struct kbase_ktrace_msg trace_log[KBASE_KTRACE_SIZE];
+#endif
+};
+static void get_ktrace(struct kbase_device *kbdev,
+			  struct sscd_segment *seg)
+{
+	struct pixel_ktrace *ktrace = seg->addr;
+#if KBASE_KTRACE_TARGET_RBUF
+	unsigned long flags;
+	u32 entries_copied = 0;
+#endif
+
+	if (seg->addr == NULL)
+		return;
+
+	ktrace->meta = (struct pixel_ktrace_metadata) { .magic = "ktra" };
+#if KBASE_KTRACE_TARGET_RBUF
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+	spin_lock_irqsave(&kbdev->ktrace.lock, flags);
+	ktrace->meta.trace_address = (uint64_t)kbdev->ktrace.rbuf;
+	ktrace->meta.trace_start = kbdev->ktrace.first_out;
+	ktrace->meta.trace_end = kbdev->ktrace.next_in;
+	ktrace->meta.version_major = KBASE_KTRACE_VERSION_MAJOR;
+	ktrace->meta.version_minor = KBASE_KTRACE_VERSION_MINOR;
+
+	entries_copied = kbasep_ktrace_copy(kbdev, seg->addr, KBASE_KTRACE_SIZE);
+	if (entries_copied != KBASE_KTRACE_SIZE)
+		dev_warn(kbdev->dev, "only copied %i of %i ktrace entries",
+			entries_copied, KBASE_KTRACE_SIZE);
+	spin_unlock_irqrestore(&kbdev->ktrace.lock, flags);
+
+	KBASE_KTRACE_RBUF_DUMP(kbdev);
+#else
+	dev_warn(kbdev->dev, "ktrace information not present");
+#endif
+}
+
+#if MALI_USE_CSF
+/**
+ * enum pixel_context_state - a coarse platform independent state for a context.
+ *
+ * @PIXEL_CONTEXT_ACTIVE:   The context is running (in some capacity) on GPU.
+ * @PIXEL_CONTEXT_RUNNABLE: The context is runnable, but not running on GPU.
+ * @PIXEL_CONTEXT_INACTIVE: The context is not acive.
+ */
+enum pixel_context_state {
+	PIXEL_CONTEXT_ACTIVE = 0,
+	PIXEL_CONTEXT_RUNNABLE,
+	PIXEL_CONTEXT_INACTIVE
+};
+
+/**
+ * struct pixel_context_metadata - metadata for context information.
+ *
+ * @magic: always "c@tx"
+ * @version: version marker.
+ * @platform: unique id for platform reporting context.
+ * @_reserved: reserved.
+ */
+struct pixel_context_metadata {
+	char magic[4];
+	u8 version;
+	u32 platform;
+	char _reserved[27];
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_context_metadata) == 36,
+               "Incorrect pixel_context_metadata size");
+
+/**
+ * struct pixel_context_snapshot_entry - platform independent context record for
+ *                                       crash reports.
+ * @id:             The context id.
+ * @pid:            The PID that owns this context.
+ * @tgid:           The TGID that owns this context.
+ * @context_state:  The coarse state for a context.
+ * @priority:       The priority of this context.
+ * @gpu_slot:       The handle that the context may have representing the
+ *                  resource granted to run on the GPU.
+ * @platform_state: The platform-dependendant state, if any.
+ * @time_in_state:  The amount of time in ms that this context has been
+ * 		    in @platform_state.
+ */
+struct pixel_context_snapshot_entry {
+	u32 id;
+	u32 pid;
+	u32 tgid;
+	u8 context_state;
+	u32 priority;
+	u32 gpu_slot;
+	u32 platform_state;
+	u64 time_in_state;
+} __attribute__((packed));
+_Static_assert(sizeof(struct pixel_context_snapshot_entry) == 33,
+               "Incorrect pixel_context_metadata size");
+
+/**
+ * struct pixel_context_snapshot - list of platform independent context info.
+ *
+ * List of contexts of interest during SSCD generation time.
+ *
+ * @meta:         The metadata for the segment.
+ * @num_contexts: The number of contexts in the list.
+ * @contexts:     The context information.
+ */
+struct pixel_context_snapshot {
+	struct pixel_context_metadata meta;
+	u32 num_contexts;
+	struct pixel_context_snapshot_entry contexts[];
+} __attribute__((packed));
+
+static int pixel_context_snapshot_init(struct kbase_device *kbdev,
+				       struct sscd_segment* segment,
+				       size_t num_entries) {
+	segment->size = sizeof(struct pixel_context_snapshot) +
+		num_entries * sizeof(struct pixel_context_snapshot_entry);
+	segment->addr = kzalloc(segment->size, GFP_KERNEL);
+	if (segment->addr == NULL) {
+		segment->size = 0;
+		dev_err(kbdev->dev,
+			"pixel: failed to allocate context snapshot buffer");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void pixel_context_snapshot_term(struct sscd_segment* segment) {
+	if (segment && segment->addr) {
+		kfree(segment->addr);
+		segment->size = 0;
+		segment->addr = NULL;
+	}
+}
+
+/* get_and_init_contexts - fill the CONTEXT segment
+ *
+ * If function returns 0, caller is reponsible for freeing segment->addr.
+ *
+ * @kbdev: kbase_device
+ * @segment: the CONTEXT segment for report
+ *
+ * \returns: 0 on success.
+ */
+static int get_and_init_contexts(struct kbase_device *kbdev,
+		 struct sscd_segment *segment)
+{
+	u32 csg_nr;
+	u32 num_csg = kbdev->csf.global_iface.group_num;
+	struct pixel_context_snapshot *context_snapshot;
+	struct kbase_csf_scheduler *const scheduler = &kbdev->csf.scheduler;
+	size_t num_entries;
+	size_t entry_idx;
+	int rc;
+
+	if (!rt_mutex_trylock(&kbdev->csf.scheduler.lock)) {
+		dev_warn(kbdev->dev, "could not lock scheduler during dump.");
+		return -EBUSY;
+	}
+
+	num_entries = bitmap_weight(scheduler->csg_inuse_bitmap, num_csg);
+	rc = pixel_context_snapshot_init(kbdev, segment, num_entries);
+	if (rc) {
+		rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+		return rc;
+	}
+	context_snapshot = segment->addr;
+	context_snapshot->num_contexts = num_entries;
+
+	context_snapshot->meta = (struct pixel_context_metadata) {
+		.magic = "c@tx",
+		.platform = kbdev->gpu_props.props.raw_props.gpu_id,
+		.version = 1,
+	};
+
+	entry_idx = 0;
+	for_each_set_bit(csg_nr, scheduler->csg_inuse_bitmap, num_csg) {
+		struct kbase_csf_csg_slot *slot =
+			&kbdev->csf.scheduler.csg_slots[csg_nr];
+		struct pixel_context_snapshot_entry *entry =
+			&context_snapshot->contexts[entry_idx++];
+		entry->context_state = PIXEL_CONTEXT_ACTIVE;
+		entry->gpu_slot = csg_nr;
+		entry->platform_state = atomic_read(&slot->state);
+		entry->priority = slot->priority;
+		entry->time_in_state = (jiffies - slot->trigger_jiffies) / HZ;
+		if (slot->resident_group) {
+			entry->id = slot->resident_group->handle;
+			entry->pid = slot->resident_group->kctx->pid;
+			entry->tgid = slot->resident_group->kctx->tgid;
+		}
+	}
+
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
+	return 0;
+}
+#endif
+
+struct pixel_fw_core_dump {
+	char magic[4];
+	u32 reserved;
+	char git_sha[BUILD_INFO_GIT_SHA_LEN];
+	char core_dump[];
+};
+
+static void get_and_init_fw_core_dump(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	const size_t core_dump_size = get_fw_core_dump_size(kbdev);
+
+	int i;
+	struct pixel_fw_core_dump *fw_core_dump;
+	struct kbase_csf_firmware_interface *interface;
+	struct page *page;
+	u32 *p;
+	size_t size;
+	size_t write_size;
+
+	if (core_dump_size == -1)
+	{
+		dev_err(kbdev->dev, "pixel: failed to get firmware core dump size");
+	}
+
+	seg->size = sizeof(struct pixel_fw_core_dump) + core_dump_size;
+	seg->addr = kzalloc(seg->size, GFP_KERNEL);
+
+	if (seg->addr == NULL) {
+		seg->size = 0;
+		dev_err(kbdev->dev, "pixel: failed to allocate for firmware core dump buffer");
+		return;
+	}
+
+	fw_core_dump = (struct pixel_fw_core_dump *) seg->addr;
+
+	strncpy(fw_core_dump->magic, "fwcd", 4);
+	memcpy(fw_core_dump->git_sha, fw_git_sha, BUILD_INFO_GIT_SHA_LEN);
+
+	// Dumping ELF header
+	{
+		struct fw_core_dump_data private = {.kbdev = kbdev};
+		struct seq_file m = {.private = &private, .buf = fw_core_dump->core_dump, .size = core_dump_size};
+		fw_core_dump_write_elf_header(&m);
+		size = m.count;
+		if (unlikely(m.count >= m.size))
+			dev_warn(kbdev->dev, "firmware core dump header may be larger than buffer size");
+	}
+
+	// Dumping pages
+	list_for_each_entry(interface, &kbdev->csf.firmware_interfaces, node) {
+		/* Skip memory sections that cannot be read or are protected. */
+		if ((interface->flags & CSF_FIRMWARE_ENTRY_PROTECTED) ||
+		    (interface->flags & CSF_FIRMWARE_ENTRY_READ) == 0)
+			continue;
+
+		for(i = 0; i < interface->num_pages; i++)
+		{
+			page = as_page(interface->phys[i]);
+			write_size = size < core_dump_size ? min(core_dump_size - size, (size_t) FW_PAGE_SIZE) : 0;
+			if (write_size)
+			{
+				p = kmap_atomic(page);
+				memcpy(fw_core_dump->core_dump + size, p, write_size);
+				kunmap_atomic(p);
+			}
+			size += FW_PAGE_SIZE;
+
+			if (size < FW_PAGE_SIZE)
+				break;
+		}
+	}
+
+	if (unlikely(size != core_dump_size))
+	{
+		dev_err(kbdev->dev, "firmware core dump size and buffer size are different");
+		kfree(seg->addr);
+		seg->addr = NULL;
+		seg->size = 0;
+	}
+
+	return;
+}
+/*
+ * Stub pending FW support
+ */
+static void get_fw_private_memory(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	(void)kbdev;
+	(void)seg;
+}
+/*
+ * Stub pending FW support
+ */
+static void get_fw_shared_memory(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	(void)kbdev;
+	(void)seg;
+}
+/*
+ * Stub pending FW support
+ */
+static void get_fw_registers(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	(void)kbdev;
+	(void)seg;
+}
+
+/*
+ * Stub pending FW support
+ */
+static void get_gpu_registers(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	(void)kbdev;
+	(void)seg;
+}
+/*
+ * Stub pending FW support
+ */
+static void flush_caches(struct kbase_device *kbdev)
+{
+	(void)kbdev;
+}
+/*
+ * Stub pending FW support
+ */
+static void suspend_mcu(struct kbase_device *kbdev)
+{
+	(void)kbdev;
+}
+
+static void get_rail_state_log(struct kbase_device *kbdev, struct sscd_segment *seg)
+{
+	lockdep_assert_held(&((struct pixel_context*)kbdev->platform_context)->pm.lock);
+
+	seg->addr = gpu_pm_get_rail_state_log(kbdev);
+	seg->size = gpu_pm_get_rail_state_log_size(kbdev);
+}
+
+static void get_pdc_state(struct kbase_device *kbdev, struct pixel_gpu_pdc_status *pdc_status,
+			  struct sscd_segment *seg)
+{
+	lockdep_assert_held(&kbdev->hwaccess_lock);
+
+	if (pdc_status == NULL) {
+		dev_err(kbdev->dev, "pixel: failed to read PDC status, no storage");
+		return;
+	}
+	gpu_debug_read_pdc_status(kbdev, pdc_status);
+	seg->addr = pdc_status;
+	seg->size = sizeof(*pdc_status);
+}
+
+static int segments_init(struct kbase_device *kbdev, struct sscd_segment* segments)
+{
+	/* Zero init everything for safety */
+	memset(segments, 0, sizeof(struct sscd_segment) * NUM_SEGMENTS);
+
+	segments[PM_EVENT_LOG].size = kbase_pm_max_event_log_size(kbdev);
+	segments[PM_EVENT_LOG].addr = kzalloc(segments[PM_EVENT_LOG].size, GFP_KERNEL);
+
+	if (!segments[PM_EVENT_LOG].addr) {
+		segments[PM_EVENT_LOG].size = 0;
+		dev_err(kbdev->dev, "pixel: failed to allocate for PM event log");
+		return -ENOMEM;
+	}
+
+	segments[FW_TRACE].size = sizeof(struct pixel_fw_trace);
+	segments[FW_TRACE].addr = kzalloc(sizeof(struct pixel_fw_trace), GFP_KERNEL);
+
+	if (segments[FW_TRACE].addr == NULL) {
+		segments[FW_TRACE].size = 0;
+		dev_err(kbdev->dev, "pixel: failed to allocate for firmware trace description");
+		return -ENOMEM;
+	}
+
+	segments[KTRACE].size = sizeof(struct pixel_ktrace);
+	segments[KTRACE].addr = kzalloc(sizeof(struct pixel_ktrace), GFP_KERNEL);
+	if (segments[KTRACE].addr == NULL) {
+		segments[KTRACE].size = 0;
+		dev_err(kbdev->dev, "pixel: failed to allocate for ktrace buffer");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void segments_term(struct kbase_device *kbdev, struct sscd_segment* segments)
+{
+	(void)kbdev;
+
+	kfree(segments[FW_TRACE].addr);
+	kfree(segments[PM_EVENT_LOG].addr);
+	kfree(segments[KTRACE].addr);
+#if MALI_USE_CSF
+	pixel_context_snapshot_term(segments);
+#endif
+	/* Null out the pointers */
+	memset(segments, 0, sizeof(struct sscd_segment) * NUM_SEGMENTS);
+}
+
+#define GPU_HANG_SSCD_TIMEOUT_MS (300000) /* 300s */
+
+/**
+ * gpu_sscd_dump() - Initiates and reports a subsystem core-dump of the GPU.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ * @reason: A null terminated string containing a dump reason
+ *
+ * Context: Process context.
+ */
+void gpu_sscd_dump(struct kbase_device *kbdev, const char* reason)
+{
+	struct sscd_segment segs[NUM_SEGMENTS];
+	struct sscd_platform_data *pdata = dev_get_platdata(&sscd_dev.dev);
+	struct pixel_context *pc = kbdev->platform_context;
+	int ec = 0;
+	unsigned long flags, current_ts = jiffies;
+	struct pixel_gpu_pdc_status pdc_status;
+	static unsigned long last_hang_sscd_ts;
+#if MALI_USE_CSF
+	int fwcd_err;
+#endif
+
+	if (!strcmp(reason, "GPU hang")) {
+		/* GPU hang - avoid multiple coredumps for the same hang until
+		 * GPU_HANG_SSCD_TIMEOUT_MS passes and GPU reset shows no failure.
+		 */
+		if (!last_hang_sscd_ts || (time_after(current_ts,
+				last_hang_sscd_ts + msecs_to_jiffies(GPU_HANG_SSCD_TIMEOUT_MS)) &&
+				!kbase_reset_gpu_failed(kbdev))) {
+			last_hang_sscd_ts = current_ts;
+		} else {
+			dev_info(kbdev->dev, "pixel: skipping mali subsystem core dump");
+			return;
+		}
+	}
+
+	dev_info(kbdev->dev, "pixel: mali subsystem core dump in progress");
+	/* No point in proceeding if we can't report the dumped data */
+	if (!pdata->sscd_report) {
+		dev_warn(kbdev->dev, "pixel: failed to report core dump, sscd_report was NULL");
+		return;
+	}
+
+#if MALI_USE_CSF
+	fwcd_err = fw_core_dump_create(kbdev);
+	if (fwcd_err)
+		dev_err(kbdev->dev, "pixel: failed to create firmware core dump (%d)", fwcd_err);
+#endif
+
+	ec = segments_init(kbdev, segs);
+	if (ec != 0) {
+		dev_err(kbdev->dev,
+			"pixel: failed to init core dump segments (%d), partial dump in progress", ec);
+	}
+
+	/* We don't want anything messing with the HW while we dump */
+	spin_lock_irqsave(&kbdev->hwaccess_lock, flags);
+
+	/* Read the FW view of GPU PDC state, we get this early */
+	get_pdc_state(kbdev, &pdc_status, &segs[PDC_STATUS]);
+
+	/* Suspend the MCU to prevent it from overwriting the data we want to dump */
+	suspend_mcu(kbdev);
+
+	/* Flush the cache so our memory page reads contain up to date values */
+	flush_caches(kbdev);
+
+	/* Read out the updated FW private memory pages */
+	get_fw_private_memory(kbdev, &segs[PRIVATE_MEM]);
+
+	/* Read out the updated memory shared between host and firmware */
+	get_fw_shared_memory(kbdev, &segs[SHARED_MEM]);
+
+	get_fw_registers(kbdev, &segs[MCU_REGISTERS]);
+	get_gpu_registers(kbdev, &segs[GPU_REGISTERS]);
+
+	get_fw_trace(kbdev, &segs[FW_TRACE]);
+
+	get_pm_event_log(kbdev, &segs[PM_EVENT_LOG]);
+
+	get_ktrace(kbdev, &segs[KTRACE]);
+
+#if MALI_USE_CSF
+	ec = get_and_init_contexts(kbdev, &segs[CONTEXTS]);
+	if (ec) {
+		dev_err(kbdev->dev,
+			"could not collect active contexts: rc: %i", ec);
+	}
+
+	if (!fwcd_err)
+		get_and_init_fw_core_dump(kbdev, &segs[FW_CORE_DUMP]);
+#endif
+
+	spin_unlock_irqrestore(&kbdev->hwaccess_lock, flags);
+
+	/* Acquire the pm lock to prevent modifications to the rail state log */
+	mutex_lock(&pc->pm.lock);
+
+	get_rail_state_log(kbdev, &segs[POWER_RAIL_LOG]);
+
+	/* Report the core dump and generate an ELF header for it */
+	pdata->sscd_report(&sscd_dev, segs, NUM_SEGMENTS, SSCD_FLAGS_ELFARM64HDR, reason);
+
+	/* Must be held until the dump completes, as the log is referenced rather than copied */
+	mutex_unlock(&pc->pm.lock);
+
+	segments_term(kbdev, segs);
+}
+
+/**
+ * gpu_sscd_fw_log_init() - Set's the FW log verbosity.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ * @level: The log verbosity.
+ *
+ * Context: Process context.
+ *
+ * Return: On success returns 0, otherwise returns an error code.
+ */
+int gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level)
+{
+	u32 addr;
+	int ec = kbase_csf_firmware_cfg_find_config_address(kbdev, "Log verbosity", &addr);
+
+	if (!ec) {
+		/* Update the FW log verbosity in FW memory */
+		kbase_csf_update_firmware_memory(kbdev, addr, level);
+	}
+
+	return ec;
+}
+
+/**
+ * gpu_sscd_init() - Registers the SSCD platform device.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Context: Process context.
+ *
+ * Return: On success, returns 0 otherwise returns an error code.
+ */
+int gpu_sscd_init(struct kbase_device *kbdev)
+{
+	sscd_dev = sscd_dev_init;
+	return platform_device_register(&sscd_dev);
+}
+
+/**
+ * gpu_sscd_term() - Unregisters the SSCD platform device.
+ *
+ * @kbdev: The &struct kbase_device for the GPU.
+ *
+ * Context: Process context.
+ */
+void gpu_sscd_term(struct kbase_device *kbdev)
+{
+	platform_device_unregister(&sscd_dev);
+}
diff --git a/mali_kbase/platform/pixel/pixel_gpu_sscd.h b/mali_kbase/platform/pixel/pixel_gpu_sscd.h
new file mode 100644
index 0000000..68f7a0b
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_sscd.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2021 Google LLC.
+ *
+ * Author: Jack Diver <diverj@google.com>
+ */
+
+#ifndef _PIXEL_GPU_SSCD_H_
+#define _PIXEL_GPU_SSCD_H_
+
+#include <mali_kbase.h>
+
+#ifdef CONFIG_MALI_PIXEL_GPU_SSCD
+int gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level);
+
+int gpu_sscd_init(struct kbase_device *kbdev);
+
+void gpu_sscd_term(struct kbase_device *kbdev);
+
+void gpu_sscd_dump(struct kbase_device *kbdev, const char* reason);
+#else
+static int __maybe_unused gpu_sscd_fw_log_init(struct kbase_device *kbdev, u32 level)
+{
+	return (void)kbdev, (void)level, 0;
+}
+
+static int __maybe_unused gpu_sscd_init(struct kbase_device *kbdev) { return (void)kbdev, 0; }
+
+static void __maybe_unused gpu_sscd_term(struct kbase_device *kbdev) { (void)kbdev; }
+
+static void __maybe_unused gpu_sscd_dump(struct kbase_device *kbdev, const char* reason)
+{
+	(void)kbdev, (void)reason;
+}
+#endif /* CONFIG_MALI_PIXEL_GPU_SSCD */
+
+#endif /* _PIXEL_GPU_SSCD_H_ */
diff --git a/mali_kbase/platform/pixel/pixel_gpu_sysfs.c b/mali_kbase/platform/pixel/pixel_gpu_sysfs.c
index e856039..f6164f9 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_sysfs.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_sysfs.c
@@ -7,11 +7,13 @@
 
 /* Mali core includes */
 #include <mali_kbase.h>
+#include <trace/events/power.h>
 
 /* Pixel integration includes */
 #include "mali_kbase_config_platform.h"
 #include "pixel_gpu_control.h"
 #include "pixel_gpu_dvfs.h"
+#include "pixel_gpu_sscd.h"
 
 static const char *gpu_dvfs_level_lock_names[GPU_DVFS_LEVEL_LOCK_COUNT] = {
 	"devicetree",
@@ -315,12 +317,25 @@ static ssize_t uid_time_in_state_h_show(struct device *dev, struct device_attrib
 	return ret;
 }
 
+static ssize_t trigger_core_dump_store(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+	struct kbase_device *kbdev = dev->driver_data;
+
+	(void)attr, (void)buf;
+
+	gpu_sscd_dump(kbdev, "Manual core dump");
+
+	return count;
+}
+
 DEVICE_ATTR_RO(utilization);
 DEVICE_ATTR_RO(clock_info);
 DEVICE_ATTR_RO(dvfs_table);
 DEVICE_ATTR_RO(power_stats);
 DEVICE_ATTR_RO(uid_time_in_state);
 DEVICE_ATTR_RO(uid_time_in_state_h);
+DEVICE_ATTR_WO(trigger_core_dump);
 
 
 /* devfreq-like attributes */
@@ -431,6 +446,8 @@ static ssize_t hint_max_freq_store(struct device *dev, struct device_attribute *
 	if (level < 0)
 		return -EINVAL;
 
+	trace_clock_set_rate("gpu_hint_max", clock, raw_smp_processor_id());
+
 	mutex_lock(&pc->dvfs.lock);
 	gpu_dvfs_update_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_HINT, -1, level);
 	gpu_dvfs_select_level(kbdev);
@@ -475,6 +492,8 @@ static ssize_t hint_min_freq_store(struct device *dev, struct device_attribute *
 	if (level < 0)
 		return -EINVAL;
 
+	trace_clock_set_rate("gpu_hint_min", clock, raw_smp_processor_id());
+
 	mutex_lock(&pc->dvfs.lock);
 	gpu_dvfs_update_level_lock(kbdev, GPU_DVFS_LEVEL_LOCK_HINT, level, -1);
 	gpu_dvfs_select_level(kbdev);
@@ -676,6 +695,57 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
 	return ret;
 }
 
+static ssize_t ifpo_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	struct kbase_device *kbdev = dev->driver_data;
+	struct pixel_context *pc = kbdev->platform_context;
+	ssize_t ret = 0;
+
+	if (!pc)
+		return -ENODEV;
+
+	mutex_lock(&pc->pm.lock);
+	ret = scnprintf(buf, PAGE_SIZE, "%d\n", pc->pm.ifpo_enabled);
+	mutex_unlock(&pc->pm.lock);
+	return ret;
+#else
+	return -ENOTSUPP;
+#endif
+}
+
+static ssize_t ifpo_store(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+#ifdef CONFIG_MALI_HOST_CONTROLS_SC_RAILS
+	int ret;
+	bool enabled;
+	struct kbase_device *kbdev = dev->driver_data;
+	struct pixel_context *pc = kbdev->platform_context;
+	if (!pc)
+		return -ENODEV;
+
+	ret = strtobool(buf, &enabled);
+	if (ret)
+		return -EINVAL;
+
+	mutex_lock(&kbdev->csf.scheduler.lock);
+
+	if (!enabled) {
+		turn_on_sc_power_rails(kbdev);
+	}
+
+	mutex_lock(&pc->pm.lock);
+	pc->pm.ifpo_enabled = enabled;
+	mutex_unlock(&pc->pm.lock);
+	mutex_unlock(&kbdev->csf.scheduler.lock);
+
+	return count;
+#else
+	return -ENOTSUPP;
+#endif
+}
+
 
 /* Define devfreq-like attributes */
 DEVICE_ATTR_RO(available_frequencies);
@@ -691,6 +761,7 @@ DEVICE_ATTR_RO(time_in_state);
 DEVICE_ATTR_RO(trans_stat);
 DEVICE_ATTR_RO(available_governors);
 DEVICE_ATTR_RW(governor);
+DEVICE_ATTR_RW(ifpo);
 
 /* Initialization code */
 
@@ -722,7 +793,9 @@ static struct {
 	{ "time_in_state", &dev_attr_time_in_state },
 	{ "trans_stat", &dev_attr_trans_stat },
 	{ "available_governors", &dev_attr_available_governors },
-	{ "governor", &dev_attr_governor }
+	{ "governor", &dev_attr_governor },
+	{ "trigger_core_dump", &dev_attr_trigger_core_dump },
+	{ "ifpo", &dev_attr_ifpo }
 };
 
 /**
diff --git a/mali_kbase/platform/pixel/pixel_gpu_tmu.c b/mali_kbase/platform/pixel/pixel_gpu_tmu.c
index a7b064b..dd49236 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_tmu.c
+++ b/mali_kbase/platform/pixel/pixel_gpu_tmu.c
@@ -207,7 +207,7 @@ static int gpu_tmu_notifier(struct notifier_block *notifier, unsigned long event
 			return NOTIFY_BAD;
 		}
 		dev_info(kbdev->dev,
-			"%s: GPU_THROTTLING event received limiting GPU clock to %d kHz\n",
+			"%s: Adjusting GPU clock to %d kHz for thermal constraints (this is normal)\n",
 			__func__, pc->dvfs.table[level].clk[GPU_DVFS_CLK_SHADERS]);
 		break;
 	default:
diff --git a/mali_kbase/platform/pixel/pixel_gpu_trace.h b/mali_kbase/platform/pixel/pixel_gpu_trace.h
index 775adde..6c30f1b 100644
--- a/mali_kbase/platform/pixel/pixel_gpu_trace.h
+++ b/mali_kbase/platform/pixel/pixel_gpu_trace.h
@@ -22,7 +22,6 @@
 
 #define GPU_POWER_STATE_SYMBOLIC_STRINGS \
 	{GPU_POWER_LEVEL_STACKS,	"STACKS"}, \
-	{GPU_POWER_LEVEL_COREGROUP,	"COREGROUP"}, \
 	{GPU_POWER_LEVEL_GLOBAL,	"GLOBAL"}, \
 	{GPU_POWER_LEVEL_OFF,		"OFF"}
 
@@ -46,6 +45,30 @@ TRACE_EVENT(gpu_power_state,
 	)
 );
 
+TRACE_EVENT(gpu_gov_rec_violate,
+	TP_PROTO(unsigned int recfreq, unsigned int retfreq,
+		unsigned int minlvfreq, unsigned int maxlvfreq),
+	TP_ARGS(recfreq, retfreq, minlvfreq, maxlvfreq),
+	TP_STRUCT__entry(
+		__field(unsigned int, recfreq)
+		__field(unsigned int, retfreq)
+		__field(unsigned int, minlvfreq)
+		__field(unsigned int, maxlvfreq)
+	),
+	TP_fast_assign(
+		__entry->recfreq	= recfreq;
+		__entry->retfreq	= retfreq;
+		__entry->minlvfreq	= minlvfreq;
+		__entry->maxlvfreq	= maxlvfreq;
+	),
+	TP_printk("rec=%u ret=%u min=%u max=%u",
+		__entry->recfreq,
+		__entry->retfreq,
+		__entry->minlvfreq,
+		__entry->maxlvfreq
+	)
+);
+
 #endif /* _TRACE_PIXEL_GPU_H */
 
 /* This part must be outside protection */
diff --git a/mali_kbase/platform/pixel/pixel_gpu_uevent.c b/mali_kbase/platform/pixel/pixel_gpu_uevent.c
new file mode 100644
index 0000000..a1db47c
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_uevent.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2023 Google LLC.
+ *
+ * Author: Varad Gautam <varadgautam@google.com>
+ */
+
+#include <linux/spinlock.h>
+#include "pixel_gpu_uevent.h"
+
+#define GPU_UEVENT_TIMEOUT_MS (1200000U) /* 20min */
+
+static struct gpu_uevent_ctx {
+    unsigned long last_uevent_ts[GPU_UEVENT_TYPE_MAX];
+    spinlock_t lock;
+} gpu_uevent_ctx = {
+    .last_uevent_ts = {0},
+    .lock = __SPIN_LOCK_UNLOCKED(gpu_uevent_ctx.lock)
+};
+
+static bool gpu_uevent_check_valid(const struct gpu_uevent *evt)
+{
+    switch (evt->type) {
+    case GPU_UEVENT_TYPE_KMD_ERROR:
+        switch (evt->info) {
+        case GPU_UEVENT_INFO_CSG_REQ_STATUS_UPDATE:
+        case GPU_UEVENT_INFO_CSG_SUSPEND:
+        case GPU_UEVENT_INFO_CSG_SLOTS_SUSPEND:
+        case GPU_UEVENT_INFO_CSG_GROUP_SUSPEND:
+        case GPU_UEVENT_INFO_CSG_EP_CFG:
+        case GPU_UEVENT_INFO_CSG_SLOTS_START:
+        case GPU_UEVENT_INFO_GROUP_TERM:
+        case GPU_UEVENT_INFO_QUEUE_START:
+        case GPU_UEVENT_INFO_QUEUE_STOP:
+        case GPU_UEVENT_INFO_QUEUE_STOP_ACK:
+        case GPU_UEVENT_INFO_CSG_SLOT_READY:
+        case GPU_UEVENT_INFO_L2_PM_TIMEOUT:
+        case GPU_UEVENT_INFO_PM_TIMEOUT:
+            return true;
+        default:
+            break;
+        }
+        break;
+    case GPU_UEVENT_TYPE_GPU_RESET:
+        switch (evt->info) {
+        case GPU_UEVENT_INFO_CSF_RESET_OK:
+        case GPU_UEVENT_INFO_CSF_RESET_FAILED:
+            return true;
+        default:
+            break;
+        }
+        break;
+    default:
+        break;
+    }
+
+    return false;
+}
+
+void pixel_gpu_uevent_send(struct kbase_device *kbdev, const struct gpu_uevent *evt)
+{
+    enum uevent_env_idx {
+        ENV_IDX_TYPE,
+        ENV_IDX_INFO,
+        ENV_IDX_NULL,
+        ENV_IDX_MAX
+    };
+    char *env[ENV_IDX_MAX] = {0};
+    unsigned long flags, current_ts = jiffies;
+    bool suppress_uevent = false;
+
+    if (!gpu_uevent_check_valid(evt)) {
+        dev_err(kbdev->dev, "unrecognized uevent type=%u info=%u", evt->type, evt->info);
+        return;
+    }
+
+    env[ENV_IDX_TYPE] = (char *) gpu_uevent_type_str(evt->type);
+    env[ENV_IDX_INFO] = (char *) gpu_uevent_info_str(evt->info);
+    env[ENV_IDX_NULL] = NULL;
+
+    spin_lock_irqsave(&gpu_uevent_ctx.lock, flags);
+
+    if (time_after(current_ts, gpu_uevent_ctx.last_uevent_ts[evt->type]
+            + msecs_to_jiffies(GPU_UEVENT_TIMEOUT_MS))) {
+        gpu_uevent_ctx.last_uevent_ts[evt->type] = current_ts;
+    } else {
+        suppress_uevent = true;
+    }
+
+    spin_unlock_irqrestore(&gpu_uevent_ctx.lock, flags);
+
+    if (!suppress_uevent)
+        kobject_uevent_env(&kbdev->dev->kobj, KOBJ_CHANGE, env);
+}
diff --git a/mali_kbase/platform/pixel/pixel_gpu_uevent.h b/mali_kbase/platform/pixel/pixel_gpu_uevent.h
new file mode 100644
index 0000000..1fe3c50
--- /dev/null
+++ b/mali_kbase/platform/pixel/pixel_gpu_uevent.h
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2023 Google LLC.
+ *
+ * Author: Varad Gautam <varadgautam@google.com>
+ */
+
+#ifndef _PIXEL_GPU_UEVENT_H_
+#define _PIXEL_GPU_UEVENT_H_
+
+#include <mali_kbase.h>
+
+#define GPU_UEVENT_TYPE_LIST                    \
+    GPU_UEVENT_TYPE(NONE)                       \
+    GPU_UEVENT_TYPE(KMD_ERROR)                  \
+    GPU_UEVENT_TYPE(GPU_RESET)                  \
+    GPU_UEVENT_TYPE(MAX)
+
+#define GPU_UEVENT_TYPE(type) GPU_UEVENT_TYPE_##type,
+enum gpu_uevent_type {
+    GPU_UEVENT_TYPE_LIST
+};
+
+#undef GPU_UEVENT_TYPE
+#define GPU_UEVENT_TYPE(type) "GPU_UEVENT_TYPE="#type,
+static inline const char *gpu_uevent_type_str(enum gpu_uevent_type type) {
+    static const char * const gpu_uevent_types[] = {
+        GPU_UEVENT_TYPE_LIST
+    };
+    return gpu_uevent_types[type];
+}
+#undef GPU_UEVENT_TYPE
+
+#define GPU_UEVENT_INFO_LIST                    \
+    GPU_UEVENT_INFO(NONE)                       \
+    GPU_UEVENT_INFO(CSG_REQ_STATUS_UPDATE)      \
+    GPU_UEVENT_INFO(CSG_SUSPEND)                \
+    GPU_UEVENT_INFO(CSG_SLOTS_SUSPEND)          \
+    GPU_UEVENT_INFO(CSG_GROUP_SUSPEND)          \
+    GPU_UEVENT_INFO(CSG_EP_CFG)                 \
+    GPU_UEVENT_INFO(CSG_SLOTS_START)            \
+    GPU_UEVENT_INFO(GROUP_TERM)                 \
+    GPU_UEVENT_INFO(QUEUE_START)                \
+    GPU_UEVENT_INFO(QUEUE_STOP)                 \
+    GPU_UEVENT_INFO(QUEUE_STOP_ACK)             \
+    GPU_UEVENT_INFO(CSG_SLOT_READY)             \
+    GPU_UEVENT_INFO(L2_PM_TIMEOUT)              \
+    GPU_UEVENT_INFO(PM_TIMEOUT)                 \
+    GPU_UEVENT_INFO(CSF_RESET_OK)               \
+    GPU_UEVENT_INFO(CSF_RESET_FAILED)           \
+    GPU_UEVENT_INFO(MAX)
+
+#define GPU_UEVENT_INFO(info) GPU_UEVENT_INFO_##info,
+enum gpu_uevent_info {
+    GPU_UEVENT_INFO_LIST
+};
+#undef GPU_UEVENT_INFO
+#define GPU_UEVENT_INFO(info) "GPU_UEVENT_INFO="#info,
+static inline const char *gpu_uevent_info_str(enum gpu_uevent_info info) {
+    static const char * const gpu_uevent_infos[] = {
+        GPU_UEVENT_INFO_LIST
+    };
+    return gpu_uevent_infos[info];
+}
+#undef GPU_UEVENT_INFO
+
+struct gpu_uevent {
+    enum gpu_uevent_type type;
+    enum gpu_uevent_info info;
+};
+
+void pixel_gpu_uevent_send(struct kbase_device *kbdev, const struct gpu_uevent *evt);
+
+#endif /* _PIXEL_GPU_UEVENT_H_ */
diff --git a/mali_kbase/tests/Kbuild b/mali_kbase/tests/Kbuild
index ee3de7b..72ca70a 100644
--- a/mali_kbase/tests/Kbuild
+++ b/mali_kbase/tests/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2017-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -17,6 +17,7 @@
 # http://www.gnu.org/licenses/gpl-2.0.html.
 #
 #
+src:=$(if $(patsubst /%,,$(src)),$(srctree)/$(src),$(src))
 
 ccflags-y += -I$(src)/include \
              -I$(src)
@@ -27,4 +28,6 @@ subdir-ccflags-y += -I$(src)/include \
 obj-$(CONFIG_MALI_KUTF) += kutf/
 obj-$(CONFIG_MALI_KUTF_IRQ_TEST) += mali_kutf_irq_test/
 obj-$(CONFIG_MALI_KUTF_CLK_RATE_TRACE) += mali_kutf_clk_rate_trace/kernel/
+obj-$(CONFIG_MALI_KUTF_MGM_INTEGRATION) += mali_kutf_mgm_integration_test/
+
 
diff --git a/mali_kbase/tests/Kconfig b/mali_kbase/tests/Kconfig
index a86e1ce..f100901 100644
--- a/mali_kbase/tests/Kconfig
+++ b/mali_kbase/tests/Kconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2017, 2020-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -52,6 +52,19 @@ config MALI_KUTF_CLK_RATE_TRACE
 	  Modules:
 	    - mali_kutf_clk_rate_trace_test_portal.ko
 
+config MALI_KUTF_MGM_INTEGRATION_TEST
+	bool "Build Mali KUTF MGM integration test module"
+	depends on MALI_KUTF
+	default y
+	help
+	  This option will build the MGM integration test module.
+	  It can test the implementation of PTE translation for specific
+	  group ids.
+
+	  Modules:
+	    - mali_kutf_mgm_integration_test.ko
+
+
 
 comment "Enable MALI_DEBUG for KUTF modules support"
 	depends on MALI_MIDGARD && !MALI_DEBUG && MALI_KUTF
diff --git a/mali_kbase/tests/Mconfig b/mali_kbase/tests/Mconfig
index 167facd..aa09274 100644
--- a/mali_kbase/tests/Mconfig
+++ b/mali_kbase/tests/Mconfig
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2018-2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2018-2023 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -26,8 +26,8 @@ menuconfig MALI_KUTF
 	  This option will build the Mali testing framework modules.
 
 	  Modules:
-	   - kutf.ko
-	   - kutf_test.ko
+	  - kutf.ko
+	  - kutf_test.ko
 
 config MALI_KUTF_IRQ_TEST
 	bool "Build Mali KUTF IRQ test module"
@@ -38,7 +38,7 @@ config MALI_KUTF_IRQ_TEST
 	  It can determine the latency of the Mali GPU IRQ on your system.
 
 	  Modules:
-	    - mali_kutf_irq_test.ko
+	  - mali_kutf_irq_test.ko
 
 config MALI_KUTF_CLK_RATE_TRACE
 	bool "Build Mali KUTF Clock rate trace test module"
@@ -50,12 +50,25 @@ config MALI_KUTF_CLK_RATE_TRACE
 	  basic trace test in the system.
 
 	  Modules:
-	    - mali_kutf_clk_rate_trace_test_portal.ko
+	  - mali_kutf_clk_rate_trace_test_portal.ko
+
+config MALI_KUTF_MGM_INTEGRATION_TEST
+	bool "Build Mali KUTF MGM integration test module"
+	depends on MALI_KUTF
+	default y
+	help
+	  This option will build the MGM integration test module.
+	  It can test the implementation of PTE translation for specific
+	  group ids.
+
+	  Modules:
+	  - mali_kutf_mgm_integration_test.ko
+
 
 
 # Enable MALI_DEBUG for KUTF modules support
 
 config UNIT_TEST_KERNEL_MODULES
-       bool
-       default y if UNIT_TEST_CODE && BACKEND_KERNEL
-       default n
+	bool
+	default y if UNIT_TEST_CODE && BACKEND_KERNEL
+	default n
diff --git a/mali_kbase/tests/build.bp b/mali_kbase/tests/build.bp
index 9d6137d..5581ba9 100644
--- a/mali_kbase/tests/build.bp
+++ b/mali_kbase/tests/build.bp
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2021-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -25,7 +25,7 @@ bob_defaults {
         "include",
         "./../../",
         "./../",
-        "./"
+        "./",
     ],
 }
 
@@ -38,3 +38,9 @@ bob_defaults {
         kbuild_options: ["CONFIG_UNIT_TEST_KERNEL_MODULES=y"],
     },
 }
+
+bob_defaults {
+    name: "kernel_unit_tests",
+    add_to_alias: ["unit_tests"],
+    srcs: [".*_unit_test/"],
+}
diff --git a/mali_kbase/tests/include/kutf/kutf_helpers.h b/mali_kbase/tests/include/kutf/kutf_helpers.h
index c4c713c..3f68efa 100644
--- a/mali_kbase/tests/include/kutf/kutf_helpers.h
+++ b/mali_kbase/tests/include/kutf/kutf_helpers.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,6 +31,7 @@
  */
 
 #include <kutf/kutf_suite.h>
+#include <linux/device.h>
 
 /**
  * kutf_helper_pending_input() - Check any pending lines sent by user space
@@ -81,4 +82,28 @@ int kutf_helper_input_enqueue(struct kutf_context *context,
  */
 void kutf_helper_input_enqueue_end_of_data(struct kutf_context *context);
 
+/**
+ * kutf_helper_ignore_dmesg() - Write message in dmesg to instruct parser
+ *                              to ignore errors, until the counterpart
+ *                              is written to dmesg to stop ignoring errors.
+ * @dev:  Device pointer to write to dmesg using.
+ *
+ * This function writes "Start ignoring dmesg warnings" to dmesg, which
+ * the parser will read and not log any errors. Only to be used in cases where
+ * we expect an error to be produced in dmesg but that we do not want to be
+ * flagged as an error.
+ */
+void kutf_helper_ignore_dmesg(struct device *dev);
+
+/**
+ * kutf_helper_stop_ignoring_dmesg() - Write message in dmesg to instruct parser
+ *                                     to stop ignoring errors.
+ * @dev:  Device pointer to write to dmesg using.
+ *
+ * This function writes "Stop ignoring dmesg warnings" to dmesg, which
+ * the parser will read and continue to log any errors. Counterpart to
+ * kutf_helper_ignore_dmesg().
+ */
+void kutf_helper_stop_ignoring_dmesg(struct device *dev);
+
 #endif	/* _KERNEL_UTF_HELPERS_H_ */
diff --git a/mali_kbase/tests/kutf/Kbuild b/mali_kbase/tests/kutf/Kbuild
index c4790bc..3b3bc4c 100644
--- a/mali_kbase/tests/kutf/Kbuild
+++ b/mali_kbase/tests/kutf/Kbuild
@@ -19,9 +19,9 @@
 #
 
 ifeq ($(CONFIG_MALI_KUTF),y)
-obj-m += kutf.o
+obj-m += mali_kutf.o
 
-kutf-y := \
+mali_kutf-y := \
     kutf_mem.o \
     kutf_resultset.o \
     kutf_suite.o \
diff --git a/mali_kbase/tests/kutf/kutf_helpers.c b/mali_kbase/tests/kutf/kutf_helpers.c
index d207d1c..4273619 100644
--- a/mali_kbase/tests/kutf/kutf_helpers.c
+++ b/mali_kbase/tests/kutf/kutf_helpers.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -127,3 +127,15 @@ void kutf_helper_input_enqueue_end_of_data(struct kutf_context *context)
 {
 	kutf_helper_input_enqueue(context, NULL, 0);
 }
+
+void kutf_helper_ignore_dmesg(struct device *dev)
+{
+	dev_info(dev, "KUTF: Start ignoring dmesg warnings\n");
+}
+EXPORT_SYMBOL(kutf_helper_ignore_dmesg);
+
+void kutf_helper_stop_ignoring_dmesg(struct device *dev)
+{
+	dev_info(dev, "KUTF: Stop ignoring dmesg warnings\n");
+}
+EXPORT_SYMBOL(kutf_helper_stop_ignoring_dmesg);
diff --git a/mali_kbase/tests/kutf/kutf_helpers_user.c b/mali_kbase/tests/kutf/kutf_helpers_user.c
index f88e138..c4e2943 100644
--- a/mali_kbase/tests/kutf/kutf_helpers_user.c
+++ b/mali_kbase/tests/kutf/kutf_helpers_user.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -28,7 +28,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 
-const char *valtype_names[] = {
+static const char *const valtype_names[] = {
 	"INVALID",
 	"U64",
 	"STR",
diff --git a/mali_kbase/tests/kutf/kutf_suite.c b/mali_kbase/tests/kutf/kutf_suite.c
index 91065b5..4468066 100644
--- a/mali_kbase/tests/kutf/kutf_suite.c
+++ b/mali_kbase/tests/kutf/kutf_suite.c
@@ -106,22 +106,16 @@ struct kutf_convert_table {
 	enum kutf_result_status result;
 };
 
-struct kutf_convert_table kutf_convert[] = {
-#define ADD_UTF_RESULT(_name) \
-{ \
-	#_name, \
-	_name, \
-},
-ADD_UTF_RESULT(KUTF_RESULT_BENCHMARK)
-ADD_UTF_RESULT(KUTF_RESULT_SKIP)
-ADD_UTF_RESULT(KUTF_RESULT_UNKNOWN)
-ADD_UTF_RESULT(KUTF_RESULT_PASS)
-ADD_UTF_RESULT(KUTF_RESULT_DEBUG)
-ADD_UTF_RESULT(KUTF_RESULT_INFO)
-ADD_UTF_RESULT(KUTF_RESULT_WARN)
-ADD_UTF_RESULT(KUTF_RESULT_FAIL)
-ADD_UTF_RESULT(KUTF_RESULT_FATAL)
-ADD_UTF_RESULT(KUTF_RESULT_ABORT)
+static const struct kutf_convert_table kutf_convert[] = {
+#define ADD_UTF_RESULT(_name)                                                                      \
+	{                                                                                          \
+#_name, _name,                                                                     \
+	}
+	ADD_UTF_RESULT(KUTF_RESULT_BENCHMARK), ADD_UTF_RESULT(KUTF_RESULT_SKIP),
+	ADD_UTF_RESULT(KUTF_RESULT_UNKNOWN),   ADD_UTF_RESULT(KUTF_RESULT_PASS),
+	ADD_UTF_RESULT(KUTF_RESULT_DEBUG),     ADD_UTF_RESULT(KUTF_RESULT_INFO),
+	ADD_UTF_RESULT(KUTF_RESULT_WARN),      ADD_UTF_RESULT(KUTF_RESULT_FAIL),
+	ADD_UTF_RESULT(KUTF_RESULT_FATAL),     ADD_UTF_RESULT(KUTF_RESULT_ABORT),
 };
 
 #define UTF_CONVERT_SIZE (ARRAY_SIZE(kutf_convert))
@@ -191,8 +185,7 @@ static void kutf_set_expected_result(struct kutf_context *context,
  *
  * Return: 1 if test result was successfully converted to string, 0 otherwise
  */
-static int kutf_result_to_string(char **result_str,
-		enum kutf_result_status result)
+static int kutf_result_to_string(const char **result_str, enum kutf_result_status result)
 {
 	int i;
 	int ret = 0;
@@ -382,7 +375,7 @@ static ssize_t kutf_debugfs_run_read(struct file *file, char __user *buf,
 	struct kutf_result *res;
 	unsigned long bytes_not_copied;
 	ssize_t bytes_copied = 0;
-	char *kutf_str_ptr = NULL;
+	const char *kutf_str_ptr = NULL;
 	size_t kutf_str_len = 0;
 	size_t message_len = 0;
 	char separator = ':';
@@ -599,11 +592,7 @@ static int create_fixture_variant(struct kutf_test_function *test_func,
 		goto fail_file;
 	}
 
-#if KERNEL_VERSION(4, 7, 0) <= LINUX_VERSION_CODE
 	tmp = debugfs_create_file_unsafe(
-#else
-	tmp = debugfs_create_file(
-#endif
 			"run", 0600, test_fix->dir,
 			test_fix,
 			&kutf_debugfs_run_ops);
diff --git a/mali_kbase/tests/kutf/kutf_utils.c b/mali_kbase/tests/kutf/kutf_utils.c
index 2ae1510..21f5fad 100644
--- a/mali_kbase/tests/kutf/kutf_utils.c
+++ b/mali_kbase/tests/kutf/kutf_utils.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2014, 2017, 2020-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2014, 2017, 2020-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -31,7 +31,7 @@
 
 static char tmp_buffer[KUTF_MAX_DSPRINTF_LEN];
 
-DEFINE_MUTEX(buffer_lock);
+static DEFINE_MUTEX(buffer_lock);
 
 const char *kutf_dsprintf(struct kutf_mempool *pool,
 		const char *fmt, ...)
diff --git a/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c b/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c
index 935f8ca..8b86fb0 100644
--- a/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c
+++ b/mali_kbase/tests/mali_kutf_clk_rate_trace/kernel/mali_kutf_clk_rate_trace_test.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2020-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2020-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -46,7 +46,7 @@
 #define MINOR_FOR_FIRST_KBASE_DEV	(-1)
 
 /* KUTF test application pointer for this test */
-struct kutf_application *kutf_app;
+static struct kutf_application *kutf_app;
 
 enum portal_server_state {
 	PORTAL_STATE_NO_CLK,
@@ -113,7 +113,7 @@ struct kbasep_cmd_name_pair {
 	const char *name;
 };
 
-struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = {
+static const struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = {
 	{ PORTAL_CMD_GET_PLATFORM, GET_PLATFORM },
 	{ PORTAL_CMD_GET_CLK_RATE_MGR, GET_CLK_RATE_MGR },
 	{ PORTAL_CMD_GET_CLK_RATE_TRACE, GET_CLK_RATE_TRACE },
@@ -128,7 +128,7 @@ struct kbasep_cmd_name_pair kbasep_portal_cmd_name_map[] = {
  * this pointer is engaged, new requests for create fixture will fail
  * hence limiting the use of the portal at any time to a singleton.
  */
-struct kutf_clk_rate_trace_fixture_data *g_ptr_portal_data;
+static struct kutf_clk_rate_trace_fixture_data *g_ptr_portal_data;
 
 #define PORTAL_MSG_LEN (KUTF_MAX_LINE_LENGTH - MAX_REPLY_NAME_LEN)
 static char portal_msg_buf[PORTAL_MSG_LEN];
@@ -442,8 +442,9 @@ static const char *kutf_clk_trace_do_get_platform(
 #if defined(CONFIG_MALI_ARBITER_SUPPORT) && defined(CONFIG_OF)
 	struct kutf_clk_rate_trace_fixture_data *data = context->fixture;
 
-	arbiter_if_node =
-		of_get_property(data->kbdev->dev->of_node, "arbiter_if", NULL);
+	arbiter_if_node = of_get_property(data->kbdev->dev->of_node, "arbiter-if", NULL);
+	if (!arbiter_if_node)
+		arbiter_if_node = of_get_property(data->kbdev->dev->of_node, "arbiter_if", NULL);
 #endif
 	if (arbiter_if_node) {
 		power_node = of_find_compatible_node(NULL, NULL,
@@ -825,14 +826,14 @@ static void *mali_kutf_clk_rate_trace_create_fixture(
 	if (!data)
 		return NULL;
 
-	*data = (const struct kutf_clk_rate_trace_fixture_data) { 0 };
+	memset(data, 0, sizeof(*data));
 	pr_debug("Hooking up the test portal to kbdev clk rate trace\n");
 	spin_lock(&kbdev->pm.clk_rtm.lock);
 
 	if (g_ptr_portal_data != NULL) {
 		pr_warn("Test portal is already in use, run aborted\n");
-		kutf_test_fail(context, "Portal allows single session only");
 		spin_unlock(&kbdev->pm.clk_rtm.lock);
+		kutf_test_fail(context, "Portal allows single session only");
 		return NULL;
 	}
 
@@ -909,7 +910,7 @@ static int __init mali_kutf_clk_rate_trace_test_module_init(void)
 {
 	struct kutf_suite *suite;
 	unsigned int filters;
-	union kutf_callback_data suite_data = { 0 };
+	union kutf_callback_data suite_data = { NULL };
 
 	pr_debug("Creating app\n");
 
diff --git a/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c b/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c
index 5824a4c..f2a014d 100644
--- a/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c
+++ b/mali_kbase/tests/mali_kutf_irq_test/mali_kutf_irq_test_main.c
@@ -40,7 +40,7 @@
  */
 
 /* KUTF test application pointer for this test */
-struct kutf_application *irq_app;
+static struct kutf_application *irq_app;
 
 /**
  * struct kutf_irq_fixture_data - test fixture used by the test functions.
@@ -51,8 +51,6 @@ struct kutf_irq_fixture_data {
 	struct kbase_device *kbdev;
 };
 
-#define SEC_TO_NANO(s)	      ((s)*1000000000LL)
-
 /* ID for the GPU IRQ */
 #define GPU_IRQ_HANDLER 2
 
@@ -212,6 +210,11 @@ static void mali_kutf_irq_latency(struct kutf_context *context)
 		average_time += irq_time - start_time;
 
 		udelay(10);
+		/* Sleep for a ms, every 10000 iterations, to avoid misleading warning
+		 * of CPU softlockup when all GPU IRQs keep going to the same CPU.
+		 */
+		if (!(i % 10000))
+			msleep(1);
 	}
 
 	/* Go back to default handler */
diff --git a/mali_kbase/arbitration/ptm/Kconfig b/mali_kbase/tests/mali_kutf_mgm_integration_test/Kbuild
index 074ebd5..e9bff98 100644
--- a/mali_kbase/arbitration/ptm/Kconfig
+++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/Kbuild
@@ -1,6 +1,6 @@
-# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note OR MIT
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
@@ -18,11 +18,8 @@
 #
 #
 
-config MALI_PARTITION_MANAGER
-	tristate "Enable compilation of partition manager modules"
-	depends on MALI_ARBITRATION
-	default n
-	help
-	  This option enables the compilation of the partition manager
-	  modules used to configure the Mali-G78AE GPU.
+ifeq ($(CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST),y)
+obj-m += mali_kutf_mgm_integration_test.o
 
+mali_kutf_mgm_integration_test-y := mali_kutf_mgm_integration_test_main.o
+endif
diff --git a/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp b/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp
new file mode 100644
index 0000000..8b995f8
--- /dev/null
+++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/build.bp
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+bob_kernel_module {
+    name: "mali_kutf_mgm_integration_test",
+    defaults: [
+        "mali_kbase_shared_config_defaults",
+        "kernel_test_configs",
+        "kernel_test_includes",
+    ],
+    srcs: [
+        "Kbuild",
+        "mali_kutf_mgm_integration_test_main.c",
+    ],
+    extra_symbols: [
+        "mali_kbase",
+        "kutf",
+    ],
+    enabled: false,
+    mali_kutf_mgm_integration_test: {
+        kbuild_options: ["CONFIG_MALI_KUTF_MGM_INTEGRATION_TEST=y"],
+        enabled: true,
+    },
+}
diff --git a/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c b/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c
new file mode 100644
index 0000000..5a42bd6
--- /dev/null
+++ b/mali_kbase/tests/mali_kutf_mgm_integration_test/mali_kutf_mgm_integration_test_main.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/*
+ *
+ * (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
+ *
+ * This program is free software and is provided to you under the terms of the
+ * GNU General Public License version 2 as published by the Free Software
+ * Foundation, and any use by you of this program is subject to the terms
+ * of such GNU license.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ *
+ */
+#include <linux/module.h>
+#include "mali_kbase.h"
+#include <kutf/kutf_suite.h>
+#include <kutf/kutf_utils.h>
+#include <kutf/kutf_helpers.h>
+#include <kutf/kutf_helpers_user.h>
+
+#define MINOR_FOR_FIRST_KBASE_DEV (-1)
+
+#define BASE_MEM_GROUP_COUNT (16)
+#define PA_MAX ((1ULL << 48) - 1)
+#define PA_START_BIT 12
+#define ENTRY_ACCESS_BIT (1ULL << 10)
+
+#define ENTRY_IS_ATE_L3 3ULL
+#define ENTRY_IS_ATE_L02 1ULL
+
+#define MGM_INTEGRATION_SUITE_NAME "mgm_integration"
+#define MGM_INTEGRATION_PTE_TRANSLATION "pte_translation"
+
+static char msg_buf[KUTF_MAX_LINE_LENGTH];
+
+/* KUTF test application pointer for this test */
+struct kutf_application *mgm_app;
+
+/**
+ * struct kutf_mgm_fixture_data - test fixture used by test functions
+ * @kbdev: kbase device for the GPU.
+ * @group_id: Memory group ID to test based on fixture index.
+ */
+struct kutf_mgm_fixture_data {
+	struct kbase_device *kbdev;
+	int group_id;
+};
+
+/**
+ * mali_kutf_mgm_pte_translation_test() -  Tests forward and reverse translation
+ * of PTE by the MGM module
+ * @context: KUTF context within which to perform the test.
+ *
+ * This test creates PTEs with physical addresses in the range
+ * 0x0000-0xFFFFFFFFF000 and tests that mgm_update_gpu_pte() returns a different
+ * PTE and mgm_pte_to_original_pte() returns the original PTE. This is tested
+ * at MMU level 2 and 3 as mgm_update_gpu_pte() is called for ATEs only.
+ *
+ * This test is run for a specific group_id depending on the fixture_id.
+ */
+static void mali_kutf_mgm_pte_translation_test(struct kutf_context *context)
+{
+	struct kutf_mgm_fixture_data *data = context->fixture;
+	struct kbase_device *kbdev = data->kbdev;
+	struct memory_group_manager_device *mgm_dev = kbdev->mgm_dev;
+	u64 addr;
+
+	for (addr = 1 << (PA_START_BIT - 1); addr <= PA_MAX; addr <<= 1) {
+		/* Mask 1 << 11 by ~0xFFF to get 0x0000 at first iteration */
+		phys_addr_t pa = addr;
+		u8 mmu_level;
+
+		/* Test MMU level 3 and 2 (2MB pages) only */
+		for (mmu_level = MIDGARD_MMU_LEVEL(2); mmu_level <= MIDGARD_MMU_LEVEL(3);
+		     mmu_level++) {
+			u64 translated_pte;
+			u64 returned_pte;
+			u64 original_pte;
+
+			if (mmu_level == MIDGARD_MMU_LEVEL(3))
+				original_pte =
+					(pa & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_ATE_L3;
+			else
+				original_pte =
+					(pa & PAGE_MASK) | ENTRY_ACCESS_BIT | ENTRY_IS_ATE_L02;
+
+			dev_dbg(kbdev->dev, "Testing group_id=%u, mmu_level=%u, pte=0x%llx\n",
+				data->group_id, mmu_level, original_pte);
+
+			translated_pte = mgm_dev->ops.mgm_update_gpu_pte(mgm_dev, data->group_id,
+									 mmu_level, original_pte);
+			if (translated_pte == original_pte) {
+				snprintf(
+					msg_buf, sizeof(msg_buf),
+					"PTE unchanged. translated_pte (0x%llx) == original_pte (0x%llx) for mmu_level=%u, group_id=%d",
+					translated_pte, original_pte, mmu_level, data->group_id);
+				kutf_test_fail(context, msg_buf);
+				return;
+			}
+
+			returned_pte = mgm_dev->ops.mgm_pte_to_original_pte(
+				mgm_dev, data->group_id, mmu_level, translated_pte);
+			dev_dbg(kbdev->dev, "\treturned_pte=%llx\n", returned_pte);
+
+			if (returned_pte != original_pte) {
+				snprintf(
+					msg_buf, sizeof(msg_buf),
+					"Original PTE not returned. returned_pte (0x%llx) != origin al_pte (0x%llx) for mmu_level=%u, group_id=%d",
+					returned_pte, original_pte, mmu_level, data->group_id);
+				kutf_test_fail(context, msg_buf);
+				return;
+			}
+		}
+	}
+	snprintf(msg_buf, sizeof(msg_buf), "Translation passed for group_id=%d", data->group_id);
+	kutf_test_pass(context, msg_buf);
+}
+
+/**
+ * mali_kutf_mgm_integration_create_fixture() - Creates the fixture data
+ *                   required for all tests in the mgm integration suite.
+ * @context: KUTF context.
+ *
+ * Return: Fixture data created on success or NULL on failure
+ */
+static void *mali_kutf_mgm_integration_create_fixture(struct kutf_context *context)
+{
+	struct kutf_mgm_fixture_data *data;
+	struct kbase_device *kbdev;
+
+	pr_debug("Finding kbase device\n");
+	kbdev = kbase_find_device(MINOR_FOR_FIRST_KBASE_DEV);
+	if (kbdev == NULL) {
+		kutf_test_fail(context, "Failed to find kbase device");
+		return NULL;
+	}
+	pr_debug("Creating fixture\n");
+
+	data = kutf_mempool_alloc(&context->fixture_pool, sizeof(struct kutf_mgm_fixture_data));
+	if (!data)
+		return NULL;
+	data->kbdev = kbdev;
+	data->group_id = context->fixture_index;
+
+	pr_debug("Fixture created\n");
+	return data;
+}
+
+/**
+ * mali_kutf_mgm_integration_remove_fixture() - Destroy fixture data previously
+ *                          created by mali_kutf_mgm_integration_create_fixture.
+ * @context: KUTF context.
+ */
+static void mali_kutf_mgm_integration_remove_fixture(struct kutf_context *context)
+{
+	struct kutf_mgm_fixture_data *data = context->fixture;
+	struct kbase_device *kbdev = data->kbdev;
+
+	kbase_release_device(kbdev);
+}
+
+/**
+ * mali_kutf_mgm_integration_test_main_init() - Module entry point for this test.
+ *
+ * Return: 0 on success, error code on failure.
+ */
+static int __init mali_kutf_mgm_integration_test_main_init(void)
+{
+	struct kutf_suite *suite;
+
+	mgm_app = kutf_create_application("mgm");
+
+	if (mgm_app == NULL) {
+		pr_warn("Creation of mgm KUTF app failed!\n");
+		return -ENOMEM;
+	}
+	suite = kutf_create_suite(mgm_app, MGM_INTEGRATION_SUITE_NAME, BASE_MEM_GROUP_COUNT,
+				  mali_kutf_mgm_integration_create_fixture,
+				  mali_kutf_mgm_integration_remove_fixture);
+	if (suite == NULL) {
+		pr_warn("Creation of %s suite failed!\n", MGM_INTEGRATION_SUITE_NAME);
+		kutf_destroy_application(mgm_app);
+		return -ENOMEM;
+	}
+	kutf_add_test(suite, 0x0, MGM_INTEGRATION_PTE_TRANSLATION,
+		      mali_kutf_mgm_pte_translation_test);
+	return 0;
+}
+
+/**
+ * mali_kutf_mgm_integration_test_main_exit() - Module exit point for this test.
+ */
+static void __exit mali_kutf_mgm_integration_test_main_exit(void)
+{
+	kutf_destroy_application(mgm_app);
+}
+
+module_init(mali_kutf_mgm_integration_test_main_init);
+module_exit(mali_kutf_mgm_integration_test_main_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("ARM Ltd.");
+MODULE_VERSION("1.0");
diff --git a/mali_kbase/thirdparty/mali_kbase_mmap.c b/mali_kbase/thirdparty/mali_kbase_mmap.c
index 1e636b9..20f7496 100644
--- a/mali_kbase/thirdparty/mali_kbase_mmap.c
+++ b/mali_kbase/thirdparty/mali_kbase_mmap.c
@@ -303,8 +303,7 @@ unsigned long kbase_context_get_unmapped_area(struct kbase_context *const kctx,
 	 * is no free region at the address found originally by too large a
 	 * same_va_end_addr here, and will fail the allocation gracefully.
 	 */
-	struct kbase_reg_zone *zone =
-		kbase_ctx_reg_zone_get_nolock(kctx, KBASE_REG_ZONE_SAME_VA);
+	struct kbase_reg_zone *zone = kbase_ctx_reg_zone_get_nolock(kctx, SAME_VA_ZONE);
 	u64 same_va_end_addr = kbase_reg_zone_end_pfn(zone) << PAGE_SHIFT;
 #if (KERNEL_VERSION(6, 1, 0) <= LINUX_VERSION_CODE)
 	const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
@@ -386,7 +385,7 @@ unsigned long kbase_context_get_unmapped_area(struct kbase_context *const kctx,
 #ifndef CONFIG_64BIT
 	} else {
 		return current->mm->get_unmapped_area(
-			kctx->filp, addr, len, pgoff, flags);
+			kctx->kfile->filp, addr, len, pgoff, flags);
 #endif
 	}
 
diff --git a/mali_kbase/tl/Kbuild b/mali_kbase/tl/Kbuild
index 4344850..1ecf3e4 100644
--- a/mali_kbase/tl/Kbuild
+++ b/mali_kbase/tl/Kbuild
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 #
-# (C) COPYRIGHT 2021 ARM Limited. All rights reserved.
+# (C) COPYRIGHT 2022 ARM Limited. All rights reserved.
 #
 # This program is free software and is provided to you under the terms of the
 # GNU General Public License version 2 as published by the Free Software
diff --git a/mali_kbase/tl/backend/mali_kbase_timeline_csf.c b/mali_kbase/tl/backend/mali_kbase_timeline_csf.c
index a6062f1..e96e05b 100644
--- a/mali_kbase/tl/backend/mali_kbase_timeline_csf.c
+++ b/mali_kbase/tl/backend/mali_kbase_timeline_csf.c
@@ -84,7 +84,7 @@ void kbase_create_timeline_objects(struct kbase_device *kbdev)
 	 * stream tracepoints are emitted to ensure we don't change the
 	 * scheduler until after then
 	 */
-	mutex_lock(&kbdev->csf.scheduler.lock);
+	rt_mutex_lock(&kbdev->csf.scheduler.lock);
 
 	for (slot_i = 0; slot_i < kbdev->csf.global_iface.group_num; slot_i++) {
 
@@ -105,7 +105,7 @@ void kbase_create_timeline_objects(struct kbase_device *kbdev)
 	 */
 	kbase_timeline_streams_body_reset(timeline);
 
-	mutex_unlock(&kbdev->csf.scheduler.lock);
+	rt_mutex_unlock(&kbdev->csf.scheduler.lock);
 
 	/* For each context in the device... */
 	list_for_each_entry(kctx, &timeline->tl_kctx_list, tl_kctx_list_node) {
diff --git a/mali_kbase/tl/mali_kbase_timeline.c b/mali_kbase/tl/mali_kbase_timeline.c
index d656c03..20356d6 100644
--- a/mali_kbase/tl/mali_kbase_timeline.c
+++ b/mali_kbase/tl/mali_kbase_timeline.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,9 +24,6 @@
 #include "mali_kbase_tracepoints.h"
 
 #include <mali_kbase.h>
-#include <mali_kbase_jm.h>
-
-#include <linux/anon_inodes.h>
 #include <linux/atomic.h>
 #include <linux/file.h>
 #include <linux/mutex.h>
@@ -35,7 +32,7 @@
 #include <linux/stringify.h>
 #include <linux/timer.h>
 #include <linux/wait.h>
-
+#include <linux/delay.h>
 
 /* The period of autoflush checker execution in milliseconds. */
 #define AUTOFLUSH_INTERVAL 1000 /* ms */
@@ -184,90 +181,109 @@ static void kbase_tlstream_current_devfreq_target(struct kbase_device *kbdev)
 }
 #endif /* CONFIG_MALI_DEVFREQ */
 
-int kbase_timeline_io_acquire(struct kbase_device *kbdev, u32 flags)
+int kbase_timeline_acquire(struct kbase_device *kbdev, u32 flags)
 {
-	int ret = 0;
+	int err = 0;
 	u32 timeline_flags = TLSTREAM_ENABLED | flags;
-	struct kbase_timeline *timeline = kbdev->timeline;
+	struct kbase_timeline *timeline;
+	int rcode;
+
+	if (WARN_ON(!kbdev) || WARN_ON(flags & ~BASE_TLSTREAM_FLAGS_MASK))
+		return -EINVAL;
 
-	if (!atomic_cmpxchg(timeline->timeline_flags, 0, timeline_flags)) {
-		int rcode;
+	timeline = kbdev->timeline;
+	if (WARN_ON(!timeline))
+		return -EFAULT;
+
+	if (atomic_cmpxchg(timeline->timeline_flags, 0, timeline_flags))
+		return -EBUSY;
 
 #if MALI_USE_CSF
-		if (flags & BASE_TLSTREAM_ENABLE_CSFFW_TRACEPOINTS) {
-			ret = kbase_csf_tl_reader_start(
-				&timeline->csf_tl_reader, kbdev);
-			if (ret) {
-				atomic_set(timeline->timeline_flags, 0);
-				return ret;
-			}
-		}
-#endif
-		ret = anon_inode_getfd(
-				"[mali_tlstream]",
-				&kbasep_tlstream_fops,
-				timeline,
-				O_RDONLY | O_CLOEXEC);
-		if (ret < 0) {
+	if (flags & BASE_TLSTREAM_ENABLE_CSFFW_TRACEPOINTS) {
+		err = kbase_csf_tl_reader_start(&timeline->csf_tl_reader, kbdev);
+		if (err) {
 			atomic_set(timeline->timeline_flags, 0);
-#if MALI_USE_CSF
-			kbase_csf_tl_reader_stop(&timeline->csf_tl_reader);
-#endif
-			return ret;
+			return err;
 		}
+	}
+#endif
 
-		/* Reset and initialize header streams. */
-		kbase_tlstream_reset(
-			&timeline->streams[TL_STREAM_TYPE_OBJ_SUMMARY]);
+	/* Reset and initialize header streams. */
+	kbase_tlstream_reset(&timeline->streams[TL_STREAM_TYPE_OBJ_SUMMARY]);
 
-		timeline->obj_header_btc = obj_desc_header_size;
-		timeline->aux_header_btc = aux_desc_header_size;
+	timeline->obj_header_btc = obj_desc_header_size;
+	timeline->aux_header_btc = aux_desc_header_size;
 
 #if !MALI_USE_CSF
-		/* If job dumping is enabled, readjust the software event's
-		 * timeout as the default value of 3 seconds is often
-		 * insufficient.
-		 */
-		if (flags & BASE_TLSTREAM_JOB_DUMPING_ENABLED) {
-			dev_info(kbdev->dev,
-					"Job dumping is enabled, readjusting the software event's timeout\n");
-			atomic_set(&kbdev->js_data.soft_job_timeout_ms,
-					1800000);
-		}
+	/* If job dumping is enabled, readjust the software event's
+	 * timeout as the default value of 3 seconds is often
+	 * insufficient.
+	 */
+	if (flags & BASE_TLSTREAM_JOB_DUMPING_ENABLED) {
+		dev_info(kbdev->dev,
+			 "Job dumping is enabled, readjusting the software event's timeout\n");
+		atomic_set(&kbdev->js_data.soft_job_timeout_ms, 1800000);
+	}
 #endif /* !MALI_USE_CSF */
 
-		/* Summary stream was cleared during acquire.
-		 * Create static timeline objects that will be
-		 * read by client.
-		 */
-		kbase_create_timeline_objects(kbdev);
+	/* Summary stream was cleared during acquire.
+	 * Create static timeline objects that will be
+	 * read by client.
+	 */
+	kbase_create_timeline_objects(kbdev);
 
 #ifdef CONFIG_MALI_DEVFREQ
-		/* Devfreq target tracepoints are only fired when the target
-		 * changes, so we won't know the current target unless we
-		 * send it now.
-		 */
-		kbase_tlstream_current_devfreq_target(kbdev);
+	/* Devfreq target tracepoints are only fired when the target
+	 * changes, so we won't know the current target unless we
+	 * send it now.
+	 */
+	kbase_tlstream_current_devfreq_target(kbdev);
 #endif /* CONFIG_MALI_DEVFREQ */
 
-		/* Start the autoflush timer.
-		 * We must do this after creating timeline objects to ensure we
-		 * don't auto-flush the streams which will be reset during the
-		 * summarization process.
-		 */
-		atomic_set(&timeline->autoflush_timer_active, 1);
-		rcode = mod_timer(&timeline->autoflush_timer,
-				  jiffies +
-					  msecs_to_jiffies(AUTOFLUSH_INTERVAL));
-		CSTD_UNUSED(rcode);
-	} else {
-		ret = -EBUSY;
-	}
+	/* Start the autoflush timer.
+	 * We must do this after creating timeline objects to ensure we
+	 * don't auto-flush the streams which will be reset during the
+	 * summarization process.
+	 */
+	atomic_set(&timeline->autoflush_timer_active, 1);
+	rcode = mod_timer(&timeline->autoflush_timer,
+			  jiffies + msecs_to_jiffies(AUTOFLUSH_INTERVAL));
+	CSTD_UNUSED(rcode);
+
+	timeline->last_acquire_time = ktime_get_raw();
+
+	return err;
+}
+
+void kbase_timeline_release(struct kbase_timeline *timeline)
+{
+	ktime_t elapsed_time;
+	s64 elapsed_time_ms, time_to_sleep;
+
+	if (WARN_ON(!timeline) || WARN_ON(!atomic_read(timeline->timeline_flags)))
+		return;
+
+	/* Get the amount of time passed since the timeline was acquired and ensure
+	 * we sleep for long enough such that it has been at least
+	 * TIMELINE_HYSTERESIS_TIMEOUT_MS amount of time between acquire and release.
+	 * This prevents userspace from spamming acquire and release too quickly.
+	 */
+	elapsed_time = ktime_sub(ktime_get_raw(), timeline->last_acquire_time);
+	elapsed_time_ms = ktime_to_ms(elapsed_time);
+	time_to_sleep = (elapsed_time_ms < 0 ? TIMELINE_HYSTERESIS_TIMEOUT_MS :
+					       TIMELINE_HYSTERESIS_TIMEOUT_MS - elapsed_time_ms);
+	if (time_to_sleep > 0)
+		msleep_interruptible(time_to_sleep);
 
-	if (ret >= 0)
-		timeline->last_acquire_time = ktime_get();
+#if MALI_USE_CSF
+	kbase_csf_tl_reader_stop(&timeline->csf_tl_reader);
+#endif
 
-	return ret;
+	/* Stop autoflush timer before releasing access to streams. */
+	atomic_set(&timeline->autoflush_timer_active, 0);
+	del_timer_sync(&timeline->autoflush_timer);
+
+	atomic_set(timeline->timeline_flags, 0);
 }
 
 int kbase_timeline_streams_flush(struct kbase_timeline *timeline)
@@ -275,11 +291,17 @@ int kbase_timeline_streams_flush(struct kbase_timeline *timeline)
 	enum tl_stream_type stype;
 	bool has_bytes = false;
 	size_t nbytes = 0;
+
+	if (WARN_ON(!timeline))
+		return -EINVAL;
+
 #if MALI_USE_CSF
-	int ret = kbase_csf_tl_reader_flush_buffer(&timeline->csf_tl_reader);
+	{
+		int ret = kbase_csf_tl_reader_flush_buffer(&timeline->csf_tl_reader);
 
-	if (ret > 0)
-		has_bytes = true;
+		if (ret > 0)
+			has_bytes = true;
+	}
 #endif
 
 	for (stype = 0; stype < TL_STREAM_TYPE_COUNT; stype++) {
diff --git a/mali_kbase/tl/mali_kbase_timeline.h b/mali_kbase/tl/mali_kbase_timeline.h
index 96a4b18..62be6c6 100644
--- a/mali_kbase/tl/mali_kbase_timeline.h
+++ b/mali_kbase/tl/mali_kbase_timeline.h
@@ -117,4 +117,12 @@ void kbase_timeline_post_kbase_context_destroy(struct kbase_context *kctx);
 void kbase_timeline_stats(struct kbase_timeline *timeline, u32 *bytes_collected, u32 *bytes_generated);
 #endif /* MALI_UNIT_TEST */
 
+/**
+ * kbase_timeline_io_debugfs_init - Add a debugfs entry for reading timeline stream data
+ *
+ * @kbdev: An instance of the GPU platform device, allocated from the probe
+ *         method of the driver.
+ */
+void kbase_timeline_io_debugfs_init(struct kbase_device *kbdev);
+
 #endif /* _KBASE_TIMELINE_H */
diff --git a/mali_kbase/tl/mali_kbase_timeline_io.c b/mali_kbase/tl/mali_kbase_timeline_io.c
index 3391e75..ae57006 100644
--- a/mali_kbase/tl/mali_kbase_timeline_io.c
+++ b/mali_kbase/tl/mali_kbase_timeline_io.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -24,26 +24,74 @@
 #include "mali_kbase_tracepoints.h"
 #include "mali_kbase_timeline.h"
 
-#include <linux/delay.h>
+#include <device/mali_kbase_device.h>
+
 #include <linux/poll.h>
+#include <linux/version_compat_defs.h>
+#include <linux/anon_inodes.h>
+
+/* Explicitly include epoll header for old kernels. Not required from 4.16. */
+#if KERNEL_VERSION(4, 16, 0) > LINUX_VERSION_CODE
+#include <uapi/linux/eventpoll.h>
+#endif
+
+static int kbase_unprivileged_global_profiling;
+
+/**
+ * kbase_unprivileged_global_profiling_set - set permissions for unprivileged processes
+ *
+ * @val: String containing value to set. Only strings representing positive
+ *       integers are accepted as valid; any non-positive integer (including 0)
+ *       is rejected.
+ * @kp: Module parameter associated with this method.
+ *
+ * This method can only be used to enable permissions for unprivileged processes,
+ * if they are disabled: for this reason, the only values which are accepted are
+ * strings representing positive integers. Since it's impossible to disable
+ * permissions once they're set, any integer which is non-positive is rejected,
+ * including 0.
+ *
+ * Return: 0 if success, otherwise error code.
+ */
+static int kbase_unprivileged_global_profiling_set(const char *val, const struct kernel_param *kp)
+{
+	int new_val;
+	int ret = kstrtoint(val, 0, &new_val);
+
+	if (ret == 0) {
+		if (new_val < 1)
+			return -EINVAL;
+
+		kbase_unprivileged_global_profiling = 1;
+	}
+
+	return ret;
+}
+
+static const struct kernel_param_ops kbase_global_unprivileged_profiling_ops = {
+	.get = param_get_int,
+	.set = kbase_unprivileged_global_profiling_set,
+};
+
+module_param_cb(kbase_unprivileged_global_profiling, &kbase_global_unprivileged_profiling_ops,
+		&kbase_unprivileged_global_profiling, 0600);
 
 /* The timeline stream file operations functions. */
 static ssize_t kbasep_timeline_io_read(struct file *filp, char __user *buffer,
 				       size_t size, loff_t *f_pos);
-static unsigned int kbasep_timeline_io_poll(struct file *filp,
-					    poll_table *wait);
+static __poll_t kbasep_timeline_io_poll(struct file *filp, poll_table *wait);
 static int kbasep_timeline_io_release(struct inode *inode, struct file *filp);
 static int kbasep_timeline_io_fsync(struct file *filp, loff_t start, loff_t end,
 				    int datasync);
 
-/* The timeline stream file operations structure. */
-const struct file_operations kbasep_tlstream_fops = {
-	.owner = THIS_MODULE,
-	.release = kbasep_timeline_io_release,
-	.read = kbasep_timeline_io_read,
-	.poll = kbasep_timeline_io_poll,
-	.fsync = kbasep_timeline_io_fsync,
-};
+static bool timeline_is_permitted(void)
+{
+#if KERNEL_VERSION(5, 8, 0) <= LINUX_VERSION_CODE
+	return kbase_unprivileged_global_profiling || perfmon_capable();
+#else
+	return kbase_unprivileged_global_profiling || capable(CAP_SYS_ADMIN);
+#endif
+}
 
 /**
  * kbasep_timeline_io_packet_pending - check timeline streams for pending
@@ -290,9 +338,10 @@ static ssize_t kbasep_timeline_io_read(struct file *filp, char __user *buffer,
  * @filp: Pointer to file structure
  * @wait: Pointer to poll table
  *
- * Return: POLLIN if data can be read without blocking, otherwise zero
+ * Return: EPOLLIN | EPOLLRDNORM if data can be read without blocking,
+ *         otherwise zero, or EPOLLHUP | EPOLLERR on error.
  */
-static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait)
+static __poll_t kbasep_timeline_io_poll(struct file *filp, poll_table *wait)
 {
 	struct kbase_tlstream *stream;
 	unsigned int rb_idx;
@@ -302,20 +351,94 @@ static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait)
 	KBASE_DEBUG_ASSERT(wait);
 
 	if (WARN_ON(!filp->private_data))
-		return -EFAULT;
+		return EPOLLHUP | EPOLLERR;
 
 	timeline = (struct kbase_timeline *)filp->private_data;
 
 	/* If there are header bytes to copy, read will not block */
 	if (kbasep_timeline_has_header_data(timeline))
-		return POLLIN;
+		return EPOLLIN | EPOLLRDNORM;
 
 	poll_wait(filp, &timeline->event_queue, wait);
 	if (kbasep_timeline_io_packet_pending(timeline, &stream, &rb_idx))
-		return POLLIN;
-	return 0;
+		return EPOLLIN | EPOLLRDNORM;
+
+	return (__poll_t)0;
+}
+
+int kbase_timeline_io_acquire(struct kbase_device *kbdev, u32 flags)
+{
+	/* The timeline stream file operations structure. */
+	static const struct file_operations kbasep_tlstream_fops = {
+		.owner = THIS_MODULE,
+		.release = kbasep_timeline_io_release,
+		.read = kbasep_timeline_io_read,
+		.poll = kbasep_timeline_io_poll,
+		.fsync = kbasep_timeline_io_fsync,
+	};
+	int err;
+
+	if (!timeline_is_permitted())
+		return -EPERM;
+
+	if (WARN_ON(!kbdev) || (flags & ~BASE_TLSTREAM_FLAGS_MASK))
+		return -EINVAL;
+
+	err = kbase_timeline_acquire(kbdev, flags);
+	if (err)
+		return err;
+
+	err = anon_inode_getfd("[mali_tlstream]", &kbasep_tlstream_fops, kbdev->timeline,
+			       O_RDONLY | O_CLOEXEC);
+	if (err < 0)
+		kbase_timeline_release(kbdev->timeline);
+
+	return err;
 }
 
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+static int kbasep_timeline_io_open(struct inode *in, struct file *file)
+{
+	struct kbase_device *const kbdev = in->i_private;
+
+	if (WARN_ON(!kbdev))
+		return -EFAULT;
+
+	file->private_data = kbdev->timeline;
+	return kbase_timeline_acquire(kbdev, BASE_TLSTREAM_FLAGS_MASK &
+						     ~BASE_TLSTREAM_JOB_DUMPING_ENABLED);
+}
+
+void kbase_timeline_io_debugfs_init(struct kbase_device *const kbdev)
+{
+	static const struct file_operations kbasep_tlstream_debugfs_fops = {
+		.owner = THIS_MODULE,
+		.open = kbasep_timeline_io_open,
+		.release = kbasep_timeline_io_release,
+		.read = kbasep_timeline_io_read,
+		.poll = kbasep_timeline_io_poll,
+		.fsync = kbasep_timeline_io_fsync,
+	};
+	struct dentry *file;
+
+	if (WARN_ON(!kbdev) || WARN_ON(IS_ERR_OR_NULL(kbdev->mali_debugfs_directory)))
+		return;
+
+	file = debugfs_create_file("tlstream", 0400, kbdev->mali_debugfs_directory, kbdev,
+				   &kbasep_tlstream_debugfs_fops);
+
+	if (IS_ERR_OR_NULL(file))
+		dev_warn(kbdev->dev, "Unable to create timeline debugfs entry");
+}
+#else
+/*
+ * Stub function for when debugfs is disabled
+ */
+void kbase_timeline_io_debugfs_init(struct kbase_device *const kbdev)
+{
+}
+#endif
+
 /**
  * kbasep_timeline_io_release - release timeline stream descriptor
  * @inode: Pointer to inode structure
@@ -325,55 +448,18 @@ static unsigned int kbasep_timeline_io_poll(struct file *filp, poll_table *wait)
  */
 static int kbasep_timeline_io_release(struct inode *inode, struct file *filp)
 {
-	struct kbase_timeline *timeline;
-	ktime_t elapsed_time;
-	s64 elapsed_time_ms, time_to_sleep;
-
-	KBASE_DEBUG_ASSERT(inode);
-	KBASE_DEBUG_ASSERT(filp);
-	KBASE_DEBUG_ASSERT(filp->private_data);
-
 	CSTD_UNUSED(inode);
 
-	timeline = (struct kbase_timeline *)filp->private_data;
-
-	/* Get the amount of time passed since the timeline was acquired and ensure
-	 * we sleep for long enough such that it has been at least
-	 * TIMELINE_HYSTERESIS_TIMEOUT_MS amount of time between acquire and release.
-	 * This prevents userspace from spamming acquire and release too quickly.
-	 */
-	elapsed_time = ktime_sub(ktime_get(), timeline->last_acquire_time);
-	elapsed_time_ms = ktime_to_ms(elapsed_time);
-	time_to_sleep = MIN(TIMELINE_HYSTERESIS_TIMEOUT_MS,
-		TIMELINE_HYSTERESIS_TIMEOUT_MS - elapsed_time_ms);
-	if (time_to_sleep > 0)
-		msleep(time_to_sleep);
-
-#if MALI_USE_CSF
-	kbase_csf_tl_reader_stop(&timeline->csf_tl_reader);
-#endif
-
-	/* Stop autoflush timer before releasing access to streams. */
-	atomic_set(&timeline->autoflush_timer_active, 0);
-	del_timer_sync(&timeline->autoflush_timer);
-
-	atomic_set(timeline->timeline_flags, 0);
+	kbase_timeline_release(filp->private_data);
 	return 0;
 }
 
 static int kbasep_timeline_io_fsync(struct file *filp, loff_t start, loff_t end,
 				    int datasync)
 {
-	struct kbase_timeline *timeline;
-
 	CSTD_UNUSED(start);
 	CSTD_UNUSED(end);
 	CSTD_UNUSED(datasync);
 
-	if (WARN_ON(!filp->private_data))
-		return -EFAULT;
-
-	timeline = (struct kbase_timeline *)filp->private_data;
-
-	return kbase_timeline_streams_flush(timeline);
+	return kbase_timeline_streams_flush(filp->private_data);
 }
diff --git a/mali_kbase/tl/mali_kbase_timeline_priv.h b/mali_kbase/tl/mali_kbase_timeline_priv.h
index bf2c385..de30bcc 100644
--- a/mali_kbase/tl/mali_kbase_timeline_priv.h
+++ b/mali_kbase/tl/mali_kbase_timeline_priv.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2019-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2019-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -51,7 +51,7 @@
  * @event_queue:            Timeline stream event queue
  * @bytes_collected:        Number of bytes read by user
  * @timeline_flags:         Zero, if timeline is disabled. Timeline stream flags
- *                          otherwise. See kbase_timeline_io_acquire().
+ *                          otherwise. See kbase_timeline_acquire().
  * @obj_header_btc:         Remaining bytes to copy for the object stream header
  * @aux_header_btc:         Remaining bytes to copy for the aux stream header
  * @last_acquire_time:      The time at which timeline was last acquired.
@@ -77,8 +77,27 @@ struct kbase_timeline {
 #endif
 };
 
-extern const struct file_operations kbasep_tlstream_fops;
-
 void kbase_create_timeline_objects(struct kbase_device *kbdev);
 
+/**
+ * kbase_timeline_acquire - acquire timeline for a userspace client.
+ * @kbdev:     An instance of the GPU platform device, allocated from the probe
+ *             method of the driver.
+ * @flags:     Timeline stream flags
+ *
+ * Each timeline instance can be acquired by only one userspace client at a time.
+ *
+ * Return: Zero on success, error number on failure (e.g. if already acquired).
+ */
+int kbase_timeline_acquire(struct kbase_device *kbdev, u32 flags);
+
+/**
+ * kbase_timeline_release - release timeline for a userspace client.
+ * @timeline:     Timeline instance to be stopped. It must be previously acquired
+ *                with kbase_timeline_acquire().
+ *
+ * Releasing the timeline instance allows it to be acquired by another userspace client.
+ */
+void kbase_timeline_release(struct kbase_timeline *timeline);
+
 #endif /* _KBASE_TIMELINE_PRIV_H */
diff --git a/mali_kbase/tl/mali_kbase_tlstream.h b/mali_kbase/tl/mali_kbase_tlstream.h
index 6660cf5..c142849 100644
--- a/mali_kbase/tl/mali_kbase_tlstream.h
+++ b/mali_kbase/tl/mali_kbase_tlstream.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2015-2021 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2015-2022 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -27,17 +27,13 @@
 #include <linux/wait.h>
 
 /* The maximum size of a single packet used by timeline. */
-#define PACKET_SIZE        4096 /* bytes */
+#define PACKET_SIZE 4096 /* bytes */
 
 /* The number of packets used by one timeline stream. */
-#if defined(CONFIG_MALI_JOB_DUMP) || defined(CONFIG_MALI_VECTOR_DUMP)
-	#define PACKET_COUNT       64
-#else
-	#define PACKET_COUNT       32
-#endif
+#define PACKET_COUNT 128
 
 /* The maximum expected length of string in tracepoint descriptor. */
-#define STRLEN_MAX         64 /* bytes */
+#define STRLEN_MAX 64 /* bytes */
 
 /**
  * struct kbase_tlstream - timeline stream structure
diff --git a/mali_kbase/tl/mali_kbase_tracepoints.c b/mali_kbase/tl/mali_kbase_tracepoints.c
index 6aae4e0..f62c755 100644
--- a/mali_kbase/tl/mali_kbase_tracepoints.c
+++ b/mali_kbase/tl/mali_kbase_tracepoints.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -84,9 +84,12 @@ enum tl_msg_id_obj {
 	KBASE_TL_ATTRIB_ATOM_PRIORITIZED,
 	KBASE_TL_ATTRIB_ATOM_JIT,
 	KBASE_TL_KBASE_NEW_DEVICE,
+	KBASE_TL_KBASE_GPUCMDQUEUE_KICK,
 	KBASE_TL_KBASE_DEVICE_PROGRAM_CSG,
 	KBASE_TL_KBASE_DEVICE_DEPROGRAM_CSG,
-	KBASE_TL_KBASE_DEVICE_HALT_CSG,
+	KBASE_TL_KBASE_DEVICE_HALTING_CSG,
+	KBASE_TL_KBASE_DEVICE_SUSPEND_CSG,
+	KBASE_TL_KBASE_DEVICE_CSG_IDLE,
 	KBASE_TL_KBASE_NEW_CTX,
 	KBASE_TL_KBASE_DEL_CTX,
 	KBASE_TL_KBASE_CTX_ASSIGN_AS,
@@ -97,17 +100,19 @@ enum tl_msg_id_obj {
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_FENCE_WAIT,
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT,
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET,
+	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION,
+	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION,
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT,
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT,
 	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE,
-	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER,
-	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND,
 	KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_ALLOC,
 	KBASE_TL_KBASE_ARRAY_ITEM_KCPUQUEUE_ENQUEUE_JIT_ALLOC,
 	KBASE_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_ALLOC,
 	KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_FREE,
 	KBASE_TL_KBASE_ARRAY_ITEM_KCPUQUEUE_ENQUEUE_JIT_FREE,
 	KBASE_TL_KBASE_ARRAY_END_KCPUQUEUE_ENQUEUE_JIT_FREE,
+	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER,
+	KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_END,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_WAIT_START,
@@ -115,6 +120,9 @@ enum tl_msg_id_obj {
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET,
+	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START,
+	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END,
+	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_END,
 	KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_UNMAP_IMPORT_START,
@@ -305,11 +313,11 @@ enum tl_msg_id_obj {
 		"@p", \
 		"atom") \
 	TRACEPOINT_DESC(KBASE_TL_JD_DONE_NO_LOCK_START, \
-		"Within function jd_done_nolock", \
+		"Within function kbase_jd_done_nolock", \
 		"@p", \
 		"atom") \
 	TRACEPOINT_DESC(KBASE_TL_JD_DONE_NO_LOCK_END, \
-		"Within function jd_done_nolock - end", \
+		"Within function kbase_jd_done_nolock - end", \
 		"@p", \
 		"atom") \
 	TRACEPOINT_DESC(KBASE_TL_JD_DONE_START, \
@@ -352,16 +360,28 @@ enum tl_msg_id_obj {
 		"New KBase Device", \
 		"@IIIIIII", \
 		"kbase_device_id,kbase_device_gpu_core_count,kbase_device_max_num_csgs,kbase_device_as_count,kbase_device_sb_entry_count,kbase_device_has_cross_stream_sync,kbase_device_supports_gpu_sleep") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_GPUCMDQUEUE_KICK, \
+		"Kernel receives a request to process new GPU queue instructions", \
+		"@IL", \
+		"kernel_ctx_id,buffer_gpu_addr") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_PROGRAM_CSG, \
 		"CSG is programmed to a slot", \
 		"@IIIII", \
-		"kbase_device_id,kernel_ctx_id,gpu_cmdq_grp_handle,kbase_device_csg_slot_index,kbase_device_csg_slot_resumed") \
+		"kbase_device_id,kernel_ctx_id,gpu_cmdq_grp_handle,kbase_device_csg_slot_index,kbase_device_csg_slot_resuming") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_DEPROGRAM_CSG, \
 		"CSG is deprogrammed from a slot", \
 		"@II", \
 		"kbase_device_id,kbase_device_csg_slot_index") \
-	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_HALT_CSG, \
-		"CSG is halted", \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_HALTING_CSG, \
+		"CSG is halting", \
+		"@III", \
+		"kbase_device_id,kbase_device_csg_slot_index,kbase_device_csg_slot_suspending") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_SUSPEND_CSG, \
+		"CSG is suspended", \
+		"@II", \
+		"kbase_device_id,kbase_device_csg_slot_index") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_DEVICE_CSG_IDLE, \
+		"KBase device is notified that CSG is idle.", \
 		"@II", \
 		"kbase_device_id,kbase_device_csg_slot_index") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_NEW_CTX, \
@@ -399,11 +419,19 @@ enum tl_msg_id_obj {
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT, \
 		"KCPU Queue enqueues Wait on Cross Queue Sync Object", \
 		"@pLII", \
-		"kcpu_queue,cqs_obj_gpu_addr,cqs_obj_compare_value,cqs_obj_inherit_error") \
+		"kcpu_queue,cqs_obj_gpu_addr,compare_value,inherit_error") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET, \
 		"KCPU Queue enqueues Set on Cross Queue Sync Object", \
 		"@pL", \
 		"kcpu_queue,cqs_obj_gpu_addr") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION, \
+		"KCPU Queue enqueues Wait Operation on Cross Queue Sync Object", \
+		"@pLLIII", \
+		"kcpu_queue,cqs_obj_gpu_addr,compare_value,condition,data_type,inherit_error") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION, \
+		"KCPU Queue enqueues Set Operation on Cross Queue Sync Object", \
+		"@pLLII", \
+		"kcpu_queue,cqs_obj_gpu_addr,value,operation,data_type") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT, \
 		"KCPU Queue enqueues Map Import", \
 		"@pL", \
@@ -416,14 +444,6 @@ enum tl_msg_id_obj {
 		"KCPU Queue enqueues Unmap Import ignoring reference count", \
 		"@pL", \
 		"kcpu_queue,map_import_buf_gpu_addr") \
-	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, \
-		"KCPU Queue enqueues Error Barrier", \
-		"@p", \
-		"kcpu_queue") \
-	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, \
-		"KCPU Queue enqueues Group Suspend", \
-		"@ppI", \
-		"kcpu_queue,group_suspend_buf,gpu_cmdq_grp_handle") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_ARRAY_BEGIN_KCPUQUEUE_ENQUEUE_JIT_ALLOC, \
 		"Begin array of KCPU Queue enqueues JIT Alloc", \
 		"@p", \
@@ -448,6 +468,14 @@ enum tl_msg_id_obj {
 		"End array of KCPU Queue enqueues JIT Free", \
 		"@p", \
 		"kcpu_queue") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER, \
+		"KCPU Queue enqueues Error Barrier", \
+		"@p", \
+		"kcpu_queue") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND, \
+		"KCPU Queue enqueues Group Suspend", \
+		"@ppI", \
+		"kcpu_queue,group_suspend_buf,gpu_cmdq_grp_handle") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START, \
 		"KCPU Queue starts a Signal on Fence", \
 		"@p", \
@@ -465,15 +493,27 @@ enum tl_msg_id_obj {
 		"@pI", \
 		"kcpu_queue,execute_error") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START, \
-		"KCPU Queue starts a Wait on an array of Cross Queue Sync Objects", \
+		"KCPU Queue starts a Wait on Cross Queue Sync Object", \
 		"@p", \
 		"kcpu_queue") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END, \
-		"KCPU Queue ends a Wait on an array of Cross Queue Sync Objects", \
+		"KCPU Queue ends a Wait on Cross Queue Sync Object", \
 		"@pI", \
 		"kcpu_queue,execute_error") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET, \
-		"KCPU Queue executes a Set on an array of Cross Queue Sync Objects", \
+		"KCPU Queue executes a Set on Cross Queue Sync Object", \
+		"@pI", \
+		"kcpu_queue,execute_error") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START, \
+		"KCPU Queue starts a Wait Operation on Cross Queue Sync Object", \
+		"@p", \
+		"kcpu_queue") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END, \
+		"KCPU Queue ends a Wait Operation on Cross Queue Sync Object", \
+		"@pI", \
+		"kcpu_queue,execute_error") \
+	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION, \
+		"KCPU Queue executes a Set Operation on Cross Queue Sync Object", \
 		"@pI", \
 		"kcpu_queue,execute_error") \
 	TRACEPOINT_DESC(KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START, \
@@ -2092,13 +2132,40 @@ void __kbase_tlstream_tl_kbase_new_device(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
+void __kbase_tlstream_tl_kbase_gpucmdqueue_kick(
+	struct kbase_tlstream *stream,
+	u32 kernel_ctx_id,
+	u64 buffer_gpu_addr
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_GPUCMDQUEUE_KICK;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kernel_ctx_id)
+		+ sizeof(buffer_gpu_addr)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kernel_ctx_id, sizeof(kernel_ctx_id));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &buffer_gpu_addr, sizeof(buffer_gpu_addr));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
 void __kbase_tlstream_tl_kbase_device_program_csg(
 	struct kbase_tlstream *stream,
 	u32 kbase_device_id,
 	u32 kernel_ctx_id,
 	u32 gpu_cmdq_grp_handle,
 	u32 kbase_device_csg_slot_index,
-	u32 kbase_device_csg_slot_resumed
+	u32 kbase_device_csg_slot_resuming
 )
 {
 	const u32 msg_id = KBASE_TL_KBASE_DEVICE_PROGRAM_CSG;
@@ -2107,7 +2174,7 @@ void __kbase_tlstream_tl_kbase_device_program_csg(
 		+ sizeof(kernel_ctx_id)
 		+ sizeof(gpu_cmdq_grp_handle)
 		+ sizeof(kbase_device_csg_slot_index)
-		+ sizeof(kbase_device_csg_slot_resumed)
+		+ sizeof(kbase_device_csg_slot_resuming)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2126,7 +2193,7 @@ void __kbase_tlstream_tl_kbase_device_program_csg(
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &kbase_device_csg_slot_resumed, sizeof(kbase_device_csg_slot_resumed));
+		pos, &kbase_device_csg_slot_resuming, sizeof(kbase_device_csg_slot_resuming));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
@@ -2158,13 +2225,71 @@ void __kbase_tlstream_tl_kbase_device_deprogram_csg(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_device_halt_csg(
+void __kbase_tlstream_tl_kbase_device_halting_csg(
+	struct kbase_tlstream *stream,
+	u32 kbase_device_id,
+	u32 kbase_device_csg_slot_index,
+	u32 kbase_device_csg_slot_suspending
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_DEVICE_HALTING_CSG;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kbase_device_id)
+		+ sizeof(kbase_device_csg_slot_index)
+		+ sizeof(kbase_device_csg_slot_suspending)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kbase_device_id, sizeof(kbase_device_id));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kbase_device_csg_slot_suspending, sizeof(kbase_device_csg_slot_suspending));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
+void __kbase_tlstream_tl_kbase_device_suspend_csg(
 	struct kbase_tlstream *stream,
 	u32 kbase_device_id,
 	u32 kbase_device_csg_slot_index
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_DEVICE_HALT_CSG;
+	const u32 msg_id = KBASE_TL_KBASE_DEVICE_SUSPEND_CSG;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kbase_device_id)
+		+ sizeof(kbase_device_csg_slot_index)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kbase_device_id, sizeof(kbase_device_id));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kbase_device_csg_slot_index, sizeof(kbase_device_csg_slot_index));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
+void __kbase_tlstream_tl_kbase_device_csg_idle(
+	struct kbase_tlstream *stream,
+	u32 kbase_device_id,
+	u32 kbase_device_csg_slot_index
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_DEVICE_CSG_IDLE;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kbase_device_id)
 		+ sizeof(kbase_device_csg_slot_index)
@@ -2401,16 +2526,16 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
 	u64 cqs_obj_gpu_addr,
-	u32 cqs_obj_compare_value,
-	u32 cqs_obj_inherit_error
+	u32 compare_value,
+	u32 inherit_error
 )
 {
 	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
 		+ sizeof(cqs_obj_gpu_addr)
-		+ sizeof(cqs_obj_compare_value)
-		+ sizeof(cqs_obj_inherit_error)
+		+ sizeof(compare_value)
+		+ sizeof(inherit_error)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2425,9 +2550,9 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait(
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &cqs_obj_compare_value, sizeof(cqs_obj_compare_value));
+		pos, &compare_value, sizeof(compare_value));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &cqs_obj_inherit_error, sizeof(cqs_obj_inherit_error));
+		pos, &inherit_error, sizeof(inherit_error));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
@@ -2459,16 +2584,24 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	u64 map_import_buf_gpu_addr
+	u64 cqs_obj_gpu_addr,
+	u64 compare_value,
+	u32 condition,
+	u32 data_type,
+	u32 inherit_error
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT;
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
-		+ sizeof(map_import_buf_gpu_addr)
+		+ sizeof(cqs_obj_gpu_addr)
+		+ sizeof(compare_value)
+		+ sizeof(condition)
+		+ sizeof(data_type)
+		+ sizeof(inherit_error)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2481,21 +2614,35 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &kcpu_queue, sizeof(kcpu_queue));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr));
+		pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &compare_value, sizeof(compare_value));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &condition, sizeof(condition));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &data_type, sizeof(data_type));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &inherit_error, sizeof(inherit_error));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	u64 map_import_buf_gpu_addr
+	u64 cqs_obj_gpu_addr,
+	u64 value,
+	u32 operation,
+	u32 data_type
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT;
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
-		+ sizeof(map_import_buf_gpu_addr)
+		+ sizeof(cqs_obj_gpu_addr)
+		+ sizeof(value)
+		+ sizeof(operation)
+		+ sizeof(data_type)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2508,18 +2655,24 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &kcpu_queue, sizeof(kcpu_queue));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr));
+		pos, &cqs_obj_gpu_addr, sizeof(cqs_obj_gpu_addr));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &value, sizeof(value));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &operation, sizeof(operation));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &data_type, sizeof(data_type));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
 	u64 map_import_buf_gpu_addr
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE;
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
 		+ sizeof(map_import_buf_gpu_addr)
@@ -2540,14 +2693,16 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(
 	struct kbase_tlstream *stream,
-	const void *kcpu_queue
+	const void *kcpu_queue,
+	u64 map_import_buf_gpu_addr
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER;
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
+		+ sizeof(map_import_buf_gpu_addr)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2559,22 +2714,22 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(
 	pos = kbasep_serialize_timestamp(buffer, pos);
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &kcpu_queue, sizeof(kcpu_queue));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	const void *group_suspend_buf,
-	u32 gpu_cmdq_grp_handle
+	u64 map_import_buf_gpu_addr
 )
 {
-	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND;
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE;
 	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
 		+ sizeof(kcpu_queue)
-		+ sizeof(group_suspend_buf)
-		+ sizeof(gpu_cmdq_grp_handle)
+		+ sizeof(map_import_buf_gpu_addr)
 		;
 	char *buffer;
 	unsigned long acq_flags;
@@ -2587,9 +2742,7 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(
 	pos = kbasep_serialize_bytes(buffer,
 		pos, &kcpu_queue, sizeof(kcpu_queue));
 	pos = kbasep_serialize_bytes(buffer,
-		pos, &group_suspend_buf, sizeof(group_suspend_buf));
-	pos = kbasep_serialize_bytes(buffer,
-		pos, &gpu_cmdq_grp_handle, sizeof(gpu_cmdq_grp_handle));
+		pos, &map_import_buf_gpu_addr, sizeof(map_import_buf_gpu_addr));
 
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
@@ -2772,6 +2925,60 @@ void __kbase_tlstream_tl_kbase_array_end_kcpuqueue_enqueue_jit_free(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kcpu_queue)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kcpu_queue, sizeof(kcpu_queue));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	const void *group_suspend_buf,
+	u32 gpu_cmdq_grp_handle
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kcpu_queue)
+		+ sizeof(group_suspend_buf)
+		+ sizeof(gpu_cmdq_grp_handle)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kcpu_queue, sizeof(kcpu_queue));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &group_suspend_buf, sizeof(group_suspend_buf));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &gpu_cmdq_grp_handle, sizeof(gpu_cmdq_grp_handle));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
 void __kbase_tlstream_tl_kbase_kcpuqueue_execute_fence_signal_start(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue
@@ -2949,6 +3156,83 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set(
 	kbase_tlstream_msgbuf_release(stream, acq_flags);
 }
 
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kcpu_queue)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kcpu_queue, sizeof(kcpu_queue));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	u32 execute_error
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kcpu_queue)
+		+ sizeof(execute_error)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kcpu_queue, sizeof(kcpu_queue));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &execute_error, sizeof(execute_error));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	u32 execute_error
+)
+{
+	const u32 msg_id = KBASE_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION;
+	const size_t msg_size = sizeof(msg_id) + sizeof(u64)
+		+ sizeof(kcpu_queue)
+		+ sizeof(execute_error)
+		;
+	char *buffer;
+	unsigned long acq_flags;
+	size_t pos = 0;
+
+	buffer = kbase_tlstream_msgbuf_acquire(stream, msg_size, &acq_flags);
+
+	pos = kbasep_serialize_bytes(buffer, pos, &msg_id, sizeof(msg_id));
+	pos = kbasep_serialize_timestamp(buffer, pos);
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &kcpu_queue, sizeof(kcpu_queue));
+	pos = kbasep_serialize_bytes(buffer,
+		pos, &execute_error, sizeof(execute_error));
+
+	kbase_tlstream_msgbuf_release(stream, acq_flags);
+}
+
 void __kbase_tlstream_tl_kbase_kcpuqueue_execute_map_import_start(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue
diff --git a/mali_kbase/tl/mali_kbase_tracepoints.h b/mali_kbase/tl/mali_kbase_tracepoints.h
index b15fe6a..f1f4761 100644
--- a/mali_kbase/tl/mali_kbase_tracepoints.h
+++ b/mali_kbase/tl/mali_kbase_tracepoints.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *
- * (C) COPYRIGHT 2010-2022 ARM Limited. All rights reserved.
+ * (C) COPYRIGHT 2010-2023 ARM Limited. All rights reserved.
  *
  * This program is free software and is provided to you under the terms of the
  * GNU General Public License version 2 as published by the Free Software
@@ -77,7 +77,7 @@ extern const size_t  aux_desc_header_size;
 #define TL_JS_EVENT_STOP      GATOR_JOB_SLOT_STOP
 #define TL_JS_EVENT_SOFT_STOP GATOR_JOB_SLOT_SOFT_STOPPED
 
-#define TLSTREAM_ENABLED (1 << 31)
+#define TLSTREAM_ENABLED (1u << 31)
 
 void __kbase_tlstream_tl_new_ctx(
 	struct kbase_tlstream *stream,
@@ -396,13 +396,19 @@ void __kbase_tlstream_tl_kbase_new_device(
 	u32 kbase_device_supports_gpu_sleep
 );
 
+void __kbase_tlstream_tl_kbase_gpucmdqueue_kick(
+	struct kbase_tlstream *stream,
+	u32 kernel_ctx_id,
+	u64 buffer_gpu_addr
+);
+
 void __kbase_tlstream_tl_kbase_device_program_csg(
 	struct kbase_tlstream *stream,
 	u32 kbase_device_id,
 	u32 kernel_ctx_id,
 	u32 gpu_cmdq_grp_handle,
 	u32 kbase_device_csg_slot_index,
-	u32 kbase_device_csg_slot_resumed
+	u32 kbase_device_csg_slot_resuming
 );
 
 void __kbase_tlstream_tl_kbase_device_deprogram_csg(
@@ -411,7 +417,20 @@ void __kbase_tlstream_tl_kbase_device_deprogram_csg(
 	u32 kbase_device_csg_slot_index
 );
 
-void __kbase_tlstream_tl_kbase_device_halt_csg(
+void __kbase_tlstream_tl_kbase_device_halting_csg(
+	struct kbase_tlstream *stream,
+	u32 kbase_device_id,
+	u32 kbase_device_csg_slot_index,
+	u32 kbase_device_csg_slot_suspending
+);
+
+void __kbase_tlstream_tl_kbase_device_suspend_csg(
+	struct kbase_tlstream *stream,
+	u32 kbase_device_id,
+	u32 kbase_device_csg_slot_index
+);
+
+void __kbase_tlstream_tl_kbase_device_csg_idle(
 	struct kbase_tlstream *stream,
 	u32 kbase_device_id,
 	u32 kbase_device_csg_slot_index
@@ -468,8 +487,8 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
 	u64 cqs_obj_gpu_addr,
-	u32 cqs_obj_compare_value,
-	u32 cqs_obj_inherit_error
+	u32 compare_value,
+	u32 inherit_error
 );
 
 void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set(
@@ -478,34 +497,41 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set(
 	u64 cqs_obj_gpu_addr
 );
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	u64 map_import_buf_gpu_addr
+	u64 cqs_obj_gpu_addr,
+	u64 compare_value,
+	u32 condition,
+	u32 data_type,
+	u32 inherit_error
 );
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	u64 map_import_buf_gpu_addr
+	u64 cqs_obj_gpu_addr,
+	u64 value,
+	u32 operation,
+	u32 data_type
 );
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
 	u64 map_import_buf_gpu_addr
 );
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(
 	struct kbase_tlstream *stream,
-	const void *kcpu_queue
+	const void *kcpu_queue,
+	u64 map_import_buf_gpu_addr
 );
 
-void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue,
-	const void *group_suspend_buf,
-	u32 gpu_cmdq_grp_handle
+	u64 map_import_buf_gpu_addr
 );
 
 void __kbase_tlstream_tl_kbase_array_begin_kcpuqueue_enqueue_jit_alloc(
@@ -548,6 +574,18 @@ void __kbase_tlstream_tl_kbase_array_end_kcpuqueue_enqueue_jit_free(
 	const void *kcpu_queue
 );
 
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue
+);
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	const void *group_suspend_buf,
+	u32 gpu_cmdq_grp_handle
+);
+
 void __kbase_tlstream_tl_kbase_kcpuqueue_execute_fence_signal_start(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue
@@ -587,6 +625,23 @@ void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set(
 	u32 execute_error
 );
 
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue
+);
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	u32 execute_error
+);
+
+void __kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation(
+	struct kbase_tlstream *stream,
+	const void *kcpu_queue,
+	u32 execute_error
+);
+
 void __kbase_tlstream_tl_kbase_kcpuqueue_execute_map_import_start(
 	struct kbase_tlstream *stream,
 	const void *kcpu_queue
@@ -1686,7 +1741,7 @@ struct kbase_tlstream;
 	} while (0)
 
 /**
- * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START - Within function jd_done_nolock
+ * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_START - Within function kbase_jd_done_nolock
  *
  * @kbdev: Kbase device
  * @atom: Atom identifier
@@ -1705,7 +1760,7 @@ struct kbase_tlstream;
 	} while (0)
 
 /**
- * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_END - Within function jd_done_nolock - end
+ * KBASE_TLSTREAM_TL_JD_DONE_NO_LOCK_END - Within function kbase_jd_done_nolock - end
  *
  * @kbdev: Kbase device
  * @atom: Atom identifier
@@ -1982,6 +2037,37 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
+ * KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK - Kernel receives a request to process new GPU queue instructions
+ *
+ * @kbdev: Kbase device
+ * @kernel_ctx_id: Unique ID for the KBase Context
+ * @buffer_gpu_addr: Address of the GPU queue's command buffer
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK(	\
+	kbdev,	\
+	kernel_ctx_id,	\
+	buffer_gpu_addr	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_gpucmdqueue_kick(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kernel_ctx_id,	\
+				buffer_gpu_addr	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_GPUCMDQUEUE_KICK(	\
+	kbdev,	\
+	kernel_ctx_id,	\
+	buffer_gpu_addr	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
  * KBASE_TLSTREAM_TL_KBASE_DEVICE_PROGRAM_CSG - CSG is programmed to a slot
  *
  * @kbdev: Kbase device
@@ -1989,7 +2075,7 @@ struct kbase_tlstream;
  * @kernel_ctx_id: Unique ID for the KBase Context
  * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace
  * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed
- * @kbase_device_csg_slot_resumed: Whether the csg is being resumed
+ * @kbase_device_csg_slot_resuming: Whether the csg is being resumed
  */
 #if MALI_USE_CSF
 #define KBASE_TLSTREAM_TL_KBASE_DEVICE_PROGRAM_CSG(	\
@@ -1998,7 +2084,7 @@ struct kbase_tlstream;
 	kernel_ctx_id,	\
 	gpu_cmdq_grp_handle,	\
 	kbase_device_csg_slot_index,	\
-	kbase_device_csg_slot_resumed	\
+	kbase_device_csg_slot_resuming	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
@@ -2009,7 +2095,7 @@ struct kbase_tlstream;
 				kernel_ctx_id,	\
 				gpu_cmdq_grp_handle,	\
 				kbase_device_csg_slot_index,	\
-				kbase_device_csg_slot_resumed	\
+				kbase_device_csg_slot_resuming	\
 				);	\
 	} while (0)
 #else
@@ -2019,7 +2105,7 @@ struct kbase_tlstream;
 	kernel_ctx_id,	\
 	gpu_cmdq_grp_handle,	\
 	kbase_device_csg_slot_index,	\
-	kbase_device_csg_slot_resumed	\
+	kbase_device_csg_slot_resuming	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
@@ -2029,7 +2115,7 @@ struct kbase_tlstream;
  *
  * @kbdev: Kbase device
  * @kbase_device_id: The ID of the physical hardware
- * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed
+ * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being deprogrammed
  */
 #if MALI_USE_CSF
 #define KBASE_TLSTREAM_TL_KBASE_DEVICE_DEPROGRAM_CSG(	\
@@ -2056,14 +2142,80 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG - CSG is halted
+ * KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG - CSG is halting
  *
  * @kbdev: Kbase device
  * @kbase_device_id: The ID of the physical hardware
- * @kbase_device_csg_slot_index: The index of the slot in the scheduler being programmed
+ * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being halted
+ * @kbase_device_csg_slot_suspending: Whether the csg is being suspended
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG(	\
+	kbdev,	\
+	kbase_device_id,	\
+	kbase_device_csg_slot_index,	\
+	kbase_device_csg_slot_suspending	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_device_halting_csg(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kbase_device_id,	\
+				kbase_device_csg_slot_index,	\
+				kbase_device_csg_slot_suspending	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALTING_CSG(	\
+	kbdev,	\
+	kbase_device_id,	\
+	kbase_device_csg_slot_index,	\
+	kbase_device_csg_slot_suspending	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
+ * KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG - CSG is suspended
+ *
+ * @kbdev: Kbase device
+ * @kbase_device_id: The ID of the physical hardware
+ * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG is being suspended
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG(	\
+	kbdev,	\
+	kbase_device_id,	\
+	kbase_device_csg_slot_index	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_device_suspend_csg(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kbase_device_id,	\
+				kbase_device_csg_slot_index	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_SUSPEND_CSG(	\
+	kbdev,	\
+	kbase_device_id,	\
+	kbase_device_csg_slot_index	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
+ * KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE - KBase device is notified that CSG is idle.
+ *
+ * @kbdev: Kbase device
+ * @kbase_device_id: The ID of the physical hardware
+ * @kbase_device_csg_slot_index: The index of the slot in the scheduler whose CSG for which we are receiving an idle notification
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG(	\
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE(	\
 	kbdev,	\
 	kbase_device_id,	\
 	kbase_device_csg_slot_index	\
@@ -2071,14 +2223,14 @@ struct kbase_tlstream;
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_device_halt_csg(	\
+			__kbase_tlstream_tl_kbase_device_csg_idle(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kbase_device_id,	\
 				kbase_device_csg_slot_index	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_DEVICE_HALT_CSG(	\
+#define KBASE_TLSTREAM_TL_KBASE_DEVICE_CSG_IDLE(	\
 	kbdev,	\
 	kbase_device_id,	\
 	kbase_device_csg_slot_index	\
@@ -2336,16 +2488,16 @@ struct kbase_tlstream;
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
  * @cqs_obj_gpu_addr: CQS Object GPU pointer
- * @cqs_obj_compare_value: Semaphore value that should be exceeded for the WAIT to pass
- * @cqs_obj_inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue
+ * @compare_value: Semaphore value that should be exceeded for the WAIT to pass
+ * @inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue
  */
 #if MALI_USE_CSF
 #define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT(	\
 	kbdev,	\
 	kcpu_queue,	\
 	cqs_obj_gpu_addr,	\
-	cqs_obj_compare_value,	\
-	cqs_obj_inherit_error	\
+	compare_value,	\
+	inherit_error	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
@@ -2354,8 +2506,8 @@ struct kbase_tlstream;
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kcpu_queue,	\
 				cqs_obj_gpu_addr,	\
-				cqs_obj_compare_value,	\
-				cqs_obj_inherit_error	\
+				compare_value,	\
+				inherit_error	\
 				);	\
 	} while (0)
 #else
@@ -2363,8 +2515,8 @@ struct kbase_tlstream;
 	kbdev,	\
 	kcpu_queue,	\
 	cqs_obj_gpu_addr,	\
-	cqs_obj_compare_value,	\
-	cqs_obj_inherit_error	\
+	compare_value,	\
+	inherit_error	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
@@ -2401,76 +2553,104 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT - KCPU Queue enqueues Map Import
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION - KCPU Queue enqueues Wait Operation on Cross Queue Sync Object
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
- * @map_import_buf_gpu_addr: Map import buffer GPU pointer
+ * @cqs_obj_gpu_addr: CQS Object GPU pointer
+ * @compare_value: Value that should be compared to semaphore value for the WAIT to pass
+ * @condition: Condition for unblocking WAITs on Timeline Cross Queue Sync Object (e.g. greater than, less or equal)
+ * @data_type: Data type of a CQS Object's value
+ * @inherit_error: Flag which indicates if the CQS object error state should be inherited by the queue
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION(	\
 	kbdev,	\
 	kcpu_queue,	\
-	map_import_buf_gpu_addr	\
+	cqs_obj_gpu_addr,	\
+	compare_value,	\
+	condition,	\
+	data_type,	\
+	inherit_error	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_wait_operation(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kcpu_queue,	\
-				map_import_buf_gpu_addr	\
+				cqs_obj_gpu_addr,	\
+				compare_value,	\
+				condition,	\
+				data_type,	\
+				inherit_error	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_WAIT_OPERATION(	\
 	kbdev,	\
 	kcpu_queue,	\
-	map_import_buf_gpu_addr	\
+	cqs_obj_gpu_addr,	\
+	compare_value,	\
+	condition,	\
+	data_type,	\
+	inherit_error	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT - KCPU Queue enqueues Unmap Import
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION - KCPU Queue enqueues Set Operation on Cross Queue Sync Object
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
- * @map_import_buf_gpu_addr: Map import buffer GPU pointer
+ * @cqs_obj_gpu_addr: CQS Object GPU pointer
+ * @value: Value that will be set or added to semaphore
+ * @operation: Operation type performed on semaphore value (SET or ADD)
+ * @data_type: Data type of a CQS Object's value
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION(	\
 	kbdev,	\
 	kcpu_queue,	\
-	map_import_buf_gpu_addr	\
+	cqs_obj_gpu_addr,	\
+	value,	\
+	operation,	\
+	data_type	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_cqs_set_operation(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kcpu_queue,	\
-				map_import_buf_gpu_addr	\
+				cqs_obj_gpu_addr,	\
+				value,	\
+				operation,	\
+				data_type	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_CQS_SET_OPERATION(	\
 	kbdev,	\
 	kcpu_queue,	\
-	map_import_buf_gpu_addr	\
+	cqs_obj_gpu_addr,	\
+	value,	\
+	operation,	\
+	data_type	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE - KCPU Queue enqueues Unmap Import ignoring reference count
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT - KCPU Queue enqueues Map Import
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
  * @map_import_buf_gpu_addr: Map import buffer GPU pointer
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT(	\
 	kbdev,	\
 	kcpu_queue,	\
 	map_import_buf_gpu_addr	\
@@ -2478,14 +2658,14 @@ struct kbase_tlstream;
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_map_import(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kcpu_queue,	\
 				map_import_buf_gpu_addr	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_MAP_IMPORT(	\
 	kbdev,	\
 	kcpu_queue,	\
 	map_import_buf_gpu_addr	\
@@ -2494,63 +2674,63 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER - KCPU Queue enqueues Error Barrier
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT - KCPU Queue enqueues Unmap Import
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
+ * @map_import_buf_gpu_addr: Map import buffer GPU pointer
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT(	\
 	kbdev,	\
-	kcpu_queue	\
+	kcpu_queue,	\
+	map_import_buf_gpu_addr	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
-				kcpu_queue	\
+				kcpu_queue,	\
+				map_import_buf_gpu_addr	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT(	\
 	kbdev,	\
-	kcpu_queue	\
+	kcpu_queue,	\
+	map_import_buf_gpu_addr	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND - KCPU Queue enqueues Group Suspend
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE - KCPU Queue enqueues Unmap Import ignoring reference count
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
- * @group_suspend_buf: Pointer to the suspend buffer structure
- * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace
+ * @map_import_buf_gpu_addr: Map import buffer GPU pointer
  */
 #if MALI_USE_CSF
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE(	\
 	kbdev,	\
 	kcpu_queue,	\
-	group_suspend_buf,	\
-	gpu_cmdq_grp_handle	\
+	map_import_buf_gpu_addr	\
 	)	\
 	do {	\
 		int enabled = atomic_read(&kbdev->timeline_flags);	\
 		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
-			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_unmap_import_force(	\
 				__TL_DISPATCH_STREAM(kbdev, obj),	\
 				kcpu_queue,	\
-				group_suspend_buf,	\
-				gpu_cmdq_grp_handle	\
+				map_import_buf_gpu_addr	\
 				);	\
 	} while (0)
 #else
-#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND(	\
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_UNMAP_IMPORT_FORCE(	\
 	kbdev,	\
 	kcpu_queue,	\
-	group_suspend_buf,	\
-	gpu_cmdq_grp_handle	\
+	map_import_buf_gpu_addr	\
 	)	\
 	do { } while (0)
 #endif /* MALI_USE_CSF */
@@ -2758,6 +2938,68 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER - KCPU Queue enqueues Error Barrier
+ *
+ * @kbdev: Kbase device
+ * @kcpu_queue: KCPU queue
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER(	\
+	kbdev,	\
+	kcpu_queue	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_error_barrier(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kcpu_queue	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_ERROR_BARRIER(	\
+	kbdev,	\
+	kcpu_queue	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND - KCPU Queue enqueues Group Suspend
+ *
+ * @kbdev: Kbase device
+ * @kcpu_queue: KCPU queue
+ * @group_suspend_buf: Pointer to the suspend buffer structure
+ * @gpu_cmdq_grp_handle: GPU Command Queue Group handle which will match userspace
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND(	\
+	kbdev,	\
+	kcpu_queue,	\
+	group_suspend_buf,	\
+	gpu_cmdq_grp_handle	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_enqueue_group_suspend(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kcpu_queue,	\
+				group_suspend_buf,	\
+				gpu_cmdq_grp_handle	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_ENQUEUE_GROUP_SUSPEND(	\
+	kbdev,	\
+	kcpu_queue,	\
+	group_suspend_buf,	\
+	gpu_cmdq_grp_handle	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
  * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_FENCE_SIGNAL_START - KCPU Queue starts a Signal on Fence
  *
  * @kbdev: Kbase device
@@ -2874,7 +3116,7 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START - KCPU Queue starts a Wait on an array of Cross Queue Sync Objects
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_START - KCPU Queue starts a Wait on Cross Queue Sync Object
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
@@ -2901,7 +3143,7 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END - KCPU Queue ends a Wait on an array of Cross Queue Sync Objects
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_END - KCPU Queue ends a Wait on Cross Queue Sync Object
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
@@ -2932,7 +3174,7 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
- * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET - KCPU Queue executes a Set on an array of Cross Queue Sync Objects
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET - KCPU Queue executes a Set on Cross Queue Sync Object
  *
  * @kbdev: Kbase device
  * @kcpu_queue: KCPU queue
@@ -2963,6 +3205,95 @@ struct kbase_tlstream;
 #endif /* MALI_USE_CSF */
 
 /**
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START - KCPU Queue starts a Wait Operation on Cross Queue Sync Object
+ *
+ * @kbdev: Kbase device
+ * @kcpu_queue: KCPU queue
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START(	\
+	kbdev,	\
+	kcpu_queue	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_start(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kcpu_queue	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_START(	\
+	kbdev,	\
+	kcpu_queue	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END - KCPU Queue ends a Wait Operation on Cross Queue Sync Object
+ *
+ * @kbdev: Kbase device
+ * @kcpu_queue: KCPU queue
+ * @execute_error: Non-zero error code if KCPU Queue item completed with error, else zero
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END(	\
+	kbdev,	\
+	kcpu_queue,	\
+	execute_error	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_wait_operation_end(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kcpu_queue,	\
+				execute_error	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_WAIT_OPERATION_END(	\
+	kbdev,	\
+	kcpu_queue,	\
+	execute_error	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
+ * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION - KCPU Queue executes a Set Operation on Cross Queue Sync Object
+ *
+ * @kbdev: Kbase device
+ * @kcpu_queue: KCPU queue
+ * @execute_error: Non-zero error code if KCPU Queue item completed with error, else zero
+ */
+#if MALI_USE_CSF
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION(	\
+	kbdev,	\
+	kcpu_queue,	\
+	execute_error	\
+	)	\
+	do {	\
+		int enabled = atomic_read(&kbdev->timeline_flags);	\
+		if (enabled & BASE_TLSTREAM_ENABLE_CSF_TRACEPOINTS)	\
+			__kbase_tlstream_tl_kbase_kcpuqueue_execute_cqs_set_operation(	\
+				__TL_DISPATCH_STREAM(kbdev, obj),	\
+				kcpu_queue,	\
+				execute_error	\
+				);	\
+	} while (0)
+#else
+#define KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_CQS_SET_OPERATION(	\
+	kbdev,	\
+	kcpu_queue,	\
+	execute_error	\
+	)	\
+	do { } while (0)
+#endif /* MALI_USE_CSF */
+
+/**
  * KBASE_TLSTREAM_TL_KBASE_KCPUQUEUE_EXECUTE_MAP_IMPORT_START - KCPU Queue starts a Map Import
  *
  * @kbdev: Kbase device
diff --git a/mali_pixel/BUILD.bazel b/mali_pixel/BUILD.bazel
index 21f1633..11b066e 100644
--- a/mali_pixel/BUILD.bazel
+++ b/mali_pixel/BUILD.bazel
@@ -17,6 +17,7 @@ kernel_module(
     ],
     kernel_build = "//private/google-modules/soc/gs:gs_kernel_build",
     visibility = [
+        "//private/google-modules/gpu/mali_kbase:__pkg__",
         "//private/google-modules/soc/gs:__pkg__",
     ],
     deps = [
diff --git a/mali_pixel/Kbuild b/mali_pixel/Kbuild
index 87e432a..4b519a9 100644
--- a/mali_pixel/Kbuild
+++ b/mali_pixel/Kbuild
@@ -23,21 +23,38 @@ src:=$(if $(patsubst /%,,$(src)),$(srctree)/$(src),$(src))
 
 CONFIG_MALI_MEMORY_GROUP_MANAGER ?= m
 CONFIG_MALI_PRIORITY_CONTROL_MANAGER ?= m
+CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR ?= m
+CONFIG_MALI_PIXEL_STATS ?= m
+CONFIG_MALI_PIXEL_GPU_SLC=y
 
-DEFINES += \
-	-DCONFIG_MALI_MEMORY_GROUP_MANAGER=$(CONFIG_MALI_MEMORY_GROUP_MANAGER) \
-	-DCONFIG_MALI_PRIORITY_CONTROL_MANAGER=$(CONFIG_MALI_PRIORITY_CONTROL_MANAGER)
+mali_pixel-objs :=
 
-# Use our defines when compiling, and include mali platform module headers
-ccflags-y += $(DEFINES) -I$(src)/../common/include
+ifeq ($(CONFIG_MALI_PIXEL_STATS),m)
+	DEFINES += -DCONFIG_MALI_PIXEL_STATS
+	mali_pixel-objs += mali_pixel_stats.o
+endif
 
-mali_pixel-objs :=
 ifeq ($(CONFIG_MALI_MEMORY_GROUP_MANAGER),m)
+	DEFINES += -DCONFIG_MALI_MEMORY_GROUP_MANAGER
 	mali_pixel-objs += memory_group_manager.o
 endif
 ifeq ($(CONFIG_MALI_PRIORITY_CONTROL_MANAGER),m)
+	DEFINES += -DCONFIG_MALI_PRIORITY_CONTROL_MANAGER
 	mali_pixel-objs += priority_control_manager.o
 endif
+ifeq ($(CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR),m)
+	DEFINES += -DCONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR
+	mali_pixel-objs += protected_memory_allocator.o
+endif
+ifeq ($(CONFIG_MALI_PIXEL_GPU_SLC),y)
+	DEFINES += -DCONFIG_MALI_PIXEL_GPU_SLC
+endif
+
+# Use our defines when compiling, and include mali platform module headers
+ccflags-y += \
+    $(DEFINES) \
+    -I$(src)/../common/include \
+    -I$(srctree)/include/linux
 
 # Add kernel module target if any of our config options is enabled
 ifneq ($(mali_pixel-objs),)
diff --git a/mali_pixel/Kconfig b/mali_pixel/Kconfig
index 2406990..10ab093 100644
--- a/mali_pixel/Kconfig
+++ b/mali_pixel/Kconfig
@@ -25,8 +25,22 @@ config MALI_MEMORY_GROUP_MANAGER
 	  for allocation and release of pages for memory pools managed by Mali GPU
 	  device drivers.
 
+config MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
+	depends on MALI_MEMORY_GROUP_MANAGER && DEBUG_FS
+	bool "Enable Mali memory group manager debugfs nodes"
+	default n
+	help
+	  Enables support for memory group manager debugfs nodes
+
 config MALI_PRIORITY_CONTROL_MANAGER
 	tristate "MALI_PRIORITY_CONTROL_MANAGER"
 	help
 	  This option enables an implementation of a priority control manager
 	  for determining the target GPU scheduling priority of a process.
+
+config MALI_PROTECTED_MEMORY_ALLOCATOR
+	tristate "MALI_PROTECTED_MEMORY_ALLOCATOR"
+	help
+	  This option enables an implementation of a protected memory allocator
+	  for allocation and release of pages of protected memory for use by
+	  Mali GPU device drivers.
diff --git a/mali_pixel/Makefile b/mali_pixel/Makefile
index 2bff7de..7b09188 100644
--- a/mali_pixel/Makefile
+++ b/mali_pixel/Makefile
@@ -8,6 +8,9 @@ M ?= $(shell pwd)
 
 KBUILD_OPTIONS += CONFIG_MALI_MEMORY_GROUP_MANAGER=m
 KBUILD_OPTIONS += CONFIG_MALI_PRIORITY_CONTROL_MANAGER=m
+KBUILD_OPTIONS += CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR=m
+KBUILD_OPTIONS += CONFIG_MALI_PIXEL_STATS=m
+KBUILD_OPTIONS += CONFIG_MALI_PIXEL_GPU_SLC=y
 
 KBUILD_OPTIONS += $(KBUILD_EXTRA) # Extra config if any
 
diff --git a/mali_pixel/mali_pixel_mod.c b/mali_pixel/mali_pixel_mod.c
index 1fd4865..75b6c87 100644
--- a/mali_pixel/mali_pixel_mod.c
+++ b/mali_pixel/mali_pixel_mod.c
@@ -8,37 +8,17 @@ MODULE_DESCRIPTION("Pixel platform integration for GPU");
 MODULE_IMPORT_NS(DMA_BUF);
 MODULE_AUTHOR("<sidaths@google.com>");
 MODULE_VERSION("1.0");
-MODULE_SOFTDEP("pre: pixel_stat_sysfs");
 MODULE_SOFTDEP("pre: slc_pmon");
 MODULE_SOFTDEP("pre: slc_dummy");
 MODULE_SOFTDEP("pre: slc_acpm");
 
-extern struct kobject *pixel_stat_kobj;
-
-struct kobject *pixel_stat_gpu_kobj;
-
-static int mali_pixel_init_pixel_stats(void)
-{
-	struct kobject *pixel_stat = pixel_stat_kobj;
-
-	WARN_ON(pixel_stat_kobj == NULL);
-
-	pixel_stat_gpu_kobj = kobject_create_and_add("gpu", pixel_stat);
-	if (!pixel_stat_gpu_kobj)
-		return -ENOMEM;
-
-	return 0;
-}
-
 static int __init mali_pixel_init(void)
 {
 	int ret = 0;
 
-	/* The Pixel Stats Sysfs module needs to be loaded first */
-	if (pixel_stat_kobj == NULL)
-		return -EPROBE_DEFER;
-
+#ifdef CONFIG_MALI_PIXEL_STATS
 	ret = mali_pixel_init_pixel_stats();
+#endif
 	if (ret)
 		goto fail_pixel_stats;
 
@@ -50,13 +30,23 @@ static int __init mali_pixel_init(void)
 
 #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER
 	ret = platform_driver_register(&priority_control_manager_driver);
-#else
 #endif
 	if (ret)
 		goto fail_pcm;
 
+#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR
+	ret = platform_driver_register(&protected_memory_allocator_driver);
+#endif
+	if (ret)
+		goto fail_pma;
+
 	goto exit;
 
+fail_pma:
+#ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER
+	platform_driver_unregister(&priority_control_manager_driver);
+#endif
+
 fail_pcm:
 #ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER
 	platform_driver_unregister(&memory_group_manager_driver);
@@ -74,6 +64,9 @@ module_init(mali_pixel_init);
 
 static void __exit mali_pixel_exit(void)
 {
+#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR
+	platform_driver_unregister(&protected_memory_allocator_driver);
+#endif
 #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER
 	platform_driver_unregister(&priority_control_manager_driver);
 #endif
diff --git a/mali_pixel/mali_pixel_mod.h b/mali_pixel/mali_pixel_mod.h
index 0f5f0d3..3a43c9e 100644
--- a/mali_pixel/mali_pixel_mod.h
+++ b/mali_pixel/mali_pixel_mod.h
@@ -8,4 +8,12 @@ extern struct platform_driver memory_group_manager_driver;
 
 #ifdef CONFIG_MALI_PRIORITY_CONTROL_MANAGER
 extern struct platform_driver priority_control_manager_driver;
-#endif
-\ No newline at end of file
+#endif
+
+#ifdef CONFIG_MALI_PROTECTED_MEMORY_ALLOCATOR
+extern struct platform_driver protected_memory_allocator_driver;
+#endif
+
+#ifdef CONFIG_MALI_PIXEL_STATS
+extern int mali_pixel_init_pixel_stats(void);
+#endif
diff --git a/mali_pixel/mali_pixel_stats.c b/mali_pixel/mali_pixel_stats.c
new file mode 100644
index 0000000..dba388e
--- /dev/null
+++ b/mali_pixel/mali_pixel_stats.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "mali_pixel_mod.h"
+#include <linux/module.h>
+
+MODULE_SOFTDEP("pre: pixel_stat_sysfs");
+
+extern struct kobject *pixel_stat_kobj;
+
+struct kobject *pixel_stat_gpu_kobj;
+
+int mali_pixel_init_pixel_stats(void)
+{
+	struct kobject *pixel_stat = pixel_stat_kobj;
+
+	if (pixel_stat_kobj == NULL)
+		return -EPROBE_DEFER;
+
+	pixel_stat_gpu_kobj = kobject_create_and_add("gpu", pixel_stat);
+	if (!pixel_stat_gpu_kobj)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/mali_pixel/memory_group_manager.c b/mali_pixel/memory_group_manager.c
index 5c98a5d..0cde4e0 100644
--- a/mali_pixel/memory_group_manager.c
+++ b/mali_pixel/memory_group_manager.c
@@ -8,7 +8,7 @@
  */
 
 #include <linux/atomic.h>
-#ifdef CONFIG_DEBUG_FS
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 #include <linux/debugfs.h>
 #endif
 #include <linux/fs.h>
@@ -19,27 +19,41 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/version.h>
+#include <linux/limits.h>
 
 #include <linux/memory_group_manager.h>
 
 #include <soc/google/pt.h>
 
+#include <uapi/gpu/arm/midgard/platform/pixel/pixel_memory_group_manager.h>
+
+
+#define ORDER_SMALL_PAGE 0
+#define ORDER_LARGE_PAGE 9
+
+/* Borr does not have "real" PBHA support. However, since we only use a 36-bit PA on the bus,
+ * AxADDR[39:36] is wired up to the GPU AxUSER[PBHA] field seen by the rest of the system.
+ * Those AxADDR bits come from [39:36] in the page descriptor.
+ *
+ * Odin and Turse have "real" PBHA support using a dedicated output signal and page descriptor field.
+ * The AxUSER[PBHA] field is driven by the GPU's PBHA signal, and AxADDR[39:36] is dropped.
+ * The page descriptor PBHA field is [62:59].
+ *
+ * We could write to both of these locations, as each SoC only reads from its respective PBHA
+ * location with the other being ignored or dropped.
+ *
+ * b/148988078 contains confirmation of the above description.
+ */
+#if IS_ENABLED(CONFIG_SOC_GS101)
 #define PBHA_BIT_POS  (36)
+#else
+#define PBHA_BIT_POS  (59)
+#endif
 #define PBHA_BIT_MASK (0xf)
 
 #define MGM_PBHA_DEFAULT 0
-#define GROUP_ID_TO_PT_IDX(x) ((x)-1)
 
-/* The Mali driver requires that allocations made on one of the groups
- * are not treated specially.
- */
-#define MGM_RESERVED_GROUP_ID 0
-
-/* Imported memory is handled by the allocator of the memory, and the Mali
- * DDK will request a group_id for such memory via mgm_get_import_memory_id().
- * We specify which group we want to use for this here.
- */
-#define MGM_IMPORTED_MEMORY_GROUP_ID (MEMORY_GROUP_MANAGER_NR_GROUPS - 1)
+#define MGM_SENTINEL_PT_SIZE U64_MAX
 
 #define INVALID_GROUP_ID(group_id) \
 	(WARN_ON((group_id) < 0) || \
@@ -68,8 +82,12 @@ static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma,
  * @lp_size:  The number of allocated large(2MB) pages
  * @insert_pfn: The number of calls to map pages for CPU access.
  * @update_gpu_pte: The number of calls to update GPU page table entries.
- * @ptid: The partition ID for this group
+ * @ptid: The active partition ID for this group
  * @pbha: The PBHA bits assigned to this group,
+ * @base_pt: The base partition ID available to this group.
+ * @pt_num: The number of partitions available to this group.
+ * @active_pt_idx: The relative index for the partition backing the group.
+ *                 Different from the absolute ptid.
  * @state: The lifecycle state of the partition associated with this group
  * This structure allows page allocation information to be displayed via
  * debugfs. Display is organized per group with small and large sized pages.
@@ -77,11 +95,17 @@ static inline vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma,
 struct mgm_group {
 	atomic_t size;
 	atomic_t lp_size;
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 	atomic_t insert_pfn;
 	atomic_t update_gpu_pte;
+#endif
 
 	ptid_t ptid;
 	ptpbha_t pbha;
+
+	u32 base_pt;
+	u32 pt_num;
+	u32 active_pt_idx;
 	enum {
 		MGM_GROUP_STATE_NEW = 0,
 		MGM_GROUP_STATE_ENABLED = 10,
@@ -91,10 +115,23 @@ struct mgm_group {
 };
 
 /**
+ * struct partition_stats - Structure for tracking sizing of a partition
+ *
+ * @capacity: The total capacity of each partition
+ * @size: The current size of each partition
+ */
+struct partition_stats {
+	u64 capacity;
+	atomic64_t size;
+};
+
+/**
  * struct mgm_groups - Structure for groups of memory group manager
  *
  * @groups: To keep track of the number of allocated pages of all groups
  * @ngroups: Number of groups actually used
+ * @npartitions: Number of partitions used by all groups combined
+ * @pt_stats: The sizing info for each partition
  * @dev: device attached
  * @pt_handle: Link to SLC partition data
  * @kobj: &sruct kobject used for linking to pixel_stats_sysfs node
@@ -106,10 +143,12 @@ struct mgm_group {
 struct mgm_groups {
 	struct mgm_group groups[MEMORY_GROUP_MANAGER_NR_GROUPS];
 	size_t ngroups;
+	size_t npartitions;
+	struct partition_stats *pt_stats;
 	struct device *dev;
 	struct pt_handle *pt_handle;
 	struct kobject kobj;
-#ifdef CONFIG_DEBUG_FS
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 	struct dentry *mgm_debugfs_root;
 #endif
 };
@@ -118,7 +157,7 @@ struct mgm_groups {
  * DebugFS
  */
 
-#ifdef CONFIG_DEBUG_FS
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 
 static int mgm_debugfs_state_get(void *data, u64 *val)
 {
@@ -249,15 +288,14 @@ static int mgm_debugfs_init(struct mgm_groups *mgm_data)
 	return 0;
 }
 
-#endif /* CONFIG_DEBUG_FS */
+#endif /* CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS */
 
 /*
  * Pixel Stats sysfs
  */
-extern struct kobject *pixel_stat_gpu_kobj;
+#ifdef CONFIG_MALI_PIXEL_STATS
 
-#define ORDER_SMALL_PAGE 0
-#define ORDER_LARGE_PAGE 9
+extern struct kobject *pixel_stat_gpu_kobj;
 
 #define MGM_ATTR_RO(_name) \
 	static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
@@ -343,41 +381,81 @@ static void mgm_sysfs_term(struct mgm_groups *data)
 	kobject_put(&data->kobj);
 }
 
+#else /* CONFIG_MALI_PIXEL_STATS */
+
+static int mgm_sysfs_init(struct mgm_groups *data)
+{
+	return 0;
+}
+
+static void mgm_sysfs_term(struct mgm_groups *data)
+{}
+
+#endif /* CONFIG_MALI_PIXEL_STATS */
+
+static int group_pt_id(struct mgm_groups *data, enum pixel_mgm_group_id group_id, int pt_index)
+{
+	struct mgm_group *group = &data->groups[group_id];
+	if (WARN_ON_ONCE(pt_index >= group->pt_num))
+		return 0;
+
+	return group->base_pt + pt_index;
+}
+
+static int group_active_pt_id(struct mgm_groups *data, enum pixel_mgm_group_id group_id)
+{
+	return group_pt_id(data, group_id, data->groups[group_id].active_pt_idx);
+}
+
 static atomic64_t total_gpu_pages = ATOMIC64_INIT(0);
 
-static void update_size(struct memory_group_manager_device *mgm_dev, int
-		group_id, int order, bool alloc)
+static atomic_t* get_size_counter(struct memory_group_manager_device* mgm_dev, int group_id, int order)
 {
-	static DEFINE_RATELIMIT_STATE(gpu_alloc_rs, 10*HZ, 1);
+	static atomic_t err_atomic;
 	struct mgm_groups *data = mgm_dev->data;
 
 	switch (order) {
 	case ORDER_SMALL_PAGE:
-		if (alloc) {
-			atomic_inc(&data->groups[group_id].size);
-			atomic64_inc(&total_gpu_pages);
-		} else {
-			WARN_ON(atomic_read(&data->groups[group_id].size) == 0);
-			atomic_dec(&data->groups[group_id].size);
-			atomic64_dec(&total_gpu_pages);
-		}
-	break;
-
+		return &data->groups[group_id].size;
 	case ORDER_LARGE_PAGE:
-		if (alloc) {
-			atomic_inc(&data->groups[group_id].lp_size);
-			atomic64_add(1 << ORDER_LARGE_PAGE, &total_gpu_pages);
-		} else {
-			WARN_ON(atomic_read(
-				&data->groups[group_id].lp_size) == 0);
-			atomic_dec(&data->groups[group_id].lp_size);
-			atomic64_sub(1 << ORDER_LARGE_PAGE, &total_gpu_pages);
-		}
-	break;
-
+		return &data->groups[group_id].lp_size;
 	default:
 		dev_err(data->dev, "Unknown order(%d)\n", order);
-	break;
+		return &err_atomic;
+	}
+}
+
+static void update_size(struct memory_group_manager_device *mgm_dev, int
+		group_id, int order, bool alloc)
+{
+	static DEFINE_RATELIMIT_STATE(gpu_alloc_rs, 10*HZ, 1);
+	atomic_t* size = get_size_counter(mgm_dev, group_id, order);
+
+	if (alloc) {
+		atomic_inc(size);
+		atomic64_add(1 << order, &total_gpu_pages);
+	} else {
+		if (atomic_dec_return(size) < 0) {
+			/* b/289501175
+			 * Pages are often 'migrated' to the SLC group, which needs special
+			 * accounting.
+			 *
+			 * TODO: Remove after SLC MGM decoupling b/290354607
+			 */
+			if (!WARN_ON(group_id != MGM_SLC_GROUP_ID)) {
+				/* Undo the dec, and instead decrement the reserved group counter.
+				 * This is still making the assumption that the migration came from
+				 * the reserved group. Currently this is always true, however it
+				 * might not be in future. It would be invasive and costly to track
+				 * where every page came from, so instead this will be fixed as part
+				 * of the b/290354607 effort.
+				 */
+				atomic_inc(size);
+				update_size(mgm_dev, MGM_RESERVED_GROUP_ID, order, alloc);
+				return;
+			}
+		}
+		atomic64_sub(1 << order, &total_gpu_pages);
 	}
 
 	if (atomic64_read(&total_gpu_pages) >= (4 << (30 - PAGE_SHIFT)) &&
@@ -385,6 +463,185 @@ static void update_size(struct memory_group_manager_device *mgm_dev, int
 		pr_warn("total_gpu_pages %lld\n", atomic64_read(&total_gpu_pages));
 }
 
+static void pt_size_invalidate(struct mgm_groups* data, int pt_idx)
+{
+	/* Set the size to a known sentinel value so that we can later detect an update */
+	atomic64_set(&data->pt_stats[pt_idx].size, MGM_SENTINEL_PT_SIZE);
+}
+
+static void pt_size_init(struct mgm_groups* data, int pt_idx, size_t size)
+{
+	/* The resize callback may have already been executed, which would have set
+	 * the correct size. Only update the size if this has not happened.
+	 * We can tell that no resize took place if the size is still a sentinel.
+	 */
+	atomic64_cmpxchg(&data->pt_stats[pt_idx].size, MGM_SENTINEL_PT_SIZE, size);
+}
+
+static void validate_ptid(struct mgm_groups* data, enum pixel_mgm_group_id group_id, int ptid)
+{
+	if (ptid == -EINVAL)
+		dev_err(data->dev, "Failed to get partition for group: %d\n", group_id);
+	else
+		dev_info(data->dev, "pt_client_mutate returned ptid=%d for group=%d", ptid, group_id);
+}
+
+static void update_group(struct mgm_groups* data,
+                         enum pixel_mgm_group_id group_id,
+                         int ptid,
+                         int relative_pt_idx)
+{
+	int const abs_pt_idx = group_pt_id(data, group_id, relative_pt_idx);
+	int const pbha = pt_pbha(data->dev->of_node, abs_pt_idx);
+
+	if (pbha == PT_PBHA_INVALID)
+		dev_err(data->dev, "Failed to get PBHA for group: %d\n", group_id);
+	else
+		dev_info(data->dev, "pt_pbha returned PBHA=%d for group=%d", pbha, group_id);
+
+	data->groups[group_id].ptid = ptid;
+	data->groups[group_id].pbha = pbha;
+	data->groups[group_id].state = MGM_GROUP_STATE_ENABLED;
+	data->groups[group_id].active_pt_idx = relative_pt_idx;
+}
+
+static void disable_partition(struct mgm_groups* data, enum pixel_mgm_group_id group_id)
+{
+	int const active_idx = group_active_pt_id(data, group_id);
+
+	/* Skip if not already enabled */
+	if (data->groups[group_id].state != MGM_GROUP_STATE_ENABLED)
+		return;
+
+	pt_client_disable_no_free(data->pt_handle, active_idx);
+	data->groups[group_id].state = MGM_GROUP_STATE_DISABLED_NOT_FREED;
+
+	pt_size_invalidate(data, active_idx);
+	pt_size_init(data, active_idx, 0);
+}
+
+static void enable_partition(struct mgm_groups* data, enum pixel_mgm_group_id group_id)
+{
+	int ptid;
+	size_t size = 0;
+	int const active_idx = group_active_pt_id(data, group_id);
+
+	/* Skip if already enabled */
+	if (data->groups[group_id].state == MGM_GROUP_STATE_ENABLED)
+		return;
+
+	pt_size_invalidate(data, active_idx);
+
+	ptid = pt_client_enable_size(data->pt_handle, active_idx, &size);
+
+	validate_ptid(data, group_id, ptid);
+
+	update_group(data, group_id, ptid, data->groups[group_id].active_pt_idx);
+
+	pt_size_init(data, active_idx, size);
+}
+
+static void set_group_partition(struct mgm_groups* data,
+                                enum pixel_mgm_group_id group_id,
+                                int new_pt_index)
+{
+	int ptid;
+	size_t size = 0;
+	int const active_idx = group_active_pt_id(data, group_id);
+	int const new_idx = group_pt_id(data, group_id, new_pt_index);
+
+	/* Early out if no changes are needed */
+	if (new_idx == active_idx)
+		return;
+
+	pt_size_invalidate(data, new_idx);
+
+	ptid = pt_client_mutate_size(data->pt_handle, active_idx, new_idx, &size);
+
+	validate_ptid(data, group_id, ptid);
+
+	update_group(data, group_id, ptid, new_pt_index);
+
+	pt_size_init(data, new_idx, size);
+	/* Reset old partition size */
+	atomic64_set(&data->pt_stats[active_idx].size, data->pt_stats[active_idx].capacity);
+}
+
+u64 pixel_mgm_query_group_size(struct memory_group_manager_device* mgm_dev,
+                               enum pixel_mgm_group_id group_id)
+{
+	struct mgm_groups *data;
+	struct mgm_group *group;
+	u64 size = 0;
+
+	/* Early out if the group doesn't exist */
+	if (INVALID_GROUP_ID(group_id))
+		goto done;
+
+	data = mgm_dev->data;
+	group = &data->groups[group_id];
+
+	/* Early out if the group has no partitions */
+	if (group->pt_num == 0)
+		goto done;
+
+	size = atomic64_read(&data->pt_stats[group_active_pt_id(data, group_id)].size);
+
+done:
+	return size;
+}
+EXPORT_SYMBOL(pixel_mgm_query_group_size);
+
+void pixel_mgm_resize_group_to_fit(struct memory_group_manager_device* mgm_dev,
+                                  enum pixel_mgm_group_id group_id,
+                                  u64 demand)
+{
+	struct mgm_groups *data;
+	struct mgm_group *group;
+	s64 diff, cur_size, min_diff = S64_MAX;
+	int pt_idx;
+
+	/* Early out if the group doesn't exist */
+	if (INVALID_GROUP_ID(group_id))
+		goto done;
+
+	data = mgm_dev->data;
+	group = &data->groups[group_id];
+
+	/* Early out if the group has no partitions */
+	if (group->pt_num == 0)
+		goto done;
+
+	/* We can disable the partition if there's no demand */
+	if (demand == 0)
+	{
+		disable_partition(data, group_id);
+		goto done;
+	}
+
+	/* Calculate best partition to use, by finding the nearest capacity */
+	for (pt_idx = 0; pt_idx < group->pt_num; ++pt_idx)
+	{
+		cur_size = data->pt_stats[group_pt_id(data, group_id, pt_idx)].capacity;
+		diff = abs(demand - cur_size);
+
+		if (diff > min_diff)
+			break;
+
+		min_diff = diff;
+	}
+
+	/* Ensure the partition is enabled before trying to mutate it */
+	enable_partition(data, group_id);
+	set_group_partition(data, group_id, pt_idx - 1);
+
+done:
+	dev_dbg(data->dev, "%s: resized memory_group_%d for demand: %lldB", __func__, group_id, demand);
+
+	return;
+}
+EXPORT_SYMBOL(pixel_mgm_resize_group_to_fit);
+
 static struct page *mgm_alloc_page(
 	struct memory_group_manager_device *mgm_dev, int group_id,
 	gfp_t gfp_mask, unsigned int order)
@@ -400,7 +657,7 @@ static struct page *mgm_alloc_page(
 		return NULL;
 
 	if (WARN_ON_ONCE((group_id != MGM_RESERVED_GROUP_ID) &&
-			 (GROUP_ID_TO_PT_IDX(group_id) >= data->ngroups)))
+			 (group_active_pt_id(data, group_id) >= data->npartitions)))
 		return NULL;
 
 	/* We don't expect to be allocting pages into the group used for
@@ -413,38 +670,9 @@ static struct page *mgm_alloc_page(
 	 *  ensure that we have enabled the relevant partitions for it.
 	 */
 	if (group_id != MGM_RESERVED_GROUP_ID) {
-		int ptid, pbha;
 		switch (data->groups[group_id].state) {
 		case MGM_GROUP_STATE_NEW:
-			ptid = pt_client_enable(data->pt_handle,
-				GROUP_ID_TO_PT_IDX(group_id));
-			if (ptid == -EINVAL) {
-				dev_err(data->dev,
-					"Failed to get partition for group: "
-					"%d\n", group_id);
-			} else {
-				dev_info(data->dev,
-					"pt_client_enable returned ptid=%d for"
-					" group=%d",
-					ptid, group_id);
-			}
-
-			pbha = pt_pbha(data->dev->of_node,
-				GROUP_ID_TO_PT_IDX(group_id));
-			if (pbha == PT_PBHA_INVALID) {
-				dev_err(data->dev,
-					"Failed to get PBHA for group: %d\n",
-					 group_id);
-			} else {
-				dev_info(data->dev,
-					"pt_pbha returned PBHA=%d for group=%d",
-					pbha, group_id);
-			}
-
-			data->groups[group_id].ptid = ptid;
-			data->groups[group_id].pbha = pbha;
-			data->groups[group_id].state = MGM_GROUP_STATE_ENABLED;
-
+			enable_partition(data, group_id);
 			break;
 		case MGM_GROUP_STATE_ENABLED:
 		case MGM_GROUP_STATE_DISABLED_NOT_FREED:
@@ -534,7 +762,7 @@ static u64 mgm_update_gpu_pte(
 
 	switch (group_id) {
 	case MGM_RESERVED_GROUP_ID:
-	case  MGM_IMPORTED_MEMORY_GROUP_ID:
+	case MGM_IMPORTED_MEMORY_GROUP_ID:
 		/* The reserved group doesn't set PBHA bits */
 		/* TODO: Determine what to do with imported memory */
 		break;
@@ -558,7 +786,35 @@ static u64 mgm_update_gpu_pte(
 		}
 	}
 
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 	atomic_inc(&data->groups[group_id].update_gpu_pte);
+#endif
+
+	return pte;
+}
+
+static u64 mgm_pte_to_original_pte(struct memory_group_manager_device *mgm_dev, int group_id,
+				int mmu_level, u64 pte)
+{
+	struct mgm_groups *const data = mgm_dev->data;
+	u64 old_pte;
+
+	if (INVALID_GROUP_ID(group_id))
+		return pte;
+
+	switch (group_id) {
+	case MGM_RESERVED_GROUP_ID:
+	case MGM_IMPORTED_MEMORY_GROUP_ID:
+		/* The reserved group doesn't set PBHA bits */
+		/* TODO: Determine what to do with imported memory */
+		break;
+	default:
+		/* All other groups will have PBHA bits, so clear them */
+		old_pte = pte;
+		pte &= ~((u64)PBHA_BIT_MASK << PBHA_BIT_POS);
+		dev_dbg(data->dev, "%s: group_id=%d pte=0x%llx -> 0x%llx\n", __func__, group_id,
+			old_pte, pte);
+	}
 
 	return pte;
 }
@@ -582,57 +838,105 @@ static vm_fault_t mgm_vmf_insert_pfn_prot(
 
 	fault = vmf_insert_pfn_prot(vma, addr, pfn, prot);
 
-	if (fault == VM_FAULT_NOPAGE)
-		atomic_inc(&data->groups[group_id].insert_pfn);
-	else
+	if (fault != VM_FAULT_NOPAGE)
 		dev_err(data->dev, "vmf_insert_pfn_prot failed\n");
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
+	else
+		atomic_inc(&data->groups[group_id].insert_pfn);
+#endif
 
 	return fault;
 }
 
 static void mgm_resize_callback(void *data, int id, size_t size_allocated)
 {
-	/* Currently we don't do anything on partition resize */
 	struct mgm_groups *const mgm_data = (struct mgm_groups *)data;
-	dev_dbg(mgm_data->dev, "Resize callback called, size_allocated: %zu\n",
-		size_allocated);
+	dev_dbg(mgm_data->dev, "Resize callback called, size_allocated: %zu\n", size_allocated);
+	/* Update the partition size for the group */
+	atomic64_set(&mgm_data->pt_stats[id].size, size_allocated);
 }
 
 static int mgm_initialize_data(struct mgm_groups *mgm_data)
 {
 	int i, ret;
 
-	const int ngroups = of_property_count_strings(mgm_data->dev->of_node, "pt_id");
+	/* +1 to include the required default group */
+	const int ngroups = of_property_count_strings(mgm_data->dev->of_node, "groups") + 1;
 	if (WARN_ON(ngroups < 0) ||
 	    WARN_ON(ngroups > MEMORY_GROUP_MANAGER_NR_GROUPS)) {
 		mgm_data->ngroups = 0;
 	} else {
 		mgm_data->ngroups = ngroups;
 	}
+	mgm_data->npartitions = of_property_count_strings(mgm_data->dev->of_node, "pt_id");
+
+	mgm_data->pt_stats = kzalloc(mgm_data->npartitions * sizeof(struct partition_stats), GFP_KERNEL);
+	if (mgm_data->pt_stats == NULL) {
+		dev_err(mgm_data->dev, "failed to allocate space for pt_stats");
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	for (i = 0; i < mgm_data->npartitions; i++) {
+		struct partition_stats* stats;
+		u32 capacity_kb;
+		ret = of_property_read_u32_index(mgm_data->dev->of_node, "pt_size", i, &capacity_kb);
+		if (ret) {
+			dev_err(mgm_data->dev, "failed to read pt_size[%d]", i);
+			continue;
+		}
+
+		stats = &mgm_data->pt_stats[i];
+		// Convert from KB to bytes
+		stats->capacity = (u64)capacity_kb << 10;
+		atomic64_set(&stats->size, stats->capacity);
+	}
 
 	for (i = 0; i < MEMORY_GROUP_MANAGER_NR_GROUPS; i++) {
 		atomic_set(&mgm_data->groups[i].size, 0);
 		atomic_set(&mgm_data->groups[i].lp_size, 0);
+#ifdef CONFIG_MALI_MEMORY_GROUP_MANAGER_DEBUG_FS
 		atomic_set(&mgm_data->groups[i].insert_pfn, 0);
 		atomic_set(&mgm_data->groups[i].update_gpu_pte, 0);
+#endif
 
 		mgm_data->groups[i].pbha = MGM_PBHA_DEFAULT;
+		mgm_data->groups[i].base_pt = 0;
+		mgm_data->groups[i].pt_num = 0;
+		mgm_data->groups[i].active_pt_idx = 0;
 		mgm_data->groups[i].state = MGM_GROUP_STATE_NEW;
 	}
 
+	/* Discover the partitions belonging to each memory group, skipping the reserved group */
+	for (i = 1; i < mgm_data->ngroups; i++) {
+		/* Device tree has no description for the reserved group */
+		int const dt_idx = i - 1;
+
+		int err = of_property_read_u32_index(
+		    mgm_data->dev->of_node, "group_base_pt", dt_idx, &mgm_data->groups[i].base_pt);
+		if (err) {
+			dev_warn(mgm_data->dev, "failed to read base pt index for group %d", i);
+			continue;
+		}
+
+		err = of_property_read_u32_index(
+		    mgm_data->dev->of_node, "group_pt_num", dt_idx, &mgm_data->groups[i].pt_num);
+		if (err)
+			dev_warn(mgm_data->dev, "failed to read pt number for group %d", i);
+	}
+
 	/*
 	 * Initialize SLC partitions. We don't enable partitions until
 	 * we actually allocate memory to the corresponding memory
 	 * group
 	 */
-	mgm_data->pt_handle = pt_client_register(
-		mgm_data->dev->of_node,
-		(void *)mgm_data, &mgm_resize_callback);
+	mgm_data->pt_handle =
+	    pt_client_register(mgm_data->dev->of_node, (void*)mgm_data, &mgm_resize_callback);
 
 	if (IS_ERR(mgm_data->pt_handle)) {
 		ret = PTR_ERR(mgm_data->pt_handle);
 		dev_err(mgm_data->dev, "pt_client_register returned %d\n", ret);
-		return ret;
+		goto out_err;
 	}
 
 	/* We don't use PBHA bits for the reserved memory group, and so
@@ -640,13 +944,26 @@ static int mgm_initialize_data(struct mgm_groups *mgm_data)
 	 */
 	mgm_data->groups[MGM_RESERVED_GROUP_ID].state = MGM_GROUP_STATE_ENABLED;
 
-	ret = mgm_debugfs_init(mgm_data);
-	if (ret)
-		goto out;
+	if ((ret = mgm_debugfs_init(mgm_data)))
+		goto out_err;
 
-	ret = mgm_sysfs_init(mgm_data);
+	if ((ret = mgm_sysfs_init(mgm_data)))
+		goto out_err;
+
+#ifdef CONFIG_MALI_PIXEL_GPU_SLC
+	/* We enable the SLC partition by default to support dynamic SLC caching.
+	 * Enabling will initialize the partition, by querying the pbha and assigning a ptid.
+	 * We then immediately disable the partition, effectively resizing the group to zero,
+	 * whilst still retaining other properties such as pbha.
+	 */
+	enable_partition(mgm_data, MGM_SLC_GROUP_ID);
+	disable_partition(mgm_data, MGM_SLC_GROUP_ID);
+#endif
 
-out:
+	return ret;
+
+out_err:
+	kfree(mgm_data->pt_stats);
 	return ret;
 }
 
@@ -677,8 +994,10 @@ static void mgm_term_data(struct mgm_groups *data)
 			break;
 
 		case MGM_GROUP_STATE_ENABLED:
+			pt_client_disable(data->pt_handle, group_active_pt_id(data, i));
+			break;
 		case MGM_GROUP_STATE_DISABLED_NOT_FREED:
-			pt_client_free(data->pt_handle, group->ptid);
+			pt_client_free(data->pt_handle, group_active_pt_id(data, i));
 			break;
 
 		default:
@@ -704,12 +1023,14 @@ static int memory_group_manager_probe(struct platform_device *pdev)
 		return -ENOMEM;
 
 	mgm_dev->owner = THIS_MODULE;
-	mgm_dev->ops.mgm_alloc_page = mgm_alloc_page;
-	mgm_dev->ops.mgm_free_page = mgm_free_page;
-	mgm_dev->ops.mgm_get_import_memory_id =
-			mgm_get_import_memory_id;
-	mgm_dev->ops.mgm_vmf_insert_pfn_prot = mgm_vmf_insert_pfn_prot;
-	mgm_dev->ops.mgm_update_gpu_pte = mgm_update_gpu_pte;
+	mgm_dev->ops = (struct memory_group_manager_ops){
+		.mgm_alloc_page = mgm_alloc_page,
+		.mgm_free_page = mgm_free_page,
+		.mgm_get_import_memory_id = mgm_get_import_memory_id,
+		.mgm_update_gpu_pte = mgm_update_gpu_pte,
+		.mgm_pte_to_original_pte = mgm_pte_to_original_pte,
+		.mgm_vmf_insert_pfn_prot = mgm_vmf_insert_pfn_prot,
+	};
 
 	mgm_data = kzalloc(sizeof(*mgm_data), GFP_KERNEL);
 	if (!mgm_data) {
diff --git a/mali_pixel/protected_memory_allocator.c b/mali_pixel/protected_memory_allocator.c
new file mode 100644
index 0000000..25b5bde
--- /dev/null
+++ b/mali_pixel/protected_memory_allocator.c
@@ -0,0 +1,580 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2021 Google LLC.
+ *
+ * Protected memory allocator driver for allocation and release of pages of
+ * protected memory for use by Mali GPU device drivers.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-heap.h>
+#include <linux/of.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/protected_memory_allocator.h>
+#include <linux/slab.h>
+
+#define MALI_PMA_DMA_HEAP_NAME "vframe-secure"
+#define MALI_PMA_SLAB_SIZE (1 << 16)
+#define MALI_PMA_SLAB_BLOCK_SIZE (PAGE_SIZE)
+#define MALI_PMA_SLAB_BLOCK_COUNT \
+	(MALI_PMA_SLAB_SIZE / MALI_PMA_SLAB_BLOCK_SIZE)
+#define MALI_PMA_MAX_ALLOC_SIZE (MALI_PMA_SLAB_SIZE)
+
+/**
+ * struct mali_pma_dev - Structure for managing a Mali protected memory
+ *                       allocator device.
+ *
+ * @pma_dev: The base protected memory allocator device.
+ * @dev: The device for which to allocate protected memory.
+ * @dma_heap: The DMA buffer heap from which to allocate protected memory.
+ * @slab_list: List of allocated slabs of protected memory.
+ * @slab_mutex: Mutex used to serialize access to the slab list.
+ */
+struct mali_pma_dev {
+	struct protected_memory_allocator_device pma_dev;
+	struct device *dev;
+	struct dma_heap *dma_heap;
+	struct list_head slab_list;
+	struct mutex slab_mutex;
+};
+
+/**
+ * struct mali_protected_memory_allocation - Structure for tracking a Mali
+ *                                           protected memory allocation.
+ *
+ * @pma: The base protected memory allocation record.
+ * @slab: Protected memory slab used for allocation.
+ * @first_block_index: Index of first memory block allocated from the slab.
+ * @block_count: Count of the number of blocks allocated from the slab.
+ */
+struct mali_protected_memory_allocation {
+	struct protected_memory_allocation pma;
+	struct mali_pma_slab *slab;
+	int first_block_index;
+	int block_count;
+};
+
+/**
+ * struct mali_pma_slab - Structure for managing a slab of Mali protected
+ *                        memory.
+ *
+ * @list_entry: Entry in slab list.
+ * @base: Physical base address of slab memory.
+ * @dma_buf: The DMA buffer allocated for the slab . A reference to the DMA
+ *           buffer is held by this pointer.
+ * @dma_attachment: The DMA buffer device attachment.
+ * @dma_sg_table: The DMA buffer scatter/gather table.
+ * @allocated_block_map: Bit map of allocated blocks in the slab.
+ */
+struct mali_pma_slab {
+	struct list_head list_entry;
+	phys_addr_t base;
+	struct dma_buf *dma_buf;
+	struct dma_buf_attachment *dma_attachment;
+	struct sg_table *dma_sg_table;
+	uint64_t allocated_block_map;
+};
+static_assert(8 * sizeof(((struct mali_pma_slab *) 0)->allocated_block_map) >=
+		MALI_PMA_SLAB_BLOCK_COUNT);
+
+static struct protected_memory_allocation *mali_pma_alloc_page(
+	struct protected_memory_allocator_device *pma_dev,
+	unsigned int order);
+
+static phys_addr_t mali_pma_get_phys_addr(
+	struct protected_memory_allocator_device *pma_dev,
+	struct protected_memory_allocation *pma);
+
+static void mali_pma_free_page(
+	struct protected_memory_allocator_device *pma_dev,
+	struct protected_memory_allocation *pma);
+
+static bool mali_pma_slab_alloc(
+	struct mali_pma_dev* mali_pma_dev,
+	struct mali_protected_memory_allocation *mali_pma, size_t size);
+
+static void mali_pma_slab_dealloc(
+	struct mali_pma_dev* mali_pma_dev,
+	struct mali_protected_memory_allocation *mali_pma);
+
+static bool mali_pma_slab_find_available(
+	struct mali_pma_dev* mali_pma_dev, size_t size,
+	struct mali_pma_slab** p_slab, int* p_block_index);
+
+static struct mali_pma_slab* mali_pma_slab_add(
+	struct mali_pma_dev* mali_pma_dev);
+
+static void mali_pma_slab_remove(
+	struct mali_pma_dev* mali_pma_dev, struct mali_pma_slab* slab);
+
+static int protected_memory_allocator_probe(struct platform_device *pdev);
+
+static int protected_memory_allocator_remove(struct platform_device *pdev);
+
+/**
+ * mali_pma_alloc_page - Allocate protected memory pages
+ *
+ * @pma_dev: The protected memory allocator the request is being made
+ *           through.
+ * @order:   How many pages to allocate, as a base-2 logarithm.
+ *
+ * Return: Pointer to allocated memory, or NULL if allocation failed.
+ */
+static struct protected_memory_allocation *mali_pma_alloc_page(
+	struct protected_memory_allocator_device *pma_dev,
+	unsigned int order) {
+	struct mali_pma_dev *mali_pma_dev;
+	struct protected_memory_allocation* pma = NULL;
+	struct mali_protected_memory_allocation *mali_pma;
+	size_t alloc_size;
+	bool succeeded = false;
+
+	/* Get the Mali protected memory allocator device record. */
+	mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev);
+
+	/* Check requested size against the maximum size. */
+	alloc_size = 1 << (PAGE_SHIFT + order);
+	if (alloc_size > MALI_PMA_MAX_ALLOC_SIZE) {
+		dev_err(mali_pma_dev->dev,
+			"Protected memory allocation size %zu too big\n",
+			alloc_size);
+		goto out;
+	}
+
+	/* Allocate a Mali protected memory allocation record. */
+	mali_pma = devm_kzalloc(
+		mali_pma_dev->dev, sizeof(*mali_pma), GFP_KERNEL);
+	if (!mali_pma) {
+		dev_err(mali_pma_dev->dev,
+			"Failed to allocate a Mali protected memory allocation "
+			"record\n");
+		goto out;
+	}
+	pma = &(mali_pma->pma);
+	pma->order = order;
+
+	/* Allocate Mali protected memory from a slab. */
+	if (!mali_pma_slab_alloc(mali_pma_dev, mali_pma, alloc_size)) {
+		dev_err(mali_pma_dev->dev,
+			"Failed to allocate Mali protected memory.\n");
+		goto out;
+	}
+
+	/* Mark the allocation as successful. */
+	succeeded = true;
+
+out:
+	/* Clean up on error. */
+	if (!succeeded) {
+		if (pma) {
+			mali_pma_free_page(pma_dev, pma);
+			pma = NULL;
+		}
+	}
+
+	return pma;
+}
+
+/**
+ * mali_pma_get_phys_addr - Get the physical address of the protected memory
+ *                          allocation
+ *
+ * @pma_dev: The protected memory allocator the request is being made
+ *           through.
+ * @pma:     The protected memory allocation whose physical address
+ *           shall be retrieved
+ *
+ * Return: The physical address of the given allocation.
+ */
+static phys_addr_t mali_pma_get_phys_addr(
+	struct protected_memory_allocator_device *pma_dev,
+	struct protected_memory_allocation *pma) {
+	return pma->pa;
+}
+
+/**
+ * mali_pma_free_page - Free a page of memory
+ *
+ * @pma_dev: The protected memory allocator the request is being made
+ *           through.
+ * @pma:     The protected memory allocation to free.
+ */
+static void mali_pma_free_page(
+	struct protected_memory_allocator_device *pma_dev,
+	struct protected_memory_allocation *pma) {
+	struct mali_pma_dev *mali_pma_dev;
+	struct mali_protected_memory_allocation *mali_pma;
+
+	/*
+	 * Get the Mali protected memory allocator device record and allocation
+	 * record.
+	 */
+	mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev);
+	mali_pma =
+		container_of(pma, struct mali_protected_memory_allocation, pma);
+
+	/* Deallocate Mali protected memory from the slab. */
+	mali_pma_slab_dealloc(mali_pma_dev, mali_pma);
+
+	/* Deallocate the Mali protected memory allocation record. */
+	devm_kfree(mali_pma_dev->dev, mali_pma);
+}
+
+/**
+ * mali_pma_slab_alloc - Allocate protected memory from a slab
+ *
+ * @mali_pma_dev: Mali protected memory allocator device.
+ * @mali_pma: Mali protected memory allocation record to hold the slab memory.
+ * @size: Size in bytes of memory to allocate.
+ *
+ * Return: True if memory was successfully allocated.
+ */
+static bool mali_pma_slab_alloc(
+	struct mali_pma_dev *mali_pma_dev,
+	struct mali_protected_memory_allocation *mali_pma, size_t size) {
+	struct mali_pma_slab *slab;
+	int start_block;
+	int block_count;
+	bool succeeded = false;
+
+	/* Lock the slab list. */
+	mutex_lock(&(mali_pma_dev->slab_mutex));
+
+	/*
+	 * Try finding an existing slab from which to allocate. If none are
+	 * available, add a new slab and allocate from it.
+	 */
+	if (!mali_pma_slab_find_available(
+		mali_pma_dev, size, &slab, &start_block)) {
+		slab = mali_pma_slab_add(mali_pma_dev);
+		if (!slab) {
+			goto out;
+		}
+		start_block = 0;
+	}
+
+	/* Allocate a contiguous set of blocks from the slab. */
+	block_count = DIV_ROUND_UP(size, MALI_PMA_SLAB_BLOCK_SIZE);
+	bitmap_set((unsigned long *) &(slab->allocated_block_map),
+			start_block, block_count);
+
+	/*
+	 * Use the allocated slab memory for the Mali protected memory
+	 * allocation.
+	 */
+	mali_pma->pma.pa =
+		slab->base + (start_block * MALI_PMA_SLAB_BLOCK_SIZE);
+	mali_pma->slab = slab;
+	mali_pma->first_block_index = start_block;
+	mali_pma->block_count = block_count;
+
+	/* Mark the allocation as successful. */
+	succeeded = true;
+
+out:
+	/* Unlock the slab list. */
+	mutex_unlock(&(mali_pma_dev->slab_mutex));
+
+	return succeeded;
+}
+
+/**
+ * mali_pma_slab_dealloc - Deallocate protected memory from a slab
+ *
+ * @mali_pma_dev: Mali protected memory allocator device.
+ * @mali_pma: Mali protected memory allocation record holding slab memory to
+ *            deallocate.
+ */
+static void mali_pma_slab_dealloc(
+	struct mali_pma_dev *mali_pma_dev,
+	struct mali_protected_memory_allocation *mali_pma) {
+	struct mali_pma_slab *slab;
+
+	/* Lock the slab list. */
+	mutex_lock(&(mali_pma_dev->slab_mutex));
+
+	/* Get the slab. */
+	slab = mali_pma->slab;
+
+	/* Deallocate the slab. */
+	if (slab != NULL) {
+		/* Deallocate all the blocks in the slab. */
+		bitmap_clear((unsigned long *) &(slab->allocated_block_map),
+				mali_pma->first_block_index,
+				mali_pma->block_count);
+
+		/* If no slab blocks remain allocated, remove the slab. */
+		if (bitmap_empty(
+			(unsigned long *) &(slab->allocated_block_map),
+			MALI_PMA_SLAB_BLOCK_COUNT)) {
+			mali_pma_slab_remove(mali_pma_dev, slab);
+		}
+	}
+
+	/* Unlock the slab list. */
+	mutex_unlock(&(mali_pma_dev->slab_mutex));
+}
+
+/**
+ * mali_pma_slab_find_available - Find a slab with available memory
+ *
+ * Must be called with the slab list mutex locked.
+ *
+ * @mali_pma_dev: Mali protected memory allocator device.
+ * @size: Size in bytes of requested memory.
+ * @p_slab: Returned slab with requested memory available.
+ * @p_block_index: Returned starting block index of available memory.
+ *
+ * Return: True if a slab was found with the requested memory available.
+ */
+static bool mali_pma_slab_find_available(
+	struct mali_pma_dev *mali_pma_dev, size_t size,
+	struct mali_pma_slab **p_slab, int *p_block_index) {
+	struct mali_pma_slab *slab;
+	int block_count;
+	int start_block;
+	bool found = false;
+
+	/* Ensure the slab list mutex is locked. */
+	lockdep_assert_held(&(mali_pma_dev->slab_mutex));
+
+	/* Search slabs for a contiguous set of blocks of the requested size. */
+	block_count = DIV_ROUND_UP(size, MALI_PMA_SLAB_BLOCK_SIZE);
+	list_for_each_entry(slab, &(mali_pma_dev->slab_list), list_entry) {
+		start_block = bitmap_find_next_zero_area_off(
+			(unsigned long *) &(slab->allocated_block_map),
+			MALI_PMA_SLAB_BLOCK_COUNT, 0, block_count, 0, 0);
+		if (start_block < MALI_PMA_SLAB_BLOCK_COUNT) {
+			found = true;
+			break;
+		}
+	}
+
+	/* Return results if found. */
+	if (found) {
+		*p_slab = slab;
+		*p_block_index = start_block;
+	}
+
+	return found;
+}
+
+/**
+ * mali_pma_slab_add - Allocate and add a new slab
+ *
+ * Must be called with the slab list mutex locked.
+ *
+ * @mali_pma_dev: Mali protected memory allocator device.
+ *
+ * Return: Newly added slab.
+ */
+static struct mali_pma_slab *mali_pma_slab_add(
+	struct mali_pma_dev *mali_pma_dev) {
+	struct mali_pma_slab *slab = NULL;
+	struct dma_buf *dma_buf;
+	struct dma_buf_attachment *dma_attachment;
+	struct sg_table *dma_sg_table;
+	bool succeeded = false;
+
+	/* Ensure the slab list mutex is locked. */
+	lockdep_assert_held(&(mali_pma_dev->slab_mutex));
+
+	/* Allocate and initialize a Mali protected memory slab record. */
+	slab = devm_kzalloc(mali_pma_dev->dev, sizeof(*slab), GFP_KERNEL);
+	if (!slab) {
+		dev_err(mali_pma_dev->dev,
+			"Failed to allocate a Mali protected memory slab.\n");
+		goto out;
+	}
+	INIT_LIST_HEAD(&(slab->list_entry));
+
+	/* Allocate a DMA buffer. */
+	dma_buf = dma_heap_buffer_alloc(
+		mali_pma_dev->dma_heap, MALI_PMA_SLAB_SIZE, O_RDWR, 0);
+	if (IS_ERR(dma_buf)) {
+		dev_err(mali_pma_dev->dev,
+			"Failed to allocate a DMA buffer of size %d\n",
+			MALI_PMA_SLAB_SIZE);
+		goto out;
+	}
+	slab->dma_buf = dma_buf;
+
+	/* Attach the device to the DMA buffer. */
+	dma_attachment = dma_buf_attach(dma_buf, mali_pma_dev->dev);
+	if (IS_ERR(dma_attachment)) {
+		dev_err(mali_pma_dev->dev,
+			"Failed to attach the device to the DMA buffer\n");
+		goto out;
+	}
+	slab->dma_attachment = dma_attachment;
+
+	/* Map the DMA buffer into the attached device address space. */
+	dma_sg_table =
+		dma_buf_map_attachment(dma_attachment, DMA_BIDIRECTIONAL);
+	if (IS_ERR(dma_sg_table)) {
+		dev_err(mali_pma_dev->dev, "Failed to map the DMA buffer\n");
+		goto out;
+	}
+	slab->dma_sg_table = dma_sg_table;
+	slab->base = page_to_phys(sg_page(dma_sg_table->sgl));
+
+	/* Add the slab to the slab list. */
+	list_add(&(slab->list_entry), &(mali_pma_dev->slab_list));
+
+	/* Mark that the slab was successfully added. */
+	succeeded = true;
+
+out:
+	/* Clean up on failure. */
+	if (!succeeded && (slab != NULL)) {
+		mali_pma_slab_remove(mali_pma_dev, slab);
+		slab = NULL;
+	}
+
+	return slab;
+}
+
+/**
+ * mali_pma_slab_remove - Remove and deallocate a slab
+ *
+ * Must be called with the slab list mutex locked.
+ *
+ * @mali_pma_dev: Mali protected memory allocator device.
+ * @slab: Slab to remove and deallocate.
+ */
+static void mali_pma_slab_remove(
+	struct mali_pma_dev *mali_pma_dev, struct mali_pma_slab *slab) {
+	/* Ensure the slab list mutex is locked. */
+	lockdep_assert_held(&(mali_pma_dev->slab_mutex));
+
+	/* Free the Mali protected memory slab allocation. */
+	if (slab->dma_sg_table) {
+		dma_buf_unmap_attachment(
+			slab->dma_attachment,
+	 		slab->dma_sg_table, DMA_BIDIRECTIONAL);
+	}
+	if (slab->dma_attachment) {
+		dma_buf_detach(slab->dma_buf, slab->dma_attachment);
+	}
+	if (slab->dma_buf) {
+		dma_buf_put(slab->dma_buf);
+	}
+
+	/* Remove the slab from the slab list. */
+	list_del(&(slab->list_entry));
+
+	/* Deallocate the Mali protected memory slab record. */
+	devm_kfree(mali_pma_dev->dev, slab);
+}
+
+/**
+ * protected_memory_allocator_probe - Probe the protected memory allocator
+ *                                    device
+ *
+ * @pdev: The platform device to probe.
+ */
+static int protected_memory_allocator_probe(struct platform_device *pdev)
+{
+	struct dma_heap *pma_heap;
+	struct mali_pma_dev *mali_pma_dev;
+	struct protected_memory_allocator_device *pma_dev;
+	int ret = 0;
+
+	/* Try locating a PMA heap, defer if not present (yet). */
+	pma_heap = dma_heap_find(MALI_PMA_DMA_HEAP_NAME);
+	if (!pma_heap) {
+		dev_warn(&(pdev->dev),
+			"Failed to find \"%s\" DMA buffer heap. Deferring.\n",
+			MALI_PMA_DMA_HEAP_NAME);
+		ret = -EPROBE_DEFER;
+		goto out;
+	}
+
+	/* Create a Mali protected memory allocator device record. */
+	mali_pma_dev = kzalloc(sizeof(*mali_pma_dev), GFP_KERNEL);
+	if (!mali_pma_dev) {
+		dev_err(&(pdev->dev),
+			"Failed to create a Mali protected memory allocator "
+			"device record\n");
+		dma_heap_put(pma_heap);
+		ret = -ENOMEM;
+		goto out;
+	}
+	pma_dev = &(mali_pma_dev->pma_dev);
+	platform_set_drvdata(pdev, pma_dev);
+
+	/* Initialize the slab list. */
+	INIT_LIST_HEAD(&(mali_pma_dev->slab_list));
+	mutex_init(&(mali_pma_dev->slab_mutex));
+
+	/* Configure the Mali protected memory allocator. */
+	mali_pma_dev->dev = &(pdev->dev);
+	pma_dev->owner = THIS_MODULE;
+	pma_dev->ops.pma_alloc_page = mali_pma_alloc_page;
+	pma_dev->ops.pma_get_phys_addr = mali_pma_get_phys_addr;
+	pma_dev->ops.pma_free_page = mali_pma_free_page;
+
+	/* Assign the DMA buffer heap. */
+	mali_pma_dev->dma_heap = pma_heap;
+
+	/* Log that the protected memory allocator was successfully probed. */
+	dev_info(&(pdev->dev),
+		"Protected memory allocator probed successfully\n");
+
+out:
+	return ret;
+}
+
+/**
+ * protected_memory_allocator_remove - Remove the protected memory allocator
+ *                                     device
+ *
+ * @pdev: The protected memory allocator platform device to remove.
+ */
+static int protected_memory_allocator_remove(struct platform_device *pdev)
+{
+	struct protected_memory_allocator_device *pma_dev;
+	struct mali_pma_dev *mali_pma_dev;
+
+	/* Get the Mali protected memory allocator device record. */
+	pma_dev = platform_get_drvdata(pdev);
+	if (!pma_dev) {
+		return 0;
+	}
+	mali_pma_dev = container_of(pma_dev, struct mali_pma_dev, pma_dev);
+
+	/* Warn if there are any outstanding protected memory slabs. */
+	if (!list_empty(&(mali_pma_dev->slab_list))) {
+		dev_warn(&(pdev->dev),
+			"Some protected memory has been left allocated\n");
+	}
+
+	/* Release the DMA buffer heap. */
+	if (mali_pma_dev->dma_heap) {
+		dma_heap_put(mali_pma_dev->dma_heap);
+	}
+
+	/* Free the Mali protected memory allocator device record. */
+	kfree(mali_pma_dev);
+
+	return 0;
+}
+
+static const struct of_device_id protected_memory_allocator_dt_ids[] = {
+	{ .compatible = "arm,protected-memory-allocator" },
+	{ /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, protected_memory_allocator_dt_ids);
+
+struct platform_driver protected_memory_allocator_driver = {
+	.probe = protected_memory_allocator_probe,
+	.remove = protected_memory_allocator_remove,
+	.driver = {
+		.name = "mali-pma",
+		.owner = THIS_MODULE,
+		.of_match_table = of_match_ptr(protected_memory_allocator_dt_ids),
+		.suppress_bind_attrs = true,
+	}
+};
+
author	Will McVicker <willmcvicker@google.com>	2024-04-15 11:41:22 -0700
committer	Will McVicker <willmcvicker@google.com>	2024-04-16 10:17:07 -0700
commit	0aa4c41c172f1e2acdf976c655f75a7a21db9791 (patch)
tree	878a00410737d020c7be8fa0e2ab6849e310645e
parent	de85b3c05698f1ce2829d3ff977dee90be48b2d8 (diff)
parent	cfb55729953d62d99f66b0adc59963b189e9394b (diff)
download	gpu-android14-gs-pixel-6.1.tar.gz