diff options
Diffstat (limited to 'docs/firmware-design.md')
-rw-r--r-- | docs/firmware-design.md | 1807 |
1 files changed, 0 insertions, 1807 deletions
diff --git a/docs/firmware-design.md b/docs/firmware-design.md deleted file mode 100644 index 72525bd0..00000000 --- a/docs/firmware-design.md +++ /dev/null @@ -1,1807 +0,0 @@ -ARM Trusted Firmware Design -=========================== - -Contents : - -1. [Introduction](#1--introduction) -2. [Cold boot](#2--cold-boot) -3. [EL3 runtime services framework](#3--el3-runtime-services-framework) -4. [Power State Coordination Interface](#4--power-state-coordination-interface) -5. [Secure-EL1 Payloads and Dispatchers](#5--secure-el1-payloads-and-dispatchers) -6. [Crash Reporting in BL3-1](#6--crash-reporting-in-bl3-1) -7. [Guidelines for Reset Handlers](#7--guidelines-for-reset-handlers) -8. [CPU specific operations framework](#8--cpu-specific-operations-framework) -9. [Memory layout of BL images](#9-memory-layout-of-bl-images) -10. [Firmware Image Package (FIP)](#10--firmware-image-package-fip) -11. [Use of coherent memory in Trusted Firmware](#11--use-of-coherent-memory-in-trusted-firmware) -12. [Code Structure](#12--code-structure) -13. [References](#13--references) - - -1. Introduction ----------------- - -The ARM Trusted Firmware implements a subset of the Trusted Board Boot -Requirements (TBBR) Platform Design Document (PDD) [1] for ARM reference -platforms. The TBB sequence starts when the platform is powered on and runs up -to the stage where it hands-off control to firmware running in the normal -world in DRAM. This is the cold boot path. - -The ARM Trusted Firmware also implements the Power State Coordination Interface -([PSCI]) PDD [2] as a runtime service. PSCI is the interface from normal world -software to firmware implementing power management use-cases (for example, -secondary CPU boot, hotplug and idle). Normal world software can access ARM -Trusted Firmware runtime services via the ARM SMC (Secure Monitor Call) -instruction. The SMC instruction must be used as mandated by the [SMC Calling -Convention PDD][SMCCC] [3]. - -The ARM Trusted Firmware implements a framework for configuring and managing -interrupts generated in either security state. The details of the interrupt -management framework and its design can be found in [ARM Trusted -Firmware Interrupt Management Design guide][INTRG] [4]. - -2. Cold boot -------------- - -The cold boot path starts when the platform is physically turned on. One of -the CPUs released from reset is chosen as the primary CPU, and the remaining -CPUs are considered secondary CPUs. The primary CPU is chosen through -platform-specific means. The cold boot path is mainly executed by the primary -CPU, other than essential CPU initialization executed by all CPUs. The -secondary CPUs are kept in a safe platform-specific state until the primary -CPU has performed enough initialization to boot them. - -The cold boot path in this implementation of the ARM Trusted Firmware is divided -into five steps (in order of execution): - -* Boot Loader stage 1 (BL1) _AP Trusted ROM_ -* Boot Loader stage 2 (BL2) _Trusted Boot Firmware_ -* Boot Loader stage 3-1 (BL3-1) _EL3 Runtime Firmware_ -* Boot Loader stage 3-2 (BL3-2) _Secure-EL1 Payload_ (optional) -* Boot Loader stage 3-3 (BL3-3) _Non-trusted Firmware_ - -ARM development platforms (Fixed Virtual Platforms (FVPs) and Juno) implement a -combination of the following types of memory regions. Each bootloader stage uses -one or more of these memory regions. - -* Regions accessible from both non-secure and secure states. For example, - non-trusted SRAM, ROM and DRAM. -* Regions accessible from only the secure state. For example, trusted SRAM and - ROM. The FVPs also implement the trusted DRAM which is statically - configured. Additionally, the Base FVPs and Juno development platform - configure the TrustZone Controller (TZC) to create a region in the DRAM - which is accessible only from the secure state. - - -The sections below provide the following details: - -* initialization and execution of the first three stages during cold boot -* specification of the BL3-1 entrypoint requirements for use by alternative - Trusted Boot Firmware in place of the provided BL1 and BL2 -* changes in BL3-1 behavior when using the `RESET_TO_BL31` option which - allows BL3-1 to run without BL1 and BL2 - - -### BL1 - -This stage begins execution from the platform's reset vector at EL3. The reset -address is platform dependent but it is usually located in a Trusted ROM area. -The BL1 data section is copied to trusted SRAM at runtime. - -On the ARM FVP port, BL1 code starts execution from the reset vector at address -`0x00000000` (trusted ROM). The BL1 data section is copied to the start of -trusted SRAM at address `0x04000000`. - -On the Juno ARM development platform port, BL1 code starts execution at -`0x0BEC0000` (FLASH). The BL1 data section is copied to trusted SRAM at address -`0x04001000. - -The functionality implemented by this stage is as follows. - -#### Determination of boot path - -Whenever a CPU is released from reset, BL1 needs to distinguish between a warm -boot and a cold boot. This is done using platform-specific mechanisms (see the -`platform_get_entrypoint()` function in the [Porting Guide]). In the case of a -warm boot, a CPU is expected to continue execution from a seperate -entrypoint. In the case of a cold boot, the secondary CPUs are placed in a safe -platform-specific state (see the `plat_secondary_cold_boot_setup()` function in -the [Porting Guide]) while the primary CPU executes the remaining cold boot path -as described in the following sections. - -#### Architectural initialization - -BL1 performs minimal architectural initialization as follows. - -* Exception vectors - - BL1 sets up simple exception vectors for both synchronous and asynchronous - exceptions. The default behavior upon receiving an exception is to populate - a status code in the general purpose register `X0` and call the - `plat_report_exception()` function (see the [Porting Guide]). The status - code is one of: - - 0x0 : Synchronous exception from Current EL with SP_EL0 - 0x1 : IRQ exception from Current EL with SP_EL0 - 0x2 : FIQ exception from Current EL with SP_EL0 - 0x3 : System Error exception from Current EL with SP_EL0 - 0x4 : Synchronous exception from Current EL with SP_ELx - 0x5 : IRQ exception from Current EL with SP_ELx - 0x6 : FIQ exception from Current EL with SP_ELx - 0x7 : System Error exception from Current EL with SP_ELx - 0x8 : Synchronous exception from Lower EL using aarch64 - 0x9 : IRQ exception from Lower EL using aarch64 - 0xa : FIQ exception from Lower EL using aarch64 - 0xb : System Error exception from Lower EL using aarch64 - 0xc : Synchronous exception from Lower EL using aarch32 - 0xd : IRQ exception from Lower EL using aarch32 - 0xe : FIQ exception from Lower EL using aarch32 - 0xf : System Error exception from Lower EL using aarch32 - - The `plat_report_exception()` implementation on the ARM FVP port programs - the Versatile Express System LED register in the following format to - indicate the occurence of an unexpected exception: - - SYS_LED[0] - Security state (Secure=0/Non-Secure=1) - SYS_LED[2:1] - Exception Level (EL3=0x3, EL2=0x2, EL1=0x1, EL0=0x0) - SYS_LED[7:3] - Exception Class (Sync/Async & origin). This is the value - of the status code - - A write to the LED register reflects in the System LEDs (S6LED0..7) in the - CLCD window of the FVP. - - BL1 does not expect to receive any exceptions other than the SMC exception. - For the latter, BL1 installs a simple stub. The stub expects to receive - only a single type of SMC (determined by its function ID in the general - purpose register `X0`). This SMC is raised by BL2 to make BL1 pass control - to BL3-1 (loaded by BL2) at EL3. Any other SMC leads to an assertion - failure. - -* CPU initialization - - BL1 calls the `reset_handler()` function which in turn calls the CPU - specific reset handler function (see the section: "CPU specific operations - framework"). - -* MMU setup - - BL1 sets up EL3 memory translation by creating page tables to cover the - first 4GB of physical address space. This covers all the memories and - peripherals needed by BL1. - -* Control register setup - - `SCTLR_EL3`. Instruction cache is enabled by setting the `SCTLR_EL3.I` - bit. Alignment and stack alignment checking is enabled by setting the - `SCTLR_EL3.A` and `SCTLR_EL3.SA` bits. Exception endianness is set to - little-endian by clearing the `SCTLR_EL3.EE` bit. - - - `SCR_EL3`. The register width of the next lower exception level is set to - AArch64 by setting the `SCR.RW` bit. - - - `CPTR_EL3`. Accesses to the `CPACR_EL1` register from EL1 or EL2, or the - `CPTR_EL2` register from EL2 are configured to not trap to EL3 by - clearing the `CPTR_EL3.TCPAC` bit. Access to the trace functionality is - configured not to trap to EL3 by clearing the `CPTR_EL3.TTA` bit. - Instructions that access the registers associated with Floating Point - and Advanced SIMD execution are configured to not trap to EL3 by - clearing the `CPTR_EL3.TFP` bit. - -#### Platform initialization - -BL1 enables issuing of snoop and DVM (Distributed Virtual Memory) requests from -the CCI-400 slave interface corresponding to the cluster that includes the -primary CPU. BL1 also initializes UART0 (PL011 console), which enables access to -the `printf` family of functions in BL1. - -#### BL2 image load and execution - -BL1 execution continues as follows: - -1. BL1 determines the amount of free trusted SRAM memory available by - calculating the extent of its own data section, which also resides in - trusted SRAM. BL1 loads a BL2 raw binary image from platform storage, at a - platform-specific base address. If the BL2 image file is not present or if - there is not enough free trusted SRAM the following error message is - printed: - - "Failed to load boot loader stage 2 (BL2) firmware." - - If the load is successful, BL1 updates the limits of the remaining free - trusted SRAM. It also populates information about the amount of trusted - SRAM used by the BL2 image. The exact load location of the image is - provided as a base address in the platform header. Further description of - the memory layout can be found later in this document. - -2. BL1 prints the following string from the primary CPU to indicate successful - execution of the BL1 stage: - - "Booting trusted firmware boot loader stage 1" - -3. BL1 passes control to the BL2 image at Secure EL1, starting from its load - address. - -4. BL1 also passes information about the amount of trusted SRAM used and - available for use. This information is populated at a platform-specific - memory address. - - -### BL2 - -BL1 loads and passes control to BL2 at Secure-EL1. BL2 is linked against and -loaded at a platform-specific base address (more information can be found later -in this document). The functionality implemented by BL2 is as follows. - -#### Architectural initialization - -BL2 performs minimal architectural initialization required for subsequent -stages of the ARM Trusted Firmware and normal world software. It sets up -Secure EL1 memory translation by creating page tables to address the first 4GB -of the physical address space in a similar way to BL1. EL1 and EL0 are given -access to Floating Point & Advanced SIMD registers by clearing the `CPACR.FPEN` -bits. - -#### Platform initialization - -BL2 copies the information regarding the trusted SRAM populated by BL1 using a -platform-specific mechanism. It calculates the limits of DRAM (main memory) -to determine whether there is enough space to load the BL3-3 image. A platform -defined base address is used to specify the load address for the BL3-1 image. -It also defines the extents of memory available for use by the BL3-2 image. -BL2 also initializes UART0 (PL011 console), which enables access to the -`printf` family of functions in BL2. Platform security is initialized to allow -access to controlled components. The storage abstraction layer is initialized -which is used to load further bootloader images. - -#### BL3-0 (System Control Processor Firmware) image load - -Some systems have a separate System Control Processor (SCP) for power, clock, -reset and system control. BL2 loads the optional BL3-0 image from platform -storage into a platform-specific region of secure memory. The subsequent -handling of BL3-0 is platform specific. For example, on the Juno ARM development -platform port the image is transferred into SCP memory using the SCPI protocol -after being loaded in the trusted SRAM memory at address `0x04009000`. The SCP -executes BL3-0 and signals to the Application Processor (AP) for BL2 execution -to continue. - -#### BL3-1 (EL3 Runtime Firmware) image load - -BL2 loads the BL3-1 image from platform storage into a platform-specific address -in trusted SRAM. If there is not enough memory to load the image or image is -missing it leads to an assertion failure. If the BL3-1 image loads successfully, -BL2 updates the amount of trusted SRAM used and available for use by BL3-1. -This information is populated at a platform-specific memory address. - -#### BL3-2 (Secure-EL1 Payload) image load - -BL2 loads the optional BL3-2 image from platform storage into a platform- -specific region of secure memory. The image executes in the secure world. BL2 -relies on BL3-1 to pass control to the BL3-2 image, if present. Hence, BL2 -populates a platform-specific area of memory with the entrypoint/load-address -of the BL3-2 image. The value of the Saved Processor Status Register (`SPSR`) -for entry into BL3-2 is not determined by BL2, it is initialized by the -Secure-EL1 Payload Dispatcher (see later) within BL3-1, which is responsible for -managing interaction with BL3-2. This information is passed to BL3-1. - -#### BL3-3 (Non-trusted Firmware) image load - -BL2 loads the BL3-3 image (e.g. UEFI or other test or boot software) from -platform storage into non-secure memory as defined by the platform. - -BL2 relies on BL3-1 to pass control to BL3-3 once secure state initialization is -complete. Hence, BL2 populates a platform-specific area of memory with the -entrypoint and Saved Program Status Register (`SPSR`) of the normal world -software image. The entrypoint is the load address of the BL3-3 image. The -`SPSR` is determined as specified in Section 5.13 of the [PSCI PDD] [PSCI]. This -information is passed to BL3-1. - -#### BL3-1 (EL3 Runtime Firmware) execution - -BL2 execution continues as follows: - -1. BL2 passes control back to BL1 by raising an SMC, providing BL1 with the - BL3-1 entrypoint. The exception is handled by the SMC exception handler - installed by BL1. - -2. BL1 turns off the MMU and flushes the caches. It clears the - `SCTLR_EL3.M/I/C` bits, flushes the data cache to the point of coherency - and invalidates the TLBs. - -3. BL1 passes control to BL3-1 at the specified entrypoint at EL3. - - -### BL3-1 - -The image for this stage is loaded by BL2 and BL1 passes control to BL3-1 at -EL3. BL3-1 executes solely in trusted SRAM. BL3-1 is linked against and -loaded at a platform-specific base address (more information can be found later -in this document). The functionality implemented by BL3-1 is as follows. - -#### Architectural initialization - -Currently, BL3-1 performs a similar architectural initialization to BL1 as -far as system register settings are concerned. Since BL1 code resides in ROM, -architectural initialization in BL3-1 allows override of any previous -initialization done by BL1. BL3-1 creates page tables to address the first -4GB of physical address space and initializes the MMU accordingly. It initializes -a buffer of frequently used pointers, called per-CPU pointer cache, in memory for -faster access. Currently the per-CPU pointer cache contains only the pointer -to crash stack. It then replaces the exception vectors populated by BL1 with its -own. BL3-1 exception vectors implement more elaborate support for -handling SMCs since this is the only mechanism to access the runtime services -implemented by BL3-1 (PSCI for example). BL3-1 checks each SMC for validity as -specified by the [SMC calling convention PDD][SMCCC] before passing control to -the required SMC handler routine. BL3-1 programs the `CNTFRQ_EL0` register with -the clock frequency of the system counter, which is provided by the platform. - -#### Platform initialization - -BL3-1 performs detailed platform initialization, which enables normal world -software to function correctly. It also retrieves entrypoint information for -the BL3-3 image loaded by BL2 from the platform defined memory address populated -by BL2. BL3-1 also initializes UART0 (PL011 console), which enables -access to the `printf` family of functions in BL3-1. It enables the system -level implementation of the generic timer through the memory mapped interface. - -* GICv2 initialization: - - - Enable group0 interrupts in the GIC CPU interface. - - Configure group0 interrupts to be asserted as FIQs. - - Disable the legacy interrupt bypass mechanism. - - Configure the priority mask register to allow interrupts of all - priorities to be signaled to the CPU interface. - - Mark SGIs 8-15, the secure physical timer interrupt (#29) and the - trusted watchdog interrupt (#56) as group0 (secure). - - Target the trusted watchdog interrupt to CPU0. - - Enable these group0 interrupts in the GIC distributor. - - Configure all other interrupts as group1 (non-secure). - - Enable signaling of group0 interrupts in the GIC distributor. - -* GICv3 initialization: - - If a GICv3 implementation is available in the platform, BL3-1 initializes - the GICv3 in GICv2 emulation mode with settings as described for GICv2 - above. - -* Power management initialization: - - BL3-1 implements a state machine to track CPU and cluster state. The state - can be one of `OFF`, `ON_PENDING`, `SUSPEND` or `ON`. All secondary CPUs are - initially in the `OFF` state. The cluster that the primary CPU belongs to is - `ON`; any other cluster is `OFF`. BL3-1 initializes the data structures that - implement the state machine, including the locks that protect them. BL3-1 - accesses the state of a CPU or cluster immediately after reset and before - the data cache is enabled in the warm boot path. It is not currently - possible to use 'exclusive' based spinlocks, therefore BL3-1 uses locks - based on Lamport's Bakery algorithm instead. BL3-1 allocates these locks in - device memory by default. - -* Runtime services initialization: - - The runtime service framework and its initialization is described in the - "EL3 runtime services framework" section below. - - Details about the PSCI service are provided in the "Power State Coordination - Interface" section below. - -* BL3-2 (Secure-EL1 Payload) image initialization - - If a BL3-2 image is present then there must be a matching Secure-EL1 Payload - Dispatcher (SPD) service (see later for details). During initialization - that service must register a function to carry out initialization of BL3-2 - once the runtime services are fully initialized. BL3-1 invokes such a - registered function to initialize BL3-2 before running BL3-3. - - Details on BL3-2 initialization and the SPD's role are described in the - "Secure-EL1 Payloads and Dispatchers" section below. - -* BL3-3 (Non-trusted Firmware) execution - - BL3-1 initializes the EL2 or EL1 processor context for normal-world cold - boot, ensuring that no secure state information finds its way into the - non-secure execution state. BL3-1 uses the entrypoint information provided - by BL2 to jump to the Non-trusted firmware image (BL3-3) at the highest - available Exception Level (EL2 if available, otherwise EL1). - - -### Using alternative Trusted Boot Firmware in place of BL1 and BL2 - -Some platforms have existing implementations of Trusted Boot Firmware that -would like to use ARM Trusted Firmware BL3-1 for the EL3 Runtime Firmware. To -enable this firmware architecture it is important to provide a fully documented -and stable interface between the Trusted Boot Firmware and BL3-1. - -Future changes to the BL3-1 interface will be done in a backwards compatible -way, and this enables these firmware components to be independently enhanced/ -updated to develop and exploit new functionality. - -#### Required CPU state when calling `bl31_entrypoint()` during cold boot - -This function must only be called by the primary CPU, if this is called by any -other CPU the firmware will abort. - -On entry to this function the calling primary CPU must be executing in AArch64 -EL3, little-endian data access, and all interrupt sources masked: - - PSTATE.EL = 3 - PSTATE.RW = 1 - PSTATE.DAIF = 0xf - SCTLR_EL3.EE = 0 - -X0 and X1 can be used to pass information from the Trusted Boot Firmware to the -platform code in BL3-1: - - X0 : Reserved for common Trusted Firmware information - X1 : Platform specific information - -BL3-1 zero-init sections (e.g. `.bss`) should not contain valid data on entry, -these will be zero filled prior to invoking platform setup code. - -##### Use of the X0 and X1 parameters - -The parameters are platform specific and passed from `bl31_entrypoint()` to -`bl31_early_platform_setup()`. The value of these parameters is never directly -used by the common BL3-1 code. - -The convention is that `X0` conveys information regarding the BL3-1, BL3-2 and -BL3-3 images from the Trusted Boot firmware and `X1` can be used for other -platform specific purpose. This convention allows platforms which use ARM -Trusted Firmware's BL1 and BL2 images to transfer additional platform specific -information from Secure Boot without conflicting with future evolution of the -Trusted Firmware using `X0` to pass a `bl31_params` structure. - -BL3-1 common and SPD initialization code depends on image and entrypoint -information about BL3-3 and BL3-2, which is provided via BL3-1 platform APIs. -This information is required until the start of execution of BL3-3. This -information can be provided in a platform defined manner, e.g. compiled into -the platform code in BL3-1, or provided in a platform defined memory location -by the Trusted Boot firmware, or passed from the Trusted Boot Firmware via the -Cold boot Initialization parameters. This data may need to be cleaned out of -the CPU caches if it is provided by an earlier boot stage and then accessed by -BL3-1 platform code before the caches are enabled. - -ARM Trusted Firmware's BL2 implementation passes a `bl31_params` structure in -`X0` and the FVP port interprets this in the BL3-1 platform code. - -##### MMU, Data caches & Coherency - -BL3-1 does not depend on the enabled state of the MMU, data caches or -interconnect coherency on entry to `bl31_entrypoint()`. If these are disabled -on entry, these should be enabled during `bl31_plat_arch_setup()`. - -##### Data structures used in the BL3-1 cold boot interface - -These structures are designed to support compatibility and independent -evolution of the structures and the firmware images. For example, a version of -BL3-1 that can interpret the BL3-x image information from different versions of -BL2, a platform that uses an extended entry_point_info structure to convey -additional register information to BL3-1, or a ELF image loader that can convey -more details about the firmware images. - -To support these scenarios the structures are versioned and sized, which enables -BL3-1 to detect which information is present and respond appropriately. The -`param_header` is defined to capture this information: - - typedef struct param_header { - uint8_t type; /* type of the structure */ - uint8_t version; /* version of this structure */ - uint16_t size; /* size of this structure in bytes */ - uint32_t attr; /* attributes: unused bits SBZ */ - } param_header_t; - -The structures using this format are `entry_point_info`, `image_info` and -`bl31_params`. The code that allocates and populates these structures must set -the header fields appropriately, and the `SET_PARA_HEAD()` a macro is defined -to simplify this action. - -#### Required CPU state for BL3-1 Warm boot initialization - -When requesting a CPU power-on, or suspending a running CPU, ARM Trusted -Firmware provides the platform power management code with a Warm boot -initialization entry-point, to be invoked by the CPU immediately after the -reset handler. On entry to the Warm boot initialization function the calling -CPU must be in AArch64 EL3, little-endian data access and all interrupt sources -masked: - - PSTATE.EL = 3 - PSTATE.RW = 1 - PSTATE.DAIF = 0xf - SCTLR_EL3.EE = 0 - -The PSCI implementation will initialize the processor state and ensure that the -platform power management code is then invoked as required to initialize all -necessary system, cluster and CPU resources. - - -### Using BL3-1 as the CPU reset vector - -On some platforms the runtime firmware (BL3-x images) for the application -processors are loaded by trusted firmware running on a secure system processor -on the SoC, rather than by BL1 and BL2 running on the primary application -processor. For this type of SoC it is desirable for the application processor -to always reset to BL3-1 which eliminates the need for BL1 and BL2. - -ARM Trusted Firmware provides a build-time option `RESET_TO_BL31` that includes -some additional logic in the BL3-1 entrypoint to support this use case. - -In this configuration, the platform's Trusted Boot Firmware must ensure that -BL3-1 is loaded to its runtime address, which must match the CPU's RVBAR reset -vector address, before the application processor is powered on. Additionally, -platform software is responsible for loading the other BL3-x images required and -providing entry point information for them to BL3-1. Loading these images might -be done by the Trusted Boot Firmware or by platform code in BL3-1. - -The ARM FVP port supports the `RESET_TO_BL31` configuration, in which case the -`bl31.bin` image must be loaded to its run address in Trusted SRAM and all CPU -reset vectors be changed from the default `0x0` to this run address. See the -[User Guide] for details of running the FVP models in this way. - -This configuration requires some additions and changes in the BL3-1 -functionality: - -#### Determination of boot path - -In this configuration, BL3-1 uses the same reset framework and code as the one -described for BL1 above. On a warm boot a CPU is directed to the PSCI -implementation via a platform defined mechanism. On a cold boot, the platform -must place any secondary CPUs into a safe state while the primary CPU executes -a modified BL3-1 initialization, as described below. - -#### Architectural initialization - -As the first image to execute in this configuration BL3-1 must ensure that -interconnect coherency is enabled (if required) before enabling the MMU. - -#### Platform initialization - -In this configuration, when the CPU resets to BL3-1 there are no parameters -that can be passed in registers by previous boot stages. Instead, the platform -code in BL3-1 needs to know, or be able to determine, the location of the BL3-2 -(if required) and BL3-3 images and provide this information in response to the -`bl31_plat_get_next_image_ep_info()` function. - -As the first image to execute in this configuration BL3-1 must also ensure that -any security initialisation, for example programming a TrustZone address space -controller, is carried out during early platform initialisation. - - -3. EL3 runtime services framework ----------------------------------- - -Software executing in the non-secure state and in the secure state at exception -levels lower than EL3 will request runtime services using the Secure Monitor -Call (SMC) instruction. These requests will follow the convention described in -the SMC Calling Convention PDD ([SMCCC]). The [SMCCC] assigns function -identifiers to each SMC request and describes how arguments are passed and -returned. - -The EL3 runtime services framework enables the development of services by -different providers that can be easily integrated into final product firmware. -The following sections describe the framework which facilitates the -registration, initialization and use of runtime services in EL3 Runtime -Firmware (BL3-1). - -The design of the runtime services depends heavily on the concepts and -definitions described in the [SMCCC], in particular SMC Function IDs, Owning -Entity Numbers (OEN), Fast and Standard calls, and the SMC32 and SMC64 calling -conventions. Please refer to that document for more detailed explanation of -these terms. - -The following runtime services are expected to be implemented first. They have -not all been instantiated in the current implementation. - -1. Standard service calls - - This service is for management of the entire system. The Power State - Coordination Interface ([PSCI]) is the first set of standard service calls - defined by ARM (see PSCI section later). - - NOTE: Currently this service is called PSCI since there are no other - defined standard service calls. - -2. Secure-EL1 Payload Dispatcher service - - If a system runs a Trusted OS or other Secure-EL1 Payload (SP) then - it also requires a _Secure Monitor_ at EL3 to switch the EL1 processor - context between the normal world (EL1/EL2) and trusted world (Secure-EL1). - The Secure Monitor will make these world switches in response to SMCs. The - [SMCCC] provides for such SMCs with the Trusted OS Call and Trusted - Application Call OEN ranges. - - The interface between the EL3 Runtime Firmware and the Secure-EL1 Payload is - not defined by the [SMCCC] or any other standard. As a result, each - Secure-EL1 Payload requires a specific Secure Monitor that runs as a runtime - service - within ARM Trusted Firmware this service is referred to as the - Secure-EL1 Payload Dispatcher (SPD). - - ARM Trusted Firmware provides a Test Secure-EL1 Payload (TSP) and its - associated Dispatcher (TSPD). Details of SPD design and TSP/TSPD operation - are described in the "Secure-EL1 Payloads and Dispatchers" section below. - -3. CPU implementation service - - This service will provide an interface to CPU implementation specific - services for a given platform e.g. access to processor errata workarounds. - This service is currently unimplemented. - -Additional services for ARM Architecture, SiP and OEM calls can be implemented. -Each implemented service handles a range of SMC function identifiers as -described in the [SMCCC]. - - -### Registration - -A runtime service is registered using the `DECLARE_RT_SVC()` macro, specifying -the name of the service, the range of OENs covered, the type of service and -initialization and call handler functions. This macro instantiates a `const -struct rt_svc_desc` for the service with these details (see `runtime_svc.h`). -This structure is allocated in a special ELF section `rt_svc_descs`, enabling -the framework to find all service descriptors included into BL3-1. - -The specific service for a SMC Function is selected based on the OEN and call -type of the Function ID, and the framework uses that information in the service -descriptor to identify the handler for the SMC Call. - -The service descriptors do not include information to identify the precise set -of SMC function identifiers supported by this service implementation, the -security state from which such calls are valid nor the capability to support -64-bit and/or 32-bit callers (using SMC32 or SMC64). Responding appropriately -to these aspects of a SMC call is the responsibility of the service -implementation, the framework is focused on integration of services from -different providers and minimizing the time taken by the framework before the -service handler is invoked. - -Details of the parameters, requirements and behavior of the initialization and -call handling functions are provided in the following sections. - - -### Initialization - -`runtime_svc_init()` in `runtime_svc.c` initializes the runtime services -framework running on the primary CPU during cold boot as part of the BL3-1 -initialization. This happens prior to initializing a Trusted OS and running -Normal world boot firmware that might in turn use these services. -Initialization involves validating each of the declared runtime service -descriptors, calling the service initialization function and populating the -index used for runtime lookup of the service. - -The BL3-1 linker script collects all of the declared service descriptors into a -single array and defines symbols that allow the framework to locate and traverse -the array, and determine its size. - -The framework does basic validation of each descriptor to halt firmware -initialization if service declaration errors are detected. The framework does -not check descriptors for the following error conditions, and may behave in an -unpredictable manner under such scenarios: - -1. Overlapping OEN ranges -2. Multiple descriptors for the same range of OENs and `call_type` -3. Incorrect range of owning entity numbers for a given `call_type` - -Once validated, the service `init()` callback is invoked. This function carries -out any essential EL3 initialization before servicing requests. The `init()` -function is only invoked on the primary CPU during cold boot. If the service -uses per-CPU data this must either be initialized for all CPUs during this call, -or be done lazily when a CPU first issues an SMC call to that service. If -`init()` returns anything other than `0`, this is treated as an initialization -error and the service is ignored: this does not cause the firmware to halt. - -The OEN and call type fields present in the SMC Function ID cover a total of -128 distinct services, but in practice a single descriptor can cover a range of -OENs, e.g. SMCs to call a Trusted OS function. To optimize the lookup of a -service handler, the framework uses an array of 128 indices that map every -distinct OEN/call-type combination either to one of the declared services or to -indicate the service is not handled. This `rt_svc_descs_indices[]` array is -populated for all of the OENs covered by a service after the service `init()` -function has reported success. So a service that fails to initialize will never -have it's `handle()` function invoked. - -The following figure shows how the `rt_svc_descs_indices[]` index maps the SMC -Function ID call type and OEN onto a specific service handler in the -`rt_svc_descs[]` array. - -![Image 1](diagrams/rt-svc-descs-layout.png?raw=true) - - -### Handling an SMC - -When the EL3 runtime services framework receives a Secure Monitor Call, the SMC -Function ID is passed in W0 from the lower exception level (as per the -[SMCCC]). If the calling register width is AArch32, it is invalid to invoke an -SMC Function which indicates the SMC64 calling convention: such calls are -ignored and return the Unknown SMC Function Identifier result code `0xFFFFFFFF` -in R0/X0. - -Bit[31] (fast/standard call) and bits[29:24] (owning entity number) of the SMC -Function ID are combined to index into the `rt_svc_descs_indices[]` array. The -resulting value might indicate a service that has no handler, in this case the -framework will also report an Unknown SMC Function ID. Otherwise, the value is -used as a further index into the `rt_svc_descs[]` array to locate the required -service and handler. - -The service's `handle()` callback is provided with five of the SMC parameters -directly, the others are saved into memory for retrieval (if needed) by the -handler. The handler is also provided with an opaque `handle` for use with the -supporting library for parameter retrieval, setting return values and context -manipulation; and with `flags` indicating the security state of the caller. The -framework finally sets up the execution stack for the handler, and invokes the -services `handle()` function. - -On return from the handler the result registers are populated in X0-X3 before -restoring the stack and CPU state and returning from the original SMC. - - -4. Power State Coordination Interface --------------------------------------- - -TODO: Provide design walkthrough of PSCI implementation. - -The PSCI v1.0 specification categorizes APIs as optional and mandatory. All the -mandatory APIs in PSCI v1.0 and all the APIs in PSCI v0.2 draft specification -[Power State Coordination Interface PDD] [PSCI] are implemented. The table lists -the PSCI v1.0 APIs and their support in generic code. - -An API implementation might have a dependency on platform code e.g. CPU_SUSPEND -requires the platform to export a part of the implementation. Hence the level -of support of the mandatory APIs depends upon the support exported by the -platform port as well. The Juno and FVP (all variants) platforms export all the -required support. - -| PSCI v1.0 API |Supported| Comments | -|:----------------------|:--------|:------------------------------------------| -|`PSCI_VERSION` | Yes | The version returned is 1.0 | -|`CPU_SUSPEND` | Yes* | The original `power_state` format is used | -|`CPU_OFF` | Yes* | | -|`CPU_ON` | Yes* | | -|`AFFINITY_INFO` | Yes | | -|`MIGRATE` | Yes** | | -|`MIGRATE_INFO_TYPE` | Yes** | | -|`MIGRATE_INFO_CPU` | Yes** | | -|`SYSTEM_OFF` | Yes* | | -|`SYSTEM_RESET` | Yes* | | -|`PSCI_FEATURES` | Yes | | -|`CPU_FREEZE` | No | | -|`CPU_DEFAULT_SUSPEND` | No | | -|`CPU_HW_STATE` | No | | -|`SYSTEM_SUSPEND` | Yes* | | -|`PSCI_SET_SUSPEND_MODE`| No | | -|`PSCI_STAT_RESIDENCY` | No | | -|`PSCI_STAT_COUNT` | No | | - -*Note : These PSCI APIs require platform power management hooks to be -registered with the generic PSCI code to be supported. - -**Note : These PSCI APIs require appropriate Secure Payload Dispatcher -hooks to be registered with the generic PSCI code to be supported. - - -5. Secure-EL1 Payloads and Dispatchers ---------------------------------------- - -On a production system that includes a Trusted OS running in Secure-EL1/EL0, -the Trusted OS is coupled with a companion runtime service in the BL3-1 -firmware. This service is responsible for the initialisation of the Trusted -OS and all communications with it. The Trusted OS is the BL3-2 stage of the -boot flow in ARM Trusted Firmware. The firmware will attempt to locate, load -and execute a BL3-2 image. - -ARM Trusted Firmware uses a more general term for the BL3-2 software that runs -at Secure-EL1 - the _Secure-EL1 Payload_ - as it is not always a Trusted OS. - -The ARM Trusted Firmware provides a Test Secure-EL1 Payload (TSP) and a Test -Secure-EL1 Payload Dispatcher (TSPD) service as an example of how a Trusted OS -is supported on a production system using the Runtime Services Framework. On -such a system, the Test BL3-2 image and service are replaced by the Trusted OS -and its dispatcher service. The ARM Trusted Firmware build system expects that -the dispatcher will define the build flag `NEED_BL32` to enable it to include -the BL3-2 in the build either as a binary or to compile from source depending -on whether the `BL32` build option is specified or not. - -The TSP runs in Secure-EL1. It is designed to demonstrate synchronous -communication with the normal-world software running in EL1/EL2. Communication -is initiated by the normal-world software - -* either directly through a Fast SMC (as defined in the [SMCCC]) - -* or indirectly through a [PSCI] SMC. The [PSCI] implementation in turn - informs the TSPD about the requested power management operation. This allows - the TSP to prepare for or respond to the power state change - -The TSPD service is responsible for. - -* Initializing the TSP - -* Routing requests and responses between the secure and the non-secure - states during the two types of communications just described - -### Initializing a BL3-2 Image - -The Secure-EL1 Payload Dispatcher (SPD) service is responsible for initializing -the BL3-2 image. It needs access to the information passed by BL2 to BL3-1 to do -so. This is provided by: - - entry_point_info_t *bl31_plat_get_next_image_ep_info(uint32_t); - -which returns a reference to the `entry_point_info` structure corresponding to -the image which will be run in the specified security state. The SPD uses this -API to get entry point information for the SECURE image, BL3-2. - -In the absence of a BL3-2 image, BL3-1 passes control to the normal world -bootloader image (BL3-3). When the BL3-2 image is present, it is typical -that the SPD wants control to be passed to BL3-2 first and then later to BL3-3. - -To do this the SPD has to register a BL3-2 initialization function during -initialization of the SPD service. The BL3-2 initialization function has this -prototype: - - int32_t init(); - -and is registered using the `bl31_register_bl32_init()` function. - -Trusted Firmware supports two approaches for the SPD to pass control to BL3-2 -before returning through EL3 and running the non-trusted firmware (BL3-3): - -1. In the BL3-2 setup function, use `bl31_set_next_image_type()` to - request that the exit from `bl31_main()` is to the BL3-2 entrypoint in - Secure-EL1. BL3-1 will exit to BL3-2 using the asynchronous method by - calling bl31_prepare_next_image_entry() and el3_exit(). - - When the BL3-2 has completed initialization at Secure-EL1, it returns to - BL3-1 by issuing an SMC, using a Function ID allocated to the SPD. On - receipt of this SMC, the SPD service handler should switch the CPU context - from trusted to normal world and use the `bl31_set_next_image_type()` and - `bl31_prepare_next_image_entry()` functions to set up the initial return to - the normal world firmware BL3-3. On return from the handler the framework - will exit to EL2 and run BL3-3. - -2. The BL3-2 setup function registers a initialization function using - `bl31_register_bl32_init()` which provides a SPD-defined mechanism to - invoke a 'world-switch synchronous call' to Secure-EL1 to run the BL3-2 - entrypoint. - NOTE: The Test SPD service included with the Trusted Firmware provides one - implementation of such a mechanism. - - On completion BL3-2 returns control to BL3-1 via a SMC, and on receipt the - SPD service handler invokes the synchronous call return mechanism to return - to the BL3-2 initialization function. On return from this function, - `bl31_main()` will set up the return to the normal world firmware BL3-3 and - continue the boot process in the normal world. - - -6. Crash Reporting in BL3-1 ----------------------------- - -The BL3-1 implements a scheme for reporting the processor state when an unhandled -exception is encountered. The reporting mechanism attempts to preserve all the -register contents and report it via the default serial output. The general purpose -registers, EL3, Secure EL1 and some EL2 state registers are reported. - -A dedicated per-CPU crash stack is maintained by BL3-1 and this is retrieved via -the per-CPU pointer cache. The implementation attempts to minimise the memory -required for this feature. The file `crash_reporting.S` contains the -implementation for crash reporting. - -The sample crash output is shown below. - - x0 :0x000000004F00007C - x1 :0x0000000007FFFFFF - x2 :0x0000000004014D50 - x3 :0x0000000000000000 - x4 :0x0000000088007998 - x5 :0x00000000001343AC - x6 :0x0000000000000016 - x7 :0x00000000000B8A38 - x8 :0x00000000001343AC - x9 :0x00000000000101A8 - x10 :0x0000000000000002 - x11 :0x000000000000011C - x12 :0x00000000FEFDC644 - x13 :0x00000000FED93FFC - x14 :0x0000000000247950 - x15 :0x00000000000007A2 - x16 :0x00000000000007A4 - x17 :0x0000000000247950 - x18 :0x0000000000000000 - x19 :0x00000000FFFFFFFF - x20 :0x0000000004014D50 - x21 :0x000000000400A38C - x22 :0x0000000000247950 - x23 :0x0000000000000010 - x24 :0x0000000000000024 - x25 :0x00000000FEFDC868 - x26 :0x00000000FEFDC86A - x27 :0x00000000019EDEDC - x28 :0x000000000A7CFDAA - x29 :0x0000000004010780 - x30 :0x000000000400F004 - scr_el3 :0x0000000000000D3D - sctlr_el3 :0x0000000000C8181F - cptr_el3 :0x0000000000000000 - tcr_el3 :0x0000000080803520 - daif :0x00000000000003C0 - mair_el3 :0x00000000000004FF - spsr_el3 :0x00000000800003CC - elr_el3 :0x000000000400C0CC - ttbr0_el3 :0x00000000040172A0 - esr_el3 :0x0000000096000210 - sp_el3 :0x0000000004014D50 - far_el3 :0x000000004F00007C - spsr_el1 :0x0000000000000000 - elr_el1 :0x0000000000000000 - spsr_abt :0x0000000000000000 - spsr_und :0x0000000000000000 - spsr_irq :0x0000000000000000 - spsr_fiq :0x0000000000000000 - sctlr_el1 :0x0000000030C81807 - actlr_el1 :0x0000000000000000 - cpacr_el1 :0x0000000000300000 - csselr_el1 :0x0000000000000002 - sp_el1 :0x0000000004028800 - esr_el1 :0x0000000000000000 - ttbr0_el1 :0x000000000402C200 - ttbr1_el1 :0x0000000000000000 - mair_el1 :0x00000000000004FF - amair_el1 :0x0000000000000000 - tcr_el1 :0x0000000000003520 - tpidr_el1 :0x0000000000000000 - tpidr_el0 :0x0000000000000000 - tpidrro_el0 :0x0000000000000000 - dacr32_el2 :0x0000000000000000 - ifsr32_el2 :0x0000000000000000 - par_el1 :0x0000000000000000 - far_el1 :0x0000000000000000 - afsr0_el1 :0x0000000000000000 - afsr1_el1 :0x0000000000000000 - contextidr_el1 :0x0000000000000000 - vbar_el1 :0x0000000004027000 - cntp_ctl_el0 :0x0000000000000000 - cntp_cval_el0 :0x0000000000000000 - cntv_ctl_el0 :0x0000000000000000 - cntv_cval_el0 :0x0000000000000000 - cntkctl_el1 :0x0000000000000000 - fpexc32_el2 :0x0000000004000700 - sp_el0 :0x0000000004010780 - -7. Guidelines for Reset Handlers ---------------------------------- - -Trusted Firmware implements a framework that allows CPU and platform ports to -perform actions immediately after a CPU is released from reset in both the cold -and warm boot paths. This is done by calling the `reset_handler()` function in -both the BL1 and BL3-1 images. It in turn calls the platform and CPU specific -reset handling functions. - -Details for implementing a CPU specific reset handler can be found in -Section 8. Details for implementing a platform specific reset handler can be -found in the [Porting Guide](see the `plat_reset_handler()` function). - -When adding functionality to a reset handler, the following points should be -kept in mind. - -1. The first reset handler in the system exists either in a ROM image - (e.g. BL1), or BL3-1 if `RESET_TO_BL31` is true. This may be detected at - compile time using the constant `FIRST_RESET_HANDLER_CALL`. - -2. When considering ROM images, it's important to consider non TF-based ROMs - and ROMs based on previous versions of the TF code. - -3. If the functionality should be applied to a ROM and there is no possibility - of a ROM being used that does not apply the functionality (or equivalent), - then the functionality should be applied within a `#if - FIRST_RESET_HANDLER_CALL` block. - -4. If the functionality should execute in BL3-1 in order to override or - supplement a ROM version of the functionality, then the functionality - should be applied in the `#else` part of a `#if FIRST_RESET_HANDLER_CALL` - block. - -5. If the functionality should be applied to a ROM but there is a possibility - of ROMs being used that do not apply the functionality, then the - functionality should be applied outside of a `FIRST_RESET_HANDLER_CALL` - block, so that BL3-1 has an opportunity to apply the functionality instead. - In this case, additional code may be needed to cope with different ROMs - that do or do not apply the functionality. - - -8. CPU specific operations framework ------------------------------ - -Certain aspects of the ARMv8 architecture are implementation defined, -that is, certain behaviours are not architecturally defined, but must be defined -and documented by individual processor implementations. The ARM Trusted -Firmware implements a framework which categorises the common implementation -defined behaviours and allows a processor to export its implementation of that -behaviour. The categories are: - -1. Processor specific reset sequence. - -2. Processor specific power down sequences. - -3. Processor specific register dumping as a part of crash reporting. - -Each of the above categories fulfils a different requirement. - -1. allows any processor specific initialization before the caches and MMU - are turned on, like implementation of errata workarounds, entry into - the intra-cluster coherency domain etc. - -2. allows each processor to implement the power down sequence mandated in - its Technical Reference Manual (TRM). - -3. allows a processor to provide additional information to the developer - in the event of a crash, for example Cortex-A53 has registers which - can expose the data cache contents. - -Please note that only 2. is mandated by the TRM. - -The CPU specific operations framework scales to accommodate a large number of -different CPUs during power down and reset handling. The platform can specify -any CPU optimization it wants to enable for each CPU. It can also specify -the CPU errata workarounds to be applied for each CPU type during reset -handling by defining CPU errata compile time macros. Details on these macros -can be found in the [cpu-specific-build-macros.md][CPUBM] file. - -The CPU specific operations framework depends on the `cpu_ops` structure which -needs to be exported for each type of CPU in the platform. It is defined in -`include/lib/cpus/aarch64/cpu_macros.S` and has the following fields : `midr`, -`reset_func()`, `core_pwr_dwn()`, `cluster_pwr_dwn()` and `cpu_reg_dump()`. - -The CPU specific files in `lib/cpus` export a `cpu_ops` data structure with -suitable handlers for that CPU. For example, `lib/cpus/cortex_a53.S` exports -the `cpu_ops` for Cortex-A53 CPU. According to the platform configuration, -these CPU specific files must must be included in the build by the platform -makefile. The generic CPU specific operations framework code exists in -`lib/cpus/aarch64/cpu_helpers.S`. - -### CPU specific Reset Handling - -After a reset, the state of the CPU when it calls generic reset handler is: -MMU turned off, both instruction and data caches turned off and not part -of any coherency domain. - -The BL entrypoint code first invokes the `plat_reset_handler()` to allow -the platform to perform any system initialization required and any system -errata workarounds that needs to be applied. The `get_cpu_ops_ptr()` reads -the current CPU midr, finds the matching `cpu_ops` entry in the `cpu_ops` -array and returns it. Note that only the part number and implementer fields -in midr are used to find the matching `cpu_ops` entry. The `reset_func()` in -the returned `cpu_ops` is then invoked which executes the required reset -handling for that CPU and also any errata workarounds enabled by the platform. -This function must preserve the values of general purpose registers x20 to x29. - -Refer to Section "Guidelines for Reset Handlers" for general guidelines -regarding placement of code in a reset handler. - -### CPU specific power down sequence - -During the BL3-1 initialization sequence, the pointer to the matching `cpu_ops` -entry is stored in per-CPU data by `init_cpu_ops()` so that it can be quickly -retrieved during power down sequences. - -The PSCI service, upon receiving a power down request, determines the highest -affinity level at which to execute power down sequence for a particular CPU and -invokes the corresponding 'prepare' power down handler in the CPU specific -operations framework. For example, when a CPU executes a power down for affinity -level 0, the `prepare_core_pwr_dwn()` retrieves the `cpu_ops` pointer from the -per-CPU data and the corresponding `core_pwr_dwn()` is invoked. Similarly when -a CPU executes power down at affinity level 1, the `prepare_cluster_pwr_dwn()` -retrieves the `cpu_ops` pointer and the corresponding `cluster_pwr_dwn()` is -invoked. - -At runtime the platform hooks for power down are invoked by the PSCI service to -perform platform specific operations during a power down sequence, for example -turning off CCI coherency during a cluster power down. - -### CPU specific register reporting during crash - -If the crash reporting is enabled in BL3-1, when a crash occurs, the crash -reporting framework calls `do_cpu_reg_dump` which retrieves the matching -`cpu_ops` using `get_cpu_ops_ptr()` function. The `cpu_reg_dump()` in -`cpu_ops` is invoked, which then returns the CPU specific register values to -be reported and a pointer to the ASCII list of register names in a format -expected by the crash reporting framework. - - -9. Memory layout of BL images ------------------------------ - -Each bootloader image can be divided in 2 parts: - - * the static contents of the image. These are data actually stored in the - binary on the disk. In the ELF terminology, they are called `PROGBITS` - sections; - - * the run-time contents of the image. These are data that don't occupy any - space in the binary on the disk. The ELF binary just contains some - metadata indicating where these data will be stored at run-time and the - corresponding sections need to be allocated and initialized at run-time. - In the ELF terminology, they are called `NOBITS` sections. - -All PROGBITS sections are grouped together at the beginning of the image, -followed by all NOBITS sections. This is true for all Trusted Firmware images -and it is governed by the linker scripts. This ensures that the raw binary -images are as small as possible. If a NOBITS section would sneak in between -PROGBITS sections then the resulting binary file would contain a bunch of zero -bytes at the location of this NOBITS section, making the image unnecessarily -bigger. Smaller images allow faster loading from the FIP to the main memory. - -### Linker scripts and symbols - -Each bootloader stage image layout is described by its own linker script. The -linker scripts export some symbols into the program symbol table. Their values -correspond to particular addresses. The trusted firmware code can refer to these -symbols to figure out the image memory layout. - -Linker symbols follow the following naming convention in the trusted firmware. - -* `__<SECTION>_START__` - - Start address of a given section named `<SECTION>`. - -* `__<SECTION>_END__` - - End address of a given section named `<SECTION>`. If there is an alignment - constraint on the section's end address then `__<SECTION>_END__` corresponds - to the end address of the section's actual contents, rounded up to the right - boundary. Refer to the value of `__<SECTION>_UNALIGNED_END__` to know the - actual end address of the section's contents. - -* `__<SECTION>_UNALIGNED_END__` - - End address of a given section named `<SECTION>` without any padding or - rounding up due to some alignment constraint. - -* `__<SECTION>_SIZE__` - - Size (in bytes) of a given section named `<SECTION>`. If there is an - alignment constraint on the section's end address then `__<SECTION>_SIZE__` - corresponds to the size of the section's actual contents, rounded up to the - right boundary. In other words, `__<SECTION>_SIZE__ = __<SECTION>_END__ - - _<SECTION>_START__`. Refer to the value of `__<SECTION>_UNALIGNED_SIZE__` - to know the actual size of the section's contents. - -* `__<SECTION>_UNALIGNED_SIZE__` - - Size (in bytes) of a given section named `<SECTION>` without any padding or - rounding up due to some alignment constraint. In other words, - `__<SECTION>_UNALIGNED_SIZE__ = __<SECTION>_UNALIGNED_END__ - - __<SECTION>_START__`. - -Some of the linker symbols are mandatory as the trusted firmware code relies on -them to be defined. They are listed in the following subsections. Some of them -must be provided for each bootloader stage and some are specific to a given -bootloader stage. - -The linker scripts define some extra, optional symbols. They are not actually -used by any code but they help in understanding the bootloader images' memory -layout as they are easy to spot in the link map files. - -#### Common linker symbols - -Early setup code needs to know the extents of the BSS section to zero-initialise -it before executing any C code. The following linker symbols are defined for -this purpose: - -* `__BSS_START__` This address must be aligned on a 16-byte boundary. -* `__BSS_SIZE__` - -Similarly, the coherent memory section (if enabled) must be zero-initialised. -Also, the MMU setup code needs to know the extents of this section to set the -right memory attributes for it. The following linker symbols are defined for -this purpose: - -* `__COHERENT_RAM_START__` This address must be aligned on a page-size boundary. -* `__COHERENT_RAM_END__` This address must be aligned on a page-size boundary. -* `__COHERENT_RAM_UNALIGNED_SIZE__` - -#### BL1's linker symbols - -BL1's early setup code needs to know the extents of the .data section to -relocate it from ROM to RAM before executing any C code. The following linker -symbols are defined for this purpose: - -* `__DATA_ROM_START__` This address must be aligned on a 16-byte boundary. -* `__DATA_RAM_START__` This address must be aligned on a 16-byte boundary. -* `__DATA_SIZE__` - -BL1's platform setup code needs to know the extents of its read-write data -region to figure out its memory layout. The following linker symbols are defined -for this purpose: - -* `__BL1_RAM_START__` This is the start address of BL1 RW data. -* `__BL1_RAM_END__` This is the end address of BL1 RW data. - -#### BL2's, BL3-1's and TSP's linker symbols - -BL2, BL3-1 and TSP need to know the extents of their read-only section to set -the right memory attributes for this memory region in their MMU setup code. The -following linker symbols are defined for this purpose: - -* `__RO_START__` -* `__RO_END__` - -### How to choose the right base addresses for each bootloader stage image - -There is currently no support for dynamic image loading in the Trusted Firmware. -This means that all bootloader images need to be linked against their ultimate -runtime locations and the base addresses of each image must be chosen carefully -such that images don't overlap each other in an undesired way. As the code -grows, the base addresses might need adjustments to cope with the new memory -layout. - -The memory layout is completely specific to the platform and so there is no -general recipe for choosing the right base addresses for each bootloader image. -However, there are tools to aid in understanding the memory layout. These are -the link map files: `build/<platform>/<build-type>/bl<x>/bl<x>.map`, with `<x>` -being the stage bootloader. They provide a detailed view of the memory usage of -each image. Among other useful information, they provide the end address of -each image. - -* `bl1.map` link map file provides `__BL1_RAM_END__` address. -* `bl2.map` link map file provides `__BL2_END__` address. -* `bl31.map` link map file provides `__BL31_END__` address. -* `bl32.map` link map file provides `__BL32_END__` address. - -For each bootloader image, the platform code must provide its start address -as well as a limit address that it must not overstep. The latter is used in the -linker scripts to check that the image doesn't grow past that address. If that -happens, the linker will issue a message similar to the following: - - aarch64-none-elf-ld: BLx has exceeded its limit. - -Additionally, if the platform memory layout implies some image overlaying like -on FVP, BL3-1 and TSP need to know the limit address that their PROGBITS -sections must not overstep. The platform code must provide those. - - -#### Memory layout on ARM FVPs - -The following list describes the memory layout on the FVP: - -* A 4KB page of shared memory is used to store the entrypoint mailboxes - and the parameters passed between bootloaders. The shared memory is located - at the base of the Trusted SRAM. The amount of Trusted SRAM available to - load the bootloader images will be reduced by the size of the shared memory. - -* BL1 is originally sitting in the Trusted ROM at address `0x0`. Its - read-write data are relocated at the top of the Trusted SRAM at runtime. - -* BL3-1 is loaded at the top of the Trusted SRAM, such that its NOBITS - sections will overwrite BL1 R/W data. - -* BL2 is loaded below BL3-1. - -* BL3-2 can be loaded in one of the following locations: - - * Trusted SRAM - * Trusted DRAM - * Secure region of DRAM (top 16MB of DRAM configured by the TrustZone - controller) - -When BL3-2 is loaded into Trusted SRAM, its NOBITS sections are allowed to -overlay BL2. This memory layout is designed to give the BL3-2 image as much -memory as possible when it is loaded into Trusted SRAM. - -The location of the BL3-2 image will result in different memory maps. This is -illustrated in the following diagrams using the TSP as an example. - -**TSP in Trusted SRAM (default option):** - - Trusted SRAM - 0x04040000 +----------+ loaded by BL2 ------------------ - | BL1 (rw) | <<<<<<<<<<<<< | BL3-1 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | | <<<<<<<<<<<<< | BL3-1 PROGBITS | - |----------| ------------------ - | BL2 | <<<<<<<<<<<<< | BL3-2 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | | <<<<<<<<<<<<< | BL3-2 PROGBITS | - 0x04001000 +----------+ ------------------ - | Shared | - 0x04000000 +----------+ - - Trusted ROM - 0x04000000 +----------+ - | BL1 (ro) | - 0x00000000 +----------+ - - -**TSP in Trusted DRAM:** - - Trusted DRAM - 0x08000000 +----------+ - | BL3-2 | - 0x06000000 +----------+ - - Trusted SRAM - 0x04040000 +----------+ loaded by BL2 ------------------ - | BL1 (rw) | <<<<<<<<<<<<< | BL3-1 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | | <<<<<<<<<<<<< | BL3-1 PROGBITS | - |----------| ------------------ - | BL2 | - |----------| - | | - 0x04001000 +----------+ - | Shared | - 0x04000000 +----------+ - - Trusted ROM - 0x04000000 +----------+ - | BL1 (ro) | - 0x00000000 +----------+ - -**TSP in the TZC-Secured DRAM:** - - DRAM - 0xffffffff +----------+ - | BL3-2 | (secure) - 0xff000000 +----------+ - | | - : : (non-secure) - | | - 0x80000000 +----------+ - - Trusted SRAM - 0x04040000 +----------+ loaded by BL2 ------------------ - | BL1 (rw) | <<<<<<<<<<<<< | BL3-1 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | | <<<<<<<<<<<<< | BL3-1 PROGBITS | - |----------| ------------------ - | BL2 | - |----------| - | | - 0x04001000 +----------+ - | Shared | - 0x04000000 +----------+ - - Trusted ROM - 0x04000000 +----------+ - | BL1 (ro) | - 0x00000000 +----------+ - -Moving the TSP image out of the Trusted SRAM doesn't change the memory layout -of the other boot loader images in Trusted SRAM. - - -#### Memory layout on Juno ARM development platform - -The following list describes the memory layout on Juno: - -* Trusted SRAM at 0x04000000 contains the MHU page, BL1 r/w section, BL2 - image, BL3-1 image and, optionally, the BL3-2 image. - -* The MHU 4 KB page is used as communication channel between SCP and AP. It - also contains the entrypoint mailboxes for the AP. Mailboxes are stored in - the first 128 bytes of the MHU page. - -* BL1 resides in flash memory at address `0x0BEC0000`. Its read-write data - section is relocated to the top of the Trusted SRAM at runtime. - -* BL3-1 is loaded at the top of the Trusted SRAM, such that its NOBITS - sections will overwrite BL1 R/W data. This implies that BL1 global variables - will remain valid only until execution reaches the BL3-1 entry point during - a cold boot. - -* BL2 is loaded below BL3-1. - -* BL3-0 is loaded temporarily into the BL3-1 memory region and transfered to - the SCP before being overwritten by BL3-1. - -* The BL3-2 image is optional and can be loaded into one of these two - locations: Trusted SRAM (right after the MHU page) or DRAM (14 MB starting - at 0xFF000000 and secured by the TrustZone controller). When loaded into - Trusted SRAM, its NOBITS sections are allowed to overlap BL2. - -Depending on the location of the BL3-2 image, it will result in different memory -maps, illustrated by the following diagrams. - -**BL3-2 in Trusted SRAM (default option):** - - Flash0 - 0x0C000000 +----------+ - : : - 0x0BED0000 |----------| - | BL1 (ro) | - 0x0BEC0000 |----------| - : : - 0x08000000 +----------+ BL3-1 is loaded - after BL3-0 has - Trusted SRAM been sent to SCP - 0x04040000 +----------+ loaded by BL2 ------------------ - | BL1 (rw) | <<<<<<<<<<<<< | BL3-1 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | BL3-0 | <<<<<<<<<<<<< | BL3-1 PROGBITS | - |----------| ------------------ - | BL2 | <<<<<<<<<<<<< | BL3-2 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | | <<<<<<<<<<<<< | BL3-2 PROGBITS | - 0x04001000 +----------+ ------------------ - | MHU | - 0x04000000 +----------+ - - -**BL3-2 in the secure region of DRAM:** - - DRAM - 0xFFE00000 +----------+ - | BL3-2 | (secure) - 0xFF000000 |----------| - | | - : : (non-secure) - | | - 0x80000000 +----------+ - - Flash0 - 0x0C000000 +----------+ - : : - 0x0BED0000 |----------| - | BL1 (ro) | - 0x0BEC0000 |----------| - : : - 0x08000000 +----------+ BL3-1 is loaded - after BL3-0 has - Trusted SRAM been sent to SCP - 0x04040000 +----------+ loaded by BL2 ------------------ - | BL1 (rw) | <<<<<<<<<<<<< | BL3-1 NOBITS | - |----------| <<<<<<<<<<<<< |----------------| - | BL3-0 | <<<<<<<<<<<<< | BL3-1 PROGBITS | - |----------| ------------------ - | BL2 | - |----------| - | | - 0x04001000 +----------+ - | MHU | - 0x04000000 +----------+ - -Loading the BL3-2 image in DRAM doesn't change the memory layout of the other -images in Trusted SRAM. - - -10. Firmware Image Package (FIP) ---------------------------------- - -Using a Firmware Image Package (FIP) allows for packing bootloader images (and -potentially other payloads) into a single archive that can be loaded by the ARM -Trusted Firmware from non-volatile platform storage. A driver to load images -from a FIP has been added to the storage layer and allows a package to be read -from supported platform storage. A tool to create Firmware Image Packages is -also provided and described below. - -### Firmware Image Package layout - -The FIP layout consists of a table of contents (ToC) followed by payload data. -The ToC itself has a header followed by one or more table entries. The ToC is -terminated by an end marker entry. All ToC entries describe some payload data -that has been appended to the end of the binary package. With the information -provided in the ToC entry the corresponding payload data can be retrieved. - - ------------------ - | ToC Header | - |----------------| - | ToC Entry 0 | - |----------------| - | ToC Entry 1 | - |----------------| - | ToC End Marker | - |----------------| - | | - | Data 0 | - | | - |----------------| - | | - | Data 1 | - | | - ------------------ - -The ToC header and entry formats are described in the header file -`include/firmware_image_package.h`. This file is used by both the tool and the -ARM Trusted firmware. - -The ToC header has the following fields: - `name`: The name of the ToC. This is currently used to validate the header. - `serial_number`: A non-zero number provided by the creation tool - `flags`: Flags associated with this data. None are yet defined. - -A ToC entry has the following fields: - `uuid`: All files are referred to by a pre-defined Universally Unique - IDentifier [UUID] . The UUIDs are defined in - `include/firmware_image_package`. The platform translates the requested - image name into the corresponding UUID when accessing the package. - `offset_address`: The offset address at which the corresponding payload data - can be found. The offset is calculated from the ToC base address. - `size`: The size of the corresponding payload data in bytes. - `flags`: Flags associated with this entry. Non are yet defined. - -### Firmware Image Package creation tool - -The FIP creation tool can be used to pack specified images into a binary package -that can be loaded by the ARM Trusted Firmware from platform storage. The tool -currently only supports packing bootloader images. Additional image definitions -can be added to the tool as required. - -The tool can be found in `tools/fip_create`. - -### Loading from a Firmware Image Package (FIP) - -The Firmware Image Package (FIP) driver can load images from a binary package on -non-volatile platform storage. For the FVPs this is currently NOR FLASH. - -Bootloader images are loaded according to the platform policy as specified in -`plat/<platform>/plat_io_storage.c`. For the FVPs this means the platform will -attempt to load images from a Firmware Image Package located at the start of NOR -FLASH0. - -Currently the FVP's policy only allows loading of a known set of images. The -platform policy can be modified to allow additional images. - - -11. Use of coherent memory in Trusted Firmware ----------------------------------------------- - -There might be loss of coherency when physical memory with mismatched -shareability, cacheability and memory attributes is accessed by multiple CPUs -(refer to section B2.9 of [ARM ARM] for more details). This possibility occurs -in Trusted Firmware during power up/down sequences when coherency, MMU and -caches are turned on/off incrementally. - -Trusted Firmware defines coherent memory as a region of memory with Device -nGnRE attributes in the translation tables. The translation granule size in -Trusted Firmware is 4KB. This is the smallest possible size of the coherent -memory region. - -By default, all data structures which are susceptible to accesses with -mismatched attributes from various CPUs are allocated in a coherent memory -region (refer to section 2.1 of [Porting Guide]). The coherent memory region -accesses are Outer Shareable, non-cacheable and they can be accessed -with the Device nGnRE attributes when the MMU is turned on. Hence, at the -expense of at least an extra page of memory, Trusted Firmware is able to work -around coherency issues due to mismatched memory attributes. - -The alternative to the above approach is to allocate the susceptible data -structures in Normal WriteBack WriteAllocate Inner shareable memory. This -approach requires the data structures to be designed so that it is possible to -work around the issue of mismatched memory attributes by performing software -cache maintenance on them. - -### Disabling the use of coherent memory in Trusted Firmware - -It might be desirable to avoid the cost of allocating coherent memory on -platforms which are memory constrained. Trusted Firmware enables inclusion of -coherent memory in firmware images through the build flag `USE_COHERENT_MEM`. -This flag is enabled by default. It can be disabled to choose the second -approach described above. - -The below sections analyze the data structures allocated in the coherent memory -region and the changes required to allocate them in normal memory. - -### PSCI Affinity map nodes - -The `psci_aff_map` data structure stores the hierarchial node information for -each affinity level in the system including the PSCI states associated with them. -By default, this data structure is allocated in the coherent memory region in -the Trusted Firmware because it can be accessed by multiple CPUs, either with -their caches enabled or disabled. - - typedef struct aff_map_node { - unsigned long mpidr; - unsigned char ref_count; - unsigned char state; - unsigned char level; - #if USE_COHERENT_MEM - bakery_lock_t lock; - #else - unsigned char aff_map_index; - #endif - } aff_map_node_t; - -In order to move this data structure to normal memory, the use of each of its -fields must be analyzed. Fields like `mpidr` and `level` are only written once -during cold boot. Hence removing them from coherent memory involves only doing -a clean and invalidate of the cache lines after these fields are written. - -The fields `state` and `ref_count` can be concurrently accessed by multiple -CPUs in different cache states. A Lamport's Bakery lock is used to ensure mutual -exlusion to these fields. As a result, it is possible to move these fields out -of coherent memory by performing software cache maintenance on them. The field -`lock` is the bakery lock data structure when `USE_COHERENT_MEM` is enabled. -The `aff_map_index` is used to identify the bakery lock when `USE_COHERENT_MEM` -is disabled. - -### Bakery lock data - -The bakery lock data structure `bakery_lock_t` is allocated in coherent memory -and is accessed by multiple CPUs with mismatched attributes. `bakery_lock_t` is -defined as follows: - - typedef struct bakery_lock { - int owner; - volatile char entering[BAKERY_LOCK_MAX_CPUS]; - volatile unsigned number[BAKERY_LOCK_MAX_CPUS]; - } bakery_lock_t; - -It is a characteristic of Lamport's Bakery algorithm that the volatile per-CPU -fields can be read by all CPUs but only written to by the owning CPU. - -Depending upon the data cache line size, the per-CPU fields of the -`bakery_lock_t` structure for multiple CPUs may exist on a single cache line. -These per-CPU fields can be read and written during lock contention by multiple -CPUs with mismatched memory attributes. Since these fields are a part of the -lock implementation, they do not have access to any other locking primitive to -safeguard against the resulting coherency issues. As a result, simple software -cache maintenance is not enough to allocate them in coherent memory. Consider -the following example. - -CPU0 updates its per-CPU field with data cache enabled. This write updates a -local cache line which contains a copy of the fields for other CPUs as well. Now -CPU1 updates its per-CPU field of the `bakery_lock_t` structure with data cache -disabled. CPU1 then issues a DCIVAC operation to invalidate any stale copies of -its field in any other cache line in the system. This operation will invalidate -the update made by CPU0 as well. - -To use bakery locks when `USE_COHERENT_MEM` is disabled, the lock data structure -has been redesigned. The changes utilise the characteristic of Lamport's Bakery -algorithm mentioned earlier. The per-CPU fields of the new lock structure are -aligned such that they are allocated on separate cache lines. The per-CPU data -framework in Trusted Firmware is used to achieve this. This enables software to -perform software cache maintenance on the lock data structure without running -into coherency issues associated with mismatched attributes. - -The per-CPU data framework enables consolidation of data structures on the -fewest cache lines possible. This saves memory as compared to the scenario where -each data structure is separately aligned to the cache line boundary to achieve -the same effect. - -The bakery lock data structure `bakery_info_t` is defined for use when -`USE_COHERENT_MEM` is disabled as follows: - - typedef struct bakery_info { - /* - * The lock_data is a bit-field of 2 members: - * Bit[0] : choosing. This field is set when the CPU is - * choosing its bakery number. - * Bits[1 - 15] : number. This is the bakery number allocated. - */ - volatile uint16_t lock_data; - } bakery_info_t; - -The `bakery_info_t` represents a single per-CPU field of one lock and -the combination of corresponding `bakery_info_t` structures for all CPUs in the -system represents the complete bakery lock. It is embedded in the per-CPU -data framework `cpu_data` as shown below: - - CPU0 cpu_data - ------------------ - | .... | - |----------------| - | `bakery_info_t`| <-- Lock_0 per-CPU field - | Lock_0 | for CPU0 - |----------------| - | `bakery_info_t`| <-- Lock_1 per-CPU field - | Lock_1 | for CPU0 - |----------------| - | .... | - |----------------| - | `bakery_info_t`| <-- Lock_N per-CPU field - | Lock_N | for CPU0 - ------------------ - - - CPU1 cpu_data - ------------------ - | .... | - |----------------| - | `bakery_info_t`| <-- Lock_0 per-CPU field - | Lock_0 | for CPU1 - |----------------| - | `bakery_info_t`| <-- Lock_1 per-CPU field - | Lock_1 | for CPU1 - |----------------| - | .... | - |----------------| - | `bakery_info_t`| <-- Lock_N per-CPU field - | Lock_N | for CPU1 - ------------------ - -Consider a system of 2 CPUs with 'N' bakery locks as shown above. For an -operation on Lock_N, the corresponding `bakery_info_t` in both CPU0 and CPU1 -`cpu_data` need to be fetched and appropriate cache operations need to be -performed for each access. - -For multiple bakery locks, an array of `bakery_info_t` is declared in `cpu_data` -and each lock is given an `id` to identify it in the array. - -### Non Functional Impact of removing coherent memory - -Removal of the coherent memory region leads to the additional software overhead -of performing cache maintenance for the affected data structures. However, since -the memory where the data structures are allocated is cacheable, the overhead is -mostly mitigated by an increase in performance. - -There is however a performance impact for bakery locks, due to: -* Additional cache maintenance operations, and -* Multiple cache line reads for each lock operation, since the bakery locks - for each CPU are distributed across different cache lines. - -The implementation has been optimized to mimimize this additional overhead. -Measurements indicate that when bakery locks are allocated in Normal memory, the -minimum latency of acquiring a lock is on an average 3-4 micro seconds whereas -in Device memory the same is 2 micro seconds. The measurements were done on the -Juno ARM development platform. - -As mentioned earlier, almost a page of memory can be saved by disabling -`USE_COHERENT_MEM`. Each platform needs to consider these trade-offs to decide -whether coherent memory should be used. If a platform disables -`USE_COHERENT_MEM` and needs to use bakery locks in the porting layer, it should -reserve memory in `cpu_data` by defining the macro `PLAT_PCPU_DATA_SIZE` (see -the [Porting Guide]). Refer to the reference platform code for examples. - - -12. Code Structure -------------------- - -Trusted Firmware code is logically divided between the three boot loader -stages mentioned in the previous sections. The code is also divided into the -following categories (present as directories in the source code): - -* **Architecture specific.** This could be AArch32 or AArch64. -* **Platform specific.** Choice of architecture specific code depends upon - the platform. -* **Common code.** This is platform and architecture agnostic code. -* **Library code.** This code comprises of functionality commonly used by all - other code. -* **Stage specific.** Code specific to a boot stage. -* **Drivers.** -* **Services.** EL3 runtime services, e.g. PSCI or SPD. Specific SPD services - reside in the `services/spd` directory (e.g. `services/spd/tspd`). - -Each boot loader stage uses code from one or more of the above mentioned -categories. Based upon the above, the code layout looks like this: - - Directory Used by BL1? Used by BL2? Used by BL3-1? - bl1 Yes No No - bl2 No Yes No - bl31 No No Yes - arch Yes Yes Yes - plat Yes Yes Yes - drivers Yes No Yes - common Yes Yes Yes - lib Yes Yes Yes - services No No Yes - -The build system provides a non configurable build option IMAGE_BLx for each -boot loader stage (where x = BL stage). e.g. for BL1 , IMAGE_BL1 will be -defined by the build system. This enables the Trusted Firmware to compile -certain code only for specific boot loader stages - -All assembler files have the `.S` extension. The linker source files for each -boot stage have the extension `.ld.S`. These are processed by GCC to create the -linker scripts which have the extension `.ld`. - -FDTs provide a description of the hardware platform and are used by the Linux -kernel at boot time. These can be found in the `fdts` directory. - - -13. References ---------------- - -1. Trusted Board Boot Requirements CLIENT PDD (ARM DEN 0006B-5). Available - under NDA through your ARM account representative. - -2. [Power State Coordination Interface PDD (ARM DEN 0022B.b)][PSCI]. - -3. [SMC Calling Convention PDD (ARM DEN 0028A)][SMCCC]. - -4. [ARM Trusted Firmware Interrupt Management Design guide][INTRG]. - -- - - - - - - - - - - - - - - - - - - - - - - - - - - -_Copyright (c) 2013-2014, ARM Limited and Contributors. All rights reserved._ - -[ARM ARM]: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.e/index.html "ARMv8-A Reference Manual (ARM DDI0487A.E)" -[PSCI]: http://infocenter.arm.com/help/topic/com.arm.doc.den0022c/DEN0022C_Power_State_Coordination_Interface.pdf "Power State Coordination Interface PDD (ARM DEN 0022C)" -[SMCCC]: http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html "SMC Calling Convention PDD (ARM DEN 0028A)" -[UUID]: https://tools.ietf.org/rfc/rfc4122.txt "A Universally Unique IDentifier (UUID) URN Namespace" -[User Guide]: ./user-guide.md -[Porting Guide]: ./porting-guide.md -[INTRG]: ./interrupt-framework-design.md -[CPUBM]: ./cpu-specific-build-macros.md.md |