nuttx/boards/arm64/qemu/qemu-a53/README.txt

README.txt
==========

This board configuration will use QEMU to emulate a generic Cortex-A53
hardware platform and provides support for these devices:

 - GICv3 interrupt controller
 - ARM Generic Timer
 - PL011 UART controller

Contents
========
  - Getting Started
  - Status
  - Platform Features
  - Debugging with QEMU
  - FPU Support and Performance
  - SMP Support
  - References

Getting Started
===============

1. Compile Toolchain
  1.1 Host environment
     GNU/Linux: Ubuntu 18.04 or greater
  1.2 Download and Install
     $ wget https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar.xz
     $ xz -d gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar.xz
     $ tar xf gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar

     Put gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin/ to your host PATH environment variable, like:
     $ export PATH=$PATH:/opt/software/arm/linaro-toolchain/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin
     check the toolchain:
     $ aarch64-none-elf-gcc -v

2. Install QEMU
   In Ubuntu 18.04(or greater), install qemu:
   $ sudo apt-get install qemu-system-arm qemu-efi-aarch64 qemu-utils
   And make sure install is properly:
   $ qemu-system-aarch64 --help

3. Configuring and running
  3.1 Single Core
   Configuring NuttX and compile:
   $ ./tools/configure.sh -l qemu-a53:nsh
   $ make
   Running with qemu
   $ qemu-system-aarch64 -cpu cortex-a53 -nographic \
     -machine virt,virtualization=on,gic-version=3 \
     -net none -chardev stdio,id=con,mux=on -serial chardev:con \
     -mon chardev=con,mode=readline -kernel ./nuttx

  3.2 SMP
   Configuring NuttX and compile:
   $ ./tools/configure.sh -l qemu-a53:nsh_smp
   $ make
   Runing with qemu
   $ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
      -machine virt,virtualization=on,gic-version=3 \
      -net none -chardev stdio,id=con,mux=on -serial chardev:con \
      -mon chardev=con,mode=readline -kernel ./nuttx

   Note:
   1. Make sure the aarch64-none-elf toolchain install PATH has been added to environment variable
   2. To quit QEMU, type Ctrl + X
   3. Nuttx default core number is 4, and Changing CONFIG_SMP_NCPUS > 4 and setting qemu command
     option -smp will boot more core. For qemu, core limit is 32.

Status
======

2022-07-01:

1. It's very stranger to see that signal testing of ostest is PASSED at Physical Ubuntu PC
   rather than an Ubuntu at VMWare. For Physical Ubuntu PC, I have run the ostest
   for 10 times at least but never see the crash again, but it's almost crashed every time
   running the ostest at Virtual Ubuntu in VMWare
   I check the fail point. It's seem at signal routine to access another CPU's task context reg
   will get a NULL pointer, but I watch the task context with GDB, everything is OK.
   So maybe this is a SMP cache synchronize issue? But I have done cache synchronize
   operation at thread switch and how to explain why the crash not happening at
   Physical Ubuntu PC?
   So maybe this is a qemu issue at VMWare. I am planning to run
   the arm64 to real hardware platform like IMX8 and will check the issue again

2022-06-12:

1. SMP is support at QEMU. Add psci interface, armv8 cache operation(data cache)
 and smccc support. The system can run into nsh shell, SMP test is PASSED, but
 ostest crash at signal testing


2022-05-22:
  Arm64 support version for NuttX is Ready, These Features supported:
1.Cotex-a53 single core support: With the supporting of GICv3,
  Arch timer, PL101 UART, The system can run into nsh shell.
  Running ostest seem PASSED.

2.qemu-a53 board configuration support: qemu-a53 board can configuring
  and compiling, And runing with qemu-system-aarch64
  at Ubuntu 18.04.
3.FPU support for armv8-a: FPU context switching in NEON/floating-point
  TRAP was supported.  FPU registers saving at vfork and independent
  FPU context for signal routine was considered but more testing
  needs to be do.

Platform Features
=================

The following hardware features are supported:
+--------------+------------+----------------------+
| Interface    | Controller | Driver/Component     |
+==============+============+======================+
| GIC          | on-chip    | interrupt controller |
+--------------+------------+----------------------+
| PL011 UART   | on-chip    | serial port          |
+--------------+------------+----------------------+
| ARM TIMER    | on-chip    | system clock         |
+--------------+------------+----------------------+

The kernel currently does not support other hardware features on this
qemu platform.


Debugging with QEMU
===================

The nuttx ELF image can be debugged with QEMU.

1. To debug the nuttx (ELF) with symbols, make sure the following change have
   applied to defconfig.

+CONFIG_DEBUG_SYMBOLS=y

2. Run QEMU(at shell terminal 1)

   Single Core
   $ qemu-system-aarch64 -cpu cortex-a53 -nographic -machine virt,virtualization=on,gic-version=3 \
     -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline \
     -kernel ./nuttx -S -s
   SMP
   $ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 \
     -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline \
     -kernel ./nuttx -S -s
   

3. Run gdb with TUI, connect to QEMU, load nuttx and continue (at shell terminal 2)

   $ aarch64-none-elf-gdb -tui --eval-command='target remote localhost:1234' nuttx
   (gdb) set debug aarch64
   (gdb) c
   Continuing.
   ^C
   Program received signal SIGINT, Interrupt.
   arch_cpu_idle () at common/arm64_cpu_idle.S:37
   (gdb)
   (gdb) where
   #0  arch_cpu_idle () at common/arm64_cpu_idle.S:37
   #1  0x00000000402823ec in nx_start () at init/nx_start.c:742
   #2  0x0000000040280148 in arm64_boot_primary_c_routine () at common/arm64_boot.c:184
   #3  0x00000000402a5bf8 in switch_el () at common/arm64_head.S:201
   (gdb)

   SMP Case
   Thread 1 received signal SIGINT, Interrupt.
    arch_cpu_idle () at common/arm64_cpu_idle.S:37
   (gdb) info threads
    Id   Target Id                  Frame
  * 1    Thread 1 (CPU#0 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
    2    Thread 2 (CPU#1 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
    3    Thread 3 (CPU#2 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
    4    Thread 4 (CPU#3 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
   (gdb)

   Note:
   1. it will make your debugging more easier in source level if you setting 
      CONFIG_DEBUG_FULLOPT=n. but there is a risk of stack overflow when the 
      option is disabled. Just enlarging your stack size will avoid the 
      issue (eg. enlarging CONFIG_DEFAULT_TASK_STACKSIZE)
   2. TODO: ARMv8-A Supporting for tools/nuttx-gdbinit


FPU Support and Performance
===========================
  I was using FPU trap to handle FPU context switch. For threads accessing
the FPU (FPU instructions or registers), a trap will happen at this thread,
the FPU context will be saved/restore for the thread at the trap handler.
  It will improve performance for thread switch since it's not to save/restore
the FPU context (almost 512 bytes) at the thread switch anymore. But some issue
need to be considered:

1. Floating point argument passing issue
    In many cases, the FPU trap is triggered by va_start() that copies
the content of FP registers used for floating point argument passing
into the va_list object in case there were actual float arguments from
the caller. But In practice this is almost never the case. 
Seeing the save_count/restore_count at the g_cpu_fpu_ctx, which will
be increase when saving/restoring FPU context. After runing ostest,
we can see the count with GDB:
   
(gdb) p g_cpu_fpu_ctx
  $1 = {{fpu_owner = 0x0, idle_thread = 0x402b3110 <g_idletcb>,
      save_count = 1293, restore_count = 2226, switch_count = 4713,
      exe_depth_count = 0}}
(gdb)
     
    adding -mgeneral-regs-only option will make compiler not use the FPU
register, we can use the following patch to syslog:

diff --git a/libs/libc/syslog/Make.defs b/libs/libc/syslog/Make.defs
index c58fb45512..acac6febaa
--- a/libs/libc/syslog/Make.defs
+++ b/libs/libc/syslog/Make.defs
@@ -26,3 +26,4 @@ CSRCS += lib_syslog.c lib_setlogmask.c

 DEPPATH += --dep-path syslog
 VPATH += :syslog
+syslog/lib_syslog.c_CFLAGS += -mgeneral-regs-only
   
    With the option to make NuttX and booting. After runing ostest, see 
the count with GDB again:

(gdb) p g_cpu_fpu_ctx
$1 = {{fpu_owner = 0x0, idle_thread = 0x402b3110 <g_idletcb>, save_count = 141,
    restore_count = 170, switch_count = 4715, exe_depth_count = 0}}
(gdb)

    it's only 141/170 for saving/restoring FPU context, which is 1293/2226 before
add this compile option. Almost all of FPU accessing switch is argument passing
at the syslog.
    I cannot commit the patch for NuttX mainline because it's very special case
since ostest is using syslog for lots of information printing. but this is 
a clue for FPU performance analysis. va_list object is using for many C code to
handle argument passing, but if it's not passing floating point argument indeed.
Add the option to your code maybe increase FPU performance

2. FPU trap at IRQ handler
    it's probably need to handle FPU trap at IRQ routine. Exception_depth is
handling for this case, it will inc/dec at enter/leave exception. If the
exception_depth > 1, that means an exception occurring when another exception
is executing, the present implement is to switch FPU context to idle thread,
it will handle most case for calling printf-like rountine at IRQ routine.
    But in fact, this case will make uncertainty interrupt processing time sine
it's uncertainty for trap exception handling. It would be best to add
"-mgeneral-regs-only" option to compile the IRQ code avoiding accessing FP
register.
    if it's necessarily for the exception routine to use FPU, calling function to
save/restore FPU context directly maybe become a solution. Linux kernel introduce
kernel_neon_begin/kernel_neon_end function for this case. Similar function will
be add to NuttX if this issue need to be handle.

SMP Support
===========
1. Booting
   Primary core call sequence
     arm64_start
       ->arm64_boot_primary_c_routine
         ->arm64_chip_boot
           ->set init TBBR and Enable MMU
         ->nx_start
           ->OS component initialize
             ->Initialize GIC: GICD and Primary core GICR
           ->nx_smp_start
             for every CPU core
             ->up_cpu_start
               ->arm64_start_cpu(call PCSI to boot CPU)
               ->waiting for every core to boot 
           ->nx_bringup

   Secondary Core call sequence
     arm64_start
       ->arm64_boot_secondary_c_routine
         ->Enable MMU
         ->Initialize GIC: Secondary core GICR
         ->Notify Primary core booting is Ready
         ->nx_idle_trampoline
         
2. interrupt

SGI
   SGI_CPU_PAUSE: for core pause request, for every core

PPI
   ARM_ARCH_TIMER_IRQ: timer interrupt, handle by primary Core

SPI
   CONFIG_QEMU_UART_IRQ: serial driver interrupt, handle by primary Core

3. Timer
 The origin design for ARMv8-A timer is assigned private timer to
 every PE(CPU core), the ARM_ARCH_TIMER_IRQ is a PPI so it's
 should be enabled at every core.

 But for NuttX, it's design only for primary core to handle timer
 interrupt and call nxsched_process_timer at timer tick mode. 
 So we need only enable timer for primary core

 IMX6 use GPT which is a SPI rather than generic timer to handle
 timer interrupt

References
===========

1. (ID050815) ARM® Cortex®-A Series - Programmer’s Guide for ARMv8-A
2. (ID020222) Arm® Architecture Reference Manual - for A profile architecture
3. (ARM062-948681440-3280) Armv8-A Instruction Set Architecture
4. AArch64 Exception and Interrupt Handling
5. AArch64 Programmer's Guides Generic Timer
6. Arm Generic Interrupt Controller v3 and v4 Overview
7. Arm® Generic Interrupt Controller Architecture Specification GIC architecture version 3 and version 4
8. (DEN0022D.b) Arm Power State Coordination Interface Platform Design Document
-												arch: arm64: ARMv8-A support for NuttX

N/A

Summary:

Arm64 support for NuttX, Features supported:

1. Cortex-a53 single core and SMP support: it's can run into nsh shell at
   qemu virt machine.

2. qemu-a53 board configuration support: it's only for evaluate propose

3. FPU support for armv8-a: FPU context switching at NEON/floating-point
  TRAP is supported.

4. psci interface, armv8 cache operation(data cache) and smccc support.

5. fix mass code style issue, thank for @xiaoxiang781216, @hartmannathan @pkarashchenko

Please refer to boards/arm64/qemu/qemu-a53/README.txt for detail

Note:
1. GCC MACOS issue
The GCC 11.2 toolchain for MACOS may get crash while compiling
float operation function, the following link describe the issue
and give analyse at the issue:

https://bugs.linaro.org/show_bug.cgi?id=5825

it's seem GCC give a wrong instruction at certain machine which
without architecture features

the new toolchain is not available still, so just disable the MACOS
cibuild check at present

Signed-off-by: qinwei1 <qinwei1@xiaomi.com>

											
										
										
											2022-06-18 13:26:10 +02:00
+								README.txt
 								==========
 								This board configuration will use QEMU to emulate a generic Cortex-A53
 								hardware platform and provides support for these devices:
 								 - GICv3 interrupt controller
 								 - ARM Generic Timer
 								 - PL011 UART controller
 								Contents
 								========
 								  - Getting Started
 								  - Status
 								  - Platform Features
 								  - Debugging with QEMU
 								  - FPU Support and Performance
 								  - SMP Support
 								  - References
 								Getting Started
 								===============
 . Compile Toolchain
 .1 Host environment
 								     GNU/Linux: Ubuntu 18.04 or greater
 .2 Download and Install
 								     $ wget https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar.xz
 								     $ xz -d gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar.xz
 								     $ tar xf gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf.tar
 								     Put gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin/ to your host PATH environment variable, like:
 								     $ export PATH=$PATH:/opt/software/arm/linaro-toolchain/gcc-arm-11.2-2022.02-x86_64-aarch64-none-elf/bin
 								     check the toolchain:
 								     $ aarch64-none-elf-gcc -v
 . Install QEMU
 								   In Ubuntu 18.04(or greater), install qemu:
 								   $ sudo apt-get install qemu-system-arm qemu-efi-aarch64 qemu-utils
 								   And make sure install is properly:
 								   $ qemu-system-aarch64 --help
 . Configuring and running
 .1 Single Core
 								   Configuring NuttX and compile:
 								   $ ./tools/configure.sh -l qemu-a53:nsh
 								   $ make
 								   Running with qemu
 								   $ qemu-system-aarch64 -cpu cortex-a53 -nographic \
 								     -machine virt,virtualization=on,gic-version=3 \
 								     -net none -chardev stdio,id=con,mux=on -serial chardev:con \
 								     -mon chardev=con,mode=readline -kernel ./nuttx
 .2 SMP
 								   Configuring NuttX and compile:
 								   $ ./tools/configure.sh -l qemu-a53:nsh_smp
 								   $ make
 								   Runing with qemu
 								   $ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
 								      -machine virt,virtualization=on,gic-version=3 \
 								      -net none -chardev stdio,id=con,mux=on -serial chardev:con \
 								      -mon chardev=con,mode=readline -kernel ./nuttx
 								   Note:
 . Make sure the aarch64-none-elf toolchain install PATH has been added to environment variable
 . To quit QEMU, type Ctrl + X
 . Nuttx default core number is 4, and Changing CONFIG_SMP_NCPUS > 4 and setting qemu command
 								     option -smp will boot more core. For qemu, core limit is 32.
 								Status
 								======
 -07-01:
 . It's very stranger to see that signal testing of ostest is PASSED at Physical Ubuntu PC
 								   rather than an Ubuntu at VMWare. For Physical Ubuntu PC, I have run the ostest
 								   for 10 times at least but never see the crash again, but it's almost crashed every time
 								   running the ostest at Virtual Ubuntu in VMWare
 								   I check the fail point. It's seem at signal routine to access another CPU's task context reg
 								   will get a NULL pointer, but I watch the task context with GDB, everything is OK.
 								   So maybe this is a SMP cache synchronize issue? But I have done cache synchronize
 								   operation at thread switch and how to explain why the crash not happening at
 								   Physical Ubuntu PC?
 								   So maybe this is a qemu issue at VMWare. I am planning to run
 								   the arm64 to real hardware platform like IMX8 and will check the issue again
 -06-12:
 . SMP is support at QEMU. Add psci interface, armv8 cache operation(data cache)
 								 and smccc support. The system can run into nsh shell, SMP test is PASSED, but
 								 ostest crash at signal testing
 -05-22:
 								  Arm64 support version for NuttX is Ready, These Features supported:
 .Cotex-a53 single core support: With the supporting of GICv3,
 								  Arch timer, PL101 UART, The system can run into nsh shell.
 								  Running ostest seem PASSED.
 .qemu-a53 board configuration support: qemu-a53 board can configuring
 								  and compiling, And runing with qemu-system-aarch64
 								  at Ubuntu 18.04.
 .FPU support for armv8-a: FPU context switching in NEON/floating-point
 								  TRAP was supported.  FPU registers saving at vfork and independent
 								  FPU context for signal routine was considered but more testing
 								  needs to be do.
 								Platform Features
 								=================
 								The following hardware features are supported:
 								+--------------+------------+----------------------+
 								| Interface    | Controller | Driver/Component     |
 								+==============+============+======================+
 								| GIC          | on-chip    | interrupt controller |
 								+--------------+------------+----------------------+
 								| PL011 UART   | on-chip    | serial port          |
 								+--------------+------------+----------------------+
 								| ARM TIMER    | on-chip    | system clock         |
 								+--------------+------------+----------------------+
 								The kernel currently does not support other hardware features on this
 								qemu platform.
 								Debugging with QEMU
 								===================
 								The nuttx ELF image can be debugged with QEMU.
 . To debug the nuttx (ELF) with symbols, make sure the following change have
 								   applied to defconfig.
 								+CONFIG_DEBUG_SYMBOLS=y
 . Run QEMU(at shell terminal 1)
 								   Single Core
 								   $ qemu-system-aarch64 -cpu cortex-a53 -nographic -machine virt,virtualization=on,gic-version=3 \
 								     -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline \
 								     -kernel ./nuttx -S -s
 								   SMP
 								   $ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 \
 								     -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline \
 								     -kernel ./nuttx -S -s
 . Run gdb with TUI, connect to QEMU, load nuttx and continue (at shell terminal 2)
 								   $ aarch64-none-elf-gdb -tui --eval-command='target remote localhost:1234' nuttx
 								   (gdb) set debug aarch64
 								   (gdb) c
 								   Continuing.
 								   ^C
 								   Program received signal SIGINT, Interrupt.
 								   arch_cpu_idle () at common/arm64_cpu_idle.S:37
 								   (gdb)
 								   (gdb) where
 								   #0  arch_cpu_idle () at common/arm64_cpu_idle.S:37
 								   #1  0x00000000402823ec in nx_start () at init/nx_start.c:742
 								   #2  0x0000000040280148 in arm64_boot_primary_c_routine () at common/arm64_boot.c:184
 								   #3  0x00000000402a5bf8 in switch_el () at common/arm64_head.S:201
 								   (gdb)
 								   SMP Case
 								   Thread 1 received signal SIGINT, Interrupt.
 								    arch_cpu_idle () at common/arm64_cpu_idle.S:37
 								   (gdb) info threads
 								    Id   Target Id                  Frame
 								  * 1    Thread 1 (CPU#0 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
 Thread 2 (CPU#1 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
 Thread 3 (CPU#2 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
 Thread 4 (CPU#3 [halted ]) arch_cpu_idle () at common/arm64_cpu_idle.S:37
 								   (gdb)
 								   Note:
 . it will make your debugging more easier in source level if you setting
 								      CONFIG_DEBUG_FULLOPT=n. but there is a risk of stack overflow when the
 								      option is disabled. Just enlarging your stack size will avoid the
 								      issue (eg. enlarging CONFIG_DEFAULT_TASK_STACKSIZE)
 . TODO: ARMv8-A Supporting for tools/nuttx-gdbinit
 								FPU Support and Performance
 								===========================
 								  I was using FPU trap to handle FPU context switch. For threads accessing
 								the FPU (FPU instructions or registers), a trap will happen at this thread,
 								the FPU context will be saved/restore for the thread at the trap handler.
 								  It will improve performance for thread switch since it's not to save/restore
 								the FPU context (almost 512 bytes) at the thread switch anymore. But some issue
 								need to be considered:
 . Floating point argument passing issue
 								    In many cases, the FPU trap is triggered by va_start() that copies
 								the content of FP registers used for floating point argument passing
 								into the va_list object in case there were actual float arguments from
 								the caller. But In practice this is almost never the case.
 								Seeing the save_count/restore_count at the g_cpu_fpu_ctx, which will
 								be increase when saving/restoring FPU context. After runing ostest,
 								we can see the count with GDB:
 								(gdb) p g_cpu_fpu_ctx
 								  $1 = {{fpu_owner = 0x0, idle_thread = 0x402b3110 <g_idletcb>,
 								      save_count = 1293, restore_count = 2226, switch_count = 4713,
 								      exe_depth_count = 0}}
 								(gdb)
 								    adding -mgeneral-regs-only option will make compiler not use the FPU
 								register, we can use the following patch to syslog:
 								diff --git a/libs/libc/syslog/Make.defs b/libs/libc/syslog/Make.defs
 								index c58fb45512..acac6febaa
 								--- a/libs/libc/syslog/Make.defs
 								+++ b/libs/libc/syslog/Make.defs
@@ -26,3 +26,4 @@ CSRCS += lib_syslog.c lib_setlogmask.c
 								 DEPPATH += --dep-path syslog
 								 VPATH += :syslog
 								+syslog/lib_syslog.c_CFLAGS += -mgeneral-regs-only
 								    With the option to make NuttX and booting. After runing ostest, see
 								the count with GDB again:
 								(gdb) p g_cpu_fpu_ctx
 								$1 = {{fpu_owner = 0x0, idle_thread = 0x402b3110 <g_idletcb>, save_count = 141,
 								    restore_count = 170, switch_count = 4715, exe_depth_count = 0}}
 								(gdb)
 								    it's only 141/170 for saving/restoring FPU context, which is 1293/2226 before
 								add this compile option. Almost all of FPU accessing switch is argument passing
 								at the syslog.
 								    I cannot commit the patch for NuttX mainline because it's very special case
 								since ostest is using syslog for lots of information printing. but this is
 								a clue for FPU performance analysis. va_list object is using for many C code to
 								handle argument passing, but if it's not passing floating point argument indeed.
 								Add the option to your code maybe increase FPU performance
 . FPU trap at IRQ handler
 								    it's probably need to handle FPU trap at IRQ routine. Exception_depth is
 								handling for this case, it will inc/dec at enter/leave exception. If the
 								exception_depth > 1, that means an exception occurring when another exception
 								is executing, the present implement is to switch FPU context to idle thread,
 								it will handle most case for calling printf-like rountine at IRQ routine.
 								    But in fact, this case will make uncertainty interrupt processing time sine
 								it's uncertainty for trap exception handling. It would be best to add
 								"-mgeneral-regs-only" option to compile the IRQ code avoiding accessing FP
 								register.
 								    if it's necessarily for the exception routine to use FPU, calling function to
 								save/restore FPU context directly maybe become a solution. Linux kernel introduce
 								kernel_neon_begin/kernel_neon_end function for this case. Similar function will
 								be add to NuttX if this issue need to be handle.
 								SMP Support
 								===========
 . Booting
 								   Primary core call sequence
 								     arm64_start
 								       ->arm64_boot_primary_c_routine
 								         ->arm64_chip_boot
 								           ->set init TBBR and Enable MMU
 								         ->nx_start
 								           ->OS component initialize
 								             ->Initialize GIC: GICD and Primary core GICR
 								           ->nx_smp_start
 								             for every CPU core
 								             ->up_cpu_start
 								               ->arm64_start_cpu(call PCSI to boot CPU)
 								               ->waiting for every core to boot
 								           ->nx_bringup
 								   Secondary Core call sequence
 								     arm64_start
 								       ->arm64_boot_secondary_c_routine
 								         ->Enable MMU
 								         ->Initialize GIC: Secondary core GICR
 								         ->Notify Primary core booting is Ready
 								         ->nx_idle_trampoline
 . interrupt
 								SGI
 								   SGI_CPU_PAUSE: for core pause request, for every core
 								PPI
 								   ARM_ARCH_TIMER_IRQ: timer interrupt, handle by primary Core
 								SPI
 								   CONFIG_QEMU_UART_IRQ: serial driver interrupt, handle by primary Core
 . Timer
 								 The origin design for ARMv8-A timer is assigned private timer to
 								 every PE(CPU core), the ARM_ARCH_TIMER_IRQ is a PPI so it's
 								 should be enabled at every core.
 								 But for NuttX, it's design only for primary core to handle timer
 								 interrupt and call nxsched_process_timer at timer tick mode.
 								 So we need only enable timer for primary core
 								 IMX6 use GPT which is a SPI rather than generic timer to handle
 								 timer interrupt
 								References
 								===========
 . (ID050815) ARM® Cortex®-A Series - Programmer’s Guide for ARMv8-A
 . (ID020222) Arm® Architecture Reference Manual - for A profile architecture
 . (ARM062-948681440-3280) Armv8-A Instruction Set Architecture
 . AArch64 Exception and Interrupt Handling
 . AArch64 Programmer's Guides Generic Timer
 . Arm Generic Interrupt Controller v3 and v4 Overview
 . Arm® Generic Interrupt Controller Architecture Specification GIC architecture version 3 and version 4
 . (DEN0022D.b) Arm Power State Coordination Interface Platform Design Document