Add registration function instrumentation API,
which can achieve instrumentation of entering and
exiting functions through the compiler's functionality.
We can use CONFIG_ARCH_INSTRUMENT_ALL to add instrumentation for all
source, or add '-finstrument-functions' to CFLAGS for Part of the
source.
Notice:
1. use CONFIG_ARCH_INSTRUMENT_ALL must mark _start or entry noinstrument_function,
becuase bss not set.
2. Make sure your callbacks are not instrumented recursively.
use instrument_register to register entry function and exit function.
They will be called by the instrumented function
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
The divider should be rounded to the next full integer to ensure that
the resulting SPI frequency is <= target frequency, i.e. the SPI is
not overclocked.
After this, RISC-V fully supports the kmap interface.
Due to the current design limitations of having only a single L2 table
per process, the kernel kmap area cannot be mapped via any user page
directory, as they do not contain the page tables to address that range.
So a "kernel address environment" is added, which can do the mapping. The
mapping is reflected to every process as only the root page directory (L1)
is copied to users, which means every change to L2 / L3 tables will be
seen by every user.
Replace DEBUGASSERTs with sanity checks. DEBUGASSERT()s are
not necessarily enabled at all, thus risking the functionality
especially in that case. Remove PANICs as well.
Don't enable the ihc irq too early. If enabled, and the master
is already up, the irq is being issued so that the system gets
stuck or is severely slowed down. Master may be already up if
this NuttX hart only is rebooted, for example.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
Version 1.3.1 is the latest tagged version as of November
the 21st, 2023. This patch prepares the required changes
to make v1.3.1 work.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
Some APIs are implemented both in common code and CHIP-specific code,
and the link needs to be based on the implementation in CHIP, so move
NUTTX_CHIP_ABS_DIR before common src.
Signed-off-by: zhanghongyu <zhanghongyu@xiaomi.com>
Connecting the static page tables to each other was done with the page
table virtual address (riscv_pgvaddr) when the page table physical address
is needed.
I can never remember whether the static page table list contains the
table's physical or kernel virtual address.. Add the fact as a comment
there.
Also add the limitations that come from this static page table approach
for Sv32.
This adds option to do PMP configuration via mpfs_board_pmp_setup instead
of just opening up everything. In this case, it is up to the specific
board to implement the PMP configuration in whichever way it sees fit.
Previously, GPIO interrupts were not correctly mapped to the peripheral base register responsible for the interrupt.
Change the IRQ number calculation so the interrupts work correctly on all GPIO peripheral bases.
For TOR: Any size and 4-byte aligned address is required
For NA4: Only size 4 and 4-byte aligned address is good
For NAPOT: Minimum size is 8 bytes, minimum base alignment is 8 bytes,
and size must be power-of-two aligned with base
This commit simplifies these checks and removes all the nonsense added
by a misunderstanding of how the MPFS / Polarfire SoC's PMP works.
These options are just wrong and a result of misunderstanding of the
Polarfire SoC spec. There are no feature limitations in the CPU PMP
implementation -> remove any configuration options added.
Fix case where NULL is de-referenced via tx/rx buffer or descriptor. Only
1 queue is currently set up for each, so the indices 1,2,3 are not valid
and should not be handled.
The BIT macro is widely used in NuttX,
and to achieve a unified strategy,
we have placed the implementation of the BIT macro
in bits.h to simplify code implementation.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
All kernel memory is mapped paddr=vaddr, so it is trivial to give mapping
for kernel memory. Only interesting region should be kernel RAM, so omit
kernel ROM and don't allow re-mapping it.
The SHM physically backed memory does not belong to the user process,
but the page table containing the mapping does -> delete the page table
memory regardless.
This is a collection of tweaks / optimizations to the driver to limit
CPU usage as well as interrupt processing times.
The changes are as follows:
- setfrequency is now no-op if the frequency does not change. Accessing
MPFS_SPI_CONTROL requires synchronization to the FIC domain, which
takes unnecessary time if nothing changes
- load/unload FIFO loops optimized so !buffer, priv->nbits and i==last are
only tested once (instead of for every word written in loop).
- Disable the RX interrupt only once (again, FIC domain access is slow)
- In case a spurious MPFS_SPI_DATA_RX interrupt arrives, just wipe the
whole RX FIFO, instead of trying to read it byte-by-byte
PMPCFG_A_TOR region may have zero size. The pmp configuration
currently fails for zero-sized TOR. This patch bypasses such a
restriction.
Also replace log2ceil with LOG2_CEIL from lib/math32.h.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
Follow other risc-v based chips, and fix:
```
chip/rv32m1_irq.c: In function 'up_irqinitialize':
Error: chip/rv32m1_irq.c:98:3: error: array subscript -2048 is outside array bounds of 'uint8_t[2147483647]' {aka 'unsigned char[2147483647]'} [-Werror=array-bounds]
98 | riscv_stack_color(g_intstacktop - intstack_size, intstack_size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /github/workspace/sources/nuttx/arch/risc-v/src/common/riscv_internal.h:40,
from chip/rv32m1_irq.c:36:
/github/workspace/sources/nuttx/arch/risc-v/src/common/riscv_common_memorymap.h:72:16: note: at offset -2048 into object 'g_intstacktop' of size [0, 2147483647]
72 | EXTERN uint8_t g_intstacktop[]; /* Initial top of interrupt stack */
| ^~~~~~~~~~~~~
cc1: all warnings being treated as errors
```
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
The MPFS eMMC DMA has some requirements that are only fulfilled by
enabling separate DMA access buffers (FAT DMA buffers) and by forcing
indirect access to the media via FAT_FORCE_INDIRECT.
Why? Direct access to user buffers violates two things:
- Buffer alignment is not ensured
- Buffers are user memory (problematic in BUILD_KERNEL)
- There are occasional extra STOPs being sent due to an IP bug when using an
FPGA based I2C. Add a flag "inflight" to mask out extra STOP interrupts when
using the FPGA based implementation
- There are no MPFS_I2C_ST_STOP_SENT irq's "initally". It is just already
either success or still in progress
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
This adds initialization of the ksz9477 switch when used instead of
a PHY, directly connected to SGMII
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Add a function to read PolarFire's serial number from system controller, and use the first five digits as device's mac address
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Workaround to avoid deadlock situation: The RX shall not try to wait for complete
frame in case there is RX errors detected.
In case mpfs_receive is called, it keeps waiting for complete frame and
also keeps the net_lock locked. In the mean while, the TX may run out of free
descriptors, but can not get net_lock mutex lock to be able to release used
descriptors. If there are no free TX descs it disables RX interrupts because
it may require to send response to the received frame.
So, TX side keeps RX interrupts disabled due to lack of free descriptors and
RX blocks TX to release those descs by stubbornly waiting for complete frame.
RPMSG is associated with the use of HPWORK / LPWORK queues.
After sending a message to the remote end (Linux), the system
waits for an ack before proceeding. Unfortunately this may
take sometimes more time than one would expect. Ack waiting is
also unnecessary: nothing is done with that information. Even
worse, the net_lock() is also held during the blocked time so
it blocks other network stacks that are unrelated to this.
Also reorganize the mpfs_opensbi_*.S so that the trap
handler is easily relocated in the linker .ld file without
the need to relocate the utils.S. This makes it easier to
separate the files into own segments. The trap file should be
located in the zero device.
Moreover, provide support for simultaneous ACK and message
present handling capabilities in both directions. There are
times when both bits are set but only other is being handled.
In the end, the maximum throughput of the RPMSG bus increases
easily 10-20% or even more.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
SiFive document: "ECC Error Handling Guide" states:
"Any SRAM block or cache memory containing ECC functionality needs to be
initialized prior to use. ECC will correct defective bits based on memory
contents, so if memory is not first initialized to a known state, then ECC
will not operate as expected. It is recommended to use a DMA, if available,
to write the entire SRAM or cache to zeros prior to enabling ECC reporting.
If no DMA is present, use store instructions issued from the processor."
Clean the cache at this early stage so no ECC errors will be flooding later.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
Check that the base address and region size are properly aligned with
relation to each other.
With NAPOT encoding the area base and size are not arbitrary, as when
the size increases the amount of bits available for encoding the base
address decreases.
Implement the previously empty mpfs_ddr_rand with adapted "seiran128" code
from https://github.com/andanteyk/prng-seiran
This implements a non-secure prng, which is minimal in size. The DDR training
doesn't need cryptographically secure prng, and linking in the NuttX crypto
would increase the code size significantly for bootloaders.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Also move the DDRC clock enablement and reset to mpfs_init_ddr. This doesn't
change the functionality, but is the cleaner place for it.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Especially the write calibration must bail out if the memory test timeouts,
otherwise the device will get stuck in running the memory test in sequence,
and it will always timeout.
Negative error value was also not properly returned from mpfs_mtc_test.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
It doesn't make sense to try to auto-determine write latency, it may pass with too low value.
Keep the existing implementation if the write latency has been set to minimum
value, otherwise just set it.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Calculate how long an I2C transation will take in microseconds, and use
this as the timeout for mpfs_i2c_sem_waitdone.
The reason for doing this is not to keep an i2c bus reserved for the full
1 second timeout, if e.g. a sensor is not on the bus / is faulty and
non-responsive. Reading the other sensors will be blocked for a relatively
long time (1 second) in this case. This fixes such behavior.
1 page might not be enough, if the task has a bigger stack. Best effort
is to allocate the default amount, however this won't work will all
tasks either.
Currently TX_FIFO_SIZE is not altered in mpfs_ep_set_fifo_size(),
but all paths (RX and TX) change MPFS_USB_RX_FIFO_SIZE only.
Fix the TX_FIFO_SIZE setup.
Signed-off-by: Eero Nurkkala <eero.nurkkala@offcode.fi>
This adds a config flag to remove manual bclksclk training if one wants
to just use the controller's own training.
Manual addcmd training depends on the manual bclksclk training, so this
also adds this dependency in Kconfig.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Decreasing the value may increase DQ/DQS window size. Keep the default value
(1) for the existing board configurations.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Adds a platform specific implementation for tickless schedular operation. This includes:
- Tickless operation for vexriscv cores.
- Tickless operation for vexriscv-smp cores.
- Ticked operation for vexriscv-smp cores.
Ticked operation for vexriscv core has been refactored.
Additional default configuration added to demonstrate operation.
Both tickless and ticked options use Litex timer0 for scheduling intervals. This is significantly faster than interfaceing with the risc-v mtimer through opensbi.
Adding the CONFIG_ARCH_PERF_EVENTS configuration to enable
hardware performance counting,solve the problem that some platform
hardware counting support is not perfect, you can choose to use
software interface.
This is configured using CONFIG_ARCH_PERF_EVENTS, so weak_functions
are removed to prevent confusion
To use hardware performance counting, must:
1. Configure CONFIG_ARCH_PERF_EVENTS, default selection
2. Call up_perf_init for initialization
Signed-off-by: wangming9 <wangming9@xiaomi.com>
1. virtio devics/drivers match and probe/remote mechanism;
2. virtio mmio transport layer based on OpenAmp (Compatible with both
virtio mmio version 1 and 2);
3. virtio-serial driver based on new virtio framework;
4. virtio-rng driver based on new virtio framework;
5. virtio-net driver based on new virtio framework
(IOB Offload implementation);
6. virtio-blk driver based on new virtio framework;
7. Remove the old virtio mmio framework, the old framework only
support mmio transport layer, and the new framwork support
more transport layer and this commit has implemented all the
old virtio drivers;
8. Refresh the the qemu-arm64 and qemu-riscv virtio related
configs, and update its README.txt;
New virtio-net driver has better performance
Compared with previous virtio-mmio-net:
| | master/-c | master/-s | this/-c | this/-s |
| :--------------------: | :-------: | :-------: | :-----: | :-----: |
| qemu-armv8a:netnsh | 539Mbps | 524Mbps | 906Mbps | 715Mbps |
| qemu-armv8a:netnsh_smp | 401Mbps | 437Mbps | 583Mbps | 505Mbps |
| rv-virt:netnsh | 487Mbps | 512Mbps | 760Mbps | 634Mbps |
| rv-virt:netnsh_smp | 387Mbps | 455Mbps | 447Mbps | 502Mbps |
| rv-virt:netnsh64 | 602Mbps | 595Mbps | 881Mbps | 769Mbps |
| rv-virt:netnsh64_smp | 414Mbps | 515Mbps | 491Mbps | 525Mbps |
| rv-virt:knetnsh64 | 515Mbps | 457Mbps | 606Mbps | 540Mbps |
| rv-virt:knetnsh64_smp | 308Mbps | 389Mbps | 415Mbps | 474Mbps |
Note: Both CONFIG_IOB_NBUFFERS=64, using iperf command, all in Mbits/sec
Tested in QEMU 7.2.2
Signed-off-by: wangbowen6 <wangbowen6@xiaomi.com>
Signed-off-by: Zhe Weng <wengzhe@xiaomi.com>
VELAPLATFO-12536
This provides the initial hooks for Flattened Device Tree support
with QEMU RV. It also provides a new procfs file that exposes the
fdt to userspace much like the /sys/firmware/fdt endpoint in Linux.
See https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-ofw
Nodes in the fdt are not yet usable by the OS.
Signed-off-by: Brennan Ashton <bashton@brennanashton.com>
Signed-off-by: liaoao <liaoao@xiaomi.com>