Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RP2040 SPI RAM performance is very poor for most workloads #5

Open
6 tasks
asumagic opened this issue May 27, 2022 · 3 comments
Open
6 tasks

RP2040 SPI RAM performance is very poor for most workloads #5

asumagic opened this issue May 27, 2022 · 3 comments

Comments

@asumagic
Copy link
Collaborator

asumagic commented May 27, 2022

There are several reasons for this:

  • The page cache is 1-way and keeps no statistics. We could keep statistics about what pages get swapped in/out the most and use n-way caching that could make use of this.
  • The page cache size and mapping could be chosen in a much smarter way. Need to think about how region flushes etc. interacts with this.
  • We could transmit cached pages to RAM in the background using DMA. NOTE: It seems like the CSn behavior is pretty crappy with SPI and I haven't managed to get it to behave properly. Moving to QSPI PIO would allow solving this issue at the same time if we're careful to design it to be DMA-friendly.
  • We use SPI, not QSPI. This would require PIO, need to investigate available resources for this.
  • It is not entirely certain why but page rx doesn't work beyond 25MHz even though we use the supposedly 125MHz compatible command. Using a PCB might make this more bearable.
  • The page size is 1KiB. This reduces the amount of hard faults we encounter but it also means we transfer much more data than good caching would need. However reducing the transfer size causes issues (probably on page tx, as page rx seems fine). It's possible that raising CE soon enough would prevent this from happening. As it is not currently tied to SPI, it is possible that it is just raised too late.
@aroesz98
Copy link

Is there any chance that yocto-8 project author will create external library to handle spi flash but only in C instead of C++? It will be first open source library available for everyone. Could be very helpful to a lot of people and it can open a door to more advanced projects like using QSPI ram via PIO for LCD double buffering or to compress/decompress large amount of data using minilzo or unzipLIB. We can use it as RAM for ucLinux too. I'm actually working on QSPI driver using 2 state machines but actually i get only dual spi and have some problems with that but i think i can make it work but it takes some time. Regards ;)

@asumagic
Copy link
Collaborator Author

asumagic commented Nov 16, 2022

Is there any chance that yocto-8 project author will create external library to handle spi flash but only in C instead of C++? It will be first open source library available for everyone. Could be very helpful to a lot of people and it can open a door to more advanced projects like using QSPI ram via PIO for LCD double buffering or to compress/decompress large amount of data using minilzo or unzipLIB. We can use it as RAM for ucLinux too. I'm actually working on QSPI driver using 2 state machines but actually i get only dual spi and have some problems with that but i think i can make it work but it takes some time. Regards ;)

@aroesz98 :

The entire code for this lives at https://github.com/yocto-8/yocto-8/tree/main/src/arch/pico/extmem
Specifically, the code for the actual SPI RAM interfacing is in spiram.cpp/.hpp.

... But this SPI RAM driver is very primitive (you can get away with ~100 LoC), and that code is essentially useless if you wanted to implement QSPI RAM using PIO, other than to serve as a reference alongside the datasheet for these PSRAM chips.
I'm not even sure how good it is as a reference because I had unsolved corruption issues trying to drive the chip at a higher frequency FYI. I believe it was rather likely a hardware issue in my setup, though.

For now, I don't have much time or interest in implementing this unfortunately :(
With the license being MIT anyone can pull it out and port it to C if they'd like. If you only need the cursed RAM fault handling mechanism, you don't really even need to interface with this code at all. As long as it's compiled in your binary, it'll override the fault handler.
It's not super clean but it should be well decoupled from the rest of y8's Pico platform code.

@asumagic
Copy link
Collaborator Author

asumagic commented Nov 19, 2022

I also should have mentioned that I have a WIP "paging" implementation that abuses the MPU over at the mpu-abuse branch. (now in main) For certain usecases, it has the potential of being significantly faster. But it does some questionable things and will very poorly run when running code from flash.

That, of course, is only relevant if what you want is transparently using external RAM as if it was regular memory: If you need to do predictable bulk copies, such as with your double buffering example, then doing the reads explicitly from code is going to be much faster and much less hacky.

Performance is still depressingly bad if you're doing anything remotely looking like random accesses :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants