Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects #2714

MarkBaker · 2022-03-26T19:01:46Z

This is:

- [ ] a bugfix
- [ ] a new feature
- [X] refactoring

Checklist:

Changes are covered by unit tests
Code style is respected
Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
CHANGELOG.md contains a short summary of the change
Documentation is updated as necessary

Why this change is needed?

Experiment to replace the large "unstructured" array of Excel function signature definitions with a structured collection of Value Objects to see what affect it has on memory usage and speed.

This isn't a perfect implementation yet: but it's enough to take a look at the memory usage and speed of the new solution.

Hopefully, this approach will also make Excel function definitions/implementations easier to maintain in future, and potentially allow new benefits such as defining the Excel version where they became available, and also if Microsoft have deprecated them.

MarkBaker · 2022-03-27T12:16:46Z

@oleibman - Thought you might like to take a look at this, replacing the large array of Excel function signatures in Calculation with a collection of Value Objects that's lazy-loaded as functions are referenced. Entries in the collection; and properties in the VOs can be accessed either as array elements or as object properties.

It's still not 100% yet, there's still some work I need to do on it; but it does seem to show both memory and speed improvements on the test suite (despite the one test that loads every single VO in the collection)... at least there's lower memory usage for PHP 7.4 and PHP 8.0; but all versions runs in the CI tests seem faster. How that will translate into real-world end-user performance, I don't know; but it should still be better.

…n signature definitions with a structured collection of Value Objects to see what affect it has on memory usage and speed. This isn't a perfect implementation yet: but it's enough to take a look at the memory usage and speed.

oleibman · 2022-03-29T19:18:14Z

@MarkBaker, I will certainly take a look. However, I will be offline for personal reasons for about a week, so will probably not be able to get back to you before late next week.

oleibman · 2022-04-07T14:27:46Z

@MarkBaker, as usual I am learning so much from reviewing this. It will take a while for me to put it all together. For now, I do have one question. You have moved the initialization of the large array phpSpreadsheetFunctions out of Calculation/Calculation and into Calculation/Engine/ExcelFunctions. But, even in the latter, you still have a large array, simpler than the original but still large. Might it be possible to just initialize that to an empty array (or possibly just the false entries if any) and populate it entirely on demand?

MarkBaker · 2022-04-07T19:08:04Z

Might it be possible to just initialize that to an empty array (or possibly just the false entries if any) and populate it entirely on demand?

Also possible: The new array is a lot smaller than the original array; but it does still use a not insignificant amount of memory.... it took me a while to get this working "as is", but a version of the ExcelFunctions that tests for in_array(), and if not then tests if there is a signature definition using class_exists(), loading it into the array if there is... I had thought about it at the time, but ran out of weekend to experiment, and haven't had the time to revisit it since.
The other variant there is that class_exists() is faster when called without loading the class; but as we'd need to load and instantiate it anyway if class_exists() returned a true, I'm not sure that would make any noticeable difference.

I'll have a go at that on a separate branch next week, so that we can compare memory/speed results for both variations.

Given that only a small number of Excel functions are typically used in any given spreadsheet; but that those few functions are often used repeatedly, it did seem a good target for memory optimisation; though I was a bit concerned about the performance cost of the additional "load on demand"... that pattern of real world demand should see both memory and performance benefits, even if our test suite doesn't fully reflect that. Though I was pleased that there didn't seem any real performance overhead with the change, and the memory saving is visible in the unit test runs, even though there are tests there that would load every function definition.

MarkBaker · 2022-04-07T19:19:36Z

Just as a warning: the biggest issue that I had with this was the large number of additional classes slowing PHPStorm's re-indexing, and making it slower overall... I doubt that would be a problem once I switch to my new laptop, with more memory and a faster processor, but it is a maintenance overhead.
That might be a result of 500 classes in a single directory; and I did try breaking them into subdirectories; but that created extra run-time overhead, and seemed to have no benefit in PHPStorm at all; so I left them all in the one folder.

MarkBaker · 2022-04-07T19:36:08Z

And if I do start from an initial empty array, I'll also have to play around a bit with the logic that builds the list of implemented functions when we do a release; because that uses the array keys as its base data; but I'm sure I can work out an alternative reading through the Engine/Functions folder to handle that. Slower release isn't real problem if we're improving run-time performance.

- TEXTBEFORE – Returns text that’s before delimiting characters - TEXTAFTER – Returns text that’s after delimiting character - TEXTSPLIT – Splits text into rows or columns using delimiters - VSTACK – Stacks arrays vertically - HSTACK – Stacks arrays horizontally - TOROW – Returns the array as one row - TOCOL – Returns the array as one column - WRAPROWS – Wraps a row array into a 2D array - WRAPCOLS – Wraps a column array into a 2D array - TAKE – Returns rows or columns from array start or end - DROP – Drops rows or columns from array start or end - CHOOSEROWS – Returns the specified rows from an array - CHOOSECOLS – Returns the specified columns from an array - EXPAND – Expands an array to the specified dimensions

MarkBaker force-pushed the Experiment-Maintaining-Excel-Function-List-as-an-ArrayObject branch 4 times, most recently from 3dff948 to e6bbb7a Compare March 27, 2022 11:45

MarkBaker force-pushed the Experiment-Maintaining-Excel-Function-List-as-an-ArrayObject branch from e6bbb7a to 3878e32 Compare March 27, 2022 19:02

MarkBaker added the memory label Apr 2, 2022

MarkBaker added calculation engine experiment labels Apr 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects #2714

Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects #2714

MarkBaker commented Mar 26, 2022 •

edited

MarkBaker commented Mar 27, 2022

oleibman commented Mar 29, 2022

oleibman commented Apr 7, 2022

MarkBaker commented Apr 7, 2022

MarkBaker commented Apr 7, 2022 •

edited

MarkBaker commented Apr 7, 2022 •

edited

Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects #2714

Are you sure you want to change the base?

Experiment to replace the large array of Excel function definitions with a FlyWeight Pattern collection of Value Objects #2714

Conversation

MarkBaker commented Mar 26, 2022 • edited

Why this change is needed?

MarkBaker commented Mar 27, 2022

oleibman commented Mar 29, 2022

oleibman commented Apr 7, 2022

MarkBaker commented Apr 7, 2022

MarkBaker commented Apr 7, 2022 • edited

MarkBaker commented Apr 7, 2022 • edited

MarkBaker commented Mar 26, 2022 •

edited

MarkBaker commented Apr 7, 2022 •

edited

MarkBaker commented Apr 7, 2022 •

edited