Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Lua array detection and order preservation when using toJSON #3369

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Pieter-Dewachter
Copy link
Contributor

User Inder00 on the development Discord server found an interesting bug when using toJSON in combination with Lua arrays that have explicitely defined indices. Reproducing this issue is very straightforward and was also confirmed by NanoBob.

local array = { [1] = "1", [2] = "2", [3] = "3", [4] = "4" }
outputChatBox(toJSON(array))

The above piece of code will output:
[ { "1": "1", "2": "2", "4": "4", "3": "3" } ]

We can see that the order of the array was not preserved and the produced JSON is actually an object rather than an array.

Now consider the following piece of code:

local array = { "1", "2", "3", "4" }
outputChatBox(toJSON(array))

You would expect the same output right? Nope! Now we actually get the correct JSON array with the correct order preserved:
[ [ "1", "2", "3", "4" ] ]

When looking into the code, MTA actually assumes that the order is preserved when reading an array from Lua in C++. This is however not guaranteed (Lua uses a hash table internally) as proven by the above examples. My pull request does a two things to properly mitigate this issue:

  • The detection of a Lua array is improved by allowing out-of-order indices. An array in Lua is defined as a table with numerical indices that start at 1 and increment by 1 continuously without gaps in between. This is now properly reflected in the code, without an assumtion of the order in which the keys are presented to C++.
  • toJSON will now preserve the order of a Lua array correctly, by sorting its elements based on the numerical key before constructing the actual JSON array.

Copy link
Contributor

@TracerDS TracerDS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good job, but please dont use using namespace std features (implicit scope).

Also lets not use default unsigned and signed values as they are platform dependent

Client/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Client/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Client/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Client/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
Server/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
@FileEX
Copy link
Contributor

FileEX commented Apr 17, 2024

Won't this break backwards compatibility?

@MegadreamsBE
Copy link
Member

Won't this break backwards compatibility?

Could always make it a flag for a while and warn scripters that the old method will no longer work down the line.

@Pieter-Dewachter
Copy link
Contributor Author

Won't this break backwards compatibility?

Could always make it a flag for a while and warn scripters that the old method will no longer work down the line.

Personally, I consider this to be an actual bug that needs fixing. Backwards compatibily shouldn't really be an issue due to the undefined behaviour (order of a hash table) that is causing the issue in the first place. When scripting (either in MTA Lua or something else), both cases (correct and incorrect behaviour) have to be handled anyway so that should not be a problem.

Currently there is an inconsistency between toJSON and fromJSONwhere you will not get the original table back if you provide it the table given in the first example. The "keys" (actually indici in this case) will be converted to a string, which causes Lua to handle the table completely different (# operator, ipairs vs pairs, ...).

That's of course my personal opinion, I'm by no means entitled to such decisions.

Copy link
Member

@AlexTMjugador AlexTMjugador left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end result of this PR looks fine to me, although I'm slightly concerned about the need for an auxiliary sorted numeric key vector for reliably detecting arrays. Does the Lua API expose some other way to detect if a table is indeed an array, hopefully one that's more efficient?

About backwards compatibility, I think we should explore exactly what conditions make tables be unordered vs. ordered. If a script author wrote a table with the more verbose { [1] = "1", [2] = "2", [3] = "3", [4] = "4" } syntax and, due to some Lua implementation detail, was lead to believe that such verbose syntax is an approved way to let tables be serialized as JSON objects, I could imagine this change causing some real pain, even if the new behavior is more correct. On the other hand, if that explicit syntax could also happen to generate ordered JSON arrays a significant fraction of the time, I'd be inclined to think that the old behavior was broken beyond repair and we shouldn't really cater for it, as it could never be relied upon in the first place.

Client/mods/deathmatch/logic/lua/CLuaArguments.cpp Outdated Show resolved Hide resolved
@Pieter-Dewachter
Copy link
Contributor Author

The end result of this PR looks fine to me, although I'm slightly concerned about the need for an auxiliary sorted numeric key vector for reliably detecting arrays. Does the Lua API expose some other way to detect if a table is indeed an array, hopefully one that's more efficient?

About backwards compatibility, I think we should explore exactly what conditions make tables be unordered vs. ordered. If a script author wrote a table with the more verbose { [1] = "1", [2] = "2", [3] = "3", [4] = "4" } syntax and, due to some Lua implementation detail, was lead to believe that such verbose syntax is an approved way to let tables be serialized as JSON objects, I could imagine this change causing some real pain, even if the new behavior is more correct. On the other hand, if that explicit syntax could also happen to generate ordered JSON arrays a significant fraction of the time, I'd be inclined to think that the old behavior was broken beyond repair and we shouldn't really cater for it, as it could never be relied upon in the first place.

The interesting thing is that it seems to trigger only in certain cases as you feared already. For example this table will already serialize to an array correctly before my changes: { [1] = "1", [2] = "2", [3] = "3" }. So the behaviour seems to be quite broken already unfortunately.

@AlexTMjugador
Copy link
Member

AlexTMjugador commented May 3, 2024

So the behaviour seems to be quite broken already unfortunately.

Right, that's useful to know. Calling fromJSON of a toJSON result should deserialize to the same Lua value no matter if the array or object representation is chosen, right? If so, then this change won't introduce a backwards-incompatible change with how MTA parses its own JSON data, which I think it's all we should cater for, as separate applications never were able to exchange arrays reliably anyway (programmers of correct and smart separate applications probably designed them to allow both representations).

@Pieter-Dewachter
Copy link
Contributor Author

So the behaviour seems to be quite broken already unfortunately.

Right, that's useful to know. Calling fromJSON of a toJSON result should deserialize to the same Lua value no matter if the array or object representation is chosen, right? If so, then this change won't introduce a backwards-incompatible change with how MTA parses its own JSON data, which I think it's all we should cater for, as separate applications never were able to exchange arrays reliably anyway (programmers of correct and smart separate applications probably designed them to allow both representations).

Before my changes, certain Lua arrays (those represented as an object in JSON, which seems to be "random") would actually deserialize to a Lua table containing the keys as a string rather than a number. After my change, this is no longer the case for any Lua array. This means calling fromJSON of a toJSON becomes fully deterministic* unlike before.

  • Array-like Lua tables with gaps in them will still deserialize to a Lua table with string keys, that's a limitation of the JSON specification as far as I know.

@TheNormalnij
Copy link
Contributor

  • Array-like Lua tables with gaps in them will still deserialize to a Lua table with string keys, that's a limitation of the JSON specification as far as I know.

This behavior is specific to the current implementation. It's technically impossible to implement a universal Lua -> JSON serializer in one function for an unknown table.
If you need a specific JSON structure, you might use object-oriented JSON builders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants