Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML mangles order of heterogeneous elements #1983

Open
mrnoname1000 opened this issue Mar 19, 2024 · 2 comments
Open

XML mangles order of heterogeneous elements #1983

mrnoname1000 opened this issue Mar 19, 2024 · 2 comments
Labels

Comments

@mrnoname1000
Copy link

mrnoname1000 commented Mar 19, 2024

Describe the bug
yq tries to convert XML documents to mappings wherever possible, however this heuristic looks to be broken. Non-consecutive elements with the same tag name are grouped together, mangling the original document.

Version of yq: 4.42.1
Operating system: mac
Installed via: homebrew

Input XML
input.xml

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
	<key>AKLastIDMSEnvironment</key>
	<integer>0</integer>
	<key>AKLastLocale</key>
	<string>en_US</string>
	<key>AppleAntiAliasingThreshold</key>
	<integer>4</integer>
</dict>
</plist>

Command

yq -px -ox < input.xml > output.xml

Actual behavior

output.xml

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
  <dict>
    <key>AKLastIDMSEnvironment</key>
    <key>AKLastLocale</key>
    <key>AppleAntiAliasingThreshold</key>
    <integer>0</integer>
    <integer>4</integer>
    <string>en_US</string>
  </dict>
</plist>

Expected behavior

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
  <dict>
    <key>AKLastIDMSEnvironment</key>
    <integer>0</integer>
    <key>AKLastLocale</key>
    <string>en_US</string>
    <key>AppleAntiAliasingThreshold</key>
    <integer>4</integer>
  </dict>
</plist>

Additional context
In my opinion, trying to represent XML as key/value pairs is an anti-pattern. Any XML element can contain any XML element or node in any order and any number of times. Since YAML can't handle duplicate keys, using arrays to represent all sequences would be more consistent and correct, if a little clunky:

- +p_xml: version="1.0" encoding="UTF-8"
- plist:
  - +@version: "1.0"
  - dict:
    - key: AKLastIDMSEnvironment
    - integer: "0"
    - key: AKLastLocale
    - string: en_US
    - key: AppleAntiAliasingThreshold
    - integer: "4"
@mikefarah
Copy link
Owner

Hmm yeah I think you're right, the only way to handle scenarios like you have would be to have everything as an array. This would be more correct - but less usable and I don't think most people are looking for that structure when they want to convert XML->Yaml.

It would also mean that you couldn't really convert ordinary Yaml/JSON to XML without complex work to re-arrange data into that sequence of key value pairs format.

A decoding flag could be added to parse/encode XML to/from that format - which would allow for existing behavior to continue (which I think, despite not being as accurate, is what most people would intuitively expect).

Be interested in knowing how often this case comes up

@mrnoname1000
Copy link
Author

I think the existing heuristic needs improvement, but an alternate dialect would be more convenient than a --xml- flag. Since lists are harder to use, -pX instead of -px would be a good shorthand, but just renaming -pxml to -pXML doesn't really signify the intent.

Conversion to/from this format could be handled by a pair of built-in functions. There's also the unfortunate case of nested (unnamed) arrays, which XML has no concept of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants