Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Running all Example code for validity #824

Open
4 tasks
ntindle opened this issue Aug 17, 2021 · 11 comments
Open
4 tasks

Investigate Running all Example code for validity #824

ntindle opened this issue Aug 17, 2021 · 11 comments
Labels
Discussion This is open for a discussion. General

Comments

@ntindle
Copy link
Member

ntindle commented Aug 17, 2021

Feature Request

Add support for running tests and verification for all code in the AA

Description

Using Globbing and a bit of comparison magic, we could associate all code file endings with a specific interpreter/compiler/transpiler/etc, and then run the code.

Once run, we could compare all outputs of each variation of the same implementation for validity and stability.

This would allow adding new examples of an existing language easier as it would require far less language domain knowledge for each individual language.

This would have the side effect of complicating esoteric language implementation as we would want to add it to our comparison language support set.

Additional context

This will probably be pretty complicated and need custom tooling

For Algorithm Archive Developers

  • This feature can be added to the Master Overview (if it cannot be, explain why in a comment below -- lack of technical expertise, not relevant to the scope of this project, too ambitious)
  • There is a timeline for when this feature can be implemented
  • The feature has been added to the Master Overview
  • The feature has been implemented (Please link the PR)
@leios
Copy link
Member

leios commented Aug 17, 2021

A really important point here is that we need some consistent way to run all of the tests in every language (#691). In particular, certain language (like Julia / Rust) should come with all the necessary files to load dependencies.

@ntindle
Copy link
Member Author

ntindle commented Aug 19, 2021

Thinking of architecture for this....

I'm imagining a scenario where we iterate over every algorithm in contents/ and run a script that will extract every language out of the code/ folder.

Each language will then be passed to a factory or builder that will create an object that we can execute and get results from. (Not sure if this is the perfect pattern)

Adding support for the language would require implementing the interface in the factory and returning that object or adding support to the builder.

All of the other solutions I've thought of have been much less elegant than this.

Still need to think on if this is the right design pattern and would appreciate insight from some real architects.

@stormofice
Copy link
Contributor

I think the main part would be creating a standardized testing format, as most of the examples currently don't have any tests associated to them (and if they do the formats vary a lot).

If the output would follow a specific format, it would only come down to running the example with its appropriate interpreter/compiler/... and getting the results from the standard output or a file and validating it.

For non-compiled languages this should be quite easy, as long as all the necessary libraries are provided.

@leios
Copy link
Member

leios commented Aug 23, 2021

This is a good point. We need a list of all the implementations without tests (I think almost all of them have at least a simple integration test, but we might be missing a few), and a list of all implementations with tests that are hard to check between languages.

For the output... Most of the time, we are outputting some .dat file; however, sometimes these files are specifically meant to be plotted to make an image that looks a certain way. This is hard to test properly. I think another big problems here is that I had the "bright" idea to use rand() a lot to test some algorithms and even if we set seeds, they are not universal across languages.

So maybe we should make sure that each test creates a data file that are "close enough" to each other? I mean, we can check "close enough" with an is_approx() function in most languages, so this seems doable?

@stormofice
Copy link
Contributor

I just did a quick check and noticed that 16/22 Julia files do write to standard output (mostly older ones), but changing them to write to a .dat file instead should not be a problem.

Concerning chapters where the output does not only consist of many points, the format should also be specified some way.
For example in the Gaussian Elimination chapter it would be necessary that every implementation prints its output matrices the same (e.g. [11, 12, 13], [21], [22], [23], ...) without additional text like Starting Gaussian elimination (or similar).

This could be circumvented however by defining that lines starting with # should not be used as data for comparison (or any other char).

I agree that outputting to .dat files and comparing them approximately would work best here.

@ntindle
Copy link
Member Author

ntindle commented Aug 24, 2021

I would also like to state that outputting to stdout is probably a valid option using redirection just to write it to a file.

I think we should investigate the implications of not having any output of the programs in the AAA on future plans and ideas (such as showing the output on the site)

@stormofice
Copy link
Contributor

Now that I think about it, I would prefer outputting to stdout.

In most languages it is easier to output directly to the console instead of a file (which leads to more concise code) and complementary informational outputs such as Starting ... make more sense.

This would also make it easier to enable showing the output on the AAA directly in the future.

If outputting to stdout was the standard, it would just come down to creating the expected outputs as comparing and running the examples. This should not be that hard for the more popular languages with 10 or more submissions.

For reference, this is an overview on how many submissions there are per language [Click]

I believe that targeting the heavily used ones would probably be fine as a first step.

@ntindle
Copy link
Member Author

ntindle commented Aug 25, 2021

sidebar: how did you generate the graph?

@stormofice
Copy link
Contributor

stormofice commented Aug 25, 2021

I executed the following in the contents/ directory:
find . -type f | sed -n 's/..*\.//p' | sort | uniq -c | sort -r
This creates a list of <file extension> <number of occurrences> and then I filtered the non programming ones (e.g. png, svg, ...) and renamed some of them to look more friendly

@stormofice
Copy link
Contributor

After some discussion on discord, this is what has come out of it:

  • Standardizing code output is important for automatic testing
  • Labeled outputs are still important for human readability (as the AAA is for humans to read)
  • The current suggestion for standardized output would look like the following:
[#] Calculating stuff
31

Lines starting with [#] would be ignored for comparing the result

  • Parsing different kinds of formatted output will still be necessary, as printing arrays/matrices/etc in language idiomatic ways will produce differently formatted outputs [For example: [x1, x2, x3] vs x1 x2 x3

@stormofice
Copy link
Contributor

As per #864, we decided to use [#]\n instead of \n for more consisting formatting, which is easy to understand for humans and machines.

@Amaras Amaras mentioned this issue Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion This is open for a discussion. General
Projects
None yet
Development

No branches or pull requests

4 participants