Size: 10208
Comment:
|
← Revision 10 as of 2013-02-24 21:42:21 ⇥
Size: 8125
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl EditorsGroup:read,write All: TEMP |
#acl EditorsGroup:read,write All:read |
Line 10: | Line 8: |
=== Rules for System competition === | === Rules for System Track === |
Line 14: | Line 12: |
The scenario of the System Competition aims at reproducing, as faithfully as possible, a setting in which it is measured the performance of a given system against a problem encoding (and instance thereof) of unknown nature and complexity. | The scenario of the System Track aims at reproducing, as faithfully as possible, a setting in which it is measured the performance of a given system against a problem encoding (and instance thereof) of unknown nature and complexity, where the problem encoding is generally more "declarative" rather than "tricky". |
Line 18: | Line 16: |
1. The System competition is open to general-purpose ASP systems, able to parse the two fixed formats '''ASP-Core''' and '''ASP-RFC''' (see [[http://www.mat.unical.it/aspcomp2011/files/LanguageSpecifications.pdf|File and Language Formats]]). | 1. The System competition is open to general-purpose ASP systems, able to parse a fixed input format. The input language of choice for the 4th ASP Competition is '''ASP-Core 2.0'''(see [[https://www.mat.unical.it/aspcomp2013/files/ASP-CORE-2.0.pdf|File and Language Formats]]). |
Line 20: | Line 18: |
1. The competition is run over a selection of problems. For each problem, a corresponding, fixed encoding in '''ASP-Core''' or '''ASP-RFC''', together with a set of benchmarks instances, is chosen by the organizers (see [[Benchmark problems classification|Benchmark problems classification]]); | 1. The competition is run over a selection of problems. For each problem, a corresponding, fixed encoding, together with a set of benchmarks instances, is chosen by the organizers (see the [[OfficialProblemSuite|Official Problem Suite]]); |
Line 22: | Line 20: |
1. Each participating solver will be launched with its default settings on each problem instance. | 1. Each participating system will be run with a fixed default setting on each problem and instance thereof. |
Line 28: | Line 26: |
In order to discourage such techniques the competition committee holds the right to introduce in the competition evaluation platform syntactic means for scrambling program encodings, such as, e.g., file, predicate and variable random renaming. Furthermore, the committee takes, in general, the right to replace official program encodings with syntactically changed versions. | In order to discourage such techniques the competition committee holds the right to introduce in the competition evaluation platform syntactic means for scrambling program encodings, such as, e.g., random renaming of files, predicates and variables. Furthermore, the committee takes, in general, the right to replace official program encodings with syntactically changed versions. |
Line 34: | Line 32: |
=== Rules for the Model & Solve competition === |
=== Rules for the Model & Solve Track === |
Line 39: | Line 36: |
The scenario of the Model & Solve Competition aims at reproducing, as faithfully as possible, a setting in which a system's team faces a problem specification and is asked, in a limited time range, to provide a tailored solution, optimized both in terms of problem encoding and evaluation strategy. | The scenario of the Model & Solve Track aims at reproducing, as faithfully as possible, a setting in which a system's team faces a problem specification and is asked, in a limited time range, to provide a tailored solution, optimized both in terms of problem encoding and evaluation strategy. |
Line 56: | Line 53: |
On request of competitors, material will be published under an explicit 'limited usage license' (e.g. usage of binaries for academic purposes only), and can be properly watermarked (e.g. scripts and systems' banners can include explicit mention of the Third ASP Competition). | On request of competitors, material will be published under an explicit 'limited usage license' (e.g. usage of binaries for academic purposes only), and can be properly watermarked (e.g. scripts and systems' banners can include explicit mention of the Fourth ASP Competition). |
Line 60: | Line 57: |
The scoring system adopted in each category of the competition is basically the one used in the first and second ASP competitions (which was mainly based on a weighted sum of the number of instances solved within the given time-bound), extended by rewarding additional scores that measure time performances. In particular, the scoring system has been conceived by balancing the following factors: | The scoring system adopted in each category of the competition refines the one used in the 3d ASP competition (which in turn extended first and second competion's scoring system) was mainly based on a weighted sum of the number of instances solved within the given time-bound). In particular, the scoring system balances the following factors: |
Line 62: | Line 59: |
1. Problems having more instances should not have bigger scoring range. Thus, for a given problem ''P'', a normalized score shall be awarded, obtained by averaging the score awarded to each instance. | 1. Problems having more instances shall not have bigger scoring range. Thus, for a given problem ''P'', a normalized score shall be awarded, obtained by averaging the score awarded to each instance. |
Line 75: | Line 72: |
For detailed information about scoring, see [[http://www.mat.unical.it/aspcomp2011/files/scoringdetails.pdf|Scoring Details]] | For detailed information about scoring and the instance selection process, see [[http://www.mat.unical.it/aspcomp2013/files/scoringdetails2013.pdf|Scoring Details]] === Instance Selection process === See above. |
Line 78: | Line 79: |
The global ranking is computed by awarding each ASP system a score that is the sum of its score in each problem. The system with the higher score is proclaimed as winner in the corresponding track. | The global ranking is computed by awarding each participant a score that is the sum of its score in each problem. The system with the higher score is proclaimed as winner in the corresponding track. |
Line 80: | Line 81: |
In the remote case a competition track ends in a tie, this is awarded ''ex-aequo''. | In the remote case a track ends in a tie, this is awarded ''ex-aequo''. |
Line 88: | Line 89: |
=== Detailed Policy in Case of Evaluation Errors === The policy here is almost the same with respect to the previous competition. For the sake of clarity, it is next stated, and some further details are reported. ==== Evaluation Errors ==== A solver that returns an incorrect answer for an instance of a particular benchmark (e.g., answers INCONSISTENT to a satisfiable problem instance, or returns an erroneous witness, or answers OPTIMUM FOUND when the generated witness is not optimal) gets zero points as overall score for that benchmark problem. Of course, it is not disqualified from the competition, nor the score of other benchmark problems is influenced. It is worth noting that, in some cases, the verification of some sort of answers might be untractable. Pragmatically, if a solver returns INCONSISTENT for an instance, and no solver finds a witness for it, this is considered correct. Similarly, if a solver returns OPTIMUM FOUND for an optimization instance, and no solver finds a better witness, this is considered correct. ==== Further Details ==== ''Please always refer to the most recent version of the language and input/output specifications document available on this site''. * A system running on an instance of any benchmark will be given the same score '''as if it was timed out''' in the following cases: * if it outputs "UNKNOWN" (''clean failure''); * if it stops and comes without a clear solution, but in a recognizable state (''unclean failure'') - e.g., out of memory limit; * if it outputs something which cannot be parsed as a solution neither can be recognized according to the abovementioned specifications (''unknown unclean failure''). In such a case, if this happens during the ''dry run'' phase, incident will be reported to system owners. * If a system outputs a solution matching the abovementioned specifications within the time limit, then (obviously): * if the check phase succeeds, score is computed and given; * if the check phase fails, the score for the whole benchmark the instance belongs to is set to ZERO. |
|
Line 114: | Line 91: |
The competition committee holds the right to enforce the fair respect of all the above regulations, to settle possible disputes and to inflict appropriate penalties and/or disqualifications to infringing participants. |
The competition committee holds the right to enforce the fair respect of all the above regulations, to settle possible disputes and to inflict appropriate penalties and/or disqualifications to infringing participants. |
Line 122: | Line 97: |
The competition will be run on a Linux Debian Lenny 32bit platform featuring a 4 core Intel Xeon CPU X3430 running at 2.4 Ghz, equipped with 4GB of physical RAM and PAE enabled. The servers are equipped with the C/C++ compiler GCC 4.3. Competitors can of course install their own compilers/libraries in their local home directory. Recall that dynamic libraries needed at runtime will have to be reachable according to the above regulations regarding path accessibility. Except for (non-competing) parallel solvers, all the problem instances will be evaluated in single-core mode. |
The competition will be run on several Ubuntu Server 12.04 LTS x86-64 machines featuring two AMD Opteron Magny-Cours 6176 SE CPUs (total of 24 cores) running at 2.3 GHz with 128GiB of physical RAM. The servers are equipped with the C/C++ compiler GCC 4.6.3. Recall that dynamic libraries needed at runtime will have to be reachable according to the above regulations regarding path accessibility. === CPU limits === To accommodate multi-core evaluations, runs are classified into sequential and parallel runs. Sequential runs will be evaluated in a single-core Linux control group (see https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt), while parallel runs are limited to a six-core control group; all of the six cores form one NUMA node to prevent memory access overhead. For both kind of runs only memory with the lowest distance to its NUMA node will be used. |
Line 128: | Line 115: |
Each process spawned by participants solvers will have access to the usual Linux process memory space (slightly less than 3GiB user space + 1GiB kernel space). The memory allocated by all the processes is however constrained to a total of 3 GiB (1 GiB = 2^30^ bytes). The memory footprint of participants' run scripts will be controlled by using the [[http://fmv.jku.at/run/|Benchmark Tool Run]]. The above tool might not detect short memory spikes (within 100 milliseconds) or, in corner cases, detect memory overflow with some short delay: we however pragmatically assume the tool as the official reference. |
Each process spawned by participants solvers will have access to the Linux process memory space (without swap) of the particular sequential or parallel control group. The memory reserved for each control group is constrained to 6 GiB. (1 GiB = 1 gibibyte = 2^30 bytes.) |
Rules & Scoring
Contents
Rules for System Track
Principle
The scenario of the System Track aims at reproducing, as faithfully as possible, a setting in which it is measured the performance of a given system against a problem encoding (and instance thereof) of unknown nature and complexity, where the problem encoding is generally more "declarative" rather than "tricky".
Rules
The System competition is open to general-purpose ASP systems, able to parse a fixed input format. The input language of choice for the 4th ASP Competition is ASP-Core 2.0(see File and Language Formats).
The competition is run over a selection of problems. For each problem, a corresponding, fixed encoding, together with a set of benchmarks instances, is chosen by the organizers (see the Official Problem Suite);
- Each participating system will be run with a fixed default setting on each problem and instance thereof.
Syntactic special-purpose solving techniques are strictly forbidden. Among syntactic solving techniques we classify the switch of internal solver options depending on:
- command-line file names;
- predicate and variable names;
- "signature" techniques, aimed at recognizing a particular benchmark problem, such as counting the number of rules, constraints, predicates and atoms in a given encoding.
The semantic recognition of the program structure is allowed (and encouraged), instead. Among allowed semantic recognition techniques, we classify:
- Recognition of the class the program encoding belongs to (e.g., stratified, positive, etc.) and possible consequent switch on of on-purpose evaluation techniques.
- Recognition of general rule and program structures (e.g., common un-stratified even and odd-cycles, common join patterns within a rule body, etc.), provided these techniques are general and not peculiar of a given problem selected for the competition.
Rules for the Model & Solve Track
Principle
The scenario of the Model & Solve Track aims at reproducing, as faithfully as possible, a setting in which a system's team faces a problem specification and is asked, in a limited time range, to provide a tailored solution, optimized both in terms of problem encoding and evaluation strategy.
Rules
- The competition organizers select a number of problem specifications, together with a set of test instances, these latter expressed in a common instance input format (basically, a set of facts in standard syntax).
- Per each problem, teams are allowed to submit a specific solver (or a solving script) and a problem encoding.
- Solutions must be objectively based on a declarative specification system at their core.
Reproducibility of Competition results
The committee will not disclose submitted material until the end of the competition, although willingly participants are allowed to share their own work at any moment. In order to allow transparency and reproducibility of the competition results, the participants agree that any material (system binaries, scripts, problems encodings) submitted by participants will be made public after the competition.
On request of competitors, material will be published under an explicit 'limited usage license' (e.g. usage of binaries for academic purposes only), and can be properly watermarked (e.g. scripts and systems' banners can include explicit mention of the Fourth ASP Competition).
Scoring
The scoring system adopted in each category of the competition refines the one used in the 3d ASP competition (which in turn extended first and second competion's scoring system) was mainly based on a weighted sum of the number of instances solved within the given time-bound). In particular, the scoring system balances the following factors:
Problems having more instances shall not have bigger scoring range. Thus, for a given problem P, a normalized score shall be awarded, obtained by averaging the score awarded to each instance.
Non-sound solvers, and encodings, are strongly discouraged. Thus, if system S outputs an incorrect answer for instance I of some problem P, this shall invalidate all the score achieved for problem P.
A system managing to solve a given problem instance sets a clear gap over systems not able to. Thus, per each instance I of problem P, a flat reward shall be given if I is solved in the allotted time.
- A system is generally perceived as clearly faster if solving time stays orders of magnitude below the maximum allowed time. Thus, similarly to SAT competition scores a logarithmically weighted bonus is awarded to faster systems.
In the case of optimization problems, scoring should depend also on the quality of the provided solution. Thus, points have to be rewarded when finding better solutions, by taking into account the fact that small improvements in solution-quality are usually obtained at the price of strong computational efforts: bonus for a better quality solution is thus given on an exponential weighting basis.
In general, 100 points per each given benchmark problem can be earned. The final score of a solver will then consist of the sum of the scores over all benchmarks.
For detailed information about scoring and the instance selection process, see Scoring Details
Instance Selection process
See above.
Global Ranking
The global ranking is computed by awarding each participant a score that is the sum of its score in each problem. The system with the higher score is proclaimed as winner in the corresponding track.
In the remote case a track ends in a tie, this is awarded ex-aequo.
Systems making usage of randomization
In order to guarantee reproducibility of the Competition outcomes, systems possibly making usage of randomization are required to be set in order to work with a fixed random seed.
Dispute resolution
The competition committee holds the right to enforce the fair respect of all the above regulations, to settle possible disputes and to inflict appropriate penalties and/or disqualifications to infringing participants.
Hardware
The competition will be run on several Ubuntu Server 12.04 LTS x86-64 machines featuring two AMD Opteron Magny-Cours 6176 SE CPUs (total of 24 cores) running at 2.3 GHz with 128GiB of physical RAM.
The servers are equipped with the C/C++ compiler GCC 4.6.3. Recall that dynamic libraries needed at runtime will have to be reachable according to the above regulations regarding path accessibility.
CPU limits
To accommodate multi-core evaluations, runs are classified into sequential and parallel runs. Sequential runs will be evaluated in a single-core Linux control group (see https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt), while parallel runs are limited to a six-core control group; all of the six cores form one NUMA node to prevent memory access overhead. For both kind of runs only memory with the lowest distance to its NUMA node will be used.
Memory limits
Each process spawned by participants solvers will have access to the Linux process memory space (without swap) of the particular sequential or parallel control group. The memory reserved for each control group is constrained to 6 GiB. (1 GiB = 1 gibibyte = 2^30 bytes.)