Attachment 'aspcomp2014-rules-and-scoring.txt'
Download 1 = 5th ASPCOMP - Rules and Scoring =
2
3 The System competition aims at reproducing, as faithfully as possible, a
4 setting in which performances of solving systems are evaluated on problem
5 encodings (and instances) of different nature and complexity.
6
7
8 == Rules ==
9
10 The System competition is open to any general-purpose solving system able to
11 parse the ASP-Core-2 input format. The input language of choice for the 5th
12 ASP Competition is ASP-Core 2.01c (see File and Language Formats).
13
14 The competition is run over a selection of problems. Corresponding encodings
15 along with instance data are chosen by the Organizing Committee (see
16 Official Problem Suite). Participant systems will be run in a uniform
17 setting on each problem and instance thereof.
18
19 Problem-specific solving approaches, breaking the uniform setting of the
20 System competition, are not admitted. For instance, specific techniques must
21 not be based on predicate or variable names, whereas algorithm selection
22 based on, e.g., structural features is admissible.
23
24 Participants are required to describe their evaluation procedures and
25 declare the set of benchmarks they support at submission time. The
26 Organizing committee takes the right to analyze any submission and verify
27 its congruence, possibly modifying the set of benchmarks supported by the
28 system before its execution.
29
30
31 == Scoring ==
32
33 The scoring system adopted in each category aims at simplifying the ones
34 used in the 3rd and 4th ASP Competitions (which, in turn, extended previous
35 scoring systems). In particular, the scoring system balances the following
36 factors:
37 - Problems shall be weighted equally, although the number of instances per
38 problem may differ. Thus, scores are normalized in order to give equal
39 relevance to each problem.
40 - If a system outputs an incorrect answer to some instance of a problem
41 (see below for further details), this shall invalidate its score for the
42 problem, even in case all other instances are correctly solved.
43 - In case of optimization problems, scoring should mainly be based on
44 solution quality.
45
46 In general, 100 points can be earned for each benchmark problem. The final
47 score of a solving system will hence consist of the sum of scores over all
48 problems.
49
50 Detailed information about scoring are provided in the following.
51
52
53 === Scoring Details ===
54
55 For Search and Query problems, the score of a solver S on a problem P with N
56 instances is
57
58 S(P) = N_S * 100 / N
59
60 where N_S is the number of instances solved within the time and memory
61 limits (see below for further details).
62
63 For Optimization problems, solvers are ranked by solution quality. If M is
64 the number of participant systems, the score of a solver S for an instance I
65 of a problem P with N instances is
66
67 S(P,I) = M_S(I) * 100 / (M * N)
68
69 where M_S(I) is
70 - 0 if S did neither provide a solution, nor report unsatisfiability, OR
71 - the number of participant solvers that did not provide any strictly
72 better solution than S, where a confirmed optimum solution is considered
73 strictly better than an unconfirmed one, otherwise.
74
75 The score S(P) of a solver S for problem P is the sum of scores S(P,I) over
76 all N instances I of P. Note that, as with Search and Query problems, S(P)
77 can range from 0 to 100.
78
79
80 === Global Ranking ===
81
82 The global ranking in each track, and overall, is obtained by awarding each
83 participant system the sum of its scores over all problems; systems are
84 ranked by their sums, in decreasing order. In case of equal sum of scores,
85 the sum of run-times, including maximum time for unsolved instances, is
86 taken into account as a tie-breaker in favor of the system whose run-time is
87 smaller.
88
89
90 == Detection of Incorrect Answers ==
91
92 Each benchmark domain P is equipped with a checker program C_P that takes
93 as input both an instance I and a corresponding witness solution A, and is
94 such that C_P(A,I) = "true" in case A is a valid witness for I w.r.t. P.
95 Suppose that a system S is faulty for an instance I of a problem P; then,
96 there are two possible ways to detect incorrect behavior, and subsequently
97 disqualify the system S:
98 - S produces an answer A, and A is not a correct solution. This scenario is
99 detected by checking the output of C_P(A,I).
100 - S answers that the instance I is unsatisfiable, but I actually has some
101 witness. In this case, it is checked whether a second system S' produced
102 a solution A' for which C_P(A',I) is true.
103
104 Note that cases of general failure (e.g. "out of memory" errors or other
105 abrupt system failures) do not imply disqualification on a given benchmark.
106
107
108 === Optimization problems ===
109
110 Concerning optimization problems, checkers produce also the cost of the
111 (last) witness. This latter value is considered when computing scores and
112 for assessing answers of systems.
113
114 If a system marks its witness as optimal for an instance of an optimization
115 problem, and no other system finds a better witness, this is pragmatically
116 assumed to be an optimal solution. In general, the cost of best witnesses
117 found is taken as the "imperfect optimum". If a system S marks its witness A
118 as optimal for an instance I of a problem P and the cost of A turns out to
119 be different from the imperfect optimum for I, then S is disqualified on P.
120
121
122 == Instance Selection process ==
123
124 For each domain, instances will be randomly selected from the sets available
125 from past competitions. They will be uniformly randomly chosen in order
126 to ensure a reasonably fair selection over the whole domain.
127
128
129 == Systems making use of randomization ==
130
131 In order to guarantee reproducibility of competition outcomes, systems
132 making usage of randomization are required to run with an a-priori fixed
133 random seed.
134
135
136 == Dispute resolution ==
137
138 The Organizing Committee holds the right to enforce the fair respect of all
139 the above regulations, to settle possible disputes, and to inflict
140 appropriate penalties and/or disqualifications to any infringing
141 participant.
142
143
144 == Hardware, CPU and memory limits ==
145
146 The competition will run on a Debian Linux server (64bit kernel) featuring
147 Intel Xeon X5365 Processors with 4096KB of cache and 16GB of RAM. Time and
148 memory for each run of a solver on a problem instance are limited to 600
149 seconds of wall-clock time and 6GB RAM. Parallel solvers will be granted the
150 whole machine (8 processors), but memory is still limited to 6GB.
Attached Files
You are not allowed to attach a file to this page.