welcome: please sign in
location: attachment:aspcomp2014-rules-and-scoring.txt of FrontPage

Attachment 'aspcomp2014-rules-and-scoring.txt'

Download

   1 = 5th ASPCOMP - Rules and Scoring =
   2 
   3 The System competition aims at reproducing, as faithfully as possible, a 
   4 setting in which performances of solving systems are evaluated on problem 
   5 encodings (and instances) of different nature and complexity.
   6 
   7 
   8 == Rules ==
   9 
  10 The System competition is open to any general-purpose solving system able to
  11 parse the ASP-Core-2 input format. The input language of choice for the 5th
  12 ASP Competition is ASP-Core 2.01c (see File and Language Formats).
  13 
  14 The competition is run over a selection of problems. Corresponding encodings
  15 along with instance data are chosen by the Organizing Committee (see
  16 Official Problem Suite). Participant systems will be run in a uniform
  17 setting on each problem and instance thereof.
  18 
  19 Problem-specific solving approaches, breaking the uniform setting of the
  20 System competition, are not admitted. For instance, specific techniques must
  21 not be based on predicate or variable names, whereas algorithm selection
  22 based on, e.g., structural features is admissible.
  23 
  24 Participants are required to describe their evaluation procedures and 
  25 declare the set of benchmarks they support at submission time. The 
  26 Organizing committee takes the right to analyze any submission and verify 
  27 its congruence, possibly modifying the set of benchmarks supported by the 
  28 system before its execution.
  29 
  30 
  31 == Scoring ==
  32 
  33 The scoring system adopted in each category aims at simplifying the ones
  34 used in the 3rd and 4th ASP Competitions (which, in turn, extended previous
  35 scoring systems). In particular, the scoring system balances the following
  36 factors:
  37  - Problems shall be weighted equally, although the number of instances per
  38    problem may differ. Thus, scores are normalized in order to give equal 
  39    relevance to each problem.
  40  - If a system outputs an incorrect answer to some instance of a problem
  41    (see below for further details), this shall invalidate its score for the
  42    problem, even in case all other instances are correctly solved.
  43  - In case of optimization problems, scoring should mainly be based on
  44    solution quality.
  45 
  46 In general, 100 points can be earned for each benchmark problem. The final
  47 score of a solving system will hence consist of the sum of scores over all
  48 problems.
  49 
  50 Detailed information about scoring are provided in the following.
  51 
  52 
  53 === Scoring Details ===
  54 
  55 For Search and Query problems, the score of a solver S on a problem P with N
  56 instances is
  57 
  58     S(P) = N_S * 100 / N
  59 
  60 where N_S is the number of instances solved within the time and memory
  61 limits (see below for further details).
  62 
  63 For Optimization problems, solvers are ranked by solution quality. If M is
  64 the number of participant systems, the score of a solver S for an instance I
  65 of a problem P with N instances is
  66 
  67     S(P,I) = M_S(I) * 100 / (M * N)
  68 
  69 where M_S(I) is
  70  - 0 if S did neither provide a solution, nor report unsatisfiability, OR
  71  - the number of participant solvers that did not provide any strictly
  72    better solution than S, where a confirmed optimum solution is considered
  73    strictly better than an unconfirmed one, otherwise.
  74 
  75 The score S(P) of a solver S for problem P is the sum of scores S(P,I) over
  76 all N instances I of P. Note that, as with Search and Query problems, S(P)
  77 can range from 0 to 100.
  78 
  79 
  80 === Global Ranking ===
  81 
  82 The global ranking in each track, and overall, is obtained by awarding each
  83 participant system the sum of its scores over all problems; systems are
  84 ranked by their sums, in decreasing order. In case of equal sum of scores,
  85 the sum of run-times, including maximum time for unsolved instances, is
  86 taken into account as a tie-breaker in favor of the system whose run-time is
  87 smaller.
  88 
  89 
  90 == Detection of Incorrect Answers ==
  91 
  92 Each benchmark domain P is equipped with a checker program C_P that takes
  93 as input both an instance I and a corresponding witness solution A, and is
  94 such that C_P(A,I) = "true" in case A is a valid witness for I w.r.t. P.
  95 Suppose that a system S is faulty for an instance I of a problem P; then,
  96 there are two possible ways to detect incorrect behavior, and subsequently
  97 disqualify the system S:
  98  - S produces an answer A, and A is not a correct solution. This scenario is
  99    detected by checking the output of C_P(A,I).
 100  - S answers that the instance I is unsatisfiable, but I actually has some
 101    witness. In this case, it is checked whether a second system S' produced
 102    a solution A' for which C_P(A',I) is true.
 103 
 104 Note that cases of general failure (e.g. "out of memory" errors or other
 105 abrupt system failures) do not imply disqualification on a given benchmark.
 106 
 107 
 108 === Optimization problems ===
 109 
 110 Concerning optimization problems, checkers produce also the cost of the
 111 (last) witness. This latter value is considered when computing scores and
 112 for assessing answers of systems.
 113 
 114 If a system marks its witness as optimal for an instance of an optimization
 115 problem, and no other system finds a better witness, this is pragmatically
 116 assumed to be an optimal solution. In general, the cost of best witnesses
 117 found is taken as the "imperfect optimum". If a system S marks its witness A
 118 as optimal for an instance I of a problem P and the cost of A turns out to
 119 be different from the imperfect optimum for I, then S is disqualified on P.
 120 
 121 
 122 == Instance Selection process ==
 123 
 124 For each domain, instances will be randomly selected from the sets available
 125 from past competitions. They will be uniformly randomly chosen in order 
 126 to ensure a reasonably fair selection over the whole domain.
 127 
 128 
 129 == Systems making use of randomization ==
 130 
 131 In order to guarantee reproducibility of competition outcomes, systems
 132 making usage of randomization are required to run with an a-priori fixed
 133 random seed.
 134 
 135 
 136 == Dispute resolution ==
 137 
 138 The Organizing Committee holds the right to enforce the fair respect of all
 139 the above regulations, to settle possible disputes, and to inflict
 140 appropriate penalties and/or disqualifications to any infringing
 141 participant.
 142 
 143 
 144 == Hardware, CPU and memory limits ==
 145 
 146 The competition will run on a Debian Linux server (64bit kernel) featuring
 147 Intel Xeon X5365 Processors with 4096KB of cache and 16GB of RAM. Time and
 148 memory for each run of a solver on a problem instance are limited to 600
 149 seconds of wall-clock time and 6GB RAM. Parallel solvers will be granted the
 150 whole machine (8 processors), but memory is still limited to 6GB.

Attached Files

You are not allowed to attach a file to this page.