Test Fest Results

The results of the test fest for Project 6 are now available, although we have decided not to give you bonus points for the test fest on this project, as discussed below.

In the chart, the rows correspond to the sets of test cases submitted by each team, and the columns correspond to the player and test harness submitted by each time. The row and column labeled "matthias" are a player and test cases that we wrote.

In each cell, you will see 4 numbers with labels:

Okay:
How many test cases from this suite the player passed.
Fail:
How many test cases from this suite the player failed — that is, it completed the test case but produced the wrong result.
IFTC:
(Ill-Formed Test Cases) how many test cases were not valid, either because they were invalid XML, or because they did not match the data definition in the project specification.
----:
The total of the three categories above.

To obtain the output of our test fest administrator, including detailed error messages for failed and ill-formed test cases, click on the cell in question.

We've removed those test cases that were obviously equivalent to others submitted by the same team. Simply changing the plane names doesn't count.

Additionally, to participate in the test fest, you have to submit both a working (i.e., compilable) test harness and test cases. If your test harness failed to compile, your test cases won't appear in these results.

Interpreting the Results

Interpret these results as follows:

Similarly, when you look at how your test cases did, you should pay special attention to how Matthias's player did on your test cases. If his player failed your test case, look for a problem in the test case.

Common Mistakes

I saw several common mistakes in ill-formed test cases:

  1. Invalid XML. Don't include anything that's not in the project's test-case specification, not even after the final </CASE> tag.
  2. Invalid aircraft. The name and tag that you specify in an <AIRCRAFT> element must name a valid aircraft in the game. Similarly, the tag in a <KEEPEM> must be within the correct range.
  3. Lots of folks reversed the two slsts in the <CASE> element. The first is those squadrons that Axis squadrons may attack; the second is those squadrons that Allied squadrons may attack. (Getting this backwards results in a contract failure from our test case administrator, so we count the test case as ill-formed.)
  4. Don't duplicate cards inside the hand. The only duplication that's possible is to put the same card on the deck and the top of the stack, to force your player to draw the same card. Other duplications are invalid (as well as being unrealistic and therefore useless test cases).

Why This Didn't Count

We decided not to credit bugs to your grade, as originally stated, because too many of the test cases didn't account for the player's randomness. As an example, consider the following:

<CASE>
  <SQUADRON>
    <AIRCRAFT NAME="Bell P-39D" TAG="1" />
    <AIRCRAFT NAME="Bell P-39D" TAG="2" />
    <AIRCRAFT NAME="Curtiss P-40E" TAG="1" />
    <AIRCRAFT NAME="Curtiss P-40E" TAG="2" />
    <AIRCRAFT NAME="Messerschmitt ME-110" TAG="1" />
    <KEEPEM TAG="1" />
  </SQUADRON>
  <FALSE />
  <STACK>
    <VICTORY />
  </STACK>
  <LIST>
    <SQUADRON>
      <AIRCRAFT NAME="Boeing B-17E" TAG="1" />
      <AIRCRAFT NAME="Boeing B-17E" TAG="2" />
      <AIRCRAFT NAME="Boeing B-17E" TAG="3" />
    </SQUADRON>
  </LIST>
  <LIST>
  </LIST>
  <RET>
    <AIRCRAFT NAME="Bell P-39D" TAG="1" />
    <LIST>
    </LIST>
    <ATTACK>
      <SQUADRON>
        <AIRCRAFT NAME="Messerschmitt ME-110" TAG="1" />
        <KEEPEM TAG="1" />
        <VICTORY />
      </SQUADRON>
      <SQUADRON>
        <AIRCRAFT NAME="Boeing B-17E" TAG="1" />
        <AIRCRAFT NAME="Boeing B-17E" TAG="2" />
        <AIRCRAFT NAME="Boeing B-17E" TAG="3" />
      </SQUADRON>
    </ATTACK>
  </RET>
</CASE>
      

This test case is valid XML and a well-formed test case according to the project specification, and it avoids the problems in the previous section. Given the specified cards, the player should indeed form a squadron of ME-110s and shoot down the B-17Es. After this, however, the player must discard a card. There are four possible cards to discard, and the strategy specified in project 5 allows any of the four discards. So, if your player passed this test case, it was only by sheer luck. And if your player failed this test case only because it discarded a different card, it really passed.

In principle, this is just a bad test case, and we should have thrown it (and similar test cases) out before running the test fest. Unfortunately, to detect and throw out such test cases would require us to look at each test case and detect this behavior by hand, which is impractical, since you submitted almost 300 (distinct) test cases.

The Actual Test Cases

To allow you to debug, here are the actual test cases. They're available in three formats: a link to the directory containing the various files, a tarball containing the directory, and a ZIP file containing the directory.