A Study of Exception-handling in Test Suites

% %intro %As shown in Section , some real world applications exist in which our technique is able to prove the satisfaction or the violation of our exception contracts. %More precisely, There exists pairs of test suite and application code for which the test suite richly specifies exception handling (it specifies conditions in which exceptions are thrown and scenarios in which exceptions are caught). %2) the exception handling code in the application is generic enough (in the sense of source independence or pure resilience) to allow short-circuiting.

%The colors of try-catch usages that we use in short-circuit testing give a new perspective on how exception-handling is specified in test suites. %This is what we present in this section. We move away from resilience analysis and propose three new types of test cases.

The classical way of analyzing the execution of test suites is to separate passing {green test cases''} and failing {red test cases’’} ({those colors refers to the graphical display of Junit, where passing tests are and failing tests are }). This distinction does not consider the specification of exception handling. Beyond green and red test cases, our results indicate that one can characterize the test cases in three categories: the pink, blue and white test cases. Those three new types of test cases are a partition of passing test cases.

%pink The ``pink test cases’’ are those test cases where no exceptions at all are thrown or caught. The pink test cases specify the nominal usage of the software under test, i.e. the functioning of the system according to plan under standard input and environment. Note that a pink test case can still execute a try-block (but never a catch block by definition).

%%%%%%%%%%%%%%%%%% BLUE %%%%%%%%%%%%%%%%%%%%%%%%

%blue / pure blue

%\begin{lstlisting}[float,captionpos=b,caption={Specifying the state correctness envelope with ``blue’’ test cases that expect thrown exceptions under incorrect input.},label=bluetests]

%\end{lstlisting}

Conceptually, there is an envelope that defines all possible correct states of an application. We call it the state correctness envelope". This envelope is the boundary between correct and incorrect runtime states. Specifying thestate correctness envelope’’ can be achieved by writing test cases that simulate incorrect states, and then assert the presence in the test suite of exceptions of the expected type.

The blue test cases'' are those test cases which assert the presence of exception under incorrect input (such as for instancedivision(15,0)’’). The blue test cases sets up incorrect state and then verify that an exception is thrown. This is illustrated in the listing, where two test cases that expect exceptions using two different testing patterns in Java (with an annotation of the JUnit testing framework; the other one with a try-catch that is specific to testing, with a fail statement). % TODO better explain the two patterns

We propose to call white test cases'' those that specify the required exception-handling capabilities of the system under test. This specification is done by 1) simulating the occurrence of an exception, 2) be sure that the exception is caught in application code and 3) asserting that the system is in a correct state afterwards (with an appropriate assertion). If a test case still passes after the execution of a catch block in the application under test, it means that the recovery code in the catch block has successfully repaired the state of the program. The{white test cases}’’ are those test cases that do not expect an exception (they are standard passing functional test cases) but throw and catch at least one exception in the application code. Contrary to blue tests, they are not expecting exceptions in the test case code (however, there is by definition at least one thrown exception but it is only used internally).

We take a dataset of 9 well tested open source Java applications and measure the proportion of blue, white, and pink test cases.

The proportion of pink test cases (with no exception at all) is the number of test cases that never use exception. Not surprisingly, in our dataset, it varies between 65% and 81% of test cases. This shows that test suites mainly specify the nominal usage and environment.

To measure the white test cases (those which specify error-handling), we have set up the following experiment. We run all test cases, trace the thrown exceptions and log those exceptions that do not bubble up to the test methods. A test with at least one thrown exception and no bubbling ones is considered as white.

Table gives the number and proportion of white test cases. The second column recalls the number of test cases. The third column gives the number of white test cases. The fourth column indicates the number of exceptions involved in those white test cases. We can see that all test suites expose white test cases.

Our results show that exception handling is often specified as shown by the presence of a reasonable amount of white test cases. This error-handling specification is valuable: %As we do with short circuit testing, it gives data points for analyzing the actual behavior of error-handling with respect to specified inputs.

As a side note, we find interesting that among the exceptions occurring in white test cases, some of them might have never have been planned. As a thought experiment, let us imagine to browse the 1106 exceptions thrown in white test cases with a developer of Apache commons-lang. For each exception, one would ask her whether she expects this exception to be thrown and to be caught. It is imaginable that she might answer for on exception “Cool, this catch block also works in this case!”. This thought experiment illustrates the idea of intentional versus accidental error handling. Intentional error handling results from a conscious reflexion and planning. Accidental error handling results from error-handling code that works outside its initial anticipated scope. It may be the case that test suites exploit accidental error handling.

Table presents the proportion of blue test cases (those which expect exceptions under incorrect input). The first and second columns are respectively the name of the application under analysis and the number of test cases in the corresponding test suite. The third column gives the absolute and relative proportion of blue test cases. The fourth column is the number of expected exceptions (exceptions bubbling up to the test case).

One can see that between 5 and 19% of test cases expect exceptions. By construction, those blue test cases use at least one exception, but may use more than one. Indeed, when comparing the third and the fourth columns (# of blue tests versus number of expected exceptions), one sees that there are test cases that expect much more than one exception. For instance Apache Shindig’s test method expects 100 exceptions. Note that for many projects under study, the number of blue test cases is equal to the number of incoming exceptions. This indicates that the presence of a testing design rule: one expected exception per test. %The pattern #1 of listing is a feature given by Junit facilitates the respect of this design rule.

Yes, our results show that there exists a specification of the state correctness envelope (the error case with the design envelope). The assertions of blue test cases specify both when an exception should be thrown and the type of the expected exception. The number of specified exceptions is an approximation of the quantity of incorrect states anticipated and specified by the developers.