GNU bug report logs - #41824
Dejagnu's unknown proc aborts testsuite run when triggered in test-case

Reported by: Tom de Vries <tdevries <at> suse.de>

Date: Fri, 12 Jun 2020 08:36:01 UTC

Severity: normal

Done: Jacob Bachmeyer <jcb62281 <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41824 in the body.
You can then email your comments to 41824 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Fri, 12 Jun 2020 08:36:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tom de Vries <tdevries <at> suse.de>:
New bug report received and forwarded. Copy sent to bug-dejagnu <at> gnu.org. (Fri, 12 Jun 2020 08:36:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tom de Vries <tdevries <at> suse.de>
To: bug-dejagnu <at> gnu.org
Subject: Dejagnu's unknown proc aborts testsuite run when triggered in
 test-case
Date: Fri, 12 Jun 2020 10:35:43 +0200

Hi,

the gdb testsuite uses the dejagnu framework.

If I add a call to foobar at the end of a test-case in that testsuite,
and I do a run of the testsuite, then the test-case aborts, and
subsequently the testsuite run aborts, that is, no further tests are run.

While it is as expected that the test-case aborts, it's not desirable
that the testsuite run aborts.

We've installed a hack in ${tool}_init/${tool}_finish to workaround this
in the gdb testsuite (
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26783bce15adc0316f9167a216519cebcf1ccac3
).

It would preferable though if this problem was addressed in dejagnu,
such that we can eventually remove the hack.

A possible solution to this problem could be to make the exiting on
error (which is done in dejagnu's version of proc unknown in
framework.exp) optional.

Thanks,
- Tom

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 16 Jun 2020 04:06:01 GMT) Full text and rfc822 format available.

Message #8 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Tom de Vries <tdevries <at> suse.de>
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Mon, 15 Jun 2020 23:05:33 -0500

Tom de Vries wrote:
> Hi,
>
> the gdb testsuite uses the dejagnu framework.
>   

Apologies for the delay in responding; I took a few days to think about 
this issue because it requires a careful balance, as silently continuing 
after aborting a testcase could cause errors to go unnoticed in large 
testsuites.

> If I add a call to foobar at the end of a test-case in that testsuite,
> and I do a run of the testsuite, then the test-case aborts, and
> subsequently the testsuite run aborts, that is, no further tests are run.
>
> While it is as expected that the test-case aborts, it's not desirable
> that the testsuite run aborts.
>   

The main problem I have with this is that calling an unknown procedure 
is presumably a very serious error.  Aborting the entire run is a good 
way to ensure that the problem is noticed, but perhaps with a testsuite 
as large (and long-running) as the GDB testsuite has become, continuing 
may also be reasonable to salvage at least some results.

Where in the GDB testsuite is this a problem?  When is it appropriate to 
continue after such a serious error?  How to ensure that the error is 
not buried in the logs amidst the normal output from other tests and is 
actually noticed?

> We've installed a hack in ${tool}_init/${tool}_finish to workaround this
> in the gdb testsuite (
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26783bce15adc0316f9167a216519cebcf1ccac3
> ).
>
> It would preferable though if this problem was addressed in dejagnu,
> such that we can eventually remove the hack.
>   

That hack (if not reverted entirely) needs to be made conditional on the 
actual existence of ::tcl_unknown as soon as possible -- the existence 
of ::tcl_unknown (kept to allow Tcl autoloading to work) is very much an 
internal implementation detail, and it *will* be moved out of the global 
namespace and eventually *will* cease to exist entirely in the 
interpreters that run testcases.  Really, tcl_unknown is not supposed to 
be there and relying on it introduces a latent bug into the GDB testsuite.

I am sorry, and I will seek to work with you to fix these problems with 
a minimum of breakage, but I have to put my foot down on this:  I 
*cannot* treat every internal symbol as long-term stable API.  That hack 
will *not* be supported long-term.  Any GDB releases that include it 
*will* break with some as-yet-undetermined future DejaGnu release.  I 
have to draw a line somewhere or I may as well declare DejaGnu 
unmaintained and frozen at 1.6.2 forever.  I have no inclination to do 
the latter.

> A possible solution to this problem could be to make the exiting on
> error (which is done in dejagnu's version of proc unknown in
> framework.exp) optional.
>   

This is obviously a good solution, but leads to the obvious question of 
how (command-line option?  (Make normally aborts on error, but has a 
--keep-going option.)  configuration variable?) that should be 
determined, or if exit-on-unknown-procedure should ever be a default.  
Could simply reporting an UNRESOLVED test (indicating that the testcase 
script aborted due to calling an undefined command) be a better option?  
It would show up in the summary report at the end.

At first, I was planning to schedule this for the 1.7 series, but I 
checked and runtest.exp:runtest already catches Tcl errors, so fixing 
this does not require a significant change to the normal control flow.  
Would simply reporting an "UNRESOLVED:  testcase $file aborted on 
unknown command '$what'" result, then propagating the Tcl error in the 
::unknown proc be a workable solution?

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 16 Jun 2020 15:55:02 GMT) Full text and rfc822 format available.

Message #11 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Tom de Vries <tdevries <at> suse.de>
To: jcb62281 <at> gmail.com
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 16 Jun 2020 17:53:54 +0200

[Message part 1 (text/plain, inline)]

On 6/16/20 6:05 AM, Jacob Bachmeyer wrote:
> Tom de Vries wrote:
>> Hi,
>>
>> the gdb testsuite uses the dejagnu framework.
>>   
> 
> Apologies for the delay in responding; I took a few days to think about
> this issue because it requires a careful balance, as silently continuing
> after aborting a testcase could cause errors to go unnoticed in large
> testsuites.
> 
>> If I add a call to foobar at the end of a test-case in that testsuite,
>> and I do a run of the testsuite, then the test-case aborts, and
>> subsequently the testsuite run aborts, that is, no further tests are run.
>>
>> While it is as expected that the test-case aborts, it's not desirable
>> that the testsuite run aborts.
>>   
> 
> The main problem I have with this is that calling an unknown procedure
> is presumably a very serious error.  Aborting the entire run is a good
> way to ensure that the problem is noticed, but perhaps with a testsuite
> as large (and long-running) as the GDB testsuite has become, continuing
> may also be reasonable to salvage at least some results.
> 
> Where in the GDB testsuite is this a problem?  When is it appropriate to
> continue after such a serious error?

It's not a problem in a specific test-case, but a generic problem. If
somebody makes a typo in a test-case and checks it in, there's no need
to abort the test run.

> How to ensure that the error is
> not buried in the logs amidst the normal output from other tests and is
> actually noticed?
> 

I monitor ERROR and WARNING in my test runs, so I didn't think of this,
but indeed, those are not tracked in the summary, so that could be a
problem, agreed.

>> We've installed a hack in ${tool}_init/${tool}_finish to workaround this
>> in the gdb testsuite (
>> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26783bce15adc0316f9167a216519cebcf1ccac3
>>
>> ).
>>
>> It would preferable though if this problem was addressed in dejagnu,
>> such that we can eventually remove the hack.
>>   
> 
> That hack (if not reverted entirely) needs to be made conditional on the
> actual existence of ::tcl_unknown as soon as possible -- the existence
> of ::tcl_unknown (kept to allow Tcl autoloading to work) is very much an
> internal implementation detail, and it *will* be moved out of the global
> namespace and eventually *will* cease to exist entirely in the
> interpreters that run testcases.  Really, tcl_unknown is not supposed to
> be there and relying on it introduces a latent bug into the GDB testsuite.
> 
> I am sorry, and I will seek to work with you to fix these problems with
> a minimum of breakage, but I have to put my foot down on this:  I
> *cannot* treat every internal symbol as long-term stable API.  That hack
> will *not* be supported long-term.  Any GDB releases that include it
> *will* break with some as-yet-undetermined future DejaGnu release.  I
> have to draw a line somewhere or I may as well declare DejaGnu
> unmaintained and frozen at 1.6.2 forever.  I have no inclination to do
> the latter.
> 

Ack, makes sense.

>> A possible solution to this problem could be to make the exiting on
>> error (which is done in dejagnu's version of proc unknown in
>> framework.exp) optional.
>>   
> 
> This is obviously a good solution, but leads to the obvious question of
> how (command-line option?  (Make normally aborts on error, but has a
> --keep-going option.)  configuration variable?) that should be
> determined, or if exit-on-unknown-procedure should ever be a default. 
> Could simply reporting an UNRESOLVED test (indicating that the testcase
> script aborted due to calling an undefined command) be a better option? 
> It would show up in the summary report at the end.
> 
> At first, I was planning to schedule this for the 1.7 series, but I
> checked and runtest.exp:runtest already catches Tcl errors, so fixing
> this does not require a significant change to the normal control flow. 
> Would simply reporting an "UNRESOLVED:  testcase $file aborted on
> unknown command '$what'" result, then propagating the Tcl error in the
> ::unknown proc be a workable solution?

I think so, yes.  I've tried that out in attached gdb patch, which also
solves the reliance on ::tcl_unknown.

Then when running into the unknown command in the test-case, we have in
gdb.sum:
...
Running
/home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.ada/O2_float_param.exp
...
PASS: gdb.ada/O2_float_param.exp: compilation foo.adb
PASS: gdb.ada/O2_float_param.exp: frame
UNRESOLVED: gdb.ada/O2_float_param.exp: testcase aborted on unknown
command 'foo'
ERROR: tcl error sourcing
/home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.ada/O2_float_param.exp.
ERROR: invalid command name "foo"
    while executing
"::gdb_unknown foo"
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 ::gdb_unknown $args"
    (procedure "::unknown" line 4)
    invoked from within
"foo"
    (file
"/home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.ada/O2_float_param.exp"
line 33)
    invoked from within
"source
/home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.ada/O2_float_param.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source
/home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.ada/O2_float_param.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === gdb Summary ===

# of expected passes            2
# of unresolved testcases       1
...

I get similar results if I disable the hack in the gdb testsuite, and
use attached runtest.exp demonstrator patch.

Note btw that in both cases I didn't put $file in the unresolved
message, because that's already present in pf_prefix, but that might not
always be true.

Thanks,
- Tom

[0001-fix.patch (text/x-patch, inline)]

fix

---
 gdb/testsuite/lib/gdb.exp | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index f502eb157d..5cb9e7907c 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -5177,6 +5177,13 @@ proc gdb_cleanup_globals {} {
     }
 }
 
+# Create gdb_unknown, a copy tcl's ::unknown.
+set temp [interp create]
+set old_args [interp eval $temp "info args ::unknown"]
+set old_body [interp eval $temp "info body ::unknown"]
+interp delete $temp
+eval proc gdb_unknown {$old_args} {$old_body}
+
 proc gdb_init { test_file_name } {
     # Reset the timeout value to the default.  This way, any testcase
     # that changes the timeout value without resetting it cannot affect
@@ -5285,12 +5292,13 @@ proc gdb_init { test_file_name } {
 
     # Dejagnu overrides proc unknown.  The dejagnu version may trigger in a
     # test-case but abort the entire test run.  To fix this, we install a
-    # local version here, which reverts dejagnu's override, and restore
-    # dejagnu's version in gdb_finish.
+    # local version here, which instead calls unresolved, and then propagates the
+    # error to tcl's unknown. In gdb_finish, we restore dejagnu's version.
     rename ::unknown ::dejagnu_unknown
     proc unknown { args } {
-	# Dejagnu saves the original version in ::tcl_unknown, use it.
-	return [uplevel 1 ::tcl_unknown $args]
+	# Restore ::unknown.
+	unresolved "testcase aborted on unknown command '$args'"
+	return [uplevel 1 ::gdb_unknown $args]
     }
 
     return $res

[runtest-exp.patch (text/x-patch, inline)]

--- runtest.exp.orig	2020-06-16 17:36:15.038208916 +0200
+++ runtest.exp	2020-06-16 17:38:19.189251033 +0200
@@ -1431,6 +1431,11 @@
     return $found
 }
 
+proc test_case_unknown { args } {
+    unresolved "testcase aborted on unknown command '$args'"
+    return [uplevel 1 ::tcl_unknown $args]
+}
+
 #
 # Source the testcase in TEST_FILE_NAME.
 #
@@ -1457,6 +1462,9 @@
 	    }
 	}
 
+	rename ::unknown dejagnu_unknown
+	rename ::test_case_unknown ::unknown
+	
 	if { [catch "uplevel #0 source $test_file_name"] == 1 } {
 	    # If we have a Tcl error, propagate the exit status so
 	    # that 'make' (if it invokes runtest) notices the error.
@@ -1476,6 +1484,9 @@
 	    }
 	}
 
+	rename ::unknown ::test_case_unknown 
+	rename dejagnu_unknown ::unknown 
+
 	if {[info exists tool]} {
 	    if { [info procs "${tool}_finish"] != "" } {
 		${tool}_finish

Owner recorded as jcb62281 <at> gmail.com. Request was from Jacob Bachmeyer <jcb62281 <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 17 Jun 2020 02:27:05 GMT) Full text and rfc822 format available.

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Wed, 17 Jun 2020 03:27:02 GMT) Full text and rfc822 format available.

Message #16 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Tom de Vries <tdevries <at> suse.de>
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 16 Jun 2020 22:26:24 -0500

Tom de Vries wrote:
> On 6/16/20 6:05 AM, Jacob Bachmeyer wrote:
>   
>> Tom de Vries wrote:
>>     
>>> If I add a call to foobar at the end of a test-case in that testsuite,
>>> and I do a run of the testsuite, then the test-case aborts, and
>>> subsequently the testsuite run aborts, that is, no further tests are run.
>>>
>>> While it is as expected that the test-case aborts, it's not desirable
>>> that the testsuite run aborts.
>>>   
>>>       
>> The main problem I have with this is that calling an unknown procedure
>> is presumably a very serious error.  Aborting the entire run is a good
>> way to ensure that the problem is noticed, but perhaps with a testsuite
>> as large (and long-running) as the GDB testsuite has become, continuing
>> may also be reasonable to salvage at least some results.
>>
>> Where in the GDB testsuite is this a problem?  When is it appropriate to
>> continue after such a serious error?
>>     
>
> It's not a problem in a specific test-case, but a generic problem. If
> somebody makes a typo in a test-case and checks it in, there's no need
> to abort the test run.
>   

So the root cause problem is people committing changes to the GDB 
testsuite without actually testing their own changes first, at all?

>> How to ensure that the error is
>> not buried in the logs amidst the normal output from other tests and is
>> actually noticed?
>>     
>
> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
> but indeed, those are not tracked in the summary, so that could be a
> problem, agreed.
>   

Blowing up the entire test run is a good way to ensure that problems 
(like typos in testcases) are noticed, but admittedly, that does not 
help when people fail to even run the tests before checking in their 
typos and then other developers' test runs blow up.  Murphy's Law says 
the offenders then carry on to check in more typos in other code without 
testing their own work there either...

>>> A possible solution to this problem could be to make the exiting on
>>> error (which is done in dejagnu's version of proc unknown in
>>> framework.exp) optional.
>>>   
>>>       
>> This is obviously a good solution, but leads to the obvious question of
>> how (command-line option?  (Make normally aborts on error, but has a
>> --keep-going option.)  configuration variable?) that should be
>> determined, or if exit-on-unknown-procedure should ever be a default. 
>> Could simply reporting an UNRESOLVED test (indicating that the testcase
>> script aborted due to calling an undefined command) be a better option? 
>> It would show up in the summary report at the end.
>>
>> At first, I was planning to schedule this for the 1.7 series, but I
>> checked and runtest.exp:runtest already catches Tcl errors, so fixing
>> this does not require a significant change to the normal control flow. 
>> Would simply reporting an "UNRESOLVED:  testcase $file aborted on
>> unknown command '$what'" result, then propagating the Tcl error in the
>> ::unknown proc be a workable solution?
>>     
>
> I think so, yes.  I've tried that out in attached gdb patch, which also
> solves the reliance on ::tcl_unknown.
>   

That patch is subtly wrong in other ways.  First, it assumes that 
::unknown is a procedure and not a command; there is no guarantee of 
this in the Tcl manual.  Second, it will fail if run inside a safe 
interpreter where the default "unknown" does not exist.  Third, it 
breaks Tcl's default autoloading because it adds an UNRESOLVED result 
regardless of the result of Tcl's autoload search.

The best solution is to just revert the hacks and wait for this to be 
solved upstream, where it is expected to land for the 1.6.3 release.

> [...]
> Note btw that in both cases I didn't put $file in the unresolved
> message, because that's already present in pf_prefix, but that might not
> always be true.
>   
The use of pf_prefix appears to be a feature of the GDB testsuite, so 
the upstream implementation of this feature will not be able to rely on it.

I am hesitant to alter default behavior for 1.6.3, so this will probably 
be added as a "--keep-going" option to runtest.  I should be able to 
land this for 1.6.3; it is simple enough to do and we are already 
getting some other new command-line options in 1.6.3 anyway that were 
needed to fix DejaGnu's own testsuite.  As of yet, there is no 
(supported) way planned to set the --keep-going option from an init 
file, only on the actual command-line.  This may change in the future; 
if it does, there will be some general API or just a special parse of 
~/.dejagnurc looking for command-line options.  These are planned for 
1.7 at the earliest.

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Wed, 17 Jun 2020 14:14:02 GMT) Full text and rfc822 format available.

Message #19 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Tom de Vries <tdevries <at> suse.de>
To: jcb62281 <at> gmail.com
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Wed, 17 Jun 2020 16:13:17 +0200

On 6/17/20 5:26 AM, Jacob Bachmeyer wrote:
> Tom de Vries wrote:
>> On 6/16/20 6:05 AM, Jacob Bachmeyer wrote:
>>  
>>> Tom de Vries wrote:
>>>    
>>>> If I add a call to foobar at the end of a test-case in that testsuite,
>>>> and I do a run of the testsuite, then the test-case aborts, and
>>>> subsequently the testsuite run aborts, that is, no further tests are
>>>> run.
>>>>
>>>> While it is as expected that the test-case aborts, it's not desirable
>>>> that the testsuite run aborts.
>>>>         
>>> The main problem I have with this is that calling an unknown procedure
>>> is presumably a very serious error.  Aborting the entire run is a good
>>> way to ensure that the problem is noticed, but perhaps with a testsuite
>>> as large (and long-running) as the GDB testsuite has become, continuing
>>> may also be reasonable to salvage at least some results.
>>>
>>> Where in the GDB testsuite is this a problem?  When is it appropriate to
>>> continue after such a serious error?
>>>     
>>
>> It's not a problem in a specific test-case, but a generic problem. If
>> somebody makes a typo in a test-case and checks it in, there's no need
>> to abort the test run.
>>   
> 
> So the root cause problem is people committing changes to the GDB
> testsuite without actually testing their own changes first, at all?
> 
>>> How to ensure that the error is
>>> not buried in the logs amidst the normal output from other tests and is
>>> actually noticed?
>>>     
>>
>> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
>> but indeed, those are not tracked in the summary, so that could be a
>> problem, agreed.
>>   
> 
> Blowing up the entire test run is a good way to ensure that problems
> (like typos in testcases) are noticed, but admittedly, that does not
> help when people fail to even run the tests before checking in their
> typos and then other developers' test runs blow up.  Murphy's Law says
> the offenders then carry on to check in more typos in other code without
> testing their own work there either...
> 
>>>> A possible solution to this problem could be to make the exiting on
>>>> error (which is done in dejagnu's version of proc unknown in
>>>> framework.exp) optional.
>>>>         
>>> This is obviously a good solution, but leads to the obvious question of
>>> how (command-line option?  (Make normally aborts on error, but has a
>>> --keep-going option.)  configuration variable?) that should be
>>> determined, or if exit-on-unknown-procedure should ever be a default.
>>> Could simply reporting an UNRESOLVED test (indicating that the testcase
>>> script aborted due to calling an undefined command) be a better
>>> option? It would show up in the summary report at the end.
>>>
>>> At first, I was planning to schedule this for the 1.7 series, but I
>>> checked and runtest.exp:runtest already catches Tcl errors, so fixing
>>> this does not require a significant change to the normal control
>>> flow. Would simply reporting an "UNRESOLVED:  testcase $file aborted on
>>> unknown command '$what'" result, then propagating the Tcl error in the
>>> ::unknown proc be a workable solution?
>>>     
>>
>> I think so, yes.  I've tried that out in attached gdb patch, which also
>> solves the reliance on ::tcl_unknown.
>>   
> 
> That patch is subtly wrong in other ways.  First, it assumes that
> ::unknown is a procedure and not a command; there is no guarantee of
> this in the Tcl manual.  Second, it will fail if run inside a safe
> interpreter where the default "unknown" does not exist.  Third, it
> breaks Tcl's default autoloading because it adds an UNRESOLVED result
> regardless of the result of Tcl's autoload search.
> 

Your last remark here got me thinking about dejagnu's unknown proc, and
I've filed two bugs:
- bug#41914: Propagate return value of auto-loaded command
- bug#41918: Propagate error value of auto-loaded command

> The best solution is to just revert the hacks and wait for this to be
> solved upstream, where it is expected to land for the 1.6.3 release.
> 
>> [...]
>> Note btw that in both cases I didn't put $file in the unresolved
>> message, because that's already present in pf_prefix, but that might not
>> always be true.
>>   
> The use of pf_prefix appears to be a feature of the GDB testsuite, so
> the upstream implementation of this feature will not be able to rely on it.
> 
> 
> I am hesitant to alter default behavior for 1.6.3, so this will probably
> be added as a "--keep-going" option to runtest.>  I should be able to
> land this for 1.6.3; it is simple enough to do and we are already
> getting some other new command-line options in 1.6.3 anyway that were
> needed to fix DejaGnu's own testsuite.  As of yet, there is no
> (supported) way planned to set the --keep-going option from an init
> file, only on the actual command-line.  This may change in the future;
> if it does, there will be some general API or just a special parse of
> ~/.dejagnurc looking for command-line options.  These are planned for
> 1.7 at the earliest.

I guess it would be convenient if we could switch this on somehow from
lib/${tool}.exp.

Thanks,
- Tom

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Wed, 17 Jun 2020 23:20:02 GMT) Full text and rfc822 format available.

Message #22 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Tom de Vries <tdevries <at> suse.de>
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Wed, 17 Jun 2020 18:18:52 -0500

[Message part 1 (text/plain, inline)]

Tom de Vries wrote:
> On 6/17/20 5:26 AM, Jacob Bachmeyer wrote:
>   
>> Tom de Vries wrote:
>>     
>>> On 6/16/20 6:05 AM, Jacob Bachmeyer wrote:
>>>  
>>>       
>>>> Tom de Vries wrote:
>>>>         
>>>>> A possible solution to this problem could be to make the exiting on
>>>>> error (which is done in dejagnu's version of proc unknown in
>>>>> framework.exp) optional.
>>>>>         
>>>>>           
>>>> This is obviously a good solution, but leads to the obvious question of
>>>> how (command-line option?  (Make normally aborts on error, but has a
>>>> --keep-going option.)  configuration variable?) that should be
>>>> determined, or if exit-on-unknown-procedure should ever be a default.
>>>> Could simply reporting an UNRESOLVED test (indicating that the testcase
>>>> script aborted due to calling an undefined command) be a better
>>>> option? It would show up in the summary report at the end.
>>>>
>>>> At first, I was planning to schedule this for the 1.7 series, but I
>>>> checked and runtest.exp:runtest already catches Tcl errors, so fixing
>>>> this does not require a significant change to the normal control
>>>> flow. Would simply reporting an "UNRESOLVED:  testcase $file aborted on
>>>> unknown command '$what'" result, then propagating the Tcl error in the
>>>> ::unknown proc be a workable solution?
>>>>     
>>>>         
>>> I think so, yes.  I've tried that out in attached gdb patch, which also
>>> solves the reliance on ::tcl_unknown.
>>>   
>>>       
>> That patch is subtly wrong in other ways.  First, it assumes that
>> ::unknown is a procedure and not a command; there is no guarantee of
>> this in the Tcl manual.  Second, it will fail if run inside a safe
>> interpreter where the default "unknown" does not exist.  Third, it
>> breaks Tcl's default autoloading because it adds an UNRESOLVED result
>> regardless of the result of Tcl's autoload search.
>>     
>
> Your last remark here got me thinking about dejagnu's unknown proc, and
> I've filed two bugs:
> - bug#41914: Propagate return value of auto-loaded command
> - bug#41918: Propagate error value of auto-loaded command
>   

I believe both of these were fixed while working on this issue in the 
attached patch.  Apologies for the delay; I wanted tests for the new 
feature.

>> The best solution is to just revert the hacks and wait for this to be
>> solved upstream, where it is expected to land for the 1.6.3 release.
>>
>>     
>>> [...]
>>> Note btw that in both cases I didn't put $file in the unresolved
>>> message, because that's already present in pf_prefix, but that might not
>>> always be true.
>>>   
>>>       
>> The use of pf_prefix appears to be a feature of the GDB testsuite, so
>> the upstream implementation of this feature will not be able to rely on it.
>>
>>
>> I am hesitant to alter default behavior for 1.6.3, so this will probably
>> be added as a "--keep-going" option to runtest.>  I should be able to
>> land this for 1.6.3; it is simple enough to do and we are already
>> getting some other new command-line options in 1.6.3 anyway that were
>> needed to fix DejaGnu's own testsuite.  As of yet, there is no
>> (supported) way planned to set the --keep-going option from an init
>> file, only on the actual command-line.  This may change in the future;
>> if it does, there will be some general API or just a special parse of
>> ~/.dejagnurc looking for command-line options.  These are planned for
>> 1.7 at the earliest.
>>     
>
> I guess it would be convenient if we could switch this on somehow from
> lib/${tool}.exp.

The problem here is that this detects an invalidity in the testsuite.  
Allowing testsuites to select --keep_going is asking (Murphy's Law 
again) for people to turn it on and then ignore the errors from broken 
test scripts "because it works" -- never mind that it is obviously 
broken, but an extra UNRESOLVED or three can still be easily ignored.  
The entire test run stopping cold unless a special command-line option 
is given that is specifically documented to suppress abort-on-error is 
much harder to explain away or even try to justify as acceptable.


-- Jacob

[0001-Allow-testing-to-continue-after-an-undefined-command.patch (text/plain, inline)]

From c5b21f1f1cfaabf1431010c314aadcc0b7b708f0 Mon Sep 17 00:00:00 2001
From: Jacob Bachmeyer <jcb62281+dev <at> gmail.com>
Date: Wed, 17 Jun 2020 18:08:57 -0500
Subject: [PATCH] Allow testing to continue after an undefined command is called

---
 ChangeLog                                          |   26 +++++++
 Makefile.am                                        |    2 +-
 Makefile.in                                        |    2 +-
 NEWS                                               |    2 +
 doc/dejagnu.texi                                   |    5 +
 doc/runtest.1                                      |    3 +
 lib/framework.exp                                  |   24 +++++-
 runtest.exp                                        |   12 +++
 testsuite/runtest.main/abort.exp                   |   78 ++++++++++++++++++++
 .../abort/testsuite/abort.test/abort-undef.exp     |   25 ++++++
 .../abort/testsuite/abort.test/simple.exp          |   21 +++++
 11 files changed, 195 insertions(+), 5 deletions(-)
 create mode 100644 testsuite/runtest.main/abort.exp
 create mode 100644 testsuite/runtest.main/abort/testsuite/abort.test/abort-undef.exp
 create mode 100644 testsuite/runtest.main/abort/testsuite/abort.test/simple.exp

diff --git a/ChangeLog b/ChangeLog
index 69a80ef..2351101 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2020-06-17  Jacob Bachmeyer  <jcb62281+dev <at> gmail.com>
+
+	PR 41824
+
+	* NEWS: Add item for --keep_going option.
+
+	* Makefile.am (CLEANFILES): Add abort-init.exp to list.
+
+	* doc/dejagnu.texi (Invoking runtest): Document new --keep_going
+	command line option.
+	* doc/runtest.1: Likewise.
+
+	* lib/framework.exp (unknown): Report an UNRESOLVED result if an
+	unknown command is invoked.  Avoid exiting and propagate the error
+	from Tcl's "unknown" procedure if --keep_going was
+	specified. Brace procedure argument list.
+	* runtest.exp (dejagnu::opt): New namespace.
+	Add option --keep_going to continue running tests after a test
+	script aborts due to calling an undefined command.
+
+	* testsuite/runtest.main/abort.exp: New file.
+	* testsuite/runtest.main/abort/testsuite/abort.test/abort-undef.exp:
+	New file.
+	* testsuite/runtest.main/abort/testsuite/abort.test/simple.exp:
+	New file.
+
 2020-06-02  Jacob Bachmeyer  <jcb62281+dev <at> gmail.com>
 
 	PR 41647
diff --git a/Makefile.am b/Makefile.am
index 6c48c20..9a2ba81 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -26,7 +26,7 @@ EXTRA_DIST = ChangeLog-1992 MAINTAINERS dejagnu runtest \
 	$(commands_DATA) $(TESTSUITE_FILES) $(TEXINFO_TEX)\
 	$(CONTRIB)
 
-CLEANFILES = options-init.exp stats-init.exp
+CLEANFILES = abort-init.exp options-init.exp stats-init.exp
 
 clean-local: clean-local-check
 .PHONY: clean-local-check
diff --git a/Makefile.in b/Makefile.in
index 1cd211c..bc9d1e5 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -381,7 +381,7 @@ EXTRA_DIST = ChangeLog-1992 MAINTAINERS dejagnu runtest \
 	$(commands_DATA) $(TESTSUITE_FILES) $(TEXINFO_TEX)\
 	$(CONTRIB)
 
-CLEANFILES = options-init.exp stats-init.exp
+CLEANFILES = abort-init.exp options-init.exp stats-init.exp
 bin_SCRIPTS = dejagnu runtest
 include_HEADERS = dejagnu.h
 pkgdata_DATA = \
diff --git a/NEWS b/NEWS
index 209e9c4..4354422 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,8 @@ Changes since 1.6.2:
    should use this proc. The 'is_remote' proc is deprecated.
 2. runtest now accepts --local_init and --global_init options to override
    the default of reading "site.exp".  See the manual for details.
+X. runtest now accepts a --keep_going option to continue with other test
+   scripts after a test script invokes an undefined command.
 3. A utility procedure relative_filename has been added.  This procedure
    computes a relative file name to a given destination from a given base.
 4. The utility procedure 'grep' now accepts a '-n' option that
diff --git a/doc/dejagnu.texi b/doc/dejagnu.texi
index bb386e8..cd8c4e9 100644
--- a/doc/dejagnu.texi
+++ b/doc/dejagnu.texi
@@ -644,6 +644,11 @@ The host board to use.
 @item @code{--ignore [tests(s)] }
 The name(s) of specific tests to ignore.
 
+@item @code{--keep_going}
+Continue testing when test scripts encounter recoverable fatal errors.
+Such errors always cause the offending test script to abort
+immediately, but DejaGnu may be able to continue with the next script.
+
 @item @code{--local_init [name]}
 Use @emph{name} as the testsuite local init file instead of
 @file{site.exp} in the current directory and in @emph{objdir}.  The
diff --git a/doc/runtest.1 b/doc/runtest.1
index d043ee7..e8913b1 100644
--- a/doc/runtest.1
+++ b/doc/runtest.1
@@ -45,6 +45,9 @@ The host board definition to use.
 .BI --ignore \ test1.exp\ test2.exp\ ...
 Do not run the specified tests.
 .TP
+.B --keep_going
+Do not abort test run if a test script encounters a fatal error.
+.TP
 .BI --local_init \ NAME
 The NAME to use for the testsuite local init file in both the current
 directory and objdir.
diff --git a/lib/framework.exp b/lib/framework.exp
index e6ce197..53333ad 100644
--- a/lib/framework.exp
+++ b/lib/framework.exp
@@ -257,21 +257,39 @@ proc isnative { } {
 # This allows Tcl package autoloading to work in the modern age.
 
 rename ::unknown ::tcl_unknown
-proc unknown args {
-    if {[catch {uplevel 1 ::tcl_unknown $args} msg]} {
+proc unknown { args } {
+    set code [catch {uplevel 1 ::tcl_unknown $args} msg]
+    if { $code != 0 } {
 	global errorCode
 	global errorInfo
 	global exit_status
 
+	set ret_cmd [list return -code $code]
+
 	clone_output "ERROR: (DejaGnu) proc \"$args\" does not exist."
 	if {[info exists errorCode]} {
+	    lappend ret_cmd -errorcode $errorCode
 	    send_error "The error code is $errorCode\n"
 	}
 	if {[info exists errorInfo]} {
+	    # omitting errorInfo from the propagated error makes this code
+	    # invisible with the backtrace pointing directly to the problem
 	    send_error "The info on the error is:\n$errorInfo\n"
 	}
 	set exit_status 2
-	log_and_exit
+
+	set unresolved_msg "testcase '[uplevel info script]' aborted"
+	append unresolved_msg " at call to unknown command '$args'"
+	unresolved $unresolved_msg
+
+	lappend ret_cmd $msg
+	if { $::dejagnu::opt::keep_going } {
+	    eval $ret_cmd
+	} else {
+	    log_and_exit
+	}
+    } else {
+	return $msg
     }
 }
 
diff --git a/runtest.exp b/runtest.exp
index ba49e73..028ad5b 100644
--- a/runtest.exp
+++ b/runtest.exp
@@ -97,6 +97,13 @@ set global_init_file	site.exp	;# global init file name
 set testsuitedir	"testsuite"	;# top-level testsuite source directory
 set testbuilddir	"testsuite"	;# top-level testsuite object directory
 
+#
+# These are used for internal command-line flags.
+#
+namespace eval ::dejagnu::opt {
+    variable keep_going	0	;# continue after a fatal error in testcase?
+}
+
 # Various ccache versions provide incorrect debug info such as ignoring
 # different current directory, breaking GDB testsuite.
 set env(CCACHE_DISABLE) 1
@@ -372,6 +379,7 @@ proc usage { } {
     send_user "\t--host \[triplet\]\tThe canonical triplet of the host machine\n"
     send_user "\t--host_board \[name\]\tThe host board to use\n"
     send_user "\t--ignore \[name(s)\]\tThe names of specific tests to ignore\n"
+    send_user "\t--keep_going\t\tContinue testing even if a script aborts\n"
     send_user "\t--local_init \[name\]\tThe file to load for local configuration\n"
     send_user "\t--log_dialog\t\t\Emit Expect output on stdout\n"
     send_user "\t--mail \[name(s)\]\tWhom to mail the results to\n"
@@ -1201,6 +1209,10 @@ for { set i 0 } { $i < $argc } { incr i } {
 	    continue
 	}
 
+	"--k*" {			# (--keep_going) reduce fatal errors
+	    set ::dejagnu::opt::keep_going 1
+	}
+
 	"--m*" {			# (--mail) mail the output
 	    set mailing_list $optarg
 	    set mail_logs 1
diff --git a/testsuite/runtest.main/abort.exp b/testsuite/runtest.main/abort.exp
new file mode 100644
index 0000000..c5f7014
--- /dev/null
+++ b/testsuite/runtest.main/abort.exp
@@ -0,0 +1,78 @@
+# Copyright (C) 1995-2016, 2018, 2020 Free Software Foundation, Inc.
+#
+# This file is part of DejaGnu.
+#
+# DejaGnu is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# DejaGnu is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with DejaGnu; if not, write to the Free Software Foundation,
+# Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+
+# This file tests handling of fatal errors in testcases.
+# The way we do this is to recursively invoke ourselves on a small testsuite
+# and analyze the results.
+
+load_lib util-defs.exp
+
+if {![info exists tmpdir]} {
+    set tmpdir [testsuite file -object -top tmpdir]
+}
+
+set fd [open abort-init.exp w]
+puts $fd "set srcdir [testsuite file -source -test abort]"
+puts $fd "set objdir [testsuite file -object -test abort]"
+puts $fd "set tmpdir $tmpdir"
+close $fd
+
+if {![file isdirectory $tmpdir]} {
+    catch "file mkdir $tmpdir"
+}
+
+if {![file isdirectory [testsuite file -object -test abort]]} {
+    catch {file mkdir [testsuite file -object -test abort]}
+}
+
+set tests {
+    { "run only simple test"
+	"simple.exp"
+	"PASS: simple test.*\
+	*expected passes\[ \t\]+1\n" }
+    { "abort on undefined command"
+	"abort-undef.exp"
+	"PASS: running abort-undef.exp.*\
+	*UNRESOLVED: .* aborted at call to unknown command.*\
+	*expected passes\[ \t\]+1\n.*unresolved testcases\[ \t\]+1\n" }
+    { "stop at abort without --keep_going"
+	"abort-undef.exp simple.exp"
+	"PASS: running abort-undef.exp.*\
+	*UNRESOLVED: .* aborted at call to unknown command.*\
+	*expected passes\[ \t\]+1\n.*unresolved testcases\[ \t\]+1\n" }
+    { "continue after abort with --keep_going"
+	"--keep_going abort-undef.exp simple.exp"
+	"PASS: running abort-undef.exp.*\
+	*UNRESOLVED: .* aborted at call to unknown command.*\
+	*PASS: simple test.*\
+	*expected passes\[ \t\]+2\n.*unresolved testcases\[ \t\]+1\n" }
+}
+
+foreach t $tests {
+    if [util_test $RUNTEST \
+	    "--local_init abort-init.exp\
+		--outdir $tmpdir -a [lindex $t 1]" \
+	    "" \
+	    [lindex $t 2]] {
+	fail [lindex $t 0]
+    } else {
+	pass [lindex $t 0]
+    }
+}
+
+file delete -force $tmpdir
diff --git a/testsuite/runtest.main/abort/testsuite/abort.test/abort-undef.exp b/testsuite/runtest.main/abort/testsuite/abort.test/abort-undef.exp
new file mode 100644
index 0000000..e5f4803
--- /dev/null
+++ b/testsuite/runtest.main/abort/testsuite/abort.test/abort-undef.exp
@@ -0,0 +1,25 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of DejaGnu.
+#
+# DejaGnu is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# DejaGnu is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with DejaGnu; if not, write to the Free Software Foundation,
+# Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+
+# Invoke an undefined command, causing a fatal error.
+
+pass "running abort-undef.exp"
+
+bogus_command 1 2 3 4
+
+fail "script did not abort"
diff --git a/testsuite/runtest.main/abort/testsuite/abort.test/simple.exp b/testsuite/runtest.main/abort/testsuite/abort.test/simple.exp
new file mode 100644
index 0000000..93a03e7
--- /dev/null
+++ b/testsuite/runtest.main/abort/testsuite/abort.test/simple.exp
@@ -0,0 +1,21 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of DejaGnu.
+#
+# DejaGnu is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# DejaGnu is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with DejaGnu; if not, write to the Free Software Foundation,
+# Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
+
+# Return a passing result
+
+pass "simple test"
-- 
1.7.4.1

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Tue, 23 Jun 2020 18:46:02 GMT) Full text and rfc822 format available.

Message #25 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com, Tom de Vries <tdevries <at> suse.de>
Cc: 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 23 Jun 2020 19:45:04 +0100

On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>
>> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
>> but indeed, those are not tracked in the summary, so that could be a
>> problem, agreed.
>>   
> 
> Blowing up the entire test run is a good way to ensure that problems (like typos in testcases) are noticed, but admittedly, that does not help when people fail to even run the tests before checking in their typos and then other developers' test runs blow up.  Murphy's Law says the offenders then carry on to check in more typos in other code without testing their own work there either...

There's conditional execution in the gdb testsuite, where tests may
run different code paths depending on the target.  Typos can happen in
code paths that the developer isn't able to exercise.  You can
imagine someone missing something when doing some large refactoring
in the testsuite.

I'd think that tracking ERROR and WARNING in the summary would sort
out this "test run blew up but no one noticed" issue.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 23 Jun 2020 23:38:01 GMT) Full text and rfc822 format available.

Message #28 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 23 Jun 2020 18:36:50 -0500

Pedro Alves wrote:
> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>   
>>> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
>>> but indeed, those are not tracked in the summary, so that could be a
>>> problem, agreed.
>>>   
>>>       
>> Blowing up the entire test run is a good way to ensure that problems (like typos in testcases) are noticed, but admittedly, that does not help when people fail to even run the tests before checking in their typos and then other developers' test runs blow up.  Murphy's Law says the offenders then carry on to check in more typos in other code without testing their own work there either...
>>     
>
> There's conditional execution in the gdb testsuite, where tests may
> run different code paths depending on the target.  Typos can happen in
> code paths that the developer isn't able to exercise.  You can
> imagine someone missing something when doing some large refactoring
> in the testsuite.
>   

This is a good point and is far more reasonable than checking in code 
without testing it at all.  That sounds like an architectural issue with 
the GDB testsuite, but could also be a legitimate problem if the 
per-target handling really is different for some targets, especially if 
simulators for those targets are unavailable or insufficient for the tests.

> I'd think that tracking ERROR and WARNING in the summary would sort
> out this "test run blew up but no one noticed" issue.

The problem is that the DejaGnu internals often generate ERROR and 
WARNING "directly" without using the procedures that update counters.  
The solution I am using in the current PR41824 patch (which has been 
rolled into the PR41918 patch) generates an extra UNRESOLVED result when 
a Tcl error is caught in runtest, which both appears in the summary and 
clearly indicates that the test run did not produce valid results.

I am currently considering also adding a 
'--no-keep-going'/'--no_keep_going' option and making '--keep-going' the 
default for the rest of the 1.6 series.  Aborting on uncaught Tcl errors 
will be the default in 1.7, but I am trying to decide whether changing 
it in 1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl 
errors under the proverbial rug) or introducing an environment change 
(needing '--keep-going' to reach the end of some testsuites that 
"worked" before).  This is further complicated by the fact that any 
testsuite needing that option to reach the end means that, technically, 
that testsuite is invalid -- it *did* throw a Tcl error during testing.

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Wed, 24 Jun 2020 09:41:01 GMT) Full text and rfc822 format available.

Message #31 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Wed, 24 Jun 2020 10:40:45 +0100

On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>  
>>>> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
>>>> but indeed, those are not tracked in the summary, so that could be a
>>>> problem, agreed.
>>>>         
>>> Blowing up the entire test run is a good way to ensure that problems (like typos in testcases) are noticed, but admittedly, that does not help when people fail to even run the tests before checking in their typos and then other developers' test runs blow up.  Murphy's Law says the offenders then carry on to check in more typos in other code without testing their own work there either...
>>>     
>>
>> There's conditional execution in the gdb testsuite, where tests may
>> run different code paths depending on the target.  Typos can happen in
>> code paths that the developer isn't able to exercise.  You can
>> imagine someone missing something when doing some large refactoring
>> in the testsuite.
>>   
> 
> This is a good point and is far more reasonable than checking in code without testing it at all.  That sounds like an architectural issue with the GDB testsuite, but could also be a legitimate problem if the per-target handling really is different for some targets, especially if simulators for those targets are unavailable or insufficient for the tests.

It's not an architectural issue -- simply different targets support
a different set of features, support the same features but differently,
run different OSs or even no OS at all, are built with different
compilers, support a different set of languages, etc., etc.

> 
>> I'd think that tracking ERROR and WARNING in the summary would sort
>> out this "test run blew up but no one noticed" issue.
> 
> The problem is that the DejaGnu internals often generate ERROR and WARNING "directly" without using the procedures that update counters.  

I understand that that's what DejaGnu does currently, but couldn't it be changed?
Would it not be desirable?

The solution I am using in the current PR41824 patch (which has been rolled into the PR41918 patch) generates an extra UNRESOLVED result when a Tcl error is caught in runtest, which both appears in the summary and clearly indicates that the test run did not produce valid results.
> 
> I am currently considering also adding a '--no-keep-going'/'--no_keep_going' option and making '--keep-going' the default for the rest of the 1.6 series.  Aborting on uncaught Tcl errors will be the default in 1.7, but I am trying to decide whether changing it in 1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors under the proverbial rug) or introducing an environment change (needing '--keep-going' to reach the end of some testsuites that "worked" before).  This is further complicated by the fact that any testsuite needing that option to reach the end means that, technically, that testsuite is invalid -- it *did* throw a Tcl error during testing.
It seems your designing this under the assumption that if a testsuite
run "blows up", it's not a big deal to just fix it and rerun the testsuite.

What that misses is that in some environments, particularly with slow embedded
system boards, running the full testsuite can take _days_.   This is all
compounded by the fact that the testsuite must be run in a large set of
different modes and boards to cover everything for a single target.  Even on
a modern GNU/Linux system, running all combinations of modes for full coverage
can take hours.

Having the testrun stop midway because of some specific testcase
blowing up can waste a lot of time.

The fact that the testsuite can take a while to fully run results in automated
testing systems, like GDB's buildbot, struggling to keep up with testing
all the commits that land in git.  On some buildbot setups, it's just not
possible to test all commits.  So if some testcase blows up the whole
test run, the next time you run the testsuite, it'll likely be that a
large number of unrelated commits went into the sources.  Losing a large
chunk of the testsuite results because of some small typo in one
testcase is thus undesirable, because noticing the typo in the first
place can take a while, and also you may be stuck with a "last good run"
for many of the testcases that gets older and older until
the typo is finally fixed.

Then there's the issue about testsuite order.  DejaGnu runs testcases
in alphabetical filename order, in sequence.  If the testrun blows up in
the first testcase run, say, a.exp, the whole testrun stops before b.exp,
etc. are run.  If the testcase that blows up is the last one, say named
zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
the sense that it can no longer blow up the whole test run.  A
testcase's name should not be significant like that.

On the same vein, consider parallel mode.  This is not native to DejaGnu,
but both GDB and GCC have parallel modes, where their makefiles set up a
make check such that they spawn a number of runtest processes running in
parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
composed of exactly 32 testcases.  And further imagine that you run it
under "make check -j32".  That is, you'll end up with 32 runtest instances,
each running exactly one testcase.  Now, if one of the testcases blows up, all
the other 31 will still run.  Again, this shows that testcase name, and
testrun order should not be important.  There is no real reason that a
sequential run in that scenario ends up with different results (testcase a.exp
blew up, so no result for the other tests) compared to a parallel
run (all tests ran successfully but test a.exp).  Instead, I advocate
that the testresults summary should indicate that a problem happened,
so that "blow up event" isn't lost.  But do let the whole testsuite run.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Thu, 25 Jun 2020 02:45:02 GMT) Full text and rfc822 format available.

Message #34 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Wed, 24 Jun 2020 21:43:56 -0500

Pedro Alves wrote:
> On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>>       
>>> I'd think that tracking ERROR and WARNING in the summary would sort
>>> out this "test run blew up but no one noticed" issue.
>>>       
>> The problem is that the DejaGnu internals often generate ERROR and WARNING "directly" without using the procedures that update counters.  
>>     
>
> I understand that that's what DejaGnu does currently, but couldn't it be changed?
> Would it not be desirable?
>   

It could be changed, but those are much larger changes to the code and 
overall behavior than simply adding an UNRESOLVED result to report the 
failure, and if I understand correctly, the presence of UNRESOLVED 
results marks the test results as invalid or partially-valid at best, 
which is correct if a test script failed with a Tcl error.

Also, the current warning and error counters are not global, but are 
reset after each test result.  Exceeding thresholds for warnings or 
errors causes the next test result to be changed to UNRESOLVED before 
the counters are reset.

>> I am currently considering also adding a '--no-keep-going'/'--no_keep_going' option and making '--keep-going' the default for the rest of the 1.6 series.  Aborting on uncaught Tcl errors will be the default in 1.7, but I am trying to decide whether changing it in 1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors under the proverbial rug) or introducing an environment change (needing '--keep-going' to reach the end of some testsuites that "worked" before).  This is further complicated by the fact that any testsuite needing that option to reach the end means that, technically, that testsuite is invalid -- it *did* throw a Tcl error during testing.
>>     
> It seems your designing this under the assumption that if a testsuite
> run "blows up", it's not a big deal to just fix it and rerun the testsuite.
>   

For most packages, this is correct.  For GDB, perhaps it is a bigger 
problem.

On the other hand, "blowing up" at least indicates clearly that 
something is wrong and prevents "whistling past the graveyard" where 
tests are not actually running, but everyone thinks the situation is fine.

> What that misses is that in some environments, particularly with slow embedded
> system boards, running the full testsuite can take _days_.   This is all
> compounded by the fact that the testsuite must be run in a large set of
> different modes and boards to cover everything for a single target.  Even on
> a modern GNU/Linux system, running all combinations of modes for full coverage
> can take hours.
>
> Having the testrun stop midway because of some specific testcase
> blowing up can waste a lot of time.
>
> The fact that the testsuite can take a while to fully run results in automated
> testing systems, like GDB's buildbot, struggling to keep up with testing
> all the commits that land in git.  On some buildbot setups, it's just not
> possible to test all commits.  So if some testcase blows up the whole
> test run, the next time you run the testsuite, it'll likely be that a
> large number of unrelated commits went into the sources.  Losing a large
> chunk of the testsuite results because of some small typo in one
> testcase is thus undesirable, because noticing the typo in the first
> place can take a while, and also you may be stuck with a "last good run"
> for many of the testcases that gets older and older until
> the typo is finally fixed.
>   

I presume buildbot can be easily configured to pass a '--keep_going' 
option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?

Still, needing a new option to get old behavior is changing the 
environment interface, so I will make --keep_going the default, at least 
for the rest of 1.6.  That default is likely to change for 1.7.  For 
automated systems like buildbot, passing '--keep_going' should be 
perfectly reasonable, which is why the option exists.  (While the 
documented options will be changed to use hyphens to align with the rest 
of the GNU system in 1.7, the current option names will continue to be 
accepted, since the aliases are obvious.)

This has been pushed on the temporary PR41918 branch because the fix for 
PR41824 has been rolled into the fix for PR41918, as the bugs are 
closely related.

> Then there's the issue about testsuite order.  DejaGnu runs testcases
> in alphabetical filename order, in sequence.  If the testrun blows up in
> the first testcase run, say, a.exp, the whole testrun stops before b.exp,
> etc. are run.  If the testcase that blows up is the last one, say named
> zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
> a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
> the sense that it can no longer blow up the whole test run.  A
> testcase's name should not be significant like that.
>
> On the same vein, consider parallel mode.  This is not native to DejaGnu,
> but both GDB and GCC have parallel modes, where their makefiles set up a
> make check such that they spawn a number of runtest processes running in
> parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
> composed of exactly 32 testcases.  And further imagine that you run it
> under "make check -j32".  That is, you'll end up with 32 runtest instances,
> each running exactly one testcase.  Now, if one of the testcases blows up, all
> the other 31 will still run.  Again, this shows that testcase name, and
> testrun order should not be important.  There is no real reason that a
> sequential run in that scenario ends up with different results (testcase a.exp
> blew up, so no result for the other tests) compared to a parallel
> run (all tests ran successfully but test a.exp).  Instead, I advocate
> that the testresults summary should indicate that a problem happened,
> so that "blow up event" isn't lost.  But do let the whole testsuite run.
>   

There are plans to eventually implement a native parallel testing mode.  
You are correct that the order in which testcases are run should be 
insignificant, and for valid testsuites that do not crash, it is 
insignificant -- all tests are run.  The problems start when a testcase 
*crashes* and aborts with a Tcl error.

The entire reason for stopping the testsuite immediately when a Tcl 
error occurs is that it is undeniable that something is *very* *wrong* 
when the testsuite stops early.  Software testing is hard enough when 
you know your testsuite is good, but it is far worse when you have bugs 
in the testsuite masking bugs in the actual program because a test 
script is crashing and part of the program that is supposed to be tested 
is not being exercised.

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Thu, 25 Jun 2020 16:02:01 GMT) Full text and rfc822 format available.

Message #37 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Thu, 25 Jun 2020 17:01:10 +0100

On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>>>       I'd think that tracking ERROR and WARNING in the summary would sort
>>>> out this "test run blew up but no one noticed" issue.
>>>>       
>>> The problem is that the DejaGnu internals often generate ERROR and WARNING "directly" without using the procedures that update counters.      
>>
>> I understand that that's what DejaGnu does currently, but couldn't it be changed?
>> Would it not be desirable?
>>   
> 
> It could be changed, but those are much larger changes to the code and overall behavior than simply adding an UNRESOLVED result to report the failure, and if I understand correctly, the presence of UNRESOLVED results marks the test results as invalid or partially-valid at best, which is correct if a test script failed with a Tcl error.
> 
> Also, the current warning and error counters are not global, but are reset after each test result.  Exceeding thresholds for warnings or errors causes the next test result to be changed to UNRESOLVED before the counters are reset.
> 
>>> I am currently considering also adding a '--no-keep-going'/'--no_keep_going' option and making '--keep-going' the default for the rest of the 1.6 series.  Aborting on uncaught Tcl errors will be the default in 1.7, but I am trying to decide whether changing it in 1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors under the proverbial rug) or introducing an environment change (needing '--keep-going' to reach the end of some testsuites that "worked" before).  This is further complicated by the fact that any testsuite needing that option to reach the end means that, technically, that testsuite is invalid -- it *did* throw a Tcl error during testing.
>>>     
>> It seems your designing this under the assumption that if a testsuite
>> run "blows up", it's not a big deal to just fix it and rerun the testsuite.
>>   
> 
> For most packages, this is correct.  For GDB, perhaps it is a bigger problem.
> 
> On the other hand, "blowing up" at least indicates clearly that something is wrong and prevents "whistling past the graveyard" where tests are not actually running, but everyone thinks the situation is fine.
> 
>> What that misses is that in some environments, particularly with slow embedded
>> system boards, running the full testsuite can take _days_.   This is all
>> compounded by the fact that the testsuite must be run in a large set of
>> different modes and boards to cover everything for a single target.  Even on
>> a modern GNU/Linux system, running all combinations of modes for full coverage
>> can take hours.
>>
>> Having the testrun stop midway because of some specific testcase
>> blowing up can waste a lot of time.
>>
>> The fact that the testsuite can take a while to fully run results in automated
>> testing systems, like GDB's buildbot, struggling to keep up with testing
>> all the commits that land in git.  On some buildbot setups, it's just not
>> possible to test all commits.  So if some testcase blows up the whole
>> test run, the next time you run the testsuite, it'll likely be that a
>> large number of unrelated commits went into the sources.  Losing a large
>> chunk of the testsuite results because of some small typo in one
>> testcase is thus undesirable, because noticing the typo in the first
>> place can take a while, and also you may be stuck with a "last good run"
>> for many of the testcases that gets older and older until
>> the typo is finally fixed.
>>   
> 
> I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
> 

What's more likely is we'll tweak GDB's Makefile so that make check
enables that option automatically.  If GCC does the same, which I would
encourage them to, since they also run the testsuite against systems
that can take hours/days to complete, then I'll get to wonder who the
default is for.  :-)

We'll need to somehow detect that the option is supported, though, so
it's of course doable, just will require a configure check or some little
bit of shell to probe the option or something of the sort.

> Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6.  That default is likely to change for 1.7.  For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists.  (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)
> 

Currently with DejaGnu only the case of calling an unknown function stops
the test run AFAIK.  Any other tcl error like calling a proc with the wrong
number of arguments, or treating a non-array variable as an array, etc. is
caught by the catch in the runtest proc in runtest.exp, and the testrun
continues.  That to me clearly indicates that the original intention was
to catch errors and continue.

That also gives the tool's $(tool)_finish proc a chance to tear down
correctly.

It is just that the "unknown" case wasn't thought of.  So I argue that not
making unknown proc calls abort the run is a bug fix, and making DejaGnu
abort the run for other errors by default is a behavior change that no real
testsuite built around DejaGnu is asking for.

> This has been pushed on the temporary PR41918 branch because the fix for PR41824 has been rolled into the fix for PR41918, as the bugs are closely related.
> 
>> Then there's the issue about testsuite order.  DejaGnu runs testcases
>> in alphabetical filename order, in sequence.  If the testrun blows up in
>> the first testcase run, say, a.exp, the whole testrun stops before b.exp,
>> etc. are run.  If the testcase that blows up is the last one, say named
>> zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
>> a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
>> the sense that it can no longer blow up the whole test run.  A
>> testcase's name should not be significant like that.
>>
>> On the same vein, consider parallel mode.  This is not native to DejaGnu,
>> but both GDB and GCC have parallel modes, where their makefiles set up a
>> make check such that they spawn a number of runtest processes running in
>> parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
>> composed of exactly 32 testcases.  And further imagine that you run it
>> under "make check -j32".  That is, you'll end up with 32 runtest instances,
>> each running exactly one testcase.  Now, if one of the testcases blows up, all
>> the other 31 will still run.  Again, this shows that testcase name, and
>> testrun order should not be important.  There is no real reason that a
>> sequential run in that scenario ends up with different results (testcase a.exp
>> blew up, so no result for the other tests) compared to a parallel
>> run (all tests ran successfully but test a.exp).  Instead, I advocate
>> that the testresults summary should indicate that a problem happened,
>> so that "blow up event" isn't lost.  But do let the whole testsuite run.
>>   
> 
> There are plans to eventually implement a native parallel testing mode.  You are correct that the order in which testcases are run should be insignificant, and for valid testsuites that do not crash, it is insignificant -- all tests are run.
> The problems start when a testcase *crashes* and aborts with a Tcl error.  

It's hopefully obvious that I'm aware of the latter.

> 
> The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early.  Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.

I think it's safe to say that we all understand the general principle,
but IMO that applies more to continuing the same testcase (if somehow that
were possible, like "unknown" suppressing the error), rather than continuing
with a different testcase.

Continuing the testrun when one testcase errors out does not mask any
bug in the program that is supposed to be tested.  Why would it?  The
testcase aborts, it doesn't continue.  The remaining testcases start
afresh.

Anyway, given the option, it won't be bad, just a little annoying.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Thu, 25 Jun 2020 22:41:01 GMT) Full text and rfc822 format available.

Message #40 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Thu, 25 Jun 2020 17:39:58 -0500

Pedro Alves wrote:
> On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> What that misses is that in some environments, particularly with slow embedded
>>> system boards, running the full testsuite can take _days_.   This is all
>>> compounded by the fact that the testsuite must be run in a large set of
>>> different modes and boards to cover everything for a single target.  Even on
>>> a modern GNU/Linux system, running all combinations of modes for full coverage
>>> can take hours.
>>>
>>> Having the testrun stop midway because of some specific testcase
>>> blowing up can waste a lot of time.
>>>
>>> The fact that the testsuite can take a while to fully run results in automated
>>> testing systems, like GDB's buildbot, struggling to keep up with testing
>>> all the commits that land in git.  On some buildbot setups, it's just not
>>> possible to test all commits.  So if some testcase blows up the whole
>>> test run, the next time you run the testsuite, it'll likely be that a
>>> large number of unrelated commits went into the sources.  Losing a large
>>> chunk of the testsuite results because of some small typo in one
>>> testcase is thus undesirable, because noticing the typo in the first
>>> place can take a while, and also you may be stuck with a "last good run"
>>> for many of the testcases that gets older and older until
>>> the typo is finally fixed.
>>>   
>>>       
>> I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
>>     
>
> What's more likely is we'll tweak GDB's Makefile so that make check
> enables that option automatically.  If GCC does the same, which I would
> encourage them to, since they also run the testsuite against systems
> that can take hours/days to complete, then I'll get to wonder who the
> default is for.  :-)
>   

The default is for developers working on testsuites -- and for testing 
release candidates and releases because if your testcases crash in a 
release, you have problems.  :-)

Are you suggesting withdrawing the --keep-going/--no-keep-going options 
and always generating an UNRESOLVED result, resuming with the next test 
script, and possibly storing a list of aborted test scripts somewhere to 
repeat at the very end of a test run with a big warning about test cases 
that crashed?

>> Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6.  That default is likely to change for 1.7.  For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists.  (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)
>>     
>
> Currently with DejaGnu only the case of calling an unknown function stops
> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
> number of arguments, or treating a non-array variable as an array, etc. is
> caught by the catch in the runtest proc in runtest.exp, and the testrun
> continues.  That to me clearly indicates that the original intention was
> to catch errors and continue.
>
> That also gives the tool's $(tool)_finish proc a chance to tear down
> correctly.
>
> It is just that the "unknown" case wasn't thought of.  So I argue that not
> making unknown proc calls abort the run is a bug fix, and making DejaGnu
> abort the run for other errors by default is a behavior change that no real
> testsuite built around DejaGnu is asking for.
>   

As it was explained to me, it is the other way around -- the catch in 
runtest was overlooked when ::unknown was added to catch bugs by 
aborting if an undefined procedure is called.

Is there a ${tool}_finish or do you mean ${tool}_exit?  Either way, you 
make a good point about running cleanup procedures when aborting a test run.

>> The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early.  Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.
>>     
>
> I think it's safe to say that we all understand the general principle,
> but IMO that applies more to continuing the same testcase (if somehow that
> were possible, like "unknown" suppressing the error), rather than continuing
> with a different testcase.
>
> Continuing the testrun when one testcase errors out does not mask any
> bug in the program that is supposed to be tested.  Why would it?  The
> testcase aborts, it doesn't continue.  The remaining testcases start
> afresh.
>   

Consider a hypothetical case where GDB has exec.exp (which tests 
starting a debugged process) and attach.exp (which tests attaching to a 
process).  Unknown to the GDB developers, the "attach" command does not 
work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl 
error before reaching the point where "attach" is issued to the inferior 
GDB.  If DejaGnu stops immediately, it is obvious that the testsuite is 
broken, the bug in the testsuite is fixed, and the bug in GDB is found.  
If DejaGnu sweeps the Tcl error under the proverbial rug (as previous 
versions did), the bug in the testsuite effectively hides the bug in GDB 
, because the developers think the "attach" command is being exercised, 
but those tests are being (almost) silently skipped.

Now a bug that obvious will be caught very quickly by users (likely the 
GDB developers themselves) but obscure regressions are for more 
insidious, especially in large testsuites such as those for GCC or GDB.  
If a regression test is being skipped with an easily-missed error, that 
regression can slip in unnoticed.

Would simply turning aborted test scripts into UNRESOLVED results be 
enough to get them fixed quickly?

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Thu, 25 Jun 2020 23:56:02 GMT) Full text and rfc822 format available.

Message #43 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Rob Savoye <rob <at> senecass.com>
To: Pedro Alves <palves <at> redhat.com>, jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Thu, 25 Jun 2020 17:53:57 -0600

On 6/25/20 10:01 AM, Pedro Alves wrote:

>> It could be changed, but those are much larger changes to the code and overall behavior than simply adding an UNRESOLVED result to report the failure, and if I understand correctly, the presence of UNRESOLVED results marks the test results as invalid or partially-valid at best, which is correct if a test script failed with a Tcl error.

  Adding an additional call makes it harder to diff the .sum files. Not
a huge big deal, but still... UNRESOLVED says a human has to look at
this test case output and decide, the actual error should be in the .log
file. UNTESTED means something went wrong with the test, and the test is
invalid, which is probably what you'd want as the test state for a Tcl
error.

> Currently with DejaGnu only the case of calling an unknown function stops
> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
> number of arguments, or treating a non-array variable as an array, etc. is
> caught by the catch in the runtest proc in runtest.exp, and the testrun
> continues.  That to me clearly indicates that the original intention was
> to catch errors and continue.

  The original intent was to catch some errors, nd ignore others. :-)
Catch is used for errors that can be ignored or worked around. Unknown
was basically to catch programming errors, since as noted, there are
many paths though the code as you change options or environments. They'd
get lost in the output unless it aborted.

> It is just that the "unknown" case wasn't thought of.  So I argue that not
> making unknown proc calls abort the run is a bug fix, and making DejaGnu
> abort the run for other errors by default is a behavior change that no real
> testsuite built around DejaGnu is asking for.

  I think the default should be to abort on a unknown Tcl error, and
have an option to ignore them and continue. As a person that has run
many long test runs, I agree I don't want some errors to abort the test
run. But to be honest, a dependable test run should mean no bugs in the
test framework. A buggy test framework means you can trust the test results.

  Aborting was for use during debugging of test framework code itself,
so supposedly you should should never have an Tcl error during a test
run. There are times when even during debugging you'd want it to
continue, and then collect all the bugs later to fix in batch mode.

	- rob -

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 01:54:02 GMT) Full text and rfc822 format available.

Message #46 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Rob Savoye <rob <at> senecass.com>
Cc: Pedro Alves <palves <at> redhat.com>, Tom de Vries <tdevries <at> suse.de>,
 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Thu, 25 Jun 2020 20:53:46 -0500

Rob Savoye wrote:
> On 6/25/20 10:01 AM, Pedro Alves wrote:
>   
>   Adding an additional call makes it harder to diff the .sum files. Not
> a huge big deal, but still... UNRESOLVED says a human has to look at
> this test case output and decide, the actual error should be in the .log
> file. UNTESTED means something went wrong with the test, and the test is
> invalid, which is probably what you'd want as the test state for a Tcl
> error.
>   

The original rationale for using UNRESOLVED here was that DejaGnu 
converts a test result to UNRESOLVED if too many errors or warnings were 
produced.  The DejaGnu manual indicates (section "A POSIX Compliant Test 
Framework") that UNRESOLVED is correct for a test where execution was 
interrupted or was set up incorrectly.  UNTESTED is specifically listed 
as a placeholder for an as-yet-unwritten testcase.  A "typical" GDB 
testsuite run has almost a hundred UNTESTED results but (without this 
patch) zero UNRESOLVED results.

To me, this seems that UNRESOLVED is correct here, or that the manual 
has an error.

>> Currently with DejaGnu only the case of calling an unknown function stops
>> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
>> number of arguments, or treating a non-array variable as an array, etc. is
>> caught by the catch in the runtest proc in runtest.exp, and the testrun
>> continues.  That to me clearly indicates that the original intention was
>> to catch errors and continue.
>>     
>
>   The original intent was to catch some errors, nd ignore others. :-)
> Catch is used for errors that can be ignored or worked around. Unknown
> was basically to catch programming errors, since as noted, there are
> many paths though the code as you change options or environments. They'd
> get lost in the output unless it aborted.
>   

So continuing with the next test script could be reasonable if the error 
(or at least an indication that the error occurred) is "replayed" at the 
end of the test run?

>> It is just that the "unknown" case wasn't thought of.  So I argue that not
>> making unknown proc calls abort the run is a bug fix, and making DejaGnu
>> abort the run for other errors by default is a behavior change that no real
>> testsuite built around DejaGnu is asking for.
>>     
>
>   I think the default should be to abort on a unknown Tcl error, and
> have an option to ignore them and continue. As a person that has run
> many long test runs, I agree I don't want some errors to abort the test
> run. But to be honest, a dependable test run should mean no bugs in the
> test framework. A buggy test framework means you can trust the test results.
>
>   Aborting was for use during debugging of test framework code itself,
> so supposedly you should should never have an Tcl error during a test
> run. There are times when even during debugging you'd want it to
> continue, and then collect all the bugs later to fix in batch mode.
>   

This succinctly states the original motivation for having the 
--keep-going option, but having "make check" always pass --keep-going 
defeats the purpose.


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 09:54:02 GMT) Full text and rfc822 format available.

Message #49 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Fri, 26 Jun 2020 10:53:34 +0100

On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> What that misses is that in some environments, particularly with slow embedded
>>>> system boards, running the full testsuite can take _days_.   This is all
>>>> compounded by the fact that the testsuite must be run in a large set of
>>>> different modes and boards to cover everything for a single target.  Even on
>>>> a modern GNU/Linux system, running all combinations of modes for full coverage
>>>> can take hours.
>>>>
>>>> Having the testrun stop midway because of some specific testcase
>>>> blowing up can waste a lot of time.
>>>>
>>>> The fact that the testsuite can take a while to fully run results in automated
>>>> testing systems, like GDB's buildbot, struggling to keep up with testing
>>>> all the commits that land in git.  On some buildbot setups, it's just not
>>>> possible to test all commits.  So if some testcase blows up the whole
>>>> test run, the next time you run the testsuite, it'll likely be that a
>>>> large number of unrelated commits went into the sources.  Losing a large
>>>> chunk of the testsuite results because of some small typo in one
>>>> testcase is thus undesirable, because noticing the typo in the first
>>>> place can take a while, and also you may be stuck with a "last good run"
>>>> for many of the testcases that gets older and older until
>>>> the typo is finally fixed.
>>>>         
>>> I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
>>>     
>>
>> What's more likely is we'll tweak GDB's Makefile so that make check
>> enables that option automatically.  If GCC does the same, which I would
>> encourage them to, since they also run the testsuite against systems
>> that can take hours/days to complete, then I'll get to wonder who the
>> default is for.  :-)
>>   
> 
> The default is for developers working on testsuites -- and for testing release candidates and releases because if your testcases crash in a release, you have problems.  :-)
> 
> Are you suggesting withdrawing the --keep-going/--no-keep-going options

Yes, but if you would like to add it, it's OK with me, though then
you can say that I'm more discussing the default.  But really I'm more
worried about making sure that the people maintaining the GDB
testsuite (me included), and the DejaGnu people understand each
other's needs better.

> and always generating an UNRESOLVED result, 
> resuming with the next test script, and possibly storing a list of aborted test scripts somewhere 

Yes (with --keep-going if you insist on having the option), but note that
I suggested storing the list of ERRORs and WARNINGs.  

> to repeat at the very end of a test run with a big warning about test cases that crashed?

I did not suggest to repeat anything.

> 
>>> Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6.  That default is likely to change for 1.7.  For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists.  (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)
>>>     
>>
>> Currently with DejaGnu only the case of calling an unknown function stops
>> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
>> number of arguments, or treating a non-array variable as an array, etc. is
>> caught by the catch in the runtest proc in runtest.exp, and the testrun
>> continues.  That to me clearly indicates that the original intention was
>> to catch errors and continue.
>>
>> That also gives the tool's $(tool)_finish proc a chance to tear down
>> correctly.
>>
>> It is just that the "unknown" case wasn't thought of.  So I argue that not
>> making unknown proc calls abort the run is a bug fix, and making DejaGnu
>> abort the run for other errors by default is a behavior change that no real
>> testsuite built around DejaGnu is asking for.
>>   
> 
> As it was explained to me, it is the other way around -- the catch in runtest was overlooked when ::unknown was added to catch bugs by aborting if an undefined procedure is called.
> 
> Is there a ${tool}_finish or do you mean ${tool}_exit?  Either way, you make a good point about running cleanup procedures when aborting a test run.

Look at proc runtest in runtest.exp, it calls ${tool}_init, 
sources the testcase under catch (using ERROR:, so here you could
run a tally of such caught errors), and then calls ${tool}_finish.

> 
>>> The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early.  Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.
>>>     
>>
>> I think it's safe to say that we all understand the general principle,
>> but IMO that applies more to continuing the same testcase (if somehow that
>> were possible, like "unknown" suppressing the error), rather than continuing
>> with a different testcase.
>>
>> Continuing the testrun when one testcase errors out does not mask any
>> bug in the program that is supposed to be tested.  Why would it?  The
>> testcase aborts, it doesn't continue.  The remaining testcases start
>> afresh.
>>   
> 
> Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp (which tests attaching to a process).  Unknown to the GDB developers, the "attach" command does not work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point where "attach" is issued to the inferior GDB.  If DejaGnu stops immediately, it is obvious that the testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu sweeps the Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the bug in GDB , because the developers think the "attach" command is being exercised, but those tests are being (almost) silently skipped.

First, that scenario doesn't normally cause a Tcl error.  If GDB trips
an internal error, the GDB testsuite catches it with a FAIL.  If GDB just dies
(whether it dumps core or not) the GDB testsuite catches that
with UNRESOLVED.

Also, even if those weren't already considered, the developer won't think the
testcase is just passing correctly, because being a good GDB developer,
he knows he needs to diff the new gdb.sum file against a previous run's, and
sees that some PASSes are missing, and some ERROR: tcl error showed up
in their stead.  Nobody should rely on just looking at the summary.
Everyone _must_ diff between a previous run and the current run.  Now some
people may use test result diffing scripts that forget to look
at ERROR: lines in .sum.  Those would be buggy, but then again, if
those ERRORs were accounted for in the DejaGnu summary, they would be
harder to miss.

And again, people run the GDB testsuite in parallel mode, so your
assumption that the testsuite would stop immediately anyway doesn't
hold.  One of the parallel runtest invocations bombs
out, while the others keep running, so by the time you get
back the prompt, other testcases already executed, and you don't
see the ERROR in the current terminal anymore.  You'll need to
look at the testsuite .sum .log files anyway.  Only, you will
find out that one testcase bombed out, 50000 tests ran, but 1000 others
didn't run because of that one testcase.  And now you're missing
those 1000 results, which maybe included those which you were
actually interested in, because you were doing a builbot try run
across a number of systems.

Now, when running the testsuite against slow embedded boards, then
you won't normally be using the parallel mode, the board just can't
support multiple parallel connections.  In that case, you set the
testsuite running, and go on your merry way for a couple hours or 20.
When you come back, maybe the next day, you notice that the testsuite
actually stopped running 30 mins in, because someone changed
the testsuite upstream just before you rebased in way that caused a
tcl error for you in some testcase.  Oops.  You just lost a day.  Now
you wish you had remembered to type --keep-going, but it's not
the default, and nobody ever writes Tcl bugs, so you didn't think
of it before.

That's why I think --keep-going should be the default.  When
you're developing a new testcase, you'll run that testcase in 
isolation a number of times.  It's very hard to miss any
silly error then -- there's no need to optimize for this use
case by default, IMO.

> 
> Now a bug that obvious will be caught very quickly by users (likely the GDB developers themselves) but obscure regressions are for more insidious, especially in large testsuites such as those for GCC or GDB.  If a regression test is being skipped with an easily-missed error, that regression can slip in unnoticed.
> 
> Would simply turning aborted test scripts into UNRESOLVED results be enough to get them fixed quickly?

Yes.  (Though I still think a separate count for ERROR (and WARNING)
would be better.)

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 13:42:02 GMT) Full text and rfc822 format available.

Message #52 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <pedro <at> palves.net>
To: Pedro Alves <palves <at> redhat.com>, jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Fri, 26 Jun 2020 11:08:11 +0100

On 6/26/20 10:53 AM, Pedro Alves wrote:
> 
>> to repeat at the very end of a test run with a big warning about test cases that crashed?
> I did not suggest to repeat anything.
> 

I misunderstood you here -- I somehow thought that that by "repeat" you meant
re-running the testcase.

Still, I would not suggest to repeat the error.  Let me clarify -- with this:

~~~~
diff --git c/gdb/testsuite/gdb.base/break.exp w/gdb/testsuite/gdb.base/break.exp
index 8c7ce42d805..056252c0677 100644
--- c/gdb/testsuite/gdb.base/break.exp
+++ w/gdb/testsuite/gdb.base/break.exp
@@ -15,6 +15,9 @@

 # This file was written by Rob Savoye. (rob <at> cygnus.com)

+set a 1
+set a(1) 1
+
~~~~

You get, if you run two testcases, and the other one manages to run
first:

$ make check TESTS="*/break.exp */schedlock.exp" 
make[1]: Entering directory '/home/pedro/brno/pedro/gdb/mygit/build/gdb/testsuite'
Nothing to be done for all...
make check-single
make[2]: Entering directory '/home/pedro/brno/pedro/gdb/mygit/build/gdb/testsuite'
rootme=`pwd`; export rootme; srcdir=/home/pedro/gdb/mygit/src/gdb/testsuite ; export srcdir ; EXPECT=`if [ "${READ1}" != "" ] ; then echo ${rootme}/expect-read1; elif [ -f ${rootme}/../../expect/expect ] ; then echo ${rootme}/../../expect/expect ; else echo expect ; fi` ; export EXPECT ; EXEEXT= ; export EXEEXT ;    LD_LIBRARY_PATH=$rootme/../../expect:$rootme/../../libstdc++:$rootme/../../tk/unix:$rootme/../../tcl/unix:$rootme/../../bfd:$rootme/../../opcodes:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; if [ -f ${rootme}/../../expect/expect ] ; then TCL_LIBRARY=${srcdir}/../../tcl/library ; export TCL_LIBRARY ; fi ; runtest --status  gdb.base/break.exp gdb.threads/schedlock.exp 
NOTE: Dejagnu's default_target_compile is missing support for Go, using local override
NOTE: Dejagnu's default_target_compile is missing support for Rust, using local override
Test run by pedro on Fri Jun 26 11:03:38 2020
Native configuration is x86_64-pc-linux-gnu

                === gdb tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/pedro/gdb/mygit/src/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file.
Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/schedlock.exp ...
Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/break.exp ...
ERROR: tcl error sourcing /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/break.exp.
ERROR: can't set "a(1)": variable isn't array
    while executing
"set a(1) 1"
    (file "/home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/break.exp" line 19)
    invoked from within
"source /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/break.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.base/break.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === gdb Summary ===

# of expected passes            208
/home/pedro/brno/pedro/gdb/mygit/build/gdb/gdb version  10.0.50.20200625-git -nw -nx -data-directory /home/pedro/brno/pedro/gdb/mygit/build/gdb/testsuite/../data-directory 

make[2]: *** [Makefile:206: check-single] Error 2
make[2]: Leaving directory '/home/pedro/brno/pedro/gdb/mygit/build/gdb/testsuite'
make[1]: *** [Makefile:159: check] Error 2
make[1]: Leaving directory '/home/pedro/brno/pedro/gdb/mygit/build/gdb/testsuite'
make: *** [Makefile:1628: check] Error 2

~~~~

Now, you already have an indication that something went wrong
because make existed with error.  But people may miss that.
The issue for me is that the "gdb Summary" tally did not
mention the aborted testcase.  I would like to see something
like this instead:

 # of expected passes            208
 # of nasty OMG FIX! tcl errors    1

:-)

Your UNRESOLVED is of course better than the status quo.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 15:32:01 GMT) Full text and rfc822 format available.

Message #55 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Rob Savoye <rob <at> senecass.com>
To: Pedro Alves <palves <at> redhat.com>, jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Fri, 26 Jun 2020 09:30:22 -0600

On 6/26/20 3:53 AM, Pedro Alves wrote:

> you can say that I'm more discussing the default.  But really I'm more
> worried about making sure that the people maintaining the GDB
> testsuite (me included), and the DejaGnu people understand each
> other's needs better.

  As the author of the GDB testsuite, I understand quite well how it
needs to work, or at least used to. Testing GDB was the #1 requirement
that lead to creating DejaGnu. Both have evolved considerably over the
years, so working together on improving the current implementation is a
good idea.

  What I get out of this discussion, which is a good one, is how DejaGnu
should handle errors. Unknown is one case, but there others. Whether
something is a warning, error, UNRESOLVED, UNTESTED, UNSUPPORTED, etc...
is often up to the developer. Maybe what needs to be improved is better
tracking of issues in a test run.

> at ERROR: lines in .sum.  Those would be buggy, but then again, if
> those ERRORs were accounted for in the DejaGnu summary, they would be
> harder to miss.

  Right, one reason I started on database support with better analysis
tools. Diffing text files has limited functionality.

> That's why I think --keep-going should be the default.  When
> you're developing a new testcase, you'll run that testcase in 
> isolation a number of times.  It's very hard to miss any
> silly error then -- there's no need to optimize for this use
> case by default, IMO.

  I personally don't consider development or debugging done till I have
done a full testrun native and cross to catch the errors running a test
in isolation misses. Yes, I know this can take days, but I prefer that
to assuming a validation testrun is not going to have problems.

  I'll note that there is a $HOME/.dejagnurc, where you can set the
default to whatever you want. :-)

	- rob-

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 22:38:02 GMT) Full text and rfc822 format available.

Message #58 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Fri, 26 Jun 2020 17:37:23 -0500

Pedro Alves wrote:
> On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
>>>  
>>>       
>>>> Pedro Alves wrote:
>>>>    
>>>>         
>>>>> [...]
>>>>>           
>>>> I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
>>>>     
>>>>         
>>> What's more likely is we'll tweak GDB's Makefile so that make check
>>> enables that option automatically.  If GCC does the same, which I would
>>> encourage them to, since they also run the testsuite against systems
>>> that can take hours/days to complete, then I'll get to wonder who the
>>> default is for.  :-)
>>>   
>>>       
>> The default is for developers working on testsuites -- and for testing release candidates and releases because if your testcases crash in a release, you have problems.  :-)
>>
>> Are you suggesting withdrawing the --keep-going/--no-keep-going options
>>     
>
> Yes, but if you would like to add it, it's OK with me, though then
> you can say that I'm more discussing the default.  But really I'm more
> worried about making sure that the people maintaining the GDB
> testsuite (me included), and the DejaGnu people understand each
> other's needs better.
>   

My position here stems from the need for reliable testing, and part of 
that is that crashes in the testsuite scripts need to be made very 
obvious, where DejaGnu had previously swept them under the proverbial 
rug.  Sure, there would be a message in the log, but that is easily 
missed in a testsuite as large as GDB's and nowhere was a need to grep 
the log for those messages documented.

>> and always generating an UNRESOLVED result, 
>> resuming with the next test script, and possibly storing a list of aborted test scripts somewhere 
>>     
>
> Yes (with --keep-going if you insist on having the option), but note that
> I suggested storing the list of ERRORs and WARNINGs.
>   

Arguably, a crashed test script is a distinct category from the ERRORs 
and WARNINGs that test scripts can report when something does not go as 
expected.  The former is a problem with the testsuite *itself*, while 
the latter indicate problems detected *by* the testsuite.

>>>> Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6.  That default is likely to change for 1.7.  For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists.  (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)
>>>>     
>>>>         
>>> Currently with DejaGnu only the case of calling an unknown function stops
>>> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
>>> number of arguments, or treating a non-array variable as an array, etc. is
>>> caught by the catch in the runtest proc in runtest.exp, and the testrun
>>> continues.  That to me clearly indicates that the original intention was
>>> to catch errors and continue.
>>>
>>> That also gives the tool's $(tool)_finish proc a chance to tear down
>>> correctly.
>>>
>>> It is just that the "unknown" case wasn't thought of.  So I argue that not
>>> making unknown proc calls abort the run is a bug fix, and making DejaGnu
>>> abort the run for other errors by default is a behavior change that no real
>>> testsuite built around DejaGnu is asking for.
>>>   
>>>       
>> As it was explained to me, it is the other way around -- the catch in runtest was overlooked when ::unknown was added to catch bugs by aborting if an undefined procedure is called.
>>
>> Is there a ${tool}_finish or do you mean ${tool}_exit?  Either way, you make a good point about running cleanup procedures when aborting a test run.
>>     
>
> Look at proc runtest in runtest.exp, it calls ${tool}_init, 
> sources the testcase under catch (using ERROR:, so here you could
> run a tally of such caught errors), and then calls ${tool}_finish.
>   

Well, sure enough, there are ${tool}_init and ${tool}_finish procedures 
called "around" each script if they exist... and no documentation... 
another item added to my local TODO list for improving the manual.

>>>> The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early.  Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.
>>>>     
>>>>         
>>> I think it's safe to say that we all understand the general principle,
>>> but IMO that applies more to continuing the same testcase (if somehow that
>>> were possible, like "unknown" suppressing the error), rather than continuing
>>> with a different testcase.
>>>
>>> Continuing the testrun when one testcase errors out does not mask any
>>> bug in the program that is supposed to be tested.  Why would it?  The
>>> testcase aborts, it doesn't continue.  The remaining testcases start
>>> afresh.
>>>   
>>>       
>> Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp (which tests attaching to a process).  Unknown to the GDB developers, the "attach" command does not work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point where "attach" is issued to the inferior GDB.  If DejaGnu stops immediately, it is obvious that the testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu sweeps the Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the bug in GDB , because the developers think the "attach" command is being exercised, but those tests are being (almost) silently skipped.
>>     
>
> First, that scenario doesn't normally cause a Tcl error.  If GDB trips
> an internal error, the GDB testsuite catches it with a FAIL.  If GDB just dies
> (whether it dumps core or not) the GDB testsuite catches that
> with UNRESOLVED.
>   

In the hypothetical scenario, attach.exp crashes with a Tcl error 
*before* issuing the "attach" command to the GDB-under-test.  Previous 
releases of DejaGnu mention a Tcl error in the log and carry on.  With 
this patch, DejaGnu catches that attach.exp crashed and itself inserts 
an UNRESOLVED result.

I also take this as more support for using the UNRESOLVED status here:  
the test script crashing is little different from GDB exiting unexpectedly.

> Also, even if those weren't already considered, the developer won't think the
> testcase is just passing correctly, because being a good GDB developer,
> he knows he needs to diff the new gdb.sum file against a previous run's, and
> sees that some PASSes are missing, and some ERROR: tcl error showed up
> in their stead.  Nobody should rely on just looking at the summary.
> Everyone _must_ diff between a previous run and the current run.  Now some
> people may use test result diffing scripts that forget to look
> at ERROR: lines in .sum.  Those would be buggy, but then again, if
> those ERRORs were accounted for in the DejaGnu summary, they would be
> harder to miss.
>   

If this is the case, then what is the problem with aborting the test run 
upon error?  "Good GDB developers" will always diff the .sum files, 
notice that the expected new PASS results are not there, and fix their 
code before checking it in, so they will never have a problem with 
DejaGnu aborting upon encountering a Tcl error.  :-)

> And again, people run the GDB testsuite in parallel mode, so your
> assumption that the testsuite would stop immediately anyway doesn't
> hold.  One of the parallel runtest invocations bombs
> out, while the others keep running, so by the time you get
> back the prompt, other testcases already executed, and you don't
> see the ERROR in the current terminal anymore.

This is a fundamental problem with slicing up the testsuite external to 
DejaGnu and running multiple instances of runtest -- output that is 
supposed to be "last at the end of the run" can be hidden away by other 
ongoing subset runs.

>   You'll need to
> look at the testsuite .sum .log files anyway.  Only, you will
> find out that one testcase bombed out, 50000 tests ran, but 1000 others
> didn't run because of that one testcase.  And now you're missing
> those 1000 results, which maybe included those which you were
> actually interested in, because you were doing a builbot try run
> across a number of systems.
>   

This also applies on a smaller scale for which we have no workaround:  
gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did 
not run.  The only real solution is for the test scripts to avoid 
causing Tcl errors in the first place; anything else is papering over 
the problem with duct tape.

> Now, when running the testsuite against slow embedded boards, then
> you won't normally be using the parallel mode, the board just can't
> support multiple parallel connections.  In that case, you set the
> testsuite running, and go on your merry way for a couple hours or 20.
> When you come back, maybe the next day, you notice that the testsuite
> actually stopped running 30 mins in, because someone changed
> the testsuite upstream just before you rebased in way that caused a
> tcl error for you in some testcase.  Oops.  You just lost a day.

This is more of an annoyance, and yes, I have come back the next day to 
find that a long-running command failed shortly after being started, so 
I know this irritation.

>   Now
> you wish you had remembered to type --keep-going, but it's not
> the default, and nobody ever writes Tcl bugs, so you didn't think
> of it before.
>   

Note that the snapshot of GDB I have been using for testing this patch 
does have such a Tcl bug in its testsuite.  I did not make any 
particular effort to choose this snapshot, it was just the current state 
of GDB master when I started.  I have not even looked at *which* commit 
it is.  Thinking "nobody ever writes Tcl bugs" is what causes these 
problems in the first place.

I would expect that, had DejaGnu always aborted immediately upon 
catching Tcl errors from testsuites, that issue would have been found 
and fixed quickly.

> That's why I think --keep-going should be the default.  When
> you're developing a new testcase, you'll run that testcase in 
> isolation a number of times.  It's very hard to miss any
> silly error then -- there's no need to optimize for this use
> case by default, IMO.
>
>   
>> Now a bug that obvious will be caught very quickly by users (likely the GDB developers themselves) but obscure regressions are for more insidious, especially in large testsuites such as those for GCC or GDB.  If a regression test is being skipped with an easily-missed error, that regression can slip in unnoticed.
>>
>> Would simply turning aborted test scripts into UNRESOLVED results be enough to get them fixed quickly?
>>     
>
> Yes.  (Though I still think a separate count for ERROR (and WARNING)
> would be better.)

I think we need the UNRESOLVED results for at least a credible 
best-effort at claiming POSIX compliance, but additionally tracking how 
many UNRESOLVED results are due to script crashes would not be difficult.


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Fri, 26 Jun 2020 22:59:02 GMT) Full text and rfc822 format available.

Message #61 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <pedro <at> palves.net>
Cc: Pedro Alves <palves <at> redhat.com>, Tom de Vries <tdevries <at> suse.de>,
 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Fri, 26 Jun 2020 17:58:09 -0500

Pedro Alves wrote:
> On 6/26/20 10:53 AM, Pedro Alves wrote:
>   
>>> to repeat at the very end of a test run with a big warning about test cases that crashed?
>>>       
>> I did not suggest to repeat anything.
>>     
>
> I misunderstood you here -- I somehow thought that that by "repeat" you meant
> re-running the testcase.
>   

No, repeating output of at least the name of the file, and possibly the 
entire error dump.

> Still, I would not suggest to repeat the error.  Let me clarify -- with this:
>
> [...]
>
> Now, you already have an indication that something went wrong
> because make existed with error.  But people may miss that.
>   

They will easily miss that -- DejaGnu exits with a failure status if any 
tests produced "surprising" results:  FAIL, XPASS, or UNRESOLVED.  The 
GDB testsuite typically has multiple FAIL results, so exiting with an 
error code is routine.

> The issue for me is that the "gdb Summary" tally did not
> mention the aborted testcase.  I would like to see something
> like this instead:
>
>  # of expected passes            208
>  # of nasty OMG FIX! tcl errors    1
>
> :-)
>
> Your UNRESOLVED is of course better than the status quo.
>   

UNRESOLVED is a step in the right direction, but I agree that more is 
needed.  We are somewhat limited by POSIX, which does not define a 
separate result type for "test case crashed" and seems to roll that into 
UNRESOLVED, although POSIX appears to require a testsuite to have a 
strictly-defined set of tests and that a run must produce results for 
exactly all of them, which requires information DejaGnu does not 
normally have about the running testsuite.  So, while UNRESOLVED is 
needed, we could also maintain a separate count of 
UNRESOLVED-due-to-crash, or simply store away the error 
message/errorCode/errorInfo tuples and emit them a second time at the 
end of the run, also easy in Tcl 8.

I am leaning towards this latter solution precisely because it would be 
loud and obnoxious, while still maintaining a professional tone.  
(Listing a count of "nasty OMG FIX! >:-( tcl errors", while certainly 
expressing the proper intent, probably should not find its way into a 
DejaGnu release, nor should an ASCII-art C'thulu reminding the user that 
broken testsuites drive programmers to madness, complete with 
randomly-chosen "your remaining sanity" score...  :-)  )  Repeating 
error dumps at the end of the run also fits well into an XML log model, 
where the collected errors could go into their own structure at the end 
of the file, without complicating the tag model used for the main test 
results.  Simple importers would still get the inserted UNRESOLVED 
result, while more thorough or specialized tools could analyze the errors.

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 11:41:02 GMT) Full text and rfc822 format available.

Message #64 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com, Pedro Alves <pedro <at> palves.net>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:40:10 +0100

On 6/26/20 11:58 PM, Jacob Bachmeyer wrote:

>> The issue for me is that the "gdb Summary" tally did not
>> mention the aborted testcase.  I would like to see something
>> like this instead:
>>
>>  # of expected passes            208
>>  # of nasty OMG FIX! tcl errors    1
>>
>> :-)
>>
>> Your UNRESOLVED is of course better than the status quo.
>>   
> 
> UNRESOLVED is a step in the right direction, but I agree that more is needed.  We are somewhat limited by POSIX, which does not define a separate result type for "test case crashed" and seems to roll that into UNRESOLVED, although POSIX appears to require a testsuite to have a strictly-defined set of tests and that a run must produce results for exactly all of them, which requires information DejaGnu does not normally have about the running testsuite.  So, while UNRESOLVED is needed, we could also maintain a separate count of UNRESOLVED-due-to-crash, or simply store away the error message/errorCode/errorInfo tuples and emit them a second time at the end of the run, also easy in Tcl 8.

I'm not sure how much is POSIX compliance important these days.  I would hope that that
would not limit DejaGnu's development and limit its potential to be more usable.  Even
if POSIX compliance is important, I would think that that could be put behind
a --posixly_correct option, and/or POSIXLY_CORRECT environment variable, like other
GNU tools do:

 https://www.gnu.org/prep/standards/html_node/Non_002dGNU-Standards.html

Or conversely, we could have a --gnu mode, though I would say that a GNU tool
should behave in GNU mode by default.

I really don't buy that adding different result types is an issue, as
hinted earlier.  We've actually done it in GDB, for a loooong while.  We
have distinct XFAIL vs KFAIL output messages show up in the .sum file (though
those aren't distinguished in the summary file, doing so would be convenient,
hence I see that as a GDB testsuite bug).

And just very recently, we've added PATH and DUPLICATE output messages,
and those do show up in the summary:

# of expected passes            89593
# of unexpected failures        547
# of expected failures          146
# of known failures             104
# of untested testcases         20
# of unresolved testcases       5
# of unsupported tests          97
# of paths in test names        1      <<<<<
# of duplicate test names       370    <<<<<

I would think that even these two would be useful to have in
DejaGnu proper for all testsuites.

v1/rationale     - https://sourceware.org/pipermail/gdb-patches/2020-April/167847.html
v3/final version - https://sourceware.org/pipermail/gdb-patches/2020-April/168105.html

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 11:41:02 GMT) Full text and rfc822 format available.

Message #67 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:40:20 +0100

On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> On 6/25/20 3:43 AM, Jacob Bachmeyer wrote:
>>>>  
>>>>      
>>>>> Pedro Alves wrote:
>>>>>           
>>>>>> [...]
>>>>>>           
>>>>> I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?
>>>>>             
>>>> What's more likely is we'll tweak GDB's Makefile so that make check
>>>> enables that option automatically.  If GCC does the same, which I would
>>>> encourage them to, since they also run the testsuite against systems
>>>> that can take hours/days to complete, then I'll get to wonder who the
>>>> default is for.  :-)
>>>>         
>>> The default is for developers working on testsuites -- and for testing release candidates and releases because if your testcases crash in a release, you have problems.  :-)
>>>
>>> Are you suggesting withdrawing the --keep-going/--no-keep-going options
>>>     
>>
>> Yes, but if you would like to add it, it's OK with me, though then
>> you can say that I'm more discussing the default.  But really I'm more
>> worried about making sure that the people maintaining the GDB
>> testsuite (me included), and the DejaGnu people understand each
>> other's needs better.
>>   
> 
> My position here stems from the need for reliable testing, and part of that is that crashes in the testsuite scripts need to be made very obvious, where DejaGnu had previously swept them under the proverbial rug.  Sure, there would be a message in the log, but that is easily missed in a testsuite as large as GDB's and nowhere was a need to grep the log for those messages documented.

Sure, but we don't need to drastically change DejaGnu's behavior to address that.

> 
>>>>> Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6.  That default is likely to change for 1.7.  For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists.  (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)
>>>>>             
>>>> Currently with DejaGnu only the case of calling an unknown function stops
>>>> the test run AFAIK.  Any other tcl error like calling a proc with the wrong
>>>> number of arguments, or treating a non-array variable as an array, etc. is
>>>> caught by the catch in the runtest proc in runtest.exp, and the testrun
>>>> continues.  That to me clearly indicates that the original intention was
>>>> to catch errors and continue.
>>>>
>>>> That also gives the tool's $(tool)_finish proc a chance to tear down
>>>> correctly.
>>>>
>>>> It is just that the "unknown" case wasn't thought of.  So I argue that not
>>>> making unknown proc calls abort the run is a bug fix, and making DejaGnu
>>>> abort the run for other errors by default is a behavior change that no real
>>>> testsuite built around DejaGnu is asking for.
>>>>         
>>> As it was explained to me, it is the other way around -- the catch in runtest was overlooked when ::unknown was added to catch bugs by aborting if an undefined procedure is called.
>>>
>>> Is there a ${tool}_finish or do you mean ${tool}_exit?  Either way, you make a good point about running cleanup procedures when aborting a test run.
>>>     
>>
>> Look at proc runtest in runtest.exp, it calls ${tool}_init, sources the testcase under catch (using ERROR:, so here you could
>> run a tally of such caught errors), and then calls ${tool}_finish.
>>   
> 
> Well, sure enough, there are ${tool}_init and ${tool}_finish procedures called "around" each script if they exist... and no documentation... another item added to my local TODO list for improving the manual.
> 
>>>>> The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early.  Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.

It's actually deniable that there's something *very* *wrong*.  For one, all tests
run under the same runtest process, and are sourced in the global namespace.
So it is easy for a global variable to leak from one testcase to another.  That's
usually harmless, except when you have

 set foo 1

in testcase a.exp, and then b.exp does:

 set foo(1) 1

Whoops, TCL doesn't like that, so it aborts the testcase.

Could this be solved by putting each testcase in its own address space?
Yes.  Do people actually do that?  No -- it's a lot of make work and the
problem is rare.  Is that reuse global as array really revealing such a MAJOR
issue with the GDB testcase that it warrants stopping the whole run?  No...

I would argue that instead it is the DejaGnu framework that should make
it so that each testcase is run under as much isolation as possible.
So I call that a design issue in DejaGnu itself.  Whether that could
be fixed by something like sourcing each testcase under its own
Tcl slave interpreter, I'm not sure.

Another example of a Tcl bug that sometimes we trip on.  Say we have a
testcase that needs to extract some address from the inferior
in its setup phase, something like this:

 set test "get address"
 gdb_test_multiple "x /i \$pc" $test {
     -re "($hex).*$gdb_prompt $" {
         set addr $expect_out(1,string)
         pass $test
     }
 }
 
 if {$addr == 0} {
   # The above failed, so no use continuing.
   return
 }

Now, gdb_test_multiple is a wrapper around expect, that internally catches
GDB crashes, internal GDB errors, timeouts and other things, so the bug above
is often missed.  The bug is of course that when e.g., a timeout occurs, $addr
is left uninitialized, so you get a Tcl error referencing a nonexisting variable.

But the thing is, we already issued a FAIL, and we want to bail out of
the testcase anyway.  Crashing the whole testrun for a bug that typically
happens in already-failing code is too strict.  And these are the
code paths that are not exercised in a normal run, so are 
easy to miss, and are just impossible to have full coverage for.

>>>>>             
>>>> I think it's safe to say that we all understand the general principle,
>>>> but IMO that applies more to continuing the same testcase (if somehow that
>>>> were possible, like "unknown" suppressing the error), rather than continuing
>>>> with a different testcase.
>>>>
>>>> Continuing the testrun when one testcase errors out does not mask any
>>>> bug in the program that is supposed to be tested.  Why would it?  The
>>>> testcase aborts, it doesn't continue.  The remaining testcases start
>>>> afresh.
>>>>         
>>> Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp (which tests attaching to a process).  Unknown to the GDB developers, the "attach" command does not work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point where "attach" is issued to the inferior GDB.  If DejaGnu stops immediately, it is obvious that the testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu sweeps the Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the bug in GDB , because the developers think the "attach" command is being exercised, but those tests are being (almost) silently skipped.
>>>     
>>
>> First, that scenario doesn't normally cause a Tcl error.  If GDB trips
>> an internal error, the GDB testsuite catches it with a FAIL.  If GDB just dies
>> (whether it dumps core or not) the GDB testsuite catches that
>> with UNRESOLVED.
>>   
> 
> In the hypothetical scenario, attach.exp crashes with a Tcl error *before* issuing the "attach" command to the GDB-under-test.  Previous releases of DejaGnu mention a Tcl error in the log and carry on.  With this patch, DejaGnu catches that attach.exp crashed and itself inserts an UNRESOLVED result.
> 
> I also take this as more support for using the UNRESOLVED status here:  the test script crashing is little different from GDB exiting unexpectedly.

Well, I'm actually not sure why doesn't GDB FAIL in the crash case instead of
UNRESOLVED.  To me a GDB internal error or the GDB process crashing doesn't
look very different.  Both require pretty much the same level of attention
from a developer.  But that decision predates me.  Who knows, maybe GDB
internal error support was only added after the testsuite was originally
written, and thus the crash handling predates the internal error handling.

> 
>> Also, even if those weren't already considered, the developer won't think the
>> testcase is just passing correctly, because being a good GDB developer,
>> he knows he needs to diff the new gdb.sum file against a previous run's, and
>> sees that some PASSes are missing, and some ERROR: tcl error showed up
>> in their stead.  Nobody should rely on just looking at the summary.
>> Everyone _must_ diff between a previous run and the current run.  Now some
>> people may use test result diffing scripts that forget to look
>> at ERROR: lines in .sum.  Those would be buggy, but then again, if
>> those ERRORs were accounted for in the DejaGnu summary, they would be
>> harder to miss.
>>   
> 
> If this is the case, then what is the problem with aborting the test run upon error?  "Good GDB developers" will always diff the .sum files, notice that the expected new PASS results are not there, and fix their code before checking it in, so they will never have a problem with DejaGnu aborting upon encountering a Tcl error.  :-)

As I've explained, the problem is that one testcase prevented a large number
of others from running, potentially substantially all of the testsuite.  So you've
now wasted a lot of time and maybe prevented other people from using the
target system meanwhile.  All because of needless interference between
testcases.

> 
>> And again, people run the GDB testsuite in parallel mode, so your
>> assumption that the testsuite would stop immediately anyway doesn't
>> hold.  One of the parallel runtest invocations bombs
>> out, while the others keep running, so by the time you get
>> back the prompt, other testcases already executed, and you don't
>> see the ERROR in the current terminal anymore.
> 
> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs.

That's like arguing against parallelization, and saying that everyone
should run the testsuite in serial mode, and be OK with it taking 2 hours
instead of 30 mins to complete a run...

The parallel mode aggregates the summaries from the different runtest
invocations as a single summary at the end,

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD

so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
doesn't contain those either.

> 
>>   You'll need to
>> look at the testsuite .sum .log files anyway.  Only, you will
>> find out that one testcase bombed out, 50000 tests ran, but 1000 others
>> didn't run because of that one testcase.  And now you're missing
>> those 1000 results, which maybe included those which you were
>> actually interested in, because you were doing a builbot try run
>> across a number of systems.
>>   
> 
> This also applies on a smaller scale for which we have no workaround:  gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did not run.  The only real solution is for the test scripts to avoid causing Tcl errors in the first place; anything else is papering over the problem with duct tape.

That's not the same thing.  The remaining /N/ tests in the file depend
on the previous /M/ tests.  E.g.,

 gdb_test "step"
 gdb_test "print foo"

If the "step" fails, the "print foo" test is meaningless.

However, there should be absolutely no dependency between foo.exp and
bar.exp.  It should be possible to run those two different testcases
in any order.  Thus, it should be irrelevant to foo.exp whether bar.exp
bombs out of not.  Consequently, there's no real reason to not let
foo.exp run regardless of what happens to bar.exp.

> 
>> Now, when running the testsuite against slow embedded boards, then
>> you won't normally be using the parallel mode, the board just can't
>> support multiple parallel connections.  In that case, you set the
>> testsuite running, and go on your merry way for a couple hours or 20.
>> When you come back, maybe the next day, you notice that the testsuite
>> actually stopped running 30 mins in, because someone changed
>> the testsuite upstream just before you rebased in way that caused a
>> tcl error for you in some testcase.  Oops.  You just lost a day.
> 
> This is more of an annoyance, and yes, I have come back the next day to find that a long-running command failed shortly after being started, so I know this irritation.

There you go.

> 
>>   Now
>> you wish you had remembered to type --keep-going, but it's not
>> the default, and nobody ever writes Tcl bugs, so you didn't think
>> of it before.
>>   
> 
> Note that the snapshot of GDB I have been using for testing this patch does have such a Tcl bug in its testsuite.  I did not make any particular effort to choose this snapshot, it was just the current state of GDB master when I started.  I have not even looked at *which* commit it is. 

Can you tell me what the error looks like?

> Thinking "nobody ever writes Tcl bugs" is what causes these problems in the first place.

I don't know what you mean by this.  What I meant is that people likely won't
remember to use --keep-going when invoking runtest interactively.  Why force
people to go through the pain of learning the option?

> 
> I would expect that, had DejaGnu always aborted immediately upon catching Tcl errors from testsuites, that issue would have been found and fixed quickly.

As I've explained, that "aborted immediately" is just impossible by design with
parallel mode.  You could approximate it perhaps, by making make abort all the
parallel runtests when one of the runtests bombs out.  It won't be immediately.
And I'd of course think that adding such feature wouldn't help anyone in reality.

And with your "quickly", you're assuming that a Tcl errors typically happen in the
normal code paths.  If it happens on some timeout path as I explained, then it may be
that it's not that easy to trigger the error, unless you happen to run the
testsuite on a "lucky" system with a "wrong" combination of environment
(compiler, glibc, kernel, etc.).

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 11:45:02 GMT) Full text and rfc822 format available.

Message #70 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:44:22 +0100

On 6/27/20 12:40 PM, Pedro Alves wrote:
> 
> Could this be solved by putting each testcase in its own address space?

I meant to say namespace instead.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 12:52:01 GMT) Full text and rfc822 format available.

Message #73 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Rob Savoye <rob <at> senecass.com>
To: jcb62281 <at> gmail.com
Cc: Pedro Alves <palves <at> redhat.com>, Tom de Vries <tdevries <at> suse.de>,
 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 06:49:30 -0600

On 6/25/20 7:53 PM, Jacob Bachmeyer wrote:

> The original rationale for using UNRESOLVED here was that DejaGnu
> converts a test result to UNRESOLVED if too many errors or warnings were
> produced.  The DejaGnu manual indicates (section "A POSIX Compliant Test
> Framework") that UNRESOLVED is correct for a test where execution was
> interrupted or was set up incorrectly.  UNTESTED is specifically listed
> as a placeholder for an as-yet-unwritten testcase.  A "typical" GDB
> testsuite run has almost a hundred UNTESTED results but (without this
> patch) zero UNRESOLVED results.
> 
> To me, this seems that UNRESOLVED is correct here, or that the manual
> has an error.

  So having written both, there is always the chance I've interpreted
things differently. If the test is interrupted, or has errors/warnings,
like timing problems for example, then it's UNRESOLVED. But a bug in Tcl
code enough to trigger unknown is UNTESTED, as the test never really
ran. I'd go with which definition the toolchain teams prefer, as this
effects validation testruns. I could go either way.

  I don't know if POSIX 1003.3 is still considered a standard, that was
a long time ago. The testsuites were primarily focused on XOpen and
POSIX conformance tests, not toolchains. I was
on the standards committee, so used them as a model cause they seemed a
good standard. But the details of the interpretation can now be whatever
we want it to be. We're probably the only ones left running TET
compliant testsuites anyway. :-)

  Anyway, whether a test run aborts on an error or not is not defined,
so it's up to us to decide.

	- rob -

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 13:16:02 GMT) Full text and rfc822 format available.

Message #76 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com, Pedro Alves <pedro <at> palves.net>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 14:15:20 +0100

On 6/27/20 12:40 PM, Pedro Alves wrote:
> We
> have distinct XFAIL vs KFAIL output messages show up in the .sum file (though
> those aren't distinguished in the summary file, doing so would be convenient,
> hence I see that as a GDB testsuite bug).
> 
Sorry, somehow I got confused and thought KFAIL was GDB specific
because AFAIK GCC doesn't use it.  DejaGnu does show the KFAILs separated
in the summary.  I don't why I got confused, I've actually read the kfail code
in DejaGnu before.  Oh well.  The point still stands with DUPLICATE and PATH
output results.  And KFAIL isn't POSIX either, I think?

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 17:22:02 GMT) Full text and rfc822 format available.

Message #79 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:21:34 -0500

Pedro Alves wrote:
> On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
>>>  
>>>       
>>>> Pedro Alves wrote:
>>>>    
>>>>         
>
> [...]
>
> Could this be solved by putting each testcase in its own address space?
> Yes.  Do people actually do that?  No -- it's a lot of make work and the
> problem is rare.  Is that reuse global as array really revealing such a MAJOR
> issue with the GDB testcase that it warrants stopping the whole run?  No...
>
> I would argue that instead it is the DejaGnu framework that should make
> it so that each testcase is run under as much isolation as possible.
> So I call that a design issue in DejaGnu itself.  Whether that could
> be fixed by something like sourcing each testcase under its own
> Tcl slave interpreter, I'm not sure.
>   

This is the current plan -- to eventually run test scripts in 
independent slave interpreters, which enforces the isolation necessary 
at the Tcl level for future native parallel testing and also prevents 
test scripts from accidentally clobbering globals that are important to 
DejaGnu, some of which have very generic names.

> Another example of a Tcl bug that sometimes we trip on.  Say we have a
> testcase that needs to extract some address from the inferior
> in its setup phase, something like this:
>
>  set test "get address"
>  gdb_test_multiple "x /i \$pc" $test {
>      -re "($hex).*$gdb_prompt $" {
>          set addr $expect_out(1,string)
>          pass $test
>      }
>  }
>  
>  if {$addr == 0} {
>    # The above failed, so no use continuing.
>    return
>  }
>
> Now, gdb_test_multiple is a wrapper around expect, that internally catches
> GDB crashes, internal GDB errors, timeouts and other things, so the bug above
> is often missed.  The bug is of course that when e.g., a timeout occurs, $addr
> is left uninitialized, so you get a Tcl error referencing a nonexisting variable.
>
> But the thing is, we already issued a FAIL, and we want to bail out of
> the testcase anyway.  Crashing the whole testrun for a bug that typically
> happens in already-failing code is too strict.  And these are the
> code paths that are not exercised in a normal run, so are 
> easy to miss, and are just impossible to have full coverage for.
>   

Fine; I will loosen that, but DejaGnu has no way to know that it is 
bailing out of a failing testcase and that there are no more tests from 
that script, so an UNRESOLVED result complaining about the error will 
still be inserted.

           

>>>>> I think it's safe to say that we all understand the general principle,
>>>>> but IMO that applies more to continuing the same testcase (if somehow that
>>>>> were possible, like "unknown" suppressing the error), rather than continuing
>>>>> with a different testcase.
>>>>>
>>>>> Continuing the testrun when one testcase errors out does not mask any
>>>>> bug in the program that is supposed to be tested.  Why would it?  The
>>>>> testcase aborts, it doesn't continue.  The remaining testcases start
>>>>> afresh.
>>>>>         
>>>>>           
>>>> Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp (which tests attaching to a process).  Unknown to the GDB developers, the "attach" command does not work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point where "attach" is issued to the inferior GDB.  If DejaGnu stops immediately, it is obvious that the testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu sweeps the Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the bug in GDB , because the developers think the "attach" command is being exercised, but those tests are being (almost) silently skipped.
>>>>     
>>>>         
>>> First, that scenario doesn't normally cause a Tcl error.  If GDB trips
>>> an internal error, the GDB testsuite catches it with a FAIL.  If GDB just dies
>>> (whether it dumps core or not) the GDB testsuite catches that
>>> with UNRESOLVED.
>>>   
>>>       
>> In the hypothetical scenario, attach.exp crashes with a Tcl error *before* issuing the "attach" command to the GDB-under-test.  Previous releases of DejaGnu mention a Tcl error in the log and carry on.  With this patch, DejaGnu catches that attach.exp crashed and itself inserts an UNRESOLVED result.
>>
>> I also take this as more support for using the UNRESOLVED status here:  the test script crashing is little different from GDB exiting unexpectedly.
>>     
>
> Well, I'm actually not sure why doesn't GDB FAIL in the crash case instead of
> UNRESOLVED.  To me a GDB internal error or the GDB process crashing doesn't
> look very different.  Both require pretty much the same level of attention
> from a developer.  But that decision predates me.  Who knows, maybe GDB
> internal error support was only added after the testsuite was originally
> written, and thus the crash handling predates the internal error handling.
>   

Considering that testing GDB was one of the motivations for writing 
DejaGnu and how old DejaGnu is, I suspect that the testsuite had to 
handle GDB crashes before GDB had support for reporting internal errors.

In some sense, UNRESOLVED is more serious than FAIL; the GDB testsuite 
has about a hundred FAIL results but typically no UNRESOLVED results.

>>> And again, people run the GDB testsuite in parallel mode, so your
>>> assumption that the testsuite would stop immediately anyway doesn't
>>> hold.  One of the parallel runtest invocations bombs
>>> out, while the others keep running, so by the time you get
>>> back the prompt, other testcases already executed, and you don't
>>> see the ERROR in the current terminal anymore.
>>>       
>> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs.
>>     
>
> That's like arguing against parallelization, and saying that everyone
> should run the testsuite in serial mode, and be OK with it taking 2 hours
> instead of 30 mins to complete a run...
>   

It is a problem with parallelization, just like aborting a test run is a 
problem when running parallel tests.

> The parallel mode aggregates the summaries from the different runtest
> invocations as a single summary at the end,
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD
>
> so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
> ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
> doesn't contain those either.
>   

The ERRORs will be printed a second time, so all the script needs to do 
is collect text that appears after the summary and starts with 
"ERROR:".  There seems to also be some confusion between Tcl errors and 
the ERROR messages produce by DejaGnu's perror procedure (or its 
internal equivalents).  Would it be better to report Tcl errors with 
"CRASH" instead of "ERROR"?

>>>   You'll need to
>>> look at the testsuite .sum .log files anyway.  Only, you will
>>> find out that one testcase bombed out, 50000 tests ran, but 1000 others
>>> didn't run because of that one testcase.  And now you're missing
>>> those 1000 results, which maybe included those which you were
>>> actually interested in, because you were doing a builbot try run
>>> across a number of systems.
>>>   
>>>       
>> This also applies on a smaller scale for which we have no workaround:  gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did not run.  The only real solution is for the test scripts to avoid causing Tcl errors in the first place; anything else is papering over the problem with duct tape.
>>     
>
> That's not the same thing.  The remaining /N/ tests in the file depend
> on the previous /M/ tests.  E.g.,
>
>  gdb_test "step"
>  gdb_test "print foo"
>
> If the "step" fails, the "print foo" test is meaningless.
>
> However, there should be absolutely no dependency between foo.exp and
> bar.exp.  It should be possible to run those two different testcases
> in any order.  Thus, it should be irrelevant to foo.exp whether bar.exp
> bombs out of not.  Consequently, there's no real reason to not let
> foo.exp run regardless of what happens to bar.exp.
>   

This will need to be documented... another item for the local TODO list.

>>>   Now
>>> you wish you had remembered to type --keep-going, but it's not
>>> the default, and nobody ever writes Tcl bugs, so you didn't think
>>> of it before.
>>>   
>>>       
>> Note that the snapshot of GDB I have been using for testing this patch does have such a Tcl bug in its testsuite.  I did not make any particular effort to choose this snapshot, it was just the current state of GDB master when I started.  I have not even looked at *which* commit it is. 
>>     
>
> Can you tell me what the error looks like?
>   

With commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb and GDB built 
without Python support:
ERROR: tcl error sourcing 
/home/jacob/arena/gdb-check/build-queue-test/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-format-string.exp.
ERROR: invoked "continue" outside of a loop


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 17:30:02 GMT) Full text and rfc822 format available.

Message #82 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Rob Savoye <rob <at> senecass.com>
Cc: Pedro Alves <palves <at> redhat.com>, Tom de Vries <tdevries <at> suse.de>,
 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:29:45 -0500

Rob Savoye wrote:
> On 6/25/20 7:53 PM, Jacob Bachmeyer wrote:
>   
>> The original rationale for using UNRESOLVED here was that DejaGnu
>> converts a test result to UNRESOLVED if too many errors or warnings were
>> produced.  The DejaGnu manual indicates (section "A POSIX Compliant Test
>> Framework") that UNRESOLVED is correct for a test where execution was
>> interrupted or was set up incorrectly.  UNTESTED is specifically listed
>> as a placeholder for an as-yet-unwritten testcase.  A "typical" GDB
>> testsuite run has almost a hundred UNTESTED results but (without this
>> patch) zero UNRESOLVED results.
>>
>> To me, this seems that UNRESOLVED is correct here, or that the manual
>> has an error.
>>     
>
>   So having written both, there is always the chance I've interpreted
> things differently. If the test is interrupted, or has errors/warnings,
> like timing problems for example, then it's UNRESOLVED. But a bug in Tcl
> code enough to trigger unknown is UNTESTED, as the test never really
> ran. I'd go with which definition the toolchain teams prefer, as this
> effects validation testruns. I could go either way.
>   

I see UNTESTED as a placeholder for a yet-to-be-written test; DejaGnu 
does not produce an error exit status after reporting UNTESTED.

Once Tcl has begun to execute "source $test_file_name", I see the test 
as running from the framework's perspective, so an error exit should be 
UNRESOLVED:  we do not know the result that would have been produced had 
the Tcl error not occurred.

>   I don't know if POSIX 1003.3 is still considered a standard, that was
> a long time ago. The testsuites were primarily focused on XOpen and
> POSIX conformance tests, not toolchains. I was
> on the standards committee, so used them as a model cause they seemed a
> good standard. But the details of the interpretation can now be whatever
> we want it to be. We're probably the only ones left running TET
> compliant testsuites anyway. :-)
>   

POSIX 1003.3 appears to have been withdrawn.

>   Anyway, whether a test run aborts on an error or not is not defined,
> so it's up to us to decide.

Apparently, GDB has baked an assumption that the test run will not 
abort, ever, into their testsuite, and we had never documented 
circumstances that cause DejaGnu to abort, so I cannot really answer 
that with "your assumption is wrong".


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 17:43:02 GMT) Full text and rfc822 format available.

Message #85 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: 41824 <at> debbugs.gnu.org, Tom de Vries <tdevries <at> suse.de>,
 Pedro Alves <pedro <at> palves.net>
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 12:42:05 -0500

Pedro Alves wrote:
> On 6/26/20 11:58 PM, Jacob Bachmeyer wrote:
>
>   
>>> The issue for me is that the "gdb Summary" tally did not
>>> mention the aborted testcase.  I would like to see something
>>> like this instead:
>>>
>>>  # of expected passes            208
>>>  # of nasty OMG FIX! tcl errors    1
>>>
>>> :-)
>>>
>>> Your UNRESOLVED is of course better than the status quo.
>>>   
>>>       
>> UNRESOLVED is a step in the right direction, but I agree that more is needed.  We are somewhat limited by POSIX, which does not define a separate result type for "test case crashed" and seems to roll that into UNRESOLVED, although POSIX appears to require a testsuite to have a strictly-defined set of tests and that a run must produce results for exactly all of them, which requires information DejaGnu does not normally have about the running testsuite.  So, while UNRESOLVED is needed, we could also maintain a separate count of UNRESOLVED-due-to-crash, or simply store away the error message/errorCode/errorInfo tuples and emit them a second time at the end of the run, also easy in Tcl 8.
>>     
>
> I'm not sure how much is POSIX compliance important these days.  I would hope that that
> would not limit DejaGnu's development and limit its potential to be more usable.  

We already have extensions; POSIX only allowed 
PASS/FAIL/UNTESTED/UNRESOLVED/UNSUPPORTED.  [KX]{PASS,FAIL} are GNU 
extensions.

> Even
> if POSIX compliance is important, I would think that that could be put behind
> a --posixly_correct option, and/or POSIXLY_CORRECT environment variable, like other
> GNU tools do:
>
>  https://www.gnu.org/prep/standards/html_node/Non_002dGNU-Standards.html
>
> Or conversely, we could have a --gnu mode, though I would say that a GNU tool
> should behave in GNU mode by default.
>   

POSIX compliance in DejaGnu depended on the testsuite being run.  Any 
results generated by the framework itself need to be in the POSIX set, 
though testsuites can use extensions, as GDB has done.  Actually adding 
more test result codes is dubious, as that will interact badly with XML 
output and efficient database processing, both of which can reasonably 
enumerate the nine result codes.  ({,K,X}{PASS,FAIL}, 
UN{TEST,RESOLV,SUPPORT}ED)

> I really don't buy that adding different result types is an issue, as
> hinted earlier.  We've actually done it in GDB, for a loooong while.  We
> have distinct XFAIL vs KFAIL output messages show up in the .sum file (though
> those aren't distinguished in the summary file, doing so would be convenient,
> hence I see that as a GDB testsuite bug).
>
> And just very recently, we've added PATH and DUPLICATE output messages,
> and those do show up in the summary:
>
> # of expected passes            89593
> # of unexpected failures        547
> # of expected failures          146
> # of known failures             104
> # of untested testcases         20
> # of unresolved testcases       5
> # of unsupported tests          97
> # of paths in test names        1      <<<<<
> # of duplicate test names       370    <<<<<
>
> I would think that even these two would be useful to have in
> DejaGnu proper for all testsuites.
>
> v1/rationale     - https://sourceware.org/pipermail/gdb-patches/2020-April/167847.html
> v3/final version - https://sourceware.org/pipermail/gdb-patches/2020-April/168105.html
>   

I will look at the exact details later, but this sounds like it is 
probably a monkeypatch that will stop working in some future version 
when the procedures you are overriding disappear into an internal 
namespace, so we need to be thinking about what would be needed to 
provide extensible summary line sets, just as we need some way to extend 
language support in default_target_compile without overriding the entire 
procedure.


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Sat, 27 Jun 2020 17:54:01 GMT) Full text and rfc822 format available.

Message #88 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Sat, 27 Jun 2020 18:53:19 +0100

On 6/27/20 6:21 PM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:

>>> Note that the snapshot of GDB I have been using for testing this patch does have such a Tcl bug in its testsuite.  I did not make any particular effort to choose this snapshot, it was just the current state of GDB master when I started.  I have not even looked at *which* commit it is.     
>>
>> Can you tell me what the error looks like?
>>   
> 
> With commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb and GDB built without Python support:
> ERROR: tcl error sourcing /home/jacob/arena/gdb-check/build-queue-test/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-format-string.exp.
> ERROR: invoked "continue" outside of a loop

Thanks.  It's already been fixed:

commit 05e682e3be7e3d9d63ec358dcf8943fd200545cb
Author:     Sandra Loosemore <sandra <at> codesourcery.com>
AuthorDate: Wed Jun 17 13:43:32 2020 -0700
Commit:     Sandra Loosemore <sandra <at> codesourcery.com>
CommitDate: Wed Jun 17 13:43:32 2020 -0700

    Fix TCL error in gdb.python/py-format-string.exp.
    
    2020-06-17 Sandra Loosemore <sandra <at> codesourcery.com>
    
            gdb/testsuite/
            * gdb.python/py-format-string.exp: Move test for python support
            earlier, out of function body.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 30 Jun 2020 04:41:01 GMT) Full text and rfc822 format available.

Message #91 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Mon, 29 Jun 2020 23:40:43 -0500

Pedro Alves wrote:
> On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> And again, people run the GDB testsuite in parallel mode, so your
>>> assumption that the testsuite would stop immediately anyway doesn't
>>> hold.  One of the parallel runtest invocations bombs
>>> out, while the others keep running, so by the time you get
>>> back the prompt, other testcases already executed, and you don't
>>> see the ERROR in the current terminal anymore.
>>>       
>> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs.
>>     
>
> That's like arguing against parallelization, and saying that everyone
> should run the testsuite in serial mode, and be OK with it taking 2 hours
> instead of 30 mins to complete a run...
>
> The parallel mode aggregates the summaries from the different runtest
> invocations as a single summary at the end,
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD
>
> so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
> ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
> doesn't contain those either.
>   

The current version of the fix for this issue (commit 
61dc0cafad8845b3c668940ed2e574bd503d410f on temporary branch PR41918) 
generates separator lines containing at least 10 hyphens around each 
error when repeating the errors after the summary.  The leading 
separator line currently begins with "ERROR: " due to limitations of the 
internal interfaces, but plans are to eventually fix that and collapse 
the separator lines between adjacent error reports if multiple Tcl 
errors occur in a run.  Each error report begins with "ERROR: in 
testcase " and ends with a Tcl backtrace and a line of at least 10 
hyphens.  Is this something the GDB results aggregation tool can be 
extended to handle?

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Tue, 30 Jun 2020 12:27:02 GMT) Full text and rfc822 format available.

Message #94 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 30 Jun 2020 13:26:25 +0100

On 6/30/20 5:40 AM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
>>  
>>> Pedro Alves wrote:
>>>    
>>>> And again, people run the GDB testsuite in parallel mode, so your
>>>> assumption that the testsuite would stop immediately anyway doesn't
>>>> hold.  One of the parallel runtest invocations bombs
>>>> out, while the others keep running, so by the time you get
>>>> back the prompt, other testcases already executed, and you don't
>>>> see the ERROR in the current terminal anymore.
>>>>       
>>> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs.
>>>     
>>
>> That's like arguing against parallelization, and saying that everyone
>> should run the testsuite in serial mode, and be OK with it taking 2 hours
>> instead of 30 mins to complete a run...
>>
>> The parallel mode aggregates the summaries from the different runtest
>> invocations as a single summary at the end,
>>
>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD
>>
>> so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
>> ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
>> doesn't contain those either.
>>   
> 
> The current version of the fix for this issue (commit 61dc0cafad8845b3c668940ed2e574bd503d410f on temporary branch PR41918) generates separator lines containing at least 10 hyphens around each error when repeating the errors after the summary.  The leading separator line currently begins with "ERROR: " due to limitations of the internal interfaces, but plans are to eventually fix that and collapse the separator lines between adjacent error reports if multiple Tcl errors occur in a run.  Each error report begins with "ERROR: in testcase " and ends with a Tcl backtrace and a line of at least 10 hyphens.  Is this something the GDB results aggregation tool can be extended to handle?

Your guess is as good as mine, since I never touched that code.

Thanks,
Pedro Alves

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 30 Jun 2020 23:50:02 GMT) Full text and rfc822 format available.

Message #97 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Tue, 30 Jun 2020 18:49:11 -0500

Pedro Alves wrote:
> On 6/30/20 5:40 AM, Jacob Bachmeyer wrote:
>   
>> Pedro Alves wrote:
>>     
>>> On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
>>>  
>>>       
>>>> Pedro Alves wrote:
>>>>    
>>>>         
>>>>> And again, people run the GDB testsuite in parallel mode, so your
>>>>> assumption that the testsuite would stop immediately anyway doesn't
>>>>> hold.  One of the parallel runtest invocations bombs
>>>>> out, while the others keep running, so by the time you get
>>>>> back the prompt, other testcases already executed, and you don't
>>>>> see the ERROR in the current terminal anymore.
>>>>>       
>>>>>           
>>>> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs.
>>>>     
>>>>         
>>> That's like arguing against parallelization, and saying that everyone
>>> should run the testsuite in serial mode, and be OK with it taking 2 hours
>>> instead of 30 mins to complete a run...
>>>
>>> The parallel mode aggregates the summaries from the different runtest
>>> invocations as a single summary at the end,
>>>
>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD
>>>
>>> so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
>>> ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
>>> doesn't contain those either.
>>>   
>>>       
>> The current version of the fix for this issue (commit 61dc0cafad8845b3c668940ed2e574bd503d410f on temporary branch PR41918) generates separator lines containing at least 10 hyphens around each error when repeating the errors after the summary.  The leading separator line currently begins with "ERROR: " due to limitations of the internal interfaces, but plans are to eventually fix that and collapse the separator lines between adjacent error reports if multiple Tcl errors occur in a run.  Each error report begins with "ERROR: in testcase " and ends with a Tcl backtrace and a line of at least 10 hyphens.  Is this something the GDB results aggregation tool can be extended to handle?
>>     
>
> Your guess is as good as mine, since I never touched that code.
>   

After briefly looking through that script, I expect that it will not be 
particularly difficult, although I also note that the script counts the 
error messages generated by DejaGnu's ::unknown procedure, so 
lib/framework.exp:unknown cannot be withdrawn as useless until 1.7.


-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org:
bug#41824; Package dejagnu. (Tue, 30 Jun 2020 23:56:01 GMT) Full text and rfc822 format available.

Message #100 received at 41824 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: 41824 <at> debbugs.gnu.org, 41918 <at> debbugs.gnu.org
Subject: Branch PR41918 containing fixes for these issues is into final testing
Date: Tue, 30 Jun 2020 18:54:51 -0500

The temporary branch PR41918 (which earlier absorbed the PR41824 
temporary branch) is now merged onto the testing queue ahead of master.  
After these patches pass full testing, master will be fast-forwarded to 
include them.

-- Jacob

Reply sent to jcb62281 <at> gmail.com:
You have taken responsibility. (Wed, 08 Jul 2020 23:54:02 GMT) Full text and rfc822 format available.

Notification sent to Tom de Vries <tdevries <at> suse.de>:
bug acknowledged by developer. (Wed, 08 Jul 2020 23:54:02 GMT) Full text and rfc822 format available.

Message #105 received at 41824-done <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Pedro Alves <palves <at> redhat.com>
Cc: Tom de Vries <tdevries <at> suse.de>, 41824-done <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Wed, 08 Jul 2020 18:52:56 -0500

The bugfix branch has landed on master and will be included in the 
upcoming 1.6.3 release.

-- Jacob

Information forwarded to bug-dejagnu <at> gnu.org, jcb62281 <at> gmail.com:
bug#41824; Package dejagnu. (Thu, 09 Jul 2020 12:18:01 GMT) Full text and rfc822 format available.

Message #108 received at 41824-done <at> debbugs.gnu.org (full text, mbox):

From: Pedro Alves <palves <at> redhat.com>
To: jcb62281 <at> gmail.com
Cc: Tom de Vries <tdevries <at> suse.de>, 41824-done <at> debbugs.gnu.org
Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when
 triggered in test-case
Date: Thu, 9 Jul 2020 13:17:06 +0100

On 7/9/20 12:52 AM, Jacob Bachmeyer wrote:
> The bugfix branch has landed on master and will be included in the upcoming 1.6.3 release.

Thanks for all the work, and for the patience.

Pedro Alves

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 07 Aug 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 343 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #41824 Dejagnu's unknown proc aborts testsuite run when triggered in test-case

GNU bug report logs - #41824
Dejagnu's unknown proc aborts testsuite run when triggered in test-case