GNU bug report logs - #9236
Fwd: Join

Previous Next

Package: coreutils;

Reported by: "David Gast" <dgast <at> csulb.edu>

Date: Thu, 4 Aug 2011 04:16:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9236 in the body.
You can then email your comments to 9236 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9236; Package coreutils. (Thu, 04 Aug 2011 04:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "David Gast" <dgast <at> csulb.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 04 Aug 2011 04:16:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "David Gast" <dgast <at> csulb.edu>
To: bug-coreutils <at> gnu.org
Subject: Fwd: Join
Date: Wed, 03 Aug 2011 19:48:06 -0700
[Message part 1 (text/plain, inline)]
Oops, I hit the wrong button ...

   cat > /tmp/x <<!
   b
   a
   !
   ln /tmp/x /tmp/y
   sort -c /tmp/x
   join --check-order /tmp/x /tmp/y
   # Note: The two files do not have to be the same.

Output is

   sort: /tmp/x:2: disorder: a
   join: file 1 is not in sorted order

Thanks for your consideration.




  --- the forwarded message follows ---
[Message part 2 (message/rfc822, inline)]
From: "David Gast" <dgast <at> csulb.edu>
To: bug-coreutils <at> gnu.org
Subject: Join
Date: Wed, 03 Aug 2011 19:43:31 -0700
When there is disorder, could you please provide the line
number like the command and option
   sort -c
does?  Note: join seems to report disorder in file 2 only
if there is no disorder in file 1.

You try the following code



Thanks


Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9236; Package coreutils. (Thu, 04 Aug 2011 14:53:02 GMT) Full text and rfc822 format available.

Message #8 received at 9236 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: David Gast <dgast <at> csulb.edu>
Cc: 9236 <at> debbugs.gnu.org
Subject: Re: bug#9236: Fwd: Join
Date: Thu, 04 Aug 2011 08:51:24 -0600
merge 9235 9236
thanks

On 08/03/2011 08:48 PM, David Gast wrote:
> Oops, I hit the wrong button ...
>
> cat > /tmp/x <<!
> b
> a
> !
> ln /tmp/x /tmp/y
> sort -c /tmp/x
> join --check-order /tmp/x /tmp/y
> # Note: The two files do not have to be the same.
>
> Output is
>
> sort: /tmp/x:2: disorder: a
> join: file 1 is not in sorted order

This sounds like a reasonable idea!  Would you like to contribute the patch?

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9236; Package coreutils. (Thu, 04 Aug 2011 17:50:02 GMT) Full text and rfc822 format available.

Message #11 received at 9236 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9236 <at> debbugs.gnu.org, David Gast <dgast <at> csulb.edu>
Subject: Re: bug#9236: Fwd: Join
Date: Thu, 04 Aug 2011 19:48:20 +0200
Eric Blake wrote:
> merge 9235 9236
> thanks
>
> On 08/03/2011 08:48 PM, David Gast wrote:
>> Oops, I hit the wrong button ...
>>
>> cat > /tmp/x <<!
>> b
>> a
>> !
>> ln /tmp/x /tmp/y
>> sort -c /tmp/x
>> join --check-order /tmp/x /tmp/y
>> # Note: The two files do not have to be the same.
>>
>> Output is
>>
>> sort: /tmp/x:2: disorder: a
>> join: file 1 is not in sorted order
>
> This sounds like a reasonable idea!  Would you like to contribute the patch?

I started looking at this, and among other things saw
a diagnostic that mentioned "file 1", which would do
much better to mention the actual file name, so embarked.
Here's a preliminary patch (not even a decent ChangeLog entry
and the join test still needs to be updated):

    $ printf '%s\n' b a c > in
    $ ./join --check-order in in
    ./join: in:2: is not sorted: a
    [Exit 1]


From adf709ba6a8d934e8f90cafada824221e1c6eb18 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 4 Aug 2011 19:31:50 +0200
Subject: [PATCH] join: FIXME: check: print both file name and line number

---
 src/join.c |   29 +++++++++++++++++++----------
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/join.c b/src/join.c
index 99d918f..368d0db 100644
--- a/src/join.c
+++ b/src/join.c
@@ -89,6 +89,12 @@ struct seq
 /* The previous line read from each file. */
 static struct line *prevline[2] = {NULL, NULL};

+/* The number of lines read from each file. */
+static uintmax_t line_no[2] = {0, 0};
+
+/* The input file names.  */
+static char *g_names[2];
+
 /* This provides an extra line buffer for each file.  We need these if we
    try to read two consecutive lines into the same buffer, since we don't
    want to overwrite the previous buffer before we check order. */
@@ -386,7 +392,10 @@ check_order (const struct line *prev,
             {
               error ((check_input_order == CHECK_ORDER_ENABLED
                       ? EXIT_FAILURE : 0),
-                     0, _("file %d is not in sorted order"), whatfile);
+                     0, _("%s:%ju: is not sorted: %.*s"),
+                     g_names[whatfile], line_no[whatfile],
+                     current->buf.length-1, /* FIXME should be int */
+                     current->buf.buffer);

               /* If we get to here, the message was just a warning, but we
                  want only to issue it once. */
@@ -436,6 +445,7 @@ get_line (FILE *fp, struct line **linep, int which)
       freeline (line);
       return false;
     }
+  ++line_no[which];

   xfields (line);

@@ -980,7 +990,6 @@ main (int argc, char **argv)
   int prev_optc_status = MUST_BE_OPERAND;
   int operand_status[2];
   int joption_count[2] = { 0, 0 };
-  char *names[2];
   FILE *fp1, *fp2;
   int optc;
   int nfiles = 0;
@@ -1100,7 +1109,7 @@ main (int argc, char **argv)
           break;

         case 1:		/* Non-option argument.  */
-          add_file_name (optarg, names, operand_status, joption_count,
+          add_file_name (optarg, g_names, operand_status, joption_count,
                          &nfiles, &prev_optc_status, &optc_status);
           break;

@@ -1122,7 +1131,7 @@ main (int argc, char **argv)
   /* Process any operands after "--".  */
   prev_optc_status = MUST_BE_OPERAND;
   while (optind < argc)
-    add_file_name (argv[optind++], names, operand_status, joption_count,
+    add_file_name (argv[optind++], g_names, operand_status, joption_count,
                    &nfiles, &prev_optc_status, &optc_status);

   if (nfiles != 2)
@@ -1148,20 +1157,20 @@ main (int argc, char **argv)
   if (join_field_2 == SIZE_MAX)
     join_field_2 = 0;

-  fp1 = STREQ (names[0], "-") ? stdin : fopen (names[0], "r");
+  fp1 = STREQ (g_names[0], "-") ? stdin : fopen (g_names[0], "r");
   if (!fp1)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
-  fp2 = STREQ (names[1], "-") ? stdin : fopen (names[1], "r");
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
+  fp2 = STREQ (g_names[1], "-") ? stdin : fopen (g_names[1], "r");
   if (!fp2)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);
   if (fp1 == fp2)
     error (EXIT_FAILURE, errno, _("both files cannot be standard input"));
   join (fp1, fp2);

   if (fclose (fp1) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
   if (fclose (fp2) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);

   if (issued_disorder_warning[0] || issued_disorder_warning[1])
     exit (EXIT_FAILURE);
--
1.7.4.4




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9236; Package coreutils. (Sat, 06 Aug 2011 19:42:01 GMT) Full text and rfc822 format available.

Message #14 received at 9236 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9236 <at> debbugs.gnu.org, David Gast <dgast <at> csulb.edu>
Subject: Re: bug#9236: Fwd: Join
Date: Sat, 06 Aug 2011 21:40:09 +0200
Jim Meyering wrote:
...
> I started looking at this, and among other things saw
> a diagnostic that mentioned "file 1", which would do
> much better to mention the actual file name, so embarked.
> Here's a preliminary patch (not even a decent ChangeLog entry
> and the join test still needs to be updated):
>
>     $ printf '%s\n' b a c > in
>     $ ./join --check-order in in
>     ./join: in:2: is not sorted: a
>     [Exit 1]
>
> Subject: [PATCH] join: FIXME: check: print both file name and line number
>
> ---
>  src/join.c |   29 +++++++++++++++++++----------
>  1 files changed, 19 insertions(+), 10 deletions(-)

Here's a much better patch.

From 2e4ca5100dcc3229e9937c48aed3dc475bb507ea Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 4 Aug 2011 19:31:50 +0200
Subject: [PATCH] join: with --check-order print offending file name, line
 number and data

* src/join (g_names): New global (was main's "names").
(main): Update all uses of "names".
(line_no[2]): New globals.
(get_line): Increment after reading each line.
(check_order): Print the standard "file name:line_no: " prefix
as well as the offending line when reporting disorder.
Here is a sample old/new comparison:
  -join: file 1 is not in sorted order
  +join: in:4: is not sorted: contents-of-line-4
* tests/misc/join: Change the two affected tests to expect
the new diagnostic.
Add new tests for more coverage: mismatch in file 2,
two diagnostics, zero-length out-of-order line.
* NEWS (Improvements): Mention it.
---
 NEWS            |    3 +++
 src/join.c      |   43 ++++++++++++++++++++++++++++++-------------
 tests/misc/join |   20 ++++++++++++++++++--
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/NEWS b/NEWS
index 2e48497..6e24f5c 100644
--- a/NEWS
+++ b/NEWS
@@ -66,6 +66,9 @@ GNU coreutils NEWS                                    -*- outline -*-
   df now supports disk partitions larger than 4 TiB on MacOS X 10.5
   or newer and on AIX 5.2 or newer.

+  join --check-order now prints "join: FILE:LINE_NUMBER: bad_line" for an
+  unsorted input, rather than e.g., "join: file 1 is not in sorted order".
+
   shuf outputs small subsets of large permutations much more efficiently.
   For example `shuf -i1-$((2**32-1)) -n2` no longer exhausts memory.

diff --git a/src/join.c b/src/join.c
index 99d918f..694fb55 100644
--- a/src/join.c
+++ b/src/join.c
@@ -86,9 +86,15 @@ struct seq
     struct line **lines;
   };

-/* The previous line read from each file. */
+/* The previous line read from each file.  */
 static struct line *prevline[2] = {NULL, NULL};

+/* The number of lines read from each file.  */
+static uintmax_t line_no[2] = {0, 0};
+
+/* The input file names.  */
+static char *g_names[2];
+
 /* This provides an extra line buffer for each file.  We need these if we
    try to read two consecutive lines into the same buffer, since we don't
    want to overwrite the previous buffer before we check order. */
@@ -384,12 +390,23 @@ check_order (const struct line *prev,
           size_t join_field = whatfile == 1 ? join_field_1 : join_field_2;
           if (keycmp (prev, current, join_field, join_field) > 0)
             {
+              /* Exclude any trailing newline. */
+              size_t len = current->buf.length;
+              if (0 < len && current->buf.buffer[len - 1] == '\n')
+                --len;
+
+              /* If the offending line is longer than INT_MAX, output
+                 only the first INT_MAX bytes in this diagnostic.  */
+              len = MIN (INT_MAX, len);
+
               error ((check_input_order == CHECK_ORDER_ENABLED
                       ? EXIT_FAILURE : 0),
-                     0, _("file %d is not in sorted order"), whatfile);
+                     0, _("%s:%ju: is not sorted: %.*s"),
+                     g_names[whatfile - 1], line_no[whatfile - 1],
+                     (int) len, current->buf.buffer);

-              /* If we get to here, the message was just a warning, but we
-                 want only to issue it once. */
+              /* If we get to here, the message was merely a warning.
+                 Arrange to issue it only once per file.  */
               issued_disorder_warning[whatfile-1] = true;
             }
         }
@@ -436,6 +453,7 @@ get_line (FILE *fp, struct line **linep, int which)
       freeline (line);
       return false;
     }
+  ++line_no[which - 1];

   xfields (line);

@@ -980,7 +998,6 @@ main (int argc, char **argv)
   int prev_optc_status = MUST_BE_OPERAND;
   int operand_status[2];
   int joption_count[2] = { 0, 0 };
-  char *names[2];
   FILE *fp1, *fp2;
   int optc;
   int nfiles = 0;
@@ -1100,7 +1117,7 @@ main (int argc, char **argv)
           break;

         case 1:		/* Non-option argument.  */
-          add_file_name (optarg, names, operand_status, joption_count,
+          add_file_name (optarg, g_names, operand_status, joption_count,
                          &nfiles, &prev_optc_status, &optc_status);
           break;

@@ -1122,7 +1139,7 @@ main (int argc, char **argv)
   /* Process any operands after "--".  */
   prev_optc_status = MUST_BE_OPERAND;
   while (optind < argc)
-    add_file_name (argv[optind++], names, operand_status, joption_count,
+    add_file_name (argv[optind++], g_names, operand_status, joption_count,
                    &nfiles, &prev_optc_status, &optc_status);

   if (nfiles != 2)
@@ -1148,20 +1165,20 @@ main (int argc, char **argv)
   if (join_field_2 == SIZE_MAX)
     join_field_2 = 0;

-  fp1 = STREQ (names[0], "-") ? stdin : fopen (names[0], "r");
+  fp1 = STREQ (g_names[0], "-") ? stdin : fopen (g_names[0], "r");
   if (!fp1)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
-  fp2 = STREQ (names[1], "-") ? stdin : fopen (names[1], "r");
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
+  fp2 = STREQ (g_names[1], "-") ? stdin : fopen (g_names[1], "r");
   if (!fp2)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);
   if (fp1 == fp2)
     error (EXIT_FAILURE, errno, _("both files cannot be standard input"));
   join (fp1, fp2);

   if (fclose (fp1) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
   if (fclose (fp2) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);

   if (issued_disorder_warning[0] || issued_disorder_warning[1])
     exit (EXIT_FAILURE);
diff --git a/tests/misc/join b/tests/misc/join
index eae3f18..d6528da 100755
--- a/tests/misc/join
+++ b/tests/misc/join
@@ -196,7 +196,23 @@ my @tv = (
 # With check, both inputs out of order (in fact, in reverse order)
 ['chkodr-5', '--check-order',
  [" b 1\n a 2\n", " b Y\n a Z\n"], "", 1,
- "$prog: file 1 is not in sorted order\n"],
+ "$prog: chkodr-5.1:2: is not sorted:  a 2\n"],
+
+# Similar, but with only file 2 not sorted.
+['chkodr-5b', '--check-order',
+ [" a 2\n b 1\n", " b Y\n a Z\n"], "", 1,
+ "$prog: chkodr-5b.2:2: is not sorted:  a Z\n"],
+
+# Similar, but with the offending line having length 0 (excluding newline).
+['chkodr-5c', '--check-order',
+ [" a 2\n b 1\n", " b Y\n\n"], "", 1,
+ "$prog: chkodr-5c.2:2: is not sorted: \n"],
+
+# Similar, but elicit a warning for each input file (without --check-order).
+['chkodr-5d', '',
+ ["a\nx\n\n", "b\ny\n\n"], "", 1,
+ "$prog: chkodr-5d.1:3: is not sorted: \n" .
+ "$prog: chkodr-5d.2:3: is not sorted: \n"],

 # Without order check, both inputs out of order and some lines
 # unpairable.  This is NOT supported by the GNU extension.  All that
@@ -229,7 +245,7 @@ my @tv = (
 # actual data out-of-order. This join should fail.
 ['header-3', '--header --check-order',
  ["ID Name\n2 B\n1 A\n", "ID Color\n2 blue\n"], "ID Name Color\n", 1,
- "$prog: file 1 is not in sorted order\n"],
+ "$prog: header-3.1:3: is not sorted: 1 A\n"],

 # '--header' with specific output format '-o'.
 # output header line should respect the requested format
--
1.7.6.351.gb35ac




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9236; Package coreutils. (Sat, 06 Aug 2011 20:44:01 GMT) Full text and rfc822 format available.

Message #17 received at 9236 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9236 <at> debbugs.gnu.org, David Gast <dgast <at> csulb.edu>
Subject: Re: bug#9236: Fwd: Join
Date: Sat, 06 Aug 2011 22:42:37 +0200
Jim Meyering wrote:

> Jim Meyering wrote:
> ...
>> I started looking at this, and among other things saw
>> a diagnostic that mentioned "file 1", which would do
>> much better to mention the actual file name, so embarked.
>> Here's a preliminary patch (not even a decent ChangeLog entry
>> and the join test still needs to be updated):
>>
>>     $ printf '%s\n' b a c > in
>>     $ ./join --check-order in in
>>     ./join: in:2: is not sorted: a
>>     [Exit 1]
>>
>> Subject: [PATCH] join: FIXME: check: print both file name and line number
>>
>> ---
>>  src/join.c |   29 +++++++++++++++++++----------
>>  1 files changed, 19 insertions(+), 10 deletions(-)
>
> Here's a much better patch.
>
>>From 2e4ca5100dcc3229e9937c48aed3dc475bb507ea Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Thu, 4 Aug 2011 19:31:50 +0200
> Subject: [PATCH] join: with --check-order print offending file name, line
>  number and data
>
> * src/join (g_names): New global (was main's "names").
> (main): Update all uses of "names".
> (line_no[2]): New globals.
> (get_line): Increment after reading each line.
> (check_order): Print the standard "file name:line_no: " prefix
> as well as the offending line when reporting disorder.
> Here is a sample old/new comparison:
>   -join: file 1 is not in sorted order
>   +join: in:4: is not sorted: contents-of-line-4
> * tests/misc/join: Change the two affected tests to expect
> the new diagnostic.
> Add new tests for more coverage: mismatch in file 2,
> two diagnostics, zero-length out-of-order line.
> * NEWS (Improvements): Mention it.

Nearly forgot.
While coding, I considered the case of an
offending line with no trailing newline,
but hadn't tested it.  Just folded this in:

diff --git a/tests/misc/join b/tests/misc/join
index d6528da..a892a10 100755
--- a/tests/misc/join
+++ b/tests/misc/join
@@ -214,6 +214,12 @@ my @tv = (
  "$prog: chkodr-5d.1:3: is not sorted: \n" .
  "$prog: chkodr-5d.2:3: is not sorted: \n"],

+# Similar, but make it so each offending line has no newline.
+['chkodr-5e', '',
+ ["a\nx\no", "b\ny\np"], "", 1,
+ "$prog: chkodr-5e.1:3: is not sorted: o\n" .
+ "$prog: chkodr-5e.2:3: is not sorted: p\n"],
+
 # Without order check, both inputs out of order and some lines
 # unpairable.  This is NOT supported by the GNU extension.  All that
 # we really care about for this test is that the return status is




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Sat, 06 Aug 2011 21:03:01 GMT) Full text and rfc822 format available.

Notification sent to "David Gast" <dgast <at> csulb.edu>:
bug acknowledged by developer. (Sat, 06 Aug 2011 21:03:01 GMT) Full text and rfc822 format available.

Message #22 received at 9236-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 9236-done <at> debbugs.gnu.org, David Gast <dgast <at> csulb.edu>
Subject: Re: bug#9236: Fwd: Join
Date: Sat, 06 Aug 2011 23:01:18 +0200
Jim Meyering wrote:
> Jim Meyering wrote:
>> Jim Meyering wrote:
>> ...
>>> I started looking at this, and among other things saw
>>> a diagnostic that mentioned "file 1", which would do
>>> much better to mention the actual file name, so embarked.
>>> Here's a preliminary patch (not even a decent ChangeLog entry
>>> and the join test still needs to be updated):
>>>
>>>     $ printf '%s\n' b a c > in
>>>     $ ./join --check-order in in
>>>     ./join: in:2: is not sorted: a
>>>     [Exit 1]
>>>
>>> Subject: [PATCH] join: FIXME: check: print both file name and line number
>>>
>>> ---
>>>  src/join.c |   29 +++++++++++++++++++----------
>>>  1 files changed, 19 insertions(+), 10 deletions(-)
>>
>> Here's a much better patch.
>>
>>>From 2e4ca5100dcc3229e9937c48aed3dc475bb507ea Mon Sep 17 00:00:00 2001
>> From: Jim Meyering <meyering <at> redhat.com>
>> Date: Thu, 4 Aug 2011 19:31:50 +0200
>> Subject: [PATCH] join: with --check-order print offending file name, line
>>  number and data
>>
>> * src/join (g_names): New global (was main's "names").
...
>> * NEWS (Improvements): Mention it.

Also nearly forgot to mention in the log that David Gast
suggested this change.  For the record, I expect to push this
tomorrow or Monday:

From a0a3f339f72f4ca3ecc348ee4416c3c1e0f4765f Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 4 Aug 2011 19:31:50 +0200
Subject: [PATCH] join: with --check-order print offending file name, line
 number and data

* src/join (g_names): New global (was main's "names").
(main): Update all uses of "names".
(line_no[2]): New globals.
(get_line): Increment after reading each line.
(check_order): Print the standard "file name:line_no: " prefix
as well as the offending line when reporting disorder.
Here is a sample old/new comparison:
  -join: file 1 is not in sorted order
  +join: in:4: is not sorted: contents-of-line-4
* tests/misc/join: Change the two affected tests to expect
the new diagnostic.
Add new tests for more coverage: mismatch in file 2,
two diagnostics, zero-length out-of-order line.
* NEWS (Improvements): Mention it.
Suggested by David Gast in http://debbugs.gnu.org/9236
---
 NEWS            |    3 +++
 src/join.c      |   43 ++++++++++++++++++++++++++++++-------------
 tests/misc/join |   26 ++++++++++++++++++++++++--
 3 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/NEWS b/NEWS
index 2e48497..6e24f5c 100644
--- a/NEWS
+++ b/NEWS
@@ -66,6 +66,9 @@ GNU coreutils NEWS                                    -*- outline -*-
   df now supports disk partitions larger than 4 TiB on MacOS X 10.5
   or newer and on AIX 5.2 or newer.

+  join --check-order now prints "join: FILE:LINE_NUMBER: bad_line" for an
+  unsorted input, rather than e.g., "join: file 1 is not in sorted order".
+
   shuf outputs small subsets of large permutations much more efficiently.
   For example `shuf -i1-$((2**32-1)) -n2` no longer exhausts memory.

diff --git a/src/join.c b/src/join.c
index 99d918f..694fb55 100644
--- a/src/join.c
+++ b/src/join.c
@@ -86,9 +86,15 @@ struct seq
     struct line **lines;
   };

-/* The previous line read from each file. */
+/* The previous line read from each file.  */
 static struct line *prevline[2] = {NULL, NULL};

+/* The number of lines read from each file.  */
+static uintmax_t line_no[2] = {0, 0};
+
+/* The input file names.  */
+static char *g_names[2];
+
 /* This provides an extra line buffer for each file.  We need these if we
    try to read two consecutive lines into the same buffer, since we don't
    want to overwrite the previous buffer before we check order. */
@@ -384,12 +390,23 @@ check_order (const struct line *prev,
           size_t join_field = whatfile == 1 ? join_field_1 : join_field_2;
           if (keycmp (prev, current, join_field, join_field) > 0)
             {
+              /* Exclude any trailing newline. */
+              size_t len = current->buf.length;
+              if (0 < len && current->buf.buffer[len - 1] == '\n')
+                --len;
+
+              /* If the offending line is longer than INT_MAX, output
+                 only the first INT_MAX bytes in this diagnostic.  */
+              len = MIN (INT_MAX, len);
+
               error ((check_input_order == CHECK_ORDER_ENABLED
                       ? EXIT_FAILURE : 0),
-                     0, _("file %d is not in sorted order"), whatfile);
+                     0, _("%s:%ju: is not sorted: %.*s"),
+                     g_names[whatfile - 1], line_no[whatfile - 1],
+                     (int) len, current->buf.buffer);

-              /* If we get to here, the message was just a warning, but we
-                 want only to issue it once. */
+              /* If we get to here, the message was merely a warning.
+                 Arrange to issue it only once per file.  */
               issued_disorder_warning[whatfile-1] = true;
             }
         }
@@ -436,6 +453,7 @@ get_line (FILE *fp, struct line **linep, int which)
       freeline (line);
       return false;
     }
+  ++line_no[which - 1];

   xfields (line);

@@ -980,7 +998,6 @@ main (int argc, char **argv)
   int prev_optc_status = MUST_BE_OPERAND;
   int operand_status[2];
   int joption_count[2] = { 0, 0 };
-  char *names[2];
   FILE *fp1, *fp2;
   int optc;
   int nfiles = 0;
@@ -1100,7 +1117,7 @@ main (int argc, char **argv)
           break;

         case 1:		/* Non-option argument.  */
-          add_file_name (optarg, names, operand_status, joption_count,
+          add_file_name (optarg, g_names, operand_status, joption_count,
                          &nfiles, &prev_optc_status, &optc_status);
           break;

@@ -1122,7 +1139,7 @@ main (int argc, char **argv)
   /* Process any operands after "--".  */
   prev_optc_status = MUST_BE_OPERAND;
   while (optind < argc)
-    add_file_name (argv[optind++], names, operand_status, joption_count,
+    add_file_name (argv[optind++], g_names, operand_status, joption_count,
                    &nfiles, &prev_optc_status, &optc_status);

   if (nfiles != 2)
@@ -1148,20 +1165,20 @@ main (int argc, char **argv)
   if (join_field_2 == SIZE_MAX)
     join_field_2 = 0;

-  fp1 = STREQ (names[0], "-") ? stdin : fopen (names[0], "r");
+  fp1 = STREQ (g_names[0], "-") ? stdin : fopen (g_names[0], "r");
   if (!fp1)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
-  fp2 = STREQ (names[1], "-") ? stdin : fopen (names[1], "r");
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
+  fp2 = STREQ (g_names[1], "-") ? stdin : fopen (g_names[1], "r");
   if (!fp2)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);
   if (fp1 == fp2)
     error (EXIT_FAILURE, errno, _("both files cannot be standard input"));
   join (fp1, fp2);

   if (fclose (fp1) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[0]);
+    error (EXIT_FAILURE, errno, "%s", g_names[0]);
   if (fclose (fp2) != 0)
-    error (EXIT_FAILURE, errno, "%s", names[1]);
+    error (EXIT_FAILURE, errno, "%s", g_names[1]);

   if (issued_disorder_warning[0] || issued_disorder_warning[1])
     exit (EXIT_FAILURE);
diff --git a/tests/misc/join b/tests/misc/join
index eae3f18..a892a10 100755
--- a/tests/misc/join
+++ b/tests/misc/join
@@ -196,7 +196,29 @@ my @tv = (
 # With check, both inputs out of order (in fact, in reverse order)
 ['chkodr-5', '--check-order',
  [" b 1\n a 2\n", " b Y\n a Z\n"], "", 1,
- "$prog: file 1 is not in sorted order\n"],
+ "$prog: chkodr-5.1:2: is not sorted:  a 2\n"],
+
+# Similar, but with only file 2 not sorted.
+['chkodr-5b', '--check-order',
+ [" a 2\n b 1\n", " b Y\n a Z\n"], "", 1,
+ "$prog: chkodr-5b.2:2: is not sorted:  a Z\n"],
+
+# Similar, but with the offending line having length 0 (excluding newline).
+['chkodr-5c', '--check-order',
+ [" a 2\n b 1\n", " b Y\n\n"], "", 1,
+ "$prog: chkodr-5c.2:2: is not sorted: \n"],
+
+# Similar, but elicit a warning for each input file (without --check-order).
+['chkodr-5d', '',
+ ["a\nx\n\n", "b\ny\n\n"], "", 1,
+ "$prog: chkodr-5d.1:3: is not sorted: \n" .
+ "$prog: chkodr-5d.2:3: is not sorted: \n"],
+
+# Similar, but make it so each offending line has no newline.
+['chkodr-5e', '',
+ ["a\nx\no", "b\ny\np"], "", 1,
+ "$prog: chkodr-5e.1:3: is not sorted: o\n" .
+ "$prog: chkodr-5e.2:3: is not sorted: p\n"],

 # Without order check, both inputs out of order and some lines
 # unpairable.  This is NOT supported by the GNU extension.  All that
@@ -229,7 +251,7 @@ my @tv = (
 # actual data out-of-order. This join should fail.
 ['header-3', '--header --check-order',
  ["ID Name\n2 B\n1 A\n", "ID Color\n2 blue\n"], "ID Name Color\n", 1,
- "$prog: file 1 is not in sorted order\n"],
+ "$prog: header-3.1:3: is not sorted: 1 A\n"],

 # '--header' with specific output format '-o'.
 # output header line should respect the requested format
--
1.7.6.351.gb35ac




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 04 Sep 2011 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 85 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.