GNU logs - #19835, boring messages


Message sent to bug-diffutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#19835: RFC: diff: skip initial columns before comparing
Resent-From: Dave Gordon <david.s.gordon@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-diffutils@HIDDEN
Resent-Date: Wed, 11 Feb 2015 16:51:01 +0000
Resent-Message-ID: <handler.19835.B.14236734087968 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 19835
X-GNU-PR-Package: diffutils
X-GNU-PR-Keywords: 
To: 19835 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-diffutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.14236734087968
          (code B ref -1); Wed, 11 Feb 2015 16:51:01 +0000
Received: (at submit) by debbugs.gnu.org; 11 Feb 2015 16:50:08 +0000
Received: from localhost ([127.0.0.1]:40294 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1YLaUI-00024P-Je
	for submit <at> debbugs.gnu.org; Wed, 11 Feb 2015 11:50:08 -0500
Received: from eggs.gnu.org ([208.118.235.92]:57927)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYME-00075g-9r
 for submit <at> debbugs.gnu.org; Wed, 11 Feb 2015 09:33:40 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYM7-0002YZ-Jx
 for submit <at> debbugs.gnu.org; Wed, 11 Feb 2015 09:33:33 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:36182)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYM7-0002YH-0l
 for submit <at> debbugs.gnu.org; Wed, 11 Feb 2015 09:33:31 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59644)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYM5-0002OK-KT
 for bug-diffutils@HIDDEN; Wed, 11 Feb 2015 09:33:30 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYM2-0002XL-DB
 for bug-diffutils@HIDDEN; Wed, 11 Feb 2015 09:33:29 -0500
Received: from mga11.intel.com ([192.55.52.93]:40945)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <david.s.gordon@HIDDEN>) id 1YLYM2-0002Wu-69
 for bug-diffutils@HIDDEN; Wed, 11 Feb 2015 09:33:26 -0500
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga102.fm.intel.com with ESMTP; 11 Feb 2015 06:33:22 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,558,1418112000"; d="scan'208";a="453142417"
Received: from dsgordon-linux.isw.intel.com (HELO [10.102.226.149])
 ([10.102.226.149])
 by FMSMGA003.fm.intel.com with ESMTP; 11 Feb 2015 06:18:44 -0800
Message-ID: <54DB6831.7090307@HIDDEN>
Date: Wed, 11 Feb 2015 14:33:21 +0000
From: Dave Gordon <david.s.gordon@HIDDEN>
Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way,
 Swindon SN3 1RJ
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Mailman-Approved-At: Wed, 11 Feb 2015 11:50:05 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

When comparing certain types of files, notably timestamped logfiles
such as the output of dmesg(1), it's necessary to ignore the initial
characters on each line, otherwise every line is different. In the
simplest case, this can be done by applying 'cut(1)' to each input;
but then, important information about when the difference(s) occurred is
lost, and it can be difficult to find the relevant lines in the original
files, especially if they are highly repetitive (as logfiles often are).
When is needed in this situation is to ignore the timestamps for
purposes of comparison, but then include them in any lines copied to
the output.

So this patch adds a new option (long form only) "--ignore-initial=N" to
ignore the first N characters of each line. This is done by skipping the
first N characters of each line in find_and_hash_each_line(), and
likewise lines_differ(). The hashing or comparison of the remaining part
of the line then proceeds as usual.

One subtle point: if both of the lines have less than N characters, the
lines are considered equal iff they have the same length. Usually, the
type of file you would use this option with will have a fixed-format
prefix (which is the part to be ignored), and a line missing this prefix
is generally an indication of a formatting error. So a line with the
prefix but no further content should NOT match an empty line or a line
with a truncated prefix; but we still want two empty lines to match each
other.

For example, with --ignore-initial=10:

These lines match:
[22:47:25] hello
[23:17:24] hello

These lines don't match:
[22:47:25] hello
[23:17:24]

Nor do these:
[22:47:]
[23:17:24]

But these do:
[NOCLOCK]
[CLKFAIL]

Hope this looks useful!
.Dave.

-----------------------
diff --git a/src/diff.c b/src/diff.c
index 50d0365..eccce21 100644
--- a/src/diff.c
+++ b/src/diff.c
@@ -121,6 +121,7 @@ enum
   NO_IGNORE_FILE_NAME_CASE_OPTION,
   NORMAL_OPTION,
   SDIFF_MERGE_ASSIST_OPTION,
+  SKIP_INITIAL_OPTION,
   STRIP_TRAILING_CR_OPTION,
   SUPPRESS_BLANK_EMPTY_OPTION,
   SUPPRESS_COMMON_LINES_OPTION,
@@ -173,6 +174,7 @@ static struct option const longopts[] =
   {"ignore-blank-lines", 0, 0, 'B'},
   {"ignore-case", 0, 0, 'i'},
   {"ignore-file-name-case", 0, 0, IGNORE_FILE_NAME_CASE_OPTION},
+  {"ignore-initial", 1, 0, SKIP_INITIAL_OPTION},
   {"ignore-matching-lines", 1, 0, 'I'},
   {"ignore-space-change", 0, 0, 'b'},
   {"ignore-tab-expansion", 0, 0, 'E'},
@@ -580,6 +582,18 @@ main (int argc, char **argv)
          sdiff_merge_assist = true;
          break;

+       case SKIP_INITIAL_OPTION:
+         numval = strtoumax (optarg, &numend, 10);
+         if (! (0 < numval && numval <= SIZE_MAX) || *numend)
+           try_help ("invalid initial skip '%s'", optarg);
+         if (initial_skip != numval)
+           {
+             if (initial_skip)
+               fatal ("conflicting initial skip options");
+             initial_skip = numval;
+           }
+         break;
+
        case STRIP_TRAILING_CR_OPTION:
          strip_trailing_cr = true;
          break;
@@ -724,7 +738,8 @@ main (int argc, char **argv)
   files_can_be_treated_as_binary =
     (brief & binary
      & ~ (ignore_blank_lines | ignore_case | strip_trailing_cr
-         | (ignore_regexp_list.regexps || ignore_white_space)));
+         | (ignore_regexp_list.regexps || ignore_white_space
+                 || initial_skip)));

   switch_string = option_list (argv + 1, optind - 1);

@@ -895,6 +910,7 @@ static char const * const option_help_msgid[] = {
   N_("-w, --ignore-all-space          ignore all white space"),
   N_("-B, --ignore-blank-lines        ignore changes where lines are
all blank"),
   N_("-I, --ignore-matching-lines=RE  ignore changes where all lines
match RE"),
+  N_("    --ignore-initial=SKIP       ignore the initial SKIP
characters of each line"),
   "",
   N_("-a, --text                      treat all files as text"),
   N_("    --strip-trailing-cr         strip trailing carriage return on
input"),
diff --git a/src/diff.h b/src/diff.h
index e9f0471..b638a3f 100644
--- a/src/diff.h
+++ b/src/diff.h
@@ -125,6 +125,9 @@ XTERN enum DIFF_white_space ignore_white_space;
 /* Ignore changes that affect only blank lines (-B).  */

+/* Skip this many initial characters on each line */
+XTERN size_t initial_skip;
+
 /* Files can be compared byte-by-byte, as if they were binary.
    This depends on various options.  */
 XTERN bool files_can_be_treated_as_binary;
diff --git a/src/io.c b/src/io.c
index 463ee35..7e15996 100644
--- a/src/io.c
+++ b/src/io.c
@@ -232,13 +232,18 @@ find_and_hash_each_line (struct file_data *current)
   bool diff_length_compare_anyway =
     ig_white_space != IGNORE_NO_WHITE_SPACE;
   bool same_length_diff_contents_compare_anyway =
-    diff_length_compare_anyway | ig_case;
+    diff_length_compare_anyway | ig_case || initial_skip != 0;

   while (p < suffix_begin)
     {
       char const *ip = p;
       hash_value h = 0;
       unsigned char c;
+      size_t skip = initial_skip;
+
+      while (skip--)
+       if ((c = *p++) == '\n')
+         goto hashing_done;

       /* Hash this line until we find a newline.  */
       switch (ig_white_space)
diff --git a/src/util.c b/src/util.c
index 016057d..0acba06 100644
--- a/src/util.c
+++ b/src/util.c
@@ -413,6 +413,16 @@ lines_differ (char const *s1, char const *s2)
   register char const *t1 = s1;
   register char const *t2 = s2;
   size_t column = 0;
+  size_t skip = initial_skip;
+
+  while (skip--)
+    {
+      register unsigned char c1 = *t1++;
+      register unsigned char c2 = *t2++;
+
+      if (c1 == '\n' || c2 == '\n')
+       return c1 != c2;
+    }

   while (1)
     {





Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.503 (Entity 5.503)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Dave Gordon <david.s.gordon@HIDDEN>
Subject: bug#19835: Acknowledgement (RFC: diff: skip initial columns
 before comparing)
Message-ID: <handler.19835.B.14236734087968.ack <at> debbugs.gnu.org>
References: <54DB6831.7090307@HIDDEN>
X-Gnu-PR-Message: ack 19835
X-Gnu-PR-Package: diffutils
Reply-To: 19835 <at> debbugs.gnu.org
Date: Wed, 11 Feb 2015 16:51:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-diffutils@HIDDEN

If you wish to submit further information on this problem, please
send it to 19835 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
19835: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D19835
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems



Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.