Received: (at 61851) by debbugs.gnu.org; 16 Mar 2023 20:38:22 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Mar 16 16:38:22 2023 Received: from localhost ([127.0.0.1]:43183 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pcuMf-0003Nl-Qp for submit <at> debbugs.gnu.org; Thu, 16 Mar 2023 16:38:22 -0400 Received: from mail1.fsfe.org ([217.69.89.151]:45818) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jlicht@HIDDEN>) id 1pcuMe-0003Nb-4m for 61851 <at> debbugs.gnu.org; Thu, 16 Mar 2023 16:38:21 -0400 From: Jelle Licht <jlicht@HIDDEN> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fsfe.org; s=2021100501; t=1678999097; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=NHpiXvDl53fAxeB+PuyKMnnVwiRW1oMonfDYuQlHS0U=; b=OE8UI1P2t6KjDZWc5jq2jTOZ7SntLm7NhI/+x/AOz1L8VSK6jfj1XrB8IrD2JVcQTK2uyL 3D4NZFh1/TdxOOCncHuI4/fdnmlze7kZBGj77LfSdNTPFTWFPJjX7cz6atJ1BjRJDYy22Y 3X2D2hNiyYLcnj/X67riXg/B+WRYBYA= To: Maxim Cournoyer <maxim.cournoyer@HIDDEN>, Simon South <simon@HIDDEN> Subject: Re: bug#61851: [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. In-Reply-To: <877cw1ii2i.fsf_-_@HIDDEN> References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> <87bkle4olv.fsf@HIDDEN> <87h6v53kdn.fsf@HIDDEN> <87mt4xiz0c.fsf@HIDDEN> <87o7pdyc9f.fsf@HIDDEN> <877cw1ii2i.fsf_-_@HIDDEN> Date: Thu, 16 Mar 2023 21:38:15 +0100 Message-ID: <87o7osig94.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 61851 Cc: 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Hey folks, Maxim Cournoyer <maxim.cournoyer@HIDDEN> writes: > Hello, > > Simon South <simon@HIDDEN> writes: > >> Maxim Cournoyer <maxim.cournoyer@HIDDEN> writes: >>> Would you be so kind as to open an issue with upstream about the >>> misleading doc? >> >> I would've submitted a patch already were the project not using GitHub. >> I don't have a GitHub account and don't intend to get one. >> >> Would anyone else be willing to be open an issue on this? > > No problem; see: https://github.com/tesseract-ocr/tesseract/issues/4025. So it seems the issue was confirmed. In addition there seems to be some inconsistencies between build system with regards to how the data dir is interpreted by tesseract: https://github.com/tesseract-ocr/tesseract/issues/4026 I think it makes sense for us to apply [a version of] Simon's patch. QA also seems to show green lights, ignoring the unrelated recent openmpi-related failures. WDYT? - Jelle
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 21:41:35 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Feb 28 16:41:34 2023 Received: from localhost ([127.0.0.1]:52118 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pX7j4-0004yy-Lx for submit <at> debbugs.gnu.org; Tue, 28 Feb 2023 16:41:34 -0500 Received: from mail-qv1-f54.google.com ([209.85.219.54]:35335) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <maxim.cournoyer@HIDDEN>) id 1pX7j2-0004yk-RN for 61851 <at> debbugs.gnu.org; Tue, 28 Feb 2023 16:41:33 -0500 Received: by mail-qv1-f54.google.com with SMTP id ff4so7946896qvb.2 for <61851 <at> debbugs.gnu.org>; Tue, 28 Feb 2023 13:41:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=XpUJmDBpLq/N9YekaEebulFp9jYZU813UXRaQ+vnDCw=; b=jNwXTfo5ax60qiwDN8GNOdJI7m4AnRTN63AeNmzuec4sQiF6YZ8MQEe4A4v8vby57+ V9Z2IK5DCMWFupZqD5IjjhWLCJHYEUns+hynsSl8pGuq/ZTG+yyBtt7CXk1+KRjLjDAL LxbuylUSiR20UxP2ETQFyB5Vl01zbMFf3aSOaROrqBM9A3rbFlbE0fjcdrVDNlmjXSUN SMVaIiryy3z85wYsbxny08tVC/tUWDgXRFbdWkrydranY1SzlCrK6t9T1wxlz4Br6nq3 317JyoIrgtcsd356qqYss/LXNT47GUg7VdGQUEXgaKsyshYBkiU0JgT1j8Niy1Uyj/gx SzJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XpUJmDBpLq/N9YekaEebulFp9jYZU813UXRaQ+vnDCw=; b=UlFFmUUE5q7pW9dGDvOwJ814KtkfteTIzdZmsM0AEUgE+v9efDqeSYzykzJ1UUWdEa lt/D9vv/jcTQKjqZR9e4ixdjzzitx3CDzOQeOLacoIGpBqdxjneLuFPtOSJ8tHEBkMO/ 8623rWJZg4Cw4x9lX4AsS6BXqR/H3Oz3H5FyO+LXRuphp+T7lWJczdV4ckmUe3wifzSs kuq6GeiNnewh1MtSScu2Fkc65nU8E0pOJg92OW4wzgQIW/qoICgO4Pw/HrI88TQv5+N3 BAsm6kGrMXUIiD3IGUnsg0HelafFR4YSQy6kXPhfGVuqWNSSqwbw8VvW8zsN1abVAMAj E/eQ== X-Gm-Message-State: AO0yUKUZDFBnu9j2Fcqzfq9zjg2pszlu/Q9++RqAf8rQDrQZr7+4BWu4 Gsg5C/mOTOj6GhWxQgUI5O50tx8MHttDNWFU X-Google-Smtp-Source: AK7set9l9RT+MFbddkY7LFGEy2y+f8U5+oe49rAcgRqxt9E8rns216sUF0HsEHFioLoeRMxFe0rReA== X-Received: by 2002:a05:6214:1d2f:b0:56e:99f2:1bb0 with SMTP id f15-20020a0562141d2f00b0056e99f21bb0mr8432615qvd.48.1677620486961; Tue, 28 Feb 2023 13:41:26 -0800 (PST) Received: from hurd (dsl-10-130-29.b2b2c.ca. [72.10.130.29]) by smtp.gmail.com with ESMTPSA id x15-20020a05620a098f00b0073b929d0371sm7535007qkx.4.2023.02.28.13.41.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 13:41:26 -0800 (PST) From: Maxim Cournoyer <maxim.cournoyer@HIDDEN> To: Simon South <simon@HIDDEN> Subject: Re: bug#61851: [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> <87bkle4olv.fsf@HIDDEN> <87h6v53kdn.fsf@HIDDEN> <87mt4xiz0c.fsf@HIDDEN> <87o7pdyc9f.fsf@HIDDEN> Date: Tue, 28 Feb 2023 16:41:25 -0500 In-Reply-To: <87o7pdyc9f.fsf@HIDDEN> (Simon South's message of "Tue, 28 Feb 2023 11:40:12 -0500") Message-ID: <877cw1ii2i.fsf_-_@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 61851 Cc: Jelle Licht <jlicht@HIDDEN>, 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Hello, Simon South <simon@HIDDEN> writes: > Maxim Cournoyer <maxim.cournoyer@HIDDEN> writes: >> Would you be so kind as to open an issue with upstream about the >> misleading doc? > > I would've submitted a patch already were the project not using GitHub. > I don't have a GitHub account and don't intend to get one. > > Would anyone else be willing to be open an issue on this? No problem; see: https://github.com/tesseract-ocr/tesseract/issues/4025. -- Thanks, Maxim
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 16:40:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Feb 28 11:40:21 2023 Received: from localhost ([127.0.0.1]:51883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pX31Z-0005Ho-CE for submit <at> debbugs.gnu.org; Tue, 28 Feb 2023 11:40:21 -0500 Received: from mailout.easymail.ca ([64.68.200.34]:53120) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <simon@HIDDEN>) id 1pX31W-0005HW-Qs for 61851 <at> debbugs.gnu.org; Tue, 28 Feb 2023 11:40:20 -0500 Received: from localhost (localhost [127.0.0.1]) by mailout.easymail.ca (Postfix) with ESMTP id 47B2668A66; Tue, 28 Feb 2023 16:40:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at emo07-pco.easydns.vpn Received: from mailout.easymail.ca ([127.0.0.1]) by localhost (emo07-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c0KpvR0G_-1z; Tue, 28 Feb 2023 16:40:12 +0000 (UTC) Received: from laptop (23-233-96-72.cpe.pppoe.ca [23.233.96.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailout.easymail.ca (Postfix) with ESMTPSA id BDCFE68253; Tue, 28 Feb 2023 16:40:12 +0000 (UTC) From: Simon South <simon@HIDDEN> To: Maxim Cournoyer <maxim.cournoyer@HIDDEN> Subject: Re: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> <87bkle4olv.fsf@HIDDEN> <87h6v53kdn.fsf@HIDDEN> <87mt4xiz0c.fsf@HIDDEN> Date: Tue, 28 Feb 2023 11:40:12 -0500 In-Reply-To: <87mt4xiz0c.fsf@HIDDEN> (Maxim Cournoyer's message of "Tue, 28 Feb 2023 10:35:31 -0500") Message-ID: <87o7pdyc9f.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 61851 Cc: Jelle Licht <jlicht@HIDDEN>, 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Maxim Cournoyer <maxim.cournoyer@HIDDEN> writes: > Would you be so kind as to open an issue with upstream about the > misleading doc? I would've submitted a patch already were the project not using GitHub. I don't have a GitHub account and don't intend to get one. Would anyone else be willing to be open an issue on this? -- Simon South simon@HIDDEN
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 15:35:42 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Feb 28 10:35:42 2023 Received: from localhost ([127.0.0.1]:51781 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pX20z-0003Ua-Lt for submit <at> debbugs.gnu.org; Tue, 28 Feb 2023 10:35:42 -0500 Received: from mail-qt1-f176.google.com ([209.85.160.176]:36540) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <maxim.cournoyer@HIDDEN>) id 1pX20w-0003UN-QZ for 61851 <at> debbugs.gnu.org; Tue, 28 Feb 2023 10:35:40 -0500 Received: by mail-qt1-f176.google.com with SMTP id l13so10903615qtv.3 for <61851 <at> debbugs.gnu.org>; Tue, 28 Feb 2023 07:35:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=n2rdSAEVDlMmlXpp2x4kV4uzFw4qKYO+QviUpLhNBLU=; b=lWeYSedGX+H3zQn5TWOo4BswROqQcqpQ4RkwamoPPBufi5s9vjKA7D6QtjtiEfC8lD MShe8CDbA/Z8qq4AFOOJYolPrzoPHnOQUkm3tW695EZnkgNjUUNg+CDPNsS+hO+aOFim +j2EZaTgdoZNgNFRNT+tYW/LJUvEHwqWPBtIBpxt1w0l8ghKicQZ2RF3vl+5Rx+f0VLG /226TV5GISYZbRycFXDu/CfYpV6dYQfNBdMamzkGiEDahIuETlDiFaFhX3HkvcjVcTeK bU1L58/9xvakwxyE4Q47eGgqF1c8GHHFAXaISIQkx0CJlEiqGs6U/TPNWeHJzXZwe2y4 Pp6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=n2rdSAEVDlMmlXpp2x4kV4uzFw4qKYO+QviUpLhNBLU=; b=iCvfBXZZBxm70s+UOTdUOSGL8LfVkoS2nWQ4qBMGakCiXFYIQvt8mjzkA711K8wBvs QZcM1fNslVjAZ4QoSJKicNhfOMkDZnF8BvJsRBlAtYg9WdlEk310UtxA2JDVsZdFXlIy wne1SBr+o6uavBAyjBMactv5GBe8WZN7gjqa+vYaEg8cxq/kdBgO6MLPyic0uneHy0cV 43y30mtwQ/89BPVtS4MniAihrJaQaIXUWvSOjS5UFYA5q5XLlDfyP2kFe7qp/m+OjuFA ofpugm7GVm1fyR8+YKU1mJhIT9d/MndJ+MFywMDF0Ujb2rw3RY8x+7oXKeNoR0XjiIiz 57Zg== X-Gm-Message-State: AO0yUKUP1ImwmxaUab4VDm/MIXpJejsCWr33IKsmew9ciUyM8dFQztM4 7FBIEIHE/Bqnfyj+wRIHtm2SER/aS95ZEp5P X-Google-Smtp-Source: AK7set8KYTNuo1ZTbMnT0KMiE0BwCiDc/1Nc2Ybm9+uYAHxLvZIpSigi/Julapy9wSuSB1tETKKp1A== X-Received: by 2002:a05:622a:1829:b0:3b8:6763:c25f with SMTP id t41-20020a05622a182900b003b86763c25fmr5618549qtc.13.1677598532917; Tue, 28 Feb 2023 07:35:32 -0800 (PST) Received: from hurd (dsl-10-130-29.b2b2c.ca. [72.10.130.29]) by smtp.gmail.com with ESMTPSA id i62-20020a37b841000000b0073b79edf46csm7009471qkf.83.2023.02.28.07.35.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 07:35:32 -0800 (PST) From: Maxim Cournoyer <maxim.cournoyer@HIDDEN> To: Simon South <simon@HIDDEN> Subject: Re: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> <87bkle4olv.fsf@HIDDEN> <87h6v53kdn.fsf@HIDDEN> Date: Tue, 28 Feb 2023 10:35:31 -0500 In-Reply-To: <87h6v53kdn.fsf@HIDDEN> (Simon South's message of "Tue, 28 Feb 2023 10:00:36 -0500") Message-ID: <87mt4xiz0c.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 61851 Cc: Jelle Licht <jlicht@HIDDEN>, 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Hi Simon, Simon South <simon@HIDDEN> writes: > Jelle Licht <jlicht@HIDDEN> writes: >> Cunningham's law strikes again :) > > Ha, interesting. That one's new to me. > >> This makes me believe the current situation was a deliberate choice... > > Yes, it was, and I realize now I didn't provide much in the way of > rationale in my previous email. So here's the background information > for anyone interested: > > Tesseract normally expects to find its data files in /usr/share/tessdata > and subfolders thereof. We'd like to use Guix's native-search-paths > functionality to pull together data from (for instance) multiple > language-specific data packages, and Tesseract conveniently honours a > TESSDATA_PREFIX environment variable that specifies its data folder's > location, so it seems we are all set. > > What should TESSDATA_PREFIX be set to? Tesseract's documentation[0] > says > > TESSDATA_PREFIX environment variable should be set to the parent > directory of =E2=80=9Ctessdata=E2=80=9D directory. > > So "share" then, presumably, to have the data files located at > "share/tessdata". The man page[1] seems to confirm this: > > To use a non-standard language pack named foo.traineddata, set the > TESSDATA_PREFIX environment variable so the file can be found at > TESSDATA_PREFIX/tessdata/foo.traineddata... > > This creates a problem, though, since defining a native-search-path of > just "share" will pull in files from virtually every single Guix > package. The solution then is to introduce an intermediate folder, > "tesseract-ocr", that sidesteps this problem, and to configure Tesseract > appropriately at build time so it installs its data files to > "share/tesseract-ocr/tessdata" instead. This is why the existing code > was written the way it was and what the comment you pointed out is > referring to. > > However there's a problem with this, too: Patching Makefile.am the way > the code does results in only some of Tesseract's data files being > placed in "share/tesseract-ocr/tessdata"; you can see in the package > output there is still a "share/tessdata" folder that contains > Tesseract's config files. Since these aren't also placed beneath > "share/tesseract-ocr/tessdata" Tesseract can't find them at runtime. > > The solution to this seems to be to remove this phase and instead use > the "--datadir" configure flag to specify the desired data-folder path. > Doing this results in all of Tesseract's data files being installed > beneath "share/tesseract-ocr/tessdata" and the resulting package works > as you'd expect. > > However the problem with this is... none of it is necessary in the first > place! It turns out Tesseract's documentation is simply WRONG and the > program actually expects TESSDATA_PREFIX to contain the complete path to > the "tessdata" data folder, not the path of the folder directly above > it. So Tesseract can be built as-is, the native-search-path can be > safely defined as "share/tessdata", and everything just works. > > This is what the patch I passed on yesterday does. Thanks for explaining, that makes sense! Would you be so kind as to open an issue with upstream about the misleading doc? That'd complete it and avoid any confusion in the future. --=20 Thanks, Maxim
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 15:00:46 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Feb 28 10:00:46 2023 Received: from localhost ([127.0.0.1]:51748 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pX1TC-0002aZ-7K for submit <at> debbugs.gnu.org; Tue, 28 Feb 2023 10:00:46 -0500 Received: from mailout.easymail.ca ([64.68.200.34]:44174) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <simon@HIDDEN>) id 1pX1T9-0002aI-Sm for 61851 <at> debbugs.gnu.org; Tue, 28 Feb 2023 10:00:44 -0500 Received: from localhost (localhost [127.0.0.1]) by mailout.easymail.ca (Postfix) with ESMTP id 31ECEE8D27; Tue, 28 Feb 2023 15:00:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at emo08-pco.easydns.vpn Received: from mailout.easymail.ca ([127.0.0.1]) by localhost (emo08-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1JvG093rm1ZN; Tue, 28 Feb 2023 15:00:37 +0000 (UTC) Received: from laptop (23-233-96-72.cpe.pppoe.ca [23.233.96.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailout.easymail.ca (Postfix) with ESMTPSA id 86487E74EC; Tue, 28 Feb 2023 15:00:37 +0000 (UTC) From: Simon South <simon@HIDDEN> To: Jelle Licht <jlicht@HIDDEN> Subject: Re: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> <87bkle4olv.fsf@HIDDEN> Date: Tue, 28 Feb 2023 10:00:36 -0500 In-Reply-To: <87bkle4olv.fsf@HIDDEN> (Jelle Licht's message of "Tue, 28 Feb 2023 01:31:40 +0100") Message-ID: <87h6v53kdn.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 61851 Cc: 61851 <at> debbugs.gnu.org, Maxim Cournoyer <maxim.cournoyer@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Jelle Licht <jlicht@HIDDEN> writes: > Cunningham's law strikes again :) Ha, interesting. That one's new to me. > This makes me believe the current situation was a deliberate choice... Yes, it was, and I realize now I didn't provide much in the way of rationale in my previous email. So here's the background information for anyone interested: Tesseract normally expects to find its data files in /usr/share/tessdata and subfolders thereof. We'd like to use Guix's native-search-paths functionality to pull together data from (for instance) multiple language-specific data packages, and Tesseract conveniently honours a TESSDATA_PREFIX environment variable that specifies its data folder's location, so it seems we are all set. What should TESSDATA_PREFIX be set to? Tesseract's documentation[0] says TESSDATA_PREFIX environment variable should be set to the parent directory of =E2=80=9Ctessdata=E2=80=9D directory. So "share" then, presumably, to have the data files located at "share/tessdata". The man page[1] seems to confirm this: To use a non-standard language pack named foo.traineddata, set the TESSDATA_PREFIX environment variable so the file can be found at TESSDATA_PREFIX/tessdata/foo.traineddata... This creates a problem, though, since defining a native-search-path of just "share" will pull in files from virtually every single Guix package. The solution then is to introduce an intermediate folder, "tesseract-ocr", that sidesteps this problem, and to configure Tesseract appropriately at build time so it installs its data files to "share/tesseract-ocr/tessdata" instead. This is why the existing code was written the way it was and what the comment you pointed out is referring to. However there's a problem with this, too: Patching Makefile.am the way the code does results in only some of Tesseract's data files being placed in "share/tesseract-ocr/tessdata"; you can see in the package output there is still a "share/tessdata" folder that contains Tesseract's config files. Since these aren't also placed beneath "share/tesseract-ocr/tessdata" Tesseract can't find them at runtime. The solution to this seems to be to remove this phase and instead use the "--datadir" configure flag to specify the desired data-folder path. Doing this results in all of Tesseract's data files being installed beneath "share/tesseract-ocr/tessdata" and the resulting package works as you'd expect. However the problem with this is... none of it is necessary in the first place! It turns out Tesseract's documentation is simply WRONG and the program actually expects TESSDATA_PREFIX to contain the complete path to the "tessdata" data folder, not the path of the folder directly above it. So Tesseract can be built as-is, the native-search-path can be safely defined as "share/tessdata", and everything just works. This is what the patch I passed on yesterday does. --=20 Simon South simon@HIDDEN [0] https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simples= t-invocation-to-ocr-an-image [1] https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 28 Feb 2023 00:31:51 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 27 19:31:51 2023 Received: from localhost ([127.0.0.1]:49274 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pWnuI-0006GU-Ma for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 19:31:51 -0500 Received: from mail1.fsfe.org ([217.69.89.151]:53206) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jlicht@HIDDEN>) id 1pWnuD-0006G7-Sw for 61851 <at> debbugs.gnu.org; Mon, 27 Feb 2023 19:31:49 -0500 From: Jelle Licht <jlicht@HIDDEN> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fsfe.org; s=2021100501; t=1677544301; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=AE1Fo0Cd7LtASWlAKh712gZbXtGBOpd2UMuRuLSqF/s=; b=U5ITwx8PMT5kPboA3YxrllVv58/KBWoUqdnRx0DH8dbDCnn72UVAFArg2HEhpKcpV4PTOv d/cAhLbJ6mglKx0a3Kd/2nQMax38ueiRKr3RGmoZ2t2HhxRb2Y/GZniQthYpqh5658OfF4 2POaUVcjAW3yGFfIqVRNHy1BVCb7KuY= To: Simon South <simon@HIDDEN> Subject: Re: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. In-Reply-To: <878rgik9uo.fsf@HIDDEN> References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> <878rgik9uo.fsf@HIDDEN> Date: Tue, 28 Feb 2023 01:31:40 +0100 Message-ID: <87bkle4olv.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 61851 Cc: 61851 <at> debbugs.gnu.org, Maxim Cournoyer <maxim.cournoyer@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Hi Simon, Simon South <simon@HIDDEN> writes: > Jelle, > > Respectfully, and speaking only as an interested observer, I think this > may not be the right fix. Cunningham's law strikes again :) [1]. > > Guix's Tesseract is indeed missing its config files, causing (among > other things) the examples in the online documentation[0] to not work, > e.g.: > > ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr > read_params_file: Can't open hocr > The (quick) [brown] {fox} jumps! > Over the $43,456.78 <lazy> #90 dog > (...) > > But the root issue appears to be a misconfiguration of the > TESSDATA_PREFIX search path in the tessdata-ocr package, which causes > Tesseract's own config files to be installed in a folder other than the > one it's configured to search. > > Fixing this places Tesseract's config files and the trained-data files > together beneath /usr/share/tessdata, allowing Tesseract to work as > expected: > > ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > (...) I will believe you without any doubt, but there's this spooky comment left in the tesseract-ocr 'adjust-TESSDATA_PREFIX-macro phase: --8<---------------cut here---------------start------------->8--- ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more ;; specific search-path than '/share' can be specified. The ;; build system uses CPPFLAGS for itself, so we can't simply set ;; a make flag. --8<---------------cut here---------------end--------------->8--- This makes me believe the current situation was a deliberate choice, but I personally don't understand what the original problem was/is. > This approach has the advantage of keeping the > tesseract-ocr-tessdata-fast package "pure" and focused only on > trained-data files, which will be important for the patch I'm working on > that will split it into multiple packages, one for each language and > script, to allow greater flexibility. > > I'll respond to this email with a draft (!) patch to tesseract-ocr that > should achieve the same result as yours, making the config files > available for use. Does this also fix the problem for you? If so, > would you consider submitting this change instead? It seems to work for my stuff! I'm bringing Maxim to weigh in on this, as they are the (un?)lucky expert according to my git-foo. Thanks for paying attention! - Jelle [1] https://meta.wikimedia.org/wiki/Cunningham%27s_Law
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 27 Feb 2023 22:48:15 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 27 17:48:15 2023 Received: from localhost ([127.0.0.1]:49143 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pWmI3-0000TT-18 for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 17:48:15 -0500 Received: from mailout.easymail.ca ([64.68.200.34]:57574) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <simon@HIDDEN>) id 1pWmI0-0000Sc-KY for 61851 <at> debbugs.gnu.org; Mon, 27 Feb 2023 17:48:14 -0500 Received: from localhost (localhost [127.0.0.1]) by mailout.easymail.ca (Postfix) with ESMTP id 7675CE8B3D; Mon, 27 Feb 2023 22:48:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at emo08-pco.easydns.vpn Received: from mailout.easymail.ca ([127.0.0.1]) by localhost (emo08-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GfnLZpcDzJaW; Mon, 27 Feb 2023 22:48:06 +0000 (UTC) Received: from localhost.localdomain (23-233-96-72.cpe.pppoe.ca [23.233.96.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailout.easymail.ca (Postfix) with ESMTPSA id 8C319E23C3; Mon, 27 Feb 2023 22:48:06 +0000 (UTC) From: Simon South <simon@HIDDEN> To: jlicht@HIDDEN Subject: [PATCH] gnu: tesseract-ocr: Use standard TESSDATA_PREFIX. Date: Mon, 27 Feb 2023 17:48:04 -0500 Message-Id: <20230227224804.21743-1-simon@HIDDEN> X-Mailer: git-send-email 2.39.1 In-Reply-To: <878rgik9uo.fsf@HIDDEN> References: <878rgik9uo.fsf@HIDDEN> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 61851 Cc: 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) --- gnu/packages/ocr.scm | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/gnu/packages/ocr.scm b/gnu/packages/ocr.scm index c1cd4f061e..fc069b83e3 100644 --- a/gnu/packages/ocr.scm +++ b/gnu/packages/ocr.scm @@ -88,7 +88,7 @@ (define-public tesseract-ocr-tessdata-fast (base32 "1m310cpb87xx8l8q7jy9fvzf6a0m8rm0dmjpbiwhc2mi6w4gn084")))) (build-system copy-build-system) - (arguments (list #:install-plan #~'(("." "share/tesseract-ocr/tessdata")) + (arguments (list #:install-plan #~'(("." "share/tessdata")) #:phases #~(modify-phases %standard-phases (add-after 'unpack 'delete-broken-links (lambda _ @@ -131,15 +131,6 @@ (define-public tesseract-ocr (substitute* "configure.ac" (("AC_SUBST\\(\\[XML_CATALOG_FILES])") "")))) - (add-after 'unpack 'adjust-TESSDATA_PREFIX-macro - (lambda _ - ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more - ;; specific search-path than '/share' can be specified. The - ;; build system uses CPPFLAGS for itself, so we can't simply set - ;; a make flag. - (substitute* "Makefile.am" - (("-DTESSDATA_PREFIX='\"@datadir@\"'") - "-DTESSDATA_PREFIX='\"@datadir@/tesseract-ocr\"'")))) (add-after 'build 'build-training (lambda* (#:key parallel-build? #:allow-other-keys) (define n (if parallel-build? (number->string @@ -155,7 +146,7 @@ (define n (if parallel-build? (number->string ;; extended via TESSDATA_PREFIX. (lambda* (#:key native-inputs inputs #:allow-other-keys) (define eng.traineddata - "/share/tesseract-ocr/tessdata/eng.traineddata") + "/share/tessdata/eng.traineddata") (install-file (search-input-file (or native-inputs inputs) eng.traineddata) (dirname (string-append #$output @@ -183,7 +174,7 @@ (define eng.traineddata (list leptonica)) (native-search-paths (list (search-path-specification (variable "TESSDATA_PREFIX") - (files (list "share/tesseract-ocr/tessdata")) + (files (list "share/tessdata")) (separator #f)))) ;single value (home-page "https://github.com/tesseract-ocr/tesseract") (synopsis "Optical character recognition engine") -- 2.39.1
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at 61851) by debbugs.gnu.org; 27 Feb 2023 22:43:54 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 27 17:43:54 2023 Received: from localhost ([127.0.0.1]:49135 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pWmDp-0000Ik-SC for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 17:43:54 -0500 Received: from mailout.easymail.ca ([64.68.200.34]:53206) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <simon@HIDDEN>) id 1pWmDm-0000IS-9Q for 61851 <at> debbugs.gnu.org; Mon, 27 Feb 2023 17:43:52 -0500 Received: from localhost (localhost [127.0.0.1]) by mailout.easymail.ca (Postfix) with ESMTP id 8726CE8B93; Mon, 27 Feb 2023 22:43:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at emo08-pco.easydns.vpn Received: from mailout.easymail.ca ([127.0.0.1]) by localhost (emo08-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sQiJe3HqSP1h; Mon, 27 Feb 2023 22:43:43 +0000 (UTC) Received: from earth (23-233-96-72.cpe.pppoe.ca [23.233.96.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailout.easymail.ca (Postfix) with ESMTPSA id B23A3E8B3D; Mon, 27 Feb 2023 22:43:43 +0000 (UTC) From: Simon South <simon@HIDDEN> To: jlicht@HIDDEN Subject: Re: [bug#61851] [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. References: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> Date: Mon, 27 Feb 2023 17:43:43 -0500 In-Reply-To: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> (jlicht@HIDDEN's message of "Mon, 27 Feb 2023 21:55:16 +0100") Message-ID: <878rgik9uo.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 61851 Cc: 61851 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Jelle, Respectfully, and speaking only as an interested observer, I think this may not be the right fix. Guix's Tesseract is indeed missing its config files, causing (among other things) the examples in the online documentation[0] to not work, e.g.: ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr read_params_file: Can't open hocr The (quick) [brown] {fox} jumps! Over the $43,456.78 <lazy> #90 dog (...) But the root issue appears to be a misconfiguration of the TESSDATA_PREFIX search path in the tessdata-ocr package, which causes Tesseract's own config files to be installed in a folder other than the one it's configured to search. Fixing this places Tesseract's config files and the trained-data files together beneath /usr/share/tessdata, allowing Tesseract to work as expected: ssouth@hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" (...) This approach has the advantage of keeping the tesseract-ocr-tessdata-fast package "pure" and focused only on trained-data files, which will be important for the patch I'm working on that will split it into multiple packages, one for each language and script, to allow greater flexibility. I'll respond to this email with a draft (!) patch to tesseract-ocr that should achieve the same result as yours, making the config files available for use. Does this also fix the problem for you? If so, would you consider submitting this change instead? -- Simon South simon@HIDDEN [0] https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html
guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.Received: (at submit) by debbugs.gnu.org; 27 Feb 2023 20:55:32 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Feb 27 15:55:32 2023 Received: from localhost ([127.0.0.1]:48942 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1pWkWy-00032E-Fv for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 15:55:32 -0500 Received: from lists.gnu.org ([209.51.188.17]:58700) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <jlicht@HIDDEN>) id 1pWkWu-000320-IF for submit <at> debbugs.gnu.org; Mon, 27 Feb 2023 15:55:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <jlicht@HIDDEN>) id 1pWkWt-0002ss-Uy for guix-patches@HIDDEN; Mon, 27 Feb 2023 15:55:28 -0500 Received: from mail1.fsfe.org ([217.69.89.151]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <jlicht@HIDDEN>) id 1pWkWq-0004ER-Ei for guix-patches@HIDDEN; Mon, 27 Feb 2023 15:55:26 -0500 From: jlicht@HIDDEN DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fsfe.org; s=2021100501; t=1677531318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=jwNQplytHi2Q6W0YYfe0gUj8Nm3LLeKrfxrg2dbsKuU=; b=MVHWTaO6NGWjgpRtYYvV+DWgEp3kc61HYflzZuzw9mQnNFveGyVYP41h/e/AlBcXOKt0uI /AsqZe2HNc2V65p2h0QhaD0OSaoUB3XmVIr4jjHbwANe+s3aeYhx1ryI08J1BA2EdbkqIc IskdkgDelMQzV4THjcLYEYw/oLz+5nw= To: guix-patches@HIDDEN Subject: [PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files. Date: Mon, 27 Feb 2023 21:55:16 +0100 Message-Id: <fed85bc978d9469832e5aaad737a8816d5f49fa7.1677531307.git.jlicht@HIDDEN> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=217.69.89.151; envelope-from=jlicht@HIDDEN; helo=mail1.fsfe.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit Cc: Jelle Licht <jlicht@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) From: Jelle Licht <jlicht@HIDDEN> * gnu/packages/ocr.scm (tesseract-ocr-tessdata-fast)[source]: Add recursive? flag. Adjust hash accordingly. [arguments]<#:phases>: Remove unneeded workaround. --- gnu/packages/ocr.scm | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/gnu/packages/ocr.scm b/gnu/packages/ocr.scm index c1cd4f061e..e07d40bda4 100644 --- a/gnu/packages/ocr.scm +++ b/gnu/packages/ocr.scm @@ -82,18 +82,14 @@ (define-public tesseract-ocr-tessdata-fast (method git-fetch) (uri (git-reference (url "https://github.com/tesseract-ocr/tessdata_fast") + (recursive? #t) ; for tessconfigs (commit version))) (file-name (git-file-name name version)) (sha256 (base32 - "1m310cpb87xx8l8q7jy9fvzf6a0m8rm0dmjpbiwhc2mi6w4gn084")))) + "1hqdsy3zdy5b9l641fvhnawkw6wpb8nkvjql78q8g47js8109mhm")))) (build-system copy-build-system) - (arguments (list #:install-plan #~'(("." "share/tesseract-ocr/tessdata")) - #:phases #~(modify-phases %standard-phases - (add-after 'unpack 'delete-broken-links - (lambda _ - (delete-file "configs") - (delete-file "pdf.ttf")))))) + (arguments (list #:install-plan #~'(("." "share/tesseract-ocr/tessdata")))) (home-page "https://github.com/tesseract-ocr/tessdata_fast") (synopsis "Fast integer versions of trained LSTM models") (description "This repository contains fast integer versions of trained -- 2.39.1
jlicht@HIDDEN
:guix-patches@HIDDEN
.
Full text available.guix-patches@HIDDEN
:bug#61851
; Package guix-patches
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.