GNU bug report logs -
#56386
[PATCH] gnu: Add mecab.
Previous Next
Reported by: Julien Lepiller <julien <at> lepiller.eu>
Date: Mon, 4 Jul 2022 19:11:02 UTC
Severity: normal
Tags: patch
Done: Julien Lepiller <julien <at> lepiller.eu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56386 in the body.
You can then email your comments to 56386 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Mon, 04 Jul 2022 19:11:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Julien Lepiller <julien <at> lepiller.eu>
:
New bug report received and forwarded. Copy sent to
guix-patches <at> gnu.org
.
(Mon, 04 Jul 2022 19:11:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Guix!
This small series adds mecab and two dictionaries. MeCab is a
morphological analysis engine. I'm not sure what that previous sentence
means (:p) but I use it as a segmenter for Japanese in one of my
projects. In fact, the two patches that follow add two dictionary
sources. You need one of them in the same profile as mecab for it to be
useful (with no dictionaries, it segfaults).
Information forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Mon, 04 Jul 2022 19:43:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 56386 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/language.scm (mecab-ipadic): New variable.
---
gnu/packages/language.scm | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 3ffe115b51..63654c544b 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -970,3 +970,30 @@ (define-public mecab
collaboration between the Kyoto university and Nippon Telegraph and Telephone
Corporation. The engine is independent of any language, dictionary or corpus.")
(license (list license:gpl2+ license:lgpl2.1+ license:bsd-3))))
+
+(define-public mecab-ipadic
+ (package
+ (name "mecab-ipadic")
+ (version "2.7.0")
+ (source (package-source mecab))
+ (build-system gnu-build-system)
+ (arguments
+ `(#:configure-flags
+ (list (string-append "--with-dicdir=" (assoc-ref %outputs "out")
+ "/lib/mecab/dic")
+ "--with-charset=utf8")
+ #:phases
+ (modify-phases %standard-phases
+ (add-after 'unpack 'chdir
+ (lambda _
+ (chdir "mecab-ipadic")))
+ (add-before 'configure 'set-mecab-dir
+ (lambda* (#:key outputs #:allow-other-keys)
+ (setenv "MECAB_DICDIR" (string-append (assoc-ref outputs "out")
+ "/lib/mecab/dic")))))))
+ (native-inputs (list mecab)); for mecab-config
+ (home-page "https://taku910.github.io/mecab")
+ (synopsis "Dictionary data for MeCab")
+ (description "This package contains dictionnary data derived from
+ipadic for use with MeCab.")
+ (license (license:non-copyleft "mecab-ipadic/COPYING"))))
--
2.36.1
Information forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Mon, 04 Jul 2022 19:43:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 56386 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/language.scm (mecab): New variable.
* gnu/packages/patches/mecab-variable-param.patch: New file.
* gnu/local.mk (dist_patch_DATA): Add it.
---
gnu/local.mk | 1 +
gnu/packages/language.scm | 51 ++++++++++++++++++-
.../patches/mecab-variable-param.patch | 30 +++++++++++
3 files changed, 81 insertions(+), 1 deletion(-)
create mode 100644 gnu/packages/patches/mecab-variable-param.patch
diff --git a/gnu/local.mk b/gnu/local.mk
index faad6cc6b2..87fe75082c 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -1490,6 +1490,7 @@ dist_patch_DATA = \
%D%/packages/patches/libmemcached-build-with-gcc7.patch \
%D%/packages/patches/libmhash-hmac-fix-uaf.patch \
%D%/packages/patches/libsigrokdecode-python3.9-fix.patch \
+ %D%/packages/patches/mecab-variable-param.patch \
%D%/packages/patches/mercurial-hg-extension-path.patch \
%D%/packages/patches/mesa-opencl-all-targets.patch \
%D%/packages/patches/mesa-skip-tests.patch \
diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 61c9e682ed..3ffe115b51 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -4,7 +4,7 @@
;;; Copyright © 2018 Nikita <nikita <at> n0.is>
;;; Copyright © 2019 Alex Vong <alexvong1995 <at> gmail.com>
;;; Copyright © 2020 Ricardo Wurmus <rekado <at> elephly.net>
-;;; Copyright © 2020 Julien Lepiller <julien <at> lepiller.eu>
+;;; Copyright © 2020, 2022 Julien Lepiller <julien <at> lepiller.eu>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -921,3 +921,52 @@ (define-public praat
analysis (pitch, formant, intensity, ...), speech synthesis, labelling, segmenting
and manipulation.")
(license license:gpl2+)))
+
+(define-public mecab
+ (package
+ (name "mecab")
+ (version "0.996")
+ (source (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "https://github.com/taku910/mecab")
+ ;; latest commit
+ (commit "046fa78b2ed56fbd4fac312040f6d62fc1bc31e3")))
+ (file-name (git-file-name name version))
+ (sha256
+ (base32
+ "1hdv7rgn8j0ym9gsbigydwrbxa8cx2fb0qngg1ya15vvbw0lk4aa"))
+ (patches
+ (search-patches
+ "mecab-variable-param.patch"))))
+ (build-system gnu-build-system)
+ (native-search-paths
+ (list (search-path-specification
+ (variable "MECAB_DICDIR")
+ (separator #f)
+ (files '("lib/mecab/dic")))))
+ (arguments
+ `(#:phases
+ (modify-phases %standard-phases
+ (add-after 'unpack 'chdir
+ (lambda _
+ (chdir "mecab")))
+ (add-before 'build 'add-mecab-dicdir-variable
+ (lambda _
+ (substitute* "mecabrc.in"
+ (("dicdir = .*")
+ "dicdir = $MECAB_DICDIR"))
+ (substitute* "mecab-config.in"
+ (("echo @libdir@/mecab/dic")
+ "if [ -z \"$MECAB_DICDIR\" ]; then
+ echo @libdir@/mecab/dic
+else
+ echo \"$MECAB_DICDIR\"
+fi")))))))
+ (inputs (list libiconv))
+ (home-page "https://taku910.github.io/mecab")
+ (synopsis "Morphological analysis engine for texts")
+ (description "Mecab is a morphological analysis engine developped as a
+collaboration between the Kyoto university and Nippon Telegraph and Telephone
+Corporation. The engine is independent of any language, dictionary or corpus.")
+ (license (list license:gpl2+ license:lgpl2.1+ license:bsd-3))))
diff --git a/gnu/packages/patches/mecab-variable-param.patch b/gnu/packages/patches/mecab-variable-param.patch
new file mode 100644
index 0000000000..4457cf3f44
--- /dev/null
+++ b/gnu/packages/patches/mecab-variable-param.patch
@@ -0,0 +1,30 @@
+From 2396e90056706ef897acab3aaa081289c7336483 Mon Sep 17 00:00:00 2001
+From: LEPILLER Julien <julien.lepiller <at> irisa.fr>
+Date: Fri, 19 Apr 2019 11:48:39 +0200
+Subject: [PATCH] Allow variable parameters
+
+---
+ mecab/src/param.cpp | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/mecab/src/param.cpp b/mecab/src/param.cpp
+index 65328a2..006b1b5 100644
+--- a/mecab/src/param.cpp
++++ b/mecab/src/param.cpp
+@@ -79,8 +79,12 @@ bool Param::load(const char *filename) {
+ size_t s1, s2;
+ for (s1 = pos+1; s1 < line.size() && isspace(line[s1]); s1++);
+ for (s2 = pos-1; static_cast<long>(s2) >= 0 && isspace(line[s2]); s2--);
+- const std::string value = line.substr(s1, line.size() - s1);
++ std::string value = line.substr(s1, line.size() - s1);
+ const std::string key = line.substr(0, s2 + 1);
++
++ if(value.find('$') == 0) {
++ value = std::getenv(value.substr(1).c_str());
++ }
+ set<std::string>(key.c_str(), value, false);
+ }
+
+--
+2.20.1
+
--
2.36.1
Information forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Mon, 04 Jul 2022 19:43:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 56386 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/language.scm (mecab-unidic): New variable.
---
gnu/packages/language.scm | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/gnu/packages/language.scm b/gnu/packages/language.scm
index 63654c544b..f97b982cb9 100644
--- a/gnu/packages/language.scm
+++ b/gnu/packages/language.scm
@@ -27,6 +27,7 @@ (define-module (gnu packages language)
#:use-module (gnu packages autotools)
#:use-module (gnu packages audio)
#:use-module (gnu packages base)
+ #:use-module (gnu packages compression)
#:use-module (gnu packages docbook)
#:use-module (gnu packages emacs)
#:use-module (gnu packages freedesktop)
@@ -57,6 +58,7 @@ (define-module (gnu packages language)
#:use-module (gnu packages xorg)
#:use-module (guix packages)
#:use-module (guix build-system cmake)
+ #:use-module (guix build-system copy)
#:use-module (guix build-system glib-or-gtk)
#:use-module (guix build-system gnu)
#:use-module (guix build-system perl)
@@ -997,3 +999,27 @@ (define-public mecab-ipadic
(description "This package contains dictionnary data derived from
ipadic for use with MeCab.")
(license (license:non-copyleft "mecab-ipadic/COPYING"))))
+
+(define-public mecab-unidic
+ (package
+ (name "mecab-unidic")
+ (version "3.1.0")
+ (source (origin
+ (method url-fetch)
+ (uri (string-append "https://clrd.ninjal.ac.jp/unidic_archive/cwj/"
+ version "/unidic-cwj-" version ".zip"))
+ (sha256
+ (base32
+ "1z132p2q3bgchiw529j2d7dari21kn0fhkgrj3vcl0ncg2m521il"))))
+ (build-system copy-build-system)
+ (arguments
+ `(#:install-plan
+ '(("." "lib/mecab/dic"
+ #:include-regexp ("\\.bin$" "\\.def$" "\\.dic$" "dicrc")))))
+ (native-inputs (list unzip))
+ (home-page "https://clrd.ninjal.ac.jp/unidic/en/")
+ (synopsis "Dictionary data for MeCab")
+ (description "UniDic for morphological analysis is a dictionary for
+analysis with the morphological analyser MeCab, where the short units exported
+from the database are used as entries (heading terms).")
+ (license (list license:gpl2+ license:lgpl2.1 license:bsd-3))))
--
2.36.1
Information forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Sun, 17 Jul 2022 19:34:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 56386 <at> debbugs.gnu.org (full text, mbox):
Hi,
Julien Lepiller <julien <at> lepiller.eu> skribis:
> + (synopsis "Dictionary data for MeCab")
> + (description "UniDic for morphological analysis is a dictionary for
> +analysis with the morphological analyser MeCab, where the short units exported
> +from the database are used as entries (heading terms).")
> + (license (list license:gpl2+ license:lgpl2.1 license:bsd-3))))
Maybe add a comment stating whether this is triple-licensed (at the
user’s choice) or if that means that there are files under each of
these.
Otherwise the whole series LGTM!
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#56386
; Package
guix-patches
.
(Thu, 30 Mar 2023 22:44:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 56386 <at> debbugs.gnu.org (full text, mbox):
On 2022-07-04 20:09, Julien Lepiller wrote:
> Hi Guix!
>
> This small series adds mecab and two dictionaries. MeCab is a
> morphological analysis engine. I'm not sure what that previous sentence
> means (:p) but I use it as a segmenter for Japanese in one of my
> projects. In fact, the two patches that follow add two dictionary
> sources. You need one of them in the same profile as mecab for it to be
> useful (with no dictionaries, it segfaults).
>
>
>
Any updates regarding this?
Cheers,
Bruno
Reply sent
to
Julien Lepiller <julien <at> lepiller.eu>
:
You have taken responsibility.
(Sat, 01 Apr 2023 14:44:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Julien Lepiller <julien <at> lepiller.eu>
:
bug acknowledged by developer.
(Sat, 01 Apr 2023 14:44:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 56386-done <at> debbugs.gnu.org (full text, mbox):
Le Thu, 30 Mar 2023 23:43:22 +0100,
Bruno Victal <mirai <at> makinata.eu> a écrit :
> On 2022-07-04 20:09, Julien Lepiller wrote:
> > Hi Guix!
> >
> > This small series adds mecab and two dictionaries. MeCab is a
> > morphological analysis engine. I'm not sure what that previous
> > sentence means (:p) but I use it as a segmenter for Japanese in one
> > of my projects. In fact, the two patches that follow add two
> > dictionary sources. You need one of them in the same profile as
> > mecab for it to be useful (with no dictionaries, it segfaults).
> >
> >
> >
>
> Any updates regarding this?
>
>
> Cheers,
> Bruno
I had forgotten about this. It's a triple license (at the user's
choice), so I added a comment. Pushed to master as
3ab24ba216ce91210b93ec61554b3343fbc3aaab to
4483296da3e2e1424d12d92d0f56fb428765ca43.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 30 Apr 2023 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 14 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.