GNU bug report logs - #73106
[PATCH 00/10] Add python-tokenizers.

Previous Next

Package: guix-patches;

Reported by: Nicolas Graves <ngraves <at> ngraves.fr>

Date: Sat, 7 Sep 2024 16:33:02 UTC

Severity: normal

Tags: patch

Done: Ricardo Wurmus <rekado <at> elephly.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 73106 in the body.
You can then email your comments to 73106 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nicolas Graves <ngraves <at> ngraves.fr>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Sat, 07 Sep 2024 16:33:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: guix-patches <at> gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 00/10] Add python-tokenizers.
Date: Sat,  7 Sep 2024 18:21:12 +0200
This patch series adds the package python-tokenizers, which is a
prerequisite for packaging python-transformers.

Nicolas Graves (10):
  gnu: Add rust-esaxx-rs-0.1.
  gnu: Add rust-spm-precompiled-0.1.
  gnu: Add rust-macro-rules-attribute-proc-macro-0.2.
  gnu: Add rust-macro-rules-attribute-0.2.
  gnu: Add rust-hf-hub-0.3.
  gnu: Add rust-monostate-impl-0.1.
  gnu: Add rust-monostate-0.1.
  gnu: Add rust-tokenizers.
  gnu: Add rust-numpy-0.21.
  gnu: Add python-tokenizers.

 gnu/packages/crates-io.scm        | 133 +++++++++++++++
 gnu/packages/machine-learning.scm | 266 ++++++++++++++++++++++++++++++
 2 files changed, 399 insertions(+)

-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:02 GMT) Full text and rfc822 format available.

Message #8 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 01/10] gnu: Add rust-esaxx-rs-0.1.
Date: Sat,  7 Sep 2024 18:56:07 +0200
* gnu/packages/machine-learning.scm (rust-esaxx-rs-0.1): New variable.

Change-Id: I38a666dd5b9f20dc721e0a28ad718ff5f227b708
---
 gnu/packages/machine-learning.scm | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 12be1d7bf6..4385603a4a 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5580,6 +5580,26 @@ (define-public python-torchfile
 Python.")
     (license license:bsd-3)))
 
+(define-public rust-esaxx-rs-0.1
+  (package
+    (name "rust-esaxx-rs")
+    (version "0.1.10")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "esaxx-rs" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "1rm6vm5yr7s3n5ly7k9x9j6ra5p2l2ld151gnaya8x03qcwf05yq"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs (("rust-cc" ,rust-cc-1))))
+    (home-page "https://github.com/Narsil/esaxx-rs")
+    (synopsis "Wrapper for sentencepiece's esaxxx library")
+    (description
+     "This package provides a wrapper around sentencepiece's esaxxx library.")
+    (license license:asl2.0)))
+
 (define-public python-hmmlearn
   (package
     (name "python-hmmlearn")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:02 GMT) Full text and rfc822 format available.

Message #11 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 02/10] gnu: Add rust-spm-precompiled-0.1.
Date: Sat,  7 Sep 2024 18:56:08 +0200
* gnu/packages/machine-learning.scm (rust-spm-precompiled-0.1): New variable.

Change-Id: I622c1a875e10041703ef0a32e7c35074f534276b
---
 gnu/packages/machine-learning.scm | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 4385603a4a..d3f76ebeba 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5600,6 +5600,33 @@ (define-public rust-esaxx-rs-0.1
      "This package provides a wrapper around sentencepiece's esaxxx library.")
     (license license:asl2.0)))
 
+(define-public rust-spm-precompiled-0.1
+  (package
+    (name "rust-spm-precompiled")
+    (version "0.1.4")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "spm_precompiled" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "09pkdk2abr8xf4pb9kq3rk80dgziq6vzfk7aywv3diik82f6jlaq"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs
+       (("rust-base64" ,rust-base64-0.13)
+        ("rust-nom" ,rust-nom-7)
+        ("rust-serde" ,rust-serde-1)
+        ("rust-unicode-segmentation" ,rust-unicode-segmentation-1))))
+    (home-page "https://github.com/huggingface/spm_precompiled")
+    (synopsis "Emulate sentencepiece's DoubleArray")
+    (description
+     "This crate aims to emulate
+@url{https://github.com/google/sentencepiece,sentencepiece}
+Dart::@code{DoubleArray} struct and it's Normalizer.  This crate is highly
+specialized and not intended for general use.")
+    (license license:asl2.0)))
+
 (define-public python-hmmlearn
   (package
     (name "python-hmmlearn")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:03 GMT) Full text and rfc822 format available.

Message #14 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 03/10] gnu: Add rust-macro-rules-attribute-proc-macro-0.2.
Date: Sat,  7 Sep 2024 18:56:09 +0200
* gnu/packages/crates-io.scm (rust-macro-rules-attribute-proc-macro-0.2): New variable.

Change-Id: I1fab6de81c897643cae52e733bd06bb00ea1bd7f
---
 gnu/packages/crates-io.scm | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 36ecbe4430..d04f8723fd 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -41076,6 +41076,27 @@ (define-public rust-macaddr-1
     (description "This pakcage provides MAC address types.")
     (license (list license:asl2.0 license:expat))))
 
+(define-public rust-macro-rules-attribute-proc-macro-0.2
+  (package
+    (name "rust-macro-rules-attribute-proc-macro")
+    (version "0.2.0")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "macro_rules_attribute-proc_macro" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "0s45j4zm0a5d041g3vcbanvr76p331dfjb7gw9qdmh0w8mnqbpdq"))))
+    (build-system cargo-build-system)
+    (home-page
+     "https://github.com/danielhenrymantilla/macro_rules_attribute-rs")
+    (synopsis "Use declarative macros in Rust")
+    (description
+     "This package provides the ability to use Rust declarative macros as
+proc_macro attributes or derives.  This package provides implementation
+details to @code{rust-macro-rules-attribute}.")
+    (license license:expat)))
+
 (define-public rust-macrotest-1
   (package
     (name "rust-macrotest")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:03 GMT) Full text and rfc822 format available.

Message #17 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 04/10] gnu: Add rust-macro-rules-attribute-0.2.
Date: Sat,  7 Sep 2024 18:56:10 +0200
* gnu/packages/crates-io.scm (rust-macro-rules-attribute-0.2): New variable.

Change-Id: I62c9ba35a8a9f71f05f0f3c5307d7abe11f408c8
---
 gnu/packages/crates-io.scm | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index d04f8723fd..658721b123 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -41097,6 +41097,34 @@ (define-public rust-macro-rules-attribute-proc-macro-0.2
 details to @code{rust-macro-rules-attribute}.")
     (license license:expat)))
 
+(define-public rust-macro-rules-attribute-0.2
+  (package
+    (name "rust-macro-rules-attribute")
+    (version "0.2.0")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "macro_rules_attribute" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "04waa4qm28adwnxsxhx9135ki68mwkikr6m5pi5xhcy0gcgjg0la"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs
+       (("rust-macro-rules-attribute-proc-macro"
+         ,rust-macro-rules-attribute-proc-macro-0.2)
+        ("rust-paste" ,rust-paste-1))
+       #:cargo-development-inputs
+       (("rust-once-cell" ,rust-once-cell-1)
+        ("rust-pin-project-lite" ,rust-pin-project-lite-0.2)
+        ("rust-serde" ,rust-serde-1))))
+    (home-page "https://crates.io/crates/macro_rules_attribute")
+    (synopsis "Use declarative macros in Rust")
+    (description
+     "This package provides the ability to use Rust declarative macros as
+proc_macro attributes or derives.")
+    (license license:expat)))
+
 (define-public rust-macrotest-1
   (package
     (name "rust-macrotest")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:03 GMT) Full text and rfc822 format available.

Message #20 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 05/10] gnu: Add rust-hf-hub-0.3.
Date: Sat,  7 Sep 2024 18:56:11 +0200
* gnu/packages/machine-learning.scm (rust-hf-hub-0.3): New variable.

Change-Id: I9e64c316dde8094e6142785af8549556953513e0
---
 gnu/packages/machine-learning.scm | 48 +++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index d3f76ebeba..27d7f0526b 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -78,7 +78,10 @@ (define-module (gnu packages machine-learning)
   #:use-module (gnu packages cmake)
   #:use-module (gnu packages cpp)
   #:use-module (gnu packages cran)
+  #:use-module (gnu packages crates-crypto)
   #:use-module (gnu packages crates-io)
+  #:use-module (gnu packages crates-tls)
+  #:use-module (gnu packages crates-web)
   #:use-module (gnu packages databases)
   #:use-module (gnu packages dejagnu)
   #:use-module (gnu packages documentation)
@@ -5627,6 +5630,51 @@ (define-public rust-spm-precompiled-0.1
 specialized and not intended for general use.")
     (license license:asl2.0)))
 
+(define-public rust-hf-hub-0.3
+  (package
+    (name "rust-hf-hub")
+    (version "0.3.2")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "hf-hub" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "0cnpivy9fn62lm1fw85kmg3ryvrx8drq63c96vq94gabawshcy1b"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:tests? #f  ; require network connection
+       #:cargo-inputs
+       (("rust-dirs" ,rust-dirs-5)
+        ("rust-futures" ,rust-futures-0.3)
+        ("rust-indicatif" ,rust-indicatif-0.17)
+        ("rust-log" ,rust-log-0.4)
+        ("rust-native-tls" ,rust-native-tls-0.2)
+        ("rust-num-cpus" ,rust-num-cpus-1)
+        ("rust-rand" ,rust-rand-0.8)
+        ("rust-reqwest" ,rust-reqwest-0.11)
+        ("rust-serde" ,rust-serde-1)
+        ("rust-serde-json" ,rust-serde-json-1)
+        ("rust-thiserror" ,rust-thiserror-1)
+        ("rust-tokio" ,rust-tokio-1)
+        ("rust-ureq" ,rust-ureq-2))
+       #:cargo-development-inputs
+       (("rust-hex-literal" ,rust-hex-literal-0.4)
+        ("rust-sha2" ,rust-sha2-0.10)
+        ("rust-tokio-test" ,rust-tokio-test-0.4))))
+    (native-inputs
+     (list pkg-config))
+    (inputs
+     (list openssl))
+    (home-page "https://github.com/huggingface/hf-hub")
+    (synopsis "Interact with HuggingFace in Rust")
+    (description
+     "This crates aims ease the interaction with
+@url{https://huggingface.co/,huggingface}.  It aims to be compatible with
+@url{https://github.com/huggingface/huggingface_hub/,huggingface_hub}
+python package, but only implements a smaller subset of functions.")
+    (license license:asl2.0)))
+
 (define-public python-hmmlearn
   (package
     (name "python-hmmlearn")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:04 GMT) Full text and rfc822 format available.

Message #23 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 07/10] gnu: Add rust-monostate-0.1.
Date: Sat,  7 Sep 2024 18:56:13 +0200
* gnu/packages/crates-io.scm (rust-monostate-0.1): New variable.

Change-Id: I53f1ebfaf98e785eedeb3293f211bffa6f44bc76
---
 gnu/packages/crates-io.scm | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 28ff81c801..7a8f090fd9 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -43741,6 +43741,32 @@ (define-public rust-monostate-impl-0.1
      "This package provides Implementation detail of the monostate crate.")
     (license (list license:expat license:asl2.0))))
 
+(define-public rust-monostate-0.1
+  (package
+    (name "rust-monostate")
+    (version "0.1.11")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "monostate" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "0xchz8cs990g7g5f8jjybjnyi9xnhykiq44gl97p5rbh3hgjm347"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs
+       (("rust-monostate-impl" ,rust-monostate-impl-0.1)
+        ("rust-serde" ,rust-serde-1))
+       #:cargo-development-inputs
+       (("rust-serde" ,rust-serde-1)
+        ("rust-serde-json" ,rust-serde-json-1))))
+    (home-page "https://github.com/dtolnay/monostate")
+    (synopsis "Type that deserializes only from one specific value")
+    (description
+     "This package provides a Rust type that deserializes only from one
+specific value.")
+    (license (list license:expat license:asl2.0))))
+
 (define-public rust-more-asserts-0.3
   (package
     (name "rust-more-asserts")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:04 GMT) Full text and rfc822 format available.

Message #26 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 06/10] gnu: Add rust-monostate-impl-0.1.
Date: Sat,  7 Sep 2024 18:56:12 +0200
* gnu/packages/crates-io.scm (rust-monostate-impl-0.1): New variable.

Change-Id: Ica72fb8bce3589ed1ee5b08c3d96dcc24aaee279
---
 gnu/packages/crates-io.scm | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 658721b123..28ff81c801 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -43718,6 +43718,29 @@ (define-public rust-modifier-0.1
       "Chaining APIs for both self -> Self and &mut self methods.")
     (license license:expat)))
 
+(define-public rust-monostate-impl-0.1
+  (package
+    (name "rust-monostate-impl")
+    (version "0.1.11")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "monostate-impl" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "1km6kc6yxvpsxciaj02zar8cx1sq142s6jn6saqn77h7165dd1pn"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs
+       (("rust-proc-macro2" ,rust-proc-macro2-1)
+        ("rust-quote" ,rust-quote-1)
+        ("rust-syn" ,rust-syn-2))))
+    (home-page "https://github.com/dtolnay/monostate")
+    (synopsis "Implementation detail of the monostate crate")
+    (description
+     "This package provides Implementation detail of the monostate crate.")
+    (license (list license:expat license:asl2.0))))
+
 (define-public rust-more-asserts-0.3
   (package
     (name "rust-more-asserts")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:05 GMT) Full text and rfc822 format available.

Message #29 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 08/10] gnu: Add rust-tokenizers.
Date: Sat,  7 Sep 2024 18:56:14 +0200
* gnu/packages/machine-learning.scm (rust-tokenizers): New variable.

Change-Id: I3189a2d826f072f65ad053d77eb39be39775f1c2
---
 gnu/packages/machine-learning.scm | 60 +++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 27d7f0526b..3b601f6c91 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5675,6 +5675,66 @@ (define-public rust-hf-hub-0.3
 python package, but only implements a smaller subset of functions.")
     (license license:asl2.0)))
 
+(define-public rust-tokenizers
+  (package
+    (name "rust-tokenizers")
+    (version "0.19.1")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "tokenizers" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "1zg6ffpllygijb5bh227m9p4lrhf0pjkysky68kddwrsvp8zl075"))
+       (modules '((guix build utils)))
+       (snippet
+        #~(substitute* "Cargo.toml"
+            (("0.1.12") ; rust-monostate requires a rust-syn-2 update
+             "0.1.11")
+            (("version = \"6.4\"")  ; rust-onig
+             "version = \"6.1.1\"")))))
+    (build-system cargo-build-system)
+    (arguments
+     (list
+      #:tests? #f  ; tests are relying on missing data.
+      #:cargo-inputs
+      `(("rust-aho-corasick" ,rust-aho-corasick-1)
+        ("rust-derive-builder" ,rust-derive-builder-0.20)
+        ("rust-esaxx-rs" ,rust-esaxx-rs-0.1)
+        ("rust-fancy-regex" ,rust-fancy-regex-0.13)
+        ("rust-getrandom" ,rust-getrandom-0.2)
+        ("rust-hf-hub" ,rust-hf-hub-0.3)
+        ("rust-indicatif" ,rust-indicatif-0.17)
+        ("rust-itertools" ,rust-itertools-0.12)
+        ("rust-lazy-static" ,rust-lazy-static-1)
+        ("rust-log" ,rust-log-0.4)
+        ("rust-macro-rules-attribute" ,rust-macro-rules-attribute-0.2)
+        ("rust-monostate" ,rust-monostate-0.1)
+        ("rust-onig" ,rust-onig-6)
+        ("rust-paste" ,rust-paste-1)
+        ("rust-rand" ,rust-rand-0.8)
+        ("rust-rayon" ,rust-rayon-1)
+        ("rust-rayon-cond" ,rust-rayon-cond-0.3)
+        ("rust-regex" ,rust-regex-1)
+        ("rust-regex-syntax" ,rust-regex-syntax-0.8)
+        ("rust-serde" ,rust-serde-1)
+        ("rust-serde-json" ,rust-serde-json-1)
+        ("rust-spm-precompiled" ,rust-spm-precompiled-0.1)
+        ("rust-thiserror" ,rust-thiserror-1)
+        ("rust-unicode-normalization-alignments" ,rust-unicode-normalization-alignments-0.1)
+        ("rust-unicode-segmentation" ,rust-unicode-segmentation-1)
+        ("rust-unicode-categories" ,rust-unicode-categories-0.1))
+      #:cargo-development-inputs
+      `(("rust-assert-approx-eq" ,rust-assert-approx-eq-1)
+        ("rust-criterion" ,rust-criterion-0.5)
+        ("rust-tempfile" ,rust-tempfile-3))))
+    (home-page "https://github.com/huggingface/tokenizers")
+    (synopsis "Implementation of various popular tokenizers")
+    (description
+     "This package provides a Rust implementation of today's most used
+tokenizers, with a focus on performances and versatility.")
+    (license license:asl2.0)))
+
 (define-public python-hmmlearn
   (package
     (name "python-hmmlearn")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:05 GMT) Full text and rfc822 format available.

Message #32 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 09/10] gnu: Add rust-numpy-0.21.
Date: Sat,  7 Sep 2024 18:56:15 +0200
* gnu/packages/crates-io.scm (rust-numpy-0.21): New variable.

Change-Id: Idae5915f3cefa47c16c4bf9a5679f55621e35da7
---
 gnu/packages/crates-io.scm | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 7a8f090fd9..ba5cb75d2c 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -48734,6 +48734,41 @@ (define-public rust-number-prefix-0.3
 giga, kibi.")
     (license license:expat)))
 
+(define-public rust-numpy-0.21
+  (package
+    (name "rust-numpy")
+    (version "0.21.0")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (crate-uri "numpy" version))
+       (file-name (string-append name "-" version ".tar.gz"))
+       (sha256
+        (base32 "1x1p5x7lwfc5nsccwj98sln5vx3g3n8sbgm5fmfmy5rpr8rhf5zc"))))
+    (build-system cargo-build-system)
+    (arguments
+     `(#:cargo-inputs
+       (("rust-half" ,rust-half-2)
+        ("rust-libc" ,rust-libc-0.2)
+        ("rust-nalgebra" ,rust-nalgebra-0.32)
+        ("rust-ndarray" ,rust-ndarray-0.13)
+        ("rust-num-complex" ,rust-num-complex-0.2)
+        ("rust-num-integer" ,rust-num-integer-0.1)
+        ("rust-num-traits" ,rust-num-traits-0.2)
+        ("rust-pyo3" ,rust-pyo3-0.21)
+        ("rust-rustc-hash" ,rust-rustc-hash-1))
+       #:cargo-development-inputs
+       (("rust-nalgebra" ,rust-nalgebra-0.32)
+        ("rust-pyo3" ,rust-pyo3-0.21))))
+    (native-inputs (list python-minimal
+                         (@ (gnu packages python-xyz) python-numpy)))
+    (home-page "https://github.com/PyO3/rust-numpy")
+    (synopsis "Rust bindings for the NumPy C-API")
+    (description
+     "This package provides @code{PyO3-based} Rust bindings of the
+@code{NumPy} C-API.")
+    (license license:bsd-2)))
+
 (define-public rust-numtoa-0.2
   (package
     (name "rust-numtoa")
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#73106; Package guix-patches. (Sat, 07 Sep 2024 16:57:06 GMT) Full text and rfc822 format available.

Message #35 received at 73106 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Graves <ngraves <at> ngraves.fr>
To: 73106 <at> debbugs.gnu.org
Cc: ngraves <at> ngraves.fr
Subject: [PATCH 10/10] gnu: Add python-tokenizers.
Date: Sat,  7 Sep 2024 18:56:16 +0200
* gnu/packages/machine-learning.scm (python-tokenizers): New variable.

Change-Id: I5db95172255dc4635c2a417f3b7252454eea27d7
---
 gnu/packages/machine-learning.scm | 111 ++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 3b601f6c91..412499d424 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5735,6 +5735,117 @@ (define-public rust-tokenizers
 tokenizers, with a focus on performances and versatility.")
     (license license:asl2.0)))
 
+(define-public python-tokenizers
+  (package
+    (name "python-tokenizers")
+    (version "0.19.1")
+    (source
+     (origin
+       (method url-fetch)
+       (uri (pypi-uri "tokenizers" version))
+       (sha256
+        (base32 "1qw8mjp0q9w7j1raq1rvcbfw38000kbqpwscf9mvxzfh1rlfcngf"))
+       (modules '((guix build utils)
+                  (ice-9 ftw)))
+       (snippet
+        #~(begin  ;; Only keeping bindings.
+            (for-each (lambda (file)
+                        (unless (member file '("." ".." "bindings" "PKG-INFO"))
+                          (delete-file-recursively file)))
+                      (scandir "."))
+            (for-each (lambda (file)
+                        (unless (member file '("." ".."))
+                          (rename-file (string-append "bindings/python/" file) file)))
+                      (scandir "bindings/python"))
+            (delete-file-recursively ".cargo")))))
+    (build-system cargo-build-system)
+    (arguments
+     (list
+      #:cargo-test-flags ''("--no-default-features")
+      #:imported-modules `(,@%cargo-build-system-modules
+                           ,@%pyproject-build-system-modules)
+      #:modules '((guix build cargo-build-system)
+                  ((guix build pyproject-build-system) #:prefix py:)
+                  (guix build utils)
+                  (ice-9 regex)
+                  (ice-9 textual-ports))
+      #:phases
+      #~(modify-phases %standard-phases
+          (add-after 'unpack-rust-crates 'inject-tokenizers
+            (lambda _
+              (substitute* "Cargo.toml"
+                (("\\[dependencies\\]")
+                 (format #f "
+[dev-dependencies]
+tempfile = ~s
+pyo3 = { version = ~s, features = [\"auto-initialize\"] }
+
+[dependencies]
+tokenizers = ~s"
+                         #$(package-version rust-tempfile-3)
+                         #$(package-version rust-pyo3-0.21)
+                         #$(package-version rust-tokenizers))))
+              (let ((file-path "Cargo.toml"))
+                (call-with-input-file file-path
+                  (lambda (port)
+                    (let* ((content (get-string-all port))
+                           (top-match (string-match
+                                       "\\[dependencies.tokenizers" content)))
+                      (call-with-output-file file-path
+                        (lambda (out)
+                          (format out "~a" (match:prefix top-match))))))))))
+          (add-after 'patch-cargo-checksums 'loosen-requirements
+            (lambda _
+              (substitute* "Cargo.toml"
+                (("version = \"6.4\"")
+                 (format #f "version = ~s"
+                         #$(package-version rust-onig-6))))))
+          (add-after 'check 'python-check
+            (lambda _
+              (copy-file "target/release/libtokenizers.so"
+                         "py_src/tokenizers/tokenizers.so")
+              (invoke "python3"
+                      "-c" (format #f
+                                   "import sys; sys.path.append(\"~a/py_src\")"
+                                   (getcwd))
+                      "-m" "pytest"
+                      "-s" "-v" "./tests/")))
+          (add-after 'install 'install-python
+            (lambda _
+              (let* ((pversion #$(version-major+minor (package-version python)))
+                     (lib (string-append #$output "/lib/python" pversion
+                                         "/site-packages/"))
+                     (info (string-append lib "tokenizers-"
+                                        #$(package-version this-package)
+                                        ".dist-info")))
+                (mkdir-p info)
+                (copy-file "PKG-INFO" (string-append info "/METADATA"))
+                (copy-recursively
+                 "py_src/tokenizers"
+                 (string-append lib "tokenizers"))))))
+      #:cargo-inputs
+      `(("rust-rayon" ,rust-rayon-1)
+        ("rust-serde" ,rust-serde-1)
+        ("rust-serde-json" ,rust-serde-json-1)
+        ("rust-libc" ,rust-libc-0.2)
+        ("rust-env-logger" ,rust-env-logger-0.11)
+        ("rust-pyo3" ,rust-pyo3-0.21)
+        ("rust-numpy" ,rust-numpy-0.21)
+        ("rust-ndarray" ,rust-ndarray-0.15)
+        ("rust-onig" ,rust-onig-6)
+        ("rust-itertools" ,rust-itertools-0.12)
+        ("rust-tokenizers" ,rust-tokenizers))
+      #:cargo-development-inputs
+      `(("rust-tempfile" ,rust-tempfile-3))))
+    (native-inputs
+     (list python-minimal python-pytest))
+    (home-page "https://huggingface.co/docs/tokenizers")
+    (synopsis "Implementation of various popular tokenizers")
+    (description
+     "This package provides bindings to a Rust implementation of the most used
+tokenizers, @code{rust-tokenizers}.")
+    (license license:asl2.0)))
+
 (define-public python-hmmlearn
   (package
     (name "python-hmmlearn")
-- 
2.45.2





Added blocking bug(s) 73094 Request was from Nicolas Graves <ngraves <at> ngraves.fr> to control <at> debbugs.gnu.org. (Sat, 07 Sep 2024 17:08:02 GMT) Full text and rfc822 format available.

Added indication that bug 73106 blocks73109 Request was from Nicolas Graves <ngraves <at> ngraves.fr> to control <at> debbugs.gnu.org. (Sat, 07 Sep 2024 17:09:01 GMT) Full text and rfc822 format available.

Reply sent to Ricardo Wurmus <rekado <at> elephly.net>:
You have taken responsibility. (Mon, 07 Apr 2025 14:55:02 GMT) Full text and rfc822 format available.

Notification sent to Nicolas Graves <ngraves <at> ngraves.fr>:
bug acknowledged by developer. (Mon, 07 Apr 2025 14:55:02 GMT) Full text and rfc822 format available.

Message #44 received at 73106-done <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: 73106-done <at> debbugs.gnu.org
Subject: [PATCH 00/10] Add python-tokenizers.
Date: Mon, 07 Apr 2025 16:54:22 +0200
Rebased, adjusted, and pushed with commit 
6483fdee51a79db09f4645b34f4ebb24f31816b3.
Thank you for your patience!

-- 
Ricardo




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 06 May 2025 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 60 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.