GNU bug report logs -
#73106
[PATCH 00/10] Add python-tokenizers.
Previous Next
Reported by: Nicolas Graves <ngraves <at> ngraves.fr>
Date: Sat, 7 Sep 2024 16:33:02 UTC
Severity: normal
Tags: patch
Done: Ricardo Wurmus <rekado <at> elephly.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 73106 in the body.
You can then email your comments to 73106 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:33:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Nicolas Graves <ngraves <at> ngraves.fr>
:
New bug report received and forwarded. Copy sent to
guix-patches <at> gnu.org
.
(Sat, 07 Sep 2024 16:33:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
This patch series adds the package python-tokenizers, which is a
prerequisite for packaging python-transformers.
Nicolas Graves (10):
gnu: Add rust-esaxx-rs-0.1.
gnu: Add rust-spm-precompiled-0.1.
gnu: Add rust-macro-rules-attribute-proc-macro-0.2.
gnu: Add rust-macro-rules-attribute-0.2.
gnu: Add rust-hf-hub-0.3.
gnu: Add rust-monostate-impl-0.1.
gnu: Add rust-monostate-0.1.
gnu: Add rust-tokenizers.
gnu: Add rust-numpy-0.21.
gnu: Add python-tokenizers.
gnu/packages/crates-io.scm | 133 +++++++++++++++
gnu/packages/machine-learning.scm | 266 ++++++++++++++++++++++++++++++
2 files changed, 399 insertions(+)
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/machine-learning.scm (rust-esaxx-rs-0.1): New variable.
Change-Id: I38a666dd5b9f20dc721e0a28ad718ff5f227b708
---
gnu/packages/machine-learning.scm | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 12be1d7bf6..4385603a4a 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5580,6 +5580,26 @@ (define-public python-torchfile
Python.")
(license license:bsd-3)))
+(define-public rust-esaxx-rs-0.1
+ (package
+ (name "rust-esaxx-rs")
+ (version "0.1.10")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "esaxx-rs" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "1rm6vm5yr7s3n5ly7k9x9j6ra5p2l2ld151gnaya8x03qcwf05yq"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs (("rust-cc" ,rust-cc-1))))
+ (home-page "https://github.com/Narsil/esaxx-rs")
+ (synopsis "Wrapper for sentencepiece's esaxxx library")
+ (description
+ "This package provides a wrapper around sentencepiece's esaxxx library.")
+ (license license:asl2.0)))
+
(define-public python-hmmlearn
(package
(name "python-hmmlearn")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/machine-learning.scm (rust-spm-precompiled-0.1): New variable.
Change-Id: I622c1a875e10041703ef0a32e7c35074f534276b
---
gnu/packages/machine-learning.scm | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 4385603a4a..d3f76ebeba 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5600,6 +5600,33 @@ (define-public rust-esaxx-rs-0.1
"This package provides a wrapper around sentencepiece's esaxxx library.")
(license license:asl2.0)))
+(define-public rust-spm-precompiled-0.1
+ (package
+ (name "rust-spm-precompiled")
+ (version "0.1.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "spm_precompiled" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "09pkdk2abr8xf4pb9kq3rk80dgziq6vzfk7aywv3diik82f6jlaq"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs
+ (("rust-base64" ,rust-base64-0.13)
+ ("rust-nom" ,rust-nom-7)
+ ("rust-serde" ,rust-serde-1)
+ ("rust-unicode-segmentation" ,rust-unicode-segmentation-1))))
+ (home-page "https://github.com/huggingface/spm_precompiled")
+ (synopsis "Emulate sentencepiece's DoubleArray")
+ (description
+ "This crate aims to emulate
+@url{https://github.com/google/sentencepiece,sentencepiece}
+Dart::@code{DoubleArray} struct and it's Normalizer. This crate is highly
+specialized and not intended for general use.")
+ (license license:asl2.0)))
+
(define-public python-hmmlearn
(package
(name "python-hmmlearn")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/crates-io.scm (rust-macro-rules-attribute-proc-macro-0.2): New variable.
Change-Id: I1fab6de81c897643cae52e733bd06bb00ea1bd7f
---
gnu/packages/crates-io.scm | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 36ecbe4430..d04f8723fd 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -41076,6 +41076,27 @@ (define-public rust-macaddr-1
(description "This pakcage provides MAC address types.")
(license (list license:asl2.0 license:expat))))
+(define-public rust-macro-rules-attribute-proc-macro-0.2
+ (package
+ (name "rust-macro-rules-attribute-proc-macro")
+ (version "0.2.0")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "macro_rules_attribute-proc_macro" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "0s45j4zm0a5d041g3vcbanvr76p331dfjb7gw9qdmh0w8mnqbpdq"))))
+ (build-system cargo-build-system)
+ (home-page
+ "https://github.com/danielhenrymantilla/macro_rules_attribute-rs")
+ (synopsis "Use declarative macros in Rust")
+ (description
+ "This package provides the ability to use Rust declarative macros as
+proc_macro attributes or derives. This package provides implementation
+details to @code{rust-macro-rules-attribute}.")
+ (license license:expat)))
+
(define-public rust-macrotest-1
(package
(name "rust-macrotest")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:03 GMT)
Full text and
rfc822 format available.
Message #17 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/crates-io.scm (rust-macro-rules-attribute-0.2): New variable.
Change-Id: I62c9ba35a8a9f71f05f0f3c5307d7abe11f408c8
---
gnu/packages/crates-io.scm | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index d04f8723fd..658721b123 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -41097,6 +41097,34 @@ (define-public rust-macro-rules-attribute-proc-macro-0.2
details to @code{rust-macro-rules-attribute}.")
(license license:expat)))
+(define-public rust-macro-rules-attribute-0.2
+ (package
+ (name "rust-macro-rules-attribute")
+ (version "0.2.0")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "macro_rules_attribute" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "04waa4qm28adwnxsxhx9135ki68mwkikr6m5pi5xhcy0gcgjg0la"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs
+ (("rust-macro-rules-attribute-proc-macro"
+ ,rust-macro-rules-attribute-proc-macro-0.2)
+ ("rust-paste" ,rust-paste-1))
+ #:cargo-development-inputs
+ (("rust-once-cell" ,rust-once-cell-1)
+ ("rust-pin-project-lite" ,rust-pin-project-lite-0.2)
+ ("rust-serde" ,rust-serde-1))))
+ (home-page "https://crates.io/crates/macro_rules_attribute")
+ (synopsis "Use declarative macros in Rust")
+ (description
+ "This package provides the ability to use Rust declarative macros as
+proc_macro attributes or derives.")
+ (license license:expat)))
+
(define-public rust-macrotest-1
(package
(name "rust-macrotest")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:03 GMT)
Full text and
rfc822 format available.
Message #20 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/machine-learning.scm (rust-hf-hub-0.3): New variable.
Change-Id: I9e64c316dde8094e6142785af8549556953513e0
---
gnu/packages/machine-learning.scm | 48 +++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index d3f76ebeba..27d7f0526b 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -78,7 +78,10 @@ (define-module (gnu packages machine-learning)
#:use-module (gnu packages cmake)
#:use-module (gnu packages cpp)
#:use-module (gnu packages cran)
+ #:use-module (gnu packages crates-crypto)
#:use-module (gnu packages crates-io)
+ #:use-module (gnu packages crates-tls)
+ #:use-module (gnu packages crates-web)
#:use-module (gnu packages databases)
#:use-module (gnu packages dejagnu)
#:use-module (gnu packages documentation)
@@ -5627,6 +5630,51 @@ (define-public rust-spm-precompiled-0.1
specialized and not intended for general use.")
(license license:asl2.0)))
+(define-public rust-hf-hub-0.3
+ (package
+ (name "rust-hf-hub")
+ (version "0.3.2")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "hf-hub" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "0cnpivy9fn62lm1fw85kmg3ryvrx8drq63c96vq94gabawshcy1b"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:tests? #f ; require network connection
+ #:cargo-inputs
+ (("rust-dirs" ,rust-dirs-5)
+ ("rust-futures" ,rust-futures-0.3)
+ ("rust-indicatif" ,rust-indicatif-0.17)
+ ("rust-log" ,rust-log-0.4)
+ ("rust-native-tls" ,rust-native-tls-0.2)
+ ("rust-num-cpus" ,rust-num-cpus-1)
+ ("rust-rand" ,rust-rand-0.8)
+ ("rust-reqwest" ,rust-reqwest-0.11)
+ ("rust-serde" ,rust-serde-1)
+ ("rust-serde-json" ,rust-serde-json-1)
+ ("rust-thiserror" ,rust-thiserror-1)
+ ("rust-tokio" ,rust-tokio-1)
+ ("rust-ureq" ,rust-ureq-2))
+ #:cargo-development-inputs
+ (("rust-hex-literal" ,rust-hex-literal-0.4)
+ ("rust-sha2" ,rust-sha2-0.10)
+ ("rust-tokio-test" ,rust-tokio-test-0.4))))
+ (native-inputs
+ (list pkg-config))
+ (inputs
+ (list openssl))
+ (home-page "https://github.com/huggingface/hf-hub")
+ (synopsis "Interact with HuggingFace in Rust")
+ (description
+ "This crates aims ease the interaction with
+@url{https://huggingface.co/,huggingface}. It aims to be compatible with
+@url{https://github.com/huggingface/huggingface_hub/,huggingface_hub}
+python package, but only implements a smaller subset of functions.")
+ (license license:asl2.0)))
+
(define-public python-hmmlearn
(package
(name "python-hmmlearn")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:04 GMT)
Full text and
rfc822 format available.
Message #23 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/crates-io.scm (rust-monostate-0.1): New variable.
Change-Id: I53f1ebfaf98e785eedeb3293f211bffa6f44bc76
---
gnu/packages/crates-io.scm | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 28ff81c801..7a8f090fd9 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -43741,6 +43741,32 @@ (define-public rust-monostate-impl-0.1
"This package provides Implementation detail of the monostate crate.")
(license (list license:expat license:asl2.0))))
+(define-public rust-monostate-0.1
+ (package
+ (name "rust-monostate")
+ (version "0.1.11")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "monostate" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "0xchz8cs990g7g5f8jjybjnyi9xnhykiq44gl97p5rbh3hgjm347"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs
+ (("rust-monostate-impl" ,rust-monostate-impl-0.1)
+ ("rust-serde" ,rust-serde-1))
+ #:cargo-development-inputs
+ (("rust-serde" ,rust-serde-1)
+ ("rust-serde-json" ,rust-serde-json-1))))
+ (home-page "https://github.com/dtolnay/monostate")
+ (synopsis "Type that deserializes only from one specific value")
+ (description
+ "This package provides a Rust type that deserializes only from one
+specific value.")
+ (license (list license:expat license:asl2.0))))
+
(define-public rust-more-asserts-0.3
(package
(name "rust-more-asserts")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:04 GMT)
Full text and
rfc822 format available.
Message #26 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/crates-io.scm (rust-monostate-impl-0.1): New variable.
Change-Id: Ica72fb8bce3589ed1ee5b08c3d96dcc24aaee279
---
gnu/packages/crates-io.scm | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 658721b123..28ff81c801 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -43718,6 +43718,29 @@ (define-public rust-modifier-0.1
"Chaining APIs for both self -> Self and &mut self methods.")
(license license:expat)))
+(define-public rust-monostate-impl-0.1
+ (package
+ (name "rust-monostate-impl")
+ (version "0.1.11")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "monostate-impl" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "1km6kc6yxvpsxciaj02zar8cx1sq142s6jn6saqn77h7165dd1pn"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs
+ (("rust-proc-macro2" ,rust-proc-macro2-1)
+ ("rust-quote" ,rust-quote-1)
+ ("rust-syn" ,rust-syn-2))))
+ (home-page "https://github.com/dtolnay/monostate")
+ (synopsis "Implementation detail of the monostate crate")
+ (description
+ "This package provides Implementation detail of the monostate crate.")
+ (license (list license:expat license:asl2.0))))
+
(define-public rust-more-asserts-0.3
(package
(name "rust-more-asserts")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:05 GMT)
Full text and
rfc822 format available.
Message #29 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/machine-learning.scm (rust-tokenizers): New variable.
Change-Id: I3189a2d826f072f65ad053d77eb39be39775f1c2
---
gnu/packages/machine-learning.scm | 60 +++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 27d7f0526b..3b601f6c91 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5675,6 +5675,66 @@ (define-public rust-hf-hub-0.3
python package, but only implements a smaller subset of functions.")
(license license:asl2.0)))
+(define-public rust-tokenizers
+ (package
+ (name "rust-tokenizers")
+ (version "0.19.1")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "tokenizers" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "1zg6ffpllygijb5bh227m9p4lrhf0pjkysky68kddwrsvp8zl075"))
+ (modules '((guix build utils)))
+ (snippet
+ #~(substitute* "Cargo.toml"
+ (("0.1.12") ; rust-monostate requires a rust-syn-2 update
+ "0.1.11")
+ (("version = \"6.4\"") ; rust-onig
+ "version = \"6.1.1\"")))))
+ (build-system cargo-build-system)
+ (arguments
+ (list
+ #:tests? #f ; tests are relying on missing data.
+ #:cargo-inputs
+ `(("rust-aho-corasick" ,rust-aho-corasick-1)
+ ("rust-derive-builder" ,rust-derive-builder-0.20)
+ ("rust-esaxx-rs" ,rust-esaxx-rs-0.1)
+ ("rust-fancy-regex" ,rust-fancy-regex-0.13)
+ ("rust-getrandom" ,rust-getrandom-0.2)
+ ("rust-hf-hub" ,rust-hf-hub-0.3)
+ ("rust-indicatif" ,rust-indicatif-0.17)
+ ("rust-itertools" ,rust-itertools-0.12)
+ ("rust-lazy-static" ,rust-lazy-static-1)
+ ("rust-log" ,rust-log-0.4)
+ ("rust-macro-rules-attribute" ,rust-macro-rules-attribute-0.2)
+ ("rust-monostate" ,rust-monostate-0.1)
+ ("rust-onig" ,rust-onig-6)
+ ("rust-paste" ,rust-paste-1)
+ ("rust-rand" ,rust-rand-0.8)
+ ("rust-rayon" ,rust-rayon-1)
+ ("rust-rayon-cond" ,rust-rayon-cond-0.3)
+ ("rust-regex" ,rust-regex-1)
+ ("rust-regex-syntax" ,rust-regex-syntax-0.8)
+ ("rust-serde" ,rust-serde-1)
+ ("rust-serde-json" ,rust-serde-json-1)
+ ("rust-spm-precompiled" ,rust-spm-precompiled-0.1)
+ ("rust-thiserror" ,rust-thiserror-1)
+ ("rust-unicode-normalization-alignments" ,rust-unicode-normalization-alignments-0.1)
+ ("rust-unicode-segmentation" ,rust-unicode-segmentation-1)
+ ("rust-unicode-categories" ,rust-unicode-categories-0.1))
+ #:cargo-development-inputs
+ `(("rust-assert-approx-eq" ,rust-assert-approx-eq-1)
+ ("rust-criterion" ,rust-criterion-0.5)
+ ("rust-tempfile" ,rust-tempfile-3))))
+ (home-page "https://github.com/huggingface/tokenizers")
+ (synopsis "Implementation of various popular tokenizers")
+ (description
+ "This package provides a Rust implementation of today's most used
+tokenizers, with a focus on performances and versatility.")
+ (license license:asl2.0)))
+
(define-public python-hmmlearn
(package
(name "python-hmmlearn")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:05 GMT)
Full text and
rfc822 format available.
Message #32 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/crates-io.scm (rust-numpy-0.21): New variable.
Change-Id: Idae5915f3cefa47c16c4bf9a5679f55621e35da7
---
gnu/packages/crates-io.scm | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/gnu/packages/crates-io.scm b/gnu/packages/crates-io.scm
index 7a8f090fd9..ba5cb75d2c 100644
--- a/gnu/packages/crates-io.scm
+++ b/gnu/packages/crates-io.scm
@@ -48734,6 +48734,41 @@ (define-public rust-number-prefix-0.3
giga, kibi.")
(license license:expat)))
+(define-public rust-numpy-0.21
+ (package
+ (name "rust-numpy")
+ (version "0.21.0")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (crate-uri "numpy" version))
+ (file-name (string-append name "-" version ".tar.gz"))
+ (sha256
+ (base32 "1x1p5x7lwfc5nsccwj98sln5vx3g3n8sbgm5fmfmy5rpr8rhf5zc"))))
+ (build-system cargo-build-system)
+ (arguments
+ `(#:cargo-inputs
+ (("rust-half" ,rust-half-2)
+ ("rust-libc" ,rust-libc-0.2)
+ ("rust-nalgebra" ,rust-nalgebra-0.32)
+ ("rust-ndarray" ,rust-ndarray-0.13)
+ ("rust-num-complex" ,rust-num-complex-0.2)
+ ("rust-num-integer" ,rust-num-integer-0.1)
+ ("rust-num-traits" ,rust-num-traits-0.2)
+ ("rust-pyo3" ,rust-pyo3-0.21)
+ ("rust-rustc-hash" ,rust-rustc-hash-1))
+ #:cargo-development-inputs
+ (("rust-nalgebra" ,rust-nalgebra-0.32)
+ ("rust-pyo3" ,rust-pyo3-0.21))))
+ (native-inputs (list python-minimal
+ (@ (gnu packages python-xyz) python-numpy)))
+ (home-page "https://github.com/PyO3/rust-numpy")
+ (synopsis "Rust bindings for the NumPy C-API")
+ (description
+ "This package provides @code{PyO3-based} Rust bindings of the
+@code{NumPy} C-API.")
+ (license license:bsd-2)))
+
(define-public rust-numtoa-0.2
(package
(name "rust-numtoa")
--
2.45.2
Information forwarded
to
guix-patches <at> gnu.org
:
bug#73106
; Package
guix-patches
.
(Sat, 07 Sep 2024 16:57:06 GMT)
Full text and
rfc822 format available.
Message #35 received at 73106 <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/machine-learning.scm (python-tokenizers): New variable.
Change-Id: I5db95172255dc4635c2a417f3b7252454eea27d7
---
gnu/packages/machine-learning.scm | 111 ++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/gnu/packages/machine-learning.scm b/gnu/packages/machine-learning.scm
index 3b601f6c91..412499d424 100644
--- a/gnu/packages/machine-learning.scm
+++ b/gnu/packages/machine-learning.scm
@@ -5735,6 +5735,117 @@ (define-public rust-tokenizers
tokenizers, with a focus on performances and versatility.")
(license license:asl2.0)))
+(define-public python-tokenizers
+ (package
+ (name "python-tokenizers")
+ (version "0.19.1")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (pypi-uri "tokenizers" version))
+ (sha256
+ (base32 "1qw8mjp0q9w7j1raq1rvcbfw38000kbqpwscf9mvxzfh1rlfcngf"))
+ (modules '((guix build utils)
+ (ice-9 ftw)))
+ (snippet
+ #~(begin ;; Only keeping bindings.
+ (for-each (lambda (file)
+ (unless (member file '("." ".." "bindings" "PKG-INFO"))
+ (delete-file-recursively file)))
+ (scandir "."))
+ (for-each (lambda (file)
+ (unless (member file '("." ".."))
+ (rename-file (string-append "bindings/python/" file) file)))
+ (scandir "bindings/python"))
+ (delete-file-recursively ".cargo")))))
+ (build-system cargo-build-system)
+ (arguments
+ (list
+ #:cargo-test-flags ''("--no-default-features")
+ #:imported-modules `(,@%cargo-build-system-modules
+ ,@%pyproject-build-system-modules)
+ #:modules '((guix build cargo-build-system)
+ ((guix build pyproject-build-system) #:prefix py:)
+ (guix build utils)
+ (ice-9 regex)
+ (ice-9 textual-ports))
+ #:phases
+ #~(modify-phases %standard-phases
+ (add-after 'unpack-rust-crates 'inject-tokenizers
+ (lambda _
+ (substitute* "Cargo.toml"
+ (("\\[dependencies\\]")
+ (format #f "
+[dev-dependencies]
+tempfile = ~s
+pyo3 = { version = ~s, features = [\"auto-initialize\"] }
+
+[dependencies]
+tokenizers = ~s"
+ #$(package-version rust-tempfile-3)
+ #$(package-version rust-pyo3-0.21)
+ #$(package-version rust-tokenizers))))
+ (let ((file-path "Cargo.toml"))
+ (call-with-input-file file-path
+ (lambda (port)
+ (let* ((content (get-string-all port))
+ (top-match (string-match
+ "\\[dependencies.tokenizers" content)))
+ (call-with-output-file file-path
+ (lambda (out)
+ (format out "~a" (match:prefix top-match))))))))))
+ (add-after 'patch-cargo-checksums 'loosen-requirements
+ (lambda _
+ (substitute* "Cargo.toml"
+ (("version = \"6.4\"")
+ (format #f "version = ~s"
+ #$(package-version rust-onig-6))))))
+ (add-after 'check 'python-check
+ (lambda _
+ (copy-file "target/release/libtokenizers.so"
+ "py_src/tokenizers/tokenizers.so")
+ (invoke "python3"
+ "-c" (format #f
+ "import sys; sys.path.append(\"~a/py_src\")"
+ (getcwd))
+ "-m" "pytest"
+ "-s" "-v" "./tests/")))
+ (add-after 'install 'install-python
+ (lambda _
+ (let* ((pversion #$(version-major+minor (package-version python)))
+ (lib (string-append #$output "/lib/python" pversion
+ "/site-packages/"))
+ (info (string-append lib "tokenizers-"
+ #$(package-version this-package)
+ ".dist-info")))
+ (mkdir-p info)
+ (copy-file "PKG-INFO" (string-append info "/METADATA"))
+ (copy-recursively
+ "py_src/tokenizers"
+ (string-append lib "tokenizers"))))))
+ #:cargo-inputs
+ `(("rust-rayon" ,rust-rayon-1)
+ ("rust-serde" ,rust-serde-1)
+ ("rust-serde-json" ,rust-serde-json-1)
+ ("rust-libc" ,rust-libc-0.2)
+ ("rust-env-logger" ,rust-env-logger-0.11)
+ ("rust-pyo3" ,rust-pyo3-0.21)
+ ("rust-numpy" ,rust-numpy-0.21)
+ ("rust-ndarray" ,rust-ndarray-0.15)
+ ("rust-onig" ,rust-onig-6)
+ ("rust-itertools" ,rust-itertools-0.12)
+ ("rust-tokenizers" ,rust-tokenizers))
+ #:cargo-development-inputs
+ `(("rust-tempfile" ,rust-tempfile-3))))
+ (native-inputs
+ (list python-minimal python-pytest))
+ (home-page "https://huggingface.co/docs/tokenizers")
+ (synopsis "Implementation of various popular tokenizers")
+ (description
+ "This package provides bindings to a Rust implementation of the most used
+tokenizers, @code{rust-tokenizers}.")
+ (license license:asl2.0)))
+
(define-public python-hmmlearn
(package
(name "python-hmmlearn")
--
2.45.2
Added blocking bug(s) 73094
Request was from
Nicolas Graves <ngraves <at> ngraves.fr>
to
control <at> debbugs.gnu.org
.
(Sat, 07 Sep 2024 17:08:02 GMT)
Full text and
rfc822 format available.
Added indication that bug 73106 blocks73109
Request was from
Nicolas Graves <ngraves <at> ngraves.fr>
to
control <at> debbugs.gnu.org
.
(Sat, 07 Sep 2024 17:09:01 GMT)
Full text and
rfc822 format available.
Reply sent
to
Ricardo Wurmus <rekado <at> elephly.net>
:
You have taken responsibility.
(Mon, 07 Apr 2025 14:55:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Nicolas Graves <ngraves <at> ngraves.fr>
:
bug acknowledged by developer.
(Mon, 07 Apr 2025 14:55:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 73106-done <at> debbugs.gnu.org (full text, mbox):
Rebased, adjusted, and pushed with commit
6483fdee51a79db09f4645b34f4ebb24f31816b3.
Thank you for your patience!
--
Ricardo
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 06 May 2025 11:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 60 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.