GNU bug report logs -
#77833
Xapian cache/search proof of concept
Previous Next
To reply to this bug, email your comments to 77833 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Tue, 15 Apr 2025 21:16:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Noé Lopez <noe <at> xn--no-cja.eu>
:
New bug report received and forwarded. Copy sent to
guix-patches <at> gnu.org
.
(Tue, 15 Apr 2025 21:16:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Arun, Ludo, and everyone,
I Just stumbled upon Arun’s suggestion of doing a guix xsearch extension
to use a xapian search in #39258 so I gave it a try tonight, here’s the
proof of concept.
Search is less than a second (loading guile modules, actual search is
instantaneous) and building the cache takes ~20 seconds.
The whole thing was very easy to make following the guile-xapian
example, and it shows as it is only 71 loc.
So, what do you think? Is it an idea worth pursuing?
[guix-xsearch.scm (text/plain, inline)]
(define-module (guix-xsearch)
#:use-module ((gnu packages) #:select (fold-packages
find-packages-by-name))
#:use-module ((guix build utils) #:select (package-name->name+version))
#:use-module (guix packages)
#:use-module (ice-9 match)
#:use-module (srfi srfi-11)
#:use-module (xapian xapian)
#:use-module (statprof))
(define %database-path "guix-xsearch.xapian")
(define (index)
(call-with-writable-database
%database-path
(lambda (database)
(fold-packages
(lambda (package _)
(let* ((name (package-name package))
(version (package-version package))
(description (package-description package))
(synopsis (package-synopsis package))
(name+version (string-append name "@" version))
(idterm (string-append "Q" name+version))
(document (make-document #:data name+version
#:terms `((,idterm . 0))))
(term-generator (make-term-generator #:stem (make-stem "en")
#:document document)))
;; Index title and description with a suitable
;; prefix. This is used to allow for searching separate
;; fields as in name:openttd, description:leather,
;; etc.
;; Disabled for performance.
(index-text! term-generator name #:prefix "S")
(index-text! term-generator synopsis #:prefix "B")
(index-text! term-generator description #:prefix "XD")
;; Index title and description without prefixes for
;; general search.
(index-text! term-generator name)
(increase-termpos! term-generator)
(index-text! term-generator synopsis)
(index-text! term-generator description)
;; Add the document to the database. The unique idterm
;; ensures each object ends up in the database only once
;; no matter how many times we run the indexer.
(replace-document! database idterm document)
#nil))
#nil))))
(define (search query-string)
(call-with-database
%database-path
(lambda (database)
(let* ((query (parse-query query-string
#:stemmer (make-stem "en")
#:prefixes '(("name" . "S")
("synopsis" . "B")
("description" . "XD"))))
(mset (enquire-mset (enquire database query)
#:maximum-items 10)))
(mset-fold
(lambda (item _)
(let* ((name+version (string-split (document-data (mset-item-document item))
#\@))
;; FIXME: Use a more precise way to restore the
;; package, like the package cache does.
(packages (find-packages-by-name
(car name+version)
(cadr name+version))))
(for-each
(lambda (package)
(format #t "~a: #~3,'0d ~a~%"
(mset-item-rank item)
(mset-item-docid item)
package))
packages))
#nil)
#nil
mset)))))
(match (command-line)
((_ "index")
(index))
((_ "search" query)
(search query))
((program . _)
(format (current-error-port) "Usage: ~a index | search <query>~%" program)))
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 16:34:10 GMT)
Full text and
rfc822 format available.
Message #8 received at 77833 <at> debbugs.gnu.org (full text, mbox):
Hi Noé,
On Tue, 15 Apr 2025 at 23:14, Noé Lopez via Guix-patches via <guix-patches <at> gnu.org> wrote:
> So, what do you think? Is it an idea worth pursuing?
Yes. Here [1] the description of what appears to me worth to
continue. :-)
Well, I’ve started something in that direction… but it’s much faster and
easier to look for an email indexed with notmuch (Xapian) than to browse
my files on various machines. ;-)
I think the best is to open a repository for your extension; maybe on
Codeberg. Then, we could iterate on that. WDYT?
Cheers,
simon
1: Extension for improving Guix search?
Simon Tournier <zimon.toutoune <at> gmail.com>
Wed, 22 May 2024 12:15:09 +0200
id:87msoi40gi.fsf <at> gmail.com
https://lists.gnu.org/archive/html/guix-devel/2024-05
https://yhetil.org/guix/87msoi40gi.fsf <at> gmail.com
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 17:17:07 GMT)
Full text and
rfc822 format available.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Noé,
That's fabulous! Thank you for taking this up. :-) Please consider
making this its own project with its own repo, and advertising on
guix-devel and elsewhere.
> (match (command-line)
> ((_ "index")
> (index))
> ((_ "search" query)
> (search query))
If there's some way to hide the indexing step from the user, that would
be good. For example, one way could be to build the index the first time
search is run. You'll have to figure out some way to make this a smooth
user experience.
Remember that the user may jump back and forth between multiple guix
profiles. So, multiple indices may be necessary.
Regards,
Arun
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 20:04:01 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
Noé Lopez <noe <at> noé.eu> writes:
> Search is less than a second (loading guile modules, actual search is
> instantaneous) and building the cache takes ~20 seconds.
>
> The whole thing was very easy to make following the guile-xapian
> example, and it shows as it is only 71 loc.
I forgot the outcome of discussions with Simon back when they worked on
it, but now I wonder: would it be reasonable to have it in ‘guix search’
proper, with indexing happening on first ‘guix search’ for a given
channel set?
I suppose 20s is on an SSD with a warm cache; would be nice to check on
spinning disks (on an SSD current ‘guix search’ is fast enough IMO).
One thing is that I’m not confident about the use of SWIG in
Guile-Xapian (I used SWIG back when it was fashionable and it didn’t do
a great job), I’d rather not have Guix depend on it.
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 20:29:02 GMT)
Full text and
rfc822 format available.
Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Ludo,
> I forgot the outcome of discussions with Simon back when they worked on
> it, but now I wonder: would it be reasonable to have it in ‘guix search’
> proper, with indexing happening on first ‘guix search’ for a given
> channel set?
We might also consider building the xapian index along with the package
cache. But, that does come with the cost of slowing down profile builds
slightly.
> One thing is that I’m not confident about the use of SWIG in
> Guile-Xapian (I used SWIG back when it was fashionable and it didn’t do
> a great job), I’d rather not have Guix depend on it.
I'm not a big fan of SWIG either. But, the main reason guile-xapian uses
SWIG is because that's what upstream
https://github.com/xapian/xapian/tree/master/xapian-bindings provides us
with. I'm not saying it is the best situation, but it is what it is, I
guess.
Regards,
Arun
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 21:01:02 GMT)
Full text and
rfc822 format available.
Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):
Arun Isaac <arunisaac <at> systemreboot.net> writes:
>> I forgot the outcome of discussions with Simon back when they worked on
>> it, but now I wonder: would it be reasonable to have it in ‘guix search’
>> proper, with indexing happening on first ‘guix search’ for a given
>> channel set?
>
> We might also consider building the xapian index along with the package
> cache. But, that does come with the cost of slowing down profile builds
> slightly.
More than slightly I’m afraid, even if it’s “just” 20s. :-)
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 21:39:03 GMT)
Full text and
rfc822 format available.
Message #23 received at 77833 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Simon Tournier <zimon.toutoune <at> gmail.com> writes:
> Hi Noé,
>
> On Tue, 15 Apr 2025 at 23:14, Noé Lopez via Guix-patches via <guix-patches <at> gnu.org> wrote:
>
>> So, what do you think? Is it an idea worth pursuing?
>
> Yes. Here [1] the description of what appears to me worth to
> continue. :-)
>
> Well, I’ve started something in that direction… but it’s much faster and
> easier to look for an email indexed with notmuch (Xapian) than to browse
> my files on various machines. ;-)
>
> I think the best is to open a repository for your extension; maybe on
> Codeberg. Then, we could iterate on that. WDYT?
>
Will do, thanks!
> Cheers,
> simon
>
> 1: Extension for improving Guix search?
> Simon Tournier <zimon.toutoune <at> gmail.com>
> Wed, 22 May 2024 12:15:09 +0200
> id:87msoi40gi.fsf <at> gmail.com
> https://lists.gnu.org/archive/html/guix-devel/2024-05
> https://yhetil.org/guix/87msoi40gi.fsf <at> gmail.com
There seems to have been a lot of attempts at this, do you know why they
did not work? What is there to learn from these previous attempts?
Good day,
Noé
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 16 Apr 2025 21:47:02 GMT)
Full text and
rfc822 format available.
Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Arun Isaac <arunisaac <at> systemreboot.net> writes:
> Hi Noé,
>
> That's fabulous! Thank you for taking this up. :-) Please consider
> making this its own project with its own repo, and advertising on
> guix-devel and elsewhere.
>
I’m amazed by the interest :) Will do when I have a working extension.
>> (match (command-line)
>> ((_ "index")
>> (index))
>> ((_ "search" query)
>> (search query))
>
> If there's some way to hide the indexing step from the user, that would
> be good. For example, one way could be to build the index the first time
> search is run. You'll have to figure out some way to make this a smooth
> user experience.
>
I could schedule an index in the background and run the normal search 🤔
> Remember that the user may jump back and forth between multiple guix
> profiles. So, multiple indices may be necessary.
Right, I’ll think about that.
>
> Regards,
> Arun
Good day,
Noé
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Thu, 17 Apr 2025 11:12:03 GMT)
Full text and
rfc822 format available.
Message #29 received at 77833 <at> debbugs.gnu.org (full text, mbox):
Hi Noé,
On Wed, 16 Apr 2025 at 23:38, Noé Lopez via Guix-patches via <guix-patches <at> gnu.org> wrote:
> There seems to have been a lot of attempts at this, do you know why they
> did not work?
Why? On my side, procrastination coupled to other fish to fry. Or maybe
we were just waiting you. :-)
Somehow, I wanted to clean how to write Guix extensions before. Recent
discussions with Nicolas (Graves) seems a very good to resume all that!
Thank you for pushing forward.
Cheers,
simon
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Thu, 17 Apr 2025 11:13:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 77833 <at> debbugs.gnu.org (full text, mbox):
Hi,
On Wed, 16 Apr 2025 at 18:12, Arun Isaac <arunisaac <at> systemreboot.net> wrote:
> If there's some way to hide the indexing step from the user, that would
> be good. For example, one way could be to build the index the first time
> search is run. You'll have to figure out some way to make this a smooth
> user experience.
Well, the design isn’t obvious to me: that’s a typical case of cache
invalidation problem and we all know, there are two hard problems:
naming thing and cache invalidation. ;-)
Somehow, the best would to have the ability to hook “guix pull” (or
“guix time-machine”) with this indexing step. Currently, I do not think
it is doable, is it? But that could be helpful: trigger some actions
(hook) registered by some Guix extensions. Well, it seems out of the
scope at first. :-)
> Remember that the user may jump back and forth between multiple guix
> profiles. So, multiple indices may be necessary.
Yes. Today, I’m more interested in being able to search in all the
history than in having faster search. :-) I mean, I see faster search as
a collateral “damage” of searching in all the Guix history
revisions. ;-)
Somehow, what I started^W failed – because I’m a procrastinator ;-) –
long ago was to be able to substitute some Xapian database.
For example, the extension was named “guix chase”, IIRC, and I wanted to
have “guix chase pull” which updates some local Xapian database. Then,
I could run “guix chase search hello” and find various versions of
’hello’ with some associated Guix revisions. Bah in the middle I
entered in a blackhole. Whatever.
Today, searching across the Guix history is really annoying. Somehow, I
do:
$ git -C ~/src/guix/guix log --format="%h %s" | grep 'gnu: bowtie:'
a47a90b900 gnu: bowtie: Remove reference to %outputs.
f336cc4fe7 gnu: bowtie: Replace invalid characters.
e5a26a1f02 gnu: bowtie: Remove trailing #T.
2ec601580b gnu: bowtie: Use TBB 2020.
21c837405a gnu: bowtie: Update to 2.3.4.3.
06e372360e gnu: bowtie: Use 'modify-phases'.
d6e63cf31c gnu: bowtie: Update to 2.3.2.
2642231b39 gnu: bowtie: Update to 2.2.9.
0047d26a22 gnu: bowtie: Update to 2.2.6.
241e122193 gnu: bowtie: fix build errors
which is not super handy. Well, it was somehow an idea behind the
Magali’s Outreachy internship implementing some “guix git log”: make it
a bit more handy.
Anyway.
Thanks Noé for working on that. Let me know where is the Git
repository. :-)
Cheers,
simon
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Thu, 17 Apr 2025 11:14:04 GMT)
Full text and
rfc822 format available.
Message #35 received at 77833 <at> debbugs.gnu.org (full text, mbox):
Hi Ludo,
On Wed, 16 Apr 2025 at 22:01, Ludovic Courtès <ludo <at> gnu.org> wrote:
> I forgot the outcome of discussions with Simon back when they worked on
> it, but now I wonder: would it be reasonable to have it in ‘guix search’
> proper, with indexing happening on first ‘guix search’ for a given
> channel set?
Adding Xapian as a dependency of Guix would be unwise, IMHO.
I would prefer we take the other direction: having a light core Guix and
add around some extensions – at least as light as possible :-) –;
instead of adding more and more to what comes by default with “guix
pull”.
Cheers,
simon
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Fri, 18 Apr 2025 22:09:02 GMT)
Full text and
rfc822 format available.
Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi everyone!
I have finished cooking, so here is Guile Xsearch 0.1!
It is available at https://codeberg.org/Baleine/guix-xsearch. After
doing a local clone, you can use it like this:
- make env
- guix xsearch --index
- guix xsearch <search>
Please try it out and tell me what you think! Comments on the code are
also welcome :)
Good day,
Noé
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Sat, 19 Apr 2025 09:33:03 GMT)
Full text and
rfc822 format available.
Message #41 received at 77833 <at> debbugs.gnu.org (full text, mbox):
Noé Lopez <noe <at> noé.eu> writes:
> I have finished cooking, so here is Guile Xsearch 0.1!
Wo0t!
> It is available at https://codeberg.org/Baleine/guix-xsearch. After
> doing a local clone, you can use it like this:
If you make it a channel, people should be able to add it to their
channel list and then ‘guix xsearch’ will work out of the box (see
commit 60c41183d9c47fb25270fe810d03c0785406faad).
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 23 Apr 2025 16:04:04 GMT)
Full text and
rfc822 format available.
Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Noé,
> - make env
> - guix xsearch --index
> - guix xsearch <search>
>
> Please try it out and tell me what you think! Comments on the code are
> also welcome :)
Lovely!
I think your comment at
https://codeberg.org/Baleine/guix-xsearch/src/commit/1a246259d9241a3ca27adf807169187d941810d6/src/guix-xsearch/search.scm#L35
is a good idea. It will make the search even more blazing fast.
Regards,
Arun
Information forwarded
to
guix-patches <at> gnu.org
:
bug#77833
; Package
guix-patches
.
(Wed, 23 Apr 2025 16:04:07 GMT)
Full text and
rfc822 format available.
Message #47 received at 77833 <at> debbugs.gnu.org (full text, mbox):
> If you make it a channel, people should be able to add it to their
> channel list and then ‘guix xsearch’ will work out of the box (see
> commit 60c41183d9c47fb25270fe810d03c0785406faad).
Good idea. +1 to that.
This bug report was last modified today.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.