'Ripgrep-all' that allows you to search for character strings in image files and databases like 'grep'



The essential command for searching for strings on the Linux command line is 'grep'. However, the weakness of grep is that it cannot search for character strings in video files and PDF files. ' Ripgrep-all (rga) ' is a command that overcomes such weaknesses of grep and can search metadata of video files, database records, and even character strings in image files.

GitHub --phiresky / ripgrep-all: rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

https://github.com/phiresky/ripgrep-all

rga can be used not only on Linux but also on Windows and macOS. This time I will try using rga on Ubuntu 20.04. Execute the following command to install the dependent packages in advance.

[code] sudo apt install ripgrep pandoc poppler-utils ffmpeg [/ code]



Download the rga binary file from GitHub and place it in the directory specified by the environment variable 'PATH' of the shell. The latest version of rga is 0.9.6 at the time of article creation.

[code] wget -O-'https://github.com/phiresky/ripgrep-all/releases/download/v0.9.6/ripgrep_all-v0.9.6-x86_64-unknown-linux-musl.tar.gz' | tar zxvf-
sudo mv ripgrep_all-v0.9.6-x86_64-unknown-linux-musl / rga * / usr / local / bin [/ code]



You can use it just like normal grep by executing 'rga search keyword search target'. When I try to compress 'gigazine.txt' recorded as 'gigazine GIGAZINE Gigazine' in ZIP format and GZIP format and search with rga ...



Like this, it recursively searches the contents of the compressed file.



The usage of rga is the same as normal grep, but the behavior is slightly different. For example, regular grep searches case-sensitively with no options, but rga does not. The backend 'ripgrep' of rga has the same behavior as grep, so it is a behavior peculiar to rga.



In order to try the metadata search of the video that normal grep does not support, prepare a video file with the title 'GIGAZINE Momotaro'.



When I searched with rga, the search results were displayed properly.



EPUB texts used in e-books can also be searched without problems.



I was also able to find records in the SQLite3 database.



The formats that can be searched by rga can be confirmed with the 'rga --rga-list-adapters' command. The formats that are valid at the time of article creation are as follows.

・ Video files (.mkv, .mp4, .avi)
・ E-books (.epub, .odt, .docx, .fb2, .ipynb)
・ PDF file
-Compressed files (.zip, .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst)
・ Archive (tar)
・ SQLite3


It is disabled by default, but if you install the 'tesseract-ocr' package and use the '--rga-adapters = + pdfpages, tesseract' option, you can search for strings in images such as JPEG and PNG. Is also possible. If you search for the following PNG file with rga ...



It automatically recognizes the character string in the PNG file and displays the result.



Although it behaves differently from normal grep, rga was a powerful command for a wide range of files.

in Review,   Software, Posted by darkhorse_log