Posted on :: 631 Words :: Tags: , , ,

We had a old CanonScan 5600F scanner lying around and wanted to have scanning to a fileshare on a button press.

With a raspberry pi connected via usb to the scanner, scanbd and a little script we were able to achieve this.

  graph LR
  A[Press Scan button] --> B[Scan to Image]
  B --> C[Convert Image to PDF]
  C --> D[Run OCR]
  D --> E[Output Document]

Required Packages

We need to to install the scanbd (Scanner Button Daemon) to act on button press. Additionally we need the sane packages to detect the scanner.

sudo apt install sane sane-utils sanebd

Configuration

Copy the sane configuration to the scanbd configuration.

cp -r /etc/sane.d/* /etc/scanbd/sane.d/

Modify /etc/sane.d/dll.conf so that only net is used and not commented out.

# genesys
net
# canon

Test if the scanner is detected

root@scanner:/opt/insaned# SANE_CONFIG_DIR=/etc/scanbd scanimage -L
device 'genesys:libusb:001:004' is a Canon CanoScan 5600F flatbed scanner

Start & enable the scanbd service

sudo systemctl enable --now scanbd
sudo systemctl enable scanbd

Edit the button configuration

/etc/scanbd/scanbd.conf

The scan action runs a the script. The path of the script or the content can be changed.

action scan {
        filter = "^scan.*"
        numerical-trigger {
               from-value = 1
               to-value   = 0
               }
        desc   = "Scan to file"
        script = "/usr/local/bin/scan-to-share"
       }

At the bottom

# devices
# each device can have actions and functions, you can disable not relevant devices
include(scanner.d/canon.conf)

Debugging

systemctl stop scanbd
SANE_CONFIG_DIR=/etc/scanbd scanbd -f

More verbose:

systemctl stop scanbd
SANE_CONFIG_DIR=/etc/scanbd scanbd -f -d7

Scan script

#!/usr/bin/env bash
set -xeo pipefail

log_file="/var/scans/scan.log"
echo "Starting script" | tee -a "$log_file"

# Set the image scanning parameters
resolution=300
file_ending=jpg
format=jpeg
mode=color

file_data=$(date +'%Y_%m_%d-%H_%M_%S')
filename="$file_data.$file_ending"
temp_path="/tmp/$filename"
dest_path="/var/scans/scanned/$file_data.pdf"

echo "Destination path \"$dest_path\"" | tee -a "$log_file"
echo "Starting scan with resolution $resolution, format $format & mode $mode" | tee -a "$log_file"

export SANE_CONFIG_DIR=/etc/scanbd
scanimage --format "$format" --resolution="$resolution" --mode "$mode" -v -p > "$temp_path"
img2pdf "$temp_path" -o "$dest_path"
rm "$temp_path"
chmod 777 "$dest_path"

OCR Script

We want to seperate the scanning script with the ocr script, because scanbd runs the scan script until it is finsihed, before it can run the next script. While it is running, the scanner is blocked.

Create a file at /usr/local/bin/scan-ocr.

#!/usr/bin/env bash
set -xeo pipefail

log_file="/var/scans/ocr.log"
local_scans_dir="/var/scans/scanned"
local_ocr_dir="/var/scans/ocr"
tesseract_language="deu"

if [ ! -d "$local_scans_dir" ]; then
    echo "Error: Local scans directory $local_scans_dir does not exist."
    exit 1
fi

if [ ! -d "$local_ocr_dir" ]; then
    echo "Error: Local OCR directory $local_ocr_dir does not exist."
    exit 1
fi

ls -la "$local_scans_dir"

for file in "$local_scans_dir"/*.pdf; do
  name=$(basename "$file")
  new_path="$local_ocr_dir/$name"
  if ! [ -f "$new_path" ]; then
    echo "Starting OCR on $file to $new_path" | tee -a "$log_file"
    ocrmypdf -l "$tesseract_language" --force-ocr "$file" "$new_path" && rm "$file"
  fi
done

systemd Service and Timer

To run the ocr script periodically, we can use a systemd timer.

Service

Create a new service file at /etc/systemd/system/scan-ocr.service

[Unit]
Description=OCR for Scans

[Service]
Type=simple
ExecStart=/usr/local/bin/scan-ocr

Timer / Cron

Create a new timer file at /etc/systemd/system/scan-ocr.timer:

[Unit]
Description=Runs the OCR every minute

[Timer]
OnBootSec=1min
OnUnitActiveSec=1min

[Install]
WantedBy=timers.target

Enable & Start the timer

sudo systemctl daemon-reload
sudo systemctl enable --now scan-ocr.timer

Verify

sudo systemctl status scan-ocr.timer
sudo systemctl status scan-ocr.service

To manually start or stop the orc service sudo systemctl start scan-ocr.service can be used.

Access the logs with journalctl -u scan-ocr