While running participants, we use multi-page logs (e.g. arrival time, researchers present, start time of each run of each task). After a session is over, we scan these to save on the server for easy reference later. However, with the most recent round of computer upgrades, the scanner functionality has decreased. Specifically, saving multiple images to a single PDF no longer works.
Opening multiple image files is a pain, so I wanted to find a quick way to convert the combine the images into a PDF. My solution was to use the img2pdf package and some Python3 code.
The script I came up with does 4 things:
- Read in all files from the directory
- Parse out the subject ID from the file names
- Check of a merged PDF already exists
- If it doesn't, merge the images into a PDF
It's not perfect (e.g. it's hard-coded to match the naming convention we've used), but it is quick to use and saves time, so passes my test of a successful script.
import img2pdf
from os.path import isfile, join
from os import listdir
# Scan the search directory for jpeg files
search_dir = 'z:/projects/research/Political Moralization/EEG Notes/'
j_files = [f for f in listdir(search_dir) if isfile(join(search_dir, f))]
# Keep only jpegs
j_files = [f for f in j_files if f.endswith('.jpeg')]
# exclude second page " 1" files
j_files = [f for f in j_files if not " " in f]
j_files.sort()
# Iterate over files:
# Extract file name, check if PDF exists, if not, create it
# ASSERT: First five characters define subject ID
for f in j_files:
sub_name = f[:5]
pdf_file = join(search_dir, sub_name + '.pdf')
if isfile(join(search_dir, sub_name + ".pdf")):
print('%s already exists.' % pdf_file)
else:
img1 = join(search_dir, sub_name + '.jpeg')
img2 = join(search_dir, sub_name + ' 1.jpeg')
with open(pdf_file, 'wb') as out:
out.write(img2pdf.convert([img1, img2]))