Friday, June 30, 2017

Count number of lines for each PDF in a folder

This is just a note about a script which may be useful to you. This one calculates the number of lines per PDF and prints the final count.

import sys

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk(sys.argv[1]):
   for filename in fnmatch.filter(filenames, '*.pdf'):
       matches.append(os.path.join(root, filename))

count = 0
for mat in matches:
   if not mat.lower().endswith("pdf"): continue
   cmd = "pdftk " + mat +  " dump_data | grep NumberOfPages > pn.log"
   os.system(cmd)
   try:
     f = open("pn.log")
     l = f.read().strip().split(":")[1].strip()
     f.close()
     print(mat + "," + l)
     count = int(l) + count
   except:
     continue

print(count)


Have a great weekend ! :)

No comments: