admin

用python处理pdf文档
这里主要用到PyPDF2这个纯Python的开源库,安装如下pip install PyPDF2PyPDF2主要提...
扫描右侧二维码阅读全文
29
2018/06

用python处理pdf文档

这里主要用到PyPDF2这个纯Python的开源库,安装如下

pip install PyPDF2

PyPDF2主要提供四个类:PdfFileWriter 、PdfFileReader、PdfFileMerger和PageObject

不多说,直接上示范源码,内有注释

#!/usr/bin/env python                                                                                                                                                                                               
#-*-coding:utf-8-*-
 
# File Name: PyPDF2_1.py
# Author: Wang Junjie
# Created Time: 2018-06-29

import PyPDF2

reader = PyPDF2.PdfFileReader(open("unlock.pdf",'rb'))
watermark = PyPDF2.PdfFileReader(open("unlock.pdf",'rb'))


#get the pages of the pdf
page_num = reader.getNumPages()
print (page_num)
#get the info of the pdf
pdf_info = reader.getDocumentInfo()
print (pdf_info)


output = PyPDF2.PdfFileWriter()
output.encrypt('123456')

#define the name of the output file
outputStream = open("output.pdf","wb")

#add in every 100 pages
for i in range(100):
    if i%10 == 0:
        page = reader.getPage(i)
        output.addPage(page)    

page = reader.getPage(200)

#rotate the page
page.rotateClockwise(180)
output.addPage(page)

#merge 2 pdfs
page = reader.getPage(201)
page.mergePage(watermark.getPage(0))
output.addPage(page)

output.write(outputStream)

outputStream.close()

运行后输出如下图

值得指出的是getNumPages这类方法可能会遇到pdf有密码的问题,如果知道密码可以通过reader.decrypt(passwd)方法解密,如果不知道可以用在线pdf编辑器解密后重新打开

 

Last modification:March 13th, 2019 at 07:06 pm
If you think my article is useful to you, please feel free to appreciate

Leave a Comment