639 lines
19 KiB
Plaintext
639 lines
19 KiB
Plaintext
Version 1.27.0, 2018-08-07
|
|
--------------------------
|
|
- NOTE: Active maintenance shifts to PyPDF4
|
|
|
|
- No functional changes: just migration to PyPDF4
|
|
|
|
|
|
Version 1.26.0, 2016-05-18
|
|
--------------------------
|
|
|
|
- NOTE: Active maintenance on PyPDF2 is resuming after a hiatus
|
|
|
|
- Fixed a bug where image resources where incorrectly
|
|
overwritten when merging pages
|
|
|
|
- Added dictionary for JavaScript actions to the root (louib)
|
|
|
|
- Added unit tests for the JS functionality (louib)
|
|
|
|
- Add more Python 3 compatibility when reading inline images (im2703
|
|
and (VyacheslavHashov)
|
|
|
|
- Return NullObject instead of raising error when failing to resolve
|
|
object (ctate)
|
|
|
|
- Don't output warning for non-zeroed xref table when strict=False
|
|
(BenRussert)
|
|
|
|
- Remove extraneous zeroes from output formatting (speedplane)
|
|
|
|
- Fix bug where reading an inline image would cut off prematurely
|
|
in certain cases (speedplane)
|
|
|
|
|
|
Patch 1.25.1, 2015-07-20
|
|
|
|
- Fix bug when parsing inline images. Occurred when merging
|
|
certain pages with inline images
|
|
|
|
- Fixed type error when creating outlines by utilizing the
|
|
isString() test
|
|
|
|
Version 1.25, 2015-07-07
|
|
------------------------
|
|
|
|
BUGFIXES:
|
|
|
|
- Added Python 3 algorithm for ASCII85Decode. Fixes issue when
|
|
reading reportlab-generated files with Py 3 (jerickbixly)
|
|
|
|
- Recognize more escape sequence which would otherwise throw an
|
|
exception (manuelzs, robertsoakes)
|
|
|
|
- Fixed overflow error in generic.py. Occurred
|
|
when reading a too-large int in Python 2 (by Raja Jamwal)
|
|
|
|
- Allow access to files which were encrypted with an empty
|
|
password. Previously threw a "File has not been decrypted"
|
|
exception (Elena Williams)
|
|
|
|
- Do not attempt to decode an empty data stream. Previously
|
|
would cause an error in decode algorithms (vladir)
|
|
|
|
- Fixed some type issues specific to Py 2 or Py 3
|
|
|
|
- Fix issue when stream data begins with whitespace (soloma83)
|
|
|
|
- Recognize abbreviated filter names (AlmightyOatmeal and
|
|
Matthew Weiss)
|
|
|
|
- Copy decryption key from PdfFileReader to PdfFileMerger.
|
|
Allows usage of PdfFileMerger with encrypted files (twolfson)
|
|
|
|
- Fixed bug which occurred when a NameObject is present at end
|
|
of a file stream. Threw a "Stream has ended unexpectedly"
|
|
exception (speedplane)
|
|
|
|
FEATURES:
|
|
|
|
- Initial work on a test suite; to be expanded in future.
|
|
Tests and Resources directory added, README updated (robertsoakes)
|
|
|
|
- Added document cloning methods to PdfFileWriter:
|
|
appendPagesFromReader, cloneReaderDocumentRoot, and
|
|
cloneDocumentFromReader. See official documentation (robertsoakes)
|
|
|
|
- Added method for writing to form fields: updatePageFormFieldValues.
|
|
This will be enhanced in the future. See official documentation
|
|
(robertsoakes)
|
|
|
|
- New addAttachment method. See documentation. Support for adding
|
|
and extracting embedded files to be enhanced in the future
|
|
(moshekaplan)
|
|
|
|
- Added methods to get page number of given PageObject or
|
|
Destination: getPageNumber and getDestinationPageNumber.
|
|
See documentation (mozbugbox)
|
|
|
|
OTHER ENHANCEMENTS:
|
|
|
|
- Enhanced type handling (Brent Amrhein)
|
|
|
|
- Enhanced exception handling in NameObject (sbywater)
|
|
|
|
- Enhanced extractText method output (peircej)
|
|
|
|
- Better exception handling
|
|
|
|
- Enhanced regex usage in NameObject class (speedplane)
|
|
|
|
|
|
Version 1.24, 2014-12-31
|
|
------------------------
|
|
|
|
- Bugfixes for reading files in Python 3 (by Anthony Tuininga and
|
|
pqqp)
|
|
|
|
- Appropriate errors are now raised instead of infinite loops (by
|
|
naure and Cyrus Vafadari)
|
|
|
|
- Bugfix for parsing number tokens with leading spaces (by Maxim
|
|
Kamenkov)
|
|
|
|
- Don't crash on bad /Outlines reference (by eshellman)
|
|
|
|
- Conform tabs/spaces and blank lines to PEP 8 standards
|
|
|
|
- Utilize the readUntilRegex method when reading Number Objects
|
|
(by Brendan Jurd)
|
|
|
|
- More bugfixes for Python 3 and clearer exception handling
|
|
|
|
- Fixed encoding issue in merger (with eshellman)
|
|
|
|
- Created separate folder for scripts
|
|
|
|
|
|
Version 1.23, 2014-08-11
|
|
------------------------
|
|
|
|
- Documentation now available at http://pythonhosted.org//PyPDF2
|
|
|
|
- Bugfix in pagerange.py for when __init__.__doc__ has no value (by
|
|
Vladir Cruz)
|
|
|
|
- Fix typos in OutlinesObject().add() (by shilluc)
|
|
|
|
- Re-added a missing return statement in a utils.py method
|
|
|
|
- Corrected viewing mode names (by Jason Scheirer)
|
|
|
|
- New PdfFileWriter method: addJS() (by vfigueiro)
|
|
|
|
- New bookmark features: color, boldness, italics, and page fit
|
|
(by Joshua Arnott)
|
|
|
|
- New PdfFileReader method: getFields(). Used to extract field
|
|
information from PDFs with interactive forms. See documentation
|
|
for details
|
|
|
|
- Converted README file to markdown format (by Stephen Bussard)
|
|
|
|
- Several improvements to overall performance and efficiency
|
|
(by mozbugbox)
|
|
|
|
- Fixed a bug where geospatial information was not scaling along with
|
|
its page
|
|
|
|
- Fixed a type issue and a Python 3 issue in the decryption algorithms
|
|
(with Francisco Vieira and koba-ninkigumi)
|
|
|
|
- Fixed a bug causing an infinite loop in the ASCII 85 decoding
|
|
algorithm (by madmaardigan)
|
|
|
|
- Annotations (links, comment windows, etc.) are now preserved when
|
|
pages are merged together
|
|
|
|
- Used the Destination class in addLink() and addBookmark() so that
|
|
the page fit option could be properly customized
|
|
|
|
|
|
Version 1.22, 2014-05-29
|
|
------------------------
|
|
|
|
- Added .DS_Store to .gitignore (for Mac users) (by Steve Witham)
|
|
|
|
- Removed __init__() implementation in NameObject (by Steve Witham)
|
|
|
|
- Fixed bug (inf. loop) when merging pages in Python 3 (by commx)
|
|
|
|
- Corrected error when calculating height in scaleTo()
|
|
|
|
- Removed unnecessary code from DictionaryObject (by Georges Dubus)
|
|
|
|
- Fixed bug where an exception was thrown upon reading a NULL string
|
|
(by speedplane)
|
|
|
|
- Allow string literals (non-unicode strings in Python 2) to be passed
|
|
to PdfFileReader
|
|
|
|
- Allow ConvertFunctionsToVirtualList to be indexed with slices and
|
|
longs (in Python 2) (by Matt Gilson)
|
|
|
|
- Major improvements and bugfixes to addLink() method (see documentation
|
|
in source code) (by Henry Keiter)
|
|
|
|
- General code clean-up and improvements (with Steve Witham and Henry Keiter)
|
|
|
|
- Fixed bug that caused crash when comments are present at end of
|
|
dictionary
|
|
|
|
|
|
Version 1.21, 2014-04-21
|
|
------------------------
|
|
|
|
- Fix for when /Type isn't present in the Pages dictionary (by Rob1080)
|
|
|
|
- More tolerance for extra whitespace in Indirect Objects
|
|
|
|
- Improved Exception handling
|
|
|
|
- Fixed error in getHeight() method (by Simon Kaempflein)
|
|
|
|
- implement use of utils.string_type to resolve Py2-3 compatibility issues
|
|
|
|
- Prevent exception for multiple definitions in a dictionary (with carlosfunk)
|
|
(only when strict = False)
|
|
|
|
- Fixed errors when parsing a slice using pdfcat on command line (by
|
|
Steve Witham)
|
|
|
|
- Tolerance for EOF markers within 1024 bytes of the actual end of the
|
|
file (with David Wolever)
|
|
|
|
- Added overwriteWarnings parameter to PdfFileReader constructor, if False
|
|
PyPDF2 will NOT overwrite methods from Python's warnings.py module with
|
|
a custom implementation.
|
|
|
|
- Fix NumberObject and NameObject constructors for compatibility with PyPy
|
|
(Rüdiger Jungbeck, Xavier Dupré, shezadkhan137, Steven Witham)
|
|
|
|
- Utilize utils.Str in pdf.py and pagerange.py to resolve type issues (by
|
|
egbutter)
|
|
|
|
- Improvements in implementing StringIO for Python 2 and BytesIO for
|
|
Python 3 (by Xavier Dupré)
|
|
|
|
- Added /x00 to Whitespaces, defined utils.WHITESPACES to clarify code (by
|
|
Maxim Kamenkov)
|
|
|
|
- Bugfix for merging 3 or more resources with the same name (by lucky-user)
|
|
|
|
- Improvements to Xref parsing algorithm (by speedplane)
|
|
|
|
|
|
Version 1.20, 2014-01-27
|
|
------------------------
|
|
|
|
- Official Python 3+ support (with contributions from TWAC and cgammans)
|
|
Support for Python versions 2.6 and 2.7 will be maintained
|
|
|
|
- Command line concatenation (see pdfcat in sample code) (by Steve Witham)
|
|
|
|
- New FAQ; link included in README
|
|
|
|
- Allow more (although unnecessary) escape sequences
|
|
|
|
- Prevent exception when reading a null object in decoding parameters
|
|
|
|
- Corrected error in reading destination types (added a slash since they
|
|
are name objects)
|
|
|
|
- Corrected TypeError in scaleTo() method
|
|
|
|
- addBookmark() method in PdfFileMerger now returns bookmark (so nested
|
|
bookmarks can be created)
|
|
|
|
- Additions to Sample Code and Sample PDFs
|
|
|
|
- changes to allow 2up script to work (see sample code) (by Dylan McNamee)
|
|
|
|
- changes to metadata encoding (by Chris Hiestand)
|
|
|
|
- New methods for links: addLink() (by Enrico Lambertini) and removeLinks()
|
|
|
|
- Bugfix to handle nested bookmarks correctly (by Jamie Lentin)
|
|
|
|
- New methods removeImages() and removeText() available for PdfFileWriter
|
|
(by Tien Haï)
|
|
|
|
- Exception handling for illegal characters in Name Objects
|
|
|
|
|
|
Version 1.19, 2013-10-08
|
|
------------------------
|
|
|
|
BUGFIXES:
|
|
- Removed pop in sweepIndirectReferences to prevent infinite loop
|
|
(provided by ian-su-sirca)
|
|
|
|
- Fixed bug caused by whitespace when parsing PDFs generated by AutoCad
|
|
|
|
- Fixed a bug caused by reading a 'null' ASCII value in a dictionary
|
|
object (primarily in PDFs generated by AutoCad).
|
|
|
|
FEATURES:
|
|
- Added new folders for PyPDF2 sample code and example PDFs; see README
|
|
for each folder
|
|
|
|
- Added a method for debugging purposes to show current location while
|
|
parsing
|
|
|
|
- Ability to create custom metadata (by jamma313)
|
|
|
|
- Ability to access and customize document layout and view mode
|
|
(by Joshua Arnott)
|
|
|
|
OTHER:
|
|
- Added and corrected some documentation
|
|
|
|
- Added some more warnings and exception messages
|
|
|
|
- Removed old test/debugging code
|
|
|
|
UPCOMING:
|
|
- More bugfixes (We have received many problematic PDFs via email, we
|
|
will work with them)
|
|
|
|
- Documentation - It's time for PyPDF2 to get its own documentation
|
|
since it has grown much since the original pyPdf
|
|
|
|
- A FAQ to answer common questions
|
|
|
|
|
|
Version 1.18, 2013-08-19
|
|
------------------------
|
|
|
|
- Fixed a bug where older verions of objects were incorrectly added to the
|
|
cache, resulting in outdated or missing pages, images, and other objects
|
|
(from speedplane)
|
|
|
|
- Fixed a bug in parsing the xref table where new xref values were
|
|
overwritten; also cleaned up code (from speedplane)
|
|
|
|
- New method mergeRotatedAroundPointPage which merges a page while rotating
|
|
it around a point (from speedplane)
|
|
|
|
- Updated Destination syntax to respect PDF 1.6 specifications (from
|
|
jamma313)
|
|
|
|
- Prevented infinite loop when a PdfFileReader object was instantiated
|
|
with an empty file (from Jerome Nexedi)
|
|
|
|
Other Changes:
|
|
|
|
- Downloads now available via PyPI
|
|
https://pypi.python.org/pypi?:action=display&name=PyPDF2
|
|
|
|
- Installation through pip library is fixed
|
|
|
|
|
|
Version 1.17, 2013-07-25
|
|
------------------------
|
|
|
|
- Removed one (from pdf.py) of the two Destination classes. Both
|
|
classes had the same name, but were slightly different in content,
|
|
causing some errors. (from Janne Vanhala)
|
|
|
|
- Corrected and Expanded README file to demonstrate PdfFileMerger
|
|
|
|
- Added filter for LZW encoded streams (from Michal Horejsek)
|
|
|
|
- PyPDF2 issue tracker enabled on Github to allow community
|
|
discussion and collaboration
|
|
|
|
|
|
Versions -1.16, -2013-06-30
|
|
---------------------------
|
|
|
|
- Note: This ChangeLog has not been kept up-to-date for a while.
|
|
Hopefully we can keep better track of it from now on. Some of the
|
|
changes listed here come from previous versions 1.14 and 1.15; they
|
|
were only vaguely defined. With the new _version.py file we should
|
|
have more structured and better documented versioning from now on.
|
|
|
|
- Defined PyPDF2.__version__
|
|
|
|
- Fixed encrypt() method (from Martijn The)
|
|
|
|
- Improved error handling on PDFs with truncated streams (from cecilkorik)
|
|
|
|
- Python 3 support (from kushal-kumaran)
|
|
|
|
- Fixed example code in README (from Jeremy Bethmont)
|
|
|
|
- Fixed an bug caused by DecimalError Exception (from Adam Morris)
|
|
|
|
- Many other bug fixes and features by:
|
|
|
|
jeansch
|
|
Anton Vlasenko
|
|
Joseph Walton
|
|
Jan Oliver Oelerich
|
|
Fabian Henze
|
|
And any others I missed.
|
|
Thanks for contributing!
|
|
|
|
|
|
Version 1.13, 2010-12-04
|
|
------------------------
|
|
|
|
- Fixed a typo in code for reading a "\b" escape character in strings.
|
|
|
|
- Improved __repr__ in FloatObject.
|
|
|
|
- Fixed a bug in reading octal escape sequences in strings.
|
|
|
|
- Added getWidth and getHeight methods to the RectangleObject class.
|
|
|
|
- Fixed compatibility warnings with Python 2.4 and 2.5.
|
|
|
|
- Added addBlankPage and insertBlankPage methods on PdfFileWriter class.
|
|
|
|
- Fixed a bug with circular references in page's object trees (typically
|
|
annotations) that prevented correctly writing out a copy of those pages.
|
|
|
|
- New merge page functions allow application of a transformation matrix.
|
|
|
|
- To all patch contributors: I did a poor job of keeping this ChangeLog
|
|
up-to-date for this release, so I am missing attributions here for any
|
|
changes you submitted. Sorry! I'll do better in the future.
|
|
|
|
|
|
Version 1.12, 2008-09-02
|
|
------------------------
|
|
|
|
- Added support for XMP metadata.
|
|
|
|
- Fix reading files with xref streams with multiple /Index values.
|
|
|
|
- Fix extracting content streams that use graphics operators longer than 2
|
|
characters. Affects merging PDF files.
|
|
|
|
|
|
Version 1.11, 2008-05-09
|
|
------------------------
|
|
|
|
- Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
|
|
or FloatObject values.
|
|
|
|
- PDF compatibility fixes.
|
|
|
|
- Fix to read object xref stream in correct order.
|
|
|
|
- Fix for comments inside content streams.
|
|
|
|
|
|
Version 1.10, 2007-10-04
|
|
------------------------
|
|
|
|
- Text strings from PDF files are returned as Unicode string objects when
|
|
pyPdf determines that they can be decoded (as UTF-16 strings, or as
|
|
PDFDocEncoding strings). Unicode objects are also written out when
|
|
necessary. This means that string objects in pyPdf can be either
|
|
generic.ByteStringObject instances, or generic.TextStringObject instances.
|
|
|
|
- The extractText method now returns a unicode string object.
|
|
|
|
- All document information properties now return unicode string objects. In
|
|
the event that a document provides docinfo properties that are not decoded by
|
|
pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
|
|
title_raw rather than title)
|
|
|
|
- generic.DictionaryObject instances have been enhanced to be easier to use.
|
|
Values coming out of dictionary objects will automatically be de-referenced
|
|
(.getObject will be called on them), unless accessed by the new "raw_get"
|
|
method. DictionaryObjects can now only contain PdfObject instances (as keys
|
|
and values), making it easier to debug where non-PdfObject values (which
|
|
cannot be written out) are entering dictionaries.
|
|
|
|
- Support for reading named destinations and outlines in PDF files. Original
|
|
patch by Ashish Kulkarni.
|
|
|
|
- Stream compatibility reading enhancements for malformed PDF files.
|
|
|
|
- Cross reference table reading enhancements for malformed PDF files.
|
|
|
|
- Encryption documentation.
|
|
|
|
- Replace some "assert" statements with error raising.
|
|
|
|
- Minor optimizations to FlateDecode algorithm increase speed when using PNG
|
|
predictors.
|
|
|
|
Version 1.9, 2006-12-15
|
|
-----------------------
|
|
|
|
- Fix several serious bugs introduced in version 1.8, caused by a failure to
|
|
run through our PDF test suite before releasing that version.
|
|
|
|
- Fix bug in NullObject reading and writing.
|
|
|
|
Version 1.8, 2006-12-14
|
|
-----------------------
|
|
|
|
- Add support for decryption with the standard PDF security handler. This
|
|
allows for decrypting PDF files given the proper user or owner password.
|
|
|
|
- Add support for encryption with the standard PDF security handler.
|
|
|
|
- Add new pythondoc documentation.
|
|
|
|
- Fix bug in ASCII85 decode that occurs when whitespace exists inside the
|
|
two terminating characters of the stream.
|
|
|
|
Version 1.7, 2006-12-10
|
|
-----------------------
|
|
|
|
- Fix a bug when using a single page object in two PdfFileWriter objects.
|
|
|
|
- Adjust PyPDF to be tolerant of whitespace characters that don't belong
|
|
during a stream object.
|
|
|
|
- Add documentInfo property to PdfFileReader.
|
|
|
|
- Add numPages property to PdfFileReader.
|
|
|
|
- Add pages property to PdfFileReader.
|
|
|
|
- Add extractText function to PdfFileReader.
|
|
|
|
|
|
Version 1.6, 2006-06-06
|
|
-----------------------
|
|
|
|
- Add basic support for comments in PDF files. This allows us to read some
|
|
ReportLab PDFs that could not be read before.
|
|
|
|
- Add "auto-repair" for finding xref table at slightly bad locations.
|
|
|
|
- New StreamObject backend, cleaner and more powerful. Allows the use of
|
|
stream filters more easily, including compressed streams.
|
|
|
|
- Add a graphics state push/pop around page merges. Improves quality of
|
|
page merges when one page's content stream leaves the graphics
|
|
in an abnormal state.
|
|
|
|
- Add PageObject.compressContentStreams function, which filters all content
|
|
streams and compresses them. This will reduce the size of PDF pages,
|
|
especially after they could have been decompressed in a mergePage
|
|
operation.
|
|
|
|
- Support inline images in PDF content streams.
|
|
|
|
- Add support for using .NET framework compression when zlib is not
|
|
available. This does not make pyPdf compatible with IronPython, but it
|
|
is a first step.
|
|
|
|
- Add support for reading the document information dictionary, and extracting
|
|
title, author, subject, producer and creator tags.
|
|
|
|
- Add patch to support NullObject and multiple xref streams, from Bradley
|
|
Lawrence.
|
|
|
|
|
|
Version 1.5, 2006-01-28
|
|
-----------------------
|
|
|
|
- Fix a bug where merging pages did not work in "no-rename" cases when the
|
|
second page has an array of content streams.
|
|
|
|
- Remove some debugging output that should not have been present.
|
|
|
|
|
|
Version 1.4, 2006-01-27
|
|
-----------------------
|
|
|
|
- Add capability to merge pages from multiple PDF files into a single page
|
|
using the PageObject.mergePage function. See example code (README or web
|
|
site) for more information.
|
|
|
|
- Add ability to modify a page's MediaBox, CropBox, BleedBox, TrimBox, and
|
|
ArtBox properties through PageObject. See example code (README or web site)
|
|
for more information.
|
|
|
|
- Refactor pdf.py into multiple files: generic.py (contains objects like
|
|
NameObject, DictionaryObject), filters.py (contains filter code),
|
|
utils.py (various). This does not affect importing PdfFileReader
|
|
or PdfFileWriter.
|
|
|
|
- Add new decoding functions for standard PDF filters ASCIIHexDecode and
|
|
ASCII85Decode.
|
|
|
|
- Change url and download_url to refer to new pybrary.net web site.
|
|
|
|
|
|
Version 1.3, 2006-01-23
|
|
-----------------------
|
|
|
|
- Fix new bug introduced in 1.2 where PDF files with \r line endings did not
|
|
work properly anymore. A new test suite developed with various PDF files
|
|
should prevent regression bugs from now on.
|
|
|
|
- Fix a bug where inheriting attributes from page nodes did not work.
|
|
|
|
|
|
Version 1.2, 2006-01-23
|
|
-----------------------
|
|
|
|
- Improved support for files with CRLF-based line endings, fixing a common
|
|
reported problem stating "assertion error: assert line == "%%EOF"".
|
|
|
|
- Software author/maintainer is now officially a proud married person, which
|
|
is sure to result in better software... somehow.
|
|
|
|
|
|
Version 1.1, 2006-01-18
|
|
-----------------------
|
|
|
|
- Add capability to rotate pages.
|
|
|
|
- Improved PDF reading support to properly manage inherited attributes from
|
|
/Type=/Pages nodes. This means that page groups that are rotated or have
|
|
different media boxes or whatever will now work properly.
|
|
|
|
- Added PDF 1.5 support. Namely cross-reference streams and object streams.
|
|
This release can mangle Adobe's PDFReference16.pdf successfully.
|
|
|
|
|
|
Version 1.0, 2006-01-17
|
|
-----------------------
|
|
|
|
- First distutils-capable true public release. Supports a wide variety of PDF
|
|
files that I found sitting around on my system.
|
|
|
|
- Does not support some PDF 1.5 features, such as object streams,
|
|
cross-reference streams.
|
|
|