If relative=True, the bounding box is calculated as an offset from the top-left of the page's bounding box, rather than an absolute positioning. If an object falls only partly within the box, its dimensions are sliced to fit the bounding box. Cropped pages retain objects that fall at least partly within the bounding box. Returns a version of the page cropped to the bounding box, which should be expressed as 4-tuple with the values (x0, top, x1, bottom). crop(bounding_box, relative=False, strict=True) imagesĮach of these properties is a list, and each list contains one dictionary for each such object embedded on the page. The sequential page number, starting with 1 for the first page, 2 for the second, and so on. Most things you'll do with pdfplumber will revolve around this class. The pdfplumber.Page class is at the core of pdfplumber. You can use this method to flush the cache and release the memory. When parsing large PDFs, however, these cached properties can require a lot of memory. and also has the following method: Methodīy default, Page objects cache their layout and object information to avoid having to reprocess it. Typically includes "CreationDate," "ModDate," "Producer," et cetera.Ī list containing one pdfplumber.Page instance per page loaded. The top-level pdfplumber.PDF class represents a single PDF and has two main properties: PropertyĪ dictionary of metadata key/value pairs, drawn from the PDF's Info trailers. If that is not intended, pass strict_metadata=True to the open method and pdfplumber.open will raise an exception if it is unable to parse the metadata. Invalid metadata values are treated as a warning by default. Defaults to all available.Ī JSON-formatted string (e.g., '). types Ĭhoices are char, rect, line, curve, image, annot, et cetera. The json format returns more information it includes PDF-level and page-level metadata, plus dictionary-nested attributes.Ī space-delimited, 1-indexed list of pages or hyphenated page ranges. The output will be a CSV containing info about every character, line, and rectangle in the PDF.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |