Convert PDF to Excel in Python

Overview

This article explains how to convert PDF to Excel formats using Python. It covers the following topics.

Format: XLS

Format: XLSX

Format: Excel

Format: CSV

Format: ODS

PDF to EXCEL conversion via Python

Aspose.PDF for Python via .NET support the feature of converting PDF files to Excel, and CSV formats.

Aspose.PDF for Python via .NET is a PDF manipulation component, we have introduced a feature that renders PDF file to Excel workbook (XLSX files). During this conversion, the individual pages of the PDF file are converted to Excel worksheets.

The following code snippet shows the process for converting PDF file into XLS or XLSX format with Aspose.PDF for Python via .NET.

Steps: Convert PDF to XLS in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions.
  3. Save it to XLS format specifying .xls extension by calling Document.Save() method and passing it ExcelSaveOptions.

    import aspose.pdf as ap

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_xls.xls"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()
    save_option.format = ap.ExcelSaveOptions.ExcelFormat.XML_SPREAD_SHEET2003

    # Save the file into MS Excel format
    document.save(output_pdf, save_option)

Steps: Convert PDF to XLSX in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions.
  3. Save it to XLSX format specifying .xlsx extension by calling save() method and passing it ExcelSaveOptions.

    import aspose.pdf as ap

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf =  DIR_OUTPUT + "convert_pdf_to_xlsx.xlsx"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()

    # Save the file into MS Excel format
    document.save(output_pdf, save_option)

Convert PDF to XLS with control Column

When converting a PDF to XLS format, a blank column is added to the output file as first column. The in ‘ExcelSaveOptions class’ InsertBlankColumnAtFirst option is used to control this column. Its default value is true.


    import aspose.pdf as ap

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_xlsx_with_control_column.xls"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()
    save_option.format = ap.ExcelSaveOptions.ExcelFormat.XML_SPREAD_SHEET2003
    save_option.insert_blank_column_at_first = True

    # Save the file into MS Excel format
    document.save(output_pdf, save_option)

Convert PDF to Single Excel Worksheet

When exporting a PDF file with a lot of pages to XLS, each page is exported to a different sheet in the Excel file. This is because the MinimizeTheNumberOfWorksheets property is set to false by default. To ensure that all pages are exported to one single sheet in the output Excel file, set the MinimizeTheNumberOfWorksheets property to true.

Steps: Convert PDF to XLS or XLSX Single Worksheet in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with MinimizeTheNumberOfWorksheets = true.
  3. Save it to XLS or XLSX format having single worksheet by calling save() method and passing it ExcelSaveOptions.

    import aspose.pdf as ap

    input_pdf = DIR_INPUT + "many_pages.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_xlsx_single_excel_worksheet.xls"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()
    save_option.format = ap.ExcelSaveOptions.ExcelFormat.XML_SPREAD_SHEET2003
    save_option.minimize_the_number_of_worksheets = True

    # Save the file into MS Excel format
    document.save(output_pdf, save_option)

Convert to other spreadsheet formats

Convert to CSV

Conversion to CSV format performs in the same way as above. All is what you need - set the appropriate format.

Steps: Convert PDF to CSV in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.CSV
  3. Save it to CSV format by calling save()* method and passing it ExcelSaveOptions.

    import aspose.pdf as ap

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_csv.csv"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()
    save_option.format = ap.ExcelSaveOptions.ExcelFormat.CSV

    # Save the file
    document.save(output_pdf, save_option)

Convert to ODS

Steps: Convert PDF to ODS in Python

  1. Create an instance of Document object with the source PDF document.
  2. Create an instance of ExcelSaveOptions with Format = ExcelSaveOptions.ExcelFormat.ODS
  3. Save it to ODS format by calling save() method and passing it ExcelSaveOptions.

Conversion to ODS format performs in the same way as all other formats.


    import aspose.pdf as ap
    
    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf = DIR_OUTPUT + "convert_pdf_to_ods.ods"
    # Open PDF document
    document = ap.Document(input_pdf)

    save_option = ap.ExcelSaveOptions()
    save_option.format = ap.ExcelSaveOptions.ExcelFormat.ODS

    # Save the file
    document.save(output_pdf, save_option)

See Also

This article also covers these topics. The codes are same as above.

Format: Excel

Format: XLS

Format: XLSX

Format: CSV

Format: ODS