Skip to main content
  • News
  • Events
  • Blog
  • Search

Natural Resource Governance Institute

  • Topics
    Beneficial ownership
    Economic diversification
    Mandatory payment disclosure
    Revenue sharing
    Civic space
    Energy transition
    Measurement of environmental and social impacts
    Sovereign wealth funds
    Commodity prices
    Gender
    Measurement of governance
    State-owned enterprises
    Contract transparency and monitoring
    Global initiatives
    Open data
    Subnational governance
    Coronavirus
    Legislation and regulation
    Revenue management
    Tax policy and revenue collection
    Corruption
    Licensing and negotiation
  • Approach
    • Stakeholders
      • Civil society actors
      • Government officials
      • Journalists and media
      • Parliaments and political parties
      • Private sector
    • Natural Resource Charter
    • Regional knowledge hubs
  • Countries
    NRGI Priority Countries
    Colombia
    Guinea
    Nigeria
    Tanzania
    Dem. Rep. of Congo
    Mexico
    Peru
    Tunisia
    Ghana
    Mongolia
    Senegal
    Uganda
    OTHER COUNTRIES
  • Learning
    • Training
      • Residential training courses
        • Advanced
        • Executive
        • Anglophone Africa
        • Francophone Africa
        • Asia-Pacific
        • Eurasia
        • Latin America
        • Middle East and North Africa
      • Online training courses
        • Massive open online course (MOOC)
        • Interactive course: Petronia
      • Trainers' modules
        • (empty)
    • Primers
    • Glossary
  • Analysis & Tools
    • Publications
    • Tools
    • Economic models
  • About Us
    • What we do
      • 2020-2025 Strategy
      • Country prioritization
    • NRGI impact
    • Board of Directors
    • Advisory Council
    • Leadership team
    • Experts and staff
    • Careers and opportunities
    • Contact us
    • Financials
    • Grant-making
    • Privacy policy
  • News
  • Events
  • Blog

You are here

  1. Home
  2. Analysis & Tools
  3. Tools

PDF Table Extractor

Topics
Open data
Stakeholders
Civil society actors
Social Sharing

Use the application here.

Making extractives data as open and accessible as possible means finding existing data and using it, in analyses and visualizations. More often than not, this data is published in a PDF report.

PDFs are not an ideal format for publication of data. Data tables in PDFs are difficult to translate into a machine-readable format for use in a spreadsheet application, like Microsoft Excel. Copying and pasting will not work.

For this reason, over the course of a large data collection project, NRGI data staff members developed an application that simplifies the process of extracting a table from a PDF. This tool is now available online.

The application builds on the open-source software Tabula, which does the heavy lifting of identifying tables in the PDF and extracting them to tabular format. Unlike Tabula, the entire application is available through the web browser, with no download or installation required.

The application is designed around the common challenges of table scraping, like the need to compare values easily to ensure accuracy. With the PDF displayed in the application window alongside a fully-editable spreadsheet of the extracted data, this vital step is as convenient as ever. Additionally, users can scrape multiple pages of tables at once in a single click, then download them as a CSV file.

This application is built on open source technology and all code is available in the Github repo. Suggestions can be made there or by emailing [email protected]. Use the application here.

This application was developed with the help of Publish What You Pay - Canada, Kate Vang at ONE, and numerous NRGI colleagues. The application would not be possible without the open-source contributions of the Tabula team and the rOpenSci team.

Related content

Forecasting Ghana's Oil Revenues: What Open Fiscal Modeling Tells Us About the Budget Year Ahead

Samuel Bekoe, David Mihalyi
19 December 2014

Extractive Industries Data Ecosystem: A Database of Available Data Tools for the Natural Resource Governance Sector

Giorgia Cecchinato
22 July 2015

Why Weren't Governments Better Prepared for the Commodity Price Crash?

Andrew Bauer, David Mihalyi
3 July 2015

Rubbing Shoulders, Linking Arms: Q&A with Rysbek Toktogul

23 April 2015

The Miracle That Became a Debacle: Iron Ore in Sierra Leone

David Mihalyi
2 April 2015

Recent Tweets

  • NRGInstitute
    NRGInstitute
    @NRGInstitute
    Follow @NRGInstitute
    Join NRGI's president and CEO @suneetajan at the @CommunityofDem, @Int_IDEA, @opengovpart “Towards a… t.co/DuC2wr3U7B
    2 days 7 hours ago.
    Reply Retweet Favorite
  • NRGInstitute
    NRGInstitute
    @NRGInstitute
    Follow @NRGInstitute
    🛢️🌏Join NRGI, @IISD_news and expert guests this Wednesday as we discuss national #oil companies and #ClimateChange… t.co/ftRq8fObHF
    2 days 8 hours ago.
    Reply Retweet Favorite
Helping people to realize the benefits of their countries’ endowments of oil, gas and minerals.
Follow on Facebook Follow on Twitter Subscribe to Updates
  • Topics
    Beneficial ownership
    Civic space
    Commodity prices
    Contract transparency and monitoring
    Coronavirus
    Corruption
    Economic diversification
    Energy transition
    Gender
    Global initiatives
    Legislation and regulation
    Licensing and negotiation
    Mandatory payment disclosure
    Measurement of environmental and social impacts
    Measurement of governance
    Open data
    Revenue management
    Revenue sharing
    Sovereign wealth funds
    State-owned enterprises
    Subnational governance
    Tax policy and revenue collection
  • Approach
    • Stakeholders
    • Natural Resource Charter
    • Regional knowledge hubs
  • Priority
    Countries
    • Colombia
    • Dem. Rep. of Congo
    • Ghana
    • Guinea
    • Mexico
    • Mongolia
    • Nigeria
    • Peru
    • Senegal
    • Tanzania
    • Tunisia
    • Uganda
  • Learning
    • Training
    • Primers
  • Analysis & Tools
    • Publications
    • Tools
    • Economic models
  • About Us
    • What we do
    • NRGI impact
    • Board of Directors
    • Advisory Council
    • Leadership team
    • Experts and staff
    • Careers and opportunities
    • Contact us
    • Financials
    • Grant-making
    • Privacy policy
  • News
  • Blog
  • Events
  • Search