Extract text from a PDF

From Computer Science Wiki
Revision as of 14:17, 18 September 2018 by Mr. MacKenty (talk | contribs) (Created page with "right|frame|This a problem set for you to work through <ref>http://www.flaticon.com/</ref> This is a problem set. Some of these are easy, others are far m...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
This a problem set for you to work through [1]

This is a problem set. Some of these are easy, others are far more difficult. The purpose of these problems sets are:

  1. to build your skill applying computational thinking to a problem
  2. to assess your knowledge and skills of different programming practices


What is this problem set trying to do[edit]

  1. This is tricky.
  2. PDF's are a ubiquitous file format
  3. They are famously difficult to get text from.


The Problem[edit]

Extract specific text from a PDF. Start here:

  1. From terminal (inside visual studio code or iTerm) : pip3 install PyPDF2
  2. Find some silly pdf to use (um, with text).
  3. use this code to get started:
import PyPDF2
pdfFileObject = open('IBCompSciGuide.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
count = pdfReader.numPages
for i in range(count):
    page = pdfReader.getPage(i)
    print(page.extractText())


Unit Tests[edit]

  • User Input: Name: Bill
  • Expected output: Hello Bill
  • User Input: Name: TJ
  • Expected output: An administrator! Hello TJ
  • User Input: Name: 123
  • Expected output: Hello 123

Hacker edition[edit]

In the hacker version:

  • Your program should test for valid user input. The user input should be only allow for strings

THIS PART ISNT DONE YET

How you will be assessed[edit]

Your solution will be graded using the following axis:


Scope

  • To what extent does your code implement the features required by our specification?
  • To what extent is there evidence of effort?

Correctness

  • To what extent did your code meet specifications?
  • To what extent did your code meet unit tests?
  • To what extent is your code free of bugs?

Design

  • To what extent is your code written well (i.e. clearly, efficiently, elegantly, and/or logically)?
  • To what extent is your code eliminating repetition?
  • To what extent is your code using functions appropriately?

Style

  • To what extent is your code readable?
  • To what extent is your code commented?
  • To what extent are your variables well named?
  • To what extent do you adhere to style guide?

References[edit]

A possible solution[edit]

Click the expand link to see one possible solution, but NOT before you have tried and failed!

not yet!