A compression problem

From Computer Science Wiki
This a problem set for you to work through [1]

This is a problem set. Some of these are easy, others are far more difficult. The purpose of these problems sets are:

  1. to build your skill applying computational thinking to a problem
  2. to assess your knowledge and skills of different programming practices

What is this problem set trying to do[edit]

Please review what a pythonic set is.

The Problem[edit]

I wrote a crawler that visits web pages[2], stores a few keywords in a database, and follows links to other web pages. I noticed that my crawler was wasting a lot of time visiting the same pages over and over, so I made a set, visited, where I'm storing URLs I've already visited. Now the crawler only visits a URL if it hasn't already been visited.

Thing is, the crawler is running on my old desktop computer in my parents' basement (where I totally don't live anymore), and it keeps running out of memory because visited is getting so huge.

How can I trim down the amount of space taken up by visited?

sites_ive_indexed = ('http://www.google.com','https://www.apple.com','https://mackenty.org','https://www.linode.com','https://allegro.pl','https://computersciencewiki.org')

How you will be assessed[edit]

Your solution will be graded using the following axis:


  • To what extent does your code implement the features required by our specification?
  • To what extent is there evidence of effort?


  • To what extent did your code meet specifications?
  • To what extent did your code meet unit tests?
  • To what extent is your code free of bugs?


  • To what extent is your code written well (i.e. clearly, efficiently, elegantly, and/or logically)?
  • To what extent is your code eliminating repetition?
  • To what extent is your code using functions appropriately?


  • To what extent is your code readable?
  • To what extent is your code commented?
  • To what extent are your variables well named?
  • To what extent do you adhere to style guide?


A possible solution[edit]

Click the expand link to see one possible solution, but NOT before you have tried and failed!

not yet!