A compression problem

From Computer Science Wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
This a problem set for you to work through [1]

This is a problem set. Some of these are easy, others are far more difficult. The purpose of these problems sets are:

  1. to build your skill applying computational thinking to a problem
  2. to assess your knowledge and skills of different programming practices


What is this problem set trying to do[edit]

Please review what a pythonic set is.

The Problem[edit]

I wrote a crawler that visits web pages[2], stores a few keywords in a database, and follows links to other web pages. I noticed that my crawler was wasting a lot of time visiting the same pages over and over, so I made a set, visited, where I'm storing URLs I've already visited. Now the crawler only visits a URL if it hasn't already been visited.

Thing is, the crawler is running on my old desktop computer in my parents' basement (where I totally don't live anymore), and it keeps running out of memory because visited is getting so huge.

How can I trim down the amount of space taken up by visited?

sites_ive_indexed = ('http://www.google.com','https://www.apple.com','https://mackenty.org','https://www.linode.com','https://allegro.pl','https://computersciencewiki.org')

How you will be assessed[edit]

Your solution will be graded using the following axis:


Scope

  • To what extent does your code implement the features required by our specification?
  • To what extent is there evidence of effort?

Correctness

  • To what extent did your code meet specifications?
  • To what extent did your code meet unit tests?
  • To what extent is your code free of bugs?

Design

  • To what extent is your code written well (i.e. clearly, efficiently, elegantly, and/or logically)?
  • To what extent is your code eliminating repetition?
  • To what extent is your code using functions appropriately?

Style

  • To what extent is your code readable?
  • To what extent is your code commented?
  • To what extent are your variables well named?
  • To what extent do you adhere to style guide?

References[edit]

A possible solution[edit]

Click the expand link to see one possible solution, but NOT before you have tried and failed!

not yet!