Bei Zhou Project -- IS1320, Spring 2003

Project #7 Building a web crawler

Last Updated May 24, 2003

Summary

"Using existing toolkits in Java, build a web crawler that downloads documents and saves ones you've specified as relevant. Learn about, explain and respect robot exclusion statements on sites. The focus for this project will be on downloading images. May use Java2D to deal with the images, once downloaded. The goal is to do a simple emulation of the google image search system."

Week 1. Wed. March 26
Link to the report
Week 2. Mon. March 31
Link to the report
Week 3. Mon. April 7. R1
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport1.html
Week 4. Mon. April 14. R2
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport2.html
Week 5. Mon. April 21. R3
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport3.html
Week 6. Mon. April 28. R4
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport4.html
Week 7. Mon. May 5. R5
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport5.html
Week 8. Mon. May 12. R6
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/WebCrawler.java
Week 9. Mon. May 19. R7
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport7.html
Week 10. Mon. May 26. R8
http://www.ccs.neu.edu/home/beihz/is1320sp2003/pub/weeklyreport8.html