Catalog Home Page

The dangers of webcrawled datasets

Bell, G.B. (2010) The dangers of webcrawled datasets. First Monday, 15 (2).

PDF - Published Version
Download (172kB)


This article highlights legal, ethical and scientific problems arising from the use of large experimental datasets gathered from the Internet — in particular, image datasets. Such datasets are currently used within research into topics such as information forensics and image processing. This paper strongly recommends against Webcrawling as a means for generating experimental datasets, and proposes safer alternatives.

Publication Type: Journal Article
Murdoch Affiliation: School of Information Technology
Publisher: University of Illinois
Copyright: Creative Commons Attribution 2.5 UK: Scotland License
Notes: “The dangers of Webcrawled datasets” by Graeme Bell is licensed under a Creative Commons Attribution 2.5 UK: Scotland License. Permissions beyond the scope of this license may be available at
Item Control Page Item Control Page


Downloads per month over past year