Catalog Home Page

The dangers of webcrawled datasets

Bell, G.B. (2010) The dangers of webcrawled datasets. First Monday, 15 (2).

PDF - Published Version
Download (168kB) | Preview


    This article highlights legal, ethical and scientific problems arising from the use of large experimental datasets gathered from the Internet — in particular, image datasets. Such datasets are currently used within research into topics such as information forensics and image processing. This paper strongly recommends against Webcrawling as a means for generating experimental datasets, and proposes safer alternatives.

    Publication Type: Journal Article
    Murdoch Affiliation: School of Information Technology
    Publisher: University of Illinois
    Copyright: Creative Commons Attribution 2.5 UK: Scotland License
    Notes: “The dangers of Webcrawled datasets” by Graeme Bell is licensed under a Creative Commons Attribution 2.5 UK: Scotland License. Permissions beyond the scope of this license may be available at
    Item Control Page


    Downloads per month over past year