Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

A scalable framework for integrated social data mining

Meneghello, James (2017) A scalable framework for integrated social data mining. PhD thesis, Murdoch University.

PDF - Whole Thesis
Download (2MB) | Preview


Social Networking Sites (SNS) are ubiquitous within modern society, forming communications networks that span across cultural and geographical boundaries. The information posted to these sites provide useful insights into individuals, but can also provide a wealth of information that can be used for further analysis into the surrounding environment. Three main challenges limit the use of this information in applications: the quantity of data is often unmanageable, there is a significant amount of data unavailable for use due to a lack of generic interfaces for access, and there is difficulty in integrating multiple disparate social data sources.

The overall aim of the research described in this thesis is to advance the field of data science and improve accessibility of social data in analytical applications, in both academic and commercial settings. This aim has been addressed with three primary contributions; new algorithms to efficiently locate and collect relevant social data, new methods of performing unsupervised data extraction from generic social sites, and the development and subsequent empirical evaluation of a framework to facilitate the collection, integration, storage and presentation of social data for use in applications.

The first contribution was the presentation of a search query optimisation algorithm designed to reduce the amount of noise resulting from social data collection by learning from collected content and iteratively building new query keyword sets. The algorithm was empirically evaluated and the results indicated that it provides significantly more data than existing search tools while minimising signal-to-noise ratio.

The second contribution aimed to improve access to social data available on Web 2.0 sites but without any existing interface access to the data. The algorithm is designed to extract social data from sites without any a priori knowledge of design or page layout. Its efficacy was empirically evaluated against a testbed consisting of popular news and current affairs websites. Results indicated that the algorithm was very effective at unsupervised retrieval of social data.

The third major contribution presented a framework that integrated the previous two contributions into a framework designed to streamline use of social data in academic and commercial applications. The generic, component-based design was evaluated in real-world scenarios and determined to provide a full social collection and analytics workflow in an extensible and scalable manner.

This research has theoretical and practical implications for the use of social data in analytical research and commercial use. It extends the data extraction field to include user-generated content, while providing new avenues for performing semi-intelligent social data sourcing, and significantly improves the accessibility of social data.

Item Type: Thesis (PhD)
Murdoch Affiliation(s): School of Engineering and Information Technology
Supervisor(s): Thompson, Nik, Wong, Kevin and Lee, Kevin
Item Control Page Item Control Page


Downloads per month over past year