Constructing a Synthetic Longitudinal Health Dataset for Data Mining

Loading...
Thumbnail Image
Date
2012
Authors
Ghassem Pour, S
Maeder, Anthony
Jorm, L
Journal Title
Journal ISSN
Volume Title
Publisher
IARIA
Rights
Copyright (c) IARIA, 2012.
Rights Holder
IARIA
Abstract
The traditional approach to epidemiological research is to analyse data in an explicit statistical fashion, attempting to answer a question or test a hypothesis. However, increasing experience in the application of data mining and exploratory data analysis methods suggests that valuable information can be obtained from large datasets using these less constrained approaches. Available data mining techniques, such as clustering, have mainly been applied to cross-sectional point-in-time data. However, health datasets often include repeated observations for individuals and so researchers are interested in following their health trajectories. This requires methods for analysis of multiple-points-over-time or longitudinal data. Here, we describe an approach to construct a synthetic longitudinal version of a major population health dataset in which clusters merge and split over time, to investigate the utility of clustering for discovering time sequence based patterns.
Description
Published version reproduced here with permission from the publisher.
Keywords
cluster analysis, Sythetic data
Citation
Ghassempour, S., Maeder, A.J. and Jorm, L. (2012). Constructing a Synthetic Longitudinal Health Dataset for Data Mining. In The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications. Wilmington, Delaware, US: IARIA. The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications. Saint Gilles, Reunion. Feb 2012, pp. 86-90.