Workshop on Data Derivation and Provenance
Provenance and data derivation are important to many aspects of scientific computation. In molecular biology, where data is repeatedly copied, corrected, and transformed as it passes through numerous genomic databases, understanding where data has come from and how it arrived in the user's database is of crucial to the trust a scientist will put in that data, yet this information is seldom captured properly. In astronomy, useful results may have been been obtained by filtering, transforming, and analyzing some base data by a complex assemblage of programs, yet we lack good tools for recording how these programs were connected and the context in which they were run.
The importance of provenance goes well beyond verification; it is closely related to archiving and annotation, also important in the context of scientific data. Moreover it may be used in data discovery. Knowing the provenance of a data item may help the biologist to make connections with other useful data. The astronomer may want to understand a derivation in order to repeat it with modified parameters, and being able to describe a derivation may help a researcher to discover whether a particular kind of analysis has already been performed.
Posted by ghbrett at May 01, 2003 10:55 AM