Nowadays a vast amount of data is generated in Extensible Markup Language (XML). However, it is necessary for applications in some domains to store and manipulate uncertain information, e.g. when the sensor inputs are noisy, or we want to store data that is uncertain. Another big change we can see in applications and web data is the increasing use of ontologies to describe the semantics of data, i.e., the semantic relationships between the terms in the databases. As such information is usually absent from traditional databases, there is tremendous opportunity to ask new kinds of queries that could not be handled in the past. This provides new challenges on how to manipulate and maintain such new kinds of database systems. In this dissertation, we will see how we can (i) incorporate and manipulate uncertainty in databases, and (ii) efficiently compute aggregates and maintain views on ontology datab
Contents: Managing Uncertainty and Ontologies in Databases
1 Introduction
1.1 New Challenges in XML Databases
1.2 Uncertainty in XML
1.3 Ontologies in XML
1.4 Contributions
1.5 Organization
2 Probabilistic XML Model, Algebra and Aggregation
2.1 Motivating Examples
2.1.1 A Bibliographical Application
2.1.2 A Surveillance Application
2.2 Probabilistic Semistructured Data Model
2.2.1 Semistructured Data Model
2.2.2 The PXML Probabilistic Data Model
2.3 Semantics
2.4 Probabilistic Point Queries
2.5 Probabilistic Semistructured Algebra
2.5.1 Projection
2.5.2 Selection
2.5.3 Cartesian Product
2.6 Probabilistic Aggregate Operators
2.6.1 Aggregates on semistructured instances
2.6.2 Possible-worlds aggregates on probabilistic instances
2.6.3 Expected aggregates on probabilistic instances
2.7 Probabilistic Aggregate Algorithms
2.7.1 SP Algorithm
2.7.2 Complexity of SP Algorithm
2.7.3 Pruning
2.7.4 SE Algorithm
2.7.5 Complexity of SE Algorithm
2.8 PXML Experiments
2.8.1 Experimental Design
2.8.2 Performance results of algebra experiments
2.8.3 Performance results of aggregate experiments
2.9 Summary
3 Probabilistic Interval XML Model
3.1 Interval Probabilities
3.2 The PIXML Data Model
3.3 PIXML : Declarative Semantics
3.3.1 Connections between Local and Global Semantics
3.3.2 Satisfaction
3.4 PIXML Queries: Syntax
3.4.1 Single-Instance Queries
3.4.2 Multiple-Instance Queries
3.5 PIXML Queries: r-Answers
3.5.1 r-Answers to SISO queries
3.5.2 r-Answers to SIMO/MIMO queries
3.6 PIXML Queries: Operational Semantics
3.6.1 Algorithm to solve SISO queries
3.6.2 Algorithm to solve SIMO queries
3.6.3 Algorithm to solve IMIMO queries
3.6.4 Algorithm to solve DMIMO queries
3.7 Summary
4 Maintaining RDF Databases
4.1 Overview of RDF and RDQL
4.1.1 RDF Model
4.1.2 RDQL, Views and Graph Patterns
4.2 Maintenance Algorithms
4.2.1 Insertion Maintenance Algorithms
4.2.2 Deletion Maintenance Algorithm
4.2.3 Triple Modification Algorithm
4.2.4 Resource Modification Algorithm
4.3 RDF Aggregates
4.3.1 Algorithm to Compute Aggregates
4.4 Aggregation Maintenance Algorithms
4.4.1 Insertion
4.4.2 Deletion
4.4.3 Triple Modification…