Which type of constraint ensures the quality of information in a relational database?

Temporal Referential Integrity

Tom Johnston, in Bitemporal Data, 2014

Referential integrity is a well-understood relational constraint. It applies to conventional tables. Temporal referential integrity is an extension of conventional referential integrity. However, as we will see later, it is really the case that all referential integrity is temporal, and that conventional referential integrity is temporal referential integrity restricted to tables which permit only one row, at any time, to represent one referent.1

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080676000103

Data Quality

William McKnight, in Information Management, 2014

Referential Integrity

Referential integrity (RI) refers to the integrity of reference between data in related tables. For example, product identifiers in sales records need to also be found in the product tables. RI is essential in preventing users from navigating to a “dead end.” Many transactional systems utilize database-enforced RI, but many other systems like data warehouses2 do not, due to the performance and order restrictions it places on data loading. But RI is still important and must be enforced somehow. In the target marketing system, for example, incorrect RI may mean sales with an undefined product and/or to an undefined customer, which effectively takes that sale out of the analysis.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080560000047

Why We Test and What Tests to Run

Ralph Hughes MA, PMP, CSM, in Agile Data Warehousing for the Enterprise, 2016

Referential Integrity Test

A referential integrity test asserts that all the foreign keys in a given column link to a correct record in the parent table. Such a test will be moot for a target database that has referential integrity constraints in effect because the database engine will ensure that this assertion is true each time it loads a record into the data warehouse. Many teams turn off referential integrity constraints, however, so that the data will load more quickly. They trust the logic of the data transform modules to achieve referential integrity. Any time the team designs ETL to implement an important business rule, it is wise to test that it has been implemented correctly. If referential integrity constraints in the database are turned off, then the project’s QA effort should plan on explicitly testing that the foreign keys resolve without error.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123964649000163

Facets of the DQAF Measurement Types

Laura Sebastian-Coleman, in Measuring Data Quality for Ongoing Improvement, 2013

Business Concerns

Referential integrity is defined as the degree to which data in two or more tables related through a foreign key relationship is complete. Referential integrity is often explained through parent/child table relationships. Within a relational database, all values that are present on a child table should also be present on its parent table (the child inherits values from the parent). If a child table has values that are not on its parent table, it does not have referential integrity with the parent table. These records represent parentless children, or “orphans.” There are many reasons why a value on a child table might be absent from a parent table. A lack of data entry controls can allow values to be entered without any reference to their validity. Two systems processing similar data may not be synchronized. Or there may be differences in the update schedules of parent tables and child tables within one database.

Referential integrity is most often illustrated in relation to reference data. If a value on a fact or core table (which is a child to a reference table) is not present in a reference table, then the two will not have referential integrity. However, referential integrity also applies to the relationship between fact tables, such as between header and detail records. For example, if someone who purchases goods from a business does not have a record on the customer table but does have a record in the order table, then the record in the order table will be an orphan.

The purpose of this periodic assessment is to test the level of referential integrity in critical fields throughout a database. It mitigates the risks associated with having incomplete relationships between related datasets. Summarized results from the assessment can be used as a way to characterize the overall quality of data in a warehouse or other database.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123970336000171

Foundational Data Modeling

Rick Sherman, in Business Intelligence Guidebook, 2015

Referential Integrity

Referential integrity (RI) is a term used with relational databases to describe the integrity of the business relationships represented in the schema. It ensures that relationships between tables remain consistent. As it applies to a logical data model, RI encompasses the rules that determine the actions that are taken when parent or child rows are inserted, updated, or deleted in a database. None of these actions—inserting, updating or deleting—can violate the relationships that have been defined.

RI includes enforcing relationship cardinality rules (e.g., one-to-many, many-to-one, etc.) and the various types of relationship models (e.g., identifying, nonidentifying optional, nonidentifying mandatory, and many-to-many). There are two primary approaches to implementing RI:

1.

Leverage database functionality, such as foreign key constraints and triggers. Using this approach, when data is inserted, updated, or deleted in the database, it will enforce referential integrity.

2.

Develop data integration processes such as ETL code. With this approach, the data integration processes will examine the data that is being inserted, updated, or deleted; determine what the relationship rules are for processing; and then take the appropriate action. This is the best practice.

The first approach appears to be the most straightforward because it uses prebuilt database functionality without the need for any custom coding. If the relationships are properly implemented in the database, it is guaranteed that referential integrity will be enforced. There are two significant drawbacks to this approach, however:

First, database performance can be adversely affected when using this approach.

Second, and most importantly, database inserts, updates, and deletions can only occur if all the data related to the relationships is available at the same time. Although this condition is met in transaction processing, it is often not the case in BI applications where data is being gathered from many different source systems that have different update frequencies and operate under different business rules. This means that if you use the first method, many BI-related database updates might not be completed, which then results in significant amounts of data never being updated or made available for analysis. Under these conditions, it is best to choose the second approach of developing code.

For these reasons, transactional applications commonly enforce referential integrity by using the first method of relying on the database. In BI implementations, the best practice has been the second method of developing data integration code that can systematically process data under agreed-upon rules to enforce referential integrity. An emerging best practice has been to define database referential integrity and then to turn it off during loads if ETL-enforced RI is needed to load data.

Figure 8.13 shows a fact table for store sales—Tbl_Fact_Store_Sales. Four entities surround Tbl_Fact_Store_Sales with the following relationships:

Which type of constraint ensures the quality of information in a relational database?

FIGURE 8.13. Referential integrity.

One-to-many relationship between the entity table item Tbl_Dim_Item, which is the product being sold at the store.

One-to-many relationship between the entity of date Tbl_Dim_Date, which is when the product was sold at the store.

One-to-many relationship between the entity of customer Tbl_Dim_Customer, which is the customer who bought the product.

Within Tbl_Fact_Store_Sales are three foreign keys that are used to define the primary key of that particular entity. Any of those three entities (SK_Item_ID, SK_Date_ID, or SK_Customer_ID) can exist on their own. But you cannot have a store sale without selling a product to a customer at a particular point in time. That is how referential integrity is defined. The store sale cannot occur unless those three foreign keys are brought into the entity.

Additionally, the example shows the buyer (Tbl_Dim_Buyer), which is a one-to-many relationship, but is optional in this case. There could be a buyer who actually bought the product from another company, or the purchase could have been in another business transaction. So, this is an optional relationship that is not enforced through referential integrity.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124114616000083

Programming Tools and Technologies in SQL Server

In Designing SQL Server 2000 Databases, 2001

Referential Integrity Enhancements

Referential integrity guarantees that tables containing related information maintain their relationships. For example, in the Northwind database, the Orders table contains the shipping information for an order, and the Order Details table lists the products ordered. You cannot delete a row in the Orders table until you delete all the related rows in the Order Details table. Previously, if you wanted to delete an order without having to first delete the order detail information, you could use a delete trigger on the Orders table that would delete the related rows in the Order Details table. You could also use a stored procedure. Cascading updates and deletes provide an efficient alternative that is easier to implement.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781928994190500166

MOTIVATION

David C. Hay, in Data Model Patterns, 2006

Foreign Keys

Just as entity classes are related to each other through relationships, tables in a relational database are logically related to each other through foreign keys. A foreign key is the reference by one or more columns in one table to one or more columns in another table in order to represent a relationship. Specifically, as shown in Figure 7-29, when an entity-relationship model is converted to a database design, each one-to-many relationship is converted to a set of new columns in the table corresponding to the “many” side of the relationship (the child table). This set of columns constitutes a foreign key that is a reference to a unique key on a different (or, occasionally, the same) table. That is, a foreign key consists of columns that refer to the columns that constitute a unique key of the table on the “one” side of the relationship (the parent table).

In Figure 7-29, then, each table may be constrained by one or more foreign keys. Each foreign key, then, must be composed of one or more foreign key elements, each of which is the use of a specified column and a reference to another column. This second column must then also be used as a unique key element in the primary key referred to by the foreign key in question.

For example, as shown in Figure 7-27, “Ord_order_number” is a column in the line items table that is used as a foreign key element. It in turn is a reference to the column “Order number”, used as a unique key element in the primary key on the orders table. This foreign key represents a constraint on the line_items table, in that a row usually cannot be created if it does not have a value in the foreign key column (“ord_order_number”) that matches a value for the primary key (“order_number”) in a parent table. Whether it can or not depends on the referential integrity constraint that is an attribute of the foreign key.

A referential integrity constraint defines the extent to which a table is constrained by a foreign key. The attribute “Referential integrity constraint” may have one of the following values.

Restricted: Deletion of an occurrence of a parent table may not take place if the occurrence is related through foreign keys to occurrences of the child table.

Cascade delete: Deletion of an occurrence of a parent table causes deletion of all related occurrences of the child table.

Nullify: Deletion of an occurrence of a parent table may leave occurrences of the child table without parents. In this case, there is no constraint, and child occurrences may be created without specifying a parent.

Business Rules

1.

A foreign key element is a reference to a column that is part of a table. The same foreign key element is part of a foreign key that is a reference to a primary key that is on the same table.

2.

A foreign key element is the use of a column that is part of a table. The same foreign key element is part of a foreign key that is a constraint on the same table.

3.

If the attribute “Referential integrity constraint” of an occurrence of foreign key has the value “Restricted”, then occurrences of the table organized around the primary key referred to by the foreign key may not be deleted if occurrences exist of the table constrained by the foreign key.

4.

If the attribute “Referential integrity constraint” of an occurrence of foreign key has the value “Cascade delete”, and if an occurrence of the table organized around the primary key that it is a reference to is deleted, then each occurrence of the table that is constrained by the same foreign key must also be deleted.

5.

If the attribute “Referential integrity constraint” of an occurrence of foreign key has the value “Nullify”, then occurrences of the table organized around the primary key that it is a reference to may be deleted. If this happens, the value of the column used as a foreign key element is set to “”.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120887989500085

Temporal Transactions on Multiple Tables

Tom Johnston, Randall Weis, in Managing Time in Relational Databases, 2010

In either an RI or a TRI relationship between a managed object representing a policy and one representing a client, a client may exist without a related policy, but a policy cannot exist without a related client.1 These “mays” and “cannots”, as far as RI is concerned, are enforced on the managed objects which are rows, by the DBMS, in accordance with rules declared to it in DDL statements as constraints. These “mays” and “cannots”, as far as TRI is concerned, are enforced on the managed objects which are versions and episodes, by the AVF, in accordance with rules declared to it as entries in metadata tables.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012375041900011X

Initial Data Assessment

Laura Sebastian-Coleman, in Measuring Data Quality for Ongoing Improvement, 2013

Incomplete Referential Relationships

The term referential integrity is used to describe a specific relationship between tables. Each table in a database must have a primary key, the set of columns that define what constitutes a unique row of data. In relational databases, tables often also contain foreign keys. A foreign key represents data in another table, for which it is the primary key. For example, medical claims records should always contain a diagnosis code. The meaning associated with standard diagnosis codes will be defined on a separate reference or dimension table. On the claim table the diagnosis code column is a foreign key. On the diagnosis code table, diagnosis is the primary key. In this situation, the diagnosis code table is said to be a parent to the medical claim table that is a child of the diagnosis code table. Records in a child table that do not have corresponding references in the parent table are called orphan records. If all records on the child table do have references on the parent table, then referential integrity exists between the two tables. This relationship is sometimes called an inheritance rule.

Foreign key relationships always represent an expectation about the data. It is expected that any value present in the foreign key column (diagnosis code column on the claim table) will be present in the table for which it is the primary key (the diagnosis code reference table). If the value is not present on the parent table, then the expectation has not been met. The cause of the problem may be simple. Reference data may be incomplete or the relationship may be mislabeled or the column may be populated with an invalid value. Whatever the cause of a particular problem, profiling allows you to observe that there is a difference between what is expected and what is present in the data.

Referential relationships have been implied in the preceding discussion about validity, and to a large degree, the two concepts overlap. There is one difference between them, however: referential integrity is usually described as a specific relationship between tables in a database, whereas validity as a concept is not dependent on particular data structures. Valid domains are not always defined within a table. They can be represented as ranges, as well as through listings that do not exist in table form. Another difference is that we usually think of validity in relation to codified data. Referential integrity governs relationships in addition to those between reference and core (or dimension and fact) data. So in the medical claims example above, in addition to the relationship between the claim data and the diagnosis code reference table, there is a relationship between tables referring to other details of the claim process: specifically, between a claim header record and the associated service records, as well as between claim data and medical provider data, member data, policy holder data, and health plan data.

Referential relationships speak first to data completeness. The presence or absence of orphan records can be used as a measure of completeness. These relationships also depend on data consistency. For them to work at all depends on representing the same information in the same form across tables.

When profiling data, you should be looking for instances where references are missing from parent tables (orphan records). Such a situation goes against the expectation defined by the foreign key relationship and indicates that you are missing data. You should also look for childless parents. Such a situation is more difficult to read and does not always indicate a problem. Its meaning depends on the expectations specific to the data. It may indicate that you do not have a complete set of child records. The expected, generic relationship between products and sales illustrates these relationships. Sales data is a child of product data. No products should be represented in sales data that do not exist on a product table. There should be no orphan product references on sales records. However, it is possible to have products that never actually get sold. There may be some childless parents among the product records.

During initial assessment, it can be challenging to characterize these relationships, especially if you are working with a limited set of sample data. Sample data may imply you have a problem when you do not have one in your production data (Parent records may be missing from the sample but present in the production data). Or it may not contain evidence of a problem that exists in production data. (The sample may have full referential integrity while production data does not.) It is still critical to assess these relationships and to bring findings and questions to the data source. The results may tell you that the sample itself is not adequate for the work you need to do. Or they may show you that these relationships do not inhere in the data in the way they should based on known rules.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123970336000079

Table Operations

Joe Celko, in Joe Celko's SQL for Smarties (Fourth Edition), 2011

Redundant Duplicates in a Table

Redundant duplicates are unneeded copies of a row in a table. You most often get them because you did not put a UNIQUE constraint on the table and then you inserted the same data twice. Removing the extra copies from a table in SQL is much harder than you would think. In fact, if the rows are exact duplicates, you cannot do it with a simple DELETE FROM statement. Removing redundant duplicates involves saving one of them while deleting the other(s). But if SQL has no way to tell them apart, it will delete all rows that were qualified by the WHERE clause. Another problem is that the deletion of a row from a base table can trigger referential actions, which can have unwanted side effects.

For example, if there is a referential integrity constraint that says a deletion in Table1 will cascade and delete matching rows in Table2, removing redundant duplicates from T1 can leave me with no matching rows in T2. Yet I still have a referential integrity rule that says there must be at least one match in T2 for the single row I preserved in T1. SQL allows constraints to be deferrable or nondeferrable, so you might be able to suspend the referential actions that the transaction below would cause:

BEGIN

INSERT INTO WorkingTable --use DISTINCT to kill duplicates

SELECT DISTINCT * FROM MessedUpTable;

DELETE FROM MessedUpTable; --clean out messed-up table

INSERT INTO MessedUpTable --put working table into it

SELECT * FROM WorkingTable;

DROP TABLE WorkingTable; --get rid of working table

END;

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123820228000156

What constraints are used to define relations in a relational database?

Integrity constraints. In a relational database, the integrity constraints are any constraint that ensures database integrity. They are defined for the purpose of atomicity, consistency, isolation and durability, or ACID. Non-relational databases do not use integrity constraints.

Which constraints can be used to ensure database integrity?

The most common types of constraints include:.
UNIQUE constraints. To ensure that a given column is unique..
NOT NULL constraints. To ensure that no null values are allowed..
FOREIGN KEY constraints. To ensure that two keys share a primary key to foreign key relationship..

Which type of constraints ensures correct functioning of primary keys in a relational database?

A UNIQUE Constraint ensures that any value in a column is unique. Each row/record in a database table is uniquely identified by the PRIMARY Key.

What are the 3 three database constraints?

DEFAULT Constraint − Provides a default value for a column when none is specified. UNIQUE Constraint − Ensures that all values in a column are different. PRIMARY Key − Uniquely identifies each row/record in a database table. FOREIGN Key − Uniquely identifies a row/record in any of the given database table.