{"id":1679,"date":"2022-10-11T11:00:39","date_gmt":"2022-10-11T11:00:39","guid":{"rendered":"https:\/\/blog.amt.in\/?p=1679"},"modified":"2022-10-11T11:00:39","modified_gmt":"2022-10-11T11:00:39","slug":"introduction-to-data-integrity","status":"publish","type":"post","link":"https:\/\/blog.amt.in\/index.php\/2022\/10\/11\/introduction-to-data-integrity\/","title":{"rendered":"Introduction to Data Integrity"},"content":{"rendered":"<p>Data integrity\u00c2\u00a0is the maintenance of, and the assurance of the accuracy and consistency of\u00c2\u00a0data\u00c2\u00a0over its entire\u00c2\u00a0life-cycle,\u00c2\u00a0and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context\u00c2\u00a0\u00e2\u20ac\u201c even under the same general umbrella of\u00c2\u00a0computing. It is at times used as a proxy term for\u00c2\u00a0data quality,\u00c2\u00a0while\u00c2\u00a0data validation\u00c2\u00a0is a pre-requisite for data integrity.\u00c2\u00a0Data integrity is the opposite of\u00c2\u00a0data corruption.\u00c2\u00a0The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the data is the same as it was when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with\u00c2\u00a0data security, the discipline of protecting data from unauthorized parties.<\/p>\n<p>Any unintended changes to data as the result of a storage, retrieval or processing operation, including malicious intent, unexpected hardware failure, and\u00c2\u00a0human error, is failure of data integrity. If the changes are the result of unauthorized access, it may also be a failure of data security. Depending on the data involved this could manifest itself as benign as a single pixel in an image appearing a different color than was originally recorded, to the loss of vacation pictures or a business-critical database, to even catastrophic loss of human life in a\u00c2\u00a0life-critical system.<\/p>\n<h3><span id=\"Physical_integrity\" class=\"mw-headline\">Physical integrity:<\/span><\/h3>\n<p>Physical integrity deals with challenges associated with correctly storing and fetching the data itself. Challenges with physical integrity may include\u00c2\u00a0electromechanical\u00c2\u00a0faults, design flaws, material\u00c2\u00a0fatigue,\u00c2\u00a0corrosion, power outages, natural disasters, acts of war and terrorism, and other special environmental hazards such as ionizing radiation, extreme temperatures, pressures and\u00c2\u00a0g-forces. Ensuring physical integrity includes methods such as\u00c2\u00a0redundant\u00c2\u00a0hardware, an\u00c2\u00a0uninterruptible power supply, certain types of\u00c2\u00a0RAID\u00c2\u00a0arrays,\u00c2\u00a0radiation hardened\u00c2\u00a0chips,\u00c2\u00a0error-correcting memory, use of a\u00c2\u00a0clustered file system, using file systems that employ block level\u00c2\u00a0checksums\u00c2\u00a0such as\u00c2\u00a0ZFS, storage arrays that compute parity calculations such as\u00c2\u00a0exclusive or\u00c2\u00a0or use a\u00c2\u00a0cryptographic hash function\u00c2\u00a0and even having a\u00c2\u00a0watchdog timer\u00c2\u00a0on critical subsystems.<\/p>\n<p>Physical integrity often makes extensive use of error detecting algorithms known as\u00c2\u00a0error-correcting codes. Human-induced data integrity errors are often detected through the use of simpler checks and algorithms, such as the\u00c2\u00a0Damm algorithm\u00c2\u00a0or\u00c2\u00a0Luhn algorithm. These are used to maintain data integrity after manual transcription from one computer system to another by a human intermediary (e.g. credit card or bank routing numbers). Computer-induced transcription errors can be detected through\u00c2\u00a0hash functions.<\/p>\n<p>In production systems, these techniques are used together to ensure various degrees of data integrity. For example, a computer\u00c2\u00a0file system\u00c2\u00a0may be configured on a fault-tolerant RAID array, but might not provide block-level check-sums to detect and prevent\u00c2\u00a0silent data corruption. As another example, a database management system might be compliant with the\u00c2\u00a0ACID\u00c2\u00a0properties, but the RAID controller or hard disk drive&#8217;s internal write cache might not be.<\/p>\n<h3><span id=\"Logical_integrity\" class=\"mw-headline\">Logical integrity:<\/span><\/h3>\n<p>This type of integrity is concerned with the\u00c2\u00a0correctness\u00c2\u00a0or\u00c2\u00a0rationality\u00c2\u00a0of a piece of data, given a particular context. This includes topics such as\u00c2\u00a0referential integrity\u00c2\u00a0and\u00c2\u00a0entity integrity\u00c2\u00a0in a\u00c2\u00a0relational database\u00c2\u00a0or correctly ignoring impossible sensor data in robotic systems. These concerns involve ensuring that the data &#8220;makes sense&#8221; given its environment. Challenges include\u00c2\u00a0software bugs, design flaws, and human errors. Common methods of ensuring logical integrity include things such as\u00c2\u00a0check constraints,\u00c2\u00a0foreign key constraints, program\u00c2\u00a0assertions, and other run-time sanity checks.<\/p>\n<p>Both physical and logical integrity often share many common challenges such as human errors and design flaws, and both must appropriately deal with concurrent requests to record and retrieve data, the latter of which is entirely a subject on its own.<\/p>\n<p>Data integrity contains guidelines for\u00c2\u00a0data retention, specifying or guaranteeing the length of time data can be retained in a particular database. To achieve data integrity, these rules are consistently and routinely applied to all data entering the system, and any relaxation of enforcement could cause errors in the data. Implementing checks on the data as close as possible to the source of input (such as human data entry), causes less erroneous data to enter the system. Strict enforcement of data integrity rules results in lower error rates, and time saved troubleshooting and tracing erroneous data and the errors it causes to algorithms.<\/p>\n<p>Data integrity also includes rules defining the relations a piece of data can have, to other pieces of data, such as a\u00c2\u00a0Customer\u00c2\u00a0record being allowed to link to purchased\u00c2\u00a0Products, but not to unrelated data such as\u00c2\u00a0Corporate Assets. Data integrity often includes checks and correction for invalid data, based on a fixed\u00c2\u00a0schema\u00c2\u00a0or a predefined set of rules. An example being textual data entered where a date-time value is required. Rules for data derivation are also applicable, specifying how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could be re-derived.<\/p>\n<p>Data integrity is normally enforced in a\u00c2\u00a0database system\u00c2\u00a0by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity.<\/p>\n<ul>\n<li><i>Entity integrity<\/i>\u00c2\u00a0concerns the concept of a\u00c2\u00a0primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.<\/li>\n<li><i>Referential integrity<\/i>\u00c2\u00a0concerns the concept of a\u00c2\u00a0foreign key. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign-key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be\u00c2\u00a0nul<a title=\"Null (SQL)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Null_(SQL)\">l<\/a>. In this case, we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown.<\/li>\n<li><i>Domain integrity<\/i>\u00c2\u00a0specifies that all columns in a relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.<\/li>\n<li><i>User-defined integrity<\/i>\u00c2\u00a0refers to a set of rules specified by a user, which do not belong to the entity, domain and referential integrity categories.<\/li>\n<\/ul>\n<p>The above is a brief about Data Integrity. Watch this space for more updates on the latest Trends in Technology.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data integrity\u00c2\u00a0is the maintenance of,<\/p>\n","protected":false},"author":1,"featured_media":1681,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[613,918,7],"tags":[615,917,18],"class_list":["post-1679","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-integrity","category-data-validation","category-techtrends","tag-data-integrity","tag-data-validation","tag-technology"],"_links":{"self":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1679","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/comments?post=1679"}],"version-history":[{"count":1,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1679\/revisions"}],"predecessor-version":[{"id":1680,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/posts\/1679\/revisions\/1680"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/media\/1681"}],"wp:attachment":[{"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/media?parent=1679"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/categories?post=1679"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.amt.in\/index.php\/wp-json\/wp\/v2\/tags?post=1679"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}