Entity information life cycle for big data : master data management and information integration / John R. Talburt, Yinle Zhou.
Material type: TextPublisher: Amsterdam ; Boston : Morgan Kaufmann, an imprint of Elsevier, [2015]Copyright date: ©2015Description: xviii, 235 pages : illustrations ; 24 cmContent type:- text
- unmediated
- volume
- 9780128005378 (paperback)
- 0128005378 (paperback)
- 025 23
- HD30.215 .T353 2015
Item type | Current library | Shelving location | Call number | URL | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|---|---|
E-Books | IUT Library | Virtual (E-Books) | Download | Available |
Includes bibliographical references (pages 219-225) and index.
Machine generated contents note: ch. 1 The Value Proposition for MDM and Big Data -- Definition and Components of MDM -- Master Data as a Category of Data -- Master Data Management -- Entity Resolution -- Entity Identity Information Management -- The Business Case for MDM -- Customer Satisfaction and Entity-Based Data Integration -- Better Service -- Reducing the Cost of Poor Data Quality -- MDM as Part of Data Governance -- Better Security -- Measuring Success -- Dimensions of MDM -- Multi-Domain MDM -- Hierarchical MDM -- Multi-Channel MDM -- Multi-Cultural MDM -- The Challenge of Big Data -- What Is Big Data? -- The Value-Added Proposition of Big Data -- Challenges of Big Data -- MDM and Big Data -- The N-Squared Problem -- Concluding Remarks -- ch. 2 Entity Identity Information and the CSRUD Life Cycle Model -- Entities and Entity References -- The Unique Reference Assumption -- The Problem of Entity Reference Resolution -- The Fundamental Law of Entity Resolution.
Note continued: Internal vs. External View of Identity -- Managing Entity Identity Information -- Entity Identity Integrity -- The Need for Persistent Identifiers -- Entity Identity Information Life Cycle Management Models -- POSMAD Model -- The Loshin Model -- The CSRUD Model -- Concluding Remarks -- ch. 3 A Deep Dive into the Capture Phase -- An Overview of the Capture Phase -- Building the Foundation -- Understanding the Data -- Data Preparation -- Selecting Identity Attributes -- Attribute Uniqueness -- Attribute Entropy -- Attribute Weight -- Assessing ER Results -- Truth Sets -- Benchmarking -- Problem Sets -- The Intersection Matrix -- Measurements of ER Outcomes -- Talburt-Wang Index -- Other Proposed Measures -- Data Matching Strategies -- Attribute-Level Matching -- Reference-Level Matching -- Boolean Rules -- Scoring Rule -- Hybrid Rules -- Cluster-Level Matching -- Implementing the Capture Process -- Concluding Remarks.
Note continued: ch. 4 Store and Share -- Entity Identity Structures -- Entity Identity Information Management Strategies -- Bring-Your-Own-Identifier MDM -- Once-and-Done MDM -- Dedicated MDM Systems -- The Survivor Record Strategy -- Attribute-Based and Record-Based EIS -- ER Algorithms and EIS -- The Identity Knowledge Base -- Storing versus Sharing -- MDM Architectures -- External Reference Architecture -- Registry Architecture -- Reconciliation Engine -- Transaction Hub -- Concluding Remarks -- ch. 5 Update and Dispose Phases -- Ongoing Data Stewardship -- Data Stewardship -- The Automated Update Process -- Clerical Review Indicators -- Pair-Level Review Indicators -- Cluster-Level Review Indicators -- The Manual Update Process -- Asserted Resolution -- Correction Assertions -- Confirmation Assertions -- EIS Visualization Tools -- Assertion Management -- Search Mode -- Negative Resolution Review Mode -- Positive Resolution Review Mode.
Note continued: Managing Entity Identifiers -- The Problem of Association Information Latency -- Models for Identifier Change Management -- Concluding Remarks -- ch. 6 Resolve and Retrieve Phase -- Identity Resolution -- Identity Resolution -- Identity Resolution Access Modes -- Batch Identity Resolution -- Interactive Identity Resolution -- Identity Resolution API -- Confidence Scores -- Depth and Degree of Match -- Match Context -- Confidence Score Model -- Concluding Remarks -- ch. 7 Theoretical Foundations -- The Fellegi-Sunter Theory of Record Linkage -- The Context and Constraints of Record Linkage -- The Fellegi-Sunter Matching Rule -- The Fundamental Fellegi-Sunter Theorem -- Attribute Level Weights and the Scoring Rule -- Frequency-Based Weights and the Scoring Rule -- The Stanford Entity Resolution Framework -- Abstraction of Match and Merge Operations -- The Entity Resolution of a Set of References -- Consistent ER -- The R-Swoosh Algorithm.
Note continued: Entity Identity Information Management -- EIIM and Fellegi-Sunter -- EIIM and the SERF -- Concluding Remarks -- ch. 8 The Nuts and Bolts of Entity Resolution -- The ER Checklist -- Deterministic or Probabilistic? -- Calculating the Weights -- Cluster-to-Cluster Classification -- The Unique Reference Assumption and Transitive Closure -- Selecting an Appropriate Algorithm -- The One-Pass Algorithm -- Concluding Remarks -- ch. 9 Blocking -- Blocking -- Two Causes of Accuracy Loss -- Blocking as Prematching -- Blocking by Match Key -- Match Key and Match Rule Alignment -- The Problem of Similarity Functions -- Dynamic Blocking versus Preresolution Blocking -- Preresolution Blocking with Multiple Match Keys -- Blocking Precision and Recall -- Match Key Blocking for Boolean Rules -- Match Key Blocking for Scoring Rules -- Concluding Remarks -- ch. 10 CSRUD for Big Data -- Large-Scale ER for MDM -- Large-Scale ER with Single Match Key Blocking.
Note continued: The Transitive Closure Problem -- Distributed, Multiple-Index, Record-Based Resolution -- Transitive Closure as a Graph Problem -- References and Match Keys as a Graph -- An Iterative, Nonrecursive Algorithm for Transitive Closure -- Bootstrap Phase: Initial Closure by Match Key Values -- Iteration Phase: Successive Closure by Reference Identifier -- Deduplication Phase: Final Output of Components -- Example of Hadoop Implementation -- ER Using the Null Rule -- The Capture Phase and IKB -- The Identity Update Problem -- Persistent Entity Identifiers -- The Large Component and Big Entity Problems -- Postresolution Transitive Closure -- Incremental Transitive Closure -- The Big Entity Problem -- Identity Capture and Update for Attribute-Based Resolution -- Concluding Remarks -- ch. 11 ISO Data Quality Standards for Master Data -- Background -- Data Quality versus Information Quality -- Relevance to MDM -- Goals and Scope of the ISO 8000-110 Standard.
Note continued: Unambiguous and Portable Data -- The Scope of ISO 8000-110 -- Motivational Example -- Four Major Components of the ISO 8000-110 Standard -- pt. 1 General Requirements -- pt. 2 Syntax of the Message -- pt. 3 Semantic Encoding -- pt. 4 Conformance to Data Specifications -- Simple and Strong Compliance with ISO 8000-110 -- ISO 22745 Industrial Systems and Integration -- Beyond ISO 8000-110 -- pt. 120 Provenance -- pt. 130 Accuracy -- pt. 140 Completeness.
Entity Information Life Cycle for Big Data walks you through the ins and outs of managing entity information so you can successfully achieve master data management (MDM) in the era of big data. This book explains big data's impact on MDM and the critical role of entity information management system (EIMS) in successful MDM. Expert authors Dr. John R. Talburt and Dr. Yinle Zhou provide a thorough background in the principles of managing the entity information life cycle and provide practical tips and techniques for implementing an EIMS, strategies for exploiting distributed processing to handle big data for EIMS, and examples from real applications. Additional material on the theory of EIIM and methods for assessing and evaluating EIMS performance also make this book appropriate for use as a textbook in courses on entity and identity management, data management, customer relationship management (CRM), and related topics.