Data Cleaning A Practical Perspective

Afbeeldingen

Artikel vergelijken

  • Engels
  • Paperback
  • 9781608456772
  • 30 september 2013
  • 85 pagina's
Alle productspecificaties

Samenvatting

One of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. This book discusses the goals of data cleaning.



Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning.

In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.

Productspecificaties

Inhoud

Taal
en
Bindwijze
Paperback
Oorspronkelijke releasedatum
30 september 2013
Aantal pagina's
85
Illustraties
Met illustraties

Betrokkenen

Hoofdauteur
Venkatesh Ganti
Tweede Auteur
Anish Das Sarma

Overige kenmerken

Extra groot lettertype
Nee
Product breedte
191 mm
Product hoogte
6 mm
Product lengte
235 mm
Studieboek
Ja
Verpakking breedte
191 mm
Verpakking hoogte
235 mm
Verpakking lengte
4 mm
Verpakkingsgewicht
165 g

EAN

EAN
9781608456772

Je vindt dit artikel in

Boek, ebook of luisterboek?
Boek
Taal
Engels
Studieboek of algemeen
Studieboeken
Nog geen reviews

Kies gewenste uitvoering

Kies je bindwijze (2)

Prijsinformatie en bestellen

Niet leverbaar

Ontvang eenmalig een mail of notificatie via de bol app zodra dit artikel weer leverbaar is.

Houd er rekening mee dat het artikel niet altijd weer terug op voorraad komt.

Lijst met gekozen artikelen om te vergelijken

Vergelijk artikelen