The terms “data cleansing” and “data scrubbing” are interchangeable;
both involve detecting and correcting (or removing) corrupt or inaccurate records from a database. Data cleansing services can
transform and combine different data, remove inaccuracies, standardize common values, remove redundancy, parse values and cleanse
corrupt data to create consistent, reliable information.
Data integration
Data integration is the process of combining data from different sources and providing the user with a unified
view of the data. Data cleansing supplements this process.
During the process of data integration, data from multiple sources
are combined into a single data set. Redundant data entries are identified for consolidation or elimination.
Data integration
is essential to business intelligence because it connects together information needed to make strategic decisions across asset types,
provides quick and convenient access to data, improves quality and comprehensiveness of data, promotes consistency and reduces the
cost of data collection, storage and processing. An organization will benefit most from enterprise business intelligence when
it helps users generate concise information from multiple data sources.
Consider a web application where a user can query a variety of information about cities such as crime statistics, weather, hotels,
demographics, etc. Traditionally, the information must exist in a single database with a single schema. Information of this breadth,
however, is difficult and expensive for a single enterprise to collect. Even if the resources exist to gather the data, it would likely
duplicate data in existing crime databases, weather websites, and census data.
A data integration solution addresses this problem by
considering these external resources as materialized views over a virtual mediated schema. This means application developers construct
a schema to best model the kinds of answers their users want. This virtual schema is called the mediated schema. Next, they design
"wrappers" or adapters for each data source, such as the crime database and weather website. These adapters simply transform the local
query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution.
When an application-user queries the mediated schema, the data integration solution transforms this query into appropriate queries
over the respective data sources. Finally, the results of these queries are combined into the answer to the user's query.
A convenience
of this solution is that new sources can be added by simply constructing an adapter for them. This contrasts with ETL systems or a
single database solution where the entire new dataset must be manually integrated into the system.
Data mining uncovers patterns in data. This process can be effected by descriptive statistics, data
summary, and/or predictive techniques. These patterns play a critical role in decision making. Using data mining, companies and organizations
can increase the profitability of their businesses by uncovering opportunities and detecting potential risks. It lies at the
core of business intelligence.