Reference *
Microsoft Excel and VBA click for sample of projects
Excel is a powerful spreadsheet allows you to store, manipulate, analyze, and visualize data. It features an intuitive interface and capable calculation and graphing tools which, has made Excel one of the most popular
microcomputer applications to date. It is overwhelmingly the dominant spreadsheet application available for these platforms and has
been so since version 5 in 1993 and its bundling as part of Microsoft Office.
Excel has included Visual Basic for Applications
(VBA), a programming language based on Visual Basic which adds the ability to automate tasks in Excel and to provide user defined
functions (UDF) for use in worksheets. VBA is a powerful addition to the application which, in later versions, includes a fully featured
integrated development environment (IDE). Macro recording can produce VBA code replicating user actions, thus allowing simple automation
of regular tasks. VBA allows the creation of forms and in-worksheet controls to communicate with the user. The language supports use
(but not creation) of ActiveX (COM) DLL's; later versions add support for class modules allowing the use of basic object-oriented
programming (OOP) techniques.
Data Mining click for more info
Data mining is the process of automatically searching large volumes of data for patterns. It is usually used
by businesses and other organizations, but is increasingly used in the sciences to extract information from the enormous data sets
generated by modern experimentation.
Although data mining is a relatively new term, the technology is not. Companies for a long time
have used powerful computers to sift through volumes of data such as supermarket scanner data, and produce market research reports.
Continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy
and usefulness of analysis.
Data mining identifies trends within data that go beyond simple analysis. Through the use of sophisticated
algorithms, users have the ability to identify key attributes of business processes and target opportunities.
The term data mining
is often used to apply to the two separate processes of knowledge discovery and prediction. Knowledge discovery provides explicit
information that has a readable form and can be understood by a user. Forecasting, or predictive modeling provides predictions of
future events and may be transparent and readable in some approaches (e.g. rule based systems) and opaque in others such as neural
networks. Moreover, some data mining systems such as neural networks are inherently geared towards prediction rather than knowledge
discovery.
Forecasting Analysis click for more info
Forecasting is the process of estimation in unknown situations. Prediction is a similar, but more general term, and usually refers to estimation of time series, cross-sectional or longitudinal data. Forecasting is commonly used in discussion of time-series data.
Time series methods use historical data as the basis for estimating future outcomes.
• Moving average
• Exponential smoothing
• Extrapolation
• Linear prediction
• Trend estimation
• Growth
curve
Some forecasting methods use the assumption that it is possible to identify the underlying factors that might influence the variable
that is being forecasted. For example, sales of umbrellas might be associated with weather conditions. If the causes are understood,
projections of the influencing variables can be made and used in the forecast.
• Regression analysis
using linear regression or non-linear regression
• Autoregressive moving average (ARMA)
• Autoregressive integrated moving average (ARIMA) e.g. Box-Jenkins
• Econometrics
In statistics,
regression analysis is the process used to estimate the parameter values of a function, in which the function predicts the value of
a response variable in terms of the values of other variables. There are many methods developed to fit functions and these methods
typically depend on the type of function being used.
An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average or (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series. The model is generally referred to as an ARIMA(p,d,q) model where p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, integrated, and moving average parts of the model respectively.
Database Marketing
Database marketing is a form of direct marketing using databases of customers or potential customers to generate
personalized communications in order to promote a product or service for marketing purposes. The method of communication can be any
addressable medium, as in direct marketing.
The distinction between direct and database marketing stems primarily from the attention
paid to the analysis of data. Database marketing emphasizes the use of statistical techniques to develop models of customer behavior,
which are then used to select customers for communications. As a consequence, database marketers also tend to be heavy users of data
warehouses, because having a greater amount of data about customers increases the likelihood that a more accurate model can be built.
Data
Cleansing click
for more info
Data cleansing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set.
After
cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have
been originally caused by different data dictionary definitions of similar entities in different stores, may have been caused by user
entry errors, or may have been corrupted in transmission or storage.
Preprocessing the data will also guarantee that it is unambiguous,
correct, and complete.
The actual process of data cleansing may involve removing typos or validating and correcting values against
a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid ZIP code) or fuzzy
(such as correcting records that partially match existing, known records).
Data cleansing is synonymous with the less frequently-used
term data scrubbing. Data cleansing differs from data validation in that validation almost invariably means data is rejected from
the system at entry and is performed at entry time, rather than on batches of data.
Data Integration click for more info
Data integration is the problem of combining data residing at different sources and providing the user with a unified view of these data. This important problem emerges in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories). Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has been the focus of extensive theoretical work and numerous open problems remain to be solved. In practice, data integration is frequently called Enterprise Information Integration.
* Information obtained from en.wikipedia.org.