• Blog Home
  • Tech Talk
    • Best Practices
    • Java
    • .NET
    • Mobile
    • UI/ UX
    • Systems Engineering
    • Quality Assurance
  • ClubM

Sign in

  • Mazarin Corporate Site »
Mazarin Blog
stay connected
Join us on Facebbook! Follow Us on Twitter! Subscribe to our RSS Feed!
Feb
15
2017
.NET // Tech Talk

Data Mining using SQL Server Analysis Server

Author Article by Shamaran Satkunanathan    Comments No Comments

Data mining is the process of discovering actionable information from large sets of data. It helps organizations analyze incredible amounts of data in order to detect common patterns or learn new things.

In today’s world, software applications are moving from traditional information systems to Business Intelligent systems. The growing of data and information have within data need to raise to develop new kind of applications for the organizations. To address this, data mining solutions have become an integral part of many software solutions.

Data mining is the process of discovering actionable information from large sets of data.  It helps organizations analyze incredible amounts of data in order to detect common patterns or learn new things. It uses mathematical analysis to derive patterns and trends from existing data. However existing data need to be organized via ETL (Extract, Transform, Loading) process before applying the mining technique on them. This is because typically these patterns cannot be discovered by traditional data exploration methods since the relationships are too complex or because there is too much data.

There are many data mining techniques available to analyze data and drive the useful knowledge and patterns from those. These techniques are ranged from extremely complex to basic. Each technique serves a slightly different purpose or goal. Here are few example approaches to data mining.

Clustering   
Cluster detection is a type of pattern recognition that is used to detect patterns within large data sets. It’s a bit like arranging a large amount of information into categories using patterns which emerge during data analysis.

Classification
Classification Analysis is a systematic process for obtaining important and relevant information about data, and metadata – data about data. The classification analysis helps identifying to which set of categories different types of data belong. Classification analysis is closely linked to cluster analysis as the classification can be used to cluster data.

Regression
Regression is a technique that aims to predict future outcomes using large sets of existing variables. This is used to predict future user engagement, customer retention and even property prices.

Anomaly or Outlier Detection
Anomaly detection refers to the search for data items in a dataset that do not match a projected pattern or expected behaviour. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and actionable information.

Association Rule Learning
Association rule learning enables the discovery of interesting relations between different variables in large databases. Association rule learning uncovers hidden patterns in the data that can be used to identify variables within the data and the co-occurrences of different variables that appear with the greatest frequencies.

 

Microsoft SQL Sever Data tools
There are many data mining tools available to apply the data and predict. Here we are going to talk about Microsoft SQL Server Data tools and how its features support to build & deploy a mining model.

The Microsoft SQL Server Data tools includes SQL server relational databases, Azure SQL databases, Integration Services packages, Analysis Services data models and reporting Services. The analysis server contains data mining algorithms and query tools that make it easy to build a comprehensive solution for a variety of projects. SQL Server Management Studio, contains tools for browsing models and managing data mining objects.

Developing a mining application generally takes following steps

  1. Defining the problem
  2. Preparing Data
  3. Exploring Data
  4. Building Models
  5. Exploring and Validating Models
  6. Deploying and updating Models

Following diagram explains the cyclical flow of creating a data mining model. Each step in the process needs to be repeated many times in order to create a decent model.

Deploying and updating Models

Image source: https://i-msdn.sec.s-msft.com/dynimg/IC125015.jpeg

 

Defining the Problem
First step is to determine the scope of the problem and analyzing the business requirements to defining specific goals for the data mining project. Here we might need to conduct a data availability study.

Preparing Data
In this step, we are working with a very huge data set and cannot examine every transaction for data quality; therefore, we might need data profiling and automated data cleansing and filtering tools, such as Microsoft SQL Server Master Data Services or SQL Server Data Quality Services to explore the data and find the inconsistencies.

Exploring Data
In this step, we can use tools such as Master Data Services to canvass available sources of data and define their availability for data mining. We can use tools such as SQL Server Data Quality Services, or the Data Profiler in Integration Services, to analyze the distribution of data and repair issues such as incorrect or missing data.

Building Models
In this step, we state the columns of data which we want to use by creating a mining structure. When we process the mining structure, Analysis Services produces aggregates and other statistical information that can be used for analysis. This information can be used by any mining model that is based on the structure.

Processing a model is called as training. Applying a specific mathematical algorithm to the data in the structure is the process of training. By using training, we can extract patterns. The patterns that we find in the training process depend on the following three points

  • Selection of training data,
  • The algorithm we chose,
  • How we have configured the algorithm.

We can also use parameters to adjust each algorithm and apply filters to the training data to use just a subset of the data. After data is passed through the model, the mining model object holds summaries and patterns that can be queried or used for prediction.

Exploring and Validating Models
By using Analysis Services we can distinct data into training and testing a dataset so that we can precisely assess the performance of all models on the same data. We use the training dataset to build the model and the testing dataset to test the accuracy of the model by creating prediction queries.

We can explore the trends and patterns that the algorithms discover by using the viewers in Data Mining Designer in SQL Server Data Tools. We can also test how well the models create predictions by using tools in the designer such as the lift chart and classification matrix. To verify whether the model is specific to our data or might be used to make inferences on the general population, we can use the statistical technique called cross-validation to automatically create subsets of the data and test the model against each subset.

Deploying and Updating Models
Here we can use the use the models to create predictions, which we can then use to make business decisions. SQL Server provides the DMX language that we can use to create prediction queries, and Prediction Query Builder to help you build the queries.

 

References

  • http://charc-concepts.org/the-benefits-of-data-mining/
  • https://msdn.microsoft.com/en-us/library/bb522607.aspx
  • https://msdn.microsoft.com/en-us/library/bb510516.aspx

 

Authors

  • Shamaran Satkunanathan
  • Amila Basnayaka

 

Related Post

Moving towards Microservices – Challenges and Common Pitfalls
5 Principles: How To Use Lean Startup Towards A Successful Project
Mazarin Christmas Celebrations 2014
Mazarin Foodies 2014
Mazarin Aurudu Ulela 2015
Learn Cucumber Test Automation with Ruby Core Framework
Mazarin Kite Masters embellished Colombo Galle Face Sky
MS SQL Server BI (Business Intelligence)
Tags: BI, Data Mining, Mazarin, Microsoft SQL Server Data Tools, SQL Server Analysis Server
Did you enjoy reading this article? Share it! Share on Facebook Tweet this! Bookmark on Delicious StumbleUpon Digg This!

Related Posts

  • Serverless Architecture with AWS Lambda
  • Let’s move to NoSQL Databases with MongoDB – Mazarin
  • Without Redux and with Redux application state behaviorProductive Development With React Redux
  • Elements of CultureCompany Culture
avatar

About the Author: Satkunanathan Shamaran

Leave a comment

Click here to cancel reply.

CAPTCHA
Refresh

*

Follow Us on Twitter!

Related Post

Sass and LESS: An Introduction to CSS Preprocessor...
Azure Functions – Learn more about it
Firebase – Mobile Application Development &#...
Serverless Architecture with AWS Lambda
Let’s move to NoSQL Databases with MongoDB &...
Productive Development With React Redux
Beginners’ Guide to CSS (CSS for dummies)
Company Culture
What is Docker ? Getting Started with Docker
Hybrid Mobile App Development with Ionic and Angul...
Test Automation of Mobile Applications using Appiu...
What Power BI Can Do – Major Benefits
Learn Cucumber Test Automation with Ruby Core Fram...
How to Succeed With Designing Scalable Web Apps
Importance of Big Data and Managing Data with Elas...
An Introduction to Node.js – Kickstarter
MS SQL Server BI (Business Intelligence)
How To Start Cloud Computing with AWS
What is NFC – The Ultimate Guide
5 Principles: How To Use Lean Startup Towards A Su...
Avatars by Sterling Adventures

Team Mazarin

A team of individuals dedicated to share common goals and vision of the company. Mazarin's endowed team consists of Managers, Software Engineers, User Interface Engineers, Business Analysts, Finance and Administration. We are a blend of quality people. We strive to maintain the open culture and work in close association. The way we work enables everyone to contribute while feeling contented sharing opinions and ideas to deliver the best software solutions.

Read More

Mazarin © 2023. All Rights Reserved.