...

巨量資料探勘課程介紹(Course Orientation for Big Data

by user

on
Category:

science

108

views

Report

Comments

Transcript

巨量資料探勘課程介紹(Course Orientation for Big Data
Tamkang University
Big Data Mining
巨量資料探勘
Tamkang
University
Course Orientation for Big Data Mining
(巨量資料探勘課程介紹)
1042DM01
MI4 (M2244) (3094)
Tue, 3, 4 (10:10-12:00) (B216)
Min-Yuh Day
戴敏育
Assistant Professor
專任助理教授
Dept. of Information Management, Tamkang University
淡江大學 資訊管理學系
http://mail. tku.edu.tw/myday/
2016-02-16
1
淡江大學104學年度第2學期
課程教學計畫表
Spring 2016 (2016.02 - 2016.06)
•
•
•
•
•
•
課程名稱:巨量資料探勘 (Big Data Mining)
授課教師:戴敏育 (Min-Yuh Day)
開課系級:資管四P (TLMXB4P) (M2244) (3094)
開課資料:選修 單學期 2 學分 (2 Credits, Elective)
上課時間:週二 3,4 (Tue 10:10-12:00)
上課教室: B216
2
課程簡介
• 本課程介紹巨量資料探勘 (Big Data Mining) 的
基礎概念及應用技術。
• 課程內容包括
– 巨量資料探勘 (Big Data Mining)
– 巨量資料基礎:MapReduce典範、Hadoop與Spark生態系統
(Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem)
–
–
–
–
–
–
關連分析 (Association Analysis)
分類與預測 (Classification and Prediction)
分群分析 (Cluster Analysis)
SAS企業資料採礦實務 (SAS EM)
巨量資料探勘個案分析與實作
Google TensorFlow 深度學習 (Deep Learning with Google TensorFlow)
3
Course Introduction
• This course introduces the fundamental concepts and
applications technology of big data mining.
• Topics include
– Big Data Mining
– Fundamental Big Data:
MapReduce Paradigm, Hadoop and Spark Ecosystem
– Association Analysis
– Classification and Prediction
– Cluster Analysis
– Data Mining Using SAS Enterprise Miner (SAS EM)
– Case Study and Implementation of Big Data Mining
– Deep Learning with Google TensorFlow
4
課程目標
(Objective)
• 瞭解及應用巨量資料探勘基本概念與技術。
• Understand and apply the fundamental
concepts and technology of big data mining
5
課程大綱 (Syllabus)
週次 (Week) 日期 (Date) 內容 (Subject/Topics)
1 2016/02/16 巨量資料探勘課程介紹
(Course Orientation for Big Data Mining)
2 2016/02/23 巨量資料基礎:MapReduce典範、Hadoop與Spark生態系統
(Fundamental Big Data: MapReduce Paradigm,
Hadoop and Spark Ecosystem)
3 2016/03/01 關連分析 (Association Analysis)
4 2016/03/08 分類與預測 (Classification and Prediction)
5 2016/03/15 分群分析 (Cluster Analysis)
6 2016/03/22 個案分析與實作一 (SAS EM 分群分析):
Case Study 1 (Cluster Analysis – K-Means using SAS EM)
7 2016/03/29 個案分析與實作二 (SAS EM 關連分析):
Case Study 2 (Association Analysis using SAS EM)
6
課程大綱 (Syllabus)
週次 (Week) 日期 (Date) 內容 (Subject/Topics)
8 2016/04/05 教學行政觀摩日 (Off-campus study)
9 2016/04/12 期中報告 (Midterm Project Presentation)
10 2016/04/19 期中考試週 (Midterm Exam)
11 2016/04/26 個案分析與實作三 (SAS EM 決策樹、模型評估):
Case Study 3 (Decision Tree, Model Evaluation using SAS EM)
12 2016/05/03 個案分析與實作四 (SAS EM 迴歸分析、類神經網路):
Case Study 4 (Regression Analysis,
Artificial Neural Network using SAS EM)
13 2016/05/10 Google TensorFlow 深度學習
(Deep Learning with Google TensorFlow)
14 2016/05/17 期末報告 (Final Project Presentation)
15 2016/05/24 畢業班考試 (Final Exam)
7
教學方法與評量方法
• 教學方法
– 講述、討論、賞析、模擬、實作、問題解決
• 評量方法
– 紙筆測驗、實作、報告、上課表現
8
教材課本
• 教材課本
– 講義 (Slides)
– 資料採礦運用: 以SAS Enterprise Miner為工具,
李淑娟,2015,SAS賽仕電腦軟體
• 參考書籍
– Big Data, Data Mining, and Machine Learning: Value Creation for
Business Leaders and Practitioners, Jared Dean, Wiley, 2014
– Data Science for Business: What you need to know about data
mining and data-analytic thinking, Foster Provost and Tom Fawcett,
O'Reilly, 2013
– Applied Analytics Using SAS Enterprise Mining, Jim Georges, Jeff
Thompson and Chip Wells, SAS, 2010
– Data Mining: Concepts and Techniques, Third Edition, Jiawei Han,
Micheline Kamber and Jian Pei, Morgan Kaufmann, 2011
9
作業與學期成績計算方式
• 作業篇數
– 3篇
• 學期成績計算方式
– 期中評量:30 %
– 期末評量:30 %
– 其他(課堂參與及報告討論表現): 40 %
10
Team Term Project
• Term Project Topics
– Big Data mining
– Web and Text mining
– Business Intelligence
– Big Data Analytics
– Social Computing
• 3-4 人為一組
– 分組名單於 2016/02/23 (二) 課程下課時繳交
– 由班代統一收集協調分組名單
11
2016/02/23
巨量資料基礎:
MapReduce典範、
Hadoop與Spark生態系統
(Fundamental Big Data:
MapReduce Paradigm,
Hadoop and Spark Ecosystem)
12
2016/05/10
Google TensorFlow
深度學習
(Deep Learning
with
Google TensorFlow)
13
Big Data
Analytics
and
Data Mining
14
Stephan Kudyba (2014),
Big Data, Mining, and Analytics:
Components of Strategic Decision Making, Auerbach Publications
Source: http://www.amazon.com/gp/product/1466568704
15
Architecture of Big Data Analytics
Big Data
Sources
Big Data
Transformation
Middleware
* Internal
* External
* Multiple
formats
* Multiple
locations
* Multiple
applications
Big Data
Platforms & Tools
Raw
Data
Hadoop
Transformed MapReduce
Pig
Data
Extract
Hive
Transform
Jaql
Load
Zookeeper
Hbase
Data
Cassandra
Warehouse
Oozie
Avro
Mahout
Traditional
Others
Format
CSV, Tables
Big Data
Analytics
Applications
Queries
Big Data
Analytics
Reports
OLAP
Data
Mining
Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
16
Architecture of Big Data Analytics
Big Data
Sources
* Internal
* External
* Multiple
formats
* Multiple
locations
* Multiple
applications
Big Data
Transformation
Big Data
Platforms & Tools
Data Mining
Big Data
Analytics
Applications
Middleware
Raw
Data
Hadoop
Transformed MapReduce
Pig
Data
Extract
Hive
Transform
Jaql
Load
Zookeeper
Hbase
Data
Cassandra
Warehouse
Oozie
Avro
Mahout
Traditional
Others
Format
CSV, Tables
Big Data
Analytics
Applications
Queries
Big Data
Analytics
Reports
OLAP
Data
Mining
Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
17
Social Big Data Mining
(Hiroshi Ishikawa, 2015)
Source: http://www.amazon.com/Social-Data-Mining-Hiroshi-Ishikawa/dp/149871093X
18
Architecture for
Social Big Data Mining
(Hiroshi Ishikawa, 2015)
Enabling Technologies
• Integrated analysis model
Analysts
Integrated analysis
• Model Construction
• Explanation by Model
Conceptual Layer
•
•
•
•
Natural Language Processing
Information Extraction
Anomaly Detection
Discovery of relationships
among heterogeneous data
• Large-scale visualization
• Parallel distrusted processing
Data
Mining
Multivariate
analysis
Application
specific task
Logical Layer
Software
• Construction and
confirmation
of individual
hypothesis
• Description and
execution of
application-specific
task
Social Data
Hardware
Physical Layer
Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press
19
Business Intelligence (BI) Infrastructure
Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson.
20
Data Warehouse
Data Mining and Business Intelligence
Increasing potential
to support
business decisions
End User
Decision
Making
Data Presentation
Visualization Techniques
Business
Analyst
Data Mining
Information Discovery
Data
Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Source: Jiawei Han and Micheline Kamber (2006), Data Mining: Concepts and Techniques, Second Edition, Elsevier
DBA
21
The Evolution of BI Capabilities
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems
22
Data Mining
Source: http://www.amazon.com/Data-Mining-Concepts-Techniques-Management/dp/0123814790
23
郝沛毅, 李御璽, 黃嘉彥 編譯, 資料探勘
(Jiawei Han, Micheline Kamber, Jian Pei, Data Mining - Concepts and Techniques 3/e),
高立圖書, 2014
Source: http://www.books.com.tw/products/0010646676
24
Data Mining at the
Intersection of Many Disciplines
ial
e
Int
tis
tic
s
c
tifi
Ar
Pattern
Recognition
en
Sta
llig
Mathematical
Modeling
Machine
Learning
ce
DATA
MINING
Databases
Management Science &
Information Systems
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems
25
Knowledge Discovery (KDD) Process
Data mining:
core of knowledge discovery process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
Source: Han & Kamber (2006)
26
A Taxonomy for Data Mining Tasks
Data Mining
Learning Method
Popular Algorithms
Supervised
Classification and Regression Trees,
ANN, SVM, Genetic Algorithms
Classification
Supervised
Decision trees, ANN/MLP, SVM, Rough
sets, Genetic Algorithms
Regression
Supervised
Linear/Nonlinear Regression, Regression
trees, ANN/MLP, SVM
Unsupervised
Apriory, OneR, ZeroR, Eclat
Link analysis
Unsupervised
Expectation Maximization, Apriory
Algorithm, Graph-based Matching
Sequence analysis
Unsupervised
Apriory Algorithm, FP-Growth technique
Unsupervised
K-means, ANN/SOM
Prediction
Association
Clustering
Outlier analysis
Unsupervised
K-means, Expectation Maximization (EM)
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems
27
Source: http://www.amazon.com/Data-Mining-Machine-Learning-Practitioners/dp/1118618041
28
Deep Learning
Intelligence from Big Data
Source: https://www.vlab.org/events/deep-learning/
29
Source: http://www.amazon.com/Big-Data-Analytics-Turning-Money/dp/1118147596
30
Source: http://www.amazon.com/Big-Data-Revolution-Transform-Mayer-Schonberger/dp/B00D81X2YE
31
Source: https://www.thalesgroup.com/en/worldwide/big-data/big-data-big-analytics-visual-analytics-what-does-it-all-mean
32
Big Data with Hadoop Architecture
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
33
Big Data with Hadoop Architecture
Logical Architecture
Processing: MapReduce
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
34
Big Data with Hadoop Architecture
Logical Architecture
Storage: HDFS
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
35
Big Data with Hadoop Architecture
Process Flow
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
36
Big Data with Hadoop Architecture
Hadoop Cluster
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
37
Traditional ETL Architecture
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
38
Offload ETL with Hadoop
(Big Data Architecture)
Source: https://software.intel.com/sites/default/files/article/402274/etl-big-data-with-hadoop.pdf
39
Big Data Solution
Source: http://www.newera-technologies.com/big-data-solution.html
40
HDP
A Complete Enterprise Hadoop Data Platform
Source: http://hortonworks.com/hdp/
41
Spark and Hadoop
Source: http://spark.apache.org/
42
Spark Ecosystem
Source: http://spark.apache.org/
43
Python for Big Data Analytics
(The column on the left is the 2015 ranking; the column on the right is the 2014 ranking for comparison
2015
Source: http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages
2014
44
Source: http://www.kdnuggets.com/2015/05/poll-r-rapidminer-python-big-data-spark.html
45
Yves Hilpisch,
Python for Finance: Analyze Big Financial Data,
O'Reilly, 2014
Source: http://www.amazon.com/Python-Finance-Analyze-Financial-Data/dp/1491945281
46
Business Insights
with
Social Analytics
47
Analyzing the Social Web:
Social Network Analysis
48
Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann
Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311
49
Mining the Social Web:
Analyzing Data from Facebook, Twitter,
LinkedIn, and Other Social Media Sites
Source: http://www.amazon.com/Mining-Social-Web-Analyzing-Facebook/dp/1449388345
50
Web Mining Success Stories
• Amazon.com, Ask.com, Scholastic.com, …
• Website Optimization Ecosystem
Customer Interaction
on the Web
Analysis of Interactions
Knowledge about the Holistic
View of the Customer
Web
Analytics
Voice of
Customer
Customer Experience
Management
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems
51
Business Intelligence Trends
1.
2.
3.
4.
5.
Agile Information Management (IM)
Cloud Business Intelligence (BI)
Mobile Business Intelligence (BI)
Analytics
Big Data
Source: http://www.businessspectator.com.au/article/2013/1/22/technology/five-business-intelligence-trends-2013
52
Business Intelligence Trends:
Computing and Service
• Cloud Computing and Service
• Mobile Computing and Service
• Social Computing and Service
53
Business Intelligence and Analytics
• Business Intelligence 2.0 (BI 2.0)
– Web Intelligence
– Web Analytics
– Web 2.0
– Social Networking and Microblogging sites
• Data Trends
– Big Data
• Platform Technology Trends
– Cloud computing platform
Source: Lim, E. P., Chen, H., & Chen, G. (2013). Business Intelligence and Analytics: Research Directions.
ACM Transactions on Management Information Systems (TMIS), 3(4), 17
54
Business Intelligence and Analytics:
Research Directions
1. Big Data Analytics
– Data analytics using Hadoop / MapReduce
framework
2. Text Analytics
– From Information Extraction to Question Answering
– From Sentiment Analysis to Opinion Mining
3. Network Analysis
– Link mining
– Community Detection
– Social Recommendation
Source: Lim, E. P., Chen, H., & Chen, G. (2013). Business Intelligence and Analytics: Research Directions.
ACM Transactions on Management Information Systems (TMIS), 3(4), 17
55
Source: Davenport, T. H., & Patil, D. J. (2012). Data Scientist. Harvard business review
56
SAS第五屆大數據資料科學家競賽
文字分析與數位行銷大賽
http://saschampion.com.tw/
57
SAS第五屆大數據資料科學家競賽
文字分析與數位行銷大賽
http://saschampion.com.tw/
58
Summary
• This course introduces the fundamental concepts and
applications technology of big data mining.
• Topics include
– Big Data Mining
– Fundamental Big Data:
MapReduce Paradigm, Hadoop and Spark Ecosystem
– Association Analysis
– Classification and Prediction
– Cluster Analysis
– Data Mining Using SAS Enterprise Miner (SAS EM)
– Case Study and Implementation of Big Data Mining
– Deep Learning with Google TensorFlow
59
Contact Information
戴敏育 博士 (Min-Yuh Day, Ph.D.)
專任助理教授
淡江大學 資訊管理學系
電話:02-26215656 #2846
傳真:02-26209737
研究室:B929
地址: 25137 新北市淡水區英專路151號
Email: [email protected]
網址:http://mail.tku.edu.tw/myday/
60
Fly UP