The 2nd edition of the book (v2.1)
The following is the second edition of the book. There are three new chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice.
Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters.
Chapter | Title | Book | Slides | Videos | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Preface and Table of Contents | ||||||||||||||||||||
Chapter 1 | Data Mining | PPT | ||||||||||||||||||
Chapter 2 | Map-Reduce and the New Software Stack | PPT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||||||||||
Chapter 3 | Finding Similar Items | PPT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |||||
Chapter 4 | Mining Data Streams | Part 1: Part 2: |
PDF |
PPT PPT |
1 | 2 | 3 | 4 | 5 | |||||||||||
Chapter 5 | Link Analysis | Part 1: Part 2: |
PDF |
PPT PPT |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ||
Chapter 6 | Frequent Itemsets | PPT | 1 | 2 | 3 | 4 | ||||||||||||||
Chapter 7 | Clustering | PPT | 1 | 2 | 3 | 4 | 5 | |||||||||||||
Chapter 8 | Advertising on the Web | PPT | 1 | 2 | 3 | 4 | ||||||||||||||
Chapter 9 | Recommendation Systems | Part 1: Part 2: |
PDF |
PPT PPT |
1 | 2 | 3 | 4 | 5 | |||||||||||
Chapter 10 | Mining Social-Network Graphs | Part 1: Part 2: |
PDF |
PPT PPT |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||||
Chapter 11 | Dimensionality Reduction | PPT | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||||||
Chapter 12 | Large-Scale Machine Learning | Part 1: Part 2: |
PDF |
PPT PPT |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||||
Index | ||||||||||||||||||||
Errata | HTML |
Download the latest version of the book as a single big PDF file (511 pages, 3 MB).
Source: Mining of Massive Datasets