Mining of Massive Datasets

The 2nd edition of the book (v2.1)

The following is the second edition of the book. There are three new chapters, on mining large graphs, dimensionality reduction, and machine learning. There is also a revised Chapter 2 that treats map-reduce programming in a manner closer to how it is used in practice.

Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters.

 

Chapter Title Book Slides Videos
Preface and Table of Contents PDF
Chapter 1 Data Mining PDF PDF PPT
Chapter 2 Map-Reduce and the New Software Stack PDF PDF PPT 1 2 3 4 5 6 7 8
Chapter 3 Finding Similar Items PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12 13
Chapter 4 Mining Data Streams PDF Part 1:
Part 2:
PDF
PDF
PPT
PPT
1 2 3 4 5
Chapter 5 Link Analysis PDF Part 1:
Part 2:
PDF
PDF
PPT
PPT
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Chapter 6 Frequent Itemsets PDF PDF PPT 1 2 3 4
Chapter 7 Clustering PDF PDF PPT 1 2 3 4 5
Chapter 8 Advertising on the Web PDF PDF PPT 1 2 3 4
Chapter 9 Recommendation Systems PDF Part 1:
Part 2:
PDF
PDF
PPT
PPT
1 2 3 4 5
Chapter 10 Mining Social-Network Graphs PDF Part 1:
Part 2:
PDF
PDF
PPT
PPT
1 2 3 4 5 6 7 8 9 10 11 12
Chapter 11 Dimensionality Reduction PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12
Chapter 12 Large-Scale Machine Learning PDF Part 1:
Part 2:
PDF
PDF
PPT
PPT
1 2 3 4 5 6 7 8 9 10 11 12
Index PDF
Errata HTML

 

Download the latest version of the book as a single big PDF file (511 pages, 3 MB).

Source: Mining of Massive Datasets