Skip to Content
Spark: the definitive guide: big data processing made simple

Spark: the definitive guide: big data processing made simple

Chambers, Bill; Zaharia, Matei

Paperback, Book. English.
Published Sebastopol, CA: O'Reilly, 2018
Rate this

Available at all branches.

This item is not reservable because:

  • There are no reservable copies for this title. Please contact a member of library staff for further information.
  • National College Library – Four available in Main Lending 006.31 and Desk Reserve 006.31

    Barcode Shelfmark Loan type Status
    39006010685040 Main Lending 006.31 Main Lending Available
    39006010685115 Main Lending 006.31 Main Lending Available
    39006010685123 Main Lending 006.31 Main Lending Available
    39006010685131 Desk Reserve 006.31 Desk Reserve Available
    39006010685065 Main Lending 006.31 Main Lending Due back 2nd January 2020
    39006010685099 Main Lending 006.31 Main Lending Due back 2nd January 2020
    39006010685107 Main Lending 006.31 Main Lending Due back 2nd January 2020
    39006010685081 Main Lending 006.31 Main Lending Due back 2nd January 2020
    39006010685057 Main Lending 006.31 Main Lending Due back 4th January 2020
    39006010685073 Main Lending 006.31 Main Lending Due back 6th January 2020

Details

Statement of responsibility: Bill Chambers and Matei Zaharia
ISBN: 1491912219, 9781491912218
Note: Includes QR code.
Note: Includes index.
Physical Description: xxvi, 576 pages : illustrations ; 24 cm
Subject: Machine learning.; Spark (Electronic resource : Apache Software Foundation); Electronic data processing.; Data mining Computer programs.

Contents

  1. Part I: Gentle overivew of big data and Spark
  2. What is Apache Spark?
  3. A gentle introduction to Spark
  4. A tour of Spark's toolset
  5. Part II: Structured APIs - DataFrames, SQL, and datasets
  6. Structured API overview
  7. Basic structured operations
  8. Working with different types of data
  9. Aggregations
  10. Joins
  11. Data sources
  12. Spark SQL
  13. Datasets
  14. Part III: Low-level APIs
  15. Resillient distributed datasets (RDDs)
  16. Advanced RDDs
  17. Distributed shared variables
  18. Part IV: Production Applications
  19. How Spark runs on a cluster
  20. Developing Spark applications
  21. Deploying Spark
  22. Monitoring and debugging
  23. Performance tuning
  24. Part V: Streming
  25. Stream processing fundamentals
  26. Structured streaming basics
  27. Event-time and and stateful processing
  28. Structured streaming in production
  29. Part VI: Advanced analytics and machine learning
  30. Advanced analytics and machine learning overview
  31. Preprocessing and feature engineering
  32. Classification
  33. Regression
  34. Recommendation
  35. Unsupervised learning
  36. Graph analytics
  37. Deep learning
  38. Part VII: Ecosystem
  39. Language specifics: Python (PySpark) and R (SparkR and sparklyr)
  40. Ecosystem and community.