Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark
- Autor:
- Russell Jurney
- Ocena:
- Bądź pierwszym, który oceni tę książkę
- Stron:
- 352
- Dostępne formaty:
-
ePubMobi
Opis ebooka: Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark
Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.
Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.
- Build value from your data in a series of agile sprints, using the data-value pyramid
- Extract features for statistical models from a single dataset
- Visualize data with charts, and expose different aspects through interactive reports
- Use historical data to predict the future via classification and regression
- Translate predictions into actions
- Get feedback from users after each sprint to keep your project on track
Wybrane bestsellery
-
Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You...
Big Data for Chimps. A Guide to Massive-Scale Data Processing in Practice Big Data for Chimps. A Guide to Massive-Scale Data Processing in Practice
(126.65 zł najniższa cena z 30 dni)126.65 zł
149.00 zł(-15%) -
Duże zbiory danych dla każdego! W dobie Big Data klasyczne podejście do analizy danych nie przynosi już pożądanych wyników. Skuteczna analiza gigantycznych zbiorów informacji, wyciąganie interesujących wniosków i prezentowanie ich w przejrzystej formie użytkownikowi wymagają...(19.95 zł najniższa cena z 30 dni)
14.90 zł
39.90 zł(-63%) -
Agile i Scrum, Scrum i Agile. Opanowawszy branżę IT, powoli, ale konsekwentnie, zdobywają inne biznesowe przyczółki i rozgaszczają się w firmach na dobre… Albo niedobre, gdy budzą niezrozumienie, protesty, a czasem nawet chęć ucieczki! Agile i Scrum brzmią tak nowocześnie, w teorii świetnie...(25.93 zł najniższa cena z 30 dni)
23.94 zł
39.90 zł(-40%) -
Książka adresowana jest przede wszystkim do kierowników zespołów, umożliwia dogłębne zrozumienie reguł rządzących pracą zespołu. Poruszono w niej takie tematy, jak: teoria systemów złożonych, teoria gier, samoorganizacja i zasada ciemności. Zebrano i usystematyzowano znane od wielu lat klasyczne ...
Zarządzanie 3.0. Kierowanie zespołami z wykorzystaniem metodyk Agile Zarządzanie 3.0. Kierowanie zespołami z wykorzystaniem metodyk Agile
(51.35 zł najniższa cena z 30 dni)47.40 zł
79.00 zł(-40%) -
O tym, ile problemów sprawia niedbale napisany kod, wie każdy programista. Nie wszyscy jednak wiedzą, jak napisać ten świetny, „czysty” kod i czym właściwie powinien się on charakteryzować. Co więcej – jak odróżnić dobry kod od złego? Odpowiedź na te pytania oraz sposoby tworzen...(51.35 zł najniższa cena z 30 dni)
47.40 zł
79.00 zł(-40%) -
Zarządzanie projektami kusi niejedną osobę, która planuje zmianę kariery zawodowej lub jej dalszy rozwój. Aby podejść do tego tematu profesjonalnie, warto poznać bliżej i przyswoić kilka terminów, takich jak Agile, czyli zwinne metodyki pracy, w szczególności – Scrum. To pojęcie oznacza spr...
Agile w akcji. Kurs video. Scrum jako narzędzie sukcesu projektowego Agile w akcji. Kurs video. Scrum jako narzędzie sukcesu projektowego
(44.55 zł najniższa cena z 30 dni)39.90 zł
99.00 zł(-60%) -
Przewodnik, który trzymasz w ręku, powstał właśnie po to, by zasypać otchłań między działem HR i pozostałymi dywizjami organizacji w procesie transformacji. Kate ma nadzieję, że dzięki książce uda jej się pomóc zarówno osobom przeprowadzającym transformację, jak i działom HR firm i organizacji. W...
Kompetentny Scrum Master. Przewodnik po rozwoju Scrum Masterów i Agile Coachów dla HR, zarządzających oraz samych zainteresowanych Kompetentny Scrum Master. Przewodnik po rozwoju Scrum Masterów i Agile Coachów dla HR, zarządzających oraz samych zainteresowanych
(32.44 zł najniższa cena z 30 dni)29.94 zł
49.90 zł(-40%) -
This book is your go-to guide on how to become a successful TPM and thriving in the fast-paced tech industry. It will help you use your technical skills to drive decisions, manage confidently, and communicate effectively. Then, take all of this and discover the career paths that are open to you!
Technical Program Manager's Handbook. Empowering managers to efficiently manage technical projects and build a successful career path Technical Program Manager's Handbook. Empowering managers to efficiently manage technical projects and build a successful career path
-
This book shows you how Microsoft Orleans can make a developer's life easy when it comes to building interactive distributed applications. You'll cover fundamentals such as the Orleans programming model, run time, and virtual actor concepts and get ready to leverage Orleans to build highly scalab...
Distributed .NET with Microsoft Orleans. Build robust and highly scalable distributed applications without worrying about complex programming patterns Distributed .NET with Microsoft Orleans. Build robust and highly scalable distributed applications without worrying about complex programming patterns
-
This mini book will walk you through the fundamentals, principles, and key concepts of Agile, Agile project management, and Agile Delivery. After reading this book, you will have a thorough understanding of Agile and be able to put Agile into practice at work and in your personal projects.
The Mini Book of Agile. Everything you really need to know about Agile, Agile Project Management and Agile Delivery The Mini Book of Agile. Everything you really need to know about Agile, Agile Project Management and Agile Delivery
Ebooka "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" przeczytasz na:
-
czytnikach Inkbook, Kindle, Pocketbook, Onyx Boox i innych
-
systemach Windows, MacOS i innych
-
systemach Windows, Android, iOS, HarmonyOS
-
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi
Masz pytania? Zajrzyj do zakładki Pomoc »
Audiobooka "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" posłuchasz:
-
w aplikacji Ebookpoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych
-
na dowolnych urządzeniach i aplikacjach obsługujących format MP3 (pliki spakowane w ZIP)
Masz pytania? Zajrzyj do zakładki Pomoc »
Kurs Video "Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark" zobaczysz:
-
w aplikacjach Ebookpoint i Videopoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych z dostępem do najnowszej wersji Twojej przeglądarki internetowej
Szczegóły ebooka
- ISBN Ebooka:
- 978-14-919-6006-6, 9781491960066
- Data wydania ebooka:
- 2017-06-07 Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@ebookpoint.pl.
- Język publikacji:
- angielski
- Rozmiar pliku ePub:
- 9.8MB
- Rozmiar pliku Mobi:
- 9.8MB
Spis treści ebooka
- Preface
- Agile Data Science Mailing List
- Data Syndrome, Product Analytics Consultancy
- Live Training
- Who This Book Is For
- How This Book Is Organized
- Conventions Used in This Book
- Using Code Examples
- OReilly Safari
- How to Contact Us
- I. Setup
- 1. Theory
- Introduction
- Definition
- Methodology as Tweet
- Agile Data Science Manifesto
- Iterate, iterate, iterate
- Ship intermediate output
- Prototype experiments over implementing tasks
- Integrate the tyrannical opinion of data
- Climb up and down the data-value pyramid
- Discover and pursue the critical path to a killer product
- Get meta
- Synthesis
- The Problem with the Waterfall
- Research Versus Application Development
- The Problem with Agile Software
- Eventual Quality: Financing Technical Debt
- The Pull of the Waterfall
- The Data Science Process
- Setting Expectations
- Data Science Team Roles
- Recognizing the Opportunity and the Problem
- Adapting to Change
- Harnessing the power of generalists
- Leveraging agile platforms
- Sharing intermediate results
- Notes on Process
- Code Review and Pair Programming
- Agile Environments: Engineering Productivity
- Collaboration space
- Private space
- Personal space
- Realizing Ideas with Large-Format Printing
- 2. Agile Tools
- Scalability = Simplicity
- Agile Data Science Data Processing
- Local Environment Setup
- System Requirements
- Setting Up Vagrant
- Downloading the Data
- EC2 Environment Setup
- Downloading the Data
- Getting and Running the Code
- Getting the Code
- Running the Code
- Jupyter Notebooks
- Touring the Toolset
- Agile Stack Requirements
- Python 3
- Anaconda and Miniconda
- Jupyter notebooks
- Serializing Events with JSON Lines and Parquet
- JSON for Python
- Collecting Data
- Data Processing with Spark
- Hadoop required
- Processing data with Spark
- Publishing Data with MongoDB
- Booting Mongo
- Pushing data to MongoDB from PySpark
- Searching Data with Elasticsearch
- Elasticsearch and PySpark
- Making PySpark data searchable
- Searching our data
- Elasticsearch and PySpark
- Python and Elasticsearch with pyelasticsearch
- Distributed Streams with Apache Kafka
- Starting up Kafka
- Topics, console producer, and console consumer
- Realtime versus batch computing with Spark
- Kafka in Python with kafka-python
- Processing Streams with PySpark Streaming
- Machine Learning with scikit-learn and Spark MLlib
- Why scikit-learn as well as Spark MLlib?
- Scheduling with Apache Airflow (Incubating)
- Installing Airflow
- Preparing a script for use with Airflow
- Conditionally initializing PySpark
- Parameterizing scripts on the command line
- Creating an Airflow DAG in Python
- Complete scripts for Airflow
- Testing a task in Airflow
- Running a DAG in Airflow
- Backfilling data in Airflow
- The power of Airflow
- Reflecting on Our Workflow
- Lightweight Web Applications
- Python and Flask
- Flask echo microservice
- Python and Mongo with pymongo
- Displaying executives in Flask
- Python and Flask
- Presenting Our Data
- Booting Bootstrap
- Visualizing data with D3.js
- Conclusion
- 3. Data
- Air Travel Data
- Flight On-Time Performance Data
- OpenFlights Database
- Air Travel Data
- Weather Data
- Data Processing in Agile Data Science
- Structured Versus Semistructured Data
- SQL Versus NoSQL
- SQL
- NoSQL and Dataflow Programming
- Spark: SQL + NoSQL
- Schemas in NoSQL
- Data Serialization
- Extracting and Exposing Features in Evolving Schemas
- Conclusion
- II. Climbing the Pyramid
- 4. Collecting and Displaying Records
- Putting It All Together
- Collecting and Serializing Flight Data
- Processing and Publishing Flight Records
- Publishing Flight Records to MongoDB
- Presenting Flight Records in a Browser
- Serving Flights with Flask and pymongo
- Rendering HTML5 with Jinja2
- Agile Checkpoint
- Listing Flights
- Listing Flights with MongoDB
- Paginating Data
- Reinventing the wheel?
- Serving paginated data
- Prototyping back from HTML
- Searching for Flights
- Creating Our Index
- Publishing Flights to Elasticsearch
- Searching Flights on the Web
- Conclusion
- 5. Visualizing Data with Charts and Tables
- Chart Quality: Iteration Is Essential
- Scaling a Database in the Publish/Decorate Model
- First Order Form
- Second Order Form
- Third Order Form
- Choosing a Form
- Exploring Seasonality
- Querying and Presenting Flight Volume
- Iterating on our first chart
- Querying and Presenting Flight Volume
- Extracting Metal (Airplanes [Entities])
- Extracting Tail Numbers
- Data processing: batch or realtime?
- Grouping and sorting data in Spark
- Publishing airplanes with Mongo
- Serving airplanes with Flask
- Ensuring database performance with indexes
- Linking back in to our new entity
- Information architecture
- Extracting Tail Numbers
- Assessing Our Airplanes
- Data Enrichment
- Reverse Engineering a Web Form
- Gathering Tail Numbers
- Automating Form Submission
- Extracting Data from HTML
- Evaluating Enriched Data
- Conclusion
- 6. Exploring Data with Reports
- Extracting Airlines (Entities)
- Defining Airlines as Groups of Airplanes Using PySpark
- Querying Airline Data in Mongo
- Building an Airline Page in Flask
- Linking Back to Our Airline Page
- Creating an All Airlines Home Page
- Extracting Airlines (Entities)
- Curating Ontologies of Semi-structured Data
- Improving Airlines
- Adding Names to Carrier Codes
- Incorporating Wikipedia Content
- Publishing Enriched Airlines to Mongo
- Enriched Airlines on the Web
- Investigating Airplanes (Entities)
- SQL Subqueries Versus Dataflow Programming
- Dataflow Programming Without Subqueries
- Subqueries in Spark SQL
- Creating an Airplanes Home Page
- Adding Search to the Airplanes Page
- Code versus configuration
- Configuring a search widget
- Building an Elasticsearch query programmatically
- Creating a Manufacturers Bar Chart
- Iterating on the Manufacturers Bar Chart
- Entity Resolution: Another Chart Iteration
- Entity resolution in 30 seconds
- Resolving manufacturers in PySpark
- Updating our chart
- Boeing versus Airbus revisited
- Cleanliness: Benefits of entity resolution
- Conclusion
- 7. Making Predictions
- The Role of Predictions
- Predict What?
- Introduction to Predictive Analytics
- Making Predictions
- Features
- Regression
- Classification
- Making Predictions
- Exploring Flight Delays
- Extracting Features with PySpark
- Building a Regression with scikit-learn
- Loading Our Data
- Sampling Our Data
- Vectorizing Our Results
- Preparing Our Training Data
- Vectorizing Our Features
- Sparse Versus Dense Matrices
- Preparing an Experiment
- Training Our Model
- Testing Our Model
- Conclusion
- Building a Classifier with Spark MLlib
- Loading Our Training Data with a Specified Schema
- Addressing Nulls
- Replacing FlightNum with Route
- Bucketizing a Continuous Variable for Classification
- Determining arrival delay buckets
- Iterative visualization with histograms
- Bucket quest conclusion
- Determining arrival delay buckets
- Bucketizing with a DataFrame UDF
- Bucketizing with pyspark.ml.feature.Bucketizer
- Feature Vectorization with pyspark.ml.feature
- Vectorizing categorical columns with Spark ML
- Vectorizing continuous variables and indexes with Spark ML
- Classification with Spark ML
- Test/train split with DataFrames
- Creating and fitting a model
- Evaluating a model
- Conclusion
- Conclusion
- 8. Deploying Predictive Systems
- Deploying a scikit-learn Application as a Web Service
- Saving and Loading scikit-learn Models
- Saving and loading objects using pickle
- Saving and loading models using sklearn.externals.joblib
- Groundwork for Serving Predictions
- Creating Our Flight Delay Regression API
- Filling in the predict_utils API
- Saving and Loading scikit-learn Models
- Testing Our API
- Pulling Our API into Our Product
- Deploying a scikit-learn Application as a Web Service
- Deploying Spark ML Applications in Batch with Airflow
- Gathering Training Data in Production
- Training, Storing, and Loading Spark ML Models
- Creating Prediction Requests in Mongo
- Feeding Mongo recommendation tasks from a Flask API
- A frontend for generating prediction requests
- Making a prediction request
- Fetching Prediction Requests from MongoDB
- Making Predictions in a Batch with Spark ML
- Loading Spark ML models in PySpark
- Making predictions with Spark ML
- Storing Predictions in MongoDB
- Displaying Batch Prediction Results in Our Web Application
- Automating Our Workflow with Apache Airflow (Incubating)
- Setting up Airflow
- Creating a DAG for creating our model
- Creating a DAG for operating our model
- Using Airflow to manage and execute DAGs and tasks
- Linking our Airflow script to the Airflow DAGs directory
- Executing our Airflow setup script
- Querying Airflow from the command line
- Testing tasks in Airflow
- Testing DAGs in Airflow
- Monitoring tasks in the Airflow web interface
- Conclusion
- Deploying Spark ML via Spark Streaming
- Gathering Training Data in Production
- Training, Storing, and Loading Spark ML Models
- Sending Prediction Requests to Kafka
- Setting up Kafka
- Start Zookeeper
- Start the Kafka server
- Create a topic
- Verify our new prediction request topic
- Setting up Kafka
- Feeding Kafka recommendation tasks from a Flask API
- A frontend for generating prediction requests
- Polling requests and LinkedIn InMaps
- A controller for the page
- An API controller for serving prediction responses
- Creating a template with a polling form
- Making a prediction request
- Making Predictions in Spark Streaming
- Testing the Entire System
- Overall system summary
- Rubber meets road
- Paydirt!
- Conclusion
- 9. Improving Predictions
- Fixing Our Prediction Problem
- When to Improve Predictions
- Improving Prediction Performance
- Experimental Adhesion Method: See What Sticks
- Establishing Rigorous Metrics for Experiments
- Defining our classification metrics
- Feature importance
- Implementing a more rigorous experiment
- Comparing experiments to determine improvements
- Inspecting changes in feature importance
- Conclusion
- Time of Day as a Feature
- Incorporating Airplane Data
- Extracting Airplane Features
- Incorporating Airplane Features into Our Classifier Model
- Incorporating Flight Time
- Conclusion
- A. Manual Installation
- Installing Hadoop
- Installing Spark
- Installing MongoDB
- Installing the MongoDB Java Driver
- Installing mongo-hadoop
- Building mongo-hadoop
- Installing pymongo_spark
- Installing Elasticsearch
- Installing Elasticsearch for Hadoop
- Setting Up Our Spark Environment
- Installing Kafka
- Installing scikit-learn
- Installing Zeppelin
- Index
O'Reilly Media - inne książki
-
JavaScript gives web developers great power to create rich interactive browser experiences, and much of that power is provided by the browser itself. Modern web APIs enable web-based applications to come to life like never before, supporting actions that once required browser plug-ins. Some are s...(177.65 zł najniższa cena z 30 dni)
186.15 zł
219.00 zł(-15%) -
How will software development and operations have to change to meet the sustainability and green needs of the planet? And what does that imply for development organizations? In this eye-opening book, sustainable software advocates Anne Currie, Sarah Hsu, and Sara Bergman provide a unique overview...(160.65 zł najniższa cena z 30 dni)
169.14 zł
199.00 zł(-15%) -
OpenTelemetry is a revolution in observability data. Instead of running multiple uncoordinated pipelines, OpenTelemetry provides users with a single integrated stream of data, providing multiple sources of high-quality telemetry data: tracing, metrics, logs, RUM, eBPF, and more. This practical gu...(143.65 zł najniższa cena z 30 dni)
143.65 zł
169.00 zł(-15%) -
Interested in developing embedded systems? Since they don't tolerate inefficiency, these systems require a disciplined approach to programming. This easy-to-read guide helps you cultivate good development practices based on classic software design patterns and new patterns unique to embedded prog...(152.15 zł najniższa cena z 30 dni)
160.65 zł
189.00 zł(-15%) -
If you use Linux in your day-to-day work, then Linux Pocket Guide is the perfect on-the-job reference. This thoroughly updated 20th anniversary edition explains more than 200 Linux commands, including new commands for file handling, package management, version control, file format conversions, an...(92.65 zł najniższa cena z 30 dni)
92.65 zł
109.00 zł(-15%) -
Gain the valuable skills and techniques you need to accelerate the delivery of machine learning solutions. With this practical guide, data scientists, ML engineers, and their leaders will learn how to bridge the gap between data science and Lean product delivery in a practical and simple way. Dav...(237.15 zł najniższa cena z 30 dni)
245.65 zł
289.00 zł(-15%) -
This practical book provides a detailed explanation of the zero trust security model. Zero trust is a security paradigm shift that eliminates the concept of traditional perimeter-based security and requires you to "always assume breach" and "never trust but always verify." The updated edition off...(203.15 zł najniższa cena z 30 dni)
203.15 zł
239.00 zł(-15%) -
Decentralized finance (DeFi) is a rapidly growing field in fintech, having grown from $700 million to $100 billion over the past three years alone. But the lack of reliable information makes this area both risky and murky. In this practical book, experienced securities attorney Alexandra Damsker ...(194.65 zł najniższa cena z 30 dni)
203.15 zł
239.00 zł(-15%) -
Whether you're a startup founder trying to disrupt an industry or an entrepreneur trying to provoke change from within, your biggest challenge is creating a product people actually want. Lean Analytics steers you in the right direction.This book shows you how to validate your initial idea, find t...(126.65 zł najniższa cena z 30 dni)
126.65 zł
149.00 zł(-15%) -
When it comes to building user interfaces on the web, React enables web developers to unlock a new world of possibilities. This practical book helps you take a deep dive into fundamental concepts of this JavaScript library, including JSX syntax and advanced patterns, the virtual DOM, React reconc...(194.65 zł najniższa cena z 30 dni)
203.15 zł
239.00 zł(-15%)
Dzieki opcji "Druk na żądanie" do sprzedaży wracają tytuły Grupy Helion, które cieszyły sie dużym zainteresowaniem, a których nakład został wyprzedany.
Dla naszych Czytelników wydrukowaliśmy dodatkową pulę egzemplarzy w technice druku cyfrowego.
Co powinieneś wiedzieć o usłudze "Druk na żądanie":
- usługa obejmuje tylko widoczną poniżej listę tytułów, którą na bieżąco aktualizujemy;
- cena książki może być wyższa od początkowej ceny detalicznej, co jest spowodowane kosztami druku cyfrowego (wyższymi niż koszty tradycyjnego druku offsetowego). Obowiązująca cena jest zawsze podawana na stronie WWW książki;
- zawartość książki wraz z dodatkami (płyta CD, DVD) odpowiada jej pierwotnemu wydaniu i jest w pełni komplementarna;
- usługa nie obejmuje książek w kolorze.
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka, którą chcesz zamówić pochodzi z końcówki nakładu. Oznacza to, że mogą się pojawić drobne defekty (otarcia, rysy, zagięcia).
Co powinieneś wiedzieć o usłudze "Końcówka nakładu":
- usługa obejmuje tylko książki oznaczone tagiem "Końcówka nakładu";
- wady o których mowa powyżej nie podlegają reklamacji;
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka drukowana
Oceny i opinie klientów: Agile Data Science 2.0. Building Full-Stack Data Analytics Applications with Spark Russell Jurney (0) Weryfikacja opinii następuję na podstawie historii zamówień na koncie Użytkownika umieszczającego opinię. Użytkownik mógł otrzymać punkty za opublikowanie opinii uprawniające do uzyskania rabatu w ramach Programu Punktowego.