Apache Oozie. The Workflow Scheduler for Hadoop
- Ocena:
- Bądź pierwszym, który oceni tę książkę
- Stron:
- 272
- Dostępne formaty:
-
ePubMobi
Opis ebooka: Apache Oozie. The Workflow Scheduler for Hadoop
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.
Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities.
- Install and configure an Oozie server, and get an overview of basic concepts
- Journey through the world of writing and configuring workflows
- Learn how the Oozie coordinator schedules and executes workflows based on triggers
- Understand how Oozie manages data dependencies
- Use Oozie bundles to package several coordinator apps into a data pipeline
- Learn about security features and shared library management
- Implement custom extensions and write your own EL functions and actions
- Debug workflows and manage Oozie’s operational details
Wybrane bestsellery
-
Oprogramowanie Apache Kafka powstało jako broker wiadomości w LinkedIn. Obecnie pełni funkcję rozproszonego systemu przetwarzania strumieniowego danych, używanego do budowania aplikacji opracowujących duże ilości danych w czasie rzeczywistym. Z zalet tego oprogramowania korzystają firmy na całym ...
Apache Kafka. Kurs video. Przetwarzanie danych w czasie rzeczywistym Apache Kafka. Kurs video. Przetwarzanie danych w czasie rzeczywistym
(35.60 zł najniższa cena z 30 dni)39.90 zł
89.00 zł(-55%) -
Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time.With this practical guide, authors Mickael Maison a...(245.65 zł najniższa cena z 30 dni)
245.65 zł
289.00 zł(-15%) -
This book describes both batch processing and real-time processing pipelines. You’ll learn how to implement basic and advanced big data use cases with ease and develop a deep understanding of the Apache Beam model. In addition to this, you’ll discover how the portability layer works...
Building Big Data Pipelines with Apache Beam. Use a single programming model for both batch and stream data processing Building Big Data Pipelines with Apache Beam. Use a single programming model for both batch and stream data processing
-
Every enterprise application creates data, including log messages, metrics, user activity, and outgoing messages. Learning how to move these items is almost as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Pulsar, this practical ...(211.65 zł najniższa cena z 30 dni)
220.15 zł
259.00 zł(-15%) -
Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.Updated to include Spark 3.0, this second edition shows data engineer...(211.65 zł najniższa cena z 30 dni)
220.15 zł
259.00 zł(-15%) -
Serverless computing greatly simplifies software development. Your team can focus solely on your application while the cloud provider manages the servers you need. This practical guide shows you step-by-step how to build and deploy complex applications in a flexible multicloud, multilanguage envi...
Learning Apache OpenWhisk. Developing Open Serverless Solutions Learning Apache OpenWhisk. Developing Open Serverless Solutions
(211.65 zł najniższa cena z 30 dni)220.15 zł
259.00 zł(-15%) -
Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables ...
Stream Processing with Apache Spark. Mastering Structured Streaming and Spark Streaming Stream Processing with Apache Spark. Mastering Structured Streaming and Spark Streaming
(211.65 zł najniższa cena z 30 dni)220.15 zł
259.00 zł(-15%) -
This practical guide explains you to program and understand the power of Apache Cassandra 3.x. You will explore the integration and interaction of Cassandra components, and explore features such as the token allocation algorithm, CQL3, vnodes, lightweight transactions, and data modelling in detail.
Mastering Apache Cassandra 3.x. An expert guide to improving database scalability and availability without compromising performance - Third Edition Mastering Apache Cassandra 3.x. An expert guide to improving database scalability and availability without compromising performance - Third Edition
-
Apache Hive helps you deal with data summarization, queries, and analysis for huge amounts of data. This book will give you a background in big data, and familiarize you with your Hive working environment. Next you will cover advanced topics like performance and security in Hive and how to work e...
Apache Hive Essentials. Essential techniques to help you process, and get unique insights from, big data - Second Edition Apache Hive Essentials. Essential techniques to help you process, and get unique insights from, big data - Second Edition
-
Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizati...
High Performance Spark. Best Practices for Scaling and Optimizing Apache Spark High Performance Spark. Best Practices for Scaling and Optimizing Apache Spark
(143.65 zł najniższa cena z 30 dni)143.65 zł
169.00 zł(-15%)
Ebooka "Apache Oozie. The Workflow Scheduler for Hadoop" przeczytasz na:
-
czytnikach Inkbook, Kindle, Pocketbook, Onyx Boox i innych
-
systemach Windows, MacOS i innych
-
systemach Windows, Android, iOS, HarmonyOS
-
na dowolnych urządzeniach i aplikacjach obsługujących formaty: PDF, EPub, Mobi
Masz pytania? Zajrzyj do zakładki Pomoc »
Audiobooka "Apache Oozie. The Workflow Scheduler for Hadoop" posłuchasz:
-
w aplikacji Ebookpoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych
-
na dowolnych urządzeniach i aplikacjach obsługujących format MP3 (pliki spakowane w ZIP)
Masz pytania? Zajrzyj do zakładki Pomoc »
Kurs Video "Apache Oozie. The Workflow Scheduler for Hadoop" zobaczysz:
-
w aplikacjach Ebookpoint i Videopoint na Android, iOS, HarmonyOs
-
na systemach Windows, MacOS i innych z dostępem do najnowszej wersji Twojej przeglądarki internetowej
Szczegóły ebooka
- ISBN Ebooka:
- 978-14-493-6975-0, 9781449369750
- Data wydania ebooka:
- 2015-05-12 Data wydania ebooka często jest dniem wprowadzenia tytułu do sprzedaży i może nie być równoznaczna z datą wydania książki papierowej. Dodatkowe informacje możesz znaleźć w darmowym fragmencie. Jeśli masz wątpliwości skontaktuj się z nami sklep@ebookpoint.pl.
- Język publikacji:
- angielski
- Rozmiar pliku ePub:
- 6.2MB
- Rozmiar pliku Mobi:
- 6.2MB
Spis treści ebooka
- Foreword
- Preface
- Contents of This Book
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Introduction to Oozie
- Big Data Processing
- A Recurrent Problem
- A Common Solution: Oozie
- Oozies role in the Hadoop Ecosystem
- What exactly is Oozie?
- The name Oozie
- A Simple Oozie Job
- Oozie Releases
- Timeline and status of the releases
- Compatibility
- Big Data Processing
- Some Oozie Usage Numbers
- 2. Oozie Concepts
- Oozie Applications
- Oozie Workflows
- Workflow use case
- Oozie Workflows
- Oozie Coordinators
- Coordinator use case
- Oozie Applications
- Oozie Bundles
- Bundle use case
- Parameters, Variables, and Functions
- Application Deployment Model
- Oozie Architecture
- 3. Setting Up Oozie
- Oozie Deployment
- Basic Installations
- Requirements
- Build Oozie
- Install Oozie Server
- Hadoop Cluster
- Hadoop installation
- Configuring Hadoop for Oozie
- Hadoop installation
- Start and Verify the Oozie Server
- Advanced Oozie Installations
- Configuring Kerberos Security
- DB Setup
- MySQL configuration
- Oracle configuration
- Shared Library Installation
- Sharelib since version 4.1.0
- Oozie Client Installations
- 4. Oozie Workflow Actions
- Workflow
- Actions
- Action Execution Model
- Action Definition
- Action Types
- MapReduce Action
- Streaming
- Pipes
- MapReduce example
- Streaming example
- MapReduce Action
- Java Action
- Java example
- Pig Action
- Pig example
- FS Action
- Filesystem example
- Sub-Workflow Action
- Hive Action
- Hive example
- DistCp Action
- DistCp Example
- Email Action
- Shell Action
- Shell example
- SSH Action
- Sqoop Action
- Sqoop example
- Synchronous Versus Asynchronous Actions
- 5. Workflow Applications
- Outline of a Basic Workflow
- Control Nodes
- <start> and <end>
- <fork> and <join>
- <decision>
- <kill>
- <OK> and <ERROR>
- Job Configuration
- Global Configuration
- Job XML
- Inline Configuration
- Launcher Configuration
- Parameterization
- EL Variables
- EL constants and system-defined variables
- Hadoop counters
- EL Variables
- EL Functions
- String timestamp()
- String wf:id()
- String wf:errorCode(String node)
- boolean fs:fileSize(String path)
- EL Expressions
- The job.properties File
- Command-Line Option
- The config-default.xml File
- The <parameters> Section
- Configuration and Parameterization Examples
- Lifecycle of a Workflow
- Action States
- 6. Oozie Coordinator
- Coordinator Concept
- Triggering Mechanism
- Time Trigger
- Data Availability Trigger
- Coordinator Application and Job
- Coordinator Action
- Our First Coordinator Job
- Coordinator Submission
- Oozie Web Interface for Coordinator Jobs
- Coordinator Job Lifecycle
- Coordinator Action Lifecycle
- Parameterization of the Coordinator
- EL Functions for Frequency
- Day-Based Frequency
- Month-Based Frequency
- Execution Controls
- An Improved Coordinator
- 7. Data Trigger Coordinator
- Expressing Data Dependency
- Dataset
- Defining a dataset
- Timelines: coordinator versus dataset
- input-events
- output-events
- Dataset
- Expressing Data Dependency
- Example: Rollup
- Parameterization of Dataset Instances
- current(n)
- latest(n)
- Comparison of current() and latest()
- Parameter Passing to Workflow
- dataIn(eventName):
- dataOut(eventName)
- nominalTime()
- actualTime()
- dateOffset(baseTimeStamp, skipInstance, timeUnit)
- formatTime(timeStamp, formatString)
- A Complete Coordinator Application
- 8. Oozie Bundles
- Bundle Basics
- Bundle Definition
- Why Do We Need Bundles?
- Bundle Basics
- Bundle Specification
- Execution Controls
- Bundle State Transitions
- 9. Advanced Topics
- Managing Libraries in Oozie
- Origin of JARs in Oozie
- Design Challenges
- Managing Action JARs
- How to get the JARs?
- Installing sharelib
- Overriding/upgrading existing JARs
- Supporting multiple versions
- Supporting the Users JAR
- JAR Precedence in classpath
- Managing Libraries in Oozie
- Oozie Security
- Oozie Security Overview
- Oozie to Hadoop
- Configuring Hadoop services
- Setting up Keytab and Principal
- Configuring the Oozie server
- Oozie Client to Server
- Oozie Server Security
- Configuring the Oozie Server
- Oozie client
- Proxy user in Oozie
- Supporting Custom Credentials
- Supporting New API in MapReduce Action
- Supporting Uber JAR
- Cron Scheduling
- A Simple Cron-Based Coordinator
- Oozie Cron Specification
- Allowed values
- Special characters
- Nonstandard special characters
- Emulate Asynchronous Data Processing
- HCatalog-Based Data Dependency
- 10. Developer Topics
- Developing Custom EL Functions
- Requirements for a New EL Function
- Implementing a New EL Function
- Writing a new EL function
- Deploy the new EL function
- Using the new function
- Developing Custom EL Functions
- Supporting Custom Action Types
- Creating a Custom Synchronous Action
- Writing an ActionExecutor
- Writing the XML schema
- Deploying the new action type
- Using the new action type
- Creating a Custom Synchronous Action
- Overriding an Asynchronous Action Type
- Implementing the New ActionMain Class
- Testing the New Main Class
- Creating a New Asynchronous Action
- Writing an Asynchronous Action Executor
- Writing the ActionMain Class
- Writing Actions Schema
- Deploying the New Action Type
- Using the New Action Type
- 11. Oozie Operations
- Oozie CLI Tool
- CLI Subcommands
- Useful CLI Commands
- The validate subcommand
- The job subcommand
- The jobs subcommand
- More subcommands
- Oozie CLI Tool
- Oozie REST API
- Oozie Java Client
- The oozie-site.xml File
- The Oozie Purge Service
- Job Monitoring
- JMS-Based Monitoring
- Installation and configuration
- Consuming JMS messages
- JMS-Based Monitoring
- Oozie Instrumentation and Metrics
- Reprocessing
- Workflow Reprocessing
- Coordinator Reprocessing
- Bundle Reprocessing
- Server Tuning
- JVM Tuning
- Service Settings
- The CallableQueueService
- The RecoveryService
- Oozie High Availability
- Debugging in Oozie
- Oozie Logs
- Developing and Testing Oozie Applications
- Application Deployment Tips
- Common Errors and Debugging
- MiniOozie and LocalOozie
- The Competition
- Index
O'Reilly Media - inne książki
-
JavaScript gives web developers great power to create rich interactive browser experiences, and much of that power is provided by the browser itself. Modern web APIs enable web-based applications to come to life like never before, supporting actions that once required browser plug-ins. Some are s...(177.65 zł najniższa cena z 30 dni)
186.15 zł
219.00 zł(-15%) -
How will software development and operations have to change to meet the sustainability and green needs of the planet? And what does that imply for development organizations? In this eye-opening book, sustainable software advocates Anne Currie, Sarah Hsu, and Sara Bergman provide a unique overview...(160.65 zł najniższa cena z 30 dni)
169.14 zł
199.00 zł(-15%) -
OpenTelemetry is a revolution in observability data. Instead of running multiple uncoordinated pipelines, OpenTelemetry provides users with a single integrated stream of data, providing multiple sources of high-quality telemetry data: tracing, metrics, logs, RUM, eBPF, and more. This practical gu...(143.65 zł najniższa cena z 30 dni)
143.65 zł
169.00 zł(-15%) -
What will you learn from this book?If you're a software developer looking for a quick on-ramp to software architecture, this handy guide is a great place to start. From the authors of Fundamentals of Software Architecture, Head First Software Architecture teaches you how to think architecturally ...(237.15 zł najniższa cena z 30 dni)
245.65 zł
289.00 zł(-15%) -
If you use Linux in your day-to-day work, then Linux Pocket Guide is the perfect on-the-job reference. This thoroughly updated 20th anniversary edition explains more than 200 Linux commands, including new commands for file handling, package management, version control, file format conversions, an...(92.65 zł najniższa cena z 30 dni)
101.15 zł
119.00 zł(-15%) -
Interested in developing embedded systems? Since they don't tolerate inefficiency, these systems require a disciplined approach to programming. This easy-to-read guide helps you cultivate good development practices based on classic software design patterns and new patterns unique to embedded prog...(152.15 zł najniższa cena z 30 dni)
160.65 zł
189.00 zł(-15%) -
Gain the valuable skills and techniques you need to accelerate the delivery of machine learning solutions. With this practical guide, data scientists, ML engineers, and their leaders will learn how to bridge the gap between data science and Lean product delivery in a practical and simple way. Dav...(237.15 zł najniższa cena z 30 dni)
245.65 zł
289.00 zł(-15%) -
This practical book provides a detailed explanation of the zero trust security model. Zero trust is a security paradigm shift that eliminates the concept of traditional perimeter-based security and requires you to "always assume breach" and "never trust but always verify." The updated edition off...(203.15 zł najniższa cena z 30 dni)
211.65 zł
249.00 zł(-15%) -
Decentralized finance (DeFi) is a rapidly growing field in fintech, having grown from $700 million to $100 billion over the past three years alone. But the lack of reliable information makes this area both risky and murky. In this practical book, experienced securities attorney Alexandra Damsker ...(203.15 zł najniższa cena z 30 dni)
211.65 zł
249.00 zł(-15%) -
Whether you're a startup founder trying to disrupt an industry or an entrepreneur trying to provoke change from within, your biggest challenge is creating a product people actually want. Lean Analytics steers you in the right direction.This book shows you how to validate your initial idea, find t...(126.65 zł najniższa cena z 30 dni)
126.65 zł
149.00 zł(-15%)
Dzieki opcji "Druk na żądanie" do sprzedaży wracają tytuły Grupy Helion, które cieszyły sie dużym zainteresowaniem, a których nakład został wyprzedany.
Dla naszych Czytelników wydrukowaliśmy dodatkową pulę egzemplarzy w technice druku cyfrowego.
Co powinieneś wiedzieć o usłudze "Druk na żądanie":
- usługa obejmuje tylko widoczną poniżej listę tytułów, którą na bieżąco aktualizujemy;
- cena książki może być wyższa od początkowej ceny detalicznej, co jest spowodowane kosztami druku cyfrowego (wyższymi niż koszty tradycyjnego druku offsetowego). Obowiązująca cena jest zawsze podawana na stronie WWW książki;
- zawartość książki wraz z dodatkami (płyta CD, DVD) odpowiada jej pierwotnemu wydaniu i jest w pełni komplementarna;
- usługa nie obejmuje książek w kolorze.
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka, którą chcesz zamówić pochodzi z końcówki nakładu. Oznacza to, że mogą się pojawić drobne defekty (otarcia, rysy, zagięcia).
Co powinieneś wiedzieć o usłudze "Końcówka nakładu":
- usługa obejmuje tylko książki oznaczone tagiem "Końcówka nakładu";
- wady o których mowa powyżej nie podlegają reklamacji;
Masz pytanie o konkretny tytuł? Napisz do nas: sklep[at]helion.pl.
Książka drukowana
Oceny i opinie klientów: Apache Oozie. The Workflow Scheduler for Hadoop Mohammad Kamrul Islam, Aravind Srinivasan (0) Weryfikacja opinii następuję na podstawie historii zamówień na koncie Użytkownika umieszczającego opinię. Użytkownik mógł otrzymać punkty za opublikowanie opinii uprawniające do uzyskania rabatu w ramach Programu Punktowego.