
A running installation of the Carte service.This means that from Airflow we will be able to see the status of these tasks, how long they have taken, as well as the logs. By using the REST API of the Pentaho PDI Carte service, Apache Airflow is able to launch and monitor both Jobs and Transformations, fully integrated. The operation of the plugin is relatively easy.

#Pentaho data integration installation install
The plugin airflow-pentaho-plugin integrates these two platforms to allow Data Engineers to orchestrate everything in an elegant way, and above all, to make our lives easier.īelow we explain how it works and how you can install it in your work environment and start taking advantage of the benefits it offers.

This tool solves problems such as executing tasks with a scheduler, managing retries in case of errors, custom error handling, establishing dependency relationships between tasks to optimise the execution times of the entire pipeline and much more. What is Apache Airflow?Īpache Airflow is a platform for authoring, scheduling and centrally monitoring data batch workflows. This is where Apache Airflow and the plugin developed by Damavis come into play. The problem is that the OpenSource version of Pentaho PDI lacks a task orchestrator, and we cannot establish relationships between different Transformations or Jobs, nor can we schedule those tasks to run from a certain time or respecting their dependencies. The reasons for choosing this tool may be many, but the most notable are that it is OpenSource, it is very complete and easy to use, supporting traditional workloads but also BigData or IoT. Many of our customers already use this tool for their Business Intelligence processes. Pentaho Data Integration is a tool primarily designed for ETLs (Extraction, Transform, Load) without the need for programming skills. In short, to obtain value from these data.īut maintaining, executing, monitoring all these ETLs and aggregations can become a complicated task, especially when we start to have a lot of dependencies between these processes. Extracting, cleaning, transforming, aggregating, loading or cross-referencing multiple data sources allows our clients to have Insights or Predictive Models using Machine Learning. Schedule, orchestrate and monitor your Kettle tasks with Airflow with this Pentaho plugin.Īt Damavis we know the importance of data processing.
