Methods to Automate Apache NiFi Knowledge Circulation Deployments within the Public Cloud

0/5 No votes

Report this app



With the newest launch of Cloudera DataFlow for the Public Cloud (CDF-PC) we added new CLI capabilities that assist you to automate knowledge circulate deployments, making it simpler than ever earlier than to include Apache NiFi circulate deployments into your CI/CD pipelines. This weblog put up walks you thru the information circulate improvement lifecycle and the way you need to use APIs in CDP Public Cloud to totally automate your circulate deployments.

Understanding the information circulate improvement lifecycle

Like another software program utility, NiFi knowledge flows undergo a improvement, testing and manufacturing part. Whereas key NiFi options like visible circulate design and interactive knowledge exploration are entrance and heart throughout the improvement part, operational options like useful resource administration, auto-scaling and efficiency monitoring grow to be essential as soon as a knowledge circulate has been deployed in manufacturing and enterprise capabilities rely upon it. 

CDF-PC, the primary cloud-native runtime for Apache NiFi knowledge flows, is targeted on operationalizing NiFi knowledge flows in manufacturing by offering useful resource isolation, auto-scaling and detailed KPI monitoring for circulate deployments.

On the similar time, Circulation Administration for Cloudera Knowledge Hub gives a conventional NiFi expertise centered on visible circulate design and interactive knowledge exploration. Collectively, Circulation Administration for Knowledge Hub and Cloudera DataFlow for the Public Cloud present all of the capabilities you could assist the whole knowledge circulate improvement lifecycle from improvement to manufacturing. 

Dev Deployment

Determine 1: Develop your knowledge flows utilizing Circulation Administration for Knowledge Hub and operationalize them utilizing Cloudera DataFlow for the Public Cloud (CDF-PC)

Creating knowledge flows with model management

As Determine 1 reveals, Circulation Administration for Knowledge Hub gives a perfect atmosphere that enables builders to shortly iterate on their knowledge flows till they’re able to be deployed in manufacturing. Each Circulation Administration cluster comes preinstalled with NiFi Registry making it simple for builders to model management their knowledge flows. 

Notice: Whereas model controlling knowledge flows isn’t required for manually exporting knowledge flows from the NiFi canvas, it’s a prerequisite for automating knowledge circulate export utilizing the NiFi Registry API.

To start out model controlling a knowledge circulate, merely proper click on the method group you wish to model, choose Model and Begin model management

Start Versioning

Determine 2: Beginning model management shops course of teams within the NiFi Registry and makes them accessible by way of the NiFi Registry API

Within the subsequent window, use the Bucket choice to affiliate your knowledge circulate with a selected undertaking or staff and specify a Circulation Identify. Optionally you can even present a Circulation Description and Model Feedback.

Save Flow Version

Determine 3: While you begin model management you may choose a Bucket and supply a reputation to your circulate definition

As soon as your knowledge circulate model has been saved to the NiFi Registry, you’ll discover a inexperienced tick showing in your NiFi course of group indicating that the method group is present and represents the newest model which is saved within the NiFi Registry.

Flow Definition

Determine 4: The inexperienced tick signifies that this course of group is utilizing the newest model of the circulate definition

Altering your knowledge circulate logic within the NiFi canvas introduces native modifications that aren’t but synchronized to the NiFi Registry. Proper click on on the method group, choose Model and Commit native modifications to create a brand new model that features your latest modifications.

Local Changes

Determine 5: A gray star signifies that native modifications must be dedicated to the NiFi Registry leading to a brand new model of the information circulate

Notice: In case you are planning to export your knowledge flows from the event atmosphere utilizing the NiFi Registry API, ensure that any native modifications you wish to embody have been dedicated again to the Registry.

Now that you’re accustomed to versioning your knowledge flows in your improvement atmosphere, let’s take a look at how one can export these variations and deploy them utilizing CDF-PC.

Exporting knowledge flows from Circulation Administration for Knowledge Hub

Apache NiFi 1.11 launched a brand new Obtain circulate definition functionality which exports the information circulate logic of a course of group. The export consists of any controller providers that exist within the chosen course of group in addition to parameter contexts which have been assigned to the chosen course of group. 

Flow Def

Determine 6: Exporting knowledge flows utilizing the “Obtain circulate definition” functionality within the NiFi canvas even works when you find yourself not versioning your course of teams

To manually export a circulate definition from the NiFI canvas, proper click on the method group you wish to export and choose Obtain circulate definition to acquire the circulate definition in JSON format. This technique exports the present course of group from NiFi together with any native modifications which could not have been dedicated to the NiFi Registry but. Since this operation doesn’t depend on the NiFi Registry, you may obtain the circulate definitions with out versioning your knowledge flows.

Exporting knowledge flows utilizing the NiFi Registry API

Downloading circulate definitions proper from the NiFi canvas is straightforward nevertheless it requires a handbook motion. One method to automate this course of is to straight use the NiFi Registry API which lets you programmatically export any model of your knowledge circulate that has been saved within the Registry. 

Notice: To make use of the NiFi Registry method you must model your knowledge flows as defined within the earlier part.

In CDP Public Cloud, endpoints just like the NiFi Registry API are protected and uncovered via a central Apache Knox proxy. To acquire the NiFi Registry API endpoint, navigate to your Circulation Administration Knowledge Hub cluster and choose the Endpoints tab.

Flow Management End Points

Determine 7: Circulation Administration cluster endpoints uncovered via Knox

Copy the NiFi Registry Relaxation URL and use it as the bottom URL to assemble your Relaxation calls. Discuss with the Apache NiFi Registry Relaxation API documentation for all obtainable API calls. First, you wish to export the newest model of your knowledge circulate from the Registry, subsequently the endpoint you could use is /buckets/{bucketId}/flows/{flowId}/variations/newest .

After acquiring the Registry Relaxation URL and the API endpoint, you could acquire the bucketID and flowId to assemble the total API path. To do that, navigate to your Circulation Administration Knowledge Hub cluster and click on the NiFi Registry icon which logs you into the NiFi Registry UI.

Navigating to Registry

Determine 8: Navigating to the NiFi Registry UI

Within the NiFi Registry UI, discover the circulate definition that you just wish to export by searching for the circulate title that you just supplied once you began versioning your course of group. Broaden the corresponding entry and replica the BUCKET IDENTIFIER and the FLOW IDENTIFIER.

Utilizing the NiFi Registry Relaxation URL in addition to the Bucket and Circulation identifiers now you can assemble the ultimate URL:
Nifi Registry Buckets

Determine 9: Acquiring the bucketID and flowId from NiFi Registry

Because the NiFi Registry API is uncovered via a Knox proxy, you could authenticate your Relaxation API name utilizing a CDP workload person and password. You should use your private CDP workload person or a machine person for this function so long as the EnvironmentUser position has been assigned to the CDP workload person for the CDP atmosphere which is internet hosting your Circulation Administration cluster.

So as to add the EnvironmentUser position, navigate to your CDP atmosphere, choose “Handle Entry” from the Actions menu and assign the EnvironmentUser position to the CDP workload person you wish to use.

User Setup

Determine 10: Assigning the EnvironmentUser position to a CDP workload person

In CDP Public Cloud, entry to versioned NiFi knowledge flows within the NiFi Registry is managed by Apache Ranger. The CDP workload person that you’re planning to make use of to name the NiFi Registry Relaxation API must be allowed entry to the circulate definition that you just wish to export. To permit the nifi-kafka-ingest person entry to the bucket caea6227-2bde-452f-a325-3eac0424868f you could create a corresponding coverage in Ranger: 

Rangers Setup

Determine 11: This Ranger coverage permits your beforehand created machine person to entry the NiFi Registry bucket which shops the circulate definition you wish to export.

Now that you’ve arrange your CDP workload person, ensured that it could entry the circulate definition within the Registry, and obtained all the mandatory IDs, you may go forward and export your circulate definition from the NiFi Registry.

Let’s mix the endpoint URL data you collected earlier with the bucket and circulate identifiers and CDP workload person particulars to assemble your last Relaxation API name. The response would be the circulate definition in JSON format and you’ll select to put it aside to a file utilizing the redirect operator >

curl -u CDP_WORKLOAD_USER:CDP_WORKLOAD_USER_PASSWORD > /house/youruser/myflowdefinition.json

Notice: In case you are working the command on one of many NiFi situations, change “gateway” by “management0” to make sure the Registry endpoint will be reached.

Notice: On this instance we’re utilizing curl to invoke the Registry Relaxation endpoint. In case you are utilizing Python, take a look at nipyapi, which already gives Python wrappers for the NiFi and NiFi Registry API endpoints.

Notice: To automate exporting knowledge flows even additional you need to use NiFi Registry Hooks that assist you to execute a script when a sure motion within the Registry is triggered. You can arrange a Registry hook that robotically exports the circulate definition and uploads it to the CDF-PC Circulation Catalog each time a brand new model is created. 

Exporting knowledge flows utilizing the NiFi CLI

It’s also possible to use the NiFi CLI to export circulate definitions from the registry. The NiFi CLI is a part of the NiFi toolkit which is put in on any NiFi node in your Circulation Administration cluster. 

To make use of the NiFi CLI, set up an SSH reference to any NiFi node and login along with your CDP workload person title. Begin the NiFi CLI by executing the next command:


Along with the circulate identifier, NiFi Registry Relaxation endpoint and CDP workload person credentials, this method additionally requires you to explicitly specify a truststore configuration to determine a safe connection. Whereas the truststore location (/hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks

) and the truststore sort (JKS) are the identical on each Circulation Administration cluster, the truststore password is exclusive for every cluster and must be obtained from /and many others/hadoop/conf/ssl-client.xml

With the Registry Relaxation endpoint, CDP workload person credentials, circulate identifier and truststore data now you can assemble the total registry export-flow-version command:

registry export-flow-version --baseUrl --flowIdentifier 45f308ce-9dc2-4ac7-9ff2-153d714b52dd --basicAuthUsername CDP_WORKLOAD_USER --basicAuthPassword CDP_WORKLOAD_USER_PASSWORD --truststore /hadoopfs/fs4/working-dir/cm-auto-global_truststore.jks --truststorePasswd TRUSTSTORE_PASSWORD --truststoreType jks --outputType json --outputFile /house/youruser/myflowdefinition.json 

The command will return the circulate definition in json format and write it to the placement specified utilizing –outputFile.

Notice: In case you are working the nifi toolkit on one of many NiFi situations, change “gateway” by “management0” to make sure the Registry endpoint will be reached.

Importing knowledge flows into CDF for the Public Cloud

Now that you’ve exported the circulate definition from the Circulation Administration improvement atmosphere, you could import it into CDF-PC’s central Circulation Catalog earlier than you may create deployments.

Many of the actions that you may carry out in CDF-PC’s UI will also be automated utilizing the CDP CLI. Earlier than you can begin utilizing the CDP CLI to add your circulate definition to the Circulation Catalog you could obtain and configure it accurately.

Notice: CDF-PC CLI instructions are at present solely obtainable within the CDP Beta CLI. Use these directions to put in and configure the Beta CLI.

Upon getting arrange the CDP CLI you may discover all obtainable CDF-PC instructions just by working cdp df.

The command for importing circulate definitions into the catalog is df import-flow-definition and requires you to specify the trail to the circulate definition you wish to add and supply a reputation for it within the catalog. 

cdp df import-flow-definition --file myflowdefinition.json --name MyFlowDefinition --description “That is my first uploaded Circulation Definition” --comments “Model 1”   

You’ve now efficiently imported your circulate definition and might discover it within the Circulation Catalog.

Flow Definition Importing

Determine 12: The circulate definition has been imported efficiently to the catalog

If you wish to add new variations of this circulate definition, use the import-flow-definition-version command. It requires you to specify the CRN of the prevailing circulate definition within the catalog in addition to the brand new circulate definition JSON file that you just wish to add as a brand new model.

To get the circulate definition CRN, navigate to the catalog, choose your circulate definition and replica the CRN. Use the CRN to assemble the ultimate import-flow-definition-version command:

cdp df import-flow-definition-version --file myflowdefinition_v2.json --flow-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:circulate:MyFlowDefinition --comments “Model 2 with fixes for processing knowledge”

After profitable execution, you’ll now see a second model for the circulate definition within the catalog.

Flow Def Version Import

Determine 13: A brand new model has been created for the imported circulate definition

Deploying knowledge flows with CDF for Public Cloud

After importing your circulate definition into the catalog you need to use the create-deployment command to automate circulate deployments. 

To create a circulate deployment in CDF-PC, you must present the circulate definition CRN from the Circulation Catalog, any parameter values the circulate may require, any KPIs you wish to arrange in addition to deployment configurations just like the NiFi node measurement or whether or not the deployment ought to robotically scale up and down. 

The simplest method to assemble the total create-deployment command is to stroll via the Deployment Wizard as soon as and use the View CLI Command function within the Evaluate step to generate the corresponding CLI command and the required parameter and KPI information.

View Clic Command

Determine 14: The Evaluate step within the Deployment Wizard creates parameter and KPI property information and constructs the ultimate create-deployment command

In case your circulate deployment comprises circulate parameters and KPIs, obtain the Circulation Deployment Parameters JSON and Circulation Deployment KPIs JSON information. These information outline all parameters and their values in addition to KPIs that you just outlined within the wizard.

Notice: Values for Parameters marked as delicate won’t be included within the generated parameters file. Replace the parameter worth after downloading the file.

With these two information downloaded, all you’ve got left to do is copy the CLI command from the wizard, alter the parameter-groups file and kpis file paths earlier than you may hit enter and programmatically create your first circulate deployment.

  cdp df create-deployment 

  --service-crn crn:cdp:df:us-west-1:9d74eee4-1cad-45d7-b645-7ccf9edbb73d:service:e7aef078-aa34-44eb-8bb7-79e89a734911 

  --flow-version-crn crn:cdp:df:us-west-1:558bc1d2-8867-4357-8524-311d51259233:circulate:MyFlowDefinition/v.2 

  --deployment-name "MyFirstDeployment" 



  --cluster-size-name EXTRA_SMALL 


  --auto-scale-min-nodes 1 

  --auto-scale-max-nodes 3 

  --parameter-groups file://PATH_TO_UPDATE/flow-parameter-groups.json 

  --kpis file://PATH_TO_UPDATE/flow-kpis.json

After issuing the create-deployment command, you may navigate to the Dashboard in CDF-PC and watch the deployment course of. As soon as the deployment has been created efficiently you may handle it through the use of each the UI and the CLI.


Automating circulate deployments with a single command is a key function of CDF-PC and helps you deal with knowledge circulate improvement, deployment and monitoring as a substitute of worrying about creating infrastructure and organising advanced CI/CD pipelines. Going ahead we’ll proceed to enhance the CDF-PC CLI capabilities to additional optimize the circulate improvement lifecycle. Take the CDF-PC Product Tour and be taught extra about CDF-PC within the documentation.



Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.