Creating a Custom Processor in Nifi and send to PublishKafka - java

Can someone provide me with a simple example for setting up a flowfile in a custom Nifi processor so the payload can be sent out thru the PublishKafka processor?
I have a legacy messaging protocol that I wrote a custom processor for. Pretty simple structure, just a MessageID (String) and the MessageBody (byte[]). My custom processor handles the input with the messages being received fine. I'm now attempting to put this data into a flowfile so it can be sent on to the publishKafka processor but I've had trouble finding any resources online with how to do this. Here's my current code snippet of the relevant portion:
try {
this.getLogger().info("[INFO - ListenMW] - Message Received: " +
data.getMsgID().toString() + " Size: " +
data.getMsgData().length);
this.currentSession.adjustCounter("MW Counter", 1, true);
// Setup the flowfile to transfer
FlowFile flowfile = this.currentSession.create();
flowfile = this.currentSession.putAttribute(flowfile, "key",data.getMsgID().toString());
flowfile = this.currentSession.putAttribute(flowfile, "value", new String(data.getMsgData(),StandardCharsets.UTF_8));
this.currentSession.transfer(flowfile, SUCCESS);
}catch(Exception e) {
this.getLogger().error("[INFO - ListenMW] - "+e.getMessage());
this.currentSession.adjustCounter("MW Failure", 1, true);
}
I've been unable to determine what attribute(s) to use for the msgID and msgData so I created my own for now. I saw one post where someone recommended building your own json structure and sending that through as your payload, but again which attribute would you send that thru so it will get mapped properly to the kafka message? I'm pretty new to Kafka and have only experimented with rudimentary test cases to this point so forgive my ignorance for any wrong assumptions.
Thanks for any guidance! I'm using Kafka2.0.1 and the PublishKafka_2.0 processor.

Based on what you've shared, it looks like the main reason you're not getting anything published into Kafka is you're not actually writing anything to the flowfile contents. For a reference point, here is a copy of the javadocs for NiFi (also, here are the processor docs). What you should be doing is something like this:
flowFile = session.write(flowFile, outStream -> {
outStream.write("some string here".getBytes());
});
I use PublishKafkaRecord, but the PublishKafka processors are pretty similar conceptually. You can set the key for the message the way you're doing it there, but you need to set the value by writing it to the flowfile body.
Without knowing your broader use case here, it looks like you can do what you need to do with ExecuteScript. See this as a starting point for ExecuteScript with multiple scripting language references.
If you need further help, we have multiple options here for you.

Related

Spring Cloud Function - Form Data/Multipart File?

I am creating a Spring Cloud Function that I want to give two inputs, an id and a Multipart file (CSV file) but I am having trouble.
If I choose to send a post with a multipart file the function won't recognise this and gives an error like Failed to determine input for function call with parameters:
With the Postman request being this:
#Bean
public Function<MultipartFile, String> uploadWatchlist() {
return body -> {
try {
return service.convert(body);
}
}
}
I have tried using something more akin to Spring MVC like a request entity object but no luck.
The backup I have (other than Python haha) will be using the binary data post so it will just be a string that has the contents of the file which does work, but requires me to append the id inside to each row of the csv which is a bit messy.
There are other solutions but trying to get this working as Java lambdas are what we want to try and use as first choice.
The infrastructure will be to fix up a manual file upload/verification process that is tedious at the moment and looks like: postman -> load balancer -> lambda -> ecs
The postman/load balancer part will be replaced in future. Ideally have the lambda sorted in Java taking in a file and id.
Thanks for any help :)

How to read output file for collecting stats (post) processing

Summary
I need to build a set of statistics during a Camel server in-modify-out process, and emit those statistics as one object (a single json log line).
Those statistics need to include:
input file metrics (size/chars/bytes and other, file-section specific measures)
processing time statistics (start/end/duration of processing time, start/end/duration of metrics gathering time)
output file metrics (same as input file metrics, and will be different numbers, output file being changed)
The output file metrics are the problem as I can't access the file until it's written to disk, and
its not written to disk until 'process'ing finishes
Background
A log4j implementation is being used for service logging, but after some tinkering we realised it really doesn't suit the requirement here as it would output multi-line json and embed the json into a single top-level field. We need varying top level fields, depending on the file processed.
The server is expected to deal with multiple file operations asynchronously, and the files vary in size (from tiny to fairly immense - which is one reason we need to iterate stats and measures before we start to tune or review)
Current State
input file and even processing time stats are working OK, and I'm using the following technique to get them:
Inside the 'process' override method of "MyProcessor" I create a new instance of my JsonLogWriter class. (shortened pseudo code with ellipsis)
import org.apache.camel.Exchange;
import org.apache.camel.Processor;
...
#Component
public class MyProcessor implements Processor {
...
#Override
public void process(Exchange exchange) throws Exception {
...
JsonLogWriter jlw = new JsonLogWriter();
jlw.logfilePath = jsonLogFilePath;
jlw.inputFilePath = inFilePath;
jlw.outputfilePath = outFilePath;
...
jlw.metricsInputFile(); //gathers metrics using inputFilePath - OK
...
(input file is processed / changed and returned as an inputstream:
InputStream result = myEngine.readAndUpdate(inFilePath);
... get timings
jlw.write
}
From this you can see that JsonLogWriter has
properties for file paths (input file, output file, log output),
a set of methods to populate data:
a method to emit the data to a file (once ready)
Once I have populated all the json objects in the class, I call the write() method and the class pulls all the json objects together and
the stats all arrive in a log file (in a single line of json) - OK.
Error - no output file (yet)
If I use the metricsOutputFile method however:
InputStream result = myEngine.readAndUpdate(inFilePath);
... get timings
jlw.metricsOutputFile(); // using outputfilePath
jlw.write
}
... the JsonLogWriter fails as the file doesn't exist yet.
java.nio.file.NoSuchFileException: aroute\output\a_long_guid_filename
when debugging I can't see any part of the exchange or result objects which I might pipe into a file read/statistics gathering process.
Will this require more camel routes to solve? What might be an alternative approach where I can get all the stats from input and output files and keep them in one object / line of json?
(very happy to receive constructive criticism - as in why is your Java so heavy-handed - and yes it may well be, I am prototyping solutions at this stage, so this isn't production code, nor do I profess deep understanding of Java internals - I can usually get stuff working though)
Use one route and two processors: one for writing the file and the next for reading the file, so one finishes writing before the other starts reading
Or , also you can use two routes: one for writing the file (to:file) and other that listens to read the file(from:file)
You can check for common EIP patterns that will solve most of this questions here:
https://www.enterpriseintegrationpatterns.com/patterns/messaging/

Best way to send a JSon object from Servlet

The question can seem simple, but I didn't find a good answer yet. I need to send a JSon structure (build with an unspecified libretry I'm currently developing) from a Servlet to a remote page.
I'm interested in the best way to send the structure.
I mean, in my Servlet, inside the doPost() event, how should I manage the send?
I was thinking about 2 scenarios:
try (PrintWriter out = response.getWriter()) {
out.print(myJSon.toString(); // <- recursive function that overrides
// toString() and returns the entire JSon
// structure
} (...)
or
try (OutputStream os = response.getOutputStream()) {
myJSon.write(os, StandardCharsets.UTF8); // <- function that
// recursively writes chunk of my JSon structure
// in a BufferWriter created inside the root write function
// forcing UTF-8 encoding
} (...)
Or something different, if there's a better approch.
Note that the JSon structure contains an array of objects with long text fields (descriptions with more than 1000 characterd), so it can be quite memory consuming.
For why I'm not using standard JSon libreries, it's because I don't know them and I don't know if I can trust them yet. And also I don't know if I will be able to install them on the production server.
Thanks for your answers.
From your question i see multiple points to adress:
How to send your JSon
What JSon library can you use
How to use the library in production
How to send your JSon
From your code this seems to be an HTTP response rather than a POST on your Servlet so you need to know how to send a JSON string as an HTTP response's body
Do you use a framework for your web server or are you handling everything manually ? If you use a framework it usually does it for you, just pass the JSON String
If your doing it manually:
try (PrintWriter pw = response.getWriter()) {
pw.write(myJson.toString());
}
or
try (OutputStream os = response.getOutputStream()) {
os.write(myJson.toString().getBytes());
}
Both are valid, see Writer or OutputStream?
Your JSON's size shouldn't matter given what your saying, it's just text so it won't be big enough to matter.
What libraries can you use
There are a lot of JSON libraries for Java, mainly:
Jackson
GSon
json-io
Genson
Go for the one you prefer, there will be extensive documentation and resources all over google
How to use in production
If you are not sure you are able to install dependencies on the production server, you can always create an uber-jar (See #Premraj' answer)
Basically, you bundle the dependency in your Jar
Using Gson is good way to send json
Gson gson = new Gson();
String jsonData = gson.toJson(student);
PrintWriter out = response.getWriter();
try {
out.println(jsonData);
} finally {
out.close();
}
for detail json response from servlet in java

How to read external JSON file from JMeter

Is there a way (any jmeter plugin) by which we can have the JMeter script read all the contents(String) from external text file ?
I have a utility in java which uses Jackson ObjectMapper to convert a arraylist to string and puts it to a text file in the desktop. The file has the JSON info that i need to send in the jmeter Post Body.
I tried using ${__FileToString()} but it was unable to deserialize the instance of java.util.ArrayList. It was also not reading all the values properly.
I am looking for something like csv reader where i just give the file location. I need all the json info present in the file. Need to extract it and assign to the post body.
Thanks for your help !!!
If your question is about how to deserialize ArrayList in JMeter and dynamically build request body, you can use i.e. Beanshell PreProcessor for it.
Add a Beanshell PreProcessor as a child of your request
Put the following code into the PreProcessor's "Script" area:
FileInputStream in = new FileInputStream("/path/to/your/serialized/file.ser");
ObjectInput oin = new ObjectInputStream(in);
ArrayList list = (ArrayList) oin.readObject();
oin.close();
in.close();
for (int i = 0; i < list.size(); i++) {
sampler.addArgument("param" + i, list.get(i).toString());
}
The code will read file as ArrayList, iterate through it and add request parameter like:
param1=foo
param2=bar
etc.
This is the closest answer I'm able to provide, if you need more exact advice - please elaborate your question. In the meantime I recommend you to get familiarized with How to use BeanShell: JMeter's favorite built-in component guide to learn about scripting in JMeter and what do pre-defined variables like "sampler" in above code snippet mean.

Unidentified MAPI property returned by Apache POI

I was digging in Apache POI API, trying out what all properties it fetches out of MSG file.
I parsed MSG file using POIFSChunkParser.
Here is the code:
try
{
InputStream is = new FileInputStream("C:\\path\\email.msg");
POIFSFileSystem poifs = new POIFSFileSystem(is);
POIFSChunkParser poifscprsr = new POIFSChunkParser();
ChunkGroup[] chkgrps = poifscprsr.parse(poifs);
for(ChunkGroup chunkgrp : chkgrps )
{
for(Chunk chunk : chunkgrp.getChunks())
{
System.out.println(chunk.getEntryName() + " ("
+ chunk.getChunkId() + ") " + chunk);
}
}
}
catch(FileNotFoundException fnfe)
{
System.out.println(fnfe.getMessage());
}
catch(IOException ioe)
{
System.out.println(ioe.getMessage());
}
In output it listed all accessible properties of MSG. One of them was looking like this:
__substg1.0_800A001F (32778) 04
I tried to find what is the significance of the property with HEX 800A here. (The subnodes of this topic lists the properties.)
Q1. However I didnt find property corresponding to HEX 800A. So what should I infer?
Also, I have some other but somewhat related questions:
Q2. Does Apache POI exposes all properties through MAPIMessage (I tried out exploring all methods of MAPIMessage too and started thinking it does not)?
Q3. If not, is there any other way to access all MAPI properties in Java with or without Apache POI.
First up, be a little wary of using the very low level HSMF classes if you're not following the Apache POI Dev List. There have been some updates to HSMF fairly recently to start adding support for fixed-length properties, and more are needed. Generally the high level classes will have a pretty stable API (even with the scratchpad warnings), which the lower level ones can (and sometimes do) change as new support gets added. If you're not on the dev list, this might be a shock...
Next up - working out what stuff is. This is where the HSMF Dev Tools come in. The simple TypesLister will let you check all the types that POI knows about (slightly more than it supports), while HSMFDump will do it's best to decode the file for you. If your chunk is of any kind of known type, between those two you can hopefully work out what it is and what it contains
Finally - getting all properties. As alluded to above, Apache POI used to only support variable length properties in .msg files. That has partly been corrected, with some fixed length support in there too, but more work is needed. Volunteers welcomed on the Dev List! MAPIMessage will give you all the common bits, but will also give you access to the different Chunk Groups. (A given message will be spread across a few different chunks, such as the main one, recipient ones, attachment ones etc). From there, you can get all the properties, along with the PropertiesChunk which gives access to the fixed length properties.

Categories