slf4j logger info format - java

I'm trying to generate a 'nice' log message with using escaping characters or an xml pattern etc.
What I'd like to output is something along the lines of:
ABAS : value
B : value
Cse : value
I've achieved this using \t but I figure there must be a cleaner way. I've looked at .info which takes an argument and using the {} as a way of inserting the values but I can't seem to find out how to add the line breaks or tabbing.
so far I have
logger.info(A : {} \nBasdas : {} \nC : asds ) and so on.
thanks for any help.

slf4j is the log frontend and only intended to provide log level, message etc. to the backend, most likely logback in your case. You shouldn't format your messages in the frontend expecting any special format in the actual log output, because exactly that can be somewhat freely configured by the backend one uses. Especially indentation over some independent lines doesn't work, because you can't know how lines start, if your logger names are part of lines, where the msg is printed within a line and all that stuff. Just look at the logback configuration and what is possible, how do you want to tell as the log message issuing programmer which of those possibilities are used during runtime in any environment of your software? You simply can't and therefore shouldn't assume too much.
So what you want is simply not possible, besides embedding tabs or newlines there's nothing to format log messages in slf4j for a good reason. And you can't count on your tabs as well, because how those are presented to a user looking at your log file depends totally on the text editor or whatever one uses. It may even convert tabs to spaces, show them with a width of 1 or 10 or whatever.
Log statements spanning multiple lines may be considered bad practice at all.

Related

Defining a manual Split algorithm for File Input

I'm new to Spark and the Hadoop ecosystem and already fell in love with it.
Right now, I'm trying to port an existing Java application over to Spark.
This Java application is structured the following way:
Read file(s) one by one with a BufferedReader with a custom Parser Class that does some heavy computing on the input data. The input files are of 1 to maximum 2.5 GB size each.
Store data in memory (in a HashMap<String, TreeMap<DateTime, List<DataObjectInterface>>>)
Write out the in-memory-datastore as JSON. These JSON files are smaller of size.
I wrote a Scala application that does process my files by one worker but that is obviously not the most performance benefit I can get out of Spark.
Now to my problem with porting this over to Spark:
The input files are line-based. I usually have one message per line. However, some messages depend on preceding lines to form an actual valid message in the Parser. For example it could happen that I get data in the following order in an input file:
{timestamp}#0x033#{data_bytes} \n
{timestamp}#0x034#{data_bytes} \n
{timestamp}#0x035#{data_bytes} \n
{timestamp}#0x0FE#{data_bytes}\n
{timestamp}#0x036#{data_bytes} \n
To form an actual message that out of the "composition message" 0x036, the parser also needs the lines from message 0x033, 0x034 and 0x035. Other messages could also get in between these set of needed messages. The most messages can be parsed by reading a single line though.
Now finally my question:
How to get Spark to split my file correctly for my purposes? The files can not be Split "randomly"; they must be split in a way that makes sure that all my messages can be parsed and the Parser will not wait for input that he will never get. This means that each composition message (messages that depend on preceding lines) need to be in one split.
I guess there are several ways to achieve a correct output but I'll throw some ideas that I had into this post as well:
Define a manual Split algorithm for the file input? This will check that the last few lines of a split do not contain the start of a "big" message [0x033, 0x034, 0x035].
Split the file however spark wants but also add a fixed number of lines (lets say 50, that will do the job for sure) from the last split to the next split. Multiple data will be handled by the Parser class correctly and would not introduce any issues.
The second way might be easier, however I have no clue how to implement this in Spark. Can someone point me into the right direction?
Thanks in advance!
I saw your comment on my blogpost on http://blog.ae.be/ingesting-data-spark-using-custom-hadoop-fileinputformat/ and decided to give my input here.
First of all, I'm not entirely sure what you're trying to do. Help me out here: your file contains lines containing the 0x033, 0x034, 0x035 and 0x036 so Spark will process them separately? While actually these lines need to be processed together?
If this is the case, you shouldn't interpret this as a "corrupt split". As you can read in the blogpost, Spark splits files into records that it can process separately. By default it does this by splitting records on newlines. In your case however, your "record" is actually spread over multiple lines. So yes, you can use a custom fileinputformat. I'm not sure this will be the easiest solution however.
You can try to solve this using a custom fileinputformat that does the following: instead of giving line by line like the default fileinputformat does, you parse the file and keep track of encountered records (0x033, 0x034 etc). In the meanwhile you may filter out records like 0x0FE (not sure if you want to use them elsewhere). The result of this will be that Spark gets all these physical records as one logical record.
On the other hand, it might be easier to read the file line by line and map the records using a functional key (e.g. [object 33, 0x033], [object 33, 0x034], ...). This way you can combine these lines using the key you chose.
There are certainly other options. Whichever you choose depends on your use case.

Passing obtained values to other templates

I have a main template that captures a string:
#(captured: String)
.... other templating stuff
I have a sub template that wants to utilize #captured:
.... somewhere in this templating stuff we have:
#subTemplate(#captured) <- wants to use #captured
I try this and I get nothing but errors. Im sure this MUST be possible, so what am I doing wrong? Im sorry if this question is simple, I just dont know how to succinctly phrase it for Google.
You need to remove the trailing # symbol on captured when it is being passed in as a variable.
e.g
#subTemplate(#captured) --> #subTemplate(captured)
The reason why this is the case is because # is a special symbol that tells Play that the template engine is about to do some computation, rather than just outputting HTML. In the case above, by calling the sub template, you have already started a computation (i.e used the # symbol), so you do not use it again inside the parenthisis, because the compiler is already in computation mode.
This was exactly the same in the Play 1.x template engine.
Remove the leading 'at' in #captured. For some odd reason, Play didnt wanna pick up on this and make it work until now. Seeing if i can reproduce the problem.

Apache Commons Logging - New Line Characters

We are using apache commons logging in our application for all our logging requirements.
A funny use case popped up this week that I had a question about.
We started receiving FIX messages from clients where a particular field in the fix message is populated based on the values in a free form textarea on an application that our client has. The client is allowed to enter in any text they want including special characters and new lines etc.
We log the fix message we receive back but when we receive a FIX message that includes this tag that has new line characters in it, only the part of the fix message up until the new line character is logged. Is there anyway to tell the logging framework to ignore new line characters and log the entire string whether or not it contains new line characters?
are you sure the message isn't being logged on a new line? We do a similar thing and new lines are logged without any extra configuration. Are you grep'ing for these lines in the log to view them? I ask because they will show on a new line, and therefore not in the output of grep unless you add '-A 5' (or how many lines you want to see after) flag to your grep statement to see the new lines after the matching one.

Mask Passwords with Logback?

We currently generically log all XML documents coming in and going out of our system, and some of them contain passwords in the clear. We would like to be able to configure the logback logger/appender that is doing this to do some pattern matching or similar and if it detects a password is present to replace it (with asterisks most likely). Note we don't want to filter out the log entry, we want to mask a portion of it. I would appreciate advice on how this would be done with logback. Thanks.
The logback version 0.9.27 introduced replacement capability. Replacements support regular expressions. For example, if the logged message was "userid=alice, pswd='my secret'", and the output pattern was
"%d [%t] $logger - %msg%n",
you just modify the pattern to
"%d [%t] $logger - %replace(%msg){"pswd='.*'", "pswd='xxx'"}%n"
Note that the above makes use of option quoting.
The previous log message would be output as "userid=alice, pswd='xxx'"
For blazing performance, you could also mark the log statement as CONFIDENTIAL and instruct %replace to perform replacement only for log statements marked as CONFIDENTIAL. Example,
Marker confidential = MarkerFactory.getMarker("CONFIDENTIAL");
logger.info(confidential, "userid={}, password='{}'", userid, password);
Unfortunately, the current version of logback does not yet support conditional replacements (based on markers or otherwise). However, you could easily write your own replacement code by extending ReplacingCompositeConverter. Shout on the logback-user mailing list if you need further assistance.
I believe Masking is an aspect of your business, not the aspect of any technology or logging system. There are situations where the passwords, national identities etc should be masked while storing them in the DB as well due to legal reasons. You should be able to mask the xml before giving it to the logger.
One way to do it is to run the XML through XSLT that does that making and then give it to logger for logging.
If you doesn't want to do this then LogBack has Filters support that is one of the option (not the right one though).
But understand that any generic out of the box solution you are trying to find at the logging infrastructure level is going to be suboptimal as every log message is going to be checked for masking.

What does the org.apache.xmlbeans.XmlException with a message of "Unexpected element: CDATA" mean?

I'm trying to parse and load an XML document, however I'm getting this exception when I call the parse method on the class that extends XmlObject. Unfortunately, it gives me no ideas of what element is unexpected, which is my problem.
I am not able to share the code for this, but I can try to provide more information if necessary.
Not being able to share code or input data, you may consider the following approach. That's a very common dichotomic approach to diagnostic, I'm afraid, and indeed you may readily have started it...
Try and reduce the size of the input XML by removing parts of it, ensuring that the underlying XML document remains well formed and possibly valid (if validity is required in your parser's setup). If you maintain validity, this may require to alter [a copy of] the Schema (DTD or other), as manditory elements might be removed during the cut-and-try approach... BTW, the error message seems to hint more at a validation issue that a basic well-formedness assertion issue.
Unless one has a particular hunch as to the area that triggers the parser's complaint, we typically remove (or re-add, when things start working) about half of what was previously cut or re-added.
You may also start with trying a mostly empty file, to assert that the parser does work at all... There again is the idea to "divide to prevail": is the issue in the XML input or in the parser ? (remembering that there could be two issues, one in the input and one in the parser, and thtat such issues could even be unrelated...)
Sorry to belabor basic diagnostics techniques which you may well be fluent with...
You should check the arguments you are passing to the method parse();
If you are directly passing a string to parse or file or inputstream accordingly (File/InputStream/String) etc.
The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.
The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.
XMLBeans - Problem with XML files if length is exactly 8193bytes
Issue reported on XMLBean Jira

Categories