Only print lines from bottom up if there is data - java

How would you go about solving the following logic:
I have pdf file with cells:
addressLine1
addressLine2
addressLine3
addressLine4
addressLine5
cityStateZip
All of them have getters.
Sometimes, all fields have data and sometimes they don't.
To make it pretty, I want them grouped together, ie:
1261 Graeber St (address4)
Bldg 2313 Rm 24 (address5)
Pensacola FL 32508 (cityStateZip)
You need to account for some of these addresses being blank, if addressLine1 is the only one existing.ie:
1261 Graeber St (address5)
Pensacola FL 32508 (cityStateZip)
Here, since address2, address3, address4 are blank, we moved address1 on pdf cell address5
My code right now print:
1261 Graeber St (address1)
(address2)
(address3)
(address4)
(address5)
Pensacola FL 32508 (cityStateZip)
And here is the code:
FdfInput.SetValue("addressLine1", getAddressLine1() );
FdfInput.SetValue("addressLine2", getAddressLine2() );
FdfInput.SetValue("addressLine3", getAddressLine3() );
FdfInput.SetValue("addressLine4", getAddressLine4() );
FdfInput.SetValue("addressLine5", getAddressLine5() );
FdfInput.SetValue("addressLine6", getCityStateZip() );
Picture on the left is how it looks like right now, I want it to be like picture on the right.
Is this a good candidate for LinkedList.insertLast() ?

This:
if(!getAddressLine1().isEmpty())
FdfInput.SetValue("addressLine1", getAddressLine1());
if(!getAddressLine2().isEmpty())
FdfInput.SetValue("addressLine2", getAddressLine2());
if(!getAddressLine3().isEmpty())
FdfInput.SetValue("addressLine3", getAddressLine3());
if(!getAddressLine4().isEmpty())
FdfInput.SetValue("addressLine4", getAddressLine4());
if(!getAddressLine5().isEmpty())
FdfInput.SetValue("addressLine5", getAddressLine5());
if(!getCityStateZip().isEmpty())
FdfInput.SetValue("cityStateZip", getCityStateZip());
In other words, if there is data to add to the line, do so, otherwise, skip it entirely. For example, let's say all of the fields are empty besides address3, address5, and cityStateZip.
// The output will not look like this:
addressLine3
addressLine5
cityStateZip
Instead, it will look like:
addressLine3
addressLine5
cityStateZip

I solved it by storing strings in array list and decrementing the counter on the name:
List<String> addrLines = new ArrayList<String>();
if(!getCityStateZip().isEmpty())
addrLines.add(getTomaCityStateZip());
if(!getAddressLine5().isEmpty())
addrLines.add(getAddressLine5());
if(!getAddressLine4().isEmpty())
addrLines.add(getAddressLine4());
if(!getAddressLine3().isEmpty())
addrLines.add(getAddressLine3());
if(!getAddressLine2().isEmpty())
addrLines.add(getAddressLine2());
if(!getAddressLine1().isEmpty())
addrLines.add(getAddressLine1());
for (int i = addrLines.size(); i > 0; --i)
{
int line = addrLines.size() - i;
String field = String.format("addressLine%d", 6 - line);
FdfInput.SetValue(field, addrLines.get(line));
}

Related

How to read a Delimited Text file in Java?

We have following SEQ file from SFTP:
TSID ,D4 ; TEST ID # (PRIMARY)
TSNAME,A15 ; TEST NAME COMMON (ALTERNATE)
TSRNAM ,A15 ; PORT NAME
TSRELO ,A5 ; TEST REPEAT LOW VALUE
TSREHI ,A5 ; TEST REPEAT HIGH VALUE
TSSSRQ ,D2 ; SAMPLE SIZE REQ.
TSCTYP ,D2 ; CONTAINER TYPE
TSSUOM,A6 ; SAMPLE UNIT OF MEAS
TSINDX ,D4 ; WIDE REPORTING INDEX (ALTERNATE)
TSWKLF ,D2 ; WORKLIST FORMAT
TSMCCD,A8 ; MEDICARE CODE + MODIFIER 1 (ALTERNATE)
TSTADY ,D3 ; RESULT TURN-AROUND TIME IN DAYS
TSENOR ,A1 ; TEST HAS EXPANDED NORMALS Y/N
TSSRPT ,A1 ; ELIGIBLE FOR STATE NOTIFICATION REPORT Y/N
TSPLAB ,D2 ; SENDOUT LAB
The content of file are simple text like:
0001MONTH COMPOSITE 12319909110940 MONTH COMPOSITE
0002MONTHLY CAPD 12319909120944 MONTHLY CAPD
0003CAPD MONTHLY LS 123199100110021004100510081010101210151016101811620944105917931794 CAPD MONTHLY LS
0004CCPD MONTHLY LS 12319910011002100410051007100810101012101510161018116209400942105917931794 CCPD MONTHLY LS
0005HD MONTHLY LS 1231991001100210041005100710081010101210151016101809400942105917931794 HD MONTHLY LS
Is there any Java Internal package (or Third party Java library) available to read file Delimited file (.SEQ) in such a way to assign each value to POJO directly using some sort of converters?
For ex:
public class ra(){
#SomethigLength (0,4)
private String tsId;
#SomethigLength (4,15)
private String tsName;
}
(Note we are using Apache Camel here but i think camel may be complicated compare to any simple library?)
You can use camel-bindy with Fixed-Length records(https://camel.apache.org/components/latest/bindy-dataformat.html#_4_fixedlengthrecord)
So your class will be like:
#FixedLengthRecord(length = 15, paddingChar = ' ')
public class Fastbox {
#DataField(pos = 1, length = 4, align = "L")
private String tsId;
#DataField(pos = 2, length = 11, align = "L")
private String tsName;
}
and with unmarshal() you can convert the file to Java object.
More details are in the link above.
Hope it will help!
After so much introspection i will use
http://fixedformat4j.ancientprogramming.com/usage/index.html

regex \\p{So} in not filtering BLACK CIRCLE FOR RECORD

My question is about Unicode Character 'BLACK CIRCLE FOR RECORD'
with is defined in Miscellaneous Technical block and Symbol, Other [So] category, Ref
This code is not working
String registered= "President⏺";
System.out.println(registered.replaceAll("\\p{So}",""));
I get president⏺
With the BLACK CIRCLE FOR RECORD not being filtered by \\p{So} regex!
Thanks
Knowing the code point of ⏺ which is 23FA and listing all characters under \p{So} (Other_Symbol) category:
for (char ch = Character.MIN_VALUE; ch<Character.MAX_VALUE; ch++) {
if (Character.OTHER_SYMBOL == Character.getType(ch)) {
String s = String.format ("\\u%04x", (int)ch);
System.out.println(s);
}
}
We'll see:
...
\u23f0
\u23f1
\u23f2
\u23f3
\u2400
...
It's clear that code points \u23f3 through \u23ff are not included however they should be according to UnicodeData.txt. You are able to match BLACK CIRCLE FOR RECORD which properly has fallen in InMiscellaneous_Technical block with \p{InMiscellaneous_Technical} in Java.
You are seeing a bug.
I wrote thie code to figure out the character unicode category name
Map<Byte, String> unicodeCategories = new HashMap<>();
unicodeCategories.put(Character.COMBINING_SPACING_MARK, "Mc");
unicodeCategories.put(Character.CONNECTOR_PUNCTUATION, "Pc");
unicodeCategories.put(Character.CONTROL, "Cc");
unicodeCategories.put(Character.CURRENCY_SYMBOL, "Sc");
unicodeCategories.put(Character.DASH_PUNCTUATION, "Pd");
unicodeCategories.put(Character.DECIMAL_DIGIT_NUMBER, "Nd");
unicodeCategories.put(Character.ENCLOSING_MARK, "Me");
unicodeCategories.put(Character.END_PUNCTUATION, "Pe");
unicodeCategories.put(Character.FINAL_QUOTE_PUNCTUATION, "Pf");
unicodeCategories.put(Character.FORMAT, "Cf");
unicodeCategories.put(Character.INITIAL_QUOTE_PUNCTUATION, "Pi");
unicodeCategories.put(Character.LETTER_NUMBER, "Nl");
unicodeCategories.put(Character.LINE_SEPARATOR, "Zl");
unicodeCategories.put(Character.LOWERCASE_LETTER, "Ll");
unicodeCategories.put(Character.MATH_SYMBOL, "Sm");
unicodeCategories.put(Character.MODIFIER_LETTER, "Lm");
unicodeCategories.put(Character.MODIFIER_SYMBOL, "Sk");
unicodeCategories.put(Character.NON_SPACING_MARK, "Mn");
unicodeCategories.put(Character.OTHER_LETTER, "Lo");
unicodeCategories.put(Character.OTHER_NUMBER, "No");
unicodeCategories.put(Character.OTHER_PUNCTUATION, "Po");
unicodeCategories.put(Character.OTHER_SYMBOL, "So");
unicodeCategories.put(Character.PARAGRAPH_SEPARATOR, "Zp");
unicodeCategories.put(Character.PRIVATE_USE, "Co");
unicodeCategories.put(Character.SPACE_SEPARATOR, "Zs");
unicodeCategories.put(Character.START_PUNCTUATION, "Ps");
unicodeCategories.put(Character.SURROGATE, "Cs");
unicodeCategories.put(Character.TITLECASE_LETTER, "Lt");
unicodeCategories.put(Character.UNASSIGNED, "Cn");
unicodeCategories.put(Character.UPPERCASE_LETTER, "Lu");
char registered = '⏺';
int code = (int) registered;
System.out.println("character's general category name =
"+unicodeCategories.get( (byte) (Character.getType(code) ) ));
I get character's general category name = Cn
System.out.println(registered.replaceAll("\\p{Cn}",""));
I get empty string
So, the '⏺' is in Character.UNASSIGNED category not Character.OTHER_SYMBOL category in java implementation !!
System.out.println("Unicode name of the specified character =
"+Character.getName(code)); retun null because the code point is unassigned

Parsing a Tab Separated File

I'm attempting to TSV from IMDB:
$hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10>
NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1>
Secret in Their Eyes (2015) (uncredited) [2002 Dodger Fan]
Steve Jobs (2015) (uncredited) [1988 Opera House Patron]
Straight Outta Compton (2015) (uncredited) [Club Patron/Dopeman]
$lim, Bee Moe Fatherhood 101 (2013) (as Brandon Moore) [Himself - President, Passages]
For Thy Love 2 (2009) [Thug 1]
Night of the Jackals (2009) (V) [Trooth]
"Idle Talk" (2013) (as Brandon Moore) [Himself]
"Idle Times" (2012) {(#1.1)} (as Brandon Moore) [Detective Ryan Turner]
As you can some lines start with a tab and some do not. I want a map with the actor's name as a key and a list of movies as the value. Between the actor's name is one or more tabs to until the movie listing.
My code:
while ((line = reader.readLine()) != null) {
Matcher matcher = headerPattern.matcher(line);
boolean headerMatchFound = matcher.matches();
if (headerMatchFound) {
Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");
String newline;
reader.readLine();
while ((newline = reader.readLine()) != null) {
String[] fullLine = null;
String actor;
String title;
Pattern startsWithTab = Pattern.compile("^\t.*");
Matcher tab = startsWithTab.matcher(newline);
boolean tabStartMatcher = tab.matches();
if (!tabStartMatcher) {
fullLine = newline.split("\t.*");
System.out.println("Actor: " + fullLine[0] +
"Movie: " + fullLine[1]);
}//this line will have code to match lines that start with tabs.
}
}
}
The way I've done this only works for a few lines before I get and arrayoutofbounds exception. How can I parse the lines and split them into 2 strings at max if they have one or more tabs?
There are subtleties in parsing tab/comma-delimited data files having to do with quoting and escaping.
To save yourself a lot of work, frustration and headaches you really should consider using one of the existing CSV parsing libaries such as OpenCSV or Apache Commons CSV.
Posted as an answer instead of a comment because the OP has not stated a reason for reinventing the wheel and there are some tasks that really have been "solved" once and for all.

Need to filter, parse and sort multiple log files

I have a need to collect a subset of info from log files that reside on one-to-many log file servers. I have the following java code that does the initial data collection/filtering:
public String getLogServerInfo(String userName, String password, String hostNames, String id) throws Exception{
int timeout = 5;
String results = "";
String[] hostNameArray = hostNames.split("\\s*,\\s*");
for (String hostName : hostNameArray) {
SSHClient ssh = new SSHClient();
ssh.addHostKeyVerifier(new PromiscuousVerifier());
try {
Utils.writeStdOut("Parsing server: " + hostName);
ssh.connect(hostName);
ssh.authPassword(userName, password);
Session s = ssh.startSession();
try {
String sh1 = "cat /logs/en/event/event*.log | grep \"" + id + "\" | grep TYPE=ERROR";
Command cmd = s.exec(sh1);
results += IOUtils.readFully(cmd.getInputStream()).toString();
cmd.join(timeout, TimeUnit.SECONDS);
Utils.writeStdOut("\n** exit status: " + cmd.getExitStatus());
} finally {
s.close();
}
} finally {
ssh.disconnect();
ssh.close();
}
}
return results;
}
The results string variable looks something like this:
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:31 253 AM, HOST=server1, APPLICATION=app1, FUNCTION=function1, STATUS=null, GUID=null, etc. etc.
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:59 123 AM, HOST=server1, APPLICATION=app1, FUNCTION=function1, STATUS=null, GUID=null, etc. etc.
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:28 956 AM, HOST=server2, APPLICATION=app1, FUNCTION=function2, STATUS=null, GUID=null, etc. etc.
I need to accomplish the following:
What do I need to do to be able to sort results by TIMESTAMP? It is unsorted right now, because i am enumerating one to many files, and appending results to end of a string.
I only want a subset of "columns" returned, such as TYPE, TIMESTAMP, FUNCTION. I thought i could REGEX it in the grep, but maybe arrays would be better?
Results are simply being printed to console/report, as this is only printed for failed tests, and is there for troubleshooting purposes only.
I took the list of output that you provided and put it in a file, named test.txt, making sure that each "TYPE=ERROR etc. etc" was in a new line (I guess it's the same in your output, but it isn't clear).
Then I used cat test.txt | cut -d',' -f1,2,5 | sort -k2 to do what you want.
cut -d',' -f1,2,5 basically splits by comma and only reports tokens number 1,2,5 (TYPE,TIMESTAMP,FUNCTION). If you want more, you can add more numbers depending on what token you want
sort -k2 sorts according to the 2nd column (TIMESTAMP)
The output I get is:
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:28 956 AM, FUNCTION=function2
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:31 253 AM, FUNCTION=function1
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:59 123 AM, FUNCTION=function1
So what you should try and do, is to further pipe your command with |cut -d',' -f1,2,5 | sort -k2
I hope it helps.
After working on this some more, i come to find that one of the key/value pairs allows commas in the values, thus cut will not work. Here is the finished product:
My grep command stays the same, collecting data from all servers:
String sh1 = "cat /logs/en/event/event*.log | grep \"" + id + "\" | grep TYPE=ERROR";
Command cmd = s.exec(sh1);
results += IOUtils.readFully(cmd.getInputStream()).toString();
Put the string into an array, so i can process them line by line:
String lines[] = results.split("\r?\n");
I then used regex to get the data i needed, repeating the below for each line in the array, and for as many columns as needed. It's a bit of a hack, I probably could have done it better by simply replacing the comma in the offending key/value pair, then using SPLIT() and comma as delimeter, then looping for the fields i want.
lines2[i] = "";
Pattern p = Pattern.compile("TYPE=(.*?), APPLICATION=.*");
Matcher m = p.matcher(lines[i]);
if (m.find()) {
lines2[i] += ("TYPE=" + m.group(1));
}
Finally, this will sort by Timestamp, since it is 2nd column:
Arrays.sort(lines2);

Weird BufferedReader behavior for a huge file

I am getting a very weird error. So, my program read a csv file.
Whenever it comes to this line:
"275081";"cernusco astreet, milan, italy";NULL
I get an error:
In the debug screen, I see that the BufferedReader read only
"275081";"cernusco as
That is a part of the line. But, it should read all of the line.
What bugs me the most is when I simply remove that line out of the csv file, the bug disappear! The program runs without any problem. I can remove the line, maybe it is a bad input or whatever; but, I want to understand why I am having this problem.
For better understanding, I will include a part of my code here:
reader = new BufferedReader(new FileReader(userFile));
reader.readLine(); // skip first line
while ((line = reader.readLine()) != null) {
String[] fields = line.split("\";\"");
int id = Integer.parseInt(stripPunctionMark(fields[0]));
String location = fields[1];
if (location.contains("\";")) { // When there is no age. The data is represented as "location";NULL. We cannot split for ";" here. So check for "; and split.
location = location.split("\";")[0];
System.out.printf("Added %d at %s\n", id, location);
people.put(id, new Person(id, location));
numberOfPeople++;
}
else {
int age = Integer.parseInt(stripPunctionMark(fields[2]));
people.put(id, new Person(id, location, age));
System.out.printf("Added %d at: %s age: %d \n", id, location, age);
numberOfPeople++;
}
Also, you can find the csv file here or here is a short version of the part that I encountered the error:
"275078";"el paso, texas, usa";"62"
"275079";"istanbul, eurasia, turkey";"26"
"275080";"madrid, n/a, spain";"29"
"275081";"cernusco astreet, milan, italy";NULL
"275082";"hacienda heights, california, usa";"16"
"275083";"cedar rapids, iowa, usa";"22"
This has nothing whatsoever to do with BufferedReader. It doesn't even appear in the stack trace.
It has to do with your failure to check the result and length of the array returned by String.split(). Instead you are just assuming the input is well-formed, with at least three columns in each row, and you have no defences if it isn't.

Categories