I want to extract a piece of information from a log file. The pattern that I am using is the prompt of the node-name and the command. I want to extract information of the command output and compare them. Consider the sample output as follows
NodeName > command1
this is the sample output
NodeName > command2
this is the sample output
I have tried the following code.
public static void searchcommand( String strLineString)
{
String searchFor = "Nodename> command1";
String endStr = "Nodename";
String op="";
int end=0;
int len = searchFor.length();
int result = 0;
if (len > 0) {
int start = strLineString.indexOf(searchFor);
while(start!=-1){
end = strLineString.indexOf(endStr,start+len);
if(end!=-1){
op=strLineString.substring(start, end);
}else{
op=strLineString.substring(start, strLineString.length());
}
String[] arr = op.split("%%%%%%%");
for (String z : arr) {
System.out.println(z);
}
start = strLineString.indexOf(searchFor,start+len);
}
}
}
The issue is that the code is too slow to extract the data. Is there any other way to do so?
EDIT 1
Its a log file which I have read as a string in the above code.
My suggestion..
public static void main(String[] args) {
String log = "NodeName > command1 \n" + "this is the sample output \n"
+ "NodeName > command2 \n" + "this is the sample output";
String lines[] = log.split("\\r?\\n");
boolean record = false;
String statements = "";
for (int j = 0; j < lines.length; j++) {
String line = lines[j];
if(line.startsWith("NodeName")){
if(record){
//process your statement
System.out.println(statements);
}
record = !record;
statements = ""; // Reset statement
continue;
}
if(record){
statements += line;
}
}
}
Here is my suggestion:
Use a regular expression. Here is one:
final String input = " NodeName > command1\n" +
"\n" +
" this is the sample output1 \n" +
"\n" +
" NodeName > command2 \n" +
"\n" +
" this is the sample output2";
final String regex = ".*?NodeName > command(\\d)(.*?)(?=NodeName|\\z)";
final Matcher matcher = Pattern.compile(regex, Pattern.DOTALL).matcher(input);
while(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2).trim());
}
Output:
1
this is the sample output1
2
this is the sample output2
So, to break down the regex:
First, it skips all signs until it finds the first "NodeName > command", followed by a number. This number we want to keep, to know which command created the output. Next, we grab all the following signs, until we (using lookahead) find another NodeName, or the end of the input.
Related
I would like to count countX and countX using the same loop instead of creating three different loops. Is there any easy way approaching that?
public class Absence {
private static File file = new File("/Users/naplo.txt");
private static File file_out = new File("/Users/naplo_out.txt");
private static BufferedReader br = null;
private static BufferedWriter bw = null;
public static void main(String[] args) throws IOException {
int countSign = 0;
int countX = 0;
int countI = 0;
String sign = "#";
String absenceX = "X";
String absenceI = "I";
try {
br = new BufferedReader(new FileReader(file));
bw = new BufferedWriter(new FileWriter(file_out));
String st;
while ((st = br.readLine()) != null) {
for (String element : st.split(" ")) {
if (element.matches(sign)) {
countSign++;
continue;
}
if (element.matches(absenceX)) {
countX++;
continue;
}
if (element.matches(absenceI)) {
countI++;
}
}
}
System.out.println("2. exerc.: There are " + countSign + " rows int the file with that sign.");
System.out.println("3. exerc.: There are " + countX + " with sick note, and " + countI + " without sick note!");
} catch (FileNotFoundException ex) {
Logger.getLogger(Absence.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
text file example:
# 03 26
Jujuba Ibolya IXXXXXX
Maracuja Kolos XXXXXXX
I think you meant using less than 3 if statements. You can actually so it with no ifs.
In your for loop write this:
Countsign += (element.matches(sign)) ? 1 : 0;
CountX += (element.matches(absenceX)) ? 1 : 0;
CountI += (element.matches(absenceI)) ? 1 : 0;
Both answers check if the word (element) matches all regular expressions while this can (and should, if you ask me) be avoided since a word can match only one regex. I am referring to the continue part your original code has, which is good since you do not have to do any further checks.
So, I am leaving here one way to do it with Java 8 Streams in "one liner".
But let's assume the following regular expressions:
String absenceX = "X*";
String absenceI = "I.*";
and one more (for the sake of the example):
String onlyNumbers = "[0-9]*";
In order to have some matches on them.
The text is as you gave it.
public class Test {
public static void main(String[] args) throws IOException {
File desktop = new File(System.getProperty("user.home"), "Desktop");
File txtFile = new File(desktop, "test.txt");
String sign = "#";
String absenceX = "X*";
String absenceI = "I.*";
String onlyNumbers = "[0-9]*";
List<String> regexes = Arrays.asList(sign, absenceX, absenceI, onlyNumbers);
List<String> lines = Files.readAllLines(txtFile.toPath());
//#formatter:off
Map<String, Long> result = lines.stream()
.flatMap(line-> Stream.of(line.split(" "))) //map these lines to words
.map(word -> regexes.stream().filter(word::matches).findFirst()) //find the first regex this word matches
.filter(Optional::isPresent) //If it matches no regex, it will be ignored
.collect(Collectors.groupingBy(Optional::get, Collectors.counting())); //collect
System.out.println(result);
}
}
The result:
{X*=1, #=1, I.=2, [0-9]=2}
X*=1 came from word: XXXXXXX
#=1 came from word: #
I.*=2 came from words: IXXXXXX and Ibolya
[0-9]*=2 came from words: 03 and 06
Ignore the fact I load all lines in memory.
So I made it with the following lines to work. It escaped my attention that every character need to be separated from each other. Your ternary operation suggestion also nice so I will use it.
String myString;
while ((myString = br.readLine()) != null) {
String newString = myString.replaceAll("", " ").trim();
for (String element : newString.split(" ")) {
countSign += (element.matches(sign)) ? 1 : 0;
countX += (element.matches(absenceX)) ? 1 : 0;
countI += (element.matches(absenceI)) ? 1 : 0;
I have the following :
String contain 5* ;
char opr ;
int data ;
I would like to store the 5 in the data and * in the opr
any idea how to do it ?
as i know if the string contain only Integer then I will split and the parse it but in this case the String contain int and char
Input :
String input = 5* ;
I want the Output will be like this :
this.opr = *
this.data = 5
A simple way to achieve that (if you don't want to use Regex) is to do something like this:
String temp ="";
// read every char in the input String
for(char c: input.toCharArray()){
// if it's a digit
if(Character.isDigit(c)){
temp +=c; // append it
}
else{ // at the end parse the temp String
data = Integer.parseInt(temp);
opr = c;
break;
}
}
//test
System.out.println("Input: " + input
+ "\t Data: " + data
+ "\t Opr: " + opr);
Test
Input: 5* Data: 5 Opr: *
Input: 123* Data: 123 Opr: *
Under the dubious assumption that the issue is parsing a String in the form of number followed by op, this code will achieve the desired result.
public static void main(String[] args)
{
final String input = "5*";
int data = -1;
String op = "";
Pattern pat = Pattern.compile("([\\d]+)[\\s]*(.*)");
Matcher m = pat.matcher(input);
if (m.matches() && m.groupCount() == 2) {
data = Integer.parseInt(m.group(1));
op = m.group(2).trim();
}
System.out.printf("%2d with op %s%n", data, op);
}
Output:
5 with op *
String input = 5*;
char[] splitted = input.toCharArray();
char opr = splitted[1];
int data = Character.getNumericValue(splitted[0]);
you may also try this with 3 different inputs as below. It will print 0 for incorrect DATA and blank for incorrect OPERAND.
String input = "5*";
//String input = "109/";
//String input = "109b3*";
int data = 0;
char opr = ' ';
int len = input.length();
if ( len > 1 ) {
String data_s = input.substring(0, len - 1);
try {
data = Integer.valueOf(data_s).intValue();
} catch(Exception e) {
}
opr = input.substring(len - 1). charAt(0);
if ( opr != '+' && opr != '-' && opr != '*' && opr != '/' ) {
opr = ' ';
}
}
System.out.println("Input:" + input + " Data:" + data + " Opr:" + opr);
I am trying to read a PDF file in Java by using Itext. In my PDF file I have some calculation results. In a line there is an element and its two calculation results and they are not in a table. My PDF file looks like this :
I. Result X 12.551.734,75 9.284.925,26
. A. Result Y 8.583.482,18 416.187,03
. 1. result z 83.708,72 91.220,23
. 3. result a 8.499.773,46 324.966,80
. B. Result B 0,00 199.942,00
. 4. result c 0,00 199.942,00
. C. Result D 780.316,81 5.376.366,65
. 1. result e 66.041,73 3.962.399,52
. 2. result f 685.579,00 1.367.086,66
What I am trying to do is parse the string and its values. I couldn't find a proper way and I tried the code below. But the problem with this logic for the line :
. 1. result z 8.583.482,18 416.187,03
it prints just "." for the string then 1 and the first number. I couldn't get the whole ". 1. result z" part as string and then its values because it prints directly after seeing an int value and skips rest.
int page = 1;
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
strategy = parser.processContent(page, new LocationTextExtractionStrategy());
Scanner scanner = new Scanner(strategy.getResultantText());
...
for (int j = page; j <= reader.getNumberOfPages(); j++) {
while (scanner.hasNextLine()) {
String nextToken = scanner.nextLine();
String rName = "";
StringTokenizer tok = new StringTokenizer(nextToken);
while (tok.hasMoreTokens()) {
String nToken = tok.nextToken();
try {
number = fmt.parse(nToken);
System.out.println(rName);
System.out.println(number);
while (tok.hasMoreTokens()) {
try {
nToken = tok.nextToken();
number = fmt.parse(nToken);
System.out.println(number);
} catch (ParseException e) {
if(rName.isEmpty()){
rName = nToken;
}else{
rName = rName + " " + nToken;
}
}
}
break;
} catch (ParseException e) {
if(rName.isEmpty()){
rName = nToken;
}else{
rName = rName + " " + nToken;
}
}
}
}
strategy = parser.processContent(++page, new LocationTextExtractionStrategy());
scanner = new Scanner(strategy.getResultantText());
}
How can I get these strings and their values correctly, could you help me please? Is there any other useful way to do it as I think this solution is not good enough?
Thank you for all the detail you provided. Typically you'd use a regular expression to parse complicated lines. Though sometimes programmatic parsing is a bit easier to follow. Rather than using the StringTokenizer to split the line, perhaps try:
String line = scanner.nextLine();
String[] tokens = line.split("\\s+");
String value1 = tokens[tokens.length-2];
String value2 = tokens[tokens.length-1];
String rowTitle = line.substring(0, line.indexOf(value1)).trim();
System.out.print(rowTitle + "\t");
System.out.print(value1 + "\t");
System.out.println(value2);
I'm trying to add a count number for matching words, like this:
Match word: "Text"
Input: Text Text Text TextText ExampleText
Output: Text1 Text2 Text3 Text4Text5 ExampleText6
I have tried this:
String text = "Text Text Text TextText ExampleText";
String match = "Text";
int i = 0;
while(text.indexOf(match)!=-1) {
text = text.replaceFirst(match, match + i++);
}
Doesn't work because it would loop forever, the match stays in the string and IndexOf will never stop.
What would you suggest me to do?
Is there a better way doing this?
Here is one with a StringBuilder but no need to split:
public static String replaceWithNumbers( String text, String match ) {
int matchLength = match.length();
StringBuilder sb = new StringBuilder( text );
int index = 0;
int i = 1;
while ( ( index = sb.indexOf( match, index )) != -1 ) {
String iStr = String.valueOf(i++);
sb.insert( index + matchLength, iStr );
// Continue searching from the end of the inserted text
index += matchLength + iStr.length();
}
return sb.toString();
}
first take one stringbuffer i.e. result,Then spilt the source with the match(destination).
It results in an array of blanks and remaining words except "Text".
then check condition for isempty and depending on that replace the array position.
String text = "Text Text Text TextText ExampleText";
String match = "Text";
StringBuffer result = new StringBuffer();
String[] split = text.split(match);
for(int i=0;i<split.length;){
if(split[i].isEmpty())
result.append(match+ ++i);
else
result.append(split[i]+match+ ++i);
}
System.out.println("Result is =>"+result);
O/P
Result is => Text1 Text2 Text3 Text4Text5 ExampleText6
Try this solution is tested
String text = "Text Text Text TextText Example";
String match = "Text";
String lastWord=text.substring(text.length() -match.length());
boolean lastChar=(lastWord.equals(match));
String[] splitter=text.split(match);
StringBuilder sb = new StringBuilder();
for(int i=0;i<splitter.length;i++)
{
if(i!=splitter.length-1)
splitter[i]=splitter[i]+match+Integer.toString(i);
else
splitter[i]=(lastChar)?splitter[i]+match+Integer.toString(i):splitter[i];
sb.append(splitter[i]);
if (i != splitter.length - 1) {
sb.append("");
}
}
String joined = sb.toString();
System.out.print(joined+"\n");
One possible solution could be
String text = "Text Text Text TextText ExampleText";
String match = "Text";
StringBuilder sb = new StringBuilder(text);
int occurence = 1;
int offset = 0;
while ((offset = sb.indexOf(match, offset)) != -1) {
// fixed this after comment from #RealSkeptic
String insertOccurence = Integer.toString(occurence);
sb.insert(offset + match.length(), insertOccurence);
offset += match.length() + insertOccurence.length();
occurence++;
}
System.out.println("result: " + sb.toString());
This will work for you :
public static void main(String[] args) {
String s = "Text Text Text TextText ExampleText";
int count=0;
while(s.contains("Text")){
s=s.replaceFirst("Text", "*"+ ++count); // replace each occurrence of "Text" with some place holder which is not in your main String.
}
s=s.replace("*","Text");
System.out.println(s);
}
O/P:
Text1 Text2 Text3 Text4Text5 ExampleText6
I refactored #DeveloperH 's code to this:
public class Snippet {
public static void main(String[] args) {
String matchWord = "Text";
String input = "Text Text Text TextText ExampleText";
String output = addNumbersToMatchingWords(matchWord, input);
System.out.print(output);
}
private static String addNumbersToMatchingWords(String matchWord, String input) {
String[] inputsParts = input.split(matchWord);
StringBuilder outputBuilder = new StringBuilder();
int i = 0;
for (String inputPart : inputsParts) {
outputBuilder.append(inputPart);
outputBuilder.append(matchWord);
outputBuilder.append(i);
if (i != inputsParts.length - 1)
outputBuilder.append(" ");
i++;
}
return outputBuilder.toString();
}
}
We can solve this by using stringbuilder, it provides simplest construct to insert character in a string. Following is the code
String text = "Text Text Text TextText ExampleText";
String match = "Text";
StringBuilder sb = new StringBuilder(text);
int beginIndex = 0, i =0;
int matchLength = match.length();
while((beginIndex = sb.indexOf(match, beginIndex))!=-1) {
i++;
sb.insert(beginIndex+matchLength, i);
beginIndex++;
}
System.out.println(sb.toString());
Text File(First three lines are simple to read, next three lines starts with p)
ThreadSize:2
ExistingRange:1-1000
NewRange:5000-10000
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
p:25 - CrossPromoEditItemRule Data:New UserLogged:false Attribute:1 Attribute:10107 Attribute:10108
p:20 - CrossPromoManageRules Data:Previous UserLogged:true Attribute:1 Attribute:10107 Attribute:10108
Below is the code I wrote to parse the above file and after parsing it I am setting the corresponding values using its Setter. I just wanted to know whether I can improve this code more in terms of parsing and other things also by using other way like using RegEx? My main goal is to parse it and set the corresponding values. Any feedback or suggestions will be highly appreciated.
private List<Command> commands;
private static int noOfThreads = 3;
private static int startRange = 1;
private static int endRange = 1000;
private static int newStartRange = 5000;
private static int newEndRange = 10000;
private BufferedReader br = null;
private String sCurrentLine = null;
private int distributeRange = 100;
private List<String> values = new ArrayList<String>();
private String commandName;
private static String data;
private static boolean userLogged;
private static List<Integer> attributeID = new ArrayList<Integer>();
try {
// Initialize the system
commands = new LinkedList<Command>();
br = new BufferedReader(new FileReader("S:\\Testing\\Test1.txt"));
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains("ThreadSize")) {
noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
} else if(sCurrentLine.contains("ExistingRange")) {
startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else if(sCurrentLine.contains("NewRange")) {
newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else {
allLines.add(Arrays.asList(sCurrentLine.split("\\s+")));
String key = sCurrentLine.split("-")[0].split(":")[1].trim();
String value = sCurrentLine.split("-")[1].trim();
values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
for(String s : values) {
if(s.contains("Data:")) {
data = s.split(":")[1];
} else if(s.contains("UserLogged:")) {
userLogged = Boolean.parseBoolean(s.split(":")[1]);
} else if(s.contains("Attribute:")) {
attributeID.add(Integer.parseInt(s.split(":")[1]));
} else {
commandName = s;
}
}
Command command = new Command();
command.setName(commandName);
command.setExecutionPercentage(Double.parseDouble(key));
command.setAttributeID(attributeID);
command.setDataCriteria(data);
command.setUserLogging(userLogged);
commands.add(command);
}
}
} catch(Exception e) {
System.out.println(e);
}
I think you should know what exactly you're expecting while using RegEx. http://java.sun.com/developer/technicalArticles/releases/1.4regex/ should be helpful.
To answer a comment:
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
to parse above with regex (and 3 times Attribute:):
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern
.compile(
"p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)")
.matcher(parseLine);
if(m.find()) {
int p = Integer.parseInt(m.group(1));
String method = m.group(2);
String data = m.group(3);
boolean userLogged = Boolean.valueOf(m.group(4));
int at1 = Integer.parseInt(m.group(5));
int at2 = Integer.parseInt(m.group(6));
int at3 = Integer.parseInt(m.group(7));
System.out.println(p + " " + method + " " + data + " " + userLogged + " " + at1 + " " + at2 + " "
+ at3);
}
EDIT looking at your comment you still can use regex:
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true "
+ "Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern.compile("p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)").matcher(
parseLine);
if(m.find()) {
for(int i = 0; i < m.groupCount(); ++i) {
System.out.println(m.group(i + 1));
}
}
Matcher m2 = Pattern.compile("Attribute:(\\d+)").matcher(parseLine);
while(m2.find()) {
System.out.println("Attribute matched: " + m2.group(1));
}
But that depends if thre is no Attribute: names before "real" attributes (for example as method name - after p)
You can use the Scanner class. It has some helper methods to read text files
I would turn this inside out. Presently you are:
Scanning the line for a keyword: the entire line if it isn't found, which is the usual case as you have a number of keywords to process and they won't all be present on every line.
Scanning the entire line again for ':' and splitting it on all occurrences
Mostly parsing the part after ':' as an integer, or occasionally as a range.
So several complete scans of each line. Unless the file has zillions of lines this isn't a concern in itself but it demonstrates that you have got the processing back to front.