Java Regex And XML - java

I've been working on a weekend project, a simple, lightweight XML parser, just for fun, to learn more about Regexes. I've been able to get data in atributes and elements, but am having a hard time separating tags. This is what I have:
CharSequence inputStr = "<a>test</a>abc<b1>test2</b1>abc1";
String patternStr = openTag+"(.*?)"+closeTag;
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
StringBuffer buf = new StringBuffer();
boolean found = false;
while ((found = matcher.find())) {
String replaceStr = matcher.group();
matcher.appendReplacement(buf, "found tag (" + replaceStr + ")");
}
matcher.appendTail(buf);
String result = buf.toString();
System.out.println(result);
Output: found tag (<a>test</a>abc<b1>test2</b1>)abc1
I need to to end the 'found tag' at each tag, not the whole group. Any way I can have it do that? Thanks.

You can try with something as follows to get it working as you require;
int count = matcher.groupCount();
for(int i=0;i<count;i++)
{
String replaceStr = matcher.group(i);
matcher.appendReplacement(buf, "found tag (" + replaceStr + ")");
}

Related

examples on how to parse string in java

I have the following string that I need to parse/extract the '20000' out of it.
"where f_id = '20000' and (flag is true or flag is null)"
Any sugguestions on best way to do this?
Here's more code to help understand:
List<ReportDto> reportDtoList = new ArrayList<ReportDto>();
for (Report report : reportList) {
List<ReportDetailsDto> ReportDetailsDtoList = new ArrayList<ReportDetailsDto>();
ReportDto reportDto = new ReportDto();
reportDto.setReportId(report.getReportId());
reportDto.setReportName(report.getName());
Pattern p = Pattern.compile("=\\s'[0-9]+'");
String whereClause = report.getWhereClause();
Matcher m = p.matcher(whereClause);
Confused of what to do after this?
You can use this regex to extract a single nonegative integer from your String
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println(m.group());
}
Or if you want to preserve the single quotes :
Pattern p = Pattern.compile("['0-9]+");
This will extract a pattern that includes '=' and a single space after that. It will print a String containing the number without '=' or the space. So if this matches you know there is a number after a '='
Pattern p = Pattern.compile("=\\s'[0-9]+");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println(m.group().substring(3));
}
EDIT
based on the code you added this is how it would look like
List<ReportDto> reportDtoList = new ArrayList<ReportDto>();
Pattern p = Pattern.compile("=\\s'[0-9]+");
for (Report report : reportList) {
List<ReportDetailsDto> ReportDetailsDtoList = new ArrayList<ReportDetailsDto>();
ReportDto reportDto = new ReportDto();
reportDto.setReportId(report.getReportId());
reportDto.setReportName(report.getName());
String whereClause = report.getWhereClause();
Matcher m = p.matcher(whereClause);
if (m.find()) {
String foundThis = m.group().substring(3);
// do something with foundThis
} else {
// didn't find a number or =
}
}
Try this:
Pattern p = Pattern.compile("-?\\d+");
String s = "your string here";
Matcher m = p.matcher(s);
List<String> extracted = new ArrayList<String>();
while (m.find()) {
extracted.add(m.group());
}
for floats and negatives
Pattern p = Pattern.compile("(-?\\d+)(\\.\\d+)?");
String s = "where f_id = '20000' 3.2 and (flag is true or flag is null)";
Matcher m = p.matcher(s);
List<String> extracted = new ArrayList<String>();
while (m.find()) {
extracted.add(m.group());
}
for (String g : extracted)
System.out.println(g);
prints out
20000
3.2

how to match by regex "to" in url? [duplicate]

This question already has an answer here:
Extract parameters and their values from query string in Java
(1 answer)
Closed 7 years ago.
I have this url
http://host.com/routingRequest?returnJSON=true&timeout=60000&to=s%3A73746647+d%3Afalse+f%3A-1.0+x%3A-74.454383+y%3A40.843021+r%3A-1.0+cd%3A-1.0+fn%3A-1+tn%3A-1+bd%3Atrue+st%3ACampus%7EDr&returnGeometries=true&nPaths=1&returnClientIds=true&returnInstructions=true
&hour=12+00&from=s%3A-1+d%3Afalse+f%3A-1.0+x%3A-74.241765+y%3A40.830182+r%3A-1.0+cd%3A-1.0+fn%3A56481485+tn%3A26459042+bd%3Afalse+st%3AClaremont%7EAve&sameResultType=true&type=HISTORIC_TIME
and i try to fetch
to = -74.454383, 40.843021
from = -74.241765, 40.830182
hour = 12+00
with this code:
String patternString = "(x%3A) (.+?) (y%3A) (.+?) (r%3A)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(freshResponse.regression_requestUrl);
H4 h4 = new H4().appendText("Response ID: " + id);
Ul ul = new Ul();
Li li1 = new Li();
Li li2 = new Li();
if (matcher.find()) {
li1.appendText("From: " + matcher.group(1) + ", " + matcher.group(2));
}
if (matcher.find()) {
li2.appendText("To: " + matcher.group(1) + ", " + matcher.group(2));
}
patternString = "(&hour=) (.+?) (&from=)";
pattern = Pattern.compile(patternString);
matcher = pattern.matcher(freshResponse.regression_requestUrl);
Li li3 = new Li();
if (matcher.find()) {
li3.appendText("At: " + matcher.group(1));
}
but i get no matches. what am i missing?
could I have done this without regex more easily?
Map params = new HashMap();
String url = "http://host.com/routingRequest?returnJSON=true&timeout=60000&to=s%3A73746647+d%3Afalse+f%3A-1.0+x%3A-74.454383+y%3A40.843021+r%3A-1.0+cd%3A-1.0+fn%3A-1+tn%3A-1+bd%3Atrue+st%3ACampus%7EDr&returnGeometries=true&nPaths=1&returnClientIds=true&returnInstructions=true
&hour=12+00&from=s%3A-1+d%3Afalse+f%3A-1.0+x%3A-74.241765+y%3A40.830182+r%3A-1.0+cd%3A-1.0+fn%3A56481485+tn%3A26459042+bd%3Afalse+st%3AClaremont%7EAve&sameResultType=true&type=HISTORIC_TIME";
List<NameValuePair> params = URLEncodedUtils.parse(new URI(url), "UTF-8");
for (NameValuePair param : params) {
map.put(param.getName(),param.getValue());
}
You need to use apache httpclient to get the NameValuePair class.

How to replace space with hyphen?

I want to replace Space with "-" what is the way
Suppose my code is
StringBuffer tokenstr=new StringBuffer();
tokenstr.append("technician education systems of the Cabinet has approved the following");
I want output
"technician-education-systems-of-the-Cabinet-has-approved-the-following"
thanks
Like this,
StringBuffer tokenstr = new StringBuffer();
tokenstr.append("technician education systems of the Cabinet has approved the following");
System.out.println(tokenstr.toString().replaceAll(" ", "-"));
and like this as well
System.out.println(tokenstr.toString().replaceAll("\\s+", "-"));
Do like this
StringBuffer tokenstr=new StringBuffer();
tokenstr.append("technician education systems of the Cabinet has approved the following".replace(" ", "-"));
System.out.print(tokenstr);
You may try this :
//First store your value in string object and replace space with "-" before appending it to StringBuffer.
String str = "technician education systems of the Cabinet has approved the following";
str = str.replaceAll(" ", "-");
StringBuffer tokenstr=new StringBuffer();
tokenstr.append(str);
System.out.println(tokenstr);
you need to write custom replaceAll method. Where you need to find out src string index and replace those string sub-string with destination string.
Please find a code snippet by Jon Skeet
If you dont want to jump back and forth between StringBuffer and String classes, you can do this:
StringBuffer tokenstr = new StringBuffer();
tokenstr.append("technician education systems of the Cabinet has approved the following");
int idx=0;
while( idx = tokenstr.indexOf(" ", idx) >= 0 ) {
tokenstr.replace(idx,idx+1,"-");
}
If you have the StringBuffer object then you need to iterate it and replace the character:
for (int index = 0; index < tokenstr.length(); index++) {
if (tokenstr.charAt(index) == ' ') {
tokenstr.setCharAt(index, '-');
}
}
or convert it into String then replace as below :
String value = tokenstr.toString().replaceAll(" ", "-");
/You can use below method pass your String parameter and get result as String spaces replaced with hyphen /
private static String replaceSpaceWithHypn(String str) {
if (str != null && str.trim().length() > 0) {
str = str.toLowerCase();
String patternStr = "\\s+";
String replaceStr = "-";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(str);
str = matcher.replaceAll(replaceStr);
patternStr = "\\s";
replaceStr = "-";
pattern = Pattern.compile(patternStr);
matcher = pattern.matcher(str);
str = matcher.replaceAll(replaceStr);
}
return str;
}

Extracting certain pattern from log using Java

I want to extract a piece of information from a log file. The pattern that I am using is the prompt of the node-name and the command. I want to extract information of the command output and compare them. Consider the sample output as follows
NodeName > command1
this is the sample output
NodeName > command2
this is the sample output
I have tried the following code.
public static void searchcommand( String strLineString)
{
String searchFor = "Nodename> command1";
String endStr = "Nodename";
String op="";
int end=0;
int len = searchFor.length();
int result = 0;
if (len > 0) {
int start = strLineString.indexOf(searchFor);
while(start!=-1){
end = strLineString.indexOf(endStr,start+len);
if(end!=-1){
op=strLineString.substring(start, end);
}else{
op=strLineString.substring(start, strLineString.length());
}
String[] arr = op.split("%%%%%%%");
for (String z : arr) {
System.out.println(z);
}
start = strLineString.indexOf(searchFor,start+len);
}
}
}
The issue is that the code is too slow to extract the data. Is there any other way to do so?
EDIT 1
Its a log file which I have read as a string in the above code.
My suggestion..
public static void main(String[] args) {
String log = "NodeName > command1 \n" + "this is the sample output \n"
+ "NodeName > command2 \n" + "this is the sample output";
String lines[] = log.split("\\r?\\n");
boolean record = false;
String statements = "";
for (int j = 0; j < lines.length; j++) {
String line = lines[j];
if(line.startsWith("NodeName")){
if(record){
//process your statement
System.out.println(statements);
}
record = !record;
statements = ""; // Reset statement
continue;
}
if(record){
statements += line;
}
}
}
Here is my suggestion:
Use a regular expression. Here is one:
final String input = " NodeName > command1\n" +
"\n" +
" this is the sample output1 \n" +
"\n" +
" NodeName > command2 \n" +
"\n" +
" this is the sample output2";
final String regex = ".*?NodeName > command(\\d)(.*?)(?=NodeName|\\z)";
final Matcher matcher = Pattern.compile(regex, Pattern.DOTALL).matcher(input);
while(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2).trim());
}
Output:
1
this is the sample output1
2
this is the sample output2
So, to break down the regex:
First, it skips all signs until it finds the first "NodeName > command", followed by a number. This number we want to keep, to know which command created the output. Next, we grab all the following signs, until we (using lookahead) find another NodeName, or the end of the input.

Improving the code that parses a Text File

Text File(First three lines are simple to read, next three lines starts with p)
ThreadSize:2
ExistingRange:1-1000
NewRange:5000-10000
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
p:25 - CrossPromoEditItemRule Data:New UserLogged:false Attribute:1 Attribute:10107 Attribute:10108
p:20 - CrossPromoManageRules Data:Previous UserLogged:true Attribute:1 Attribute:10107 Attribute:10108
Below is the code I wrote to parse the above file and after parsing it I am setting the corresponding values using its Setter. I just wanted to know whether I can improve this code more in terms of parsing and other things also by using other way like using RegEx? My main goal is to parse it and set the corresponding values. Any feedback or suggestions will be highly appreciated.
private List<Command> commands;
private static int noOfThreads = 3;
private static int startRange = 1;
private static int endRange = 1000;
private static int newStartRange = 5000;
private static int newEndRange = 10000;
private BufferedReader br = null;
private String sCurrentLine = null;
private int distributeRange = 100;
private List<String> values = new ArrayList<String>();
private String commandName;
private static String data;
private static boolean userLogged;
private static List<Integer> attributeID = new ArrayList<Integer>();
try {
// Initialize the system
commands = new LinkedList<Command>();
br = new BufferedReader(new FileReader("S:\\Testing\\Test1.txt"));
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains("ThreadSize")) {
noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
} else if(sCurrentLine.contains("ExistingRange")) {
startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else if(sCurrentLine.contains("NewRange")) {
newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else {
allLines.add(Arrays.asList(sCurrentLine.split("\\s+")));
String key = sCurrentLine.split("-")[0].split(":")[1].trim();
String value = sCurrentLine.split("-")[1].trim();
values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
for(String s : values) {
if(s.contains("Data:")) {
data = s.split(":")[1];
} else if(s.contains("UserLogged:")) {
userLogged = Boolean.parseBoolean(s.split(":")[1]);
} else if(s.contains("Attribute:")) {
attributeID.add(Integer.parseInt(s.split(":")[1]));
} else {
commandName = s;
}
}
Command command = new Command();
command.setName(commandName);
command.setExecutionPercentage(Double.parseDouble(key));
command.setAttributeID(attributeID);
command.setDataCriteria(data);
command.setUserLogging(userLogged);
commands.add(command);
}
}
} catch(Exception e) {
System.out.println(e);
}
I think you should know what exactly you're expecting while using RegEx. http://java.sun.com/developer/technicalArticles/releases/1.4regex/ should be helpful.
To answer a comment:
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
to parse above with regex (and 3 times Attribute:):
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern
.compile(
"p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)")
.matcher(parseLine);
if(m.find()) {
int p = Integer.parseInt(m.group(1));
String method = m.group(2);
String data = m.group(3);
boolean userLogged = Boolean.valueOf(m.group(4));
int at1 = Integer.parseInt(m.group(5));
int at2 = Integer.parseInt(m.group(6));
int at3 = Integer.parseInt(m.group(7));
System.out.println(p + " " + method + " " + data + " " + userLogged + " " + at1 + " " + at2 + " "
+ at3);
}
EDIT looking at your comment you still can use regex:
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true "
+ "Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern.compile("p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)").matcher(
parseLine);
if(m.find()) {
for(int i = 0; i < m.groupCount(); ++i) {
System.out.println(m.group(i + 1));
}
}
Matcher m2 = Pattern.compile("Attribute:(\\d+)").matcher(parseLine);
while(m2.find()) {
System.out.println("Attribute matched: " + m2.group(1));
}
But that depends if thre is no Attribute: names before "real" attributes (for example as method name - after p)
You can use the Scanner class. It has some helper methods to read text files
I would turn this inside out. Presently you are:
Scanning the line for a keyword: the entire line if it isn't found, which is the usual case as you have a number of keywords to process and they won't all be present on every line.
Scanning the entire line again for ':' and splitting it on all occurrences
Mostly parsing the part after ':' as an integer, or occasionally as a range.
So several complete scans of each line. Unless the file has zillions of lines this isn't a concern in itself but it demonstrates that you have got the processing back to front.

Categories