How do I stop regex after finding "Message: "? - java

I'm splitting the body of a JSON message with the regex ":|\n" and storing the values into an array. I would like to get assistance with stopping my regex expression from splitting the message once it finds "Message: ".
In the JSON body, each section is separated by a new line, so the body looks similar to this:
{"body": "Name: Alfred Alonso\nCompany: null\nEmail: 123#abc.com\nPhone Number: 123-456-9999\nProject Type: Existing\nContact by: Email\nTime Frame: within 1 month\nMessage: Hello,\nThis is my message.\nThank You,\nJohn Doe"}
The code below works perfectly when the user doesn't create a new line within the message, so the entire message gets stored as one array value.
Thank you to anyone that can help me fix this!
String[] messArr = body.split(":|\n");
for (int i = 0; i < messArr.length; i++)
messArr[i] = messArr[i].trim();
if ("xxx".equals(eventSourceARN)) {
name = messArr[1];
String[] temp;
String delimiter = " ";
temp = name.split(delimiter);
name = temp[0];
String lastName = temp[1];
company = messArr[3];
email = messArr[5];
phoneNumber = messArr[7];
projectType = messArr[9];
contactBy = messArr[11];
timeFrame = messArr[13];
message = messArr[15];
I would like
messArr[14] = "Message"
messArr[15] = "Hello, This is my message. Thank you, John Doe"
This is what I get
[..., Message, Hello,, This is my message., Thank You, John Doe].
messArr[14] = "Message"
messArr[15] = "Hello,"
messArr[16] = "This is my message."
messArr[17] = "Thank You,"
messArr[18] = "John Doe"

Instead of using split, you can use a find loop, e.g.
Pattern p = Pattern.compile("([^:\\v]+): |((?<=Message: )(?s:.*)|(?<!$).*)\\R?");
List<String> result = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
result.add(m.start(1) != -1 ? m.group(1) : m.group(2));
Test
String input = "Name: Alfred Alonso\n" +
"Company: null\n" +
"Email: 123#abc.com\n" +
"Phone Number: 123-456-9999\n" +
"Project Type: Existing\n" +
"Contact by: Email\n" +
"Time Frame: within 1 month\n" +
"Message: Hello,\n" +
"This is my message.\n" +
"Thank You,\n" +
"John Doe";
Pattern p = Pattern.compile("([^:\\v]+): |((?<=Message: )(?s:.*)|(?!$).*)\\R?");
List<String> result = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
result.add(m.start(1) != -1 ? m.group(1) : m.group(2));
for (int i = 0; i < result.size(); i++)
System.out.println("result[" + i + "]: " + result.get(i));
Output
result[0]: Name
result[1]: Alfred Alonso
result[2]: Company
result[3]: null
result[4]: Email
result[5]: 123#abc.com
result[6]: Phone Number
result[7]: 123-456-9999
result[8]: Project Type
result[9]: Existing
result[10]: Contact by
result[11]: Email
result[12]: Time Frame
result[13]: within 1 month
result[14]: Message
result[15]: Hello,
This is my message.
Thank You,
John Doe
Explanation
Match one of:
( Start capture #1
[^:\v]+ Match one or more characters that are not a : or a linebreak
) End capture #1
: Match, but don't capture, a : and a space (which SO is hiding here)
| or:
( Start capture #2
Match one of:
(?<=Message: )(?s:.*) Rest of input, i.e. all text including linebreaks, if the text is immediately preceded by "Message: "
| or:
(?!$) Don't match if we're already at end-of-input
.* Match 0 or more characters up to end-of-line, excluding the EOL
) End capture #2
\\R? Match, but don't capture, an optional linebreak. This doesn't apply to Message text, and is optional in case there is no Message text and no linebreak after last value

If you want to, you could do exactly what you are doing and then put things together later. As you are trimming, notice where it says Message, then know that the Message is in the next slot and beyond. Then put it back together.
int messagePosition = -1;
for (int i = 0; i < messArr.length; i++){
messArr[i] = messArr[i].trim();
if (i>0 && messArr[i-1].equals("Message")){
messagePosition =i;
}
}
if (messagePosition > -1){
for (int i=messagePosition+1; i <messArr.length; i++){
messArr[messagePosition]=messArr[messagePosition]+" "+messArr[i];
}
}
One downside is that because arrays are fixed size, you need to act as if there is nothing beyond the messagePosition. So any calculations with length will be misleading. If for some reason you are worried you will look in the slots beyond, you could add messArr[i]=""; to the second for loop after the concatenation step.

Related

Regex: starts with messages and string between parent message curly brace

I want to get all the message data. Such that it should look for message and all the data between curly braces of the parent message. With the below pattern, I am not getting all parent body.
String data = "syntax = \"proto3\";\r\n" +
"package grpc;\r\n" +
"\r\n" +
"import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n" +
"import \"google/api/annotations.proto\";\r\n" +
"import \"google/protobuf/wrappers.proto\";\r\n" +
"import \"protoc-gen-swagger/options/annotations.proto\";\r\n" +
"\r\n" +
"message Acc {\r\n" +
" message AccErr {\r\n" +
" enum Enum {\r\n" +
" UNKNOWN = 0;\r\n" +
" CASH = 1;\r\n" +
" }\r\n" +
" }\r\n" +
" string account_id = 1;\r\n" +
" string name = 3;\r\n" +
" string account_type = 4;\r\n" +
"}\r\n" +
"\r\n" +
"message Name {\r\n" +
" string firstname = 1;\r\n" +
" string lastname = 2;\r\n" +
"}";
List<String> allMessages = new ArrayList<>();
Pattern pattern = Pattern.compile("message[^\\}]*\\}");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
String str = matcher.group();
allMessages.add(str);
System.out.println(str);
}
}
I am expecting response like below in my array list of string with size 2.
allMessage.get(0) should be:
message Acc {
message AccErr {
enum Enum {
UNKNOWN = 0;
CASH = 1;
}
}
string account_id = 1;
string name = 3;
string account_type = 4;
}
and allMessage.get(1) should be:
message Name {
string firstname = 1;
string lastname = 2;
}
First remove the input prior to "message" appearing at the start of the line, then split on newlines followed by "message" (include the newlines in the split so newlines that intervene parent messages are consumed):
String[] messages = data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)");
See live demo.
If you actually need a List<String>, pass that result to Arrays.asList():
List<String> = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)"));
The first regex matches everything from start up to, but not including, the first line that starts with message, which is replaced with a blank (ie deleted). Breaking the down:
(?sm) turns on flags s, which makes dot also match newlines, and m, which makes ^ and $ match start and end of each line
\\A means the very start of input
.*? .* means any quantity of any character (including newline as per the s flag being set), but adding ? makes this reluctant, so it matches as few characters as possible while still matching
(?=^message) is a look ahead and means the following characters are a start of a line then "message"
See regex101 live demo for a thorough explanation.
The split regex matches one or more line break sequences when they are followed by "message":
\\R+ means one or more line break sequences (all OS variants)
(?=message) is a look ahead and means the following characters are "message"
See regex101 live demo for a thorough explanation.
Try this for your regex. It anchors on message being the start of a line, and uses a positive lookahead to find the next message or the end of messages.
Pattern.compile("(?s)\r\n(message.*?)(?=(\r\n)+message|$)")
// or
Pattern.compile("(?s)\r?\n(message.*?)(?=(\r?\n)+message|$)")
No spliting, parsing, or managing nested braces either :)
https://regex101.com/r/Wa2xxx/1

Length of String within tags in java

We need to find the length of the tag names within the tags in java
{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}
so the length of Student tag is 7 and that of subject tag is 7 and that of marks is 5.
I am trying to split the tags and then find the length of each string within the tag.
But the code I am trying gives me only the first tag name and not others.
Can you please help me on this?
I am very new to java. Please let me know if this is a very silly question.
Code part:
System.out.println(
getParenthesesContent("{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}"));
public static String getParenthesesContent(String str) {
return str.substring(str.indexOf('{')+1,str.indexOf('}'));
}
You can use Patterns with this regex \\{(\[a-zA-Z\]*)\\} :
String text = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
Matcher matcher = Pattern.compile("\\{([a-zA-Z]*)\\}").matcher(text);
while (matcher.find()) {
System.out.println(
String.format(
"tag name = %s, Length = %d ",
matcher.group(1),
matcher.group(1).length()
)
);
}
Outputs
tag name = Student, Length = 7
tag name = Subject, Length = 7
tag name = Marks, Length = 5
You might want to give a try to another regex:
String s = "{Abc}{Defg}100{Hij}100{/Klmopr}{/Stuvw}"; // just a sample String
Pattern p = Pattern.compile("\\{\\W*(\\w++)\\W*\\}");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + ", length: " + m.group(1).length());
}
Output you get:
Abc, length: 3
Defg, length: 4
Hij, length: 3
Klmopr, length: 6
Stuvw, length: 5
If you need to use charAt() to walk over the input String, you might want to consider using something like this (I made some explanations in the comments to the code):
String s = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
ArrayList<String> tags = new ArrayList<>();
for(int i = 0; i < s.length(); i++) {
StringBuilder sb = new StringBuilder(); // Use StringBuilder and its append() method to append Strings (it's more efficient than "+=") String appended = ""; // This String will be appended when correct tag is found
if(s.charAt(i) == '{') { // If start of tag is found...
while(!(Character.isLetter(s.charAt(i)))) { // Skip characters that are not letters
i++;
}
while(Character.isLetter(s.charAt(i))) { // Append String with letters that are found
sb.append(s.charAt(i));
i++;
}
if(!(tags.contains(sb.toString()))) { // Add final String to ArrayList only if it not contained here yet
tags.add(sb.toString());
}
}
}
for(String tag : tags) { // Printing Strings contained in ArrayList and their length
System.out.println(tag + ", length: " + tag.length());
}
Output you get:
Student, length: 7
Subject, length: 7
Marks, length: 5
yes use regular expression, find the pattern and apply that.

Parsing a Log File to Display Data from Multiple Lines Using Regular Expressions

So I'm trying to parse a bit of code here to get message text from a log file. I'll explain as I go. Here's the code:
// Print to interactions
try
{
// assigns the input file to a filereader object
BufferedReader infile = new BufferedReader(new FileReader(log));
sc = new Scanner(log);
while(sc.hasNext())
{
String line=sc.nextLine();
if(line.contains("LANTALK")){
Document doc = Jsoup.parse(line);
Element idto = doc.select("MBXTO").first();
Element msg = doc.select("MSGTEXT").first();
System.out.println(" to " + idto.text() + " " +
msg.text());
System.out.println();
} // End of if
} // End of while
try
{
// Print to output file
sc = new Scanner (log);
while(sc.hasNext())
{
String line=sc.nextLine();
if(line.contains("LANTALK")){
Document doc = Jsoup.parse(line);
Element idto = doc.select("MBXTO").first();
Element msg = doc.select("MSGTEXT").first();
outFile.println(" to " + idto.text() + " " +
msg.text());
outFile.println();
outFile.println();
} // End of if
} // End of while
} // end of try
I'm getting input from a log file, here's a sample of what it looks like and the lines that I'm filtering out:
08:25:20.740 [D] [T:000FF0] [F:LANTALK2C] <CMD>LANMSG</CMD>
<MBXID>1124</MBXID><MBXTO>5760</MBXTO><SUBTEXT>LanTalk</SUBTEXT><MOBILEADDR>
</MOBILEADDR><LAP>0</LAP><SMS>0</SMS><MSGTEXT>and I talked to him and he
gave me a credit card number</MSGTEXT>
08:25:20.751 [+] [T:000FF0] [S:1:1:1124:5607:5] LANMSG [15/2 | 0]
08:25:20.945 [+] [T:000FF4] [S:1:1:1124:5607:5] LANMSGTYPESTOPPED [0/2 | 0]
08:25:21.327 [+] [T:000FE8] [S:1:1:1124:5607:5] LANMSGTYPESTARTED [0/2 | 0]
So far, I've been able to filter the line that contains the message (LANMSG). And from that, I've been able to get the id number of the recipient (MBXTO). But the next line contains the sender's id, which I need to pull out and display. ([S:1:1:1124:SENDERID:5]). How should I do this? Below is a copy of the output I'm getting:
to 5760 and I talked to him and he gave me a credit card number
And here's what I need to get:
SENDERID to 5760 and I talked to him and he gave me a credit card number
Any help you guys could give me on this would be great. I'm just not sure how to go about getting the information I need.
Your answer isn't clear enough, but as it seems like you have not used regex in this code... remember to specify what have you tried before asking.
Anyways the regex you're searching for is:
(\d{2}:\d{2}:\d{2}\.\d{3})\s\[D\].+<MBXID>(\d+)<\/MBXID><MBXTO>(\d+)<\/MBXTO>.+<MSGTEXT>(.+)<\/MSGTEXT>
Working example in Regex101
It should capture:
$1: 08:25:20.740
$2: 1124
$3: 5760
$4: and I talked to him and he
gave me a credit card number (Note that it also capture \n, or newline, characters).
(Also, you'll use matcher.group(number) instead of $number in Java).
And then you can use these substitution (group reference) terms to get your formatted output.
E.g.: $1 [$2] to [$3] $4
Should return:
08:25:20.740 [1124] to [5760] and I talked to him and he
gave me a credit card number
Remember, when you're going to implement regex in your Java code, you must escape all the backslashes (\), for this reason, this regex looks bigger:
Pattern pattern = Pattern.compile("(\\d{2}:\\d{2}:\\d{2}\\.\\d{3})\\s\\[D\\].+<MBXID>(\\d+)<\\/MBXID><MBXTO>(\\d+)<\\/MBXTO>.+<MSGTEXT>(.+)<\\/MSGTEXT>", Pattern.MULTILINE + Pattern.DOTALL);
// Multiline is used to capture the LANMSG more than once, and Dotall is used to make the '.' term in regex also match the newline in the input
Matcher matcher = pattern.matcher(input);
while (matcher.find()){
String output = matcher.group(1) + " [" + matcher.group(2) + "] to [" + matcher.group(3) + "] " + matcher.group(4);
System.out.println(output);
}
And for your second problem Oh, you have edited and erased it already. . . But I'll still answer:
You can parse the $2 and $3 and make them return an integer:
int id1 = Integer.parseInt(matcher.group(2));
int id2 = Integer.parseInt(matcher.group(3));
This way you can create a method to return a name for these IDs. e.g.: UserUtil.getName(int id)

How to split a text that contains String and Int and store into an ArrayList?

So basically I have a java program that reads a txt file using BufferedReader.
The text file contains all the information about the movies. The first column is the code, the second is the title and the third is the rating.
e.g
1 Titanic 44
34 Avengers 60
2 The Lord of the Rings 100
So 1 is the code, Titanic is the title and 44 is the rating etc.
My problem is that I have made a class Movie(int code, String title, int rating)
and I want to store all the informations in there but I can't figure out how to split the text. split(" ") doesn't seem like it would handle the case where a title has embedded spaces (e.g The Lord of the Rings).
What I really need is the ability to strip off the first and last fields based on space as a separator, and treat all other interior spaces as part of the title, not as separators.
You are correct, split that you already tried is inappropriate. Based on the pattern you are showing, the delimiters seem to be the first and the last space in each line and everything between them is the title. I recommend to find the index of these spaces and use substring().
Eg:
String text = "2 The Lord of the Rings 100";
int firstSpace = text.indexOf(' ');
int lastSpace = text.lastIndexOf(' ');
String codeText = text.substring(0, firstSpace);
String title = text.substring(firstSpace, lastSpace);
String ratingText = text.substring(lastSpace);
You can use split(" ")
Use
String[] foo = split(" ")
The first element in foo will be the first integer. Convert that to an integer type. Then step through the remaining elements and append them into one string, except for the last element in foo, which will be your last integer and you can convert that to an integer type.
You can try this as you mentioned I can't use split(" ") because some titles (e.g The Lord of the Rings) has spaces between the title.
String str = "2 The Lord of the Rings 100";
String[] arr = str.split("\\s+");
List<String> list = Arrays.asList(arr);
Get the data for arr
String code = arr[0];
String rate = arr[arr.length-1];
String title = "";
for (int i = 1; i < arr.length-1; i++) {
title += arr[i]+" ";
}
Run the code
System.out.println("code = " + code);
System.out.println("rate = " + rate);
System.out.println("title = " + title);
And it is the result:
code = 2
rate = 100
title = The Lord of the Rings
May this help you...
See Regexp and Matcher classes with pattern : (\d*)\s(.*)\s(\d*)
EDIT : Example
#Test
public void testRegExp(){
String text = "2 The Lord of the Rings 100";
String patternString = "(\\d*)\\s(.*)\\s(\\d*)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("CODE : " + matcher.group(1));
System.out.println("TITLE : " + matcher.group(2));
System.out.println("RATE : " + matcher.group(3));
}
}
String line = "1 Titanic 44";
int code = Integer.valueOf(line.split(" ")[0]);
String title = line.split(" ")[1];
int rank = Integer.valueOf(line.split(" ")[2]);

Java split string (check given parameters)

For a Java IRC client I have a login funtion. If you type "!LOGIN user pass" it will log you in.
Right now if a user uses a space too much or only uses 1 parameter in stead of "user" + "pass" it will crash the programm due to the way I am spliting the line.
I`m having trouble to find a solution so I can make a check if string user or string pass != null..
Any suggestions would be very much appreciated!
if (line.contains("!LOGIN")){ //!LOGIN username password
String[] parts = line.split(" ");
String user = parts[4];
String pass = parts[5];
}
In general it is recommended to verify your input before parsing it, or to test if the parsing worked.
In this case you are splitting on string, which gives you no certainty.
The minimum you should do is test if you have enough chunks as expected:
String[] parts = line.split(" ");
if (parts.length >= 5) {
// your usual logic
String user = parts[4];
String pass = parts[5];
}
But it's generally better to create a pattern that (strictly) defines the acceptable input. You first validate that the input provided matches the expected pattern. (where in your pattern you decide how lenient you want to be)
something like:
public class TestPattern {
public static String[] inputTest = new String[] {
"!LOGIN user pass",
"!LOGIN user pass ",
"!LOGIN user pass",
"!LOGIN user pass",
" !LOGIN user pass",
" !LOGIN user pass "
};
public static void main(String[] argv) {
// ^ = start of line
// \\s* = 0 or more spaces
// \\s+ = 1 or more spaces
// (\\w+) = group 1 containing 1 or more word-characters (a-zA-Z etc)
// $ = end of line
Pattern pattern = Pattern.compile("^\\s*!LOGIN\\s+(\\w+)\\s+(\\w+)\\s*$");
for (String input : inputTest) {
Matcher matcher = pattern.matcher(input);
if (!matcher.find()) {
System.out.println("input didn't match login: " + input);
continue;
}
String username = matcher.group(1);
String password = matcher.group(2);
System.out.println("username[ " + username + " ], password[ " + password + " ]");
}
}
}
You can test this also with bad input like:
public static String[] inputFailed = new String[] {
"",
"! LOGIN user pass",
"!LOGINX user pass",
"!LOGIN user pass other",
"!LOGIN userpass"
};
if (line.contains("!LOGIN")){ //!LOGIN username password
String[] parts = line.split("\\s+");
String user = parts.length > 3 ? parts[4] : "";
String pass = parts.length > 4 ? parts[5] : "";
}
Use the regex as described in the comments above, then check the size of the array.

Categories