Regex Help for a Command - java

I need to parse this Command:
direct print conference <style>:<First Name Last Name>,<First Name Last Name>,<First Name Last Name>,<title>,<conference series name>,<location>,<year>
A Example Command would be:
direct print conference ieee:Sergey Brin,Lawrence Page,,The Anatomy of a Large-Scale Hypertextual Web Search Engine,WWW,Brisbane Australia,1998
My Main Problem is (First Name Last Name) can be empty, but how do I do that with Regex?
For The (First Name Last Name) i always do ([a-zA-Z]+) ([a-zA-Z]+) but How do I definiate Empty Possible Places with Regex?
direct print conference ([a-zA-Z]+):([a-zA-Z]+) ([a-zA-Z]+),([a-zA-Z]+) ([a-zA-Z]+),([a-zA-Z]+) ([a-zA-Z]+),([^,]+),([a-zA-Z]+),([a-zA-Z ]+),([0-9]+)
That is my Regex for if the names are not empty but How i can include Empty Characters to my Regex like:
([a-zA-Z]+) ([a-zA-Z]+) OR EMPTY ?
I Hope you can Help me

Since your input is basicaly a CSV with a special start, you don't need regex here, just String.split():
String input = "direct print conference ieee:Sergey Brin,Lawrence Page,,The Anatomy of a Large-Scale Hypertextual Web Search Engine,WWW,Brisbane Australia,1998";
String[] parts = input.split(":");
String[] values = parts[1].split(",");
for(int i=0; i<values.length; i++) {
System.out.println(values[i]);
}
See it live

Related

Is there a way to find a word by searching for only part of it?

I need to take a phrase that contains a specific word, then if it does have that word even if it's part of another word, to print the entire word out.
I think how to find the word "apple", but I can't figure how to find the word "appletree".
So far, I have some code that finds the word apple and prints that out.
String phrase = "She's sitting under an appletree!";
if (phrase.contains("apple")) {
System.out.println("apple");
} else {
System.out.println("none");
}
How do I print "appletree"?
Use regex for a 1-liner:
String target = phrase.replaceAll(".*?(\\w*apple\\w*).*", "$1");
This works by matching (and thus replacing) the entire input, but capturing the target then using a backreference ($1) to the captured input resulting in just the target being returned.
The word in which apple appears is matched using \\w* (ie any number of word chars) at either end of apple. The minimum number of leading chars are matched outside the target by using a reluctant quantifier .*?, otherwise that expression would match all the way to apple, which would miss words like dapple.
Test code:
String phrase = "She's sitting under an appletree!";
String target = phrase.replaceAll(".*?(\\w*apple\\w*).*", "$1");
System.out.println(target);
Output:
appletree
You could import a scanner to read the phrase. You would use the scanner.next() to capture each token of input into a String variable "s", in this case each word, and then use the if statement if(s.contains("apple")) then System.out.println(s).
Hope this helps!
Robin.
without using regex you could simply split the sentence into words, loop through and check if it contains the requested word - old school style
String [] arr = phrase.split ("\\s+");
for (String word : arr) {
if (word.contains("apple")) return word;
}

Use substring to retain middle value in string

String s = "John Stuart Mill";
String aFriendlyAssigneeName = s.substring(s.lastIndexOf('-')+1);
I'm currently able to remove jstm - from jstm - John Stuart Mill but I'm not sure how to now remove everything after John.
All data will be in the format initials - Fist Middle Last. Basically I just want to strip everything except First.
How can I accomplish this? Perhaps by removing everything after the third white space...
I'd just use this, should be fast enough, and quite short:
String aFriendlyAssigneeName = s.split(" ")[2];
(Splits the string at the spaces in it, and takes the third member of the array, which should be the first name if they're all in that format.)
This should work:
String s = "jstm - John Stuart Mill";
String aFriendlyAssigneeName = s.substring(s.lastIndexOf('-')+1);
String aFriendlyAssigneeName = aFriendlyAssigneeName.substring(aFriendlyAssigneeName.indexOf(' '));
After you have removed th Initials, the firstname ends after the first blank.
You are looking for the following method -
s.substring(startIndex, endIndex);
This gives a begin and end index, this will help you to easily get the middle of any String.
You can then find the last index with a bit of ( I dare say ) magic...
endIndex = indexOf(" ", fromIndex)
Where from index is
s.lastIndexOf('-')+1
Alternatively
If substring is no "hard" requirement, try using
String[] words = s.split(" ");
This will return an array of all values separated by the space.
You can then just select the index of the word. ( This case words[2] )
Why do not you find the substring after the first occurrence of the space in the string that you found without initials?
aFriendlyAssigneeName = aFriendlyAssigneeName.substring(aFriendlyAssigneeName.indexOf(' '));
In my opinion this is a job for a regex: .* - (\w+)? .*
final String value = "jstm - John Stuart Mill";
final Matcher matcher = Pattern.compile(".* - (\\w+)? .*").matcher(value);
matcher.matches();
System.out.println(matcher.group(1));
In my opinion using a regex vs substring:
Pros:
More clear on what you expect as input and what you intent to capture.
Easily modified/extended if input changes or you want to capture some other part.
Cons:
Regexes can look more cryptic to someone that's not used to them.

Matching and sorting a Bukkit ChatColor expression

I'm splitting up a String by spaces and then checking each piece if it contains a code (&a, &l, etc). If it matches, I have to grab the codes that are beside each other and then order them alphanumerically (0, 1, 2... a, b, c...).
Here is what I tried so far:
String message = "&l&aCheckpoint&m&6doreime";
String[] parts = message.split(" "); // This may not be needed for the example, but I'm only using one word for simplicity here
List<String> orderedMessage = new ArrayList<>();
Pattern pattern = Pattern.compile("((?:&|\u00a7)[0-9A-FK-ORa-fk-or])(.*?)"); // Completely matches the entire pattern, not what i want
for (String part : parts) {
if (pattern.matcher(part).matches()) {
List<String> orderedParts = new ArrayList<>();
// what do i do?
}
}
I need to change the pattern value so it matches groups like this:
Match: &l&aCheckpoint
Groups that I need: [&l, &a, Checkpoint]
Match: &m&6doreime
Groups that I need: [&m, &6, doreime]
How can I match each shown Match and split it into the 3 groups (where it splits each code section (&[0-9A-FK-ORa-fk-or]) and the remaining text until another code section?
Info: For anyone who is wondering why, when you submit color/format coded text to Minecraft, colors have to come first, or the format ([a-fk-or]) codes are ignored because of how Minecraft has parsed color codes since 1.5. By sorting them and rebuilding the message, it won't rely on users or developers getting the order correct.
You can get what you are after by using a slightly more complicated regex
(((?:&|§)[0-9A-FK-ORa-fk-or])+)([^&]*)
Breaking it down we have two important capturing groups
(((?:&|§)[0-9A-FK-ORa-fk-or])+)
This will capture one or more code sections of and & followed by a character
([^&]*)
The second grabs any number of non & characters which will get you the remainder of that section. (This is slightly different behavior than the regex you provided - things more complicated if & is a legal character in the string.
Putting that regex into use with a Matcher you can do the following,
String input = "&l&aCheckpoint&m&6doreime";
Pattern pattern = Pattern.compile("(((?:&|§)[0-9A-FK-ORa-fk-or])+)([^&]*)");
Matcher patternMatcher = pattern.matcher(input);
while(patternMatcher.find()){
String[] codes = patternMatcher.group(1).split("(?=&)");
String rest = patternMatcher.group(3);
}
Which will loop twice, giving you
codes = ["&l", "&a"]
rest = "Checkpoint"
on the first loop and the following on the second
codes = ["&m", "&6"]
rest = "doreime"

Tokenizer skipping blank values before the split - Java

I used Tokenizer to split a text file which was separated like so:
FIRST NAME, MIDDLE NAME, LAST NAME
harry, rob, park
tom,,do
while (File.hasNext())
{
StringTokenizer strTokenizer = new StringTokenizer(File.nextLine(), ",");
while (strTokenizer.hasMoreTokens())
{
I have used this method to split it.
The problem is that when there is missing data (for example Tom does not have a middle name) it will ignore the split (,) and register the next value as the middle name.
How do I get it to register it as blank instead?
based on Davio's answer you can manage the blank and replace it with your own String :
String[] result = File.nextLine().split(",");
for (int x = 0; x < result.length; x++){
if (result[x].isEmpty()) {
System.out.println("Undefined");
} else
System.out.println(result[x]);
}
Use String.split instead.
"tom,,do".split(",") will yield an array with 3 elements: tom, (blank) and do
Seems to me like you have 2 solutions:
You either double the comma in the file as suggested by ToYonos OR.
You can count the tokens before you assign the values to the variables using the countTokens() method, if there are only 2 tokens that means the person doesn't have a middle name.
From JavaDoc of StringTokenizer
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead
So, you don't need the use of StringTokenizer.
An example with String.split
String name = "name, ,last name";
String[] nameParts = name.split(",");
for (String part : nameParts) {
System.out.println("> " + part);
}
this produces the following output
> name
>
> last name

Complex File content search in java

I have a file,having the content 'HREC ZZ INCOK4 ZZ BEOINDIANEX ICES1_5P CHCAE02 71484 20131104 1230'(first line of file ).I need to reach the 8th word, that could be CHCAE02 or CHCAI02 (here word is determined by space ) and need some logic checking on it.How can I achieve this with java .plz help me .It is urgent.Below shown is the full file content.
HREC ZZ INCOK4 ZZ BEOINDIANEX ICES1_5P CHCAE0271484201311041230
INCOK4104112013CHA Not Registered;IEC Not Registered;Invalid Bank Code;Authorised Dealer Code of IEC Not Found;Country of Destination can not be India;Wrong Port of destination:INCOK4;Wrong Port of destination:INCOK4;Wrong RITC Code For Inv./Item No:1/1;
TREC71484
There can be many ways to fetch the 8th column-
Using String.split(String regex)
String word = row.split("\W+")[7];
if column matches certain pattern like digits count and only digits then
String regex = "[0-9]{5}"; -- matches a word between 0 and 9 and 5 length.
Try String.split(regex)
String words[] = line.split(" ");
String eightWord = words[7];

Categories