I'm trying to find out if there are any methods in Java which would me achieve the following.
I want to pass a method a parameter like below
"(hi|hello) my name is (Bob|Robert). Today is a (good|great|wonderful) day."
I want the method to select one of the words inside the parenthesis separated by '|' and return the full string with one of the words randomly selected. Does Java have any methods for this or would I have to code this myself using character by character checks in loops?
You can parse it by regexes.
The regex would be \(\w+(\|\w+)*\); in the replacement you just split the argument on the '|' and return the random word.
Something like
import java.util.regex.*;
public final class Replacer {
//aText: "(hi|hello) my name is (Bob|Robert). Today is a (good|great|wonderful) day."
//returns: "hello my name is Bob. Today is a wonderful day."
public static String getEditedText(String aText){
StringBuffer result = new StringBuffer();
Matcher matcher = fINITIAL_A.matcher(aText);
while ( matcher.find() ) {
matcher.appendReplacement(result, getReplacement(matcher));
}
matcher.appendTail(result);
return result.toString();
}
private static final Pattern fINITIAL_A = Pattern.compile(
"\\\((\\\w+(\\\|\w+)*)\\\)",
Pattern.CASE_INSENSITIVE
);
//aMatcher.group(1): "hi|hello"
//words: ["hi", "hello"]
//returns: "hello"
private static String getReplacement(Matcher aMatcher){
var words = aMatcher.group(1).split('|');
var index = randomNumber(0, words.length);
return words[index];
}
}
(Note that this code is written just to illustrate an idea and probably won't compile)
May be it helps,
Pass three strings("hi|hello"),(Bob|Robert) and (good|great|wonderful) as arguments to the method.
Inside method split the string into array
by, firststringarray[]=thatstring.split("|"); use this for other two.
and Use this to use random string selection.
As per my knowledge java don't have any method to do it directly.
I have to write code for it or regexe
I don't think Java has anything that will do what you want directly. Personally, instead of doing things based on regexps or characters, I would make a method something like:
String madLib(Set<String> greetings, Set<String> names, Set<String> dispositions)
{
// pick randomly from each of the sets and insert into your background string
}
There is no direct support for this. And you should ideally not try a low level solution.
You should search for 'random sentence generator'. The way you are writing
`(Hi|Hello)`
etc. is called a grammar. You have to write a parser for the grammar. Again there are many solutions for writing parsers. There are standard ways to specify grammar. Look for BNF.
The parser and generator problems have been solved many time over, and the interesting part of your problem will be writing the grammar.
Java does not provide any readymade method for this. You can use either Regex as described by Penartur or create your own java method to split Strings and store random words. StringTokenizer class can help you if following second approach.
Related
For part of my Java assignment I'm required to select all records that have a certain area code. I have custom objects within an ArrayList, like ArrayList<Foo>.
Each object has a String phoneNumber variable. They are formatted like "(555) 555-5555"
My goal is to search through each custom object in the ArrayList<Foo> (call it listOfFoos) and place the objects with area code "616" in a temporaryListOfFoos ArrayList<Foo>.
I have looked into tokenizers, but was unable to get the syntax correct. I feel like what I need to do is similar to this post, but since I'm only trying to retrieve the first 3 digits (and I don't care about the remaining 7), this really didn't give me exactly what I was looking for. Ignore parentheses with string tokenizer?
What I did as a temporary work-around, was...
for (int i = 0; i<listOfFoos.size();i++){
if (listOfFoos.get(i).getPhoneNumber().contains("616")){
tempListOfFoos.add(listOfFoos.get(i));
}
}
This worked for our current dataset, however, if there was a 616 anywhere else in the phone numbers [like "(555) 616-5555"] it obviously wouldn't work properly.
If anyone could give me advice on how to retrieve only the first 3 digits, while ignoring the parentheses, I would greatly appreciate it.
You have two options:
Use value.startsWith("(616)") or,
Use regular expressions with this pattern "^\(616\).*"
The first option will be a lot quicker.
areaCode = number.substring(number.indexOf('(') + 1, number.indexOf(')')).trim() should do the job for you, given the formatting of phone numbers you have.
Or if you don't have any extraneous spaces, just use areaCode = number.substring(1, 4).
I think what you need is a capturing group. Have a look at the Groups and capturing section in this document.
Once you are done matching the input with a pattern (for example "\((\\d+)\) \\d+-\\d+"), you can get the number in the parentheses using a matcher (object of java.util.regex.Matcher) with matcher.group(1).
You could use a regular expression as shown below. The pattern will ensure the entire phone number conforms to your pattern ((XXX) XXX-XXXX) plus grabs the number within the parentheses.
int areaCodeToSearch = 555;
String pattern = String.format("\\((%d)\\) \\d{3}-\\d{4}", areaCodeToSearch);
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(phoneNumber);
if (m.matches()) {
String areaCode = m.group(1);
// ...
}
Whether you choose to use a regular expression versus a simple String lookup (as mentioned in other answers) will depend on how bothered you are about the format of the entire string.
I'm trying to replace several different characters with different values. For example, if I have: #love hate then I would like to do is get back %23love%20hate
Is it something to do with groups? i tried to understand using groups but i really didn't understand it.
You can try to do this:
String encodedstring = URLEncoder.encode("#love hate","UTF-8");
It will give you the result you want. To revers it you should do this:
String loveHate = URLDecoder.decode(encodedstring);
You don't need RegEx to replace single characters. RegEx is an overkill for such porposes. You can simply use the plain replace method of String class in a loop, for each character that you want to replace.
String output = input.replace("#", "%23");
output = output.replace(" ", "%20");
How many such characters do you want to get replaced?
If you are trying to encode a URL to utf-8 or some encoding using existing classes will be much easier
eg.
commons-httpclient project
URIUtil.encodeWithinQuery(input,"UTF-8");
No, you will need multiple replaces. Another option is to use group to find the next occurrence of one of several strings, inspect what the string is and replace appropriately, perhaps using a map.
i think what you want to achieve is kind of url encoding instead of pure replacement.
see some answers on this thread of SO , especially the one with 7 votes which may be more interesting for you.
HTTP URL Address Encoding in Java
As Mat said, the best way to solve this problem is with URLEncoder. However, if you insist on using regex, then see the sample code in the documentation for java.util.regex.Matcher.appendReplacement:
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
Within the loop, you can use m.group() to see what substring matched and then do a custom substitution based on that. This technique can be used for replacing ${variables} by looking them up in a map, etc.
Hey, I've been trying to figure out why this regular expression isn't matching correctly.
List l_operators = Arrays.asList(Pattern.compile(" (\\d+)").split(rtString.trim()));
The input string is "12+22+3"
The output I get is -- [,+,+]
There's a match at the beginning of the list which shouldn't be there? I really can't see it and I could use some insight. Thanks.
Well, technically, there is an empty string in front of the first delimiter (first sequence of digits). If you had, say a line of CSV, such as abc,def,ghi and another one ,jkl,mno you would clearly want to know that the first value in the second string was the empty string. Thus the behaviour is desirable in most cases.
For your particular case, you need to deal with it manually, or refine your regular expression somehow. Like this for instance:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(rtString);
if (m.find()) {
List l_operators = Arrays.asList(p.split(rtString.substring(m.end()).trim()));
// ...
}
Ideally however, you should be using a parser for these type of strings. You can't for instance deal with parenthesis in expressions using just regular expressions.
That's the behavior of split in Java. You just have to take it (and deal with it) or use other library to split the string. I personally try to avoid split from Java.
An example of one alternative is to look at Splitter from Google Guava.
Try Guava's Splitter.
Splitter.onPattern("\\d+").omitEmptyStrings().split(rtString)
Imagine this string:
if(editorPart instanceof ITextEditor){
ITextEditor editor = (ITextEditor)editorPart;
selection = (ITextSelection) editor.getSelectionProvider().getSelection();
}else if( editorPart instanceof MultiPageEditorPart){
//this would be the case for the XML editor
selection = (ITextSelection) editorPart.getEditorSite().getSelectionProvider().getSelection();
}
I can see, visually, that the "common" start in each of these lines is two tab characters. Is there a regular expression that would replace -- only at the beginning of each line (including the first and last line), this common start, such that after the regex I'd end up with that same string, only essentially un-indented?
I can't simply search for "two tabs" in this case because there might be two tabs elsewhere in the text but not at the start of a line.
I've implemented this functionality with a different method but thought it'd be a fun regex challenge, if it's possible at all
The ^ symbol in a regular expression matches the beginning of a line. So:
/^\t\t//g
Would remove two tabs at the beginning of a line.
In general (i.e. if you want to match an arbitrary prefix, not necessarily two tabs), there may or may not be a way. It depends on which regular expression engine you're using. I would imagine that maybe something roughly like this might work:
\B^(.+).*?$(?:^\1.*?$)+\E
note that I've probably screwed up the regex syntax, just think of it as regex pseudocode of sorts (\B is beginning of string, ^ is beginning of line, $ is end of line, \E is end of string)
But this really isn't a job I would do with a regular expression. A simple character-by-character parser seems much better suited.
Not in one regex. You need to make two passes: matches() to find the longest common prefix, then replaceAll() to remove it. Here's my best solution:
import java.util.regex.*;
public class Test
{
public static void main(String[] args) throws Exception
{
String target =
"\t\tif(editorPart instanceof ITextEditor){\n"
+ "\t\t\tITextEditor editor = (ITextEditor)editorPart;\n"
+ "\t\t\tselection = (ITextSelection) fee.fie().fum();\n"
+ "\t\t}else if( editorPart instanceof MultiPageEditorPart){\n"
+ "\t\t\t//this would be the case for the XML editor\n"
+ "\t\t\tselection = (ITextSelection) fee.fie().foe().fum();\n"
+ "\t\t}";
System.out.printf("%n%s%n", target);
Pattern p = Pattern.compile("^(\\s+).*+(?:\n\\1.*+)*+");
Matcher m = p.matcher(target);
if (m.matches())
{
String indent = m.group(1);
String result = target.replaceAll("(?m)^" + indent, "");
System.out.printf("%n%s%n", result);
}
}
}
Of course, this assumes (as Jonathan Leffler hinted at in his comment to your question) that the target string is not part of a larger string, and you're only removing whitespace. Without those assumptions the task becomes a lot more complex.
It's absolutely possible. As everyone points out, I'd never inflict this on a real project, though.
My answer, if you're curious, is here. I tried writing it in perl, but it doesn't support variable-length lookbehinds.
EDIT: Fixed it! The linked code now works. If you'd like hints, just comment -- I don't want to give it away if you want to solve it yourself, though.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
For parsing player commands, I've most often used the split method to split a string by delimiters and then to then just figure out the rest by a series of ifs or switches. What are some different ways of parsing strings in Java?
I really like regular expressions. As long as the command strings are fairly simple, you can write a few regexes that could take a few pages of code to manually parse.
I would suggest you check out http://www.regular-expressions.info for a good intro to regexes, as well as specific examples for Java.
I assume you're trying to make the command interface as forgiving as possible. If this is the case, I suggest you use an algorithm similar to this:
Read in the string
Split the string into tokens
Use a dictionary to convert synonyms to a common form
For example, convert "hit", "punch", "strike", and "kick" all to "hit"
Perform actions on an unordered, inclusive base
Unordered - "punch the monkey in the face" is the same thing as "the face in the monkey punch"
Inclusive - If the command is supposed to be "punch the monkey in the face" and they supply "punch monkey", you should check how many commands this matches. If only one command, do this action. It might even be a good idea to have command priorities, and even if there were even matches, it would perform the top action.
Parsing manually is a lot of fun... at the beginning:)
In practice if commands aren't very sophisticated you can treat them the same way as those used in command line interpreters. There's a list of libraries that you can use: http://java-source.net/open-source/command-line. I think you can start with apache commons CLI or args4j (uses annotations). They are well documented and really simple in use. They handle parsing automatically and the only thing you need to do is to read particular fields in an object.
If you have more sophisticated commands, then maybe creating a formal grammar would be a better idea. There is a very good library with graphical editor, debugger and interpreter for grammars. It's called ANTLR (and the editor ANTLRWorks) and it's free:) There are also some example grammars and tutorials.
I would look at Java migrations of Zork, and lean towards a simple Natural Language Processor (driven either by tokenizing or regex) such as the following (from this link):
public static boolean simpleNLP( String inputline, String keywords[])
{
int i;
int maxToken = keywords.length;
int to,from;
if( inputline.length() = inputline.length()) return false; // check for blank and empty lines
while( to >=0 )
{
to = inputline.indexOf(' ',from);
if( to > 0){
lexed.addElement(inputline.substring(from,to));
from = to;
while( inputline.charAt(from) == ' '
&& from = keywords.length) { status = true; break;}
}
}
return status;
}
...
Anything which gives a programmer a reason to look at Zork again is good in my book, just watch out for Grues.
...
Sun itself recommends staying away from StringTokenizer and using the String.spilt method instead.
You'll also want to look at the Pattern class.
Another vote for ANTLR/ANTLRWorks. If you create two versions of the file, one with the Java code for actually executing the commands, and one without (with just the grammar), then you have an executable specification of the language, which is great for testing, a boon for documentation, and a big timesaver if you ever decide to port it.
If this is to parse command lines I would suggest using Commons Cli.
The Apache Commons CLI library provides an API for processing command line interfaces.
Try JavaCC a parser generator for Java.
It has a lot of features for interpreting languages, and it's well supported on Eclipse.
#CodingTheWheel Heres your code, a bit clean up and through eclipse (ctrl+shift+f) and the inserted back here :)
Including the four spaces in front each line.
public static boolean simpleNLP(String inputline, String keywords[]) {
if (inputline.length() < 1)
return false;
List<String> lexed = new ArrayList<String>();
for (String ele : inputline.split(" ")) {
lexed.add(ele);
}
boolean status = false;
to = 0;
for (i = 0; i < lexed.size(); i++) {
String s = (String) lexed.get(i);
if (s.equalsIgnoreCase(keywords[to])) {
to++;
if (to >= keywords.length) {
status = true;
break;
}
}
}
return status;
}
A simple string tokenizer on spaces should work, but there are really many ways you could do this.
Here is an example using a tokenizer:
String command = "kick person";
StringTokenizer tokens = new StringTokenizer(command);
String action = null;
if (tokens.hasMoreTokens()) {
action = tokens.nextToken();
}
if (action != null) {
doCommand(action, tokens);
}
Then tokens can be further used for the arguments. This all assumes no spaces are used in the arguments... so you might want to roll your own simple parsing mechanism (like getting the first whitespace and using text before as the action, or using a regular expression if you don't mind the speed hit), just abstract it out so it can be used anywhere.
When the separator String for the command is allways the same String or char (like the ";") y recomend you use the StrinkTokenizer class:
StringTokenizer
but when the separator varies or is complex y recomend you to use the regular expresions, wich can be used by the String class itself, method split, since 1.4. It uses the Pattern class from the java.util.regex package
Pattern
If the language is dead simple like just
VERB NOUN
then splitting by hand works well.
If it's more complex, you should really look into a tool like ANTLR or JavaCC.
I've got a tutorial on ANTLR (v2) at http://javadude.com/articles/antlrtut which will give you an idea of how it works.
JCommander seems quite good, although I have yet to test it.
If your text contains some delimiters then you can your split method.
If text contains irregular strings means different format in it then you must use regular expressions.
split method can split a string into an array of the specified substring expression regex.
Its arguments in two forms, namely: split (String regex) and split (String regex, int limit), which split (String regex) is actually by calling split (String regex, int limit) to achieve, limit is 0. Then, when the limit> 0 and limit <0 represents what?
When the jdk explained: when limit> 0 sub-array lengths up to limit, that is, if possible, can be limit-1 sub-division, remaining as a substring (except by limit-1 times the character has string split end);
limit <0 indicates no limit on the length of the array;
limit = 0 end of the string empty string will be truncated.
StringTokenizer class is for compatibility reasons and is preserved legacy class, so we should try to use the split method of the String class.
refer to link