Trouble splitting string with split(regex) Java - java

I want to split a number of strings similar to name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST] into only these tokens:
john
20
toledo
seattle
[2/8/12 15:48:01:837 MST]
I'm doing this
String delims = "(name|id|dest|from|date_time)?[:,\\s]+";
String line = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] lineTokens = line.split(delims, 5);
for (String t : lineTokens)
{
// for debugging
System.out.println (t);
// other processing I want to do
}
but every even element in lineTokens turns out to be either empty or just whitespace. Each odd element in lineTokens is what I want, i.e. lineTokens[0] is "", lineTokens[1] is "john", lineTokens[2] is "", lineTokens[3] is "20", etc. Can anyone explain what I'm doing wrong?

The problem is that your regex is not matching , id: as a whole, it is matching , as one and then id: as a 2nd match. Between these two matches you have an empty string. You need to modify it to match the whole thing. Something like this:
String delims = "(, )?(name|id|dest|from|date_time)?[:\\s]+";
http://ideone.com/Qgs8y

Why not a little less complicated regex solution.
String str = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] expr = str.split(", ");
for(String e : expr)
System.out.println(e.split(": ")[1]);
Output =
john
20
toledo
seattle
[2/8/12 15:48:01:837 MST]

I made some changes to your code:
String delims = "(name|id|dest|from|date_time)[:,\\s]+";
String line = "name: john, id: 20, dest: toledo, from: seattle, date_time: [2/8/12 15:48:01:837 MST]";
String[] lineTokens = line.split(delims);
for (String t : lineTokens)
{
// for debugging
System.out.println (t);
// other processing I want to do
}
also you should ignore the first element in lineTokens, since it's the capturing from the beginning of the line till "name:...."

Related

How add quotes in a JSON string, using Java, when the value is a date

I'm facing difficulties in a scenario that I need to read a JSON object, in Java, that has no double quotes in the keys and no values, like the example below:
"{id: 267107086801, productCode: 02-671070868, lastUpdate: 2018-07-15, lastUpdateTimestamp: 2018-07-15 01:49:58, user: {pf: {document: 123456789, name: Luis Fernando}, address: {street: Rua Pref. Josu00e9 Alves Lima,number:37}, payment: [{sequential: 1, id: CREDIT_CARD, value: 188, installments: 9}]}"
I was able to add the double quotes in the fields using the code below, with replaceAll and the Gson library:
String jsonString = gson.toJson (obj);
String jsonString = jsonString.replaceAll ("([\\ w] +) [] *:", "\" $ 1 \ ":"); // to quote before: value
jsonString = jsonString.replaceAll (": [] * ([\\ w # \\.] +)", ": \" $ 1 \ ""); // to quote after: value, add special character as needed to the exclusion list in regex
jsonString = jsonString.replaceAll (": [] * \" ([\\ d] +) \ "", ": $ 1"); // to un-quote decimal value
jsonString = jsonString.replaceAll ("\" true \ "", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll ("\" false \ "", "false"); // to un-quote boolean
However, fields with dates are being broken down erroneously, for example:
"{"id" : 267107086801,"productCode" : 02-671070868,"lastUpdate" : 2018-07-15,"lastUpdateTimestamp" : 2018-07-15 "01" : 49 : 58,"user" :{"pf":{"document" : 123456789, "name" : "Luis" Fernando},"address" :{"street" : "Rua"Pref.Josu00e9AlvesLima,"number" : 37},"payment" : [{"sequential" : 1,"id" : "CREDIT_CARD","value" : 188,"installments" : 9}]}"
Also, strings with spaces are wrong as well. How could I correct this logic? What am I doing wrong? Thanks in advance.
String incorrectJson = "{id: 267107086801, productCode: 02-671070868,"
+ " lastUpdate: 2018-07-15, lastUpdateTimestamp: 2018-07-15 01:49:58,"
+ " user: {pf: {document: 123456789, name: Luis Fernando},"
+ " address: {street: Rua Pref. Josu00e9 Alves Lima,number:37},"
+ " payment: [{sequential: 1, id: CREDIT_CARD, value: 188, installments: 9}]}";
String correctJson = incorrectJson.replaceAll("(?<=: ?)(?![ \\{\\[])(.+?)(?=,|})", "\"$1\"");
System.out.println(correctJson);
Output:
{id: "267107086801", productCode: "02-671070868", lastUpdate:
"2018-07-15", lastUpdateTimestamp: "2018-07-15 01:49:58", user: {pf:
{document: "123456789", name: "Luis Fernando"}, address: {street: "Rua
Pref. Josu00e9 Alves Lima",number:"37"}, payment: [{sequential: "1",
id: "CREDIT_CARD", value: "188", installments: "9"}]}
One downside of non-trivial regular expressions is they can be hard to read. The one I use here matches each literal value (but not values that are objects or arrays). I am using colons, commas and curly braces to guide the matching so I don’t need to care what is inside each string value, it may be any characters (except comma or right curly brace). The parts mean:
(?<=: ?): there’s a colon an optionally a blank before the value (lookbehind)
(?![ \\{\\[]) the value does not start with a blank, curly brace or square bracket (negative lookahead; blank because we don’t want a blank between the colon and the value to be taken as part of the value)
(.+?): the value consists of at least one character, as few as possible (reluctant quantifier; or regex would try to take the rest of the string)
(?=,|}): after the value comes either a comma or a right curly brace (positive lookahead).
Without being well versed in JSON I don’t think you need to quote the name. You may, though:
String correctJson = incorrectJson.replaceAll(
"(?<=\\{|, ?)([a-zA-Z]+?): ?(?![ \\{\\[])(.+?)(?=,|})", "\"$1\": \"$2\"");
{"id": "267107086801", "productCode": "02-671070868", "lastUpdate":
"2018-07-15", "lastUpdateTimestamp": "2018-07-15 01:49:58", user: {pf:
{"document": "123456789", "name": "Luis Fernando"}, address:
{"street": "Rua Pref. Josu00e9 Alves Lima","number": "37"}, payment:
[{"sequential": "1", "id": "CREDIT_CARD", "value": "188",
"installments": "9"}]}
The following code takes care single quote present in JSON string as well as a key containing number
jsonString = jsonString.replaceAll(" :",":"); // to trip space after key
jsonString = jsonString.replaceAll(": ,",":,");
jsonString = jsonString.replaceAll("(?<=: ?)(?![ \{\[])(.+?)(?=,|})", ""$1"");
jsonString = jsonString.replaceAll("(?<=\{|, ?)([a-zA-Z0-9]+?)(?=:)",""$1"");
jsonString = jsonString.replaceAll(""true"", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll(""false"", "false"); // to un-quote boolean
jsonString = jsonString.replaceAll(""null"", "null");// to un-quote null
jsonString = jsonString.replaceAll(":",", ":"" ,"); // to remove unnecessary double quotes
jsonString = jsonString.replaceAll("true"", "true"); // to un-quote boolean
jsonString = jsonString.replaceAll("'",", "',"); // to handle single quote within json string
jsonString = jsonString.replaceAll("'},", "'}","); // to put double quote after string ending with single quote

Splitting a string into an array then splitting the array again

I have this string:
fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862
I want to use a loop to split this string into an array at the comma [,] example:
[0]fname lname
[1]GTA V: 120 : 00000
[2]Minecraft : 20 : 10
[3]Assassin’s Creed IV : 90 : 800
[4]Payday 2 : 190 : 2001
[5]Wolfenstein TNO : 25 : 80
[6]FarCry 4 : 55 : 862
Then I want to use another loop to split this further at : into another array example
[0]fname lname
[1]GTA V
[2]120
[3]00000
[4]Minecraft
[5]20
[6]10
....
Is there a better way of doing this?
currently I have:
List<String> lines = new ArrayList<String>();
while (scan.hasNextLine())
{
lines.add(scan.nextLine());
}
//converts the list array to string array
String[] scanarray = lines.toArray(new String[0]);
//converts the string array into one large string
String str_array = Arrays.toString(scanarray);
String[] arraysplit;
arraysplit = str_array.split("\\s*:\\s*");
for (int i=0; i<arraysplit.length; i++)
{
arraysplit[i] = arraysplit[i].trim();
// ^^^^^^^^^^^^ has values with spaces
System.out.println(scanarray[i]);
}
EDIT:
Currently my program creates 3 identical arrays, containing the example you can see in the second block of code above.
You can use the split method from String class with multiple delimiters
public static void main(String[] args) {
String myOriginalString = " fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862";
// | is the regex OR operator
String[] splited = myOriginalString.split(",|:");
for(String s : splited)
System.out.println(s.trim());
}
You can achieve it what you are looking for with REGEX, just put what all thing you get separated with string split method.
I tried below code locally and it is pretty much same what you are looking for.
public class StackSol1 {
public static void main(String[] args) {
String str = "fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 :80, FarCry 4 : 55 : 862";
String delimiters = "\\s+|,\\s*|\\:\\s*";
// analyzing the string
String[] tokensVal = str.split(delimiters);
// prints the number of tokens
System.out.println("Count of tokens = " + tokensVal.length);
String finalStr="";
for (String token : tokensVal) {
finalStr = finalStr+"\n"+token;
}
System.out.println(finalStr);
}
}
How about using split with regex? e.g.
String aa = "fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862";
String [] a = aa.split("[,:]");

Java regex to split along words, punctuation, and whitespace, and keep all in an array

I am trying to split a sentence into a group of strings. I want to keep all words, punctuation and whitespace in an array.
For example:
"Hello! My name is John Doe."
Would be split into:
["Hello", "!", " ", "My", " ", "name", " ", "is", " ", "John", " ", "Doe"]
I currently have the following line of code breaking my sentence:
String[] fragments = sentence.split("(?<!^)\\b");
However, this is running into an error where it counts a punctuation mark followed by a whitespace as a single string. How do I modify my regex to account for this?
You can try the following regular expression:
(?<=\b|[^\p{L}])
"Hello! My name is John Doe.".split("(?<=\\b|[^\\p{L}])", 0)
// ⇒ ["Hello", "!", " ", "My", " ", "name", " ", "is", " ", "John", " ", "Doe", "."]

How to get numbers from string using regex? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have this string
[23,22,17][17,2][23][3,29][][10,43,6][7][32,17,6][][][23,49,12][14,40,15][34,41,32][4,7,19][9,27][17][31,36,45][][32][40,27,25]
obtained from json and i saved it into ArrayList like this:
ArrayList<?> listAdress=(ArrayList<?>)jobj.get("adress");
I want to take only the numbers and save the numbers in brackets into a vector like this.
v[]={23,22,18}
v[]={17,2}
I tried to get only the numbers, but i dont know how to take the numbers untill you find ]
Someone know how to?
here is the regex you'll need for your problem :
(\d*,*)*
A link for explanation of this regex
here follows the java method to get the arrays of numbers :
public static List<String []> getNumberArrays (String toBeProcessed){
List<String[]> listOfArrays = new ArrayList<String[]>();
Pattern p = Pattern.compile("(\\d*,*)*");
Matcher m = p.matcher(toBeProcessed);
while(m.find()){
String[] a ;
a =m.group(0).split(",");
// next statement for avoiding the printing of empty arrays
if(a.length>=2)
listOfArrays.add(a);
}
return listOfArrays;
}
Test code :
String x = "[23,22,17][17,2][23][3,29][][10,43,6][7][32,17,6][][][23,49,12][14,40,15][34,41,32][4,7,19][9,27][17][31,36,45][][32][40,27,25]" ;
List<String[]> listOfArrays = new ArrayList<String[]>();
listOfArrays = getNumberArrays(x);
for(String[] a :listOfArrays){
System.out.println(Arrays.toString(a));
}
Output :
[23, 22, 17]
[17, 2]
[3, 29]
[10, 43, 6]
[32, 17, 6]
[23, 49, 12]
[14, 40, 15]
[34, 41, 32]
[4, 7, 19]
[9, 27]
[31, 36, 45]
[40, 27, 25]
What about this:
public static void main(String[] args) {
String testStr = "[23,22,17][17,2][23][3,29][][10,43,6][7][32,17,6][][][23,49,12][14,40,15][34,41,32][4,7,19][9,27][17][31,36,45][][32][40,27,25]";
ArrayList<String[]> result = new ArrayList<>();
String[] resTmp = testStr.split("\\[|\\]\\["); // First split input into vectors
for (String vecDef: resTmp) // Then split each vector into a String[]
result.add(vecDef.split(","));
for (String[] s : result) { // result = ArrayList with an element for each vector
for (String ss : s) // Each element is an array of Strings each being a number
System.out.print(ss + " ");
System.out.println();
}
}
I know you asked for a Regex but I'm not sure it's the only or the best way to go for such a simple parsing.
Here a quick (and not so safe) code:
public class HelloWorld{
public static void main(String []args){
String input = "[23,22,17][17,2][23][3,29][][10,43,6][7][32,17,6][][][23,49,12][14,40,15][34,41,32][4,7,19][9,27][17][31,36,45][][32][40,27,25]";
input = input.substring(1, input.length()-1);
String[] vectors = input.split("\\]\\[");
for(String vector : vectors)
{
System.out.println(String.format("\"%s\"", vector));
}
}
}
Output:
"23,22,17"
"17,2"
"23"
"3,29"
""
"10,43,6"
"7"
"32,17,6"
""
""
"23,49,12"
"14,40,15"
"34,41,32"
"4,7,19"
"9,27"
"17"
"31,36,45"
""
"32"
"40,27,25"
The thing is: you have to make sure that the string provided as an input is always well formatted (beginning with a [, ending with a ], and made of segments beginning with [ and ending with ]). Yet it's almost the same story with regular expressions (invalid input = no outputs, or partial outputs).
Once you have your strings with numbers separated by commas, the rest of the job is easy (you can split again and then parse to Integers).
public void importarCorreos() throws Exception{
#SuppressWarnings("deprecation")
ClientRequest cr = new ClientRequest("http://di002.edv.uniovi.es/~delacal/tew/1415/practica02/servicio_correos.php");
#SuppressWarnings("deprecation")
String result = cr.get(String.class).getEntity(String.class);
CorreosService service = Factories.services.createCorreosService();
//Imprimimos todo el flujo JSON recibido en formato cadena.
System.out.println(result);
//Procesamos el texto JSON y lo pasamos a formato SIMPLE-JSON
Object obj=JSONValue.parse(result);
JSONArray correos = (JSONArray)obj;
ListIterator li = correos.listIterator();
while(li.hasNext()){
JSONObject jobj =(JSONObject) li.next();
Correo c = new Correo();
c.setFechaHora( Long.parseLong(jobj.get("fechahora").toString()));
c.setAsunto(jobj.get("asunto").toString());
c.setCuerpo(jobj.get("cuerpo").toString());
c.setCarpeta( Integer.parseInt(jobj.get("carpeta").toString()));
c.setLogin_user(usuario.getLogin());
ArrayList<?> listaDestinatarios=(ArrayList<?>)jobj.get("destinatarios");
service.saveCorreo(c);
}
}
This is my function, mainly i obtained a json with mails from this url. I create a new mail with the fields from that url. But one of field from Mails class is mail_contacts where you should save the adresses from each contact like a vector [1,2,3] this is the id from the adress.
So how can i get the numbers into [ ], and save it into the fields mail_contacts what its a array.
I can save it like this:
c.setMailAdress(Here i want an array with the numbers from each [])
#ulix
Ok, this give the exit that i want:
00:53:20,413 INFO [stdout] (default task-6) 23 22 17
00:53:20,414 INFO [stdout] (default task-6) 17 2
00:53:20,414 INFO [stdout] (default task-6) 23
00:53:20,416 INFO [stdout] (default task-6) 3 29
00:53:20,416 INFO [stdout] (default task-6)
00:53:20,417 INFO [stdout] (default task-6) 10 43 6
But i want to save each position from string into an array of int, like int v[]={23,22,17}

Picking apart a string and replacing it

I have been picking my brain lately and can't seem to figure out how to pull the "text" from this string and replace the found pattern with those word(s).
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
sb is the string that contains a few occurrences of these patterns that start with [{ and end with ]}.
[{ md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}}]
gets returned as
md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}
Notice the lack of [{ and }]. I manage to find the above pattern but how would I find the words set and Book and then replace the original found pattern with only those words. I can search the string if it contains a " via
while (matcher.find()) {
matcher.group(1).contains("\"");
but I really just need some ideas about how to go about doing this.
Is this what you are looking for (answer based on your first comment)?
its actually fairly large.. but goes along the lines of "hello my name is, etc, etc, etc, [{ md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "set", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, l : "Books", v : "ETBO"}}] , some more text here, and some more" -> the [{ }] parts should be replaced with the text inside of them in this case set, books, etbo... resulting in a final string of "hello my name is, etc, etc, etc, set set Books ETBO , some more text here, and some more"
// text from your comment
String sb = "hello my name is, etc, etc, etc, [{ md : "
+ "{o : \"set\", et : _LU.et.v.v }, d : {t : "
+ "_LU.el.searchtype, l : _LU[_LU.el.searchtype].nfts.l, "
+ "v : _LU[_LU.el.searchtype].nfts.v}}, { md : {o : "
+ "\"set\", et : _LU.et.v.v }, d : {t : _LU.el.topicgroup, "
+ "l : \"Books\", v : \"ETBO\"}}] , "
+ "some more text here, and some more";
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
// pattern that finds words between quotes
Pattern serchWordsInQuores = Pattern.compile("\"(.+?)\"");
// here I will collect words in quotes placed in [{ and }] and separate
// them with one space
StringBuilder words = new StringBuilder();
// buffer used while replacing [{ xxx }] part with words found in xxx
StringBuffer output = new StringBuffer();
while (matcher.find()) {// looking for [{ xxx }]
words.delete(0, words.length());
//now I search for words in quotes from [{ xxx }]
Matcher m = serchWordsInQuores.matcher(matcher.group());
while (m.find())
words.append(m.group(1)).append(" ");
matcher.appendReplacement(output, words.toString().trim());
//trim was used to remove last space
}
//we also need to append last part of String that wasn't used in matcher
matcher.appendTail(output);
System.out.println(output);
Output:
hello my name is, etc, etc, etc, set set Books ETBO , some more text here, and some more
OK, I think you need to do this in three passes, first time matching the section between the [{ }], and the second time going through the match doing the replace, and the third time replacing that match with the string you got from the second pass.
You already have a pattern for the first match, and you'd just use it again for the third match, when you replace it with the result of the second pass.
For the second pass, you're going to need to replaceAll on the first match. Something like this:
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcher = searchPattern.matcher(sb);
while ( matcher.find() )
{
matcher.replaceFirst(matcher.group(1).replaceAll("[^\"]*\"([^\"]*)\"", "$1"));
}
The first pass is done by matcher.find(). The next one is done by matcher.group().replaceAll(), which is then passed into matcher.replaceFirst() for the third pass. The third pass is a little weird: it replaces the first example of the [{ }]. However, since we're starting from the beginning and moving forward, that will be the one we just found, and we won't match it again because it will get replaced by a non-matching string. The docs recommend resetting the matcher after replaceFirst(), but I think it will be safe here because it will continue from after that replacement, which is exactly what we want.
I would point out that this is not particularly efficient. I think that you would be better off doing more of this manually rather than with regular expressions.
LATEST REVISION
An Example on how to loop over a string with multiple boundaries and replacing at each level
public static String replace(CharSequence rawText, String oldWord, String newWord, String regex) {
Pattern patt = Pattern.compile(regex);
Matcher m = patt.matcher(rawText);
StringBuffer sb = new StringBuffer(rawText.length());
while (m.find()) {
String text = m.group(1);
if(oldWord == null || oldWord.isEmpty()) {
m.appendReplacement(sb, Matcher.quoteReplacement(newWord));
} else {
if(text.matches(oldWord)) {
m.appendReplacement(sb, Matcher.quoteReplacement(newWord));
}
}
}
m.appendTail(sb);
return sb.toString();
}
public static void main(String[] args) throws Exception {
String rawText = "[{MY NAME IS \"NAME\"}]";
rawText += " bla bla bla [{I LIVE IN \"SOME RANDOM CITY\" WHERE THE PIZZA IS GREAT!}]";
rawText += " bla bla etc etc [{I LOVE \"A HOBBY\"}]";
System.out.println(rawText);
Pattern searchPattern = Pattern.compile("\\[\\{(.+?)\\}\\]");
Matcher matcherBoundary = searchPattern.matcher(rawText);
List<String> replacement = new ArrayList<String>();
replacement.add("BOB");
replacement.add("LOS ANGELES");
replacement.add("PUPPIES");
int counter = 0;
while (matcherBoundary.find()) {
String result = Test.replace(matcherBoundary.group(1), null, replacement.get(counter), "\"([^\"]*)\"");
System.out.println(result);
counter++;
}
}
The output I get is:
**Raw Text**
[{MY NAME IS "NAME"}] bla bla bla [{I LIVE IN "SOME RANDOM CITY" WHERE THE PIZZA IS GREAT!}] bla bla etc etc [{I LOVE "A HOBBY"}]
**In Every Loop**
MY NAME IS BOB
I LIVE IN LOS ANGELES WHERE THE PIZZA IS GREAT!
I LOVE PUPPIES

Categories