String split fails due to unwanted delimiter

String split fails due to unwanted delimiter - java

This is the string I need to split for putting in map as key-val pair:
"jti":"4ef61081-e2e0-40e4-a9ad-8f2bf33f8923","exp":1525357546,"nbf":0,"iat":1525271146,"iss":"https://dev.open-sunbird.org/auth/realms/sunbird","aud":"admin-cli"
I tried with
String[] parts = body.split(":|,");
Problem with this approach is the ":" in https link. See the output as follows
--"jti"--"4ef61081-e2e0-40e4-a9ad-8f2bf33f8923"
--"exp"--1525357546
--"nbf"--0
--"iat"--1525271146
--"iss"--"https
--//dev.open-sunbird.org/auth/realms/sunbird"--"aud"
Any lead for the exact regex to solve the issue will be appreciated. (On top of my head is if we can do a check that every spitted word either starts and ends with " or doesn't start and end with ". But I feel that is a naive approach. even if we can do it.)

No need to get fancy with regex. There are a couple options.
This is clearly claims / attributes on a JWT token. Use a library to parse the JWT instead of parsing the string this way.
Just split first by commas, and then by the FIRST colon. Should give you what you want without trying to respect the position of the quotes.
It's JSON, so use a JSON parser.

Related

JSON broken when double quotes comes inside the key/value

Sample data:
{"630":{"TotalLength":"33-3/8" - 36-3/4""},"631":{"Length":"34 37 7/8"}}
We are facing the double quotes issue in JSON response. How we can replace the double quotes with " \" " which comes inside the key or value? Java is the development platform.

This answer is assuming that you are not in control of creating this JSON-like string. If you can control that part, then you should be escaping properly there itself.
In this case, since parsing systematically is not an option as it's not a valid JSON yet, all I could suggest is to go through the various strings and see if you can find a pattern on which you can apply some logic and escape all the "s which prevent the string from being a valid JSON.
Here is probably a way to start:
All of the "s that are needed to be there for the string to be a vaild JSON are surrounded by one or multiple characters among {, :, ,, and }, with or without space in between the " and the other JSON characters.
So, if you parse the JSON-like string using Java and look for all the "s, and, when encountered with one, if they are along with any of the above characters (with or without space in between), you just leave it as it is. If not, replace that " with a \".
Note that the above method may or may not work depending on the data in question. What I mean to convey is the approach that you may find useful if there's absolutely no way for the string to be escaped during it's creation, and, if these strings follow a strict pattern with respect to the unescaped "s.

is it possible to use replaceAll() with wildcards

Good morning. I realize there are a ton of questions out there regarding replace and replaceAll() but i havnt seen this.
What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of <p> in the string i want to remove everything that starts with & and ends with ; until i see the next </p>
To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")
That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;

You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the <p> tags.
For the replacement logic, String.replaceAll uses regular expressions, which can do the matching you want.
The "wildcard" in regular expressions that you want is the .* expression. Using your example:
String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);
This outputs This String. This is because the . represents any character, and the * means "this character 0 or more times." So .* basically means "any number of characters." However, feeding it:
"This &escape;String &anotherescape;Extended"
will probably not do what you want, and it will output This Extended. To fix this, you specify exactly what you want to look for instead of the . character. This is done using [^;], which means "any character that's not a semicolon:
String removed = ampStr.replaceAll("&[^;]*;", "");
This has performance benefits over &.*?; for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc; token and the &.*?; version can have huge performance bottle-necks as a result.

The expression you want is:
s.replaceAll("&.*?;","");
But do you really want to be parsing HTML this way? You may be better off using an XML parser.

Java Splitting a String

I have this string
G234101,Non-Essential,ATPases,Respiration chain complexes,"Auxotrophies, carbon and",PS00017,2,IONIC HOMEOSTASIS,mitochondria.
That I have been trying to split in java. The file is comma delimeted but some of the strings have commas within them and I don't want them to get split up. Currently in the above example
"Auxotrophies, carbon and"
is getting split into two strings.
Any suggestions on how to best split this up by comma's. Not all of the strings have the " " for example the following string:
G234103,Essential,Protein Kinases,?,Cell cycle defects,PS00479,2,CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION,cytoplasm.

http://opencsv.sourceforge.net/
But if you really do need to reinvent the wheel (homework), you need to use a more complicated regular expression than just "what,ever".split(","). It's not simple though. And you might be better off creating your own custom Lexer. http://en.wikipedia.org/wiki/Lexical_analysis
This isn't too hard in your case. As you process your text character by character you just need to keep track of opening and closing quotes to decide when to ignore commas and when to act on them.
Also see StreamTokenizer for a built-in configurable Lexer - you should be able to use this to meet your requirements.

I would think that this would be a multi step process. First, find all the comma's in quotes from your original string, replace it with something like {comma}. You can do this with some regex. Then on the new string, split the new string with the comma symbol(,). Then go through your list, and replace the {comma} with the comma symbol {,}.

Java Regex - exclude empty tags from xml

let's say I have two xml strings:
String logToSearch = "<abc><number>123456789012</number></abc>"
String logToSearch2 = "<abc><number xsi:type=\"soapenc:string\" /></abc>"
String logToSearch3 = "<abc><number /></abc>";
I need a pattern which finds the number tag if the tag contains value, i.e. the match should be found only in the logToSearch.
I'm not saying i'm looking for the number itself, but rather that the matcher.find method should return true only for the first string.
For now i have this:
Pattern pattern = Pattern.compile("<(" + pattrenString + ").*?>",
Pattern.CASE_INSENSITIVE);
where the patternString is simply "number". I tried to add "<(" + pattrenString + ")[^/>].*?> but it didn't work because in [^/>] each character is treated separately.
Thanks

This is absolutely the wrong way to parse XML. In fact, if you need more than just the basic example given here, there's provably no way to solve the more complex cases with regex.
Use an easy XML parser, like XOM. Now, using xpath, query for the elements and filter those without data. I can only imagine that this question is a precursor to future headaches unless you modify your approach right now.

So a search for "<number[^/>]*>" would find the opening tag. If you want to be sure it isn't empty, try "<number[^/>]*>[^<]" or "<number[^/>]*>[0-9]"

Java Inner Text (getTextContents()) Problem

I'm trying to do some parsing in Java and I'm using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like
"\n\n\nValue\n-\nValue\n\n\n"
Is there a built in way to get rid of the line breaks? I would like to do a RegEx like
(?:\s*([^-]+)\s*-\s*([^-]+)\s*)
on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.
Example Input:
Value
-
Value
Thanks

You can use String.replaceAll().
String trimmed = original_string.replaceAll("\n", "");
The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "") for instance.

I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:
String s = node.getTextContents().replaceAll("\\s","");
If you just want to get rid of the leading/trailing whitespace, use trim().

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String split fails due to unwanted delimiter - java

Related

JSON broken when double quotes comes inside the key/value

is it possible to use replaceAll() with wildcards

Java Splitting a String

Java Regex - exclude empty tags from xml

Java Inner Text (getTextContents()) Problem

Categories

Resources