Split with multiple delimiters not working - java

For some reason my multi delimiter split is not working. Hope it just a syntax error.
This works, but I want to also split if it finds end date
String dateList[] = test.split("(?="+StartDate+")");
But this does not. Am I missing something?
String dateList[] = text.split("[(?="+StartDate+")(?="+EndDate+")]");

You cannot use "lookarounds" in a custom character class - they'd be just interpreted as characters of the class (and may not even compile the pattern properly if a malformed range is detected, e.g. with dangling - characters).
Use the | operator to alternate between StartDate and EndDate.
Something like:
String dateList[] = text.split("(?="+StartDate+"|"+EndDate+")");
Notes
You also may want to invoke Pattern.quote on your start and end date values, in case they contain reserved characters.
Java variable naming convention is camelBack, not CamelCase

Related

Conversion of String into Map in Java With Some Special Characters

I am trying to convert a String into Java Map but getting the following exception
com.google.gson.stream.MalformedJsonException
This is the string that I am trying to Map.
String myString = "{name=Nikhil Gupta,age=23,location=234Niwas#res=34}"
Map innerMap = new Gson().fromJson(myString,Map.class);
I understood the main problem here that because of these special characters I am getting this error.
If I remove those spaces and special characters then it will work fine.
Is there any way to do this without removing those spaces and special characters?
The approach used so far.
Wrapped those strings with special characters inside a single quote.
String myString = "{name='Nikhil Gupta',age='23',location='234Niwas#res=34'}"
But this is something that I don't want to use in a production environment as it will not work with nested structures.
Is there some genuine way to approach this in java?
I understood the main problem here that because of these special characters I am getting this error.
No, it's not because of "special characters" (whatever that means exactly).
{name=Nikhil Gupta,age=23,location=234Niwas#res=34}
The string you're trying to parse is simply not in JSON format, but in some other format that superficially resembles JSON. Your fixes by enclosing values in single quotes still don't make it JSON.
If it were valid JSON, it would look like this:
{"name":"Nikhil Gupta","age":23,"location":"234Niwas#res=34"}
Notable differences with your original:
Keys must be enclosed in double quotes
String values must be enclosed in double quotes (numeric values do not)
Key and value must be separated by a colon : instead of an equals sign =
Ways to solve this:
Use actual JSON format; see json.org for the specification
If you can't make it real JSON and you must absolutely use the format you are using, then you need to write your own parser for this custom non-JSON format

How to validate String in Java by matches?

To validate String in Java I can use String.matches(). I would like to validate a simple string "*.txt" where "*" means anything. Input e.g. test.txt is correct, but test.tt is not correct, because of ".tt". I tried to use matches("[*].txt"), but it doesn't work. How can I improve this matches? Thanks.
Do not use code, you don't understand!
For your simple problem you could totally avoid using a regular expression and just use
yourString.endsWith(".txt")
and if you want to perform this comparison case insensitive (i.e. allow ".TXT" or ".tXt") use
yourString.toLowerCase().endsWith(".txt")
If you want to learn more about regular expressions in java, I'd recomment a tutorial. For example this one.
You may try this for txt files:
"file.txt".matches("^.*[.]txt$")
Basically ^ means the start of your string. .* means match anything greedy, hence as much as you can get to make the expression match. And [.] means match the dot character. The suffix txt is just the txt text itself. And finally $ is the anchor for the end of the string, which ensures that the string does not contain anything more.
Use .+, it means any character having one or unlimited lengths. It will ensure to avoid the inputs like only .txt
matches(".+[.]txt")
FYI: [.] simply matches with the dot character.

Java Regex to find a string which starts with SDPCDR_

what is the regular expression which i should use to match a string which starts with SDPCDR
and contains date in the format 20120826 and ends with .asn ?
an example string is SDPCDR_delsdp3a_6091_20120826-042451.asn
This would work:
^SDPCDR\w+(\d{8})-\w+.asn$
"^SDPCDR.*\\d{8}.*\\.asn$"
Pretty generous on the date part, but the string is probably specific enough already to avoid false matches. If you're looking for a substring rather than trying to match the entire string, instead use
"SDPCDR.*?\\d{8}.*?\\.asn"
SDPCR_[a-z_]*[0-9]{8,8}-[a-z_]*\\\\.asn

how to built regular expression to get value between two single quotes and if there is no single qoute, extract between commas

Problem that i face:
-I have an input string, a SQL statement that i need to parse
-extract the value that need to be insert base on the column name specify
-i can extract the value that is wrap in between 2 single quotes, but:
--?what about value that has no single quotes wrap at them? (like: integer or double)
--?what if the value inside already has single quotes? (like: 'James''s dictionary')
Below is the sample input string:
INSERT INTO LJS1_DX (base, doc, key1, key2, no, sq, eq, ln, en, date, line)
VALUES ('GET','','#000210','',' 0',' 1','5',1,0,'20100706','Street''James''s dictionary')
The Java Code i have below match value between two single quotes only:
Pattern p = Pattern.compile("'.*?'");
columnValues = "'GET0','','#000210','',' 0',' 1','5',1,0,'20100706','Street''James''s dictionary'";
Matcher m = p.matcher(columnValues); // get a matcher object
StringBuffer output = new StringBuffer();
while (m.find()) {
logger.trace(m.group());
}
Appreciate if anyone can provide any guideline or sample to this question.
Thank you!!
I agree with gnibbler that this is a job for a csv parser.
A regex that works on your example would be
'(?:''|[^'])*'|[^',]+
which looks challenging to debug and maintain, doesn't it?
Explanation:
' # First alternative: match an "opening" '
(?: # followed by either...
'' # two ' in a row (escaped ')
| # or...
[^'] # any character that is not a '
)* # zero or more times,
' # then match a "closing" '
| # or (second alternative):
[^',\s]+ # match any run of characters except ', comma or whitespace
It also works if there is whitespace around the values/commas (and will leave that out of the match).
Regex are not really suitable for this. You will always find cases that fail
A csv parser such as opencsv is probably a better option
In general, when you need to parse complex langauges, regexps are not the best tool - there's too much context to make sense of. So, if reading XML use an XML parser, if reading C code, use a C language parser and if reading SQL ...
There's a Java SQL parser here, I would use somethink like this.
For other languages it may be best to use a "YACC"-like parser. For example JACK
instead you can get all values using subString after Values keyword. Same way we can get names also. then you will have two comma-separated string which can be converted to array and you will have a arrays for names and values. you can then check which param has which value .
hope this helps.
I think Tim had the right idea; it just needs to be implemented more efficiently. Here's a much more efficient version:
'[^']*+(?:''[^']*+)*+'|[^',\s]++
It uses Friedl's "unrolled loop" technique to avoid excessive reliance on alternations that match one or two characters at a time (I think that's what did you in, Tim), plus possessive quantifiers throughout.
Regular expressions are not easy to use with this (but everything is possible).
I would suggest parsing it yourself, or use a library to do the parsing. By writing the parser yourself you are certain that it works exactly as you need it to.

Use RegExp to replace XML tags with whitespaces (in the length of the tags)

I need to strip all xml tags from an xml document, but keep the space the tags occupy, so that the textual content stays at the same offsets as in the xml. This needs to be done in Java, and I thought RegExp would be the way to go, but I have found no simple way to get the length of the tags that match my regular expression.
Basically what I want is this:
Pattern p = Pattern.compile("<[^>]+>[^<]*]+>");
Matcher m = p.matcher(stringWithXMLContent);
String strippedContent = m.replaceAll("THIS IS A STRING OF WHITESPACES IN THE LENGTH OF THE MATCHED TAG");
Hope somebody can help me to do this in a simple way!
Since < and > characters always surround starting and ending tags in XML, this may be simpler with a straightforward statemachine. Simply loop over all characters (in some writeable form - not stored in a string), and if you encounter a < flip on the "replacement mode" and start replacing all characters with spaces until you encounter a >. (Be sure to replace both the initial < and the closing >).
If you care about layout, you may wish to avoid replacing tab characters and/or newline characters. If all you care about is overall string length, that obviously won't matter.
Edit: If you want to support comments, processing instructions and/or CData sections, you'll need to explicitly recognize these too; also, attribute values unfortunately can include > as well; all this means a full-fledged implementation will be more complex that you'd like.
A regular transducer would be perfect for this task; but unfortunately those aren't exactly commonly found in class libraries...
Pattern p = Pattern.compile("<[^>]+>[^<]*]+>");
In the spirit of You Can't Parse XML With Regexp, you do know that's not an adequate pattern for arbitrary XML, right? (It's perfectly valid to have a > character in an attribute value, for example, not to mention other non-tag constructs.)
I have found no simple way to get the length of the tags that match my regular expression.
Instead of using replaceAll, repeatedly call find on the Matcher. You can then read start/end to get the indexes to replace, or use the appendReplacement method on a buffer. eg.
StringBuffer b= new StringBuffer();
while (m.find()) {
String spaces= StringUtils.repeat(" ", m.end()-m.start());
m.appendReplacement(b, spaces);
}
m.appendTail(b);
stringWithXMLContent= b.toString();
(StringUtils comes from Apache Commons. For more background and library-free alternatives see this question.)
Why not use an xml pull parser and simply echo everything that you want to keep as you encounter it, e.g. character content and whenever you reach a start or end tag find out the length using the name of the element, plus any attributes that it has and write the appropriate number of spaces.
The SAX API also has callbacks for ignoreable whitespace. So you can also echo all whitespace that occurs in your document.
Maybe m.start() and m.end() can help.
m.start() => "The index of the first character matched"
m.end() => "The offset after the last character matched"
(m.end() - m.start())-2 and you know how many /s you need.
**string**.replaceAll("(</?[a-zA-Z]{1}>)*", "")
you can also try this. it searches for <, then / 0 or 1 occurance then followed by characters only 1 (small or capital char), then followed by a > , then * for multiple occurrence of this pattern.
:)

Categories