Java string matching expression String Array - java

I am using Java, i need your opinion on how to write better code for the following task.
I have following String value
String testStr = "INCLUDES(ABC) EXCLUDES(ABC) EXCLUDES(ABC) INCLUDES(ABC) INCLUDES(ABC)"
I want to manipulate Strings and want to combine all INCLUDES statements into one INCLUDES and the result should be similar to the following:
INCLUDES(ABC,ABC, ABC) EXCLUDES(ABC, ABC)

i would break the initial string into new strings using this class:
http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
and put them in an array
start a new string, and then use the tokenizer to break out the part within the parentheses (you can set it to use ( and ) as delimiters) and loop through the array and concatenate them into the new string.
be aware though that any wrongly placed spaces (like INCLUDES( abc )) will mess it up

This seems like a reasonable approach:
Split the testStr using the StringUtils.split method; use " " or null as the token.
Create a Map<String, List<String>>. I will refer to that as theMap
For each string in the returned array perform the following:
Split the string using "()" as the token.
The returned array should have 2 elements. The first element (index 0) is key for theMap and the second element (index 1) is the value to add to the list.
Once you are done with the array returned from splitting testStr, build a new string by using the key value in theMap and appending the elements from the associated list into a string.
Apache StringUtils

I wrote a piece of code for this issue but i don't know if it's good or not
according to your format ,you can split testStr by using " " ,the output will be like this: INCLUDES(ABC)
check if this string contain INCLUDES or EXCLUDES
then split it by using ( )
Like this :
String testStr = "INCLUDES(ABC) EXCLUDES(C) EXCLUDES(ABC) INCLUDES(AC) INCLUDES(AB)";
String s[] = testStr.split(" ");
String INCLUDES = "INCLUDES( ";
String EXCLUDES = "EXCLUDES ( ";
for (int i = 0; i < s.length; i++) {
if (s[i].contains("INCLUDES")) {
INCLUDES += (s[i].substring(s[i].indexOf("(") + 1, s[i].indexOf(")"))) + " ";
}
else if (s[i].contains("EXCLUDES")) {
EXCLUDES += (s[i].substring(s[i].indexOf("(") + 1, s[i].indexOf(")"))) + " ";
}
}
INCLUDES = INCLUDES + ")";
EXCLUDES = EXCLUDES + ")";
System.out.println(INCLUDES);
System.out.println(EXCLUDES);

I have wrote down small utility result as following
if text = "EXCLUDES(ABC) EXCLUDES(ABC) INCLUDES(BMG) INCLUDES(EFG) INCLUDES(IJK)";
output = EXCLUDES(ABC) EXCLUDES(ABC) INCLUDES(BMG & EFG & IJK)
Following is my java codeas following please take a look and if any one can improve it please feel free.
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.sun.xml.internal.ws.util.StringUtils;
/**
* Created by IntelliJ IDEA.
* User: fkhan
* Date: Aug 31, 2012
* Time: 1:36:45 PM
* To change this template use File | Settings | File Templates.
*/
public class TestClass {
public static void main(String args[]) throws Exception {
//String text = "INCLUDES(ABC) EXCLUDES(ABC) EXCLUDES(ABC) INCLUDES(EFG) INCLUDES(IJK)";
String text = "EXCLUDES(ABC) EXCLUDES(ABC) INCLUDES(BMG) INCLUDES(EFG) INCLUDES(IJK)";
List<String> matchedList = findMatchPhrase("INCLUDES", text);
String query = combinePhrase(text, "INCLUDES", matchedList);
System.out.println(query);
}
/**
* This method takes query combine and & multiple phrases
* #param expression
* #param keyword
* #param matchedItemList
* #return
*/
private static String combinePhrase(String expression, String keyword, List<String> matchedItemList) {
//if only one phrase found return value
if(matchedItemList.isEmpty() || matchedItemList.size() ==1){
return expression;
}
//do not remove first match
String matchedItem = null;
for (int index = 1; index < matchedItemList.size(); index++) {
matchedItem = matchedItemList.get(index);
//remove match items from string other then first match
expression = expression.replace(matchedItem, "");
}
StringBuffer textBuffer = new StringBuffer(expression);
//combine other matched strings in first matched item
StringBuffer combineStrBuf = new StringBuffer();
if (matchedItemList.size() > 1) {
for (int index = 1; index < matchedItemList.size(); index++) {
String str = matchedItemList.get(index);
combineStrBuf.append((parseValue(keyword, str)));
combineStrBuf.append(" & ");
}
combineStrBuf.delete(combineStrBuf.lastIndexOf(" & "), combineStrBuf.length());
}
// Inject created phrase into first phrase
//append in existing phrase
return injectInPhrase(keyword, textBuffer, combineStrBuf.toString());
}
/**
*
* #param keyword
* #param textBuffer
* #param injectStr
*/
private static String injectInPhrase(String keyword, StringBuffer textBuffer, String injectStr) {
Matcher matcher = getMatcher(textBuffer.toString());
while (matcher.find()) {
String subStr = matcher.group();
if (subStr.startsWith(keyword)) {
textBuffer.insert(matcher.end()-1, " & ".concat(injectStr));
break;
}
}
return textBuffer.toString();
}
/**
* #param expression
* #param keyword
* #return
*/
private static String parseValue(String keyword, String expression) {
String parsStr = "";
if (expression.indexOf(keyword) > -1) {
parsStr = expression.replace(keyword, "").replace("(", "").replace(")", "");
}
return parsStr;
}
/**
* This method creates matcher object
* and return for further processing
* #param expression
* #return
*/
private static Matcher getMatcher(String expression){
String patternString = "(\\w+)\\((.*?)\\)";
Pattern pattern = Pattern.compile(patternString);
return pattern.matcher(expression);
}
/**
* This method find find matched items by keyword
* and return as list
* #param keyword
* #param expression
* #return
*/
private static List<String> findMatchPhrase(String keyword, String expression) {
List<String> matchList = new ArrayList<String>(3);
keyword = StringUtils.capitalize(keyword);
Matcher matcher = getMatcher(expression);
while (matcher.find()) {
String subStr = matcher.group();
if (subStr.startsWith(keyword)) {
matchList.add(subStr);
}
}
return matchList;
}
}

Related

How to properly import the libphonelib to the java project?

I'm trying to use the PhoneNumberMatcher from the libphonenumber library. After adding the jar file to my project and setting the BuildPath, I was able to import classes to my project:
import com.google.i18n.phonenumbers.*;
Inside the lib, there is a class named PhoneNumberMatcher.class. I've been trying to reach it but this class name isn't included in suggestions I normally get when I press Ctrl + Space.
If I insist and write the name class, eclipse will underline the name and the message "The type PhoneNumberMatcher is not visible".
Newly I realized that the class has a small blue flag icon is in the project explorer.
It's not the only one that has such a blue flag, then I try the other classes and I realized that all classes with this blue flag isn't accessible. That's why I think these classes probably are private, or for internal use of the lib.
I'm trying to create a tool to extract phone numbers out of a text and I read this lib is exactly for it.
How do I use the PhonNumberMatcher class in my java project, please?
CharSequence text = "Call me at +1 425 882-8080 for details.";
String country = "US";
PhoneNumberUtil util = PhoneNumberUtil.getInstance();
// Find the first phone number match:
PhoneNumberMatch m = util.findNumbers(text, country).iterator().next();
// rawString() contains the phone number as it appears in the text.
"+1 425 882-8080".equals(m.rawString());
// start() and end() define the range of the matched subsequence.
CharSequence subsequence = text.subSequence(m.start(), m.end());
"+1 425 882-8080".contentEquals(subsequence);
// number() returns the the same result as PhoneNumberUtil.parse()
// invoked on rawString().
util.parse(m.rawString(), country).equals(m.number());
https://javadoc.io/doc/com.googlecode.libphonenumber/libphonenumber
Thank you for your answer Chana.
I did actually was able to use the library, but then I realized that the lib was to complicated to use for me, so I did write my own code to extract German phone numbers, IBANs, Postcodes and money amounts and then classify then:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NumberExtractorAndClassifier {
public static final String REGEXPhoneNumbers = "(0|0049\\s?|\\+49\\s?|\\(\\+49\\)\\s?|\\(0\\)\\s?){1}([0-9]{2,4})([ \\-\\/]?[0-9]{1,10})+";
public static final String REGEXIbanNumbers = "([A-Z][ -]?){2}([0-9]([ -]?)*){12,30}";
public static final String REGEXAmounts = "(\\d+([.,]?)\\d{0,}([.,]?)\\d*(\\s*)(\\$|€|EUR|USD|Euro|Dollar))|((\\$|€|EUR|USD|Euro|Dollar)(\\s*)\\d+([.,]?\\d{0,}([.,]?)\\d*))";
public static final String REGEXPostCode = "\\b((?:0[1-46-9]\\d{3})|(?:[1-357-9]\\d{4})|(?:[4][0-24-9]\\d{3})|(?:[6][013-9]\\d{3}))\\b";
public static String TextToAnalyze;
public static String CopyOfText = "";
public static int ExaminatedIndex = 0;
public static List<String> ExtractedPhoneNumbers = new ArrayList<String>();
public static List<String> ExtractedIbanNumbers = new ArrayList<String>();
public static List<String> ExtractedAmounts = new ArrayList<String>();
public static List<String> ExtractedPostCodes = new ArrayList<String>();
public static final String EMPTY_STRING = "";
/**
* #brief Constructor: initializes the needed variables and call the different methods in order to find an classify the Numbers
*
* #param Text: is the input text that need to be analyzed
*/
public NumberExtractorAndClassifier(String Text) {
TextToAnalyze = Text; //- This variable is going to have our complete text
CopyOfText = Text; //- This variable is going to have the missing text to analyze
//- We extract the amounts first in order to do not confuse them later with a phone number, IBAN or post-code
ExtractedAmounts = ExtractAmounts();
for (String Amount : ExtractedAmounts)
{
//- We cut them out of the text in order to do not confuse them later with a IBAN, phone number or post-code
String safeToUseInReplaceAllString = Pattern.quote(Amount);
CopyOfText = CopyOfText.replaceAll(safeToUseInReplaceAllString, "");
System.out.println("Found amount -------> " + Amount);
}
//- We extract the IBAN secondly in order to do not confuse them later with a phone number or post-code
ExtractedIbanNumbers = ExtracIbanNumbers();
for (String Iban : ExtractedIbanNumbers)
{
//- We cut them out of the text in order to do not confuse them later with a phone number, or post-code
String safeToUseInReplaceAllString = Pattern.quote(Iban);
CopyOfText = CopyOfText.replaceAll(safeToUseInReplaceAllString, "");
System.out.println("Found IBAN ---------> " + Iban);
}
//- We extract the phone numbers thirdly in order to do not confuse them later with a post-code
ExtractedPhoneNumbers = ExtractPhoneNumbers();
for( String number : ExtractedPhoneNumbers )
{
//- We cut them out of the text in order to do not confuse them later with a post-code
String safeToUseInReplaceAllString = Pattern.quote(number);
CopyOfText = CopyOfText.replaceAll(safeToUseInReplaceAllString, "");
System.out.println("Found number -------> " + number);
}
ExtractedPostCodes = ExtractPostCodes();
for( String PostCode : ExtractedPostCodes)
{
System.out.println("Found post code ----> " + PostCode);
}
}
/**
* #Brief Method extracts phone numbers out of the text with help of REGEXPhoneNumbers
*
* #return List of strings with all the found numbers.
*/
public static List<String> ExtractPhoneNumbers(){
//Initializing our variables
List<String> FoundNumbers = new ArrayList<String>();
boolean LineContainsNumber = true;
Pattern pattern = Pattern.compile(REGEXPhoneNumbers);
Matcher matcher = pattern.matcher(CopyOfText);
while (LineContainsNumber) {
if (matcher.find()) {
String NumberFoundByTheMatcher = matcher.group(0);
FoundNumbers.add(NumberFoundByTheMatcher);
}
else{LineContainsNumber = false;}
}
return FoundNumbers;
}
/**
* #Brief Method extracts IBAN numbers out of the text with help of REGEXIbanNumbers
*
* #return List of strings with all the found IBANS numbers.
*/
public static List<String> ExtracIbanNumbers(){
//Initializing our variables
List<String> FoundIbans = new ArrayList<String>();
boolean LineContainsIban = true;
Pattern pattern = Pattern.compile(REGEXIbanNumbers);
Matcher matcher = pattern.matcher(CopyOfText);
while (LineContainsIban) {
if (matcher.find()) {
String NumberFoundByTheMatcher = matcher.group(0);
FoundIbans.add(NumberFoundByTheMatcher);
}
else{LineContainsIban = false;}
}
return FoundIbans;
}
/**
* #Brief Method extracts Amounts out of the text with help of REGEXAmounts
*
* #return List of strings with all the found amounts.
*/
public static List<String> ExtractAmounts(){
//Initializing our variables
List<String> FoundAmounts = new ArrayList<String>();
boolean LineContainsAmount = true;
Pattern pattern = Pattern.compile(REGEXAmounts);
Matcher matcher = pattern.matcher(CopyOfText);
while (LineContainsAmount) {
if (matcher.find()) {
String NumberFoundByTheMatcher = matcher.group(0);
FoundAmounts.add(NumberFoundByTheMatcher);
}
else{LineContainsAmount = false;}
}
return FoundAmounts;
}
/**
* #Brief Method extracts post codes out of the text with help of REGEXPostCode
*
* #return List of strings with all the found post codes.
*/
public static List<String> ExtractPostCodes(){
List<String> FoundPostCodes = new ArrayList<String>();
boolean LineContainsPostCode = true;
Pattern pattern = Pattern.compile(REGEXPostCode);
Matcher matcher = pattern.matcher(CopyOfText);
while(LineContainsPostCode) {
if(matcher.find()) {
String PostCodeFoundByMatcher = matcher.group(0);
FoundPostCodes.add(PostCodeFoundByMatcher);
}
else {
LineContainsPostCode = false;
}
}
return FoundPostCodes;
}
}

Java Extracting Text Between Tags and Attributes

I am trying to extract text between particular tags and attributes. For now, I tried to extract for tags. I am reading a ".gexf" file which has XML data inside. Then I am saving this data as a string. Then I am trying to extract text between "nodes" tag. Here is my code so far:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private static String filePath = "src/babel.gexf";
public String readFile(String filePath) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(filePath));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
return sb.toString();
} finally {
br.close();
}
}
public void getNodesContent(String content) throws IOException {
final Pattern pattern = Pattern.compile("<nodes>(\\w+)</nodes>", Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
public static void main(String [] args) throws IOException {
Main m = new Main();
String result = m.readFile(filePath);
m.getNodesContent(result);
}
}
In the code above, I don't get any result. When I try it with sample string like "My string", I get the result. Link of the gexf (since it is too long, I had to upload it) file:
https://files.fm/u/qag5ykrx
I don't think placing the entire file contents into a single string is such a great idea but then I suppose that would depend upon the amount of content within the file. If it's a lot of content then I would read in that content a little differently. It would of been nice to see a fictitious example of what the file contains.
I suppose you can try this little method. The heart of it utilizes a regular expression (RegEx) along with Pattern/Matcher to retrieve the desired substring from between tags.
It is important to read the doc's with the method:
/**
* This method will retrieve a string contained between string tags. You
* specify what the starting and ending tags are within the startTag and
* endTag parameters. It is you who determines what the start and end tags
* are to be which can be any strings.<br><br>
*
* #param inputString (String) Any string to process.<br>
*
* #param startTag (String) The Start Tag String or String. Data content retrieved
* will be directly after this tag.<br><br>
*
* The supplied Start Tag criteria can contain a single special wildcard tag
* (~*~) providing you also place something like the closing chevron (>)
* for an HTML tag after the wildcard tag, for example:<pre>
*
* If we have a string which looks like this:
* {#code
* "<p style=\"padding-left:40px;\">Hello</p>"
* }
* (Note: to pass double quote marks in a string they must be excaped)
*
* and we want to use this method to extract the word "Hello" from between the
* two HTML tags then your Start Tag can be supplied as "<p~*~>" and of course
* your End Tag can be "</p>". The "<p~*~>" would be the same as supplying
* "<p style=\"padding-left:40px;\">". Anything between the characters <p and
* the supplied close chevron (>) is taken into consideration. This allows for
* contents extraction regardless of what HTML attributes are attached to the
* tag. The use of a wildcard tag (~*~) is also allowed in a supplied End
* Tag.</pre><br>
*
* The wildcard is used as a special tag so that strings that actually
* contain asterisks (*) can be processed as regular asterisks.<br>
*
* #param endTag (String) The End Tag or String. Data content retrieval will
* end just before this Tag is reached.<br>
*
* The supplied End Tag criteria can contain a single special wildcard tag
* (~*~) providing you also place something like the closing chevron (>)
* for an HTML tag after the wildcard tag, for example:<pre>
*
* If we have a string which looks like this:
* {#code
* "<p style=\"padding-left:40px;\">Hello</p>"
* }
* (Note: to pass double quote marks in a string they must be excaped)
*
* and we want to use this method to extract the word "Hello" from between the
* two HTML tags then your Start Tag can be supplied as "<p style=\"padding-left:40px;\">"
* and your End Tag can be "</~*~>". The "</~*~>" would be the same as supplying
* "</p>". Anything between the characters </ and the supplied close chevron (>)
* is taken into consideration. This allows for contents extraction regardless of what the
* HTML tag might be. The use of a wildcard tag (~*~) is also allowed in a supplied Start Tag.</pre><br>
*
* The wildcard is used as a special tag so that strings that actually
* contain asterisks (*) can be processed as regular asterisks.<br>
*
* #param trimFoundData (Optional - Boolean - Default is true) By default
* all retrieved data is trimmed of leading and trailing white-spaces. If
* you do not want this then supply false to this optional parameter.
*
* #return (1D String Array) If there is more than one pair of Start and End
* Tags contained within the supplied input String then each set is placed
* into the Array separately.<br>
*
* #throws IllegalArgumentException if any supplied method String argument
* is Null ("").
*/
public static String[] getBetweenTags(String inputString, String startTag,
String endTag, boolean... trimFoundData) {
if (inputString == null || inputString.equals("") || startTag == null ||
startTag.equals("") || endTag == null || endTag.equals("")) {
throw new IllegalArgumentException("\ngetBetweenTags() Method Error! - "
+ "A supplied method argument contains Null (\"\")!\n"
+ "Supplied Method Arguments:\n"
+ "==========================\n"
+ "inputString = \"" + inputString + "\"\n"
+ "startTag = \"" + startTag + "\"\n"
+ "endTag = \"" + endTag + "\"\n");
}
List<String> list = new ArrayList<>();
boolean trimFound = true;
if (trimFoundData.length > 0) {
trimFound = trimFoundData[0];
}
Matcher matcher;
if (startTag.contains("~*~") || endTag.contains("~*~")) {
startTag = startTag.replace("~*~", ".*?");
endTag = endTag.replace("~*~", ".*?");
Pattern pattern = Pattern.compile("(?iu)" + startTag + "(.*?)" + endTag);
matcher = pattern.matcher(inputString);
} else {
String regexString = Pattern.quote(startTag) + "(?s)(.*?)" + Pattern.quote(endTag);
Pattern pattern = Pattern.compile("(?iu)" + regexString);
matcher = pattern.matcher(inputString);
}
while (matcher.find()) {
String match = matcher.group(1);
if (trimFound) {
match = match.trim();
}
list.add(match);
}
return list.toArray(new String[list.size()]);
}
Without a sample of the file I can only suggest so much. On the contrary, what I can tell you is that you can get the substring of that text using a tag search loop. Here is an example:
String s = "<a>test</a><b>list</b><a>class</a>";
int start = 0, end = 0;
for(int i = 0; i < s.toCharArray().length-1; i++){
if(s.toCharArray()[i] == '<' && s.toCharArray()[i+1] == 'a' && s.toCharArray()[i+2] == '>'){
start = i+3;
for(int j = start+3; j < s.toCharArray().length-1; j++){
if(s.toCharArray()[j] == '<' && s.toCharArray()[j+1] == '/' && s.toCharArray()[j+2] == 'a' && s.toCharArray()[j+3] == '>'){
end = j;
System.out.println(s.substring(start, end));
break;
}
}
}
}
The above code will search string s for the tag and then start where it found that and continue until it finds the closing a tag. then it uses those two positions to create a substring of the string which is the text between the two tags. You can stack as many of these tag searches as you want. Here is an example of a 2 tag search:
String s = "<a>test</a><b>list</b><a>class</a>";
int start = 0, end = 0;
for(int i = 0; i < s.toCharArray().length-1; i++){
if((s.toCharArray()[i] == '<' && s.toCharArray()[i+1] == 'a' && s.toCharArray()[i+2] == '>') ||
(s.toCharArray()[i] == '<' && s.toCharArray()[i+1] == 'b' && s.toCharArray()[i+2] == '>')){
start = i+3;
for(int j = start+3; j < s.toCharArray().length-1; j++){
if((s.toCharArray()[j] == '<' && s.toCharArray()[j+1] == '/' && s.toCharArray()[j+2] == 'a' && s.toCharArray()[j+3] == '>') ||
(s.toCharArray()[j] == '<' && s.toCharArray()[j+1] == '/' && s.toCharArray()[j+2] == 'b' && s.toCharArray()[j+3] == '>')){
end = j;
System.out.println(s.substring(start, end));
break;
}
}
}
}
The only difference is that i've added clauses to the if statements to also get the text between b tags. This system is extreemly versatile and I think you'll fund an abundance of use for it.

Need to extract data from CSV file

In my file I have below data, everything is string
Input
"abcd","12345","success,1234,out",,"hai"
The output should be like below
Column 1: "abcd"
Column 2: "12345"
Column 3: "success,1234,out"
Column 4: null
Column 5: "hai"
We need to use comma as a delimiter , the null value is comming without double quotes.
Could you please help me to find a regular expression to parse this data
You could try a tool like CSVReader from OpenCsv https://sourceforge.net/projects/opencsv/
You can even configure a CSVParser (used by the reader) to output null on several conditions. From the doc :
/**
* Denotes what field contents will cause the parser to return null: EMPTY_SEPARATORS, EMPTY_QUOTES, BOTH, NEITHER (default)
*/
public static final CSVReaderNullFieldIndicator DEFAULT_NULL_FIELD_INDICATOR = NEITHER;
You can use this Regular Expression
"([^"]*)"
DEMO: https://regex101.com/r/WpgU9W/1
Match 1
Group 1. 1-5 `abcd`
Match 2
Group 1. 8-13 `12345`
Match 3
Group 1. 16-32 `success,1234,out`
Match 4
Group 1. 36-39 `hai`
Using the ("[^"]+")|(?<=,)(,) regex you may find either quoted strings ("[^"]+"), which should be treated as is, or commas preceded by commas, which denote null field values. All you need now is iterate through the matches and check which of the two capture groups defined and output accordingly:
String input = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
Pattern pattern = Pattern.compile("(\"[^\"]+\")|(?<=,)(,)");
Matcher matcher = pattern.matcher(input);
int col = 1;
while (matcher.find()) {
if (matcher.group(1) != null) {
System.out.println("Column " + col + ": " + matcher.group(1));
col++;
} else if (matcher.group(2) != null) {
System.out.println("Column " + col + ": null");
col++;
}
}
Demo: https://ideone.com/QmCzPE
Step #1:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(,,)";
final String string = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"\n"
+ "\"abcd\",\"12345\",\"success,1234,out\",\"null\",\"hai\"";
final String subst = ",\"null\",";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
Original Text:
"abcd","12345","success,1234,out",,"hai"
Transformation: (with null)
"abcd","12345","success,1234,out","null","hai"
Step #2: (use REGEXP)
"([^"]*)"
Result:
abcd
12345
success,1234,out
null
hai
Credits:
Emmanuel Guiton [https://stackoverflow.com/users/7226842/emmanuel-guiton] REGEXP
You can also use the Replace function:
final String inuput = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
System.out.println(inuput);
String[] strings = inuput
.replaceAll(",,", ",\"\",")
.replaceAll(",,", ",\"\",") // if you have more then one null successively
.replaceAll("\",\"", "\";\"")
.replaceAll("\"\"", "")
.split(";");
for (String string : strings) {
String output = string;
if (output.isEmpty()) {
output = null;
}
System.out.println(output);
}

Regex for converting xml tags key to camel case

I am looking for the simplest method in java which takes a XML string, and converts all tags (not their contents) to camel case, such as
<HeaderFirst>
<HeaderTwo>
<ID>id1</ID>
<TimeStamp>2016-11-04T02:46:34Z</TimeStamp>
<DetailedDescription>
<![CDATA[di]]>
</DetailedDescription>
</HeaderTwo>
</HeaderFirst>
will be converted to
<headerFirst>
<headerTwo>
<id>id1</id>
<timeStamp>2016-11-04T02:46:34Z</timeStamp>
<detailedDescription>
<![CDATA[di]]>
</detailedDescription>
</headerTwo>
</headerFirst>
Try something like this:
public void tagToCamelCase(String input){
char[] inputArray = input.toCharArray();
for (int i = 0; i < inputArray.length-2; i++){
if (inputArray[i] == '<'){
if(inputArray[i+1]!= '/')
inputArray[i+1] = Character.toLowerCase(inputArray[i+1]);
else
inputArray[i+2] = Character.toLowerCase(inputArray[i+2]);
}
}
System.out.println(new String(inputArray));
}
Note: the tag ID will be iD and not id. Hope this helps.
Here is a solution that is based on splitting the string on the ">" character and then processing the tokens in three different cases: CDATA, open tag, and close tag
The following code should work (see the program output below). There is, however, a problem with the tag "ID" -- how do we know that its camel case should be "id" not "iD"? This needs a dictionary to capture that knowledge. So the following routine convert() has two modes -- useDictionary being true or false. See if the following solution satisfies your requirement.
To use the "useDictionary" mode you also need to maintain a proper dictionary (the hashmap called "dict" in the program, right now there is only one entry in the dictionary "ID" should be camel-cased to "id"). Note that the dictionary can be ramped up incrementally -- you only need to add the special cases to the dictionary (e.g. the camel-case of "ID" is "id" not "iD")
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CamelCase {
private static Map<String, String> dict = new HashMap<>();
static {
dict.put("ID", "id");
}
public static void main(String[] args) {
String input = "<HeaderFirst> "
+ "\n <HeaderTwo>"
+ "\n <ID>id1</ID>"
+ "\n <TimeStamp>2016-11-04T02:46:34Z</TimeStamp>"
+ "\n <DetailedDescription>"
+ "\n <![CDATA[di]]>"
+ "\n </DetailedDescription>"
+ "\n </HeaderTwo> "
+ "\n</HeaderFirst>";
System.out.println("===== output without using a dictionary =====");
System.out.println(convert(input, false /* useDictionary */));
System.out.println("===== output using a dictionary =====");
System.out.println(convert(input, true /* useDictionary */));
}
private static String convert(String input, boolean useDictionary) {
String splitter = ">";
String[] tokens = input.split(splitter);
StringBuilder sb = new StringBuilder();
Pattern cdataPattern = Pattern.compile("([^<]*)<!\\[CDATA\\[([^\\]]*)\\]\\]");
Pattern oTagPattern = Pattern.compile("([^<]*)<(\\w+)");
Pattern cTagPattern = Pattern.compile("([^<]*)</(\\w+)");
String prefix;
String tag;
String newTag;
for (String token : tokens) {
Matcher cdataMatcher = cdataPattern.matcher(token);
Matcher oTagMatcher = oTagPattern.matcher(token);
Matcher cTagMatcher = cTagPattern.matcher(token);
if (cdataMatcher.find()) { // CDATA - do not change
sb.append(token);
} else if (oTagMatcher.find()) {// open tag - change first char to lower case
prefix = oTagMatcher.group(1);
tag = oTagMatcher.group(2);
newTag = camelCaseOneTag(tag, useDictionary);
sb.append(prefix + "<" + newTag);
} else if (cTagMatcher.find()) {// close tag - change first char to lower case
prefix = cTagMatcher.group(1);
tag = cTagMatcher.group(2);
newTag = camelCaseOneTag(tag, useDictionary);
sb.append(prefix + "<" + newTag);
}
sb.append(splitter);
}
return sb.toString();
}
private static String camelCaseOneTag(String tag, boolean useDictionary) {
String newTag;
if (useDictionary
&& dict.containsKey(tag)) {
newTag = dict.get(tag);
} else {
newTag = tag.substring(0, 1).toLowerCase()
+ tag.substring(1);
}
return newTag;
}
}
The output of this program is this:
===== output without using a dictionary =====
<headerFirst>
<headerTwo>
<iD>id1<iD>
<timeStamp>2016-11-04T02:46:34Z<timeStamp>
<detailedDescription>
<![CDATA[di]]>
<detailedDescription>
<headerTwo>
<headerFirst>
===== output using a dictionary =====
<headerFirst>
<headerTwo>
<id>id1<id>
<timeStamp>2016-11-04T02:46:34Z<timeStamp>
<detailedDescription>
<![CDATA[di]]>
<detailedDescription>
<headerTwo>
<headerFirst>

Parse the string

Can any one suggest how to parse the the below string?
Added Active10000000044: {activityId=Active1, schedule=1 22 * * 0, isEnabled=Y, type=global, runAtHost=null}
I want Active10000000044 part out to use further next step..
If you want the right of the ":" then you can use
String str = "Added Active10000000044: {activityId=Active1, schedule=1 22 * * 0, isEnabled=Y, type=global, runAtHost=null}:";
System.out.println(str.split(":")[1]);
The left can be found using
System.out.println(str.split(":")[0]);
It could be as simple as:
String str = str.replaceFirst("Added ","").replaceFirst(" .*","");
depending on whether you've given us the full suite of test data :-)
If you want the second word regardless of the first, you could try:
String str = str.replaceFirst("[^ ]+ +","").replaceFirst(" .*","");
Both those suggestions rely on the fact that the first word is not preceded by spaces and that the white space is actually spaces. Any deviation from that will require some slight tweaks.
Try this,
String str = "Added Active10000000044: {activityId=Active1, schedule=1 22 * * 0, isEnabled=Y, type=global, runAtHost=null}:";
String[] parts = str.split(":");
String part1 = parts[0]; // value "Added Active10000000044"
String[] SetU_need = part1.split(" ");
String u_need = SetU_need[1]; // value "Active10000000044"
try this
Splitter class is from Google guava library
String text = "Added Active10000000044: {activityId=Active1, schedule=1 22 * * 0, isEnabled=Y, type=global, runAtHost=null}:";
int indexOfOpenBrace = text.indexOf("{");
int indexOfCloseBrace = text.indexOf("}");
String valuesAsText = text.substring(indexOfOpenBrace+1, indexOfCloseBrace);
List<String> splitToList = Splitter.on(",").omitEmptyStrings().splitToList(valuesAsText);
Map<String, String> map = new HashMap<>();
for (String keyValues : splitToList) {
List<String> splitToKeyAndValues = Splitter.on("=").omitEmptyStrings().splitToList(keyValues);
map.put(splitToKeyAndValues.get(0), splitToKeyAndValues.get(1));
}
Set<String> keySet = map.keySet();
for (String key : keySet) {
System.out.println(key+":"+map.get(key));
}
Output
activityId:Active1
schedule:1 22 * * 0
type:global
runAtHost:null
isEnabled:Y

Categories