Regex for replace the string on particular position - java

MSH|^~\&|RAD|MCH|SOARCLIN|MCH|201309281506||ORU^R01|RMS|P|2.4
PID|0001|_MISSING_|059805^a~059805^a~059805^a||RENNER^KATHRYN^
In a string like the above I need to replace the string on basis of | (pipe sign) count.
e.g. :
MSH line want to replace after 3rth position of (|) pipe sign "MCH"
with "ABC"
input : MSH|^~\&|RAD|MCH|SOARCLIN|MCH|201309281506||ORU^R01|RMS|P|2.4
output : MSH|^~\&|RAD|MCH|SOARCLIN|ABC|201309281506||ORU^R01|RMS|P|2.4

String repSection( String del, int count, String rep ){
String[] toks = theString.split( Pattern.quote( del ) );
toks[count] = rep;
theString = String.join( del, toks );
}
Call:
String result = repSection( "|", 3, "ABC" );
It depends on counting alone; it doesn't matter what is there between the 3rd and 4th pipe char.
I prefer this to some fancy and difficult to maintain regex.
s = s.replaceAll( "^((?:[^|]*\\|){3})[^|]*", "$1|ABC" );
Again, this doesn't care what is between 3rd and 4th pipe symbol.

Related

Apache StrTokenizer How To Escape Quote and Comma in String Literal

I have this code to parse some csv, given the understanding that doing double quote escapes the quote within the string literal (as said in the Apache docs)
private void test() {
char quote = '\'';
char delim = ',';
// should be split into [comma, comma], [quote ', comma]
String inputListValues = "'comma, comma', 'quote '', comma'";
StrTokenizer st = new StrTokenizer(inputListValues, delim, quote);
List<String> vals = new ArrayList<String>();
while (st.hasNext()) {
vals.add(st.nextToken().trim());
}
System.out.println(vals);
// should be split into [quote ', comma], [comma, comma]
String inputListValues2 = "'quote '', comma', 'comma, comma'";
StrTokenizer st2 = new StrTokenizer(inputListValues2, delim, quote);
List<String> vals2 = new ArrayList<String>();
while (st2.hasNext()) {
vals2.add(st2.nextToken().trim());
}
System.out.println(vals2);
}
the output is
vals ArrayList<E> (id=1088)
[0] "comma, comma" (id=1063)
[1] "'quote ''" (id=1036)
[2] "comma'" (id=2123)
vals2 ArrayList<E> (id=2296)
[0] "quote ', comma" (id=1920)
[1] "'comma" (id=1852)
[2] "comma'" (id=1316)
I'm expecting 2 items parsed: [quote ', comma], [comma, comma]
If it didn't work at all it would be one thing, but it seems like changing the order causes the parsing to change the behavior.
Does anyone have any idea? I'm on the verge of just using another library or regex.
It's because I started using this with "csv parser" in mind, however it's not. The docs say
"a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
So spaces are part of the token. I added then used setTrimmerMatcher, since for a trimmer matcher:
These characters are trimmed off on each side of the delimiter until the token or quote is found.
Code ended up being
StrTokenizer st = new StrTokenizer(toTokenize, DELIM_CHAR, QUOTE_CHAR);
// by default this is a STRING matching, not csv parser, so spaces count as part of the token
// ie "a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
// thus we set the trimmer matcher, which "are trimmed off on each side of the delimiter until the token or quote is found."
st.setTrimmerMatcher(StrMatcher.trimMatcher());

Convert String="one,two,three" to String='one','two','three'

Need to convert my String values "one,two,three" to 'one','two','three'
I have below code
String input = "One,two,three";
Need to send this input values for an query in hibernate.
So need to send as 'one','two','three' as single string, please provide me a solution easy way to do it
You can do the following:
String result = input.replace(",", "','").replaceAll("(.*)", "'$1'");
input.replace(",", "','") replaces each , with ',' so at this step your string will look like:
One','two','three
next we use a regex to surround the string with ', now it'll look
'One','two','three'
which is what you want.
Regex explanation: We catch the whole string, then we replace it with itself, but surrounded with single quote.
References:
String#replaceAll
String#replace
Regex tutorial
Using regex :
input.replaceAll("(\\w+)", "\'$1\'")
In regex, 'w' is the word character.
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum
String[] split = input.split(",");
StringBuilder sb = new StringBuilder();
for (int i = 0 ; i < split.length ; i++ ) {
sb.append("'" + str + "'");
if ( i != split.length-1 ) {
sb.append(",");
}
}
sb.toString();
not smart, but easiest..

Java String tokens

I have a string line
String user_name = "id=123 user=aron name=aron app=application";
and I have a list that contains: {user,cuser,suser}
And i have to get the user part from string. So i have code like this
List<String> userName = Config.getConfig().getList(Configuration.ATT_CEF_USER_NAME);
String result = null;
for (String param: user_name .split("\\s", 0)){
for(String user: userName ){
String userParam = user.concat("=.*");
if (param.matches(userParam )) {
result = param.split("=")[1];
}
}
}
But the problem is that if the String contains spaces in the user_name, It do not work.
For ex:
String user_name = "id=123 user=aron nicols name=aron app=application";
Here user has a value aron nicols which contain spaces. How can I write a code that can get me exact user value i.e. aron nicols
If you want to split only on spaces that are right before tokens which have = righ after it such as user=... then maybe add look ahead condition like
split("\\s(?=\\S*=)")
This regex will split on
\\s space
(?=\\S*=) which has zero or more * non-space \\S characters which ends with = after it. Also look-ahead (?=...) is zero-length match which means part matched by it will not be included in in result so split will not split on it.
Demo:
String user_name = "id=123 user=aron nicols name=aron app=application";
for (String s : user_name.split("\\s(?=\\S*=)"))
System.out.println(s);
output:
id=123
user=aron nicols
name=aron
app=application
From your comment in other answer it seems that = which are escaped with \ shouldn't be treated as separator between key=value but as part of value. In that case you can just add negative-look-behind mechanism to see if before = is no \, so (?<!\\\\) right before will require = to not have \ before it.
BTW to create regex which will match \ we need to write it as \\ but in Java we also need to escape each of \ to create \ literal in String that is why we ended up with \\\\.
So you can use
split("\\s(?=\\S*(?<!\\\\)=)")
Demo:
String user_name = "user=Dist\\=Name1, xyz src=activedirectorydomain ip=10.1.77.24";
for (String s : user_name.split("\\s(?=\\S*(?<!\\\\)=)"))
System.out.println(s);
output:
user=Dist\=Name1, xyz
src=activedirectorydomain
ip=10.1.77.24
Do it like this:
First split input string using this regex:
" +(?=\\w+(?<!\\\\)=)"
This will give you 4 name=value tokens like this:
id=123
user=aron nicols
name=aron
app=application
Now you can just split on = to get your name and value parts.
Regex Demo
Regex Demo with escaped =
CODE FISH, this simple regex captures the user in Group 1: user=\\s*(.*?)\s+name=
It will capture "Aron", "Aron Nichols", "Aron Nichols The Benevolent", and so on.
It relies on the knowledge that name= always follows user=
However, if you're not sure that the token following user is name, you can use this:
user=\s*(.*?)(?=$|\s+\w+=)
Here is how to use the second expression (for the first, just change the string in Pattern.compile:
String ResultString = null;
try {
Pattern regex = Pattern.compile("user=\\s*(.*?)(?=$|\\s+\\w+=)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group(1);
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

How to remove special characters from input text

I want to remove all special characters from input text as well as some restricted words.
Whatever the things I want to remove, that will come dynamically
(Let me clarify this: Whatever the words I need to exclude they will be provided dynamically - the user will decide what needs to be excluded. That is the reason I did not include regex. restricted_words_list (see my code) will get from the database just to check the code working or not I kept statically ),
but for demonstration purposes, I kept them in a String array to confirm whether my code is working properly or not.
public class TestKeyword {
private static final String[] restricted_words_list={"#","of","an","^","#","<",">","(",")"};
private static final Pattern restrictedReplacer;
private static Set<String> restrictedWords = null;
static {
StringBuilder strb= new StringBuilder();
for(String str:restricted_words_list){
strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
}
strb.setLength(strb.length()-1);
restrictedReplacer = Pattern.compile(strb.toString(),Pattern.CASE_INSENSITIVE);
strb = new StringBuilder();
}
public static void main(String[] args)
{
String inputText = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";
System.out.println("inputText : " + inputText);
String modifiedText = restrictedWordCheck(inputText);
System.out.println("Modified Text : " + modifiedText);
}
public static String restrictedWordCheck(String input){
Matcher m = restrictedReplacer.matcher(input);
StringBuffer strb = new StringBuffer(input.length());//ensuring capacity
while(m.find()){
if(restrictedWords==null)restrictedWords = new HashSet<String>();
restrictedWords.add(m.group()); //m.group() returns what was matched
m.appendReplacement(strb,""); //this writes out what came in between matching words
for(int i=m.start();i<m.end();i++)
strb.append("");
}
m.appendTail(strb);
return strb.toString();
}
}
The output is :
inputText : abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg
Modified Text : abcd abc# cbda ssef jjj the gg wh&at gggg ss%ss ### (()) DhD
Here the excluded words are of and an, but only some of the special characters, not all that I specified in restricted_words_list
Now I got a better Solution:
String inputText = title;// assigning input
List<String> restricted_words_list = catalogueService.getWordStopper(); // getting all stopper words from database dynamically (inside getWordStopper() method just i wrote a query and getting list of words)
String finalResult = "";
List<String> stopperCleanText = new ArrayList<String>();
String[] afterTextSplit = inputText.split("\\s"); // split and add to list
for (int i = 0; i < afterTextSplit.length; i++) {
stopperCleanText.add(afterTextSplit[i]); // adding to list
}
stopperCleanText.removeAll(restricted_words_list); // remove all word stopper
for (String addToString : stopperCleanText)
{
finalResult += addToString+";"; // add semicolon to cleaned text
}
return finalResult;
public String replaceAll(String regex,
String replacement)
Replaces each substring of this string (which matches the given regular expression) with the given replacement.
Parameters:
regex - the regular expression to which this string is to be
matched
replacement - the string to be substituted for each match.
So you just need to provide replacement parameter with an empty String.
You should change your loop
for(String str:restricted_words_list){
strb.append("\\b").append(Pattern.quote(str)).append("\\b|");
}
to this:
for(String str:restricted_words_list){
strb.append("\\b*").append(Pattern.quote(str)).append("\\b*|");
}
Because with your loop you're matching the restricted_words_list elements only if there is something before and after the match. Since abc# does not have anything after the # it will not be replaced. If you add * (which means 0 or more occurences) to the \\b on either side it will match things like abc# as well.
You may consider to use Regex directly to replace those special character with empty ''? Check it out: Java; String replace (using regular expressions)?, some tutorial here: http://www.vogella.com/articles/JavaRegularExpressions/article.html
You can also do like this :
String inputText = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg<g ss%ss ### (()) D^h^D";
String regx="([^a-z^ ^0-9]*\\^*)";
String textWithoutSpecialChar=inputText.replaceAll(regx,"");
System.out.println("Without Special Char:"+textWithoutSpecialChar);
String yourSetofString="of|an"; // your restricted words.
String op=textWithoutSpecialChar.replaceAll(yourSetofString,"");
System.out.println("output : "+op);
o/p :
Without Special Char:abcd abc cbda ssef of jjj the gg an what gggg ssss h
output : abcd abc cbda ssef jjj the gg what gggg ssss h
String s = "abcd abc# cbda ssef of jjj t#he g^g an wh&at ggg (blah) and | then";
String[] words = new String[]{ " of ", "|", "(", " an ", "#", "#", "&", "^", ")" };
StringBuilder sb = new StringBuilder();
for( String w : words ) {
if( w.length() == 1 ) {
sb.append( "\\" );
}
sb.append( w ).append( "|" );
}
System.out.println( s.replaceAll( sb.toString(), "" ) );

Regex for extracting a substring

I want to extract the string from the input string with "/" removed from the beginning and the end (if present).
For example :
Input String : /abcd
Output String : abcd
Input String : /abcd/
Output String : abcd
Input String : abcd/
Output String : abcd
Input String : abcd
Output String : abcd
Input String : //abcd/
Output String : /abcd
public static void main(String[] args) {
String abcd1 = "/abcd/";
String abcd2 = "/abcd";
String abcd3 = "abcd/";
String abcd4 = "abcd";
System.out.println(abcd1.replaceAll("(^/)?(/$)?", ""));
System.out.println(abcd2.replaceAll("(^/)?(/$)?", ""));
System.out.println(abcd3.replaceAll("(^/)?(/$)?", ""));
System.out.println(abcd4.replaceAll("(^/)?(/$)?", ""));
}
Will work.
Matches the first (^/)? means match 0 or 1 '/' at the beginning of the string, and (/$)? means match 0 or 1 '/' at the end of the string.
Make the regex "(^/*)?(/*$)?" to support matching multiple '/':
String abcd5 = "//abcd///";
System.out.println(abcd1.replaceAll("(^/*)?(/*$)?", ""));
One more guess: ^\/|\/$ for replace RegEx.
Method without regex:
String input = "/hello world/";
int length = input.length(),
from = input.charAt(0) == '/' ? 1 : 0,
to = input.charAt(length - 1) == '/' ? length - 1 : length;
String output = input.substring(from, to);
You can try
String original="/abc/";
original.replaceAll("/","");
Then do call trim to avoid white spaces.
original.trim();
This one seems works :
/?([a-zA-Z]+)/?
Explanation :
/? : zero or one repetition
([a-zA-Z]+) : capture alphabetic caracter, one or more repetition
/? : zero or one repetition

Categories