3 columns from Pipe delimited 7columns using regex in java
Example:
String:
10|name|city||date|0|9013
i only want upto city(3 columns):
Expected output:
10|name|city
means: i want number of columns based on | using regex.
Thank you.
I would use a simple regex pattern with the split method. There's probably a more elegant way to handle the pipes in the resulting string but this should give you an idea, goodluck!
public static void main(String[] args) {
String str = "10|name|city||date|0|9013";
// split the string whenever we see a pipe
String[] arrOfStr = str.split("\\|");
StringBuilder sb = new StringBuilder();
// loop through the array we generated and format our output
// we only want the first three elements so loop accordingly
for (int i = 0; i < 3; i++) {
sb.append(arrOfStr[i]+"|");
}
// remove the trailing pipe
sb.setLength(sb.length() - 1);
System.out.println(sb.toString());
}
Related
So I wish to split a sentence into separate tokens. However, I don't want to get rid of certain punctuations that I wish to be part of tokens. For example, "didn't" should stay as "didn't" at the end of a word if the punctuation is not followed by a letter it should be taken out. So, "you?" should be converted to "you" same with the begining: "?you" should be "you".
String str = "..Hello ?don't #$you %know?";
String[] strArray = new String[10];
strArray = str.split("[^A-za-z]+[\\s]|[\\s]");
//strArray[strArray.length-1]
for(int i = 0; i < strArray.length; i++) {
System.out.println(strArray[i] + i);
}
This should just print out:
hello0
don't1
you2
know3
Rather than splitting, you should prefer to use find to find all the tokens as you want with this regex,
[a-zA-Z]+(['][a-zA-Z]+)?
This regex will only allow sandwiching a single ' within it. If you want to allow any other such character, just place it within the character set ['] and right now it will allow only once and in case you want to allow multiple times, you will have to change ? at the end with a * to make it zero or more times.
Checkout your modified Java code,
List<String> tokenList = new ArrayList<String>();
String str = "..Hello ?don't #$you %know?";
Pattern p = Pattern.compile("[a-zA-Z]+(['][a-zA-Z]+)?");
Matcher m = p.matcher(str);
while (m.find()) {
tokenList.add(m.group());
}
String[] strArray = tokenList.toArray(new String[tokenList.size()]);
for (int i = 0; i < strArray.length; i++) {
System.out.println(strArray[i] + i);
}
Prints,
Hello0
don't1
you2
know3
However, if you insist on using split method only, then you can use this regex to split the values,
[^a-zA-Z]*\\s+[^a-zA-Z]*|[^a-zA-Z']+
Which basically splits the string on one or more white space optionally surrounded by non-alphabet characters or split by sequence of one or more non-alphabet and non single quote character. Here is the sample Java code using split,
String str = ".. Hello ?don't #$you %know?";
String[] strArray = Arrays.stream(str.split("[^a-zA-Z]*\\s+[^a-zA-Z]*|[^a-zA-Z']+")).filter(x -> x.length()>0).toArray(String[]::new);
for (int i = 0; i < strArray.length; i++) {
System.out.println(strArray[i] + i);
}
Prints,
Hello0
don't1
you2
know3
Notice here, I have used filter method on streams to filter tokens of zero length as split may generate zero length tokens at the start of array.
I have a list of titles that I want to save as a String:
- title1
- title2
- title|3
Now, I want to save this as a single line String delimited by |, which would mean it ends up like this: title1|title2|title|3.
But now, when I split the String:
String input = "title1|title2|title|3";
String[] splittedInput = input.split("\\|");
splittedInput will be the following array: {"title1", "title2", "title", "3"}.
Obviously, this is not what I want, I want the third entry of the array to be title|3.
Now my question: how do I correctly escape the | in the titles so that when I split the String I end up with the correct array of three titles, instead of 4?
#Gábor Bakos
Running this code snippet:
String input = "title1|title2|title\\|3";
String[] split = input.split("(?<!\\\\)\\|");
for (int i = 0; i < split.length; i++) {
split[i] = split[i].replace("\\\\(?=\\|)", "");
}
System.out.println(Arrays.toString(split));
I get this output: [title1, title2, title\|3]. What am I doing wrong?
You can use anything. For example with \:
"title1|title2|title\\|3".split("(?<!\\\\)\\|").map(_.replaceAll("\\\\(?=\\|)", "")) //Scala syntax
Resulting:
Array(title1, title2, title|3)
The final mapping is required to remove the escaping character too.
(?<!\\\\) is look behind, (?=\\|) is an extra look-ahead for the escaped |.
Well if you use a TSV format the chosen separator must never be left unescaped in the data.
You could simply escape your data (for ex, title1|title2|title\|3) and you would then split on (?<!\\)| (negative lookbehind).
In Java, it gives:
public static void main(String[] args) {
// prints out [title1, title2, title|3, title|4]
System.out.println(parsePipeSeparated("title1|title2|title\\|3|title\\|4"));
}
private static List<String> parsePipeSeparated(String input) {
return Stream.of(input.split("(?<!\\\\)\\|"))
.map(escapedText -> escapedText.replace("\\|", "|"))
.collect(Collectors.toList());
}
Use another separator, for instance "title1,title2,title|3", instead of "title1|title2|title|3". And then split(",")
I want to split a String on a delimiter.
Example String:
String str="ABCD/12346567899887455422DEFG/15479897445698742322141PQRS/141455798951";
Now I want Strings as ABCD/12346567899887455422, DEFG/15479897445698742322141 like I want
only 4 chars before /
after / any number of chars numbers and letters.
Update:
The only time I need the previous 4 characters is after a delimiter is shown, as the string may contain letters or numbers...
My code attempt:
public class StringReq {
public static void main(String[] args) {
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
testSplitStrings(str);
}
public static void testSplitStrings(String path) {
System.out.println("splitting of sprint starts \n");
String[] codeDesc = path.split("/");
String[] codeVal = new String[codeDesc.length];
for (int i = 0; i < codeDesc.length; i++) {
codeVal[i] = codeDesc[i].substring(codeDesc[i].length() - 4,
codeDesc[i].length());
System.out.println("line" + i + "==> " + codeDesc[i] + "\n");
}
for (int i = 0; i < codeVal.length - 1; i++) {
System.out.println(codeVal[i]);
}
System.out.println("splitting of sprint ends");
}
}
You claim that after / there can appear digits and alphabets, but in your example I don't see any alphabets which should be included in result after /.
So based on that assumption you can simply split in placed which has digit before and A-Z character after it.
To do so you can split with regex which is using look-around mechanism like str.split("(?<=[0-9])(?=[A-Z])")
Demo:
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
for (String s : str.split("(?<=[0-9])(?=[A-Z])"))
System.out.println(s);
Output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
If you alphabets can actually appear in second part (after /) then you can use split which will try to find places which have four alphabetic characters and / after it like split("(?=[A-Z]{4}/)") (assuming that you are using at least Java 8, if not you will need to manually exclude case of splitting at start of the string for instance by adding (?!^) or (?<=.) at start of your regex).
you can use regex
Pattern pattern = Pattern.compile("[A-Z]{4}/[0-9]*");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
Instead of:
String[] codeDesc = path.split("/");
Just use this regex (4 characters before / and any characters after):
String[] codeDesc = path.split("(?=.{4}/)(?<=.)");
Even simpler using \d:
path.split("(?=[A-Za-z])(?<=\\d)");
EDIT:
Included condition for 4 any size letters only.
path.split("(?=[A-Za-z]{4})(?<=\\d)");
output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
It is still unclear if this is authors expected result.
I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?
You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"
Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]
I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]
You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]
I want to separate a string with special characters for example:
String s = ",?hello=glu()Stop/<><$#!gluglufazoperu";
I use the split function to obtain the normal characters:
hello
glu
Stop
gluglufazoperu
I have a problem, when I use the split it puts whitespace in the begining of the string array, anyone knows how to remove it?
Here is my code example:
public class Main {
public static void main(String[] args) {
String s = ",?hello=glu()Stop/<><$#!gluglufazoperu";
String f[] = s.split("[^\\w \\s]+");
int i= 0;
while(i < f.length){
System.out.println(f[i]);
i++;
}
}
}
This is the output:
(whitespace)
hello
glu
Stop
gluglufazoperu
there is a whitespace because the first split between ',' and '?' returns an empty string "".
With your while loop you print System.out.println(""), and that is an empty line.
When you only want to print the not empty strings you should replace your System.out.println with
if(!"".equals(f[i])){
System.out.println(f[i]);
}
And (beside your question) a little tip, take a look at this tutorial.
oracle for loop tutorial
Not whitespace but empty string as ",?" is a separator too. This could happen at the end too.
You might simple skip an empty string.
You might remove those from the array, which is costly, as it makes a copy.
if (f.length > 1) {
if (f[0].isEnpty()) {
f = Arrays.copyOfRange(f, 1, f.length);
}
if (f.length > 1) {
if (f[f.length - 1].isEnpty()) {
f = Arrays.copyOfRange(f, 0, f.length - 1);
}
}
}
Remarks:
String[] f = ... is more conventional in java.
\\s already captures the space, and do you really want to keep spaces?
You could simply replace the longest prefix the string matching your regular expression with the empty string before splitting:
final String regex = "[^\\w \\s]+";
String f[] = s.replaceAll("^"+regex, "")
.split(regex);