How to split a java string based on newline character - java

I have a string in java defined as below:
String numbers = null;
for (int i= 0; i < contactNumberList.size();i++)
{
numbers = contactNumberList.get(i) + "\n" + numbers;
}
where contactNumberList contains four items as : 9891, 3432, 5432, 9890.
So after the loop i have numbers string as:
9890\n5432\n3432\n9891\nnull
Then i passed the above string through following APIs.
String toUnicodeEncoded = StringEscapeUtils.escapeJava(numbers);
toUnicodeEncoded = StringEscapeUtils.escapeXml10(toUnicodeEncoded);
Now when i try to print the string toUnicodeEncoded character by character as below:
for (int i =0;i<toUnicodeEncoded.length();i++)
{
Logger.log("chat at "+i + " = "+(toUnicodeEncoded.charAt(i)));
}
It gives :
char at 0 = 9
char at 1 = 8
char at 2 = 9
char at 3 = 0
char at 4 = \
char at 5 = n
and so on .
My point is "\n" became two characters \ and n .
Now i wanted to split the string toUnicodeEncoded based on "\n" using the following APIs:
String lines[] = toUnicodeEncoded.split("\\n");
But its not able to split it because now "\n" has become \ and n. How do i split toUnicodeEncoded string by "\n" or new line character.
BAsically i want the output as :
9890
5432
3432
9891
null
i.e all four numbers . How do i do it.

When we split your string with \n it is giving expected output. But it is better to use System.getProperty("line.separator") instead of \n
String s="9890\n5432\n3432\n9891\nnull";
s = StringEscapeUtils.escapeJava(s);
s= StringEscapeUtils.escapeXml10(s);
for (String number:s.split("\n")) {
System.out.println(number);
}
result
9890
5432
3432
9891
null

use this, should do the trick
String.split("[\\r\\n]+")

Thanks everybody for replying. But i got it working using following approach:
String pattern = Pattern.quote("\\" + "n");
String lines[] = toUnicodeEncoded.split(pattern);

Related

Removing whitespaces at the beginning of the string with Regex gives null Java

I would like to get groups from a string that is loaded from txt file. This file looks something like this (notice the space at the beginning of file):
as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655
First part of string until first comma can be digits and letter, second part of string are only digits and third are also only digits. After | its all repeating.
First, I load txt file into string :String readFile3 = readFromTxtFile("/resources/file.txt");
Then I remove all whitespaces with regex :
String no_whitespace = readFile3.replaceAll("\\s+", "");
After that i try to get groups :
Pattern p = Pattern.compile("[a-zA-Z0-9]*,\\d*,\\d*", Pattern.MULTILINE);
Matcher m = p.matcher(ue_No_whitespace);
int lastMatchPos = 0;
while (m.find()) {
System.out.println(m.group());
lastMatchPos = m.end();
}
if (lastMatchPos != ue_No_whitespace.length())
System.out.println("Invalid string!");
Now I would like, for each group remove "," and add every value to its variable, but I am getting this groups : (notice this NULL)
nullas431431af,87546,3214
5a341fafaf,3365,54465
6adrT43,5678,5655
What am i doing wrong? Even when i physicaly remove space from the beginning of the txt file , same result occurs.
Is there any easier way to get groups in this string with regex and add each string part, before "," , to its variable?
You can split with | enclosed with optional whitespaces and then split the obtained items with , enclosed with optional whitespaces:
String str = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String[] items = str.split("\\s*\\|\\s*");
List<String[]> res = new ArrayList<>();
for(String i : items) {
String[] parts = i.split("\\s*,\\s*");
res.add(parts);
System.out.println(parts[0] + " - " + parts[1] + " - " + parts[2]);
}
See the Java demo printing
as431431af - 87546 - 3214
5a341fafaf - 3365 - 54465
6adrT43 - 5678 - 5655
The results are in the res list.
Note that
\s* - matches zero or more whitespaces
\| - matches a pipe char
The pattern that you tried only has optional quantifiers * which could also match only comma's.
You also don't need Pattern.MULTILINE as there are no anchors in the pattern.
You can use 3 capture groups and use + as the quantifier to match at least 1 or more occurrence, and after each part either match a pipe | or assert the end of the string $
([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\||$)
Regex demo | Java demo
For example
String readFile3 = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String no_whitespace = readFile3.replaceAll("\\s+", "");
Pattern p = Pattern.compile("([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\\||$)");
Matcher matcher = p.matcher(no_whitespace);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
System.out.println("--------------------------------");
}
Output
as431431af
87546
3214
--------------------------------
5a341fafaf
3365
54465
--------------------------------
6adrT43
5678
5655
--------------------------------

Length of String within tags in java

We need to find the length of the tag names within the tags in java
{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}
so the length of Student tag is 7 and that of subject tag is 7 and that of marks is 5.
I am trying to split the tags and then find the length of each string within the tag.
But the code I am trying gives me only the first tag name and not others.
Can you please help me on this?
I am very new to java. Please let me know if this is a very silly question.
Code part:
System.out.println(
getParenthesesContent("{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}"));
public static String getParenthesesContent(String str) {
return str.substring(str.indexOf('{')+1,str.indexOf('}'));
}
You can use Patterns with this regex \\{(\[a-zA-Z\]*)\\} :
String text = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
Matcher matcher = Pattern.compile("\\{([a-zA-Z]*)\\}").matcher(text);
while (matcher.find()) {
System.out.println(
String.format(
"tag name = %s, Length = %d ",
matcher.group(1),
matcher.group(1).length()
)
);
}
Outputs
tag name = Student, Length = 7
tag name = Subject, Length = 7
tag name = Marks, Length = 5
You might want to give a try to another regex:
String s = "{Abc}{Defg}100{Hij}100{/Klmopr}{/Stuvw}"; // just a sample String
Pattern p = Pattern.compile("\\{\\W*(\\w++)\\W*\\}");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + ", length: " + m.group(1).length());
}
Output you get:
Abc, length: 3
Defg, length: 4
Hij, length: 3
Klmopr, length: 6
Stuvw, length: 5
If you need to use charAt() to walk over the input String, you might want to consider using something like this (I made some explanations in the comments to the code):
String s = "{Student}{Subject}{Marks}100{/Marks}{/Subject}{/Student}";
ArrayList<String> tags = new ArrayList<>();
for(int i = 0; i < s.length(); i++) {
StringBuilder sb = new StringBuilder(); // Use StringBuilder and its append() method to append Strings (it's more efficient than "+=") String appended = ""; // This String will be appended when correct tag is found
if(s.charAt(i) == '{') { // If start of tag is found...
while(!(Character.isLetter(s.charAt(i)))) { // Skip characters that are not letters
i++;
}
while(Character.isLetter(s.charAt(i))) { // Append String with letters that are found
sb.append(s.charAt(i));
i++;
}
if(!(tags.contains(sb.toString()))) { // Add final String to ArrayList only if it not contained here yet
tags.add(sb.toString());
}
}
}
for(String tag : tags) { // Printing Strings contained in ArrayList and their length
System.out.println(tag + ", length: " + tag.length());
}
Output you get:
Student, length: 7
Subject, length: 7
Marks, length: 5
yes use regular expression, find the pattern and apply that.

Java code to process special characters that need to be replaced by other special characters

I am writing Java code to process a string received from a Mainframe that contains special characters that need to be replaced by other special characters, my search characters are §ÄÖÜäüßö#[\]~{¦} and the replacement characters are #[\]{}~¦§ÄÖÜßäöü so if the string has a { in it I need to replace it with ä and example of my input is "0.201322.05.2017LM-R{der Dopp"
My code currently is
String repChar = "§ÄÖÜäüßö#[\\\\]~{¦}#[\\\\]{}~¦§ÄÖÜßäöü";
// Split String and Convert
String repCharin = repChar.substring(0, repChar.length()/2-1);
String repCharout = repChar.substring(repChar.length()/2, repChar.length()-1);
String strblob = new String(utf8ContentIn);
// Convert
for (int j=0; j < repCharin.length();j++) {
strblob = strblob.replace(repCharin.substring(j, 1), repCharout.substring(j, 1));
}
byte [] utf8Content = strblob.getBytes();
But it generates the following error
java.lang.StringIndexOutOfBoundsException at
java.lang.String.substring(String.java:1240)
The \\ are escaped characters I only need a single \
The code
String utf8ContentIn = "0.201322.05.2017LM-R{der Dopp";
String repChar = "§ÄÖÜäüßö#[\\]~{¦}#[\\]{}~¦§ÄÖÜßäöü";
// Split String and Convert
String repCharin = repChar.substring(0, repChar.length() / 2);
String repCharout = repChar.substring(repChar.length() / 2, repChar.length());
String strblob = new String(utf8ContentIn);
String output = strblob.chars().mapToObj(c -> {
char ch = (char) c;
int index = repCharin.indexOf(c);
if (index != -1) {
ch = repCharout.charAt(index);
}
return String.valueOf(ch);
}).collect(Collectors.joining());
System.out.println(output);
will print "0.201322.05.2017LM-Räder Dopp" as you expect. Your problem here (besides incorrect indexes during separation) is that you should iterate input string instead of your characters. Because you can run into situation when you replace Ä with [ and after threat [ as special character again and replace it second time with Ä.
Also, single backslash should be escaped with single backslash, so to get \ you need \\
Hope it helps!

replaceFirst for character "`"

First time here. I'm trying to write a program that takes a string input from the user and encode it using the replaceFirst method. All letters and symbols with the exception of "`" (Grave accent) encode and decode properly.
e.g. When I input
`12
I am supposed to get 28AABB as my encryption, but instead, it gives me BB8AA2
public class CryptoString {
public static void main(String[] args) throws IOException, ArrayIndexOutOfBoundsException {
String input = "";
input = JOptionPane.showInputDialog(null, "Enter the string to be encrypted");
JOptionPane.showMessageDialog(null, "The message " + input + " was encrypted to be "+ encrypt(input));
public static String encrypt (String s){
String encryptThis = s.toLowerCase();
String encryptThistemp = encryptThis;
int encryptThislength = encryptThis.length();
for (int i = 0; i < encryptThislength ; ++i){
String test = encryptThistemp.substring(i, i + 1);
//Took out all code with regard to all cases OTHER than "`" "1" and "2"
//All other cases would have followed the same format, except with a different string replacement argument.
if (test.equals("`")){
encryptThis = encryptThis.replaceFirst("`" , "28");
}
else if (test.equals("1")){
encryptThis = encryptThis.replaceFirst("1" , "AA");
}
else if (test.equals("2")){
encryptThis = encryptThis.replaceFirst("2" , "BB");
}
}
}
I've tried putting escape characters in front of the grave accent, however, it is still not encoding it properly.
Take a look at how your program works in each loop iteration:
i=0
encryptThis = '12 (I used ' instead of ` to easier write this post)
and now you replace ' with 28 so it will become 2812
i=1
we read character at position 1 and it is 1 so
we replace 1 with AA making 2812 -> 28AA2
i=2
we read character at position 2, it is 2 so
we replace first 2 with BB making 2812 -> BB8AA2
Try maybe using appendReplacement from Matcher class from java.util.regex package like
public static String encrypt(String s) {
Map<String, String> replacementMap = new HashMap<>();
replacementMap.put("`", "28");
replacementMap.put("1", "AA");
replacementMap.put("2", "BB");
Pattern p = Pattern.compile("[`12]"); //regex that will match ` or 1 or 2
Matcher m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while (m.find()){//we found one of `, 1, 2
m.appendReplacement(sb, replacementMap.get(m.group()));
}
m.appendTail(sb);
return sb.toString();
}
encryptThistemp.substring(i, i + 1); The second parameter of substring is length, are you sure you want to be increasing i? because this would mean after the first iteration test would not be 1 character long. This could throw off your other cases which we cannot see!

removing space before new line in java

i have a space before a new line in a string and cant remove it (in java).
I have tried the following but nothing works:
strToFix = strToFix.trim();
strToFix = strToFix.replace(" \n", "");
strToFix = strToFix.replaceAll("\\s\\n", "");
myString.replaceAll("[ \t]+(\r\n?|\n)", "$1");
replaceAll takes a regular expression as an argument. The [ \t] matches one or more spaces or tabs. The (\r\n?|\n) matches a newline and puts the result in $1.
try this:
strToFix = strToFix.replaceAll(" \\n", "\n");
'\' is a special character in regex, you need to escape it use '\'.
I believe with this one you should try this instead:
strToFix = strToFix.replace(" \\n", "\n");
Edit:
I forgot the escape in my original answer. James.Xu in his answer reminded me.
Are you sure?
String s1 = "hi ";
System.out.println("|" + s1.trim() + "|");
String s2 = "hi \n";
System.out.println("|" + s2.trim() + "|");
prints
|hi|
|hi|
are you sure it is a space what you're trying to remove? You should print string bytes and see if the first byte's value is actually a 32 (decimal) or 20 (hexadecimal).
trim() seems to do what your asking on my system. Here's the code I used, maybe you want to try it on your system:
public class so5488527 {
public static void main(String [] args)
{
String testString1 = "abc \n";
String testString2 = "def \n";
String testString3 = "ghi \n";
String testString4 = "jkl \n";
testString3 = testString3.trim();
System.out.println(testString1);
System.out.println(testString2.trim());
System.out.println(testString3);
System.out.println(testString4.trim());
}
}

Categories