String operations:Split

String operations:Split - java

I have a string consisting of file separators; e.g. "Bancs\Bancs_CP_P&MB.xml".
I want to separate the string based on "\".
Here's what I'm trying:
public class Stringoperations {
public static void main(String[] args) {
try {
String fileSeparator = System.getProperty("file.separator");
String TestCaseName = "Bancs" + fileSeparator + "Bancs_CP_P&MB.xml";
String[] tukde = TestCaseName.split("\\");
System.out.println(tukde[0]);
System.out.println(tukde[1]);
} catch (Exception e) {
e.getMessage();
} finally {
System.out.println("here");
}
}
}
But this is not working.

First: add a e.printStackTrace(); or something similar to your catch block, so you'll see what's actually wrong:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
\
^
at java.util.regex.Pattern.error(Pattern.java:1924)
A back slash in Java string literals allows you place special chars into a string:
String withTab = "a\tb";
would print as "a b". To get a backslash in a Java string you need to escape it:
String withBackslash = "a\\b";
So this is what you done in the split invocation: you passed one java string back slash. Since String.split() evaluates the passed string a regular expression (Java Doc for String.split()), the back slash is treated as a RegEx. Backslash has a special meaning in regular expressions and cannot appear alone (Java Doc for Pattern). If you want a literal back slash you need to escape the back slash again:
String[] tukde = TestCaseName.split("\\\\");

First, putting that code into IntelliJ IDEA causes it to fuss at me with an illegal escape sequence. You have to escape the escape, so you'd be using \\\\ as valid backslash escape syntax.
Second, you should be splitting on fileSeparator, not an arbitrary backslash. The backslash actually varies from system to system (e.g. I'm on Linux Mint, and my separators are all forward slashes).
String[] tukde = TestCaseName.split(fileSeparator);
As a further note, there's no exceptions here that could be thrown (save for runtime), and blindly catching all exceptions isn't a good practice.

Try this code :-
public static void main(String[] args) {
try
{
String fileSeparator = System.getProperty("file.separator");
String TestCaseName = "Bancs"+fileSeparator+"Bancs_CP_P&MB.xml";
String[] tukde = TestCaseName.split("\\\\");
System.out.println(tukde[0]);
System.out.println(tukde[1]);
}catch(Exception e)
{
e.getMessage();
}
finally
{
System.out.println("here");
}
}
Out put :-
Bancs
Bancs_CP_P&MB.xml
here

What platform or OS are you working on. Maybe your default file separator is not "\". Try this :
public class Stringoperations {
public static void main(String[] args) {
try {
String fileSeparator = System.getProperty("file.separator");
String TestCaseName = "Bancs" + fileSeparator + "Bancs_CP_P&MB.xml";
String[] tukde = TestCaseName.split(fileSeparator); //This uses the default separator returned by System.getProperty
System.out.println(tukde[0]);
System.out.println(tukde[1]);
} catch (Exception e) {
e.getMessage();
} finally {
System.out.println("here");
}
}
}
EDIT :
Also as Makoto points out if you are bent on using "\" to split you need to use "\\" and not "\"

Try this:
String pattern = Pattern.quote(System.getProperty("file.separator"));
String[] splittedFileName = fileName.split(pattern);

public class Stringoperations {
public static void main(String[] args) {
try {
String fileSeparator = System.getProperty("file.separator");
String TestCaseName = "Bancs" + fileSeparator + "Bancs_CP_P&MB.xml";
String[] tukde = TestCaseName.split("\\\\");
System.out.println(tukde[0]);
System.out.println(tukde[1]);
} catch (Exception e) {
e.getMessage();
} finally {
System.out.println("here");
}
}
}
try this

Related

How to replaces white spaces from String loaded through properties

I am loading property from file, the property contains path (Windows path) and I need to normalize it to create usable path. The problem is that I can't replace "\".
Here is my test class:
public class PathUtil {
public static String normalizeEscapeChars(String source) {
String result = source;
result = result.replace("\b", "/b");
result = result.replace("\f", "/f");
result = result.replace("\n", "/n");
result = result.replace("\r", "/r");
result = result.replace("\t", "/t");
result = result.replace("\\", "/");
result = result.replace("\"", "/\"");
result = result.replace("\'", "/'");
return result;
}
public static void main(String[] args) {
try(FileInputStream input = new FileInputStream("C:\\Users\\Rakieta\\Desktop\\aaa.properties")) {
Properties prop = new Properties();
prop.load(input);
System.out.println(PathUtil.normalizeEscapeChars(prop.getProperty("aaa")));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Here property file:
aaa=Intermix\koza , intermix\trace
Actual output is :
Intermixkoza , intermix/trace
Needed output is :
Intermix/koza , intermix/trace
Any suggestions?

When I copied your code my IDE threw an error saying \k is not a valid escape character. So I removed the whole line.
result = result.replace("\k", "/k");
// I have not seen that escape character (Correct me if I am wrong)
And my output was
aaa=Intermix/koza , intermix/trace
or you try what Connor said that is
result = result.replace("\\k", "/k");
// This code is replacing \k with /k in Intermix\koza. So it is kinda hard coded.
which also gives the same result.

The backslash is already interpreted by the java.util.Properties class.
To bypass this, you can extend it and tweak the load(InputStream) method as shown in this answer:
public class PropertiesEx extends Properties {
public void load(FileInputStream fis) throws IOException {
Scanner in = new Scanner(fis);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext()) {
out.write(in.nextLine().replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}
}

Use double backslash \\ to escape a backslash in java.

Set with duplicates java - import from file - java

I have a small project.
The project imports the txt file to String (coding similar to CSV - contains semicolons = ";").
In the next steps, the String is changed to ArrayList.
Then, using Predicate, I remove elements that do not interest me.
At the end I replace ArrayList on TreeSet to remove duplicates.
Unfortunately, there is a problem here because the duplicates occur ...
I checked in Notepadd ++ changing the encoding on ANSI to check whether there are no unnecessary characters.
Unfortunately, everything looks good and duplicates are still there.
Uploaded input file - https://drive.google.com/open?id=1OqIKUTvMwK3FPzNvutLu-GYpvocUsSgu
Any idea?
public class OpenSCV {
private static final String SAMPLE_CSV_FILE_PATH = "/Downloads/all.txt";
public static void main(String[] args) throws IOException {
File file = new File(SAMPLE_CSV_FILE_PATH);
String str = FileUtils.readFileToString(file, "utf-8");
str = str.trim();
String str2 = str.replace("\n", ";").replace("\"", "" ).replace("\n\n",";").replace("\\*www.*\\","")
.replace("\u0000","").replace(",",";").replace(" ","").replaceAll(";{2,}",";");
List<String> lista1 = new ArrayList<>(Arrays.asList((str2.split(";"))));
Predicate<String> predicate = s -> !(s.contains("#"));
Set<String> removeDuplicates = new TreeSet<>(lista1);
removeDuplicates.removeIf(predicate);
String fileName2 = "/Downloads/allMails.txt";
try ( BufferedWriter bw =
new BufferedWriter (new FileWriter (fileName2)) )
{
for (String line : removeDuplicates) {
bw.write (line + "\n");
}
bw.close ();
} catch (IOException e) {
e.printStackTrace ();
}
}
}

before doing str.replace you can try str.trim to remove any spaces or unwanted and unseen characters.
str = str.trim()

Using trim() in Java to remove parts of an ouput

I have some code I wrote that outputs a batch file output to a jTextArea. Currently the batch file outputs an active directory query for the computer name, but there is a bunch of stuff that outputs as well that I want to be removed from the output from the variable String trimmedLine. Currently it's still outputting everything else and I can't figure out how to get only the computer name to appear.
Output: "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET"
I want the output to instead just show only this:
FDCD111304
Can anyone show me how to fix my code to only output the computer name and nothing else?
Look at console output (Ignore top line in console output)
btnPingComputer.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent arg0) {
String line;
BufferedWriter bw = null;
BufferedWriter writer =null;
try {
writer = new BufferedWriter(new FileWriter(tempFile));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String lineToRemove = "OU=Workstations";
String s = null;
Process p = null;
try {
p = Runtime.getRuntime().exec("c:\\computerQuery.bat");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
StringBuffer sbuffer = new StringBuffer(); // new trial
BufferedReader in = new BufferedReader(new InputStreamReader(p
.getInputStream()));
try {
while ((line = in.readLine()) != null) {
System.out.println(line);
textArea.append(line);
textArea.append(String.format(" %s%n", line));
sbuffer.append(line + "\n");
s = sbuffer.toString();
String trimmedLine = line.trim();
if(trimmedLine.equals(lineToRemove)) continue;
writer.write(line + System.getProperty("line.separator"));
}
fw.write("commandResult is " + s);
String input = "CN=FDCD511304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(3, m.group().length() - 1);
System.out.println(currentVar); //store or do whatever you want
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally
{
try {
fw.close();
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});

You could also use javax.naming.ldap.LdapName when dealing with distinguished names. It also handles escaping which is tricky with regex alone (i.e. cn=foo\,bar,dc=fl,dc=net is a perfectly valid DN)
String dn = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
LdapName ldapName = new LdapName(dn);
String commonName = (String) ldapName.getRdn(ldapName.size() - 1).getValue();

Well I would personally use the split() function to first get the parts split up and then parse out again. So my (probably unprofessional and buggy code) would be
String args[] = line.split(",");
String args2[] = args[0].split("=");
String computerName = args2[1];
And that would be where this is:
while ((line = in.readLine()) != null) {
System.out.println(line);
String trimmedLine = line.trim();
if (trimmedLine.equals(lineToRemove))
continue;
writer.write(line
+ System.getProperty("line.separator"));
textArea.append(trimmedLine);
textArea.append(String.format(" %s%n", line));
}

You can use a different regular expression and Matcher.matches() to find only the value you're looking for:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(?:.*,)?CN=([^,]+).*");
Matcher matcher = pattern.matcher(str);
if(matcher.matches()) {
System.out.println(matcher.group(1));
} else {
System.out.println("No value for CN found");
}
FDCD111304
That regular expression will find the value for CN regardless of where in the string it is. The first group is to discard anything in front of CN= (we use a group starting with ?: here to indicate that the contents of the group should not be kept), then we match CN=, then the value, which may not contain a comma and then the rest of the string (which we don't care about).
You can also use a different regex and Matcher.find() to get both the keys and values and choose which keys to act on:
String str = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("([^=]+)=([^,]+),?");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String key = matcher.group(1);
String value = matcher.group(2);
if("CN".equals(key) || "DC".equals(key)) {
System.out.printf("%s: %s%n", key, value);
}
}
CN: FDCD111304
DC: FL
DC: NET

Try using substring to chop off the parts you dont require hence creating a new string

There're few options, simples dumbest:
str.substring(str.indexOf("=") + 1, str.indexOf(","))
Second one and more flexible approach would be to build HashArray, it would be helpful in future to read other values.
Edit: Second method
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.HashMap;
public class HelloWorld{
public static void main(String []args){
String input = "CN=FDCD111304,OU=Workstations,OU=SIM,OU=Accounts,DC=FL,DC=NET";
Pattern pattern = Pattern.compile("(.*?)\\=(.*?)\\,");
Matcher m = pattern.matcher(input);
while(m.find()) {
String currentVar = m.group().substring(0, m.group().length() - 2);
System.out.println(currentVar); //store or do whatever you want
}
}
}
This one will print all values like CN=FDCD11130, you can split it by '=' and store in key/value container like HashMap or just inside list.

Replacing \\u by \u in java string

I have a string which contains normal text and Unicode in between, for example "abc\ue415abc".
I want to replace all occurrences of \\u with \u. How can I achieve this?
I used the following code but it's not working properly.
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
int cp = Integer.parseInt(m.group(1), 16);
m.appendReplacement(buf, "");
buf.appendCodePoint(cp);
} catch (NumberFormatException e) {
}
}
m.appendTail(buf);
s = buf.toString();
Please help. Thanks in advance.

From API reference: http://developer.android.com/reference/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence)
You can use public
public String replace (CharSequence target, CharSequence replacement)
string = string.replace("\\u", "\u");
or
String replacedString = string.replace("\\u", "\u");

Your initial string doesn't, in fact, have any double backslashes.
String s = "aaa\\u2022bbb\\u2014ccc";
yields a string that contains aaa\u2022bbb\u2014ccc, as \\ is just java string-literal escaping for \.
If you want unicode characters: (StackOverflow21028089.java)
import java.util.regex.*;
class StackOverflow21028089 {
public static void main(String[] args) {
String s = "aaa\\u2022bbb\\u2014ccc";
StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u([0-9A-Fa-f]{4})").matcher(s);
while (m.find()) {
try {
// see example:
// http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#appendReplacement%28java.lang.StringBuffer,%20java.lang.String%29
int cp = Integer.parseInt(m.group(1), 16);
char[] chars = Character.toChars(cp);
String rep = new String(chars);
System.err.printf("Found %d which means '%s'\n", cp, rep);
m.appendReplacement(buf, rep);
} catch (NumberFormatException e) {
System.err.println("Confused: " + e);
}
}
m.appendTail(buf);
s = buf.toString();
System.out.println(s);
}
}
=>
Found 8226 which means '•'
Found 8212 which means '—'
aaa•bbb—ccc
If you want aaa\u2022bbb\u2014ccc, that's what you started with. If you meant to start with a string literal with aaa\\u2022bbb\\u2014ccc, that's this:
String s = "aaa\\\\u2022bbb\\\\u2014ccc";
and converting it to the one with single slashes can be as simple as #Overv's code:
s = s.replaceAll("\\\\u", "\\u");
though since backslash has a special meaning in regex patterns and replacements (see Matcher's docs) (in addition to java parsing), this should probably be:
s = s.replaceAll("\\\\\\\\u", "\\\\u");
=>
aaa\u2022bbb\u2014ccc

Try this:
s = s.replace(s.indexOf("\\u"), "\u");

There is a contains method and a replace method in String. That being said
String hello = "hgjgu\udfgyud\\ushddsjn\hsdfds\\ubjn";
if(hello.contains("\\u"))
hello.replace("\\u","\u");
System.out.println(hello);
It will print :- hgjgu\udfgyud\ushddsjn\hsdfds\ubjn

Regex Working on the test program but not on WebSprinx crwaler

Here is my code for Regex matching which worked for a webpage:
public class RegexTestHarness {
public static void main(String[] args) {
File aFile = new File("/home/darshan/Desktop/test.txt");
FileInputStream inFile = null;
try {
inFile = new FileInputStream(aFile);
} catch (FileNotFoundException e) {
e.printStackTrace(System.err);
System.exit(1);
}
BufferedInputStream in = new BufferedInputStream(inFile);
DataInputStream data = new DataInputStream(in);
String string = new String();
try {
while (data.read() != -1) {
string += data.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Pattern pattern = Pattern
.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(string);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
But the same code doesn't work on the crwaler code for which I'm testing the regex, my crawler code is:(I'm using Websphinx)
// Our own Crawler class extends the WebSphinx Crawler
public class MyCrawler extends Crawler {
MyCrawler() {
super(); // Do what the parent crawler would do
}
// We could choose not to visit a link based on certain circumstances
// For now we always visit the link
public boolean shouldVisit(Link l) {
// String host = l.getHost();
return false; // always visit a link
}
// What to do when we visit the page
public void visit(Page page) {
System.out.println("Visiting: " + page.getTitle());
String content = page.getContent();
System.out.println(content);
Pattern pattern = Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
This is my code for running the crawler:
public class WebSphinxTest {
public static void main(String[] args) throws MalformedURLException, InterruptedException {
System.out.println("Testing Websphinx. . .");
// Make an instance of own our crawler
Crawler crawler = new MyCrawler();
// Create a "Link" object and set it as the crawler's root
Link link = new Link("http://justeat.in/restaurant/spices/5633/indian-tandoor-chinese-and-seafood/sarjapur-road/bangalore");
crawler.setRoot(link);
// Start running the crawler!
System.out.println("Starting crawler. . .");
crawler.run(); // Blocking function, could implement a thread, etc.
}
}
A little detail about the crawler code. shouldvisit(Link link) filters whether to visit a link or not. visit(Page page) decides what to do when we get the page.
In the above example, test.txt and content contains the same String

In your RegexTestHarness you're reading in lines from a file and concatenating the lines without line breaks after which you do your matching (readLine() returns the contents of the line without the line breaks!).
So in the input of your MyCrawler class, there probably are line break characters in the input. And since the regex meta-char . by default does not match line break chars, it doesn't work in MyCrawler.
To fix this, append (?s) in from of all your patterns that contain a . meta char. So:
Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
would become:
Pattern.compile("(?s)<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
The DOT-ALL flag, (?s), will cause the . to match any character, including line break chars.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

String operations:Split - java

Try this: String pattern = Pattern.quote(System.getProperty("file.separator")); String[] splittedFileName = fileName.split(pattern);

Related

How to replaces white spaces from String loaded through properties

Set with duplicates java - import from file - java

Using trim() in Java to remove parts of an ouput

Replacing \\u by \u in java string

Regex Working on the test program but not on WebSprinx crwaler

Categories

Resources