Regarding Java String Manipulation - java

I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET

Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.

How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );

I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );

Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.

Related

Result separation with regex

I wrote a parser that reads a file line by line and parses it with a regex statement. (Regex below)
case "countries":
pattern = "\\\"(.+?)\\\"(\\s+)?(\\((.+?)\\))?(\\s+)?(\\{(.+?)\\(\\#(.+?)\\)\\})?(\\s+)?(.+)";
substitution = "$1, $4, $7, $8, $10";
break;
This outputs a list with all the groups I want and each group separated by a comma. (through the result.split(",");)
Now lets say I don't want to use a comma but instead an | or an *. Changing the comma to any other string doesn't seem to change anything. What am I missing?
try (CSVWriter csvWriter = new CSVWriter(new FileWriter(myLocalPath + "CSV/" + choice.toLowerCase() + ".csv")))
{
Pattern r = Pattern.compile(pattern);
while (br.readLine() != null)
{
String nextLine = br.readLine();
Matcher matcher = r.matcher(nextLine);
String result = matcher.replaceAll(substitution);
String[] line = result.split("lorem");
csvWriter.writeNext(line, false);
}
}catch(Exception e){
System.out.println(e);
System.out.println("Parsing done!");
}
seems what you're missing is Pattern.quote, if argument must be read literally, indeed split argument is a regex.
String[] line = result.split(Pattern.quote("..."));

Java: Remove a semicolon in a string only if its after a word and quotation mark

Here's my problem:
I need to remove a semicolon in a String but this String comes from a semicolon separated file in excel.
I need to replace a semicolon only if there's a quotation mark after the word.
ie:
data1;data2;"This is a duck;";data3;"Here's another duck";
needs to be replaced by:
data1;data2;"This is a duck";data3;"Here's another duck";
What is the best way to do this ?
Edit: Here's what i tried:
String line = myLine;
line.replaceAll(("\\w*;"),$1);
but i can't make it work and I dont think that its the best way to do it. I also tried
line = line.replaceall(";\"", "\"");
But that doesn't work because it replace
data1;data2;"This is a duck;";data3;"Here's another duck";
for
data1;data2"This is a duck";data3"Here's another duck";
If only you want regex :
public static void main (String[] args) throws java.lang.Exception
{
Pattern p = Pattern.compile ( "\"(.*);\"");
String input1 = "\"This is duck;\"";
String input2 = "This is duck;";
Matcher m = p.matcher(input1);
if ( m.find() )
{
input1 = m.group(1);
System.out.println( "Modified input1 is : " + input1 );
}
else
{
System.out.println( "input1 is not modified" );
}
m = p.matcher(input2);
if ( m.find() )
{
input2 = m.group(1);
System.out.println( "Modified input2 is : " + input2 );
}
else
{
System.out.println( "input2 is not modified" );
}
}
Output :
Modified input1 is : This is duck
input2 is not modified
Probably not the best but easiest way:
String str = "\"This is a duck;\"";
str = str.replace(";\"", "\"")
You should not use regex for that. You should use a csv parser and writer.
For example, here's how you can do it with OpenCSV:
CSVReader reader = new CSVReader(new FileReader("myCsv.csv"),';');
CSVWriter writer = new CSVWriter(new FileWriter("corrected.csv"), ';');
String[] lineTokens;
while ((lineTokens = reader.readNext()) != null) {
for(String token : lineTokens) {
token.replace(";", "");
}
writer.writeNext(lineTokens);
}
writer.close();
If you need to do an operation on a csv, use the right tool. Help yourself and use a csv parser.
Use regex with positive lookahead assertion
/;(?=")/g
Example :
String pattern = ";(?=\")";
String updated = STRING.replaceAll(pattern, "");
or
String pattern = ";\"";
String updated = STRING.replaceAll(pattern, "\"");

Deal with PatternSyntaxException and scanning texts

I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}

Buffered Reader find specific line separator char then read that line

My program needs to read from a multi-lined .ini file, I've got it to the point it reads every line that start with a # and prints it. But i only want to to record the value after the = sign. here's what the file should look like:
#music=true
#Volume=100
#Full-Screen=false
#Update=true
this is what i want it to print:
true
100
false
true
this is my code i'm currently using:
#SuppressWarnings("resource")
public void getSettings() {
try {
BufferedReader br = new BufferedReader(new FileReader(new File("FileIO Plug-Ins/Game/game.ini")));
String input = "";
String output = "";
while ((input = br.readLine()) != null) {
String temp = input.trim();
temp = temp.replaceAll("#", "");
temp = temp.replaceAll("[*=]", "");
output += temp + "\n";
}
System.out.println(output);
}catch (IOException ex) {}
}
I'm not sure if replaceAll("[*=]", ""); truly means anything at all or if it's just searching for all for of those chars. Any help is appreciated!
Try following:
if (temp.startsWith("#")){
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
Explanation:
To process lines only starting with desired character use String#startsWith method. When you have string to extract values from, String#split will split given text with character you give as method argument. So in your case, text before = character will be in array at position 0, text you want to print will be at position 1.
Also note, that if your file contains many lines starting with #, it should be wise not to concatenate strings together, but use StringBuilder / StringBuffer to add strings together.
Hope it helps.
Better use a StringBuffer instead of using += with a String as shown below. Also, avoid declaring variables inside loop. Please see how I've done it outside the loop. It's the best practice as far as I know.
StringBuffer outputBuffer = new StringBuffer();
String[] fields;
String temp;
while((input = br.readLine()) != null)
{
temp = input.trim();
if(temp.startsWith("#"))
{
fields = temp.split("=");
outputBuffer.append(fields[1] + "\n");
}
}

How to add delimiters from the StringTokenizers to a seperate string?

I am inputting a string and I want to add the delimeters in that string to a different string and I was wondering how you would do that. This is the code I have at the moment.
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?");
while (tokenizer.hasMoreTokens()){
//add delimeters to string here
}
Any help would be greatly appreciated(:
If you want StringTokenizer to return the delimiters it parses, you would need to add a flag to the constructor as shown here
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?", true);
But if you are searching only for delimiters I dont think this is the right approach.
I don't think StringTokenizer is good for this task, try
StringBuilder sb = new StringBuilder();
for(char c : input.toCharArray()) {
if ("'.,><-=[]{}+!##$%^&*()~`;/?".indexOf(c) >= 0) {
sb.append(c);
}
}
I'm guessing you want to extract all the delimiters from the string and process them
String allTokens = "'.,><-=[]{}+!##$%^&*()~`;/?";
StringTokenizer tokenizer = new StringTokenizer(input, allTokens, true);
while(tokenizer.hasMoreTokens()) {
String nextToken = tokenizer.nextToken();
if(nextToken.length()==1 && allTokens.contains(nextToken)) {
//this token is a delimiter
//append to string or whatever you want to do with the delimiter
processDelimiter(nextToken);
}
}
Create a processDelimiter method in which you add the delimiter to a different string or perform any action you want.
This would even take care of repeated usage of delimeters
String input = "adfhkla.asijdf.';.akjsdhfkjsda";
String compDelims = "'.,><-=[]{}+!##$%^&*()~`;/?";
String delimsUsed = "";
for (char a : compDelims.toCharArray()) {
if (input.indexOf(a) > 0 && delimsUsed.indexOf(a) == -1) {
delimsUsed += a;
}
}
System.out.println("The delims used are " + delimsUsed);

Categories