I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes.
But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");
After this recordType has MORET I want it to have MO"RET
Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:
public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
BufferedReader reader = null;
List<List<String>> csv = new ArrayList<List<String>>();
try {
reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
for (String record; (record = reader.readLine()) != null;) {
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List<String> fields = new ArrayList<String>();
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);
if (c == '"') {
quoted = !quoted;
}
if ((!quoted && c == separator) || i + 1 == record.length()) {
fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
.replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
fieldBuilder = new StringBuilder();
}
if (c == separator && i + 1 == record.length()) {
fields.add("");
}
}
csv.add(fields);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return csv;
}
Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.
You can however also grab any 3rd party Java CSV API.
How about:
String recordType = items[1].replaceAll( "\"\"", "\"" );
I prefer you to use replace instead of replaceAll.
replaceAll uses REGEX as the first argument.
The requirement is to replace two continues QUOTES with one QUOTE
String recordType = items[1].replace( "\"\"", "\"" );
To see the difference between replace and replaceAll , execute bellow code
recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );
Here you can use the regular expression.
recordType = items[1].replaceAll( "\\B\"", "" );
recordType = recordType.replaceAll( "\"\\B", "" );
First statement replace the quotes in the beginning of the word with empty character.
Second statement replace the quotes in the end of the word with empty character.
Related
I wrote a parser that reads a file line by line and parses it with a regex statement. (Regex below)
case "countries":
pattern = "\\\"(.+?)\\\"(\\s+)?(\\((.+?)\\))?(\\s+)?(\\{(.+?)\\(\\#(.+?)\\)\\})?(\\s+)?(.+)";
substitution = "$1, $4, $7, $8, $10";
break;
This outputs a list with all the groups I want and each group separated by a comma. (through the result.split(",");)
Now lets say I don't want to use a comma but instead an | or an *. Changing the comma to any other string doesn't seem to change anything. What am I missing?
try (CSVWriter csvWriter = new CSVWriter(new FileWriter(myLocalPath + "CSV/" + choice.toLowerCase() + ".csv")))
{
Pattern r = Pattern.compile(pattern);
while (br.readLine() != null)
{
String nextLine = br.readLine();
Matcher matcher = r.matcher(nextLine);
String result = matcher.replaceAll(substitution);
String[] line = result.split("lorem");
csvWriter.writeNext(line, false);
}
}catch(Exception e){
System.out.println(e);
System.out.println("Parsing done!");
}
seems what you're missing is Pattern.quote, if argument must be read literally, indeed split argument is a regex.
String[] line = result.split(Pattern.quote("..."));
Here's my problem:
I need to remove a semicolon in a String but this String comes from a semicolon separated file in excel.
I need to replace a semicolon only if there's a quotation mark after the word.
ie:
data1;data2;"This is a duck;";data3;"Here's another duck";
needs to be replaced by:
data1;data2;"This is a duck";data3;"Here's another duck";
What is the best way to do this ?
Edit: Here's what i tried:
String line = myLine;
line.replaceAll(("\\w*;"),$1);
but i can't make it work and I dont think that its the best way to do it. I also tried
line = line.replaceall(";\"", "\"");
But that doesn't work because it replace
data1;data2;"This is a duck;";data3;"Here's another duck";
for
data1;data2"This is a duck";data3"Here's another duck";
If only you want regex :
public static void main (String[] args) throws java.lang.Exception
{
Pattern p = Pattern.compile ( "\"(.*);\"");
String input1 = "\"This is duck;\"";
String input2 = "This is duck;";
Matcher m = p.matcher(input1);
if ( m.find() )
{
input1 = m.group(1);
System.out.println( "Modified input1 is : " + input1 );
}
else
{
System.out.println( "input1 is not modified" );
}
m = p.matcher(input2);
if ( m.find() )
{
input2 = m.group(1);
System.out.println( "Modified input2 is : " + input2 );
}
else
{
System.out.println( "input2 is not modified" );
}
}
Output :
Modified input1 is : This is duck
input2 is not modified
Probably not the best but easiest way:
String str = "\"This is a duck;\"";
str = str.replace(";\"", "\"")
You should not use regex for that. You should use a csv parser and writer.
For example, here's how you can do it with OpenCSV:
CSVReader reader = new CSVReader(new FileReader("myCsv.csv"),';');
CSVWriter writer = new CSVWriter(new FileWriter("corrected.csv"), ';');
String[] lineTokens;
while ((lineTokens = reader.readNext()) != null) {
for(String token : lineTokens) {
token.replace(";", "");
}
writer.writeNext(lineTokens);
}
writer.close();
If you need to do an operation on a csv, use the right tool. Help yourself and use a csv parser.
Use regex with positive lookahead assertion
/;(?=")/g
Example :
String pattern = ";(?=\")";
String updated = STRING.replaceAll(pattern, "");
or
String pattern = ";\"";
String updated = STRING.replaceAll(pattern, "\"");
I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}
My program needs to read from a multi-lined .ini file, I've got it to the point it reads every line that start with a # and prints it. But i only want to to record the value after the = sign. here's what the file should look like:
#music=true
#Volume=100
#Full-Screen=false
#Update=true
this is what i want it to print:
true
100
false
true
this is my code i'm currently using:
#SuppressWarnings("resource")
public void getSettings() {
try {
BufferedReader br = new BufferedReader(new FileReader(new File("FileIO Plug-Ins/Game/game.ini")));
String input = "";
String output = "";
while ((input = br.readLine()) != null) {
String temp = input.trim();
temp = temp.replaceAll("#", "");
temp = temp.replaceAll("[*=]", "");
output += temp + "\n";
}
System.out.println(output);
}catch (IOException ex) {}
}
I'm not sure if replaceAll("[*=]", ""); truly means anything at all or if it's just searching for all for of those chars. Any help is appreciated!
Try following:
if (temp.startsWith("#")){
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
Explanation:
To process lines only starting with desired character use String#startsWith method. When you have string to extract values from, String#split will split given text with character you give as method argument. So in your case, text before = character will be in array at position 0, text you want to print will be at position 1.
Also note, that if your file contains many lines starting with #, it should be wise not to concatenate strings together, but use StringBuilder / StringBuffer to add strings together.
Hope it helps.
Better use a StringBuffer instead of using += with a String as shown below. Also, avoid declaring variables inside loop. Please see how I've done it outside the loop. It's the best practice as far as I know.
StringBuffer outputBuffer = new StringBuffer();
String[] fields;
String temp;
while((input = br.readLine()) != null)
{
temp = input.trim();
if(temp.startsWith("#"))
{
fields = temp.split("=");
outputBuffer.append(fields[1] + "\n");
}
}
I am inputting a string and I want to add the delimeters in that string to a different string and I was wondering how you would do that. This is the code I have at the moment.
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?");
while (tokenizer.hasMoreTokens()){
//add delimeters to string here
}
Any help would be greatly appreciated(:
If you want StringTokenizer to return the delimiters it parses, you would need to add a flag to the constructor as shown here
StringTokenizer tokenizer = new StringTokenizer(input, "'.,><-=[]{}+!##$%^&*()~`;/?", true);
But if you are searching only for delimiters I dont think this is the right approach.
I don't think StringTokenizer is good for this task, try
StringBuilder sb = new StringBuilder();
for(char c : input.toCharArray()) {
if ("'.,><-=[]{}+!##$%^&*()~`;/?".indexOf(c) >= 0) {
sb.append(c);
}
}
I'm guessing you want to extract all the delimiters from the string and process them
String allTokens = "'.,><-=[]{}+!##$%^&*()~`;/?";
StringTokenizer tokenizer = new StringTokenizer(input, allTokens, true);
while(tokenizer.hasMoreTokens()) {
String nextToken = tokenizer.nextToken();
if(nextToken.length()==1 && allTokens.contains(nextToken)) {
//this token is a delimiter
//append to string or whatever you want to do with the delimiter
processDelimiter(nextToken);
}
}
Create a processDelimiter method in which you add the delimiter to a different string or perform any action you want.
This would even take care of repeated usage of delimeters
String input = "adfhkla.asijdf.';.akjsdhfkjsda";
String compDelims = "'.,><-=[]{}+!##$%^&*()~`;/?";
String delimsUsed = "";
for (char a : compDelims.toCharArray()) {
if (input.indexOf(a) > 0 && delimsUsed.indexOf(a) == -1) {
delimsUsed += a;
}
}
System.out.println("The delims used are " + delimsUsed);