Replace quotes in String - java

I have to replace all the commas that are between double quotes with a dot.
I'm trying to do that with the replace and replaceAll Java's methods. But I still didn't sort out a solution.
Can someone help me?
EDIT:
I have to manually parse a csv file to object. So I'm trying to string split each input line, but one number has a comma inside so i'm getting more datas than I need for the split.
Example: I have to split this string.
"""LASER MEDIA SOCIETA' COOPERATIVA""",CNF146010,FM (S),PIAZZA UMBERTO I - PISTICCI,MT,40N2323,16E3328,383,,"99,1",CITY RADIO,"H: --V: 32 dBW",0.0
Notice that I have "99,1" and the ,, before that are putting me in trouble.
Scanner var = new Scanner(new BufferedReader(new FileReader ("t1.csv")));
ArrayList<Catasto> obj = new ArrayList();
String data = var.nextLine();
String data2 = null;
String full = null;
int j = 0;
while (var.hasNextLine()) {
data = var.nextLine();
data2 = var.nextLine();
full = data + data2;
//full = full.replaceAll("\"*[,]*\"", "."); attempt 1
System.out.println(full);
ArrayList<String> parts = new ArrayList();
String[] parti = full.split(",");
//for (int i = 0; i<parti.length; i++) { this is because I'm trying to change empty string with a null
//if (parti[i] == " ") in order to solve this error: java.lang.NumberFormatException: For input string: ""
// parti[i] = null;
//}
for (int i = 0; i<12; i++) {
parts.add(parti[i]);
}
Catasto foo = new Catasto(parts);
obj.add(foo);
}
var.close();
EDIT 2:
I have solved the problem of the comma between the double quotes. But I don't know why the error: java.lang.NumberFormatException: For input string: ""

You're going to struggle to do it with a single replaceAll or replace as you need to determine pairs of quotes. Your best bet is to match pairs of quotes and the use replaceAll for the group to change the comma to a full stop.
String input = "\"One,Two,There\",\"Four,Five,Six\"";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(input);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group().replaceAll(",", "."));
}
m.appendTail(sb);
String output = sb.toString(); // "One.Two.There","Four.Five.Six"

Related

How CSV parsing can be utilized - JAVA

I am given a file that will read the following:
"String",int,int
"String",int,int
"String",int,int
...
Given an unknown number of variables, a while (scanner.hasNextLine()) can solve to the number of entries. My goal is to take these three pieces of data and store them into a Node. I am using the method BinaryTree.addNode(String, int, int) for this. My issue comes to when I am trying to read in the data. I am trying to remove the commas within the document and then attempting to re-read the data using the following:
Scanner firstpass = new Scanner(file);
String input = firstpass.nextLine().replaceAll(",", "");
Scanner secondpass = new Scanner(input);
String variable1 = secondpass.next();
int variable2 = secondpass.nextInt();
int variable3 = secondpass.nextInt();
This however is a very innefective way of going about this.
UPDATED
The compiling errors can be fixed with the following:
try {
Scanner scanner1 = new Scanner(file);
while (scanner1.hasNextLine()) {
String inventory = scanner1.nextLine().replaceAll(",", " ");
Scanner scanner2 = new Scanner(inventory);
while (scanner2.hasNext()){
String i = scanner2.next();
System.out.print(i);
}
scanner2.close();
}
scanner1.close();
}
catch (FileNotFoundException ex) {
ex.printStackTrace();
}
which gives me the output:
"String"intint"String"intint"String"intint...
So I know I am on the right track. However any (spaces) within the "String" variable are removed. So they would output "SomeString" instead of "Some String". Also I still don't know how to remove the "" from the strings.
The format you've shown matches the CSV (Comma-Separated Values) format, so your best option is to use a CSV parser, e.g. Apache Commons CSV ™.
If you don't want to add a third-party library, you could use Regular Expression to parse the line.
Reading lines from a file should not be done with a Scanner. Use a BufferedReader instead. See Scanner vs. BufferedReader.
try (BufferedReader in = new BufferedReader(new FileReader(file))) {
Pattern p = Pattern.compile("\"(.*?)\",(-?\\d+),(-?\\d+)");
for (String line; (line = in.readLine()) != null; ) {
Matcher m = p.matcher(line);
if (! m.matches())
throw new IOException("Invalid line: " + line);
String value1 = m.group(1);
int value2 = Integer.parseInt(m.group(2));
int value3 = Integer.parseInt(m.group(3));
// use values here
}
} catch (IOException | NumberFormatException ex) {
ex.printStackTrace();
}
Note that this will not work if the string contains escaped characters, e.g. if it contains embedded double-quotes. For that, you should use a parser library.
The code above will correctly handle embedded spaces and commas.
I would instead of using
String input = firstpass.nextLine().replaceAll(",", "");
Scanner secondpass = new Scanner(input);
String variable1 = secondpass.next();
int variable2 = secondpass.nextInt();
int variable3 = secondpass.nextInt();
Use the following approach
String line = firstpass.nextLine();
String[] temp = line.split(",");
String variable1 = temp[0];
int variable2 = Integer.parseInt(temp[1]);
int variable3 = Integer.parseInt(temp[2]);

Java using \034 as delimiter in a string

I am trying to use '\034' field separator character as a delimiter in a string.
The issue is when I hardcode "\034"+opField and write it to a file it works, but if the "\034" character is read from a file, it writes the output as string "col1\034col2'.
I tried using StringBuilder but it escapes the \034 to "\\034".
I am using the following code to read the character from the file:
try (BufferedReader br = new BufferedReader(new FileReader(fConfig))){
int lc = 1;
for(String line;(line = br.readLine())!=null;){
String[] rowList = line.split(delim);
int row_len = rowList.length;
if (row_len<2){
System.out.println("Incorrect dictionary file row:"+fConfig.getAbsolutePath()+"\nNot enough values found at row:"+line);
}else{
String key = rowList[0];
String value = rowList[1];
dictKV.put(key, value);
}
lc++;
}
}catch(Exception e){
throw e;
}
Any help is welcome...
[update]: The same thing is happening with '\t' character, if harcoded fine, but if read from a file its getting appended as characters. "col0\tcol1"
if(colAl.toLowerCase().contains(" as ")){
String temp = colAl.replaceAll("[ ]+as[ ]+"," | ");
ArrayList<String> tempA = this.brittle_delim(temp,'|');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}else {
ArrayList<String> tempA = this.brittle_delim(colAl,' ');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}
if(i==0){
sb.append(colAl);
headerCols+=colAl.trim();
}else{
headerCols+= this.output_field_delim + colAl;
sb.append(this.output_field_delim);
sb.append(colAl);
}
}
}
System.out.println("SB Header Cols:"+sb.toString());
System.out.println("Header Cols:"+headerCols);
Output:
SB Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
In the above code if I do the following I am getting correct results:
headerCols+= "\034"+ colAl;
output:
SPRN_CO_IDFISC_YR_MTH_DSPLY_CDCST_OBJ_CDPRFT_CTR_CDLEGL_CO_CDHEAD_CT_TYPE_IDFIN_OWN_CDFUNC_AREA_CDHEAD_CT_NR
The FS characters are there even if they are geting removed here
You should provide an example demonstrating your problem. Not just incomplete code snippets.
Following runable snippet does what you explained.
// create a file one line
byte[] bytes = "foo bar".getBytes(StandardCharsets.ISO_8859_1);
String fileName = "/tmp/foobar";
Files.write(Paths.get(fileName), bytes);
String headerCols = "";
String outputFieldDelim = "\034";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
// read the line from the file and split by blank character
String[] cols = br.readLine().split(" ");
// contcatenate the values with "\034"
// but ... for your code ...
// don't concatenate String objects in a loop like below
// use a StringBuilder or StringJoiner instead
headerCols += outputFieldDelim + cols[0];
headerCols += outputFieldDelim + cols[1];
}
// output with the "\034" character
System.out.println(headerCols);
I guess this is where I found my solution and the actual words for my Question.
How to unescape string literals in java

Detect an array from text file

I have some files containing text, from which I have to extract some information, among which I have a 2-dimensional double array(sometimes it might be missing - therefore you'll find the "if" clause).
This is the way the file is formatted:
Name=fileName
groups={ group1=groupName group2=groupName minAge= maxAge= ages=[[18.0,21.0,14.7],[17.3,13.0,12.0]] }
I am using java.nio.file.Files, java.nio.file.Path and java.io.Bufferedreader to read these files, but I am having problems while trying to convert the Strings representing the arrays to real java Arrays:
Path p = Paths.get(filename);
try(BufferedReader br = Files.newBufferedReader(p)) {
String line = br.readLine();
String fileName = line.split("=")[1];
line = br.readLine();
String[] arr = line.split("=");
String group1 = arr[2].split(" ")[0];
String group2 = arr[3].split(" ")[0];
Integer minAge = Integer.parseInt(arr[4].split(" ")[0]);
Integer maxAge = Integer.parseInt(arr[5].split(" ")[0]);
double[][] ag = null;
if (line.contains("ages")) {
String age = arr[6].trim().replace("}", "").replace("[[", "").replace("]]", "").trim();
String[] arrAge = weights.split(",");
//don't know what to do here from now on, since the number of arrays inside
//the first one may vary from 1 to 2 (e.g I might find: [[3.0, 4.0]] or [[3.0, 7.0],[4.0,5.0]])
//this is what I was trying to do
ag = new double[1][arrAge.length];
for (int i = 0; i < arrAge.length; i++)
ag[0][i] = Double.parseDouble(arrAge[i]);
}
}
catch (Exception e) {
e.printStackTrace();
}
Is there any way to detect the array from the text without doing what I am trying to do in my code or is there any way to extract a correct 2-dimensional array by reading a file formatted that way?
One more question: is there a way to print a 2-dimensional array like that? If yes, how? (by using Arrays.toString I only get something like this: [[D#69222c14])
You can use regex to extract the two dimensional array from any string.
String groups="{ group1=groupName group2=groupName minAge= maxAge= ages=[[18.0,21.0,14.7],[17.3,13.0,12.0]] }";
String pattern = "(\\[\\[.+\\]\\])";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(groups);
if (m.find( ))
System.out.println(m.group());
The Output for the above code is:
[[18.0,21.0,14.7],[17.3,13.0,12.0]]

Deal with PatternSyntaxException and scanning texts

I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}

How to replace all special characters with another character in java?

I want to replace all 'special characters' with a special character in java
For example 'cash&carry' will become 'cash+carry' and also 'cash$carry' will become 'cash+carry'
I have a sample CSV file as
Here the CSV headers are 'What' and 'Where'
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate&barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book store,los+angeles+metro
garment,los+angeles+metro
"cash,carry",los+angeles+metro
cash&carry,los+angeles+metro
cash carry,los+angeles+metro
The expected output
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate+barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book+store,los+angeles+metro
garment,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
The sample code is as follows
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}
All the special characters, all the spaces and the double quotes should be removed and replaced as in the desired output.
I am using value.replaceAll("[^A-Za-z0-9 ]", "+") , but it is not working.
Error
cash
carry"
Any help is appreciated. new to regex.
You need to:
replace all commas within quotes with +
replace non-whitelist (and you need to add commas to your whitelist)
+
remove double quotes
Try this:
line = line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");
I think your regex is pretty close. Add an exception for comma's as well and get rid of the space and you are good.
BufferedReader r = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = r.readLine()) != null)
{
String replaced = line.replace("\"", "");
replaced = replaced.replaceAll("[^A-Za-z0-9,]", "+");
System.out.println(replaced);
}
Of course, Strings are immutable in Java. Keep that in mind. replaceAll() returns a new String and does not modify the original instance.
Demo here.
You need to first find quote and replace , inside it with +. Next you can just use replaceAll("[^A-Za-z0-9,]", "+") so you will replace all non alphanumeric characters or , with +. Your code for that can use
Pattern p = Pattern.compile("\"([^\"]*)\"");
pattern to locate quotations and appendReplacement, appendTail from Matcher class to replace founded quotations with its new version.
So in short your code can look something like
Scanner scanner = new Scanner(new File(csvfile));
Pattern p = Pattern.compile("\"([^\"]*)\"");
StringBuffer sb = new StringBuffer();
while(scanner.hasNextLine()){
String line = scanner.nextLine();
Matcher m = p.matcher(line);
while (m.find()){//find quotes
//and replace their content with content with replaced `,` by `+`
//BTW group(1) holds part of quotation without `"` marsk
m.appendReplacement(sb, m.group(1).replace(',', '+'));
}
m.appendTail(sb);//we need to also add rest of unmatched data to buffer
//now we can just normally replace special characters with +
String result = sb.toString().replaceAll("[^A-Za-z0-9,]", "+");
//after job is done we can use result, so lest print it
System.out.println(result);
//lets not forget to reset buffer for next line
sb.delete(0, sb.length());
}
Answer to the question
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
line =line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}

Categories