Regex for csv file separated by spaces and optional quotes [duplicate] - java

This question already has answers here:
Python csv string to array
(10 answers)
Closed 3 years ago.
I have a csv file that is in this format:
22/09/2011 15:15:11 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 096 123456_110922_151511_000001M.jpg 123456 "DBx 4U02" 428008 100 95 "AB123CD"
22/09/2011 15:15:16 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 087 123456_110922_151516_000002M.jpg 123456 "DBx 4U02" 428008 100 95 "EF456GH"
22/09/2011 15:16:30 "AT45 - Km 2 +300 Foo " "PL - 0460" 70 079 123456_110922_151630_000005M.jpg 123456 "DBx 4U02" 428008 200 96 "LM789NP"
And I need a regex to split each value correctly, for example the first line would be:
22/09/2011
15:15:11
"AT45 - Km 2 +300 Foo "
"PL - 0460"
70 096 123456_110922_151511_000001M.jpg
123456
"DBx 4U02"
428008
100
95
"AB123CD"
I have found this regex: ([^,"]+|"([^"]|)*"), but it doesn't do the job quite well.
Can somebody give me a good hint?

This kind of tasks are better handled with CSV parser. One of them is http://opencsv.sourceforge.net/ which allows us to specify your own separator (and many other things).
String csv =
"22/09/2011 15:15:11 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 096 123456_110922_151511_000001M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"AB123CD\"\n" +
"22/09/2011 15:15:16 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 087 123456_110922_151516_000002M.jpg 123456 \"DBx 4U02\" 428008 100 95 \"EF456GH\"\n" +
"22/09/2011 15:16:30 \"AT45 - Km 2 +300 Foo \" \"PL - 0460\" 70 079 123456_110922_151630_000005M.jpg 123456 \"DBx 4U02\" 428008 200 96 \"LM789NP\"";
CSVParser parser = new CSVParserBuilder().withSeparator(' ').build();
CSVReader reader = new CSVReaderBuilder(new StringReader(csv))
.withCSVParser(parser)
.build();
for (String[] row : reader){
for (String str : row){
System.out.println(str);
}
System.out.println("----");
}
Output (at least its beginning):
22/09/2011
15:15:11
AT45 - Km 2 +300 Foo
PL - 0460
70
096
123456_110922_151511_000001M.jpg
123456
DBx 4U02
428008
100
95
AB123CD
----

Related

How to get only characters from a file in java [duplicate]

This question already has answers here:
extract data column-wise from text file using Java
(2 answers)
Closed 4 years ago.
I have a file txt. This is the file:
Team P W L D F A Pts
1. Arsenal 38 26 9 3 79 - 36 87
2. Liverpool 38 24 8 6 67 - 30 80
3. Manchester_U 38 24 5 9 87 - 45 77
4. Newcastle 38 21 8 9 74 - 52 71
5. Leeds 38 18 12 8 53 - 37 66
6. Chelsea 38 17 13 8 66 - 38 64
7. West_Ham 38 15 8 15 48 - 57 53
8. Aston_Villa 38 12 14 12 46 - 47 50
9. Tottenham 38 14 8 16 49 - 53 50
How can I get only the name of teams? I tried to use the regex in the following way but don't work:
FileReader f;
f=new FileReader("file.txt");
BufferedReader b;
b=new BufferedReader(f);
s=b.readLine();
String[] name = s.split("\\w+");
for(int i=0;i<name.length;i++)
System.out.println(name[i]);
How do I solve? Thanks to everyone in advance!
FileReader f;
f=new FileReader("file.txt");
BufferedReader b;
b=new BufferedReader(f);
while(s=b.readLine()!=null){
Matcher name=Pattern.compile("(?<=\\d\\.\\s)\\S+").matcher(s);
if(name.find())
System.out.println(name.group());
}
here the regex (?<=\\d\\.\\s)\\S+ will match only the name after the serial no. Regex
If you want to read line by line and your file has structure as you presented. These code enable you to get clubs names.
File f = new File("file.txt");
Scanner sc = new Scanner(f);
sc.nextLine();
while (sc.hasNextLine()) {
String[] name = sc.nextLine().split("\\s+");
System.out.println(name[1]);
}
try to use replaceAll, find all word characters (a-zA-Z_) and replace them all with empty. this gives team name.
s=b.readLine();
s.replaceAll("[^a-zA-Z_]+","");
System.out.println(s);
Your string s is one line:
1. Arsenal 38 26 9 3 79 - 36 87
All you need to do is split by space and get second entry:
s.split(" ")[1]
RegEx is overkill here. Do this for each line and add the name to a list at each step.

Finding substring from a string using regex java

I have a String:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
I want to get the path (in this case D:\\workdir\\PV 81\\config\\sum81pv.pwf) from this string. This path is an argument of a command option -sn or -n, so this path always appears after these options.
The path may or may not contain whitespaces, which needs to be handled.
public class TestClass {
public static void main(String[] args) {
String path;
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
path = s.replaceAll(".*(-sn|-n) \"?([^ ]*)?", "$2");
System.out.println("Path: " + path);
}
}
Current output: Path: D:\workdir\PV 81\config\sum81pv.pwf -C 5000
Expected output: Path: D:\workdir\PV 81\config\sum81pv.pwf
Below Answers working fine for the earlier case.
i need a regex which return `*.pwf` path if the option is `-sn, -n, -s, -s -n, or without -s or -n.`
But if I have below case then what would be the regex to find password file.
String s1 = msqllab91 0 0 1 50 50 60 /mti/root/bin/msqlora -n "tmp/my.pwf" -s
String s2 = msqllab92 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.pwf
String s3 = msqllab93 0 0 1 50 50 60 msqlora -s -n "/mti/root/my.pwf" -C 10000
String s4 = msqllab94 0 0 1 50 50 60 msqlora.exe -sn /mti/root/my.pwf
String s5 = msqllab95 0 0 1 50 50 60 msqlora.exe -sn "/mti/root"/my.pwf
String s6 = msqllab96 0 0 1 50 50 60 msqlora.exe -sn"/mti/root"/my.pwf
String s7 = msqllab97 0 0 1 50 50 60 "/mti/root/bin/msqlora" -s -n /mti/root/my.pwf -s
String s8 = msqllab98 0 0 1 50 50 60 /mti/root/bin/msqlora -s
String s9 = msqllab99 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.NOTpwf -s -n /mti/root/my.pwf
String s10 = msqllab90 0 0 1 50 50 60 /mti/root/bin/msqlora -sn /mti/root/my.NOTpwf -sn /mti/root/my.pwf
String s11 = msqllab901 0 0 1 50 50 60 /mti/root/bin/msqlora
String s12 = msqllab902 0 0 1 50 50 60 /mti/root/msqlora-n NOTmy.pwf
String s13 = msqllab903 0 0 1 50 50 60 /mti/root/msqlora-n.exe NOTmy.pwf
i need a regex which return *.pwf path if the option is -sn, -n, -s, -s -n, or without -s or -n.
path contains *.pwf file extension only not NOTpwf or any other extension and code should all work except the last two because it is an invalid command.
Note: I already asked this type of question but didn't get anything working as per my requirement. (How to get specific substring with option vale using java)
You can use:
path = s.replaceFirst(".*\\s-s?n\\s*(.+?)(?:\\s-.*|$)", "$1");
//=> D:\workdir\PV 81\config\sum81pv.pwf
Code Demo
RegEx Demo
Try this
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
int l=s.indexOf("-sn");
int l1=s.indexOf("-C");
System.out.println(s.substring(l+4,l1-2));
You can also use : [A-Z]:.*\.\w+
Demo and Explaination
Rather than using complex regexps for replacing, I'd rather suggest a simpler one for matching:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
Pattern pattern = Pattern.compile("\\s-s?n\\s*(.*?)\\s*-C\\s+\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
// => D:\workdir\PV 81\config\sum81pv.pwf
See the IDEONE Demo
If the -C <NUMBER> is optional at the end, wrap with an optional group -> (?:\\s*-C\\s+\\d+)?$.
Pattern details:
\\s - a whitespace
-s?n - a -sn or -n (as s? matches an optional s)
\\s* - 0+ whitespaces
(.*?) - Group 1 matching any 0+ chars other than a newline
\\s* - ibid
-C - a literal -C
\\s+ - 1+ whitespaces
\\d+ - 1 or more digits
$ - end of string.

Different data when printing and writing to file

There is a stream of data which is sent from server. I need to store this byte stream into a file. The problem is the data which I output to console and the one which I store in a file are different. Seems like there is a change in format of data when I stored in a file.
Here is the program:
try
{
System.out.println("My Address is "+serverSocket.getLocalSocketAddress());
Socket server = serverSocket.accept(); // return a new socket
System.out.println("Connected to client "+server.getRemoteSocketAddress());
inputStream = server.getInputStream();
in = new DataInputStream(inputStream);
out = new FileOutputStream("output.txt");
ArrayList<Byte> bytes = new ArrayList<Byte>();
int curi;
byte cur;
byte[] curBytes = null;
int length = 0;
System.out.println("Before while loop");
while((curi = in.read())!=-1 && count!=500)
{
System.out.println(count+" Reading some data");
//out.write(curi);
cur = (byte)curi;
bytes.add(cur);
curBytes = getPrimativeArray(bytes);
String curBytesString = new String(curBytes, "UTF-8");
count++;
}
int i=0;
for(byte b : bytes)
{
System.out.print(b+" ");
curBytes[i] = b;
i++;
}
out.write(curBytes);
server.close();
}
catch(IOException e)
{
e.printStackTrace();
}
What I print using System.out.print(b+" "); and the one I store in curBytes[] are the same thing. But when I compare the console and file output, they are different.
Console output: 0 0 113 -100 -126 -54 0 32 14 1 0 0 1 -58 60 54 0 3 63 -2 85 74 -81 -88 0 9 1 24 85 74 -81 -48 0 13 65 -113 85 74 -81 -88 0 12 125 -126 85 74 -81 -88 0 13 21 97 85 74 -81 -88 0 13 31 19 85 74 -81 -48 0 13 42 24 0 6 0 0 0 0 0 0 0 0 0 0 32 0 7 -100 0 -5 6 -128 0 -56 29 -127 23 112 -1 -1 0 0 64 0 1 -121 28 115 105 112 58 43 49 52 50 50 50 48 57 57 57 49 53 64 111 110 101 46 97 116 116 46 110 101 116 28 115 105 112 58 43 49 52 50 50 50 48 57 57 57 54 53 64 111 110 101 46 97 116 116 46 110 101 116 37 50 57 54 53 45 49 53 48 53 48 54 50 51 50 55 48 50 45 50 48 53 48 54 54 50 55 54 54 64 48 48 55 56 48 48 49 49 16 32 1 5 6 64 0 0 0 32 16 0 0 0 120 0 17 16 32 1 24 -112 16 1 46 2 0 0 0 0 0 0 0 6 1 -113 0 4 0 33 -64 -42 0 91 5 8 0 9 0 -56 0 0 0 15 3 85 74 -81 -88 0 12 -120 -28 8 0 9 0 -56 0 0 0 15 3 85 74 -81 -88 0 12 -44 -39 8 0 4 0 -56 0 0 1 11 3 85 74 -81 -88 0 9 1 24 8 0 5 0 0 0 0 0 0 3 85 74 -81 -88 0 13 31 19 8 0 1 0 -56 0 0 0 6 3 85 74 -81 -48 0 13 42 24 -64 34 4 24 9 89 83 73 80 47 50 46 48 47 84 67 80 32 91 50 48 48 49 58 53 48 54 58 52 48 48 48 58 48 58 50 48 49 48 58 48 58 55 56 58 49 49 93 58 49 51 55 48 59 98 114 97 110 99 104 61 122 57 104 71 52 98 75 50 57 48 45 48 48 55 56 48 48 49 49 45 48 48 48 102 45 52 52 49 57 55 49 52 48 51 3 85 74 -81 -88 0 12 -120 -28 127 83 73 80 47 50 46 48 47 84 67 80 32 91 50 48 48 49 58 53 48 54 58 52 48 48 48 58 48 58 50 48 49 48 58 48 58 55 56 58 49 49 93 58 49 51 55 48 59 114 101 99 101 105 118 101 100 61 50 48 48 49
File Output: ^#^#q<9c><82>Ê^# ^N^A^#^#^AÆ<6^#^C?þUJ¯¨^# ^A^XUJ¯Ð^#^MA<8f>UJ¯¨^#^L}<82>UJ¯¨^#^M^UaUJ¯¨^#^M^_^SUJ¯Ð^#^M*^X^#^F^#^#^#^#^#^#^#^#^#^# ^#^G<9c>^#û^F<80>^#È^]<81>^Wpÿÿ^#^##^#^A<87>^\sip:+14222099915#one.att.net^\sip:+14222099965#one.att.net%2965-150506232702-2050662766#00780011^P ^A^E^F#^#^#^# ^P^#^#^#x^#^Q^P ^A^X<90>^P^A.^B^#^#^#^#^#^#^#^F^A<8f>^#^D^#!ÀÖ^#[^E^H^# ^#È^#^#^#^O^CUJ¯¨^#^L<88>ä^H^# ^#È^#^#^#^O^CUJ¯¨^#^LÔÙ^H^#^D^#È^#^#^A^K^CUJ¯¨^# ^A^X^H^#^E^#^#^#^#^#^#^CUJ¯¨^#^M^_^S^H^#^A^#È^#^#^#^F^CUJ¯Ð^#^M*^XÀ"^D^X YSIP/2.0/TCP [2001:506:4000:0:2010:0:78:11]:1370;branch=z9hG4bK290-00780011-000f-441971403^CUJ¯¨^#^L<88>ä^?SIP/2.0/TCP [2001:506:4000:0:2010:0:78:11]:1370;received=2001
Please let me know at what step I'm making a mistake.
The other answers here tell you to use a PrintWriter or a FileWriter instead of the FileOutputStream but I'm fairly sure that this is not what you want.
Your problem is that you're writing raw bytes to a file and then reading it back as characters and comparing that to byte values represented as characters and then printed with System.out.
Let's take a look at what happens when you print a byte with the value 65 (or 01000001 in binary).
When you use System.out.print you will invoke PrintStream.print(int) with the integer value of 65 which will in turn print the characters 6 and 5 to the terminal.
When you use out.write you will invoke FileOutputStream.write(byte[]) which will write the bits 01000001 to the file.
Later, when you check the contents of the file your tool will try to interpret this byte as a character and it will most likely use the ASCII encoding to do so (even if you're using Unicode as your default encoding this is likely what will happen since Unicode is a superset of ASCII). This results in the character A being printed.
If you want to view the output file in a way similar to what you've printed with System.out.print you can use the following command on linux:
$ hexdump -e '/1 "%i "' <file>
Example:
$ cat /etc/issue
Ubuntu 12.04.5 LTS \n \l
$ hexdump -e '/1 "%i "' /etc/issue
85 98 117 110 116 117 32 49 50 46 48 52 46 53 32 76 84 83 32 92 110 32
92 108 10 *
My first answer was wrong, so I am editing this because I made the assumption that you could write out a string to the FileOutputStream, but I don't think that is the case. FileOutputStream is only used for byte streams, so you've got to stick to that format when writing out to the file.
If you hold the data in a buffer[array], and then write those bytes out to a file that you have created using the output stream, it should work. I found this document that might be helpful.
The main idea is that somewhere in your code, the byte array isn't getting written to the file correctly. Perhaps its just a matter of adding the close() method.
out.close();
server.close();
reading and writing files in java
Here is the section I found useful.
import java.io.*;
public class Test {
public static void main(String [] args) {
// The name of the file to create.
String fileName = "temp.txt";
try {
// Put some bytes in a buffer so we can
// write them. Usually this would be
// image data or something. Or it might
// be unicode text.
String bytes = "Hello theren";
byte[] buffer = bytes.getBytes();
FileOutputStream outputStream =
new FileOutputStream(fileName);
// write() writes as many bytes from the buffer
// as the length of the buffer. You can also
// use
// write(buffer, offset, length)
// if you want to write a specific number of
// bytes, or only part of the buffer.
outputStream.write(buffer);
// Always close files.
outputStream.close();
System.out.println("Wrote " + buffer.length +
" bytes");
}
catch(IOException ex) {
System.out.println(
"Error writing file '"
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
}
}
The console (System.out) is a PrintWriter, while the file output is a FileOutputStream.
The basic difference between Stream and Writer: Streams are supposed to manipulate "raw data", like numbers taken directly from binary format, while writers are used to manipulate "human-readable data", transforming all the data you write.
For example, the 6 int is different from the 6 character. When you use a stream, you write directly the int, while with a writer, the data wrriten is transformed into the character.
Then, if you want your file output to be the same as your console output, do not use FileOutputStream, but instead, use FileWriter, and it's method write(String).
How to make this work:
1 - replace out = new FileOutputStream("output.txt"); by out = new FileWriter("output.txt");
2 - replace out.write(curBytes); by:
for (byte b : curBytes) {
out.write(b + " ");
}
I would suggest you use IOUtils.copy and use a BufferedReader
to wrap your InputStream.
The output stream should obviously be FileOutputStream
I hope this helps

How to get unique values in jList?

I have an excel database with some values like
PlotID Sl_no Species Gbh_cm
P1 1 Cassia fistula 13
P1 2 Lagerstroemia parviflora 41
P1 3 Lagerstroemia parviflora 59
P2 84 Shorea robusta 97
P2 85 Shorea robusta 101
P2 86 Shorea robusta 103
P2 87 Terminalia tomentosa 227
P2 88 Terminalia tomentosa 83
P2 89 Terminalia tomentosa 226
P2 90 Terminalia tomentosa 88
I want to display distinct values of species in my jList.When I run my code I get distinct values of all species except Terminalia tomentosa which is the last species of my excel database. This species come two times.
My code is:
sql1="select distinct Species from [Data$] where Plot_ID = ? ";
pst1=con.prepareStatement(sql1);
pst1.setString(1, tmp);
//pst1.setString(2, tmp);
rs5=pst1.executeQuery();
while(rs5.next())
{
species=rs5.getString("Species");
System.out.println(species);
dlm1.addElement(species);
}
listSPECIES.setModel(dlm1);

Comparing lines in a file

I am trying to compare File 1 and File 2.
File 1:
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 1
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 1
File 2:
5.1 0.11 0.32 1.6 0.028 12 90 0.99008 3.57 0.52 12.2 6 -1
7.3 0.25 0.39 6.4 0.034 8 84 0.9942 3.18 0.46 11.5 5 1
6.9 0.38 0.25 9.8 0.04 28 191 0.9971 3.28 0.61 9.2 5 -1
7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
In both files the last element in each line is class label.
I am comparing if the class labels are equal.
ie compare the classlabel of
line1:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
with
line2:7.3 0.28 0.36 12.7 0.04 38 140 0.998 3.3 0.79 9.6 6 1
Matches.
compare
line1:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 1
with
line2:7.4 0.33 0.26 15.6 0.049 67 210 0.99907 3.06 0.68 9.5 5 -1
Not matches
Updated
What I did is
String line1;
String line2;
int notequalcnt = 0;
while((line1 = bfpart.readLine())!=null){
found = false;
while((line2 = bfin.readLine())!=null){
if(line1.equals(line2)){
found = true;
break;
}
else{
System.out.println("not equal");
notequalcnt++;
}
}
}
But I am getting every one as not equal.
Am I doing anything wrong.
After the first iteration itself, line2 becomes null. So, the loop will not execute again... Declare line2 buffer after the first while loop. Use this code:
public class CompareFile {
public static void main(String args[]) throws IOException{
String line1;
String line2;
boolean found;
int notequalcnt =0;
BufferedReader bfpart = new BufferedReader(new FileReader("file1.txt"));
while((line1 = bfpart.readLine())!=null){
found = false;
BufferedReader bfin = new BufferedReader(new FileReader("file2.txt"));
while((line2 = bfin.readLine())!=null){
System.out.println("line1"+line1);
System.out.println("line2"+line1);
if(line1.equals(line2)){
System.out.println("equal");
found = true;
break;
}
else{
System.out.println("not equal");
}
}
bfin.close();
if(found==false)
notequalcnt++;
}
bfpart.close();
}
}
You're comparing every line from file 1 with every line from file 2, and you are printing "not equal" every time any one of them doesn't match.
If file 2 has 6 lines, and you are looking for a given line from file 1 (say it's also in file 2), then 5 of the lines from file 2 won't match, and "not equal" will be output 5 times.
Your current implementation says "if any lines in file 2 don't match, it's not a match", but what you really mean is "if any lines in file 2 do match, it is a match". So your logic (pseudocode) should be more like this:
for each line in file 1 {
found = false
reset file 2 to beginning
for each line in file 2
if line 1 equals line 2
found = true, break.
if found
"found!"
else
"not found!"
}
Also you describe this as comparing "nth line of file 1 with nth line of file 2", but that's not actually what your implementation does. Your implementation is actually comparing the first line of file 1 with every line of file 2 then stopping, because you've already consumed every line of file 2 in that inner loop.
Your code has a lot of problems, and you probably need to sit back and work out your logic on paper first.
If the target is to compare and find the matching lines. Convert the file contents to an arraylist and compare the values.
Scanner s = new Scanner(new File("file1.txt"));
ArrayList<String> file1_list = new ArrayList<String>();
while (s.hasNext()){
file1_list .add(s.next());
}
s.close();
s = new Scanner(new File("file2.txt"));
ArrayList<String> file2_list = new ArrayList<String>();
while (s.hasNext()){
file2_list .add(s.next());
}
s.close();
for(String line1 : file1_list ){
if(file2_list.contains(line1)){
// found the line
}else{
// NOt found the line
}
}
Check Apache file Utils o compare files.
http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html

Categories