Splitting up file in Java Scanner using Regex

Splitting up file in Java Scanner using Regex - java

I have the following data which I want to split :
1111|AAA|DDDD|CCC00021|RR13|600999922|101111287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000
2222|BBB|DDDD|CCC00031|RR15|600911122|101000287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000
3333|AAA|DDDD|CCC11021|RR01|600955522|101122287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000
Treating them like each line . I need to store each elements
to get an output of :
1111
AAA
DDDD
CCC00021
RR13
600999922
101111287
0
0
2011-06-20 15:38:31.549000
2011-06-30 08:57:20.114000
Next line
2222
BBB
DDDD
CCC00031
RR15
600911122
101000287
0
0
2011-06-20 15:38:31.549000
2011-06-30 08:57:20.114000
Next Line
3333
AAA
DDDD
CCC11021
RR01
600955522
101122287
0
0
2011-06-20 15:38:31.549000
2011-06-30 08:57:20.114000
I am using Scanner class.

Code:
String var = "1111|AAA|DDDD|CCC00021|RR13|600999922|101111287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000 2222|BBB|DDDD|CCC00031|RR15|600911122|101000287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000 3333|AAA|DDDD|CCC11021|RR01|600955522|101122287|0|0|2011-06-20 15:38:31.549000|2011-06-30 08:57:20.114000"
for(String x : var.split("\\|")){
System.out.println(x);
}

Related

Java: Hex-String split by Hex-Byte with regex

I would like to Split a HEX-String in Java using split(), regex or any other convenient method.
What I want to do is to split it at a special Byte in HEX.
Example:
String hex = "1234567E1237E6787E4321";
String del = "7E"; // ~
hex.split( ????? );
Result should be:
123456
1237E678
4321
If I use hex.split( del ); I get:
123456
123
678
4321

Splitting a string into an array then splitting the array again

I have this string:
fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862
I want to use a loop to split this string into an array at the comma [,] example:
[0]fname lname
[1]GTA V: 120 : 00000
[2]Minecraft : 20 : 10
[3]Assassin’s Creed IV : 90 : 800
[4]Payday 2 : 190 : 2001
[5]Wolfenstein TNO : 25 : 80
[6]FarCry 4 : 55 : 862
Then I want to use another loop to split this further at : into another array example
[0]fname lname
[1]GTA V
[2]120
[3]00000
[4]Minecraft
[5]20
[6]10
....
Is there a better way of doing this?
currently I have:
List<String> lines = new ArrayList<String>();
while (scan.hasNextLine())
{
lines.add(scan.nextLine());
}
//converts the list array to string array
String[] scanarray = lines.toArray(new String[0]);
//converts the string array into one large string
String str_array = Arrays.toString(scanarray);
String[] arraysplit;
arraysplit = str_array.split("\\s*:\\s*");
for (int i=0; i<arraysplit.length; i++)
{
arraysplit[i] = arraysplit[i].trim();
// ^^^^^^^^^^^^ has values with spaces
System.out.println(scanarray[i]);
}
EDIT:
Currently my program creates 3 identical arrays, containing the example you can see in the second block of code above.

You can use the split method from String class with multiple delimiters
public static void main(String[] args) {
String myOriginalString = " fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862";
// | is the regex OR operator
String[] splited = myOriginalString.split(",|:");
for(String s : splited)
System.out.println(s.trim());
}

You can achieve it what you are looking for with REGEX, just put what all thing you get separated with string split method.
I tried below code locally and it is pretty much same what you are looking for.
public class StackSol1 {
public static void main(String[] args) {
String str = "fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 :80, FarCry 4 : 55 : 862";
String delimiters = "\\s+|,\\s*|\\:\\s*";
// analyzing the string
String[] tokensVal = str.split(delimiters);
// prints the number of tokens
System.out.println("Count of tokens = " + tokensVal.length);
String finalStr="";
for (String token : tokensVal) {
finalStr = finalStr+"\n"+token;
}
System.out.println(finalStr);
}
}

How about using split with regex? e.g.
String aa = "fname lname, GTA V: 120 : 00000, Minecraft : 20 : 10, Assassin’s Creed IV : 90 : 800, Payday 2 : 190 : 2001 ,Wolfenstein TNO : 25 : 80, FarCry 4 : 55 : 862";
String [] a = aa.split("[,:]");

Finding substring from a string using regex java

I have a String:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
I want to get the path (in this case D:\\workdir\\PV 81\\config\\sum81pv.pwf) from this string. This path is an argument of a command option -sn or -n, so this path always appears after these options.
The path may or may not contain whitespaces, which needs to be handled.
public class TestClass {
public static void main(String[] args) {
String path;
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
path = s.replaceAll(".*(-sn|-n) \"?([^ ]*)?", "$2");
System.out.println("Path: " + path);
}
}
Current output: Path: D:\workdir\PV 81\config\sum81pv.pwf -C 5000
Expected output: Path: D:\workdir\PV 81\config\sum81pv.pwf
Below Answers working fine for the earlier case.
i need a regex which return `*.pwf` path if the option is `-sn, -n, -s, -s -n, or without -s or -n.`
But if I have below case then what would be the regex to find password file.
String s1 = msqllab91 0 0 1 50 50 60 /mti/root/bin/msqlora -n "tmp/my.pwf" -s
String s2 = msqllab92 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.pwf
String s3 = msqllab93 0 0 1 50 50 60 msqlora -s -n "/mti/root/my.pwf" -C 10000
String s4 = msqllab94 0 0 1 50 50 60 msqlora.exe -sn /mti/root/my.pwf
String s5 = msqllab95 0 0 1 50 50 60 msqlora.exe -sn "/mti/root"/my.pwf
String s6 = msqllab96 0 0 1 50 50 60 msqlora.exe -sn"/mti/root"/my.pwf
String s7 = msqllab97 0 0 1 50 50 60 "/mti/root/bin/msqlora" -s -n /mti/root/my.pwf -s
String s8 = msqllab98 0 0 1 50 50 60 /mti/root/bin/msqlora -s
String s9 = msqllab99 0 0 1 50 50 60 /mti/root/bin/msqlora -s -n /mti/root/my.NOTpwf -s -n /mti/root/my.pwf
String s10 = msqllab90 0 0 1 50 50 60 /mti/root/bin/msqlora -sn /mti/root/my.NOTpwf -sn /mti/root/my.pwf
String s11 = msqllab901 0 0 1 50 50 60 /mti/root/bin/msqlora
String s12 = msqllab902 0 0 1 50 50 60 /mti/root/msqlora-n NOTmy.pwf
String s13 = msqllab903 0 0 1 50 50 60 /mti/root/msqlora-n.exe NOTmy.pwf
i need a regex which return *.pwf path if the option is -sn, -n, -s, -s -n, or without -s or -n.
path contains *.pwf file extension only not NOTpwf or any other extension and code should all work except the last two because it is an invalid command.
Note: I already asked this type of question but didn't get anything working as per my requirement. (How to get specific substring with option vale using java)

You can use:
path = s.replaceFirst(".*\\s-s?n\\s*(.+?)(?:\\s-.*|$)", "$1");
//=> D:\workdir\PV 81\config\sum81pv.pwf
Code Demo
RegEx Demo

Try this
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
int l=s.indexOf("-sn");
int l1=s.indexOf("-C");
System.out.println(s.substring(l+4,l1-2));

You can also use : [A-Z]:.*\.\w+
Demo and Explaination

Rather than using complex regexps for replacing, I'd rather suggest a simpler one for matching:
String s = "msqlsum81pv 0 0 25 25 25 2 -sn D:\\workdir\\PV 81\\config\\sum81pv.pwf -C 5000";
Pattern pattern = Pattern.compile("\\s-s?n\\s*(.*?)\\s*-C\\s+\\d+$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
// => D:\workdir\PV 81\config\sum81pv.pwf
See the IDEONE Demo
If the -C <NUMBER> is optional at the end, wrap with an optional group -> (?:\\s*-C\\s+\\d+)?$.
Pattern details:
\\s - a whitespace
-s?n - a -sn or -n (as s? matches an optional s)
\\s* - 0+ whitespaces
(.*?) - Group 1 matching any 0+ chars other than a newline
\\s* - ibid
-C - a literal -C
\\s+ - 1+ whitespaces
\\d+ - 1 or more digits
$ - end of string.

Transferring data structure from R to Java

I have an R script that does some computation. The last step of the computation is a kernel density estimate: http://www.inside-r.org/packages/cran/kerdiest/docs/kde
I now, in R, need to convert the result of calling kde into a string, or save it into a file, such that I can read and "unmarshal" it from a Java program.
What is the best format to use for the exchange and what R and Java libraries can read / write that format?
The structure is not ridiculously complex, but also not trivial:
> str(tmp)
List of 8
$ x : num [1:1398, 1:3] 1.035 0.902 0.679 0.826 1.243 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ eval.points:'data.frame': 1398 obs. of 3 variables:
..$ Rb ppm: num [1:1398] 1.035 0.902 0.679 0.826 1.243 ...
..$ Sb ppm: num [1:1398] -2.58 -2.6 -2.48 -2.44 -2.53 ...
..$ Cr ppm: num [1:1398] 4.56 4.44 4.3 4.26 4.49 ...
$ estimate : Named num [1:1398] 0.1572 0.0897 0.0311 0.0434 0.099 ...
..- attr(*, "names")= chr [1:1398] "1" "2" "3" "4" ...
$ H : num [1:3, 1:3] 0.02395 0.00927 -0.014 0.00927 0.06868 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:3] "Rb ppm" "Sb ppm" "Cr ppm"
$ w : num [1:1398] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"

RJSONIO seems to do the job. It seems quite verbose, however.

Why doesn't ANTLR recognise this rule the way I expect?

I'm using ANTLR to replace an existing (small) parser I currently have. Here is a snippet of the file I am trying to parse:
Lurker 915236167 10 2 Bk cc b b 1000 70 200 Jc Qs
Lurker 915236237 10 1 Bc kf - - 1130 10 0
Lurker 915236302 10 10 c c rc b 1120 110 305 6d Kd
Lurker 915236381 10 9 c f - - 1315 20 0
Lurker 915236425 10 8 cc f - - 1295 30 0
Here is Shared.g:
lexer grammar Shared;
NICK
: LETTER (LETTER | NUMBER | SPECIAL)*
;
fragment
LETTER
: 'A'..'Z'
| 'a'..'z'
| '_'
;
NUMBER
: ('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9')+
;
fragment
SPECIAL
: ('-'|'^'|'{'|'}'|'|'|'['|']'|'`'|'\\')
;
WS
: ( ' '
| '\t'
| '\r'
| '\n'
)+
;
And Pdb.g:
grammar Pdb;
import Shared;
#header{
import java.util.ArrayList;
import java.sql.Connection;
}
#members{
private Connection conn;
private StringBuilder currentExpr = new StringBuilder(500);
ArrayList<String> players = new ArrayList<String>(10);
public void setConn(Connection conn){
this.conn = conn;
}
}
pdb
: line+
;
line
#after{
currentExpr.append("execute player_handplan(");
currentExpr.append($nick.text);
currentExpr.append(", to_timestamp(");
currentExpr.append(Integer.parseInt($timestamp.text));
currentExpr.append("), ");
currentExpr.append(Integer.parseInt($n_players.text));
currentExpr.append(", ");
currentExpr.append(Integer.parseInt($position.text));
currentExpr.append(", ");
currentExpr.append($action_p.text);
currentExpr.append(", ");
currentExpr.append($action_f.text);
currentExpr.append(", ");
currentExpr.append($action_t.text);
currentExpr.append(", ");
currentExpr.append($action_r.text);
currentExpr.append(", ");
currentExpr.append(Integer.parseInt($bankroll.text));
currentExpr.append(", ");
currentExpr.append(Integer.parseInt($total_action.text));
currentExpr.append(", ");
currentExpr.append(Integer.parseInt($amount_won.text));
currentExpr.append(", ");
currentExpr.append("CARDS");
currentExpr.append(");");
System.out.println(currentExpr.toString());
currentExpr = new StringBuilder(500);
}
: nick=NICK WS
timestamp=NUMBER WS
n_players=NUMBER WS
position=NUMBER WS
action_p=action WS
action_f=action WS
action_t=action WS
action_r=action WS
bankroll=NUMBER WS
total_action=NUMBER WS
amount_won=NUMBER WS
(NICK WS NICK WS)? // ignore this
;
action
: '-'
| ('B'|'f'|'k'|'b'|'c'|'r'|'A'|'Q'|'K')+
;
My problem is, when I run the parser, I get the following error:
cal#lambda:~/src/DecisionTrees/grammar/output$ cat example | java Test
line 1:26 no viable alternative at input 'Bk'
line 1:30 no viable alternative at input 'cc'
execute player_handplan(Lurker, to_timestamp(915236167), 10, 2, null, null, b, b, 1000, 70, 200, CARDS);
Why won't my grammar accept "Bk", even though it will accept "b"? I feel like there is something obvious I am overlooking. Thanks in advance

Why don't you use {$channel=HIDDEN} in rule WS and leave them out of the line rule.
That way at least you won't get in trouble for putting one too many WS by accident.
And if action can only have 2 chars max maybe trying this will help:
action
: '-'
| ('B'|'f'|'k'|'b'|'c'|'r'|'A'|'Q'|'K')('B'|'f'|'k'|'b'|'c'|'r'|'A'|'Q'|'K')?
;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting up file in Java Scanner using Regex - java

Related

Java: Hex-String split by Hex-Byte with regex

Splitting a string into an array then splitting the array again

Finding substring from a string using regex java

Transferring data structure from R to Java

Why doesn't ANTLR recognise this rule the way I expect?

Categories

Resources