How can I split String array with following delimiters in java - java

I have a line in input file.
It is arranged as following (example):
(space)MOV(space)A,(space)(space)#20
When computer is reading this line, I plan to split() this string and add into the array. I use following code for this:
while((nline = bufreader.readLine()) != null)
{
String[] array = nline.split("[ ,]");
With other words, string is splitted with delimiters: (space) and (comma). So, I expect my array to have a length of 3. but in practce I get 6.
So, as I understood, computer creates array of {"(space)", "MOV", "(space)", "A", "(space)", "(space)", "#20"}. However, I need this array: {"MOV", "A", "#20"}
How can I get this? Or how can I split the array according to the above mentioned delimiters. (I suppose that nline.split("[ ,]") is not correct).

I put all the explanations in the comment to proper lines, have a look at this:
String nline;
BufferedReader bufreader = new BufferedReader(new FileReader(new File("nameOfYourFile")));
while((nline = bufreader.readLine()) != null) {
String trimmed = nline.trim(); // removing leading and trailing spaces
// System.out.println(trimmed); Output from this line: >>MOV A, #20<< (">>" and "<<" just to show where it begins and ends)
String[] splitted = trimmed.split("[ |,]{1,}"); // split on ' ' OR ',' that appear AT LEAST once (so it also matches " ," (space + comma))
System.out.println(Arrays.toString(splitted)); // Output: [MOV, A, #20]
}
bufreader.close();

Related

How do I access every single word in a file java?

I am trying to keep every single word in this file into an array so i could apply my own language implementation on it. I have applied split already but when I put the string into the variable parts, parts[0] will display the whole file instead of one word only while parts[1] will give an error
java.lang.ArrayIndexOutOfBoundsException: 2
How do I access every single word in this file?
String[] parts = line.split("\\s+");
System.out.print(parts[0] + '\n');
file test.snol contains
SNOL
INTO num IS 5
INTO res IS MULT num num
INTO res IS MULT res res
INTO res IS MOD res num
PRINT num
PRINT res
LONS
If you are using java-8 you can do so in a single line :-
String[] words = Files.lines(Paths.get(PATH))
.flatMap(line -> Arrays.stream(line.split(" ")))
.toArray(String[]::new);
Alternatively, if you want to access each line as a list of String[] you can use :-
List<String[]> lines = Files.lines(Paths.get(PATH))
.collect(Collectors.toList())
.stream().map(e -> e.split(" "))
.collect(Collectors.toList());
The regular expression match token for whitespace is \s. Your code uses a forward slash (/) instead of a backslash (\) which has no special meaning, so your code is trying to match two forward slashes followed by one or more ss.
In Java, regular expressions are passed through strings, so backslashes need to be escaped by a second backslash (unlike a forward slash which needs no special handling). Your regular expression should read "\\s+" which will match one more whitespace characters.
Your call to split should then return an array with each word from the line as a different element.
If you are reading your file line by line, you can access every word with code like
BufferedReader reader = new BufferedReader(new FileReader("D:\\test.snol"));
String line;
while ((line = reader.readLine()) != null) {
String[] words = line.split("\\s+");
for (String word : words) {
System.out.println(word);
}
}

How can I parse a "key1:value1, value, key2:value3" string into ArrayLists?

I have a string
String line = "abc:xyz uvw, def:ghi, mno:rst, ijk:efg, abc opq";
I want to parse this string into two lists:
ArrayList<String> tags;
ArrayList<String> values;
where the tags are the words before the colon (in my example: abc, def, ijk and mno). That is I want
tags = Arrays.asList("abc", "def", "mno", "ijk");
values = Arrays.asList("xyz uvw", "ghi", "rst", "efg, abc opq");
Note that the values can have spaces and commas in them and are not just one word.
Since your values can contain commas, you need to split when you find a key.
A key is defined as a word preceding a :.
So, your split pattern will be ", (?=[a-zA-z]+:)"
This checks for a comma space chars colon in the specified order, looking ahead the chars and colon.
Checks for a key, and splits with lookahead (thus leaving the key intact). This will give you an array of keyValue pairs
Then you can split with : to get the keys and values.
String str = "Your string";
String[] keyValuePairs = str.split(", (?=[a-zA-z]+:)");
for (String keyValuePair : keyValuePairs) {
String[] keyvalue = keyValuePair.split(":");
keysArray.add(keyvalue[0]);
valuesArray.add(keyvalue[1]);
}
I would go with a regex. I am not sure how to do this in Java but in python that would be:
(\w+):([ ,\w]+)(,|$)
Tested on pythex with input abc:xy z uvw, def:g,hi, mno:rst. The result is:
Match 1
1. abc
2. xy z uvw
3. ,
Match 2
1. def
2. g,hi
3. ,
Match 3
1. mno
2. rst
3. Empty
So for each match you get the key in position 1 and the value in 2. The separator is stored in 3
First obtain your string from the file
List<String> tags = new ArrayList<String>();
List<String> values = new ArrayList<String>;
String lineThatWasRead = ...
Then we split it by commas to obtain the pairs, and for each pari split by :
List<String> separatedStringList = Arrays.asList(lineThatWasRead.split(","));
for (String str : separatedStringList)
{
//Since it is possible values also contain commas, we have to check
//if the current string is a new pair of tag:value or just adding to the previous value
if (!str.contains(":"))
{
if (values.size > 0)
{
values.set(values.size() - 1, values.get(values.size()-1) + ", " + str);
continue; //And go to next str since this one will not have new keys
}
}
String[] keyValArray = str.split(:);
String key = keyValArray[0];
String val = keyValArray[1];
tags.add(key);
values.add(val);
}
Note that you are not forced to use a list but I just like the flexibility they give. Some might argue String[] would perform better for the first split by ,.
You get your line as string.
//your code here
String line = //your code here
String[] stuff = line.split(":")// this will split your line by ":" symbol.
stuff[0] //this is your key
stuff[1] //this is your value

Transferring each elemnt in a text file into an array

I have made this method to take in a file.txt and transfer its elements into an array list.
My problem is, I dont want to transfer a whole line into one string. I want to take each element on the line as string.
public ArrayList<String> readData() throws IOException {
FileReader pp=new FileReader(filename);
BufferedReader nn=new BufferedReader(pp);
ArrayList<String> data=new ArrayList<String>();
String line;
while((line=nn.readLine()) != null){
data.add(line);
}
xoxo.close();
return data;
}
is it possible ?
What about reading the lines, but splitting each line into the single words?
while ((line = nn.readLine()) != null) {
for (String word : line.split(" ")) {
data.add(line);
}
}
The method split(" ") in this example will split the line on each whitespace " " and put the single words into an array.
In case the words in the file are separated by another character (like a comma for example) you can use that too in split():
line.split(",");
If I may, here is a somewhat easier way to read a text file:
Scanner scanner = new Scanner(filename);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
for (String word : line.split(" ")) {
data.add(word);
}
}
Well not easier but shorter :)
And one last advice: if you give your variables a more.. readable name like bufferedReader instead of naming them all nn, pp, xoxo you might have less problems when the code grows more and more complex later on
Use split function for String.
String line = "This is line";
String [] a = line.split("\\s");// \\s is regular expression for space
a[0] = This
a[1] = is
a[2] = line
If by 'Element' you mean each word, then simply changing
line = nn.readLine()
to
line = nn.read()
should fix your problem, as the read method will take in every character it reads until it hits a space character in which it will return the processed characters. However if by element you mean each character then the problem is much harder. You will need to read each word and split that string up using any of the various functions Java provides.

regular expression for extracting some data from a text file

I have a text with sentences by this format:
sentence 1 This is a sentence.
t-extraction 1 This is a sentence
s-extraction 1 This_DT is_V a_DT sentence_N
sentence 2 ...
As you see, the lines are separated by enter key. sentence, t-extraction, s-extraction words are repeated. The numbers are sentence numbers 1,2,.. . The phrases are separated by Tab key for example in the first line: sentence(TAb)1(TAb)This is a sentence.
or in the second line:t-extraction(TAb)1(TAb)This(TAb)is(TAb)a sentence.
I need to map some of these information in a sql table, so I should extract them.
I need first and second sentence(without sentence word in first lines and t-extraction and numbers in second lines). Each separated part by Tab will be mapped in a field in sql (for example 1 in one column, This is a sentence in one column, This (in second lines) in one column, and also is and a sentence ).
What is your suggestion? Thanks in advance.
You could use String.split().
The regex you could use is [^A-Za-z_]+ or [ \t]+
Using the split method on String is probably the key to this. The split command breaks a string into parts where the regex matches, returning an array of Strings of the parts between the matches.
You want to match on tab (or \t as it is delimited to). You also want to process three lines as a unit, the code below shows one way of doing this (it does depend on the file being in good format).
Of course you want to use a reader created from your file not a string.
public class Test {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("/my/file.data"));
String line = null;
for(int i = 0; (line = reader.readLine()) != null; i++){
if(i % 3 == 0){
String[] parts = line.split("\t");
System.out.printf("sentence ==> %s\n", Arrays.toString(parts));
} else if(i % 3 == 1){
String[] parts = line.split("\t");
System.out.printf("t-sentence ==> %s\n", Arrays.toString(parts));
} else {
String[] parts = line.split("\t");
System.out.printf("s-sentence ==> %s\n", Arrays.toString(parts));
}
}
}
}

Java regex, delete content to the left of comma

I got a string with a bunch of numbers separated by "," in the following form :
1.2223232323232323,74.00
I want them into a String [], but I only need the number to the right of the comma. (74.00). The list have abouth 10,000 different lines like the one above. Right now I'm using String.split(",") which gives me :
System.out.println(String[1]) =
1.2223232323232323
74.00
Why does it not split into two diefferent indexds? I thought it should be like this on split :
System.out.println(String[1]) = 1.2223232323232323
System.out.println(String[2]) = 74.00
But, on String[] array = string.split (",") produces one index with both values separated by newline.
And I only need 74.00 I assume I need to use a REGEX, which is kind of greek to me. Could someone help me out :)?
If it's in a file:
Scanner sc = new Scanner(new File("..."));
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's all one giant string, separated by new-lines:
String oneGiantString = "1.22,74.00\n1.22,74.00\n1.22,74.00";
Scanner sc = new Scanner(oneGiantString);
sc.useDelimiter("(\r?\n)?.*?,");
while (sc.hasNext())
System.out.println(sc.next());
If it's just a single string for each:
String line = "1.2223232323232323,74.00";
System.out.println(line.replaceFirst(".*?,", ""));
Regex explanation:
(\r?\n)? means an optional new-line character.
. means a wildcard.
.*? means 0 or more wildcards (*? as opposed to just * means non-greedy matching, but this probably doesn't mean much to you).
, means, well, ..., a comma.
Reference.
split for file or single string:
String line = "1.2223232323232323,74.00";
String value = line.split(",")[1];
split for one giant string (also needs regex) (but I'd prefer Scanner, it doesn't need all that memory):
String line = "1.22,74.00\n1.22,74.00\n1.22,74.00";
String[] array = line.split("(\r?\n)?.*?,");
for (int i = 1; i < array.length; i++) // the first element is empty
System.out.println(array[i]);
Just try with:
String[] parts = "1.2223232323232323,74.00".split(",");
String value = parts[1]; // your 74.00
String[] strings = "1.2223232323232323,74.00".split(",");

Categories