Java File Parsing - Go word by word - java
I have a file content as Follows:
Sample.txt:
Hi my name is john
and I am an engineer. How are you
The output I want is an arrayList of string like [Hi,my,name,is,john,and,I,am,an,engineer,.,How,are,you]
The standard java function parses it as line and I would get an array containing the lines. I am confused as to which approach I should use to get the following output.
Any help is appretiated.
.nextLine() will get one whole line but .next() will go word by word
You could check out using the Scanner class with the .next() method.
This will read the file and collect all words into a list of strings.
Edit: Updated so as to handle punctuation and the likes as distinct words:
try {
List<String> words = Files.lines(Paths.get("/path/to/sample.txt"))
.map(line -> line.split("\\b"))
.flatMap(Arrays::stream)
.filter(w -> !w.trim().isEmpty())
.collect(Collectors.toList());
return words;
} catch (IOException e) {
// handle error
}
If you are getting the strings as whole lines, but just want the words, you could use .split(" ") on the words, as this would return an array containing individual words with no spaces. If you want to do this within the file reading, you could use something like the following...
public ArrayList<String> readWords(File file) throws IOException {
ArrayList<String> words = new ArrayList<String>();
String cLine = "";
BufferedReader reader = new BufferedReader(new FileReader(file));
while ((cLine = reader.readLine()) != null) {
for (String word : cLine.split(" ")) {words.add(word);}
}
reader.close();
return words;
}
which would return an ArrayList<String> containing all of the individual words in the file.
Hope this helps.
Related
How to collect CSV row as array of strings using simpleflatmapper
I'm trying to collect CSV row as array of strings using simpleflatmapper: try (Reader in = Files.newBufferedReader("path")) { return org.simpleflatmapper.csv.CsvParser // .mapTo(String[].class) .stream(in) // .parallel() // .flatMap(Arrays::stream) .map(line -> {return new ArrayList<>(Arrays.asList(line));}) // .map(Arrays::asList) .collect(Collectors.toList()); } catch (Exception e) { e.printStackTrace(); } As I debug, the line is String[] but the value is entire row (one element) instead of many strings (many cells). How can I got the array of cells? The CSV file is no special. Ex: a\t b\t 1\t 2 x\t y\t 3\t 4 The issue as I see in this code .map(line -> {return new ArrayList<>(Arrays.asList(line));}) that the line contains one string value that is the whole line (with tab, space, ...) instead of many strings (each string is the value of each cell). The whole result I want is List<List<String>> (List of lines). Each line is List<String> (list of cells). The current result is list of lines (rows), each line/row is the whole string.
since the file is CSV, you don't need to use any external lib, so simply you have to read the file as you read a txt file like this Scanner scanner=new Scanner(new File("MyFile.csv")); while(scanner.hasNextLine()){ myArray=scanner.nextLine().split(","); }
I have found the solution: return org.simpleflatmapper.csv.CsvParser .separator('\t') //<-- solution .stream(in) .map(Arrays::asList) .collect(Collectors.toList()); Thanks all!
Grouping of words from a text file to Arraylist on the basis of length
public class JavaApplication13 { /** * #param args the command line arguments */ public static void main(String[] args) { // TODO code application logic here BufferedReader br; String strLine; ArrayList<String> arr =new ArrayList<>(); HashMap<Integer,ArrayList<String>> hm = new HashMap<>(); try { br = new BufferedReader( new FileReader("words.txt")); while( (strLine = br.readLine()) != null){ arr.add(strLine); } } catch (FileNotFoundException e) { System.err.println("Unable to find the file: fileName"); } catch (IOException e) { System.err.println("Unable to read the file: fileName"); } ArrayList<Integer> lengths = new ArrayList<>(); //List to keep lengths information System.out.println("Total Words: "+arr.size()); //Total waords read from file int i=0; while(i<arr.size()) //this loop will itrate our all the words of text file that are now stored in words.txt { boolean already=false; String s = arr.get(i); //following for loop will check if that length is already in lengths list. for(int x=0;x<lengths.size();x++) { if(s.length()==lengths.get(x)) already=true; } //already = true means file is that we have an arrayist of the current string length in our map if(already==true) { hm.get(s.length()).add(s); //adding that string according to its length in hm(hashmap) } else { hm.put(s.length(),new ArrayList<>()); //create a new element in hm and the adding the new length string hm.get(s.length()).add(s); lengths.add(s.length()); } i++; } //Now Print the whole map for(int q=0;q<hm.size();q++) { System.out.println(hm.get(q)); } } } is this approach is right? Explanation: load all the words to an ArrayList. then iterate through each index and check the length of word add it to an ArrayList of strings containing that length where these ArrayList are mapped in a hashmap with length of words it is containing.
Firstly, your code is working only for the files which contain one word by line as you're processing whole lines as words. To make your code more universal you have to process each line by splitting it to words: String[] words = strLine.split("\\s+") Secondly, you don't need any temporary data structures. You can add your words to the map right after you read the line from file. arr and lengths lists are actually useless here as they do not contain any logic except temporary storing. You're using lengths list just to store the lengths which has already been added to the hm map. The same can be reached by invoking hm.containsKey(s.length()). And an additional comment on your code: for(int x=0;x<lengths.size();x++) { if(s.length()==lengths.get(x)) already=true; } when you have a loop like this when you only need to find if some condition is true for any element you don't need to proceed looping when the condition is already found. You should use a break keyword inside your if statement to terminate the loop block, e.g. for(int x=0;x<lengths.size();x++) { if(s.length()==lengths.get(x)) already=true; break; // this will terminate the loop after setting the flag to true } But as I already mentioned you don't need it at all. That is just for educational purposes.
Your approach is long, confusing, hard to debug and from what I see it's not good performance-wise (check out the contains method). Check this: String[] words = {"a", "ab", "ad", "abc", "af", "b", "dsadsa", "c", "ghh", "po"}; Map<Integer, List<String>> groupByLength = Arrays.stream(words).collect(Collectors.groupingBy(String::length)); System.out.println(groupByLength); This is just an example, but you get the point. I have an array of words, and then I use streams and Java8 magic to group them in a map by length (exactly what you're trying to do). You get the stream, then collect it to a map, grouping by length of the words, so it's gonna put every 1 letter word in a list under key 1 etc. You can use the same approach, but you have your words in a list so remember to not use Arrays.stream() but just .stream() on your list.
How to only get the lines you want from an arraylist depending on how they start, IN JAVA
I have a very long string containing GPS data but this is not important. What I need to do is separate the string which is in an arraylist (one big string) into multiple pieces. The tricky part is that the string is made up of multiple 'gps sentances' and I only require two types of these sentences. The types I need start with $GPSGSV and $GPSGGA. Basically I need to dump ONLY THESE sentences into another arraylist while leaving all the rest behind. The new arraylist must be in line-by-line form so that each sentence is followed by a new line. Each sentence also ends in one white space which could be helpful when splitting up. The arraylist data is shown below. - This is printed from the arraylist. [$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A, $GPRMC,151018.000,A,5225.9627,N,00401.1624,W,0.11,104.71,210214,,*14, $GPGGA,151019.000,5225.9627,N,00401.1624,W,1,09,1.0,38.9,M,51.1,M,,0000*72, $GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A, $GPGSV,3,1,12,26,80,302,44,09,55,063,40,05,53,191,39,08,51,059,37*79, $GPGSV,3,2,12,28,43,112,34,15,40,284,42,21,18,305,33,07,18,057,27*7E, $GPGSV,3,3,12,10,05,153,,24,05,234,38,18,05,318,22,19,05,035,*79, $GPRMC,151019.000,A,5225.9627,N,00401.1624,W,0.10,105.97,210214,,*1D, $GPGGA,151020.000,5225.9627,N,00401.1624,W,1,09,1.0,38.9,M,51.1,M,,0000*78, $GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A, $GPRMC,151020.000,A,5225.9627,N,00401.1624,W,0.12,105.18,210214,,*12, $GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A, $GPRMC,151021.000,A,5225.9626,N,00401.1624,W,0.11,99.26,210214,,*28, $GPGGA,151022.000,5225.9626,N,00401.1623,W,1,09,1.0,38.9,M,51.1,M,,0000*7C, $GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A, $GPRMC,151022.000,A,5225.9626,N,00401.1623,W,0.11,109.69,210214,,*1F, The data continues up to 2000 sentences. Any help would be great. Thanks EDITS ------ Looking back at what I have.. It may be best if I just read in the lines (as the file is formatted to be one sentence per line) which start with either the GSV or the GGA tag. In the buffered reader section of the method, how could I go about doing that? Here is some of my code .... try { File gpsioFile = new File(gpsFile); FileReader file = new FileReader(gpsFile); BufferedReader buffer = new BufferedReader(file); StringBuffer stringbuff = new StringBuffer(); String ans; while ((ans = buffer.readLine()) != null) { gps.add(ans); stringbuff.append(ans); stringbuff.append("\n"); } } catch (Exception e) { e.printStackTrace(); } From this could I get an Arraylist with just the GGA and GSV sentences/lines but in the same order that they were from the file? Thanks
OK, I'd first start by splitting your string into individual lines with spilt(): String[] split = "$GPGSA,A,3,28,09,26,15,08,05,21,24,07,,,,1.6,1.0,1.3*3A,".split(","); you can also use "\n" as a split delimiter instead of ",". This will give you an array over which you can iterate. List<String> filtered = new ArrayList<String>() for (String item, split) { if (item.startsWith("$GPGSA")) { filtered.add(item); } } filtered would be a new Array with the items you want to keep. This approach works with JDK 6+. In JDK 8, this kind of problem can be solved more elegantly with the stream API.
My understanding is that you've got an ArrayList with a single String element. That String is a comma separated list of values. So step one is to extract the string and split it into it's constituent parts. Once you've done that you can process the each item in turn. private static List<List<String>> splitData(final ArrayList<String> data) { final List<List<String>> filteredData = new ArrayList<List<String>>(); String fullText = data.get(0); String[] splitData = fullText.split(","); List<String> currentList = null; for (int i = 0;i < splitData.length; i++) { final String next = splitData[i]; if (startTags.contains(next)) { if (interestingStartTags.contains(next)) { currentList = new ArrayList<String>(); filteredData.add(currentList); } else { currentList = null; } } if (currentList != null) { currentList.add(next); } } return filteredData; } The two static Set<String> provide the set of all 'gps sentence' start tags and also the set of ones you're interested in. The split data method uses startTags to determine if it has reached the start of a new sentence. If the new tag is also interesting, then a new list is created and added to the List<List<String>>. It is this list of lists that is returned. If you don't know all of the strings you want to use as 'startTag' then you could next.startsWith("$GP") or similar. Reading the file Looking at the updated question of how to read the file you could remove the StringBuffer and instead simply add each line you read to an ArrayList. The code below will step over any lines that do not start with the two tags you are interested in. The order of the lines within lineList will match the order they are found in the file. FileReader file = new FileReader(gpsFile); BufferedReader buffer = new BufferedReader(file); String ans; ArrayList<String> lineList = new ArrayList<String>(); while ((ans = buffer.readLine()) != null) { if (ans.startsWith("$GPSGSV")||ans.startsWith("$GPSGGA")) { lineList.add(ans); } }
How to make a method for searching from array and print entire arraybox?
I could use some help with a part of a code I am working on. I made a method which I think transformed every line of my .txt file into separate elements in an Array. However, I now want to be able to search in them and make the program print the entire element. ie: one of the lines reads: Crow, M, Kansas, june2012 I think I was able to make it into an array. Now I want to be able to search for "crow" and be able to get all the elements with that word in them printed alongside the rest of the String in the element. The code I have so far: System.out.println("Her kan du soke etter registrerigner etter fugletype"); try { Scanner sc = new Scanner(new File("fugler.txt")); List<String> lines = new ArrayList<String>(); while (sc.hasNextLine()) { lines.add(sc.nextLine()); } String[] arr = lines.toArray(new String[lines.size]); }catch (Exception e) { }
As others have already pointed out, you don't need to put your lines into an array since you already have them in an ArrayList. If you want to "search" lines and only print certain ones you could use contains: try { Scanner sc = new Scanner(new File("fugler.txt")); List<String> lines = new ArrayList<String>(); while (sc.hasNextLine()) { lines.add(sc.nextLine()); } for (String line : lines) { if(line.contains("yourSearchString")) { System.out.println(line); } } } catch (Exception e) { }
First of all, you don't need to put the lines in an array. You already have them in a list. You could print them as they come in: while (sc.hasNextLine() { String currentLine = sc.nextLine(); System.out.println(currentLine); lines.add(currentLine); } Or, you could just print all the lines in your list: for (String line : lines) { System.out.println(line); }
In addition to the other problems, if you want to have an array, you should replace this: String[] arr = lines.toArray(new String[0]); with this: String[] arr = lines.toArray(new String[lines.size()]); Your array that gets passed in is the array that will be populated by toArray, so it needs to be big enough to hold all the elements. If you want to search for some value line, you can use the original ArrayList<String>: // returns the index of the element, ie. its zero-based line number int index = lines.indexOf(line); To print them all, just loop through them all: for(String l : lines) { System.out.println(l); }
how to retrieve a line with spaces with comma as separator?
I have this record in my game.txt file menard,menard mabunga,0 francis,francis mabunga,0 angelica,francis mabunga,1 I access the file and store it in an array list using this code; Scanner s = new Scanner(getResources().openRawResource(R.raw.game)); ArrayList<String> list = new ArrayList<String>(); while (s.hasNext()){ list.add(s.next()); } s.close(); And use this function to load a random line; public static String randomLine(ArrayList list) { return (String) list.get(new Random().nextInt(list.size())); } When I try to print the result of the randomLine function using System.out.println(randomLine(list)); the output is mabunga,0 only. Now, how can I retrieve a line with spaces with comma as separator?
You are reading words instead of lines. Use nextLine() instead of next(). So, this should be your code: Scanner s = new Scanner(getResources().openRawResource(R.raw.game)); ArrayList<String> list = new ArrayList<String>(); while (s.hasNext()){ list.add(s.nextLine()); // Change here!! } s.close(); I prefer BufferedReader over Scanner if the goal is to read lines.
You should instantiate Random ony once, and store a reference to it. Otherwise, nextInt() will always return the same value.
Use a String Tokenizer (or a Stream Tokenizer if you want to separate them while reading the file). Set the separator to ",". Then you should be fine. In the randomLine function use the following piece of code, instead of yours: public static String randomLine(ArrayList list) { return (String) list.get((int)(Math.Random()*list.size())); }