Searching for strings in a text file

Searching for strings in a text file - java

I was programing in Python but now I want to do the same code in Java. Can you help me please? This is the code that I was working on
import random
import re
a = "y"
while a == "y":
i = input('Search: ')
b = i.lower()
word2 = ""
for letter in b:
lista = []
with open('d:\lista.txt', 'r') as inF:
for item in inF:
if item.startswith(letter):
lista.append(item)
word = random.choice(lista)
word2 = word2 + word
print(word2)
a = input("Again? ")
Now I want to do the same on Java but Im not really sure how to do it. Its not that easy. Im just a beginner. So far I founded a code that makes the search in a text file but I'm stuck.
This is the java code. It finds the position of the word. I've been trying to modify it without the results Im looking for.
import java.io.*;
import java.util.Scanner;
class test {
public static void main(String[] args){
Scanner input = new Scanner(System.in);
System.out.println("Search: ");
String searchText = input.nextLine();
String fileName = "lista.txt";
StringBuilder sb = new StringBuilder();
try {
BufferedReader reader = new BufferedReader(new FileReader(fileName));
while (reader.ready()) {
sb.append(reader.readLine());
}
}
catch(IOException ex) {
ex.printStackTrace();
}
String fileText = sb.toString();
System.out.println("Position in file : " + fileText.indexOf(searchText));
}
}
What I want is to find an item in a text file, a list, but just want to show the items that begin with the letters of the string I want to search. For example, I have the string "urgent" and the text file contains:
baby
redman
love
urban
gentleman
game
elephant
night
todd
So the display would be "urban"+"redman"+"gentleman"+ until it reaches the end of the string.

Let's assume that you've already tokenized the string so you've got a list of Strings, each containing a single word. It's what comes from the reader if you've got one word per line, which is how your Python code is written.
String[] haystack = {"baby", "redman", "love", "urban", "gentleman", "game",
"elephant", "night", "todd"};
Now, to search for a needle, you can simply compare the first characters of your haystack to all characters of the needle :
String needle = "urgent";
for (String s : haystack) {
for (int i = 0; i < needle.length(); ++i) {
if (s.charAt(0) == needle.charAt(i)) {
System.out.println(s);
break;
}
}
}
This solutions runs in O(|needle| * |haystack|).
To improve it a bit for the cost of a little bit of extra memory, we can precompute a hash table for the available starts :
String needle = "urgent";
Set<Character> lookup = new HashSet<Character>();
for (int i = 0; i < needle.length(); ++i) {
lookup.add(needle.charAt(i));
}
for (String s : haystack) {
if (lookup.contains(s.charAt(0))) {
System.out.println(s);
}
}
The second solution runs in O(|needle| + |haystack|).

This works if your list of words isn't too large. If your list of words is large you could adapt this so that you stream over the file multiple time collecting words to use.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
public class Test {
public static void main(String[] args) {
Map<Character, List<String>> map = new HashMap<Character, List<String>>();
File file = new File("./lista.txt");
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String line = null;
while ((line = reader.readLine()) != null) {
// assumes words are space separated with no
// quotes or commas
String[] tokens = line.split(" ");
for(String word : tokens) {
if(word.length() == 0) continue;
// might as well avoid case issues
word = word.toLowerCase();
Character firstLetter = Character.valueOf(word.charAt(0));
List<String> wordsThatStartWith = map.get(firstLetter);
if(wordsThatStartWith == null) {
wordsThatStartWith = new ArrayList<String>();
map.put(firstLetter, wordsThatStartWith);
}
wordsThatStartWith.add(word);
}
}
Random rand = new Random();
String test = "urgent";
List<String> words = new ArrayList<String>();
for (int i = 0; i < test.length(); i++) {
Character key = Character.valueOf(test.charAt(i));
List<String> wordsThatStartWith = map.get(key);
if(wordsThatStartWith != null){
String randomWord = wordsThatStartWith.get(rand.nextInt(wordsThatStartWith.size()));
words.add(randomWord);
} else {
// text file didn't contain any words that start
// with this letter, need to handle
}
}
for(String w : words) {
System.out.println(w);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if(reader != null) {
try {
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
}
This assumes the content of lista.txt looks like
baby redman love urban gentleman game elephant night todd
And the output will look something like
urban
redman
gentleman
elephant
night
todd

Related

Print out the total number of letters - Java

I want to print out the total number of letters (not including whitespace characters) of all the Latin names in the data file. Duplicate letters must be counted. This is what I have done so far:
List<Person> peopleFile = new ArrayList<>();
int numberOfLetters = 0;
try {
BufferedReader br = new BufferedReader(new FileReader("people_data.txt"));
String fileRead = br.readLine();
while (fileRead != null) {
String[] tokenSize = fileRead.split(":");
String commonName = tokenSize[0];
String latinName = tokenSize[1];
Person personObj = new Person(commonName, latinName);
peopleFile.add(personObj);
fileRead = br.readLine();
// Iterating each word
for (String s: tokenSize) {
// Updating the numberOfLetters
numberOfLetters += s.length();
}
}
br.close();
}
catch (FileNotFoundException e) {
System.out.println("file not found");
}
catch (IOException ex) {
System.out.println("An error has occured: " + ex.getMessage());
}
System.out.print("Total number of letters in all Latin names = ");
System.out.println(numberOfLetters);
The problem is that it prints out all number of letters in the file, I just want it to print out the number of characters in the Latin names.
The text file:
David Lee:Cephaloscyllium ventriosum
Max Steel:Galeocerdo cuvier
Jimmy Park:Sphyrna mokarren

What you are doing wrong is you are counting all the names despite you tokenize them. You can use this method to count letters of any String or Sentence.
public static int countLetter(String name) {
int count = 0;
if(name != null && !name.isEmpty()) {
/* This regular expression is splitting String at the
* sequence of Non-alphabetic characters. Hence actually
* splitting the Name into group of words */
String[] tokens = name.split("[^a-zA-Z]+");
for(String token : tokens) {
count += token.length();
}
}
return count;
}
And replace these lines
/* Note: here you are iterating all your Names from each line */
for (String s: tokenSize) {
// Updating the numberOfLetters
numberOfLetters += s.length();
}
with this
numberOfLetters += countLetter(latinName);
Does it make sense ? I hope you found your problem.
NB: you can experiment with this regex here

Get rid of all the blank spaces before summing the length :
s=s.replaceAll("[ \n\t]+","");
numberOfLetters += s.length();

Calculating the frequency of strings as they get stored in a nested hashmap

i want to write a code that stores strings in a hashmap as they are read from text files.
i have written the code below and it works, no errors, the frequency of every occurrence of the string combination does not change, it is always 1.
i am asking for assistance on how i can ensure that if a string combination appears more than once in the text file then its frequency should also increase.
this is my code:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
public class NgramBetaC {
static String[] hashmapWord = null;
public static Map<String,Map<String, Integer>> bigrams = new HashMap<>();
public static void main(String[] args) {
//prompt user input
Scanner input = new Scanner(System.in);
//read words from collected corpus; a number of .txt files
File directory = new File("Corpus4");
File[] listOfFiles = directory.listFiles();//To read from all listed iles in the "directory"
//String bWord[] = null;
int lineNumber = 0;
String line;
String files;
String delimiters = "[\\s+,?!:;.]";
int wordTracker = 0;
//reading from a list of text files
for (File file : listOfFiles) {
if (file.isFile()) {
files = file.getName();
try {
if (files.endsWith(".txt") || files.endsWith(".TXT")) { //ensures a file being read is a text file
BufferedReader br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
line = line.toLowerCase();
hashmapWord = line.split(delimiters);
for(int s = 0; s < hashmapWord.length - 2; s++){
String read = hashmapWord[s];
String read1 = hashmapWord[s + 1];
final String read2 = hashmapWord[s + 2];
String readBigrams = read + " " + read1;
final Integer count = null;
//bigrams.put(readBigrams, new HashMap() {{ put (read2, (count == null)? 1 : count + 1);}});
bigrams.put(readBigrams, new HashMap<String, Integer>());
bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);
} br.close();
}
}
} catch (NullPointerException | IOException e) {
e.printStackTrace();
System.out.println("Unable to read files: " + e);
}
}
}
}
THE LINES CONTAINED IN THE TEXT FILES ARE::
1.i would like some ice cream.
2.i would like to be in dubai this december.
3.i love to eat pasta.
4.i love to prepare pasta myself.
5.who will be coming to see me today?
THE OUTPUT I GET WHEN PRINTING CONTENTS OF THE HASHMAP IS:
{coming to={see=1}, would like={to=1}, in dubai={this=1}, prepare pasta={myself=1}, to eat={pasta=1}, like to={be=1}, to prepare={pasta=1}, will be={coming=1}, love to={prepare=1}, some ice={cream=1}, be in={dubai=1}, be coming={to=1}, dubai this={december=1}, to be={in=1}, i love={to=1}, to see={me=1}, who will={be=1}, like some={ice=1}, i would={like=1}, see me={today=1}}
Please assist! some string combinations are not even appearing.
THE OUTPUT I EXPECT AS I READ FROM THE FILES IS:
{coming to={see=1}, would like={to=1}, in dubai={this=1}, prepare pasta={myself=1}, to eat={pasta=1}, like to={be=1}, to prepare={pasta=1}, will be={coming=1}, love to={prepare=1}, some ice={cream=1}, be in={dubai=1}, be coming={to=1}, dubai this={december=1}, to be={in=1}, i love={to=1}, to see={me=1}, who will={be=1}, like some={ice=1}, i would={like=2}, see me={today=1}, love to {eat=1}, would like {some=1}, i would {love=1}, would love {to=1}}

Tentatively update the current structure without overwriting the originl content
Replace
bigrams.put(readBigrams, new HashMap<String, Integer>());
bigrams.get(readBigrams).put(read2, (count == null) ? 1 : count+1);
With
HashMap<String, Integer> counter = bigrams.get(readBigrams);
if (null == counter) {
counter = new HashMap<String, Integer>();
bigrams.put(readBigrams, counter);
}
Integer count = counter.get(read2);
counter.put(read2, count == null ? 1 : count + 1);

Read a text file and store numbers in different arrays in java

I am currently writing my thesis, and in that context I need to develop a meta-heuristic using java. However I am facing a problem when trying to read and store the data.
My file is a text file, with around 150 lines. An example of the problem is in line 5 where three integer numbers are stated: 30, 38 and 1. I would like to store each of these as an integer called respectively L, T and S, and this goes on for many other of the lines.
Any of you who knows how to do that? If needed I can send you the txt file.
btw: this is what I've tried so far:
Main.java:
import java.io.IOException;
import java.io.FileWriter;
import java.io.BufferedWriter;
public class MAIN {
public static void main(String[] args) throws IOException {
Test.readDoc("TAP_T38L30C4F2S12_03.txt");
}
}
Test.java:
import java.io.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
public class Test {
private static ArrayList<Integer> integerList = new ArrayList<Integer>();
public static Map<String, ArrayList<Integer>> data = new HashMap<String, ArrayList<Integer>>();
public static String aKey;
public static void readDoc(String File) {
try{
FileReader fr = new FileReader("TAP_T38L30C4F2S12_03.txt");
BufferedReader br = new BufferedReader(fr);
while(true) {
String line = br.readLine();
if (line == null)
break;
else if (line.matches("\\#\\s[a-zA-Z]")){
String key = line.split("\\t")[1];
line = br.readLine();
data.put(key, computeLine(line));
}
else if (line.matches("\\\\\\#\\s(\\|[a-zA-Z]\\|,?\\s?)+")){
String[] keys = line.split("\\t");
line = br.readLine();
ArrayList<Integer> results = computeLine(line);
for (int i=0; i<keys.length; i++){
aKey = aKey.replace("|", "");
// data.put(aKey, results.get(i));
data.put(aKey, results);
}
}
System.out.println(data);
}
} catch(Exception ex) {
ex.printStackTrace(); }
}
private static ArrayList<Integer> computeLine (String line){
String[] splitted = line.split("\\t");
for (String s : splitted) {
integerList.add(Integer.parseInt(s));
}
return integerList;
}
}
And example of the data is seen here:
\# TAP instance
\# Note that the sequence of the data is important!
\#
\# |L|, |T|, |S|
30 38 1
\#
\# v
8213 9319 10187 12144 8206 ...
\#
\# w
7027 9652 9956 13973 6661 14751 ...
\#
\# b
1 1 1 1 1 ...
\#
\# c
1399 1563 1303 1303 2019 ...
\#
\# continues

The following code is working with the sample data you gave.
In short :
Create a field to store your data, I chose a TreeMap so you can map a letter to a certain number of Integers but you can use another Collection.
Read the file line by line using BufferedReader#readLine()
Then process each bunch of lines depending on your data. Here I use regular expressions to match a given line and then to remove everything that is not data. See String#split(), String#matches()
But before all start by reading some good beginners books about java and Object Oriented Design.
public class ReadAndParse {
public Map<String, ArrayList<Integer>> data = new TreeMap<String, ArrayList<Integer>>();
public ReadAndParse() {
try {
FileReader fr = new FileReader("test.txt");
BufferedReader br = new BufferedReader(fr);
while(true) {
String line = br.readLine();
if (line == null) break;
else if (line.matches("\\\\#\\s[a-zA-Z]")){
String key = line.split("\\s")[1];
line = br.readLine();
ArrayList<Integer> value= computeLine(line);
System.out.println("putting key : " + key + " value : " + value);
data.put(key, value);
}
else if (line.matches("\\\\\\#\\s(\\|[a-zA-Z]\\|,?\\s?)+")){
String[] keys = line.split("\\s");
line = br.readLine();
ArrayList<Integer> results = computeLine(line);
for (int i=1; i<keys.length; i++){
keys[i] = keys[i].replace("|", "");
keys[i] = keys[i].replace(",", "");
System.out.println("putting key : " + keys[i] + " value : " + results.get(i-1));
ArrayList<Integer> value= new ArrayList<Integer>();
value.add(results.get(i-1));
data.put(keys[i],value);
}
}
}
}
catch (IOException e) {
e.printStackTrace();
}
// print the data
for (Entry<String, ArrayList<Integer>> entry : data.entrySet()){
System.out.println("variable : " + entry.getKey()+" value : "+ entry.getValue() );
}
}
// the compute line function
private ArrayList<Integer> computeLine(String line){
ArrayList<Integer> integerList = new ArrayList<>();
String[] splitted = line.split("\\s+");
for (String s : splitted) {
System.out.println("Compute Line : "+s);
integerList.add(Integer.parseInt(s));
}
return integerList;
}
// and the main function to call it all
public static void main(String[] args) {
new ReadAndParse();
}
}
Some sample output of what I got after parsing your file :
variable : L value : [30]
variable : S value : [1]
variable : T value : [38]
variable : b value : [1, 1, 1, 1, 1]
variable : c value : [1399, 1563, 1303, 1303, 2019]
variable : v value : [8213, 9319, 10187, 12144, 8206]
variable : w value : [7027, 9652, 9956, 13973, 6661, 14751]

I think I've got something.
EDIT:
I've changed my approach
You'll need to import;
import java.io.BufferedReader;
Then
BufferedReader reader = new BufferedReader
int[] arr = new int[3];
int L;
int T;
int S;
for (int i = 0 ;i<5; i++){ //brings you to fifth line
line = reader.readLine();
}
L = line.split(" ")[0]trim();
T = line.split(" ")[1]trim();
S = line.split(" ")[2]trim();
arr[0] = (L);
arr[1] = (T);
arr[2] = (S);

Issue Reading from a file and using a 2D array to sort the data

I'm making a province sorter, and the requirement is that I must leave the main class as is, and make a private class called Munge, i've been at this for hours and changed my code hundreds of times, basically it reads from a text file that looks like this
Hamilton, Ontario
Toronto, Ontario
Edmonton, Alberta
Red Deer, Alberta
St John's, Newfoundland
and needs to be output like this
Alberta; Edmonton, Red Deer
Ontario; Hamilton, Toronto
Newfoundland; St John's
my main class is unchangeable and looks like this
public class Lab5 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
if(args.length < 2) {
System.err.println("Usage: java -jar lab5.jar infile outfile");
System.exit(99);
}
Munge dataSorter = new Munge(args[0], args[1]);
dataSorter.openFiles();
dataSorter.readRecords();
dataSorter.writeRecords();
dataSorter.closeFiles();
}
}
and the Munge class i've made looks like this
package lab5;
import java.io.File;
import java.util.Scanner;
import java.util.Formatter;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Munge
{
private String inFileName, outFileName;
private Scanner inFile;
private Formatter outFile;
private int line = 0;
private String[] data;
public Munge(String inFileName, String outFileName)
{
this.inFileName = inFileName;
this.outFileName = outFileName;
data = new String[100];
}
public void openFiles()
{
try
{
inFile = new Scanner(new File(inFileName));
File file = new File("input.txt");
SortedMap<String, List<String>> map = new TreeMap<String, List<String>>();
Scanner scanner = new Scanner(file).useDelimiter("\\n");
while (scanner.hasNext()) {
String newline = scanner.next();
if (newline.contains(",")) {
String[] parts = newline.split(",");
String city = parts[0].trim();
String province = parts[1].trim();
List<String> cities = map.get(province);
if (cities == null) {
cities = new ArrayList<String>();
map.put(province, cities);
}
if (!cities.contains(city)) {
cities.add(city);
}
}
}
for (String province : map.keySet()) {
StringBuilder sb = new StringBuilder();
sb.append(province).append(": ");
List<String> cities = map.get(province);
for (String city : cities) {
sb.append(city).append(", ");
}
sb.delete(sb.length() - 2, sb.length());
String output = sb.toString();
System.out.println(output);
}
}
catch(FileNotFoundException exception)
{
System.err.println("File not found.");
System.exit(1);
}
catch(SecurityException exception)
{
System.err.println("You do not have access to this file.");
System.exit(1);
}
try
{
outFile = new Formatter(outFileName);
}
catch(FileNotFoundException exception)
{
System.err.println("File not found.");
System.exit(1);
}
catch(SecurityException exception)
{
System.err.println("You do not have access to this file.");
System.exit(1);
}
}
public void readRecords()
{
while(inFile.hasNext())
{
data[line] = inFile.nextLine();
System.out.println(data[line]);
line++;
}
}
public void writeRecords()
{
for(int i = 0; i < line; i++)
{
String tokens[] = data[i].split(", ");
Arrays.sort(tokens);
for(int j = 0; j < tokens.length; j++)
outFile.format("%s\r\n", tokens[j]);
}
}
public void closeFiles()
{
if(inFile != null)
inFile.close();
if(outFile != null)
outFile.close();
}
}
you'll have to excuse my brackets, there formatted correctly in netbeans but i had to move the bottom ones over to keep it in the codeblock

As I think this is homework I'll avoid giving you a solution but give some hints of what to do.
When you have read a line it consists of City, Province. So the first thing you need to do is split the string into two parts. The second part is the province and the first is the city. You need to make a collection for each province and store the city in the correct province collection.
Once you have that you sort the names of the found provinces, and iterate through them. Sort the cities for the province and then output the province name and each city name.
Useful classes could be will be HashMap, TreeMap, List, Collections (has sort methods).
Hope that helps to get you further, otherwise try to be more specific where you are stuck.

importing a csv file into a java swing table

I have a csv file of all the stock quotes on in the nyse. first column is symbol second column is the name of the company.
I have a search box and table made in netbeans using the java swing library.
Right now when I enter the name in the box it is returning the correct amount of rows. So for instance if I search GOOG it will only return 2 rows (1 row for the GOOG symbol and one row for the name in the full company name). However the data within the rows is not the correct ones it is just printing the first row of the csv file over and over. here is the code that gets executed when clicking the search button:
package my.Stock;
import java.util.ArrayList;
import java.util.Scanner;
import java.io.BufferedReader;
import java.util.StringTokenizer;
import java.io.FileReader;
import java.io.*;
public class searchy {
public static void doSearch(String s){
javax.swing.JTable resTable = StockUI.stockUI.getResultTable();
javax.swing.table.DefaultTableModel dtm =
(javax.swing.table.DefaultTableModel) resTable.getModel();
while (dtm.getRowCount()> 0 ) dtm.removeRow(0);
String sym = s.trim().toUpperCase();
try {
//csv file containing data
String strFile = "companylist.csv";
//create BufferedReader to read csv file
BufferedReader br = new BufferedReader( new FileReader(strFile));
String strLine = "";
StringTokenizer st = null;
int lineNumber = 0, tokenNumber = 0;
//create arraylist
ArrayList<String> arrayList = new ArrayList<String>();
//read comma separated file line by line
while( (strLine = br.readLine()) != null){
lineNumber++;
//break comma separated line using ","
st = new StringTokenizer(strLine, ",");
while(st.hasMoreTokens()){
//display csv values
tokenNumber++;
arrayList.add(st.nextToken());
//System.out.println("Line # " + lineNumber + ": "+ st.nextToken() + " " + st.nextToken());
} //end small while
//reset token number
tokenNumber = 0;
} //end big while loop
//send csv to an array
Object[] elements = arrayList.toArray();
/*
for(int i=0; i < elements.length ; i++) {
System.out.println(elements[i]);
} */
Scanner input = new Scanner(System.in);
System.out.print("Enter Ticker symbol");
//String sym = input.next().toUpperCase(); //convert to uppercase to match csv
int j=0;
for(int i=0; i < elements.length ; i++) {
if (((String) elements[i]).contains(sym)){
//System.out.println(elements[i]);
dtm.addRow(elements);
j++;
if (j==25) break; //only show this many results
}
}
}
catch(Exception e){
System.out.println("Exception while reading csv file: " + e);
}
}
}
I understand why this is happening but I am not sure how to tell it to print the correct lines since I can't use dtm.addRow(elements[i]);
Any help is greatly appreciated.

Try CSVManager.

I collect csv data for stocks from Yahoo, and, oddly enough, every now and then they mess it up by using a company name with a comma in it, e.g., "Dolby, Inc.". Of course, that throws off the parsing of the CSV file. I don't know if this might be your problem.
John Doner

package recommendation.event.test;
import java.io.FileReader;
import com.csvreader.CsvReader;
public class ReadCSV {
public static void main (String [] args){
try {
CsvReader products = new CsvReader("resources/Event Recommendation Engine Challenge/data/test.csv");
products.readHeaders();
while (products.readRecord())
{
String user = products.get("user");
String event = products.get("event");
String invited = products.get("invited");
String timestamp = products.get("timestamp");
System.out.println(user + " : " + event+" : "+invited+" : "+timestamp);
}
products.close();
}catch (Exception e) {
// TODO: handle exception
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Searching for strings in a text file - java

Related

Print out the total number of letters - Java

Calculating the frequency of strings as they get stored in a nested hashmap

Read a text file and store numbers in different arrays in java

Issue Reading from a file and using a 2D array to sort the data

importing a csv file into a java swing table

Categories

Resources