Regular expression, reading character by character(java)

Regular expression, reading character by character(java) - java

Sorry for my English. Read file character by character. I need to output a string that meets our expression, in this case ([a-zA-Z0-9]){0,}. That's how I did it: if a line break occurs, then there is a check on the regular expression. Why does not it work?
UDP code:
public static void main(String[] args) throws FileNotFoundException, IOException {
int endWord = wordToFind.length();
int startWord = 0;
String myWord = "";
int numbLine = 1;
char[] barray = new char[1024];
StringBuilder stringB = new StringBuilder();
Pattern p = Pattern.compile("([a-zA-Z0-9]){0,}");
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("text.txt"))))
{
startTime = System.currentTimeMillis();
int value;
while((value = reader.read(barray, 0, barray.length)) != -1) {
for(int i = 0; i < value; i++) {
stringB.append(barray[i]);
if( Character.toString(barray[i]).equals("\n") ) {
Matcher m = p.matcher(stringB.toString());
if(m.matches()) {
finwWordLine = true;
}
if(finwWordLine) {
System.out.println(numbLine + ": " +stringB.toString());
}
stringB.delete(0, stringB.length());
finwWordLine = false;
numbLine++;
}
}
}
reader.close();
}catch(Exception e){
System.out.println("Error : "+e);
}
}
UPD text.txt
one line
two line
three line
asd asdas das dasd asd
asdasdasd
five line
one
asdasd asd asd asd

Try this regex instead:
[a-zA-Z0-9]*

Related

Problem parsing the first item of array from a csv file read from inputstream

The file is read from CSV. ok
the String is splitted using the semicolon. ok
parsing first string of array returns NumberformatException. Error<
Things i've already tried:
Integer.Valueof(str);
Integer.parseInt(str);
pp = new Integer(str);
Could someone help with this issue? thanks
public static void main(String[] args) throws FileNotFoundException, IOException {
File inputF = new File("C:\\chatse\\Estatistica\\dados.csv");
InputStream inputFS = new FileInputStream(inputF);
BufferedReader in = new BufferedReader(new InputStreamReader(inputFS));
String line;
String readFromCsv = "";
while ((line = in.readLine()) != null) {
readFromCsv += line + ";";
}
in.close();
String read = readFromCsv.trim();
String[] n = read.split(";");
ArrayList<String> lista = new ArrayList<>();
for (String st : n) {
lista.add(st);
}
ArrayList<Integer> nlistNum = new ArrayList<>();
for (String data : lista) {
int pp Integer.Valueof(data); //tried this,
int pp Integer.parseInt(data); // this,
int pp = new Integer(data); // and this.
nlistNum.add(pp);
}

issue was that the first index of my array had a "ascii 65279", so i
figured out this by spliting this index in a char array and retrieving
its Length
n = readFromCsv.split(";");
for (String st : n) {
char[] charr = st.toCharArray();
String str = "";
StringBuilder sb = new StringBuilder();
for (char c : charr) {
int asci = (int) c;
//this verify if it is in the range of number
if (asci >= 48 && asci <= 57) {
sb.append(c);
}
}
str = sb.toString();
int novoInt = Integer.valueOf(str);
nlistNum.add(novoInt);
}

BufferedReader ( scanner )

This is what i have for now. I want to know, how many times i have some word in .txt document . Now i am trying to use BufferedReader didn't manage well enough. I guess here is a easier way to solve this, but i don't know.
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
public class TekstiAnalüsaator {
public static void main(String[] args) throws Exception {
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
while (rida != null){
System.out.println("Reading: " + rida);
rida = puhverdab.readLine();
}
puhverdab.close();
}
}
I want to search words using this structure. What file, then what word i need to find, (return) how many times, this word is in the file.
TekstiAnalüsaator analüsaator = new TekstiAnalüsaator("kiri.txt");
int esinemisteArv = analüsaator.sõneEsinemisteArv("kala");

Please see the code example below. This should solve the issue you are facing.
import java.io.*;
public class CountWords {
public static void main(String args[]) throws IOException {
System.out.println(count("Test.java", "static"));
}
public static int count(String filename, String wordToSearch) throws IOException {
int tokencount = 0;
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String s;
int linecount = 0;
String line;
while ((s = br.readLine()) != null) {
if (s.contains(wordToSearch))
tokencount++;
// System.out.println(s);
}
return tokencount;
}
}

It is a bit of a tricky question because counting words in a string is not so simple task. Your approach is fine for reading the file line by line so now the problem is how to count the word matches.
For example you can do the simple check for matches like that:
public static int getCountOFWordsInLine(String line, String test){
int count=0;
int index=0;
while(line.indexOf(test,index ) != -1) {
count++;
index=line.indexOf(test,index)+1;
}
return count;
}
The problem with that approach is that if your word is "test" and your string is "Next word matches asdfatestsdf" it will count it as a match. So you can try using some more advanced regex:
public static int getCountOFWordsInLine(String line, String word) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+word+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
It actually checks for the word surrounded by \b which is word break
It still won't find the word if it start with uppercase though. If you want to make it case insensitive you can modify the previous method by changing everything to lowercase prior to searching. But it depends on your definition of word.
The whole program will become:
public class MainClass {
public static void main(String[] args) throws InterruptedException {
try {
InputStream baidid = new FileInputStream("c:\\test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
String word="test";
int count=0;
while (rida != null){
System.out.println("Reading: " + rida);
count+=getCountOFWordsInLine(rida,word );
rida = puhverdab.readLine();
}
System.out.println("count:"+count);
puhverdab.close();
}catch(Exception e) {
e.printStackTrace();
}
}
public static int getCountOFWordsInLine(String line, String test) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+test+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
}

import java.io.*;
import java.until.regex.*;
public class TA
{
public static void main(String[] args) throws Exception
{
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida;
String word = argv[0]; // search word passed via command line
int count1=0, count2=0, count3=0, count4=0;
Pattern P1 = Pattern.compile("\\b" + word + "\\b");
Pattern P2 = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE);
while ((rida = puhverdab.readLine()) != null)
{
System.out.println("Reading: " + rida);
// Version 1 : counts lines containing [word]
if (rida.contains(word)) count1++;
// Version 2: counts every instance of [word]
into pos=0;
while ((pos = rida.indexOf(word, pos)) != -1) { count2++; pos++; }
// Version 3: looks for surrounding whitespace
Matcher m = P1.matcher(rida);
while (m.find()) count3++;
// Version 4: looks for surrounding whitespace (case insensitive)
Matcher m = P2.matcher(rida);
while (m.find()) count4++;
}
System.out.println("Found exactly " + count1 + " line(s) containing word: \"" + word + "\"");
System.out.println("Found word \"" + word + "\" exactly " + count2 + " time(s)");
System.out.println("Found word \"" + word + "\" surrounded by whitespace " + count3 + " time(s).");
System.out.println("Found, case insensitive search, word \"" + word + "\" surrounded by whitespace " + count4 + " time(s).");
puhverdab.close();
}
}

This reads line-by-line as you've already done, splits a line by whitespace to obtain individual words, and checks each word for a match.
int countWords(String filename, String word) throws Exception {
InputStream inputStream = new FileInputStream(filename);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
BufferedReader reader = new BufferedReader(inputStreamReader);
int count = 0;
String line = reader.readLine();
while (line != null) {
String[] words = line.split("\\s+");
for (String w : words)
if (w.equals(word))
count++;
line = reader.readLine();
}
reader.close();
return count;
}

Replace letters with *

Making a hangman style of game
I have the random word now. How do I replace the letters of the word with an asterix * so that when the program starts the word is shown as *.
I assume that when someone inputs a letter for the hangman game you get the index of that character in the word and then replace the corresponding *.
public class JavaApplication10 {
public static String[] wordArray = new String[1];
public static String file_dir = "Animals.txt";
public static String selectedWord = "";
public static char[] wordCharacter = new char[1];
Scanner sc = new Scanner(System.in);
public static void main(String[] args) throws IOException {
wordArray = get_word(file_dir);
selectedWord = select_word(wordArray);
System.out.println(selectedWord);
}
public static String[] get_word(String file_dir) throws IOException {
FileReader fileReader = new FileReader(file_dir);
BufferedReader bufferedReader = new BufferedReader(fileReader);
List<String> lines = new ArrayList<String>();
String line = null;
while ((line = bufferedReader.readLine()) != null) {
lines.add(line);
}
bufferedReader.close();
return lines.toArray(new String[lines.size()]);
}
public static String select_word(String[] wordArray) {
Random rand = new Random();
int lines = Math.abs(rand.nextInt(wordArray.length)- 1);
return wordArray[lines];
}
}

If you know how many lines are there you could use Random method in java with a specific range to pick out a line at random.
Then you could read the file line-by-line till you reach that random line and print it.
// Open the file
FileInputStream fstream = new FileInputStream("testfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int counter=0;
//While-loop -> Read File Line By Line till the end of file
//And will also terminate when the required line is printed
while ((strLine = br.readLine()) != null && counter!=randomValue){
counter++;
//You need to set randomValue using the Random method as suggested
if(counter==randomValue)
// Print the content on the console
System.out.println (strLine+"\n");
}
//Close the input stream
br.close();

Assuming Java 8:
// Loading ...
Random R = new Random(System.currentTimeMillis());
List<String> animals = Files.readAllLines(Paths.get(path));
// ...
// When using
String randomAnimal = animals.get(R.nextInt(animals.size()));

Answer of your first question :
First you have to get the total number of lines
Then you have to generate a random number between 1 and that total number.
Finally, get the required word
try {
InputStream is = new BufferedInputStream(new FileInputStream("D:\\test.txt"));
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
int noOfLines = count+1;
System.out.println(noOfLines);
Random random = new Random();
int randomInt = random.nextInt(noOfLines);
FileReader fr = new FileReader("D:\\test.txt");
BufferedReader bufferedReader =
new BufferedReader(fr);
String line = null;
int counter =1;
while((line = bufferedReader.readLine()) != null) {
if(counter == randomInt)
{
System.out.println(line); // This the word you want
}
counter++;
}
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
}
finally {
//is.close();
}

Procedure to bold Strings in a text file

So, I'm working on a procedure that has an entry of a txt file called orders that specifies the number of words to bold and wich words must be bolded. I've managed to to it for one word but when i try with two words the output gets doubled. For example:
Input:
2
Ophelia
him
Output:
ACT I
ACT I
SCENE I. Elsinore. A platform before the castle.
SCENE I. Elsinore. A platform before the castle.
FRANCISCO at his post. Enter to him BERNARDO
FRANCISCO at his post. Enter to *him* BERNARDO
Here's my code, can anyone help me? PS: Ignore the boolean I guess.
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out) throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
StringBuilder str = new StringBuilder(linha);
for (int i = 0; i < palavras.length && !encontrou; i++)
{
if (linha.toLowerCase().indexOf(palavras[i]) != -1)
{
str.insert((linha.toLowerCase().indexOf(palavras[i])), bold);
str.insert((linha.toLowerCase().indexOf(palavras[i])) + palavras[i].length() + 1, bold);
out.write(str.toString());
out.newLine();
}
else
{
out.write(linha);
out.newLine();
}
}
linha = in.readLine();
}
}

This merits a regular expression replace of WORD-BOUNDARY + ALTERNATIVES + WORD-BOUNDARY.
String linha = in.readLine(); // Read number of words to be bolded.
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for(int i = 0; i < palavras.length; i++){
palavras[i]=orders.readLine();
}
// We make a regular expression Pattern.
// Like "\\b(him|her|it)\\b" where \\b is a word-boundary.
// This prevents mangling "shimmer".
StringBuilder regex = new StringBuilder("\\b(");
for (int i = 0; i < palavras.length; i++) {
if (i != 0) {
regex.append('|');
}
regex.append(Pattern.quote(palavras[i]));
}
regex.append(")\\b");
Pattern pattern = Pattern.compile(regex.toString(), Pattern.CASE_INSENSITIVE);
boolean encontrou = false;
linha = in.readLine(); // Read first line.
while(linha != null){
Matcher m = pattern.matcher(linha);
String linha2 = m.replaceAll(pattern, "*$1*");
if (linha2 != linha) {
encontrou = true; // Found a replacement.
}
out.write(linha2);
out.newLine();
linha = in.readLine(); // Read next line.
}
A replaceAll (instead of replaceFirst) then replaces all occurrences.

It's writing out twice because you output your StringBuilder (out.write(str.toString())) for the line (linha) every time you iterate through it, which will be at least the number of words in the lookup list.
Move the out.write() statements outside the loop and you should be fine.
Note this will only find one match in each line for each word. If you need to find more than one, the code is a little more complicated. You need to introduce a while loop instead of your if test for matching, or you could consider using replaceAll() using a regular expression based on your word palavras[i]. Ensuring you respected the capitalisation of the original is not simple there, but possible.
Fixed version
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out)
throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
StringBuilder str = new StringBuilder(linha);
for (int i = 0; i < palavras.length && !encontrou; i++)
{
if (linha.toLowerCase().indexOf(palavras[i]) != -1)
{
str.insert((linha.toLowerCase().indexOf(palavras[i])), bold);
str.insert(
(linha.toLowerCase().indexOf(palavras[i])) + palavras[i].length() + 1,
bold);
}
}
out.write(str.toString());
out.newLine();
linha = in.readLine();
}
}
With replaceAll
static void bold(char bold, BufferedReader orders, BufferedReader in, BufferedWriter out)
throws IOException
{
String linha = in.readLine();
boolean encontrou = false;
String[] palavras = new String[Integer.parseInt(orders.readLine())];
for (int i = 0; i < palavras.length; i++)
{
palavras[i] = orders.readLine();
}
while (linha != null)
{
for (int i = 0; i < palavras.length && !encontrou; i++)
{
String regEx = "\\b("+palavras[i]+")\\b";
linha = linha.replaceAll(regEx, bold + "$1"+bold);
}
out.write(linha);
our.newLine();
linha = in.readLine();
}
}
P.S. I've left the found boolean (encontrou) in, although it is not doing anything at the moment.

Counting the number of characters from a text file

I currently have the following code:
public class Count {
public static void countChar() throws FileNotFoundException {
Scanner scannerFile = null;
try {
scannerFile = new Scanner(new File("file"));
} catch (FileNotFoundException e) {
}
int starNumber = 0; // number of *'s
while (scannerFile.hasNext()) {
String character = scannerFile.next();
int index =0;
char star = '*';
while(index<character.length()) {
if(character.charAt(index)==star){
starNumber++;
}
index++;
}
System.out.println(starNumber);
}
}
I'm trying to find out how many times a * occurs in a textfile. For example given a text file containing
Hi * My * name *
the method should return with 3
Currently what happens is with the above example the method would return:
0
1
1
2
2
3
Thanks in advance.

Use Apache commons-io to read the file into a String
String org.apache.commons.io.FileUtils.readFileToString(File file);
And then, use Apache commons-lang to count the matches of *:
int org.apache.commons.lang.StringUtils.countMatches(String str, String sub)
Result:
int count = StringUtils.countMatches(FileUtils.readFileToString(file), "*");
http://commons.apache.org/io/
http://commons.apache.org/lang/

Everything in your method works fine, except that you print the count per line:
while (scannerFile.hasNext()) {
String character = scannerFile.next();
int index =0;
char star = '*';
while(index<character.length()) {
if(character.charAt(index)==star){
starNumber++;
}
index++;
}
/* PRINTS the result for each line!!! */
System.out.println(starNumber);
}

int countStars(String fileName) throws IOException {
FileReader fileReader = new FileReader(fileName);
char[] cbuf = new char[1];
int n = 0;
while(fileReader.read(cbuf)) {
if(cbuf[0] == '*') {
n++;
}
}
fileReader.close();
return n;
}

I would stick to the Java libraries at this point, then use other libraries (such as the commons libraries) as you become more familiar with the core Java API. This is off the top of my head, might need to be tweaked to run.
StringBuilder sb = new StringBuilder();
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String s = br.readLine();
while (s != null)
{
sb.append(s);
s = br.readLine();
}
br.close(); // this closes the underlying reader so no need for fr.close()
String fileAsStr = sb.toString();
int count = 0;
int idx = fileAsStr('*')
while (idx > -1)
{
count++;
idx = fileAsStr('*', idx+1);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression, reading character by character(java) - java

Try this regex instead: [a-zA-Z0-9]*

Related

Problem parsing the first item of array from a csv file read from inputstream

BufferedReader ( scanner )

Replace letters with *

Procedure to bold Strings in a text file

Counting the number of characters from a text file

Categories

Resources