parsing a text file using a java scanner - java

I am trying to create a method that parses a text file and returns a string that is the url after the colon. The text file looks as follow (it is for a bot):
keyword:url
keyword,keyword:url
so each line consists of a keyword and a url, or multiple keywords and a url.
could anyone give me a bit of direction as to how to do this? Thank you.
I believe I need to use a scanner but couldn't find anything on anyone wanting to do anything similar to me.
Thank you.
edit: my attempt using suggestions below. doesn't quite work. Any help would be appreciated.
public static void main(String[] args) throws IOException {
String sCurrentLine = "";
String key = "hello";
BufferedReader reader = new BufferedReader(
new FileReader(("sites.txt")));
Scanner s = new Scanner(sCurrentLine);
while ((sCurrentLine = reader.readLine()) != null) {
System.out.println(sCurrentLine);
if(sCurrentLine.contains(key)){
System.out.println(s.findInLine("http"));
}
}
}
output:
hello,there:http://www.facebook.com
null
whats,up:http:/google.com
sites.txt:
hello,there:http://www.facebook.com
whats,up:http:/google.com

You should read the file line by line with a BufferedReader as you are doing, I would the recommend parsing the file using regex.
The pattern
(?<=:)http://[^\\s]++
Will do the trick, this pattern says:
http://
followed by any number of non-space characters (more than one) [^\\s]++
and preceded by a colon (?<=:)
Here is a simple example using a String to proxy your file:
public static void main(String[] args) throws Exception {
final String file = "hello,there:http://www.facebook.com\n"
+ "whats,up:http://google.com";
final Pattern pattern = Pattern.compile("(?<=:)http://[^\\s]++");
final Matcher m = pattern.matcher("");
try (final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(file.getBytes("UTF-8"))))) {
String line;
while ((line = bufferedReader.readLine()) != null) {
m.reset(line);
while (m.find()) {
System.out.println(m.group());
}
}
}
}
Output:
http://www.facebook.com
http://google.com

Use BufferedReader, for text parsing you can use regular expresions.

You should use the split method:
String strCollection[] = yourScannedStr.Split(":", 2);
String extractedUrl = strCollection[1];

Reading a .txt file using Scanner class in Java
http://www.tutorialspoint.com/java/java_string_substring.htm
That should help you.

Related

How to load a text file to a string variable in java

I'm pretty new in the programming world, and i can't find a good explanation on how to to load a txt file to a string variable in java using eclpise.
So far, from what i have been able to understand, i am supposed to use the StdIn class, and i know that the txt file need to be located in my eclipse workspace (outside the source folder) but i don't know what excatly i need to write in the code to get the given file to load into the variable.
I could really use some help with this.
Although I'm not a Java expert, I'm pretty sure this is the information you're looking for It looks like this:
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
Basically all languages provide you with some methods to read from the file system you're in. Hope that does it for you!
Good luck with your project!
to read a file and store it in a String you can do it by using either String or StringBuilder:
you need to define BufferedReader to with constructor of FileReader to pass the name of the file and make it ready to read from file.
use StringBuilder to append every line of result to it.
when the reading finished add the result to String data.
public static void main(String[] args) {
String data = "";
try {
BufferedReader br = new BufferedReader(new FileReader("filename"));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
data = sb.toString();
} catch (Exception e) {
e.printStackTrace();
}
}

Need help to read specific line in a log which ends with specific word

I need help on java code to read a log file that can print all lines present ends with START word.
my file contains:
test 1 START
test2 XYZ
test 3 ABC
test 2 START
it should print
test 1 START
test 2 START
I tried below code but it printing START only.
public class Findlog{
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("myfile"));
Pattern patt = Pattern.compile(".*START$");
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
System.out.println(line.substring(line));
}
}
line.endsWith("START") check is good enough. You do not need regular expressions here.
I think you already found your solution.
Anyway a regex that should work is:
".*START$"
which says: take everithing (.*) that is followed by START and START is the end of the line ($)
Full version of your code should look like below
public class Findlog{
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("myfile"));
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
if(line.endsWith("START"))
{
System.out.println(line);
}
}
If you want to skip case sensitivity, then the code should look like below.
public class Findlog{
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("myfile"));
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
if(line.toLowerCase().endsWith(matches.toLowerCase()))
{
System.out.println(line);
}
}
Just change in the if condition.
public class Findlog{
public static void main(String[] args) throws IOException {
BufferedReader r = new BufferedReader(new FileReader("myfile"));
Pattern patt = Pattern.compile(".*START$");
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
System.out.println(line.substring(line));
}
}

Find a string in a very large formatted text file in java

Here is the thing:
I have a really big text file and it has a format like this:
0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99
0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99
...
...
And I need to find if a particular pattern exists in the file, say
0007476|whatever|00313865000|whatever
All I need is a boolean saying yes or no.
Now what I have done is to read the file line by line and do a regular expression matching:
Pattern pattern = Pattern.compile(regex);
Scanner scanner = new Scanner(new File(fileName));
String line;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (pattern.matcher(line).matches()) {
scanner.close();
return true;
}
}
and the regex has a form of
"0007476\|\d{12}\|0031386500.*
This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks
The java String class has a contains method which returns a boolean. If your string is fixed, this is a lot faster than a regular expression:
if (string.contains("0007476|") && string.contains("|00313865000|")) {
// whatever
}
Hope that helped, if not, leave a comment.
I assume that you need the Scanner because the file is too big to read into a single String instead?
If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:
"(?m)^0007476\|\d{12}\|0031386500.*$
If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?
I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:
private static boolean matches() throws IOException {
String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";
Pattern pattern = Pattern.compile(regex);
try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {
for(String lines; (lines = readLines(br, 10000)) != null; ) {
if (pattern.matcher(lines).find()) {
return true;
}
}
}
return false;
}
private static String readLines(BufferedReader br, int amount) throws IOException {
StringBuilder builder = new StringBuilder();
int lineCounter = 0;
for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {
builder.append(line).append(System.lineSeparator());
}
return lineCounter > 0 ? builder.toString() : null;
}

Regular expression illegal character in Java

I've been looking through the Internet an after a big headache, cannon't find why this regular expression is wrong:
"\"\w*&&[\p{Punct}]\"["+sepChar+"]\"\w*&&[\p{Punct}]\""
I'm trying to read a master data file with the following pattern (quotes included):
"TEXTVALUE":"TEXTVALUE":"TEXTVALUE"
and split each line with the regular expression above.
So, for example:
"Hello:John":"Hello:World":"Hello:Mark"
will be splitted into:
{"Hello:John", "Hello:World", "Hello:Mark"}
The backwards slash is the escape character in Java. You need to use two backslashes \\ to include a single backslash in the regex.
Try:
"\"\\w*&&[\\p{Punct}]\"["+sepChar+"]\"\\w*&&[\\p{Punct}]\""
Ok.
Thanks to #kevin-bowersox for the help.
It seems that Oracle has done a great job improving Java with version 7.
With this code:
File file = new File(someFile);
BufferedReader br = new BufferedReader(file);
String line = null;
while((line = br.readLine()) != null){
//todo
}
If your file has been formatted with a constant patern, for example:
"TEXTVALUE":"TEXTVALUE":"TEXTVALUE"
It reads:
"TEXTVALUE-->TEXTVALUE-->TEXTVALUE"
where '-->' stands for tabs ('\t')
So, at the end, my solution is:
public ArrayList getSplittedTextFromFile(String filePath) throws FileNotFoundException, IOException{
ArrayList<String[]> ret = null;
if (!filePath.isEmpty()){
File input = new File(filePath);
BufferedReader br = new BufferedReader(input);
String line = null;
while((line = br.readLine()) != null){
String[] aSplit = line.split("\\t");
if (ret == null)
ret = new ArrayList<>();
ret.add(aSplit);
}//while
}//fi
}//fnc

in java, how to print entire line in the file when string match found

i am having text file called "Sample.text". It contains multiple lines. From this file, i have search particular string.If staring matches or found in that file, i need to print entire line . searching string is in in middle of the line . also i am using string buffer to append the string after reading the string from text file.Also text file is too large size.so i dont want to iterate line by line. How to do this
You could do it with FileUtils from Apache Commons IO
Small sample:
StringBuffer myStringBuffer = new StringBuffer();
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).contains("something")) {
myStringBuffer.append(String.valueOf(line));
}
}
we can also use regex for string or pattern matching from a file.
Sample code:
import java.util.regex.*;
import java.io.*;
/**
* Print all the strings that match a given pattern from a file.
*/
public class ReaderIter {
public static void main(String[] args) throws IOException {
// The RE pattern
Pattern patt = Pattern.compile("[A-Za-z][a-z]+");
// A FileReader (see the I/O chapter)
BufferedReader r = new BufferedReader(new FileReader("file.txt"));
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
int start = m.start(0);
// Get ending position
int end = m.end(0);
// Print whatever matched.
// Use CharacterIterator.substring(offset, end);
System.out.println(line.substring(start, end));
}
}
}
}

Categories