Java search string for certain wordgroups - java

I have this code, which is working fine, that is giving me the source code of a website:
package Quellenpackage;
import java.net.URL;
import java.io.*;
public class Quellcode {
/**
* #param args
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
final String meineURL = "http://www.onvista.de/aktien/technische-kennzahlen/Aareal-Bank-Aktie-DE0005408116";
URL url = new URL(meineURL);
InputStreamReader isr = new InputStreamReader(url.openConnection().getInputStream());
BufferedReader br = new BufferedReader(isr);
// read complete content
String line ="";
String code ="";
while((line = br.readLine()) != null)
{
code += line + "\r\n";
}
// open Reader
br.close();
isr.close();
// give out page content
System.out.println(code);
}
At the moment, this shows me the whole code of the page, but I only want a certain part.
This part should be exactly between this exact signs "Start:" and "ende". So I need something that searches the code for the.. "Start:"
.. part and then gives out everything until the "ende"
I have absolutely no clue whatsoever if this is possible let alone know how to do it. I really hope you guys can help me.

You can use the String.split() method see doku here. Its seperates a String into different Strings at any time the given paremeter is found somewhere in the string.
String[] parts = code.split("Start:"); //makes to parts before Start: and after Start:
parts = parts[1].split("ende"); //makes to parts before ende and after ende
String result = parts[0]; //here is the result
Problem with more than one ende:
If there are more than one ende in the code it wont work this way so you have to choose a different solution where you add the other parts back in:
String[] parts = code.split("Start:");
parts = parts[1].split("ende");
String result = parts[0];
//special case:
//add every splitted sub string, but NOT the last one
for (int i = 1;i<parts.length-1;i++){
result+=parts[i];
}

Related

java string StringTokenizer doesn't recognize token after "//"?

i am writing a code where i want to print only comments in a java file , it worked when i have a comments like this
// a comment
but when i have a comment like this :
// /* cdcdf
it will not print "/* cdcdf" , it only prints a blank line
anyone know why this happens ?
here is my code :
package printC;
import java.io.*;
import java.util.StringTokenizer;
import java.lang.String ;
public class PrintComments {
public static void main(String[] args) {
try {
String line;
BufferedReader br = new BufferedReader(new FileReader(args[0]));
while ((line = br.readLine()) != null) {
if (line.contains("//") ) {
StringTokenizer st1 = new StringTokenizer(line, "//");
if(!(line.startsWith("//"))) {
st1.nextToken();
}
System.out.println(st1.nextToken());
}
}
}catch (Exception e) {
System.out.println(e);
}
}
}
You can simplify the code by just looking for the first position of the //. indexOf works fine for this. You don't need to tokenize as you really just want everything after a certain position (or text), you don't need to split the line into multiple pieces.
If you find the // (indexOf doesn't return -1 for "not found"), you use substring to only print the characters starting at that position.
This minimal example should do what you want:
import java.io.*;
import java.util.StringTokenizer;
public class PrintComments {
public static void main(String[] args) throws IOException {
String line; // comment
BufferedReader br = new BufferedReader(new FileReader(args[0]));
while ((line = br.readLine()) != null) {
int commentStart = line.indexOf("//");
if (commentStart != -1) {
System.out.println(line.substring(commentStart));
}
} // /* that's it
}
}
If you don't want to print the //, just add 2 to commentStart.
Note that this primitive approach to parsing for comments is very brittle. If you run the program on its own source, it will happily report //"); as well, for the line of the indexOf. Any serious attempt to find comments need to properly parse the source code.
Edit: If you want to look for other comments marked by /* and */ as well, do the same thing for the opening comment, then look for the closing comment at the end of the line. This will find a /* comment */ when all of the comment is on a single line. When it sees the opening /* it looks whether the line ends with a closing */ and if so, uses substring again to only pick the parts between the comment markers.
import java.io.*;
import java.util.StringTokenizer;
public class PrintComments {
public static void main(String[] args) throws IOException {
String line; // comment
BufferedReader br = new BufferedReader(new FileReader(args[0]));
while ((line = br.readLine()) != null) {
int commentStart;
String comment = null;
commentStart = line.indexOf("//");
if (commentStart != -1) {
comment = line.substring(commentStart + 2);
}
commentStart = line.indexOf("/*");
if (commentStart != -1) {
comment = line.substring(commentStart + 2);
if (comment.endsWith("*/")) {
comment = comment.substring(0, comment.length() - 2);
}
}
if (comment != null) {
System.out.println(comment);
}
} // /* that's it
/* test */
}
}
To extend this for comments that span multiple lines, you need to remember whether you're in a multi-line comment, and if you are keep printing line and checking for the closing */.
StringTokenizer takes a collection of delimiters, not a single string delimiter. so it is splitting on the '/' char. the "second" token is the empty token between the two initial "//".
If you just want the rest of the line after the "//", you could use:
if(line.startsWith("//")) {
line = line.substring(2);
}
Additional to #jtahlborn answer. You can check all of the token by iterating token:
e.g:
...
StringTokenizer st1 = new StringTokenizer(line, "//");
while (st1.hasMoreTokens()){
System.out.println("token found:" + st1.nextToken());
}
...
If you are reading per line, the StringTokenizer don't do much in your code. Try this, change the content of if like this:
if(line.trim().startWith("//")){//true only if líne start with //,aka: comment line
//Do stuff with líne
String cleanLine = line.trim().replace("//"," ");//to remove all // in line
String cleanLine = línea.trim().substring(2,línea.trim().lenght());//to remove only the first //
}
Note: try to always use the trim() to remove all Blanc spaces at begin and end of string.
To split the líne per // use:
líne.split("//")
For more general purpose,check out :
Java - regular expression finding comments in code

Putting a text file into an ArrayList, but if word exist it skips it

I´m in a bit of a struggle here, I´m trying to add each word from a textfile to an ArrayList and every time the reader comes across the same word again it will skip it. (Makes sense?)
I don't even know where to start. I kind of know that I need one loop that adds the textfile to the ArrayList and one the checks if the word is not in the list. Any ideas?
PS: Just started with Java
This is what I've done so far, don't even know if I'm on the right path..
public String findWord(){
int text = 0;
int i = 0;
while sc.hasNextLine()){
wordArray[i] = sc.nextLine();
}
if wordArray[i].contains() {
}
i++;
}
A List (an ArrayList or otherwise) is not the best data structure to use; a Set is better. In pseudo code:
define a Set
for each word
if adding to the set returns false, skip it
else do whatever do want to do with the (first time encountered) word
The add() method of Set returns true if the set changed as a result of the call, which only happens if the word isn't already in the set, because sets disallow duplicates.
I once made a similar program, it read through a textfile and counted how many times a word came up.
Id start with importing a scanner, as well as a file system(this needs to be at the top of the java class)
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.File;
import java.io.PrintStream;
import java.util.Scanner;
then you can make file, as well as a scanner reading from this file, make sure to adjsut the path to the file accordingly. The new Printstream is not necessary but when dealing with a big amount of data i dont like to overflow the console.
public static void main(String[] args) throws FileNotFoundException {
File file=new File("E:/Youtube analytics/input/input.txt");
Scanner scanner = new Scanner(file); //will read from the file above
PrintStream out = new PrintStream(new FileOutputStream("E:/Youtube analytics/output/output.txt"));
System.setOut(out);
}
after this you can use scanner.next() to get the next word so you would write something like this:
String[] array=new String[MaxAmountOfWords];//this will make an array
int numberOfWords=0;
String currentWord="";
while(scanner.hasNext()){
currentWord=scanner.next();
if(isNotInArray(currentWord))
{
array[numberOfWords]=currentWord
}
numberOfWords++;
}
If you dont understand any of this or need further guidence to progress, let me know. It is hard to help you if we dont exactly know where you are at...
You can try this:
public List<String> getAllWords(String filePath){
String line;
List<String> allWords = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
//read each line of the file
while((line = reader.readLine()) != null) {
//get each word in the line
for(String word: line.split("(\\w)+"))
//validate if the current word is not empty
if(!word.isEmpty())
if(!allWords.contains(word))
allWords.add(word);
}
}
return allWords;
}
Best solution is to use a Set. But if you still want to use a List, here goes:
Suppose the file has the following data:
Hi how are you
I am Hardi
Who are you
Code will be:
List<String> list = new ArrayList<>();
// Get the file.
FileInputStream fis = new FileInputStream("C:/Users/hdinesh/Desktop/samples.txt");
//Construct BufferedReader from InputStreamReader
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
// Loop through each line in the file
while ((line = br.readLine()) != null) {
// Regex for finding just the words
String[] strArray = line.split("[ ]");
for (int i = 0; i< strArray.length; i++) {
if (!list.contains(strArray[i])) {
list.add(strArray[i]);
}
}
}
br.close();
System.out.println(list.toString());
If your text file has sentences with special characters, you will have to write a regex for that.

Copy part of line from 1 text to 2nd text file

I am working on project where I need to copy some part of each line from 1st text file to other. In the first text each data is separated by splitter --# (which i have used).
I want to get the 1st two parts of actual 4 parts total of 3 splitters Ex:
Hello--#StackOverflow--#BMWCar--#Bye.
I just want to fetch 1st 2 parts .ie.
Hello--#StackOverflow
from all the lines of first text file to second text file. I have tried everything and could not get it to work. Please help me out of this. :)
I am little late, but below code will work as well :
String str = "Hello--#StackOverflow--#BMWCar--#Bye.";
String strResult = str.split("(?<!\\G\\w+)(?:--#)")[0];
System.out.println(strResult);
\G is previous match, (?<!regex) is negative lookbehind.
[Update]
In your case, can we use below code? The solution is based on the file you provided
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Test{
public static void main (String[] args) throws IOException
{
BufferedReader in = new BufferedReader(new FileReader("abc.txt"));
String line;
while((line = in.readLine()) != null)
{
System.out.println(line.substring(0,line.lastIndexOf("--#")));
}
in.close();
}
}
For each line you read do the following:
String[] sp = line.split("--#",2);
String result = sp[0]+"--#"+sp[1];
If you're expecting names with dashes, it seems to me the best thing to use is String.split:
String test = "Hel-lo--#StackOverflow--#BMWCar--#Bye";
String[] sp = test.split("--#");
for(String name : Arrays.copyOfRange(sp, 0, 2))
{
System.out.println(name);
}
First, to get all the lines in the text file, try using a while loop. As you are reading in from one file, output to the second file. Split each line based on the conditional.
String inLine = in.nextLine();
String[] parts = inLine.split("--#");
String toWrite = parts[0] + " " + parts[1];
outLine.write(toWrite);
Should be easy enough to figure out.

How to split string by new lines in JAVA?

I want to split string by new lines in Java.I am using following regex -
str.split("\\r|\\n|\\r\\n");
But still it is not splitting string by new lines.
Input -
0
0
0
0
Output = String [] array = {"0000"} instead I want = String [] array = {"0","0","0","0"}.
I have read various solutions on stack overflow but nothing works for me.
Code is -
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.DecimalFormat;
public class Input {
public static void main(String[] args) {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line;
String text = "";
try {
while((line=br.readLine())!=null){
text = text + line;
}
} catch (IOException e) {
e.printStackTrace();
}
String [] textarray = text.trim().split("[\\r\\n]+");
for(int j=0;j<textarray.length;j++)
System.out.println(textarray[j]);
// System.out.print("");
// for(int i=((textarray.length)-1);i>=0;i--){
// long k = Long.valueOf(textarray[i]).longValue();
// System.out.println(k);
//// double sqrt = Math.sqrt(k);
//// double value = Double.parseDouble(new DecimalFormat("##.####").format(sqrt));
//// System.out.println(value);
////
//// }
}
When you call br.readLine(), the newline characters are stripped from the end of the string. So if you type 0 + ENTER four times, you are trying to split the string "0000".
You would be better to read items in from stdin and store them in an expandable data structure, such as a List<String>. No need to split things if you've already read them separately.
Updated Answer:
If you are reading the inputstreamfrom the keyboard, the \n may not be put into the data correctly. In that case, you may want to choose a new sentinel value.
Original Answer:
I believe you need to create a sentinel value. So if \n is your sentinel value, you could do something like this:
Load the inputstream into a string variable
Go character by character through the string variable checking to see if \n is in the input (you could use a for loop and the substing(i, i+2)
If it is found, then you could add it to an array

Incorrect Output from CSV File

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.ArrayList;
/**
* Write a description of class ReadInCsv here.
*
* #author (Kevin Knapp)
* #version (10-10-2013)
*/
public class ReadInCsv
{
public static void main(String[] args)
{
String csvName = "Countries.csv";
File csvFile = new File(csvName);
ArrayList<String> nameList = new ArrayList<>();
ArrayList<String> popList = new ArrayList<>();
ArrayList<String> areaList = new ArrayList<>();
ArrayList<String> gdpList = new ArrayList<>();
ArrayList<String> litRateList = new ArrayList<>();
try
{
Scanner in = new Scanner(csvFile).useDelimiter(",");
while (in.hasNext())
{
String name = in.next();
nameList.add(name);
String pop = in.next();
popList.add(pop);
String area = in.next();
areaList.add(area);
String gdp = in.next();
gdpList.add(gdp);
String litRate = in.next();
litRateList.add(litRate);
}
in.close();
System.out.println(nameList);
System.out.println(popList);
System.out.println(areaList);
System.out.println(gdpList);
System.out.println(litRateList);
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
}
}
So im trying to read from a csv file and as it goes through it should add each scanned instance into a array list (im going to reference each element from these lists at a later point), but my outputs show that as soon as it reads something and adds it, it skips to the next line before reading the next string, i need it to read straight across, not diagonally
im sure im just missing something very simple, I just began learning java about a week ago, thanks for the help
A big problem with this method is that unless each line in the file ends in a comma, newlines will not be delimited. A better way is to read each line in, split on commas, and add the values to the ArrayLists one line at a time:
Scanner in = new Scanner(csvFile);
while (in.hasNextLine()) {
String[] fields = in.nextLine().split(",");
if (fields.length == 5) {
nameList.add(fields[0]);
popList.add(fields[1]);
areaList.add(fields[2]);
gdpList.add(fields[3]);
litRateList.add(fields[4]);
} else {
// Bad line...do what you want to show error here
}
}
An even better way is to use a Java library dedicated to reading CSV files. A quick Google search should turn up some good ones.

Categories