I'm writing an application which has a method that will download a text file from my server. This text file will contain ~1,000 proxy IP's. The download will happen every 10 minutes. I need to find the most efficient way of doing this.
Currently I have a method in a class called Connection which will return the bytes of whatever I want to retrieve. So if I make a connection to the server for the text file using such method, I will get it returned in bytes. My other method will create a very long string from these bytes. After, I split the long string into an array using System.LineSeparator. Here is the code:
public static void fetchProxies(String url) {
Connection c = new Connection();
List<Proxy> tempProxy = new ArrayList<Proxy>();
ByteArrayOutputStream baos =
c.requestBytes(url);
String line = new String(baos.toByteArray());
String[] split = line.split(System.lineSeparator());
//more code to come but the above works fine.
}
This currently works but I know that it isn't the most efficient way. I
My Problem
Instead of turning the bytes into a very long string, what is the most efficient way of turning the bytes into my IP's so I can add each individual IP into an arraylist and then return the arraylist full of IP's?
The most efficient and logical way would be to create a BufferedReader wrapping an InputStreamReader wrapping the InputStream of the URL connection. You would the use readLine() on the BufferedReader until it returns null, and append each line read to the list of IP addresses:
List<String> ipList = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), theAppropriateEncoding))) {
String line;
while ((line = reader.readLine()) != null) {
ipList.add(line);
}
}
Note that this probably won't change much in the performance of the method, though, because most of the time is being spend in waiting fof bytes coming from the remote host, which is considerably slower than building and splitting a String in memory.
split method from String isn't the fastest way to separate all the IP's. There ara other libraries to achive this in an more optimized way.
Read this: http://demeranville.com/battle-of-the-tokenizers-delimited-text-parser-performance/
There is a very nice time comparision about 7 different ways to split a String.
For example a the Splitter class from Guava library returns an Iterable, and with Guava you could also convert the results as List:
import com.google.common.base.Splitter;
...
public static void fetchProxies(String url) {
Connection c = new Connection();
List<Proxy> tempProxy = new ArrayList<Proxy>();
ByteArrayOutputStream baos =
c.requestBytes(url);
String line = new String(baos.toByteArray());
Iterator<Element> myIterator =
Splitter.on(System.getProperty("line.separator")).split(line);
List<Element> myList = Lists.newArrayList(myIterator);
// do somethjing with the List...
Related
I am writing application for Android devices, which have to communicate with some other device via bluetooth. (the other device its just a board like Raspery pi with attached bluetooth)
This other device accept list of command which I are single bytes like 'A', 'X', 'C' etc. and when you sending command it always returns some response which is something like 'OK', 'ERROR4' or somethoer data like '0000000000000001230120000000'.
I have implemented most of the command in my application and they work fine. But I have issue with last command which return 1748 bytes. Most of the time I am using this method to getting response, and it works fine:
private String send(byte [] bytes) {
if(bluetoothService == null)
return null;
if(bluetoothSocket == null)
return null;
String line = "";
Log.wtf("BYTESARR", Arrays.toString(bytes));
try {
OutputStream outputStream = bluetoothSocket.getOutputStream();
InputStream inputStream = bluetoothSocket.getInputStream();
outputStream.write(bytes);
outputStream.flush();
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
line = r.readLine();
} catch (IOException e) {
e.printStackTrace();
}
return line;
}
But for that command where response is 1748 bytes method above freezing my app. So I implemented second method only for that command, which you see below.
OutputStream outputStream = bluetoothSocket.getOutputStream();
InputStream inputStream = bluetoothSocket.getInputStream();
outputStream.write(bytes);
outputStream.flush();
char [] b = new char[1540];
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
int from = 0;
for(int i = 0; i < EXCEPTIONS_NUMBER; i++){
int asdf = r.read(b, from, 50); // from and len are related to B not R
Log.wtf("READ", asdf + " <");
from += asdf;
}
line = new String(b);
Log.wtf("LINE", line);
But problem with that method is my response looks like this: (
...0000000000000������������...
Less then half of this response are 0 (zeros), which is what I was expecting, but second part of this response are those strange question marks(which I believe should be zeros) and I do not know why is that?
Question 1) Why first method making my app freeze when I am reading response with 1748bytes?
Question 2) Why second method is giving me that strange question marks instead of zeros?
EDIT [Solved]
I found the solution. So basically I have to use ByteArrayInputStream like this:
byte [] b = new byte[1538];
ByteArrayInputStream bais = new ByteArrayInputStream(b);
inputStream.read(b);
And that allows me read all bytes from b array and allows me to use bais if needed. I am not sure why this solution works, yet. But I am glad it worked. If someone understand it and can explain would be great. If not I will update post, when find out why it is working, if post will be not closed/deleted.
when you call r.readLine(), the method will not return until one of two conditions is met: the stream returns a line separator character, or the stream signals it has no more data. If neither of those things happen, your app will "freeze" indefinitely.
when you create an array of characters with new char[1540], it's filled with "NUL" characters - a non printing control character with ASCII/Unicode code 0 (don't confuse with "null" object reference in Java). While you read from the stream, you replace some of the content of the array with the characters you read, but some of the original NULs are left in place, and these make it to the string you create later.
To fix, create the string using the portion of the array that you've written to:
line = new String(b, 0, from);
Say we have a file like so:
one
two
three
(but this file got encrypted)
My crypto method returns the whole file in memory, as a byte[] type.
I know byte arrays don't have a concept of "lines", that's something a Scanner (for example) could have.
I would like to traverse each line, convert it to string and perform my operation on it but I don't know
how to:
Find lines in a byte array
Slice the original byte array to "lines" (I would convert those slices to String, to send to my other methods)
Correctly traverse a byte array, where each iteration is a new "line"
Also: do I need to consider the different OS the file might have been composed in? I know that there is some difference between new lines in Windows and Linux and I don't want my method to work only with one format.
Edit: Following some tips from answers here, I was able to write some code that gets the job done. I still wonder if this code is worthy of keeping or I am doing something that can fail in the future:
byte[] decryptedBytes = doMyCrypto(fileName, accessKey);
ByteArrayInputStream byteArrInStrm = new ByteArrayInputStream(decryptedBytes);
InputStreamReader inStrmReader = new InputStreamReader(byteArrInStrm);
BufferedReader buffReader = new BufferedReader(inStrmReader);
String delimRegex = ",";
String line;
String[] values = null;
while ((line = buffReader.readLine()) != null) {
values = line.split(delimRegex);
if (Objects.equals(values[0], tableKey)) {
return values;
}
}
System.out.println(String.format("No entry with key %s in %s", tableKey, fileName));
return values;
In particular, I was advised to explicitly set the encoding but I was unable to see exactly where?
If you want to stream this, I'd suggest:
Create a ByteArrayInputStream to wrap your array
Wrap that in an InputStreamReader to convert binary data to text - I suggest you explicitly specify the text encoding being used
Create a BufferedReader around that to read a line at a time
Then you can just use:
String line;
while ((line = bufferedReader.readLine()) != null)
{
// Do something with the line
}
BufferedReader handles line breaks from all operating systems.
So something like this:
byte[] data = ...;
ByteArrayInputStream stream = new ByteArrayInputStream(data);
InputStreamReader streamReader = new InputStreamReader(stream, StandardCharsets.UTF_8);
BufferedReader bufferedReader = new BufferedReader(streamReader);
String line;
while ((line = bufferedReader.readLine()) != null)
{
System.out.println(line);
}
Note that in general you'd want to use try-with-resources blocks for the streams and readers - but it doesn't matter in this case, because it's just in memory.
As Scott states i would like to see what you came up with so we can help you alter it to fit your needs.
Regarding your last comment about the OS; if you want to support multiple file types you should consider making several functions that support those different file extensions. As far as i know you do need to specify which file and what type of file you are reading with your code.
I am reading in from a stream using a BufferedReader and InputStreamReader to create one long string that gets created from the readers. It gets up to over 100,000 lines and then throws a 500 error (call failed on the server). I am not sure what is the problem, is there anything faster than this method? It works when the lines are in the thousands but i am working with large data sets.
BufferedReader in = new BufferedReader(new InputStreamReader(newConnect.getInputStream()));
String inputLine;
String xmlObject = "";
StringBuffer str = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
str.append(inputLine);
str.toString();
}
in.close();
Thanks in advance
to create one long string that gets created from the readers.
Are you by any chance doing this to create your "long string"?
String string;
while(...)
string+=whateverComesFromTheSocket;
If yes, then change it to
StringBuilder str = new StringBuilder(); //Edit:Just changed StringBuffer to StringBuilder
while(...)
str.append(whateverComesFromTheSocket);
String string = str.toString();
String objects are immutable and when you do str+="something", memory is reallocated and str+"something" is copied to that newly allocated area. This is a costly operation and running it 51,000 times is an extremely bad thing to do.
StringBuffer and StringBuilder are String's mutable brothers and StringBuilder, being non-concurrent is more efficient than StringBuffer.
readline() can read at about 90 MB/s, its what you are doing with the data read which is slow. BTW readline removes newlines so this approach you are using is flawed as it will turn everying into one line.
Rather than re-inventing the wheel I would suggest you try FileUtils.readLineToString()
This will read a file as a STring without discarding newlines, efficiently.
I have problem to read last n lines from url. How to do that ? I have url.openstream but there is no contrsuctor for RandomAccessFile which has input for stream. Can somebody help me ? Is there meybe already library for this. ( I know how to implement with RandomAccess when I have file but how to change stream to file ).
Open the URL stream as per usual.
Wrap the returned InputStream in a BufferedReader so you can read it line by line.
Maintain a LinkedList into which you will save the lines.
After reading each line from the BufferedReader:
Add the line to the list.
If the size of the list is greater than "n" then call LinkedList#removeFirst().
Once you have read all lines from the stream the list will contain the last "n" lines.
For example (untested, just for demonstration):
BufferedReader in = new BufferedReader(url.openStream());
LinkedList<String> lines = new LinkedList<String>();
String line = null;
while ((line = in.readLine()) != null) {
lines.add(line);
if (lines.size() > nLines) {
lines.removeFirst();
}
}
// Now "lines" has the last "n" lines of the stream.
Sorry. You're going to have to do this one yourself. But don't worry because it's pretty simple.
You just need to keep track of the last n lines you have encountered since you started reading from the UrlStream. Might I suggest using a Queue?
Basically you could do something like
public String[] readLastNLines(final URL url, final int n) throws IOException{
final Queue<String> q = new LinkedList<String>();
final BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
String line=null;
while ((line = br.readLine())!=null)
{
q.add(line);
if (q.size()>n) q.remove();
}
return q.toArray(new String[q.size()]);
}
readLastNLines returns an array containing the last n lines read from url.
Unfortunately, you cannot use a RandomAccessFile with a stream from the Internet because streams are, by definition, not random access.
I am using Java + Selenium 1 to test a web application.
I have to read through a text file line by line using befferedreader.readLine and compare the data that was found to another String.
Is there way to assign each line a unique string? I think it would be something like this:
FileInputStream fstream = new FileInputStream("C:\\write.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String[] strArray = null;
int p=0;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strArray[p] = strLine;
assertTrue(strArray[p].equals(someString));
p=p+1;
}
The problem with this is that you don't know how many lines there are, so you can't size your array correctly. Use a List<String> instead.
In order of decreasing importance,
You don't need to store the Strings in an array at all, as pointed out by Perception.
You don't know how many lines there are, so as pointed out by Qwerky, if you do need to store them you should use a resizeable collection like ArrayList.
DataInputStream is not needed: you can just wrap your FileInputStream directly in an InputStreamReader.
You may want to try something like:
public final static String someString = "someString";
public boolean isMyFileOk(String filename){
Scanner sc = new Scanner(filename);
boolean fileOk = true;
while(sc.hasNext() && fileOk){
String line = sc.nextLine();
fileOk = isMyLineOk(line);
}
sc.close();
return fileOk;
}
public boolean isMyLineOk(String line){
return line.equals(someString);
}
The Scanner class is usually a great class to read files :)
And as suggested, you may check one line at a time instead of loading them all in memory before processing them. This may not be an issue if your file is relatively small but you better keep your code scalable, especially for doing the exact same thing :)