How can I optimize this String search/replace method?

How can I optimize this String search/replace method? - java

I am implementing my own web server. The following method searches for server side includes and builds the html page appropriately.
public String getSSI(String content) throws IOException {
String beginString = "<!--#INCLUDE VIRTUAL=\"";
String endString = "\"-->";
int beginIndex = content.indexOf(beginString);
while (beginIndex != -1) {
int endIndex = content.indexOf(endString, beginIndex);
String includePath = content.substring(beginIndex+beginString.length(), endIndex);
File includeFile = new File(BASE_DIR+includePath);
byte[] bytes = new byte[(int) includeFile.length()];
FileInputStream in = new FileInputStream(includeFile);
in.read(bytes);
in.close();
String includeContent = new String(bytes);
includeContent = getSSI(includeContent);
content = content.replaceAll(beginString+includePath+endString, includeContent);
beginIndex = content.indexOf(beginString);
}
return content;
}
I know StringBuilder is faster than String, but is that all I can do to optimize this? The original data is read into a byte array and converted into a String, at which point it is passed into this method, and the output is converted back into a byte array and sent to the client.

I don't know how significant of an impact this will have, but instead of reading into a byte array and and converting to a String, you can use the IOUtils toString(InputStream) method to read directly to a String. Likewise, you can write the String directly to an OutputStream.

Related

Sending string as byte array from C# to Java via socket

I am trying the following:
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes:
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
mClientSocket.GetStream().Write(headerBytes, 0, headerBytes.Length);
//write text:
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
int headerSize = in.read();
int bytesRead = 0;
char[] input = new char[headerSize];
while (bytesRead < headerSize)
{
bytesRead += in.read(input, bytesRead, headerSize - bytesRead);
}
String resString = new String(input);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}
The string size equals 9.That's correct on both sides.But, when I am reading the string iteself on the Java side, the data looks wrong.The char buffer ('input' variable)content looks like this:
",",",'H','e','l','l','o',''
I tried to change endianness with reversing the byte array.Also tried changing string encoding format between ASCII and UTF-8.I still feel like it relates to the endianness problem,but can not figure out how to solve it.I know I can use other types of writers in order to write text data to the steam,but I am trying using raw byte arrays for the sake of learning.

These
byte[] headerBytes = BitConverter.GetBytes(stringToSend.Length);
are 4 bytes. And they aren't character data so it makes no sense to read them with a BufferedReader. Just read the bytes directly.
byte[] headerBytes = new byte[4];
// shortcut, make sure 4 bytes were actually read
in.read(headerBytes);
Now extract your text's length and allocate enough space for it
int length = ByteBuffer.wrap(headerBytes).getInt();
byte[] textBytes = new byte[length];
Then read the text
int remaining = length;
int offset = 0;
while (remaining > 0) {
int count = in.read(textBytes, offset, remaining);
if (-1 == count) {
// deal with it
break;
}
remaining -= count;
offset += count;
}
Now decode it as UTF-8
String text = new String(textBytes, StandardCharsets.UTF_8);
and you are done.
Endianness will have to match for those first 4 bytes. One way of ensuring that is to use "network order" (big-endian). So:
C# Client
byte[] headerBytes = BitConverter.GetBytes(IPAddress.HostToNetworkOrder(stringToSend.Length));
Java Server
int length = ByteBuffer.wrap(headerBytes).order(ByteOrder.BIG_ENDIAN).getInt();

At first glance it appears you have a problem with your indexes.
You C# code is sending an integer converted to 4 bytes.
But you Java Code is only reading a single byte as the length of the string.
The next 3 bytes sent from C# are going to the three zero bytes from your string length.
You Java code is reading those 3 zero bytes and converting them to empty characters which represent the first 3 empty characters of your input[] array.
C# Client:
string stringToSend = "Hello man";
BinaryWriter writer = new BinaryWriter(mClientSocket.GetStream(),Encoding.UTF8);
//write number of bytes: Original line was sending the entire string here. Optionally if you string is longer than 255 characters, you'll need to send another data type, perhaps an integer converted to 4 bytes.
byte[] textBytes = System.Text.Encoding.UTF8.GetBytes(stringToSend);
mClientSocket.GetStream().Write((byte)textBytes.Length);
//write text the entire buffer
writer.Write(textBytes, 0, textBytes.Length);
Java Server:
Charset utf8 = Charset.forName("UTF-8");
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(), utf8));
while (true) {
//we read header first
// original code was sending an integer as 4 bytes but was only reading a single char here.
int headerSize = in.read();// read a single byte from the input
int bytesRead = 0;
char[] input = new char[headerSize];
// no need foe a while statement here:
bytesRead = in.read(input, 0, headerSize);
// if you are going to use a while statement, then in each loop
// you should be processing the input but because it will get overwritten on the next read.
String resString = new String(input, utf8);
System.out.println(resString);
if (resString.equals("!$$$")) {
break;
}
}

Base64 Failed to Decode String to Byte Array

I tried to decode a string to byte array using Base64. But it returned null. Here is the code:
LZW lzw = new LZW();
String enkripEmbedFileString = Base64.encode(byteFile);
List<Short> compressed = lzw.compress(enkripEmbedFileString);
String kompress = "";
Iterator<Short> compressIterator = compressed.iterator();
while (compressIterator.hasNext()) {
String sch = compressIterator.next().toString();
int in = Integer.parseInt(sch);
char ch = (char) in;
kompress = kompress + ch;
}
byteFile = Base64.decode(kompress);
I call "byteFile" variable at the last row in a code below this code and it throw NullPointerException.
I have check the "kompress" variable and it's not null. It contains a string.
All you need to know is, with that code I compress a string with LZW which require String for parameter and returns List<Short>. And, I convert the List<Short> to a String with a loop that you can see.
The problem is, why Base64 failed to convert String to byte[], after that String modified with LZW?
Whereas, if I decompress the String first and than return the decompressed String to be converted with Base64 to byte[], has no problem. It works. Here is the code which works:
//LZW Compress
LZW lzw = new LZW();
String enkripEmbedFileString = Base64.encode(byteFile);
List<Short> compressed = lzw.compress(enkripEmbedFileString);
String kompress = "";
Iterator<Short> compressIterator = compressed.iterator();
while (compressIterator.hasNext()) {
String sch = compressIterator.next().toString();
int in = Integer.parseInt(sch);
char ch = (char) in;
kompress = kompress + ch;
}
//Decompress
List<Short> kompressback = back(kompress);
String decompressed = decompress(kompressback);
byteFile = Base64.decode(decompressed);
Please, give me an explanation. Where is my fault?

Base64 decode can be applied only to strings that contain Base64 encoded data. Since you encode and then compress, the result is not Base64. You proved it yourself when you saw that uncompressing the data first allowed you to then decode the Base64 string.

Java: Reading file in two parts - partly as String and partly as byte[]

I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!

The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.

Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}

Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}

GZIP decompress string and byte conversion

I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.

You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)

When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.

When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)

UTF-8 byte[] to String

Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and converting each one?
public String openFileToString(byte[] _bytes)
{
String file_string = "";
for(int i = 0; i < _bytes.length; i++)
{
file_string += (char)_bytes[i];
}
return file_string;
}

Look at the constructor for String
String str = new String(bytes, StandardCharsets.UTF_8);
And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:
String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

Java String class has a built-in-constructor for converting byte array to string.
byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
String value = new String(byteArray, "UTF-8");

To convert utf-8 data, you can't assume a 1-1 correspondence between bytes and characters.
Try this:
String file_string = new String(bytes, "UTF-8");
(Bah. I see I'm way to slow in hitting the Post Your Answer button.)
To read an entire file as a String, do something like this:
public String openFileToString(String fileName) throws IOException
{
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
try {
InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
StringBuilder contents = new StringBuilder();
char[] buff = new char[4096];
int len = rdr.read(buff);
while (len >= 0) {
contents.append(buff, 0, len);
}
return buff.toString();
} finally {
try {
is.close();
} catch (Exception e) {
// log error in closing the file
}
}
}

You can use the String(byte[] bytes) constructor for that. See this link for details.
EDIT You also have to consider your plateform's default charset as per the java doc:
Constructs a new String by decoding the specified array of bytes using
the platform's default charset. The length of the new String is a
function of the charset, and hence may not be equal to the length of
the byte array. The behavior of this constructor when the given bytes
are not valid in the default charset is unspecified. The
CharsetDecoder class should be used when more control over the
decoding process is required.

You could use the methods described in this question (especially since you start off with an InputStream): Read/convert an InputStream to a String
In particular, if you don't want to rely on external libraries, you can try this answer, which reads the InputStream via an InputStreamReader into a char[] buffer and appends it into a StringBuilder.

Knowing that you are dealing with a UTF-8 byte array, you'll definitely want to use the String constructor that accepts a charset name. Otherwise you may leave yourself open to some charset encoding based security vulnerabilities. Note that it throws UnsupportedEncodingException which you'll have to handle. Something like this:
public String openFileToString(String fileName) {
String file_string;
try {
file_string = new String(_bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
// this should never happen because "UTF-8" is hard-coded.
throw new IllegalStateException(e);
}
return file_string;
}

Here's a simplified function that will read in bytes and create a string. It assumes you probably already know what encoding the file is in (and otherwise defaults).
static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";
public static String readFileToString(String filePath, String encoding) throws IOException {
if (encoding == null || encoding.length() == 0)
encoding = DEFAULT_ENCODING;
StringBuffer content = new StringBuffer();
FileInputStream fis = new FileInputStream(new File(filePath));
byte[] buffer = new byte[BUFF_SIZE];
int bytesRead = 0;
while ((bytesRead = fis.read(buffer)) != -1)
content.append(new String(buffer, 0, bytesRead, encoding));
fis.close();
return content.toString();
}

String has a constructor that takes byte[] and charsetname as parameters :)

This also involves iterating, but this is much better than concatenating strings as they are very very costly.
public String openFileToString(String fileName)
{
StringBuilder s = new StringBuilder(_bytes.length);
for(int i = 0; i < _bytes.length; i++)
{
s.append((char)_bytes[i]);
}
return s.toString();
}

Why not get what you are looking for from the get go and read a string from the file instead of an array of bytes? Something like:
BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));
then readLine from in until it's done.

I use this way
String strIn = new String(_bytes, 0, numBytes);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I optimize this String search/replace method? - java

I don't know how significant of an impact this will have, but instead of reading into a byte array and and converting to a String, you can use the IOUtils toString(InputStream) method to read directly to a String. Likewise, you can write the String directly to an OutputStream.

Related

Sending string as byte array from C# to Java via socket

Base64 Failed to Decode String to Byte Array

Java: Reading file in two parts - partly as String and partly as byte[]

GZIP decompress string and byte conversion

UTF-8 byte[] to String

Categories

Resources