streams to strings: merging multiple files into a single string - java

I've got two text files that I want to grab as a stream and convert to a string. Ultimately, I want the two separate files to merge.
So far, I've got
//get the input stream of the files.
InputStream is =
cts.getClass().getResourceAsStream("/files/myfile.txt");
// convert the stream to string
System.out.println(cts.convertStreamToString(is));
getResourceAsStream doesn't take multiple strings as arguments. So what do I need to do? Separately convert them and merge together?
Can anyone show me a simple way to do that?

It sounds like you want to concatenate streams. You can use a SequenceInputStream to create a single stream from multiple streams. Then read the data from this single stream and use it as you need.
Here's an example:
String encoding = "UTF-8"; /* You need to know the right character encoding. */
InputStream s1 = ..., s2 = ..., s3 = ...;
Enumeration<InputStream> streams =
Collections.enumeration(Arrays.asList(s1, s2, s3));
Reader r = new InputStreamReader(new SequenceInputStream(streams), encoding);
char[] buf = new char[2048];
StringBuilder str = new StringBuilder();
while (true) {
int n = r.read(buf);
if (n < 0)
break;
str.append(buf, 0, n);
}
r.close();
String contents = str.toString();

You can utilize commons-io which has the ability to read a Stream into a String
http://commons.apache.org/io/api-release/org/apache/commons/io/IOUtils.html#toString%28java.io.InputStream%29

Off hand I can think of a couple ways
Create a StringBuilder, then convert each stream to a string and append to the stringbuilder.
Or, create a writable memorystream and stream each input stream into that memorystream, then convert that to a string.

Create a loop that for each file loads the text into a StringBuilder. Then once each file's data is appended, call toString() on the builder.

Related

Best way to populate a user defined object using the values of string array

I am reading two different csv files and populating data into two different objects. I am splitting each line of csv file based on regex(regex is different for two csv files) and populating the object using each data of that array which is obtained by splitting each line using regex as shown below:
public static <T> List<T> readCsv(String filePath, String type) {
List<T> list = new ArrayList<T>();
try {
File file = new File(filePath);
FileInputStream fileInputStream = new FileInputStream(file);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)
list = bufferedReader.lines().skip(1).map(line -> {
T obj = null;
String[] data = null;
if (type.equalsIgnoreCase("Student")) {
data = line.split(",");
ABC abc = new ABC();
abc.setName(data[0]);
abc.setRollNo(data[1]);
abc.setMobileNo(data[2]);
obj = (T)abc;
} else if (type.equalsIgnoreCase("Employee")) {
data = line.split("\\|");
XYZ xyz = new XYZ();s
xyz.setName(Integer.parseInt(data[0]));
xyz.setCity(data[1]);
xyz.setEmployer(data[2]);
xyz.setDesignation(data[3]);
obj = (T)xyz;
}
return obj;
}).collect(Collectors.toList());} catch(Exception e) {
}}
csv files are as below:
i. csv file to populate ABC object:
Name,rollNo,mobileNo
Test1,1000,8888888888
Test2,1001,9999999990
ii. csv file to populate XYZ object
Name|City|Employer|Designation
Test1|City1|Emp1|SSE
Test2|City2|Emp2|
The issue is there can be a missing data for any of the above columns in the csv file as shown in the second csv file. In that case, I will get ArrayIndexOutOfBounds exception.
Can anyone let me know what is the best way to populate the object using the data of the string array?
Thanks in advance.
In addition to the other mistakes you made and that were pointed out to you in the comments your actual problem is caused by line.split("\\|") calling line.split("\\|", 0) which discards the trailing empty String. You need to call it with line.split("\\|", -1) instead and it will work.
The problem appears to be that one or more of the last values on any given CSV line may be empty. In that case, you run into the fact that String.split(String) suppresses trailing empty strings.
Supposing that you can rely on all the fields in fact being present, even if empty, you can simply use the two-arg form of split():
data = line.split(",", -1);
You can find details in that method's API docs.
If you cannot be confident that the fields will be present at all, then you can force them to be by adding delimiters to the end of the input string:
data = (line + ",,").split(",", -1);
Since you only use the first values few values, any extra trailing values introduced by the extra delimiters would be ignored.

Read faster a file & convert it into HEX

I need to read a file that is in ascii and convert it into hex before applying some functions (search for a specific caracter)
To do this, I read a file, convert it in hex and write into a new file. Then I open my new hex file and I apply my functions.
My issue is that it makes way too much time to read and convert it (approx 8sec for a 9Mb file)
My reading method is :
public static void convertToHex2(PrintStream out, File file) throws IOException {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
int value = 0;
StringBuilder sbHex = new StringBuilder();
StringBuilder sbResult = new StringBuilder();
while ((value = bis.read()) != -1) {
sbHex.append(String.format("%02X ", value));
}
sbResult.append(sbHex);
out.print(sbResult);
bis.close();
}
Do you have any suggestions to make it faster ?
Did you measure what your actual bottleneck is? Because you seem to read very little amount of data in your loop and process that each time. You might as well read larger chunks of data and process those, e.g. using DataInputStream or whatever. That way you would benefit more from optimized reads of your OS, file system, their caches etc.
Additionally, you fill sbHex and append that to sbResult, to print that somewhere. Looks like an unnecessary copy to me, because sbResult will always be empty in your case and with sbHex you already have a StringBuilder for your PrintStream.
Try this:
static String[] xx = new String[256];
static {
for( int i = 0; i < 256; ++i ){
xx[i] = String.format("%02X ", i);
}
}
and use it:
sbHex.append(xx[value]);
Formatting is a heavy operation: it does not only the coversion - it also has to look at the format string.

How do I get an FileInputStream from FileItem in java?

I am trying to avoid the FileItem getInputStream(), because it will get the wrong encoding, for that I need a FileInputStream instead. Is there any way to get a FileInputStream without using this method? Or can I transform my fileitem into a file?
if (this.strEncoding != null && !this.strEncoding.isEmpty()) {
br = new BufferedReader(new InputStreamReader(clsFile.getInputStream(), this.strEncoding));
}
else {
// br = ?????
}
You can try
FileItem#getString(encoding)
Returns the contents of the file item as a String, using the specified encoding.
You can use the write method here.
File file = new File("/path/to/file");
fileItem.write(file);
An InputStream is binary data, bytes. It must be converted to text by giving the encoding of those bytes.
Java uses internally Unicode to represent all text scripts. For text it uses String/char/Reader/Writer.
For binary data, byte[], InputStream, OutputStream.
So you could use a bridging class, like InputStreamReader:
String encoding = "UTF-8"; // Or "Windows-1252" ...
BufferedReader in = new BufferedStream(
new InputStreamReader(fileItem.getInputStream(),
encoding));
Or if you read the bytes:
String s = new String(bytes, encoding);
The encoding is often an option parameter (there then exists an overloaded method without encoding).

How to chain multiple different InputStreams into one InputStream

I'm wondering if there is any ideomatic way to chain multiple InputStreams into one continual InputStream in Java (or Scala).
What I need it for is to parse flat files that I load over the network from an FTP-Server. What I want to do is to take file[1..N], open up streams and then combine them into one stream. So when file1 comes to an end, I want to start reading from file2 and so on, until I reach the end of fileN.
I need to read these files in a specific order, data comes from a legacy system that produces files in barches so data in one depends on data in another file, but I would like to handle them as one continual stream to simplify my domain logic interface.
I searched around and found PipedInputStream, but I'm not positive that is what I need. An example would be helpful.
It's right there in JDK! Quoting JavaDoc of SequenceInputStream:
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
You want to concatenate arbitrary number of InputStreams while SequenceInputStream accepts only two. But since SequenceInputStream is also an InputStream you can apply it recursively (nest them):
new SequenceInputStream(
new SequenceInputStream(
new SequenceInputStream(file1, file2),
file3
),
file4
);
...you get the idea.
See also
How do you merge two input streams in Java? (dup?)
This is done using SequencedInputStream, which is straightforward in Java, as Tomasz Nurkiewicz's answer shows. I had to do this repeatedly in a project recently, so I added some Scala-y goodness via the "pimp my library" pattern.
object StreamUtils {
implicit def toRichInputStream(str: InputStream) = new RichInputStream(str)
class RichInputStream(str: InputStream) {
// a bunch of other handy Stream functionality, deleted
def ++(str2: InputStream): InputStream = new SequenceInputStream(str, str2)
}
}
With that, I can do stream sequencing as follows
val mergedStream = stream1++stream2++stream3
or even
val streamList = //some arbitrary-length list of streams, non-empty
val mergedStream = streamList.reduceLeft(_++_)
Another solution: first create a list of input stream and then create the sequence of input streams:
List<InputStream> iss = Files.list(Paths.get("/your/path"))
.filter(Files::isRegularFile)
.map(f -> {
try {
return new FileInputStream(f.toString());
} catch (Exception e) {
throw new RuntimeException(e);
}
}).collect(Collectors.toList());
new SequenceInputStream(Collections.enumeration(iss)))
Here is a more elegant solution using Vector, this is for Android specifically but use vector for any Java
AssetManager am = getAssets();
Vector v = new Vector(Constant.PAGES);
for (int i = 0; i < Constant.PAGES; i++) {
String fileName = "file" + i + ".txt";
InputStream is = am.open(fileName);
v.add(is);
}
Enumeration e = v.elements();
SequenceInputStream sis = new SequenceInputStream(e);
InputStreamReader isr = new InputStreamReader(sis);
Scanner scanner = new Scanner(isr); // or use bufferedReader
Here's a simple Scala version that concatenates an Iterator[InputStream]:
import java.io.{InputStream, SequenceInputStream}
import scala.collection.JavaConverters._
def concatInputStreams(streams: Iterator[InputStream]): InputStream =
new SequenceInputStream(streams.asJavaEnumeration)

Convert Stream to String Java/Groovy

I stole this snippet off the web. But it looks to be limited to 4096 bytes and is quite ugly IMO. Anyone know of a better approach? I'm actually using Groovy btw...
String streamToString(InputStream input) {
StringBuffer out = new StringBuffer();
byte[] b = new byte[4096];
for (int n; (n = input.read(b)) != -1;) {
out.append(new String(b, 0, n));
}
return out.toString();
}
EDIT:
I found a better solution in Groovy:
InputStream exportTemplateStream = getClass().getClassLoader().getResourceAsStream("export.template")
assert exportTemplateStream: "[export.template stream] resource not found"
String exportTemplate = exportTemplateStream.text
Some good and fast answers. However I think the best one is Groovy has added a "getText" method to InputStream. So all I had to do was stream.text. And good call on the 4096 comment.
For Groovy
filePath = ... //< a FilePath object
stream = filePath.read() //< InputStream object
// Specify the encoding, and get the String object
//content = stream.getText("UTF-16")
content = stream.getText("UTF-8")
The InputStream class reference
The getText() without encoding, it will use current system encoding, ex ("UTF-8").
Try IOUtils from Apache Commons:
String s = IOUtils.toString(inputStream, "UTF-8");
It's reading the input in chunks of 4096 bytes(4KB), but the size of the actual string is not limited as it keeps reading more and appending it to the SringBuffer.
You can do it fairly easily using the Scanner class:
String streamToSring(InputStream input) {
Scanner s = new Scanner(input);
StringBuilder builder = new StringBuilder();
while (s.hasNextLine()) {
builder.append(s.nextLine() +"\n");
}
return builder.toString();
}
That snippet has a bug: if the input uses a multi-byte character encoding, there's a good chance that a single character will span two reads (and not be convertable). And it also has the semi-bug that it relies on the platform's default encoding.
Instead, use Jakarta Commons IO. In particular, the version of IOUtils.toString() that takes an InputStream and applies an encoding to it.
For future reviewers who have similar problems, please note that both IOUtils from Apache, and Groovy's InputStream.getText() method require the stream to complete, or be closed before returning. If you are working with a persistent stream you will nead to deal with the "ugly" example that Phil originally posted, or work with non-blocking IO.
You can try something similar to this
new FileInputStream( new File("c:/tmp/file.txt") ).eachLine { println it }

Categories