I have a java program that uploads files from local to Minio browser. The file size is around 900 MB. When I'm executing the java program I get -
Java.lang.OutOfMemoryError - Java heap Size
I tried increasing heap size both in eclipse.ini as well as under Run-->Configurations-->Project to -Xms4096M -Xmx8192M.
After increasing the heap size when I executed the program I recieve -
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
How to upload large size files to Minio using Java ?
This is how my java program looks like -
StringBuilder stringBuilder = new StringBuilder();
File[] files = new File(path).listFiles();
showFiles(files);
System.out.println(pathList);
ListIterator<String> itr=pathList.listIterator();
while(itr.hasNext()){
String relativePath=itr.next();
if(relativePath!=null) {
String absolutePath=path+(relativePath).replaceFirst("minio_files", "");
System.out.println(absolutePath);
System.out.println(relativePath);
File f =new File(absolutePath);
BufferedReader reader = new BufferedReader(new FileReader(f));
String line = null;
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
if(stringBuilder.length()!=0) {
// delete the last new line separator
stringBuilder.deleteCharAt(stringBuilder.length() - 1);
}
reader.close();
// Create a InputStream for object upload.
ByteArrayInputStream bais = new ByteArrayInputStream(stringBuilder.toString().getBytes("UTF-8"));
Do you absolutely need to remove a trailing line separator from your text file?
If this not absolutely required you could let the minio client libraries handle the upload transparently:
String absolutePath=path+(relativePath).replaceFirst("minio_files", "");
File f =new File(absolutePath);
minio.putObject("bucketName", f.getName(), absolutePath);
According to the minio docs this allows uploads of up to 5 GB. This is easier to implement and faster than any other solution.
If you absolutely need to remove a trailing line separator, you should at least pre-size the StringBuilder (and use the correct code to remove the trailing line separator):
File f = new File(absolutePath);
stringBuilder.ensureCapacity((int) f.length()+2);
try (BufferedReader reader = new BufferedReader(new FileReader(f))) {
String line;
String ls = System.getProperty("line.separator");
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
stringBuilder.append(ls);
}
if (stringBuilder.length() != 0) {
// delete the last new line separator
stringBuilder.setLength(stringBuilder.length() - ls.length());
}
}
Please beware that this code can never upload files larger than about 2GB:
arrays in java cannot be larger than Integer.MAX_VALUE-5
therefore StringBuilder cannot be used to create strings with more than Integer.MAX_VALUE-5 characters
transforming the string into a UTF-8 encoded byte array cannot produce a byte array longer than Integer.MAX_VALUE-5 bytes
since UTF-8 is a multibyte encoding, transforming a string with Integer.MAX_VALUE-5 characters into a byte array might not be possible
Related
Is there any "already-implemented" (not manual) way to replace all occurrences of single byte-array/string inside byte array ? I have a case where i need to create byte array containing platform dependent text (Linux (line feed), Windows (carriage return + line feed)). I know such task can be implemented manually but i am looking for out-of-the-box solution. Note that these byte array's are large and solution needs to be performance wise in my case. Also note that i am processing large amount of these byte-arrays.
My current approach:
var byteArray = resourceLoader.getResource("classpath:File.txt").getInputStream().readAllBytes();
byteArray = new String(byteArray)
.replaceAll((schemeModel.getOsType() == SystemTypes.LINUX) ? "\r\n" : "\n",
(schemeModel.getOsType() == SystemTypes.LINUX) ? "\n" : "\r\n"
).getBytes(StandardCharsets.UTF_8);
This approach is not performance wise because of creating new Strings and using regex to find occurrences. I know that manual implementation would require looking at sequence of bytes because of Windows encoding. Manual implementation would therefore also require reallocation (if needed) as well.
Appache common lang utils contains ArrayUtils which contains method
byte[] removeAllOccurrences(byte[] array, byte element). Is there any third party library which contains similar method for replacing ALL byte-arrays/strings occurrences inside byte array ??
Edit: As #saka1029 mentioned in comments, my approach doesn't work for Windows OS type. Because of this bug i need to stick with regexes as following:
(schemeModel.getOsType() == SystemTypes.LINUX) ? "\\r\\n" : "[?:^\\r]\\n",
(schemeModel.getOsType() == SystemTypes.LINUX) ? "\n" : "\r\n")
This way, for windows case, only occurrences of '\n' without preceding '\r' are searched and replaced with '\r\n' (regex is modified to find group at '\n' not at [^\r]\n position directly otherwise last letter from line would be extracted as well). Such workflow cannot be implemented using conventional methods thus invalidates this question.
If you’re reading text, you should treat it as text, not as bytes. Use a BufferedReader to read the lines one by one, and insert your own newline sequences.
String newline = schemeModel.getOsType() == SystemTypes.LINUX ? "\n" : "\r\n";
OutputStream out = /* ... */;
try (Writer writer = new BufferedWriter(
new OutputStreamWriter(out, StandardCharsets.UTF_8));
BufferedReader reader = new BufferedReader(
new InputStreamReader(
resourceLoader.getResource("classpath:File.txt").getInputStream(),
StandardCharsets.UTF_8))) {
String line;
while ((line = reader.readLine()) != null) {
writer.write(line);
writer.write(newline);
}
}
No byte array needed, and you are using only a small amount of memory—the amount needed to hold the largest line encountered. (I rarely see text with a line longer than one kilobyte, but even one megabyte would be a pretty small memory requirement.)
If you are “fixing” zip entries, the OutputStream can be a ZipOutputStream pointing to a new ZipEntry:
String newline = schemeModel.getOsType() == SystemTypes.LINUX ? "\n" : "\r\n";
ZipInputStream oldZip = /* ... */;
ZipOutputStream newZip = /* ... */;
ZipEntry entry;
while ((entry = oldZip.getNextEntry()) != null) {
newZip.putNextEntry(entry);
// We only want to fix line endings in text files.
if (!entry.getName().matches(".*\\." +
"(?i:txt|x?html?|xml|json|[ch]|cpp|cs|py|java|properties|jsp)")) {
oldZip.transferTo(newZip);
continue;
}
Writer writer = new BufferedWriter(
new OutputStreamWriter(newZip, StandardCharsets.UTF_8));
BufferedReader reader = new BufferedReader(
new InputStreamReader(oldZip, StandardCharsets.UTF_8));
String line;
while ((line = reader.readLine()) != null) {
writer.write(line);
writer.write(newline);
}
writer.flush();
}
Some notes:
Are you deliberately ignoring Macs (and other operating systems which are neither Windows nor Linux)? You should assume \n for everything except Windows. That is, schemeModel.getOsType() == SystemTypes.WINDOWS ? "\r\n" : "\n"
Your code contains new String(byteArray) which assumes the bytes of your resource use the default Charset of the system on which your program is running. I suspect this is not what you intended; I have added StandardCharsets.UTF_8 to the construction of the InputStreamReader to address this. If you really meant to read the bytes using the default Charset, you can remove that second constructor argument.
I have several XML files ( in size of GBs ) that are to be converted to JSON. I am easily able to convert small sized files ( in KiloBytes ) using the JSON library ( org.json - https://mvnrepository.com/artifact/org.json/json/20180813 ).
Here's the code that i am using
static String line="",str="";
BufferedReader br = new BufferedReader(new FileReader(link));
FileWriter fw = new FileWriter(outputlink);
JSONObject jsondata = null;
while ((line = br.readLine()) != null)
{
str+=line;
}
jsondata = XML.toJSONObject(str);
But the large files ( even the <100 MB ones ) are taking too long to process and the larger ones are throwing java.lang.OutOfMemoryError: Java heap space. So, how to optimize the code to process large files ( or any other approach/library ).
UPDATE
I have updated the code and I am writing XML into JSON segment by segment
My XML :
<PubmedArticleSet>
<PubmedArticle>
</PubmedArticle>
<PubmedArticle>
</PubmedArticle>
...
</PubmedArticleSet>
So I am ignoring the root node <PubmedArticleSet> ( I will add it later ) converting each <PubmedArticle> </PubmedArticle> to JSON and writing at a time
br = new BufferedReader(new FileReader(link));
fw = new FileWriter(outputlink,true);
StringBuilder str = new StringBuilder();
br.readLine(); // to skip the first three lines and the root
br.readLine();
br.readLine();
while ((line = br.readLine()) != null) {
JSONObject jsondata = null;
str.append(line);
System.out.println(str);
if (line.trim().equals("</PubmedArticle>")) { // split here
jsondata = XML.toJSONObject(str.toString());
String jsonPrettyPrintString = jsondata.toString(PRETTY_PRINT_INDENT_FACTOR);
fw.append(jsonPrettyPrintString.toString());
System.out.println("One done"); // One section done
str= new StringBuilder();
}
}
fw.close();
I am no longer getting the HeapError but still the processing is taking hours for ~300 MB range files. Kindly provide any suggestions to speed up this process.
This statement is the main reason that kills your performance:
str+=line;
This causes the allocation, copying and deallocation of numerous of String objects.
You need to use a StringBuilder:
StringBuilder builder = new StringBuilder();
while ( ... ) {
builder.append(line);
}
It may also help (to a lesser extent) to read the file in larger chunks and not line by line.
The IO operation of reading a large file is very time consuming. Try utilizing a library to handle this for you. For example with apache commons IO:
File xmlFile= new File("D:\\path\\file.xml");
String xmlStr= FileUtils.readFileToString(xmlFile, "UTF-8");
JSONObject xmlJson = XML.toJSONObject(xmlStr);
Our application need to read a file with a single line and that single line contains large amount data . What we are doing is that , read the line from file and store it in string and tokenize the string with - and store to a list . From that list some entries are to be checked.
the method is as follows
public bollean checkMessage(String filename){
boolean retBool = true;
LinkedList tokenList;
int size;
String line = "";
try {
File file = new File(filename);
FileInputStream fs = new FileInputStream(file);
InputStreamReader is = new InputStreamReader(fs);
BufferedReader br = new BufferedReader(is);
while ((line = br.readLine()) != null) {
line.trim();
tokenList = tokenizeString(line, "-");
if (tokenList == null) {
retBool = false;
resultMsg = "Error in File.java "
}
if (retBool) {
retBool = checkMessagePart(tokenList);
}
}
}
the error occurs at line , while ((line = br.readLine()) != null)
error is
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at java.io.BufferedReader.readLine(BufferedReader.java:363)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
Actually increasing heapsize didn't work. the size of the file trying to read is more than 1gb. Also tried to read as chunks of bytes , but when adding the read data to StringBuilder or list will again generate the MemoryError
If the problem is that you cannot read the file to a String, then don't do it. Read it token by token by using some other method. The easy one is using Scanner with the right delimiter ("-" in your case). If you find its performance lacking, you could resort to implementing your own version of BufferedReader in which the "lines" are split by that character instead of the normal values.
I'm trying to parse a very large file (~1.2 GB). Some lines of the file are bigger than the maximum allowed String size.
FileReader fileReader = new FileReader(filePath);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while ((line = bufferedReader.readLine()) != null) {
//Do something
}
bufferedReader.close();
Error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
at java.lang.StringBuffer.append(StringBuffer.java:369)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at sax.parser.PrettyPrintXML.format(PrettyPrintXML.java:30)
line 30 :
while ((line = bufferedReader.readLine()) != null) {
Can anyone suggest any alternative approach for this case.
You are using readLine() on a file that doesn't have lines. So it tries to read the entire file as a single lines. This does not scale.
Solution: don't. Read a chunk at a time, or maybe even a character at a time: whatever is dictated by the unstated structure of your file.
I believe maximum string character length is 2^31-1 [2,147,483,647] and 1.2GB txt file(assuming is a txt file) can store about 1,200,000,000 characters. Why do you need to read all the data? What are you using it for? Can you split the file up into several files or read and parse it as a smaller string. Need more info.
You can use Apache commons IO :
https://commons.apache.org/proper/commons-io/description.html
example:
InputStream in = new URL( "http://commons.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}
I have big file (about 30mb) and here the code I use to read data from the file
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line = br.readLine();
while (line != null) {
sb.append(line).append("\n");
line = br.readLine();
}
Then I need to split the content I read, so I use
String[] inst = sb.toString().split("GO");
The problem is that sometimes the sub-string is over the maximum String length and I can't get all the data inside the string. How can I get rid of this?
Thanks
Scanner s = new Scanner(input).useDelimiter("GO"); and use s.next()
WHY PART:- The erroneous result may be the outcome of non contiguous heap segment as the CMS collector doesn't de-fragment memory.
(It does not answer your how to solve part though).
You may opt for loading the whole string partwise, i.e using substring