I need to modify a file. We've already written a reasonably complex component to build sets of indexes describing where interesting things are in this file, but now I need to edit this file using that set of indexes and that's proving difficult.
Specifically, my dream API is something like this
//if you'll let me use kotlin for a second, assume we have a simple tuple class
data class IdentifiedCharacterSubsequence { val indexOfFirstChar : int, val existingContent : String }
//given these two structures
List<IdentifiedCharacterSubsequences> interestingSpotsInFile = scanFileAsPerExistingBusinessLogic(file, businessObjects);
Map<IdentifiedCharacterSubsequences, String> newContentByPreviousContentsLocation = generateNewValues(inbterestingSpotsInFile, moreBusinessObjects);
//I want something like this:
try(MutableFile mutableFile = new com.maybeGoogle.orApache.MutableFile(file)){
for(IdentifiedCharacterSubsequences seqToReplace : interestingSpotsInFile){
String newContent = newContentByPreviousContentsLocation.get(seqToReplace);
mutableFile.replace(seqToReplace.indexOfFirstChar, seqtoReplace.existingContent.length, newContent);
//very similar to StringBuilder interface
//'enqueues' data changes in memory, doesnt actually modify file until flush call...
}
mutableFile.flush();
// ...at which point a single write-pass is made.
// assumption: changes will change many small regions of text (instead of large portions of text)
// -> buffering makes sense
}
Some notes:
I cant use RandomAccessFile because my changes are not in-place (the length of newContent may be longer or shorter than that of seq.existingContent)
The files are often many megabytes big, thus simply reading the whole thing into memory and modifying it as an array is not appropriate.
Does something like this exist or am I reduced to writing my own implementation using BufferedWriters and the like? It seems like such an obvious evolution from io.Streams for a language which typically emphasizes indexed based behaviour heavily, but I cant find an existing implementation.
Lastly: I have very little domain experience with files and encoding schemes, so I have taken no effort to address the 'two-index' character described in questions like these: Java charAt used with characters that have two code units. Any help on this front is much appreciated. Is this perhaps why I'm having trouble finding an implementation like this? Because indexes in UTF-8 encoded files are so pesky and bug-prone?
I have a file with several hundreds of stopwords. I want to be able to check if the file has been modified by a user for example or even if it is corrupted.
The way I am thinking of doing it currently is by looking if the number of lines is correct. I could also check if the total number of characters is the one expected or even have the whole stopwords list loaded in memory to check if every single one of them is in the file. All 3 of the ways I thought of seem inefficient and/or bad so I thought of asking if there is any better way of doing it.
What I am thinking of implementing:
private static final int WORD_COUNT = 354;
public static boolean stopwordsCorrupted(File file) {
int numOfLines = countLines(file);
return WORD_COUNT != numOfLines;
}
Check out this: http://en.wikipedia.org/wiki/Checksum This uses the hashfuntion of the file to check if no alterations have been made
Here you also have an example of how to use it.
Java WatchService API might be helpful for your problem.
For a school assignment, I need to create a Simulation for memory accesses. First I need to read 1 or more trace files. Each contains memory addresses for each access. Example:
0 F001CBAD
2 EEECA89F
0 EBC17910
...
Where the first integer indicates a read/write etc. then the hex memory address follows. With this data, I am supposed to run a simulation. So the idea I had was parse these data into an ArrayList<Trace> (for now I am using Java) with trace being a simple class containing the memory address and the access type (just a String and an integer). After which I plan to loop through these array lists to process them.
The problem is even at parsing, it running out of heap space. Each trace file is ~200MB. I have up to 8. Meaning minimum of ~1.6 GB of data I am trying to "cache"? What baffles me is I am only parsing 1 file and java is using 2GB according to my task manager ...
What is a better way of doing this?
A code snippet can be found at Code Review
The answer I gave on codereview is the same one you should use here .....
But, because duplication appears to be OK, I'll duplicate the answer here.
The issue is almost certainly in the structure of your Trace class, and it's memory efficiency. You should ensure that the instrType and hexAddress are stored as memory efficient structures. The instrType appears to be an int, which is good, but just make sure that it is declared as an int in the Trace class.
The more likely problem is the size of the hexAddress String. You may not realise it but Strings are notorious for 'leaking' memory. In this case, you have a line and you think you are just getting the hexString from it... but in reality, the hexString contains the entire line.... yeah, really. For example, look at the following code:
public class SToken {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("99 bottles of beer");
int instrType = Integer.parseInt(tokenizer.nextToken());
String hexAddr = tokenizer.nextToken();
System.out.println(instrType + hexAddr);
}
}
Now, set a break-point in (I use eclipse) your IDE, and then run it, and you will see that hexAddr contains a char[] array for the entire line, and it has an offset of 3 and a count of 7.
Because of the way that String substring and other constructs work, they can consume huge amounts of memory for short strings... (in theory that memory is shared with other strings though). As a consequence, you are essentially storing the entire file in memory!!!!
At a minimum, you should change your code to:
hexAddr = new String(tokenizer.nextToken().toCharArray());
But even better would be:
long hexAddr = parseHexAddress(tokenizer.nextToken());
Like rolfl I answered your question in the code review. The biggest issue, to me, is the reading everything into memory first and then processing. You need to read a fixed amount, process that, and repeat until finished.
Try use class java.nio.ByteBuffer instead of java.util.ArrayList<Trace>. It should also reduce the memory usage.
class TraceList {
private ByteBuffer buffer;
public TraceList(){
//allocate byte buffer
}
public void put(byte operationType, int addres) {
//put data to byte buffer
}
public Trace get(int index) {
//get data from byte buffer by index
byte type = ...//read type
int addres = ...//read addres
return new Trace(type, addres)
}
}
I have a piece logging and tracing related code, which called often throughout the code, especially when tracing is switched on. StringBuilder is used to build a String. Strings have reasonable maximum length, I suppose in the order of hundreds of chars.
Question: Is there existing library to do something like this:
// in reality, StringBuilder is final,
// would have to create delegated version instead,
// which is quite a big class because of all the append() overloads
public class SmarterBuilder extends StringBuilder {
private final AtomicInteger capRef;
SmarterBuilder(AtomicInteger capRef) {
int len = capRef.get();
// optionally save memory with expense of worst-case resizes:
// len = len * 3 / 4;
super(len);
this.capRef = capRef;
}
public syncCap() {
// call when string is fully built
int cap;
do {
cap = capRef.get();
if (cap >= length()) break;
} while (!capRef.compareAndSet(cap, length());
}
}
To take advantage of this, my logging-related class would have a shared capRef variable with suitable scope.
(Bonus Question: I'm curious, is it possible to do syncCap() without looping?)
Motivation: I know default length of StringBuilder is always too little. I could (and currently do) throw in an ad-hoc intitial capacity value of 100, which results in resize in some number of cases, but not always. However, I do not like magic numbers in the source code, and this feature is a case of "optimize once, use in every project".
Make sure you do the performance measurements to make sure you really are getting some benefit for the extra work.
As an alternative to a StringBuilder-like class, consider a StringBuilderFactory. It could provide two static methods, one to get a StringBuilder, and the other to be called when you finish building a string. You could pass it a StringBuilder as argument, and it would record the length. The getStringBuilder method would use statistics recorded by the other method to choose the initial size.
There are two ways you could avoid looping in syncCap:
Synchronize.
Ignore failures.
The argument for ignoring failures in this situation is that you only need a random sampling of the actual lengths. If another thread is updating at the same time you are getting an up-to-date view of the string lengths anyway.
You could store the string length of each string in a statistic array. run your app, and at shutdown you take the 90% quartil of your string length (sort all str length values, and take the length value at array pos = sortedStrings.size() * 0,9
That way you created an intial string builder size where 90% of your strings will fit in.
Update
The value could be hard coded (like java does for value 10 in ArrayList), or read from a config file, or calclualted automatically in a test phase. But the quartile calculation is not for free, so best you run your project some time, measure the 90% quartil on the fly inside the SmartBuilder, output the 90% quartil from time to time, and later change the property file to use the value.
That way you would get optimal results for each project.
Or if you go one step further: Let your smart Builder update that value from time to time in the config file.
But this all is not worth the effort, you would do that only for data that have some millions entries, like digital road maps, etc.
Is there any maximum size for code in Java? I wrote a function with more than 10,000 lines. Actually, each line assigns a value to an array variable.
arts_bag[10792]="newyorkartworld";
arts_bag[10793]="leningradschool";
arts_bag[10794]="mailart";
arts_bag[10795]="artspan";
arts_bag[10796]="watercolor";
arts_bag[10797]="sculptures";
arts_bag[10798]="stonesculpture";
And while compiling, I get this error: code too large
How do I overcome this?
A single method in a Java class may be at most 64KB of bytecode.
But you should clean this up!
Use .properties file to store this data, and load it via java.util.Properties
You can do this by placing the .properties file on your classpath, and use:
Properties properties = new Properties();
InputStream inputStream = getClass().getResourceAsStream("yourfile.properties");
properties.load(inputStream);
There is a 64K byte-code size limit on a method
Having said that, I have to agree w/Richard; why do you need a method that large? Given the example in the OP, a properties file should suffice ... or even a database if required.
According to the Java Virtual Machine specification, the code of a method must not be bigger than 65536 bytes:
The value of the code_length item gives the number of bytes in the code array for this method.
The value of code_length must be greater than zero (as the code array must not be empty) and less than 65536.
code_length defines the size of the code[] attribute which contains the actual bytecode of a method:
The code array gives the actual bytes of Java Virtual Machine code that implement the method.
This seems a bit like madness. Can you not initialize the array by reading the values from a text file, or some other data source?
This error sometimes occur due to too large code in a single function...
To solve that error, split that function in multiple functions, like
//Too large code function
private void mySingleFunction(){
.
.
2000 lines of code
}
//To solve the problem
private void mySingleFunction_1(){
.
.
500 lines of code
}
private void mySingleFunction_2(){
.
.
500 lines of code
}
private void mySingleFunction_3(){
.
.
500 lines of code
}
private void mySingleFunction_4(){
.
.
500 lines of code
}
private void MySingleFunction(){
mySingleFunction_1();
mySingleFunction_2();
mySingleFunction_3();
mySingleFunction_4();
}
Try to refactor your code. There is limit on the size of method in Java.
As mentioned in other answers there is a 64KB of bytecode limit for a method (at least in Sun's java compiler)
Too me it would make more sense to break that method up into more methods - each assigning certain related stuff to the array (might make more sense to use a ArrayList to do this)
for example:
public void addArrayItems()
{
addSculptureItems(list);
...
}
public void addSculptureItems(ArrayList list)
{
list.add("sculptures");
list.add("stonesculpture");
}
Alternatively you could load the items from a static resource if they are fixed like from a properties file
I have run into this problem myself. The solution that worked for me was to refactor and shrink the method to more manageable pieces. Like you, I am dealing with a nearly 10K line method. However, with the use of static variables as well as smaller modular functions, the problem was resolved.
Seems there would be a better workaround, but using Java 8, there is none...
I came to this question because I was trying to solve a similar problem. I wanted to hard code a graph that had 1600 elements into a 2D integer array for performance reasons. I was solving a problem on a leetcode style website and loading the graph data from a file was not an option. The entire graph exceeded the 64K maximum so I could not do a single static run of assignments. I split the assignments across several static methods each below the limit and then called each method one by one.
private static int[][] G = new int[1601][];
static {
assignFirst250();
assignSecond250();
assignSecond500();
assignThird500();
}
private static void assignFirst250() {
G[1] = new int[]{3,8,15,24,35,48,63,80,99,120,143,168,195,224,255,288,323,360,399,440,483,528,575,624,675,728,783,840,899,960,1023,1088,1155,1224,1295,1368,1443,1520,1599};
G[2] = new int[]{2,7,14,23,34,47,62,79,98,119,142,167,194,223,254,287,322,359,398,439,482,527,574,623,674,727,782,839,898,959,1022,1087,1154,1223,1294,1367,1442,1519,1598};
You can add another method to create space for your code for additional data space, you might have a method that is taking a large amount of data space. Try dividing your methods because I had the same issue and and fix it by creating another an additional method for the same data in my java Android code, The issue was gone after I did that.
I have an enum that causes the .java file to be over 500KB in size. Eclipse can build it for some reason; the eclipse-exported ant build.xml cannot. I'm looking into this and will update this post.
this is due to all code in single methods
solution :create more some small methods then this error will be gone
As there is a size limit for methods and you don't want to redesign your code as this moment, may be you can split the array into 4-5 parts and then put them into different methods. At the time of reading the array, call all the methods in a series. You may maintain a counter as well to know how many indexes you have parsed.
ok maybe this answer is too late but I think this way is better than another way so
for example, we have 1000 rows data in code
break them
private void rows500() {
//you shoud write 1-500 rows here
}
private void rows1000() {
you shoud write 500-1000 rows here
}
for better performance put an "if" in your codes
if (count < 500) {
rows500();
} else if (count > 500) {
rows1000();
}
I hope this code helps you