How to insert, replace and delete regions of a file, in Java?

How to insert, replace and delete regions of a file, in Java? - java

I'm looking for an implemented class (in Java) which handles insertion, replacement and deletion of text for an existing file, just like StringBuilder does.
The reason I do not want to use a StringBuilder is that I want to avoid having all the file content in memory.
The purpose is to apply patches to a file which contains code in any programming language.

Generally, a class should do one thing, and do it well.
That means that it is unlikely you'll find a single class that reads the code, turns it into an internal representation (parse tree), detects the issue at hand, alters the internal representation, and writes the internal representation back out to disk.
With that in mind, there are a number of projects that you might be able to extend to add in your desired functionality.
Checkstyle parses Java code, with the intent of reporting the stylistic errors. To do this it must read the code, turn it into an internal representation, and detect (formatting) issues. It might be a good starting point, depending on your goals.
PMD is a static code analysis tool. For it to find issues in Java source code, it must: read the code, turn it into an internal representation, and detect (structural) issues.
Note that neither of these tools does everything you wish; but, they are close. All you will have to do is construct a "fixer" that runs on the parsed tree, fixing the detected problem. Then you will either need to find if the tool provides an "outputter" that reconstructs the text of the code from the internal parsed (tree) representation, and use it to generate the desired text which you will then save to disk.
If the tree-to-text module doesn't exist, you might have to write it.
Source code is subject to rules, and while you might feel that you don't need these extra steps, your code will have a lifetime much greater than it would have by skipping these steps. Simply pasting in a line of code might not make sense with unexpected input. For example, assuming you add the #Overrides tag to a Java method, this pseudo code will fail
currentLine = next();
if (currentLine.detectMethod() and isAnOverride(currentLine().getMethod())) {
code.insertBefore(currentline, "#Overrides");
}
Because someone will feed your code this
public
void
myMethod(
String one,
String two,
String three)
{
System.out.println("Haha! I broke you!");
}
possibly leading to
public
void
#Overrides
myMethod(
String one,
String two,
String three)
{
System.out.println("Haha! I broke you!");
}
And you can say "Well, nobody should do that!" But, if the language permits it, then you'll be at odds with the language itself.
If you don't believe me, a line-by-line processor would not detect "public" as a method, nor "void" as a method, but would detect "myMethod(" as the beginning of a (misidentified) package private multi-line method.

Related

How can I override default failure message of a test in google.Truth?

I am writing a test that assert that a document does not contain a specific String. When the test fails, it prints the 'actual' value in the form
expected not to contain a match for: my_regex
but was : a huge document that is unreadable
The document is very long. It would be preferable to not print it and just print the name of the document. I tried assertWithMessage() but it only adds a message, not replace the default one.

Sorry, we've considered providing this feature occasionally but not pulled the trigger.
For starters, it would often make the assertion statement longer than writing the check yourself. Compare:
assertThat(doc.matches(".*the-regex.*")).isTrue();
assertThat(doc).displayedAs("the doc").containsMatch("the-regex");
(To be fair, there are cases in which it's not so easy to write the check yourself.)
And anyway, much of the goal of Truth is to produce informative failure messages. In cases in which people have good reasons to leave that information out, they can fall back to isTrue() assertions.
(To be fair again, the isTrue() failure produces basically no useful message, whereas you'd like to have "expected not to contain a match for: my_regex." You can of course add it back with assertWithMessage, as you've said, but now your assertion statement is getting long again, and you have to repeat "my_regex" if you want it in the message.)
(Plus, it's nice to be able to always write the assertion in the idiomatic Truth way, rather than switching to non-idiomatic when you want the override the message.)
As noted in all the parentheticals above, though, this feature would have its uses. The "real" concerns are mainly:
API size. Consider also that some people want to omit different parts of the message, so they might desire more than one method.
People may call this method by mistake, accidentally throwing information away.
There's a related feature request here, which is for Truth to truncate values after a certain length. We've actually gotten feedback complaining about cases in which we do truncate, so there's a balance we need to strike here :) But it seems reasonable for us to provide some kind of configurable limit, perhaps based on a system property. I invite you to file an issue (and another for the "override default failure message" thing, if you'd like, even if I suspect we won't do it), though I should warn you that the next quarter or two are probably not going to see a lot of Truth development.

Actually, I forgot: Contrary to what I said in my other answer, there's actually something of a way to do this: Extend StringSubject to override the string representation, and use your custom subject:
public static StringSubject assertThatAbbreviatedString(String actual) {
return assertAbout(abbreviatedStrings()).that(actual);
}
public static Subject.Factory<StringSubject, String> abbreviatedStrings() {
return AbbreviatedStringSubject::new;
}
private static final class AbbreviatedStringSubject extends StringSubject {
AbbreviatedStringSubject(FailureMetadata metadata, String actual) {
super(metadata, actual);
}
#Override
protected String actualCustomStringRepresentation() {
return "<actual value omitted>";
// [Edit: Or maybe you can extract the title from the doc and return that?]
}
}
That enables you to write:
assertThatAbbreviatedString("abcdefghijklmnopqrstuvwyz").containsMatch("foo");
And the output is something like:
expected to contain a match for: foo
but was : <actual value omitted>
If you want to be able to plug in a specific name, rather than <actual value omitted>, the simplest thing is probably to use assertWithMessage(...).about(...).that(...), which you can again wrap in a helper method. (If assertWithMessage is a poor fit for some reason, there's at least one other approach I could get into.)

Is there a way to place a mark in bytecode?

What I am trying to do: I want to have a pre-compiled java byte-code file, and be able to place a "mark" in some places. Later I want to analyze this file using ASM and replace mark with some code. So, how can I implement this? Currently I am trying to do it, by inserting invocations of empty static method, but I still feeling like I am doing something wrong. Is there a better way to do this?
P.S. If more general, I want to have some precompiled class template, for example:
public class Main {
public static void Main(String... args){
System.out.println("Program starts!");
//I want to insert code here
System.out.println("Bye!");
}}

There is no Java statement without a predefined meaning, well, maybe with the exception of the empty statement ; which doesn’t create code that you can find in the byte code. There are annotations, but these can only be used to mark another code fragment, not to create a stand-alone statement within your code.
So you have to choose a statement to assign it the meaning of being a mark in your template code and your solution of using an invocation of a dedicated empty method is a perfect candidate for such a mark. Since it’s new meaning does not rely on the kind of statement but on the target method which resides in a class whose name is distinguishable from all other classes, there is no conflict between your mark and other statements.
But you should consider that the framing class code is rather trivial compared to the code you will generate when implementing a compiler for any non trivial language. In most cases, the logic of patching the generated code into an existing code will exceed the complexity of just generating a complete class file.
If you really have large pieces of unchanging code you should consider placing them into their own classes and generate classes using or extending them. This simplifies the code generation and avoids code duplication (the same reason why these techniques are used in manually written code).

How do I ensure the format for saving and parsing string representations of Objects correlate properly

I am making a small boardgame program which needs to persist the state of the board to a file, and later read from the file and re-create the board.
I am delegating this functionality to the class shown below. I would like to implement this such that the save format of a square of the board along with it's co-ordinates are captured in the SQUARE_FORMAT constant, and the regex for reading that same information is captured in the LOAD_REGEX constant. Both should co-relate in code and also be able to visually decipher (by that I mean that a person should be able to clearly see that they co-relate to the same data)
Is there an idiom or pattern for doing this in Java code ?
public class BoardPersistenceUtility {
private final String SQUARE_SAVE_FORMAT = "";
private fial String LOAD_REGEX = "";
public void save(PrintWriter writer, Board board) {
}
public Board load(BufferedReader reader) {
// Implement
return null;
}
}
Update 1:
On reading my question again, I guess it might be a bit confusing, about what exactly I am looking for. I am specifically looking for the right way to represent SQUARE_SAVE_FORMAT so that it clearly co-relates with the regex LOAD_REGEX.
SQUARE_SAVE_FORMAT would ideally be a String which uses special characters/variables that will be replaced with actual values and the result will be saved to a file. LOAD_REGEX is the corresponding regex that will be used to read contents from the file. The regex will use capturing groups so I can re-create the original object from the values I get from the capturing groups.
My question is, what are the idioms around creating such pairs of Strings - one of them a format string to be used for saving data, and the other a regex to be used while reading that data.
Update 2:
On thinking a bit more, I think I have been able to clarify my question a bit better.
If you look at both the Strings, SQUARE_SAVE_FORMAT is a format string which will be used in String.format() to create the text for a square on the board, which will be saved in the file. The constant SQUARE_LOAD_REGEX is a regex which will be used to read the line and capture relevant parts into named groups, so I can re-create the original object. (sorry if my regex is slightly incorrect... I quickly wrote something, but I need to refresh some regex principles to ensure that this is indeed what I need)
If you look at both these Strings visually, it is difficult to co-relate them together. Perhaps it is because we do not have any named variables in a Java format String. The best we can do is to specify %i where i is the index of the argument.
I would like to understand if there is any idiom or pattern to represent such pairs of Strings, where one is used for formatting some data to text and the other is used to read the same text and parse it's parts.
public class BoardPersistenceUtility {
private final String SQUARE_SAVE_FORMAT =
"%d,%d:%b-%s";
private final String SQUARE_LOAD_REGEX =
"^(?<row>\d*),(?<col>\d*):(?<mine>true|false)-(?<status>\w)$";
public void save(PrintWriter writer, Board board) {
}
public Board load(BufferedReader reader) {
// Implement
return null;
}
}

Note: you call SQUARE_SAVE_FORMAT and LOAD_REGEX "constants" which they are not, as you haven't declared them static final. It's better to keep terminology clear :-)
The simplest way to link these two is to define a class which encloses both as (final) fields. If you plan to define multiple such pairs of information, you can define multiple instances of the class, one for each type of format.
If you really want to keep these as constants, it may be best to define the enclosing class as an enum. Note that Java enums may contain methods too, so you may choose to implement the save/load logic as Strategies in the enum instances themselves, and call these polymorphically, which may help simplify your code.

I'm still not sure what you mean, but need formatting, so answer instead of comment.
First of all, the names are almost completely unrelated--related them somehow.
SQUARE_DATA_STORE
SQUARE_DATA_REGEX
Second, there's no point in differentiating the "style" of the saved data if there's only a single BoardPersistenceUtility--if there were multiple formats then that information would be captured in a persistence utility subclass, like SquareFormatPersister or something.
Third, according to your text, one string is where the data will actually be stored. The other is a regular expression. The two will, in this case, never be "visually similar"--regular expressions of any complexity will never (much) look like the strings they can represent. (In this case, we have no clue, because we don't know what the board data can look like, of course.)
If your code is so non-self-explanatory that the reader can't figure out the two fields are related through via your comments and your code, something has gone horribly wrong. I'm having a hard time imagining this code is so overwhelmingly complex that their relationship cannot be trivially communicated.
Edit after update
The answer is still no.
You could use a templating mechanism to provide names for the fields, similar to those used in your regex. This might also make the code a bit more self-explanatory as you'd fill the template context with named values (like "row" or "col").
You could use a real parser/generator, but the complexity there is a bit too much.
You could use a DSL (internal using Groovy, JRuby, JavaScript, etc. or external, which brings us back to parsing) and write chunks of the code that way.
IMO you're over-thinking, and over-estimating perceived complexity: except possibly for the templating solution, which IMO is likely over-engineering for the level of difficulty, you'd be far better off writing one or two sentences, which should be more than enough to relate the "fields" of the load and save formats.

Put comments in your code to explain that they're related, how they're related, what they're used for, and that if one is changed, the other should be modified accordingly.
Implement a unit test to make sure that a saved board can be loaded.
Make sure that your build and release process runs the unit tests, and fails if one of them doesn't pass.

Best choice? Edit bytecode (asm) or edit java file before compiling

Goal
Detecting where comparisons between and copies of variables are made
Inject code near the line where the operation has happened
The purpose of the code: everytime the class is ran make a counter increase
General purpose: count the amount of comparisons and copies made after execution with certain parameters
2 options
Note: I always have a .java file to begin with
1) Edit java file
Find comparisons with regex and inject pieces of code near the line
And then compile the class (My application uses JavaCompiler)
2)Use ASM Bytecode engineering
Also detecting where the events i want to track and inject pieces into the bytecode
And then use the (already compiled but modified) class
My Question
What is the best/cleanest way? Is there a better way to do this?

If you go for the Java route, you don't want to use regexes -- you want a real java parser. So that may influence your decision. Mind, the Oracle JVM includes one, as part of their internal private classes that implement the java compiler, so you don't actually have to write one yourself if you don't want to. But decoding the Oracle AST is not a 5 minute task either. And, of course, using that is not portable if that's important.
If you go the ASM route, the bytecode will initially be easier to analyze, since the semantics are a lot simpler. Whether the simplicity of analyses outweighs the unfamiliarity is unknown in terms of net time to your solution. In the end, in terms of generated code, neither is "better".
There is an apparent simplicity of just looking at generated java source code and "knowing" that What You See Is What You Get vs doing primitive dumps of class files for debugging and etc., but all that apparently simplicity is there because of your already existing comfortability with the Java lanaguage. Once you spend some time dredging through byte code that, too, will become comfortable. Just a question whether it's worth the time to you to get there in the first place.

Generally it all depends how comfortable you are with either option and how critical is performance aspect. The bytecode manipulation will be much faster and somewhat simpler, but you'll have to understand how bytecode works and how to use ASM framework.
Intercepting variable access is probably one of the simplest use cases for ASM. You could find a few more complex scenarios in this AOSD'07 paper.
Here is simplified code for intercepting variable access:
ClassReader cr = ...;
ClassWriter cw = ...;
cr.accept(new MethodVisitor(cw) {
public void visitVarInsn(int opcode, int var) {
if(opcode == ALOAD) { // loading Object var
... insert method call
}
}
});

If it was me i'd probably use the ASM option.
If you need a tutorial on ASM I stumbled upon this user-written tutorial click here

How do I generate the source code to create an object I'm debugging?

Typical scenario for me:
The legacy code I work on has a bug that only a client in production is having
I attach a debugger and figure out how to reproduce the issue on their system given their input. But, I don't know why the error is happening yet.
Now I want to write an automated test on my local system to try and reproduce then fix the bug
That last step is really hard. The input can be very complex and have a lot of data to it. Creating the input by hand (eg: P p = new P(); p.setX("x"); p.setY("x"); imagine doing this 1000 times to create the object) is very tedious and error prone. In fact you may notice there's a typo in the example I just gave.
Is there an automated way to take a field from a break point in my debugger and generate source code that would create that object, populated the same way?
The only thing I've come up with is to serialize this input (using Xstream, for example). I can save that to a file and read it back in in an automated test. This has a major problem: If the class changes in certain ways (eg: a field/getter/setter name is renamed), I won't be able to deserialize the object anymore. In other words, the tests are extremely fragile.

Java standard serialisation is well know to be not very usefull when objects change their version ( content, naming of fields). Its fine for quick demo projects.
More suitable for your needs, is the approach that objetcs support your own (binary) custom serialisation:
This is not difficult, use DataOutputStream to write out all fields of an object. But now introduce versiong, by first writing out a versionId. Objects that have only one version, write out versionId 1. That way you can later, when you have to introduce a change in your objetcs, remove fields, add fields, raise the version number.
Such a ICustomSerializable will then first read out the version number from the input stream, in a readObject() method, and depending on the version Id call readVersionV1() or e.g readVersionV2().
public Interface ICustomSerializable {
void writeObject(DataOutputStream dos);
Object readObject(DataInputStream dis);
}
public Class Foo {
public static final VERSION_V1 = 1;
public static final VERSION_V2 = 2;
public static final CURRENT_VERSION = VERSION_V2;
private int version;
private int fooNumber;
private double fooDouble;
public void writeObject(DataOutputStream dos) {
dos.writeInt(this.version);
if (version == VERSION_V1) {
writeVersionV1(dos);
} else (version == VERSION_V2) {
writeVersionV2(dos);
} else {
throw new IllegalFormatException("unkown version: " + this.version);
}
}
public void writeVersionV1(DataOutputStream dos) {
writeInt(this.fooNumber);
writeDouble(this.fooValue);
}
}
Further getter and setter, and a constructor with initialised the version to CURRENT_VERSION is needed.
This kind of serialisazion is safe to refactoring if you change or add also the appropriate read and write version. For complex objects using classes from external libs not und your controll, it can be more work, but strings, lists are easily serialized.

I think what you want to do is store the "state", and then restore that in your test to ensure the bug stays fixed.
Short answer: There is afaik no such general code generation tool, but as long as several constraints are kept, writing such a tool is small work.
Long Comment:
There are constraints under which that can work. If everything is just beans with getter and setter for all the fields you need, then generating code for this is not so difficult. And yes that would be safe to renaming if you refactor the generated code along with the normal code. If setter are missing, then this approach will not work. And that is only one example of why this is no general solution.
Refactoring can also for example move fields to other classes. How do you want to introduce the values from the other fields of that class? How can you later know if they that altered your saved state still reflects the critical data? Or worse, imagine the refactoring gives the same field a different meaning than before.
The nature of the bug itself is also a constraint. Imagine for example the bug happened because a field/method had this and that name. If a refactoring now changes the name the bug will not appear anymore regardless your state.
Those are just arbitrary examples, that may have exactly nothing to do with your real life cases. But this is a case to case decision, not a general strategy. Anyway, if you know your code the bug and your refactorings are all well behaving enough for this, then making such a tool is done in less than day, probably much less.
With xstream you would partially get this as well, but you would have to change the xml yourself. If you used for example db4o you would have to tell it that this and that field has now this and that name.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.