The world of Minecraft Modding has made me curious about differences in mechanisms between Java and C/C++ libraries to allow methods / functions in the libraries to be invoked externally.
My understanding is that Minecraft Modding came about due to the ability to decompile / reflect over Java in order to reverse engineer classes and methods that can be invoked from the library. I believe that the Java class specification includes quite a lot of metadata about the structure of classes allowing code to be used in ways other than intended.
There are some obfuscation tools around that try to make it harder to reverse engineer Java but overall it seems to be quite difficult to prevent.
I don't have the depth of knowledge in C/C++ to know to what degree the same can be done there.
For C/C++ code is compiled natively ahead of time. The end result is an assembly of machine code specific for that platform. C/C++ has the notion of externalising functions so that they can be exposed from outside the library or executable. Some libraries also have an entry point.
Typically when connecting to external functions there is a header file to list what functions are available to code against from the library.
I would assume there would need to be a mechanism to map an exposed function to the address within the library / executable machine code assembly so the function calls get made in the right place.
Typically connecting the function calls together with the address is the job of the linker. The linker still needs to somehow know where to find these functions.
This makes me wonder if it is fundamentally possible to invoke non exported functions. If so would this require the ability to locate their address and understand their parameter format?
function calls in C/C++ as I understand it is typically done by assigning the parameters to registers for simple functions or to an argument array for more complex functions.
I don't know if the practice of invoking non-public API's in native code is common
or if the inherent difficulty in doing so makes native code pretty safe from this kind of use.
First of all, there are tools (of varying quality and capabilities) to reverse engineer compiled machine-code back to the original language [or another language, for that matter]. The biggest problem when doing this is that languages such as C and C++, the names of members in a structure don't have names, and often become "flat", so what is originally:
struct user
{
std::string name;
int age;
int score;
};
will become:
struct s0
{
char *f0;
char *f1;
int f2;
int f3;
};
[Note of course that std::string may be implemented in a dozen different ways, and the "two pointers" is just one plausible variant]
Of course, if there is a header file describing how the library works, you can use the data structures in that to get better type information. Likewise, if there is debug information in the file, it can be used to form data structures and variable names in a much better way. But someone who wants to keep these things private will (most often) not ship the code with debug symbols, and only publish the actual necessary parts to call the public functionality.
But if you understand how these are used [or read some code that for example displayed a "user", you can figure out what is the name, the age and what is the score.
Understanding what is an array and what is separate fields can also be difficult. Which is it:
struct
{
int x, y, z;
};
or
int arr[3];
Several years ago, I started on a patience card game (Similar to "Solitaire"). To do that, I needed a way to display cards on the screen. So I thought "well, there's one for the existing Solitaire on Windows, I bet I can figure out how to use that", and indeed, I did. I could draw the Queen of Clubs or Two of Spades, as I wished. I never finished the actual game-play part, but I certainly managed to load the card-drawing functionality from a non-public shared library. Not rocket science by any means (there are people who do this for commercial games with thousands of functions and really complex data structures - this had two or three functions that you needed to call), but I didn't spend much time on it either, a couple of hours if I remember right, from coming up with the idea to having something that "works".
But for the second part of your question, plugin-interfaces (such as filter plugins to Photoshop, or transitions in video editors), are very often implemented as "shared libraries" (aka "dynamic link libraries", DLLs).
There are functions in the OS to load a shared library into memory, and to query for functions by their name. The interface of these functions is (typically) pre-defined, so a function pointer prototype in a header-file can be used to form the actual call.
As long as the compiler for the shared library and the application code are using the the same ABI (application binary interface), all should work out when it comes to how arguments are passed from the caller to the function - it's not like the compiler just randomly uses whatever register it fancies, the parameters are passed in a well-defined order and which register is used for what is defined by the ABI specification for a given processor architecture. [It gets further more complicated if you have to know the contents of data structures, and there are different versions of such structures - say for example someone has a std::string that contains two pointers (start and end), and for whatever reason, the design is changed to be one pointer and a length - both the application code and the shared library need to be compiled with the same version of std::string, or bad things will happen!]
Non-public API functions CAN be called, but they wouldn't be discoverable by calling the query for finding a function by name - you'd have to figure out some other way - for example by knowing that "this function is 132 bytes on from the function XYZ", and of course, you wouldn't have the function prototype either.
There is of course the added complication where Java Bytecode is portable for many different processor architectures, machine code only works on a defined set of processors - code for x86 works on Intel and AMD processors (and maybe a few others), code for ARM processors work in chips developed with the ARM instruction set, and so on. You have to compile the C or C++ code for the given process.
Related
I'm a Java developer but I've recently begun learning Angular2/Typescript. I've worked with Angular 1.x before so I'm not a complete noob :)
While working through a POC with a RESTful Spring Boot back end and Angular2 front end I noticed myself duplicating model objects on both sides a lot e.g.
Java Object
public class Car {
private Double numSeats;
private Double numDoors;
.....
}
Now in interest of Typescript and being strongly typed I'd create a similar object within my front end project:
export interface PersonalDetailsVO {
numSeats : number;
numDoors : number;
}
I'm duplicating the work and constantly violating the DRY (Don't Repeat Yourself) principle here.
I'm wondering is there a better way of going about this. I was thinking about code generation tools like jSweet but interested to hear if anyone else has come across the same issue and how they approached it.
There are two schools of thought on whether this is a violation of the DRY principle. If you're really, really sure that there's a natural mapping you would always apply to bind json in each language, then you could say that it is duplicate work; which is (at least part of) the thinking behind IDL-type languages in technologies like CORBA (but I'm showing my age).
OTOH maybe each system (the server, the client, an alternate client if anyone were to write one) should be free to independently define the internal representations of objects that is best suited to that system (given its language, what it plans to do, etc.).
In your example, the typescript certainly doesn't contain all of the information needed to define the Java "equivalent". ('number' could map to a lot of things; and the typescript says nothing about access modifiers...) Of course you can narrow that down by adopting conventions, but my point is it's not self-evident that there'd be a 1-to-1 mapping.
Maybe one language handles references more gracefully than another. Maybe one can't deal with circular references but the other can. Maybe one has reason to prefer a more flat view of the object. Maybe a lot of things.
All of that said, it certainly is true that if you modify the json structure of an object, and you're maintaining each system's internal representation independently, then you likely have to make code changes in multiple places to accommodate that single underlying change. And pragmatically, if that can be avoided it's a good thing.
So if you can come up with a code generator that processes the more expressive language's representation to create a representation for the less expressive language, and maybe at least use that by default, you may find it's not a bad thing for your project.
A function may work on CPU another on GPU, but both do the same job.
Sure, you want to use the GPU-solution (assuming it's faster), but it is not available on -for example- let's say the older OpenGL version.
Instead of programming a check (if available then use "this" else "that") on each function call, you may want to just call one function trough a reference bound to this function call.
To go further into optimization, imagine 4 solutions:
CPU + optimized for small pictures
CPU + optimized for big pictures
GPU + optimized for small pictures
GPU + optimized for big pictures
Now, not only do you (as a programmer) have to eliminate 2 possibilities depending on old/new "OpenGL version", you also have to choose one of the 2 remaining possibilities depending on usage.
Some calls only have small or big pictures as function parameters, but in others places of your code, you need to choose which function they want to call depending on the picture-parameter's values.
- For 4x4 pixel pictures or small lookup-tables even the CPU-solution could be the fastest (lower overhead)
One solution could be to make a function from which code paths split and lead to optimized functions.
This works for the same package, not for different packages providing the same function (example: standard-library vs driver-library/hooks)
Another solution could be to write jet another package which incorporates the used ones and chooses the function optimized for certain task.
Jet another -even uglier- solution could be to update each function call by hand.
But the solution I am searching uses a function reference for each call given to the function at program-loading time, depending on hardware or software environment.
It should even be able to change when dependency-libraries load or unload.
(For example: a new version of the other library is installed, the old one uninstalled while your program is running - or waiting during execution of another thread on this CPU-core)
The program shouldn't bother if there is just 1 or more functions under this name. It should bother what's the fastest to execute.
Example:
package Pictures; //has averageRedValue( byte[height][width][RGB] )
package Images; //has averageRedValue( byte[height][width][RGB] ) too
If they both give the same result, why should the programmer care about which one is used?
He wants the fast solution or an option read from a settings-file.
And the end-user wants a simple option to choose the same functions as used in a past date - which asks for version control and rollback features
Please tell me if you have seen a solution or an idea where to look at.
This is all rather confused, but for the normal case, the solution is trivial: Write an interface containing the needed methods and make sure that only the best implementation gets loaded.
The JIT determines that there's just one such implementation (Class Hierarchy Analysis) and calls the proper method directly.
It should even be able to change when dependency-libraries load or unload.
Java can't do this efficiently. Whenever a second implementation gets loaded, the optimized code must be thrown away and the methods get recompiled. The conditional branch is still pretty cheap, with more implementations loaded its gets slower.
There's no way to unload a class without using some classloader magic.
What do you need it for?
How does Java find sine and cosine? I’m working on trying to make a game that is a simple platformer something like super Mario or Castlevania. I attempted to make a method that would rotate an image for me and then resize the JLabel to fit that image. I found an algorithm that worked and was able to accomplish my goal. However all I did was copy and past the algorithm any one can do that I want to understand the math behind it. So far I have figured everything out except one part. The methods sin and cos in the math class. They work and I can use them but I have no idea how Java get its numbers.
It would seem there is more then one way to solve this problem. For now I’m interested in how Java does it. I looked into the Taylor series but I’m not sure that is how java does it. But if Java does use the Taylor series I would like to know how that algorithm is right all the time (I am aware that it is an approximation). I’ve also heard of the CORDIC algorithm but I don’t know much about it as I do with the Taylor series which I have programmed into Java even though I don’t understand it. If CORDIC is how it's done, I would like to know how that algorithm is always right. It would seem it is also possible that the Java methods are system dependent meaning that the algorithm or code used would differ from system to system. If the methods are system dependent then I would like to know how Windows gets sine and cosine. However if it is the CPU itself that gets the answer I would like to know what algorithm it is using (I run an AMD Turion II Dual-Core Mobile M520 2.29GHz).
I have looked at the score code of the Math class and it points to the StrictMath class. However the StrictMath class only has a comment inside it no code. I have noticed though that the method does use the keyword native. A quick Google search suggest that this keyword enables java to work with other languages and systems supporting the idea that the methods are system dependent. I have looked at the java api for the StrictMath class (http://docs.oracle.com/javase/7/docs/api/java/lang/StrictMath.html) and it mentions something called fdlimb. The link is broken but I was able to Google it (http://www.netlib.org/fdlibm/).
It seems to be some sort of package written in C. while I know Java I have never learned C so I have been having trouble deciphering it. I started looking up some info about the C language in the hopes of getting to bottom of this but it a slow process. Of cores even if did know C I still don’t know what C file Java is using. There seems to be different version of the c methods for different systems and I can’t tell which one is being used. The API suggest it is the "IEEE 754 core function" version (residing in a file whose name begins with the letter e). But I see no sin method in the e files. I have found one that starts with a k which I think is sort for kernel and another that starts with an s which I think is sort for standard. The only e files I found that look similar to sin are e_sinh.c and e_asin.c which I think are different math functions. And that’s the story of my quest to fiend the Java algorithms for sine and cosine.
Somewhere at some point in the line an algorithm is being called upon to get these numbers and I want to know what it is and why it works(there is no way java just gets these numbers out of thin air).
The JDK is not obligated to compute sine and cosine on its own, only to provide you with an interface to some implementation via Math. So the simple answer to your question is: It doesn't; it asks something else to do it, and that something else is platform/JDK/JVM dependent.
All JDKs that I know of pass the burden off to some native code. In your case, you came across a reference to fdlibm, and you'll just have to suck it up and learn to read that code if you want to see the actual implementation there.
Some JVMs can optimize this. I believe HotSpot has the ability to spot Math.cos(), etc. calls and throw in a hardware instruction on systems where it is available, but do not quote me on that.
From the documentation for Math:
By default many of the Math methods simply call the equivalent method in StrictMath for their implementation. Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of Math methods. Such higher-performance implementations still must conform to the specification for Math.
The documentation for StrictMath actually mentions fdlibm (it places the constraint on StrictMath that all functions must produce the same results that fdlibm produces):
To help ensure portability of Java programs, the definitions of some of the numeric functions in this package require that they produce the same results as certain published algorithms. These algorithms are available from the well-known network library netlib as the package "Freely Distributable Math Library," fdlibm. These algorithms, which are written in the C programming language, are then to be understood as executed with all floating-point operations following the rules of Java floating-point arithmetic.
Note, however, that Math is not required to defer to StrictMath. Use StrictMath explicitly in your code if you want to guarantee consistent results across all platforms. Note also that this implies that code generators (e.g. HotSpot) are not given the freedom to optimize StrictMath calls to hardware calls unless the hardware would produce exactly the same results as fdlibm.
In any case, again, Java doesn't have to implement these on its own (it usually doesn't), and this question doesn't have a definitive answer. It depends on the platform, the JDK, and in some cases, the JVM.
As for general computational techniques, there are many; here is a potentially good starting point. C implementations are generally easy to come by. You'll have to search through hardware datasheets and documentation if you want to find out more about the hardware options available on a specific platform (if Java is even using them on that platform).
Working on a simple tic-tac-toe game in Java.
I have a class named GameHelpers. This class should contain useful methods for the game. The game happenes in another class.
A method in GameHelpers is ResetGame(). This method is supposed to set the text on all the 9 buttons (the tic-tac-toe board) to blank, set them enabled again, and set a variable to 1.
This is it's code:
public class GameHelpers {
public void resetGame(){
for(int i=0;i<3;i++){
for(int j=0;j<3;j++){
buttons[i][j].setEnabled(true);
buttons[i][j].setText("");
count = 1;
}
}
}
}
buttons[] is an array of JButtons inside the main class of the game, TicTacToe.
This method was previously inside the main class of the game, TicTacToe. But now that it's in a different class, it can't reach the buttons in the TicTacToe class and manipulate them.
I created get and set methods in TicTacToe, but how do I activate them from GameHelpers?
How can I make the method in GameHelpers work?
You may refer Java to EXE - Why, When, When Not and How
Drawbacks
Disk footprint. Java bytecode has been designed for compactness, so it has a much higher level than a typical CPU instruction set.
Expect that an executable produced by an AOT compiler will be 2-4
times larger than the original jar file.
Dynamic applications. Classes that the application loads dynamically at runtime may be unavailable to the application
developer. These can be third-party plug-ins, dynamic proxies and
other classes generated at runtime and so on. So the runtime system
has to include a Java bytecode interpreter and/or a JIT compiler.
Moreover, in the general case only classes that are loaded by either
system or application classloader may be precompiled to native code.
So applications that use custom classloaders extensively may only be
partially precompiled.
Hardware-specific optimizations. A JIT compiler has a potential advantage over AOT compilers in that it can select code generation
patterns according to the actual hardware on which the application is
executing. For instance, it may use Intel MMX/SSE/SSE2 extensions to
speedup floating point calculations. An AOT compiler must either
produce code for the lowest common denominator or apply versioning to
the most CPU-intensive methods, which results in further code size
increase.
Speaking from personal experience, if there's a bug that's only present in the EXE version, you might end up spending a lot of time trying to track it down and decide it's just easier to give users the other version instead.
It sounds to me like you're really trying to make things easier on your users overall. I don't know what your budget is, but perhaps something like install4j would be a better solution.
I'm working on a Scala-based script language (internal DSL) that allows users to define multiple data transformations functions in a Scala script file. Since the application of these functions could take several hours I would like to cache the results in a database.
Users are allowed to change the definition of the transformation functions and also to add new functions. However, then the user restarts the application with a slightly modified script I would like to execute only those functions that have been changed or added. The question is how to detect those changes? For simplicity let us assume that the user can only adapt the script file so that any reference to something not defined in this script can be assumed to be unchanged.
In this case what's the best practice for detecting changes to such user-defined functions?
Until now I though about:
parsing the script file and calculating fingerprints based on the source code of the function definitions
getting the bytecode of each function at runtime and building fingerprints based on this data
applying the functions to some test data and calculating fingerprints on the results
However, all three approaches have their pitfalls.
Writing a parser for Scala to extract the function definitions could be quite some work, especially if you want to detect changes that indirectly affect the behaviour of your functions (e.g. if your function calls another (changed) function defined in the script).
The bytecode analysis could be another option, but I never worked with those libraries. Thus I have no idea if they can solve my problem and how they deal with Java's dynamic binding.
The approach with example data is definitely the simplest one, but has the drawback that different user-defined functions could be accidentally mapped to the same fingerprint if they return the same results for my test data.
Does someone has experience with one of these "solutions" or can suggest me a better one?
The second option doesn't look difficult. For example, with Javassist library obtaining bytecode of a method is as simple as
CtClass c = ClassPool.getDefault().get(className);
for (CtMethod m: c.getDeclaredMethod()) {
CodeAttribute ca = m.getMethodInfo().getCodeAttribute();
if (ca != null) { // i.e. if the method is not native
byte[] byteCode = ca.getCode();
...
}
}
So, as long as you assume that results of your methods depend on the code of that methods only, it's pretty straighforward.
UPDATE:
On the other hand, since your methods are written in Scala, they probably contain some closures, so that parts of their code reside in anonymous classes, and you may need to trace usage of these classes somehow.