What does asm instruction 'ta' & '0xfffffffc' mean? - java

See the following code from jdk8/openjdk/hotspot/src/os_cpu/linux_sparc/vm/linux_sparc.s:
_flush_reg_windows:
ta 0x03
retl
mov %fp, %o0
What does the code above mean?
I can't understand what ta means.
Also, why there exists code after retl. Nothing should exist after ret, right?
In the instruction mov %fp, %o0 what does %o0 mean? Does %fp refer to FramePointer FP register?
Plus, another small question:
mov %eax, 0xfffffffc(%ebp)
What does 0xfffffffc here mean?

As far as I understand your question correctly you are not really aware of the fact that different processor architectures (like x86, ARM, Sparc, MIPS, PowerPC) use a completely different instruction set.
For this reason assembler code (and instructions) for these processors will look absolutely different.
The first fragement of code is obviously Sparc assembly code. Sparc processors were used in larger Sun (now Oracle) workstations. The way these processors work is very different to x86 processors used in a PC.
Registers are named %o0-%o7, %g0-%g7, %l0-%l7 and %i0-%i7 on such processors. %fp is the "special name" of one of these registers (%i6 = %fp).
Just like MIPS processors such CPUs have "delay slots". This means that a Jump, Call or Return instruction will execute with one instruction delay. This means the instruction following a jump or ret instruction is executed before the actual jump takes place.
For this reason there is an instruction after the "retl" instruction!
The "ta" instruction is similar to the "int" instruction on x86 CPUs: It executes an operating system interrupt.
As "Michael" already set 0xFFFFFFFC is a hexadecimal number; in this case -4.
"mov %eax,0xfffffffc(%ebp)" is code for x86 CPUs. The instruction will load a value that is stored at the address ebp-4 into the register eax.

Related

Why can I fill the stack more on successive calls to recursive method [duplicate]

This question already has answers here:
Why is the max recursion depth I can reach non-deterministic?
(4 answers)
Closed 5 years ago.
A simple class for demonstration purposes:
public class Main {
private static int counter = 0;
public static void main(String[] args) {
try {
f();
} catch (StackOverflowError e) {
System.out.println(counter);
}
}
private static void f() {
counter++;
f();
}
}
I executed the above program 5 times, the results are:
22025
22117
15234
21993
21430
Why are the results different each time?
I tried setting the max stack size (for example -Xss256k). The results were then a bit more consistent but again not equal each time.
Java version:
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
EDIT
When JIT is disabled (-Djava.compiler=NONE) I always get the same number (11907).
This makes sense as JIT optimizations are probably affecting the size of stack frames and the work done by JIT definitely has to vary between the executions.
Nevertheless, I think it would be beneficial if this theory is confirmed with references to some documentation about the topic and/or concrete examples of work done by JIT in this specific example that leads to frame size changes.
The observed variance is caused by background JIT compilation.
This is how the process looks like:
Method f() starts execution in interpreter.
After a number of invocations (around 250) the method is scheduled for compilation.
The compiler thread works in parallel to the application thread. Meanwhile the method continues execution in interpreter.
As soon as the compiler thread finishes compilation, the method entry point is replaced, so the next call to f() will invoke the compiled version of the method.
There is basically a race between applcation thread and JIT compiler thread. Interpreter may perform different number of calls before the compiled version of the method is ready. At the end there is a mix of interpreted and compiled frames.
No wonder that compiled frame layout differs from interpreted one. Compiled frames are usually smaller; they don't need to store all the execution context on the stack (method reference, constant pool reference, profiler data, all arguments, expression variables etc.)
Futhermore, there is even more race possibilities with Tiered Compilation (default since JDK 8). There can be a combination of 3 types of frames: interpreter, C1 and C2 (see below).
Let's have some fun experiments to support the theory.
Pure interpreted mode. No JIT compilation.
No races => stable results.
$ java -Xint Main
11895
11895
11895
Disable background compilation. JIT is ON, but is synchronized with the application thread.
No races again, but the number of calls is now higher due to compiled frames.
$ java -XX:-BackgroundCompilation Main
23462
23462
23462
Compile everything with C1 before execution. Unlike previous case there will be no interpreted frames on the stack, so the number will be a bit higher.
$ java -Xcomp -XX:TieredStopAtLevel=1 Main
23720
23720
23720
Now compile everything with C2 before execution. This will produce the most optimized code with the smallest frame. The number of calls will be the highest.
$ java -Xcomp -XX:-TieredCompilation Main
59300
59300
59300
Since the default stack size is 1M, this should mean the frame now is only 16 bytes long. Is it?
$ java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=print,Main.f Main
0x00000000025ab460: mov %eax,-0x6000(%rsp) ; StackOverflow check
0x00000000025ab467: push %rbp ; frame link
0x00000000025ab468: sub $0x10,%rsp
0x00000000025ab46c: movabs $0xd7726ef0,%r10 ; r10 = Main.class
0x00000000025ab476: addl $0x2,0x68(%r10) ; Main.counter += 2
0x00000000025ab47b: callq 0x00000000023c6620 ; invokestatic f()
0x00000000025ab480: add $0x10,%rsp
0x00000000025ab484: pop %rbp ; pop frame
0x00000000025ab485: test %eax,-0x23bb48b(%rip) ; safepoint poll
0x00000000025ab48b: retq
In fact, the frame here is 32 bytes, but JIT has inlined one level of recursion.
Finally, let's look at the mixed stack trace. In order to get it, we'll crash JVM on StackOverflowError (option available in debug builds).
$ java -XX:AbortVMOnException=java.lang.StackOverflowError Main
The crash dump hs_err_pid.log contains the detailed stack trace where we can find interpreted frames at the bottom, C1 frames in the middle and lastly C2 frames on the top.
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 164 C2 Main.f()V (12 bytes) # 0x00007f21251a5958 [0x00007f21251a5900+0x0000000000000058]
J 164 C2 Main.f()V (12 bytes) # 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
// ... repeated 19787 times ...
J 164 C2 Main.f()V (12 bytes) # 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
J 163 C1 Main.f()V (12 bytes) # 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
J 163 C1 Main.f()V (12 bytes) # 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
// ... repeated 1866 times ...
J 163 C1 Main.f()V (12 bytes) # 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
j Main.f()V+8
j Main.f()V+8
// ... repeated 1839 times ...
j Main.f()V+8
j Main.main([Ljava/lang/String;)V+0
v ~StubRoutines::call_stub
First of all, the following has not been researched. I have not "deep dived" the OpenJDK source code to validate any of the following, and I don't have access to any inside knowledge.
I tried to validate your results by running your test on my machine:
$ java -version
openjdk version "1.8.0_71"
OpenJDK Runtime Environment (build 1.8.0_71-b15)
OpenJDK 64-Bit Server VM (build 25.71-b15, mixed mode)
I get the "count" varying over a range of ~250. (Not as much as you are seeing)
First some background. A thread stack in a typical Java implementation is a contiguous region of memory that is allocated before the thread is started, and that is never grown or moved. A stack overflow happens when the JVM tries to create a stack frame to make a method call, and the frame goes beyond the limits of the memory region. The test could be done by testing the SP explicitly, but my understanding is that it is normally implemented using a clever trick with the memory page settings.
When a stack region is allocated, the JVM makes a syscall to tell the OS to mark a "red zone" page at the end of the stack region read-only or non-accessible. When a thread makes a call that overflows the stack, it accesses memory in the "red zone" which triggers a memory fault. The OS tells the JVM via a "signal", and the JVM's signal handler maps it to a StackOverflowError that is "thrown" on the thread's stack.
So here are a couple of possible explanations for the variability:
The granularity of hardware-based memory protection is the page boundary. So if the thread stack has been allocated using malloc, the start of the region is not going to be page aligned. Therefore the distance from the start of the stack frame to the first word of the "red zone" (which >is< page aligned) is going to be variable.
The "main" stack is potentially special, because that region may be used while the JVM is bootstrapping. That might lead to some "stuff" being left on the stack from before main was called. (This is not convincing ... and I'm not convinced.)
Having said this, the "large" variability that you are seeing is baffling. Page sizes are too small to explain a difference of ~7000 in the counts.
UPDATE
When JIT is disabled (-Djava.compiler=NONE) I always get the same number (11907).
Interesting. Among other things, that could cause stack limit checking to be done differently.
This makes sense as JIT optimizations are probably affecting the size of stack frames and the work done by JIT definitely has to vary between the executions.
Plausible. The size of the stackframe could well be different after the f() method has been JIT compiled. Assuming f() was JIT compiled at some point you stack will have a mixture of "old" and "new" frames. If the JIT compilation occurred at different points, then the ratio will be different ... and hence the count will be different when you hit the limit.
Nevertheless, I think it would be beneficial if this theory is confirmed with references to some documentation about the topic and/or concrete examples of work done by JIT in this specific example that leads to frame size changes.
Little chance of that, I'm afraid ... unless you are prepared to PAY someone to do a few days research for you.
1) No such (public) reference documentation exists, AFAIK. At least, I've never been able to find a definitive source for this kind of thing ... apart from deep diving the source code.
2) Looking at the JIT compiled code tells you nothing of how the bytecode interpreter handled things before the code was JIT compiled. So you won't be able to see if the frame size has changed.
The exact functioning of Java stack undocumented, but it totally depends on the memory allocated to that thread.
Just try using the Thread constructor with stacksize and see if it gets constant. I have not tried it it, so please share the results.

Investigating a Java bug regarding String.valueOf(float)

In Java is it a possibility that String.valueOf(float) would format a float number differently based on what operating system the code is run on, the version of java and/or the operating systems locale.
For example, with the float number 4.5 would it ever be formatted to "4,5" instead of "4.5"?
String.valueOf(float) calls Float.toString().
Float.toString() calls intern sun.misc.FloatingDecimal.toJavaFormatString(float)
The result string will never contain the sign , bacause of hard-coded '.' (ASCII: 46) inside the BinaryToASCIIBuffer.getChars(chars[])
You can see it if you decompile sun.misc.FloatingDecimal class (in my case java 8 jdk) or see the (similar) implementation in openjdk.

JVM: how are exit codes of the java executable defined?

I'm looking for a definition of possible exit codes of the java executable:
(How) can I tell whether the exit code is of the executed Java process or of the VM itself?
Example: On Windows, java -badoption returns 1; java Main with Main being a valid class may return 1 as well.
Are there any VM options I could use to make exit codes more meaningful? E.g. to distinguish between both types of exit codes?
If I know that the exit code is not from my Java process (which only returns 0), what do non-zero exit codes mean?
On Windows, I'm frequently seeing -1 and 1. As these are reported through an automated bug-reporting facility, I can't see any error messages. I only have the exit code and need to interpret it.
Are the exit codes platform-dependent?
Meaning of Exit Codes (in general)
ISO/IEC 9899 (Programming Languages: C), ISO/IEC 9945 (IEEE 1003, POSIX)
There is nothing which is specifically defined for Java / JVM. The meaning of these exit codes is defined by ISO/IEC 9899 (Programming Languages: C, for example 7.20.4.3 The exit function of ISO/IEC 9899:TC3), ISO/IEC 9945 (IEEE 1003, POSIX) and similar specifications, and it's always like this:
0 means success
Any other value means failure
This is used by shell environments (sh, bash, cmd.exe, make etc. etc.) to determine whether a program exited successfully (0) or there was an error (not 0).
It is strongly recommended to use only 0 for success. Programs which use values other than 0 for success cause headache to the authors of shell scripts, Makefiles and alike.
It's up to the individual program to define additional semantics of different failure values. However, I think -1 is not such a good idea. The best thing would be to use EXIT_SUCCESS and EXIT_FAILURE, however, these definitions from <stdlib.h> have no corresponding counterparts in Java. ISO/IEC 9899:2011 does not describe how EXIT_FAILURE and EXIT_SUCCESS should be defined. However, it defines 0 as having the same semantics as EXIT_SUCCESS, and I do not know of any system which does something else than the following, which therefore is the best thing a Java programmer can assume:
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
So, I'd use 0 for success, 1 for failure, and values distinct from 1 if it is important to differentiate between different types of failure. A good example is the grep command. From its man page (GNU grep):
EXIT STATUS
The exit status is 0 if selected lines are found, and 1 if not found. If an error occurred the exit status is 2. (Note: POSIX error handling code should check for '2' or greater.)
I'd not use -1 because it sets all bits, and depending on the environment, -1 might actually accidentally generate additonal bogus information, like claiming that signals were raised and alike. Actually, to be portable, the exit code should be within the range [0..127], unless you really really really know what you're doing.
For example, on most POSIX systems, the exit code will be truncated to 8 bits, and the semantics of an exit code above 127 is that the exit was caused by a signal.
BSD
BSD has tried to supply a few more exit codes, which are available in /usr/include/sysexits.h (man page). They are like this:
64: command line usage error
65: data format error
66: cannot open input
67: addressee unknown
68: host name unknown
69: service unavailable
70: internal software error
71: system error (i.e. can't fork)
72: critical OS file missing
73: can't create (user) output file
74: input/output error
75: temp failure; user is invited to retry
76: remote error in protocol
77: permission denied
78: configuration error
In the absence of any other meaningful standard, I think the best thing we can do is use them.
Interpretation of JVM-generated exit values
JVM on POSIX
On UNIX, you can obtain the signal by masking out the lower 7 bits (subtract 128) from the exit value, which in most shells can be queried using $?.
SIGTERM -> VM: 143 SIGTERM
SIGSEGV -> VM: 134 SIGABRT (because the VM handles SIGSEGV to write the hs_err file and then calls abort()).
Comparison to "normal" programs:
SIGSEGV -> prog: 139 SIGSEGV
This behavior is not explicitly specified for the JVM, it is just the normal expected behavior of programs in a POSIX environment. Even the SIGABRT instead of SIGSEGV is more or less expected, given that the JVM wants to handle SIGSEGV on its own, writing a more specific crash dump than a normal core file.
JVM on Windows
TODO

Compiler written in Java: Peephole optimizer implementation

I'm writing a compiler for a subset of Pascal. The compiler produces machine instructions for a made-up machine. I want to write a peephole optimizer for this machine language, but I'm having trouble substituting some of the more complicated patterns.
Peephole optimizer specification
I've researched several different approaches to writing a peephole optimizer, and I've settled on a back-end approach:
The Encoder makes a call to an emit() function every time a machine instruction is to be generated.
emit(Instruction currentInstr) checks a table of peephole optimizations:
If the current instruction matches the tail of a pattern:
Check previously emitted instructions for matching
If all instructions matched the pattern, apply the optimization, modifying the tail end of the code store
If no optimization was found, emit the instruction as usual
Current design approach
The method is easy enough, it's the implementation I'm having trouble with. In my compiler, machine instructions are stored in an Instruction class. I wrote an InstructionMatch class stores regular expressions meant to match each component of a machine instruction. Its equals(Instruction instr) method returns true if the patterns match some machine instruction instr.
However, I can't manage to fully apply the rules I have. First off, I feel that given my current approach, I'll end up with a mess of needless objects. Given that a complete list of peephole optimizations numbers can number around 400 patterns, this will get out of hand fast. Furthermore, I can't actually get more difficult substitutions working with this approach (see "My question").
Alternate approaches
One paper I've read folds previous instructions into one long string, using regular expressions to match and substitute, and converting the string back to machine instructions. This seemed like a bad approach to me, please correct me if I'm wrong.
Example patterns, pattern syntax
x: JUMP x+1; x+1: JUMP y --> x: JUMP y
LOADL x; LOADL y; add --> LOADL x+y
LOADA d[r]; STOREI (n) --> STORE (n) d[r]
Note that each of these example patterns is just a human-readable representation of the following machine instruction template:
op_code register n d
(n usually indicates the number of words, and d an address displacement). The syntax x: <instr> indicates that the instruction is stored at address x in the code store.
So, the instruction LOADL 17 is equivalent to the full machine instruction 5 0 0 17 when the LOADL opcode is 5 (n and r are unused in this instruction)
My question
So, given that background, my question is this: How do I effectively match and replace patterns when I need to include parts of previous instructions as variables in my replacement? For example, I can simply replace all instances of LOADL 1; add with the increment machine instruction - I don't need any part of the previous instructions to do this. But I'm at a loss of how to effectively use the 'x' and 'y' values of my second example in the substitution pattern.
edit: I should mention that each field of an Instruction class is just an integer (as is normal for machine instructions). Any use of 'x' or 'y' in the pattern table is a variable to stand in for any integer value.
An easy way to do this is to implement your peephole optimizer as a finite state machine.
We assume you have a raw code generator that generates instructions but does not emit them, and an emit routine that sends actual code to the object stream.
The state machine captures instructions that your code generator produces, and remembers sequences of 0 or more generated instructions by transitioning between states. A state thus implicitly remembers a (short) sequence of generated but un-emitted instructions; it also has to remember the key parameters of the instructions it has captured, such as a register name, a constant value, and/or addressing modes and abstract target memory locations. A special start state remembers the empty string of instructions. At any moment, you need to be able to emit the unemitted instructions ("flush"); if you do this all the time, your peephole generator captures the next instruction and then emits it, doing no useful work.
To do useful work, we want the machine to capture as long a sequence as possible. Since there are typically many kinds of machine instructions, as practical matter you can't remember too many in a row or the state machine will become enormous. But it is practical to remember the last two or three for the most common machine instructions (load, add, cmp, branch, store). The size of the machine will really be determined by lenght of the longest peephole optimization we care to do, but if that length is P, the entire machine need not be P states deep.
Each state has transitions to a next state based on the "next" instruction I produced by your code generator. Imagine a state represents the capture of N instructions.
The transition choices are:
flush the leftmost 0 or more (call this k) instructions that this state represents, and transition to a next state, representing N-k+1, instructions that represents the additional capture of machine instruction I.
flush the leftmost k instructions this state represents, transition to the state
that represents the remaining N-k instructions, and reprocess instruction I.
flush the state completely, and emit instruction I, too. [You can actually
do this on the just the start state].
When flushing the k instructions, what actually gets emitted is the peephole optimized version of those k. You can compute anything you want in emitting such instructions. You also need to remember "shift" the parameters for the remaining instructions appropriately.
This is all pretty easily implemented with a peephole optimizer state variable, and a case statement at each point where your code generator produces its next instruction. The case statement updates the peephole optimizer state and implements the transition operations.
Assume our machine is an augmented stack machine, has
PUSHVAR x
PUSHK i
ADD
POPVAR x
MOVE x,k
instructions, but the raw code generator generates only pure stack machine instructions, e.g., it does not emit the MOV instruction at all. We want the peephole optimizer to do this.
The peephole cases we care about are:
PUSHK i, PUSHK j, ADD ==> PUSHK i+j
PUSHK i, POPVAR x ==> MOVE x,i
Our state variables are:
PEEPHOLESTATE (an enum symbol, initialized to EMPTY)
FIRSTCONSTANT (an int)
SECONDCONSTANT (an int)
Our case statements:
GeneratePUSHK:
switch (PEEPHOLESTATE) {
EMPTY: PEEPHOLESTATE=PUSHK;
FIRSTCONSTANT=K;
break;
PUSHK: PEEPHOLESTATE=PUSHKPUSHK;
SECONDCONSTANT=K;
break;
PUSHKPUSHK:
#IF consumeEmitLoadK // flush state, transition and consume generated instruction
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
SECONDCONSTANT=K;
PEEPHOLESTATE=PUSHKPUSHK;
break;
#ELSE // flush state, transition, and reprocess generated instruction
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
PEEPHOLESTATE=PUSHK;
goto GeneratePUSHK; // Java can't do this, but other langauges can.
#ENDIF
}
GenerateADD:
switch (PEEPHOLESTATE) {
EMPTY: emit(ADD);
break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
emit(ADD);
PEEPHOLESTATE=EMPTY;
break;
PUSHKPUSHK:
PEEPHOLESTATE=PUSHK;
FIRSTCONSTANT+=SECONDCONSTANT;
break:
}
GeneratePOPX:
switch (PEEPHOLESTATE) {
EMPTY: emit(POP,X);
break;
PUSHK: emit(MOV,X,FIRSTCONSTANT);
PEEPHOLESTATE=EMPTY;
break;
PUSHKPUSHK:
emit(MOV,X,SECONDCONSTANT);
PEEPHOLESTATE=PUSHK;
break:
}
GeneratePUSHVARX:
switch (PEEPHOLESTATE) {
EMPTY: emit(PUSHVAR,X);
break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
PEEPHOLESTATE=EMPTY;
goto GeneratePUSHVARX;
PUSHKPUSHK:
PEEPHOLESTATE=PUSHK;
emit(PUSHK,FIRSTCONSTANT);
FIRSTCONSTANT=SECONDCONSTANT;
goto GeneratePUSHVARX;
}
The #IF shows two different styles of transitions, one that consumes the generated
instruction, and one that does not; either works for this example.
When you end up with a few hundred of these case statements,
you'll find both types handy, with the "don't consume" version helping
you keep your code smaller.
We need a routine to flush the peephole optimizer:
flush() {
switch (PEEPHOLESTATE) {
EMPTY: break;
PUSHK: emit(PUSHK,FIRSTCONSTANT);
break;
PUSHKPUSHK:
emit(PUSHK,FIRSTCONSTANT),
emit(PUSHK,SECONDCONSTANT),
break:
}
PEEPHOLESTATE=EMPTY;
return; }
It is interesting to consider what this peephole optimizer does with the following generated code:
PUSHK 1
PUSHK 2
ADD
PUSHK 5
POPVAR X
POPVAR Y
What this whole FSA scheme does is hide your pattern matching in the state transitions, and the response to matched patterns in the cases. You can code this by hand, and it is fast and relatively easy to code and debug. But when the number of cases gets large, you don't want to build such a state machine by hand. You can write a tool to generate this state machine for you; good background for this would be FLEX or LALR parser state machine generation. I'm not going to explain this here :-}

The code of method specialStateTransition(int, IntStream) is exceeding the 65535 bytes

I have a pretty big grammar I don't want to break it into multiple smaller grammars. But The generated Lexer file is giving the following error:
The code of method specialStateTransition(int, IntStream) is exceeding the 65535 bytes
I am using ANTLR-3.2. Please tell me how to remove this compiler error.
Thanks
Preeti
Method specialStateTransition is not always generated. It may be related to some tokens that share common prefixes with other tokens.
See this question/answer for a case where specialStateTransition completely vanished by reformulating just one such token.
I had the same problem recently and managed to fix it by changing the options for the Antlr code generation tool..
C: java org.antlr.Tool –Xmaxinlinedfastates [a number less than 60] grammar.g
Using this option forces the code generator to create a table of DFA states rather than many nested if statements
You can't: you will have to refactor your code. The limit is inherent to Java class files.
From Section 4.10 (Limitations of the Java Virtual Machine) of the VM specification:
The amount of code per non-native, non-abstract method is limited to
65536 bytes by the sizes of the indices in the exception_table of the
Code attribute (§4.7.3), in the LineNumberTable attribute (§4.7.8),
and in the LocalVariableTable attribute (§4.7.9).

Categories