Related
I'm doing an inter-procedrual analysis project in Java at the moment and I'm looking into using an IFDS solver to compute the control flow graph of a program. I'm finding it hard to follow the maths involved in the description of the IFDS framework and graph reachability. I've read in several places that its not possible to compute the points-to sets of a program using this solver as "pointer analysis is known to be a non-distributive problem." [1] Other sources have said that this is often specifically with regard to 'strong updates', which from what I can gather are field write statements.
I think I can basically follow how the solver computes edges and works out the dataflow facts. But I don't quite follow what this: f(A ∪ B) = f(A) ∪ f(B) means in practical terms as a definition of a distributive function, and therefore what it means to say that points-to analysis deals with non-distributive functions.
The linked source [1] gives an example specific to field write statements:
A a = new A();
A b = a;
A c = new C();
b.f = c;
It claims that in order to reason about the assignment to b.f one must also take into account all aliases of the base b. I can follow this. But what I don't understand is what are the properties of this action that make it non-distributive.
A similar (I think) example from [2]:
x = y.n
Where before the statement there are points-to edges y-->obj1 and obj1.n-->obj2 (where obj1 and 2 are heap objects). They claim
it is not possible to correctly deduce that the edge x-->obj2 should be generated after the statement if we consider each input edge independently. The flow function for this statement is a function of the points-to graph as a whole and cannot be decomposed into independent functions of each edge and then merged to get a correct result.
I think I almost understand what, at least the first, example is saying but that I am not getting the concept of distributive functions which is blocking me getting the full picture. Can anyone explain what a distributive or non-distributive function is on a practical basis with regards to pointer analysis, without using set theory which I am having difficulty following?
[1] http://karimali.ca/resources/pubs/conf/ecoop/SpaethNAB16.pdf
[2] http://dl.acm.org/citation.cfm?doid=2487568.2487569 (paywall, sorry)
The distributiveness of a flow function is defined as: f(a Π b) = f(a) Π f(b), with Π being the merge function. In IFDS, Π is defined as the set union ∪.
What this means is that it doesn't matter whether or not you apply the merge function before or after the flow function, you will get the same result in the end.
In a traditional data-flow analysis, you go through the statements of your CFG and propagate sets of data-flow facts. So with a flow function f, for each statement, you compute f(in, stmt) = out, with in and out the sets of information you want to keep (e.g.: for an in-set {(a, allocA), (b, allocA)} -denoting that the allocation site of objects a and b is allocA, and the statement "b.f = new X();" -which we will name allocX, you would likely get the out-set {(a, allocA), (b, allocA), (a.f, allocX), (b.f, allocX)} because a and b are aliased).
IFDS explodes the in-set into its individual data-flow facts. So for each fact, instead of running your flow-function once with your entire in-set, you run it on each element of the in-set: ∀ d ∈ in, f(d, stmt) = out_d. The framework then merges all out_d together into the final out-set.
The issue here is that for each flow function, you don't have access to the entire in-set, meaning that for the example we presented above, running the flow-function f((a, allocA)) on the statement would yield a first out-set {(a, allocA)}, f((b, allocA)) would yield a second out-set {(b, allocA)}, and f(0) would yield a third out-set {(0), (b.f, allocX)}.
So the global out-set after you merge the results would be {(a, allocA), (b, allocA), (b.f, allocX)}. We are missing the fact {(a.f, allocX)} because when running the flow function f(0), we only know that the in-fact is 0 and that the statement is "b.f = new X();". Because we don't know that a and b refer to the allocation site allocA, we don't know that they are aliased, and we therefore cannot know that a.f should also point to allocX after the statement.
IFDS runs on the assumption of distributiveness: merging the out-sets after running the flow-function should yield the same results as merging the in-sets before running the flow-function.
In other words, if you need to combine information from multiple elements on the in-set to create a certain data-flow fact in your out-set, then you are not distributive, and should not express your problem in IFDS (unless you do something to handle those combination cases, like the authors of the paper you refer to as [1] did).
I usually use python/php but currently writing an Android app so playing with java at the moment.
I can't figure out why the following piece of my code doesn't work.
It's meant to take a long representing the time the last check occurred (last_check) and add a predefined number of minutes to it (CHECK_INTERVAL) which is currently set to 1 minute.
Log.i(this.toString(), "Last check: " + Long.toString(last_check));
long add_to_check = CHECK_INTERVAL * 60 * 1000;
long next = last_check + add_to_check;
Log.i(this.toString(), "Add to check: " + Long.toString(add_to_check));
Log.i(this.toString(), "Next check: " + Long.toString(next));
scheduleNextRun(next);
My expectation is that the first log will show the last_check time, the second log will show 60000 and the third log will show the sum of those two.
However, I am getting the same value for the first log and the third log - it's like my addition isn't working at all but I can't figure out why.
I thought I might have had issues with log vs int but have tried adding L to one of the variables and it doesn't make a difference.
FYI the second log is showing 60000 as expected.
thanks
Aaron
Is CHECK_INTERVAL 0? You wrote that "the second log is showing 60000 as expected" but perhaps CHECK_INTERVAL was 0 the first time this code ran, then initialized, then 1 on a later iteration when you're looking at that part of the log.
Are you initializing CHECK_INTERVAL to something non-zero but after this code runs? Do you have an initialization bug?
This problem will be easy to solve if you step through the code in the debugger and watch the results. If you're not using a debugger, do yourself a huge favor and get Android Studio (a wonderful tool built on IntelliJ IDEA) or Eclipse + ADT (a good tool).
Java initialization is defined in a way that's predictable, portable, and useful, but it can still be tricky. E.g. given
class Foo {
static final int A2 = A1 * 1000;
static final int A1 = 60;
}
The JVM first initializes all variables to default values, then runs the initialization expressions in order. Since A1 is 0 when A2's initializer runs, A2 will end up 0.
See Java Puzzlers for more subtle cases such as when one class's initialization code refers to a second class, causing the second class's initialization code to run, which then refers to values in the first class which haven't yet been initialized beyond their default values (0 and null). A class runs its initialization code on first demand, but nothing guarantees that it finishes initializing before those values are used.
Another tricky case happens when one class, C1, refers to a static final value from a second class, C2.A, then you edit the initialization code for A without recompiling class C1. Java has precise rules about when to cache such constants in the first class's .class file, but they aren't the ideal rules, and the compiler doesn't notice that it needs to recompile C1 for this!
BTW 1: If CHECK_INTERVAL is an int, the expression CHECK_INTERVAL * 60 * 1000 will compute an int value and wrap around within a 32-bit signed range. Still, 1 * 60 * 1000 will easily fit in an int.
BTW 2: The first arg to Log.i() is a tag. It's OK to pass in this.toString() [or toString() for short] but the idea is to pass in a constant tag like the current class name that you can use for log filtering.
[Added] Quick intro to Eclipse debugging
In the source code editor, double-click in the left margin to set a breakpoint. Then use the menus or toolbar to "debug as" a Java application rather than "run as". Eclipse will go into its "Debug perspective" (arrangement of views).
https://www.google.com/search?q=eclipse+debugger finds nice tutorials with step-by-step pictures (I checked the first 3; IBM's is the most concise and introduces more features) and videos. The Eclipse docs are good but harder to navigate.
It's all slicker in Android Studio.
The Title is self explanatory. This was an interview question. In java, List is an interface. So it should be initialized by some collection.
I feel that this is a tricky question to confuse. Am I correct or not? How to answer this question?
Assuming you don't have a copy of the original List, and the randomizing algorithm is truly random, then no, you cannot restore the original List.
The explanation is far more important on this type of question than the answer. To be able to explain it fully, you need to describe it using the mathematical definitions of Function and Map (not the Java class definitions).
A Function is a Map of Elements in one Domain to another Domain. In our example, the first domain is the "order" in the first list, and the second domain is the "order" in the second list. Any way that can get from the first domain to the second domain, where each element in the first domain only goes to one of the elements in the second domain is a Function.
What they want is to know if there is an Inverse Function, or a corresponding function that can "back map" the elements from the second domain to the elements in the first domain. Some functions (squaring a number, or F(x) = x*x ) cannot be reversed because one element in the second domain might map back to multiple (or none) elements in the first domain. In the squaring a number example
F(x) = x * x
F(3) = 9 or ( 3 -> 9)
F(12) = 144 or ( 12 -> 144)
F(-11) = 121 or (-11 -> 121)
F(-3) = 9 or ( -3 -> 9)
attempting the inverse function, we need a function where
9 maps to 3
144 maps to 12
121 maps to -11
9 maps to -3
Since 9 must map to 3 and -3, and a Map must have only one destination for every origin, constructing an inverse function of x*x is not possible; that's why mathematicians fudge with the square root operator and say (plus or minus).
Going back to our randomized list. If you know that the map is truly random, then you know that the output value is truly independent of the input value. Thus if you attempted to create the inverse function, you would run into the delimma. Knowledge that the function is random tells you that the input cannot be calculated from the output, so even though you "know" the function, you cannot make any assumptions about the input even if you have the output.
Unless, it is pseudo-random (just appears to be random) and you can gather enough information to reverse the now-not-truly random function.
If you have not kept some external order information (this includes things like JVM trickery with ghost copies), and the items are not implicitly ordered, you cannot recover the original ordering.
When information is lost, it is lost. If the structure of the list is the only place recording the order you want, and you disturb that order, it's gone for good.
There's a user's view, and there's internals. There's the question as understood and the question as can be interpreted.
The user's view is that list items are blocks of memory, and that the pointer to the next item is a set of (4?8? they keep changing the numbers:) bytes inside this memory. So when the list is randomized and the pointer to the next item is changed, that area of memory is overriden and can't be recovered.
The question as understood is that you are given a list after it had been randomized.
Internals - I'm not a Java or an OS guy, but you should look into situations where the manner in which the process is executed differs from the naive view: Maybe Java randomizes lists by copying all the cells, so the old list is still kept in memory somewhere? Maybe it keeps backup values of pointers? Maybe the pointers are kept at an external table, separate from the list, and can be reconstructed? Maybe. Internals.
Understanding - Who says you haven't got an access to the list before it was randomized? You could have just printed it out! Or maybe you have a trace of the execution? Or who said you're using Java's built it list? Maybe you are using your own version controlled list? Or maybe you're using your own reversable-randomize method?
Edwin Buck's answer is great but it all depends what the interviewer was looking for.
I'm looking for Java solution but any general answer is also OK.
Vector/ArrayList is O(1) for append and retrieve, but O(n) for prepend.
LinkedList (in Java implemented as doubly-linked-list) is O(1) for append and prepend, but O(n) for retrieval.
Deque (ArrayDeque) is O(1) for everything above but cannot retrieve element at arbitrary index.
In my mind a data structure that satisfy the requirement above has 2 growable list inside (one for prepend and one for append) and also stores an offset to determine where to get the element during retrieval.
You're looking for a double-ended queue. This is implemented the way you want in the C++ STL, which is you can index into it, but not in Java, as you noted. You could conceivably roll your own from standard components by using two arrays and storing where "zero" is. This could be wasteful of memory if you end up moving a long way from zero, but if you get too far you can rebase and allow the deque to crawl into a new array.
A more elegant solution that doesn't really require so much fanciness in managing two arrays is to impose a circular array onto a pre-allocated array. This would require implementing push_front, push_back, and the resizing of the array behind it, but the conditions for resizing and such would be much cleaner.
A deque (double-ended queue) may be implemented to provide all these operations in O(1) time, although not all implementations do. I've never used Java's ArrayDeque, so I thought you were joking about it not supporting random access, but you're absolutely right — as a "pure" deque, it only allows for easy access at the ends. I can see why, but that sure is annoying...
To me, the ideal way to implement an exceedingly fast deque is to use a circular buffer, especially since you are only interested in adding removing at the front and back. I'm not immediately aware of one in Java, but I've written one in Objective-C as part of an open-source framework. You're welcome to use the code, either as-is or as a pattern for implementing your own.
Here is a WebSVN portal to the code and the related documentation. The real meat is in the CHAbstractCircularBufferCollection.m file — look for the appendObject: and prependObject: methods. There is even a custom enumerator ("iterator" in Java) defined as well. The essential circular buffer logic is fairly trivial, and is captured in these 3 centralized #define macros:
#define transformIndex(index) ((headIndex + index) % arrayCapacity)
#define incrementIndex(index) (index = (index + 1) % arrayCapacity)
#define decrementIndex(index) (index = ((index) ? index : arrayCapacity) - 1)
As you can see in the objectAtIndex: method, all you do to access the Nth element in a deque is array[transformIndex(N)]. Note that I make tailIndex always point to one slot beyond the last stored element, so if headIndex == tailIndex, the array is full, or empty if the size is 0.
Hope that helps. My apologies for posting non-Java code, but the question author did say general answers were acceptable.
If you treat append to a Vector/ArrayList as O(1) - which it really isn't, but might be close enough in practice -
(EDIT - to clarify - append may be amortized constant time, that is - on average, the addition would be O(1), but might be quite a bit worse on spikes. Depending on context and the exact constants involved, this behavior can be deadly).
(This isn't Java, but some made-up language...).
One vector that will be called "Forward".
A second vector that will be called "Backwards".
When asked to append -
Forward.Append().
When asked to prepend -
Backwards.Append().
When asked to query -
if ( Index < Backwards.Size() )
{
return Backwards[ Backwards.Size() - Index - 1 ]
}
else
{
return Forward[ Index - Backwards.Size() ]
}
(and also check for the index being out of bounds).
Your idea might work. If those are the only operations you need to support, then two Vectors are all you need (call them Head and Tail). To prepend, you append to head, and to append, you append to tail. To access an element, if the index is less than head.Length, then return head[head.Length-1-index], otherwise return tail[index-head.Length]. All of these operations are clearly O(1).
Here is a data structure that supports O(1) append, prepend, first, last and size. We can easily add other methods from AbstractList<A> such as delete and update
import java.util.ArrayList;
public class FastAppendArrayList<A> {
private ArrayList<A> appends = new ArrayList<A>();
private ArrayList<A> prepends = new ArrayList<A>();
public void append(A element) {
appends.add(element);
}
public void prepend(A element) {
prepends.add(element);
}
public A get(int index) {
int i = prepends.size() - index;
return i >= 0 ? prepends.get(i) : appends.get(index + prepends.size());
}
public int size() {
return prepends.size() + appends.size();
}
public A first() {
return prepends.isEmpty() ? appends.get(0) : prepends.get(prepends.size());
}
public A last() {
return appends.isEmpty() ? prepends.get(0) : appends.get(prepends.size());
}
What you want is a double-ended queue (deque) like the STL has, since Java's ArrayDeque lacks get() for some reason. There were some good suggestions and links to implementations here:
Java equivalent of std::deque?
I am tasked with creating a small program that can read in the definition of a FSM from input, read some strings from input and determine if those strings are accepted by the FSM based on the definition. I need to write this in either C, C++ or Java. I've scoured the net for ideas on how to get started, but the best I could find was a Wikipedia article on Automata-based programming. The C example provided seems to be using an enumerated list to define the states, that's fine if the states are hard coded in advance. Again, I need to be able to actually read the number of states and the definition of what each state is supposed to do. Any suggestions are appreciated.
UPDATE:
I can make the alphabet small (e.g. { a b }) and adopt other conventions such as the
start state is always state 0. I'm allowed to impose reasonable restrictions on the number of
states, e.g. no more than 10.
Question summary:
How do I implement an FSA?
First, get a list of all the states (N of them), and a list of all the symbols (M of them). Then there are 2 ways to go, interpretation or code-generation:
Interpretation. Make an NxM matrix, where each element of the matrix is filled in with the corresponding destination state number, or -1 if there is none. Then just have an initial state variable and start processing input. If you get to state -1, you fail. If you run out of input symbols without getting to the success state, you fail. Otherwise you succeed.
Code generation. Print out a program in C or your favorite compiler language. It should have an integer state variable initialized to the start state. It should have a for loop over the input characters, containing a switch on the state variable. You should have one case per state, and at each case, have a switch statement on the current character that changes the state variable.
If you want something even faster than 2, and that is sure to get you flunked (!), get rid of the state variable and instead use goto :-) If you flunk, you can comfort yourself in the knowledge that that's what compilers do.
P.S. You could get your F changed to an A if you recognize loops etc. in the state diagram and print out corresponding while and if statements, rather than using goto.
One non-hardcoded way to represent an automaton is as a transition matrix, which allows to represent for each current state, and each input character, what the next state is.
You haven't actually asked a question. You'll get more and better help if you have a specific question for a specific task (but still give the overall goal). The question should be narrow in scope (e.g. not "How can I implement an FSA?").
As for how to represent an FSA (which seems to be what you're having difficulties with), read on.
Start by considering the definition of an FSM: it's an alphabet ∑, a set of states S, a start state s0, a set of accept states A and a transition function δ from a state and a symbol to a state. You have to be able to determine these properties from the input. Any states not reachable by the transition function can be dropped to produce an equivalent FSM. The minimal set of states and alphabet are thus implicit in the transition function; you could make your FSM easier to use (and harder to implement, but not much harder) by not requiring either ∑ or S in the input.
You don't need to use the same representation for states that the input uses. You could use unsigned integers for your internal representation, as long as you have a map from integers to strings and strings to integers so you can convert between the internal representation and external representation. This way, your transition function can be stored as an array, so the transition step can be performed in constant time.
A simpler approach would be to use the external representation as your internal representation. With this option, the transition function would be stored as a map from strings and symbols to strings. The transition step would probably be O(log(|S|+|∑|)), given the performance of most map data structures. If symbols are represented as integers (e.g. chars), the transition function could be represented as a map from strings to an array of strings, giving O(log(|S|)) performance.
Yet another optionmodeled after the graph view of an FSM, is to create a class for states. A state has a name (the external representation). States are responsible for transitions; send a symbol to a state and get back another state.
class State {
property name;
State& transition(Symbol s);
void setTransition(Symbol s, State& to);
}
Store the set of states as a map from names to states.
There you go, three different places to start, each with a different way to represent states.
Stop thinking about everything at once. Do one thing at a time
- come with language of state machine
- come with language for stimulus
- create sample file of one state machine in language
- create sample file of stimulus
- come with class for state
- come with class for transition
- come with class for state machine as set of states and transitions
- add method to handle violation to state class
- code a little parser for language
- code another parser for language
- initial state
- some output thing like WriteLn here and there
- main method
- compile
- run
- debug
- done
The way the OpenFst toolkit does it is: A FSM has a vector of states, each of which has a vector of arcs. Each arc has an input (and output) label, a target state ID and a weight. You could take a look at the code. Maybe it will inspire you.
If you're using an object-oriented language like Java or C++, I'd recommend that you start with objects. Before you worry about file formats and the like, get a good object model for a finite state automata and how it behaves. How will you represent states, transitions, events, etc.? Will your FSA be a Composite? Once you have that sort of thing working you can get the file formats right. Anything will do: XML, text, etc.