I have 3 points [x0 y0], [x1 y1], [x2 y2] with strict conditional x0<x1<x2, y0<y1<y2. All this points lay on some exponentional functions y=ae^(bx)+c. I need to find a,b,c... It's not possible to solve system of 3 equations precisely, therefore I need to approximate it. Is there some math library in java that will help me solve this problem? I find something similar on mathcad
https://help.ptc.com/mathcad/en/index.html#page/PTC_Mathcad_Help/exponential_regression.html but not find in java.
Other way - how to solve system of 3 equations and 3 values approximately.
ae^(bx_0)+c=y_0
ae^(bx_1)+c=y_1
ae^(bx_2)+c=y_2
You have to solve a system of non-linear equations, for which only an approximate solution is possible but can be done using the Newton Raphson's Multivariate method.
The algorithm is, quite frankly, a notational pain but you can go through it here -
http://fourier.eng.hmc.edu/e176/lectures/NM/node21.html.
What is happening essentially is you have a function whose derivative lead you to an 'equilibrium' from an initial random point (which you guess as a possible root)
If you are not willing to write the code yourself this repo can give you a starter of sorts - https://github.com/prasser/newtonraphson.
But AFAIK, no ready library exists for this purpose. You can use Wolfram's Mathematica or MATLAB/OCTAVE for ready libraries though.
That said, here are a few other (more complicated) things you can look into
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
https://www1.fpl.fs.fed.us/optimization.html
http://icl.cs.utk.edu/f2j/
http://optalgtoolkit.sourceforge.net/
http://scribblethink.org/Computer/Javanumeric/index.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
Hope this helps!
Related
I am trying to create a class that takes in a string input containing pseudocode and computes its' worst case runtime complexity. I will be using regex to split each line and analyze the worst-case and add up the complexities (based on the big-O rules) for each line to give a final worst-case runtime. The pseudocode written will follow a few rules for declaration, initilization, operations on data structures. This is something I can control. How should I go about designing a class considering the rules of iterative and recursive analysis?
Any help in C++ or Java is appreciated. Thanks in advance.
class PseudocodeAnalyzer
{
public:
string inputCode;
string performIterativeAnalysis(string line);
string performRecursiveAnalysis(string line);
string analyzeTotalComplexity(string inputCode);
}
An example for iterative algorithm: Check if number in a grid is Odd:
1. Array A = Array[N][N]
2. for i in 1 to N
3. for j in 1 to N
4. if A[i][j] % 2 == 0
5. return false
6. endif
7. endloop
8. endloop
Worst-case Time-Complexity: O(n*n)
The concept: "I wish to write a program that analyses pseudocode in order to print out the algorithmic complexity of the algorithm it describes" is mathematically impossible!
Let me try to explain why that is, or how you get around the inevitability that you cannot write this.
Your pseudocode has certain capabilities. You call it pseudocode, but given that you are now trying to parse it, it's still a 'real' language where terms have real meaning. This language is capable of expressing algorithms.
So, which algorithms can it express? Presumably, 'all of them'. There is this concept called a 'turing machine': You can prove that anything a computer can do, a turing machine can also do. And turing machines are very simple things. Therefore, if you have some simplistic computer and you can use that computer to emulate a turing machine, you can therefore use it to emulate a complete computer. This is how, in fundamental informatics, you can prove that a certain CPU or system is capable of computing all the stuff some other CPU or system is capable of computing: Use it to compute a turing machine, thus proving you can run it all. Any system that can be used to emulate a turing machine is called 'turing complete'.
Then we get to something very interesting: If your pseudocode can be used to express anything a real computer can do, then your pseudocode can be used to 'write'... your very pseudocode checker!
So let's say we do just that and stick the pseudocode that describes your pseudocode checker in a function we shall call pseudocodechecker. It takes as argument a string containing some pseudocode, and returns a string such as O(n^2).
You can then write this program in pseudocode:
1. if pseudocodechecker(this-very-program) == O(n^2)
2. If True runSomeAlgorithmThatIsO(1)
3. If False runSomeAlgorithmTahtIsO(n^2)
And this is self-defeating: We have 'programmed' a paradox. It's like "This statement is a lie", or "the set of all sets that do not contain themselves". If it's false it is true and if it is true it false. [Insert GIF of exploding computer here].
Thus, we have mathematically proved that what you want is impossible, unless one of the following is true:
A. Your pseudocode-based checker is incorrect. As in, it will flat out give a wrong answer sometimes, thus solving the paradox: If you feed your program a paradox, it gives a wrong answer. But how useful is such an app? An app where you know the answer it gives may be incorrect?
B. Your pseudocode-based checker is incomplete: The official definition of your pseudocode language is so incapable, you cannot even write a turing machine in it.
That last one seems like a nice solution; but it is quite drastic. It pretty much means that your algorithm can only loop over constant ranges. It cannot loop until a condition is true, for example. Another nice solution appears to be: The program is capable of realizing that an answer cannot be given, and will then report 'no answer available', but unfortunately, with some more work, you can show that you can still use such a system to develop a paradox.
The answer by #rzwitserloot and the ones given in the link are correct. Let me just add that it is possible to compute an approximation both to the halting problem as well as to finding the time complexity of a piece of code (written in a Turing-complete language!). (Compare that to the existence of automated theorem provers for arithmetic and other second order logics, which are undecidable!) A tool that under-approximated the complexity problem would output the correct time complexity for some inputs, and "don't know" for other inputs.
Indeed, the whole wide field of code analyzers, often built into the IDEs that we use every day, more often than not under-approximate decision problems that are uncomputable, e.g. reachability, nullability or value analyses.
If you really want to write such a tool: the basic idea is to identify heuristics, i.e., common patterns for which a solution is known, such as various patterns of nested for-loops with only very basic arithmetic operations manipulating the indices, or simple recursive functions where the recurrence relation can be spotted straight-away. It would actually be not too hard (though definitely not easy!) to write a tool that could solve most of the toy problems (such as the one you posted) that are given as homework to students, and that are often posted as questions here on SO, since they follow a rather small number of patterns.
If you wish to go beyond simple heuristics, the main theoretical concept underlying more powerful code analyzers is abstract interpretation. Applied to your use case, this would mean developing a mapping between code constructs in your language to code constructs in a different language (or simpler code constructs in the same language) for which it is easier to compute the time complexity. This mapping would have to conform to some constraints, in particular, the mapped constructs have have the same or worse time complexity as the original code. Actually, mapping a piece of code to a recurrence relation would be an example of abstract interpretation. So is replacing a line of code with something like "O(1)". So, the task is just to formalize some of the things that we do in our heads anyway when we are analyzing the time complexity of code.
I have a system of nonlinear dynamics which I which to solve to optimality. I know how to do this in MATLAB, but I wish to implement this in JAVA. I'm for some reason lost in how to do it in Java.
What I have is following:
z(t) which returns states in a dynamic system.
z(t) = [state1(t),...,state10(t)]
The rate of change of this dynamic system is given by:
z'(t) = f(z(t),u(t),d(t)) = [dstate1(t)/dt,...,dstate10(t)/dt]
where u(t) and d(t) is some external variables that I know the value of.
In addition I have a function, lets denote that g(t) which is defined from a state variable:
g(t) = state4(t)/c1
where c1 is some constant.
Now I wish to solve the following unconstrained nonlinear system numerically:
g(t) - c2 = 0
f(z(t),u(t),0)= 0
where c2 is some constant. Above system can be seen as a simple f'(x) = 0 problem consisting of 11 equations and 1 unkowns and if I where supposed to solve this in MATLAB I would do following:
[output] = fsolve(#myDerivatives, someInitialGuess);
I am aware of the fact that JAVA doesn't come with any build-in solvers. So as I see it there are two options in solving the above mentioned problem:
Option 1: Do it my-self: I could use numerical methods as e.g. Gauss newton or similar to solve this system of nonlinear equations. However, I will start by using a java toolbox first, and then move to a numerical method afterwards.
Option 2: Solvers (e.g. commons optim) This solution is what I am would like to look into. I have been looking into this toolbox, however, I have failed to find an exact example of how to actually use the MultiVariateFunction evaluater and the numerical optimizer. Does any of you have any experience in doing so?
Please let me know if you have any ideas or suggestions for solving this problem.
Thanks!
Please compare what your original problem looks like:
A global optimization problem
minimize f(y)
is solved by looking for solutions of the derivatives system
0=grad f(y) or 0=df/dy (partial derivatives)
(the gradient is the column vector containing all partial derivatives), that is, you are computing the "flat" or horizontal points of f(y).
For optimization under constraints
minimize f(y,u) such that g(y,u)=0
one builds the Lagrangian functional
L(y,p,u) = f(y,u)+p*g(y,u) (scalar product)
and then compute the flat points of that system, that is
g(y,u)=0, dL/dy(y,p,u)=0, dL/du(y,p,u)=0
After that, as also in the global optimization case, you have to determine what the type of the flat point is, maximum, minimun or saddle point.
Optimal control problems have the structure (one of several equivalent variants)
minimize integral(0,T) f(t,y(t),u(t)) dt
such that y'(t)=g(t,y(t),u(t)), y(0)=y0 and h(T,y(T))=0
To solve it, one considers the Hamiltonian
H(t,y,p,u)=f(t,y,u)-p*g(t,y,u)
and obtained the transformed problem
y' = -dH/dp = g, (partial derivatives, gradient)
p' = dH/dy,
with boundary conditions
y(0)=y0, p(T)= something with dh/dy(T,y(T))
u(t) realizes the minimum in v -> H(t,y(t),p(t),v)
I'm looking for a Java library to solve this problem:
We know X is sparse(most of it's entries are zero), so X can be recovered by solving this:
variable X;
minimize(norm(X,1)+norm(A*X - Y,2));
It's a MATLAB code, matrix A and vector Y are known and I want the best X.
I saw JOptimizer, but I couldn't use it. (Doesn't have good documentation or examples).
What you need is a reasonably good LP Solver.
Possible Java LP Solver Options
Apache Commons (Math) Simplex Solver.
See this blog post.
If you have access to CPLEX (not-free), its Java API would work great.
Also, you can look into SuanShu, a Java numerical and statistical library
lpSolve has a Java wrapper which can do the job.
Finally, JOptimizer is indeed a good option. Not sure if you looked at this example.
Hope at least one of those help.
As far as I can tell, you're trying to solve a binary integer program for feasibility
Ax = b, x in {0,1}.
I'm not completely sure, but it seems that you might be interested in the optimization problem
min 1'*x
s.t. Ax = b, x in {0,1}
where 1 is a vector of 1's of the same dimension as x.
The feasibility problem may be in practice much easier than the optimization problem - it all depends on a particular A and b.
If you can get a license of either CPLEX or Gurobi (if you're an academic), these are excellent integer programming solvers with good Java API's. If you don't have access to these, lpsolve may be a good option.
As far as I can tell, JOptimizer will not solve your problem since your variables are integers (although I have never used JOptimizer).
To solve convex optimization problems in java you can use the following library https://github.com/erikerlandson/gibbous
I am wanting to make a program that will when given a formula, it can manipulate the formula to make any value (or in the case of a simultaneous formula, a common value) the subject of the formula.
For example if given:
a + b = c
d + b = c
The program should therefore say:
b = c - a, d = c - b etc.
I'm not sure if java can do this automatically or not when I give the original formula as input. I am not really interested in solving the equation and getting the result of each variable, I am just interested in returning a manipulated formula.
Please let me know if I need to make an algorithm or not for this, and if so, how would I go about doing this. Also, if there are any helpful links that you might have, please post them.
Regards
Take a look at JavaCC. It's a little daunting at first but it's the right tool for something like this. Plus there are already examples of what you are trying to achieve.
Not sure what exactly you are after, but this problem in its general problem is hard. Very hard.
In fact, given a set of "formulas" (axioms), and deduction rules (mathematical equivalence operations), we cannot deduce if a given formula is correct or not. This problem is actually undecideable.
This issue was first addressed by Hilbert as Entscheidungsproblem
I read a book called Fluid Concepts and Creative Analogies by Douglas Hofstadter that talked about this sort of algebraic manipulations that would automatically rewrite equations in other ways attempting to join equations to other equations an infinite (yet restricted) number of ways given rules. It was an attempt to prove yet unproven theorems/proofs by brute force.
http://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_Analogies
Douglas Hofstadter's Numbo program attempts to do what you want. He doesn't give you the source, only describes how it works in detail.
It sounds like you want a program to do what highschool students do when they solve algebraic problems to move from a position where you know something, modifying it and combining it with other equations, to prove something previously unknown. It takes a strong Artificial intelligence to do this. The part of your brain that does this is the Neo Cortex, which does science, and it's operating principle is as of yet not understood.
If you want something that will do what college students do when they manipulate equations in calculus, you'll have to build a fairly strong artificial intelligence.
http://en.wikipedia.org/wiki/Neocortex
When we can do whole-brain emulation of a human neo cortex, I will post the answer here.
Yes, you need to write some algorithm to do this kind of computer algebra. At least
a parser to interpret the input
an algebra model to relate parsed operands ('a', 'b', ...) and operator ('+', '=')
implement any appropriate rule to support the manipulation you wish to do
What would be the best approach to compare two hexadecimal file signatures against each other for similarities.
More specifically, what I would like to do is to take the hexadecimal representation of an .exe file and compare it against a series of virus signature. For this approach I plan to break the file (exe) hex representation into individual groups of N chars (ie. 10 hex chars) and do the same with the virus signature. I am aiming to perform some sort of heuristics and therefore statistically check whether this exe file has X% of similarity against the known virus signature.
The simplest and likely very wrong way I thought of doing this is, to compare exe[n, n-1] against virus [n, n-1] where each element in the array is a sub array, and therefore exe1[0,9] against virus1[0,9]. Each subset will be graded statistically.
As you can realize there would be a massive number of comparisons and hence very very slow. So I thought to ask whether you guys can think of a better approach to do such comparison, for example implementing different data structures together.
This is for a project am doing for my BSc where am trying to develop an algorithm to detect polymorphic malware, this is only one part of the whole system, where the other is based on genetic algorithms to evolve the static virus signature. Any advice, comments, or general information such as resources are very welcome.
Definition: Polymorphic malware (virus, worm, ...) maintains the same functionality and payload as their "original" version, while having apparently different structures (variants). They achieve that by code obfuscation and thus altering their hex signature. Some of the techniques used for polymorphism are; format alteration (insert remove blanks), variable renaming, statement rearrangement, junk code addition, statement replacement (x=1 changes to x=y/5 where y=5), swapping of control statements. So much like the flu virus mutates and therefore vaccination is not effective, polymorphic malware mutates to avoid detection.
Update: After the advise you guys gave me in regards what reading to do; I did that, but it somewhat confused me more. I found several distance algorithms that can apply to my problem, such as;
Longest common subsequence
Levenshtein algorithm
Needleman–Wunsch algorithm
Smith–Waterman algorithm
Boyer Moore algorithm
Aho Corasick algorithm
But now I don't know which to use, they all seem to do he same thing in different ways. I will continue to do research so that I can understand each one better; but in the mean time could you give me your opinion on which might be more suitable so that I can give it priority during my research and to study it deeper.
Update 2: I ended up using an amalgamation of the LCSubsequence, LCSubstring and Levenshtein Distance. Thank you all for the suggestions.
There is a copy of the finished paper on GitHub
For algorithms like these I suggest you look into the bioinformatics area. There is a similar problem setting there in that you have large files (genome sequences) in which you are looking for certain signatures (genes, special well-known short base sequences, etc.).
Also for considering polymorphic malware, this sector should offer you a lot, because in biology it seems similarly difficult to get exact matches. (Unfortunately, I am not aware of appropriate approximative searching/matching algorithms to point you to.)
One example from this direction would be to adapt something like the Aho Corasick algorithm in order to search for several malware signatures at the same time.
Similarly, algorithms like the Boyer Moore algorithm give you fantastic search runtimes especially for longer sequences (average case of O(N/M) for a text of size N in which you look for a pattern of size M, i.e. sublinear search times).
A number of papers have been published on finding near duplicate documents in a large corpus of documents in the context of websearch. I think you will find them useful. For example, see
this presentation.
There has been a serious amount of research recently into automating the detection of duplicate bug reports in bug repositories. This is essentially the same problem you are facing. The difference is that you are using binary data. They are similar problems because you will be looking for strings that have the same basic pattern, even though the patterns may have some slight differences. A straight-up distance algorithm probably won't serve you well here.
This paper gives a good summary of the problem as well as some approaches in its citations that have been tried.
ftp://ftp.computer.org/press/outgoing/proceedings/Patrick/apsec10/data/4266a366.pdf
As somebody has pointed out, similarity with known string and bioinformatics problem might help. Longest common substring is very brittle, meaning that one difference can halve the length of such a string. You need a form of string alignment, but more efficient than Smith-Waterman. I would try and look at programs such as BLAST, BLAT or MUMMER3 to see if they can fit your needs. Remember that the default parameters, for these programs, are based on a biology application (how much to penalize an insertion or a substitution for instance), so you should probably look at re-estimating parameters based on your application domain, possibly based on a training set. This is a known problem because even in biology different applications require different parameters (based, for instance, on the evolutionary distance of two genomes to compare). It is also possible, though, that even at default one of these algorithms might produce usable results. Best of all would be to have a generative model of how viruses change and that could guide you in an optimal choice for a distance and comparison algorithm.