Generate an MD5 hash with six leading zeros - java

Doing some practice coding problems in Java from Advent of Code and came across one where it asked me to find the smallest six-digit integer, combined with the leading string of iwrupvqb, that produced an MD5 hash that started with five zeros.
I found the answer to this part using the Apache DigestUtils.md5Hex function where I just brute forced through 100000-999999 and combined it with iwrupvqbuntil I got an MD5 that started with five zeros.
The answer came out to be iwrupvqb346386 creating the hash:
0000045c5e2b3911eb937d9d8c574f09
Now it's asking me to find one with six leading zeros. I've been going through pages and pages of how the md5 algorithm works, inverting MD5, etc but can't seem to figure out the problem in an equation format that would help me determine how the MD5 would be calculated based on the characters used.
I even let this loop run for like 30 mins - an hour to see if it got any hits outside of a six digit integer (because apparently there are none combined with this text phrase that create six leading zeros)
I don't really know anything about hexadecimal so at this point I'm just taking shots in the dark and trying to guess number combos all night isn't really my thing. I don't necessarily need to solve this problem for anything besides practice but I am curious to know more about what's going on here. (And yes I am aware that MD5 is compromised and I wouldn't ever use it in production)

This problem can only be solved by brute-forcing. That is exactly how "proof-of-work" in Bitcoin works, for example. The only way to speed it up is to optimize each step in your calculation. Bitcoin miners have moved to specialized hardware because of this. They don't do anything "special" or "clever", they just calculate hashes very, very fast.
You can only optimize the code and throw more/better hardware at it. A cluster of compute nodes would also work well here, the problem lends itself to parallel processing (again, Bitcoin mining pools).
If you have a multi-core CPU, an easy thing to do is to use one Thread per CPU. Should speed up linearly (which may still not be fast enough).

I'm also working through the Advent of Code but I'm using PowerShell. I solved Puzzle 2 of Day 4 with basically the same code that I used to solve Puzzle 1. The only change was to the WHILE condition to check for six leading zeroes. It did take a lot longer to run than it did to solve for Puzzle 1. I just kicked it off and went to bed after I saw this post. Got my answer when I woke up. My code is posted at Github. Advent of Code Day 4 Puzzle 2 solution with PowerShell

Related

Cyclomatic Complexity in Intellij

I was working on an assignment today that basically asked us to write a Java program that checks if HTML syntax is valid in a text file. Pretty simple assignment, I did it very quickly, but in doing it so quickly I made it very convoluted (lots of loops and if statements). I know I can make it a lot simpler, and I will before turning it in, but Amid my procrastination, I started downloading plugins and seeing what information they could give me.
I downloaded two in particular that I'm curious about - CodeMetrics and MetricsReloaded. I was wondering what exactly these numbers that it generates correlate to. I saw one post that was semi-similar, and I read it as well as the linked articles, but I'm still having some trouble understanding a couple of things. Namely, what the first two columns (CogC and ev(G)), as well as some more clarification on the other two (iv(G) and v(G)), mean.
MetricsReloaded Method Metrics:
MetricsReloaded Class Metrics:
These previous numbers are from MetricsReloaded, but this other application, CodeMetrics, which also calculates cyclomatic complexity gives slightly different numbers. I was wondering how these numbers correlate and if someone could just give a brief general explanation of all this.
CodeMetrics Analysis Results:
My final question is about time complexity. My understanding of Cyclomatic complexity is that it is the number of possible paths of execution and that it is determined by the number of conditionals and how they are nested. It doesn't seem like it would, but does this correlate in any way to time complexity? And if so, is there a conversion between them that can be easily done? If not, is there a way in either of these plug-ins (or any other in IntelliJ) that can automate time complexity calculations?

Ambiguity in a CodeForces Problem - usage of HashSet Vs LinkedHashSet

I was solving a Codeforces problem yesterday. The problem's URL is this
I will just explain the question in short below.
Given a binary string, divide it into a minimum number of subsequences
in such a way that each character of the string belongs to exactly one
subsequence and each subsequence looks like "010101 ..." or "101010
..." (i.e. the subsequence should not contain two adjacent zeros or
ones).
Now, for this problem, I had submitted a solution yesterday during the contest. This is the solution. It got accepted temporarily and on final test cases got a Time limit exceeded status.
So today, I again submitted another solution and this passed all the cases.
In the first solution, I used HashSet and in the 2nd one I used LinkedHashSet. I want to know, why didn't HashSet clear all the cases? Does this mean I should use LinkedHashSet whenever I need a Set implementation? I saw this article and found HashSet performs better than LinkedHashSet. But why my code doesn't work here?
This question would probably get more replies on Codeforces, but I'll answer it here anyways.
After a contest ends, Codeforces allows other users to "hack" solutions by writing custom inputs to run on other users' programs. If the defending user's program runs slowly on the custom input, the status of their code submission will change from "Accepted" to "Time Limit Exceeded".
The reason why your code, specifically, changed from "Accepted" to "Time Limit Exceeded" is that somebody created an "anti-hash test" (a test on which your hash function results in many collisions) on which your program ran slower than usual. If you're interested in how such tests are generated, you can find several posts on Codeforces, like this one: https://codeforces.com/blog/entry/60442.
As linked by #Photon, there's a post on Codeforces explaining why you should avoid using Java.HashSet and Java.HashMap: https://codeforces.com/blog/entry/4876, which is essentially due to anti-hash tests. In some instances, adding the extra log(n) factor from a balanced BST may not be so bad (by using TreeSet or TreeMap). In many cases, an extra log(n) factor won't make your code time out, and it gives you protection from anti-hash tests.
How do you determine whether your algorithm is fast enough to add the log(n) factor? I guess this comes with some experience, but most people suggest performing some sort of calculation. Most online judges (including Codeforces) show the time that your program is allowed to run on a particular problem (usually somewhere between one and four seconds), and you can use 10^9 constant-time operations per second as a rule of thumb when performing calculations.

Neural network to solve a card-prob

The problem - I have 10 number of cards value 1 to 10. Now I have to arrange the cards in away that adding 5 cards gives me 36 and product of remaining 5 cards give me 360.
I had successfully made a GA to solve cards Problem in java. Now I am thinking to solve same problem with Neural Network. Is it possible to solve this by NN? What approach should I take?
This problem is hard to solve directly with a Neural Network. Neural Networks are not going to have a concept of sum or product, so they won't be able to tell the difference between a valid and invalid solution directly.
If you created enough examples and labelled then then the neural network might be able to learn to tell the "good" and "bad" arrangements apart just by memorising them all. But it would be a very inefficient and inaccurate way of doing this, and it would be somewhat pointless - you'd have to have a separate program that knew how to solve the problem in order to create the data to train the neural network.
P.S. I think you are a bit lucky that you managed to get the GA to work as well - I suspect it only worked because the problem is small enough for the GA to try most of the possible solutions in the vicinity of the answer(s) and hence it stumbles upon a correct answer by chance before too long.
To follow up on #mikera's comments on why Neural Networks (NNs) might not be best for this task, it is useful to consider how NNs are usually used.
A NN is usually used in a supervised learning task. That is, the implementer provides many examples of input and the correct output that goes with that input. The NN then finds a general function which captures the provided input/output pairs and hopefully captures many other previously unseen input/output pairs as well.
In your problem you are solving a particular optimization, so there isn't much training to be done. There is just one (or more) right answers. So, NNs aren't really designed for such problems.
Note that the concept of not having a sum/product doesn't necessarily hurt a NN. You just have to create your own input layer which has sum and product features so that the NN can learn directly from these features. But, in this problem it won't help very much.
Note also that your problem is so small that even a naive enumeration of all combinations (10! = 3,628,800) of numbers should be achievable in a few seconds at most.

Time for A* algorithm to solve a 8 tile sliding puzzle

Just wondering if anyone could help me out with some code that I'm currently working on for uni. It's a sliding tile puzzle that I'm coding and I've implemented an A* algorithm with a Manhattan distance heuristic. At the moment the time for it to solve the puzzle can range from a few hundered milliseconds to up to about 12 seconds for some configurations. What I was wanting to know is if this range in time is what I should be expecting?
I've never really done any AI before and I'm having to learn this on the fly, so any help would be appreciated.
What i was wanting to know is if this range in time is what i should be expecting?
That's a little hard to figure out just from the information you've provided. It would help if you could describe how you implemented A*, or if you profiled your application and needed help with specific areas that were slow.
One thing to note that'd probably speed up your average solution time: Half of the starting positions of any n-tile puzzle can never lead to a solution, so you can immediately exclude certain configurations very quickly. For example, you cannot solve an 8-tile puzzle that looks like this:
1 2 3
4 5 6
8 7 .
To see why, note that because the blank space has to wind up back where it started, the overall number of "up"/"down" moves must be equal, as does the overall number of "left"/"right" moves. That means that the overall number of moves must be even.
But the 7/8 transposition here is one move off from the starting puzzle, without changing the blank position! So this puzzle can't be solved. (However, if we made two transpositions, then it'd be solvable again.)
Like you should know you cannot expect any general time. It depends everytime on the code itself especially in which deap your implementation walkes down the tree and also if your code can use the advantages for processor features.
For debugging I would save or print out (but this takes time!) in which level of your tree you are.
Also remember that the weights are very important. E.g.:
123
4 6 <- your final state
789
213 1 3
To change 4 6 is much more expensive than 426
789 789
I hope that helps.
Obviously, this depends not only on your hardware, but on your implementation.
It's not a good measure of performance, though: What you want to do is determine the effective branching factor of your heuristic, vs the actual branching factor of some other non-heuristic approach.
I don't want to say too much more, since this is a homework problem, but if memory serves, Russel and Norvig conver this in the context of the sliding puzzle itself... chapter three, perhaps? (My R+N is not at hand.)

Help with Project Euler #200? [duplicate]

This question already has an answer here:
Closed 13 years ago.
Possible Duplicate:
Need help solving Project Euler problem 200
Similar to this question
Project Euler Problem 200.
I wrote up a brute force solution in Java that takes several hours to run, and produced the first 500+ sqube numbers, which I thought should be enough. However, none of the answers from 190 to 210 seems to be the correct answer.
I'm wondering what I'm doing wrong here and how I could optimize this. Could the problem lie in BigInteger.isProbablePrime()?
I'm not sure if Stackoverflow is the best place to ask this, but I seem to be stuck. I've included my code and the generated data.
I'd really appreciate it if someone would give me some hints or pointers.
Edit: I've run the program again simply with using the first 500,000 prime numbers; took a day to run but produced the correct answer.
I'm a Project Euler administrator. Please do not post information that can spoil the problem for others, particularly code and answers, even half-functioning code. Please edit your question accordingly. EDIT: Thank you for doing so!
It's not unusual for solvers to use the web to search for information on solving a problem, and it would take away some fun if they stumbled upon such a spoiler. (Yes, I know there are sites with lots of solutions ready-made, but at least they're generally for the lowered numbered easy problems.)
We have forums for discussing difficulties with problems and getting hints, which are aggressively edited for spoilers.
Aren't you supposed to think of a clever solution which doesn't take a day or even an hour to run? :D
I think the problem is isProbablePrime, which doesn't guarantee that a number is a prime. It just says that a prime which was found could be a prime with a certain probability.
You should use an algorithm which is certain that it has found a prime.
The first answer is incorrect because isProbablyPrime isn't always correct (hence the Probably). It's slow, in part, because you are using BigInteger. All the values involved will fit in a long. Why not use a long?
There may be some simple refactorings you could do that would save some time. It seems that the bulk of the time is spent in the nested for loop. The order of magnitude is n squared. You could reduce the complexity by not nesting loops. Another problem is that you are finding more potential results than are required. You only need to find 200. The reason you are needing to find more is due to the fact that you are not finding the potential results in their numeric order. The TreeSet does keep the results in order, but the algorithm would be faster if it were able to stop when the 200th result was found.

Categories