How can I simulate erroneous bit transmission through a wire? - java

So I have the following homework, but I don't understand exactly what the process is. Has anyone seen this question before or actually understands what the logic should be? I don't want code, I know how to program, but I don't exactly know what to do here.
Consider a wire across which data is transmitted bit-by-bit. Occasionally, a bit or a group of
consecutive bits is transmitted incorrectly. If the previous bit was transmitted correctly, the
probability that the current bit is transmitted incorrectly is 0.1. If the previous bit was
transmitted incorrectly, the probability that the current bit is also transmitted incorrectly
is 0.3. Write a program called BitError.java that simulates the transmission of one million
bits and prints out the percentage of bits transmitted incorrectly.
(Hint: According to theory, the expected answer is 12.5%.)

You test for the probability of an event happening as follows
Generate a uniform random number between 0 and 1
If the number generated is less than the prob an event happening then the event happend
Your code should look something like this
// Generate random bit either a 0 or a 1
int bit = RandInt(0,1)
// Assume first bit was correct
bool bPreviousWasCorrect = false;
Loop 1 million times
double probBitIsCorrect = RandUnif(0,1) // get a random number between 0 and 1
if bPreviousWasWrong then
// if an error has occured then a 2nd error occurs with prob 0.3
if (probBitIsCorrect < 0.3) then
Set bPreviousWasWrong to true
increment number of wrong bits
else
Set bPreviousWasWrong to false
increment number of correct bits
end if
else
if (probBitIsCorrect < 0.1) then
Set bPreviousWasWrong to true
increment number of wrong bits
else
Set bPreviousWasWrong to false
increment number of correct bits
end if
Display results when done

They want you to write a simulator. You make a loop which does one million iterations, each iteration representing the transmission of one bit. Every time you decide randomly if the bit gets transmitted correctly or incorrectly, based on the two rules, and keep count.
At the end, your simulation will tell you how many bits were transmitted correctly (which should apparently be close to 87.5%).

ok, so you need to "transmit" the data bit by bit and on each iteration calculate the probability.
let's consider that the transmission of the first bit is with the probability of a bit with a preceding correct transmission. this means that the first bit's probability to be transmitted correctly is 0.9
next iteration:
if bit 1 was transmitted correctly, do probability 0.9 for correct, otherwise 0.7

That's why it's called homework.... It's deigned so that you don't exactly know what to do.
The problem relates to recursion and iteration. Google: Recursion. Given the current state (whether or not the previous bit was transmitted correctly or not) you can calculate the probability that the current bit is transmitted correctly. After that, it's simple probability (e.g. multiplication) to get 12.5%. You may even be able to do it without looping through all the bits, depending on how much statistics you know.
At the end you should know all about recursion. That's what the assignment is really about. What is your base case (i.e. the first bit) and what is your recursive step (i.e. each bit thereafter)? Once you understand that, writing the Java should be easy.

Related

Generating random integers uniformly in log space

I want to generate random integers which are uniformly distributed in log space. That is, the log of the values of will be uniformly distributed.
A normal uniformly distributed unsigned int will have 75% of its magnitudes above 1 billion, and something like 99.98% above 1 million, so small values are underrepresented. A uniform value from log space would have the same number of values in the range 4-8, as 256-512, for example.
Ignoring negative values for now, one way I can think of is something like:
Random r = new Random();
return (int)Math.pow(2, r.nextDouble() * 31);
That should generate a 31-bit log-uniformly distributed. It's not going to be fast though, with an pow() operation in there and to introduce floating point values to generate integers is a bit of a smell. Furthermore, a lot of the range of double is lost by Random.nextDouble() and it is not clear to me if this code can even generate all 2^31-1 positive integer values.
Better solutions welcome.
There are two similar solutions below which both involve filling the integer with random bits, then shifting a random number of bits to the right. Something like:
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
This has two types of bias:
Step-wise bias
This produces sort of a stepwise log distributed value, not a smooth one. In particular, the right shift by a random value in [0,31], means there are 31 equally probable "sizes" of integers, and every value in that range is equally probable. Since there are 2^N values in range N, the values in one range are twice as probable as the ones in the next - so you get log behavior between the ranges, but the ranges themselves are flat.
I don't know of an easy way to get rid of this bias.
Top bit bias
A second form of bias occurs because the MSB is not always 1 (e.g., even a shift amount of 10, doesn't necessary produce a 31-10=21 bit value, there is an additional distortion. In effect, the ranges overlap. The value 1 is not just present (with p(1)=.5) for a shift amount of 30, but also for shifts of 29 (p(1)=0.25), 28 (p(1)=.125), and so on. That effect cancels out for smaller values (i.e., if you look at shift amounts of 30 and 29 only, 1 seems like it is 3x more likely than 2, rather than the predicted value of 2x, but once you look at more values it converges. It doesn't cancel out for large values, however, which is why you see the 20:32207 bucket be smaller than the others in #sprinter's answer.
I think this form of bias can pretty easily be removed simply by forcing the top bit to zero, so something like:
(r.nextInt(0x40000000) | 0x40000000) >> r.nextInt(31)
This has a couple of other tweaks - it a max of 2^30 for the rand, which is faster (special case for powers of 2 in nextInt(int) code), since we never want the second-from-MSB bit set anyway (we force it to 1). This also eliminates a microscopic additional source of bias which is that Integer.MAX_VALUE could never be generated, so one value is missing from full representation.
It shifts by [0,31) bits so you never get zero, if you want zeros too, change that to shift by [0,32) bits and you'll get zeros equal in frequency to ones (technically not log-distributed anymore, but useful in many cases). Another approach is to subtract one from the final value to get zeros (at the cost of never getting Integer.MAX_VALUE).
Incorrect answer provided for information only. This does not satisfy OP's requirements for the reasons given in the question.
int number = rand.nextInt(Integer.MAX_VALUE) >> rand.nextInt(Integer.SIZE);
My informal test of that seems to indicate there is the expected skew. I generated 1M numbers this way and had the following distribution of the log (ignoring zeros)
0:46819
1:47045
2:40663
3:44001
4:45306
5:43802
6:46447
7:43355
8:47366
9:42747
10:46387
11:43899
12:45179
13:45496
14:44431
15:46751
16:43055
17:47127
18:41243
19:41837
20:32207
21:11965

Double as close to 0 as possible?

I need a value as close to 0 as possible. I need to be able to divide through this value, but it should be effectively 0.
Does Java provide an easy way of generating a double with only the least significant bit set? Or do I have to calculate it myself?
//EDIT: A little background information, because someone requested it. I know that my soultion is not a particularly clean one, but here you are:
I am writing a program for homework. It calculates the resistance of a circuit consisting of multiple resistors in parallel and serial circuits.
It is a 2nd year programming class. Our teacher still designs classes for us, we need to implement them according to his design.
Parallel circuits involve calculation of 1/*resistance*, therefore my program prohibits creation of resistors with 0 Ohm. Physics tells you that this is impossible anyway (you have just a tiny little resistance in every metal).
However, the example circuit we should use to test the program contains a 0 Ohm resistor. It is placed in a serial circuit, but resistors do not know where they are (the teacher designed it that way), so I cannot change my program to allow resistors with 0 Ohm resistance in serial circuits only.
Two solutions:
Allow 0 Ohm resistors in any case - if division by 0 occurs, well, bad luck
Set the resistor not to 0, but to a resistance one can neglect.
Both are not very good. The first one seemed not too good to me, and neither did the second, but I had to decide.
It was just a random choice that threw up the problem. I could not let go without solving it, so switching to the first one was not an option anymore ;-)
Use Double.MIN_VALUE:
A constant holding the smallest positive nonzero value of type double, 2-1074. It is equal to the hexadecimal floating-point literal 0x0.0000000000001P-1022 and also equal to Double.longBitsToDouble(0x1L).
If you would like to divide by "zero" you can actually just use Double.POSITIVE_INFINITY as the result.

Java - normalize and denormalize nominal attributes in neural networks

Hi I am building a simple multilayer network which is trained using back propagation. My problem at the moment is that some attributes in my dataset are nominal (non numeric) and I have to normalize them. I wanted to know what the best approach is. I was thinking along the lines of counting up how many distinct values there are for each attribute and assigning each an equal number between 0 and 1. For example suppose one of my attributes had values A to E then would the following be suitable?:
A = 0
B = 0.25
C = 0.5
D = 0.75
E = 1
The second part to my question is denormalizing the output to get it back to a nominal value. Would I first do the same as above to each distinct output attribute value in the dataset in order to get a numerical representation? Also after I get an output from the network, do I just see which number it is closer to? For example if I got 0.435 as an output and my output attribute values were assigned like this:
x = 0
y = 0.5
z = 1
Do I just find the nearest value to the output (0.435) which is y (0.5)?
You can only do what you are proposing if the variables are ordinal and not nominal, and even then it is a somewhat arbitrary decision. Before I suggest a solution, a note on terminology:
Nominal vs ordinal variables
Suppose A, B, etc stand for colours. These are the values of a nominal variable and can not be ordered in a meaningful way. You can't say red is greater than yellow. Therefore, you should not be assigning numbers to nominal variables .
Now suppose A, B, C, etc stand for garment sizes, e.g. small, medium, large, etc. Even though we are not measuring these sizes on an absolute scale (i.e. we don't say that small corresponds to 40 a chest circumference), it is clear that small < medium < large. With that in mind, it is still somewhat arbitrary whether you set small=1, medium=2, large=3, or small=2, medium=4, large=8.
One-of-N encoding
A better way to go about this is to to use the so called one-out-of-N encoding. If you have 5 distinct values, you need five input units, each of which can take the value 1 or 0. Continuing with my garments example, size extra small can be encoded as 10000, small as 01000, medium as 00100, etc.
A similar principle applies to the outputs of the network. If we treat garment size as output instead of input, when the network output the vector [0.01 -0.01 0.5 0.0001 -.0002], you interpret that as size medium.
In reply to your comment on #Daan's post: if you have 5 inputs, one of which takes 20 possible discrete values, you will need 24 input nodes. You might want to normalise the values of your 4 continuous inputs to the range [0, 1], because they may end out dominating your discrete variable.
It really depends on the meaning of the attributes you're trying to normalize, and the functions used inside your NN. For example, if your attributes are non-linear, or if you're using a non-linear activation function, then linear normalization might not end up doing what you want it to do.
If the ranges of attribute values are relatively small, splitting the input and output into sets of binary inputs and outputs will probably be simpler and more accurate.
EDIT:
If the NN was able to accurately perform it's function, one of the outputs will be significantly higher than the others. If not, you might have a problem, depending on when you see inaccurate results.
Inaccurate results during early training are expected. They should become less and less common as you perform more training iterations. If they don't, your NN might not be appropriate for the task you're trying to perform. This could be simply a matter of increasing the size and/or number of hidden layers. Or it could be a more fundamental problem, requiring knowledge of what you're trying to do.
If you've succesfully trained your NN but are seeing inaccuracies when processing real-world data sets, then your training sets were likely not representative enough.
In all of these cases, there's a strong likelihood that your NN did something entirely different than what you wanted it to do. So at this point, simply selecting the highest output is as good a guess as any. But there's absolutely no guarantee that it'll be a better guess.

Normalized Iteration Count does not work. What am I doing wrong?

As you can see from the title, I'm busy programming a little programm for visualizing fractals in Java. Anybody who deals with fractals will come to the point where he/she searches for a solution to get these stupid "bands" away, when you just colour a pixel by the number of iterations it took to escape.
So I searched for a more advanced colouring algorithm, finding the "normalized iteration count". The formula I'm using is:
float loc = (float) 1 - Math.log(Math.log(c.abs())) / Math.log(2);
Everybody on the Internet is so happy about this algorithm, everybody uses it, everbody gets great results. Except me. I thought, this algorithm should provide a float between 0 and 1. But that doesn't happen. I did some calculations and came to the conclusion, that this algorithm only works for c.abs() >= Math.E && c.abs() <= Math.exp(2) (that is Math.E * Math.E).
In numbers this means, my input into this equation has to be between about 2.718 and 7.389.
But a complex number c is considerd to tend towards infinity when its magnitude gets greater than 2. But for any Input smaller than Math.E, I get a value greater than one. And for any number greater than Math.exp(2), it gets negative. That is the case if a complex number escapes really fast.
So please tell me: what am I doing wrong. I'm desperate.
Thanks.
EDIT:
I was wrong: the code I posted is correct, I just
1. used it the wrong way and so it didn't provide the right output.
2. had to set the bailout value of the mandelbrot/julia algorithm to 10, otherwise I would've got stupid bands again.
Problem solved!
As you've already discovered, you need to increase the bailout radius before smoothing will look right.
Two is the minimum length that a coordinate can have such that when you square it and add the initial value, it cannot result in a smaller length. If the previous length was 2.0, and you squared it, you'd have a length of 4.0 (pointing in whichever direction), and the most that any value of c could reduce that by is 2.0 (by pointing in precisely the opposite direction). If c were larger than that then it would start to escape right away.
Now, to estimate the fractional part of the number of iterations we look at the final |z|. If z had simply been squared and c not added to it, then it would have a length between 2.0 and 4.0 (the new value must be larger than 2.0 to bail out, and the old value must have been less than 2.0 to have not bailed out earlier).
Without c, taking |z|'s proportional position between 2 and 4 gives us a fractional part of the number of iterations. If |z| is close to 4 then the previous length must have been close to 2, so it was already close to bailing out in the previous iteration and the smoothed result should be close to the previous iteration count to represent that. If it's close to 2, then the previous iteration was further from bailing out, and so the smoothed result should be closer to the new iteration count.
Unfortunately c messes that up. The larger c is, the larger the potential error is in that simple relationship. Even if the old length was nearly at 2.0, it might have landed such that c's influence made it look like it must have been smaller.
Increasing the bailout mitigates the effect of adding c. If the bailout is 64 then the resulting length will be between 64 and 4096, and c's maximum offset of 2 has a proportionally smaller very impact on the result.
You have left out the iteration value, try this:
float loc = <iteration_value> + (float) 1 - Math.log(Math.log(c.abs())) / Math.log(2);
The iteration_value is the number of iterations which yielded c in the formula.

How does java handle integer overflow and underflow?

i know this is an old question, asked many times. but i am not able to find any satisfactory answer for this, hence asking again.
can someone explain what exactly happens in case of integer overflow and underflow?
i have heard about some 'lower order bytes' which handle this, can someone explain what is that?
thanks!
You could imagine that when you have only 2 places you are counting (so adding 1 each time)
00
01
10
11
100
But the last one gets cut down to "00" again. So there is your "overflow". You're back at 00. Now depending on what the bits mean, this can mean several things, but most of the time this means you are going from the highest value to the lowest. (11 to 00)
Mark peters adds a good one in the comments: even without overflow you'll have a problem, because the first bit is used as signing, so you'll go from high to low without losing that bit. You could say that the bit is 'separate' from the others
Java loops the number either to the maximum or minimum integer (depending on whether it is overflow or underflow).
So:
System.out.println(Integer.MAX_VALUE + 1 == Integer.MIN_VALUE);
System.out.println(Integer.MIN_VALUE - 1 == Integer.MAX_VALUE);
prints true twice.
It basically handles them without reporting an exception, performing the 2's complement arithmetic without concern for overflow or underflow, returning the expected (but incorrect) result based on the mechanics of 2's complement arithmetic.
This means that the bits which over or underflow are simply chopped, and that Integer.MIN_VALUE - 1 typically returns Integer.MAX_VALUE.
As far as "lower order bytes" being a workaround, they really aren't. What is happening when you use Java bytes to do the arithmetic is that they get expanded into ints, the arithmetic is generally performed on the ints, and the end result is likely to be completely contained in the returned it as it has far more storage capacity than the starting bytes.
Another way to think of how java handles overflow/underclock is to picture an anology clock. You can move it forward an hour at a time but eventually the hours will start again. You can wind the clock backward but once you go beyond the start you are at the end again.

Categories