Printing very big BigIntegers - java

I'm trying to figure out the following issue related to BigIntegers in Java 7 x64. I am attempting to calculate a number to an extremely high power. Code is below, followed by a description of the problem.
import java.math.BigInteger;
public class main {
public static void main(String[] args) {
// Demo calculation; Desired calculation: BigInteger("4096").pow(800*600)
BigInteger images = new BigInteger("2").pow(15544);
System.out.println(
"The number of possible 16 bpc color 800x600 images is: "
+ images.toString());
}
}
I am encountering issues printing the result of this operation. When this code executes it prints the message but not the value of images.toString().
To isolate the problem I started calculating powers of two instead of the desired calculation listed in the comment on that line. On the two systems I have tested this on, 2^15544 is the smallest calculation that triggers the problem; 2^15543 works fine.
I'm no where close to hitting the memory limit on the host systems and I don't believe that I am even close to the VM limit (at any rate running with the VM arguments -Xmx1024M -Xms1024M has no effect).
After poking around the internet looking for answers I have come to suspect that I am hitting a limit in either BigInteger or String related to the maximum size of an array (Integer.MAX_VALUE) that those types use for internal data storage. If the problem is in String I think it would be possible to extend BigInteger and write a print method that spews out a few chars at a time until the entire BigInteger is printed, but I rather suspect that the problem lies elsewhere.
Thank you for taking the time to read my question.

The problem is a bug of the Console view in Eclipse.
On my setup, Eclipse (Helios and Juno) can't show a single line longer than 4095 characters without CRLF. The maximum length can vary depending on your font choice - see below.
Therefore, even the following code will show the problem - there's no need for a BigInteger.
StringBuilder str = new StringBuilder();
for (int i = 0; i < 4096; i++) {
str.append('?');
}
System.out.println(str);
That said, the string is actually printed in the console - you can for instance copy it out of it. It is just not shown.
As a workaround, you can set Fixed width console setting in Console preferences, the string will immediatelly appear:
The corresponding bugs on Eclipse's bugzilla are:
Display problem in console when a line reaches 4096 characters
Texteditor can't show a line with more than 4095 chars. Limit at 4096 chars.
Long lines are not displayed by editor
According to those, it's a Windows/GTK bug and Eclipse's developers can't do anything about it.
The bug is related to the length of the text is pixels, use a smaller
font and you will be able to get more characters in the text before it
breaks.

Related

java Program runtime is too fast? Issue with Memory

So I am running some simulations that require some sample datasets. For the sake of simplicity I am using this http://loremipsum.sourceforge.net/ Lorem Ipsum generator. I am setting a test parameter called DATASIZE that sets the amount of words or paragraphs this generator creates. I am using this generated data to create an "input" and "output" hash. The output data will use a slightly different hash. For example,
String input = hash(new LoremIpsum().getWords(DATASIZE))
String output = hash(new LoremIpsum().getWords(DATASIZE-2))
My question is, does Java keep the first data set in memory and then slightly modify it to quickly produce output? Maybe I was just pessemistic on the runtime but it seems very small. Virtually zero in System.currentTimeMillis(); Could it be the jar?
I also noticed something odd with my output. I am creating several objects that store this input and output hash. On some of these that I generate, for some reason the runtime is 16. Otherwise it is 0. Something with memory or just shoddy code?
It uses StringBuilder. So answer to your question is NO. There is no reuse/cache in getWords(..). - https://sourceforge.net/p/loremipsum/code/HEAD/tree/trunk/src/main/java/de/svenjacobs/loremipsum/LoremIpsum.java
Having said that, if you give really large number - say 1000000 then you may see difference. I checked using my latest all powerful macbook pro
public static void main(String[] args) {
LoremIpsum loremipsum = new LoremIpsum();
long start;
int number = 100000;
for(int i=0;i<5;i++) {
start = System.currentTimeMillis();
loremipsum.getWords(number);
System.out.println("getWords():" +(System.currentTimeMillis()-start));
}
}
Output in ms
getWords():11
getWords():7
getWords():5
getWords():4
getWords():4

How can I determine the width of a Unicode character

me and a friend are programming our own console in java, but we have Problems to adjust the lines correctly, because of the width of the unicode characters which can not be determined exactly. This leads to the problem that not only the line of the unicode, but also following lines are shifted.
Is there a way to determine the width of the unicodes?
Screenshots of the problem can be found bellow.
This is how it should look: https://abload.de/img/richtigslkmg.jpeg
This is an example in Terminal: https://abload.de/img/terminal7dj5o.jpeg
This is an example in PowerShell: https://abload.de/img/powershelln7je0.jpeg
This is an example in Visual Studio Code: https://abload.de/img/visualstudiocode4xkuo.jpeg
This is an example in Putty: https://abload.de/img/putty0ujsk.png
EDIT:
I am sorry that the question was unclear.
It is about the display width, in the example I try to determine the display length to have each line the same length.
The function real_length is to calculate/determine and return the display width.
here the example code:
public static void main(String[] args) {
String[] tests = {
"Peter",
"SHGAMI",
"Marcel №1",
"💏",
"👨‍❤️‍👨",
"👩‍❤️‍💋‍👩",
"👨‍👩‍👦"
};
for(String test : tests) test(test);
}
public static void test(String text) {
int max = 20;
for(int i = 0; i < max;i++) System.out.print("#");
System.out.println();
System.out.print(text);
int length = real_length(text);
for(int i = 0; i < max - length;i++) System.out.print("#");
System.out.println();
}
public static int real_length(String text) {
return text.length();
}
Unfortunately there is no easy solution to your deceptively simple question, for several reasons:
The width of the characters being rendered on the console might (and probably will) vary, based on the font being used. So the code would need to determine, or assume, the target font in order to calculate widths.
System.out is just a PrintStream that does not know or care about fonts and character width, so any solution has to be independent of that.
Even if you could determine the font being used on the console, and you had a way to determine the width of each character you were trying to render in that specific font, how would that help you? Knowing the variation in widths might conceivably allow you to cleverly tweak the lines being rendered so that they were aligned, but it's just as likely that it wouldn't be practicable.
A potential solution is to leave your code as it stands, and use a monospaced font on the console that println() is writing to, but there are still some major problems with that approach. First, you need to identify a font that is monospaced, but will also support all of the characters you want to render. This can be problematic when including emojis. Second, even if you identify such a font, you may find that all the glyphs for that font are not monospaced! Such a font will ensure that (say) a lowercase i and an uppercase W have the same width, but you can't also make that assumption for emojis, and you can't even assume that the "monospaced" emojis will all have the same non-standard width! Third, the font you identify (if it exists at all) would have to be available in your target environments (your PowerShell, your friend's PuTTY shell, etc.). That is not a major obstacle, but it is one more thing to worry about.
You may find that the rendered text varies by operating system. Your output may look aligned in a Linux terminal window, but that same output, using the same font, might be misaligned in a PowerShell window.
Given all that, a better approach might be to use Swing or JavaFX, where you have finer control over the output being rendered. Even if you are unfamiliar with those technologies, it wouldn't take too long to get something working, just by tweaking some sample code obtained through a search. And even allowing for the learning curve, it would still take less time than coming up with a robust solution for aligning arbitrary characters written to an arbitrary console, because that is a hard problem to solve.
Notes:
Your real_length() method is merely returning the number of code points in the supplied Java String. That relates to its internal representation, and has no direct correlation with the width of the rendered characters, which is determined by the font being used.
See Emoji exceed monospace character width, breaking column alignment #100730 where Microsoft have declined to address the issue for VS Code.
For SO question Java: how to align UTF Miscellaneous Symbols in plain text, see this answer which solved a similar but simpler problem, but only for the Command Prompt window on Windows.
tl;dr
Use code points rather than char. Avoid calling String#length.
input
+
"#".repeat( targetLength - input.codePoints().toArray().length )
Details
Your Question neglected to show any code. So I can only guess what you are doing and what might be the problem.
Avoid char
I am guessing that your goal is to append a certain number of NUMBER SIGN characters as needed to make a fixed-length row of text.
I am guessing the problem is that you are using the legacy char type, or its wrapper class Character. The char type has been essentially broken since Java 2. As a 16-bit value, char is physically incapable of representing most characters.
Use code point numbers
Instead, use code point integer numbers when working with individual characters. A code point is the number permanently assigned to each of the over 140,000 characters defined in Unicode.
A variety of code point related methods have been added to various classes in Java 5+: String, StringBuilder, Character, etc.
Here we use String#codePoints to get an IntStream of code points, one element for each character in the source. And we use StringBuilder#appendCodePoint to collect the code points for our final result string.
final int targetLength = 10;
final int fillerCodePoint = "#".codePointAt( 0 ); // Annoying zero-based index counting.
String input = "😷🤠🤡";
int[] codePoints = input.codePoints().toArray();
StringBuilder stringBuilder = new StringBuilder();
for ( int index = 0 ; index < targetLength ; index++ )
{
if ( index < codePoints.length )
{
stringBuilder.appendCodePoint( codePoints[ index ] );
} else
{
stringBuilder.appendCodePoint( fillerCodePoint );
}
}
Or, shorten that for loop with the use of a ternary operator.
for ( int index = 0 ; index < targetLength ; index++ )
{
int codePoint = ( index < codePoints.length ) ? codePoints[ index ] : fillerCodePoint;
stringBuilder.appendCodePoint( codePoint );
}
Report result.
System.out.println( Arrays.toString( codePoints ) );
String output = stringBuilder.toString();
System.out.println( "output = " + output );
[128567, 129312, 129313]
output = 😷🤠🤡#######
There is likely a clever way to write that code more briefly with streams and lambdas, but I cannot think of one at the moment.
And, one could cleverly use the String#repeat method in Java 11+.
String output = input + "#".repeat( targetLength - input.codePoints().toArray().length ) ;
Note: This answer is distinct and qualitatively different from my earlier one (which I still stand by).
There is a simple way for a Java application (i.e. one not using a graphical user interface) to obtain the width of a String being rendered in a given font with a given font size. It requires the use of some awt classes which are supported even in a non-AWT environment. Here's a demo using the data provided in the question:
package fixedwidth;
import java.awt.Canvas;
import java.awt.Font;
import java.awt.FontMetrics;
public class FixedWidth {
static String[] tests = {
"Peter", "SHGAMI", "Marcel №1", "💏", "👨‍❤️‍👨", "👩‍❤️‍💋‍👩", "👨‍👩‍👦"
};
static Font smallFont = new Font("Monospaced", Font.PLAIN, 10);
static Font bigFont = new Font("Monospaced", Font.BOLD, 24);
/**
* This code is based on an answer by SO user Lonzak.
* See SO Answer https://stackoverflow.com/a/18123024/2985643
*/
public static void main(String[] args) {
FontMetrics fm1 = new Canvas().getFontMetrics(FixedWidth.smallFont);
FixedWidth.demo(tests, fm1);
FontMetrics fm2 = new Canvas().getFontMetrics(FixedWidth.bigFont);
FixedWidth.demo(tests, fm2);
}
static void demo(String[] tests, FontMetrics fm) {
Font f = fm.getFont();
System.out.println("\nFont name:" + f.getName() + ", font size:" +
f.getSize() + ", font style:" + f.getStyle());
for (String test : tests) {
int width = fm.stringWidth(test);
System.out.println("width=" + width + ", data=" + test);
}
}
}
The code above is based on this old answer by user Lonzak to the question Java - FontMetrics without Graphics. Those AWT classes allow you to create a Font with defined characteristics (i.e. name, size, style), and then use a FontMetrics instance to obtain the width of an arbitrary String when using that font.
Here is the output from running the code shown above:
Font name:Monospaced, font size:10, font style:0
width=30, data=Peter
width=60, data=SHGAMI
width=59, data=Marcel №1
width=10, data=💏
width=30, data=👨‍❤️‍👨
width=40, data=👩‍❤️‍💋‍👩
width=30, data=👨‍👩‍👦
Font name:Monospaced, font size:24, font style:1
width=70, data=Peter
width=149, data=SHGAMI
width=140, data=Marcel №1
width=25, data=💏
width=73, data=👨‍❤️‍👨
width=98, data=👩‍❤️‍💋‍👩
width=74, data=👨‍👩‍👦
Notes:
The first set of results shows the widths of the sample data in the question when using plain Monospaced 10 point font. The second set of results shows the widths of those same strings when using bold Monospaced 24 point font.
The widths don't look correct for some of the emojis, but that is because when the source code and output results are pasted into SO some emoji representations are changed, presumably because of the different font being used in the browser. (I was using Monospaced for both the source and the output.) Here's a screen shot of the original output, showing that the widths at least look plausible:
Even though the widths are being calculated and rendered for a fixed width font (Monospaced), it's clear that the width of the emojis cannot be predicted from the widths of normal keyboard characters.
Sounds like you're looking for a Java implementation of the POSIX wcwidth and wcswidth functions, which implement the rules defined in Unicode Technical Report #11 (which exclusively focuses on display widths for Unicode codepoints when rendered to fixed width devices - terminals and the like). The only such Java implementation that I'm aware of is in the JLine3 library, which is a lot of code to bring in for just this one class, but that may be your best bet.
Note however that that code appears to be incomplete. Unicode codepoint 0x26AA (⚪️), for example, is reported as having a width of 1 by the JLine3 code, but on every platform I've tested on (including here in the StackOverflow editor, which is a fixed width "device") that codepoint is displayed over two columns.
Good luck - this stuff is a lot more complex than it looks. The JVM's unfortunate UCS-2 history (not Sun's fault - it was bad timing wrt the Unicode standard) only makes matters worse, and as others have said here, avoid the char and Character data types like the plague - they do not work the way you expect, and the instant code that uses those types encounters data including codepoints from the Unicode supplemental planes, it is almost certain to function incorrectly (unless the author has been especially careful - do you feel lucky? 😉).

Reading large files for a simulation (Java crashes with out of heap space)

For a school assignment, I need to create a Simulation for memory accesses. First I need to read 1 or more trace files. Each contains memory addresses for each access. Example:
0 F001CBAD
2 EEECA89F
0 EBC17910
...
Where the first integer indicates a read/write etc. then the hex memory address follows. With this data, I am supposed to run a simulation. So the idea I had was parse these data into an ArrayList<Trace> (for now I am using Java) with trace being a simple class containing the memory address and the access type (just a String and an integer). After which I plan to loop through these array lists to process them.
The problem is even at parsing, it running out of heap space. Each trace file is ~200MB. I have up to 8. Meaning minimum of ~1.6 GB of data I am trying to "cache"? What baffles me is I am only parsing 1 file and java is using 2GB according to my task manager ...
What is a better way of doing this?
A code snippet can be found at Code Review
The answer I gave on codereview is the same one you should use here .....
But, because duplication appears to be OK, I'll duplicate the answer here.
The issue is almost certainly in the structure of your Trace class, and it's memory efficiency. You should ensure that the instrType and hexAddress are stored as memory efficient structures. The instrType appears to be an int, which is good, but just make sure that it is declared as an int in the Trace class.
The more likely problem is the size of the hexAddress String. You may not realise it but Strings are notorious for 'leaking' memory. In this case, you have a line and you think you are just getting the hexString from it... but in reality, the hexString contains the entire line.... yeah, really. For example, look at the following code:
public class SToken {
public static void main(String[] args) {
StringTokenizer tokenizer = new StringTokenizer("99 bottles of beer");
int instrType = Integer.parseInt(tokenizer.nextToken());
String hexAddr = tokenizer.nextToken();
System.out.println(instrType + hexAddr);
}
}
Now, set a break-point in (I use eclipse) your IDE, and then run it, and you will see that hexAddr contains a char[] array for the entire line, and it has an offset of 3 and a count of 7.
Because of the way that String substring and other constructs work, they can consume huge amounts of memory for short strings... (in theory that memory is shared with other strings though). As a consequence, you are essentially storing the entire file in memory!!!!
At a minimum, you should change your code to:
hexAddr = new String(tokenizer.nextToken().toCharArray());
But even better would be:
long hexAddr = parseHexAddress(tokenizer.nextToken());
Like rolfl I answered your question in the code review. The biggest issue, to me, is the reading everything into memory first and then processing. You need to read a fixed amount, process that, and repeat until finished.
Try use class java.nio.ByteBuffer instead of java.util.ArrayList<Trace>. It should also reduce the memory usage.
class TraceList {
private ByteBuffer buffer;
public TraceList(){
//allocate byte buffer
}
public void put(byte operationType, int addres) {
//put data to byte buffer
}
public Trace get(int index) {
//get data from byte buffer by index
byte type = ...//read type
int addres = ...//read addres
return new Trace(type, addres)
}
}

How to center a print statement text?

So I was working on my java project and in one part of the program I'm printing out text
The text is displayed on the left side
However I wanted it be displayed in the middle
How many I accomplish this?
Is this a newbie question?
Example:
public static void main(String[] args)
{
System.out.println("Hello");
}
VERY QUICK answer
You can use the JavaCurses library to do fun things on the console. Read below it's in there.
Before you do though let's answer your entire question in some context
It is a newbie question :) but it's a valid question. So some hints for you:
First question is, how wide is the terminal? (it's counted in number of characters) old terminals had a fixed dimensions of 80 characters and 25 lines;
So as a first step start with the assumption that it's 80 characters wide.
How would you center a string on an 80 character wide terminal screen?
Do you need to worry about the length of the string? How do you position something horizontally? Do you add spaces? Is there a format string you can come up with?
Once you've written a program such that you can give it any string that will display properly on those assumptions (that terminal is 80 characters wide) you can now start worrying about what happens if you are connected to a terminal which is more or less than 80 characters? Or whether or not you are even connected to a terminal. For example if you are not does it make sense to "prettify" your code? probably not.
So question is how do you get all this information?
What you are asking for is the ability to treat the console as a smart teletype (tty) terminal with character-based control capabilities. Teletype terminals of the old can do a lot of fun things.
Some history
Teletype terminals were complicated things and come from the legacy that there were a lots of terminal manufacturers (IBM, DEC, etc.) ... These teletype terminals were developed to solve lots of problems like being able to display content remotely from mainframes and minicomputers.
There were a bunch of terminal standards vt100, vt200, vt220, ansi, that came about at various points in terminal development history and hundreds of proprietary ones along the way.
These terminals could do positioning of cursors and windowing and colors, highlight text, underline etc. but not everyone could do everything. However this was done using "control" characters. ctrl-l is clear screen on ansi and vt terminals, but it may be page feed on something else.
If you wrote a program specific to one it would make no sense elsewhere. So the necessity to make that simple caused a couple of abstraction libraries to developed that would hide away the hideousness.
The first one is called termcap (terminal-capabilities) library, circa 1978, which provided a generic way to deal with terminals on UNIX systems. It could tell a running program of the available capabilities of the terminal (for example the ability to change text color) or to position cursor at a location, or to clear itself etc, and the program would then modify its behavior accordingly.
The second library is called curses, circa 1985 (??) it was developed as part of the BSD system and was used to write games ... One of the most popular versions of this library is the GNU curses library (previously known as ncurses).
On VMS I believe the library is called SMG$ (screen management library).
On with the answer
Any how, so you can use one of these libraries in java to determine whether or not you are working on a proper terminal. There is a library called JavaCurses on source forge that provides this capability to java programs. This will be an exercise in learning how to utilize a new library into your programs and should be exciting.
JavaCurses provides terminal programming capability on both Unix and Windows environments. It will be a fun exercise for you to see if you can use it to play with.
advanced exercise
Another exercise would be to use that same library to see if you can create a program that display nicely on a terminal and also writes out to a text file without the terminal codes;
If you have any issues, post away, I'll help as you go along.
If you have a definite line length, apache commons StringUtils.center will easily do the job. However, you have to add that library. javadoc
Java print statements to the console can't be centered as there is no maximum width to a line.
If your console is limited to, for example, 80 chars, you could write a special logger that would pad the string with spaces.
If your string was greater than 80 chars then you would have to cut the string and print the remainder on the next line. Also, if someone else was using your app with a console with a different width (especially smaller) if would look weird.
So basically, no, there is no easy way to center the output...
You could do something like:
public static void main(String[] args) {
String h = "Hello";
System.out.println(String.format("%-20s", h));
}
This approach outputs a string offset by a given number of spaces. In this case Hello is preceded by 20 spaces. The spaces precede Hello because the integer between % and s is negative, otherwise the spaces would be trailing.
Just mess with the integer between % and s until you get the desired result.
As lot of programming questions, dont reinvent the wheel!
Apache have a nice library: "org.apache.commons" that come with a StringUtils class:
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html
The pad method is what you want:
int w = 20;
System.out.println(StringUtils.rightPad("+", w - 1, "-") + "+");
System.out.println(StringUtils.center(StringUtils.center("output", w - 2), w, "|"));
System.out.println(StringUtils.rightPad("+", w - 1, "-") + "+");
will give you:
+----------------------+
| output |
+----------------------+
You can't. You are writing to the console which does not have a width so the center is undefined.
If you know the size and don't want to use an external library you could do something like this:
static void printer(String str, int size) {
int left = (size - str.length()) / 2;
int right = size - left - str.length();
String repeatedChar = "-";
StringBuffer buff = new StringBuffer();
for (int i = 0; i < left; i++) {
buff.append(repeatedChar);
}
buff.append(str);
for (int i = 0; i < right; i++) {
buff.append(repeatedChar);
}
// to see the end (and debug) if using spaces as repeatedChar
//buff.append("$");
System.out.println(buff.toString());
}
// testing:
printer("string", 30);
// output:
// ------------string------------
If you call it with an odd number for the size variable, then it would be with one - more to the right. And you can change the repeatedChar to be a space.
Edit
If you want to print just one char and you know the size, you could do it with the default System.out.printf like so:
int size = 10;
int left = size/2;
int right = size - left;
String format = "%" + left + "c%-" + right + "c";
// would produce: "%5c%-5c"
System.out.printf(format,' ', '#');
// output: " # " (without the quotes)
The %-5c align the # character to the left of the 5 spaces assigned to it

Java console bug under windows

The following Code
System.out.println("Start");
String s = "";
//936 * 5 = 4680 characters
for (int i = 0; i < 937; i++){
s += "1234 ";
}
System.out.println(s);
System.out.println("End");
produces an empty line between "Start" and "End" on the java console under windows, but works as expected when running MacOS or Linux. Same applies when writing to a file instead of using sysout. I've tried multiple windows machines. It doesn't matter whether I execute the method through eclipse or via cmd.
When you change "1234 " to "1234," or "12g4 " or when the number of runs is more/less than 936, it works as expected with all OS.
Can anybody confirm this/is there a known bug concerning this issue?
I can reproduce this as well, under Windows 7. It looks like a limitation due to the OS in SWT, and it seems to have been around for a very long time (2002). It's marked as WONTFIX. See GC#drawString, drawText don't render more than 10923 characters per line correctly. So this is a known bug.
The workaround is to go to the Workspace->Preferences->Run/Debug->Console and set the Fixed width console to be something like 4000 chars. This will wrap your lines after 4000 characters, which is a pain, but at least you'll get all of your output.
I have tried Galileo (3.5), Helios (3.6) and Indigo (3.7), and all exhibit the behaviour, but wierdly, Galileo & Helios have a limit = 818 (4090 chars) and Indigo = 936 (4680) chars as the OP said. The 4090 makes me think of a OS limit (the next would be 4090 + 5 + crlf, > 4096), which matches with the bugs raised in Eclipse/SWT. I can't explain why there is a difference in the number of characters accepted. I can only suggest that it's something in the OS.
There are a number of duplicate bugs raised in Eclipse:
Bug 19850 - Large string printed in Console overstrikes/disappears depending on length
Bug 44866 - Truncate long strings in variables view
Bug 104588 - Unreadable console output under certain conditions
Everything seems to have been a consequence of: Bug 11601 - console hangs while displaying long strings without crlf

Categories