I have a question regarding displaying a suit symbol (ASCII) (heart, diamond, spade, and club) to the terminal window when running a Java program. I currently use JCreator V3 LE. The JDK I use is 1.8.0_172.
In the past, I used the syntax:
Character.toString((char) 3)
Character.toString((char) 4)
Character.toString((char) 5)
Character.toString((char) 6)
Now, it displays a box with a ? in it, as if the character cannot be found. Is there another way to do this, or has this character been eliminated from the window?
Thanks.
As with all text transfers, your Java program (writer) and terminal (reader) need to be on the same page—"code page" (character encoding), that is. You said ASCII but ASCII doesn't support the characters you want to use. You are probably thinking of CP437 from MS-DOS and Windows. (MS-DOS didn't have an ASCII code page; Windows got one late in life, for the sake of completeness. ASCII is only used is very specialized contexts.)
If you want to take Java's character transcoding out of the equation, you can write bytes to the output stream. Then whatever they mean to the terminal, it will decode them to characters.
// for illustration purposes only; I would not invest in code like this.
System.out.flush();
System.out.write(0x03);
System.out.flush();
To actually see them, the terminal has to have a font that includes the decoded character. A white box or box with question mark indicate that the font doesn't. A question mark or question mark in a black diamond indicate that the bytes don't mean anything in the terminal's character encoding.
To check your terminal's character encoding, go chcp (Windows) or locale (most other OSes).
As #VGR stated in the comments, GUIs are simpler. This is because they avoid the concept of creating a byte stream of text in a particular character encoding and just use the windowing system's facility for drawing text. (This comes at the cost of not being able to pipe the output to another program or redirect it to a file, which is a key feature of CUI programs.)
Related
I'm utilizing this line codes
String string = "Some usefull information − don't know what happens with my output";
System.out.println(string);
String str2verify = driver.findElement(By.xpath("//someWellFormXpath")).getText();
Assert.assertEquals(str2verify , "Some usefull information − don't know what happens with my output");
And I'm getting this in my console, so if I want to use equals function doesn't work.
Output
Some usefull information ? don't know what happens with my output
expected [Some usefull information ? don't know what happens with my outputS] but found [Some usefull information − don't know what happens with my output]
java.lang.AssertionError: expected [Some usefull information ? don't know what happens with my outputS] but found [Some usefull information − don't know what happens with my output]
This is the process:
You write some text. In an editor. That is showing strings to you.
You save your file. files are bytes, not characters, so your editor is applying a charset encoding to do this. Which one? Your editor will know, you didn't mention which one you use so I can't tell you.
Javac reads your file. files are bytes, but javac needs characters, so javac is applying a charset encoding to do this. Which one? "The platform default", unless you use the -encoding parameter / the tool you are using that calls javac has a way to tell it which -encoding parameter to use.
Javac emits class files. These are byte based so this doesn't require encoding.
Your java JVM runs your class file. As part of running, a string is printed to standard out.
System.out refers to 'standard out'. These things are, on pretty much every OS, a stream of bytes. Meaning, when you send strings there, the JVM first encodes your string using some charset encoding, then it goes to standard out.
Something is connected to the other end of standard out and sees these bytes. These convert the bytes back to a string, also using some encoding.
The characters are sent to the font rendering engine on your OS. Even if the character 'survived' all those conversions back and forth, it is possible your font doesn't have a glyph for it. The intent is clearly for that character to be an emdash (a dash that is as long as the letter 'm' - the standard 'minus' character is an ndash, not the same thing; that one is a bit shorter).
Count em up - that's like 6 conversions. They all need to be using the same charset encoding. So, check that your editor and javac agree on what charset encoding your source file is in. Then, check that the console thing that is showing the string is in agreement with standard out (which should be 'platform default', whatever that might be), then, check if the font you use has emdash.
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
Then write to ps, not System.out - that's how you can explicitly force some charset to be used when writing to output.
It turns that em dash doesn't have a representation in cp-1252 charset encoding, so at the end I have to change to UTF-8 all my files in the project to be able to save this character.
It was a pain in the brain this encoding issue.
Thanks for all the suggestions friends.
As far as know in the end of all files, specially text files, there is a Hex code for EOF or NULL character. And when we want to write a program and read the contents of a text file, we send the read function until we receive that EOF hexcode.
My question : I downloaded some tools to see a hex view of a text file. but I can't see any hex code for EOF(End Of File/NULL) or EOT(End Of Text)
ASCII/Hex code tables :
This is output of Hex viewer tools:
Note : My input file is a text file that its content is "Where is hex code of "EOF"?"
Appreciate your time and consideration.
There is no such thing as a EOF character. The operating system knows exactly how many bytes a file contains (this is stored alongside other metadata like permissions, creation date, and the name), and hence can tell programs that try to read the eleventh byte of a ten byte file: You've reached the end of file, there are no more bytes to read.
In fact, the "EOF" value returned for example by C functions like getchar is explicitly an int value outside the range of a byte, so it cannot possibly be stored in a file!
Sometimes, certain file formats insist on adding NUL terminators (probably because that's how strings are usually stored in C), though usually these delimit multiple records in a single file, not the file as a whole. And such decoration usually disqualifies a file from being considered a "text file".
ASCII codes like ETX and NUL date back to the days of teletypewriters and friends. NUL is used in C for in-memory strings, but this has no bearing on file systems.
There was - a long long time ago - an End Of File marker but it hasn't been used in files for many years.
You can demonstrate a distant echo of it on windows using:
C:\>copy con junk.txt
Hello
Hello again
- Press <Ctrl> and <z>
C:\>dump junk.txt
junk.txt:
00000000 4865 6c6c 6f0d 0a48 656c 6c6f 2061 6761 Hello..Hello aga
00000010 696e 0d0a in..
C:\>
Note the use of Ctrl-Z as an EOT marker.
However, notice also that the Ctrl-Z does not appear in the file any more - it used to appear as a 0x1a but only on some operating systems and even then not consistently.
Use of ETX (0x03) stopped even before those dim and distant times.
There is no such thing as EOF. EOF is just a value returned by file reading functions to tell you the file pointer reached the end of the file.
The EOT byte (0x04) is used to this day by unix tty terminals to indicate end of input. You type it with a Ctrl + D (ie. ^D) to end input to shells or any other program reading from stdin.
However, as others have pointed out, this is distinct from EOF, which is a condition rather than a piece of data per se.
There once were even different EOF characters (for different operating systems). No longer seen one. (Typically files were in blocks of 128 bytes.) For coding a PITA, like nowadays BOMs.
Instead there is still a int read() that normally delivers a byte value, but for EOF delivers -1.
The NUL character is a string terminator in C. In java you can have a NUL character in the middle of a string. To be cooperative with C, the UTF-8 bytes generated use a multi-byte encoding both for Unicode characters > 127 and for NUL.
(Some of this is probably known already.)
In the 7bit Wintel world it is 0x1A or chr(26).
It is still commonly found in older text files and archives and is still produced by some file transmission protocols. In particular text files downloaded from BBS systems were commonly terminated with this character.
There are other such sentinel values for older systems, and like EOL (CR,LF,CR+LF) needs to be anticipated from time to time.
It can be a source of annoyance to see it still being used, on the same level as return(0) for instance.
What I want as an end result is this
System.out.println("This is the not equal to sign\n≠");
to appear (when run) as
This is the not equal to sign
≠
not to appear as
This is the not equal to sign
?
Is there any way to do this? I tried using windows character map, copied the symbol here, and in my code, but after changing encoding to UTF-8 and inserting it, it comes up as ? when run...
What can be done? Thanks in advance for answers to this utterly simple question
Set character encoding to UTF-8, pass this vm argument, if your text editor already uses UTF-8 or supports this character
-Dfile.encoding=UTF-8
As #Tobias Brandt says, you could use: \u2260
And btw also #Crozin is right about your console configuration
Like this
System.out.println("This is the not equal to sign \n\u2260");
There are five potential issues here:
1) In which charset encoding are you saving (from your editor) you Java source?
2) Which charset encoding the java compiler assumes?
3) Which charset is your console?
4) Are you using some terminal with translation?
5) Does your console font include that particular character?
For getting issues 1-2 right, you should use UTF-8 for both (editor and javac settings), or more robust, specifify the Unicode char with escaped pure ascii text (Frakcool answer).
For issue 3, try -Dfile.encoding=UTF-8 or see this answer. Issues 4-5 are outside your Java program scope. If you are unsure, just redirect the ouput to a file, and look at it with a Hex editor.
When you save the java file, make sure it is saved in the same Charset as the one it is open.
In my Eclipse, when I save a file with special chars (such as \u2260) it asks me what charset I want to use.
Open your file in the terminal and inspect the content of the file.
Make sure it is the same char as the one in the editor you are using.
It seems that after Eclipse asked me if I want to change to UTF-8, it worked, only after I posted this.
Sorry for wasting your time
I am having some trouble with encoding this string into barcode symbology - Code 128.
Text to encode:
1021448642241082212700794828592311
I am using the universal encoder from idautomation.com:
https://www.bcgen.com/fontencoder/
I get the following output for the encoded text for Code 128:
Í*5LvJ8*r5;ÂoP<[7+.Î
However, in ";Âo" the character between the semi-colon and o (let us call it special A) - is not part of the extended character set used in Code128. (See the Latin Supplements at https://www.fonts2u.com/code-128.font)
Yet the same string shows a valid barcode at
https://www.bcgen.com/linear-barcode-creator.html
How?
If I use the output with the Special A on a webpage with a font face for barcodes, the special A character does not show up as the barcode (and that seems correct since the special A is not part of the character set).
What gives? Please help.
I am using the IDAutomation utility to encode the string to 128c symbology. If you can share code to do the encoding (in Java/Python/C/Perl) that would help too.
There are multiple fonts for Code128 that may use different characters to represent the barcode symbols. Make sure the font and the encoding logic match each other.
I used this one http://www.jtbarton.com/Barcodes/Code128.aspx (there is also sample code how to encode it on the site, but you have to translate it from VB). The font works for all three encodings (A, B and C).
Sorry, this is very late.
When you are dealing with the encoding of code 128, in any subset, it's a good idea to think of that coding in terms of numbers, not characters. At this level, when you have shifts, code-changes, checksums and stuff, intermixed with the data, the whole concept of "character" is lost.
However, this is what is happening:
The semicolon in the output corresponds to "27"
The lowercase o corresponds to "48" and the P to "79"
The "A with Macron" corresponds to your "00" sequence. This is why you should be dealing with numbers, not characters, at this level of encoding.
How would you expect it to show a character with a code of 00 ? That would be a space of NULL, neither of which is particularly visible.
Your software has simply rendered it the best way it can, which is to make the character 'visible' by adding 0x80 to it. If you look at charmap, you will see that code 0x80 is indeed A with macron.
The rest (indeed all) of your encoded string looks correct for a setc-encodation.
What's the best way to insert statistics symbols in a JLabel's text? For example, the x-bar? I tried assigning the text field the following with no success:
<html>x̄
Html codes will not work in Java. However, you can use the unicode escape in Java Strings.
For example:
JLabel label = new JLabel(new String("\u0304"));
Also, here is a cool website for taking Unicode text and turning it into Java String leterals.
Well, that's completely mal-formed HTML, probably even for Swing (I think you would need the </html> at the end for it to work. But I would try to never go that road if you can help it, as Swing's HTML support has many drawbacks and bugs.
You can probably simply insert the appropriate character directly, either directly in the source code if you're using Unicode or with the appropriate Unicode escape:
"x\u0304"
This should work, actually. But it depends on font support and some fonts are pretty bad in positioning combining characters. But short of drawing it yourself it should be your best option.
You can obtain x̄ in adding \u0304 to x character.
In your case, you must generate following string
"x\u0304"
The character \u0304 alone is only a overscore or overline character. It is a special Diacritic character in UNICODE table. You can use it in combination with other normal character to obtain a composite character.
You can find more information on Diacritics characters at following location //en.wikipedia.org/wiki/Combining_character
You can also use
\u0305 to have a longer bar
\u0307 to have a dot above previous character (speed in mechanic)
\u0308 to have 2 dots above previous character (accelaration in
mechanic)
\u0325 to have a circle above previous character
Example
"[x\u0305]" -> x̄
"[z\u0307]" -> x̅
"[t\u0308]" -> ṫ
"[u\u0309]" -> ü
"[A\u030A]" -> Å
In example, only x overlapped with long bar is not correctly displayed in HTML because Chrome browser has a bug (I think).
If you Paste/Copy correctly displayed character in MS Word, you will see correct display of all these characters.
To type a new overlined character in MS Word, you type the character and immediately after you press Alt & 0 774 where 0774 is the base 10 representation of diacritic overline character.