Java display invalid characters?

Java display invalid characters? - java

I have written a program the performs the standard AES S-Box encryption. my problem is that when I encrypt the message it is supposed to write the text to a JTextArea, but it just shows a bunch of little square boxes and when I try to save it to a text document it just makes a bunch of question marks in the text file. how can I make it display the encrypted text? or can I even have it automatically write it to a text document without it creating a bunch of question marks?
I think that I have to use utf-8 text encoding but I have no idea how to do that.

Your text is encrypted as binary data. While encrypted it is not in any character set and cannot be rendered as text. If you want a way to view it, you could Base64 encode the encrypted data.
See: http://en.wikipedia.org/wiki/Base64

The output of the algorithm will not be a valid text in the general case.
If you need to manipulate it as text you can encrypt it in base-64 which uses only valid ASCII characters.

Related

Display special characters using entity or hex values

I am trying to display ŵ through my jsf page but unable to do so. Basically the text with special characters is read from properties file , but on my application screen it becomes something else . I did try to use entity values but not succeeding for example if original text is :
ŵyhsne klqdw dwql
then after replacing with with entity or hexvalues:
**&wcirc ;**yhsne klqdw dwql but in my page it displays as it is

I can just guess your question. Please edit it and improve it.
If you are displaying in web, you should use &wcirc; (note: without spaces), but this also requires a fonts on client site that support such character.
If the string is in your code: replace the character with \u0175.
But probably the best way it is to use just ŵ either in code on in web, or on any file, and you should assure that such files (or sources) are interpreted ad UTF-8, and you deliver pages are UTF-8. If you are not using UTF-8, just check in similar way, that you are using consistently the correct encoding.
And sending a character doesn't mean it could be displayed. There is always the possibility that a font will not have all *special" characters in it.

Facing issues on extracting text from pdf file using java

Not able to extract the text from the pdf which has Customer encryption fonts, which can identify by File -> Properties -> Font in Adobe reader.
One of the font is mention as,
C0EX02Q0_22
Type: Type 3
Encoding: Custom
Actual Font: C0EX02Q0_22
Actual Font type: Type 3
Let me know is there any way to to extract the text content from such pdf files.
Currently i am using PDFText2HTML from pdf util.
Get the values like 'ÁÙÅ#ÅÕãÉ' while extracting such pdf files
Sample pdf: tesis completa.pdf
In this pdf you could see the fonts used having custom encoding Eg: T3Font_1 (Please refer by File -> Properties -> Font in Adobe reader)
Since i could not upload the my pdf updated the sample one having same issue

Extraction as described in the standard
The PDF specification ISO 32000-1 describes in section 9.10 Extraction of Text Content how text extraction can be done if the PDF provides the required information and does so correctly.
Using this algorithm, though, only works in a few page ranges of the document (namely the summaries, the content lists, the thank-yous, and the section Publicación 7) but in the other ranges results in gibberish, e.g. 8QLYHUVLWDWGH/OHLGD instead of Universitat de Lleida. Looking at the PDF objects in question makes clear that the required information are missing (no ToUnicode map and while the Encoding is based on WinAnsiEncoding, all positions in use are mapped via Differences to non-standard names).
Also trying to extract the text using copy&paste from Adobe Reader returns that gibberish. This generally is a sign that generic extraction is not possible.
A work-around
Inspecting the PDF objects and the outputs of the generic text extraction attempt, though, gives rise to the idea that the actual encoding for the text extracted as gibberish is the same for all fonts used, and that it is some ASCII-based encoding shifted by a constant: Adding 'U' - '8' to each character of the extracted 8QLYHUVLWDWGH/OHLGD results in Universitat de Lleida. Adding the same constant to the chars from text extracted elsewhere in the document also results in correct text as long as the text only uses ASCII characters.
Characters outside the ASCII range are not mapped correctly by that simple method, but they also always seem to be extracted as the same wrong character, e.g. the glyph 'ó' always is extracted as 'y'.
Thus, you can extract the text from that (and similarly created) documents by first extracting the text using the standard algorithm and then in the gibberish sections (which probably can be identified by font name) replacing each character by adding 'U' - '8' for small values and by replacing according to some mapping for higher values.
As you mentioned Java in your question, I have run your document through iText and PDFBox text extraction with and without shifting by 'U' - '8', and the results look promising. I assume other general-purpose Java PDF libraries will also work.
Another work-around
Instead of creating custom extraction routines, you can try to fix the PDFs in question by adding ToUnicode map entries to the fonts in question. After that normal text extraction programs should be able to properly extract the contents.

Encrypting and Decrypting a text file in java maintaining first line as readable

I need to encrypt and decrypt a text file.For encryption and decryption i may use DES/AES algorithms.I have a code for encrypt and decrypt text file but the problem is,the first line in file must be encrypted in such a way it should be understandable.using AES and DES iam getting non readable format after encryption.I need to read the first line of file after encryption.Please help me .Thanks in advance

Why not add a user-readable magic number to the beginning of the file, and again after you're done with the text block? Something like this:
MagiKrypt
This file has been encrypted with MagiKrypt, and you will need the program at (URL HERE) to decrypt it.
MakiKrypt\x00\x01\x02\x03
(AES data here)
EOF
This way your program would easily be able to tell where is text, and where is AES data, and the user would be able to read the first part of the file. It would still be a mess if they open it in a text editor, but at least they'd see the intro block.

Encryption produces bytes, not human readable characters. To make your bytes human readable, you need to convert them to a different format. I would suggest Base64 as a common way to do this.
After you have encrypted your file, convert as much of it as you need to Base64 and display the Base64 part. It will not make any sense, but it won't contain anything too weird.

Its better to keep unecrypted the first line or you should cerate rules of your own and encrypt the first(and all) text.Its better to use SHAI algorithm for better encryptions.Other than SHA1 another choice is BTE encryption

Convert image to textual form(compressed) and back from text to image in android

hello there i want to convert an image to compressed textual form and then i want to send it to the other android user in form of sms and then that textual form need to be converted to the Image, i tried with the base64 encoding but its of no use, because its output is very long.. so that it ll be tough to send that much of text in form of sms, so is there any other way to covert an image to text else any method to compress the text...? please help me and i am work on android emulator and really need your help. Thanks in advance
Regards Hitesh,

Base64 must be OK, because it converts 3 bytes of data to 4 characters of text. You can't decrease size of your data using textual representation. But you can try to compress data and then convert it to text. Maybe it will help.

Base64 encodes the data in base 64, i.e., using 64 different characters. According to the Wikipedia page on SMS an SMS can contain any 7-bit character (that is, you have 128 characters at your disposal!)
Here's the best way to do it:
Compress the image using a compression method of your preference. (jpg for instance)
Find out the sequence of bits that the jpg-data has.
Write this bits as 7-bit characters. (Take special care of padding the last 7 bits.)
Write this sequence of characters as an SMS.

Russian text is not coming in SMS

I try to send a Russian text from my web application, but when the text is sent to a mobile, it is displayed as "?????????". I have tried UTF-8 and all other possible values of charset for Russian text.
Does anyone have a solution for this?

SMS text is by default a special 7bit character set, alternatively Unicode UCS2 can be used. Either way, you will need to either encode your text properly before sending or use a gateway that does the encoding for you.

you can transliterate it, i.e. "Привет" => "Privet"
also check the encoding you use. UTF8 is the most common for international characters these days.

There are several ways to send SMS, one is with Unicode text another is ASCII. Unicode has a larger space requirement so messages max length will be smaller.
Make sure you are sending the text in Unicode format. The SMS gatway API should have documentation on this.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.