Converting ASCII characters to UTF-8 within a Java String [duplicate]

Converting ASCII characters to UTF-8 within a Java String [duplicate] - java

In Java, I want to convert this:
https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type
To this:
https://mywebsite/docs/english/site/mybook.do&request_type
This is what I have so far:
class StringUTF
{
public static void main(String[] args)
{
try{
String url =
"https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do" +
"%3Frequest_type%3D%26type%3Dprivate";
System.out.println(url+"Hello World!------->" +
new String(url.getBytes("UTF-8"),"ASCII"));
}
catch(Exception E){
}
}
}
But it doesn't work right. What are these %3A and %2F formats called and how do I convert them?

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.
Try something like this:
try {
String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
// not going to happen - value came from JDK's own StandardCharsets
}
Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:
String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);
Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

The string you've got is in application/x-www-form-urlencoded encoding.
Use URLDecoder to convert it to Java String.
URLDecoder.decode( url, "UTF-8" );

This has been answered before (although this question was first!):
"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
As URL class documentation states:
The recommended way to manage the encoding and decoding of URLs is to
use URI, and to convert between these two classes using toURI() and
URI.toURL().
The URLEncoder and URLDecoder classes can also be used, but only for
HTML form encoding, which is not the same as the encoding scheme
defined in RFC2396.
Basically:
String url = "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type";
System.out.println(new java.net.URI(url).getPath());
will give you:
https://mywebsite/docs/english/site/mybook.do?request_type

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and /
String decoded = java.net.URLDecoder.decode(url, "UTF-8");

public String decodeString(String URL)
{
String urlString="";
try {
urlString = URLDecoder.decode(URL,"UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
}
return urlString;
}

I use apache commons
String decodedUrl = new URLCodec().decode(url);
The default charset is UTF-8

try {
String result = URLDecoder.decode(urlString, "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

import java.io.UnsupportedEncodingException;
import java.net.URISyntaxException;
public class URLDecoding {
String decoded = "";
public String decodeMethod(String url) throws UnsupportedEncodingException
{
decoded = java.net.URLDecoder.decode(url, "UTF-8");
return decoded;
//"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
}
public String getPathMethod(String url) throws URISyntaxException
{
decoded = new java.net.URI(url).getPath();
return decoded;
}
public static void main(String[] args) throws UnsupportedEncodingException, URISyntaxException
{
System.out.println(" Here is your Decoded url with decode method : "+ new URLDecoding().decodeMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type"));
System.out.println("Here is your Decoded url with getPath method : "+ new URLDecoding().getPathMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest"));
}
}
You can select your method wisely :)

If it is integer value, we have to catch NumberFormatException also.
try {
Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8"));
} catch (NumberFormatException | UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Using java.net.URI class:
public String getDecodedURL(String encodedUrl) {
try {
URI uri = new URI(encodedUrl);
return uri.getScheme() + ":" + uri.getSchemeSpecificPart();
} catch (Exception e) {
return "";
}
}
Please note that exception handling can be better, but it's not much relevant for this example.

I was having this problem too and came here as an answer. But I used the code of the friend whose question was approved, it didn't work. I tried something different and it worked, so I'm sharing the following line of code in case it helps.
URLDecoder.decode(URLDecoder.decode(url, StandardCharsets.UTF_8)))

Related

How to convert a Java string into a ByteString without re-escaping the string

I'm trying to convert an already java encoded string into a ByteString, without re-encoding the string.
My method looks like this:
import com.google.api.HttpBody;
import com.google.protobuf.ByteString;
protected HttpBody getHttpBody(String contentType, String data) {
System.out.println("=======d==:" + data);
HttpBody.Builder bodyBuilder = HttpBody.newBuilder();
bodyBuilder.setContentType(contentType);
try {
bodyBuilder.setData(ByteString.copyFrom(data.getBytes()));
} catch (Exception e) {}
HttpBody body = bodyBuilder.build();
System.out.println("=======d2==:" + body);
return body;
}
The input "data" looks like this:
=======d==:------WebKitFormBoundaryeyaPdl6AduTufZV4\r\nContent-Disposition: form-data; name=\"content\"; filename=\"473892_CRUST_grammars.zip\"\r\nContent-Type: application/x-zip-compressed\r\n\r\nPK\003\004\024\000\000\000\b\000\342:nT\222{\260q;\001\000\000\022\002\000\000\021\000\000\000en-US/CRUST.grxmleQOO\2030\024?\217OQ\353\201\023+h4s\026\226\005\227x\232\211l1\272L\322\2617\326\004ZR\212\250\237\336\256\262\211YO\257\357\375\376\275\226N>\313\002}\200\252\271\024\241\033\f}\027\201\310\344\226\213<ty\362\344\215F7w^\340N\"\207\346\212\225%S\310P\306\0053\000\f\302[&\370\304\307\206\217\221\222R\2078~^&\v|\300\212:\304{\255\2531!m\333\016\333\353\241T9\271\362\375\200\370\267\244\023\305\221\203\272CK\320\f\tVB\210\353\226+\310R!U\311\n\376\r\251\226i\245\344\206mx\3015\207\032\243L\n\r\3028\006\230D\2163\240\252)\000\361\355)B\235\311\312HU\315\246\340\231\361\031\f\250\024\340\311\335\237%2=\256\241\214,Y\301\0165\212\207\3702\333\003\324_F\227j\226G\364b\025?L\027\323\225\0256\372\217\263Y\362j/\357\261E\242X5\265\036\317\337\322\376l>{A\3759\276_\257#J\254$\261\266\207L\344\030\312\224\207\024\377w\351\222\364\"\233\207:[\303v\255\"\372%\240\243A\037p\356D\217\337\0209?PK\001\002\024\000\024\000\000\000\b\000\342:nT\222{\260q;\001\000\000\022\002\000\000\021\000\000\000\000\000\000\000\001\000 \000\000\000\000\000\000\000en-US/CRUST.grxmlPK\005\006\000\000\000\000\001\000\001\000?\000\000\000j\001\000\000\000\000\r\n------WebKitFormBoundaryeyaPdl6AduTufZV4--\r\n
While the data after being converted to ByteString looks like this:
=======d2==:content_type: "multipart/form-data; boundary=----WebKitFormBoundaryeyaPdl6AduTufZV4"
data: "------WebKitFormBoundaryeyaPdl6AduTufZV4\\r\\nContent-Disposition: form-data; name=\\\"content\\\"; filename=\\\"473892_CRUST_grammars.zip\\\"\\r\\nContent-Type: application/x-zip-compressed\\r\\n\\r\\nPK\\003\\004\\024\\000\\000\\000\\b\\000\\342:nT\\222{\\260q;\\001\\000\\000\\022\\002\\000\\000\\021\\000\\000\\000en-US/CRUST.grxmleQOO\\2030\\024?\\217OQ\\353\\201\\023+h4s\\026\\226\\005\\227x\\232\\211l1\\272L\\322\\2617\\326\\004ZR\\212\\250\\237\\336\\256\\262\\211YO\\257\\357\\375\\376\\275\\226N>\\313\\002}\\200\\252\\271\\024\\241\\033\\f}\\027\\201\\310\\344\\226\\213<ty\\362\\344\\215F7w^\\340N\\\"\\207\\346\\212\\225%S\\310P\\306\\0053\\000\\f\\302[&\\370\\304\\307\\206\\217\\221\\222R\\2078~^&\\v|\\300\\212:\\304{\\255\\2531!m\\333\\016\\333\\353\\241T9\\271\\362\\375\\200\\370\\267\\244\\023\\305\\221\\203\\272CK\\320\\f\\tVB\\210\\353\\226+\\310R!U\\311\\n\\376\\r\\251\\226i\\245\\344\\206mx\\3015\\207\\032\\243L\\n\\r\\3028\\006\\230D\\2163\\240\\252)\\000\\361\\355)B\\235\\311\\312HU\\315\\246\\340\\231\\361\\031\\f\\250\\024\\340\\311\\335\\237%2=\\256\\241\\214,Y\\301\\0165\\212\\207\\3702\\333\\003\\324_F\\227j\\226G\\364b\\025?L\\027\\323\\225\\0256\\372\\217\\263Y\\362j/\\357\\261E\\242X5\\265\\036\\317\\337\\322\\376l>{A\\3759\\276_\\257#J\\254$\\261\\266\\207L\\344\\030\\312\\224\\207\\024\\377w\\351\\222\\364\\\"\\233\\207:[\\303v\\255\\\"\\372%\\240\\243A\\037p\\356D\\217\\337\\0209?PK\\001\\002\\024\\000\\024\\000\\000\\000\\b\\000\\342:nT\\222{\\260q;\\001\\000\\000\\022\\002\\000\\000\\021\\000\\000\\000\\000\\000\\000\\000\\001\\000 \\000\\000\\000\\000\\000\\000\\000en-US/CRUST.grxmlPK\\005\\006\\000\\000\\000\\000\\001\\000\\001\\000?\\000\\000\\000j\\001\\000\\000\\000\\000\\r\\n------WebKitFormBoundaryeyaPdl6AduTufZV4--\\r\\n"
What I want is for the 2 data fields to look the same. As you can see after the conversion to ByteString, the data content gets escaped again (which I don't want).
I've tried a couple of copyFrom /copyFromUtf8 approaches, providing encoding, even tried to unescape the input using StringEscapeUtils functionality, before passing it into copyFrom, but messes up the message structure.
This message is a multipart form data payload which contains a zip file with a text file in it.
I need this in order to mock a request for some integration tests.
Any idea how I can achieve this ?
Thanks

OK for whoever will be interested, I've took a different approach to solve my problem. Basically I've serialized the whole HttpBody object and dumped it into a text file and then in my tests I've desterilized the content of the file.
That worked.
protected HttpBody deserializeHttpBody(String file) {
try {
InputStream in = this.getClass().getClassLoader().getResourceAsStream("data/" + file);
ObjectInputStream objectInputStream = new ObjectInputStream(in);
HttpBody obj = (HttpBody) objectInputStream.readObject();
objectInputStream.close();
return obj;
} catch (Exception e) {
System.out.println("==============Error==" + e);
}
return null;
}
private void serialize (com.google.api.HttpBody body) {
try {
FileOutputStream fileOutputStream
= new FileOutputStream("/tmp/upload"+System.currentTimeMillis()+".txt");
ObjectOutputStream objectOutputStream
= new ObjectOutputStream(fileOutputStream);
objectOutputStream.writeObject(body);
objectOutputStream.flush();
objectOutputStream.close();
System.out.println("==============Saved==");
} catch (Exception e) {
System.out.println("==============Error==" + e);
}
}

Inject Hexadecimal Java

well, i am having problems with the code i made, i want to inject hexadecimal in a certain file, but when compiling comes an error message saying that some symbols are missing
String str = "rw";
String str2 = "test.so";
try {
raf = new RandomAccessFile(str2, str);
write(0x1234, "0100A0E3");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
public void write(int i, String str) {
try {
raf.seek((long) i);
raf.write(Hex2b(str));
} catch (IOException e) {
e.printStackTrace();
}
could anyone tell which symbols are missing? I'm new to programming

Hex2b isn't a thing. I don't know where you got that from. Info on how to convert a string such as "0100A0E3" into bytes so that you can pass it as argument to raf.write can be found at this SO answer.

How to resolve Russian language encoding in java?

Constructing a String with value as ФФХЧЯЯЯЯэшЩтЯ .The string value is in Russian Language.
String russian=new String("ФФХЧЯЯЯЯэшЩтЯ");
Printing the string as below.
ФФХЧЯЯЯЯ�?шЩтЯ
so the э in the character set is not able to convert.
Tried with all the possible encoding types like, UTF-8,ISO-8859-1,ISO-8859-2,ISO-8859-3 and many things i have tried
public void setter(String attachment) {
try {
filename=new String(filename.getBytes(),"UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
this.filename= filename;
}

arabic string in JSONObject is not readable [duplicate]

In Java, I want to convert this:
https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type
To this:
https://mywebsite/docs/english/site/mybook.do&request_type
This is what I have so far:
class StringUTF
{
public static void main(String[] args)
{
try{
String url =
"https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do" +
"%3Frequest_type%3D%26type%3Dprivate";
System.out.println(url+"Hello World!------->" +
new String(url.getBytes("UTF-8"),"ASCII"));
}
catch(Exception E){
}
}
}
But it doesn't work right. What are these %3A and %2F formats called and how do I convert them?

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.
Try something like this:
try {
String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
// not going to happen - value came from JDK's own StandardCharsets
}
Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:
String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);
Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

The string you've got is in application/x-www-form-urlencoded encoding.
Use URLDecoder to convert it to Java String.
URLDecoder.decode( url, "UTF-8" );

This has been answered before (although this question was first!):
"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
As URL class documentation states:
The recommended way to manage the encoding and decoding of URLs is to
use URI, and to convert between these two classes using toURI() and
URI.toURL().
The URLEncoder and URLDecoder classes can also be used, but only for
HTML form encoding, which is not the same as the encoding scheme
defined in RFC2396.
Basically:
String url = "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type";
System.out.println(new java.net.URI(url).getPath());
will give you:
https://mywebsite/docs/english/site/mybook.do?request_type

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and /
String decoded = java.net.URLDecoder.decode(url, "UTF-8");

public String decodeString(String URL)
{
String urlString="";
try {
urlString = URLDecoder.decode(URL,"UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
}
return urlString;
}

I use apache commons
String decodedUrl = new URLCodec().decode(url);
The default charset is UTF-8

try {
String result = URLDecoder.decode(urlString, "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

import java.io.UnsupportedEncodingException;
import java.net.URISyntaxException;
public class URLDecoding {
String decoded = "";
public String decodeMethod(String url) throws UnsupportedEncodingException
{
decoded = java.net.URLDecoder.decode(url, "UTF-8");
return decoded;
//"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
}
public String getPathMethod(String url) throws URISyntaxException
{
decoded = new java.net.URI(url).getPath();
return decoded;
}
public static void main(String[] args) throws UnsupportedEncodingException, URISyntaxException
{
System.out.println(" Here is your Decoded url with decode method : "+ new URLDecoding().decodeMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type"));
System.out.println("Here is your Decoded url with getPath method : "+ new URLDecoding().getPathMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest"));
}
}
You can select your method wisely :)

If it is integer value, we have to catch NumberFormatException also.
try {
Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8"));
} catch (NumberFormatException | UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

Using java.net.URI class:
public String getDecodedURL(String encodedUrl) {
try {
URI uri = new URI(encodedUrl);
return uri.getScheme() + ":" + uri.getSchemeSpecificPart();
} catch (Exception e) {
return "";
}
}
Please note that exception handling can be better, but it's not much relevant for this example.

I was having this problem too and came here as an answer. But I used the code of the friend whose question was approved, it didn't work. I tried something different and it worked, so I'm sharing the following line of code in case it helps.
URLDecoder.decode(URLDecoder.decode(url, StandardCharsets.UTF_8)))

URL encoding encodes my String twice

I have a Json String to encode
String strMappingList = [{"Id": "67","AccessType": "2"},{"Id": "1111","AccessType": "2"},{"Id": "1166","AccessType": "2"}]
When I did url encoding it encodes strMappingList twice
try {
String str = URLEncoder.encode(strMappingList, "utf-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

Try the code you take in strings file
if you get response from server that fine not use in string.xml you use direct
in string.xml
<string name="urls">[{"Id": "67","AccessType": "2"},{"Id": "1111","AccessType": "2"},{"Id": "1166","AccessType": "2"}]</string>
Code
String strMappingList = getResources().getString(R.string.urls);
try {
String str = URLEncoder.encode(strMappingList, "UTF-8");
System.out.println("Strings"+str);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Output Single time
%5B%7BId%3A+67%2CAccessType%3A+2%7D%2C%7BId%3A+1111%2CAccessType%3A+2%7D%2C%7BId%3A+1166%2CAccessType%3A+2%7D%5D

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Converting ASCII characters to UTF-8 within a Java String [duplicate] - java

The string you've got is in application/x-www-form-urlencoded encoding. Use URLDecoder to convert it to Java String. URLDecoder.decode( url, "UTF-8" );

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and / String decoded = java.net.URLDecoder.decode(url, "UTF-8");

public String decodeString(String URL) { String urlString=""; try { urlString = URLDecoder.decode(URL,"UTF-8"); } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block } return urlString; }

I use apache commons String decodedUrl = new URLCodec().decode(url); The default charset is UTF-8

try { String result = URLDecoder.decode(urlString, "UTF-8"); } catch (UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); }

If it is integer value, we have to catch NumberFormatException also. try { Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8")); } catch (NumberFormatException | UnsupportedEncodingException e) { // TODO Auto-generated catch block e.printStackTrace(); }

Related

How to convert a Java string into a ByteString without re-escaping the string

Inject Hexadecimal Java

How to resolve Russian language encoding in java?

arabic string in JSONObject is not readable [duplicate]

URL encoding encodes my String twice

Categories

Resources