Convert string into utf-8 in php

Convert string into utf-8 in php - java

There is a script written in Java and I am trying to convert it into PHP, but I'm not getting the same output.
How can I get it in PHP same as in Java?
Script written in Java
String key = "hghBGJH/gjhgRGB+rfr4654FeVw12y86GHJGHbnhgnh+J+15F56H";
byte[] kSecret = ("AWS4" + key).getBytes("UTF8");
Output: [B#167cf4d
Script written in PHP
$secret_key = "hghBGJH/gjhgRGB+rfr4654FeVw12y86GHJGHbnhgnh+J+15F56H";
utf8_encode("AWS4".$secret_key);
Output: AWS4hghBGJH/gjhgRGB+rfr4654FeVw12y86GHJGHbnhgnh+J+15F56H

The result [B#167cf4d you are getting is a toString() call on the byte array. The [B means the value is a byte array. #167cf4d is it's location within the virtual memory. You simply cannot get the PHP script to give you the same result. What you need to do is fix the printing on the Java side, but we don't know what API you're using to print it and where. Is it a web-application, a local application, etc... ?
edit:
Since you're using it in a Java web-application, there are two likely scenarios. The code is either in a Servlet, or in a JSP. If it's in a servlet, you have to set your output to UTF-8, then you can simply print the string like so:
response.setCharacterEncoding("UTF-8");
PrintWriter out = response.getWriter();
out.write("AWS4");
out.write(key);
If it's in a JSP, that makes it a bit more inconvenient, because you have to make sure you don't leave white-spaces like new lines before the printing begins. It's not a problem for regular HTML pages, but it's a problem if you need precisely formatted output to the last byte. The response and out objects exist readily there. You will likely have some <%#page tags with attributes there, so you have to make sure you're opening the next one as you're closing the last. Also a new line on the end of file could skew the result, so in general, JSP is not recommended for white-space-sensitive data output.

You can't sysout an byte array without converting it to a string, UTF-8 by default, toString() is being called,when you do something like:
System.out.println("Weird output: " + byteArray);
This will output the garbage you mentioned above. Instead, create a new instance of the standard String class.
System.out.println("UTF-8: " + new String(byteArray));

In Java you get the object reference of the byte array, not its value. Try:
new String(kSecret, "UTF-8");

Related

Expected equal JSON strings are not equal when compared using assertEquals

I wrote JUnit test for the web service client, which submits JSON document to the service.
I saved "correct" JSON document to the file, then after the test execution I compare it with actual result.
They are not matched, although lines are identical:
org.junit.ComparisonFailure:
Expected :{"Callback":null,"Data":
{"MarketCode":"ISEM",,............"Price":2.99}]}]}]}]}}
Actual :{"Callback":null,"Data":
{"MarketCode":"ISEM",,............"Price":2.99}]}]}]}]}}
Lines are very long , about 4K characters, so I cut much of it here, but their length is identical. I compared string.size() in the debugger , and also I trim it before the compare, to remove some invisible or whitespace symbols in the end, which text editor can implicitly insert.
Also, test is OK when executed isolately. But it fails , when I run it as part of bigger suite.
There is no global/static variables, so memory overriding should be not an issue.
I'm mocking web service client to extract the request string , like this:
StringBuilder pd = new StringBuilder();
doAnswer((invocation) -> {
String postDocument = ((String)invocation.getArguments()[0]).trim();
pd.append(postDocument);
return null;
}).when(client).doPost(anyString(), anyObject());
client is mocked class.
Then I compare trimmed versions of strings, but it doesnt help
String expectedSubmit = TestUtils.readXmlFromFile("strategyexecution\\ireland_bm_strategy_override_expected.json").trim();
assertEquals(expectedSubmit, pd.toString().trim());

I found answer myself :-)
The problem is with JSON specification itself.
JSON cannot guarantee the same order of elements inside the array, it's basically unordered set.
So, the content can be randomly reordered. Two produced JSON files should not be compared as two strings.
I deserialized it to Java object and object comparision works!

Same old issue as we had with XML. For XML there is XMLUnit which semantically compares xml-s. For JSON I'd try to use a similar tool, like JsoNunit. JSONAssert too looks promising.

thrift character encoding, perl to java

I have a complex situation that I'm trying to deal with involving character encoding.
I have a perl program which is communicating with a java endpoint via thrift, the java is then using the data to make a request to a legacy php service. It's ugly, but part of a migration plan so needs to work for a short while.
In perl a thrift object is created where some of the fields of the thrift object are json encoded strings.
The problem is that when perl makes the request to java, one of the strings is as follows (this is from data:dumper and is subsequently json encoded and added to thrift):
'offer_message' => "<<>>
&&
\x{c3}\x{82}\x{c2}\x{a9}©
<script>alert(\"XSS\");</script>
https://url.com/imghp?hl=uk",
However, when this data is received on the java side the sequence \x{c3}\x{82}\x{c2}\x{a9} has been converted so in java we receive the following:
<<>>\\n&&\\nÃ�Â�Ã�Â©©\\n<script>alert(\"XSS\");</script>\\nhttps://www.google.com.ua/imghp?hl=uk
The problem is that if I pass the second string to the legacy php program, it fails, if I pass the string taken from the dump of the perl hash, it succeeds. So my assumption is that I need to convert the received string to another encoding (correct me if I'm wrong, I'm not sure that this is the right solution).
I've tried taking the parameters received in java and converting them to every encoding I can think of, however it doesn't work. So for example:
byte[] utf8 = templateParams.getBytes("UTF8");
normallisedTemplateParams = new String(utf8, "UTF8");
I've been varying the encoding schemes in the hope I find something that works.
What is the correct way to solve this? For a short time this messy solution is my only option while other re-engineering is happening.

The problem in the end difficult to diagnose but simple to resolve. It turned out that the package I was using to convert in Java was using java's default encoding of UTF-16. I had to modify the package and force it to use UTF-8. After that, everything worked.

how to write hexadecimal values to a binary file

Im currently trying to build a save editor for a video game. Anyway the I figured out how to write to the binary file with output stream rather than writer I'm running into a problem. I'm trying to overwrite certain hexadecimal values but every time I try I end up replacing the whole file, theres probably an easy explanation for this but I also wanted advice on how to replace the hex values converting the hex values (ex. 5acd) from a string only gives me the byte data for the strings. Heres what I'm doing:
String textToWrite = inputField.getText();
byte[] charsToWrite = textToWrite.getBytes();
FileOutputStream out = new FileOutputStream(theFile);
out.write(charsToWrite, 23, charsToWrite.length)

Use a RandomAccessFile. This has the methods that you are looking for. FileOutputStream will only allow you to overwrite or append. However, note as Murali VP eluded to, this will only allow you to perform direct replacements (byte-for-byte) - and not removal or insertion of bytes.

Converting from Hex String to Byte Array (which is essentially what you need) - see this SO post for what you need.
HTH

Convert String to model or Statement in jena using Java program?

I made a program for RDF by using jena in java... I have to return the result in string format.. and then in other function i have to get it as a string format and convert it to either model or statement.... Is that possible... If so how to do that... could some one help me with a sample code...
Thanks in advance

If the RDF you want to serialize is less than your complete model, then create a temporary memory model and copy into it the statements to want to write. Use Model.write to convert those statements to a string (in RDF/XML, Turtle or N-triples format). When you want to load a string containing RDF, create a java.io.StringReader object containing your string and pass that to the Model.read method.

It may be important to note that, according to the latest JavaDoc, the two Model.read() methods that take a Reader as a method parameter all say "Using this method is often a mistake.". I do not know why the JavaDoc says that, but it does. An alternative that I am using is to pass in an InputStream, as shown (where 'is' is the InputStream):
// read(InputStream in, String base, String lang)...
memModel.read(is, null,"TTL");
If you need to turn a String into an InputStream first, you can use:
InputStream is = new ByteArrayInputStream( str.getBytes() );

Why can I only convert a Representation to a string once in RESTlet?

So,
I'm trying to convert a Representation to a String or a StringWriter either using the getText() or write() method. It seems I can only call this method once successfully on a Representation... If I call the method again, it returns null or empty string on the second call. Why is this? I'd expect it to return the same thing every time:
public void SomeMethod(Representation rep)
{
String repAsString = rep.getText(); // returns valid text for example: <someXml>Hello WOrld</someXml>
String repAsString2 = rep.getText(); // returns null... wtf?
}
If I'm "doing it wrong" then I'd be open to any suggestions as to how I can get to that data.

The javadocs explain this:
The content of a representation can be
retrieved several times if there is a
stable and accessible source, like a
local file or a string. When the
representation is obtained via a
temporary source like a network
socket, its content can only be
retrieved once.
So presumably it's being read directly from the network or something similar.
You can check this by calling isTransient(). If you need to be able to read it multiple times, presumably you should convert it to a string and then create a new Representation from that string.

It's because in general the Representation doesn't actually get read in from the InputStream until you ask for it with getText(), and once you've asked for it, all the bytes have been read and converted into the String.
This is the natural implementation for efficiency: rather than creating a potentially very large String and then converting that String into something useful (a JSON object, a DOM tree, or whatever), you write your converter to operate on the InputStream instead, avoiding the costs of making and reading that huge String.
So for example if you have a large XML file being PUT into a web service, you can feed the InputStream right into a SAX parser.
(As #John notes, a StringRepresentation wraps a String, and so can be read multiple times. But you must be reading a Request's representation, which is most likely an InputRepresentation.)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert string into utf-8 in php - java

In Java you get the object reference of the byte array, not its value. Try: new String(kSecret, "UTF-8");

Related

Expected equal JSON strings are not equal when compared using assertEquals

thrift character encoding, perl to java

how to write hexadecimal values to a binary file

Convert String to model or Statement in jena using Java program?

Why can I only convert a Representation to a string once in RESTlet?

Categories

Resources