Split character is not recognized - java

As I looked for a reliable character for splitting strings, I found out an earlier post about using "((char)007)" as a split character so i decided to use that for a request/response project I'm building.
But when I send data with "((char)007)" between data parts that need to be seperated, the data arrives at the other end of the socket like this instead "teq□weq□1231□21231".
So splitting this data properly is unsuccessful at the moment. Any ideas about why this happens and what kind of approach I might follow to fix this, what else I can use for splitting, any ideas would be appreciated, thanks.

If you are printing control characters (BELL) then your console may not print it out properly.
In any case, consider just sending a structure like a serialized object (be careful with deserializing user-supplied content) or perhaps JSON. Any structure with a standardized format will do better in the long term versus arbitrary splitting on a magic character

Related

How would you go about separating strings in a file if they may contain any arbitary characters?

I'm currently writing a Java program which involves goals. It's basically a to-do list. Each goal has a few strings, such as name, description etc. I can save and load these goals to a file. My issue was separating the strings - I couldn't think of a character that couldn't be in the string itself. I ended up prefixing each string with it's length and then a colon.
I'm sure there is something in the Java API that will handle this, like ObjectOutputStream. I'm curious about the 'general case', though. This must be an issue for any program that saves and loads strings from a file without being able to assume anything about the string. Is there a better way to go about this?
There are couple of ways to handle your case, e.g:
Encoding your String with something like base64
Applying a well defined format, e.g. JSON or CSV
There are tons of tools support you including:
Apache Commons codec for base64 encoding/decoding
Jaskson for JSON serializing/deserializing
opencsv for csv serializing/deserializing

split JSON and string in android

My HTTP Request responds with combination of string and JSON, something like this:
null{"username:name","email:email"}
I need only the JSON part.
I directly tried parsing as json object, which was not right of course. I tried splitting it: serverResponse.split("{"), but android does not allow to parse with this character because it is not a pattern. Any suggestion how i can achieve this?
String.split uses regular expressions, and since '{' is a special character in regular expressions, you should escape it like this: serverResponse.split("\\{").
It would be better to change the server side, but you can also just use split. The only thing you need to do is escape your {.
String json = serverResponse.split("\\{")[1];
It is a bad idea and a bad practice to split a Json. If one day it you change on the serve side, it may pick a wrong part of your Json Object.
I recommend you to PARSE it, even if it is simple and small.

Replacing Java unicode encodings with actual characters

When I make web queries, for accented characters, I get special character encodings back as strings such as "\u00f3" , but I need to replace it with the actual character, like "ó" before making another query.
How would I find these cases without actually looking for each one, one by one?
It seems you're handling JSON formatted data.
Use any of the many freely available JSON libraries to handle this (and other parsing issues) for you instead of trying to do it manually.
The one from JSON.org is pretty widely used, but there are surely others that work just as well.

Parsing of data structure in a plain text file

How would you parse in Java a structure, similar to this
\\Header (name)\\\
1JohnRide 2MarySwanson
1 password1
2 password2
\\\1 block of data name\\\
1.ABCD
2.FEGH
3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc
Suppose, it comes from a text buffer (plain file).
Each line of text is "\n" - limited. Space is used between the words.
The structure is more or less defined. Ambuguity may sometimes be, though, case
number of fields in each line of information may be different, sometimes there may not
be some block of data, and the number of lines in each block may vary as well.
The question is how to do it most effectively?
First solution that comes to my head is to use regular expressions.
But are there other solutions? Problem-oriented? Maybe some java library already written?
Check out UTAH: https://github.com/sonalake/utah-parser
It's a tool that's pretty good at parsing this kind of semi structured text
As no one recommended any library, my suggestion would be : use REGEX.
From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).
This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).
First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?
After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.
If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).
Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).
If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.
E.g., the password can be gotten from your password line like this:
final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());

Java unreadable strings

I have made a java socket listener which listens on port 80. And what is basically does is it gathers the data that it listens on port 80 and stores it in a temporary string which is then used for further operation(type conversions et all). Now the basic problem is that the data that comes on port 80 has parts that are unreadable (like # [ Qô — z ‡ ). And now that im storing it in a string and when i print the string, it prints only the readable parts which is understandable, but what puzzles me is that when i print the length of the string, it only prints the length of the readable part. SO i want to know if my approach of storing unreadable string parts in a string is acceptable to enable further operations on them. If not, I would also like some pointers as to how I could store such incoming data.
Regards
p1nG
Something does not make sense here. If you are storing the "unreadable" part of the data in the String, it will be reflected in the length of the String.
i want to know if my approach of storing unreadable string parts in a string is acceptable to enable further operations on them. If not, I would also like some pointers as to how I could store such incoming data.
It depends on why the data is unreadable.
One possibility is that the remote system is sending data in some unexpected character set or encoding. For example, if it is sending Latin-1 and you are expecting UTF-8 (or vice versa) some sections of the text may be unreadable. The solution is to figure out what character set and encoding the remote system is sending, and use the correct Java charset name when converting to to Java characters.
Another possibility is that some of the data is binary data. If so, you should separate the text from the binary data, based on the application protocol used by the remote system.
Finally, the unreadable stuff might be caused by line noise or such like. If that's the case, you should probably leave it intact.
An alternative approach is to use a byte array (or something similar) rather than a String to hold the data. The problem with trying to convert bytes to characters when you are not sure of the character set and encoding is that the conversion may be lossy. By storing the raw bytes, your application at least has the possibility of getting it right later ... when you figure out what the correct conversion is.
you can store the data in a java.nio.ByteBuffer to avoid all the string wackiness...
if it's truly text being sent in some wide character encoding, you'll want to convert the byte buffer into a string using the appropriate character set with the handy java.nio.charset.Charset.decode

Categories