How to compare two char* variables - java

Suppose we have the following method (it is in c code):
const char *bitap_search(const char *text, const char *pattern)
My question is how can I compare text and pattern if they are char? This method is like a substring problem but I am confused a bit can I write in term of char such code?
if (text[i]==pattern[i])?
look i am interesting at this algorithm in java
http://en.wikipedia.org/wiki/Bitap_algorithm
how implement this in java?
R = malloc((k+1) * sizeof *R);
and please help me to translate this code in java
so we have two string text?
like "i like computer it is very important"
and patter string " computer it is very"?
can anybody explain me what we have instead of char?

I'm not sure what exactly you are asking, but if you mean to find pattern in text, then strstr(text, pattern). Or if you mean to just compare text and pattern, then strcmp(text, pattern) (note that it returns 0 when text and pattern are equal).
Edit based on discussion in comments: If you mean to ask how to implement the indexing of individual characters in Java, then substitute (in Java) text.charAt(i) for the C text[i]. In C the chars in strings can be indexed directly like an array, in Java one needs to call the correct method in String.
Edit 2: The C code const char * can be replaced in Java with String.
In C malloc is used to allocate memory; in this case it allocates room in the array R for m+1 elements. So, the BIT *R can be removed and R = malloc((m+1) * sizeof *R); replaced with boolean[] R = new boolean[m + 1];. When assigning values into the array R substitute true for 1 and false for 0.

I think you are confused about the difference between char and char *. In C there is no built-in string type. Strings are represented as null-terminated character arrays, meaning that the last character of the string must be \0 So char is a single character, while char * is a pointer to an array of characters, i. e. a string. And that means that it is perfectly fine to say if (text[i] == pattern[i]).

You might try these:
Google-diff-match-patch says that it
has a
java implementation of Bitap.
Also, it appears that crosswire
has an implementation.
Finally, there
seems to be a package called String
Search, whose title is
High-perfor­mance pattern matching
algo­rithms in Java.

You probably need strcmp() or strpos().

You should use strncmp(). The syntax is something like:
int strncmp( const char *str1, const char *str2, size_t count );
It is the best and more secure way of comparing strings, but of course you will need to know their length, or at least the minimum length between them.

Related

Quote marks when concatenating string with a single character

What kind of quote marks should I choose for a single character when I concatenate it with a string?
String s1="string";
Should I use
String s2=s1+'c';
Or
String s2=s1+"c";
?
You can use both! Give it a try!
"Why?" you ask. The magic here is the + operator.
When + is used with strings, it automatically turns the other operand into a string! That's why it can be used with 'c', a character literal. It can also be used with "c" because of course, "c" is a string literal.
Not only that, you can even add integers to a string:
String s2=s1+1;
U can use it in two diferent ways : String s2=s1+'c'; and
char x = 'c';
String s2 = s1 + x;
Both approaches are ok but if you are going into the details then perhaps
String s2=s1+'c';
will take a little less memory than the second way because char is just two bytes while String requires 8+ bytes. But I don't think that such nuances are important in most cases and also I'm not even sure that this difference will exist at all because JVM may optimize it
Java automatically does the conversion for you, so it doesn't really matter, but I'd personally just use a string (double quotes) just because I personally prefer to minimize the 'automatic stuff' that happens if I can prevent it.
Also, if you ever decide that 'c' should be 'csomething', then you'll have to change it into a boudle quote anyway.
But I suppose I'm just nitpicking...
Those are 2 different types of casting: implicit casting and explicit casting.
String s2=s1+'c';
This is a implicit casting, which means that the magic is done by the compiler (no overhead).
String s2=s1+"c";
This is a explicit casting, because "c" is converted to an object like:
Object o = "c";
String s2 = (String) o;
This means that the conversion must be checked for null-pointers, which will create an overhead.
Therefore, while both ways works, I prefer casting from character ('c') because that will create less overhead!
source: http://www.javaworld.com/article/2076555/build-ci-sdlc/java-performance-programming--part-2--the-cost-of-casting.html

Null termination in strings

Yes, I did check other threads and I have come to a conclusion. I just want you to confirm it so that I don't have any misconceptions.
Java String objects are not null terminated.
C++ std::string objects are also not null terminated
C strings or C-style strings if you will (array of characters), are the only strings that are null-terminated.
Correct or Incorrect?
C-strings are 0-terminated strings. You aren't forced to use them in C though.
Both C++ std::string and Java strings are counted strings, which means they store their length.
But C++ std::strings are also followed by a 0 since C++11, making them 0-terminated if (as often the case) they don't contain any embeddded 0, for better interoperability with 0-terminated-string APIs.
All of those are in themselves correct, but petty pedantery: C-style strings are not unique to C, there are other places where such things occur (most commonly in various forms of Assembler code, and C being a language originally designed to be "slightly above assembler" makes this "no surprise").
And in C++11, std::string is guaranteed to have a NUL terminator after the last actual string character [but it's valid to store NULL characters inside the string if you wish] (at least if you call c_str(), but in the implementations I've looked at, it's stored there on creation/update)
All the statements are not wrong, but need to clarify more of the specifics in each of the mentioned languages.
That is correct c++ std::string and java String both hold private fields indicating the length of the string. A NULL terminator is not needed.
The std::string method c_str returns the string as a NULL terminated char array for use when a NULL terminator is required e.g. c string functions such as strlen.
I don't know about the Java part, but in C++11 std::strings are NUL-terminated (besides storing the chars count), i.e. &s[0] returns the same string as s.c_str() (which is NUL-terminated, as a raw C-style string).
See this answer for more details.
The question you need to be asking is why C-String should be null terminated.
The answer is the string manipulation functions needs to know the exact length of the string. As strings in C are just array of characters there is no information that tells (this is the size of this array) they need something to help determining the size of array which is the null character standing at the end of it.
Where as in Java strings are instances of the String class which has the length field so there is no need for the null termination.
The same thing apply to strings in c++.
Almost correct.
C-string are not just an array of characters. They are a null-terminated array of characters.
So if you have an array of characters, it's not a C-string yet, it's just an ordinary array of characters. It has to have a terminating null character to be a valid C-style string.
Additionally, an std::string must also be null-terminated (since C++11). (But it still has a private variable holding the length of the string.)

java unicode value of char

When I do Collection.sort(List), it will sort based on String's compareTo() logic,where it compares both the strings char by char.
List<String> file1 = new ArrayList<String>();
file1.add("1,7,zz");
file1.add("11,2,xx");
file1.add("331,5,yy");
Collections.sort(file1);
My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc. How can I do it? Any url contains the numeric value of these?
My understanding is char means it specifies the unicode value, I want to know the unicode values of char like ,(comma) etc
Well there's an implicit conversion from char to int, which you can easily print out:
int value = ',';
System.out.println(value); // Prints 44
This is the UTF-16 code unit for the char. (As fge notes, a char in Java is a UTF-16 code unit, not a Unicode character. There are Unicode code points greater than 65535, which are represented as two UTF-16 code units.)
Any url contains the numeric value of these?
Yes - for more information about Unicode, go to the Unicode web site.
Uhm no, char is not a "unicode value" (and the word to use is Unicode code point).
A char is a code unit in the UTF-16 encoding. And it so happens that in Unicode's Basic Multilingual Plane (ie, Unicode code points ranging from U+0000 to U+FFFF, for code points defined in this range), yes, there is a 1-to-1 mapping between char and Unicode.
In order to know the numeric value of a code point you can just do:
System.out.println((int) myString.charAt(0));
But this IS NOT THE CASE for code points outside the BMP. For these, one code point translates to two chars. See Character.toChars(). And more generally, all static methods in Character relating to code points. There are quite a few!
This also means that String's .length() is actually misleading, since it returns the number of chars, not the number of graphemes.
Demonstration with one Unicode emoticon (the first in that page):
System.out.println(new String(Character.toChars(0x1f600)).length())
prints 2. Whereas:
final String s = new String(Character.toChars(0x1f600));
System.out.println(s.codePointCount(0, s.length());
prints 1.

difference between Strings in C++ and Java

In C++ I can do something like this...
String s = "abc";
char c = s[i]; // works fine...
But in Java, if I try doing the same, it throws an error. Why?.
In java, to achieve the above, I have to do :
s.toCharArray();
How is the implementation of Strings in C++ different from that in Java?
In java, to achieve the above, I have to do :
s.toCharArray();
Not really. You can use charAt instead:
char c = s.charAt(i);
Basically, C++ allows user-defined operators - Java doesn't. So the String class doesn't expose any sort of "indexing" operator; that only exists for arrays, and a String isn't an array. (It's usually implemented using an array, but that's a different matter.)
EDIT: As noted in comments, the + operator is special-cased for strings - right in the language specification. The same could have been done for [], but it isn't - and as it's not in the language specification, and Java doesn't support overloaded operators, it can't be performed in library code. (For example, you can't give custom behaviour to + for any other class.)
The difference is that C++ has operator overloading, and uses it to access the string contents.
They both store the string characters in such a way as you cannot change them.
The reason that it is possible to write
string s = "abc";
char c = s[i];
in C++ is that the string class has overloaded the indexing operator (say [] operator) which allows programmers to access characters of a string object the same way that they access an element of an array, despite the fact that a string object is not an array.
Java, on the other hand, does not allow operator overloading of any kind (the only exception is the + operator that is overloaded for strings) and hence, the indexing operator is not and can not be overloaded for String objects. In Java, to access a character of a string, you need to use accessor member methods, such as charAt. You can also invoke the toCharArray method of the String class, which returns to you an array of the characters of the string object and you can use the indexing operator with this returned value:
char c = s.toCharArray()[i];
See the method String#charAt
Returns the char value at the specified index. An index ranges from 0 to length() - 1. The first char value of the sequence is at index 0, the next at index 1, and so on, as for array indexing.
If the char value specified by the index is a surrogate, the surrogate value is returned.
public char charAt(int index)
in c++ strings are already treated as array of characters,
but in java String is a built in class.
it is different from array of characters.
In C++, a string is typically just an array of (or a pointer to) chars, terminated with a NULL (\0) character. You can process a string by indexing also as you would process any array.
But in Java , a strings are not arrays. Java strings are objects of type java.lang.String so You cannot process them by indexing .

I am having trouble creating a 16bit char in java

How can I create a variable character that can hold a four byte value?
I am trying to write an program to encrypt messages in java, for fun. I figured out how to use RSA, and managed to write a program that will encrypt a message and save it to a .txt file.
For example if "Quiet" is entered the outcome will be "041891090280". I wrote my code so that the number would always have length that is a multiple of six. So I thought that I could convert the numbers into a hash code. The first three letters are "041" so I could convert that into ")".
However I am having trouble created a char with a number greater than 255. I have looked around online and found a few examples, but I can't figure out how to implement them. I created a new method just to test them.
int a = 256;
char b = (char) a;
char c = 0xD836;
char[] cc = Character.toChars(0x1D50A);
System.out.println(b);
System.out.println(c);
System.out.println(cc);
The program outputs
?
?
?
I am only getting two bytes. I read that Java uses Unicode which should go up to 65535 which is four bytes. I am using eclipse if that makes a difference.
I apologize for the noob question.
And thanks in advance.
edit
I am sorry, I think I gave too much information and ended up being confusion.
What I want to do is store a string of numbers as a string of unicode characters. the only way I know how to do that is to break up the number string small enough to fit it into a character. then add the characters one by one to a new string. But I don't know how to add a variable unicode character to a string.
All chars are 16-bit already. 0 to 65535 only need 16-bit and 2^16 = 65536.
Note: not all characters are valid and in particular, 0xD800 to 0xDFFF are used for encoding code points (characters beyond 65536)
If you want to be able to store all possible 16-bit values I suggest you use short instead. You can store the same values but it may be less confusing to use.

Categories