Implementation of substring in C# and which algorithm does it use? - java

I have a question, i'm doing some research on some programming languages.
The research is about the efficiency of the substring functions in C# and Java.
Questions like is C# using a brute force kind of way, or do they implement Boyer-Moore's algorithm like a good boy.
I need the source code for this, I already found it for Java ( Who use a brute force implementation in the indexOf()method for those who wonder ).
Does anyone have an idea how i can retrieve the source code for methods like these in C#.
I have visual studios installed on my laptop but i can't find any source code...
Your help will much obliged!

Microsoft has published the complete framework source code, including comments. You will find the actual implementation over here on referencesource. For SubString, it comes down to some unmanaged code:
[System.Security.SecurityCritical] // auto-generated
unsafe string InternalSubString(int startIndex, int length) {
Contract.Assert( startIndex >= 0 && startIndex <= this.Length, "StartIndex is out of range!");
Contract.Assert( length >= 0 && startIndex <= this.Length - length, "length is out of range!");
String result = FastAllocateString(length);
fixed(char* dest = &result.m_firstChar)
fixed(char* src = &this.m_firstChar) {
wstrcpy(dest, src + startIndex, length);
}
return result;
As you can see, they are using wstrcpy which probably is as fast as it gets.

Related

efficient way to compare multiple int values to 0

I'm trying to compare three lengths to 0 and was wondering if there was a more efficient/cleaner way than repeating "!= 0".
public static boolean isTriangle(int lengthA, int lengthB, int lengthC) {
if (lengthA != 0 && lengthB != 0 && lengthC != 0) { //is there a shorter/cleaner way to write this?
//do a thing
}
return false;
}
You can use the IntStream and allMatch
if(IntStream.of(lengthA,lengthB,lengthC).allMatch(i->i!=0)) {
// do a thing
}
or by using noneMatch also
IntStream.of(lengthA,lengthB,lengthC).noneMatch(i->i==0)
The other way around you do this is by having a util method
public static boolean isNotZero(int val) {
return val!=0;
}
Now simplify the if condition
if (isNotZero(lengthA) && isNotZero(lengthB) && isNotZero(lengthC)) {
You're asking three things; you're asking for code that is
shorter
more efficient
cleaner
I have an alternative for you:
if ((lengthA & lengthB & lengthC) != 0)
It's correct - it does the same as your old code (it uses bitwise-and)
it is shorter.
it's potentially more efficient but only a good microbenchmark can confirm. Even if it is, that shouldn't guide you. Because of point 3, you should only consider it if it shows up as a bottleneck in your app using a performance analyzer tool, which is very, very unlikely.
however it's not cleaner. And with that I mean that it will make your code harder to read and understand, and anyone maintaining your code will now have to think about bit manipulation.
Most likely the same will go for any proposed alternative.
What you were doing in your original code is good enough; it's what people understand, and it's most likely the cleanest code you can write for the purpose.
Best answer (IMHO): "Don't even try to second-guess the optimizing compiler." Just specify the source-code in the way that most accurately specifies (to your fellow humans, nothing more ...) what you want the computer to do. Don't presume that the actual sequence of machine instructions that is actually given to the hardware actually corresponds to this. "It's magic. Really."
For Java language, your code is good. There's no better way to do this in Java.

JAVA string how can i implement the length method

My roommate's teacher gave them a assignment to implement string length method in JAVA?
we have thought out two ways.
Check the element,and when get the out of bounds exception,it means the end of string,we catch this exception,then we can get the length.
Every time a string is pass to calculate the length,we add the special character to the end of it,it can be '\0',or "A",etc..
But we all think this two way may can finish the assignment,but they are bad(or bad habit to do with exception),it's not cool.
And we have googled it,but don't get what we want.
Something like this?
int i = 0;
for (char ch : string.toCharArray()) {
i++;
}
The pseudo-code you probably want is:
counter = 0
for(Character c in string) {
counter = counter + 1
}
This requires you to find a way to turn a Java String into an array of characters.
Likely the teacher is trying to make his or her students think, and will be satisfied with creative solutions that solve the problem.
None of these solutions would be used in the real world, because we have the String.length() method. But the creative, problem-solving process you're learning would be used in real development.
"1. Check the element,and when get the out of bounds exception,it means the end of string,we catch this exception,then we can get the length."
Here, you're causing an exception to be thrown in the normal case. A common style guideline is for exceptions to be thrown only in exceptional cases. Compared to normal flow of control, throwing an exception can be more expensive and more difficult to follow by humans.
That said, this one of your ideas has a potential advantage for very long strings. All of the posted answers so far run in linear time and space. The time and/or additional space they take to execute is proportional to the length of the string. With this approach, you could implement an O(log n) search for the length of the string.
Linear or not, it's possible that the teacher would find this approach acceptable for its creativity. Avoid if the teacher has communicated the idea that exceptions are only for exceptional cases.
"2. Every time a string is pass to calculate the length,we add the special character to the end of it,it can be '\0',or "A",etc.."
This idea has a flaw. What happens if the string contains your special character?
EDIT
A simple implementation would be to get a copy of the underlying char array with String.toCharArray(), then simply take its length. Unlike your ideas, this is not an in-place approach - making the copy requires additional space in memory.
String s = "foo";
int length = s.toCharArray().length;
Try this
public static int Length(String str) {
str = str + '\0';
int count = 0;
for (int i = 0; str.charAt(i) != '\0'; i++) {
count++;
}
return count;
}
What about:
"your string".toCharArray().length

Splitting a string into n-length chunks in Java [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Split string to equal length substrings in Java
Given the following utility method I have:
/**
* Splits string <tt>s</tt> into chunks of size <tt>chunkSize</tt>
*
* #param s the string to split; must not be null
* #param chunkSize number of chars in each chuck; must be greater than 0
* #return The original string in chunks
*/
public static List<String> splitInChunks(String s, int chunkSize) {
Preconditions.checkArgument(chunkSize > 0);
List<String> result = Lists.newArrayList();
int length = s.length();
for (int i = 0; i < length; i += chunkSize) {
result.add(s.substring(i, Math.min(length, i + chunkSize)));
}
return result;
}
1) Is there an equivalent method in any common Java library (such as Apache Commons, Google Guava) so I could throw it away from my codebase? Couldn't find with a quick look. Whether it returns an array or a List of Strings doesn't really matter.
(Obviously I wouldn't add dependency to some huge framework just for this, but feel free to mention any common lib; maybe I use it already.)
2) If not, is there some simpler and cleaner way to do this in Java? Or a way that is strikingly more performant? (If you suggest a regex-based solution, please also consider cleanness in the sense of readability for non regex experts... :-)
Edit: this qualifies as a duplicate of the question "Split string to equal length substrings in Java" because of this Guava solution which perfectly answers my question!
You can do this with Guava's Splitter:
Splitter.fixedLength(chunkSize).split(s)
...which returns an Iterable<String>.
Some more examples in this answer.
Mostly a duplicate of Split Java String in chunks of 1024 bytes where the idea of turning it into a stream and reading N bytes at a time would seem to meet your need?
Here is a way of doing it with regex (which seems a bit of a sledgehammer for this particular nut)

Java Binary search

Trying to perform a binary search on a sorted array of Book objects.
Its not working well, it returns the correct results for some of the objects, but not all.
I went through the loop on paper and it seems that a number can get missed out due to rounding #.5 upwards.
Any ideas how to make this work?
Book found = null;
/*
* Search at the center of the collection. If the reference is less than that,
* search in the upper half of the collection, else, search in the lower half.
* Loop until found else return null.
*/
int top = numberOfBooks()-1;
int bottom = 0;
int middle;
while (bottom <= top && found == null){
middle = (bottom + top)/2;
if (givenRef.compareTo(bookCollection.get(middle).getReference()) == 0) {
found = bookCollection.get(middle);
} else if (givenRef.compareTo(bookCollection.get(middle).getReference()) < 0){
bottom = middle + 1;
} else if (givenRef.compareTo(bookCollection.get(middle).getReference()) > 0){
top = middle - 1;
}
}
return found;
A couple suggestions for you:
there's no need to keep a Book variable. In your loop, just return the book when it's found, and at the end return null. And you can also remove the boolean check for the variable in the while condition.
the middle variable can be scoped inside the loop, no need to have it live longer.
you're doing bookCollection.get(middle).getReference() three times. Consider creating a variable and then using it.
the middle = (bottom + top)/2 is a classic mistake in binary search implementation algorithms. Even Joshua Bloch, who wrote the Java Collection classes, made that error (see this interesting blog post about it). Instead, use (bottom+top) >>> 1, to avoid integer overflow for very large values (you probably wouldn't encounter this error, but it's for the principle).
As for your actual problem statement, rounding would be downwards (integer division), not upwards. To troubleshoot the problem:
are you sure the numberOfBooks() method corresponds to the length of your collection?
are you sure the compareTo() method works as expected for the types you are using (in your code example we do not know what the getReference() return type is)
are you sure your collection is properly sorted according to getReference()?
and finally, are you sure that using givenRef.compareTo(bookCollection.get(middle).getReference()) < 0 is correct? In standard binary search implementations it would be reversed, e.g. bookCollection.get(middle).getReference().compareTo(givenRef) < 0. This might be what donroby mentions, not sure.
In any case, the way to find the error would be to try out different values and see for which the output is correct and for which it isn't, and thus infer what the problem is. You can also use your debugger to help you step through the algorithm, rather than using pencil and paper if you have to run many tests. Even better, as donroby said, write a unit test.
What about Collections.binarySearch()?
All of JRL's suggestions are right, but the actual fail is that your compares are reversed.
I didn't see this immediately myself, but replicating your code into a function (using strings instead of Books), writing a some simple Junit tests and then running them in the debugger made it really obvious.
Write unit tests!
I found the problem.
It turns out i was binary searching my bookCollection arrayList, and NOT the new sroted array i had created - sortedLib.
Silly mistake at my end, but thanks for the input and suggestions!

Good choice for a lightweight checksum algorithm?

I find myself needing to generate a checksum for a string of data, for consistency purposes. The broad idea is that the client can regenerate the checksum based on the payload it recieves and thus detect any corruption that took place in transit. I am vaguely aware that there are all kinds of mathematical principles behind this kind of thing, and that it's very easy for subtle errors to make the whole algorithm ineffective if you try to roll it yourself.
So I'm looking for advice on a hashing/checksum algorithm with the following criteria:
It will be generated by Javascript, so needs to be relatively light computationally.
The validation will be done by Java (though I cannot see this actually being an issue).
It will take textual input (URL-encoded Unicode, which I believe is ASCII) of a moderate length; typically around 200-300 characters and in all cases below 2000.
The output should be ASCII text as well, and the shorter it can be the better.
I'm primarily interested in something lightweight rather than getting the absolute smallest potential for collisions possible. Would I be naive to imagine that an eight-character hash would be suitable for this? I should also clarify that it's not the end of the world if corruption isn't picked up at the validation stage (and I do realise that this will not be 100% reliable), though the rest of my code is markedly less efficient for every corrupt entry that slips through.
Edit - thanks to all that contributed. I went with the Adler32 option and given that it was natively supported in Java, extremely easy to implement in Javascript, fast to calculate at both ends and have an 8-byte output it was exactly right for my requirements.
(Note that I realise that the network transport is unlikely to be responsible for any corruption errors and won't be folding my arms on this issue just yet; however adding the checksum validation removes one point of failure and means we can focus on other areas should this reoccur.)
CRC32 is not too hard to implement in any language, it is good enough to detect simple data corruption and when implemted in a good fashion, it is very fast. However you can also try Adler32, which is almost equally good as CRC32, but it's even easier to implement (and about equally fast).
Adler32 in the Wikipedia
CRC32 JavaScript implementation sample
Either of these two (or maybe even both) are available in Java right out of the box.
Are aware that both TCP and UDP (and IP, and Ethernet, and...) already provide checksum protection to data in transit?
Unless you're doing something really weird, if you're seeing corruption, something is very wrong. I suggest starting with a memory tester.
Also, you receive strong data integrity protection if you use SSL/TLS.
Javascript implementation of MD4, MD5 and SHA1. BSD license.
Other people have mentioned CRC32 already, but here's a link to the W3C implementation of CRC-32 for PNG, as one of the few well-known, reputable sites with a reference CRC implementation.
(A few years back I tried to find a well-known site with a CRC algorithm or at least one that cited the source for its algorithm, & was almost tearing my hair out until I found the PNG page.)
[UPDATE 30/5/2013: The link to the old JS CRC32 implementation died, so I've now linked to a different one.]
Google CRC32: fast, and much lighter weight than MD5 et al. There is a Javascript implementation here.
In my search for a JavaScript implementation of a good checksum algorithm I came across this question. Andrzej Doyle rightfully chose Adler32 as the checksum, as it is indeed easy to implement and has some excellent properties. DroidOS then provided an actual implementation in JavaScript, which demonstrated the simplicity.
However, the algorithm can be further improved upon as detailed in the Wikipedia page and as implemented below. The trick is that you need not determine the modulo in each step. Rather, you can defer this to the end. This considerably increases the speed of the implementation, up to 6x faster on Chrome and Safari. In addition, this optimalisation does not affect the readability of the code making it a win-win. As such, it definitely fits in well with the original question as to having an algorithm / implementation that is computationally light.
function adler32(data) {
var MOD_ADLER = 65521;
var a = 1, b = 0;
var len = data.length;
for (var i = 0; i < len; i++) {
a += data.charCodeAt(i);
b += a;
}
a %= MOD_ADLER;
b %= MOD_ADLER;
return (b << 16) | a;
}
edit: imaya created a jsperf comparison a while back showing the difference in speed when running the simple version, as detailed by DroidOS, compared to an optimised version that defers the modulo operation. I have added the above implementation under the name full-length to the jsperf page showing that the above implementation is about 25% faster than the one from imaya and about 570% faster than the simple implementation (tests run on Chrome 30): http://jsperf.com/adler-32-simple-vs-optimized/6
edit2: please don't forget that, when working on large files, you will eventually hit the limit of your JavaScript implementation in terms of the a and b variables. As such, when working with a large data source, you should perform intermediate modulo operations as to ensure that you do not exceed the maximum value of the integer that you can reliably store.
Use SHA-1 JS implementation. It's not as slow as you think (Firefox 3.0 on Core 2 Duo 2.4Ghz hashes over 100KB per second).
Here's a relatively simple one I've 'invented' - there's no mathematical research behind it but it's extremely fast and works in practice. I've also included the Java equivalent that tests the algorithm and shows that there's less than 1 in 10,000,000 chance of failure (it takes a minute or two to run).
JavaScript
function getCrc(s) {
var result = 0;
for(var i = 0; i < s.length; i++) {
var c = s.charCodeAt(i);
result = (result << 1) ^ c;
}
return result;
}
Java
package test;
import java.util.*;
public class SimpleCrc {
public static void main(String[] args) {
final Random randomGenerator = new Random();
int lastCrc = -1;
int dupes = 0;
for(int i = 0; i < 10000000; i++) {
final StringBuilder sb = new StringBuilder();
for(int j = 0; j < 1000; j++) {
final char c = (char)(randomGenerator.nextInt(128 - 32) + 32);
sb.append(c);
}
final int crc = crc(sb.toString());
if(lastCrc == crc) {
dupes++;
}
lastCrc = crc;
}
System.out.println("Dupes: " + dupes);
}
public static int crc(String string) {
int result = 0;
for(final char c : string.toCharArray()) {
result = (result << 1) ^ c;
}
return result;
}
}
This is a rather old thread but I suspect it is still viewed quite often so - if all you need is a short but reliable piece of code to generate a checksum the Adler32 bit algorithm has to be your choice. Here is the JavaScript code
function adler32(data)
{
var MOD_ADLER = 65521;
var a = 1, b = 0;
for (var i = 0;i < data.length;i++)
{
a = (a + data.charCodeAt(i)) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}
var adler = a | (b << 16);
return adler;
}
The corresponding fiddle demonsrating the algorithm in action is here.

Categories