Thread Safe String Split - java

I have ran a split test with StringTokenizer and String's method indexOf() followed by substring(). After result I have to knew, the second one is faster than StringTokenizer. But my question is which is best for Thread safety?
String split using StringTokenizer:
String sample=null;
StringTokenizer tokens = new StringTokenizer("FF#Mukil","#");
if(tokens.hasMoreTokens())
{
sample= tokens.nextToken();
System.out.println(sample);
}
String split using String's method:
String sample= "FF#Mukil";
int pos=sample.indexOf('#');
System.out.println(sample.substring(0,pos));
In these examples, sample is global variable
Thanks in advance
Edited:
In Previous first example is like below
StringTokenizer tokens = new StringTokenizer("FF#Mukil","#");
if(tokens.hasMoreTokens())
{
String sample= tokens.nextToken();
System.out.println(sample);
}
I have changed it like below
String sample=null;
StringTokenizer tokens = new StringTokenizer("FF#Mukil","#");
if(tokens.hasMoreTokens())
{
sample= tokens.nextToken();
System.out.println(sample);
}
Edited: Added java file source.
//FileName=MySplitUtil.java
import javax.servlet.http.HttpServletRequest;
import java.util.StringTokenizer;
public class MySplitUtil
{
public static void setUsername_locale(HttpServletRequest request)
{
String value = (String) request.getAttribute("userLocale"); // userLocale's sample value looks like ffmukil$en_US
splite1(value,request);
splite2(value,request);
}
private static void splite1(String value,HttpServletRequest request)
{
StringTokenizer tokens = new StringTokenizer(value,"$");
String username=null;
String locale=null;
if(tokens.hasMoreTokens())
{
username = tokens.nextToken();
locale = tokens.nextToken();
}
request.setAttribute("usrName",username);
request.setAttribute("usr_Locale",locale);
}
private static void splite2(String value,HttpServletRequest request)
{
int pos = value.indexOf('$');
String username = value.substring(0,pos);
String locale = value.substring(pos+1,value.length());
request.setAttribute("usrName",username);
request.setAttribute("usr_Locale",locale);
}
}

Since you only use local variables, both versions are perfectly safe, whatever you do with them...
To run into thread safety issues, you need to (improperly) share something across threads. Nothing can be shared in your example.
More generally, Strings are immutable and therefore thread safe. So the second version, assuming that the sample variable is shared, would be thread safe.
EDIT
Your latest edit confirms 2 things:
your class is stateless (no instance variables)
you only use local variables
So both methods are thread safe and you should choose the simpler and/or more efficient one.

In java, Strings are immutable, nothing will affect the original string.
That immutability makes the Strings thread safe. It can be accesses by several StringTokenizer or string.substring at the same time without any problems.
However, the problem could come from the class StringTokenizer that you should not use on several strings at the same time. But, as pointed out by others, in your code you are only using local variables, so you won't have thread safety issues.

According to the Javadoc: "A StringTokenizer object internally maintains a current position within the string to be tokenized. Some operations advance this current position past the characters processed."
So it's not threadsafe. Why would you want to share it anyway? Just make it a local variable.

Related

How to extract a String from a changing template in Java?

I have a question regarding best practices considering Java regular expressions/Strings manipulation.
I have a changing String template, let's say this time it looks like this:
/get/{id}/person
I have another String that matches this pattern eg.
/get/1234ewq/person
Keep in mind that the pattern could change anytime, slashes could disappear etc.
I would like to extract the difference between the two of them i.e. the result of the processing would be 1234ewq.
I know I could iterate over them char by char and compare, but, if it is possible, I wanted to find some smart approach to it with regular expressions.
What would be the best Java approach?
Thank you.
For you to answer your question with a regex approach I built a small example class which should hint you into a direction you could go with this (see below).
The problem with this approach is that you dynamically create a regular expression that depends on your template strings. This means that you have to somehow verify that your templates do not interfere with the regex compilation and matching process itself.
Also atm if you would use the same placeholder multiple times within a template the resulting HashMap only contains the value for the last placeholder mapping of that kind.
Normally this is the expected behaviour but this depends on your strategy of filling your templates.
For template processing in general you could have a look at the mustache library.
Also as Uli Sotschok mentioned, you probably would be better of with using something like google-diff-match-patch.
public class StringExtractionFromTemplate {
public static void main(String[] args) {
String template = "/get/{id}/person";
String filledTemplate = "/get/1234ewq/person";
System.out.println(diffTemplateInsertion(template, filledTemplate).get("id"));
}
private static HashMap<String, String> diffTemplateInsertion(String template, String filledTemplate){
//language=RegExp
String placeHolderPattern = "\\{(.+)}";
HashMap<String, String> templateTranslation = new HashMap<>();
String regexedTemplate = template.replaceAll(placeHolderPattern, "(.+)");
Pattern pattern = Pattern.compile(regexedTemplate);
Matcher templateMatcher = pattern.matcher(template);
Matcher filledTemplateMatcher = pattern.matcher(filledTemplate);
while (templateMatcher.find() && filledTemplateMatcher.find()) {
if(templateMatcher.groupCount() == filledTemplateMatcher.groupCount()){
for (int i = 1; i <= templateMatcher.groupCount(); i++) {
templateTranslation.put(
templateMatcher.group(i).replaceAll(placeHolderPattern,"$1"),
filledTemplateMatcher.group(i)
);
}
}
}
return templateTranslation;
}
}

Java stringbuffer method

I am learning Java in this summer st college in US. I am new to Stringmethod.
I try to understand how to use Stringbuffer() method.
I tried to make ssl=statusBuffer in Else if section, but it causes error; how should I fix appropriate way?
Thank you so much.
public class test {
public static void main(String[] args){
String s= "123-45-6789";
String ssl;
int slength = (s.length());
if(slength ==11)
{
ssl = s;
}
else if(slength =9){
StringBuffer statusBuffer = new StringBuffer(s);
statusBuffer.insert(3,"-");
statusBuffer.insert(6,"-");
ssl=statusBuffer; //------ This part is causing error**
}
System.out.println(ssl);
}
}
ssl is a String. statusBuffer is a StringBuffer.
You need
ssl = statusBuffer.toString();
StringBuffer is not a string. You have to call statusBuffer.toString() to actually get the String from your StringBuffer.
In Java, Strings cannot be modified. If you create a String, it is always that String. If you modify a String, a new string is created. A StringBuffer is basically a modifiable String, which can be used for performance reasons when assembling big texts. They are also Thread safe. For more information, see the javadocs

Scanner.findInLine() leaks memory massively

I'm running a simple scanner to parse a string, however I've discovered that if called often enough I get OutOfMemory errors. This code is called as part of the constructor of an object that is built repeatedly for an array of strings :
Edit: Here's the constructor for more infos; not much more happening outside of the try-catch regarding the Scanner
public Header(String headerText) {
char[] charArr;
charArr = headerText.toCharArray();
// Check that all characters are printable characters
if (charArr.length > 0 && !commonMethods.isPrint(charArr)) {
throw new IllegalArgumentException(headerText);
}
// Check for header suffix
Scanner sc = new Scanner(headerText);
MatchResult res;
try {
sc.findInLine("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
res = sc.match();
} finally {
sc.close();
}
if (res.group(1) == null || res.group(1).isEmpty()) {
throw new IllegalArgumentException("Missing header keyword found"); // Empty header to store
} else {
mnemonic = res.group(1).toLowerCase(); // Store header
}
if (res.group(2) == null || res.group(2).isEmpty()) {
suffix = -1;
} else {
try {
suffix = Integer.parseInt(res.group(2)); // Store suffix if it exists
} catch (NumberFormatException e) {
throw new NumberFormatException(headerText);
}
}
if (res.group(3) == null || res.group(3).isEmpty()) {
isQuery= false;
} else {
if (res.group(3).equals("?")) {
isQuery = true;
} else {
throw new IllegalArgumentException(headerText);
}
}
// If command was of the form *ABC, reject suffixes and prefixes
if (mnemonic.contains("*")
&& suffix != -1) {
throw new IllegalArgumentException(headerText);
}
}
A profiler memory snapshot shows the read(Char) method of Scanner.findInLine() to be allocated massive amounts of memory during operation as a I scan through a few hundred thousands strings; after a few seconds it already is allocated over 38MB.
I would think that calling close() on the scanner after using it in the constructor would flag the old object to be cleared by the GC, but somehow it remains and the read method accumulates gigabytes of data before filling the heap.
Can anybody point me in the right direction?
You haven't posted all your code, but given that you are scanning for the same regex repeatedly, it would be much more efficient to compile a static Pattern beforehand and use this for the scanner's find:
static Pattern p = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)");
and in the constructor:
sc.findInLine(p);
This may or may not be the source of the OOM issue, but it will definitely make your parsing a bit faster.
Related: java.util.regex - importance of Pattern.compile()?
Update: after you posted more of your code, I see some other issues. If you're calling this constructor repeatedly, it means you are probably tokenizing or breaking up the input beforehand. Why create a new Scanner to parse each line? They are expensive; you should be using the same Scanner to parse the entire file, if possible. Using one Scanner with a precompiled Pattern will be much faster than what you are doing now, which is creating a new Scanner and a new Pattern for each line you are parsing.
The strings that are filling up your memory were created in findInLine(). Therefore, the repeated Pattern creation is not the problem.
Without knowing what the rest of the code does, my guess would be that one of the groups you get out of the matcher is being kept in a field of your object. Then that string would have been allocated in findInLine(), as you see here, but the fact that it is being retained would be due to your code.
Edit:
Here's your problem:
mnemonic = res.group(1).toLowerCase();
What you might not realize is that toLowerCase() returns this if there are no uppercase letters in the string. Also, group(int) returns a substring(), which creates a new string backed by the same char[] as the full string. So, mnemonic actually contains the char[] for the entire line.
The fix would just be:
mnemonic = new String(res.group(1).toLowerCase());
I think that your code snippet is not full. I believe you are calling scanner.findInLine() in loop. Anyway, try to call scanner.reset(). I hope this will solve your problem.
The JVM apparently does not have time to Garbage collect. Possibly because it's using the same code (the constructor) repeatedly to create multiple instances of the same class. The JVM may not do anything about GC until something changes on the run time stack -- and in this case that's not happening. I've been warned in the past about doing "too much" in a constructor as some of the memory management behaviors are not quite the same when other methods are being called.
Your problem is that you are scanning through a couple hundred thousand strings and you are passing the pattern in as a string, so you have a new pattern object for every single iteration of the loop. You can pull the pattern out of the loop, like so:
Pattern toMatch = Pattern.compile("(\\D*[a-zA-Z]+)(\\d*)(\\D*)")
Scanner sc = new Scanner(headerText);
MatchResult res;
try {
sc.findInLine(toMatch);
res = sc.match();
} finally {
sc.close();
}
Then you will only be passing the object reference to toMatch instead of having the overhead of creating a new pattern object for every attempt at a match. This will fix your leak.
Well I've found the source of the problem, it wasn't Scanner exactly but the list holding the objects doing the scanning in the constructor.
The problem had to do with the overrun of a list that was holding references to the object containing the parsing, essentially more strings were received per unit of time than could be processed and the list grew and grew until there were no more RAM. Bounding this list to a maximum size now prevents the parser from overloading the memory; I'll be adding some synchronization between the parser and the data source to avoid this overrun in the future.
Thank you all for your suggestions, I've already made some changes performance wise regarding the scanner and thank you to #RobI for pointing me to jvisualvm which allowed me to trace back the exact culprits holding the references. The memory dump wasn't showing the reference linking.

Using reserved words in an enum / switch statement, best workaround?

I am writing a flat file parser that reads token/value pairs using a Scanner. The files being read contain the token "class". The token is later used in a switch statement, and uses the (pre Java 7) valueOf(token) Java idiom to produce an enum value. (I am using Java6 for compatibility with GWT.) As a workaround, I am using uppercase values in the enum, and valueOf(token.toUpperCase()).
public enum ParseTags {
CODE, CLASS, INSTRUCTOR, HOURS;
}
// . . .
token = scanner.next();
value = scanner.next();
switch (ParseTags.valueOf(token.toUpperCase())) {
case CODE:
entry.setCode(value);
break;
case CLASS:
entry.setClass(value);
break;
Because this is being compiled into javascript, I want to avoid the extra "toUpperCase()" operation on each iteration; not sure what performance will be on target platform. Is there a more graceful way to represent reserved words in an enumeration? This would be handled well by Java7's switch on String, but again, I am confined to Java6sdk.
What you're doing right now is the preferred way to do it. I would be extraordinarily shocked if the toUpperCase were a bottleneck.
That said, I might consider something like
enum ParseTags {
CODE {
public void set(Entry entry, String value) {
entry.setCode(value);
}
},
...;
public abstract void set(Entry entry, String value);
}
so you can do
ParseTags.valueOf(token.toUpperCase()).set(entry, value);

Select object dynamically

Here's the situation :
I have 3 objects all named **List and I have a method with a String parameter;
gameList = new StringBuffer();
appsList = new StringBuffer();
movieList = new StringBuffer();
public void fetchData(String category) {
URL url = null;
BufferedReader input;
gameList.delete(0, gameList.length());
Is there a way to do something like the following :
public void fetchData(String category) {
URL url = null;
BufferedReader input;
"category"List.delete(0, gameList.length());
, so I can choose which of the lists to be used based on the String parameter?
I suggest you create a HashMap<String, StringBuffer> and use that:
map = new HashMap<String, StringBuffer>();
map.put("game", new StringBuffer());
map.put("apps", new StringBuffer());
map.put("movie", new StringBuffer());
...
public void fetchData(String category) {
StringBuffer buffer = map.get(category);
if (buffer == null) {
// No such category. Throw an exception?
} else {
// Do whatever you need to
}
}
If the lists are fields of your object - yes, using reflection:
Field field = getClass().getDeclaredField(category + "List");
List result = field.get();
But generally you should avoid reflection. And if your objects are fixed - i.e. they don't change, simply use an if-clause.
The logically simplest way taking your question as given would just be:
StringBuffer which;
if (category.equals("game"))
which=gameList;
else if (category.equals("apps"))
which=appList;
else if (category.equals("movie"))
which=movieList;
else
... some kind of error handling ...
which.delete();
As Jon Skeet noted, if the list is big or dynamic you probably want to use a map rather than an if/else/if.
That said, I'd encourage you to use integer constant or an enum rather than a String. Like:
enum ListType {GAME, APP, MOVIE};
void deleteList(ListType category)
{
if (category==GAME)
... etc ...
In this simple example, if this is all you'd ever do with it, it wouldn't matter much. But I'm working on a system now that uses String tokens for this sort of thing all over the place, and it creates a lot of problems.
Suppose you call the function and by mistake you pass in "app" instead of "apps", or "Game" instead of "game". Or maybe you're thinking you added handling for "song" yesterday but in fact you went to lunch instead. This will successfully compile, and you won't have any clue that there's a problem until run-time. If the program does not throw an error on an invalid value but instead takes some default action, you could have a bug that's difficult to track down. But with an enum, if you mis-spell the name or try to use one that isn't defined, the compiler will immediately alert you to the error.
Suppose that some functions take special action for some of these options but not others. Like you find yourself writing
if (category.equals("app"))
getSpaceRequirements();
and that sort of thing. Then someone reading the program sees a reference to "app" here, a reference to "game" 20 lines later, etc. It could be difficult to determine what all the possible values are. Any given function might not explicitly reference them all. But with an enum, they're all neatly in one place.
You could use a switch statement
StringBuffer buffer = null;
switch (category) {
case "game": buffer = gameList;
case "apps": buffer = appsList;
case "movie": buffer = movieList;
default: return;
}

Categories