It takes too much time to operate on BLOB - java

I have to read a BLOB column which contains only text. It worked quite efficiently (reading 100k blobs in 3 minutes) before but it is taking awful amount of time in a different environment although with same hardware.
Here's my code :-
while (rs.next()) {
is = rs.getBinaryStream(3);
while ((len = is.read(buffer)) != -1) {
baos.write(buffer, 0, len);
}
is.close();
blobByte = baos.toByteArray();
baos.close();
String blob = new String(blobByte);
String msisdn = rs.getString(2);
blobData = blob.split("\\|");
//some operations
}
I took jstack at intervals of 5 seconds and found application always in this line :-
blobData = blob.split("\\|");
And sometimes in :-
new String(blobByte);
My java options :-
-ms10g -mx12g -XX:NewSize=1g -XX:MaxNewSize=1g
Is some part of my code un-optimized? Or is there a significantly efficient way to read BLOB?

You get an InputStream for a BLOB to be able to avoid having the entire BLOB data in memory. But then, you do the entire opposite
You use a ByteArrayOutputStream to transfer the whole data into a byte[] array. Note that the data even exists twice in memory, once inside ByteArrayOutputStream’s own buffer, then in the copy created and returned by baos.toByteArray()
Then, you convert the entire array into a potentially humongous String via new String(blobByte), bearing the 3rd copying of the entire data (including the charset conversion).
split("\\|") will run over the entire String, creating substrings for each sequence between the delimiters, which implies another copying of the entire data, into the substrings (minus the delimiter characters) by then, you have four copies of the entire data in memory, depending on the source’s buffering, it might be five times. Additionally, an array containing references to all these substrings is created and populated
Not all copy operation can be avoided. But we can avoid having the entire data in memory:
try(Scanner s = new Scanner(is).useDelimiter("\\|")) {
while(s.hasNext()) {
String next = s.next();
System.out.println(next);// replace with actual processing
}
}
When you are able to process items individually, not keeping a reference to the previous item(s), these strings may get garbage collected, with a minor collection in the best case.
Even when a String[] array with all elements is required for your processing, which makes one copy of the entire data (in form of individual strings) unavoidable, you can avoid all the other copies:
try(Scanner s = new Scanner(is).useDelimiter("\\|")) {
List<String> list = new ArrayList<>();
while(s.hasNext()) list.add(s.next());
System.out.println(list);// replace with actual processing as List
String[] array = list.toArray(new String[0]); // when an array really is required
}
Starting with Java 9, you can use
try(Scanner s = new Scanner(is).useDelimiter("\\|")) {
List<String> list = s.tokens().collect(Collectors.toList());
System.out.println(list); // replace with actual processing as List
}
or
try(Scanner s = new Scanner(is).useDelimiter("\\|")) {
String[] array = s.tokens().toArray(String[]::new);
System.out.println(Arrays.toString(array)); // replace with actual processing
}
But processing the elements individually, without holding all of them in memory, is the preferred way.
Another possible optimization is to avoid multiple (internal) Pattern.compile("\\|") calls by doing it once yourself and passing the prepared Pattern instead of the "\\|" string to the useDelimiter method.
Note that all of these example use the system’s default charset encoding, just like your original code. Since the default charset of the environment running your code is not necessarily the same as for the database, you should be explicit, i.e. use new Scanner(is, charset), just like you should have used new String(blobByte, charset) in your original code, instead of new String(blobByte).
Or you use a CLOB in the first place.

Related

Java Reference while adding elements to a list

first of all thanks for the help.
I'm aware of the Reference passing mechanism of java and I need to read one million of lines (a word + a_list_of_integers each line) from a text file and put them in some structures that are class attributes, one hashmap and two arraylist.
The problem is that with the code below, written to save memory reusing the list "termine_frequenza", when I try to get and element from the "frequency" arraylist or the "dictionaryMarTD" hashmap, the list that returns is always the last list that I added.
Adding the declaration of the "Arraylist termine_frequenza" into the While obviously solves the problem but I receive a prevedible "GC overhead limit exceeded" error because of multiple declaration (i tried to increase heap o disable it, but GC fills the cpu capacity trying to free memory.
The question is simple: how can I save memory and at the same time have a correct reading? Thanks.
//Class attributes
private HashMap<String, ArrayList> dictionaryMapTD;
private ArrayList<String> words;
private ArrayList<ArrayList> frequency;
//This is the code of a method of the class that reads from a file
br = new BufferedReader(new FileReader("dictionary.txt"));
s = br.readLine();
String[] splitted;
ArrayList<Integer> termine_frequenza = new ArrayList<>();
while(s!=null)
{
termine_frequenza.clear();
splitted = s.split(" ");
words.add(splitted[0]);
for (int i = 1; i < splitted.length; i++)
{
termine_frequenza.add(Integer.valueOf(splitted[i]));
}
frequency.add(termine_frequenza);
dictionaryMapTD.put(splitted[0], termine_frequenza);
s = br.readLine();
}
//END
Change your XMS/XMX parameters in your eclips.ini file.
I set it -Xms256m-Xmx7024m for 3000000
If it has no effect then try to modify that parameters for application
In your eclips go to
RunConfigurations->Arguments->VM Arguments
for your application and put
-Xms256m
-Xmx7024m
Then in your code move
termine_frequenza = new ArrayList<>();
inside while and remove
termine_frequenza.clear();
GC should not complain
In my case It runs for 7000000 records
Let me know if it helps

How to Sort Numeric Array in JMeter Beanshell

I am new to using Beanshell/Java in my JMeter scripts. I have following code in my JMeter Beanshell Processor.
int count = Integer.parseInt(vars.get("student_id_RegEx_matchNr"));
String delimiter = ",";
StringBuffer sb = new StringBuffer();
for(int i=1;i<=25;i++) {
sb.append(vars.get("student_id_RegEx_" + i));
if (i == count){
break; //to eliminate comma after the array
}else {
sb.append(delimiter);
}
}
vars.putObject("myUnsortedVar",sb.toString());
I get following as output when I run script:
myUnsortedVar=5,6,2,3,1,4
I want it to be sorted numerically like this and also stored in a new variable named "sortedVar".
1,2,3,4,5,6
What code can I use to sort this and also store in a new variable so I can use the sorted array in coming JMeter requests. Thanks for help.
Taking sb.toString() = "5,6,2,3,1,4".
Use String::split() to convert from String to String[].
Use Arrays::sort() to sort the array
Use Arrays.toString() to convert from String[] to String
String[] sortedArray = Arrays.sort(sb.toString().split(","));
vars.putObject("mySortedVar", Arrays.toString(sortedArray));
I suppose that in bean shell you may use the same as in Java. Once you fill StringBuffer, there is not easy way to sort the contents. Therefore I would store the contents first into an intermediate ArrayList<String> (or even better ArrayList<Integer> if you always get numbers), then sort it using Collections.sort, and then use another for cycle to put the list's contents into StringBuffer using the comma delimiter.
You could do something like:
char [] responseCharArray = vars.get("myUnsortedVar").toCharArray();
Arrays.sort(responseCharArray);
String mySortedString = Arrays.toString(responseCharArray);
vars.put("mySortedVar", mySortedString.replaceAll("\\,\\,","").replaceAll(" ",""));
See How to use BeanShell: JMeter's favorite built-in component guide for more information on Beanshell scripting in JMeter
As OndreJM suggested, you need to change your approach. Instead of storing values in StringBuffer, store them in ArrayList and then use Collections.sort to sort it. Following code should work for you.
// create an ArrayList
ArrayList strList = new ArrayList();
for (int i=0;i<25; i++){
strList.add(vars.get("student_id_RegEx_" + String.valueOf(i+1)));
}
// sort this ArrayList
Collections.sort(strList);
// use StringBuilder to build String from ArrayList
StringBuilder builder = new StringBuilder();
for (String id: strList){
builder.append(id);
builder.append(",");
}
builder.deleteCharAt(builder.length()-1);
// finally put in variable using JMeter built in 'vars.put'
// do not use vars.putObject, as you can not send object as request parameter
vars.put("sortedVar", builder.toString());

Read data from multiple files and apply business logic

Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as
aaa.txt:
100110,StringA,22
200110,StringB,2
300110,StringC, 12
400110,StringD,34
500110,StringE,423
bbb.txt as:
100110,StringA,20.1
200110,StringB,2.1
300110,StringC, 12.2
400110,StringD,3.2
500110,StringE,42.1
and ccc.txt as:
100110,StringA,2.1
200110,StringB,2.1
300110,StringC, 11
400110,StringD,3.2
500110,StringE,4.1
Now I have to read all the three files (huge files) and report the result as
100110: (22, 20.1,2.1).
Issue is with the size of files and how to achieve this in optimized way.
I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.
The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.
HashMap<String, String[]> map = new HashMap<>();
while (aaa.hasNextLine()) {
String[] lineContents = aaa.nextLine().split(",");
String[] array = new String[3];
array[0] = lineContents[2].trim();
map.put(lineContents[0], array);
}
while (bbb.hasNextLine()) {
String[] lineContents = bbb.nextLine().split(",");
String[] array = map.get(lineContents[0]);
if (array != null) {
array[1] = lineContents[2].trim();
map.put(lineContents[0], lineContents[2].trim());
} else {
array = new String[3];
array[1] = lineContents[2].trim();
map.put(lineContents[0], array);
}
}
// same for c, with a new index of 2
To add synchronicity, you would probably use one of these maps.
Then you'd create 3 threads that just read and put.
Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.
If your files are all ordered, simply maintain an array of Scanner pointing to your files and read the lines one by one, output the result file in a file as you go.
Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.
If your files are not ordered, you can use the sort command to sort them.

Getting rid of excess while statement

Could anybody have a look at this snippet of code and and tell me if there is a way to amalgamate the two while statements into one?
public static void main(String[] args) throws IOException
{
BufferedReader fileInput;
fileInput = new BufferedReader(new FileReader("information.txt"));
int countOfClients = 0;
while (fileInput.ready())
{
fileInput.readLine();
countOfClients ++;
}
int totalClients = countOfClients ;
Client[] clientDetails = new Client[totalClients];
int clientNumber = 0;
while (fileInput.ready())
{
String currentLineOfText = fileInput.readLine();
String clientName = currentLineOfText.substring(0, 19);
String gender = currentLineOfText.substring(20,21);
char clientGender = gender.charAt(0);
int clientAge = Integer.parseInt(currentLineOfText.substring(22,24));
String clientInterests = currentLineOfText.substring(25);
clientDetails[clientNumber] = new Client(clientName, clientGender, clientAge, clientInterests);
clientNumber++;
}
The first while statement is reading all the lines in the text, so it knows how many elements in the object array it needs.
The array clientDetails of class Client[] is then created.
The second while statement populates that array.
Can I avoid using two while statements?
Note: This is for an assignment and I have to use arrays.
As they're all saying, use an ArrayList to store the items.
If memory is an issue, you can use ArrayList.toArray() to trim it down to the bare bones.
If efficiency is an issue, you probably shouldn't be reading from a file in the first palce.
You could use an ArrayList instead of an array and simply use:
list.add(new Client(...));
If you really need an array, you can always call:
Client[] array = list.toArray();
Why create an array ? Why not have one while loop that creates an ArrayList and then (if you need an array) extract the resultant array from that using ArrayList.toArray() ?
You can avoid two while loops by changing Client[] to ArrayList();
Example:
List<Client> clientDetails = new ArrayList<Client>();
int clientNumber = 0;
while (fileInput.ready())
{
String currentLineOfText = fileInput.readLine();
String clientName = currentLineOfText.substring(0, 19);
String gender = currentLineOfText.substring(20,21);
char clientGender = gender.charAt(0);
int clientAge = Integer.parseInt(currentLineOfText.substring(22,24));
String clientInterests = currentLineOfText.substring(25);
clientDetails.add( new Client(clientName, clientGender, clientAge, clientInterests));
}
Note: Hand edited, there may be syntax errors.
If you really can't use the pre-written ArrayList class, you could always effectively re-implement it (or at least the relevant bits of it) yourself.
The key technique is to take a guess at the size of the array you might need, define an array that size, and, if you find it is too small, create a bigger array and copy all the existing values from the old to the new array, before continuing in the space that is left over.
At the other end of the loop, you might be in for yet another step, and shrink the array again (by declaring a smaller array and copying values over) so you have no empty spaces left.
Or, as recommended by all the other answers, just use an ArrayList, which already does exactly this for you...

Working with a List of Lists in Java

I'm trying to read a CSV file into a list of lists (of strings), pass it around for getting some data from a database, build a new list of lists of new data, then pass that list of lists so it can be written to a new CSV file. I've looked all over, and I can't seem to find an example on how to do it.
I'd rather not use simple arrays since the files will vary in size and I won't know what to use for the dimensions of the arrays. I have no issues dealing with the files. I'm just not sure how to deal with the list of lists.
Most of the examples I've found will create multi-dimensional arrays or perform actions inside the loop that's reading the data from the file. I know I can do that, but I want to write object-oriented code. If you could provide some example code or point me to a reference, that would be great.
ArrayList<ArrayList<String>> listOLists = new ArrayList<ArrayList<String>>();
ArrayList<String> singleList = new ArrayList<String>();
singleList.add("hello");
singleList.add("world");
listOLists.add(singleList);
ArrayList<String> anotherList = new ArrayList<String>();
anotherList.add("this is another list");
listOLists.add(anotherList);
Here's an example that reads a list of CSV strings into a list of lists and then loops through that list of lists and prints the CSV strings back out to the console.
import java.util.ArrayList;
import java.util.List;
public class ListExample
{
public static void main(final String[] args)
{
//sample CSV strings...pretend they came from a file
String[] csvStrings = new String[] {
"abc,def,ghi,jkl,mno",
"pqr,stu,vwx,yz",
"123,345,678,90"
};
List<List<String>> csvList = new ArrayList<List<String>>();
//pretend you're looping through lines in a file here
for(String line : csvStrings)
{
String[] linePieces = line.split(",");
List<String> csvPieces = new ArrayList<String>(linePieces.length);
for(String piece : linePieces)
{
csvPieces.add(piece);
}
csvList.add(csvPieces);
}
//write the CSV back out to the console
for(List<String> csv : csvList)
{
//dumb logic to place the commas correctly
if(!csv.isEmpty())
{
System.out.print(csv.get(0));
for(int i=1; i < csv.size(); i++)
{
System.out.print("," + csv.get(i));
}
}
System.out.print("\n");
}
}
}
Pretty straightforward I think. Just a couple points to notice:
I recommend using "List" instead of "ArrayList" on the left side when creating list objects. It's better to pass around the interface "List" because then if later you need to change to using something like Vector (e.g. you now need synchronized lists), you only need to change the line with the "new" statement. No matter what implementation of list you use, e.g. Vector or ArrayList, you still always just pass around List<String>.
In the ArrayList constructor, you can leave the list empty and it will default to a certain size and then grow dynamically as needed. But if you know how big your list might be, you can sometimes save some performance. For instance, if you knew there were always going to be 500 lines in your file, then you could do:
List<List<String>> csvList = new ArrayList<List<String>>(500);
That way you would never waste processing time waiting for your list to grow dynamically grow. This is why I pass "linePieces.length" to the constructor. Not usually a big deal, but helpful sometimes.
Hope that helps!
If you are really like to know that handle CSV files perfectly in Java, it's not good to try to implement CSV reader/writer by yourself. Check below out.
http://opencsv.sourceforge.net/
When your CSV document includes double-quotes or newlines, you will face difficulties.
To learn object-oriented approach at first, seeing other implementation (by Java) will help you. And I think it's not good way to manage one row in a List. CSV doesn't allow you to have difference column size.
The example provided by #tster shows how to create a list of list. I will provide an example for iterating over such a list.
Iterator<List<String>> iter = listOlist.iterator();
while(iter.hasNext()){
Iterator<String> siter = iter.next().iterator();
while(siter.hasNext()){
String s = siter.next();
System.out.println(s);
}
}
Something like this would work for reading:
String filename = "something.csv";
BufferedReader input = null;
List<List<String>> csvData = new ArrayList<List<String>>();
try
{
input = new BufferedReader(new FileReader(filename));
String line = null;
while (( line = input.readLine()) != null)
{
String[] data = line.split(",");
csvData.add(Arrays.toList(data));
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
finally
{
if(input != null)
{
input.close();
}
}
I'd second what xrath said - you're better off using an existing library to handle reading / writing CSV.
If you do plan on rolling your own framework, I'd also suggest not using List<List<String>> as your implementation - you'd probably be better off implementing CSVDocument and CSVRow classes (that may internally uses a List<CSVRow> or List<String> respectively), though for users, only expose an immutable List or an array.
Simply using List<List<String>> leaves too many unchecked edge cases and relying on implementation details - like, are headers stored separately from the data? or are they in the first row of the List<List<String>>? What if I want to access data by column header from the row rather than by index?
what happens when you call things like :
// reads CSV data, 5 rows, 5 columns
List<List<String>> csvData = readCSVData();
csvData.get(1).add("extraDataAfterColumn");
// now row 1 has a value in (nonexistant) column 6
csvData.get(2).remove(3);
// values in columns 4 and 5 moved to columns 3 and 4,
// attempting to access column 5 now throws an IndexOutOfBoundsException.
You could attempt to validate all this when writing out the CSV file, and this may work in some cases... but in others, you'll be alerting the user of an exception far away from where the erroneous change was made, resulting in difficult debugging.
public class TEst {
public static void main(String[] args) {
List<Integer> ls=new ArrayList<>();
ls.add(1);
ls.add(2);
List<Integer> ls1=new ArrayList<>();
ls1.add(3);
ls1.add(4);
List<List<Integer>> ls2=new ArrayList<>();
ls2.add(ls);
ls2.add(ls1);
List<List<List<Integer>>> ls3=new ArrayList<>();
ls3.add(ls2);
methodRecursion(ls3);
}
private static void methodRecursion(List ls3) {
for(Object ls4:ls3)
{
if(ls4 instanceof List)
{
methodRecursion((List)ls4);
}else {
System.out.print(ls4);
}
}
}
}
Also this is an example of how to print List of List using advanced for loop:
public static void main(String[] args){
int[] a={1,3, 7, 8, 3, 9, 2, 4, 10};
List<List<Integer>> triplets;
triplets=sumOfThreeNaive(a, 13);
for (List<Integer> list : triplets){
for (int triplet: list){
System.out.print(triplet+" ");
}
System.out.println();
}
}

Categories