I am working on extracting values from a tab separated text file into a list in groovy. But am running into the ArrayIndexOutOfBoundsException.
Code
println("Reading File Contents")
def fullArray = new String[31721][4]
def availableArray = new String[1386][2]
def filteredFullArray = new String[1386][5]
String fileContents = new File('beliefs.txt').text
String availableContents = new File('available.txt').text
def count = 0
fileContents.eachLine { line ->
String[] str
str = line.split('\t')
def subCount = 0
for (subCount; subCount < str.length; subCount++) {
fullArray[count][subCount] = str[subCount]
}
count++
}
beliefs.txt
1 Azerbaijan hasOfficialLanguage Azerbaijani_language
2 Augustus hasChild Julia_the_Elder
3 Arthur_Aikin isCitizenOf England
4 Arthur_Aikin diedIn London
5 Alexander_III_of_Russia isMarriedTo Maria_Feodorovna__Dagmar_of_Denmark_
6 Alexander_III_of_Russia hasChild Nicholas_II_of_Russia
7 Alexander_III_of_Russia hasChild Grand_Duke_Michael_Alexandrovich_of_Russia
8 Alexander_III_of_Russia hasChild Grand_Duchess_Olga_Alexandrovna_of_Russia
9 Alexander_III_of_Russia hasChild Grand_Duke_Alexander_Alexandrovich_of_Russia
10 Alexander_III_of_Russia hasChild Grand_Duke_George_Alexandrovich_of_Russia
...
...
...
31719 Minqi_Li isKnownFor Chinese_New_Left
31720 Henry_Bates_Grubb isKnownFor Mount_Hope_Estate
31721 Thomas_Kuhn isKnownFor Paradigm_shift
Running this gives me the following error.
Caught: java.lang.ArrayIndexOutOfBoundsException: 4
java.lang.ArrayIndexOutOfBoundsException: 4
at extractBeliefs$_run_closure1.doCall(extractBeliefs.groovy:19)
at extractBeliefs.run(extractBeliefs.groovy:12)
I am aware of the reason why the above error could occur. But since my array does not exceed the last index and since the error is shown to be at the line fileContents.eachLine { line ->, I am unable to find where this is going wrong.
Any suggestions in this regard will be highly appreciated.
Your initial error is coming from this line (19):
fullArray[count][subCount] = str[subCount]
Line 12 is just elevating the exception as it exits the closure. This definitely indicates you have an extra tab on one line... for debugging purposes, try printing the line to the console before you attempt to load it into the array. That'll help you identify which line has the error.
Try splitting with space
str = line.split('\s+')
instead of
str = line.split('\t')
Better way would be to replace all Multispaces or tabs with the single space first and then split by single space.
line = line.replace("\\s+/g", " ")
str = line.split('\\s+')
Related
Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
I have 2 files which contains some data like this!!
File 1 contains:
/begin MENT AE0DAQ0O41 ""
ECU_ADDRESS 0x8111DSCC
ECU_ADDRESS_EXTENSION 0x0
/begin IF_DATA CAN_EXT
120
LINK_MAP "AE0DAQ0O41" 0x8111DSCC 0x0 0 0x2 1 0x2F 0x1
DISPLAY 0 0 655
/end IF_DATA
SYMBOL_LINK "AE0DAQ0O41" 0
/end MENT
File 2 contains:
name value line keyword
.data 80008114+000005 AE0DAQ0O43
.data 80008116+000005 AE0DAQ0O41
.data 80008118+000005 EA0DAQ0O45
.data 8000811a+000005 AE0DAF0O89
Now what we need to do is take a keyword AE0DAQ0O41 and need to search in the next file.
It has some value before the keyword, so we need to take that value 80008116 and need to replace it in
ECU_ADDRESS 0x8111DSCC and also LINK_MAP AE0DAQ0O41 0x8111DSCC 0x0 0 0x2 1 0x2F 0x1 for (0x8111DSCC it needs to be 0x80008116) and save it to FILE 1.
FILE 1 is to be saved like this :
/begin MENT AE0DAQ0O41 ""
ECU_ADDRESS 0x80008116
ECU_ADDRESS_EXTENSION 0x0
/begin IF_DATA CAN_EXT
120
LINK_MAP "AE0DAQ0O41" 0x80008116 0x0 0 0x2 1 0x2F 0x1
DISPLAY 0 0 655
/end IF_DATA
SYMBOL_LINK "AE0DAQ0O41" 0
/end MENT
How do we do that ??? because it has multiple lines like this ????
Thanks in advance!!!!!!!
If you consider your File 2 as a tab seperated value file then you could read the File 1 line by line and then compare the keyword in the file1 with each line in file2.
When you get a match then write another file with the new inputs
Quick and dirty solution:
(Assuming that the inputs are both text files...)
The code creates a dictionary by mining the second file.
The first file is processed line by line and written to the output file after the required modifications.
This is certainly not the best way to go about it.
If you know the exact format of the files, you can optimize the code to run a lot faster.
fout = open('output.txt' , 'w')
beg, ecu, lnk = '/begin','ECU_ADDRESS', 'LINK_MAP'
keyVal = dict()
with open('file2.txt') as f2:
for line in f2:
b = line.split(' ')
newK, newV = b[-1].replace('\n','') , b[-2].split('+')[0]
keyVal[newK] = newV
with open('file1.txt') as f1:
value,keyword = '', ''
for line in f1:
a = line.split(' ')
loc = 0
if beg in a and 'MENT' in a:
keyword = a[a.index(beg)+2]
value = '0x'+keyVal.get(keyword,keyword)
elif ecu in a:
loc = a.index(ecu) + 1
elif lnk in a: loc = a.index(lnk) + 2
else : loc = 0
if loc != 0:
a[loc] = value
a = ' '.join(a)
fout.writelines(a)
fout.close()
I have broken wiki xml dump into many small parts of 1M and tried to clean it (after cleaning it with another program by somebody else)
I get an out of memory error which I don't know how to solve. Can anyone enlighten me?
I get the following error message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:212)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:235)
at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:252)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:292)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:645)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:454)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1541)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1256)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1237)
at qa.main.ja.Indexing$$anonfun$5$$anonfun$apply$4.apply(SearchDocument.scala:234)
at qa.main.ja.Indexing$$anonfun$5$$anonfun$apply$4.apply(SearchDocument.scala:224)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:750)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at qa.main.ja.Indexing$$anonfun$5.apply(SearchDocument.scala:224)
at qa.main.ja.Indexing$$anonfun$5.apply(SearchDocument.scala:220)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
Where line 234 is as follows:
writer.addDocument(document)
It is adding some documents to Lucene
and where line 224 is as follows:
for (doc <- target_xml \\ "doc") yield {
It is the first line of a for loop for adding various elements as fields in the index.
Is it a code problem, setting problem or hardware problem?
EDIT
Hi, this is my for loop:
for (knowledgeFile <- knowledgeFiles) yield {
System.err.println(s"processing file: ${knowledgeFile}")
val target_xml=XML.loadString(" <file>"+cleanFile(knowledgeFile).mkString+"</file>")
for (doc <- target_xml \\ "doc") yield {
val id = (doc \ "#id").text
val title = (doc \ "#title").text
val text = doc.text
val document = new Document()
document.add(new StringField("id", id, Store.YES))
document.add(new TextField("text", new StringReader(title + text)))
writer.addDocument(document)
val xml_doc = <page><title>{ title }</title><text>{ text }</text></page>
id -> xml_doc
}
}).flatten.toArray`
The inner loop just loops thru every doc element. The outer loop loops thru every file. Is the nested for the source of the problem?
Below is the cleanFile function for reference:
def cleanFile(fileName:String):Array[String] = {
val tagRe = """<\/?doc.*?>""".r
val lines = Source.fromFile(fileName).getLines.toArray
val outLines = new Array[String](lines.length)
for ((line,lineNo) <- lines.zipWithIndex) yield {
if (tagRe.findFirstIn(line)!=None)
{
outLines(lineNo) = line
}
else
{
outLines(lineNo) = StringEscapeUtils.escapeXml11(line)
}
}
outLines
}
Thanks again
Looks like you would like to try increasing the heap size by having -xmx jvm argument?
i got an error problem! I open my file i read a line and then i take information from the line with StringTokenizer
my code works with one line but when i am trying to read another i got an error any help ?
here is my code
try{
line = reader.readLine();
while(line!=null){
StringTokenizer st = new StringTokenizer(line,"\t");
timer=st.nextToken("\t");
int Itimer=Integer.parseInt(timer);
// System.out.println(Itimer);
what_to_do=st.nextToken("\t");
// System.out.print(what_to_do);
flightnumber=st.nextToken();
int Iflightnumber=Integer.parseInt(flightnumber);
// System.out.print(Iflightnumber);
departure=st.nextToken("\t");
// System.out.print(departure);
flighttime=st.nextToken("\t");
int Iflighttime=Integer.parseInt(flighttime);
// System.out.print(Iflighttime);
Key=new KeyFlight(Iflightnumber,Iflighttime);
flight=new Flight(Key,true);
if(what_to_do.equals("insert")){
// System.out.print("worked");
if(departure.equals("D")){
result=true;
}else{result=false;}
flight.setdeparture(result);//8a mporousa na kanw new flight alla gia e3ikonomisi to ekana me seter//
EV.insert(flight);
// System.out.println("worked again");
}else if(what_to_do.equals("cancel")){
EV.remove(Key);
}
else if(what_to_do.equals("update")){
EV.UpdateKey(flight, Key);
}
line=reader.readLine();
and these are the errors Exception in thread "main" java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(Unknown Source)
at java.util.StringTokenizer.nextToken(Unknown Source)
at FlightSchedule.loadandStoreFile(FlightSchedule.java:54)
at FlightSchedule.main(FlightSchedule.java:13)
i wrote instead of last reader.readLine(), line=null and it worked
Code is ok its a StringTokenizer problem
examble of my txt format: 0 insert 370 D 425
The problem could be, you are looking for tab "\t" on you're stringTokenizer and maybe the space between youre data is not a tab is just a white space, try better line.split("\s+")
A file contains the following:
HPWAMain.exe 3876 Console 1 8,112 K
hpqwmiex.exe 3900 Services 0 6,256 K
WmiPrvSE.exe 3924 Services 0 8,576 K
jusched.exe 3960 Console 1 5,128 K
DivXUpdate.exe 3044 Console 1 16,160 K
WiFiMsg.exe 3984 Console 1 6,404 K
HpqToaster.exe 2236 Console 1 7,188 K
wmpnscfg.exe 3784 Console 1 6,536 K
wmpnetwk.exe 3732 Services 0 11,196 K
skypePM.exe 2040 Console 1 25,960 K
I want to get the process ID of the skypePM.exe. How is this possible in Java?
Any help is appreciated.
Algorithm
Open the file.
In a loop, read a line of text.
If the line of text starts with skypePM.exe then extract the number.
Repeat looping until all lines have been read from the file.
Close the file.
Implementation
import java.io.*;
public class T {
public static void main( String args[] ) throws Exception {
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream( "tasklist.txt" ) ) );
String line;
while( (line = br.readLine()) != null ) {
if( line.startsWith( "skypePM.exe" ) ) {
line = line.substring( "skypePM.exe".length() );
int taskId = Integer.parseInt( (line.trim().split( " " ))[0] );
System.out.println( "Task Id: " + taskId );
}
}
br.close();
}
}
Alternate Implementation
If you have Cygwin and related tools installed, you could use:
cat tasklist.txt | grep skypePM.exe | awk '{ print $2; }'
To find the Process Id of the application SlypePM..
Open the file
now read lines one by one
find the line which contains SkypePM.exe in the beginning
In the line containing SkypePM.exe parse the line to read the numbers after the process name leaving the spaces.
You get process id of the process
It is all string operations.
Remember the format of the file should not change after you write the code.
If you really want to parse the output, you may need a different strategy. If your output file really is the result of a tasklist execution, then it should have some column headers at the top of it like:
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
I would use these, in particular the set of equal signs with spaces, to break any subsequent strings using a fixed-width column strategy. This way, you could have more flexibility in parsing the output if needed (i.e. maybe someone is looking for java.exe or wjava.exe). Do keep in mind the last column may not be padded with spaces all the way to the end.
I will say, in the strictest sense, the existing answers should work for just getting the PID.
Implementation in Java is not a good way. Shell or other script languages may help you a lot. Anyway, JAWK is a implementation of awk in Java, I think it may help you.
I run a small online gaming community and deal with a database of accounts.
The setup is this:
Folder named Accounts
Inside the Accounts directory, there is 200,000+ text files organized by player name. Access to this folder manually is a pain because of the needed RAM to get in and search files. I find this very inconvenient.
I access this directory to send password reminders or for highscores on who has been playing the longest.
Here is an example of an account file. This file is named Falcon.txt
[ACCOUNT]
character-username = Falcon
character-password = falconpassword
[INFO]
character-coordx = 3252
character-coordy = 3432
character-active = yes
character-ismember = 1
character-messages = 5
character-lastconnection = [removed]
character-lastlogin = 2009-11-29
character-energy = 100
character-gametime = 193
character-gamecount = 183
[EQUIPMENT]
character-equip = 0 4724 0
character-equip = 1 1052 0
character-equip = 2 6585 0
character-equip = 3 4151 0
character-equip = 4 4720 0
character-equip = 5 1215 0
character-equip = 6 -1 0
character-equip = 7 4722 0
character-equip = 8 -1 0
character-equip = 9 775 0
character-equip = 10 1837 0
character-equip = 11 -1 0
character-equip = 12 6735 0
character-equip = 13 -1 0
[APPEARANCE]
character-look = 0 1
character-look = 1 1
character-look = 2 2
character-look = 3 3
character-look = 4 5
character-look = 5 2
[STATS]
character-skill = 0 1 0
character-skill = 1 1 0
character-skill = 2 1 0
character-skill = 3 1 0
character-skill = 4 1 0
character-skill = 5 1 0
character-skill = 6 1 0
character-skill = 7 1 0
character-skill = 8 1 0
character-skill = 9 1 0
character-skill = 10 1 0
character-skill = 11 1 0
character-skill = 12 1 0
character-skill = 13 1 0
character-skill = 14 1 0
character-skill = 15 1 0
character-skill = 16 1 0
character-skill = 17 1 0
character-skill = 18 1 0
character-skill = 19 1 0
character-skill = 20 1 0
[ITEMS]
[BANK]
[FRIENDS]
[IGNORES]
[END]
There is a huge database of these and search through the directory in the files for values.
Values I mean by item ID's or IP addresses to find and track other accounts.
However I have a new problem and my development for this is crashing.
As you can see in the file the lines are organized by tabs.
character-equip = 0 4724 1
If I put the value 4724 in my search application, I want it to print out the value 1 tab to the right of the found search result. I want it to print out the value for the found results only, not extra results.
So the search could look like this:
1 "Enter item to find:"
2 "Enter item to find: 4724"
3 "Account Falcon.txt has 1!"
press any key to continue...
Or if there was more quantity of that equipped item
character-equip = 5 1239 102
1. "Enter item to find:"
2. "Enter item to find: 1239"
3. "Account Falcon2.txt has 102!"
press any key to continue...
I simply want to input an item ID, and have it display the value after the found value. The white space is a tab. I have tried doing this and the only successful way of getting any result is to put a tab in between the search term. So if I want to find item 1239 id type this in the cmd line:
Enter item to find:<tab>1239<tab>
It would then search for that and will display the accounts with that item in it. However I still have to individually open up the accounts to find out the quantity of that item. I want the search results to display the quantity of the item if the value is found. However if the value is a quantity and it trys to search one tab over, I want it to either skip it or say zero.
This is what I mean.
character-equip = 0 1024 1239
Enter item to find: 1239
If it hits this account I want to make the search results display a zero if it cannot tab over or view any values in the white space. So it will display as null or zero
Account Falcon3.txt has null!
or
Account Falcon3.txt has 0!
I've attempted to do this but I am unsure how to achieve this.
Here is my code.
import java.io.*;
import java.util.*;
public class ItemDatabase {
public static void main(String args[]) {
System.out.print("Enter item to find: ");
Scanner sc = new Scanner(System.in);
find(sc.nextLine());
}
public static void find(String delim) {
File dir = new File("accounts");
if (dir.exists()) {
String read;
try {
File files[] = dir.listFiles();
for (int i = 0; i < files.length; i++) {
File loaded = files[i];
if (loaded.getName().endsWith(".txt")) {
BufferedReader in = new BufferedReader(new FileReader(loaded));
StringBuffer load = new StringBuffer();
while ((read = in.readLine()) != null) {
load.append(read + "\n");
}
String delimiter[] = new String(load).split(delim);
if(delimiter.length > 1) {
System.out.println("Account " + loaded.getName() + "has " + delimiter[1] + "!");
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
} else {
System.out.println("error: dir wasn't found!");
}
}
}
Thanks guys I hope you can help me.
This is simply crying out for a database. If your entire back end is running in a single java process I'd recommend going with something like Apache Derby or H2.
If you'd rather not move to a database, and the main problem is the act of listing all the entries in a single directory, could you split it into a heirarchy. Your Falcon account could then be located in F/FA/Falcon.txt - each directory would then contain a more manageable number of files.
Aside from the need for a database, you could probably implement your solution more intuitively and easily using commandline utilities such as find, grep, etc. or a text-processing language such as perl.
Maybe
grep '[0-9]+\t[0-9]+\t1239' Account_Falcon3.txt
Would return
character-equip = 0 1024 1239
You could then easily see that the value is 0 for that item.
I cannot emphasize enough the need to not write a Java program to do this - you won't do as good a job as the authors of the standard shell utilities. Let's face it, the fact that you are asking this question indicates that you are a newb! :) (We are all newbs depending on the topic).