Ok so I have 4 different files in java that are in .dat but they are all in text. I'm wondering which collection is best to use and how will I combine them together
Here are the 4 .dat files
The scores.dat consists of PERSONA_ID|GAME_ID|WIN
The personas.dat consists of ID|PLAYER_ID|GAMERTAG|PLATFORM
The players.dat consists of ID|FIRST_NAME|LAST_NAME|EMAIL|BIRTHDATE
The games.dat consist of ID|NAME|PRODUCER
Here are some other information that might be useful
scores.dat PersonaID = Persona.dat ID
scores.dat GameID = Games.dat ID
You can read each file and append the contents of the files to a single String and then write the combined String out to a new file. You can use StringBuilder for better performance if you are going to add many files in this way.
You don't need any data structure if you only want to append the contents of files. If you have any other requirement for 'merging' the files, you will need to use appropriate data structures.
Related
I have two large txt files around 150 mb. I want to read some data from each line of file1 and scan through all the lines of file2 till I find the matching data. If the matching data is not found, I want to output that line to another file.
I want the program to use as less memory as possible. Time is not a constraint.
Edit1
I have tried couple of options
Option1 : I have read the file2 using BufferedReader, Scanner and apache commons FileUtils.lineIterator. Loaded data of file2 into HashMap by reading each line. Read the data from file1 one line at a time and compared with data in HashMap. If it didn't match, wrote the line in a file3.
Option 2 : Read the file2 n times for every records in File 1 using the above mentioned three Readers.After every read I had to close the file and read again. I am wondering what's the best way. Is there any other option I can look into
I have to make some assumptions about the file.
I am going to assume the lines are long, and you want the lines that are not the same in the 2 files.
I would read the files 4 times (2 times per file).
Of course, it's not as efficient as reading it 2 times (1 time per file), but reading it 2 times means lots of memory is used.
Pseudo code for 1st read of each file:
Map<MyComparableByteArray, Long> digestMap = new HashMap<>();
try (BufferedReader br = ...)
{
long lineNr = 0;
String line;
while ((line = br.readLine()) != null)
{
digestMap.put(CreateDigest(line), lineNr);
}
}
If the digests are different/unique, I know that the line does not occur in the other file.
If the digests are the same, we will need to check the lines and actually compare them to make sure that they are really the same - this can occur during the second read.
Now what is also important is that we need to be careful of the digest we choose.
If we choose a short digest (i.e. md-5), we might run into lots of collisions, but this is appropriate for files with short lines, and we will need to handle the collisions separately (i.e. convert the map to a map<digest, list> structure.
If we choose a long digest (i.e. sha2-512), we won't run into lots of collisions (still safer to handle it like I mentioned above), BUT we will have the problem of not saving as much memory unless the file lines are very long.
So the general technique is:
Read each file and generate hashes.
Compare the hashes to mark the lines that need to be compared.
Read each file again and generate the output. Recheck all collisions found by the hashes in this step.
By the way, MyComparableByteArray is a custom wrapper around a byte[], to enable it to be a HashMap key (i.e. by implementing equals() and hashCode() methods). The byte[] cannot be used as a key, as it doesn't work with equals() and hashCode(). There are 2 ways to handle this:
custom wrapper as I've mentioned - this will be more efficient than the alternative.
convert it to a string using base64. This will make the memory usage around 2.5x worse than option 1, but does not need the custom code.
I have read a bit about multidimensional arrays would it make sense to solve this problem using such data structures in Java, or how should I proceed?
Problem
I have a text file containing records which contain multiple lines. One record is anything between <SUBBEGIN and <SUBEND.
The lines in the record follow no predefined order and may be absent from a record. In the input file (see below) I am only interested in lines MSISDN, CB,CF and ODBIC fields.
For each of these fields I would like to apply regular expressions to extract the value to the right of the equals.
Output file would be a comma separated file containing these values, example
MSISDN=431234567893 the value 431234567893 is written to the output file
error checking
NoMSISDNnofound when no MSISDN is found in a record
noCFUALLPROVNONE when no CFU-ALL-PROV-NONE is found in a recored
Search and replace operations
CFU-ALL-PROV-NONE should be replaced by CFU-ALL-PROV-1/1/1
CFU-TS10-ACT-914369223311 should be replaced by CFU-TS10-ACT-1/1/0/4369223311
Output for first record
431234567893,BAOC-ALL-PROV,BOIC-ALL-PROV,BOICEXHC-ALL-PROV,BICROAM-ALL-PROV,CFU-ALL-PROV-1/1/1,CFB-ALL-PROV-1/1/1,CFNRY-ALL-PROV-1/1/1,CFNRY-ALL-PROV-1/1/1,CFU-TS10-ACT-1/1/1/4369223311,BAIC,BAOC
Input file
<BEGINFILE>
<SUBBEGIN
IMSI=11111111111111;
MSISDN=431234567893;
CB=BAOC-ALL-PROV;
CB=BOIC-ALL-PROV;
CB=BOICEXHC-ALL-PROV;
CB=BICROAM-ALL-PROV;
IMEISV=4565676567576576;
CW=CW-ALL-PROV;
CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFU-TS10-ACT-914369223311-YES-YES-25-YES-65535-YES-YES-NO-NO-NO-YES-YES-
YES-YES-NO;
ODBIC=BAIC;
ODBOC=BAOC;
ODBROAM=ODBOHC;
ODBPRC=ENTER;
ODBPRC=INFO;
ODBPLMN=NONE;
ODBPOS=NOBPOS-BOTH;
ODBECT=OdbAllECT;
ODBDECT=YES;
ODBMECT=YES;
ODBPREMSMS=YES;
ODBADULTSMS=YES;
<SUBEND
<SUBBEGIN
IMSI=11111111111133;
MSISDN=431234567899;
CB=BAOC-ALL-PROV;
CB=BOIC-ALL-PROV;
CB=BOICEXHC-ALL-PROV;
CB=BICROAM-ALL-PROV;
CW=CW-ALL-PROV;
CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO+-NO-NO;
CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFU-TS10-ACT-914369223311-YES-NO-NONE-YES-65535-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFD-TS10-REG-91430000000-YES-YES-25-YES-65535-YES-YES-NO-NO-NO-YES-YES-YES-YES-NO;
ODBIC=BICCROSSDOMESTIC;
ODBOC=BAOC;
ODBROAM=ODBOH;
ODBPRC=INFO;
ODBPLMN=PLMN1
ODBPLMN=PLMN3;
ODBPOS=NOBPOS-BOTH;
ODBECT=OdbAllECT;
ODBDECT=YES;
ODBMECT=YES;
ODBPREMSMS=NO;
ODBADULTSMS=YES;
<SUBEND
From what I understand, you are simply reading a text file and processing it and maybe replacing some words. You do not therefore need a data structure to store the words in. Instead you can simply read the file line by line and pass it through a bunch of if statements (maybe a couple booleans to check if the specific parameters you are searching for have been found?) and then rewrite the line you want to a new file.
Dealing with big files to implement data in machine learning algorithms, I did it by passing all of the file contents in a variable, and then using the String.split("delimeter") method (Supported from Java 8 and later), I broke the contents in a one-dimensional array, where each cell had the info before the delimeter.
Firstly read the file via a scanner or your way of doing it (let content be the variable with your info), and then break it with
content.split("<SUBEND");
user u1 Khulbe Sharma gupta
These are the entries in the CSV file i want to know if there is a Java inbuilt function that can directly give me the number of lines in CSV file? In this case, it would be 5 lines.
A built-in functionality would be to open the file and read how many lines there are. In Java8 for example, it would look like this:
final Path path = Paths.get(ClassLoader.getSystemResource("your.csv").toURI());
return Files.lines(path).skip(1L).count(); // skip(1L) to ignore the titles
CSV has nothing to do with it. All you need is a line count. This can be accomplished in two lines of code with a LineNumberReader:
lineNumberReader.skip(Long.MAX_VALUE);
int lines = lineNumberReader.getLineNumber();
Super CSV. Have a close look at Example CSV file and the output.
You can put a variable to keep count while using Super CSV to find the number of lines. Only thing you have to do is to add the jar to your classpath and then access its methods
this is my first post here. I'm excited to finally take part.
I'm working on a project where I'm parsing obscure files types. I need to be able to parse word (which I've already done), .sbs, .day, .cmp, and more. All of these types can be opened simply with notepad and displayed.
Since I'm so new to this stuff, is there a way I can use some generic library (or two) to open all of these up? And if so what library would it be?
What's a best practice in this sort of circumstance?
Thanks!
You could use the Apache Commons IO library. FileUtils class has several methods that receives the file path and optionlly the file encoding.
If you just want to only read text files and save them to a text variable
java.io.File file = new java.io.File("C:\\dir\\file.cmp");
String allWordAndLines = org.apache.commons.io.FileUtils.readFileToString(file);
If you want each line separately and store them in a collection:
java.util.List<String> lines = org.apache.commons.io.FileUtils.readLines(file);
for(String line : lines) {
// do something with line
}
To specify the encoding, you need to add another parameter:
org.apache.commons.io.FileUtils.readFileToString(file, "UTF-8");
org.apache.commons.io.FileUtils.readLines(file, "Cp1252");
Java include several classes for read files, see more in http://docs.oracle.com/javase/tutorial/essential/io/index.html
I hope this can help you if you are looking for only to have your text file is available in memory.
Having some issues with my program I am trying to write. Basically what I am needing to do is taking an ArrayList that I have and export the information to a text file. I have tried several different solutions from Google, but none have given me any success.
Basically I have two ArrayList, two lists, one of graduates and one of undergraduates. I simply need to take the information that is associated with these ArrayList and put them into one text file.
I'll later need to do the opposite (import) the .txt file into ArrayList, but I can figure that out later.
Any suggestions?
If you need to write the data in a specific format, you could use a PrintWriter to write the data to a file in whatever manner you wish. The problem with this is that you will then have to figure out a way in which you will then re-read the text file and populate the data.
On the other hand, you could use XStream(tutorial here) to write your files as XML. This will provide you with a human readable text file (as above) however, it will be much easier to re-read the text file when populating the data.
Lastly, you could use the ObjectOutputStream to write the data and the ObjectInputStream to re-read it back. Note however, that this method does not yield a human readable text file. Also, your classes will need to implement the Serializable interface.
Here's a solution using Apache commons-io library:
//Put all data into one big list, prepended with size of first list
List<String> allData = new ArrayList<String>(1+grads.size()+undergrads.size());
allData.add(String.valueOf(grads.size());
allData.addAll(grads);
allData.addAll(undergrads);
FileUtils.writeLines(new File("list.txt"), allData);
To read the data back:
List<String> allData = FileUtils.readLines(new File("list.txt"));
int gradsSize = Integer.parseInt(allData.get(0));
List<String> grads = allData.subList(1, gradsSize+1);
List<String> undergrads = allData.subList(1+gradsSize, allData.size());