Parse a Comma Delimited File using JAVA - java

I want to read this file:-
http://www.somehost.com/products/, A0,D1,L0,T0
http://www.somehost.com/news/rel, A1,D0,L1,T0
http://istor.somehost.com, A0, D1, L0, T0
I have a list of urls and I want to compare those url's with the url's that are there in this file. And Suppose the url that I wanted to compare starts with the url that is there in these file.. Then it will move forward in that url line and it will check for A and D. If A is 0 then we will not crawl that url and vice versa and If A is 1 then we will move forward and see whether L is 0 or 1 means if L is 1 then we will extract link only and vice versa and same with T is 0 or 1, we will extract text only if T is 0.
Any suggestion how can I do this.. ??

I've used Java CSV and it's pretty easy. See the code examples as well. However (summarizing what #Hovercraft Full Of Eels said), if your data are not overly complicated, Java's String.split() should work fine.
After parsing your data you can, you know, read the values and determine what to do from there. Your description of what you need to do is practically an outline of a method with an if ... else if ... else structure, so start from that.

Related

How to get a value in a file with coordinates in Java

My programm needs to read a file that has different data structures with a variable separator.
In my properties-file you can set the separator and put coordinates for values of different variables:
separator = ;
variable1 = 1,7
variable2 = 2,42
I would like to have a way where I can access a column and a line with some kind of coordinates.
I'm thinking of a syntax like this:
file.get(1,7,";")
(Which would give you the value of the 1st line and 7th column with the specific separator)
Does someone know a library or a code snippet that does exactly this?
Using String.split() :
public String get(File file, int lineNumber, int column, String separator ) {
//getting to the lineNumber of the file ommitted
// suppose you got it in a String named "line"
return line.split(separator)[column - 1];
}
You can use OpenCSV or SuperCSV for example. I'm not aware of any library that does your 'coordinates' gettings, but it's as simple as reading the CSV with the given separator as List-of-Lists and then call
csv.get(1).get(7)
Seems to be a simple file processing, You should first process the file -
create ArrayList<ArrayList<String>> processedFile
Read every line, split using "line".split(separator)
Store the array above in the ArrayList processedFile at current index
increase the index with every line
Once processedFile is ready, you can simply use processedFile.get(row).get(column). Also once the file is processed, all the other queries will be O(1). Hints are enough, try writing the code yourself, you will learn more.
PS: Take care of NullPointerExceptions wherever required.

How to mask a specific value without knowing exact key, within a JSON string, in Java

I am receiving a JSON in the form of a string, need to mask a piece of information, however the JSON strucutre and key-names are always different but value's pattern is recognizable. Question being, what is an efficient way to traverse through String/JSONObject to mask that piece of data.
I've tried turning the String into a JSONObject and traverse through every embedded JSONObject/Array, detect the pattern, and replace that original value with its masked version. But this seems very time consuming when Logging this information out to console.
Value's pattern for reference is a 9-digit (Long) number.
Structure always varies from "{"key1":[{"innerKey1":123456789}]}" to "{"key1":"value1", "key2":{"innerKey1":123456789}"
Sample result : "{"key1":[{"innerKey1":"XXXXXX789"}]}"
If the JSON structure is always provided as optimized single line string you could just find the value in the string and replace it or get even more elaborate and use a regular expression to find the innerKey1:12345 match and replace that.
If this is just for logging purposes you can even implement this as a filter, depending on your logging framework it might be even configurable instead of having to code it.

convert an hashMap.toString back to hashmap

I have a log line from my logs like:
{Contact={attributes={type=Contact}, Id=003, Email=xxx#xxx.com,, Account={attributes={type=Account}, Name=NBC, LLC}}, fromAccount=true}
This was logged using HashMap.toString()
I need to convert it back to hashMap.
I tried objectMapper etc and looked around on google, I could not find a solution.
Please advise how to do.
You could use Snake Yaml which looks like this and creates a Map by default, but the problem you have is non standard field syntax like
, Name=NBC, LCC
The first , is a separator, the second is part of the field. The ,, is also a bit odd. Is there a , at the end of a value?

how to parse a file

Alright, i have an assignment and i dont know how to parse the file. Is string tokenizer my best option?
The file has commas, newlines and spaces. S is the starting state and small a is the input and the big A is the next state. Should i parse the file into seperate variables and run it through a switch case to simulate a state machine?
This is the file
‘Ends in a
2
S, a, A
S, b, S
A, a, A
A, b, S
F: A
aba
bbaabba
bbabab
aaaab
b
a
Thank you so much because i just cant seem to get started...
My biggest question is how can i parse the file?
Like any other text file. There are literally millions of examples on how to do this on the web.
I would look for examples using the Scanner class.
I am not very good at parsing files. Especially in this situation.
With practice it will get easier. Doing this assignment will help.
Should i use dilimeters?
The file has delimiters so I don't why you wouldn't.
comma and newline?
Your file has commas, newlines and spaces.
and put the states into an array and the inputs ( a,b) into a second array?
Java is an object orientated programming language. Perhaps using Collections like Map and Objects is a better choice.
Should i check for digits, isaplha?
I would just assume the file is formatter correctly and read numbers when you expect to have a number and strings when you expect to have a word/token.
lower case and uppercase alpha?
Not sure if this is a consideration.
i am thinking i need a switch and a couple of cases to handle the state transitions?
If your states were handled in Java code, I would say yes. However you states are being read from a text files and stored in a data structure. In this case its simpler not to use switches.
Can someone explain how i should go about handling this file so i can process it?
Read it, store the data in a structure, process the inputs.
I am also confused on how to handle the :F A in that file..
This is information you need to record to determine when your DFA stops.
Java is an object-oriented language so build a series of classes that reflect the real world.
Example:
What do you have? And what do they need to be able to do
DFA
has a series of states
needs to be able to accept/reject input strings
State
has a collection of inputs to look for and states to transition to based on input
needs to be able to check for a token and transition to a new state
So these kind govern how you should lay out your classes (members and methods). So you should make a DFA class and it should have a method: public boolean process(String input).

How can I search a string in a very big file with a specific format in java? [duplicate]

This question already has an answer here:
Closed 12 years ago.
Possible Duplicate:
do searching in a very big ARPA file in a very short time in java
my file's format:
\data\
ngram 1=19
ngram 2=234
ngram 3=1013
\1-grams:
-1.7132 puluh -3.8008
-1.9782 satu -3.8368
\2-grams:
-1.5403 dalam dua -1.0560
-3.1626 dalam ini 0.0000
\3-grams:
-1.8726 itu dan tiga
-1.9654 itu dan untuk
\end\
As you can see I have a number of lines in ngram 1,2 and 3. There is no need to read the whole file. If an input string is a one-word string, the program can just search in \1-grams: part. If an input string is a two-word string, the program can just search in \2-grams: part and so on. At last if the program finds the input string in the file, it has to return two numbers which are located at the left and right sides of the string. Also, I have to say that each part of the file has been sorted. I am sure that I do not have to read the file completely, and using the index file can not solve my problem. These ways take a lot of time, and my lecturer said that searching has to be done in less than 1 minute for such a big file. I think the best thing is to find a way to jump to a specific line not byte of the file, but I do not know how I can do it. It will be great if someone can help me to solve my problem.
My file is almost 800MB. I have found that using BufferedReader is a good way to read a file very fast, but when I read such a big file and put it in an array line by line, it takes more than 30 minutes.
How big is your file? A minute is a very long time. I would suggest using a BufferedReader for efficiency (and also for its readLine method).
If that really takes too long, two approaches come to mind that don't use indexes:
Force every line in the file to be the same length. Then you can jump to a specific line by calculating its start. If you don't know the line number you need, then at least you can use this to efficiently do a binary search of the entire file.
Jump to an arbitrary position and read forward until you get to a line that starts with a \. That will tell you whether you've found the right part or whether you need to jump forward from there or backward from the arbitrary position that you jumped to. This can also be used to create a binary search strategy for the data you need. It relies on the \ being a reliable indicator of the start of a part.

Categories