Parsing CSV-like data for use with JDBC

Parsing CSV-like data for use with JDBC - java

How would I generate the output in Java with regex, split, or tokenizing? I need to remove the spaces around words and the commas between the words. I want to store this output and then jdbc will parse this data into the tables of my MySQL database.

Since this is homework, I'm only going to give you HINTS. You then need to go away and try and code this yourself:
How would I generate the output
The simple way is to use String concatenation and System.out.println(...);
... if I have to do in java with Regex or split or tokenizer?
Either will work. Some ways are simpler than others. You figure out which. (It is important that you figure this out for yourself!)
I need to remove the " "around the words
If you need to do this after splitting, lookup String.trim().
... and , between the words.
You should already have done this with whatever it is you used to split the line into parts.
I want to store this output and then jdbc will parse this data into the tables of my database on mysql.
Do you need to do this? Does the homework question expect you to do this?
If you do, check your lecture notes / text book on how to use JDBC and SQL from Java. (I'm sure they won't have thrown you this question without giving you lots of material about this.)

Related

Regex for IP and string

Im using this regex online test site.
Here is the regex im using:
\{"ip":"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$","iphone":"admin/ios","dev":\{"action":"CUS","from":"REG","CUSA":"ADVERT"\}\}
And im trying to match it to:
{"ip":"192.168.50.5","iphone":"admin/ios","dev":{"action":"CUS","from":"REG","CUSA":"ADVERT"}}
When i run the test, it doesn't match, I need it to match on the site above for validation reasons.

A different perspective: it seems that it is already pretty hard to come up with a regex that initially works for you. What does this tell you about how hard will it be in the future to maintain this regex; and maybe extend it?!
What I am saying is: regexes are a good tool; but sometimes overrated. This looks like a string in JSON format. Wouldn't it be better to just take it as that, and use a garden-variety JSON parser instead of trying to build your own regex?
You see, what will be more robust over time - your self baked regex; or some standard library that millions of people are using?
One place to read about JSON parsers would be this question here.

This will be enough for your context.
"ip":"(\d+).(\d+).(\d+).(\d+)"
Edit:
Regex is not for structured data processing, most of the time you need a solution that just works. When sample data changed and doesn't match anymore, you update the regex string to match it again.
Since you want to get four numbers inside a quote pair after a key called "ip", this regex will definitely do it.
If you want something else, please provide more context. Thanks!

Is there a tool to take a block of text and turn it into Java StringBuffer code?

I have several blocks of text that I need to be able to paste inline in my code for some unit tests. It would make the code difficult to read if they were externalized, so is there some web tool where I can paste in my text and it will generate the code for a StringBuffer that preserves it's formatting? Or even a String, I'm not that picky at this point.
This seems like a code generator like this must exist somewhere on the web. I tried to Google one, but I have yet to come up with a set of search terms that don't fill my results with Java examples and documentation.
I suppose I could write one myself, but I'm in a bit of a time crunch and would rather not duplicate effort.

If I understood it correctly, any text editor which supports regexps should make it an easy task. For instance Notepad++ - just replace ^(.+)$ with "\1"+, then copy the result to the code, remove the last + and add String s = to the beginning :)

If you want to externalize then, use a properties file or something like that to read the text.
If you are looking for a simple tool to break up your text into concatenated strings that are joined together by stringbuffer then, most modern IDE will help you do it automatically. Here's how.
Copy the block of text in the IDE
Surround it in double quotes and assign to a String type variable. (This step may not be required)
Enter carriage returns wherever you want to wrap the text to next line and the IDE will automatically break the literals, concatenate them using double quotes "" and add them together
All modern compilers will internally convert "addas" + "addasfdas" literals to a String using StringBuffer.

The squirrel SQL client has a function called convert to string buffer it works nice.

How to escape special characters used in SQL query?

Is there a Java library for escaping special characters from a string that is going to be inserted into an SQL query.
I keep writing code to escape various things, but I keep finding some new issue trips me up. So a library that takes care of all or most of the possibilities would be very handy.
EDIT: I am using MySQL (if that makes any difference).

Well... jdbc. Pass the strings as parameters, and don't append them to the query string

A little bit more research points me to this:
http://devwar.blogspot.com/2010/06/how-to-escape-special-characters-in.html
Which suggests to use apache.commons.lang.StringEscapeUtils, I will try this out

I know this is a long time thread, but using the commonslang library there is a method called escapeSql(String). Also using prepared statement automatically escape the offending SQL character.

Sentence Auto-Complete with Java

Lets say I have about 1000 sentences that I want to offer as suggestions when user is typing into a field.
I was thinking about running lucene in memory search and then feeding the results into the suggestions set.
The trigger for running the searches would be space char and exit from the input field.
I intend to use this with GWT so the client with be just getting the results from server.
I don't want to do what google is doing; where they complete each word and than make suggestions on each set of keywords. I just want to check the keywords and make suggestions based on that. Sort of like when I'm typing the title for the question here on stackoverflow.
Did anyone do something like this before? Is there already library I could use?

I was working on a similar solution. This paper titled Effective Phrase Prediction was quite helpful for me . You will have to prioritize the suggestions as well

If you've only got 1000 sentences, you probably don't need a powerful indexer like lucene. I'm not sure whether you want to do "complete the sentence" suggestions or "suggest other queries that have the same keywords" suggestions. Here are solutions to both:
Assuming that you want to complete the sentence input by the user, then you could put all of your strings into a SortedSet, and use the tailSet method to get a list of strings that are "greater" than the input string (since the string comparator considers a longer string A that starts with string B to be "greater" than B). Then, iterate over the top few entries of the set returned by tailSet to create a set of strings where the first inputString.length() characters match the input string. You can stop iterating as soon as the first inputString.length() characters don't match the input string.
If you want to do keyword suggestions instead of "complete the sentence" suggestions, then the overhead depends on how long your sentences are, and how many unique words there are in the sentences. If this set is small enough, you'll be able to get away with a HashMap<String,Set<String>>, where you mapped keywords to the sentences that contained them. Then you could handle multiword queries by intersecting the sets.
In both cases, I'd probably convert all strings to lower case first (assuming that's appropriate in your application). I don't think either solution would scale to hundreds of thousands of suggestions either. Do either of those do what you want? Happy to provide code if you'd like it.

Parsing of data structure in a plain text file

How would you parse in Java a structure, similar to this
\\Header (name)\\\
1JohnRide 2MarySwanson
1 password1
2 password2
\\\1 block of data name\\\
1.ABCD
2.FEGH
3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc
Suppose, it comes from a text buffer (plain file).
Each line of text is "\n" - limited. Space is used between the words.
The structure is more or less defined. Ambuguity may sometimes be, though, case
number of fields in each line of information may be different, sometimes there may not
be some block of data, and the number of lines in each block may vary as well.
The question is how to do it most effectively?
First solution that comes to my head is to use regular expressions.
But are there other solutions? Problem-oriented? Maybe some java library already written?

Check out UTAH: https://github.com/sonalake/utah-parser
It's a tool that's pretty good at parsing this kind of semi structured text

As no one recommended any library, my suggestion would be : use REGEX.

From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).

This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).
First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?
After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.
If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).
Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).

If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.
E.g., the password can be gotten from your password line like this:
final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.