java - Reading and writing text in spezific locations inside a .txt file - java

I am very confused by the amount of different read and write methods in Java and couldn't find one that fits my needs.
I have a text file that looks like this:
CharName{
Sheet{
Vitality{
Short=LP,Name=Life,Base=1000,Equip=0,Buff=0;
Short=LPR,Name=Life regen per sec,Base=50,Equip=0,Buff=0;
}
Magic{
Short=MP,Name=Mana,Base=100,Equip=0,Buff=0;
Short=MPR,Name=Mana regen per sec,Base=5,Equip=0,Buff=0;
}
}
}
It's actually a lot bigger, but that much is sufficient to show what I mean.
I want to make a function that basically looks something like getInfoFromCharFile(sectionName, categoryName, rowNum, tagName).
Example: getInfoFromCharFile(Sheet, Vitality, 2, Name) so that it gives me 50) and also the same with writing saveInfoToCharFile(sectionName, categoryName, rowNum, tagName, newValue)
On top of that comes that I'm trying to make it that when one of those files gets read, I have an easy method of 'loading it in' (reading everything systematically) - so something that goes like "find sheet; find Vitality; get everything in between the next { and }; split at ;;.; find Magic; get everything in between the next { and };. And so on and so forth.
I have an idea how to do the spiting and processing, but how can I get it to 'load' everything in between the braces?
Note: I did read, watch and Google a lot about this, but couldn't find anything that gave me an idea on how to solve my problem. I'm sitting on this problem for weeks now! I used to just use properties, but that got really ugly really fast.
I would show you the code to 'load' in that solution, but the last time I had to edit it, it literally took me hours to understand it again - so yeah, not very good code I guess, that's why I want to replace it like that.

I see two ways to to this:
The hard way:
You insist to use your custom format. In that case, you will probably have to use regular expressions to read your files, you will have to be prepared to handle malformed input, etc.
You can use Matcher or Scanner to do the matching. If you feel adventurous, you can do state-based parsing, by reading the chars one by one.
I actually did something like that a while ago and maybe you can take inspiration from the source...or even use it, last time I checked, it worked. (especially the JONReader could be useful.
The easy way:
Use a standard format like XML, JSON (your format is already quite close to that anyway!), or even some CSV stuff. There are a lot of solid parsers for those out there, and it might become as easy as "charname.getMagic()", without worrying for the details.
If you decide to use JSON, what would seem the most fitting thing for your approach, I suggest having a look at minimal-json or Jackson, depending on how you want to work with it.

Related

deeplearning4j generate response to input

I have recently been trying to learn DL4J but have run into some issues. They have an example of a neural network generating Shakespeare-like text based off and input character but I can't seem to find anything that wold indicate a possible way of creating a response to an input statement.
I would like to use an input string such as "Hello" and have it be able to generate a response of varying length depended on the input. I would like to know if this is possible using LSTM and have a point in the right direction as I have no idea where to even start.
We have plenty of documentation this actually. This gives you a layout of what an RNN looks like:
http://deeplearning4j.org/usingrnns
The model you would be looking at is character level, in general what you want is question answering though. You may want to look at an architecture like this: https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf
If you are completely new to NLP, I would look at this class:
https://www.youtube.com/playlist?list=PLhVhwi0Pz282aSA2uZX4jR3SkF3BKyMOK
It covers question answering as well.

Insert pictures to .docx in java

i have seen countless of posts about how to insert pictures or images into word documents using Apache POI. There is a major problem to this: I can not say, where exactly to put them. Of course you can put them into a certain segment using the alignment option, but this seems to me a little bit lack luster. What i would like to have is something along the lines of
insert(xcoord, ycoord, width, length, pictureName);
but i have not seen this being done anywhere. Is there already an elegant approach of how to implement this?
Because another issue with this is the following: Yeah, of course i could take the Microsoft XML code and parse this xml with the wanted data, but that is kind of a pain in the arse and not efficient at all. So yeah, that's that.
Greetings, newbie

Possible alternatives or solution to reading and writing large objects with Gson java

I'm attempting to read and write an object through gson. Early in the project this was completely viable and doing great, but as I wrote more data for that object I eventually ran across something along the lines of this:
I can't seem to grab the full stacktrack seeing as it overflows my console within milliseconds, but I've pastebinned everything my console could grab: http://pastebin.com/v36d5qua
If there is a solution to this, or possibly just a better api for this purpose I would really appreciate some advice.
Current usage: http://pastebin.com/2Yk2v0Tm
GsonUtil.save(player, Player.class, new File("./resources/players/"+player.getId()+".json"));
P.S I'm new to java & this site in general, if I have misleading tags, title etc please let me know.
Don't use gson. It's slow, it's buggy, it's inconvenient to use. Just use org.json - http://theoryapp.com/parse-json-in-java/

Is it possible to do this type of search in Java

I am stuck on a project at work that I do not think is really possible and I am wondering if someone can confirm my belief that it isn't possible or at least give me new options to look at.
We are doing a project for a client that involved a mass download of files from a server (easily did with ftp4j and document name list), but now we need to sort through the data from the server. The client is doing work in Contracts and wants us to pull out relevant information such as: Licensor, Licensee, Product, Agreement date, termination date, royalties, restrictions.
Since the documents are completely unstandardized, is that even possible to do? I can imagine loading in the files and searching it but I would have no idea how to pull out information from a paragraph such as the licensor and restrictions on the agreement. These are not hashes but instead are just long contracts. Even if I were to search for 'Licensor' it will come up in the document multiple times. The documents aren't even in a consistent file format. Some are PDF, some are text, some are html, and I've even seen some that were as bad as being a scanned image in a pdf.
My boss keeps pushing for me to work on this project but I feel as if I am out of options. I primarily do web and mobile so big data is really not my strong area. Does this sound possible to do in a reasonable amount of time? (We're talking about at the very minimum 1000 documents). I have been working on this in Java.
I'll do my best to give you some information, as this is not my area of expertise. I would highly consider writing a script that identifies the type of file you are dealing with, and then calls the appropriate parsing methods to handle what you are looking for.
Since you are dealing with big data, python could be pretty useful. Javascript would be my next choice.
If your overall code is written in Java, it should be very portable and flexible no matter which one you choose. Using a regex or a specific string search would be a good way to approach this;
If you are concerned only with Licensor followed by a name, you could identify the format of that particular instance and search for something similar using the regex you create. This can be extrapolated to other instances of searching.
For getting text from an image, try using the API's on this page:
How to read images using Java API?
Scanned Image to Readable Text
For text from a PDF:
https://www.idrsolutions.com/how-to-search-a-pdf-file-for-text/
Also, PDF is just text, so you should be able to search through it using a regex most likely. That would be my method of attack, or possibly using string.split() and make a string buffer that you can append to.
For text from HTML doc:
Here is a cool HTML parser library: http://jericho.htmlparser.net/docs/index.html
A resource that teaches how to remove HTML tags and get the good stuff: http://www.rgagnon.com/javadetails/java-0424.html
If you need anything else, let me know. I'll do my best to find it!
Apache tika can extract plain text from almost any commonly used file format.
But with the situation you describe, you would still need to analyze the text as in "natural language recognition". Thats a field where; despite some advances have been made (by dedicated research teams, spending many person years!); computers still fail pretty bad (heck even humans fail at it, sometimes).
With the number of documents you mentioned (1000's), hire a temp worker and have them sorted/tagged by human brain power. It will be cheaper and you will have less misclassifications.
You can use tika for text extraction. If there is a fixed pattern, you can extract information using regex or xpath queries. Other solution is to use Solr as shown in this video.You don't need solr but watch the video to get idea.

Search for commented-out code across files in Eclipse

Is there a quick way to find all the commented-out code across Java files in Eclipse?
Any option in Search, perhaps, or any add-on that can do this?
It should be able to find only code which is commented out, but not ordinary comments.
In Eclipse, I just do a file search with the regular expression checkbox turned on:
(/\*.*;.*\*/)|(//.*;)
It will find semicolons in
// These;
and /* these; */
Works for me.
Sonar can do it: http://www.sonarsource.org/commented-out-code-eradication-with-sonar/
You can mark your own commented code with a task tag. You can create your own task tags in Eclipse.
From the menu, go to Window -> Preferences. In the Preferences dialog, go to General -> Editors -> Structured Text Editors -> Task Tags.
Add an appropriate task tag, like COMMENTED. Set the priority to Low.
Then, any code you comment out, you can mark with the COMMENTED task tag. A list of these task tags, along with their locations, appears in the Tasks view.
#Jorn said:
I think [the OP] wants to find code that is commented out, not code that has a comment.
If the intention is to find commented out code, then I don't think it is possible in general. The problem is that it is impossible to distinguish between comments that were written as code or pseudo-code, and code that is commented out. Making that distinction requires human intelligence.
Now IDE's typically have a "toggle comments" function that comments out code in a particular way. It would be feasible to write a tool / plugin that matches the style produced by a
particular IDE. But that's probably not good enough, especially since reformatting the code typically gets rid of the characteristics that made the commented out code recognizable.
If the problem is to find commented-out code, what is needed is a way to find comments, and way to decide if a comment might contain code.
A simple way to do this is to search for comment that contain code-like things. I'd be tempted to hunt for comments containing a ";" character (or some other rare indicator such as "="); it will be pretty hard to have any interesting commented code that doesn't contain this and in my experience with comments, I don't see many that people write that contain this. A regexp search for this should be pretty straightforward, even if it picked up a few addtional false positives (e.g. // in a string literal).
A more sophisticated way to accomplish this is to use a Java lexer or parser. If you have a lexer that returns comments at tokens (not all of them do, Java compilers aren't interested in comments), then you can simply scan the lexemes for a comment and do the semicolon check I described above. You won't get any false positives hits for comment like things in string literals with this approach.
If you have a re-engineering parser that captures comments as part of the AST ( such as our SD Java Front End),
you can mechanically scan the parse tree for comments, feed the comment context back to the parser
to see if the content is code like, and report any that passes that test modulo some size-depedent error rate
(10 errors in 15 characters implies "really is a comment"). Now the "code-like" test requires
the reengineering parser be willing to recognize any substring of the (Java) language.
Our DMS Software Reengineering Toolkit underlying the Java Front End can actually do that, using access to the grammar buried in the front end, as it is willing to start a parse for any language (non)terminal,
and this question is "can you find a sequuence of (non)terminals that consumes the string?".
The lexer and parser approaches are small and big sledgehammers respectively. If OP is going to do this just once, he can stick to the manual regex search. If the problem is to vet the code base repeatedly (needed in big organizations), he'd want a tool that can be run on regular basis.
You can do a search in Eclipse.
All you need to search for is /* and //
However, you will only find the files which contain that expression, and not the actual content which I believe you are after.
However, if you are using Linux you can easily get all the comments with a one liner.

Categories