Best way to output Stanford NLP results [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Hi folks: I'm using the Stanford CoreNLP software to process hundreds of letters by different people (each about 10KB). After I get the output, I need to further process it and add information at the level of tokens, sentences, and letters. I'm quite new to NLP and was wondering what the typical or best way would be to output the pipeline results from Stanford CoreNLP to permit further processing?
I'm guessing the typical approach would be to output to XML. If I do, I estimate that will take about a GB of disk space, and I wonder, then, how quick and easy it would be to load that much XML back into Java for further processing and adding of information?
An alternative might be to have CoreNLP serialize the annotation objects it produces and load those back for processing. An advantage: not having to figure out how to convert a sentence parse string back into a tree for further processing. A disadvantage: annotation objects contain a lot of different types of objects I'm still quite rough on manipulating and the documentation on these in Stanford CoreNLP seems slim to me.

This is really matter of what you want to do afterwards. Doing serialization is probably the most straightforward and fast approach, the con is that you need to understand the CoreNLP data structure.
What if you want to read it in another language or read into your own data structure, save as XML.
I would go the first way.

Related

Java versus XML&Lua for storing voxel/block types [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
So, I want it to be very easy to create all the entities of my game and for other people to come in and do the same. I was thinking I could just let the users/myself create an XML sheet the stores all the properties of each block (Like a Terraria or Minecraft voxel) and add Lua scripts that are referenced in the XML for additional functionality of any of the blocks.
I'm starting to think It would just be easier to let the user create a JAR file full of classes for each block. And then that JAR file could easily be used to get all the blocks. It'd just be interesting to reference all the blocks by a block id without storing all the classes by ID. Or I could give each class a static id. But that's not important.
Okay, so my short question is what are the pros and cons of storing all the the different types of blocks as classes versus in an XML sheet with Lua for additional functionality?
UPDATE: It looks like I'll be using pure Lua! Looks like an interesting and effective way to do it!
A limitation of the JAR approach is that your data would need to be compiled before it got used. With XML/Lua the data gets read/interpreted at runtime.
A third option that you did not mention is using straight Lua tables instead of XML. This lets you load the data with a simple "require", "dofile" or similar instead of needing to use a XML library as well.

Turn HTML into XML and parse it -- Android Apps [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have been learning how to build android apps this summer. I am currently trying to work on xml parsing which falls under java in this case. I have a few questions that are mostly conceptual and one specific one.
First, in most of the examples I have seen pages already in xml are used. Can I use a page in regular html format and with whatever the program does turn it to xml and then parse it? Or is that what is normally done anyway?
Secondly, I could use a little explanation on how the parser actually works and saves the data so I will better know how to use it (extract it from whatever it is saved in), when the parsing is done.
So for my specific example I am trying to work with some weather data from the NWS. My program will take the data from this page, and after some user input take you to a page like this, which sometimes will have various alerts. I want to select certain ones. This is what I could use help with. I haven't really coded anything on that yet because I don't know what I am doing.
If I need to clarify or rephrase anything in here I am happy too and let me know. I am trying to be a good contributor on here!
Yes you can parse HTML and there are many parsers available too, there is a question about it here Parse HTML in Android, then we have an answer here about parsing html https://stackoverflow.com/a/7114346/826657
Although its a bad idea, as the tag names aren't well named, so you will have to write lots of code searching attributes for a specific data tag, so you always have to prefer XML,for saving lots of code space and also time.
Here is a text from CodingHorror which says at general parsing html is a bad idea.
http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
Here is something which explains parsing an XML document using XML PullParser http://www.ibm.com/developerworks/library/x-android/

Which is the fastest way of converting an Object to a stream of Bytes in Java? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have an object which I want to convert into a stream of bytes and then to operate on it. I don't want to serialise the object, but just to convert it. I have read this article, where Java Unsafe class is used and the conversion is very fast. However, is there any other fast solution for this?
Fast Convertion is possible. You can use GSON lib. then get it to json string. Use the string as per your requirement. Hope this helps.
There are a number of libraries in development to do what you suggest. I believe all of them are discussed on this forum. https://groups.google.com/forum/#!forum/mechanical-sympathy which may also have many topics which may interest you.
In short you can do it using Unsafe, or a library which uses it. In fact I have one of my own, but again it is in development.
For the effort involved this will only make much of a difference if you have many GB of data. At this point the reduce GC times and reduced size of the heap are the main advantages, on saving a single de-reference.

My own code vs library [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
This is kind of unusual question for developers but for some reason i want to post it here and hope to get adequate answer.
Here is a simple example:
I wrote a java function that calculates distance between two geo points. The function is not more than 50 lines of code. I decided to download a source code from ibm that does the same thing but when i opened it i saw that it looks very complicated and is almost thousand lines of code.
What kind of people write such source code? Are they just very good programmers? Should i use their source code or my own?
I have noticed this kind of thing lots of times and i from time to time i start to wonder if it is just me who do not know how exactly to program or maybe i am wrong?
Do you guys have the same kind of feeling when you browse throught some other peoples source code?
The code you found, does it do the exact same calculation? Perhaps it takes into account some edge cases you didn't think of, or uses an algorithm that has better numerical stability, lower asymptotic complexity, or is written to take advantage of branch prediction or CPU caches. Or it could be just over-engineered.
Remember the saying: "For every complex problem there is a solution that is simple, elegant, and wrong." If you are dealing with numerical software, even the most basic problems like adding a bunch of numbers can turn out to be surprisingly complex.

java peephole optimization beginner compilers [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
As part of a group project I'm writing a compiler for a simplified language. As one of the optional features I thought I'd add a peephole optimizer to go over the codegen's output intel assembly code and optimize it.
Our compiler is done in java and it seems like it's going to be a lot of work to create this peephole optimizer using the java I've learned so far. Is there some sort of tool I should be using to make this possible, as pattern matching strings doesn't sound like a good approach in java.
thanks
Peephole optimization ought to be done on a binary representation of the parse tree, not on text intended as input to the assembler.
Hard to say without looking at the design of your compiler, but typically you would have an intermediate step between generating code and emitting it. For example, you could think of having the output of the code generation phase be e.g. a linked list of instructions, where each instruction object stores the kind of instruction, any arguments, label/branch destinations, and so on. Then each pattern would inspect the current node and its immediate successors (e.g. if (curr.isMov() && curr.next.isPush() && ...) and modify the list accordingly. Then your peephole optimizer starts with the codegen output, runs each pattern on it, and does this over and over until the list stops changing. Then you have a separate phase which just takes this list of instructions and outputs the actual assembly.
I definitely wouldn't use strings for this. You might look at lex/yacc, and its ilk (e.g. Jack is one for Java, although I haven't used it) to generate the AST of the assembly, then run optimizations on the AST, and write out the assembly again … but you do realise this is a hard thing to do, right? :-)

Categories