Writing a LaTeXParser in Java, conceptually attacking [closed] - java

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Alright so I have to write a LaTeXParser in java, I'm going to be taking in a file much like this one below and reading it for validity and errors. Now I am not looking for help really or code but more of a conceptual understanding, how to attack the problem. I am going to be using Stacks to store the blocks and make sure everything is sorted properly. So my question to you is, how to handle it?
For example, Should I begin by getting all the "\begin{_}" and putting them in a stack and then pop them with their corresponding "\end{}"? I was wondering using a String based case switch system that, when particular strings were found, would perform the actions necessary based on that string, on my stack.
Or maybe 2 Stacks that cancel each other out, all the \begins in one and the \ends in another and has their {__} match up, I start poping them out and what not.
So yeah, just wondering what the bright minds of SOF had to say about how I should be thinking about this problem and how to deal with it. Thanks for your input!
\documentclass{article}
\usepackage{amsmath, amssymb, amsthm}
\begin{document}
{\Large \begin{center} Homework Problems \end{center}}\begin{itemize}\item\end{itemize}
\begin{enumerate}
\item Prove: For all sets $A$ and $B$, $(A - B) \cup
(A \cap B) = A$.
\begin{proof}
\begin{align}
& (A - B) \cup (A \cap B) && \\
& = (A \cap B^c) \cup (A \cap B) && \text{by
Alternate Definition of Set Difference} \\
& = A \cap (B^c \cup B) && \text{by Distributive Law} \\
& = A \cap (B \cup B^c) && \text{by Commutative Law} \\
& = A \cap U && \text{by Union with the Complement Law} \\
& = A && \text{by Intersection with $U$ Law}
\end{align}
\end{proof}
\item If $n = 4k + 3$, does 8 divide $n^2 - 1$?
\begin{proof}
Let $n = 4k + 3$ for some integer $k$. Then
\begin{align}
n^2 - 1 & = (4k + 3)^2 - 1 \\
& = 16k^2 + 24k + 9 - 1 \\
& = 16k^2 + 24k + 8 \\
& = 8(2k^2 + 3k + 1) \text{,}
\end{align}
which is certainly divisible by 8.
\end{proof}
\end{enumerate}
\end{document}
EDIT: Lol I think everyone is overthinking this wayyyyyy too much, I am not looking for anything that recognizes and compiles code, or actually performs the actions of the LATEX language via this file. I simply want to be able to write up a text file, like the one above, have my program open it, read it, and say "hey! this would work because every block that begins also ends!" Or "hey theres an error on line 10!" Nothing more, nothing less. Just a simple validator/error checker that uses Stacks to contain the blocks and then pops them when the end is found and so on. Again I AM NOT LOOKING FOR CODE OR HANDOUTS! All I would like is some good ideas and methods for attacking this problem, maybe some pseudo code structuring at best!
For example...I was thinking of having this all contained in 1 class, in my main, and making a Stack that would hold all of the Strings in the file that were coded like such " \begin{_} " then when I found the corresponding " \end{} " just popping it out and check it off a list or something. If every beginning block is popped by the end of my run through the file, I have a valid .txt file.

Trying to roll your own parser is a big task. There are a number of Parser Generators that take some of the busy work out of the task. ANTLR is a popular one for java.
One of the first things you're going to need to do is find out what kind of language latex is? More complicated languages like C++ can't be parsed with the same kinds of parsers that you can use for a more regular language like forth.
The following Jules Bean post leads me to think that latex is harder to parse than most programming languages.
I'm pretty sure it's not an LALR language. It's context dependent and is capable of modifying it's own syntax. I think it is probably technical impossible to parse without actually executing the macros. I.e. you need a TeX state machine to parse it in full generality.
'well-behaved' LaTeX is probably LALR, though.

Related

Java CSV parser with unescaped quotes [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have a CSV file that has some quoting issues:
"Albanese Confectionery","157137","ALBANESE BULK ASST. MINI WILD FRUIT WORMS 2" 4/5LB",9,90,0,0,0,.53,"21",50137,"3441851137","5 lb",1,4,4,$6.7,$6.7,$26.8
SuperCSV is choking on these fruit worms (pun intended). I know that the 2" should probably be 2"", but it's not. LibreOffice actually parses this correctly (which surprises me). I was thinking of just writing my own little parser but other rows have commas inside the string:
"Albanese Confectionery","157230","ALBANESE BULK JET FIGHTERS,ASSORTED 4/5 B",9,90,0,0,0,.53,"21",50230,"3441851230","5 lb",1,4,4,$6.7,$6.7,$26.8
Does anyone know of a Java library that will handle crazy stuff like this? Or should I try all the available ones? Or am I better off hacking this out myself?
The right solution is to find the person who generated the data and beat them over the head with a keyboard until they fix the problem on their end.
Once you've exhausted that route, you could try some of the other CSV parsers on the market, I've used OpenCSV with success in the past.
Even if OpenCSV won't solve the problem out of the box, the code is fairly easy to read and available under an Apache license, so it might be possible to modify the algorithm to work with your wonky data, and probably easier than starting from scratch.
Surprising even myself here, but I think I would hack it myself. I mean, you only need to read the lines and generate the tokens by splitting on quotes/commas, whichever you want. That way you can adjust the logic the way it suites you. It's not very hard. The file seems to be broken as much so that going through some existing solutions seems like more work.
One point though - if LibreOffice already parses it correctly, couldn't you just save the file from there, thus generating a file that is more reasonable. However, if you think LibreOffice might be guessing, just write the tokenizer yourself.
+1 for the 'choking on fruit worms' pun - I nearly choked on my coffee reading that :)
If you really can't get that CSV fixed, then you could just supply your own Tokenizer (Super CSV is very flexible like that!).
You'd normally write your own readColumns() implementation, but it's quicker to extend the default Tokenizer and override the readLine() method to intercept the String (and fix the unescaped quotes) before it's tokenized.
I've made an assumption here that any quotes not next to a delimiter or at the start/end of the line should be escaped. It's far from perfect, but it works for your sample input. You can implement this however you like - it was too early in the morning for me to use a regex :)
This way you don't have to modify Super CSV at all (it just plugs in), so you get all of the other features like cell processors and bean mapping as well.
package org.supercsv;
import java.io.IOException;
import java.io.Reader;
import org.supercsv.io.Tokenizer;
import org.supercsv.prefs.CsvPreference;
public class FruitWormTokenizer extends Tokenizer {
public FruitWormTokenizer(Reader reader, CsvPreference preferences) {
super(reader, preferences);
}
#Override
protected String readLine() throws IOException {
final String line = super.readLine();
if (line == null) {
return null;
}
final char quote = (char) getPreferences().getQuoteChar();
final char delimiter = (char) getPreferences().getDelimiterChar();
// escape all quotes not next to a delimiter (or start/end of line)
final StringBuilder b = new StringBuilder(line);
for (int i = b.length() - 1; i >= 0; i--) {
if (quote == b.charAt(i)) {
final boolean validCharBefore = i - 1 < 0
|| b.charAt(i - 1) == delimiter;
final boolean validCharAfter = i + 1 == b.length()
|| b.charAt(i + 1) == delimiter;
if (!(validCharBefore || validCharAfter)) {
// escape that quote!
b.insert(i, quote);
}
}
}
return b.toString();
}
}
You can just supply this Tokenizer to the constructor of your CsvReader.

java: A long list of conditions , what to do? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need suggestion for the right approach to apply conditions in Java.
I have 100 conditions based on which I have to change value of a String variable that would be displayed to the user.
an example condition: a<5 && (b>0 && c>8) && d>9 || x!=4
More conditions are there but variables are the same more or less.
I am doing this right now:
if(condition1)
else if(condition2)
else if(condition3)
...
A switch case alternative would obviously be there nested within if-else's i.e.
if(condition1)
switch(x)
{
case y:
blah-blah
}
else if(condition2)
switch(x)
{
case y:
blah-blah
}
else if(condition3)
...
But I am looking for some more elegant solution like using an Interface for this with polymorphic support , What could be the thing that I could possibly do to avoid lines of code or what should be the right approach.
---Edit---
I actualy require this on an android device. But its more of a java construct here.
This is a small snapshot of conditions that I have with me. More will be added if a few pass/fail. That obviously would require more if-else's with/without nesting. In that case would the processing go slow.
I am as of now storing the messages in a separate class with various string variables those I have kept static so if a condition gets true
then I pick the static variable from the only class and display that
one. Would that be right about storing the resultant messages.
Depending on the number of conditional inputs, you might be able to use a look-up table, or even a HashMap, by encoding all inputs or even some relatively simple complex conditions in a single value:
int key = 0;
key |= a?(1):0;
key |= b?(1<<1):0;
key |= (c.size() > 1)?(1<<2):0;
...
String result = table[key]; // Or result = map.get(key);
This paradigm has the added advantage of constant time (O(1)) complexity, which may be important in some occasions. Depending on the complexity of the conditions, you might even have fewer branches in the code-path on average, as opposed to full-blown if-then-else spaghetti code, which might lead to performance improvements.
We might be able to help you more if you added more context to your question. Where are the condition inputs coming from? What are they like?
And the more important question: What is the actual problem that you are trying to solve?
There are a lot of possibilities to this. Without knowing much about your domain, I would create something like (you can think of better names :P)
public interface UserFriendlyMessageBuilder {
boolean meetCondition(FooObjectWithArguments args);
String transform(String rawMessage);
}
In this way, you can create a Set of UserFriendlyMessageBuilder and just iterate through them for the first that meets the condition to transform your raw message.
public class MessageProcessor {
private final Set<UserFriendlyMessageBuilder> messageBuilders;
public MessageProcessor(Set<UserFriendlyMessageBuilder> messageBuilders) {
this.messageBuilders = messageBuilders;
}
public String get(FooWithArguments args, String rawMsg) {
for (UserFriendlyMessageBuilder msgBuilder : messageBuilders) {
if (msgBuilder.meetCondition(args)) {
return msgBuilder.transform(rawMsg);
}
}
return rawMsg;
}
}
What it seems to me is "You have given very less importance to design the product in modules"
Which is the main factor of using OOP Language.
eg:If you have 100 conditions and you are able to make 4 modules then therotically for anything to choose you need 26 conditions.
This is an additional possibility that may be worth considering.
Take each comparison, and calculate its truth, then look the resulting boolean[] up in a truth table. There is a lot of existing work on simplifying truth tables that you could apply. I have a truth table simplification applet I wrote many years ago. You may find its source code useful.
The cost of this is doing all the comparisons, or at least the ones that are needed to evaluate the expression using the simplified truth table. The advantage is an organized system for managing a complicated combination of conditions.
Even if you do not use a truth table directly in the code, consider writing and simplifyin one as a way of organizing your code.

shall brackets be used for one line conditional statements? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Apart from readability, are there any differences in performance or compile-time when a single-line loop / conditional statement is written with and without brakets?
For example, are there any differences between following:
if (a > 10)
a = 0;
and
if (a > 10)
{
a = 0;
}
?
Of course there is no difference in performance. But there is a difference in the possibility of introducing errors:
if (a>10)
a=0;
If somebody extends code and writes later,
if (a>10)
a=0;
printf ("a was reset\n");
This will always be printed because of the missing braces. Some people request that you always use braces to avoid this kind of errors.
Contrary to several answers, there is a finite but negligible performance difference at compile time. There is zero difference of any kind at runtime.
No, there is no difference, the compiler will strip out non-meaningful braces, line-breaks etc.
The compile time will be marginally different, but so marginally that you have already lost far more time reading this answer than you will get back in compile speed. As compute power increases, this cost goes down yet further, but the cost of reducing readability does not.
In short, do what is readable, it makes no useful difference in any other sense.
A machine code does not contain such braces. After compilation, there is no more {}. Use the most readable form.
Well, there is of course no difference between them as such at runtime.
But you should certainly use the 2nd way for the sake of maintainence of your code.
Why I'm saying this is, suppose in future, you need to add some more lines to your if-else block to expand them. Then if you have the first way incorporated in your old code, then you would have to add the braces before adding some new code. Which you won't need to do in 2nd case.
So, it is far easier to add code to the 2nd way in future, than to the 1st one.
Also, if you are using the first way, you are intended to do typing errors, such as semi-colon after your if, like this: -
if (a > 0);
System.out.println("Hello");
So, you can see that your Hello will always get printed. And these errors you can easily remove if you have curly braces attached to your if.
It depends on the rest of the coding guidelines. I don't see any
problem dropping the braces if the opening brace is always on a line
by itself. If the opening brace is at the end of the if line,
however, I find it too easy to overlook when adding to the contents. So
I'd go for either:
if ( a > 10 ) {
a = 0;
}
regardless of the number of lines, or:
if ( a > 10 )
{
// several statements...
}
with:
if ( a > 10 )
a = 0;
when there is just one statement. The important thing, however, is that
all of the code be consistent. If you're working on an existing code
base which uses several different styles, I'd alway use braces in new
code, since you can't count on the code style to ensure that if they
were there, they'd be in a highly visible location.

How far should best practices like avoiding magic number should go? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
In near future we might be enforced by a rule by which we can not have any hard coded numbers in our java source code. All the hard coded numbers must be declared as final variables.
Even though this sounds great in theory it is really hard/tedious to implement in code, especially legacy code. Should it really be considered "best practice" to declare numbers in following code snippets as final variables?
//creating excel
cellnum = 0;
//Declaring variables.
Object[] result = new Object[2];
//adding dash to ssn
return ssn.substring(1, 3)+"-"+ssn.substring(3, 5)+"-"+ssn.substring(5, 9);
Above are just some of the examples I could think of, but in these (and others) where would you as a developer say enough is enough?
I wanted to make this question a community wiki but couldn't see how...?
Definitely no. Literal constants have their places, especially low constants such as 0, 1, 2, ...
I don't think anyone would think
double[] pair = new double[PAIR_COUNT];
makes more sense than
double[] pair = new double[2];
I'd say use final variables if
...it increases readability,
...the value may change (and is used in multiple places), or
...it serves as documentation
A related side note: As always with coding standards / conventions: very few (if any) rules should be followed strictly.
Replacing numbers by constants makes sense if the number carries a meaning that is not inherently obvious by looking at its value alone.
For instance,
productType = 221; // BAD: the number needs to be looked up somewhere to understand its meaning
productType = PRODUCT_TYPE_CONSUMABLE; // GOOD: the constant is self-describing
On the other hand,
int initialCount = 0; // GOOD: in this context zero really means zero
int initialCount = ZERO; // BAD: the number value is clear, and there's no need to add a self-referencing constant name if there's no other meaning
Generally speaking, if a literal has a special meaning, it should be given a unique name rather than assuming things. I'm not sure why it is "practically" hard/tedious to do the same.
Object[] result = new Object[2]; => seems like a good candidate for using a Pair class
cellnum = 0; => cellnum = FIRST_COLUMN; esp since you might end up using an API which treats 1 as the starting index or maybe you want to process an excel in which columns start from 2.
return ssn.substring(1, 3)+"-"+ssn.substring(3, 5)+"-"+ssn.substring(5, 9) => If you have code like this littered throughout your codebase, you have bigger problems. If this code exists in a single location and is shielded by a sane API, I don't really see a problem here.
I've seen folks consider 0 and 1 accepted exceptions.
The idea is that you want to document why you have two Objects as above for example.
I agree with you about the dashes in SSN. The comment describes it better than 4 named constants.
In general, I like the idea of no magic numbers, but as with every rule, there are pragmatics involved. Legacy code, brings its own issues. It's a lot of work without a lot of productivity in terms of changed behavior to bring old code up to date this way. I would consider doing it in an evolutionary fashion: when you have to edit an old file, bring it up to date.
It really depends on the context doesn't it. If there are numbers in the code that does not indicate why they exist then naming them makes teh code more readable. If you see the number 3.14 in code is it PI? is there any way to tell or is that just a coincidence? Naming it PI will clear up the mystery.
In your example, why is cellnum = 2? why not 10? or 20? That should be named something, say INITIAL_CELL or MAX_CELL. Expecially if this same number, meaning the same thing appears again in the code.
Depends if it needs to be changed. Or for that matter, it can be changed.
If you only need 2 objects (say, for a pair like aioobe mentioned) then that isn't a magic number, it's the correct number. If it's for a variable tuple that, at this moment, is 2, then you probably should abstract it out into a constant.

Is Scala a Functional Programming Language? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've learned programming from Java, then tried to learn one programming language per year, second was C++, then Python. It came to learn next one, I looked for something new, I choose Scala because it was compatible with Java and could be some transition from OOP to Functional Programming.
It was cool, learning new paradigms, new style and new way of thinking. It was great experience just read about elegant Scala concepts, and much better to code on Scala.
Reading a lot of articles I faced this article criticizing Scala:
Scala is not a functional programming language. It is a statically typed object oriented language with closures.
After reading this articles some doubts came to me, I really like Scala and was starting to write on Scala more, but is Scala suits definition of Functional Programming? Is that article says truth or just faking readers? Must I learn Haskell or some other Functional Programming Language to really experience FP?
UPDATE: Expecting rational answers with good examples, without causing disputes.
Scala does not force you to write in a functional style. This is perfectly valid Scala:
var i = 1
while (i < 10) {
println("I like side effects, this is number "+i)
i += 1
}
case class C(var i: Int, var set: Boolean = false)
def setMe(c: C) = { if (!c.set) { c.set = true; c.i += 1 }; c }
setMe(C(5))
So in this sense, horrors, Scala is not functional! Side effects galore, mutable state--everything you can do in Java you can do in Scala.
Nonetheless, Scala permits you to code in functional style, and makes your life easier (than in Java) in a number of ways:
There are first-class functions
There is an immutable collections library
Tail recursion is supported (to the extent that the JVM can manage)
Pattern matching is supported
(etc.)
This looks somewhat more functional:
for (i <- 1 to 10) println("Sometimes side effects are a necessary evil; this is number"+i)
case class C(i: Int, set: Boolean = false)
def setIt(c: C, f: Int=>Int) = C(f(c.i), true)
setIt(C(5), _+1)
It's worth noting that the author of that particular article seems to have a very poor understanding of Scala; pretty much every example that looks ugly in his hands is unnecessarily ugly. For example, he writes
def x(a: Int, b: Int) = a + b
def y = Function.curried(x _)(1)
But it's not that bad, if you pay attention to what you're doing:
def x(a: Int)(b: Int) = a + b
val y = x(1) _
Anyway, the bottom line is that Scala is not a pure functional programming language, and as such, its syntax is not always ideal for functional programming since there are other considerations at play. It does have virtually all of the standard features that one expects from a functional programming language, however.
Scala is a multi-paradigm programming
language designed to integrate
features of object-oriented
programming and functional
programming.
I couldn't say it any better and that's all there is to say except for pointless arguments.
My personal litmus test for a functional language is Church numerals.
Scheme example:
(define (thrice f)
(lambda (x)
(f (f (f x))))))
((thrice 1+) 0)
=> 3
(1+ is a Scheme function that adds 1 to its argument. thrice takes a function f and returns a function that composes f with itself three times. So (thrice 1+) adds three to its argument.)
((thrice (thrice 1+)) 0)
=> 9
(Since (thrice 1+) is a function that adds three, taking the thrice of that gives a function that adds nine.)
And my favorite:
(((thrice thrice) 1+) 0)
=> 27
(Reasoning left as an exercise for the reader. This last example is the most important.)
If you cannot write this example in your language without horrible contortions, then I say it is not a functional language (example: C/C++).
If you can write this example in your language, but it looks very unnatural, then I say your language "supports functional programming" but is not really a functional language (example: Perl).
If this example ports neatly to your language and actually looks not too different from how you use it day to day, then it's a functional language.
I do not know Scala. Anybody want to tell me where it fits? :-)

Categories