sample editor - same as stackOverflow - java

I want to create an editor and store formatted text in database. I want just a sample editor do functions like StackOverFlow editor:
_sfdfdgdfgfg_ : for underlined text
/sfdfdgdfgfg/ : for bolded text
I will use a function to replace the first _ by <b> and for the second </b> (respec. for /).
So my question is how can I do to detect the end and the last _ and / if they are nested??
For example :
/dsdfdfffddf _ dsdsdssd_/ ffdfddsgds /dfdfehgiuyhbf/ ....
I will use this editor in Java Application.

So what you want is a Java Version of markdown.
Here's what Google finds:
http://code.google.com/p/markdownj/

It will not make you happy, but you should probably take the time to learn to write parsers (dragon book is nice for that). The thing with parser tasks is that they are easy if you know how to do it and nearly impossible if you don't.
I would write a tokenizer that can recognize tokens like <start_underline, "_"> and <end_underline, "_"> for the format indicators you want to use in your editor and one for everything else. Results could look like this:
Text: Hello _world_, /how are you?/
Tokens: <text, "Hello ">,<start_underline, "_">,<text, "world">,<end_underline, "_">,<text, ", ">,<start_bold, "/">,<text, "how are you?">,<end_bold, "/">,
Start and End can be tracked fairly easy with boolean variables, because it makes no sense to nest them. That's why I would do that tracking in the tokenizer already.
After that I would write a parser class that takes these tokens and configures the output to a textarea accordingly.
You see, that this is actually just an application of the principle divide and conquer. The task of How do I do everything I want with my text? is split up into 3 parts:
According to a useful structure, what is this string about? (Answer from Tokenizer)
How do I handle specific textpart x for all possible x? (Answer from Parser)
How do I represent the parsers interpretation of this string? (Answer from JTextpane or alike)
Both Tokenizer and Parser don't need to be extra classes. Because the context is not complicated they can just be methods in an extension class of the Textarea type you prefer.
Giving more detailed advice is not helpful I think, the next best step would be an implementation that you probably better want to do by yourself. Don't hesitate to ask if you fail to find a good solution to one specific part, though.

You can see stackoverflow.com Page Source and try to integrate... I guess it should work...
https://blog.stackoverflow.com/2008/12/reverse-engineering-the-wmd-editor/

This is an example show how to use MarkDownJ:
First, make sure that MarkdownJ is as a class library invoked in your Java application.
Second, use this code to invoke MarkdownJ :
MarkdownProcessor m = new MarkdownProcessor();
String html = m.markdown("this is *a sample text*");
System.out.print("<html>"+html+"</html>");

Related

Regex for IP and string

Im using this regex online test site.
Here is the regex im using:
\{"ip":"(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$","iphone":"admin/ios","dev":\{"action":"CUS","from":"REG","CUSA":"ADVERT"\}\}
And im trying to match it to:
{"ip":"192.168.50.5","iphone":"admin/ios","dev":{"action":"CUS","from":"REG","CUSA":"ADVERT"}}
When i run the test, it doesn't match, I need it to match on the site above for validation reasons.
A different perspective: it seems that it is already pretty hard to come up with a regex that initially works for you. What does this tell you about how hard will it be in the future to maintain this regex; and maybe extend it?!
What I am saying is: regexes are a good tool; but sometimes overrated. This looks like a string in JSON format. Wouldn't it be better to just take it as that, and use a garden-variety JSON parser instead of trying to build your own regex?
You see, what will be more robust over time - your self baked regex; or some standard library that millions of people are using?
One place to read about JSON parsers would be this question here.
This will be enough for your context.
"ip":"(\d+).(\d+).(\d+).(\d+)"
Edit:
Regex is not for structured data processing, most of the time you need a solution that just works. When sample data changed and doesn't match anymore, you update the regex string to match it again.
Since you want to get four numbers inside a quote pair after a key called "ip", this regex will definitely do it.
If you want something else, please provide more context. Thanks!

How can I parse this language in Java?

Sorry for the vague title, I didn't know how to describe the problem with just one line.
Basically I'm trying to build a simple parser (manually) for a language with a syntax similar to XML, like this:
<my_language check="somestring">
*strings here*
</my_language>
Where strings here means that inside that there could be anything (but most likely code from another language).
An example of complete code could be something like this:
<my_language check="House">
House myHouse = new House();
house.setAdress("somewhere");
</my_language>
<my_language check="House/Garage">
Garage myGarage = new Garage();
garage.setCar("some car");
</my_language>
The sense of the language is not really relevant right now. What I need is a way to parse this, using a recursive descent parser (made with just a syntax analyzer and a lexical analyzer).
The grammar for the syntax analyzer is not really a problem... what I'm struggling to make is the lexical analyzer that finds the tokens I need.
I recently made another parser similar to this for a language more similar to XML, and I used a StreamTokenizer for the lexical analyzer. In this case though I don't know how can I use it.
With a StreamTokenizer I could easily split the parts like "my_language check="House">" into tokens, but then I would need to take the code inside the tags as a whole (leaving the format intact) and I don't know how can I do that. Basically I would need to take the whole code block instead of word for word, but I know that the StreamTokenizer can't let me do that.
So, what approach should I use?

Java library for cleaning up user-entered title to make it show up in a URL?

I am doing a web application. I would like to have a SEO-friendly link such as the following:
http://somesite.org/user-entered-title
The above user-entered-title is extracted from user-created records that have a field called title.
I am wondering whether there is any Java library for cleaning up such user-entered text (remove spaces, for example) before displaying it in a URL.
My target text is something such as "stackoverflow-is-great" after cleanup from user-entered "stackoverflow is great".
I am able to write code to replace spaces in a string with dashes, but not sure what are other rules/ideas/best practices out there for making text part of a url.
Please note that user-entered-title may be in different languages, not just English.
Thanks for any input and pointers!
Regards.
What you want is some kind of "SLUGifying" the prhase into a URL, so it is SEO-friendly.
Once I had that problem, I came to use a solution provided in maddemcode.com. Below you'll find its adapted code.
The trick is to properly use the Normalize JDK class with some little additional cleanup. The usage is simple:
// casingchange-aeiouaeiou-takesexcess-spaces
System.out.println(slugify("CaSiNgChAnGe áéíóúâêîôû takesexcess spaces "));
// these-are-good-special-characters-sic
System.out.println(slugify("These are good Special Characters šíč"));
// some-exceptions-123-aeiou
System.out.println(slugify(" some exceptions ¥123 ã~e~iõ~u!##$%¨&*() "));
// gonna-accomplish-yadda
System.out.println(slugify("gonna accomplish, yadda, 완수하다, 소양양)이 있는 "));
Function code:
public static String slugify(String input) {
return Normalizer.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
.replaceAll("[^ \\w]", "").trim()
.replaceAll("\\s+", "-").toLowerCase(Locale.ENGLISH);
}
In the source page (http://maddemcode.com/java/seo-friendly-urls-using-slugify-in-java/) you can take a look at where this comes from. The small snippet above, though, works the same.
As you can see, there are some exceptional chars that aren't converted. To my knowledge, everyone that translates them, uses some kind of map, like Djago's urlify (see example map here). You need them, I believe your best bet is making one.
It seems you want to URL-encode a string. It's possible in core Java, without using external libraries. URLEncoder is the class you need.
Languages other than English shouldn't be a problem as the class allows you to specify the character encoding, which takes care of special characters like accents, etc.

Regex exclusion behavior

Ok, so I know this question has been asked in different forms several times, but I am having trouble with specific syntax. I have a large string which contains html snippets. I need to find every link tag that does not already have a target= attribute (so that I can add one as needed).
^((?!target).)* will give me text leading up to 'target', and <a.+?>[\w\W]+?</a> will give me a link, but thats where I'm stuck. An example:
<a href="http://www.someSite.com>Link</a> (This should be a match)
Link (this should not be a match).
Any suggestions? Using DOM or XPATH are not really options since this snippet is not well-formed html.
You are being wilfully evil by trying to parse HTML with Regexes. Don't.
That said, you are being extra evil by trying to do everything in one regexp. There is no need for that; it makes your code regex-engine-dependent, unreadable, and quite possibly slow. Instead, simply match tags and then check your first-stage hits again with the trivial regex /target=/. Of course, that character string might occur elsewhere in an HTML tag, but see (1)... you have alrady thrown good practice out of the window, so why not at least make things un-obfuscated so everyone can see what you're doing?
If you insist on doing it with Regex a pattern such as this should help...
<a(?![^>]*target=) [^>]*>.*?</a>
It's by no means 100% perfect technically speaking a tag can contain a > in places other than then end so it won't work for all HTML tags.
NB. I work with PHP, you may have to make slight syntax adjustments for Java.
You could try a negative lookahead like this:
<a(?!.*?target.*?).*?>[\w\W]+?</a>
I didn't test this and spent about a minute writing it, but for your specific example if you can do it on the client-side, try this via the DOM:
var links = document.getElementsByTagName("a");
for (linkIndex=0; linkIndex < links.length; linkIndex++) {
var link = links[linkIndex];
if (link.href && !link.target) {
link.target = "someTarget"
// or link.setAttribute("target", "someTarget");
}
}

Parsing of data structure in a plain text file

How would you parse in Java a structure, similar to this
\\Header (name)\\\
1JohnRide 2MarySwanson
1 password1
2 password2
\\\1 block of data name\\\
1.ABCD
2.FEGH
3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc
Suppose, it comes from a text buffer (plain file).
Each line of text is "\n" - limited. Space is used between the words.
The structure is more or less defined. Ambuguity may sometimes be, though, case
number of fields in each line of information may be different, sometimes there may not
be some block of data, and the number of lines in each block may vary as well.
The question is how to do it most effectively?
First solution that comes to my head is to use regular expressions.
But are there other solutions? Problem-oriented? Maybe some java library already written?
Check out UTAH: https://github.com/sonalake/utah-parser
It's a tool that's pretty good at parsing this kind of semi structured text
As no one recommended any library, my suggestion would be : use REGEX.
From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).
This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).
First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?
After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.
If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).
Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).
If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.
E.g., the password can be gotten from your password line like this:
final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());

Categories