Spanish language encoding issue with Java Properties class load method

Spanish language encoding issue with Java Properties class load method - java

I am trying to encode the Spanish language for internationalization and use the Java Properties class load method to populate it to pass it to frontend.
I have tried to encode it using UTF-8 but still the accent characters are not coming correctly.
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Properties;
public class Message extends Properties {
public static void main(String[] args) throws IOException {
String spanish = "label=Sí";
Message messages = new Message();
messages.load(new ByteArrayInputStream(spanish.getBytes(Charset.forName("UTF-8"))));
System.out.println(messages.get("label"));
}
}
When i run the above code I am getting the text as "SÃ-". How can I retrieve the same text as "Sí"?

Try using ISO-8859-1 it gets expected result:
messages.load(new ByteArrayInputStream(spanish.getBytes(Charset.forName("ISO-8859-1"))));
More about ISO-8859-1 can be obtained from this link:
https://en.wikipedia.org/wiki/ISO/IEC_8859-1

Related

jsoup Not Catching Full Webpage

Trying to make a super simple bit of code using jsoup to see if a webpage contains a specific word (checks to see the availability of a class to take in-residence). jsoup seems to be catching lots of the webpage specified, but won't capture all of it - specifically the area I'm interested in. Is there a reason for this? Am I doing something wrong?
import org.jsoup.Jsoup;
import java.io.IOException;
import org.jsoup.nodes.Document;
public class main {
public static void main(String[] args) throws IOException{
String html = Jsoup.connect("https://www.afit.edu/CE/Course_Desc.cfm?p=WENG%20481").maxBodySize(0).get().html();
System.out.print(html);
if (html.contains("Resident")) {
System.out.print("\nAVAILABLE!");
}
}
}

Stanford Parser - use german model jar

I want to use stanford parser within the coreNLP.
I already got this example working:
http://stanfordnlp.github.io/CoreNLP/simple.html
BUT: I need the german model. So i downloaded "stanford-german-2016-01-19-models.jar".
But how can I set this jar file for usage?
I only found:
LexicalizedParser lp = LexicalizedParser.loadModel("englishPCFG.ser.gz");
but i have a jar with the germn models, NOT a ...ser.gz.
Can anyboady help?

Here is some sample code for parsing a German sentence:
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.simple.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.PropertiesUtils;
import edu.stanford.nlp.util.StringUtils;
import java.util.*;
public class SimpleGermanExample {
public static void main(String[] args) {
String sampleGermanText = "...";
Annotation germanAnnotation = new Annotation(sampleGermanText);
Properties germanProperties = StringUtils.argsToProperties(
new String[]{"-props", "StanfordCoreNLP-german.properties"});
StanfordCoreNLP pipeline = new StanfordCoreNLP(germanProperties);
pipeline.annotate(germanAnnotation);
for (CoreMap sentence : germanAnnotation.get(CoreAnnotations.SentencesAnnotation.class)) {
Tree sentenceTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
System.out.println(sentenceTree);
}
}
}
Make sure you download the full toolkit to use this sample code.
http://stanfordnlp.github.io/CoreNLP/
Also make sure you have there German models jar in your CLASSPATH. The code above will know to look at all the jars in your CLASSPATH and will recognize that file as being in the German jar.

First of all: This works, Thank you!
But, I don't need this complex way with all these annotators. Thats why I wanted to start with the simple CoreNLP Api. Thats my code:
import edu.stanford.nlp.simple.*;
import java.util.*;
public class Main {
public static void main(String[] args) {
Sentence sent = new Sentence("Lucy is in the sky with diamonds.");
List<String> posTags = sent.posTags();
List<String> words = sent.words();
for (int i = 0; i < posTags.size(); i++) {
System.out.println(words.get(i)+" "+posTags.get(i));
}
}
}
How can I get the german prperties file work with this example?
Or the other way: How do I get only the word with the pos tag in your example?

The german equivalent to the english example is the following:
LexicalizedParser lp = LexicalizedParser.loadModel("germanPCFG.ser.gz");
Extract the latest stanford-german-corenlp-2018-10-05-models.jar file and you will find it inside the folder: stanford-german-corenlp-2018-10-05-models\edu\stanford\nlp\models\lexparser

why i am not able to read Html Content from a website in a file?

I have made a java program where in i can use any website to read its Html Content using Scanner class and Varargs.I am not able to get the output while i am using Scanner class and VarArgs.
Below is the following Code.
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.Scanner;
public class ReadWebsite
{
public static void main(String[] args) throws Exception
{
URL oracle = new URL(args[0]);
Scanner s=new Scanner(oracle.openStream());
while (s.hasNext())
{
System.out.println(s.nextLine());
}
s.close();
}
}
OutputShown
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at oodlesTech.ReadWebsite.main(ReadWebsite.java:15)

if you are running from eclipse you have to pass the arguments.
right click program - > run as - > run configurations -> arguments ->program arguments . in this tab pass the actual url which will be passed as args[0] to your main method.

You are not passing the argument to your java program.
For your testing you can either hard code it in code e.g. URL oracle = new URL("http://www.google.com"); or pass an argument to your java program, explained here

Run a String through Java using Pig

I have a UDF jar which takes in a String as an input through Pig. This java file works through pig fine as running a 'hard coded' string such as this command
B = foreach f generate URL_UDF.mathUDF('stack.overflow');
Will give me the output I expect
My question is I am trying to get information from a text file and use my UDF with it. I load a file and want to pass data within that file which I have loaded to the UDF.
LoadData = load 'data.csv' using PigStorage(',');
f = foreach LoadData generate $0 as col0, $1 as chararray
$1 is the column I needed and researching data types (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types) a char array is used.
I then tryed using the following command
B = foreach f generate URL_UDF.mathUDF($1);
to pass the data into the jar which fails stating
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
If anybody has any solution to this that would be great.
The java code I am running is as follows
package URL_UDF;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;
import org.apache.commons.logging.Log;
import org.apache.*;
public class mathUDF extends EvalFunc<String> {
public String exec(Tuple arg0) throws IOException {
// TODO Auto-generated method stub
try{
String urlToCheck = (String) arg0.get(0);
return urlToCheck;
}catch (Exception e) {
// Throwing an exception will cause the task to fail.
throw new IOException("Something bad happened!", e);
}
}
}
Thanks

You can specify the schema with LOAD as follows
LoadData = load 'data.csv' using PigStorage(',') AS (col0: chararray, col1:chararray);
and pass col1 to the UDF.
Or
B = foreach LoadData generate (chararray)$1 AS col1:chararray;
Actually, this is a bug (PIG-2315) in Pig which will be fixed in 0.12.1. The AS clause in foreach does not work as one would expect.

Error: Could not find or load main class- Novice

Hi I am a novice in JAVA. I have been getting this file not found exception inspite of the file existing in the very location I have specified in the path which is
Initially I had the issue of file not found. However, after performing a clean and re-run, now I am having an issue which says
Error: Could not find or load main class main.main
import Message.*;
import java.util.*;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.PrintWriter;
public class main{
public static void main(String[] args) {
Message msg=new Message("bob","alice","request","Data####");
MPasser passerObj=new MPasser("C:\\Workspace\\config.txt","process1");
}
}
Also in the MPasser Constructor the following piece of relevant code is there
public class MPasser(String file_name,String someVariable){
InputStream input;
try {
input =new RandomAccessFile(file_name,"r");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Yaml yaml = new Yaml();
Map<String, String> Object = (Map<String, String>) yaml.load(input);
}
Sorry I have made edits from initial query so that it is more clear

On this line:
input = RandomAccessFile("C:\Workspace\conf.txt",'r');
You need to escape the \'s
input = RandomAccessFile("C:\\Workspace\\conf.txt",'r');

"C:\Workspace\conf.txt"
Those are escape sequences. You probably meant:
"C:\\Workspace\\conf.txt"
You also appear to call it config.txt in one snippet and conf.txt in the other?

Make sure the java process has permissions to read the file.

You have to escape the backslash.
input = RandomAccessFile("C:\\Workspace\\conf.txt",'r');
and also
input = new RandomAccessFile("C:\\Workspace\\conf.txt",'r');
and why you have two different filename conf.txt and config.txt.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spanish language encoding issue with Java Properties class load method - java

Try using ISO-8859-1 it gets expected result: messages.load(new ByteArrayInputStream(spanish.getBytes(Charset.forName("ISO-8859-1")))); More about ISO-8859-1 can be obtained from this link: https://en.wikipedia.org/wiki/ISO/IEC_8859-1

Related

jsoup Not Catching Full Webpage

Stanford Parser - use german model jar

why i am not able to read Html Content from a website in a file?

Run a String through Java using Pig

Error: Could not find or load main class- Novice

Categories

Resources