How extract the data from a list? - java

I'm developing an Android application and I want to recognize hashtags, mentions and links. I have a code that can be usable in objective-c that do my propose. I question these and now I have these code:
import java.net.URL;
import java.util.List;
String input = /* text from edit text */;
String[] words = input.split("\\s");
List<URL> urls=null;
for (String s : words){
try
{
urls.add(new URL(s));
}
catch (MalformedURLException e) {
// not a url
}
}
Now I want to put these on a tweet, I have developed the code to do it, and the tweet is based on an string. My question is how I put the data from the list in the string?
//I test these
String tweet="Using my app"+urls
But in the tweet appears "Using my appnull"
How I reuse this code to recognize hashtags and mentions?
I think that is changing the input.split("\\s") by "#\\s" or "#\\s"

You could just use a library here:
https://github.com/twitter/twitter-text-java
that does what you're trying to do.

Related

Why is JSoup printing a question mark

I'm trying to understand the following. I have some code reading a page from gutenberg.org. Almost everything is ok but some characters are not. They are ok in the browser.
package nl.atticworks.gutenberg;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
public class Gutenberg {
private static final String GET_URL = "http://www.gutenberg.org/browse/languages/nl";
public static void main(String[] args) {
try {
Document doc = Jsoup.connect(GET_URL).get();
Elements data = doc.select("div.pgdbbylanguage");
for (Element d : data) {
Elements children = d.select("*");
for (Element child : children) {
if (child.tagName().equals("ul")) {
Element author = children.get(children.indexOf(child) - 1);
String a1 = author.select("a:last-child").text();
if (a1.startsWith("Kara")) {
System.out.println(a1);
Elements titles = child.select("li.pgdbetext a");
for (Element title : titles) {
System.out.println("\t" + title.text());
}
}
}
}
}
} catch (IOException ex) {
// do something...
}
}
}
The string a1 prints "Karadži?, Vuk Stefanovi?, 1787-1864" but should print "Karadžić, Vuk Stefanović, 1787-1864"
I'm pretty sure that the encoding is ok (UTF-8) but the c with acute isn't encoded properly.
Still, browsers do show the correct char, Jsoup doesn't. Why?
Regards,
Hans
As you haven't said what you are running your program in it is difficult to give a definitive answer, but basically there is nothing wrong with your code. JSoup is not responsible for your display problem, whichever console you are displaying on is the problem.
If you set your console (or IDE) to the UTF-8 encoding it should display correctly.
I tried this code on my own IDEA,and the output was just as you expected.
So I insist that the encoding is the problem.

Renaming video files with a Java program

I have a folder containing many videos that i'd like to rename. I can't think of any convenient way of doing so. The naming convention is the following "SeasonX, EpisodeY: Episode name". This is going to be "SXEY:Name" for short.
An example: S01E01:JavaCode
That would be Season One, Episode One of Episode called JavaCode.
I wrote something that is able to change the file names, but I need different and unique file names for every episode because it's a TV show.
Here's the code:
import java.io.File;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class BatchFileRenamer {
public static void main(String[] args) {
// TODO Auto-generated method stub
File folder = new File("C:\\Users\\Tony\\Videos\\New folder");
TreeMap map = new TreeMap();
String name = "name";
File[] files = folder.listFiles();
Pattern p = Pattern.compile("\\..*");
for (int i = 0; i != files.length; i++) {
Matcher m = p.matcher(files[i].getName());
System.out.println(files[i].getName());
m.find();
files[i].renameTo(new File(folder.getAbsolutePath() + "\\" + name + " S01E" +
(i < 10 ? "1" : "") + i + m.group()));
}
}
}
I was thinking of creating an array containing the episode names but that's just as much work as manually renaming them in Windows. I guess if I had a txt file to download for all the TV shows with the names of the episodes in it it'd be useful.
Anyway, any suggestions would be greatly appreciated!
I think the best way to do this would be to use the Open Movie Database API. With this, you can get a REST response including a list of episodes for each season of a show. (Example request).
With this, you could use Gson or another parser to serialize the list of episodes:
Here is a Gist of some sample code. (There is probably a better getter method, but you get the point)
What the code does is it gets the information from the sample request above via the API, then it serializes it into a basic POJO from the Episodes.java class using Gson:
Gson gson = new Gson();
Episodes episodes = gson.fromJson(download, Episodes.class);
System.out.println(episodes);
You can then use this information to create the individual file names for the video files.

jsoup: How to extract correct data from this website

I am trying to extract data from a Spanish dictionary using jsoup. Essentially, the user will input words he wants to define as command line arguments and the program will return a formatted list of definitions. Here is what I have done so far:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class Main {
public static void main(String[] args) {
String[] urls = new String[args.length];
for(int i=0; i<args.length; i++) {
urls[i] = "http://www.diccionarios.com/detalle.php?palabra="
+ args[i]
+ "&Buscar.x=0&Buscar.y=0&Buscar=submit&dicc_100=on&dicc_100=on";
try{
Document doc = Jsoup.connect(urls[i]).get();
Elements htmly = doc.getElementsByTag("html");
String untokenized = htmly.text();
System.out.println(untokenized);
}catch (Exception e) {
System.out.println("EXCEPTION: Word is probably not in this dictionary.");
}
}
}
}
That url array gives the correct urls where the information for the definition is.
Now, what I'm expecting to be returned is what you would get if you went to the try.jsoup website and used (for example) this : http://www.diccionarios.com/detalle.php?palabra=libro&Buscar.x=0&Buscar.y=0&Buscar=submit&dicc_100=on&dicc_100=on
as the link and typed in html as the CSS Query. I need that data so I can tokenize the definition from that.
So I guess my question is, what method would I use to obtain the same data that you can see on the try.jsoup website. Thanks a lot!
Edit: This is about interpreting the data from the url. The end result data I want (in this example) is "Conjunto de hojas escritas unidas o cosidas por uno de sus lados y cubiertas por tapas de cartón u otro material." That is the definition on the website. However, I noticed that on that try.jsoup website that if I put the html text in the CSS Query box then the result was a huge bunch of text. My assumption was that the following 2 lines of code would capture this huge bunch of text and save it as a string:
Elements htmly = doc.getElementsByTag("html");
String untokenized = htmly.text();
However, the output for when I print untokenized is instead this: "Usuario Clave ¿Olvidaste tu clave? Condiciones Privacidad Versión completa © 2011 Larousse Editorial, SL." So my question is, how to I obtain the string data for that huge bunch of text found on the try.jsoup website?
EDIT: I followed the advice of the question here: Jsoup - CSS Query selector issue (?) and it worked great.

Quote retweets Twitter4j

I have a problem when using twitter4j, when I get timeline using this code :
try {
ResponseList<Status> tweets = twitter.getHomeTimeline();
for(Status s : tweets){
Tweet temp = new Tweet(new URL(s.getUser().getProfileImageURL()),s.getUser().getName(),"#"+s.getUser().getScreenName() , s.getText());
tweetsPanel.add(temp);
}
} catch (TwitterException | MalformedURLException e) {
e.printStackTrace();
}
(Tweet is local class) everything is OK except the retweets in the timeline are displayed as "Quote Retweet":
RT #SOMEONE : the tweet.
I want it like the website, just a normal retweet.
on twitter4j the retweets are shown in format
RT #user : tweet
because is the actual form a rt takes in text. If you put on twitter a tweet with this same format it will be parsed as a normal retweet from twitter itself.
the only way you can edit this out is to parse the text and eliminate the first part manually.
try something like:
String tweetText = "";
String [] splitted=s.getText().split(":");
if(splitted.lenght>2)
for (int i=1;i<splitted.lenght-1;i++)
{
tweetText+=splitted[i]+":";
}
tweetText+=splitted[splitted.lenght-1];
return tweetText;
by starting the for on i=1 you will avoid adding the first split that contains the RT #user, by adding the splitted[i]+":" you will put back eventual other ":" present in the tweet that split will otherwise eliminate. Of course you don't want to introduce a ":" that was not there, so the last piece of splitted goes outside the for, without the +":"

How to get specific lines in a text file

I am trying read the contents of a text file. The idea is to get the first line with the 'title :' keyword, read the file, get the next 'title:' keyword again, keep doing it until the file is read. I am trying to store it in a database. Other ideas to do this are welcomed as well. Thanks.
This is the text file I am trying to read from.
title : Mothers Day
mattiebelle : YEA! A movie that grabbed me from beginning to end! Love to come across this kind of movie. A must see for all! Enjoy!
title : Pregnant in Heels
CuittePie : I CAN'T WATCH ANY OF THEES. :#
title : The Flintstones
Row_Sweet_Girl : Nice one to watch
title : Barter Kings
dragon3476 : Barter Kings - Season 1 Episode 4 - Rock and a Hard Place Air Date: 19/06/2012 Summary:Traders barter for a car and a pool table.
I think the easiest way would be using FileUtils from Apache Commons IO like this:
import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.commons.io.FileUtils;
public class ReadFileLines {
public static void main(final String[] args) throws IOException {
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).startsWith("title : ")) {
System.out.println(line); // here you store it
}
}
}
}

Categories