Unlimited tweet search using java - java

I have a requirement for retrieving all(i mean "all") till a given date or between dates.
But the code i wrote gives me tweets but only for today. I implemented paging but its no help i do get multiple pages and the data is not redundant. But the data is still limited for the current day. I only get like 600-700 tweets. And i used hasNext() and it retrieves false after 6-7 pages.
I'm fairly new to this API and i dont have much idea about the framework so forgive me if i sound really naive.
Heres the code:
Query search=new Query(searchKeyWord);
QueryResult results;
search.setCount(100);
//search.setMaxId(-1);
search.setSince("2013-01-01");
search.lang("en");
// search.setUntil("2013-05-01");
int i=0;
//TwitterFactory.getSingleton().search(search);//
do{
i++;
System.out.println("Page "+i);
results=tweety.search(search);
for(Status stats : results.getTweets()){
Text=stats.getText();
Text=Text.replace("\n", " ");
writer.append(stats.getUser().getScreenName()+";"+Text+";"+stats.getCreatedAt()+";"+"\n");
}
search=results.nextQuery();
} while(search!=null);
The requirement is for text mining on a large amount data so the more tweets retrieved the better. Of course I will restricting the since and until dates. But if i set the dates for an older time interval the tweets are still retrieved only for the last day of that interval.
Am i wrong here somewhere? And is there something I need to add or change to get all the tweets? I'm aware of rate limits. Is this the reason why i receive only limited data?
Thanks in advance.

You should use both search API and Streaming API. I am also working on data mining with twitter data and what I am doing is I just implemented two different apps to collect tweets. You can also do same thing. The streaming API needs only one twitter account for token and authentication stuff.However, you should have more accounts for the search API. If you have more questions let me know.

Related

Best way to gather large volume of tweets?

So I am currently trying to gather tweets on a specific location and then analyse what is going on in that location from the tweets gathered. My task basically involves a lot of data mining.
The main problem I have come across however is gathering enough tweets that will allow me to make a judgement.
I have been using the Twitter Streaming API, however this only gives 1% of all the tweets which is far from enough. I mined 100,000 tweets and very little were in English let alone related to the location I was looking for.
I have also noticed that twitter rate limits how often you can call a method via their API. How are sites like trendsmap.com working? Are they somehow accessing a larger data set?
Edit: Ok, so I have tried to use the geolocation feature in the twiiter4j API. Turns out the rate limits can be avoided if you are careful with your implementation. The amount of people however that actually have the geolocation feature turned on when tweeting is very low. This therefore does not represent people in that area. I seem to be getting the same tweets every single time. Twitter does offer a search operator "near" which works great on their website. However they have not included this functionality in their API as far as I can tell.
If you are searching using the Twitter API you can restrict your searches to a specific geolocation using the geocode option.
You can use result_type=recent to ensure you're only getting the most recent tweets.
The maximum count - that is, number of tweets per request - is 100.
The current limit on number of search requests per hour is 450.
So, that's a maximum of 45,000 tweets per hour - is that enough for you?
tl:dr - use the most restrictive set of search parameters to limit the results to those you actually need.

Getting All Tweets From a Country Within A Time Period at Java

I am working on a project that I will get all tweets from a country that has tweeted within a certain time period. I will make a data mining on it after that(examining that how many positive thoughts are said for a certain pupil etc.). I want to use Java as programming language. However I don't know how to start this project. I made a search and I know that there is:
Twitter's Search API
Twitter's Streaming API
Twitter4J a twitter API for Java
Something interesting here out of Java : http://dev.datasift.com/discussions/category/csdl-language
Where I can start to get all tweets from a country(if it can be from a given state) within a time period. Some examples are like: you are giving a username and it returns the tweets if it is a public profile. I don't have the list of all public profiles. Should I handle that problem and how?
Any ideas?
If you gonna use Java Twitter4j is your best shot.
But you gonna have to choose a strategy for retrieving the tweets that you want.
You can either get the data from Twitter itself or get it from a Data Provider which has full Firehose Access. DataSift and Gnip are those providers which has full access to Firehose.If you want to use a data provider DataSift is the way to go because of its own query language which is pretty cool.
In case of retrieving the data by yourself.
Firstly if you want to get the Tweets in real time you need to use Twitter Streaming API and Twitter4j makes it really easy to use it.But unfortunately Streaming API doesn't support country or language filtering.You can listen the Streaming API for the search queries that you are registered for.
Your second option is Search API.Twitter4j also makes using Search API pretty easy.Search API supports much more filtering options.But there isn't any way to filter tweets for country.But instead of that filtering tweets depending on the Language is much more useful way to do that. E.g filtering tweets that are en,fr or so on.
Hope this helps.
You want to use the search API. However, the API doesn't allow searching by country, only by geocode.
in Twitter4J
You can get location like this.
tweet.getUser().getLocation()
But it gets user's location input field.

How to get around the Twitter 3200 status limit? [duplicate]

With https://dev.twitter.com/docs/api/1/get/statuses/user_timeline I can get 3,200 most recent tweets. However, certain sites like http://www.mytweet16.com/ seems to bypass the limit, and my browse through the API documentation could not find anything.
How do they do it, or is there another API that doesn't have the limit?
You can use twitter search page to bypass 3,200 limit. However you have to scroll down many times in the search results page. For example, I searched tweets from #beyinsiz_adam. This is the link of search results:
https://twitter.com/search?q=from%3Abeyinsiz_adam&src=typd&f=realtime
Now in order to scroll down many times, you can use the following javascript code.
var myVar=setInterval(function(){myTimer()},1000);
function myTimer() {
window.scrollTo(0,document.body.scrollHeight);
}
Just run it in the FireBug console. And wait some time to load all tweets.
The only way to see more is to start saving them before the user's tweet count hits 3200. Services which show more than 3200 tweets have saved them in their own dbs. There's currently no way to get more than that through any Twitter API.
http://www.quora.com/Is-there-a-way-to-get-more-than-3200-tweets-from-a-twitter-user-using-Twitters-API-or-scraping
https://dev.twitter.com/discussions/276
Note from that second link: "…the 3,200 limit is for browsing the timeline only. Tweets can always be requested by their ID using the GET statuses/show/:id method."
I've been in this (Twitter) industry for a long time and witnessed lots of changes in Twitter API and documentation. I would like to clarify one thing to you. There is no way to surpass 3200 tweets limit. Twitter doesn't provide this data even in its new premium API.
The only way someone can surpass this limit is by saving the tweets of an individual Twitter user.
There are tools available which claim to have a wide database and provide more than 3200 tweets. Few of them are followersanalysis.com, keyhole.co which I know of.
You can use a tool I wrote that bypasses the limit.
It saves the Tweets in a JSON format.
https://github.com/pauldotknopf/twitter-dump
You can use a Python library snscrape to do it. Or you can use ExportData tool to get all tweets for the user, which returns already preprocessed CSV and spreadsheet files. The first option is free, but has less information and requires more manual work.

Building network graph from twitter users by subject

I'm trying to construct a social network graph of twitter users who have mentioned a particular topic. My strategy to do this goes roughly like this:
Query twitter for a topic. Collect the first 100 tweets that come up and add those users to the graph.
For each user:
Retrieve friends and followers.
Query each friend/follower for the topic. If they turn up a result (meaning they've discussed the topic), add them to the graph.
For each user that was added to the graph, return to step 2 until the desired search depth is reached.
My problem is two-fold. First of all, this approach quickly exceeds my search API rate limit. Even with a search depth of 2, it's quite likely that I'll find people with 100+ friends/followers and I am unable to query them all before hitting the rate limit.
Secondly, this all takes quite awhile. Twitter API is not fast. In the hypothetical event that I was not rate limited, I could submit the requests asynchronously, but I can't help wondering if there is a more efficient way.
I've tried aggregating the requests into one query per search depth:
topic AND from:name1 OR from:name2 .... OR from:namei
This basically explodes. I get a connection reset error from the twitter API. If I copy the query into the twitter web page, it just sits for awhile and then says "loading tweets seems to be taking awhile."
I also emailed api#twitter.com to ask for suggestions / access increase, but no response so far.
If anyone has any suggestions on how to go about gathering this type of information through the twitter API, I would very much appreciate it. I am currently using twitter4j and java.
Have you tried just using a filtered stream for a topic, and building the graph using mentions and retweets? This is quite indirect, and will still be slow, but won't hit any rate limits.
See http://truthy.indiana.edu/ and http://cnets.indiana.edu/groups/nan/truthy

Does facebook graph api provide support for searching Post messages by location?

Does facebook graph api provide support for searching Post messages by specific location?
It's going to be a multi-step process.
Find the places within your criteria. https://graph.facebook.com/search?q=coffee&type=place&center=37.76,-122.427&distance=1000&fields=id
Loop thru each of those ids, and build yourself a multi FQL query with up to 50 queries (50 max is all facebook allows) fql?q=SELECT post_id, message, attachment FROM stream WHERE source_id = {pageId}
Query results, and add to your list of posts
repeat 2-3 with the remaining ids returned from the search.
display list of posts.
You can now directly search for posts, photos near a location:
https://developers.facebook.com/docs/reference/api/#searching
It was announced yesterday:
http://developers.facebook.com/blog/post/2012/03/07/building-better-stories-with-location-and-friends/
An example from the docs:
https://graph.facebook.com/search?type=location&center=37.76,-122.427&distance=1000&access_token=whatever
No.there is no way to search posts by location using graph api.please check this for more details https://developers.facebook.com/tools/explorer
If there is some Place object for your location, then yes you can.

Categories