Best way to gather large volume of tweets?

Best way to gather large volume of tweets? - java

So I am currently trying to gather tweets on a specific location and then analyse what is going on in that location from the tweets gathered. My task basically involves a lot of data mining.
The main problem I have come across however is gathering enough tweets that will allow me to make a judgement.
I have been using the Twitter Streaming API, however this only gives 1% of all the tweets which is far from enough. I mined 100,000 tweets and very little were in English let alone related to the location I was looking for.
I have also noticed that twitter rate limits how often you can call a method via their API. How are sites like trendsmap.com working? Are they somehow accessing a larger data set?
Edit: Ok, so I have tried to use the geolocation feature in the twiiter4j API. Turns out the rate limits can be avoided if you are careful with your implementation. The amount of people however that actually have the geolocation feature turned on when tweeting is very low. This therefore does not represent people in that area. I seem to be getting the same tweets every single time. Twitter does offer a search operator "near" which works great on their website. However they have not included this functionality in their API as far as I can tell.

If you are searching using the Twitter API you can restrict your searches to a specific geolocation using the geocode option.
You can use result_type=recent to ensure you're only getting the most recent tweets.
The maximum count - that is, number of tweets per request - is 100.
The current limit on number of search requests per hour is 450.
So, that's a maximum of 45,000 tweets per hour - is that enough for you?
tl:dr - use the most restrictive set of search parameters to limit the results to those you actually need.

Related

How can I use Google Places API multiple times in a location aware app?

I am developing an app which will give you nearby Mosques within 10 km of your current location. Now that the Places API allows a certain number of queries per day, I have used firebase to store nearby Mosques for a certain location and I first check if the data is in database or not before querying. But this still doesn't solve the problem. e.g. if a user is on the go the whole day then the results must be changing every single minute, according to his/her location. How can I achieve the desirable results?
As mentioned earlier, I am saving nearby locations in a database with their relative location (around which they exist). But this doesn't quite solve the problem.
Any help will be greatly appreciated.

Places API is a commercial offering - you are meant to pay for using it, if you want to make applications around it.
There's a certain small number of calls that you can do for free, but this is only meant as testing grounds or private use. I am no lawyer, but I would guess that circumventing the fee by scraping the map (like setting a bot to go around a country to build a database of points of interests) would be illegal and would probably get you a letter from Google saying you should stop.

Use AutocompleteSessionToken class to generate a token and place it after your key , this token will reduce your usage because you can request the places api multiple times and still it will be considered as a single request. i hope this will help cause i didnt get your question very well. here is sample of the link:
https://maps.googleapis.com/maps/api/place/autocomplete/json?input=1600+Amphitheatre&key=&sessiontoken=1234567890.
For more details.see here

Reduction in georeferenced tweets

While streaming twitter data, I found that there has been an obvious reduction in geo-referenced tweets (tweets with lat and lon). Is it because of the Foursquare information integration? Or are there any other issues?
Many thanks!

I worked on a Social Analytics by Location application last year. We sampled tweets from twitter with the intension of using the geolocation attributes to determine the sentiment by area. Unfortunately we found that only between 10-15% of tweets (based on our own findings) were actually geo-tagged which was not enough to provide an accurate depiction of sentiment. Instead we opted for using location indicative hashtags.
In saying that it depends on the sample size. We were trying to determine sentiment in areas such as buildings which had a small amount of active twitter users. If your aim is to find tweets within much larger areas such as Towns/Cities/Countries then 10-15% is probably enough for your needs.
To answer your original question: users are generally private unless they explicitly intend to checkin somewhere and so my guess is that the 10-15% of tweets that are geo-located are a result of users forgetting to disable geo-location or using a new/infrequently used device where it is not disabled. It can also be attributed to foursquare information integration as I'm sure users just overlook the fact that foursquare provides twitter with the geolocation information.
This article is an interesting read. It outlines an application developed by the University of SoCal that can help users identify if they are giving away sensitive/private location information with their tweets.

Getting All Tweets From a Country Within A Time Period at Java

I am working on a project that I will get all tweets from a country that has tweeted within a certain time period. I will make a data mining on it after that(examining that how many positive thoughts are said for a certain pupil etc.). I want to use Java as programming language. However I don't know how to start this project. I made a search and I know that there is:
Twitter's Search API
Twitter's Streaming API
Twitter4J a twitter API for Java
Something interesting here out of Java : http://dev.datasift.com/discussions/category/csdl-language
Where I can start to get all tweets from a country(if it can be from a given state) within a time period. Some examples are like: you are giving a username and it returns the tweets if it is a public profile. I don't have the list of all public profiles. Should I handle that problem and how?
Any ideas?

If you gonna use Java Twitter4j is your best shot.
But you gonna have to choose a strategy for retrieving the tweets that you want.
You can either get the data from Twitter itself or get it from a Data Provider which has full Firehose Access. DataSift and Gnip are those providers which has full access to Firehose.If you want to use a data provider DataSift is the way to go because of its own query language which is pretty cool.
In case of retrieving the data by yourself.
Firstly if you want to get the Tweets in real time you need to use Twitter Streaming API and Twitter4j makes it really easy to use it.But unfortunately Streaming API doesn't support country or language filtering.You can listen the Streaming API for the search queries that you are registered for.
Your second option is Search API.Twitter4j also makes using Search API pretty easy.Search API supports much more filtering options.But there isn't any way to filter tweets for country.But instead of that filtering tweets depending on the Language is much more useful way to do that. E.g filtering tweets that are en,fr or so on.
Hope this helps.

You want to use the search API. However, the API doesn't allow searching by country, only by geocode.

in Twitter4J
You can get location like this.
tweet.getUser().getLocation()
But it gets user's location input field.

How to get around the Twitter 3200 status limit? [duplicate]

With https://dev.twitter.com/docs/api/1/get/statuses/user_timeline I can get 3,200 most recent tweets. However, certain sites like http://www.mytweet16.com/ seems to bypass the limit, and my browse through the API documentation could not find anything.
How do they do it, or is there another API that doesn't have the limit?

You can use twitter search page to bypass 3,200 limit. However you have to scroll down many times in the search results page. For example, I searched tweets from #beyinsiz_adam. This is the link of search results:
https://twitter.com/search?q=from%3Abeyinsiz_adam&src=typd&f=realtime
Now in order to scroll down many times, you can use the following javascript code.
var myVar=setInterval(function(){myTimer()},1000);
function myTimer() {
window.scrollTo(0,document.body.scrollHeight);
}
Just run it in the FireBug console. And wait some time to load all tweets.

The only way to see more is to start saving them before the user's tweet count hits 3200. Services which show more than 3200 tweets have saved them in their own dbs. There's currently no way to get more than that through any Twitter API.
http://www.quora.com/Is-there-a-way-to-get-more-than-3200-tweets-from-a-twitter-user-using-Twitters-API-or-scraping
https://dev.twitter.com/discussions/276
Note from that second link: "…the 3,200 limit is for browsing the timeline only. Tweets can always be requested by their ID using the GET statuses/show/:id method."

I've been in this (Twitter) industry for a long time and witnessed lots of changes in Twitter API and documentation. I would like to clarify one thing to you. There is no way to surpass 3200 tweets limit. Twitter doesn't provide this data even in its new premium API.
The only way someone can surpass this limit is by saving the tweets of an individual Twitter user.
There are tools available which claim to have a wide database and provide more than 3200 tweets. Few of them are followersanalysis.com, keyhole.co which I know of.

You can use a tool I wrote that bypasses the limit.
It saves the Tweets in a JSON format.
https://github.com/pauldotknopf/twitter-dump

You can use a Python library snscrape to do it. Or you can use ExportData tool to get all tweets for the user, which returns already preprocessed CSV and spreadsheet files. The first option is free, but has less information and requires more manual work.

Building network graph from twitter users by subject

I'm trying to construct a social network graph of twitter users who have mentioned a particular topic. My strategy to do this goes roughly like this:
Query twitter for a topic. Collect the first 100 tweets that come up and add those users to the graph.
For each user:
Retrieve friends and followers.
Query each friend/follower for the topic. If they turn up a result (meaning they've discussed the topic), add them to the graph.
For each user that was added to the graph, return to step 2 until the desired search depth is reached.
My problem is two-fold. First of all, this approach quickly exceeds my search API rate limit. Even with a search depth of 2, it's quite likely that I'll find people with 100+ friends/followers and I am unable to query them all before hitting the rate limit.
Secondly, this all takes quite awhile. Twitter API is not fast. In the hypothetical event that I was not rate limited, I could submit the requests asynchronously, but I can't help wondering if there is a more efficient way.
I've tried aggregating the requests into one query per search depth:
topic AND from:name1 OR from:name2 .... OR from:namei
This basically explodes. I get a connection reset error from the twitter API. If I copy the query into the twitter web page, it just sits for awhile and then says "loading tweets seems to be taking awhile."
I also emailed api#twitter.com to ask for suggestions / access increase, but no response so far.
If anyone has any suggestions on how to go about gathering this type of information through the twitter API, I would very much appreciate it. I am currently using twitter4j and java.

Have you tried just using a filtered stream for a topic, and building the graph using mentions and retweets? This is quite indirect, and will still be slow, but won't hit any rate limits.
See http://truthy.indiana.edu/ and http://cnets.indiana.edu/groups/nan/truthy

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.