Unable to scrape full data on Tiktok

Unable to scrape full data on Tiktok - java

I've been trying to scrape data about a user's Tiktok posts through an API.
The HTTP request works well, but the problem is that I'd need infos about the 170 last videos or so from the aforementioned user , and the API only returns the last 20 videos, nothing more, even when I specify the timestamps, that is between the 20th December 2021 and the date of today.
Here's the code I used for my request:
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://tiktok-scraper2.p.rapidapi.com/user/videos?sec_uid={SEC_UID}&user_id={ID}&user_name={USERNAME}&min_cursor=1639958400000&max_cursor=1674932652000"))
.header("X-RapidAPI-Key", "{KEY}")
.header("X-RapidAPI-Host", "tiktok-scraper2.p.rapidapi.com")
.method("GET", HttpRequest.BodyPublishers.noBody())
.build();
HttpResponse<String> response;
try {
response = HttpClient.newHttpClient().send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());
} catch (Exception e) {
System.err.println("Erreur lors de l'exécution de la requête : \n" + e.getMessage());
}
And this is what I get from the console prints out (it doesn't print "xxx" by the way, I prefer to keep my info secure) after making it more readable in Notepad++:
{"itemList":[{"id":"7193xxxxxxxxxxxxxxx","video":{"cover":"xxx"}
,"stats":{"commentCount":206,"diggCount":57100,"playCount":515500,"shareCount":71}
}
,{"id":"7191xxxxxxxxxxxxxxx","video":{"cover":"xxx"}
,"stats":{"commentCount":25,"diggCount":23600,"playCount":153300,"shareCount":22}
}
, (...) // Data
,{"id":"7174xxxxxxxxxxxxxxx","video":{"cover":"xxx"}
,"stats":{"commentCount":2,"diggCount":272,"playCount":4737,"shareCount":1}
}
],"statusCode":0,"status_code":0}
Now this is stopping at timestamp 1670284800000, which means I'm missing almost a year of data...
I must also mention that I've tried with other APIs, but they couldn't return more than 20 or 30 results, and only the latest ones, even when specified otherwise. I've inspected the source code on the profile page and when doing Ctrl+F and searching diggCount it also returns about 30 results, in spite of me having scrolled all the way down to the first videos I want to analyse, I don't know if the problem might come from there ?
Anyways anybody knows how to fix this problem is welcome, any advice is appreciated, even another functioning API, as this one seems to be the best free one I've been able to find, but I might have missed out on some better ones.
Thank you in advance :)

Related

How to send "Security : low" to DVWA as cookie parameter using Jsoup?

I am trying to send a request using jsoup with manually inserted cookies for the purpose of detecting SQL injection vulnerability.
The problem seams to be that only one of the cookies works and i don't understand why.
I first authenticate myself manually and get the cookies. Example:
PHPSESSID : b74302c3c6af62d23047a450a40cbf5a
security : high
After i got the cookies i send my request (which whould look like this from the browser http://localhost:8090/dvwa/vulnerabilities/sqli/?id='&Submit=Submit#) using the same PHPSESSID but with Security : low. The purpose is to force a "You have an error in your SQL syntax" response that signals SQL Injection vulnerability. The problem is that the PHPSESSID is received good (since I retrieve the dvwa/vulnerabilities/sqli page and not Login page thus it recognizes the PHPSESSID as valid after authentication) but the "security : low" seems not to work. I can't find the problem.
The jsoup code for an initial connection so that i can parse the forms on the page looks like this. I supply the cookies manually.
Connection connection = Jsoup.connect(urlDTO.getUrl())
.userAgent(StringConstants.USER_AGENT)
.cookies(cookies)//Map<String,String>
.referrer(StringConstants.REFERRER);
Document htmlDocument = connection.get();
For sending the form i use this code:
Connection connection = Jsoup.connect(formDTO.getUrl())
.userAgent(StringConstants.USER_AGENT)
.cookies(cookies)
.data(listToMap(formDTO.getInputList()))// id = ' , Submit = Submit
.method(getMethod(formDTO.getMethod()))
.referrer(StringConstants.REFERRER);
Connection.Response res = connection.execute();
Document doc = res.parse();
Does anyone know what I'm doing wrong?

After much debugging I fund out the origin of the problem and the odd behavior. There was no problem with the cookies/headers/url, the problem was at the .method(). The default value of .method() is Method.GET. But since i was sending dynamic requests i had to construct this dynamically also. For this purpose i was parsing the forms to get the method and then adding the type needed in the connection construction.
if (method.equals("post")) {
return Method.POST;
}
if (method.equals("get")) {
return Method.GET;
}
return Method.POST;
This worked until now and surprisingly when sending the wrong method it still sent a response almost valid, so i overlooked it.
Here is the fix.
if (method.toLowerCase().equals("post")) {
return Method.POST;
}
if (method.toLowerCase().equals("get")) {
return Method.GET;
}
return Method.GET;
It was my mistake and not relay a Jsoup problem but since I overlooked it others might too so here is a reminder.

restAssured - cannot master post method

fellow stackoverflowians :)
I've been for quit time to make a Post call using Gmail API.
Been trying to use createDraft and createLabel.
Now I guess I've found how to do this correctly (mostly) but I get this error:
java.lang.AssertionError: 1 expectation failed.
Expected status code <200> but was <400>.
I realise that this error occurs because I make incorrect request.
Could You, guys, help me with this?
Here's my code:
import io.restassured.RestAssured.*
import io.restassured.http.ContentType
import io.restassured.matcher.RestAssuredMatchers.*
import org.hamcrest.Matchers.*
import org.testng.annotations.Test
class RestAPIAutoTestPost {
#Test
fun createLabelInGoogleMail() {
RestAssured.baseURI = "https://www.googleapis.com/gmail/v1/users/me"
val accessToken = "ya29.Glw7BEv6***"
val jsonAsMap = HashMap<String, Any>()
jsonAsMap.put("id", "labelAPITestNameID")
jsonAsMap.put("labelListVisibility", "labelShow")
jsonAsMap.put("messageListVisibility", "show")
jsonAsMap.put("messagesTotal", "0")
jsonAsMap.put("messagesUnread", "0")
jsonAsMap.put("name", "labelAPITestName")
jsonAsMap.put("threadsTotal", "0")
jsonAsMap.put("threadsUnread", "0")
jsonAsMap.put("type", "user")
given().
contentType(ContentType.JSON).
body(jsonAsMap).
`when`()
post("/labels?access_token=$accessToken").
then().
statusCode(200)
}
}
I suppose I use HashMap incorrectly or I use some incorrect body fields.
I've only started to learn restAssured so I beg my pardons for newby question.
Thanks!
P.S. I'd really appreciate for any help with Post methods and puting data into body

I think your use of RestAssured and HashMap is correct. I think you are getting a 400 from this API because you are specifying the id property. By playing with this in Google's API Explorer, I was able to generate 400 errors by doing that. According to the documentation, the only things you need to specify for a POST/Create are: labelListVisibility, messageListVisibility, and name. The id is returned to you as part of the response.
A good feature in RestAssured is that you can have it log what it sends or receives when there is an error or all the time.
Log all requests:
given().log().all()
Log all responses:
`when`().log().all()
Or just when validations fail:
`when`().log().ifValidationFails()
Using that will give you a more precise reason why your interaction with the API is failing because it will show whatever Google is sending back. So we can see for sure if I'm right about the id.
And since you seem to be using Kotlin for this, you might want to take advantage of its great multiline string capabilities and just create the JSON payload manually:
val body = """
{
"labelListVisibility": "labelShow",
"messageListVisibility": "show",
"name": "ThisIsATest"
}
"""

Android: App working on WiFi but neither on 3G nor 2G

I desperately need help with this problem. I know there are already questions on these lines but none of them are quite like the issue I'm facing at the moment.
I have an app which pulls the follow JSON data from a URL:
{
"dateToday" = "17th May"
}
The code to retrieve that data is as follows:
protected List<String> doInBackground(List<String>... arg0) {
Log.d("Refresh Check","In background...");
// TODO Auto-generated method stub
List<String> matchDetails = new ArrayList<String>();
try {
Log.d("Refresh Check","In try...");
json = jsonData.retrieveData(URL, client);
Log.d("Refresh Check",json.toString());
String date = json.getString("dateToday");
matchDetails.add(date);
return matchDetails;
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return Arrays.asList("Connectivity issue");
}
And the retrieve data function is:
public JSONObject retrieveData(String URL, HttpClient client)
throws ClientProtocolException, IOException, JSONException {
StringBuilder url = new StringBuilder(URL);
HttpGet get = new HttpGet(url.toString());
HttpResponse r = client.execute(get);
int status = r.getStatusLine().getStatusCode();
if (status == 200) {
HttpEntity e = r.getEntity();
String data = EntityUtils.toString(e);
JSONObject last = new JSONObject(data);
return last;
} else {
return null;
}
}
So when I run this on my phone using WiFi, everything works and gets updated perfectly but when I switch to my data plan on 3G or 2G, the data just doesn't refresh at all.
The 3rd Log.d in my code, which checks for the String, keeps returning the date as 17th May even if I change the JSON value to 18th.
The only difference in the logcat I could note for WiFi versus my data plan was that the data plan had an error on the lines of Network Controller: iconLevel >= 5
Here's where it gets annoying though, there are times when the app works completely fine even on 2G, it's like it won't work for a while, then say I switch from 2G to WiFi and then back to 2G, it works. The behaviour of the app has been very erratic.
Things I've tried so far:
1.) Use another phone to see if the issue persists (it did).
2.) Check if my phone browser could access the URL and display the updated results (it could).
So I am now clueless as hell and really close to giving up on the app after 3 weeks of unexplained execution.
If anyone could help, I'd be very very very grateful. This is what I have now narrowed to:
1.) Why precisely would getting light-weight data from a particular URL behave differently on WiFi and on 3G/2G
2.) Is android saving the data to cache or something? Because every time I start the app, it should reach the doInBackground wherein, it should retrieve the updated data from the site but the log simply shows the old data.
3.) Would it help if I tried loading the apk separately onto my device rather than running it on my phone through eclipse? I mean, is there any difference between the two?
Any help would be extremely appreciate. Thanks in advance.

it's probably the caching when you request json data from server.
You can try to add a timestamp string behind your url when you requesting data from server to ensure ever request is a new request to avoid caching
for example :
Original Url : mytest.com/webservice/?date=today
add extra parameter : mytest.com/webservice/?date=today&timestamp=ddmmyyhhmmss

You obviously have a cache problem here. As everything is running fine on wifi, I think the issue probably comes from a cache in your network provider infrastructure. Your request is a GET, it could easily get cached by any element on the path. to work around this you can
configure the HTTP server to add a cache control headers (header Cache-Control with no-cache or max-age or header expire ) to the answer to your request (how to do this depends on the http server you use). Actually, it can be a bit more tricky than that, see this section on wikipedia avoid avoiding caching.
add a random parameter to your url (as suggested by #Darkangle)

I apologize for the delay in getting back about this.
Darkangle's suggestion seems to be spot on.
However, I had checked through the cache issue on stackoverflow and found a solution which worked well for me.
This was the change to the code:
StringBuilder url = new StringBuilder(URL);
HttpGet get = new HttpGet(url.toString());
get.setHeader("Cache-Control", "no-cache, no-store, must-revalidate"); // HTTP 1.1.
get.setHeader("Pragma", "no-cache"); // HTTP 1.0.
HttpResponse response = client.execute(get);
The get.setHeaders which made sure new data is fetched every time.
Thanks to everyone who helped.

Spring REST tutorial [duplicate]

I'm building a REST API, but I've encountered a problem.
It seems that accepted practice in designing a REST API is that if the resource requested doesn't exist, a 404 is returned.
However, to me, this adds unnecessary ambiguity. HTTP 404 is more traditionally associated with a bad URI. So in effect we're saying "Either you got to the right place, but that specific record does not exist, or there's no such location on the Internets! I'm really not sure which one..."
Consider the following URI:
http://mywebsite/api/user/13
If I get a 404 back, is that because User 13 does not exist? Or is it because my URL should have been:
http://mywebsite/restapi/user/13
In the past, I've just returned a NULL result with an HTTP 200 OK response code if the record doesn't exist. It's simple, and in my opinion very clean, even if it's not necessarily accepted practice. But is there a better way to do this?

404 is just the HTTP response code. On top of that, you can provide a response body and/or other headers with a more meaningful error message that developers will see.

Use 404 if the resource does not exist. Don't return 200 with an empty body.
This is akin to undefined vs empty string (e.g. "") in programming. While very similar, there is definitely a difference.
404 means that nothing exists at that URI (like an undefined variable in programming). Returning 200 with an empty body means that something does exist there and that something is just empty right now (like an empty string in programming).
404 doesn't mean it was a "bad URI". There are special HTTP codes that are intended for URI errors (e.g. 414 Request-URI Too Long).

As with most things, "it depends". But to me, your practice is not bad and is not going against the HTTP spec per se. However, let's clear some things up.
First, URI's should be opaque. Even if they're not opaque to people, they are opaque to machines. In other words, the difference between http://mywebsite/api/user/13, http://mywebsite/restapi/user/13 is the same as the difference between http://mywebsite/api/user/13 and http://mywebsite/api/user/14 i.e. not the same is not the same period. So a 404 would be completely appropriate for http://mywebsite/api/user/14 (if there is no such user) but not necessarily the only appropriate response.
You could also return an empty 200 response or more explicitly a 204 (No Content) response. This would convey something else to the client. It would imply that the resource identified by http://mywebsite/api/user/14 has no content or is essentially nothing. It does mean that there is such a resource. However, it does not necessarily mean that you are claiming there is some user persisted in a data store with id 14. That's your private concern, not the concern of the client making the request. So, if it makes sense to model your resources that way, go ahead.
There are some security implications to giving your clients information that would make it easier for them to guess legitimate URI's. Returning a 200 on misses instead of a 404 may give the client a clue that at least the http://mywebsite/api/user part is correct. A malicious client could just keep trying different integers. But to me, a malicious client would be able to guess the http://mywebsite/api/user part anyway. A better remedy would be to use UUID's. i.e. http://mywebsite/api/user/3dd5b770-79ea-11e1-b0c4-0800200c9a66 is better than http://mywebsite/api/user/14. Doing that, you could use your technique of returning 200's without giving much away.

That is an very old post but I faced to a similar problem and I would like to share my experience with you guys.
I am building microservice architecture with rest APIs. I have some rest GET services, they collect data from back-end system based on the request parameters.
I followed the rest API design documents and I sent back HTTP 404 with a perfect JSON error message to client when there was no data which align to the query conditions (for example zero record was selected).
When there was no data to sent back to the client I prepared an perfect JSON message with internal error code, etc. to inform the client about the reason of the "Not Found" and it was sent back to the client with HTTP 404. That works fine.
Later I have created a rest API client class which is an easy helper to hide the HTTP communication related code and I used this helper all the time when I called my rest APIs from my code.
BUT I needed to write confusing extra code just because HTTP 404 had two different functions:
the real HTTP 404 when the rest API is not available in the given url, it is thrown by the application server or web-server where the rest API application runs
client get back HTTP 404 as well when there is no data in database based on the where condition of the query.
Important: My rest API error handler catches all the exceptions appears in the back-end service which means in case of any error my rest API always returns with a perfect JSON message with the message details.
This is the 1st version of my client helper method which handles the two different HTTP 404 response:
public static String getSomething(final String uuid) {
String serviceUrl = getServiceUrl();
String path = "user/" + , uuid);
String requestUrl = serviceUrl + path;
String httpMethod = "GET";
Response response = client
.target(serviceUrl)
.path(path)
.request(ExtendedMediaType.APPLICATION_UTF8)
.get();
if (response.getStatus() == Response.Status.OK.getStatusCode()) {
// HTTP 200
return response.readEntity(String.class);
} else {
// confusing code comes here just because
// I need to decide the type of HTTP 404...
// trying to parse response body
try {
String responseBody = response.readEntity(String.class);
ObjectMapper mapper = new ObjectMapper();
ErrorInfo errorInfo = mapper.readValue(responseBody, ErrorInfo.class);
// re-throw the original exception
throw new MyException(errorInfo);
} catch (IOException e) {
// this is a real HTTP 404
throw new ServiceUnavailableError(response, requestUrl, httpMethod);
}
// this exception will never be thrown
throw new Exception("UNEXPECTED ERRORS, BETTER IF YOU DO NOT SEE IT IN THE LOG");
}
BUT, because my Java or JavaScript client can receive two kind of HTTP 404 somehow I need to check the body of the response in case of HTTP 404. If I can parse the response body then I am sure I got back a response where there was no data to send back to the client.
If I am not able to parse the response that means I got back a real HTTP 404 from the web server (not from the rest API application).
It is so confusing and the client application always needs to do extra parsing to check the real reason of HTTP 404.
Honestly I do not like this solution. It is confusing, needs to add extra bullshit code to clients all the time.
So instead of using HTTP 404 in this two different scenarios I decided that I will do the following:
I am not using HTTP 404 as a response HTTP code in my rest application anymore.
I am going to use HTTP 204 (No Content) instead of HTTP 404.
In that case client code can be more elegant:
public static String getString(final String processId, final String key) {
String serviceUrl = getServiceUrl();
String path = String.format("key/%s", key);
String requestUrl = serviceUrl + path;
String httpMethod = "GET";
log(requestUrl);
Response response = client
.target(serviceUrl)
.path(path)
.request(ExtendedMediaType.APPLICATION_JSON_UTF8)
.header(CustomHttpHeader.PROCESS_ID, processId)
.get();
if (response.getStatus() == Response.Status.OK.getStatusCode()) {
return response.readEntity(String.class);
} else {
String body = response.readEntity(String.class);
ObjectMapper mapper = new ObjectMapper();
ErrorInfo errorInfo = mapper.readValue(body, ErrorInfo.class);
throw new MyException(errorInfo);
}
throw new AnyServerError(response, requestUrl, httpMethod);
}
I think this handles that issue better.
If you have any better solution please share it with us.

404 Not Found technically means that uri does not currently map to a resource. In your example, I interpret a request to http://mywebsite/api/user/13 that returns a 404 to imply that this url was never mapped to a resource. To the client, that should be the end of conversation.
To address concerns with ambiguity, you can enhance your API by providing other response codes. For example, suppose you want to allow clients to issue GET requests the url http://mywebsite/api/user/13, you want to communicate that clients should use the canonical url http://mywebsite/restapi/user/13. In that case, you may want to consider issuing a permanent redirect by returning a 301 Moved Permanently and supply the canonical url in the Location header of the response. This tells the client that for future requests they should use the canonical url.

So in essence, it sounds like the answer could depend on how the request is formed.
If the requested resource forms part of the URI as per a request to http://mywebsite/restapi/user/13 and user 13 does not exist, then a 404 is probably appropriate and intuitive because the URI is representative of a non-existent user/entity/document/etc. The same would hold for the more secure technique using a GUID http://mywebsite/api/user/3dd5b770-79ea-11e1-b0c4-0800200c9a66 and the api/restapi argument above.
However, if the requested resource ID was included in the request header [include your own example], or indeed, in the URI as a parameter, eg http://mywebsite/restapi/user/?UID=13 then the URI would still be correct (because the concept of a USER does exits at http://mywebsite/restapi/user/); and therefore the response could reasonable be expected to be a 200 (with an appropriately verbose message) because the specific user known as 13 does not exist but the URI does. This way we are saying the URI is good, but the request for data has no content.
Personally a 200 still doesn't feel right (though I have previously argued it does). A 200 response code (without a verbose response) could cause an issue not to be investigated when an incorrect ID is sent for example.
A better approach would be to send a 204 - No Contentresponse. This is compliant with w3c's description *The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.*1 The confusion, in my opinion is caused by the Wikipedia entry stating 204 No Content - The server successfully processed the request, but is not returning any content. Usually used as a response to a successful delete request. The last sentence is highly debateable. Consider the situation without that sentence and the solution is easy - just send a 204 if the entity does not exist. There is even an argument for returning a 204 instead of a 404, the request has been processed and no content has been returned! Please be aware though, 204's do not allow content in the response body
Sources
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
1. http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

This old but excellent article... http://www.infoq.com/articles/webber-rest-workflow says this about it...
404 Not Found - The service is far too lazy (or secure) to give us a real reason why our request failed, but whatever the reason, we need to deal with it.

This recently came up with our team.
We use both 404 Not found with a message body and 204 No Content based on the following rational.
If the request URI indicates the location of a single resource, we use 404 Not found. When the request queries a URI, we use 204 No Content
http://mywebsite/api/user/13 would return 404 when user 13 does not exist
http://mywebsite/api/users?id=13 would return 204 no content
http://mywebsite/api/users?firstname=test would return 204 no content
The idea here being, 'query routes' are expected to be able to return 1, many or no content.
Whatever pattern you choose, the most important things is to be consistent - so get buy in from your team.

The Uniform Resource Identifier is a unique pointer to the resource. A poorly form URI doesn't point to the resource and therefore performing a GET on it will not return a resource. 404 means The server has not found anything matching the Request-URI. If you put in the wrong URI or bad URI that is your problem and the reason you didn't get to a resource whether a HTML page or IMG.

Since this discussion seems to be able to survive the end of time I'll throw in the JSON:API Specifications
404 Not Found
A server MUST respond with 404 Not Found when processing a request to fetch a single resource that does not exist, except when the request warrants a 200 OK response with null as the primary data (as described above).
HTTP/1.1 200 OK
Content-Type: application/vnd.api+json
{
"links": {
"self": "http://example.com/articles/1/author"
},
"data": null
}
Also please see this Stackoverflow question

For this scenario HTTP 404 is response code for the response from the REST API
Like 400, 401, 404 , 422 unprocessable entity
use the Exception handling to check the full exception message.
try{
// call the rest api
} catch(RestClientException e) {
//process exception
if(e instanceof HttpStatusCodeException){
String responseText=((HttpStatusCodeException)e).getResponseBodyAsString();
//now you have the response, construct json from it, and extract the errors
System.out.println("Exception :" +responseText);
}
}
This exception block give you the proper message thrown by the REST API

How to retrieve home timeline from twitter in the form of .xml or JSON using java?

I want to get a home timeline from twitter and I was able to get the home timeline using twitter4j and oauth authentication method
ConfigurationBuilder confBuilder = new ConfigurationBuilder();
confBuilder.setOAuthAccessToken(accessToken.getToken())
.setOAuthAccessTokenSecret(accessToken.getTokenSecret())
.setOAuthConsumerKey(key)
.setOAuthConsumerSecret(secret);
Twitter twit = new TwitterFactory(confBuilder.build()).getInstance();
User user = twitter.verifyCredentials();
List<Status> statuses = twitter.getHomeTimeline();
but the result is not in the form of .xml or JSON. i also tried
WebResource resource = client.resource("https://api.twitter.com/1/statuses/user_timeline.json");
but all I get is GET https://api.twitter.com/1/statuses/user_timeline.json returned a response status of 401 Unauthorized
I googled many times but I just cant get it right. Please I need a sample java code of how to do it. Complete code that can run right away would be really helpful as I got a lot of partially coded program and just couldnt get it to work. thank you in advance

OK, so after looking at the release notes for the 2.2.x versions, it appears there is a way to get the JSON representation from Twitter4J, but it's disabled by default since it uses some extra memory.
So, you need to:
Enable the JSONStore using the jsonStoreEnabled config option
Get the JSON representation of a request using the getRawJson method
Sorry there's no code example, I haven't tried it myself.

401 Unauthorized:
Authentication credentials were missing or incorrect.
You need to authenticate before you perform the query.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.