Best way to get Amazon page and product information

Best way to get Amazon page and product information - java

I want to get Amazon page and product information from their website so I work on a future project. I have no experience with APIs but also saw that I would need to pay in order to use Amazon's. My current plan was to use a WebRequest class which basically takes down the page's raw text and then parse through it to get what I need. It pulls down HTML from all the websites I have tried except amazon. When I try and use it for amazon I get text like this...
??èv~-1?½d!Yä90û?¡òk6??ªó?l}L??A?{í??j?ì??ñF Oü?ª[D ú7W¢!?É?L?]â  v??ÇJ???t?ñ?j?^,Y£>O?|?I`OöN??Q?»bÇJPy1·¬Ç??RtâU??Q%vB??^íè|??ª?
Can someone explain to me why this happens? Or even better if you could point me towards a better way of doing this? Any help is appreciated.
This is the class I mentioned...
public class WebRequest {
protected String url;
protected ArrayList<String> pageText;
public WebRequest() {
url = "";
pageText = new ArrayList<String>();
}
public WebRequest(String url) {
this.url = url;
pageText = new ArrayList<String>();
load();
}
public boolean load() {
boolean returnValue = true;
try {
URL thisURL = new URL(url);
BufferedReader reader = new BufferedReader(new InputStreamReader(thisURL.openStream()));
String line;
while ((line = reader.readLine()) != null) {
pageText.add(line);
}
reader.close();
}
catch (Exception e) {
returnValue = false;
System.out.println("peepee");
}
return returnValue;
}
public boolean load(String url) {
this.url = url;
return load();
}
public String toString() {
String returnString = "";
for (String s : pageText) {
returnString += s + "\n";
}
return returnString;
}
}

It could be that the page is returned using a different character encoding than your platform default. If that's the case, you should specify the appropriate encoding, e.g:
new InputStreamReader(thisURL.openStream(), "UTF-8")
But that data doesn't look like character data at all to me. It's too random. It looks like binary data. Are you sure you're not downloading an image by mistake?
If you want to make more sophisticated HTTP requests, there are quite a few Java libraries, e.g. OkHttp and AsyncHttpClient.
But it's worth bearing in mind that Amazon probably doesn't like people scraping its site, and will have built in detection of malicious or unwanted activity. It might be sending you gibberish on purpose to deter you from continuing. You should be careful because some big sites may block your IP temporarily or permanently.
My advice would be to learn how to use the Amazon APIs. They're pretty powerful—and you won't get yourself banned.

Related

Mocking/Testing HTTP Get Request

I'm trying to write unit tests for my program and use mock data. I'm a little confused on how to intercept an HTTP Get request to a URL.
My program calls a URL to our API and it is returned a simple XML file. I would like the test to instead of getting the XML file from the API online to receive a predetermined XML file from me so that I can compare the output to the expected output and determine if everything is working correctly.
I was pointed to Mockito and have been seeing many different examples such as this SO post, How to use mockito for testing a REST service? but it's not becoming clear to me how to set it all up and how to mock the data (i.e., return my own xml file whenever the call to the URL is made).
The only thing I can think of is having another program made that's running locally on Tomcat and in my test pass a special URL that calls the locally running program on Tomcat and then return the xml file that I want to test with. But that just seems like overkill and I don't think that would be acceptable. Could someone please point me in the right direction.
private static InputStream getContent(String uri) {
HttpURLConnection connection = null;
try {
URL url = new URL(uri);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
return connection.getInputStream();
} catch (MalformedURLException e) {
LOGGER.error("internal error", e);
} catch (IOException e) {
LOGGER.error("internal error", e);
} finally {
if (connection != null) {
connection.disconnect();
}
}
return null;
}
I am using Spring Boot and other parts of the Spring Framework if that helps.

Part of the problem is that you're not breaking things down into interfaces. You need to wrap getContent into an interface and provide a concrete class implementing the interface. This concrete class will then
need to be passed into any class that uses the original getContent. (This is essentially dependency inversion.) Your code will end up looking something like this.
public interface IUrlStreamSource {
InputStream getContent(String uri)
}
public class SimpleUrlStreamSource implements IUrlStreamSource {
protected final Logger LOGGER;
public SimpleUrlStreamSource(Logger LOGGER) {
this.LOGGER = LOGGER;
}
// pulled out to allow test classes to provide
// a version that returns mock objects
protected URL stringToUrl(String uri) throws MalformedURLException {
return new URL(uri);
}
public InputStream getContent(String uri) {
HttpURLConnection connection = null;
try {
Url url = stringToUrl(uri);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
return connection.getInputStream();
} catch (MalformedURLException e) {
LOGGER.error("internal error", e);
} catch (IOException e) {
LOGGER.error("internal error", e);
} finally {
if (connection != null) {
connection.disconnect();
}
}
return null;
}
}
Now code that was using the static getContent should go through a IUrlStreamSource instances getContent(). You then provide to the object that you want to test a mocked IUrlStreamSource rather than a SimpleUrlStreamSource.
If you want to test SimpleUrlStreamSource (but there's not much to test), then you can create a derived class that provides an implementation of stringToUrl that returns a mock (or throws an exception).

The other answers in here advise you to refactor your code to using a sort of provider which you can replace during your tests - which is the better approach.
If that isn't a possibility for whatever reason you can install a custom URLStreamHandlerFactory that intercepts the URLs you want to "mock" and falls back to the standard implementation for URLs that shouldn't be intercepted.
Note that this is irreversible, so you can't remove the InterceptingUrlStreamHandlerFactory once it's installed - the only way to get rid of it is to restart the JVM. You could implement a flag in it to disable it and return null for all lookups - which would produce the same results.
URLInterceptionDemo.java:
public class URLInterceptionDemo {
private static final String INTERCEPT_HOST = "dummy-host.com";
public static void main(String[] args) throws IOException {
// Install our own stream handler factory
URL.setURLStreamHandlerFactory(new InterceptingUrlStreamHandlerFactory());
// Fetch an intercepted URL
printUrlContents(new URL("http://dummy-host.com/message.txt"));
// Fetch another URL that shouldn't be intercepted
printUrlContents(new URL("http://httpbin.org/user-agent"));
}
private static void printUrlContents(URL url) throws IOException {
try(InputStream stream = url.openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream))) {
String line;
while((line = reader.readLine()) != null) {
System.out.println(line);
}
}
}
private static class InterceptingUrlStreamHandlerFactory implements URLStreamHandlerFactory {
#Override
public URLStreamHandler createURLStreamHandler(final String protocol) {
if("http".equalsIgnoreCase(protocol)) {
// Intercept HTTP requests
return new InterceptingHttpUrlStreamHandler();
}
return null;
}
}
private static class InterceptingHttpUrlStreamHandler extends URLStreamHandler {
#Override
protected URLConnection openConnection(final URL u) throws IOException {
if(INTERCEPT_HOST.equals(u.getHost())) {
// This URL should be intercepted, return the file from the classpath
return URLInterceptionDemo.class.getResource(u.getHost() + "/" + u.getPath()).openConnection();
}
// Fall back to the default handler, by passing the default handler here we won't end up
// in the factory again - which would trigger infinite recursion
return new URL(null, u.toString(), new sun.net.www.protocol.http.Handler()).openConnection();
}
}
}
dummy-host.com/message.txt:
Hello World!
When run, this app will output:
Hello World!
{
"user-agent": "Java/1.8.0_45"
}
It's pretty easy to change the criteria of how you decide which URLs to intercept and what you return instead.

The answer depends on what you are testing.
If you need to test the processing of the InputStream
If getContent() is called by some code that processes the data returned by the InputStream, and you want to test how the processing code handles specific sets of input, then you need to create a seam to enable testing. I would simply move getContent() into a new class, and inject that class into the class that does the processing:
public interface ContentSource {
InputStream getContent(String uri);
}
You could create a HttpContentSource that uses URL.openConnection() (or, better yet, the Apache HttpClientcode).
Then you would inject the ContentSource into the processor:
public class Processor {
private final ContentSource contentSource;
#Inject
public Processor(ContentSource contentSource) {
this.contentSource = contentSource;
}
...
}
The code in Processor could be tested with a mock ContentSource.
If you need to test the fetching of the content
If you want to make sure that getContent() works, you could create a test that starts a lightweight in-memory HTTP server that serves the expected content, and have getContent() talk to that server. That does seem overkill.
If you need to test a large subset of the system with fake data
If you want to make sure things work end to end, write an end to end system test. Since you indicated you use Spring, you can use Spring to wire together parts of the system (or to wire the entire system, but with different properties). You have two choices
Have the system test start a local HTTP server, and when you have your test create your system, configure it to talk to that server. See the answers to this question for ways to start the HTTP server.
Configure spring to use a fake implementation of ContentSource. This gets you slightly less confidence that everything works end-to-end, but it will be faster and less flaky.

YouTube API v3 Not Displaying Exceptions

I just started using YouTube API for Java and I'm having a tough time trying to figure out why things don't work since exception/stack trace is no where to be found. What I'm trying to do is to get list of videos uploaded by current user.
GoogleTokenResponse tokenFromExchange = new GoogleTokenResponse();
tokenFromExchange.setAccessToken(accessToken);
GoogleCredential credential = new GoogleCredential.Builder().setJsonFactory(JSON_FACTORY).setTransport(TRANSPORT).build();
credential.setFromTokenResponse(tokenFromExchange);
YouTube.Channels.List channelRequest = youtube.channels().list("contentDetails");
channelRequest.setMine(true);
channelRequest.setFields("items/contentDetails,nextPageToken,pageInfo");
ChannelListResponse channelResult = channelRequest.execute();
I don't see anything wrong with this code and also tried removing multiple things, but still not able to get it to work. Please let me know if you have run into a similar issue. The version of client library I'm using is v3-rev110-1.18.0-rc.

YouTube API has some working code and you can use it.
public static YouTubeService service;
public static String USER_FEED = "http://gdata.youtube.com/feeds/api/users/";
public static String CLIENT_ID = "...";
public static String DEVELOPER_KEY = "...";
public static int getVideoCountOf(String uploader) {
try {
service = new YouTubeService(CLIENT_ID, DEVELOPER_KEY);
String uploader = "UCK-H1e0S8jg-8qoqQ5N8jvw"; // sample user
String feedUrl = USER_FEED + uploader + "/uploads";
VideoFeed videoFeed = service.getFeed(new URL(feedUrl), VideoFeed.class);
return videoFeed.getTotalResults();
} catch (Exception ex) {
Logger.getLogger(YouTubeCore.class.getName()).log(Level.SEVERE, null, ex);
}
return 0;
}
This simple give you the number of videos a user has. You can read through videoFeed using printEntireVideoFeed prepared on their api page.

Java Webdav File Synchronization

I have a cloud storage at strato namely hidrive. It uses the webdav protocol. Note that it's based on HTTP. The client application they provide is poor and buggy so I tried various other tools for synchronization but none just worked the way I need it.
I'm therefore trying to implement it in Java using the Sardine project. Is there any code for hard-copying a local source folder to an external cloud folder? I haven't found anything in that direction.
The following code is supposed to upload the file...
Sardine sardine = SardineFactory.begin("username", "password");
InputStream fis = new FileInputStream(new File("some/file/test.txt"));
sardine.put("https://webdav.hidrive.strato.com/users/username/Backup", fis);
... but throws an exception instead:
Exception in thread "main" com.github.sardine.impl.SardineException: Unexpected response (301 Moved Permanently)
at com.github.sardine.impl.handler.ValidatingResponseHandler.validateResponse(ValidatingResponseHandler.java:48)
at com.github.sardine.impl.handler.VoidResponseHandler.handleResponse(VoidResponseHandler.java:34)
at com.github.sardine.impl.handler.VoidResponseHandler.handleResponse(VoidResponseHandler.java:1)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:218)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
at com.github.sardine.impl.SardineImpl.execute(SardineImpl.java:828)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:755)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:738)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:726)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:696)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:689)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:682)
at com.github.sardine.impl.SardineImpl.put(SardineImpl.java:676)
Printing out the folders in that directory works so the connection/ authentication did succeed:
List<DavResource> resources = sardine.list("https://webdav.hidrive.strato.com/users/username/Backup");
for (DavResource res : resources)
{
System.out.println(res);
}
Please either help me fix my code or link me to some file synchronization library that works for my purpose.

Sardine uses (internally) HttpClient. There is similar question here where you can find an answer Httpclient 4, error 302. How to redirect?.

Try converting the InputStream obj into byte array before you call put(). Something like the below,
byte[] fisByte = IOUtils.toByteArray(fis);
sardine.put("https://webdav.hidrive.strato.com/users/username/Backup", fisByte);
It worked for me. Let me know.

I had to extend the "org.apache.http.impl.client.LaxRedirectStrategy" and also the getRedirect() Method of org.apache.http.impl.client.DefaultRedirectStrategy with a treatment of the needed methods: PUT, MKOL, etc. . By default only GET is redirected.
It looks like this:
private static final String[] REDIRECT_METHODS = new String[] { HttpGet.METHOD_NAME, HttpPost.METHOD_NAME, HttpHead.METHOD_NAME, HttpPut.METHOD_NAME, HttpDelete.METHOD_NAME, HttpMkCol.METHOD_NAME };
isRedirectable-Method
for (final String m : REDIRECT_METHODS) {
if (m.equalsIgnoreCase(method)) {
System.out.println("isRedirectable true");
return true;
}
}
return method.equalsIgnoreCase(HttpPropFind.METHOD_NAME);
getRedirect-Method:
final URI uri = getLocationURI(request, response, context);
final String method = request.getRequestLine().getMethod();
if (method.equalsIgnoreCase(HttpHead.METHOD_NAME)) {
return new HttpHead(uri);
} else if (method.equalsIgnoreCase(HttpGet.METHOD_NAME)) {
return new HttpGet(uri);
} else if (method.equalsIgnoreCase(HttpPut.METHOD_NAME)) {
HttpPut httpPut = new HttpPut(uri);
httpPut.setEntity(((HttpEntityEnclosingRequest) request).getEntity());
return httpPut;
} else if (method.equalsIgnoreCase("MKCOL")) {
return new HttpMkCol(uri);
} else if (method.equalsIgnoreCase("DELETE")) {
return new HttpDelete(uri);
} else {
final int status = response.getStatusLine().getStatusCode();
if (status == HttpStatus.SC_TEMPORARY_REDIRECT) {
return RequestBuilder.copy(request).setUri(uri).build();
} else {
return new HttpGet(uri);
}
}
That worked for me.

Fetch JSON Data from URL and Repeatedly Update SQLite Database

In my app, I create a SQLite database. Then I populate it with JSON data fetched from a URL using an instance of the HttpAsyncTask class in my main activity. That works fine, but I also want to update the database. New data (one row in the database) is added to the URL page once per day, and I want to implement a "synchronize" button in the app that updates the database with only the new information. Could I get some advice on how to do this? My HttpAsyncTask is below, if that helps - I'm thinking I might need an if/else clause in the onPostExecute() method that adds all the rows only if the database is getting created for the first time. I thought about trying to put an HttpAsyncTask class in my DatabaseHelper, but that doesn't really make sense.
private class HttpAsyncTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String...urls) {
return GET(urls[0]);
}
#Override
protected void onPostExecute(String result) {
try {
JSONObject main = new JSONObject(result);
JSONObject body = main.getJSONObject("body");
JSONArray measuregrps = body.getJSONArray("measuregrps");
// get measurements for date, unit, and value (weight)
for (int i = 0; i < measuregrps.length(); i++) {
JSONObject row = measuregrps.getJSONObject(i);
// a lot of getting & parsing data happens
db.addEntry(new Entry(date, weight, null, null));
//adds all the lines every time this is run, but I only want to add all
//the lines once and then add new rows one by one from there
}
} catch (JSONException e) {
e.printStackTrace();
}
}
}
public static String GET(String url) {
InputStream is = null;
String result = "";
try {
HttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet(url);
HttpResponse httpResponse = client.execute(get);
is = httpResponse.getEntity().getContent();
if (is != null)
result = convertInputStream(is);
else
result = "Did not work!";
} catch (Exception e) {
Log.d("input stream", e.getLocalizedMessage());
}
return result;
}
private static String convertInputStream(InputStream is) throws IOException {
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
String line = "";
while((line = reader.readLine()) != null)
builder.append(line);
is.close();
return builder.toString();
}

Your implementation totally depends on the project requirements.
If there are continuously changes over the server, the right way to implement the synchronization process is:
1.
Implement the sync process, which works totally in background. This sync will be customized to call specific API calls/Service classes which will be required to sync.
2.
Server will prompt the mobile client for the data change.
3.
To get server updates, A continuously running service/Sync at some predefined intervals will be run and checks for the updates or implements the GCM.
Sync Adapter would be the best for the sync services.
Ohh, also don't forget to apply the content provider, as database calls would be concurrent from UI and background both.
Hope it may help to decide.

You have to check there is similar data available in the table if yes update the table and if no insert new data to table

Text not being saved to String variable

I have been trying to get an app to work that allows me to read from a URL and then use the text that I get from the URL for other purposes in the app.
The problem I'm having is that the text isn't being "saved".
I know for a fact that my #getText method works because I ran a basic command line application in IntelliJ:
String textFromUrl;
public static void main(String[] args) {
textFromUrl = Vars.getEngUrlText();
System.out.println(textFromUrl);
}
and the result was it printing the exact text it should. And this was written inside of the main activity's class in my Android project, I just ran the main method as a normal Java application instead of running the actual apk from my USB device.
Now, I try to do the same in the Android device by doing
In Vars class:
String textFromUrl;
In #onCreate of the first activity:
Vars.textFromUrl = Vars.getEngUrlText();
TextView tx = (TextView) findViewById(R.id.titletext);
tx.setText(Vars.textFromUrl);
and it just displays blank, no text, no nothing. Rest of the layout is fine though, everything else shows and no errors. I'm assuming the value of textFromUrl is null and I don't know why.
Yes I do have the proper permissions to access the web in my AndroidManifest because I'm using a WebView and it works fine. I've even tried running threads that give it some time to wait (about 5 seconds) before changing the text and it still won't work.
What's going on?
getText and #getEngUrlText below.
getEngUrlText calls getText:
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static String getEngUrlText() {
try {
textFromUrl = getText("url that is supposed to be here removed");
} catch (Exception e) {
e.printStackTrace();
}
return textFromUrl;
}

Fixed this by targetting a lower SDK. Apparently I didn't see a NetworkOnMainThreadException that the logcat was displaying because I was only looking for errors and not all warnings and such.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best way to get Amazon page and product information - java

Related

Mocking/Testing HTTP Get Request

YouTube API v3 Not Displaying Exceptions

Java Webdav File Synchronization

Fetch JSON Data from URL and Repeatedly Update SQLite Database

Text not being saved to String variable

Categories

Resources