I would like to get the categories of the amazon ,I am planning to scrap not to use API.
I have scrapped the http://www.amazon.com.I have scraped all the categories and sub-categories under Shop By Department drop down .I have created a web service to do this The code is here
def hello():
req = urllib2.Request("http://www.amazon.com",
headers={"Content-Type": "application/json"})
soup = BeautifulSoup(html)
last_page = soup.find('div', id="nav_subcats")
for elm in last_page.findAll('a'):
texts = elm.text
links = elm.get('href')
links = links.partition("&node=")[2]
for i,j in zip(text,link):
response.content_type = 'application/json'
return dumps(alltext)
run(host='localhost', port=8080, debug=True)
I am passing the category name and category id as a JSON object to one of my members to pass it to the API to get the product listing for each category
It is written in JAVA.Here is the code
for (int pageno = 1; pageno <= 10; pageno++) {
String page = String.valueOf(pageno);
String category_string = selectedOption.get("category_name").toString();
String category_id = selectedOption.get("category_id").toString();
final Map<String, String> params = new HashMap<String, String>(3);
params.put(AmazonClient.Op.PARAM_OPERATION, "ItemSearch");
params.put("SearchIndex", category_string);
params.put("BrowseNodeId", category_id);
params.put("Keywords", category_string);
params.put("ItemPage", page);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = null;
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream is = client.getInputStream(params);
doc = db.parse(is);
NodeList itemList = doc.getElementsByTagName("Items");
But i am getting this error when i pass the category id as the BrowseNodeId and category name as keyword and search index.
For example
Search Index and Keyword -Amazon Instant Video
The value you specified for SearchIndex is invalid. Valid values include [ 'All','Apparel',...................................reless','WirelessAccessories' ].
I would like to know from which amazon url i will get all the categories and its browse nodes
Thank you
I have never looked at Amazon's API before, so this is just a guess but, based on the error message it would seem that "Amazon Instant Video" is not a valid search index. Just because it is there in the drop-down list, doesn't necessarily mean that it is a valid search index.
Here's a list of search indices for US: http://docs.aws.amazon.com/AWSECommerceService/latest/DG/USSearchIndexParamForItemsearch.html . I don't know how up to date it is, but "Amazon Instant Video" does not appear on the list. The error message does include a list of valid search index values, and these do appear to correspond to the above list.
For other locales look here : http://docs.aws.amazon.com/AWSECommerceService/latest/DG/APPNDX_SearchIndexParamForItemsearch.html
I don't think that this is a coding problem per se.
You might like to take a look at python-amazon-product-api. The API might be useful to you, and the documentation might give you some ideas.
I have a string list with the dates of the days of a given week.
String daysweek[] = ["10/05/2020", "11/05/2020", "12/05/2020", "13/05/2020", "14/05/2020", "15/05/2020", "16/05/2020" ]
My goal is to be able to find several documents that belong to a certain week. The comparison field is "firstday".
Follows the image of the document structure in the database:
Document insert = new Document().append("$elemMatch", daysweek[]);
Document filterstar = new Document().append("id_motorista", idmotorista).append("pagamento", false).append("firstday", insert);
coll.find(filterstar).projection(new Document().append("_id", 1).append("origem",1).append("destino", 1).append("formadepagamento", 1).append("valordaviagem",1)
.append("notamotorista",1).append("pagamento",1).append("iniciodaviagem", 1).append("fimdaviagem",1).append("viagemcancelada", 1).append("horadaaceitacao",1)
.append("horacancelamentomotorista", 1).append("horacancelamentousuario", 1).append("taxadecancelamento", 1).append("valordaviagemmotorista", 1).append("valordaviagemusuario", 1).append("id_acompanhamento",1)
.append("taxaaplicativo", 1).append("taxacartao", 1).append("taxamotorista", 1)).sort(new Document().append("firstday", 1)).limit(100)
.into(docs).addOnSuccessListener(new OnSuccessListener<List<Document>>() {
public void onSuccess(List<Document> documents) {}
But the search finds no documents. The number of queries expected would be 35.
I would like to know if there is any way to find documents through a given document field, match any of the items within an arraylist.
$elemMatch is used when you're querying against an array field, but in your scenario you're querying against a string field and input is an array, then you can just use $in operator.
Mongo Shell Syntax :
db.collection.find({firstday : {$in : ["10/05/2020", "11/05/2020", "12/05/2020", "13/05/2020", "14/05/2020", "15/05/2020", "16/05/2020"]}})
Test : mongoplayground
The advice of #whoami works for me :D
So i change part of the code.
I changed that:
Document insert = new Document().append("$elemMatch", daysweek[]);
to this:
Document insert = new Document().append("$in", daysweek[]);
Document insert = new Document().append("$in", daysweek[]);
Document filterstar = new Document().append("id_motorista", idmotorista).append("pagamento", false).append("firstday", insert);
coll.find(filterstar).projection(new Document().append("_id", 1).append("origem",1).append("destino", 1).append("formadepagamento", 1).append("valordaviagem",1)
.append("notamotorista",1).append("pagamento",1).append("iniciodaviagem", 1).append("fimdaviagem",1).append("viagemcancelada", 1).append("horadaaceitacao",1)
.append("horacancelamentomotorista", 1).append("horacancelamentousuario", 1).append("taxadecancelamento", 1).append("valordaviagemmotorista", 1).append("valordaviagemusuario", 1).append("id_acompanhamento",1)
.append("taxaaplicativo", 1).append("taxacartao", 1).append("taxamotorista", 1)).sort(new Document().append("firstday", 1)).limit(100)
.into(docs).addOnSuccessListener(new OnSuccessListener<List<Document>>() {
public void onSuccess(List<Document> documents) {}
I am trying to add a string to the end of an existing array in a mongoDB document.
I have tried looking at the documentation for mongoDB which lead me to the push page and other similar questions. None of them have worked so far as the document that i have have no ids made by me, they are auto-generated as a new element in the array is added.
Document in collection:
_id: 5ce85c1e1c9d4400003dcfd9
name: "Halloween party"
category: 2
date: 2019-10-31T23:00:00.000+00:00
address: "Sample Street, london"
description: "It's Halloween, bring your costumes and your personality to the studen..."
bookings: Array
0: "1610512"
I am able to get the document that I want to append the string in with the following code.
Java Code:
MongoDatabase database = mongoClient.getDatabase("KioskDB");
MongoCollection<Document> Kiosk = database.getCollection("Events");
Document searchQuery = new Document();
searchQuery.put("name", selectedActivityName);
searchQuery.put("bookings", username);
FindIterable<Document> documents = Kiosk.find(searchQuery);
for (Document document: documents){
Giving me the following output
Document{{_id=5ce85c1e1c9d4400003dcfd9, name=Halloween party, category=2, date=Thu Oct 31 23:00:00 GMT 2019, address=Sample Street, london, description=It's Halloween, bring your costumes and your personality to the student Bar and join us in this age long celebration., bookings=[1610512]}}
How do I go about appending a new string at the end of the array giving me something like this shown below.
Desired final document
_id: 5ce85c1e1c9d4400003dcfd9
name: "Halloween party"
category: 2
date: 2019-10-31T23:00:00.000+00:00
address: "Sample Street, London"
description: "It's Halloween, bring your costumes and your personality to the studen..."
bookings: Array
0: "1610512"
1: "1859301"
Was able to find the answer with the following code.
DBObject listItem = new BasicDBObject("bookings", username);
Kiosk.updateOne(eq("name", selectedActivityName), new Document().append("$push", listItem));
Where username is the number (Ex: 1859301), selectedActivityName is the name of the name field (Ex: Halloween party) and Kiosk is the collection name.
I will try this code according to the documentation. Ref. https://mongodb.github.io/mongo-java-driver/3.4/driver/getting-started/quick-start/
Document doc = new Document("name", "Halloween party")
.append("bookings", Arrays.asList("1859301"));
Since the 3.0 Java driver they added helper methods for filters which make querying mongo a lot nicer and readable. In 3.1 they also added helper methods for updates which make things like this pretty straightforward and easy to understand what is happening See:
Bson query = Filters.eq("name", selectedActivityName);
Bson update = Updates.push("bookings", username);
collection.findOneAndUpdate(query, update);
Doing this in older versions is possible as well. This syntax should still hold true for pre 3.0 versions. However, if you're running older than 3.0 you'll need to replace Document with BasicDBObject.
Bson query = new Document("name", selectedActivityName);
Bson update = new Document("$push", new Document("bookings", username));
collection.findOneAndUpdate(query, update);
I'm trying to extract data from a webpage, for example, lets say I wish to fetch information from chess.org.
I know the player's ID is 25022, which means I can request
In that page I can see that this player's fide ID = 2821109.
From that, I can request this page:
And from that I can see that stdRating=1602.
How can I get the "stdRating" output from a given "localID" input in Java?
(localID, fideID and stdRating are aid parameters that I use to clarify the question)
You could try the univocity-html-parser, which is very easy to use and avoids a lot of spaghetti code.
To get the standard rating for example you can use this code:
public static void main(String... args) {
UrlReaderProvider url = new UrlReaderProvider("http://ratings.fide.com/card.phtml?event={EVENT}");
url.getRequest().setUrlParameter("EVENT", 2821109);
HtmlElement doc = HtmlParser.parseTree(url);
String rating = doc.query()
Which produces the value 1602.
But getting data by querying individual nodes and trying to stitch all pieces together is not exactly easy.
I expanded the code to illustrate how you can use the parser to get more information into records. Here I created records for the player and her rank details which are available in the table of the second page. It took me less than 1h to get this done:
public static void main(String... args) {
UrlReaderProvider url = new UrlReaderProvider("http://www.chess.org.il/Players/Player.aspx?Id={PLAYER_ID}");
url.getRequest().setUrlParameter("PLAYER_ID", 25022);
HtmlEntityList entities = new HtmlEntityList();
HtmlEntitySettings player = entities.configureEntity("player");
player.addField("id").match("b").withExactText("מספר שחקן").getFollowingText().transform(s -> s.replaceAll(": ", ""));
player.addField("name").match("h1").followedImmediatelyBy("b").withExactText("מספר שחקן").getText();
player.addField("date_of_birth").match("b").withExactText("תאריך לידה:").getFollowingText();
player.addField("fide_id").matchFirst("a").attribute("href", "http://ratings.fide.com/card.phtml?event=*").getText();
HtmlLinkFollower playerCard = player.addField("fide_card_url").matchFirst("a").attribute("href", "http://ratings.fide.com/card.phtml?event=*").getAttribute("href").followLink();
HtmlEntitySettings ratings = playerCard.addEntity("ratings");
configureRatingsBetween(ratings, "World Rank", "National Rank ISR", "world");
configureRatingsBetween(ratings, "National Rank ISR", "Continent Rank Europe", "country");
configureRatingsBetween(ratings, "Continent Rank Europe", "Rating Chart", "continent");
Results<HtmlParserResult> results = new HtmlParser(entities).parse(url);
HtmlParserResult playerData = results.get("player");
String[] playerFields = playerData.getHeaders();
for(HtmlRecord playerRecord : playerData.iterateRecords()){
for(int i = 0; i < playerFields.length; i++){
System.out.print(playerFields[i] + ": " + playerRecord.getString(playerFields[i]) +"; ");
HtmlParserResult ratingData = playerRecord.getLinkedEntityData().get("ratings");
for(HtmlRecord ratingRecord : ratingData.iterateRecords()){
System.out.print(" * " + ratingRecord.getString("rank_type") + ": ");
System.out.println(ratingRecord.fillFieldMap(new LinkedHashMap<>(), "all_players", "active_players", "female", "u16", "female_u16"));
private static void configureRatingsBetween(HtmlEntitySettings ratings, String startingHeader, String endingHeader, String rankType) {
Group group = ratings.newGroup()
group.addField("rank_type", rankType);
group.addField("all_players").match("tr").withText("World (all", "National (all", "Rank (all").match("td", 2).getText();
group.addField("active_players").match("tr").followedImmediatelyBy("tr").withText("Female (active players):").match("td", 2).getText();
group.addField("female").match("tr").withText("Female (active players):").match("td", 2).getText();
group.addField("u16").match("tr").withText("U-16 Rank (active players):").match("td", 2).getText();
group.addField("female_u16").match("tr").withText("Female U-16 Rank (active players):").match("td", 2).getText();
The output will be:
id: 25022; name: יעל כהן; date_of_birth: 02/02/2003; fide_id: 2821109; rating_std: 1602; rating_rapid: 1422; rating_blitz: 1526;
* world: {all_players=195907, active_players=94013, female=5490, u16=3824, female_u16=586}
* country: {all_players=1595, active_players=1024, female=44, u16=51, female_u16=3}
* continent: {all_players=139963, active_players=71160, female=3757, u16=2582, female_u16=372}
Hope it helps
Disclosure: I'm the author of this library. It's commercial closed source but it can save you a lot of development time.
As #Alex R pointed out, you'll need a Web Scraping library for this.
The one he recommended, JSoup, is quite robust and is pretty commonly used for this task in Java, at least in my experience.
You'd first need to construct a document that fetches your page, eg:
int localID = 25022; //your player's ID.
Document doc = Jsoup.connect("http://www.chess.org.il/Players/Player.aspx?Id=" + localID).get();
From this Document Object, you can fetch a lot of information, for example the FIDE ID you requested, unfortunately the web page you linked inst very simple to scrape, and you'll need to basically go through every link on the page to find the relevant link, for example:
Elements fidelinks = doc.select("a[href*=fide.com]");
This Elements object should give you a list of all links that link to anything containing the text fide.com, but you probably only want the first one, eg:
Element fideurl = doc.selectFirst("a[href=*=fide.com]");
From that point on, I don't want to write all the code for you, but hopefully this answer serves as a good starting point!
You can get the ID alone by calling the text() method on your Element object, but You can also get the link itself by just calling Element.attr('href')
The css selector you can use to get the other value is
div#main-col table.contentpaneopen tbody tr td table tbody tr td table tbody tr:nth-of-type(4) td table tbody tr td:first-of-type, which will get you the std score specifically, at least with standard css, so this should work with jsoup as well.
I am trying to get the value of a key from a sub-document and I can't seem to figure out how to use the BasicDBObject.get() function since the key is embedded two levels deep. Here is the structure of the document
File {
name: file_1
report: {
name: report_1,
group: RnD
Basically a file has multiple reports and I need to retrieve the names of all reports in a given file. I am able to do BasicDBObject.get("name") and I can get the value "file_1", but how do I do something like this BasicDBObject.get("report.name")? I tried that but it did not work.
You should first get the "report" object and then access its contents.You can see the sample code in the below.
DBCursor cur = coll.find();
for (DBObject doc : cur) {
String fileName = (String) doc.get("name");
DBObject report = (BasicDBObject) doc.get("report");
String reportName = (String) report.get("name");
I found a second way of doing it, on another post (didnt save the link otherwise I would have included that).
where query = (BasicDBObject) cursor.next()
You can also use queries, as in the case of MongoTemplate and so on...
Query query = new Query(Criteria.where("report.name").is("some value"));
You can try this, this worked for me
BasicDBObject query = new BasicDBObject("report.name", "some value");
I am constructing XML code using Java. See my code snippet.
Document document = null;
String xml = "";
ReportsDAO objReportsDAO = null;
logger.info("Getting XML data for Consumable Report Starts...");
objReportsDAO = new ReportsDAO();
List consumableDTOLst = objReportsDAO.getConsumableData(issuedBy, issuedTo, employeeType, itemCode, itemName, className, transactionFromDate, transactionToDate, machineCode, workOrderNumber, jobName, customerId);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument();
Element rootElmnt = (Element) document.createElement("items");
Element elmt = null;
ConsumableDTO objConsumableDTO = null;
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy");
for (int i = 0; i < consumableDTOLst.size(); i++)
objConsumableDTO = (ConsumableDTO)consumableDTOLst.get(i);
elmt = (Element) document.createElement("item");
elmt.setAttribute("IssuedBy", objConsumableDTO.getIssuedBy());
elmt.setAttribute("IssuedTo", objConsumableDTO.getIssuedTo());
elmt.setAttribute("EMPLOYECADRE", objConsumableDTO.getEmployeeType());
elmt.setAttribute("ITEMCODE", objConsumableDTO.getItemCode());
elmt.setAttribute("ITEMNAME", objConsumableDTO.getItemName());
elmt.setAttribute("ITEMCLASS", objConsumableDTO.getClassName());
elmt.setAttribute("DATE", sdf.format(objConsumableDTO.getTransactionDate()));
elmt.setAttribute("machineCode", objConsumableDTO.getMachineCode());
elmt.setAttribute("JOB", objConsumableDTO.getJobName());
elmt.setAttribute("WORKORDERNUMBER", objConsumableDTO.getWorkOrderNumber());
elmt.setAttribute("CustomerName", objConsumableDTO.getCustomerName());
elmt.setAttribute("RoleName", objConsumableDTO.getGroupName());
elmt.setAttribute("VendorName", objConsumableDTO.getVendorName());
elmt.setAttribute("QTY", String.valueOf(Math.abs(objConsumableDTO.getQuantity())));
elmt.setAttribute("unitDescription", objConsumableDTO.getUnitDescription());
elmt.setAttribute("RATEPERQTY", String.valueOf(objConsumableDTO.getRate()));
elmt.setAttribute("AMOUNT", String.valueOf(objConsumableDTO.getAmount()));
The problem is all the attributes are sorted automatically. How to restrict it?
For eg,
<empdetails age="25" name="john"/>
but i want
<empdetails name="john" age="25"/>
Please suggest some idea.
Duplicate: Order of XML attributes after DOM processing
From the accepted answer:
Look at section 3.1 of the XML
recommendation. It says, "Note that
the order of attribute specifications
in a start-tag or empty-element tag is
not significant."
If a piece of software requires
attributes on an XML element to appear
in a specific order, that software is
not processing XML, it's processing
text that looks superficially like
XML. It needs to be fixed.
If it can't be fixed, and you have to
produce files that conform to its
requirements, you can't reliably use
standard XML tools to produce those
Credit to Robert Rossney
XML attributes are not ordered. How they're output is dependent on the XML output mechanism you use.
Consequently you could write your on output mechanism, but you shouldn't rely on any consumer to consume them in an ordered fashion. If you want/need ordering, you should instead specify a sequence of XML elements below this node.