Parse response from Wikipedia API

Parse response from Wikipedia API - java

I am trying to parse response from the Wikipedia API (MediaWiki). The URL i am using are of the form -
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Argo_(2012_film)
Response from the api has the wikipedia content inside a xml tag which looks like : (this is just an incomplete sample)
{{Use mdy dates|date=October 2012}} {{Infobox film | name = Argo | image =
Argo2012Poster.jpg | alt = <!-- See: WP:ALT --> | caption = Theatrical release poster |
tagline = "The movie was fake. The mission was real." | director = [[Ben Affleck]] |
producer = [[Grant Heslov]]<br />Ben Affleck<br />[[George Clooney]] | based on = {{Based
on|''The Master of Disguise''|[[Tony Mendez|Antonio J. Mendez]]}}<br />{{Based on|''The
Great Escape''|[[Joshuah Bearman]]}} | screenplay = [[Chris Terrio]] | starring = Ben
Affleck<br />[[Bryan Cranston]]<br />[[Alan Arkin]]<br />[[John Goodman]] | music =
[[Alexandre Desplat]] | cinematography = [[Rodrigo Prieto]] | editing = [[William
Goldenberg]] | studio = [[Graham King|GK Films]]<br />[[Smokehouse Pictures]] | distributor =
[[Warner Bros.]] | released = {{Film date|2012|08|31|Telluride Film
Festival|2012|10|12|United States}} | runtime = 120 minutes<ref> ...continued
This does not look like JSON or XML, how do i parse this?

If you want to get the content parsed as HTML, add &rvparse to the query.
For example when you execute the query
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Argo_%282012_film%29&rvparse
the response contains something like (after skipping the infobox):
<i><b>Argo</b></i> is a 2012 American <a href="/wiki/Political_thriller"
title="Political thriller">political thriller</a> film directed by Ben Affleck.

Related

Index out of Bounds Exception - when comparing two(2) lists of Strings

I am comparing two List of Strings, which finish comparing successfully, but then after, I get a -
java.lang.IndexOutOfBoundsException: Index: 7, Size: 7
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at java.util.Collections$UnmodifiableList.get(Collections.java:1309)
at com.cucumber.CucumberWebGui.StepDefinitions.tableHeaders(StepDefinitions.java:254)
at ✽.Then table with id "content" has header values of(tableHeader.feature:9)
The first list I pass in from a cucumber feature file. The second I collect from the table headers at this website - "http://toolsqa.com/automation-practice-table/"
I have tried changing the for loop, but it doesn't help. I have read other people's same issue on Stack Overflow, but I cannot solve it.
I don't know what to do.
Here is the code and feature file -
Code -
#SuppressWarnings("deprecation")
#Then("^table with id \"([^\"]*)\" has header values of$")
public void tableHeaders(String id, DataTable table) {
java.util.List<java.util.List<String>> expectedHeaders = table.raw();
WebElement container = driver.findElement(By.id(id));
List<WebElement> allHeaders = container.findElements(By.tagName("th"));
List<String> actualHeaders = new ArrayList<String>();
for (WebElement header : allHeaders) {
actualHeaders.add(header.getText());
}
for (int i = 0; i < actualHeaders.size(); i++) {
Assert.assertEquals(expectedHeaders.get(i).get(0), actualHeaders.get(i));
}
}
Feature File -
Scenario: Test Table Header assertion
Then table with id "content" has header values of
| Structure |
| Country |
| City |
| Height |
| Built |
| Rank |
| … |

Probably because expectedHeaders has less elements than actualHeaders.

Sending table index in SNMP trap

I have implemented an SNMP agent using snmp4j, and am running into a bit of a snag with how to properly report SNMPv3 traps/notifications to an SNMP manager. The implementation is set up to manage a table of values indexed by instanceId. When you query the agent you receive the OIDs of the various fields suffixed by ".1", ".2", ".3", etc. based on the particular link instance associated with the OID value. So querying results in:
-----------------------------------------
| Entity | OID | Value |
-----------------------------------------
| Link1 | linkAlias.1 | Link 1 |
| Link2 | linkAlias.2 | Link 2 |
| Link1 | linkState.1 | 1 |
| Link2 | linkState.2 | 3 |
| Link1 | linkText.1 | UP |
| Link2 | linkText.2 | INITIALIZING |
-----------------------------------------
That works great. However, I need to be able to send traps in a similar way so that the index of the table is sent with the trap. That way alarms triggered from SNMP queries can be cleared when the link status changes I tried simply adding the instanceId as a varbind as seen below in my code block, but the entity is always reported as the generic "Link". Has anyone encountered this that could help me solve this? Anything is appreciated greatly.
public static void updateLinkStatus(int instanceId, LinkState status)
{
boolean varChanged = false;
Variable[] currentManagedObject = currentManagedObjects.get(instanceId);
if(currentManagedObject != null)
{
// If we are managing this object
if(((Integer32)currentManagedObject[1]).toInt() != status.value)
varChanged = true;
// Update the managed object Status Integer and Status Text
currentManagedObject[1] = new Integer32(status.value);
currentManagedObject[2] = new OctetString(status.getLinkStateText());
}
else
{
varChanged = true; // No previous record to check against
}
// Send trap now if not equal to previous value
if(varChanged)
{
OID trapOid = null;
int linkState = LinkState.UNKNOWN.value; // Will be overridden
String linkStateText = null;
if(status == LinkState.DOWN)
{
trapOid = oidLinkDown;
linkState = LinkState.DOWN.value;
linkStateText = LnkState.DOWN.getLinkStateText();
}
else if(status == LinkState.MISCONFIGURED)
{
trapOid = oidLinkMisconfigured;
linkState = LinkState.MISCONFIGURED.value;
linkStateText = LinkState.MISCONFIGURED.getLinkStateText();
}
else if(status == LinkState.UP)
{
trapOid = oidLinkUp;
linkState = LinkState.UP.value;
linkStateText = LinkState.UP.getLinkStateText();
}
else if(status == LinkState.INITIALIZING)
{
trapOid = oidLinkInitializing;
linkState = LinkState.INITIALIZING.value;
linkStateText = LinkState.INITIALIZING.getLinkStateText();
}
else
{
// Theoretically, should never change to LinkState.UNKNOWN - no trap available for it
linkState = LinkState.UNKNOWN.value;
linkStateText = LinkState.UNKNOWN.getLinkStateText();
}
// Create variable bindings for V3 trap
if(trapOid != null)
{
List<VariableBinding> variableBindings = new ArrayList<VariableBinding>();
variableBindings.add(new VariableBinding(oidLinkState, new Integer32(linkState)));
variableBindings.add(new VariableBinding(oidLinkText, new OctetString(linkStateText)));
variableBindings.add(new VariableBinding(oidLinkInstanceID, new Integer32(instanceId)));
//I've tried the below varbind too and it also doesn't work
//variableBindings.add(new VariableBinding(new OID(oidLinkInstanceID.toIntArray(), instanceId)));
agent.sendTrap_Version3(trapOid, variableBindings);
}
}
}
Edit:
Note: The links are dynamically configurable so I cannot simply define each link with a separate OID; I define the base OID in the MIB and need to add the index dynamically.

I discovered the solution. I attempted to simply add the instance index to the OID and send a NULLOBJ as the value in the varbind like so:
variableBindings.add(new VariableBinding(new OID(oidLinkInstanceID.toIntArray(), instanceId)));
But the manager rejected the message. So I added the instance index to the OID as well as the varbind value like so:
variableBindings.add(new VariableBinding(new OID(oidLinkInstanceID.toIntArray(), instanceId), new Integer32(instanceId)));
And the manager reports the entity as Link1, Link2, etc. consistent with the SNMP table.

Java - data structure for storing and getting translations (as object properties)

I want to save translation values in a class. Because it's to convenient, Java's Locale implementation seems like the correct key for the mapping. The problem is: If I just use HashMap<Locale, String> translations = ...; for the translation, my code will not be able to fall back when a specific locale is not available.
How can I achieve a good data structure for storing translations of an object?
Note that these translations are not translations of the program elements, like an user interface, imagine the class being a Dictionary entry, so each class has its own amount of translations that are different every time.
Here is a example of what the problem with a HashMap would be:
import java.util.HashMap;
import java.util.Locale;
public class Example
{
private final HashMap<Locale, String> translationsMap = new HashMap<>();
/*
* +------------------------+-------------------+-------------------+
* | Input | Expected output | Actual output |
* +------------------------+-------------------+-------------------+
* | new Locale("en") | "enTranslation" | "enTranslation" |
* | new Locale("en", "CA") | "enTranslation" | null | <-- Did not fall back
* | new Locale("de") | "deTranslation" | "deTranslation" |
* | new Locale("de", "DE") | "deTranslation" | null | <-- Did not fall back
* | new Locale("de", "AT") | "deATTranslation" | "deATTranslation" |
* | new Locale("fr") | "frTranslation" | "frTranslation" |
* | new Locale("fr", "CA") | "frTranslation" | null | <-- Did not fall back
* +------------------------+-------------------+-------------------+
*/
public String getTranslation(Locale locale)
{
return translationsMap.get(locale);
}
public void addTranslation(Locale locale, String translation)
{
translationsMap.put(locale, translation);
}
// dynamic class initializer
{
addTranslation(new Locale("en"), "enTranslation");
addTranslation(new Locale("de"), "deTranslation");
addTranslation(new Locale("fr"), "frTranslation");
addTranslation(new Locale("de", "AT"), "deATTranslation");
}
}

This is a little bit hacky, but it works. Using a ResourceBundle.Control, it's possible to use a standard implementation for fallbacks.
private Map<Locale, String> translations = new HashMap<>();
/** static: this instance is not modified or bound, it can be reused for multiple instances */
private static final ResourceBundle.Control CONTROL = ResourceBundle.Control.getControl(ResourceBundle.Control.FORMAT_PROPERTIES);
#Nullable
public String getTranslation(#NotNull Locale locale)
{
List<Locale> localeCandidates = CONTROL.getCandidateLocales("_dummy_", locale); // Sun's implementation discards the string argument
for (Locale currentCandidate : localeCandidates)
{
String translation = translations.get(currentCandidate);
if (translation != null)
return translation;
}
return null;
}

Have your classes extend ListResourceBundle.
See here: https://docs.oracle.com/javase/tutorial/i18n/resbundle/list.html

Regular expression, omit bracket

Need a help in figuring out the regular expression where I need to remove all the data between {{ and }}?
Below is the coupus:
{{for|the American actor|Russ Conway (actor)}}
{{Use dmy dates|date=November 2012}}
{{Infobox musical artist <!-- See Wikipedia:WikiProject_Musicians -->
| birth_name = Trevor Herbert Stanford
| birth_date = {{birth date|1925|09|2|df=y}}
| birth_place = [[Bristol]], [[England]], UK
| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place = [[Eastbourne]], [[Sussex]], England, UK
| origin =
}}
record|hits]].<ref name="British Hit Singles & Albums"/>
{{reflist}}
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*{{YouTube|TnIpQhDn4Zg|Russ Conway playing Side Saddle}}
{{Authority control|VIAF=41343596}}
<!-- Metadata: see [[Wikipedia:Persondata]] -->
{{Persondata
| NAME =Conway, Russ
}}
{{DEFAULTSORT:Conway, Russ}}
[[Category:1925 births]]
Below is the output with all the curly braces are removed along with the text within it:
record|hits]].<ref name="British Hit Singles & Albums"/>
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*
<!-- Metadata: see [[Wikipedia:Persondata]] -->
[[Category:1925 births]]
P.S - I have omitted the space in the output, I will take care of that.

This would take care of nested {{ }}
Matcher m=Pattern.compile("\\{[^{}]*\\}").matcher(input);
while(m.find())
{
input=m.replaceAll("");
m.reset(input);
}

string.replaceAll("\\{\\{[\\s\\S]*?\\}\\}","");
will produce:
record|hits]].<ref name="British Hit Singles & Albums"/>
==External links==
*[http://www.russconway.co.uk/ Russ Conway]
*
<!-- Metadata: see [[Wikipedia:Persondata]] -->
[[Category:1925 births]]

is there an alternate to javax.mail.search?

I'm using GNU NNTP to connect to leafnode, which is an NNTP server, on localhost. The GNU API utilizes javax.mail.Message, which comes with the following caveat:
From the Message API:
..the message number for a particular Message can change during a
session if other messages in the Folder are deleted and expunged.
So, currently, I'm using javax.mail.search to search for a known message. Unfortunately, for each search the entire folder has be searched. I could keep the folder open and in that way speed the search a bit, but it just seems klunky.
What's an alternate approach to using javax.mail.search? This:
SearchTerm st = new MessageIDTerm(id);
List<Message> messages = Arrays.asList(folder.search(st));
works fine when the javax.mail.Folder only has a few Message's. However, for very large Folder's there must be a better approach. Instead of the Message-ID header field, Xref might be preferable, but still has the same fundamental problem of searching strings.
Here's the database, which just needs to hold enough information to find/get/search the Folder's for a specified message:
mysql>
mysql> use usenet;show tables;
Database changed
+------------------+
| Tables_in_usenet |
+------------------+
| articles |
| newsgroups |
+------------------+
2 rows in set (0.00 sec)
mysql>
mysql> describe articles;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | bigint(20) | NO | PRI | NULL | auto_increment |
| MESSAGEID | varchar(255) | YES | | NULL | |
| NEWSGROUP_ID | bigint(20) | YES | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql>
mysql> describe newsgroups;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| ID | bigint(20) | NO | PRI | NULL | auto_increment |
| NEWSGROUP | varchar(255) | YES | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql>
While the schema is very simple at the moment, I plan to add complexity to it.
messages are queried for with getMessage():
package net.bounceme.dur.usenet.model;
import java.util.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.mail.*;
import javax.mail.search.MessageIDTerm;
import javax.mail.search.SearchTerm;
import net.bounceme.dur.usenet.controller.Page;
public enum Usenet {
INSTANCE;
private final Logger LOG = Logger.getLogger(Usenet.class.getName());
private Properties props = new Properties();
private Folder root = null;
private Store store = null;
private List<Folder> folders = new ArrayList<>();
private Folder folder = null;
Usenet() {
LOG.fine("controller..");
props = PropertiesReader.getProps();
try {
connect();
} catch (Exception ex) {
Logger.getLogger(Usenet.class.getName()).log(Level.SEVERE, "FAILED TO LOAD MESSAGES", ex);
}
}
public void connect() throws Exception {
LOG.fine("Usenet.connect..");
Session session = Session.getDefaultInstance(props);
session.setDebug(true);
store = session.getStore(new URLName(props.getProperty("nntp.host")));
store.connect();
root = store.getDefaultFolder();
setFolders(Arrays.asList(root.listSubscribed()));
}
public List<Message> getMessages(Page page) throws Exception {
Newsgroup newsgroup = new Newsgroup(page);
LOG.fine("fetching.." + newsgroup);
folder = root.getFolder(newsgroup.getNewsgroup());
folder.open(Folder.READ_ONLY);
List<Message> messages = Arrays.asList(folder.getMessages());
LOG.fine("..fetched " + folder);
return Collections.unmodifiableList(messages);
}
public List<Folder> getFolders() {
LOG.fine("folders " + folders);
return Collections.unmodifiableList(folders);
}
private void setFolders(List<Folder> folders) {
this.folders = folders;
}
public Message getMessage(Newsgroup newsgroup, Article article) throws MessagingException {
LOG.fine("\n\ntrying.." + newsgroup + article);
String id = article.getMessageId();
Message message = null;
folder = root.getFolder(newsgroup.getNewsgroup());
folder.open(Folder.READ_ONLY);
SearchTerm st = new MessageIDTerm(id);
List<Message> messages = Arrays.asList(folder.search(st));
LOG.severe(messages.toString());
if (!messages.isEmpty()) {
message = messages.get(0);
}
LOG.info(message.getSubject());
return message;
}
}
The problem, which I'm only now realizing, is that:
...the message number for a particular Message can change during a session if other messages in the Folder are deleted and expunged.
Regardless of which particular header is used, it's something like:
Message-ID: <x1-CZwog1NTZLd68+JJY35Zrl9OqXE#gwene.org>
or
Xref: dur.bounceme.net gwene.com.economist:541
So that there's always a String which needs parsing and searching, which is quite awkward.
I do notice that MimeMessage has a very convenient getMessageID method. Unfortunately, GNU uses javax.mail.Message and not MimeMessage. Granted, it's possible to instantiate a folder and MimeMessage, but I don't see any savings there in that from one run to another there's no guarantee that getMessageID will return the correct message.
The awkward solution I see is to maybe create a persistent folder of MimeMessage's, but that seems like overkill.
Hence, using a header, either Xref or Message-ID and then parsing and searching strings...
Is there a better way?

javax.mail is a lowest-common-denominator API, and it's behavior depends entirely on what is the backend. So, without knowing what you are talking to, it's not really possible to give a good answer to your question. Chances are, however, that you'll need to talk directly to whatever you're talking to and learn more about its behavior.
This might be a comment rather than an answer, but I'm thinking that the information that this API is just a thin layer might be enough information to justify.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse response from Wikipedia API - java

Related

Index out of Bounds Exception - when comparing two(2) lists of Strings

Sending table index in SNMP trap

Java - data structure for storing and getting translations (as object properties)

Regular expression, omit bracket

is there an alternate to javax.mail.search?

Categories

Resources