Rest-Assured XSD References Other XSDs - java

I am programming an XML validator according to schemas using Rest-Assured. However, I am having trouble handling XSDs that reference other XSDs, because I retrieve the original XSD from a URL using GET.
I have been trying to implement my own parsing to consolidate the XSDs(Strings) into one XSD(String), but it is becoming a recursive monster, and extremely inefficient/difficult. To see the algorithm, look at the end of the post.
I have two questions:
1) My problem is that I am using GET to retrieve the XSD, so it's not within the namespace. Is there a way to GET all referenced XSDs and consolidate them using Rest-Assured? I wouldn't have a clue about how to go about this.
2) Is there a better way to handle includes in general? As you can see, my algorithm is very costly and overcomplicated (especially the ref attribute), and I'm sure something will break easily if I change my test cases.
My algorithm(Pseudo-Code to avoid complexity) so far is like the following:
boolean xmlValid(String xmlAddress, String xsdAddress){
LinkedList XSDList = new LinkedList;
XSDList.add(xsdAddress);
xsdString = getExternalXSDStrings(XSDList);
try{ //No PseudoCode here
RestAssured.expect().
statusCode(200).
body(
RestAssuredMatchers.matchesXsd(xsdString)).
when().
get(xmlAddress);
}catch Exceptions{...}
}
String getExternalXSDStrings(LinkedList xsdReferences, String prevString){
LinkedList recursiveXSDReferences = new LinkedList();
for(xsdRef:xsdReferences){
xsdAddress = "http://..." + xsdRef;
Open InputStream From URL;
while(inputLine != null){
if(prologFlag) //Do Nothing, this is to avoid multiple prologs ;
else if(includeFlag){
if(refFlag) Note Reference;
else recursiveXSDReferences.add(includeReference);
}else if(refFlag){
referenceDefinition = Extract Reference Element Definition;
xsdString = xsdString + referenceDefinition;
}else{
xsdString = xsdString + inputLine;
}
}
Close input stream;
}
xsdString = prevString + xsdString;
if(xsdReferences.length > 0) return getExternalXSDStrings(recursiveXSDReferences , xsdString);
else return xsdString;
}
Thank you very much in advance!

Perhaps can make use of the XmlConfig in detailed configuration. This gives you access to configure features and namespaces etc. For example if you want to disable the loading of external DTD's you could do:
given().config(RestAssured.config().xmlConfig(xmlConfig().disableLoadingOfExternalDtd())). ..
So perhaps you could look in the "disableLoadingOfExternalDtd" method to see how it's implemented to get some hints.

Related

Trying to add substrings from newLines in a large file to a list

I downloaded my extended listening history from Spotify and I am trying to make a program to turn the data into a list of artists without doubles I can easily make sense of. The file is rather huge because it has data on every stream I have done since 2016 (307790 lines of text in total). This is what 2 lines of the file looks like:
{"ts":"2016-10-30T18:12:51Z","username":"edgymemes69endmylifepls","platform":"Android OS 6.0.1 API 23 (HTC, 2PQ93)","ms_played":0,"conn_country":"US","ip_addr_decrypted":"68.199.250.233","user_agent_decrypted":"unknown","master_metadata_track_name":"Devil's Daughter (Holy War)","master_metadata_album_artist_name":"Ozzy Osbourne","master_metadata_album_album_name":"No Rest for the Wicked (Expanded Edition)","spotify_track_uri":"spotify:track:0pieqCWDpThDCd7gSkzx9w","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"fwdbtn","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":0,"incognito_mode":false},
{"ts":"2021-03-26T18:15:15Z","username":"edgymemes69endmylifepls","platform":"Android OS 11 API 30 (samsung, SM-F700U1)","ms_played":254120,"conn_country":"US","ip_addr_decrypted":"67.82.66.3","user_agent_decrypted":"unknown","master_metadata_track_name":"Opportunist","master_metadata_album_artist_name":"Sworn In","master_metadata_album_album_name":"Start/End","spotify_track_uri":"spotify:track:3tA4jL0JFwFZRK9Q1WcfSZ","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"trackdone","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":1616782259928,"incognito_mode":false},
It is formatted in the actual text file so that each stream is on its own line. NetBeans is telling me the exception is happening at line 19 and it only fails when I am looking for a substring bounded by the indexOf function. My code is below. I have no idea why this isn't working, any ideas?
import java.util.*;
public class MainClass {
public static void main(String args[]){
File dat = new File("SpotifyListeningData.txt");
List<String> list = new ArrayList<String>();
Scanner swag = null;
try {
swag = new Scanner(dat);
}
catch(Exception e) {
System.out.println("pranked");
}
while (swag.hasNextLine())
if (swag.nextLine().length() > 1)
if (list.contains(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album"))))
System.out.print("");
else
try {list.add(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album")));}
catch(Exception e) {}
System.out.println(list);
}
}
Find a JSON parser you like.
Create a class that with the fields you care about marked up to the parsers specs.
Read the file into a collection of objects. Most parsers will stream the contents so you're not string a massive string.
You can then load the data into objects and store that as you see fit. For your purposes, a TreeSet is probably what you want.
Your code will throw a lot of exceptions only because you don't use braces. Please do use braces in each blocks, whether it is if, else, loops, whatever. It's a good practice and prevent unnecessary bugs.
However, everytime scanner.nextLine() is called, it reads the next line from the file, so you need to avoid using that in this way.
The best way to deal with this is to write a class containing the fields same as the json in each line of the file. And map the json to the class and get desired field value from that.
Your way is too much risky and dependent on structure of the data, even on whitespaces. However, I fixed some lines in your code and this will work for your purpose, although I actually don't prefer operating string in this way.
while (swag.hasNextLine()) {
String swagNextLine = swag.nextLine();
if (swagNextLine.length() > 1) {
String toBeAdded = swagNextLine.substring(swagNextLine.indexOf("artist_name") + "artist_name".length() + 2
, swagNextLine.indexOf("master_metadata_album_album") - 2);
if (list.contains(toBeAdded)) {
System.out.print("Match");
} else {
try {
list.add(toBeAdded);
} catch (Exception e) {
System.out.println("Add to list failed");
}
}
System.out.println(list);
}
}

Leave entities as-is when parsing XML with Woodstox

I'm using Woodstox to process an XML that contains some entities (most notably >) in the value of one of the nodes. To use an extreme example, it's something like this:
<parent> < > & " &apos; </parent>
I have tried a lot of different configuration options for both WstxInputFactory (IS_REPLACING_ENTITY_REFERENCES, P_TREAT_CHAR_REFS_AS_ENTS, P_CUSTOM_INTERNAL_ENTITIES...) and WstxOutputFactory, but no matter what I try, the output is always something like this:
<parent>nbsp; < nbsp; > & " ' nbsp;</parent>
(> gets converted to >, < stays the same, loses the &...)
I'm reading the XML with an XMLEventReader created with
XMLEventReader reader = wstxInputFactory.createXMLEventReader(new StringReader(fulltext));
after configuring the WstxInputFactory.
Is there any way to configure Woodstox to just ignore all entities and output the text exactly as it was in the input String?
First of all, you need to include actual code since "output is always something like this" makes no sense without explaining exactly how are you outputting content that is parsed: you may be printing events, using some library, or perhaps using Woodstox stream or event writer.
Second: there is difference in XML between small number of pre-defined entities (lt, gt, apos, quot, amp), and arbitary user-defined entities like what nbsp here would be. Former you can use as-is, they are already defined; latter only exist if you define them in DTD.
Handling of the two groups is different, too; former will always be expanded no matter what, and this is by XML specification. Latter will be resolved (unless resolution disabled), and then expanded -- or if not defined exception will be thrown.
You can also specify custom resolver as mention by the other answer; but this will only be used for custom entities (here, ).
In the end it is also good to explain not what you are doing as much as what you are trying to achieve. That will help suggest things better than specific questions of "how do I do X" which may not be the ways to go about.
And as to configuration of Woodstox, maybe this blog entry:
https://medium.com/#cowtowncoder/configuring-woodstox-xml-parser-woodstox-specific-properties-1ce5030a5173
will help (as well as 2 others in the series) -- it covers existing configuration settings.
The basic five XML entities (quot, amp, apos, lt, gt) will be always processed. As far as I know there is no way to get the source of them with Sax.
For the other entities you can process them manually. You can capture the events until the end of the element and concatenate the values:
XMLInputFactory factory = WstxInputFactory.newInstance();
factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, Boolean.FALSE);
XMLEventReader xmlr = factory.createXMLEventReader(
this.getClass().getResourceAsStream(xmlFileName));
String value = "";
while (xmlr.hasNext()) {
XMLEvent event = xmlr.nextEvent();
if (event.isCharacters()) {
value += event.asCharacters().getData();
}
if (event.isEntityReference()) {
value += "&" + ((EntityReference) event).getName() + ";";
}
if (event.isEndElement()) {
// Assign it to the right variable
System.out.println(value);
value = "";
}
}
For your example input:
<parent> < > & " &apos; </parent>
The output will be:
< > & " '
Otherwise if you want to convert all the entities maybe you could use a custom XmlResolver for undeclared entities:
public class NaiveHtmlEntityResolver implements XMLResolver {
private static final Map<String, String> ENTITIES = new HashMap<>();
static {
ENTITIES.put("nbsp", " ");
ENTITIES.put("apos", "'");
ENTITIES.put("quot", "\"");
// and so on
}
#Override
public Object resolveEntity(String publicID,
String systemID,
String baseURI,
String namespace) throws XMLStreamException {
if (publicID == null && systemID == null) {
return ENTITIES.get(namespace);
}
return null;
}
}
And then tell Woodstox to use it for the undeclared entities:
factory.setProperty(WstxInputProperties.P_UNDECLARED_ENTITY_RESOLVER, new NaiveHtmlEntityResolver());

Java Jersey REST Request Parameter Sanitation

I'm trying to make sure my Jersey request parameters are sanitized.
When processing a Jersey GET request, do I need to filter non String types?
For example, if the parameter submitted is an integer are both option 1 (getIntData) and option 2 (getStringData) hacker safe? What about a JSON PUT request, is my ESAPI implementation enough, or do I need to validate each data parameter after it is mapped? Could it be validated before it is mapped?
Jersey Rest Example Class:
public class RestExample {
//Option 1 Submit data as an Integer
//Jersey throws an internal server error if the type is not Integer
//Is that a valid way to validate the data?
//Integer Data, not filtered
#Path("/data/int/{data}/")
#GET
#Produces(MediaType.TEXT_HTML)
public Response getIntData(#PathParam("data") Integer data){
return Response.ok("You entered:" + data).build();
}
//Option 2 Submit data as a String, then validate it and cast it to an Integer
//String Data, filtered
#Path("/data/string/{data}/")
#GET
#Produces(MediaType.TEXT_HTML)
public Response getStringData(#PathParam("data") String data) {
data = ESAPI.encoder().canonicalize(data);
if (ESAPI.validator().isValidInteger("data", data, 0, 999999, false))
{
int intData = Integer.parseInt(data);
return Response.ok("You entered:" + intData).build();
}
return Response.status(404).entity("404 Not Found").build();
}
//JSON data, HTML encoded
#Path("/post/{requestid}")
#POST
#Consumes({MediaType.APPLICATION_FORM_URLENCODED, MediaType.APPLICATION_JSON})
#Produces(MediaType.TEXT_HTML)
public Response postData(String json) {
json = ESAPI.encoder().canonicalize(json);
json = ESAPI.encoder().encodeForHTML(json);
//Is there a way to iterate through each JSON KeyValue and filter here?
ObjectMapper mapper = new ObjectMapper();
DataMap dm = new DataMap();
try {
dm = mapper.readValue(json, DataMap.class);
} catch (Exception e) {
e.printStackTrace();
}
//Do we need to validate each DataMap object value and is there a dynamic way to do it?
if (ESAPI.validator().isValidInput("strData", dm.strData, "HTTPParameterValue", 25, false, true))
{
//Is Integer validation needed or will the thrown exception be good enough?
return Response.ok("You entered:" + dm.strData + " and " + dm.intData).build();
}
return Response.status(404).entity("404 Not Found").build();
}
}
Data Map Class:
public class DataMap {
public DataMap(){}
String strData;
Integer intData;
}
The short answer is yes, though by "filter" I interpret it as "validate," because no amount of "filtering" will EVER provide you with SAFE data. You can still run into integer overflows in Java, and while those may not have immediate security concerns, they could still put parts of your application in an unplanned for state, and hacking is all about perturbing the system in ways you can control.
You packed waaaaay too many questions into one "question," but here we go:
First off, the lines
json = ESAPI.encoder().canonicalize(json);
json = ESAPI.encoder().encodeForHTML(json);
Aren't doing what you think they're doing. If your JSON is coming in as a raw String right here, these two calls are going to be applying mass rules across the entire string, when you really need to handle these with more surgical precision, which you seem to at least be subconsciously aware of in the next question.
//Is there a way to iterate through each JSON KeyValue and filter
here?
Partial duplicate of this question.
While you're in the loop discussed here, you can perform any data transformations you want, but what you should really be considering is using the JSONObject class referenced in that first link. Then you'll have JSON parsed into an object where you'll have better access to JSON key/value pairs.
//Do we need to validate each DataMap object value and is there a
dynamic way to do it?
Yes, we validate everything that comes from a user. All users are assumed to be trained hackers, and smarter than you. However if you handled filtering before you do your data mapping transformation, you don't need to do it a second time. Doing it dynamically?
Something like:
JSONObject json = new JSONObject(s);
Iterator iterator = json.keys();
while( iterator.hasNext() ){
String data = iterator.next();
//filter and or business logic
}
^^That syntax is skipping typechecks but it should get you where you need to go.
/Is Integer validation needed or will the thrown exception be good
enough?
I don't see where you're throwing an exception with these lines of code:
if (ESAPI.validator().isValidInput("strData", dm.strData, "HTTPParameterValue", 25, false, true))
{
//Is Integer validation needed or will the thrown exception be good enough?
return Response.ok("You entered:" + dm.strData + " and " + dm.intData).build();
}
Firstly, in java we have autoboxing which means this:
int foo = 555555;
String bar = "";
//the code
foo + bar;
Will be cast to a string in any instance. The compiler will promote the int to an Integer and then silently call the Integer.toString() method. Also, in your Response.ok( String ); call, THIS is where you're going to want to encodeForHTML or whatever the output context may be. Encoding methods are ALWAYS For outputting data to user, whereas canonicalize you want to call when receiving data. Finally, in this segment of code we also have an error where you're assuming that you're dealing with an HTTPParameter. NOT at this point in the code. You'll validate http Parameters in instances where you're calling request.getParameter("id"): where id isn't a large blob of data like an entire JSON response or an entire XML response. At this point you should be validating for things like "SafeString"
Usually there are parsing libraries in Java that can at least get you to the level of Java objects, but on the validation side you're always going to be running through every item and punting whatever might be malicious.
As a final note, while coding, keep these principles in mind your code will be cleaner and your thought process much more focused:
user input is NEVER safe. (Yes, even if you've run it through an XSS filter.)
Use validate and canonicalize methods whenever RECEIVING data, and encode methods whenever transferring data to a different context, where context is defined as "Html field. Http attribute. Javascript input, etc...)
Instead of using the method isValidInput() I'd suggest using getValidInput() because it will call canonicalize for you, making you have to provide one less call.
Encode ANY time your data is going to be passed to another dynamic language, like SQL, groovy, Perl, or javascript.

Accessing annotations in UIMA

Is there a way in UIMA to access the annotations from the tokens like the same way they do in their CAS debugger GUI?. You can of course access all the annotations from the index repository, but i want to loop on the tokens, and get all associated annotations to every token.
The reason for that is simply, I want to want to check some annotations and discard the others and in such way it is much easier. Any help is appreciated :)
I'm a uimaFIT developer.
If you want to find all annotations within the boundaries of another annotation, you may prefer the shorter and faster variant
JCasUtil.selectCovered(referenceAnnotation, <T extends ANNOTATION>);
Mind that it is not a good idea creating a "dummy" annotation with the desired offsets and then search within its boundaries, because this immediately allocates memory in the CAS which and is not garbage-collected unless the complete CAS is collected.
After searching and asking the developers of cTAKES( Apache clinical Text Analysis and Knowledge Extraction System ). you can use the following library "uimafit" which can be found on http://code.google.com/p/uimafit/ . The following code can be used
List list = JCasUtil.selectCovered(jcas, <T extends Annotation>, startIndex, endIndex);
This will return all the between the 2 indices.
Hope that will help
if you don't want to use uimaFIT, you can create a filtered iterator to loop through annotations of interest.
The UIMA reference documentation is here: UIMA reference documentation
I recently used this approach in some code to find a sentence annotation which encompassed a regex annotation (this approach was acceptable for our project because all regular expression matches were shorter than the sentences in the document, and there was only one regex match per sentence. Obviously, based on indexing rules, your mileage may vary. If you are afraid of running into another shorterAnnotationType, put the inner code into a while loop):
static ArrayList<annotationsPair> process(Annotation shorterAnnotationType,
Annotation longerAnnotationType, JCas aJCas){
ArrayList<annotationsPair> annotationsList = new ArrayList<annotationsPair>();
FSIterator it = aJCas.getAnnotationIndex().iterator();
FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint();
constraint.add(shorterAnnotationType.getType());
constraint.add(longerAnnotationType.getType());
it = aJCas.createFilteredIterator(it, constraint);
Annotation a = null;
int shorterBegin = -1;
int shorterEnd = -1;
it.moveTo((shorterAnnotationType));
while (it.isValid()) {
a = (Annotation) it.get();
if (a.getClass() == shorterAnnotationType.getClass()){
shorterBegin = a.getBegin();
shorterEnd = a.getEnd();
System.out.println("Target annotation from " + shorterBegin
+ " to " + shorterEnd);
//because assume that sentence type is longer than other type,
//the sentence gets indexed prior
it.moveToPrevious();
if(it.isValid()){
Annotation prevAnnotation = (Annotation) it.get();
if (prevAnnotation.getClass() == longerAnnotationType.getClass()){
int sentBegin = prevAnnotation.getBegin();
int sentEnd = prevAnnotation.getEnd();
System.out.println("found annotation [" + prevAnnotation.getCoveredText()
+ "] location: " + sentBegin + ", " + sentEnd);
annotationsPair pair = new annotationsPair(a, prevAnnotation);
annotationsList.add(pair);
}
//return to where you started
it.moveToNext(); //will not invalidate iter because just came from next
}
}
it.moveToNext();
}
return annotationsList;
}
Hope this helps!
Disclaimer: I am new to UIMA.

Is it ok to handle a class metadata through reflection to ensure a DRY approach?

The title might seem unsettling, but let me explain.
I'm facing an interesting challenge, where I have a hierarchy of classes that have associated an object that stores metadata related to each one of its attributes (an int-valued enum with edit flags like UPDATED or NO_UPDATE).
The problem comes when merging two objects, because I dont want to check EVERY field on a class to see if it was updated and skip or apply the changes.
My idea: Reflection.
All the objects are behind an interface, so I could use IObject.class.getMethods() and iterate over that array in this fashion:
IClass class = //Instance of the first class;
IAnotherClass anotherClass = //Instance of the second class;
for(Method m : IObject.class.getMethods()) {
if(m.getName().startsWith("get")) {
try {
//Under this method (which is a getter) I cast it on
//both classes who implement interfaces that extend an
//interface that defines the getters to make them
//consistent and ensure I'll invoke the same methods.
String propertyClass = (String)m.invoke(class);
String propertyAnotherClass = (String)m.invoke(anotherClass);
if(propertyClass != propertyAnotherClass) {
//Update attribute and attribute status.
}
} catch (Exception e) {
}
}
}
Is there another way to implement this or should I stick to lengthy methods invoking attribute per attribute and doing the checks like that?. The objects are not going to change that much and the architecture is quite modular, so there is not much update involved if the fields change but having to change a method like that worries me a little.
EDIT 1: I'm posting a working code of what I have got so far. This code is a solution for me but, tough it works, I'm using it as a last resource not because I have time to spend but because I don't want to rediscover the wheel. If I use it, I'll make a static list with the methods so I only have to fetch that list once, considering the fact that AlexR pointed out.
private static void merge(IClazz from, IClazz to) {
Method methods[] = from.getClass().getDeclaredMethods();
for(Method m : methods) {
if(m.getName().startsWith("get") && !m.getName().equals("getMetadata")) {
try {
String commonMethodAnchor = m.getName().split("get")[1];
if(!m.getReturnType().cast(m.invoke(from)).equals(m.getReturnType().cast(m.invoke(to)))) {
String setterMethodName = "set" + commonMethodAnchor;
Method setter = IClazz.class.getDeclaredMethod(setterMethodName, m.getReturnType());
setter.invoke(to, m.getReturnType().cast(m.invoke(from)));
//Updating metadata
String metadataMethodName = "set" + commonMethodAnchor + "Status";
Method metadataUpdater = IClazzMetadata.class.getDeclaredMethod(metadataMethodName, int.class);
metadataUpdater.invoke(to.getMetadata(), 1);
}
} catch (Exception e) {
}
}
}
}
metadataUpdater sets the value to 1 just to simulate the "UPDATED" flag I'm using on the real case scenario.
EDIT 3: Thanks Juan, David and AlexR for your suggestions and directions! They really pointed me to consider things I did not consider at first (I'm upvoting all your answers because all of them helped me).
After adding what AlexR sugegsted and checking jDTO and Apache Commons (finding out that in the end the general concepts are quite similar) I've decided to stick to my code instead of using other tools, since it is working given the object hierarchy and metadata structure of the solution and there are no exceptions popping up so far. The code is the one on the 2nd edit and I've placed it on a helper class that did the trick in the end.
Apache Commons Bean Utils may resolve your problem: http://commons.apache.org/beanutils/
If you want to copy all properties, try to use copyProperties: http://commons.apache.org/beanutils/v1.8.3/apidocs/src-html/org/apache/commons/beanutils/BeanUtils.html#line.134
Look an example from: http://www.avajava.com/tutorials/lessons/how-do-i-copy-properties-from-one-bean-to-another.html
FromBean fromBean = new FromBean("fromBean", "fromBeanAProp", "fromBeanBProp");
ToBean toBean = new ToBean("toBean", "toBeanBProp", "toBeanCProp");
System.out.println(ToStringBuilder.reflectionToString(fromBean));
System.out.println(ToStringBuilder.reflectionToString(toBean));
try {
System.out.println("Copying properties from fromBean to toBean");
BeanUtils.copyProperties(toBean, fromBean);
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
}
System.out.println(ToStringBuilder.reflectionToString(fromBean));
System.out.println(ToStringBuilder.reflectionToString(toBean));
I think the best approach would be using proxy objects, either dynamic proxies or cglib enhancers or something like it, so you decorate the getters and setters and you can keep track of the changes there.
Hope it helps.
Your approach is OK, but keep in mind that getMethod() is much slower than invoke(), so if your code is performance critical you will probably want to cache the Method objects.

Categories