Hive - Remove substring from string - java

I need to replace substring from a given string with empty string with the substring appearing in different positions of the string.
I want to remove the "fruit":"apple" from these possible combinations of the strings and expected the corresponding string:
{"client":"web","fruit":"apple"} --> {"client":"web"}
{"fruit":"apple","client":"web"} --> {"client":"web"}
{"client":"web","fruit":"apple","version":"v1.0"} --> {"client":"web","version":"v1.0"}
{"fruit":"apple"} --> null or empty string
I used regexp_replace(str, "\,*\"fruit\"\:\"apple\"", "") but that didn't get me the expected results. What is the right way to construct the regex?

It seems that you are working with data in JSON format. Depending from included dependencies you can achieve it totally without regular expression.
For example, if you are using Google's lib Gson, then you can parse String to JsonObject and then remove property from it
String input = "your data";
JsonParser parser = new JsonParser();
JsonObject o = parser.parse(input).getAsJsonObject();
try {
String foundValue = o.getAsJsonPrimitive("fruit").getAsString();
if ("apple".equals(foundValue)) {
o.remove("fruit");
}
} catch (Exception e) {
e.printStackTrace();
}
String filteredData = o.toJSONString();
P.S. code is not final version, it might needs handling of some situations (when there is no such field, or it contains non-primitive value), need further details to cover it
P.P.S. IMO, using regex in such situatioins makes code less readable and flexible

Related

How to map values of a json object without knowing about the format?

I have the following programming requirement:
problem:
Given two JSONs A and B, if the fields x,y,z in JSON A match the fields i,o,p in B return true else false.
approach:
I want to stay away from building a matching engine that depends on the json's format. I don't want to format the jsons by using pojos and then do object matching. My approach is to convert all the jsons into a hash map and then specify the location of the fields by using a string:
Example:
money -> a,b,c
{
a :
{
b : {
c: {
money : "100"
}
}
}
}
However this approach seems to be a bit tricky as we have to take into account collections. I have to cover all of the edge cases. Is there any spring library or java tool I can use to fulfill this purpose?.
There are many libraries being used for this purpose.The most popular one is com.google.gson
Usage:
JsonObject jo = (JsonObject)(jsonParser.parse("{somejsonstring}");<br>
jo.has("objectProperty") //Check if property exists
jo.get("objectProperty") // returns JsonElement,
jo.get("objectProperty").isJsonArray() // check if the property is the type that want
jo.getAsJsonArray("objectProperty") get the property
You may simplify this work by using im.wilk.vor:Voritem library gitHub or in Maven repository.
JsonElement je_one = jsonParser.parse("{some_json_string"})
JsonElement je_two = jsonParser.parse("{other_json_string"})
VorItem vi_one = vorItemFactory.from(je_one);
VorItem vi_two = vorItemFactory.from(je_two);
if (vi_one.get("a.b.c").optionalLong().isPresent() ) {
return vi_one.get("a.b.c").optionalLong().equals(vi_one.get("i.o.p").optionalLong())
}
return false;

Java and Json- Securing json input strings from injections

I am using fasterxml and wonder how I have to handle the incoming string to prevent any kind of injections :( I googled a lot now and can't find the right informations. Can anyone help me out with this?
Updated: What I was trying to ask is that I was asked to escape the incoming json strings so that the requests can't be abused. But I can't find useful informations about Json escaping as it seems to allow quite a lot of signs.
Gson gson = new Gson();
String escaped = gson.toJson(value);
if(value instanceof String) {
if(escaped.startsWith("\"")) {
escaped = escaped.substring(1);
}
if(escaped.endsWith("\"")) {
escaped = escaped.substring(0, escaped.length() - 1);
}
return escaped;
}
value = escaped;

Java Jersey REST Request Parameter Sanitation

I'm trying to make sure my Jersey request parameters are sanitized.
When processing a Jersey GET request, do I need to filter non String types?
For example, if the parameter submitted is an integer are both option 1 (getIntData) and option 2 (getStringData) hacker safe? What about a JSON PUT request, is my ESAPI implementation enough, or do I need to validate each data parameter after it is mapped? Could it be validated before it is mapped?
Jersey Rest Example Class:
public class RestExample {
//Option 1 Submit data as an Integer
//Jersey throws an internal server error if the type is not Integer
//Is that a valid way to validate the data?
//Integer Data, not filtered
#Path("/data/int/{data}/")
#GET
#Produces(MediaType.TEXT_HTML)
public Response getIntData(#PathParam("data") Integer data){
return Response.ok("You entered:" + data).build();
}
//Option 2 Submit data as a String, then validate it and cast it to an Integer
//String Data, filtered
#Path("/data/string/{data}/")
#GET
#Produces(MediaType.TEXT_HTML)
public Response getStringData(#PathParam("data") String data) {
data = ESAPI.encoder().canonicalize(data);
if (ESAPI.validator().isValidInteger("data", data, 0, 999999, false))
{
int intData = Integer.parseInt(data);
return Response.ok("You entered:" + intData).build();
}
return Response.status(404).entity("404 Not Found").build();
}
//JSON data, HTML encoded
#Path("/post/{requestid}")
#POST
#Consumes({MediaType.APPLICATION_FORM_URLENCODED, MediaType.APPLICATION_JSON})
#Produces(MediaType.TEXT_HTML)
public Response postData(String json) {
json = ESAPI.encoder().canonicalize(json);
json = ESAPI.encoder().encodeForHTML(json);
//Is there a way to iterate through each JSON KeyValue and filter here?
ObjectMapper mapper = new ObjectMapper();
DataMap dm = new DataMap();
try {
dm = mapper.readValue(json, DataMap.class);
} catch (Exception e) {
e.printStackTrace();
}
//Do we need to validate each DataMap object value and is there a dynamic way to do it?
if (ESAPI.validator().isValidInput("strData", dm.strData, "HTTPParameterValue", 25, false, true))
{
//Is Integer validation needed or will the thrown exception be good enough?
return Response.ok("You entered:" + dm.strData + " and " + dm.intData).build();
}
return Response.status(404).entity("404 Not Found").build();
}
}
Data Map Class:
public class DataMap {
public DataMap(){}
String strData;
Integer intData;
}
The short answer is yes, though by "filter" I interpret it as "validate," because no amount of "filtering" will EVER provide you with SAFE data. You can still run into integer overflows in Java, and while those may not have immediate security concerns, they could still put parts of your application in an unplanned for state, and hacking is all about perturbing the system in ways you can control.
You packed waaaaay too many questions into one "question," but here we go:
First off, the lines
json = ESAPI.encoder().canonicalize(json);
json = ESAPI.encoder().encodeForHTML(json);
Aren't doing what you think they're doing. If your JSON is coming in as a raw String right here, these two calls are going to be applying mass rules across the entire string, when you really need to handle these with more surgical precision, which you seem to at least be subconsciously aware of in the next question.
//Is there a way to iterate through each JSON KeyValue and filter
here?
Partial duplicate of this question.
While you're in the loop discussed here, you can perform any data transformations you want, but what you should really be considering is using the JSONObject class referenced in that first link. Then you'll have JSON parsed into an object where you'll have better access to JSON key/value pairs.
//Do we need to validate each DataMap object value and is there a
dynamic way to do it?
Yes, we validate everything that comes from a user. All users are assumed to be trained hackers, and smarter than you. However if you handled filtering before you do your data mapping transformation, you don't need to do it a second time. Doing it dynamically?
Something like:
JSONObject json = new JSONObject(s);
Iterator iterator = json.keys();
while( iterator.hasNext() ){
String data = iterator.next();
//filter and or business logic
}
^^That syntax is skipping typechecks but it should get you where you need to go.
/Is Integer validation needed or will the thrown exception be good
enough?
I don't see where you're throwing an exception with these lines of code:
if (ESAPI.validator().isValidInput("strData", dm.strData, "HTTPParameterValue", 25, false, true))
{
//Is Integer validation needed or will the thrown exception be good enough?
return Response.ok("You entered:" + dm.strData + " and " + dm.intData).build();
}
Firstly, in java we have autoboxing which means this:
int foo = 555555;
String bar = "";
//the code
foo + bar;
Will be cast to a string in any instance. The compiler will promote the int to an Integer and then silently call the Integer.toString() method. Also, in your Response.ok( String ); call, THIS is where you're going to want to encodeForHTML or whatever the output context may be. Encoding methods are ALWAYS For outputting data to user, whereas canonicalize you want to call when receiving data. Finally, in this segment of code we also have an error where you're assuming that you're dealing with an HTTPParameter. NOT at this point in the code. You'll validate http Parameters in instances where you're calling request.getParameter("id"): where id isn't a large blob of data like an entire JSON response or an entire XML response. At this point you should be validating for things like "SafeString"
Usually there are parsing libraries in Java that can at least get you to the level of Java objects, but on the validation side you're always going to be running through every item and punting whatever might be malicious.
As a final note, while coding, keep these principles in mind your code will be cleaner and your thought process much more focused:
user input is NEVER safe. (Yes, even if you've run it through an XSS filter.)
Use validate and canonicalize methods whenever RECEIVING data, and encode methods whenever transferring data to a different context, where context is defined as "Html field. Http attribute. Javascript input, etc...)
Instead of using the method isValidInput() I'd suggest using getValidInput() because it will call canonicalize for you, making you have to provide one less call.
Encode ANY time your data is going to be passed to another dynamic language, like SQL, groovy, Perl, or javascript.

JAXB: Get Tag as String

This question may have been answered before in some dark recess of the Interwebs, but I couldn't even figure out how to form a meaningful Google query to search for it.
So: Suppose I have a (simplified) XML document like so:
<root>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
</tag3>
</root>
I know how to use JAXB to unmarshal this into a Java Object in the standard use cases.
What I don't know how to do is unmarshal tag3's contents wholesale into a String. By which I mean:
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
as a String, tags and all.
Use annotation #XmlAnyElement.
I've been looking for the same solution and I expected to find some annotation that prevents parsing dom and live it as it is, but did not find it.
Detail at:
Using JAXB to extract inner text of XML element
and
http://blog.bdoughan.com/2011/04/xmlanyelement-and-non-dom-properties.html
I added one cheking in method getElement(), otherwise we could get IndexOutOfBoundsException
if (xml.indexOf(START_TAG) < 0) {
return "";
}
For me it's quite strange behavior with this solution. method getElement() is called for every tag of your xml. The first call is for "Value", the second - "ValueWord", etc. It appends the next tag for previous
update:
I noticed that this approach works only for ONE occurence of tag that we want to parse to String. It's impossible to parse correctly the followint example:
<root>
<parent1>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>Foo</something1>
<something2>Bar</something2>
<something3>Baz</something3>
</tag3>
</parent1>
<parent2>
<tag1>Value</tag1>
<tag2>Word</tag2>
<tag3>
<something1>TheSecondFoo</something1>
<something2>TheSecondBar</something2>
<something3>TheSecondBaz</something3>
</tag3>
</parent2>
"tag3" with parent tag "parent2" will contain parameters from the first tag (Foo, Bar, Baz) instead of (TheSecondFoo, TheSecondBar, TheSecondBaz)
Any suggestions are appreciated.
Thanks.
I have an utility method that might come in handy for you in that case. See if it helps. I made a sample code with your example:
public static void main(String[] args){
String text= "<root><tag1>Value</tag1><tag2>Word</tag2><tag3><something1>Foo</something1><something2>Bar</something2><something3>Baz</something3></tag3></root>";
System.out.println(extractTag(text, "<tag3>"));
}
public static String extractTag(String xml, String tag) {
String value = "";
String endTag = "</" + tag.substring(1);
Pattern p = Pattern.compile(tag + "(.*?)" + endTag);
Matcher m = p.matcher(xml);
if (m.find()) {
value = m.group(1);
}
return value;
}

Parsing URL into components

I want to parse a descriptive-style URL with slashes (such as server/books/thrillers/johngrisham/thefirm), in Java.
My overall idea is to handle the data I receive to do a lookup (therefore using the URL as search criteria) in a database and then return HTML pages with data on it.
How do I do this?
String urlToParse = "server/books/thrillers/johngrisham/thefirm";
String[] parsedURL = urlToParse.split("/");
What you will have is an array of strings that you can then work with.
// parsedURL[0] == "server";
// parsedURL[1] == "books";
// parsedURL[2] == "thrillers";
// parsedURL[3] == "johngrisham";
// parsedURL[4] == "thefirm";
The split() method of String class can do the work, as commented Ionut before.

Categories