I am currently trying to use the OWASP library to remove some html from a string. A string that I have is a list:
one
two
three
Which in markup, the string looks like "<ul><li>one</li><li>two</li><li>three</li></ul>".
When I use the OWSAP library with a policy like:
PolicyFactory policy = new HtmlPolicyBuilder()
.allowUrlProtocols("https")
.allowStandardUrlProtocols()
.requireRelNofollowOnLinks()
.toFactory();
OSWAP converts the list to: "onetwothree" . I instead would like to add spaces between the list items, and be able to convert the string to "one two three". I was wondering how / if there is a way to do this with OWSAP? I am new to using this so any advice is appreciated!
This is how it works, in case someone in future faced the same issue
It's in kotlin but pretty simple to be converted to java
private class SeparateTextPostProcessor(htmlStreamEventReceiver: HtmlStreamEventReceiver, val separator: String = " ") :
HtmlStreamEventReceiverWrapper(htmlStreamEventReceiver) {
override fun text(elem: String) {
if (elem.isNotEmpty()) {
// adds a space between elements
underlying.text("$elem$separator")
} else {
underlying.text(elem)
}
}
}
fun String.stripHTML(): String {
val policy =
HtmlPolicyBuilder().withPostprocessor { SeparateTextPostProcessor(it) }.toFactory()
val sanitized = policy.sanitize(this)
// Sanitized text will contain additional escaped characters.
return sanitized.trim()
}
Related
I have a question regarding best practices considering Java regular expressions/Strings manipulation.
I have a changing String template, let's say this time it looks like this:
/get/{id}/person
I have another String that matches this pattern eg.
/get/1234ewq/person
Keep in mind that the pattern could change anytime, slashes could disappear etc.
I would like to extract the difference between the two of them i.e. the result of the processing would be 1234ewq.
I know I could iterate over them char by char and compare, but, if it is possible, I wanted to find some smart approach to it with regular expressions.
What would be the best Java approach?
Thank you.
For you to answer your question with a regex approach I built a small example class which should hint you into a direction you could go with this (see below).
The problem with this approach is that you dynamically create a regular expression that depends on your template strings. This means that you have to somehow verify that your templates do not interfere with the regex compilation and matching process itself.
Also atm if you would use the same placeholder multiple times within a template the resulting HashMap only contains the value for the last placeholder mapping of that kind.
Normally this is the expected behaviour but this depends on your strategy of filling your templates.
For template processing in general you could have a look at the mustache library.
Also as Uli Sotschok mentioned, you probably would be better of with using something like google-diff-match-patch.
public class StringExtractionFromTemplate {
public static void main(String[] args) {
String template = "/get/{id}/person";
String filledTemplate = "/get/1234ewq/person";
System.out.println(diffTemplateInsertion(template, filledTemplate).get("id"));
}
private static HashMap<String, String> diffTemplateInsertion(String template, String filledTemplate){
//language=RegExp
String placeHolderPattern = "\\{(.+)}";
HashMap<String, String> templateTranslation = new HashMap<>();
String regexedTemplate = template.replaceAll(placeHolderPattern, "(.+)");
Pattern pattern = Pattern.compile(regexedTemplate);
Matcher templateMatcher = pattern.matcher(template);
Matcher filledTemplateMatcher = pattern.matcher(filledTemplate);
while (templateMatcher.find() && filledTemplateMatcher.find()) {
if(templateMatcher.groupCount() == filledTemplateMatcher.groupCount()){
for (int i = 1; i <= templateMatcher.groupCount(); i++) {
templateTranslation.put(
templateMatcher.group(i).replaceAll(placeHolderPattern,"$1"),
filledTemplateMatcher.group(i)
);
}
}
}
return templateTranslation;
}
}
I am trying to format my Jackson Yaml serialization in a certain way.
employees:
- name: John
age: 26
- name: Bill
age: 17
But, when I serialize the object, this is the format that I get.
employees:
-
name: John
age: 26
-
name: Bill
age: 17
Is there any way to get rid of the newline at the start of an object in the array? This is purely a personal preference/human readability issue.
These are the properties I'm currently setting on the YAMLFactory:
YAMLFactory yamlFactory = new YAMLFactory()
.enable(YAMLGenerator.Feature.MINIMIZE_QUOTES) //removes quotes from strings
.disable(YAMLGenerator.Feature.WRITE_DOC_START_MARKER)//gets rid of -- at the start of the file.
.enable(YAMLGenerator.Feature.INDENT_ARRAYS);// enables indentation.
I've looked in the java docs for the YAMLGenerator in Jackson, and looked at the other questions on stackoverflow, but I can't find an option to do what I'm trying to do.
I've tried CANONICAL_OUTPUT, SPLIT_LINES and LITERAL_BLOCK_STYLE properties, the last one being automatically set when MINIMIZE_QUOTES is set.
CANONICAL_OUTPUT seems to add brackets around arrays.
SPLIT_LINES and LITERAL_BLOCK_STYLE are related to how multi-line strings are handled.
The short answer is there is currently no way to do this through Jackson. This is due to a bug within snakeyaml where if you set the indicatorIndent property, the whitespace is not handled properly, and therefore snakeyaml adds the new line.
I have found a workaround using snakeyaml directly.
//The representer allows us to ignore null properties, and to leave off the class definitions
Representer representer = new Representer() {
//ignore null properties
#Override
protected NodeTuple representJavaBeanProperty(Object javaBean, Property property, Object propertyValue, Tag customTag) {
// if value of property is null, ignore it.
if (propertyValue == null) {
return null;
}
else {
return super.representJavaBeanProperty(javaBean, property, propertyValue, customTag);
}
}
//Don't print the class definition
#Override
protected MappingNode representJavaBean(Set<Property> properties, Object javaBean) {
if (!classTags.containsKey(javaBean.getClass())){
addClassTag(javaBean.getClass(), Tag.MAP);
}
return super.representJavaBean(properties, javaBean);
}
};
DumperOptions dumperOptions = new DumperOptions();
//prints the yaml as nested blocks
dumperOptions.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
//indicatorIndent indents the '-' character for lists
dumperOptions.setIndicatorIndent(2);
//This is the workaround. Indent must be set to at least 2 higher than the indicator indent because of how whitespace is handled.
//If not set to 2 higher, then the newline is added.
dumperOptions.setIndent(4);
Yaml yaml = new Yaml(representer, dumperOptions);
//prints the object to a yaml string.
yaml.dump(object);
The workaround happens with setting the indent property on the DumperOptions. You need to set the indent to a value at least 2 higher than the indicatorIndent, or else the newline will be added. This is due to how whitespace is handled within snakeyaml.
Looking at the Jackson source code the snakeyaml dumper options are created like:
#Andrey does this look good to you?
protected DumperOptions buildDumperOptions(int jsonFeatures, int yamlFeatures,
org.yaml.snakeyaml.DumperOptions.Version version)
{
DumperOptions opt = new DumperOptions();
// would we want canonical?
if (Feature.CANONICAL_OUTPUT.enabledIn(_formatFeatures)) {
opt.setCanonical(true);
} else {
opt.setCanonical(false);
// if not, MUST specify flow styles
opt.setDefaultFlowStyle(FlowStyle.BLOCK);
}
// split-lines for text blocks?
opt.setSplitLines(Feature.SPLIT_LINES.enabledIn(_formatFeatures));
// array indentation?
if (Feature.INDENT_ARRAYS.enabledIn(_formatFeatures)) {
// But, wrt [dataformats-text#34]: need to set both to diff values to work around bug
// (otherwise indentation level is "invisible". Note that this should NOT be necessary
// but is needed up to at least SnakeYAML 1.18.
// Also looks like all kinds of values do work, except for both being 2... weird.
opt.setIndicatorIndent(1);
opt.setIndent(2);
}
// 14-May-2018: [dataformats-text#84] allow use of platform linefeed
if (Feature.USE_PLATFORM_LINE_BREAKS.enabledIn(_formatFeatures)) {
opt.setLineBreak(DumperOptions.LineBreak.getPlatformLineBreak());
}
return opt;
}
I ran into this and ended up writing this blog entry to describe the solution I came up with. In short, I made a custom subclass of YAMLGenerator and YAMLFactory and used that to configure Jackson's YAMLMapper. Not "clean" but not huge and reasonably effective. Let me set arbitrary DumperOptions
Source below, but also available in this gist.
Warning - I did this all in Kotlin, but it's trivial code, and should be easily back-portable to Java:
val mapper: YAMLMapper = YAMLMapper(MyYAMLFactory()).apply {
registerModule(KotlinModule())
setSerializationInclusion(JsonInclude.Include.NON_EMPTY)
}
class MyYAMLGenerator(
ctx: IOContext,
jsonFeatures: Int,
yamlFeatures: Int,
codec: ObjectCodec,
out: Writer,
version: DumperOptions.Version?
): YAMLGenerator(ctx, jsonFeatures, yamlFeatures, codec, out, version) {
override fun buildDumperOptions(
jsonFeatures: Int,
yamlFeatures: Int,
version: DumperOptions.Version?
): DumperOptions {
return super.buildDumperOptions(jsonFeatures, yamlFeatures, version).apply {
//
// NOTE: CONFIGURATION HAPPENS HERE!!
//
defaultScalarStyle = ScalarStyle.LITERAL;
defaultFlowStyle = FlowStyle.BLOCK
indicatorIndent = 2
nonPrintableStyle = ESCAPE
indent = 4
isPrettyFlow = true
width = 100
this.version = version
}
}
}
class MyYAMLFactory(): YAMLFactory() {
#Throws(IOException::class)
override fun _createGenerator(out: Writer, ctxt: IOContext): YAMLGenerator {
val feats = _yamlGeneratorFeatures
return MyYAMLGenerator(ctxt, _generatorFeatures, feats,_objectCodec, out, _version)
}
}
In today's environment using at least version 2.12.x of jackson-dataformat-yaml thanks to the introduction of YAMLGenerator.Feature.INDENT_ARRAYS_WITH_INDICATOR, you can simply just do:
public ObjectMapper yamlObjectMapper() {
final YAMLFactory factory = new YAMLFactory()
.enable(YAMLGenerator.Feature.INDENT_ARRAYS_WITH_INDICATOR)
.enable(YAMLGenerator.Feature.MINIMIZE_QUOTES)
.disable(YAMLGenerator.Feature.WRITE_DOC_START_MARKER);
return new ObjectMapper(factory);
}
Which will give you the output of:
employees:
- name: John
age: 26
- name: Bill
age: 17
I have a property file (a.txt) which has the values (Example values given below) like below
test1=10
test2=20
test33=34
test34=35
By reading this file, I need to produce an output like below
value = 35_20_34_10
which means => I have a pattern like test34_test2_test33_test1
Note, If the 'test33' has any value other than 34 then I need to produce the value like below
value = 35_20_10
which means => I have a pattern like test34_test2_test1
Now my problem is, every time when the customer is making the change in the logic, I am making the change in the code. So what I expect is, I want to keep the logic (pattern) in another property file so I will be sending the two inputs to the util (one input is the property file (A.txt) another input will be the 'pattern.txt'),
My util has to be compare the A.txt and the business logic 'pattern.txt' and produce the output like
value = 35_20_34_10 (or)
value = 35_20_10
If there an example for such pattern based logic as I expect?
Any predefined util / java class does this?
Any help would be Great.
thanks,
Harry
First of all, svasa's answer makes a lot of sense, but covers different level of
abstraction. I recommend you read his answer too, that pattern should
be useful.
You may wanna look at Apache Velocity and FreeMarker libraries to see how they structure their API.
Those are template engines - they usually have some abstraction of pattern or format, and abstraction of variable/value binding (or namespace, or source). You can render a template by binding it with a binding/namespace, which yields the result.
For example, you may wanna have a pattern "<a> + <b>", and binding that looks like a map: {a: "1", b: "2"}. By binding that binding to that pattern you'll get "1 + 2", when interpreting <...> as variables.
You basically load the pattern from your pattern.txt, then load your data file A.txt (for example, by treating it as properties and using Properties class) and construct binding based on these properties. You'll get your output and possibility to customize the pattern all the time.
You may call the sequences like test34_test2_test33_test1 as a pattern, let me call them as constraints when building something.
To me this problem best fits into a
builder pattern.
When building the value you want, you tell the builder that these are my constraints(pattern) and these are my original properties like below:
new MyPropertiesBuilder().setConstraints(constraints).setProperties(original).buildValue();
Details:
Set some constraints in a separate file where you specify the order of the properties and their values like :
test34=desiredvalue-could-be-empty
test2=desiredvalue-could-be-empty
test33=34
test1=desiredvalue-could-be-empty
The builder goes over the constraints in the order specified, but get the values from the original properties and build the desired string.
One way to achieve your requirement through builder pattern is to define classes like below :
Interface:
public interface IMyPropertiesBuilder
{
public void setConstraints( Properties properties );
public void setProperties( Properties properties );
public String buildValue();
}
Builder
public class MyPropertiesBuilder implements IMyPropertiesBuilder
{
private Properties constraints;
private Properties original;
#Override
public void setConstraints( Properties constraints )
{
this.constraints = constraints;
}
#Override
public String buildValue()
{
StringBuilder value = new StringBuilder();
Iterator it = constraints.keySet().iterator();
while ( it.hasNext() )
{
String key = (String) it.next();
if ( original.containsKey( key ) && constraints.getProperty( key ) != null && original.getProperty( key ).equals( constraints.getProperty( key ) ) )
{
value.append( original.getProperty( key ) );
value.append( "_" );
}
}
return value.toString();
}
#Override
public void setProperties( Properties properties )
{
this.original = properties;
}
}
User
public class MyPropertiesBuilderUser
{
private Properties original = new Properties().load(new FileInputStream("original.properties"));;
private Properties constraints = new Properties().load(new FileInputStream("constraints.properties"));
public String getValue()
{
String value = new MyPropertiesBuilder().setConstraints(constraints).setProperties(original).buildValue();
}
}
I need to replace substring from a given string with empty string with the substring appearing in different positions of the string.
I want to remove the "fruit":"apple" from these possible combinations of the strings and expected the corresponding string:
{"client":"web","fruit":"apple"} --> {"client":"web"}
{"fruit":"apple","client":"web"} --> {"client":"web"}
{"client":"web","fruit":"apple","version":"v1.0"} --> {"client":"web","version":"v1.0"}
{"fruit":"apple"} --> null or empty string
I used regexp_replace(str, "\,*\"fruit\"\:\"apple\"", "") but that didn't get me the expected results. What is the right way to construct the regex?
It seems that you are working with data in JSON format. Depending from included dependencies you can achieve it totally without regular expression.
For example, if you are using Google's lib Gson, then you can parse String to JsonObject and then remove property from it
String input = "your data";
JsonParser parser = new JsonParser();
JsonObject o = parser.parse(input).getAsJsonObject();
try {
String foundValue = o.getAsJsonPrimitive("fruit").getAsString();
if ("apple".equals(foundValue)) {
o.remove("fruit");
}
} catch (Exception e) {
e.printStackTrace();
}
String filteredData = o.toJSONString();
P.S. code is not final version, it might needs handling of some situations (when there is no such field, or it contains non-primitive value), need further details to cover it
P.P.S. IMO, using regex in such situatioins makes code less readable and flexible
I have the following method which I use to get the object converted to yaml representation (which I can eg. print to console)
#Nonnull
private String outputObject(#Nonnull final ObjectToPrint packageSchedule) {
DumperOptions options = new DumperOptions();
options.setAllowReadOnlyProperties(true);
options.setPrettyFlow(true);
return new Yaml(new Constructor(), new JodaTimeRepresenter(), options).dump(ObjectToPrint);
}
All is good, but for some object contained within ObjectToPrint structure I get something like reference name and not the real object content, eg.
!!com.blah.blah.ObjectToPrint
businessYears:
- businessYearMonths: 12
ppiYear: &id001 {
endDate: 30-06-2013,
endYear: 2013,
startDate: 01-07-2012,
startYear: 2012
}
ppiPeriod:
ppiYear: *id001
endDate: 27-03-2014
startDate: 21-06-2013
units: 24.000
number: 1
As you can see from the example above I have ppiYear object printed (marked as $id001) and the same object is used in ppiPeriod but only the reference name is printed there, not the object content.
How to print the object content everytime I use that object within my structure, which I want to be converted to yaml (ObjectToPrint).
PS. It would be nice not to print the reference name at all (&id001) but thats not crucial
This is because you reference the same object at different places. To avoid this you need to create copies of those objects.
Yaml does not have a flag to switch this off, because you might get into endless loops in case of cyclic references.
However you might tweak Yaml source code to ignore double references:
look at Serializer line ~170 method serializeNode:
...
if ( this.serializedNodes.contains(node) ) {
this.emmitter.emit( new AliasEvent( ... ) );
} else {
serializedNodes.add(node); // <== Replace with myHook(serializedNodes,node);
...
void myHook(serializedNodes,node) {
if ( node's class != myClass(es) to avoid ) {
serializedNodes.add(node);
}
if you find a way to avoid Yaml to put nodes into the serializedNodes collection, your problem will be solved, however your programm will loop endless in case of cyclic references.
Best solution is to add a hook which avoids registering only the class you want to be written plain.
As a way to do this without changing the SnakeYAML source code, you can define:
public class NonAnchorRepresenter extends Representer {
public NonAnchorRepresenter() {
this.multiRepresenters.put(Map.class, new RepresentMap() {
public Node representData(Object data) {
return representWithoutRecordingDescendents(data, super::representData);
}
});
}
protected Node representWithoutRecordingDescendents(Object data, Function<Object,Node> worker) {
Map<Object,Node> representedObjectsOnEntry = new LinkedHashMap<Object,Node>(representedObjects);
try {
return worker.apply(data);
} finally {
representedObjects.clear();
representedObjects.putAll(representedObjectsOnEntry);
}
}
}
and use it as e.g. new Yaml(new SafeConstructor(), new NonAnchorRepresenter());
This only does maps, and it does use anchors when it has to, i.e. where a map refers to an ancestor. It would need similar extensions for sets and lists if required. (In my case it was empty maps that were the biggest offender.)
(It would be more easily handled in the SnakeYAML codebase on Representer.representData looking at an option e.g. setAllowAnchors defaulting to true, with the logic above to reset representedObjects after recursing if disallowed. I take the point at https://stackoverflow.com/a/18419489/109079 about the possibility of endless loops but it would be straightforward using this strategy to detect any reference to a parent map and fail fast.)
Another solution which i came by is using io.circle instead snake-yaml if it possible.
Scala code:
private[this] def removeAnchors(configYaml: String): String = {
val withoutAnchorsObj = io.circe.yaml.parser.parse(configYaml).valueOr(throw _)
val withoutAnchorString = io.circe.yaml.Printer(dropNullKeys = true, mappingStyle = Printer.FlowStyle.Block).pretty(withoutAnchorsObj)
logger.info(s"Removed Anchors from configYaml: $configYaml, result: $withoutAnchorString")
withoutAnchorString
}
build.sbt:
val circeVersion = "0.12.0"
libraryDependencies ++= Seq(
"io.circe" %% "circe-yaml" % circeVersion,
"io.circe" %% "circe-parser" % circeVersion,
)