How does this normalize function work?

How does this normalize function work? - java

I was doing a Junit tutorial and I came across this normalize function that was being tested. It was defined like this:
public static String normalizeWord(String word) {
try {
int i;
Class<?> normalizerClass = Class.forName("java.text.Normalizer");
Class<?> normalizerFormClass = null;
Class<?>[] nestedClasses = normalizerClass.getDeclaredClasses();
for (i = 0; i < nestedClasses.length; i++) {
Class<?> nestedClass = nestedClasses[i];
if (nestedClass.getName().equals("java.text.Normalizer$Form")) {
normalizerFormClass = nestedClass;
}
}
assert normalizerFormClass.isEnum();
Method methodNormalize = normalizerClass.getDeclaredMethod(
"normalize",
CharSequence.class,
normalizerFormClass);
Object nfcNormalization = null;
Object[] constants = normalizerFormClass.getEnumConstants();
for (i = 0; i < constants.length; i++) {
Object constant = constants[i];
if (constant.toString().equals("NFC")) {
nfcNormalization = constant;
}
}
return (String) methodNormalize.invoke(null, word, nfcNormalization);
} catch (Exception ex) {
return null;
}
}
How does this function work? What is it actually doing?

It does the same as:
import java.text.Normalizer;
try {
return Normalizer.normalize(word, Normalizer.Form.NFC);
} catch (Exception ex) {
return null;
}
Except that all operations are performed via Reflection.

It's using reflection to call
java.text.Normalizer.normalize(word, java.text.Normalizer.Form.NFC);
Presumably to allow it to run on Java versions before 1.6 which don't have this class.

This function offers services regarding strings normalization for Unicode.
In Unicode, you can represent the same thing in many ways. For example, you have a character with accent. You can represent it joined, using one single Unicode character, or decomposed (the original letter, without accents, then the modifier - the accent).
The class comes in Java 6. For Java 5, there's a SUN proprietary class.
See class info.olteanu.utils.TextNormalizer in Phramer project (http://sourceforge.net/projects/phramer/ , www.phramer.org ) for a way to get a normalizer both in Java 5 (SUN JDK) and in Java 6, without any compilation issues (the code will compile in any version >= 5 and the code will run in both JVMs, although SUN discarded the Java 5 proprietary class).

Related

Saxon-HE Java Extension - How to I access the value of a xsl-variable which is passed as a parameter?

I have created a function using the Saxon documentation which has 3 parameters. The function takes an input string and pads it out to a specific size using an integer and string values.
padStringLeft(inputStr,size,padChar)
If I put this in my XSLT and hard wire the parameters the function works.
<debug1><xsl:value-of select="c4j_XSLT_Ext_padStringLeft:padStringLeft('1',4,'0')" /></debug1>
The output from the above would be '0001'
When I pass the contents of a XSLT variable however and set a debug / break point in my java function I can see that I'm getting param0 as a lazysequence.
<debug2><xsl:value-of select="c4j_XSLT_Ext_padStringLeft:padStringLeft($myvar,4,'0')" /></debug2>
Java function
As my code is attempting to treat it as a string it does not work.
How should I be handling this scenario, how do I access the value or the xsl-variable/param and what if sometimes I want to use a literal string instead of a variable?
public class XSLT_Ext_padStringLeft extends ExtensionFunctionDefinition
{
#Override
public SequenceType[] getArgumentTypes()
{
return new SequenceType[]{SequenceType.SINGLE_STRING,SequenceType.SINGLE_INTEGER, SequenceType.SINGLE_STRING};
}
#Override
public StructuredQName getFunctionQName()
{
return new StructuredQName("c4j_XSLT_Ext_padStringLeft", "http://com.commander4j.Transformation.XSLT_Ext_padStringLeft", "padStringLeft");
}
#Override
public SequenceType getResultType(SequenceType[] arg0)
{
return SequenceType.SINGLE_STRING;
}
#Override
public ExtensionFunctionCall makeCallExpression()
{
return new ExtensionFunctionCall() {
#Override
public Sequence call(XPathContext context, Sequence[] arguments) throws XPathException {
String inputStr;
try
{
inputStr = ((StringValue)arguments[0]).getStringValue();
} catch (ClassCastException ex)
{
inputStr = "";
}
long size;
try
{
String temp =arguments[1].toString();
size = Integer.valueOf(temp);
} catch (ClassCastException ex)
{
size = 1;
}
String padStr;
try
{
padStr = ((StringValue)arguments[2]).getStringValue();
} catch (ClassCastException ex)
{
padStr = "";
}
String result = inputStr;
while (result.length() < size)
{
result = padStr + result;
}
return StringValue.makeStringValue(result);
}
};
}
}
Thanks
Dave

In general the parameters are passed as instance of the class net.sf.saxon.om.Sequence, and you should only use the methods on the interface Sequence, rather than examining what particular kind of Sequence it is, because that could change in the future.
If you're expecting a singleton sequence (that is, a single item), call head() to get the first item in the sequence (this will return null if the sequence is empty). You will then have an instance of net.sf.saxon.om.Item. (The Sequence might already be an Item, because an item is a sequence, but you can't rely on that, and calling head() is safer than casting.) If you're expecting a string, you can safely call getStringValue() on this item to get the value as a string.
Also note, Saxon uses lazy evaluation wherever possible, which means that the string might not actually be computed until someone asks for its value. This means that innocent-looking calls like head() and getStringValue() can actually throw exceptions, and you need to be prepared for this.
So in short, you should replace
inputStr = ((StringValue)arguments[0]).getStringValue();
with
inputStr = arguments[0].head().getStringValue();

Also note, Saxon uses lazy evaluation wherever possible, which means that the string might not actually be computed until someone asks for its value. This means that innocent-looking calls like head() and getStringValue() can actually throw exceptions, and you need to be prepared for this.
So if I understand you correctly - when I call Transform to process the XSLT transformation it will call each of my custom java external functions as needed but the reference to
inputStr = arguments[0].head().getStringValue();
could generate an exception?
I would then need to do something within the java function to force it to get the value - or would I let the exception propogate back to the calling Transformation and catch it there ?
Dave

Accessing JavaScript array through Java with graal.js

I am migrating a project from Nashorn to graal.js. The project utilizes a large amount of scripts (over 3,400) and there's a function on the Java side which invokes a method; the method returns a JavaScript list of objects.
function filterList(ob)
{
var list = [];
var arr = ob.toArray();
for(var i = 0; i < arr.length; i++)
{
if(global.isValid(arr[i]))
{
list.push(arr[i]);
}
}
return list;
}
This worked fine on Nashorn previously with the use of ScriptUtils. This code was written by the developer who worked on the project before I picked it up:
try {
Object p = iv.invokeFunction("filterList", this.getList());
if(p != null) {
List<MyObj> lObj = new ArrayList<>(((Map<String, MyObj>)(ScriptUtils.convert(p, Map.class))).values());
return lObj;
}
} catch (ScriptException | NoSuchMethodException ex) {
ex.printStackTrace();
}
How can I access the array through Java with graal.js? I have tried using Value.asValue(p)as(MyObj[]) to no avail. I have also tried following the Nashorn migration guide where they suggest to cast the object to List or Map, to no avail either.
I'm aware of a solution where I would have to rewrite the script to just use new Java.type('java.util.ArrayList'); and return a List rather than an array - however there are thousands of scripts and rewriting all of them will be incredibly tedious.

you can use Value.getArraySize() and Value.getArrayElement(index):
Context context = Context.newBuilder("js").build();
Value result = context.eval("js", "var list=[1,2,'foo',true]; list;");
if (result.hasArrayElements()) {
for (int i=0;i<result.getArraySize();i++) {
System.out.println(result.getArrayElement(i));
}
}
You find the full JavaDoc of the Value class in https://www.graalvm.org/sdk/javadoc/org/graalvm/polyglot/Value.html
Best,
Christian

How to determine line number for the method with java ASM?

I need to determine line number of specific method in class using ObjectWeb ASM library. Line number of method declaration or first line in method's body are equally accepted as right answers (6 or 7 in example).
Example:
1. public class Foo {
...
6. public void bar() {
7. try {
8. try {
9. System.out.println(); //first executable line
I try to use MethodVisitor's visitLineNumber method, but it visit only first executable line (line 9 in example).
I found solution for this problem on JavaAssist library (link). But is there a way to solve this with ASM?
EDIT:
Following snippet gave same result, line 9 instead of 6 or 7.
public static int getLineNumber(String path) throws IOException {
final File f = new File(path);
try (FileInputStream fis = new FileInputStream(f)) {
ClassReader reader = new ClassReader(fis);
ClassNode clNode = new ClassNode(Opcodes.ASM5);
reader.accept(clNode, Opcodes.ASM5);
for (MethodNode mNode : (List<MethodNode>) clNode.methods) {
if (mNode.name.equals("bar")) {
ListIterator<AbstractInsnNode> it = mNode.instructions.iterator();
while (it.hasNext()) {
AbstractInsnNode inNode = it.next();
if (inNode instanceof LineNumberNode) {
return ((LineNumberNode) inNode).line;
}
}
}
}
}
return -1;
}

The line numbers provided by any bytecode processing library are based on the LineNumberTable attribute which maps executable instructions of the method to line numbers. So it’s a fundamental limitation that you can not find source code lines in the class file which do not cause the generation of executable byte code.
Sometimes it even depends on the compiler, which source code line a construct spanning multiple lines gets assigned to.

public static LineNumberNode findLineNumberForInstruction(InsnList
insnList, AbstractInsnNode insnNode) {
Validate.notNull(insnList);
Validate.notNull(insnNode);
int idx = insnList.indexOf(insnNode);
Validate.isTrue(idx != -1);
// Get index of labels and insnNode within method
ListIterator<AbstractInsnNode> insnIt = insnList.iterator(idx);
while (insnIt.hasPrevious()) {
AbstractInsnNode node = insnIt.previous();
if (node instanceof LineNumberNode) {
return (LineNumberNode) node;
}
}
return null;
}

Java String Mutability - java.lang.NoSuchFieldException: offset

I'm new to Java and I saw a Q&A section here with two examples where mutability is removed. Upon testing MutableString.java:
import java.lang.reflect.Field;
public class MutableString {
public static void main(String[] args) {
String s = "Immutable";
String t = "Notreally";
mutate(s, t);
StdOut.println(t);
// strings are interned so this doesn't even print "Immutable" (!)
StdOut.println("Immutable");
}
// change the first min(|s|, |t|) characters of s to t
public static void mutate(String s, String t) {
try {
Field val = String.class.getDeclaredField("value");
Field off = String.class.getDeclaredField("offset");
val.setAccessible(true);
off.setAccessible(true);
int offset = off.getInt(s);
char[] value = (char[]) val.get(s);
for (int i = 0; i < Math.min(s.length(), t.length()); i++)
value[offset + i] = t.charAt(i);
}
catch (Exception e) { e.printStackTrace(); }
}
}
I received the following error:
java.lang.NoSuchFieldException: offset
Any input on the following would be greatly appreciated:
a) why do I get this exception
b) how do I check which fields exist in a class (Java strings specifically)

Disclaimer: these kinds of hacks are interesting lessons in learning and fun trivia. But they are definitely not something that you want to use in any production code. It will lead to pain.
By their very nature, such a hack always depends on implementation details of the classes that are hacked.
In your case you seem to be using a String implementation that doesn't have an offset field, but uses some other mechanism (or maybe just a different name!).
For example, I've just reviewed the Oracle Java 7 String class and it no longer has the offset field (which was used in Java 6 and earlier to share the char[] among substrings)!*
You can use Class.getDeclaredFields() to check which fields this implementation does define:
for (Field f : String.class.getDeclaredFields()) {
System.out.println(f);
}
For a version of that hack that works with Java 7, you could do this:
public static void mutate(String s, String t) {
try {
Field val = String.class.getDeclaredField("value");
val.setAccessible(true);
char[] value = (char[]) val.get(s);
for (int i = 0; i < Math.min(s.length(), t.length()); i++)
value[i] = t.charAt(i);
}
catch (Exception e) { e.printStackTrace(); }
}
Of course, this too will break if the internals of String change again.
* Here's an Email that talks about that change, it seems that the sharing of the char[] only lead to improved performance in a few, special cases.

Java how common is extending/wrapping built-in classes

I'm new to the Java language and I've tried to write my first relatively complex program. After I wrote a few classes I've realized that I barely use built-in classes (like BigInteger, MessageDigest, ByteBuffer) directly because they don't totally fit my needs. Instead I write my own class and inside the class I use the built-in class as an attribute.
Example:
public class SHA1 {
public static final int SHA_DIGEST_LENGTH = 20;
private MessageDigest md;
public SHA1() {
try {
md = MessageDigest.getInstance("SHA-1");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
}
public void update(byte[] data) {
md.update(data);
}
public void update(BigNumber bn) {
md.update(bn.asByteArray());
}
public void update(String data) {
md.update(data.getBytes());
}
public byte[] digest() {
return md.digest();
}
}
With the following simple class I don't have to use try catch when using SHA1, I can put my custom BigNumber class as parameter and I can also put String as parameter to update function.
The following BigNumber class contains all of the functions what I need and exactly how I need them.
public class BigNumber {
private BigInteger m_bn;
public BigNumber() {
m_bn = new BigInteger("0");
}
public BigNumber(BigInteger bn) {
m_bn = bn;
}
public BigNumber(String hex) {
setHexStr(hex);
}
//reversed no minsize
public byte[] asByteArray() {
return asByteArray(0, true);
}
//reversed with minsize
public byte[] asByteArray(int minSize) {
return asByteArray(minSize, true);
}
public byte[] asByteArray(int minSize, boolean rev) {
byte[] mag = m_bn.toByteArray();
//delete sign bit
//there is always a sign bit! so if bitNum % 8 is zero then
//the sign bit created a new byte (0th)
if(getNumBits() % 8 == 0) {
byte[] tmp = new byte[mag.length-1];
System.arraycopy(mag, 1, tmp, 0, mag.length-1);
mag = tmp;
}
//extend the byte array if needed
int byteSize = (minSize >= getNumBytes()) ? minSize : getNumBytes();
byte[] tmp = new byte[byteSize];
//if tmp's length smaller then byteSize then we keep 0x00-s from left
System.arraycopy(mag, 0, tmp, byteSize-mag.length, mag.length);
if(rev) ByteManip.reverse(tmp);
return tmp;
}
public String asHexStr() {
return ByteManip.byteArrayToHexStr(asByteArray(0, false));
}
public void setHexStr(String hex) {
m_bn = new BigInteger(hex, 16);
}
public void setBinary(byte[] data) {
//reverse = true
ByteManip.reverse(data);
//set as hex (binary set has some bug with the sign bit...)
m_bn = new BigInteger(ByteManip.byteArrayToHexStr(data), 16);
}
public void setRand(int byteSize) {
byte[] tmp = new byte[byteSize];
new Random().nextBytes(tmp);
//reversing byte order, but it doesn't really matter since it is a random
//number
setBinary(tmp);
}
public int getNumBytes() {
return (m_bn.bitLength() % 8 == 0) ? (m_bn.bitLength() / 8) : (m_bn.bitLength() / 8 + 1);
}
public int getNumBits() {
return m_bn.bitLength();
}
public boolean isZero() {
return m_bn.equals(BigInteger.ZERO);
}
//operations
public BigNumber modExp(BigNumber exp, BigNumber mod) {
return new BigNumber(m_bn.modPow(exp.m_bn, mod.m_bn));
}
public BigNumber mod(BigNumber m) {
return new BigNumber(m_bn.mod(m.m_bn));
}
public BigNumber add(BigNumber bn) {
return new BigNumber(m_bn.add(bn.m_bn));
}
public BigNumber subtract(BigNumber bn) {
return new BigNumber(m_bn.subtract(bn.m_bn));
}
public BigNumber multiply(BigNumber bn) {
return new BigNumber(m_bn.multiply(bn.m_bn));
}
}
My question is that how common in Java language to use these kind of classes instead of the built-in classes? Does it make my code unreadable for other programmers (compared to implementing everything with built-in classes)?
I've read that new C++ programmers desperately trying to write codes they used to write in C therefore the benefits of C++ remains hidden for them.
I'm afraid I do something like that in Java: trying to implement everything on my own instead of using the build-in classes directly.
Is this happening (for example in the BigNumber class)?
Thank you for your opinions!

I normally write a utility class which will support me to handle logics. Such as
public class CommonUtil{
public byte[] asByteArray(int minSize)
{
return "something".getBytes();
}
// add more utility methods
}

Wrapping a class makes sense when you add some value by doing so. If you are adding small functionality it can be worth using a Utility class instead of wrapping an existing one.

I think that if you do not have a very good reason for implementing the same functionality again you should not probably do it. Here are several reasons:
Built-in classes are used by a lot of people around the world and therefore there are less bugs than in your code
Users that are experienced in Java will be better in using standard classes than your classes and they will need less time to understand your code and write something new in your project
Built-in classes have good documentations and therefore it is much easier to use them
You are wasting your time by implementing something that was implemented and tested by Java professionals. It is better to concentrate on your own project
If you are writing a long-term project you will need to support all your classes. Oracle is already supporting built-in classes. Let them do their job!
The last but not the least. Are you sure that you know more about the problem than an author of a built-in class? Only if the answer is yes, consider writing your own implementation. Even implementation of daily used classes, such as collections or time-related classes can be tricky.

You're not gaining anything by making a class that does this stuff for you. If you're going to be doing certain operations a lot, then you might want to create a new class with static methods that do these important things for you.
Let's assume that you want a sorted array at all times. You could make a new class, let's call it SortedArray. You could sort it whenever you add something in, but why would you do that when you can just add in everything and then call the (utility) method Arrays.sort?
For common operations, take a look at Java's Arrays class - if you are doing something often, that's something you make a method for, like searching and sorting. In your case, you might make a utility method that turns the BigInteger into a byte array for you. You shouldn't be just making your own, 'better' version that does what you want it. When other people look at your code, when you use standard objects it's much better, instead of having custom objects that don't really do anything.

As #Shark commented, there's no point in creating your own solutions, because:
They take time to create
They become not as flexible
However, you can extend classes (it's recommended) or use 3rd party frameworks that might suit you better.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How does this normalize function work? - java

It does the same as: import java.text.Normalizer; try { return Normalizer.normalize(word, Normalizer.Form.NFC); } catch (Exception ex) { return null; } Except that all operations are performed via Reflection.

It's using reflection to call java.text.Normalizer.normalize(word, java.text.Normalizer.Form.NFC); Presumably to allow it to run on Java versions before 1.6 which don't have this class.

Related

Saxon-HE Java Extension - How to I access the value of a xsl-variable which is passed as a parameter?

Accessing JavaScript array through Java with graal.js

How to determine line number for the method with java ASM?

Java String Mutability - java.lang.NoSuchFieldException: offset

Java how common is extending/wrapping built-in classes

Categories

Resources