Reading Unicode Characters using Selenium Webdriver in Java

Reading Unicode Characters using Selenium Webdriver in Java - java

I am trying to get the text of the following web element including its unicode character (Copyright symbol).
© 2021 ABC Inc. All rights reserved.
enter image description here
I tried getWebDriver().findElement(elem).getText() but that gives me the following output.
? 2021 ABC Inc. All rights reserved.
I saw a few posts on this from earlier but still could not figure out how to go about reading this web element so that I capture unicode symbol (©) as well.
Appreciate any suggestions in this regard.
Thanks!

Update: I am confused after finding that number 169 decimal is the COPYRIGHT SIGN character in both Unicode and Windows-1252. So I have no idea as to what is really going!
I will leave this Answer as-is in case the code is helpful to anyone trying to solve this mystery.
Likely due to a limited (non-Unicode) character set and encoding in use by whatever means you generated your output.
Here is demo code showing your example string dumped to console via System.out by using the current default Charset, using UTF-8, and using the limited legacy Windows-1252.
See this example code run live at IdeOne.com.
import java.util.*;
import java.lang.*;
import java.io.*;
import java.nio.charset.StandardCharsets ;
import java.nio.charset.Charset ;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String blurb = "© 2021 ABC Inc. All rights reserved." ;
// The character set and encoding currently in use by `System.out` is not known, some default.
System.out.println( "----------| default |--------------------------" );
System.out.println( "blurb: " + blurb ) ;
// Let's set the character set and encoding to UTF-8 by wrapping `System.out` in a `PrintStream`.
System.out.println( "----------| UTF-8 |--------------------------" );
try
{
PrintStream printStream = new PrintStream( System.out , true , StandardCharsets.UTF_8.name() );
printStream.println( "blurb: " + blurb );
}
catch ( UnsupportedEncodingException e )
{
e.printStackTrace();
}
// In contrast, try Windows-1252 character set.
System.out.println( "----------| windows-1252 |--------------------------" );
// Verify windows-1252 charset is available on the current JVM.
String windows1252CharSetName = "windows-1252";
boolean isWindows1252CharsetAvailable = Charset.availableCharsets().keySet().contains( windows1252CharSetName );
if ( isWindows1252CharsetAvailable )
{
System.out.println( "isWindows1252CharsetAvailable = " + isWindows1252CharsetAvailable );
} else
{
System.out.println( "FAIL - No charset available for name: " + windows1252CharSetName );
}
// Print the blurb.
try
{
PrintStream printStream = new PrintStream( System.out , true , windows1252CharSetName );
printStream.println( "blurb: " + blurb );
}
catch ( UnsupportedEncodingException e )
{
e.printStackTrace();
}
}
}
When run.
----------| default |--------------------------
blurb: © 2021 ABC Inc. All rights reserved.
----------| UTF-8 |--------------------------
blurb: © 2021 ABC Inc. All rights reserved.
----------| windows-1252 |--------------------------
isWindows1252CharsetAvailable = true
blurb: � 2021 ABC Inc. All rights reserved.
As expected, we see the COPYRIGHT SIGN character (code point 169 decimal) appear properly for Unicode but fail for Windows-1252. According to Wikipedia,
Recommended reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Related

Not able to decode unicode of Currency Symbol when passed as environment variables

The following code works well for Rupee Symbol.
String encode ="\u20B9";
byte[] ptext = encode.getBytes(StandardCharsets.UTF_8);
String value = new String(ptext,StandardCharsets.UTF_8);
System.out.print(value);
It prints : ₹
But when we pass Rupee Symbol unicode as Environment variable it's not working
String encode = System.getenv("encode");
byte[] ptext = encode.getBytes(StandardCharsets.UTF_8);
String value = new String(ptext,StandardCharsets.UTF_8);
System.out.print(value);
It prints : \u20B9
This i have tried on mac OS.

Java only understands \uXXXX format when it's inside a Java source file and the compiler handles it (parses it and converts it to a single character). If you read the value from somewhere, it won't be interpreted as a unicode character (except in a .properties file, which is a bit special).

Copy-Paste ₹ character
Paste the actual ₹ character rather than using the \u escape, as noted in correct Answer by Kayaman.
Works on macOS High Sierra with Java 10.0.1 (Zulu by Azul Systems) within IntelliJ 2018.2.2.
package com.basilbourque.example;
public class Env {
// Example passing Unicode characters via system environment variable.
public static void main ( String[] args ) {
System.out.println( "Bonjour tout le monde." );
System.out.println( System.getProperty( "java.version" ) );
System.out.println( System.getenv( "wazzup" ) );
System.out.println( System.getenv( "fire" ) );
System.out.println( System.getenv( "rupee" ) );
}
}
Bonjour tout le monde.
10.0.1
lunch
🔥
₹
UnicodeChecker app
Tip: You can find and copy all the Unicode characters from within the excellent free-of-cost macOS app, UnicodeChecker.

How to collect directory listing along with each file CRC checksum?

I use the following command to get dir listing in nix(Linux, AIX, Sunos, HPUX) platforms
Command
ls -latr
Ouput
drwxr-xr-x 2 ricky support 4096 Aug 29 11:59 lib
-rwxrwxrwx 1 ricky support 924 Aug 29 12:00 initservice.sh
cksum command is used for getting CRC checksum.
How can the CRC Checksum be appended after each file something (including directory listing too) like below, maintaining the below format in these nix(Linux, AIX, Sunos, HPUX) platforms?
drwxr-xr-x 2 ricky support 4096 Aug 29 11:59 lib
-rwxrwxrwx 1 ricky support 924 Aug 29 12:00 initservice.sh 4287252281
Update Note : No third party application, I am using java/Groovy to parse the output ultimately into a given format which forms a xml using groovy XmlSlurper (XML's get generated around 5MB sized)
"permission","hardlink","owner","group","fsize","month","date","time","filename","checksum"
All Suggestions are welcome! :)
Update with my code
But here I am calculating md5hex which gives a similar output as md5sum command from linux. So it's no longer cksum as I cannot use jacksum bcz of some licensing issue :(
class CheckSumCRC32 {
public def getFileListing(String file){
def dir = new File(file)
def filename = null
def md5sum = null
def filesize = null
def lastmodified = null
def lastmodifiedDate = null
def lastmodifiedTime = null
def permission = null
Format formatter = null
def list=[]
if(dir.exists()){
dir.eachFileRecurse (FileType.FILES) { fname ->
list << fname
}
list.each{fileob->
try{
md5sum=getMD5CheckSum(fileob.toString())
filesize=fileob.length()+"b"
lastmodified=new Date(fileob.lastModified())
lastmodifiedDate=lastmodified.format('dd/MM/yyyy')
formatter=new SimpleDateFormat("hh:mm:ss a")
lastmodifiedTime=formatter.format(lastmodified)
permission=getReadPermissions(fileob)+getWritePermissions(fileob)+getExecutePermissions(fileob)
filename=getRelativePath("E:\\\\temp\\\\recurssive\\\\",fileob.toString())
println "$filename, $md5sum, $lastmodifiedDate, $filesize, $permission, $lastmodifiedDate, $lastmodifiedTime "
}
catch(IOException io){
println io
}
catch(FileNotFoundException fne){
println fne
}
catch(Exception e){
println e
}
}
}
}
public def getReadPermissions(def file){
String temp="-"
if(file.canRead())temp="r"
return temp
}
public def getWritePermissions(def file){
String temp="-"
if(file.canWrite())temp="w"
return temp
}
public def getExecutePermissions(def file){
String temp="-"
if(file.canExecute())temp="x"
return temp
}
public def getRelativePath(def main, def file){""
return file.toString().replaceAll(main, "")
}
public static void main(String[] args) {
CheckSumCRC32 crc = new CheckSumCRC32();
crc.getFileListing("E:\\temp\\recurssive")
}
}
Output
release.zip, 25f995583144bebff729086ae6ec0eb2, 04/06/2012, 6301510b, rwx, 04/06/2012, 02:46:32 PM
file\check\release-1.0.zip, 3cc0f2b13778129c0cc41fb2fdc7a85f, 18/07/2012, 11786307b, rwx, 18/07/2012, 04:13:47 PM
file\Dedicated.mp3, 238f793f0b80e7eacf5fac31d23c65d4, 04/05/2010, 4650908b, rwx, 04/05/2010, 10:45:32 AM
but still I need a way to calculate hardlink, owner & group. I searched on the net it looks like java7 has this capability & I am stuck with java6. Any help?

Take a look at http://www.jonelo.de/java/jacksum/index.html - it is reported to provide cksum - compatible CRC32 checksums.
BTW, I tried using java.util.zip.CRC32 to calculate checksums, and it gives a different value than cksum does, so must use a slightly different algorithm.
EDIT: I tried jacksum, and it works, but you have to tell it to use the 'cksum' algorithm - apparently that is different from crc32, which jacksum also supports.

Well, you could run the command, then, for each line, run the cksum and append it to the line.
I did the following:
dir = "/home/will"
"ls -latr $dir".execute().in.eachLine { line ->
// let's omit the first line, which starts with "total"
if (line =~ /^total/) return
// for directories, we just print the line
if (line =~ /^d/)
{
println line
}
else
{
// for files, we split the line by one or more spaces and join
// the last pieces to form the filename (there must be a better
// way to do this)
def fileName = line.split(/ {1,}/)[8..-1].join("")
// now we get the first part of the cksum
def cksum = "cksum $dir/$fileName".execute().in.text.split(/ {1,}/)[0]
// concat the result to the original line and print it
println "$line $cksum"
}
}
Special attention to my "there must be a better way to do this".

Is there a way to fix wrong encoded strings?

I am getting this string via a message broker (Stomp):
JoÃÂ£o
and that's how it suposed to be:
João
Is there a way to revert this in Java?!
Thanks!

U+00C3 Ã c3 83 LATIN CAPITAL LETTER A WITH TILDE
U+00C2 Â c3 82 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+00A3 £ c2 a3 POUND SIGN
U+00E3 ã c3 a3 LATIN SMALL LETTER A WITH TILDE
I'm having trouble determining how this could be a data (encoding) conversion problem. Is it possible the data is just bad?
If the data isn't bad, then we have to assume you are misinterpreting the encoding. We don't know the original encoding and unless you're doing something different, the default encoding for Java is UTF-16. I don't see how João encoded in any common encoding could be interpreted as JoÃÂ£o in UTF-16
Just to be sure, I whipped this python script up with no match found. I'm not entirely sure it covers all encodings or I'm not missing a corner case, FWIW.
#!/usr/bin/env python
# -- coding: utf-8 --
import pkgutil
import encodings
good = u'João'
bad = u'JoÃÂ£o'
false_positives = set(["aliases"])
found = set(name for imp, name, ispkg in pkgutil.iter_modules(encodings.__path__) if not ispkg)
found.difference_update(false_positives)
print found
for x in found:
for y in found:
res = None
try:
res = good.encode(x).decode(y)
print res,x,y
except:
pass
if not res is None:
if res == bad:
print "FOUND"
exit(1)

In some cases a hack works. But best is to prevent it from ever happening.
I had this problem before when I had a servlet that correctly printed the correct headers and http content type and encoding on the page, but IE would submit forms encoded with latin1 instead of the correct one. So I created a quick dirty hack (involving a request wrapper that detects and converts if it is indeed IE) to fix it for new data which worked fine. And for the data in the database that was already messed up, I used the following hack.
Unfortunately my hack doesn't work perfectly for your example string, but it looks very close (just an extra Ã in your broken string compared to my 'theoretical cause' reproduced broken string). So perhaps my guess of "latin1" is wrong, and you should try others (such as in that other link posted by Tomas).
package peter.test;
import java.io.UnsupportedEncodingException;
/**
* User: peter
* Date: 2012-04-12
* Time: 11:02 AM
*/
public class TestEncoding {
public static void main(String args[]) throws UnsupportedEncodingException {
//In some cases a hack works. But best is to prevent it from ever happening.
String good = "João";
String bad = "JoÃÂ£o";
//this line demonstrates what the "broken" string should look like if it is reversible.
String broken = breakString(good, bad);
//here we show that it is fixable if broken like breakString() does it.
fixString(good, broken);
//this line attempts to fix the string, but it is not fixable unless broken in the same way as breakString()
fixString(good, bad);
}
private static String fixString(String good, String bad) throws UnsupportedEncodingException {
byte[] bytes = bad.getBytes("latin1"); //read the Java bytes as if they were latin1 (if this works, it should result in the same number of bytes as java characters; if using UTF8, it would be more bytes)
String fixed = new String(bytes, "UTF8"); //take the raw bytes, and try to convert them to a string as if they were UTF8
System.out.println("Good: " + good);
System.out.println("Bad: " + bad);
System.out.println("bytes1.length: " + bytes.length);
System.out.println("fixed: " + fixed);
System.out.println();
return fixed;
}
private static String breakString(String good, String bad) throws UnsupportedEncodingException {
byte[] bytes = good.getBytes("UTF8");
String broken = new String(bytes, "latin1");
System.out.println("Good: " + good);
System.out.println("Bad: " + bad);
System.out.println("bytes1.length: " + bytes.length);
System.out.println("broken: " + broken);
System.out.println();
return broken;
}
}
And the result (with Sun jdk 1.7.0_03):
Good: João
Bad: JoÃÂ£o
bytes1.length: 5
broken: JoÃ£o
Good: João
Bad: JoÃ£o
bytes1.length: 5
fixed: João
Good: João
Bad: JoÃÂ£o
bytes1.length: 6
fixed: Jo�£o

Java: List Cameras that are Plugged In

My program currently gets a list of drives plugged into the computer with File.listRoots(). But, when I plug a camera or an MP3 player into the computer directly (instead of inserting the memory card), it's not listed, nor does it have a drive letter in Windows Explorer. For example, here's the location of my camera:
Computer\Canon PowerShot SD750\Removable storage
How can I also list cameras/other devices that do not have a drive letter? I assume this will require a JNI library of some sort, but I don't know for sure obviously.
Thanks!
P.S. Out of desperation, I did try to list the contents of Computer\; it didn't work of course.
Update: I found this question here: Portable Device Path on Windows ; that's exactly the problem I'm having, but there is no solution laid out there.

Java 7 has some promising looking classes in this area, like this one:
http://download.java.net/jdk7/docs/api/java/nio/file/FileSystem.html
Assuming that you need it to work on Java 6 as well, I would suggest running a shell script and parsing its output.
On Windows you could run mountvol, on Unix/MacOS X mount etc. Of course parsing the output would be somewhat tedious and you would have to worry about every OS your app runs on, but hey, at least... not sure what.... it works?
Here is some sample code which seems helpful on Windows:
strComputer = "."
Set objWMIService = GetObject("winmgmts:" _
& "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
Set colItems = objWMIService.ExecQuery("Select * from Win32_Volume")
For Each objItem In colItems
WScript.Echo "Automount: " & objItem.Automount
WScript.Echo "Block Size: " & objItem.BlockSize
WScript.Echo "Capacity: " & objItem.Capacity
WScript.Echo "Caption: " & objItem.Caption
WScript.Echo "Compressed: " & objItem.Compressed
WScript.Echo "Device ID: " & objItem.DeviceID
WScript.Echo "Dirty Bit Set: " & objItem.DirtyBitSet
WScript.Echo "Drive Letter: " & objItem.DriveLetter
WScript.Echo "Drive Type: " & objItem.DriveType
WScript.Echo "File System: " & objItem.FileSystem
WScript.Echo "Free Space: " & objItem.FreeSpace
WScript.Echo "Indexing Enabled: " & objItem.IndexingEnabled
WScript.Echo "Label: " & objItem.Label
WScript.Echo "Maximum File Name Length: " & objItem.MaximumFileNameLength
WScript.Echo "Name: " & objItem.Name
WScript.Echo "Quotas Enabled: " & objItem.QuotasEnabled
WScript.Echo "Quotas Incomplete: " & objItem.QuotasIncomplete
WScript.Echo "Quotas Rebuilding: " & objItem.QuotasRebuilding
WScript.Echo "Serial Number: " & objItem.SerialNumber
WScript.Echo "Supports Disk Quotas: " & objItem.SupportsDiskQuotas
WScript.Echo "Supports File-Based Compression: " & _
objItem.SupportsFileBasedCompression
WScript.Echo
Next
Here is the output I got for my ebook reader:
Automount: True
Block Size: 4096
Capacity: 999120896
Caption: G:\
Compressed:
Device ID: \\?\Volume{8e3b4ce5-a124-11e0-9d2b-e30c5839642d}\
Dirty Bit Set: False
Drive Letter: G:
Drive Type: 2
File System: FAT32
Free Space: 663683072
Indexing Enabled:
Label: PocketBook9
Maximum File Name Length: 255
Name: G:\
Quotas Enabled:
Quotas Incomplete:
Quotas Rebuilding:
Serial Number: 1276177233
Supports Disk Quotas: False
Supports File-Based Compression: False

The solution to above problem using JMTP library on
http://code.google.com/p/jmtp/
Here is my code
package jmtp;
import be.derycke.pieter.com.COMException;
import be.derycke.pieter.com.Guid;
import java.io.*;
import java.math.BigInteger;
import jmtp.PortableDevice;
import jmtp.*;
public class Jmtp {
public static void main(String[] args) {
PortableDeviceManager manager = new PortableDeviceManager();
PortableDevice device = manager.getDevices()[0];
// Connect to my mp3-player
device.open();
System.out.println(device.getModel());
System.out.println("---------------");
// Iterate over deviceObjects
for (PortableDeviceObject object : device.getRootObjects()) {
// If the object is a storage object
if (object instanceof PortableDeviceStorageObject) {
PortableDeviceStorageObject storage = (PortableDeviceStorageObject) object;
for (PortableDeviceObject o2 : storage.getChildObjects()) {
//
// BigInteger bigInteger1 = new BigInteger("123456789");
// File file = new File("c:/JavaAppletSigningGuide.pdf");
// try {
// storage.addAudioObject(file, "jj", "jj", bigInteger1);
// } catch (Exception e) {
// //System.out.println("Exception e = " + e);
// }
//
System.out.println(o2.getOriginalFileName());
}
}
}
manager.getDevices()[0].close();
}
}
Donot forget add jmtp.dll files (that comes up with jmtp download) as a native library for more info see my answer on
http://stackoverflow.com/questions/12798530/including-native-library-in-netbeans

This may not be the answer you're looking for, but is assigning them to a drive letter not an option? You can usually manually do this with USB devices on Windows using My Computer > right-click > Manage > Storage.
It's possible that CaptureDeviceManager in JMF (java media framework) could help you but I kind of doubt it.

Maybe you can take a look at Morena Framework http://www.gnome.sk/Twain/jtp.htmlv (seems to be open source, but a little expensive; though there is a free evaluation version), it is for TWAIN compatible scanners/cameras (Windows/MAC) or SANE compatible (Linux or other unix flavor), to get a list of connected devices, you can do this:
import SK.gnome.morena.*;
import SK.gnome.twain.*;
public class Test
{
public static void main(String[] args) throws Exception
{
TwainSource[] sources=TwainManager.listSources();
if(sources == null) return;
for(int i = 0; i < sources.length; i++)
{
System.out.println("Twain source is: " + ts.toString());
}
}
}
Maybe that could help,if not I think maybe JMF is a possible solution.

Parse a task list

A file contains the following:
HPWAMain.exe 3876 Console 1 8,112 K
hpqwmiex.exe 3900 Services 0 6,256 K
WmiPrvSE.exe 3924 Services 0 8,576 K
jusched.exe 3960 Console 1 5,128 K
DivXUpdate.exe 3044 Console 1 16,160 K
WiFiMsg.exe 3984 Console 1 6,404 K
HpqToaster.exe 2236 Console 1 7,188 K
wmpnscfg.exe 3784 Console 1 6,536 K
wmpnetwk.exe 3732 Services 0 11,196 K
skypePM.exe 2040 Console 1 25,960 K
I want to get the process ID of the skypePM.exe. How is this possible in Java?
Any help is appreciated.

Algorithm
Open the file.
In a loop, read a line of text.
If the line of text starts with skypePM.exe then extract the number.
Repeat looping until all lines have been read from the file.
Close the file.
Implementation
import java.io.*;
public class T {
public static void main( String args[] ) throws Exception {
BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream( "tasklist.txt" ) ) );
String line;
while( (line = br.readLine()) != null ) {
if( line.startsWith( "skypePM.exe" ) ) {
line = line.substring( "skypePM.exe".length() );
int taskId = Integer.parseInt( (line.trim().split( " " ))[0] );
System.out.println( "Task Id: " + taskId );
}
}
br.close();
}
}
Alternate Implementation
If you have Cygwin and related tools installed, you could use:
cat tasklist.txt | grep skypePM.exe | awk '{ print $2; }'

To find the Process Id of the application SlypePM..
Open the file
now read lines one by one
find the line which contains SkypePM.exe in the beginning
In the line containing SkypePM.exe parse the line to read the numbers after the process name leaving the spaces.
You get process id of the process
It is all string operations.
Remember the format of the file should not change after you write the code.

If you really want to parse the output, you may need a different strategy. If your output file really is the result of a tasklist execution, then it should have some column headers at the top of it like:
Image Name PID Session Name Session# Mem Usage
========================= ======== ================ =========== ============
I would use these, in particular the set of equal signs with spaces, to break any subsequent strings using a fixed-width column strategy. This way, you could have more flexibility in parsing the output if needed (i.e. maybe someone is looking for java.exe or wjava.exe). Do keep in mind the last column may not be padded with spaces all the way to the end.
I will say, in the strictest sense, the existing answers should work for just getting the PID.

Implementation in Java is not a good way. Shell or other script languages may help you a lot. Anyway, JAWK is a implementation of awk in Java, I think it may help you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading Unicode Characters using Selenium Webdriver in Java - java

Related

Not able to decode unicode of Currency Symbol when passed as environment variables

How to collect directory listing along with each file CRC checksum?

Is there a way to fix wrong encoded strings?

Java: List Cameras that are Plugged In

Parse a task list

Categories

Resources