Find out programmatically which code page is being used for stderr - java

I've made some progress on my previous question, and discovered that the message being displayed is redirected from stderr and is encoded with Cp850.
So, how can I discover this from the program in Java? Reading the Wikipedia page on code page 850, and looking at the characters that can be encoded, it's obvious that this won't be used on all Windows systems because it doesn't include the characters needed for, say, Japanese. (And I assume that it won't be used on non-windows systems.)
If I print out the results of Charset.defaultCharset(), I get windows-1252. But using Cp1252 to decode the output from stderr doesn't give the correct encoding -- what should be è prints as Š.
This has been asked before in a slightly different way, but the answers given are just wrong or don't apply to what I'm trying to do. Is this possible to do in Java? If not, what are people doing to get a stderr message to the user in a format they can read?

Here's what worked for me, for the benefit of future Stack Overflowers.
import com.sun.jna.Library;
import com.sun.jna.Native;
public static int getCodePage(){
Kernel32 lib = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);
return lib.GetOEMCP();
}
public interface Kernel32 extends Library {
public int GetOEMCP();
}
The code page returned will be an int corresponding to the list at Code Page Identifiers. These can be correlated to the list of supported encodings for Java. (I made a static hashmap. I'm sure there are other, nicer ways to do it.)

Related

JNA call with String behaves differently from one with byte[]

I have a JNA Java interface for a C function mpv_set_option_string defined as:
public interface MPV extends StdCallLibrary {
MPV INSTANCE = Native.loadLibrary("lib/mpv-1.dll", MPV.class, W32APIOptions.DEFAULT_OPTIONS);
long mpv_create();
int mpv_initialize(long handle);
int mpv_set_option_string(long handle, String name, String data);
}
When I call this like this:
System.setProperty("jna.encoding", "UTF8");
long handle = MPV.INSTANCE.mpv_create();
int error = MPV.INSTANCE.mpv_initialize(handle);
error = MPV.INSTANCE.mpv_set_option_string(handle, "keep-open", "always");
I get an error back (-5) from the last call, indicating the option (keep-open) is not found.
However, if I change the JNA function signature to:
int mpv_set_option_string(long handle, byte[] name, byte[] data);
...and then call it like this:
error = MPV.INSTANCE.mpv_set_option_string(
handle,
"keep-open\0".getBytes(StandardCharsets.UTF_8),
"always\0".getBytes(StandardCharsets.UTF_8)
);
...it returns no error (0) and works correctly (or so it seems).
What I don't get is, JNA is supposed to encode String by default as char * with UTF-8 encoding and NUL terminated (exactly what I do manually), yet I get different results.
Anyone able to shed some light on this?
You shouldn't be passing W32OPTIONS to a library that isn't a WIN32 API.
By default, JNA maps String to char*, so removing the options should fix the issue for you.
You should also be using an explicit native type for your handle instead of Java long. Pointer is probably correct in this case.
Looks like I found the issue, although I'm not 100% sure what is happening.
It seems that using W32APIOptions.DEFAULT_OPTIONS means it will use the UNICODE settings (because w32.ascii property is false). This looked okay to me, as mpv-1.dll works with UTF-8 strings only, which is Unicode.
However, now I'm guessing that in this case it means it will call a wide-char version of the library function (and if that doesn't exist, still call the original function), and probably means it encodes Strings with two bytes per character. This is because most Win32 libraries have an ASCII and WIDE version of methods accepting strings, but nothing for UTF-8.
Since mpv-1.dll only accepts UTF-8 (and isn't really Win32), strings should be just encoded as bytes in UTF-8 format (basically, just leave them alone). To let JNA know this, either donot pass a W32APIOptions map at all, or select the ASCII_OPTIONS manually.

Some Chinese characters cannot be interpreted when REST call is triggered

Hopefully there are some of you who already encountered this problem and had a fix for it.
I've been encountering a problem with some Chinese characters like ('𨍭','𠀪') because when the REST call is triggered, they are transformed as 'ࠀª' and our Backend system cannot interpret this.
I've created a simple springboot program to check. see below. The character encoding is set to UTF-8
#RequestMapping(value="/chinese", method= RequestMethod.POST, produces="application/json", consumes="application/json;charset=UTF-8")
public String interpret(#RequestBody SampleRESTBean bean ) {
String value = bean.getChineseName();
return value;
}
The bean object below
public class SampleRESTBean {
private String chineseName;
public String getChineseName() {
return chineseName;
}
public void setChineseName(String chineseName) {
this.chineseName = chineseName;
}
Now, to test (in Postman), My JSON object is set as below
{"chineseName":"𨍭獲𠀪保意外,另危疾加1/1000" }
When the object is inside the method, it looks like this
਍­獲ࠀª保意外,另危疾加1/1000
The other Chinese characters are interpreted but some specific characters are not.
Is there a solution for this kind of problem? I've searched for many references but still, I cannot fix this issue. Also, if your have a solution as well when JAX-RS is used because it produces the same issue.
Need your assistance please. Thanks!
A little bit of research shows that:
The unicode codepoint for 𨍭 is u+2836D
The unicode codepoint for 𠀪 is u+2002A
Both are codepoints in code plane 2.
I suspect that the problem is in the way that your client-side or server-side software stack is handling characters that are not in Unicode code-plane 0 (aka the BMP). These are represented as 2 char values when the data is converted to an array of char (or a String, StringBuffer, etcetera.) Some (old) libraries and a lot of custom code don't handle this properly.
I suggest that you do the following:
Brush up on your understanding of Unicode, UTF-16, UTF-8 and "surrogate pairs".
Brush up on your skills in using a Java debugger.
Use the debugger to find out what is happening on the client and server-side1.
Once you have found what is causing the problem:
If it is your code, fix it.
If it is someone else's code, submit a bug report and/or look for a newer version that has already fixed the problem.
1 - Set breakpoints at key points and look at the char values in the data structures that are passing the data around. Look for where the values seem to go "bad".

How do I mask passwords in a command-line Java program with asterisks or the like?

The standard JVM method for reading a password from the command line without showing it is java.io.Console.readPassword(). This, however, shows nothing while the user is typing; users accustomed to graphical programs will expect symbols such as "•" or "*" to appear in place of the characters they type. Naturally, they will also want backspacing, inserting, and so on to work as normal, just with all the characters being operated on replaced with the same symbol.
In 2019, is there a generally accepted JVM procedure for showing "*******" when the user types "hunter2" in a console application? Can this even be done properly without a GUI? A 2011 SO question on the topic got an answer linking to this article on the topic; can we do better nowadays than the rather elaborate solution shown therein?
(I happen to be using Kotlin as my language of choice, so a Kotlin-specific solution will satisfy if there is one.)
hunter2? Wow. Reference acknowledged.
There is no easy way. The primary problem is that the standard System.in doesn't give you any characters at all until the user has pressed enter, so there's no way to emulate it (if you try to read char-for-char from System.in and emit a * every time a key is pressed, that won't work).
The lanterna library at https://github.com/mabe02/lanterna can do it, though. If you want to emulate it, it's.. very complicated. It has branching code paths for unix and windows. For example, on unix, it uses some hackery to figure out what tty you're on, and then opens the right /dev/tty device. With lanterna, writing this yourself would be trivial.
It's that or accept Console.readPassword()'s blank nothingness, really. Or, write a web interface or a swing/awt/javafx GUI.
I think answer to your question can be found here in stackoverflow itself.
please see this:
masking-password-input-from-the-console-java
sample code from there:
import java.io.Console;
public class Main {
public void passwordExample() {
Console console = System.console();
if (console == null) {
System.out.println("Couldn't get Console instance");
System.exit(0);
}
console.printf("Testing password%n");
char passwordArray[] = console.readPassword("Enter your secret password: ");
console.printf("Password entered was: %s%n", new String(passwordArray));
}
public static void main(String[] args) {
new Main().passwordExample();
}
}
hope this is helpful. :)

Preventing "Null Byte Attacks" | Java

My initial understanding on this topic is that I need to prevent some junk characters available in request to avoid these attacks.
I have decided to solve this by Pattern matching for every request parameter before using it. Most of the posts available on internet talks about Null Byte and the example given shows how file IOs are the main victims of this attack. So following are my questions
Is File IOs are the only thing that null byte can affect or other operations are also victims of this attack?
What are the char/strings/patterns I need to take care if I want to filter my request parameter to be safe for null bye attacks? I have a list and I am sure it is not complete one. %00, \0, 0x00 in hex
The articles that I am referring to are:
http://projects.webappsec.org/w/page/13246949/Null%20Byte%20Injection
http://www.perlmonks.org/index.pl?node_id=38548
http://hakipedia.com/index.php/Poison_Null_Byte
Thanks in advance
So to make it more clear:
First post points out the vulnerability in java that I am talking about. String serverlogs.txt%00.db is allowed in java but when it comes to C/C++ this is serverlogs.txt as in C %00 would be replace by null byte causing the string to terminate after serverlogs.txt. So we should avoid such characters. This is what I am trying to figure out which such characters I should not allow.
String fn = request.getParameter("fn");
if (fn.endsWith(".db"))
{
File f = new File(fn);
//read the contents of “f” file
…
}
Have you tried it? I wrote this quick unit test:
#Test
public void test() throws Exception {
FileOutputStream out = new FileOutputStream("test.txt");
out.write("hello!".getBytes("utf-8"));
out.close();
String badPath = "test.txt\0foo";
File file = new File(badPath);
FileInputStream in = new FileInputStream(file);
System.out.println(StreamUtils.copyToString(in, Charset.forName("utf-8")));
}
Now, if the null character broke the string, I would expect to have the contents of my file printed to the console. Instead, I get a FileNotFoundException. For the record, this was using Java 1.7.0_40 on Ubuntu 13.04.
Update
Further investigation reveals this code in File#isInvalid:
final boolean isInvalid() {
if (status == null) {
status = (this.path.indexOf('\u0000') < 0) ? PathStatus.CHECKED
: PathStatus.INVALID;
}
return status == PathStatus.INVALID;
}
Not a bad question. I'm doubtful that this is a valid vulnerability on all platforms (for example, I believe Windows uses Pascal-style strings, not null-terminated strings, in its kernel), but I would not at all be surprised if some platforms and JVMs were in fact vulnerable to this kind of attack.
The key point to consider is where your strings are coming from, and what you're doing to those bytes before you interact with them as strings. Any bytes coming from a remote machine should always be assumed to be malicious until proven otherwise. And you should never take strings that come from over the Internet and try to turn them into paths on your local machine. Yes webservers like Apache do this, but that's also the most vulnerable code they have. The correct solution is: don't try to blacklist bad data (like null bytes), only whitelist good data.
You might also fight the issue of Null byte from the other angle!
in May 1013 Oracle fixed the problem: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8014846
So, upgrade to Java 8 or Java 7u40 and you are protected.
(Yes, i tested it!), it works!
If a link to my personal blog is not considered a spam, I'l drop it here:
http://crocode.blogspot.ru/2015/03/java-null-byte-injections.html
If I'm reading your question correctly, you want to prevent executable code from being injected into memory after the terminating null byte of a string.
Java ain't C.
Java doesn't use terminating null byes for its strings so you don't need to protect against this.

Rename Pdf from Pdf title

I want to organize my pdf file downloaded from the internet. It is clear that many of them are ill-named. I want to extract the real title from the file. Here many of them are generated from Latex and I think from the compiled pdf we can find the \title{} keyword or something like that. I want then use this to rename the file.
I can read the meta-data using pypdf. But most pdf does not contains that title in its metadata. I have tried it with all my collections and find none!
Two questions:
1. Is it possible to read pdf title compiled from the pdf compiled from latex.
2. Which library(mainly in C/C++, java, python) can I use to get that information.
Thanks in advance.
I think this is not really possible. The LaTeX information is no longer present in the pdf. If the title is not present in the metadata, you might be able to deduce the title from the structure information if it is a "tagged pdf". Most pdfs aren't however, and those that are will probably provide the metadata anyway.
This leaves you with layout analysis: try to determine what is the title from the document by looking at layout characteristics. For python, you might want to have a look at pdfminer.
The following example uses pdfminer to determine the title using a rather simplistic approach:
we assume that the title is somewhere on the first page
we leave it to pdfminer to recognize "blocks of text" on the first page
we assume that the title is printed "bigger" than the rest of the page. Looking at the height of each line in the text blocks, we determine which block contains the "tallest" line, and assume that that block contains the title
we let pdfminer extract the text from the block,
the text will probably contain newlines (placed by pdfminer) because the title might contain more than one line, and other needless whitespace, so we do some simple whitespace normalization (replace consecutive whitespace by a single space, and strip leading and trailing whitespace), and that's it!
As I said: this approach is rather simplistic, and might or might not give good results for your documents, but it may point you in the right direction. Here it goes:
import sys
import re
from pdfminer.pdfparser import PDFParser, PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox
filename = sys.argv[1]
fp = open(filename, 'rb')
parser = PDFParser(fp)
doc = PDFDocument()
parser.set_document(doc)
doc.set_parser(parser)
doc.initialize()
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interp = PDFPageInterpreter(rsrcmgr, device)
pages = doc.get_pages()
first_page = pages.next()
interp.process_page(first_page)
layout = device.get_result()
textboxes = [i for i in layout if isinstance(i, LTTextBox)]
box_with_tallest_line = max(textboxes, key=lambda x: max(i.height for i in x))
text = box_with_tallest_line.get_text()
print re.sub('\s+', ' ', text).strip()
I'll leave renaming the file to you (note that the title might contain characters that you might not want, or that are not even valid in filenames). Pdfminer documentation is rather sparse at the moment, so you might want to ask on the mailing list if you need to know more. (don't know very much about it myself, but couldn't resist trying ;-)). Or you might try a similar approach with other pdf libraries/other languages.
In python, your best bet is to look at pyPdf (Debian package: python-pypdf). Here's some code:
import pyPdf, sys
filename=sys.argv[1]
i=pyPdf.PdfFileReader(open(filename,"rb"))
d=i.getDocumentInfo()
print d["/Title"]
In my experience, few PDFs have the "/Title" attribute set, though, so your mileage may vary. In that case, you'll have to guess the title from the contents, which is bound to be error-prone. pyPdf may help you with that as well.
Try iText (Java). I found this example, try it (you may add generics, if supported):
PdfReader reader = new PdfReader("yourpdf.pdf");
HashMap map= reader.getInfo();
Set keys = map.keySet();
Iterator i = keys.iterator();
while(i.hasNext()) {
String thiskey = (String)i.next();
System.out.println(thiskey + ":" + (String)map.get(thiskey));
}
Another option for C++ is Poppler.
I tried to do something similar in the past (and was asking advice here:
Extracting text from PDF with Poppler (C++) ) but never really got it working. At the end of the day I realised that at least for my use, it was easier to manually rename the files.
The best solution I found for renamin PDF files using not jus the tittle, but any text you need in the pdf file is the A-PDF rename app, it worked very well for all files I tried.

Categories