Confusion Decoding Using Format and Printf

Confusion Decoding Using Format and Printf - java

I am very new to String formatting using format() and printf methods. I have read the tutorial on Oracle site but finding it very confusing. So decided to try some examples. I got this sample and
have understood the output as 124.00
public class TestStringFormatter {
public static void main(String[] args){
/* I do understand % - denotes start of instruction
, is the flag
6 - denotes width
2 - Denotes precision
f - Type */
String s = String.format("%,6.2f",124.000) ;
System.out.printf(s);
}
}
What i am not able to understand is , is the flag and how it is used in this formatting?
Can someone explain the use of flag "," in this example.

The comma flag indicates that commas will be used to separate thousands, at least in the US. In other countries, it will use separators that make more sense in those countries. For example, formatting 123 with the comma flag will yield 123, and formatting 123456789 with the comma flag will yield 123,456,789.

Related

Converting Strings into the numeric unicode equivalent for each letter in the word using a loop using Java? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I've been trying to figure out how to convert Strings into numeric Unicode that is from the user input. From the assignment, I know I'm suppose to use loops to transfer the words to Unicode but I'm uncertain how to do this.
I'm pretty sure it's suppose to be either a while or for loop. They said the while loop if for when you don't have a set iteration which I don't think this one does.
I have looked up Unicode and I'm still confused but I think this might be Unicode site: http://www.unicodetables.com/
Which lead me to the Basic Latin:
http://www.unicode.org/charts/PDF/U0000.pdf
But I have no idea what to do with it.
I also saw another question that was similar to mine but used ord() but that was for Python so I don't know if Java has an equivalent.
The code that I have so far is here:
package calculate;
import java.util.Scanner;
public class UniCal {
public static void main(String[] args) {
String word1;
String word2;
String resultFullWord;
int resultAbsVal = 0;
Scanner scanner = new Scanner(System.in);
System.out.println("What is your first word?");
word1 = scanner.nextLine();
System.out.println("And your second word?");
word2 = scanner.nextLine();
resultFullWord = (word1 + " " + word2);
System.out.println(resultFullWord);
word = input("Enter a word: ")
for letter in word:
print(letter + ': ', ord(letter));
}
// while (resultFullWord.equals(resultFullWord)) {
// Do something ...
// Get input into userChar
// }
// Math.abs
System.out.println("Your absolute value based on your word is : " + resultAbsVal);
scanner.close();
}
}

You are missing the key point that Java char values already provide a numeric encoding of Unicode, and also that Java Strings have methods for extracting Unicode "code points" -- the numeric values that Unicode assigns to characters.
It's unclear exactly what the expectation of the assignment is. If the "numeric unicode equivalent" means the UTF-16 code unit value, then you can just iterate over the chars of the string. Depending on what method you use to print them, you may need to cast them to type int to get numeric results instead of numbers.
I suspect that the above is what the assignment expects you to do, since you seem to have had little or no introduction to Unicode concepts or terminology. However, evantually you will need to be aware that UTF-16, and therefore Java Strings, encode the majority of Unicode characters as two-char "surrogate pairs". If you're intended to print the numeric value of each Unicode character encoded by a String, as opposed to each Java char, then you need a different approach: you want to iterate over the code points represented by the String, not the chars.
To iterate over all the Unicode code points of a String, I would be inclined to use String.codePoints(), but that provides them to you in the form of an IntStream. That's a really convenient form, except that it's not conducive to performing an explicit iteration via a for or while loop, and if you're new to Java then you may not have learned about streams yet. In that case, you can use String.codePointCount() to find out how many code points the string contains, then use a for loop to iterate over the code point indexes, extracting each one via String.codePointAt().

Why is the java compiler stripping all unicode characters before the actual compilation? [duplicate]

This question already has answers here:
Java Unicode translation
(2 answers)
Closed 7 years ago.
I am very new to Java and I have code like this:
public class Puzzle {
public static void main(String... args) {
System.out.println("Hi Guys!");
// Character myChar = new Character('\u000d');
}
}
You can see the line:
Character myChar = new Character('\u000d');
is commented out. But still, I get an error like this when I run javac:
Puzzle.java:9: error: unclosed character literal
// Character myChar = new Character('\u000d');
^
1 error
In this blog post I found the reason for the exception. The blog says:
Java compiler, just before the actual compilation strips out all the
unicode characters and coverts it to character form. This parsing is
done for the complete source code which includes the comments also.
After this conversion happens then the Java compilation process
continues.
In our code, the when Java compiler encounters \u000d, it considers
this as a newline and changes the code as below,
public class Puzzle {
public static void main(String... args) {
System.out.println("Hi Guys!");
// Character myChar = new Character('
');
}
}
With this I have two questions:
Why does Java parse the unicode first? Are there any advantages to it?
Because the line is still commented, Java is trying to parse it! Is this the only case it does? Or does it generally parse the commented lines too? I'm confused.
Thanks in advance.

Why Java parses the unicode first? Are there any advantages of it?
Yes, unicode sequences are first replaced before the compiler proceeds to lexicographical analysis.
Quoting from the The Java™ Language Specification §3.3 Unicode Escapes:
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) for the indicated hexadecimal value, and passing all other characters unchanged.
So for example the following source code results in error:
// String s = "\u000d";
But this one is valid:
/*String s = "\u000d";*/
Because when \u000d is replaced with a new line it will look like this:
/*String s="
";*/
Which is totally fine with the multi-line comment /* */.
Also the following code:
public static void main(String[] args) {
// Comment.\u000d System.out.println("I will be printed out");
// Comment.\u000a System.out.println("Me too.");
}
Will print out:
I will be printed out
Me too.
Because after the unicode replace both System.out.println() statements will be outside of comment sections.
To answer your question: The unicode replace has to happen some time. One could argue that this should happen before or after taking out comments. A choice was made to do this before taking out the comments.
Reasonig might be because the comment is just another lexical element and prior to identify and analyze lexical elements you usually want to replace unicode sequences.
See this example:
/\u002f This is a comment line
If placed in a Java source, it causes no compile errors because \u002f will be translated to the character '/' and along with the preceeding '/' will form the start of a line comment //.
Because, the line is still commented, Java is trying to parse it! Is this the only case it does? Or it generally parses the commented lines too? I'm confused.
The Java compiler does not analyze comments but they still have to be parsed to know where they end.

Java regex pattern matching (Irish car registration)

Sorry if this a dumb question but it's been driving me mental for the past 5 days.
I'm trying to make a regex pattern to match the Irish car registration example '12-W-1234'
So far this is what I have:
import java.util.ArrayList;
import java.util.List;
public class ValidateDemo {
public static void main(String[] args) {
List<String> input = new ArrayList<String>();
input.add("12-WW-1");
input.add("12-W-223");
input.add("02-WX-431");
input.add("98-zd-4134");
input.add("99-c-7465");
for (String car : input) {
if (car.matches("^(\\d{2}-?\\w*([KK|kk|ww|WW|c|C|ce|CE|cn|CN|cw|CW|d|D|dl|DL|g|G|ke|KE|ky|KY|l|L|ld|LD|lh|LH|lk|LK|lm|LM|ls|LS|mh|MH|mn|MN|mo|MO|oy|OY|so|SO|rn|RN|tn|TN|ts|TS|w|W|wd|WD|wh|WH|wx|WX])-?\\d{1,4})$")) {
System.out.println("Car Template " + car);
}
}
}
}
My problems are coming up when it is checking regs that would have a single letter in the that is in my pattern. Eg '12-ZD-1234'.
Where ZD isn't a valid county ID but since D is valid it allows it to be displayed.
Any help would be great.
I've already done research on a few websites including this and this.
These websites helped, but I'm still having my problems.
By the by, I'am going to change the pattern to change all inputs into
uppercase to reduce the size of my code.
Thanks for the help

Besides the \\w* that others have pointed out, you're misusing character classes ([...]). To actually use alternation (|), take out the square brackets as well:
^(\\d{2}-?(KK|kk|ww|WW|c|C|ce|CE|cn|CN|cw|CW|d|D|dl|DL|g|G|ke|KE|ky|KY|l|L|ld|LD|lh|LH|lk|LK|lm|LM|ls|LS|mh|MH|mn|MN|mo|MO|oy|OY|so|SO|rn|RN|tn|TN|ts|TS|w|W|wd|WD|wh|WH|wx|WX)-?\\d{1,4})$
Here are some examples to show you how character classes actually work:
[abc] matches a single character, either a, b, or c.
[aabbcc] is equivalent to [abc] (duplicates are disregarded).
[|] matches a pipe character, i.e. symbols are allowed.
[KK|kk|ww|WW|c|C|ce|CE ... ] ends up being equivalent to [K|wWcCeE ... ] because, again, duplicates are disregarded.
You were correct to use the alternation operator (|) to do what you desired, but you didn't need to use character classes.

You can improve you pattern like this:
^[0-9]{2}-?(?>c[enw]?|C[ENW]?|dl?|DL?|g|G|k[eky]|K[EKY]|l[dhkms]?|L[DHKMS]?|m[hno]|M[HNO]|oy|OY|rn|RN|so|SO|t[ns]|T[NS]|w[dhx]?|W[DHX]?)-?[0-9]{1,4}$
And if you don't care about the case of letters:
^(?i)[0-9]{2}-?(?>c[enw]?|dl?|g|k[eky]|l[dhkms]?|m[hno]oy|rn|so|t[ns]|w[dhx]?)-?[0-9]{1,4}$
Note that anchors (^ and $) are useful if your string must only contain the car registration number.
Note2: You can improve it more, if you put at the first place in the alternation the most frequent county.

Irish numberplates can also start with three digits, since 2013, they are now (year)(1|2)-(county)-(number), so the regex could be simply (\d+-?\w{2}-?\d+)
However, the best form of validataion is to run this against a vehicle registraion API, such as http://ie.carregistrationapi.com/ - since this will determine if the vehicle is registered, rather than just being in the right format.

foo.split(',').length != number of ',' found in 'foo'?

Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.
I am using Java but am .NET developer.
I have a string and I need to split it on semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).
What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?
thanks much

First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.
"foo,,bar,baz,,".split(",")
is:
{foo,,bar,baz}
an array of 4 elements. But
"foo,,bar,baz,,".split(",", -1)
is::
{foo,,bar,baz,,}
with all 6.
Note that only trailing empty strings are dropped by default.
Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since , is not a special character, but you should keep it in mind.

There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.
But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,
The ones at the end get dropped. Check the java documentation for the split function.
http://download.java.net/jdk7/docs/api/java/lang/String.html

As others have pointed out, String.split has some very non-intuitive behaviour.
If you're using Google's Guava open-source Java library, there's a Splitter class which gives a much nicer (in my opinion) API for this, with more flexibility:
String input = "foo, bar,";
Splitter.on(',').split(input);
// returns "foo", " bar", ""
Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"
Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"

Is it omitting blanks?
Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?
Are there extra delimiters in the cell data?

Short example: foo = "1,2" and
foo.split(",").length = 2
count(foo, ",") = 1
Probably you have a mistake in your code. Here is an example in Java code:
String row = "1,2,3,4,,5"; // second example: 1,2,3,5,,
System.out.println(row.split(",").length); // print 6 in both cases
// code to count how many , you have in your row
Pattern patter = Pattern.compile(",");
Matcher m = patter.matcher(row);
int nr = 0;
while(m.find())
{
nr++;
}
System.out.println(nr); // print 5 for the first example and 6 for second

Print string literal unicode as the actual character

In my Java application I have been passed in a string that looks like this:
"\u00a5123"
When printing that string into the console, I get the same string as the output (as expected).
However, I want to print that out by having the unicode converted into the actual yen symbol (\u00a5 -> yen symbol) - how would I go about doing this?
i.e. so it looks like this: "[yen symbol]123"

I wrote a little program:
public static void main(String[] args) {
System.out.println("\u00a5123");
}
It's output:
¥123
i.e. it output exactly what you stated in your post. I am not sure there is not something else going on. What version of Java are you using?
edit:
In response to your clarification, there are a couple of different techniques. The most straightforward is to look for a "\u" followed by 4 hex-code characters, extract that piece and replace with a unicode version with the hexcode (using the Character class). This of course assumes the string will not have a \u in front of it.
I am not aware of any particular system to parse the String as though it was an encoded Java String.

As has been mentioned before, these strings will have to be parsed to get the desired result.
Tokenize the string by using \u as separator. For example: \u63A5\u53D7 => { "63A5", "53D7" }
Process these strings as follows:
String hex = "63A5";
int intValue = Integer.parseInt(hex, 16);
System.out.println((char)intValue);

You're probably going to have to write a parse for these, unless you can find one in a third party library. There is nothing in the JDK to parse these for you, I know because I fairly recently had an idea to use these kind of escapes as a way to smuggle unicode through a Latin-1-only database. (I ended up doing something else btw)
I will tell you that java.util.Properties escapes and unescapes Unicode characters in this manner when reading and writing files (since the files have to be ASCII). The methods it uses for this are private, so you can't call them, but you could use the JDK source code to inspire your solution.

Could replace the above with this:
System.out.println((char)0x63A5);
Here is the code to print all of the box building unicode characters.
public static void printBox()
{
for (int i=0x2500;i<=0x257F;i++)
{
System.out.printf("0x%x : %c\n",i,(char)i);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.