I need to display a list in table/grid format, so I'm using String.format() as in the following example,
how to print object list to file with formatting in table format using java
My issue is that I need to force-wrap the output at 80 chars. The table's maximum width is 80, any further output must continue on the next line.
Is this possible?
Current code, without wrapping implemented:
StringBuilder sbOutput = new StringBuilder();
sbOutput.append(String.format("%-14s%-200s%-13s%-24s%-12s", "F1", "F2", "F3", "F4", "F5"));
for (MyObject result: myObjects) {
sbOutput.append(String.format("%-14s%-200s%-13s%-24s%-12s", result.getF1(),
result.getF2(), result.getF3(), result.getF3(), result.getF4()));
}
You can inject a newline into a string every 80 chars like this:
str.replaceAll(".{80}(?=.)", "$0\n");
So your code would become:
sbOutput.append(String.format("%-14s%-200s%-13s%-24s%-12s", result.getF1(),
result.getF2(), result.getF3(), result.getF3(), result.getF4())
.replaceAll(".{80}(?=.)", "$0\n"));
The search regex means "80 chars that have a character following" and "$0" in the replacement means "everything matched by the search".
The (?=.) is a look ahead asserting the match is followed by any character, which prevents output that is an exact multiple of 80 chars getting an unecessary newline added after it.
Related
I am reading a JSON file in Java using this code:
String data = Files.readFile(jsonFile)
.trim()
.replaceAll("[^\\x00-\\x7F]", "")
.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "")
.replaceAll("\\p{C}", "");
In my JSON file, there is a unique char: 'あ' (12354) that is interpreted to: "" (nothing) when reading the file.
How can I make this char show up in my variable "data"?
Due to answers I've got, I understand that the data is cleaned from high ASCII characters by adding replaceAll("[^\\x00-\\x7F]", ""). But what can I do if I want all high ASCII characters to be cleaned except this one 'あ'?
The character you want is the unicode character HIRAGANA LETTER A and has code U+3042.
You can simply add it to the list of valid characters:
...
.replaceAll("[^\\x00-\\x7F\\u3042]", "")
...
I created a JSON file on the fly by using some runtime data and stored as string as like below:
JSON:
{
"ticketDetails": "kindle tracking ticket: TICKET0900060
Iimpact statement: impacted due to year 2020 format handling issue,
depending on the Gateway,
user can be asked to
try with another instrument.
Timeline: 00: 00 SAP Internal Declines spiked to 300 +
05: 22 AM flintron reported DECLINED errors since 0: 00 PST.
As per TDO,flitron is not seeing clear metrics impact " }
Note: I just copied the exact json which i'm getting at runtime. It containse \n space and exactly like above.
I can see few JSONObjects like ticketDetails is having huge description and when I tried to parse the above string is leading to parse error.
I tried the below way to eliminate the parse error by using
String removeSpecialCharacterFromJson= jsonString.replaceAll("[^A-Za-z0-9]","")
System.out.println(removeSpecialCharacterFromJson);
Sample Output:
kindletrackingticketTICKET0900060Iimpactstatementimpactedduetoyear2020formathandlingissue....[space between characters are removed and It's hard to read the string]
The above code removed all the special characters from the string and It will be successfully parsed. But the description is not having the space and It very hard to read the content after the above changes done.
I tried to escape the \s in the regular expression which is giving the original String value which is leading to parse exception.
String removeSpecialCharacterFromJson= jsonString.replaceAll("[^A-Za-z0-9\\s]","")
Is there anyohter way to handle this ? I just want to the ticketDetails to be readable format and It should not have any special characters and \n lines.
Can someone help me on this?
s in regular expression is not for space but for the whitespace
I guess that you may have some additional non allowed whitespace in your JSON string
Take a look at The JSON spec (RFC 7159):
Insignificant whitespace is allowed before or after any of the six structural characters.
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ) ; Carriage return
Verify your values and look for improper whitespaces
I am trying to remove a special character (å) from a column in a dataframe.
My data looks like:
ClientID,PatientID
AR0001å,DH_HL704221157198295_91
AR00022,DH_HL704221157198295_92
My original data is approx 8TB in size from which I need to get rid of this special character.
Code to load data:
reader.option("header", true)
.option("sep", ",")
.option("inferSchema", false)
.option("charset", "ISO-8859-1")
.schema(schema)
.csv(path)
After loading into dataframe when I do df.show() it shows:
+--------+--------------------+
|ClientID| PatientID|
+--------+--------------------+
|AR0001Ã¥|DH_HL704221157198...|
|AR00022 |DH_HL704221157198...|
+--------+--------------------+
Code I used to try to replace this character:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "\å", ""));
But this didn't work. While loading the data in dataframe if I change the charset to "UTF-8" it works.
I am not able to find a solution with the current charset (ISO-8859-1).
Some things to note,
Make sure to assign the result to a new variable and use that afterwards
You do not need to escape "å" with \
colName in the command should be ClientId or PatientID
If you did all these things, then I would suggest to, instead of matching on "å", try matching on the characters you want to keep. For example, for the ClientID column,
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^A-Z0-9_]", ""));
Another approach would be to convert the UTF-8 character "å" to it's ISO-8859-1 equivalent and replace with the resulting string.
String escapeChar = new String("å".getBytes("UTF-8"), "ISO-8859-1");
The below command will remove all the special characters and will keep all the lower/upper case alphabets and all the numbers in the string:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^a-zA-Z0-9]", ""));
For Simple Java Mail I'm trying to deal with a somewhat free-format of delimited email addresses. Note that I'm specifically not validating, just getting the addresses out of a list of addresses. For this use case the addresses can be assumed to be valid.
Here is an example of a valid input:
"name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
So there are two basic forms "name#domain.com" and "Joe Sixpack ", which can appear in a comma / semicolon delimited string, ignoring white space padding. The problem is that the names can contains delimiters as valid characters.
The following array shows the data needed (trailing spaces or delimiters would not be a big problem):
["name#domain.com",
"Sixpack, Joe 1 <name#domain.com>",
"Sixpack, Joe 2 <name#domain.com>",
"Sixpack, Joe, 3<name#domain.com>",
"nameFoo#domain.com",
"nameBar#domain.com",
"nameBaz#domain.com"]
I can't think of a clean way to deal with this. Any suggestion how I can reliably recognize whether a comma is part of a name or is a delimiter?
Final solution (variation on the accepted answer):
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
// recognize value tails and replace the delimiters there, disambiguating delimiters
const result = string
.replace(/(#.*?>?)\s*[,;]/g, "$1<|>")
.replace(/<\|>$/,"") // remove trailing delimiter
.split(/\s*<\|>\s*/) // split on delimiter including surround space
console.log(result)
Or in Java:
public static String[] extractEmailAddresses(String emailAddressList) {
return emailAddressList
.replaceAll("(#.*?>?)\\s*[,;]", "$1<|>")
.replaceAll("<\\|>$", "")
.split("\\s*<\\|>\\s*");
}
since you are not validating, i assume that the email addresses are valid.
Based on this assumption, i will look up an email address followed by ; or , this way i know its valid.
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
const result = string.match(/(.*?#.*?\..*?)[,;]/g)
console.log(result)
This pattern works for your provided examples:
([^#,;\s]+#[^#,;\s]+)|(?:$|\s*[,;])(?:\s*)(.*?)<([^#,;\s]+#[^#,;\s]+)>
([^#,;\s]+#[^#,;\s]+) # email defined by an # with connected chars except ',' ';' and white-space
| # OR
(?:$|\s*[,;])(?:\s*) # start of line OR 0 or more spaces followed by a separator, then 0 or more white-space chars
(.*?) # name
<([^#,;\s]+#[^#,;\s]+)> # email enclosed by lt-gt
PCRE Demo
Using Java's replaceAll and split functions (mimicked in javascript below), I would say lock onto what you know ends an item (the ".com"), replace separator characters with a unique temp (a uuid or something like <|>), and then split using your refactored delimiter.
Here is a javascript example, but Java's repalceAll and split can do the same job.
var string = "name#domain.com,Joe Sixpack <name#domain.com>, Sixpack, Joe <name#domain.com> ;Sixpack, Joe<name#domain.com> , name#domain.com,name#domain.com;name#domain.com;"
const result = string.replace(/(\.com>?)[\s,;]+/g, "$1<|>").replace(/<\|>$/,"").split("<|>")
console.log(result)
79 0009!017009!0479%0009!0479 0009!0469%0009!0469
0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449
0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419
0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009
0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339
0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032
In this data, I'm supposed to extract the number 47, 46 , 45 , 44 and so on. I´m supposed to avoid the rest. The numbers always follow this flow - 9!0 no 9%
for example: 9!0 42 9%
Which language should I go about to solve this and which function might help me?
Is there any function that can position a special character and copy the next two or three elements?
Ex: 9!0 42 9% and ' 009
look out for ! and then copy 42 from there and look out for ' that refers to another value (009). It's like two different regex to be used.
You can use whatever language you want, or even a unix command line utility like sed, awk, or grep. The regex should be something like this - you want to match 9!0 followed by digits followed by 0%. Use this regex: 9!0(\d+)0% (or if the numbers are all two digits, 9!0(\d{2})0%).
The other answers are fine, my regex solution is simply "9!.(\d\d)"
And here's a full solution in powershell, which can be easily correlated to other .net langs
$t="79 0009!017009!0479%0009!0479 0009!0469%0009!0469 0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449 0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419 0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009 0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339 0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032"
$p="9!.(\d\d)"
$ms=[regex]::match($t,$p)
while ($ms.Success) {write-host $ms.groups[1].value;$ms=$ms.NextMatch()}
This is perl:
#result = $subject =~ m/(?<=9!0)\d+(?=9%)/g;
It will give you an array of all your numbers. You didn't provide a language so I don't know if this is suitable for you or not.
Pattern regex = Pattern.compile("(?<=9!0)\\d+(?=9%)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}