How do I change 38k to 38000? or 43k to 48000 etc
I tried this
String s0=toClean.replaceAll("[0-9]k", "[0-9]000");
But its wrong. It changes 38k to 309000
Not only one digit, but multiple (so add +). Also, you should capture that number and return it with the capture group 1 ($1) and check if there's nothing else after k (e.g. so as 38kabc wouldn't be valid)
String s0=toClean.replaceAll("([0-9]+)k\\b", "$1000");
Related
I have a file with data in the first row that i want to extract the data looks like
20200403|AS421|||FINN|
public void handleLine(String line) {
if (line.contains(firstJobConfig.DELIMITER_PIPE)){
headerInfo.setcreateDate(line.substring(0, line.indexOf(firstJobConfig.DELIMITER_PIPE)));
headerInfo.setformName(line.substring(line.indexOf(firstJobConfig.DELIMITER_PIPE)));
}
}
}
I have code that pulls 20200403 into my createDate variable but i cant figure out how to get my formName to be set to AS421. right now its set to |AS421|||FINN|. i know that if i doline.substring(9,14)); it will work but i want to start after the first pipe delimiter( |) and stop at the next one.
Right now, you're doing this: headerInfo.setformName(line.substring(line.indexOf(firstJobConfig.DELIMITER_PIPE))) -> you're taking substring starting with the index equals to index where the first delimiter is and aren't specifying the end of this substring (That's why the result of the second substring is: |AS421|||FINN|). So the better way will be to use line.split("\\|") - It will return the table of 5 elements in your case: ["20200403","AS421","","","FINN"]. And then you can do:
headerInfo.setcreateDate(table[0]);
headerInfo.setformName(table[1])
You can split the strings like below.
Add a + to match one or more instances of the pipe:
temp.split("\\|+");
I am a Java developer, but am working on a C# project. What I need to do is split a String by a delimiter, but limit it to a certain number of fields. In Java, I can do this:
String message = "xx/xx - xxxxxxxxxxxxxxxxxxx - xxxxxxx";
String[] splitMessage = message.split("\\s-", 3);
In this case, it will split it by the -, but I want to also have it check for any space before the dash, and limit it to 3 fields of the String. The String coming through is broken down into ___ - ____________ - _________ with the first space being a date (like 12/31) the second space being a message about the string, and the third space being a location tied to the message. The reason I limit it to 3 fields so the array only has 3 elements. The reason I do this is because sometimes the message can have dashes in it to look like this: 12/31 - Test message - test - Test City, 11111. So my Java code above would split it into this:
0: 12/31
1: Test message - test
2: Test City, 11111
I am trying to achieve something similar in C#, but am not sure how to limit it to a certain number of fields. This is my C# code:
var splitMessage = Regex.Split(Message, " -");
The problem is that without a limit, it splits it into 4 or 5 fields, instead of just the 3. For example, if this were the message: 12/31 - My test - don't use - just a test - Test City, 11111, it would return a string[] with 5 indexes:
0: 12/31
1: My test
2: don't use
3: just a test
4: Test City, 11111
When I want it to return this:
0: 12/31
1: My test - don't use - just a test
2: Test City, 11111
Before you ask, I can't change the incoming String. I have to parse it the same why I did in Java. So is there an equivalent to limiting it to 3 fields? Is there a better way to do it besides using Regex.Split()?
If you want to split based on the first and last instance of -, such that you get exactly three fields (so long as there are at least two dashes in the string), C# does actually have a neat trick for this. C# Regex allows for non-fixed-width lookbehinds. So the following regex:
(?<=^[^-]*)-|-(?=[^-]*$)
(<= //start lookbehind
^ //look for start of string
[^-]* //followed by any amount of non-dash characters
) //end lookbehind
- //match the dash
| //OR
- //match a dash
(?= //lookahead for
[^-]* //any amount of non-dash characters
$ //then the end of the string
) //end lookahead
Will match the first and last dash, and allow you to split the string the way you want to.
var splitMessage = Regex.Split(Message, "(?<=^[^-]*)-|-(?=[^-]*$)");
Note that this also has no problem splitting into fewer than three groups, if there are less dashes, but will not split into more than three.
You can't split like with the delimiter inside the one of the desired grouped, except when that is the last group.
You can however use a custom regex that consume as much as possible in the 2nd group to parse the said input:
var splitMessage = Regex.Match("12/31 - Test message - test - Test City, 11111", "^(.+?) - (.+) - (.+)$")
.Groups
.Cast<Group>()
// skip first group which is the entire match
.Skip(1)
.Select(x => x.Value)
.ToArray();
Given that the first group is "xx/xx", you can also opt to use this regex instead:
"^(../..) - (.+) - (.+)$"
// or, assuming they are date
"^(\d{2}/\d{2}) - (.+) - (.+)$"
EDIT: Or, you can just split by " - ", and then concatenate everything in the middle together when there is more than 3 matches:
var groups = "12/31 - Test message - test - Test City, 11111".Split(new[] { " - " }, StringSplitOptions.None);
if (groups.Length > 3)
{
groups = new[]
{
groups[0],
string.Join(" - ", groups.Skip(1).Take(groups.Length - 2)),
groups[groups.Length - 1]
};
}
Whe I have to split a string at certain delimiters including optional spaces, I do it usually this way:
String message = "xx/xx - xxxxxxxxxxxxxxxxxxx - xxxxxxx";
String[] splitMessage = message.split(" *- *", 3);
System.out.println(Arrays.asList(splitMessage));
Outputs: [xx/xx, xxxxxxxxxxxxxxxxxxx, xxxxxxx]
String message = "12/31 - My test - don't use - just a test - Test City; 11111";
String[] splitMessage = message.split(" *- *", 3);
System.out.println(Arrays.asList(splitMessage));
Outputs: [12/31, My test, don't use - just a test - Test City; 11111]
But you seem to want that something different:
splitMessage[0] shall contain the first part
splitMessage[1] shall contain the second and third part
splitMessage[2] shall contain the rest
How do you want to tell your computer that the second output element shall contain two parts? I think this is impossible except by splitting the string into all 5 parts and then re-concatenating the parts together as you want.
Maybe it's not clear what result you want. Can you specify the requirement more clearly: What shall happen if the input string contains more than 3 elements?
I got to deal here with a problem, caused by a dirty design. I get a list of string and want to parse attributes out of it. Unfortunately, I can't change the source, where these String were created.
Example:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false"
Now I want to extract the attributes type, languageCode, url, ref, info and deactivated.
The problem here is the field info, whose text is not limited by quote mark. Also commas may occur in this field, so I can't use the comma at the end of the string, to find out where is ends.
Additional, those strings not always contain all attributes. type, info and deactivated are always present, the rest is optional.
Any suggestions how I can solve this problem?
One possible solution is to search for = characters in the input and then take the single word immediately before it as the field name - it seems that all your field names are single words (no whitespace). If that's the case, you can then take everything after the = until the next field name (accounting for separating ,) as the value.
This assumes that the value cannot contain =.
Edit:
As a possible way to handle embedded =, you can see if the word in front of it is one your known field names - if not, you can possibly treat the = as an embedded character rather than an operator. This, however, assumes that you have a fixed set of known fields (some of which may not always appear). This assumption may be eased if you know that the field names are case-sensitive.
Assuming that order of elements is fixed you could write solution using regex like this one
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String regex = //type, info and deactivated are always present
"type=(?<type>.*?)"
+ "(?:, languageCode=(?<languageCode>.*?))?"//optional group
+ "(?:, url=(?<url>.*?))?"//optional group
+ "(?:, ref=(?<rel>.*?))?"//optional group
+ ", info=(?<info>.*?)"
+ ", deactivated=(?<deactivated>.*?)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("type -> "+m.group("type"));
System.out.println("languageCode -> "+m.group("languageCode"));
System.out.println("url -> "+m.group("url"));
System.out.println("rel -> "+m.group("rel"));
System.out.println("info -> "+m.group("info"));
System.out.println("deactivated -> "+m.group("deactivated"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
rel -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
EDIT: Version2 regex searching for oneOfPossibleKeys=value where value ends with:
, oneOfPossibleKeys=
or has end of string after it (represented by $).
Code:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String[] possibleKeys = {"type","languageCode","url","ref","info","deactivated"};
String keysStrRegex = String.join("|", possibleKeys);
//above will contain type|languageCode|url|ref|info|deactivated
String regex = "(?<key>\\b(?:"+keysStrRegex+")\\b)=(?<value>.*?(?=, (?:"+keysStrRegex+")=|$))";
// (?<key>\b(?:type|languageCode|url|ref|info|deactivated)\b)
// =
// (?<value>.*?(?=, (?:type|languageCode|url|ref|info|deactivated)=|$))System.out.println(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group("key")+" -> "+m.group("value"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
ref -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
You could use a regular expression, capturing all the "fixed" groups and using whatever remains for info. This should even work if the info part contains , or = characters. Here's some quick example (using Python, but that should not be a problem...).
>>> p = r"(type=[A-Z]+), (languageCode=[-A-Z]+), (url=[^,]+), (ref=\d), (info=.+?), (deactivated=(?:true|false))"
>>> s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars, even deactivated=true., deactivated=false"
>>> re.search(p, s).groups()
('type=INFO',
'languageCode=EN-GB',
'url=http://www.stackoverflow.com',
'ref=1',
'info=Text, that may contain all kind of chars, even deactivated=true.',
'deactivated=false')
If any of those elements are optional, you can put a ? after those groups, and make the comma optional. If the order can be different, then it's more complicated. In this case, instead of using one RegEx to capture everything at once, use several RegExes to capture the individual attributes and then remove (replace with '') those in the string before matching the next attribute. Finally, match info.
On further consideration, given that those attributes could have any order, it may be more promising to capture just everything spanning from one keyword to the next, regardless of its actual content, very similar to Pshemo's solution:
keys = "type|languageCode|url|ref|info|deactivated"
p = r"({0})=(.+?)(?=\, (?:{0})=|$)".format(keys)
matches = re.findall(p, s)
But this, too, might fail in some very obscure cases, e.g. if the info attribute contains something like ', ref=foo', including the comma. However, there seems to be no way around those ambiguities. If you had a string like info=in this string, ref=1, and in another, ref=2, ref=1, does it contain one ref attribute, or three, or none at all?
[introduction][position]Lead Researcher and Research Manager[/position] in the [affiliation]Web Search and Mining Group, Microsoft Research[/affiliation]</b>.
I am a [position]lead researcher[/position] at [affiliation]Microsoft Research[/affiliation]. I am also [position]adjunct professor[/position] of [affiliation]Peking University[/affiliation], [affiliation]Xian Jiaotong University[/affiliation] and [affiliation]Nankai University[/affiliation].
I joined [affiliation]Microsoft Research[/affiliation] in June 2001. Prior to that, I worked at the Research Laboratories of NEC Corporation.
I obtained a [bsdegree]B.S.[/bsdegree] in [bsmajor]Electrical Engineering[/bsmajor] from [bsuniv]Kyoto University[/bsuniv] in [bsdate]1988[/bsdate] and a [msdegree]M.S.[/msdegree] in [msmajor]Computer Science[/msmajor] from [msuniv]Kyoto University[/msuniv] in [msdate]1990[/msdate]. I earned my [phddegree]Ph.D.[/phddegree] in [phdmajor]Computer Science[/phdmajor] from the [phduniv]University of Tokyo[/phduniv] in [phddate]1998[/phddate].
I am interested in [interests]statistical learning[/interests], [interests]natural language processing[/interests], [interests]data mining, and information retrieval[/interests].[/introduction]
I'm able to strip all tags from the paragraph above with:
String stripped = html.replaceAll("\\[.*?\\]", "");
But I'd like to keep three pairs of tags in the paragraph, which are [bsuniv][/bsuniv],[msuniv][/msuniv] and [phduniv][/phduniv]. In other words, I don't want to strip those tags containing the keyword "univ". I can't find a convenient way to rewrite the regular expression. Anyone help me?
You can use a negative-look ahead assertion here: -
str = str.replaceAll("\\[(.(?!univ))*?\\]", "");
or: -
str = str.replaceAll("\\[((?!univ).)*?\\]", "");
Both of them will give you the desired output. There is only one difference -
The first one does a negative look-ahead, against the current character, and if it is not followed by univ, it moves to the next character.
The second one does a negative look-ahead against an empty string before every character, and if it is not followed by univ, it goes ahead to match a single character.
I use java and a regexp.
I've made a regexp for password validation :
String PASSWORD_PATTERN_ADVANCED = "^(?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\\\\##$¤£µ§%&<>,.!:?;~{-|`'_^¨éèçàù)=}()°\"\\]\\[²³*/+]).{8,20}$";
or without the extra slash :
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\\##$¤£µ§%&<>,.!:?;~{-|`'_^¨éèçàù)=}()°"\]\[²³*/+]).{8,20}$
whuch means (i may be wrong): at least one digit / at least one lowercase / at least one uppercase / at least one of the special chars listed / with a minimum total length of 8 and a max of 20...
made a test case generating password for success and failure...
success -> OK, all passed
failure -> Almost OK ...
The only password that fails to fail :D are the ones with space in it like :
iF\ !h6 2A3|Gm
¨I O7 gZ2%L£k vd~39
2< A Uw a7kEw6,6S^
cC2c5N#
6L kIw~ Béj7]5
ynRZ #44ç
9A `sè53Laj A
s²R[µ3 9UrR q8n
I am puzzled.
Any thoughts to make it works ?
Thanks
A regex may not be the right tool for the job here.
Regexes are best suited for matching patterns; what you're describing isn't really a pattern, per se; it's more of a rule set. Sure, you may be able to create some regex that helps, but it's a really complex and opaque piece of code which make maintenance a challenge.
A method like this might be a better fit:
public boolean isValidPassword(String password) {
boolean containsLowerCase;
boolean containsUpperCase;
boolean containsInvalid;
boolean containsSpecialChar;
boolean containsDigit;
for(char c: password.toCharArray()) {
containsLowerCase ||= Character.isLowerCase(c);
containsUpperCase ||= Character.isUpperCase(c);
containsDigit ||= Character.isDigit(c);
containsSpecialChar ||= someMethodForDetectingIfItIsSpecial(c);
}
return containsLowerCase &&
containsUpperCase &&
containsSpecialChar &&
containsDigit &&
!containsInvalid &&
password.length >=8 && password.length <=20;
}
You'd need to decide the best way to detect a special character (specialCharArray.contains(c), regular expression, etc).
However, this approach would make adding new rules a lot simpler.
I may be wrong but if you simply don't want spaces then use [^\\s] instead of . in your lookahead.
String PASSWORD_PATTERN_ADVANCED =
"^(?=[^\\s]*\\d)"
+ "(?=[^\\s]*[a-z])"
+ "(?=[^\\s]*[A-Z])"
+ "(?=[^\\s]*[\\\\##$¤£µ§%&<>,.!:?;~{-|`'_^¨éèçàù)=}()°\"\\]\\[²³*/+])"
+ ".{8,20}$";
None of your conditions are stating what can't be in the password, only what must. You need one more condition that combines all the possible valid characters and makes sure all characters in the password are in that list (i.e., (\d|[a-z]|[A-Z]|##$...){8,20} as the final condition). Either that or a list of rejected characters.