Split UK postcode into two main parts using java

Split UK postcode into two main parts using java - java

This regular expression for validating postcodes works perfect
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})$
but I want to split the postcodes to retrieve the individual parts of the postcode using java.
How can this be done in java?

Here are the official regexes for matching UK postcodes:
http://interim.cabinetoffice.gov.uk/media/291370/bs7666-v2-0-xsd-PostCodeType.htm
If you want to split a found postcode into it's two parts, isn't it simply a question of splitting on whitespace? A UK postcode's two parts are just separated by a space, right? In java this would be:
String[] fields = postcode.split("\\s");
where postcode is a validated postcode and fields[] will be an array of length 2 containing the first and second parts.
Edit: If this is to validate user input, and you want to validate the first part, your regex would be:
Pattern firstPart = Pattern.compile("[A-Z]{1,2}[0-9R][0-9A-Z]?");
To validate the second part it is:
Pattern secondPart = Pattern.compile("[0-9][A-Z-[CIKMOV]]{2}");

I realise that it's rather a long time since this question was asked, but I had the same requirement and thought that I'd post my solution in case it helps someone out there :
const string matchString = #"^(?<Primary>([A-Z]{1,2}[0-9]{1,2}[A-Z]?))(?<Secondary>([0-9]{1}[A-Z]{2}))$";
var regEx = new Regex(matchString);
var match = regEx.Match(Postcode);
var postcodePrimary = match.Groups["Primary"];
var postcodeSecondary = match.Groups["Secondary"];
This doesn't validate the postcode, but it does split it into 2 parts if no space has been entered between them.

You can use the Google's recentlly open sourced library for this. http://code.google.com/p/libphonenumber/

Related

replace char in specific pattern or after specific character

The task at hand is to replace "-" with "/" in a birthday format e.g. 03-12-89 -> 03/12/89. However, the "-" must be able to appear elsewhere in the string e.g. "My-birthday-is-on-the: 03/12/89".
I have tried creating substrings, replace the "-" in the birthday part and then combine the strings again. However, that solution is inflexible and fails the testcases.
I'm thinking I must be able to do this with a regular expression, although I seem unable to construct it. So now I'm back to: String newStr = input.replace("-", "/"); Which remove all instances of "-" which I don't want.
Can anyone help?

You can use the following regex:
(?<=\d{2})-
with replacement \/ (no need to escape it in Java)
INPUT:
My-birthday-is-on-the: 03-12-89
OUTPUT:
My-birthday-is-on-the: 03/12/89
demo
Code:
String input = "My-birthday-is-on-the: 03-12-89";
System.out.println(input.replaceAll("(?<=\\d{2})-", "/"));
OUTPUT:
My-birthday-is-on-the: 03/12/89

The easiest way which comes to mind is just match \d{2}-\d{2}-\d{2}, with capture groups. Then, use those captured numbers to rebuild the birthdate the way you want it. Something like this:
String input = "My-birthday-is-on-the: 03/12/89";
input = input.replaceAll("\\b(\\d{2})-(\\d{2})-(\\d{2})\\b", "$1/$2/$3");
Demo
The advantage of specifying the full pattern is that it avoids the chance of matching anything other than a 6 digit dash-separated birthday.
Edit:
Based on your comment below, it sounds like maybe you want to do this replacement on a two dash separated number, with any number of digits. In this case, we can slightly modify the above code to the following:
String input = "Your policy number is: 123-45-6789.";
input = input.replaceAll("\\b(\\d+)-(\\d+)-(\\d+)\\b", "$1/$2/$3");

Using regex in Java to extract string after first comma and before two capital letters and a comma

I am currently working with strings that follow this format:
4,Matt, Hopkins,MI,5.75,Wood,33.0,2.25,2.1,2016-09-02,74.25,69.3,8.254125,151.804125
and I am trying to use regex to extract all the words and integers as separate strings ( as in MI, Wood, 33.0 and so forth) with one exception: I want to treat the part that follows the first comma as a single string, until we get to the all caps - so the regex would extract this:
[4] [Matt, Hopkins] [MI] [5.75] [Wood] and so forth.
Note that the name part can have no commas at all i.e. [Hopkins] or more than one i.e. [Matt, Jr., Hopkins]. The all caps field desribes a state and so always follows the same format.
I do not understand Regex well enough to do that - so far I only came up with
[a-zA-Z(?:\d*\.)?\d+-]+
which handles all fields alright, except the name.

You can do something like (my Java is a bit rusty and I'm posting this from a phone):
String[] values = data.split(",(?! )");
Java allows splitting a string on a regex, and this simple specimen uses a negative lookahead to ensure that you're only splitting on CSV commas, rather than the ones in names.

Using regex might just make things harder for yourself here.
This looks like CSV data. You can use a CSV library to correctly parse this into individual fields (*):
String[] fields = YourCsvLibrary.parseRow(string); // or string.split(","), maybe.
and then recombine the fields as appropriate. For example, your regex's logic can be expressed via the following code:
String[] output = Arrays.copyOfRange(fields, 1, fields.length);
output[0] = fields[0];
output[1] = fields[1] + "," + fields[2];
Ideone demo
(*) String.split(",") might work, provided the field data doesn't contain quotes, commas, newlines, etc.

Regex for matching multiple date formats?

Sorry if this is a noob question but I'm not very comfortable with regex and (as of now) this is a little beyond my understanding.
My dilemma is that we have a verity of ID badges that get scanned into an android application and I'm trying to parse out some dates.
For example, some dates are represented like so:
"ISS20141231" format = yyyyMMdd desired output = "20141231"
"ISS12312014" format = MMddyyyy desired output = "12312014"
"ISS12-31-2014" format = MM-dd-yyyy desired output = "12312014"
currently I have a regex pattern:
Pattern p = Pattern.compile("ISS(\\d{8})");
Matcher m = p.matcher(scanData);
which worked fine for the first two examples but recently I have realized that we also occasionally have dates which use dashes (or slashes) as separators.
Is there an efficient means for extracting these dates without having to write multiple patterns and loop through each one checking for a match?
possibly similar to: "ISS([\d{8} (\d{2}\w\d{2}\w\d{4}) (\d{4}\w\d{2}\w\d{2})])"
Thanks!!
[EDIT]
Just to make things a little bit more clear. The substring ("ISSMMddyyyy") is from a much larger string and could be located anywhere within it. So regex must search the original (200+ byte) string for a match.

If that date string is actually a substring of a larger string, and so you need the regex in order to also search for that pattern, you could modify your regex to be:
ISS([\\d\\-/]{8,10})
And then when retrieving the capture group, strip the hyphens and slashes.
String dateStr = m.group(1).replaceAll("[/\\-]", "");

You can do 2 replace i.e. replace ISS first and then replace / or -:
str = str.replaceFirst("^ISS", "").replaceAll("[/-]", "");

Or to only use a regex:
Search: ISS([0-9])([-./])([0-9])([-./])([0-9]*)
Replace: ${1}${3}${5}

Extracting fields from .csv file input line

I'm very new to the world of Java programming, and although I know this is a ridiculously easy question, I can't seem to phrase my searches in a way that turns up the answer I need...so hopefully someone from this community won't mind helping me.
MY program needs to take an input line from a .csv file and split it into fields of an array, using commas as delimiters. The fields of the array are then assigned to variables that are different data types - char, int, float, and string. What I'm struggling with is the formatting for my String variables.
Here is part of my code:
public void parseCSV(String inputLine) {
String[] splitFields;
splitFields = inputLine.split(",");
try {
empNumber = Integer.parseInt(splitFIelds[0[);
payType = splitFields[1].charAt(0);
hourlyRate = Float.parseFloat(splitFields[2]);
last name =
I need to assign variable lastName, a String data type, to position 3 of my splitFields array. I just don't know how to format it. Help would be greatly appreciated!

A warning on your overall approach
Go with the other answers if you're doing a homework assignment with a simple csv file, but splitting a String on the comma character , will not work for more complicated CSVs. Example:
"Roberts, John", Chicago
This should be read as two cells where the first string is Roberts, John. Naive splitting on , will read this as three cells: "Roberts, John", and Chicago.
What you should be doing (for robust code)
If you're writing serious/production level code, you should use the Apache Commons CSV library to parse CSVs. There are enough tricky issues with commas and quotations, enough variation in possible formats that it makes sense to use a mature library. There's no reason to reinvent the wheel.
Another tool for parsing text
If you're a beginner, this might be opening up a can of worms, but a powerful tool for parsing/validating text input is "regular expressions." Regular expressions can be used to match a string against a pattern and to extract portions of a string. Once you have extracted a String from a specific cell of a csv, you could use a regular expression to validate that the String is in the format you're expecting.
While you're unlikely to really need regular expressions for this project, I thought I'd mention it.

String.split(...) returns a String[] so you really can just assign a specific index to a String.
String s = "one two dog!";
String[] sa = s.split(" ");
String ns = sa[1]; // ns now equals "two"
so you can just:
last_name = splitFields[index]; // this will work fine as long as index is within the `array` bounds.
Please mind that your last name var has a space(that might have been you problem).
I also recommend minding the parses, Integer.parseInt(...) & Float.parseFloat(...) might throw a NumberFormatException if you try to parse a non decimal values.

Easy, it is already a String, so you do not have to perform additional parsing. The following assignment will do the trick:
lastName = splitFields[3];

How to extract substrings from a string in java

I am not so confident in Java so I need some help to extract multiple substrings from a string.string is as given below.
I have a text file with possibly thousands of similar POS-tagged lines that I need to extract the original text from that.I have tried using tokenizer but didn't really get the result I wanted.I tried using Pattern Matcher and I am having problems with the regex.
String="I_PRP recently_RB purchased_VBD this_DT camera_NN";
I want to get the output= I recently purchased this camera.
I use
Regex: [\/](.*?)\s\b
But its not working.Please help me.

try
String s= "I_PRP recently_RB purchased_VBD this_DT camera_NN";
s = s.replaceAll("_\\w+(?=(\\s|$))", "");
System.out.println(s);
prints
I recently purchased this camera

It seems that you are attaching a tag to indicate the word type (e.g. noun, verb or pronoun) if this suffix will be always capital letters, it is more safe to use the following regex in your replaceAll
s = s.replaceAll("_[A-Z]+(?=(\\s|$))", "");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split UK postcode into two main parts using java - java

You can use the Google's recentlly open sourced library for this. http://code.google.com/p/libphonenumber/

Related

replace char in specific pattern or after specific character

Using regex in Java to extract string after first comma and before two capital letters and a comma

Regex for matching multiple date formats?

Extracting fields from .csv file input line

How to extract substrings from a string in java

Categories

Resources