How to remove unbalanced/unpartnered double quotes (in Java) - java

I thought to share this relatively smart problem with everyone here.
I am trying to remove unbalanced/unpaired double-quotes from a string.
My work is in progress, I might be close to a solution. But, I didn't get a working solution yet. I am not able to delete the unpaired/unpartnered double-quotes from the string.
Example Input
string1=injunct! alter ego."
string2=successor "alter ego" single employer" "proceeding "citation assets"
Output Should be
string1=injunct! alter ego.
string2=successor "alter ego" single employer proceeding "citation assets"
This problem sound similar to
Using Java remove unbalanced/unpartnered parenthesis
Here is my code so far(it doesn't delete all the unpaird double-quotes)
private String removeUnattachedDoubleQuotes(String stringWithDoubleQuotes) {
String firstPass = "";
String openingQuotePattern = "\\\"[a-z0-9\\p{Punct}]";
String closingQuotePattern = "[a-z0-9\\p{Punct}]\\\"";
int doubleQuoteLevel = 0;
for (int i = 0; i < stringWithDoubleQuotes.length() - 3; i++) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(openingQuotePattern)) {
doubleQuoteLevel++;
firstPass += c;
}
else if (c.matches(closingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
firstPass += c;
}
}
else {
firstPass += c;
}
}
String secondPass = "";
doubleQuoteLevel = 0;
for (int i = firstPass.length() - 1; i >= 0; i--) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(closingQuotePattern)) {
doubleQuoteLevel++;
secondPass = c + secondPass;
}
else if (c.matches(openingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
secondPass = c + secondPass;
}
}
else {
secondPass = c + secondPass;
}
}
String result = secondPass;
return result;
}

It could probably be done in a single regex if there is no nesting.
There is a concept of delimeters roughly defined, and it is possible to 'bias'
those rules to get a better outcome.
It all depends on what rules are set forth. This regex takes into account
three possible scenario's in order;
Valid Pair
Invalid Pair (with bias)
Invalid Single
It also doesen't parse "" beyond end of line. But it does do multiple
lines combined as a single string. To change that, remove \n where you see it.
global context - raw find regex
shortened
(?:("[a-zA-Z0-9\p{Punct}][^"\n]*(?<=[a-zA-Z0-9\p{Punct}])")|(?<![a-zA-Z0-9\p{Punct}])"([^"\n]*)"(?![a-zA-Z0-9\p{Punct}])|")
replacement grouping
$1$2 or \1\2
Expanded raw regex:
(?: // Grouping
// Try to line up a valid pair
( // Capt grp (1) start
" // "
[a-zA-Z0-9\p{Punct}] // 1 of [a-zA-Z0-9\p{Punct}]
[^"\n]* // 0 or more non- [^"\n] characters
(?<=[a-zA-Z0-9\p{Punct}]) // 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
) // End capt grp (1)
| // OR, try to line up an invalid pair
(?<![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
( [^"\n]* ) // Capt grp (2) - 0 or more non- [^"\n] characters
" // "
(?![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] ahead of us
| // OR, this single " is considered invalid
" // "
) // End Grouping
Perl testcase (don't have Java)
$str = '
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
';
print "\n'$str'\n";
$str =~ s
/
(?:
(
"[a-zA-Z0-9\p{Punct}]
[^"\n]*
(?<=[a-zA-Z0-9\p{Punct}])
"
)
|
(?<![a-zA-Z0-9\p{Punct}])
"
( [^"\n]* )
" (?![a-zA-Z0-9\p{Punct}])
|
"
)
/$1$2/xg;
print "\n'$str'\n";
Output
'
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
'
'
string1=injunct! alter ego.
string2=successor "alter ego" single employer "a" free proceeding "citation assets"
'

You could use something like (Perl notation):
s/("(?=\S)[^"]*(?<=\S)")|"/$1/g;
Which in Java would be:
str.replaceAll("(\"(?=\\S)[^\"]*(?<=\\S)\")|\"", "$1");

Related

Java match two strings without last character

I've a URL with path being /mypath/check/10.10/-123.11 . I want to return true if (optionally) there are 3 digits after decimal instead of 2 e.g /mypath/check/10.101/-123.112 should return true when matched. Before decimal for both two occurences should be exact match.
To cite some examples :
Success
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.112
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.11
/mypath/check/10.10/-123.11 = /mypath/check/10.10/-123.112
/mypath/check/10.10/123.11 = /mypath/check/10.101/123.112
.. and so forth
Failure :
/mypath/check/10.10/-123.11 != /mypath/check/10.121/-123.152
/mypath/check/10.11/-123.11 != /mypath/check/10.12/-123.11
The numbers before decimal can include - with digits with 1 to 3 numbers.
Try /mypath/check/10\.10/-?123\.11[ ]*=[ ]*/mypath/check/(\d\d)\.\1\d?/
demo
Try this:
url1.equals(url2) || url1.equals(url2.replaceAll("\\d$", ""))
Idea
Regex subpatterns that shall match optionally are suffixed with the ? modifier. In your case this applies to the 3rd character after a decimal point.
An equality tests modulo that optional digit may be implemented in matching each occurrence of the context pattern and replacing the optional part within the match with the empty string. After this normalization the strings can be tested for equality.
Code
// Initializing test data.
// Will compare Strings in batch1, batch2 at the same array position.
//
String[] batch1 = {
"/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.11/-123.11"
};
String[] batch2 = {
"/mypath/check/10.101/-123.112"
, "/mypath/check/10.101/-123.11"
, "/mypath/check/10.10/-123.112"
, "/mypath/check/10.101/123.112"
, "/mypath/check/10.121/-123.152"
, "/mypath/check/10.12/-123.11"
};
// Regex pattern used for normalization:
// - Basic pattern: decimal point followed by 2 or 3 digits
// - Optional part: 3rd digit of the basic pattern
// - Additional context: Pattern must match at the end of the string or be followed by a non-digit character.
//
Pattern re_p = Pattern.compile("([.][0-9]{2})[0-9]?(?:$|(?![0-9]))");
// Replacer routine for processing the regex match. Returns capture group #1
Function<MatchResult, String> fnReplacer= (MatchResult m)-> { return m.group(1); };
// Processing each test case
// Expected result
// match
// match
// match
// match
// mismatch
// mismatch
//
for ( int i = 0; i < batch1.length; i++ ) {
String norm1 = re_p.matcher(batch1[i]).replaceAll(fnReplacer);
String norm2 = re_p.matcher(batch2[i]).replaceAll(fnReplacer);
if (norm1.equals(norm2)) {
System.out.println("Url pair #" + Integer.toString(i) + ": match ( '" + norm1 + "' == '" + norm2 + "' )");
} else {
System.out.println("Url pair #" + Integer.toString(i) + ": mismatch ( '" + norm1 + "' != '" + norm2 + "' )");
}
}
Demo available here (ideone.com).
I'm assuming the first URL always has exactly 2 digits after every decimal point. If so, match the 2nd URL to the regex formed by appending an optional digit to the end of each decimal fraction in the first URL.
static boolean matchURL(String url1, String url2)
{
return url2.matches(url1.replaceAll("([.][0-9]{2})", "$1[0-9]?"));
}
Test:
String url1 = "/mypath/check/10.10/-123.11";
List<String> tests = Arrays.asList(
"/mypath/check/10.10/-123.11",
"/mypath/check/10.10/-123.111",
"/mypath/check/10.101/-123.11",
"/mypath/check/10.101/-123.111",
"/mypath/check/10.11/-123.11"
);
for(String url2 : tests)
System.out.format("%s : %s = %b%n", url1, url2, matchURL(url1, url2));
Output:
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.11/-123.11 = false

How do I stop regex after finding "Message: "?

I'm splitting the body of a JSON message with the regex ":|\n" and storing the values into an array. I would like to get assistance with stopping my regex expression from splitting the message once it finds "Message: ".
In the JSON body, each section is separated by a new line, so the body looks similar to this:
{"body": "Name: Alfred Alonso\nCompany: null\nEmail: 123#abc.com\nPhone Number: 123-456-9999\nProject Type: Existing\nContact by: Email\nTime Frame: within 1 month\nMessage: Hello,\nThis is my message.\nThank You,\nJohn Doe"}
The code below works perfectly when the user doesn't create a new line within the message, so the entire message gets stored as one array value.
Thank you to anyone that can help me fix this!
String[] messArr = body.split(":|\n");
for (int i = 0; i < messArr.length; i++)
messArr[i] = messArr[i].trim();
if ("xxx".equals(eventSourceARN)) {
name = messArr[1];
String[] temp;
String delimiter = " ";
temp = name.split(delimiter);
name = temp[0];
String lastName = temp[1];
company = messArr[3];
email = messArr[5];
phoneNumber = messArr[7];
projectType = messArr[9];
contactBy = messArr[11];
timeFrame = messArr[13];
message = messArr[15];
I would like
messArr[14] = "Message"
messArr[15] = "Hello, This is my message. Thank you, John Doe"
This is what I get
[..., Message, Hello,, This is my message., Thank You, John Doe].
messArr[14] = "Message"
messArr[15] = "Hello,"
messArr[16] = "This is my message."
messArr[17] = "Thank You,"
messArr[18] = "John Doe"
Instead of using split, you can use a find loop, e.g.
Pattern p = Pattern.compile("([^:\\v]+): |((?<=Message: )(?s:.*)|(?<!$).*)\\R?");
List<String> result = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
result.add(m.start(1) != -1 ? m.group(1) : m.group(2));
Test
String input = "Name: Alfred Alonso\n" +
"Company: null\n" +
"Email: 123#abc.com\n" +
"Phone Number: 123-456-9999\n" +
"Project Type: Existing\n" +
"Contact by: Email\n" +
"Time Frame: within 1 month\n" +
"Message: Hello,\n" +
"This is my message.\n" +
"Thank You,\n" +
"John Doe";
Pattern p = Pattern.compile("([^:\\v]+): |((?<=Message: )(?s:.*)|(?!$).*)\\R?");
List<String> result = new ArrayList<>();
for (Matcher m = p.matcher(input); m.find(); )
result.add(m.start(1) != -1 ? m.group(1) : m.group(2));
for (int i = 0; i < result.size(); i++)
System.out.println("result[" + i + "]: " + result.get(i));
Output
result[0]: Name
result[1]: Alfred Alonso
result[2]: Company
result[3]: null
result[4]: Email
result[5]: 123#abc.com
result[6]: Phone Number
result[7]: 123-456-9999
result[8]: Project Type
result[9]: Existing
result[10]: Contact by
result[11]: Email
result[12]: Time Frame
result[13]: within 1 month
result[14]: Message
result[15]: Hello,
This is my message.
Thank You,
John Doe
Explanation
Match one of:
( Start capture #1
[^:\v]+ Match one or more characters that are not a : or a linebreak
) End capture #1
: Match, but don't capture, a : and a space (which SO is hiding here)
| or:
( Start capture #2
Match one of:
(?<=Message: )(?s:.*) Rest of input, i.e. all text including linebreaks, if the text is immediately preceded by "Message: "
| or:
(?!$) Don't match if we're already at end-of-input
.* Match 0 or more characters up to end-of-line, excluding the EOL
) End capture #2
\\R? Match, but don't capture, an optional linebreak. This doesn't apply to Message text, and is optional in case there is no Message text and no linebreak after last value
If you want to, you could do exactly what you are doing and then put things together later. As you are trimming, notice where it says Message, then know that the Message is in the next slot and beyond. Then put it back together.
int messagePosition = -1;
for (int i = 0; i < messArr.length; i++){
messArr[i] = messArr[i].trim();
if (i>0 && messArr[i-1].equals("Message")){
messagePosition =i;
}
}
if (messagePosition > -1){
for (int i=messagePosition+1; i <messArr.length; i++){
messArr[messagePosition]=messArr[messagePosition]+" "+messArr[i];
}
}
One downside is that because arrays are fixed size, you need to act as if there is nothing beyond the messagePosition. So any calculations with length will be misleading. If for some reason you are worried you will look in the slots beyond, you could add messArr[i]=""; to the second for loop after the concatenation step.

How to search a CSV file based on an input field?

I am able to access only few number of lines (41 lines to be proper).after that I am unable to read.
import java.io.File;
import java.util.Scanner;
public class FileReader {
public static void main(String[] args) {
String filePath = "qwe.csv";
System.out.println("Enter the City name to be Searched\n");
Scanner in = new Scanner(System.in);
String searchTerm = in.nextLine();
readRecord(filePath, searchTerm);
}
public static void readRecord( String filePath, String searchTerm ) {
boolean found = false;
String City = ""; String City_Asciis = ""; String Lattitude = "";
String longitude = ""; String Country = "";
String iso_2 = ""; String iso_3 = ""; String Admin_Name = "";
String Capital = ""; String Population = ""; String Id = "";
try {
File file = new File(filePath);
Scanner x = new Scanner (file);
x.useDelimiter("[,\n]"); //to separate the data items
//hasNext - Returns true if the scanner has another token/value in its input
while(x.hasNext() && !found) {
City = x.next();
City_Asciis = x.next();
Lattitude = x.next();
longitude = x.next();
Country = x.next();
iso_2 = x.next();
iso_3 = x.next();
Admin_Name = x.next();
Capital = x.next();
Population = x.next();
Id = x.next();
if (City.equals(searchTerm)) {
found = true;
}
}
if (found) {
System.out.println(" The following details are of city : " + City +"\n The Ascii string would be : "
+ City_Asciis +"\n Its having the lattitude around : "
+ Lattitude + "\n and Longitude of : "+ longitude +"\n It is situated in : "
+ Country +"\n These have iso code like : "+ iso_2 +" and : "+ iso_3 +"\n It comes under : "
+ Admin_Name +" State \n Capital of this city is : "+ Capital +"\n The population is around : "
+ Population +"\n ZIP code is : "+Id+"");
}
else {
System.out.print("Enter the Correct City Name");
}
}
catch(Exception e1){
System.out.print("file not found \n");
e1.printStackTrace();
}
}
}
This code will load the searched city from the file path given, so that given a particular city name print the details of the city.
Who knows?
Without getting a headache, the code itself looks as though it should work and I personally can't off hand see why your read will only do 41 lines without doing a series of experiments with actual data and not many people really want to do that which is why you've been asked to provide some sample fictitious data.
It could be as simple as the fact that you are satisfying the boolean found variable criteria within the while loop condition and the loop breaks out stopping the read. I would suspect this since you do indicate that the "code will load the searched city from the file path given". I should think that this is not really what you would want for the mere reason that some countries contain the same City names. As a matter of fact, some States, Provinces, or Regions within the same Country can contain same City names. As an example did you know that in the United States alone, there are 88 Cities and Towns named Washington? I know, weird right, especially when you consider that there are only 50 States and 2 Territories. Benjamin Franklin was also one of the Founding Fathers of the United States and there are 35 cities and towns/villages that honorably carry the name of Franklin within that country.
If your data file or database is big enough then I'm sure you would want to display all cities that match your specific search criteria. With that being said perhaps what you need to do is get rid of that && !found condition for the while loop. I personally wouldn't use the Scanner#hasNext() method in the while loop condition either. It's an invitation for disaster since it's more focused towards checking the availability of tokens when used in combination with Scanner#next() rather than actual file lines. Use the Scanner#hasNextLine() in combination with the Scanner#nextLine() method then use the String#split() method to parse the CSV comma (,) delimited data lines one line at a time.
Below I provide a runnable Java code example which demonstrates the above mentioned methods. Your readRecord() method is used but considerably modified to accommodate the following options:
Return a List Interface (List<String>) of City information found
pertaining to the supplied search criteria.
Ignore (skip past) blank or comment lines within the CSV file. Comment lines can start with # or ;.
The option to ignore letter case during the search.
Allow for selection of the desired City Information field to which
the Search Criteria will be applied to. The City Information Fields
are:
City, CityAscii, Latitude, Longitude, Country, ISO2,
ISO3, AdminName, Capital, Population, and ID
Wildcard (? and *) characters can be used when supplying the desired search field so that the entire field name does not need to be provided, for example: lat* for Latitude. So, you can do a City Information search simply based on population if you like instead of a City Name.
Allow for wildcard (? and *) characters to be used within the
supplied search criteria for example: wash*. This tells the method
to search for any city which name starts with Wash like
Washington or Washougal or Washtucna.
Allow for the number of found city instances to be returned.
Below is the runnable code which demonstrates the above mentioned concepts. The code is well commented. There are Regular Expressions used within the code and if you want an explanation of those expressions then copy/paste them into regex101.com.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class CityInfoRecords {
public static void main(String[] args) {
/* The appplication is started this way so that there
is no need for static methods or variables.
*/
new CityInfoRecords().startApp(args);
}
// Application Start method.
private void startApp(String[] args) {
String ls = System.lineSeparator(); // Not all OS Consoles work well with "\n"
String filePath = "qwe.csv"; // Path and file name of the data file.
Scanner in = new Scanner(System.in);
// Provide the City Info Field to base search from...
System.out.println("Enter the Data Field you want to search by:" + ls
+ "[City, CityAscii, Lattitude, Longitude, Country" + ls
+ "ISO2, ISO3, AdminName, Capital, Population, ID]" + ls
+ "Wildcards (? and *) can be used:");
String searchField = in.nextLine();
// Provide the Search Criteria to find within the supplied City Info Field.
System.out.println(ls + "Enter the search criteria you are looking for" + ls
+ "in " + searchField + ". Wildcards (? and *) are permitted:");
String searchCriteria = in.nextLine();
// Declare a List Interface of String and fill it
// with the call to the readRecord method.
List<String> cityInfoList = readRecord(filePath, searchField, searchCriteria, 0, "N/A");
// Display the returned List to console window.
for (int i = 0; i < cityInfoList.size(); i++) {
System.out.println(cityInfoList.get(i));
}
}
/**
* Returns a List Interface of the City Information found based on the supplied
* search criteria.<br><br>
*
* #param filePath (String) The full path and file name of the data file to read
* containing City information.<br>
*
* #param searchField (String) The City Information Field to based the supplied
* Search Criteria from. Any City Information Field can be supplied here and
* letter case is optional. The wildcard characters (? and *) can also be used
* here so that the entire field name does not need to be supplied, for example:
* <pre>
* lat* for the Latitude field or
* *asc* for the CityAscii field or
* iso? for either the ISO2 or ISO3 fields or simply
* City for the City field.</pre><br>
*
* The <b>?</b> wildcard character specifies any single alphanumeric character,
* as in ?an, which locates "ran," "pan", "can", and "ban".<br><br>
*
* The <b>*</b> wildcard character specifies zero or more of any alphanumeric
* character, as in corp*, which locates "corp", "corporate", "corporation",
* "corporal", and "corpulent".<br>
*
* #param searchCriteria (String) The search criteria string. This can be any
* string you would like to search for within the supplied City Information
* Field. By default letter case is ignored during searches therefore the
* supplied search criteria string does not need to be letter case specific
* however if you want the search to be case specific then set this methods
* optional ignoreLetterCase parameter to false.<br><br>
*
* Wildcard characters (? and *) can also be used within the Search Criteria
* string so as to expand the search to other possibilities, for example if
* the "City" field is supplied and a criteria string like: "wash*" is supplied
* then any city which name starts with "Wash" will have their city information
* returned.<br><br>
*
* The <b>?</b> wildcard character specifies any single alphanumeric character,
* as in ?an, which locates "ran," "pan", "can", and "ban".<br><br>
*
* The <b>*</b> wildcard character specifies zero or more of any alphanumeric
* character, as in corp*, which locates "corp", "corporate", "corporation",
* "corporal", and "corpulent".<br>
*
* #param numberOfFoundToReturn (int) The number of cities who's information
* should be returned. If 0 is supplied then all cities found will be returned.<br>
*
* #param noDataReplacement (String) Sometimes there is no data supplied for a
* specific field within the data file or the file data line may not contain
* the same amount of delimited data. Rather than returning NULL or Null String
* ("") for empty data fields you can supply here what to actually return in
* such a case. "N/A" is a good choice or perhaps: "Nothing Supplied". Whatever
* you like to use can be supplied here.<br>
*
* #param ignoreLetterCase (Optional - Boolean - Default is true) By default
* searches ignore letter case but if you want your search to be letter case
* specific then you can supply boolean false to this optional parameter.<br>
*
* #return (String List Collection) Information for every City found within the
* supplied data file which matches the supplied field and search criteria.
*/
public List<String> readRecord(String filePath, String searchField,
String searchCriteria, int numberOfFoundToReturn,
String noDataReplacement, boolean... ignoreLetterCase) {
String ls = System.lineSeparator(); // Not all OS Consoles work well with "\n" (property)
boolean ignoreCase = true; // Ignore letter case when searching (Default - property)
if (ignoreLetterCase.length > 0) {
ignoreCase = ignoreLetterCase[0];
}
boolean found = false; // Flag to indicate data was found (toggles)
int foundCounter = 0; // Indicates number of same data found (increments)
List<String> returnableList = // The List of found city information that will be returned (collection)
new ArrayList<>();
// City Information Variables (data fields)
String city;
String cityAscii;
String latitude;
String longitude;
String country;
String iso2;
String iso3;
String adminName;
String capital;
String population;
String id;
// Open Scanner to read data file...
// Try With Resources is used here to auto close the reader.
try (Scanner fileReader = new Scanner(new File(filePath))) {
// Iterate through data file...
while (fileReader.hasNextLine()) {
// Read file line by line and remove leading or
// trailing whitespaces, tabs, line breaks, etc.
String cityData = fileReader.nextLine().trim();
// Skip blank or comment lines (comment lines can be lines that start with # or ;)
if (cityData.equals("") || cityData.startsWith("#") || cityData.startsWith(";")) {
continue; // Get next file line
}
// Split the read line based on any comma delimited anomaly.
String[] cityInfo = cityData.split(",|,\\s+|\\s+,|\\s+,\\s+");
// The number of data pieces split from data line.
// Not all lines may contain the same amount of data.
int i = cityInfo.length;
/* Ternary is used to fill city information variables
so that data not provided will not be null or null string.
As an Example for the city variabel this is the same as:
if (i >= 1 && !cityInfo[0].equals("")) {
city = cityInfo[0].trim();
}
else {
city = noDataReplacement;
}
*/
city = (i >= 1 && !cityInfo[0].equals("")) ? cityInfo[0].trim() : noDataReplacement;
cityAscii = (i >= 2 && !cityInfo[1].equals("")) ? cityInfo[1].trim() : noDataReplacement;
latitude = (i >= 3 && !cityInfo[2].equals("")) ? cityInfo[2].trim() : noDataReplacement;
longitude = (i >= 4 && !cityInfo[3].equals("")) ? cityInfo[3].trim() : noDataReplacement;
country = (i >= 5 && !cityInfo[4].equals("")) ? cityInfo[4].trim() : noDataReplacement;
iso2 = (i >= 6 && !cityInfo[5].equals("")) ? cityInfo[5].trim() : noDataReplacement;
iso3 = (i >= 7 && !cityInfo[6].equals("")) ? cityInfo[6].trim() : noDataReplacement;
adminName = (i >= 8 && !cityInfo[7].equals("")) ? cityInfo[7].trim() : noDataReplacement;
capital = (i >= 9 && !cityInfo[8].equals("")) ? cityInfo[8].trim() : noDataReplacement;
population = (i >= 10 && !cityInfo[9].equals("")) ? cityInfo[9].trim() : noDataReplacement;
id = (i >= 11 && !cityInfo[10].equals("")) ? cityInfo[10].trim() : noDataReplacement;
// Determine the city data field we want to search in
String regex;
// Were wildcards used in the supplied Search Field string?
if (searchField.contains("?") || searchField.contains("*")) {
// Yes... Prep regex to get proper search field
regex = searchField.replace("?", ".?").replace("*", ".*?").toLowerCase();
}
else {
regex = "(?i)(" + searchField + ")";
}
// Get proper search field data
String field = "";
if ("city".toLowerCase().matches(regex)) {
field = city;
}
else if ("cityAsciis".toLowerCase().matches(regex)) {
field = cityAscii;
}
else if ("lattitude".toLowerCase().matches(regex)) {
field = latitude;
}
else if ("longitude".toLowerCase().matches(regex)) {
field = longitude;
}
else if ("country".toLowerCase().matches(regex)) {
field = country;
}
else if ("iso2".toLowerCase().matches(regex)) {
field = iso2;
}
else if ("iso3".toLowerCase().matches(regex)) {
field = iso3;
}
else if ("adminName".toLowerCase().matches(regex)) {
field = adminName;
}
else if ("capital".toLowerCase().matches(regex)) {
field = capital;
}
else if ("population".toLowerCase().matches(regex)) {
field = population;
}
else if ("id".toLowerCase().matches(regex)) {
field = id;
}
if (field.equals("")) {
System.err.println("Invalid Search Field Name Provided! (" + searchField + ")");
return returnableList;
}
// See if the search criteria contains wildcard characters
// A search can be carried out using wildcards in this method.
if (searchCriteria.contains("?") || searchCriteria.contains("*")) {
// There is...build the required Regular Expression (RegEx) to use.
regex = searchCriteria.replace("?", ".?").replace("*", ".*?");
// See if the data item matches the search criteria ignoring letter case if desired.
// The String.matches() method is used for this and ternary for ignoring letter case.
if (ignoreCase ? field.toLowerCase().matches(regex.toLowerCase()) : field.matches(regex)) {
found = true; // toogle flag to true if there is a match.
}
}
// No wildcard characters in search criteria...
// Ternary is used in condition to handle ignore letter case if desired.
else if (ignoreCase ? field.equalsIgnoreCase(searchCriteria) : field.equals(searchCriteria)) {
found = true; // toogle flag to true if there is a match.
}
// If the 'found' flag has been set to true...
if (found) {
// Add City information to returnable ArrayList
String info = ls + "The following details are of city: " + city + ls
+ "The Ascii string would be: " + cityAscii + ls
+ "It has the approximate Lattitude of: " + latitude + ls
+ "And the approximate Longitude of: " + longitude + ls
+ "It is situated in the country of: " + country + ls
+ "The city has iso codes like: " + iso2 + " and: " + iso3 + ls
+ "The State/Province/Region is: " + adminName + ls
+ "Capital of this city is: " + capital + ls
+ // Didn't know cities had capitals
"The population is approximately: " + population + ls
+ "City general ZIP code is: " + id;
returnableList.add(info); // Add to list
found = false; // Toggle found flag back to false in prep to locate more city data.
foundCounter++; // increment the found counter.
// If the First Instance Only flag is true then...
if (numberOfFoundToReturn > 0 && foundCounter == numberOfFoundToReturn) {
// Break out of the 'while' loop. We don't need anymore cities.
break;
}
}
}
// If the Found Counter was not incremented then
// we didn't find any data in file... Inform User.
if (foundCounter == 0) {
System.err.print(ls + "Can not find City Name (" + searchCriteria
+ ") in data file!" + ls);
}
}
catch (FileNotFoundException ex) {
System.err.print("City Data file not found! (" + filePath + ")" + ls);
}
// Return the List of found data.
return returnableList;
}
}
Create a new Java Application Project and name it CityInfoRecords. Copy and paste the above code over top of the Main Startup class. Run the application, read the console prompts carefully and enter the proper data.
The first prompt asks for a City Information Field name...enter: city.
The second prompt will ask for the search criteria for city...enter a city name in Upper or lower case (it doesn't matter). The city information will be displayed in console but only if that city name is contained within the City field within the data file.
Now run the code again and enter the same data except this time, for the city name just provide the first three letters of a city name and an asterisk (*) and then hit the enter key. Now any city information within your specific City Data File which starts with those supplied three letters will be displayed within the Console Window.
Play with it, try different fields to search from and play with the wildcard characters as well with your supplied field or search criteria data.
Now make readRecord a Class instead of a method which would be better.

Java: IndexOf(String string) that returns wrong character

I am writing a file browser program that displays file directory path while a user navigates between folders/files.
I have the following String as file path:
"Files > Cold Storage > C > Capital"
I am using Java indexOf(String) method to return the index of 'C' character between > C >, but it returns the first occurrence for it from this word Cold.
I need to get the 'C' alone which sets between > C >.
This is my code:
StringBuilder mDirectoryPath = new StringBuilder("Files > Cold Storage > C > Capital");
String mTreeLevel = "C";
int i = mDirectoryPath.indexOf(mTreeLevel);
if (i != -1) {
mDirectoryPath.delete(i, i + mTreeLevel.length());
}
I need flexible solution that fits other proper problems
Any help is appreciated!
A better approach would be to use a List of Strings.
public void test() {
List<String> directoryPath = Arrays.asList("Files", "Cold Storage", "C", "Capital");
int cDepth = directoryPath.indexOf("C");
System.out.println("cDepth = " + cDepth);
}
Search for the first occurance of " C " :
String mTreeLevel = " C ";
int i = mDirectoryPath.indexOf(mTreeLevel);
Then add 1 to account to get the index of 'C' (assuming the String you searched for was found).
If you only want to delete the single 'C' character :
if (i >= 0) {
mDirectoryPath.delete(i + 1, i + 2);
}
EDIT:
If searching for " C " may still return the wrong occurrence, search for " > C > " instead.

Finding errors in LaTeX document, java

So I have a bunch of LaTeX styled documents, one looks like this...
\documentclass{article}
\usepackage{amsmath, amssymb, amsthm}
\begin{document}
{\Large \begin{center} Homework Problems \end{center}}\begin{itemize}\item\end{itemize}
\begin{enumerate}
\item Prove: For all sets $A$ and $B$, $(A - B) \cup
(A \cap B) = A$.
\begin{proof}
\begin{align}
& (A - B) \cup (A \cap B) && \\
& = (A \cap B^c) \cup (A \cap B) && \text{by
Alternate Definition of Set Difference} \\
& = A \cap (B^c \cup B) && \text{by Distributive Law} \\
& = A \cap (B \cup B^c) && \text{by Commutative Law} \\
& = A \cap U && \text{by Union with the Complement Law} \\
& = A && \text{by Intersection with $U$ Law}
\end{align}
\end{proof}
\item If $n = 4k + 3$, does 8 divide $n^2 - 1$?
\begin{proof}
Let $n = 4k + 3$ for some integer $k$. Then
\begin{align}
n^2 - 1 & = (4k + 3)^2 - 1 \\
& = 16k^2 + 24k + 9 - 1 \\
& = 16k^2 + 24k + 8 \\
& = 8(2k^2 + 3k + 1) \text{,}
\end{align}
which is certainly divisible by 8.
\end{proof}
\end{enumerate}
\end{document}
Now first I had to read through each document and line and find all of the "\begin{BLOCK}" and "\end{BLOCK}" commands and add the BLOCK string to a Stack and when I found the matching "\end" I would call the pop() command on my Stack. I pretty much got all that done, it's just not well organized, or at least I think there is a better way to go about it than all my "if" statements. So that is my first question, is there something better than the way I did it?
Next is I want to find errors and report them. For example, if I removed the line "\begin{document}" from the text above, I want the program to run through, do everything it is suppose to, but when It reaches the line "\end{document}" it reports the missing "\begin" command. I got my code to handle other examples, such as removing the "\begin" commands for enumerate or itemize, but I can't get that case to work.
Finally I want to be able to handle missing "\end" commands. I have made an attempt at it but I can't quite get the conditioning right. Let's say I have this document...
\begin{argument}
\begin{Palin}No it can't. An argument is a connected series of statements intended to establish a proposition.\end{Palin}
\begin{Cleese}No it isn't.\end{Cleese}
\begin{Palin}\expression{exclamation}Yes it is! It's not just contradiction.\end{Palin}
\begin{Cleese}Look, if I argue with you, I must take up a contrary position.\end{Cleese}
\begin{Palin}Yes, but that's not just saying \begin{quotation}'No it isn't.'\end{Palin}
\begin{Cleese}\expression{exclamation}Yes it is!\end{Cleese}
\begin{Palin}\expression{exclamation}No it isn't!\end{Palin}
\begin{Cleese}\expression{exclamation}Yes it is!\end{Cleese}
\begin{Palin}Argument is an intellectual process. Contradiction is just the automatic gainsaying of any statement the other person makes.\end{Palin}
\end{argument}
You'll notice on line 6 there is a "\begin{quotation}" command without an "\end". My code when going through this particular document gives me this as output...
PARSE ERROR Line 6: Missing command \begin{Palin}.
PARSING TERMINATED!
This is obviously not true, but I don't know how to restructure my error handling to get these cases to work. Can anyone provide any help? Especially in the way of organizing this code to better suite it for finding these issues.
-------------------------------------------CODE-------------------------------------------
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.Stack;
import java.util.StringTokenizer;
public class LaTeXParser{
public static void main(String args[]) throws FileNotFoundException{
Scanner scan = new Scanner(System.in);
Stack s = new Stack();
int lineCount = 0;
String line;
String nextData = null;
String title = null;
String fname;
System.out.print("Enter the name of the file (no extension): ");
fname = scan.next();
fname = fname + ".txt";
FileInputStream fstream = new FileInputStream(fname);
Scanner fscan = new Scanner(fstream);
System.out.println();
while(fscan.hasNextLine()){
lineCount++;
line = fscan.nextLine();
StringTokenizer tok = new StringTokenizer(line);
while(tok.hasMoreElements()){
nextData = tok.nextToken();
System.out.println("The line: "+nextData);
if(nextData.contains("\\begin") && !nextData.contains("\\end")){
if(nextData.charAt(1) == 'b'){
title = nextData.substring(nextData.indexOf("{") + 1, nextData.indexOf("}"));
s.push(title);
}
}//end of BEGIN if
if(nextData.contains("\\end") && !nextData.contains("\\begin")){
String[] theLine = nextData.split("[{}]");
for(int i = 0 ; i < theLine.length ; i++){
if(theLine[i].contains("\\end") && !s.isEmpty() && theLine[i+1].equals(s.peek())){
s.pop();
i++;
}
if(theLine[i].contains("\\end") && !theLine[i+1].equals(s.peek())){
System.out.println("PARSE ERROR Line " + lineCount + ": Missing command \\begin{" + theLine[i+1] + "}.");
System.out.println("PARSING TERMINATED!");
System.exit(0);
}
}
}//end of END if
if(nextData.contains("\\begin") && nextData.contains("\\end")){
String[] theLine = nextData.split("[{}]");
for(int i = 0 ; i < theLine.length ; i++){
if(theLine[i].contains("\\end") && theLine[i+1].equals(s.peek())){
s.pop();
}
if(theLine[i].equals("\\begin")){
title = theLine[i+1];
s.push(title);
}
}
}//end of BEGIN AND END if
}
}//end of whiles
fscan.close();
if(s.isEmpty()){
System.out.println();
System.out.println(fname + " LaTeX file is valid!");
System.exit(0);
}
while(!s.isEmpty()){
}
}
}

Categories