How to match a regex with words followed by date? - java

I'd like to write a regex to match sentences like these:
"I rated Minions (2015)..."
"I rated Beauty and the Beast (2015)..."
I've tried a regex like:
I rated \\w+ \\(\\b(18|19|20)\\d{2}\\b\\)
but it works only in the first case, when the title is a single word.
Between "I rated" and the year there is a title of a movie with no fixed length. Could you help me?

Try using regex like
\[^.?!(]* \\((18|19|20)\\d{2}\\)\
OR
\\w+ (?:\\w+ )*\\((?:1[89]|20)\\d{2}\\)

Assuming that:
you don't really need to validate the year
your text has mixed spurious sentences, as opposed to one-liner "I rated..."
you want to do something with the movie title and year separately
You can use:
String text = "I rated Minions (2015)... I like turtles. "
+ "I rated Beauty and the Beast (2015)... "
+ "I rated rare live footage of Louis XVI being beheaded (1789)";
// | starts with "I rated"
// | | group 1 with the title
// | | | open parenthesis
// | | | | group 2 with non-validated year
// | | | | | closing parenthesis
// | | | | |
Pattern pattern = Pattern.compile("I rated (.+?) \\((\\d+)\\)");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.printf(
"Title: %s - Year: %s%n",
// title is back-referenced as group 1
matcher.group(1),
// year is back-referenced as group 2
matcher.group(2)
);
}
... which will return:
Title: Minions - Year: 2015
Title: Beauty and the Beast - Year: 2015
Title: rare live footage of Louis XVI being beheaded - Year: 1789

Related

Subtotaling an arraylist that was imported from a csv file

I'm making a school project for a java coding class and need some help
I've imported a CSV file with some information on it that has been separated into an ArrayList and wish to total each different category in the file.
right now my output is:
Item | Category | Amount | Price | Location
Carrots | Food | 12 | $2 | Walmart
Potatoes | Food | 15 | $3 | Walmart
T-Shirt | Clothes | 1 | $16 | Walmart
Toothbrush | Cleaning | 2 | $2 | Walmart
my goals are to get subtotals for each category such as:
Item | Category | Amount | Price | Location
Carrots | Food | 12 | $2 | Walmart
Potatoes | Food | 15 | $3 | Walmart
Subtotal for Food is: $69
T-Shirt | Clothes | 1 | $16 | Walmart
Subtotal for Clothes is: $16
Toothbrush | Cleaning | 2 | $2 | Walmart
Subtotal For Cleaning is: $4
and then to eventually have the whole thing totalled at the end and displayed like this:
Item | Category | Amount | Price | Location
Carrots | Food | 12 | $2 | Walmart
Potatoes | Food | 15 | $3 | Walmart
Subtotal for Food is: $69
T-Shirt | Clothes | 1 | $16 | Walmart
Subtotal for Clothes is: $16
Toothbrush | Cleaning | 2 | $2 | Walmart
Subtotal For Cleaning is: $4
Total is: $89
Here is my code right now:
public static void main(String[] args) {
ArrayList<output> csvstuff = new ArrayList<output>();
String fileName = "Project2.csv";
File file = new File(fileName);
try {
Scanner inputStream = new Scanner(file);
while(inputStream.hasNext()) {
String data = inputStream.next();
String[] values= data.split(",");
String Item = values[0];
String Category = values[1];
String Amount = values[2];
String Price = values[3];
String Location = values[4];
String format = "%-10s |%10s |%10s |%10s |%10s\n";
System.out.printf(format, values[0], values[1], values[2], values[3], values[4]);
output _output = new output(Item,Category,Amount,Price,Location);
csvstuff.add(_output);
}
inputStream.close();
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Instead of printing values while reading in the loop, add it to arraylist as you have done. Then sort the data based on Category type. Refer to comparable and comparator in java.
And now you can simply iterate over the list and keep counting the subtotal for each category.
Useful link: Comarator v/s Comparable
Sort all rows by their category.
Walk the sorted list and keep a subtotal count.
If an item is in a different category than the previous one, emit a subtotal row and clear the subtotal value. (You'll also have to do this once after the last row.)
Then emit a line describing the current row.
To emit a grand total, simply keep another total that is updated with each row and never cleared.

Paragraph to Table in Different Columns

I want to create a program such that if I have inserted a paragraph in a text area, I want certain parts of it to be put in the table in different columns. For example the statement is:
My name is James Olson. I am 21 years old. I am a doctor. I live in Canterville, Bacon Street, London.
Then the table should automatically look like:
| Name | Age | Profession | Area Name | Street Name | Area |
James | 21 | Doctor | Canterville | Bacon Street | London |
I also want to know which language would go the best - Python or Java.
Yes it is certainly possible to do so and I would personally prefer Python to do the Job.
I've written the code, it is not the best or the most efficient code but it will surely do the job but there is a catch in my code. It will only work if the sequence and pattern of your sentence is same. The pattern should be exactly like the one you provided in the example.
If you want the code to work for multiple sentences, a slight change in the code with a loop can do the work.
import pandas as pd
my_sent = "My name is James Olson. I am 21 years old. I am a doctor. I live in Canterville, Bacon Street, London."
my_words = my_sent.split()
my_stopwords = ['My', 'name', 'is', 'I', 'am', 'years', 'old.', 'I', 'am', 'a', 'I', 'live', 'in',]
cleaned_stopwords = []
useful_words = []
for temp in my_stopwords:
cleaned_stopwords.append(temp.lower().strip())
for word in my_words:
if word.lower().strip() not in cleaned_stopwords:
useful_words.append(word.title().strip(".").strip(","))
name = useful_words[0] + " " + useful_words[1]
street = useful_words[5] + " " + useful_words[6]
useful_words.pop(0)
useful_words.pop(0)
useful_words.insert(0, name)
useful_words.pop(4)
useful_words.pop(4)
useful_words.insert(4, street)
all_columns = ["Name", "Age", "Profession", "Area Name", "Street Name", "Area"]
my_df = pd.DataFrame([useful_words], columns = all_columns)
Output:
Name Age Profession Area Name Street Name Area
0 James Olson 21 Doctor Canterville Bacon Street London

Regex doesn't work in Java

I have some expected pattern for data that I receive in my server. The following 2 lines are expected as I wish.
¬14AAAA3170008#¶
%AAAA3170010082¶
So, to check if the data it's fine, I wrote the following regex:
(?<pacote>\\A¬\\d{2}[a-fA-F0-9]{4}\\d{7}.{2})
(?<pacote>\\A[$%][a-fA-F0-9]{4}\\d{10}.)
And it works fine in regex101.com but Java Pattern and Matcher doesn't understand this regex as expected. Here goes my Java code
String data= "¬14AAAA3170008#¶%AAAA3170010082¶";
Pattern patternData = Pattern.compile( "(?<pacote>\\A¬\\d{2}[a-fA-F0-9]{4}\\d{7}.{2})", Pattern.UNICODE_CASE );
Matcher matcherData = patternData.matcher( data );
if( matcherData .matches() ){
System.out.println( "KNOW. DATA[" + matcherData .group( "pacote" ) + "]");
}else{
System.out.println( "UNKNOW" );
}
And it didn't worked as expected. Could someone help me figure what mistake I'm doing?
You're using Matcher#matches, which matches the whole input.
However, the Pattern you're using only applies for the first input, and your whole input contains the two cases concatenated.
On top of that, the \\A boundary matcher implies the pattern follows the start of the input.
You can use the following pattern to generalize and match the two:
String test = "¬14AAAA3170008#¶%AAAA3170010082¶";
Pattern p = Pattern.compile(
// | named group definition
// | | actual pattern
// | | | ¬ + 2 digits or $%
// | | | | 4 hex alnums
// | | | | | 7 to 10 digits
// | | | | | | any 1 or 2 characters
// | | | | | | | multiple times (2 here)
"(?<pacote>((¬\\d{2}|[$%])[a-fA-F0-9]{4}\\d{7,10}.{1,2})+)"
);
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group("pacote"));
}
Output
¬14AAAA3170008#¶%AAAA3170010082¶

Setting up a Regex Pattern in Java

I am having some difficulty trying to create a working regex pattern for recognizing a string of file names.
Pattern pattern = Pattern.compile("(\\w+)(\\.)(t)(x)(t)(\\s$|\\,)");
When using a .find() from a matcher class on my sample input, say
"file1.txt,file2.txt "
I am returned true, which is fine, however other erroneous input also returns true.
This erroneous input includes strings such as:
"file1.txt,,file2.txt "
"file%.text "
I have been consulting this website as I have been trying to construct them, I'm pretty sure I am missing something rather obvious though. Link
To validate your file name list, you can use the following solution:
// | preceded by start of input, or
// | | comma, preceded itself by either word or space
// | | | file name
// | | | | dot
// | | | | | extension
// | | | | | | optional 1 space
// | | | | | | | followed by end of input
// | | | | | | | | or comma,
// | | | | | | | | followed itself by word character
Pattern pattern = Pattern.compile("(?<=^|(?<=[\\w\\s]),)(\\w+)(\\.)txt\\s?(?=$|,(?=\\w))");
String input = "file1.txt,file2.txt";
String badInput = "file3.txt,,file4.txt";
String otherBadInput = "file%.txt, file!.txt";
Matcher m = pattern.matcher(input);
while (m.find()) {
// printing group 1: file name
System.out.println(m.group(1));
}
m = pattern.matcher(badInput);
// won't find anything
while (m.find()) {
// printing group 1: file name
System.out.println(m.group(1));
}
m = pattern.matcher(otherBadInput);
// won't find anything
while (m.find()) {
// printing group 1: file name
System.out.println(m.group(1));
}
Output
file1
file2
May this help you:
Pattern pattern = Pattern.compile("(.*?\\.\\w+)(?:,|$)");
String files = "file1.txt,file2.txt";
Matcher mFile = pattern.matcher(files);
while (mFile.find()) {
System.out.println("file: " + mFile.group(1));
}
Output:
file: file1.txt
file: file2.txt
With only .txt files: (.*?\\.txt)(?:,|$)
Naive way :
public boolean validateFileName(String string){
String[] fileNames= string.split(",");
Pattern pattern = Pattern.compile("\b[\w]*[.](txt)\b"); /*match complete name, not just pattern*/
for(int i=0;i<fileNames.length;i++){
Matcher m = p.matcher(fileNames[i]);
if (!m.matches())
return false;
}
return true;
}
Pattern p = Pattern.compile("((\\w+\\.txt)(,??|$))+");
The above Pattern lets "file1.txt,,file2.txt" pass, but does not get an empty file for the two commas.
The other Strings "file1.txt,file2.txt" and "file%txt" are processed correctly.

Regular Expression in Java for UMASK

I need a Java regular expression to get the following two values:
the value of the UMASK parameter in file /etc/default/security should be set to 077. [Current value: 022] [AA.1.9.3]
the value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]
I need to get the file name from the input string, as well as the current value if existing.
I wrote a regex as .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])?.* for this one, it can match both strings, also to get the file name, but can NOT get the current value.
Then another regex: .* (.*?/.*?) (?:\\[Current value\\: (\\d+)\\])? .* comparing with the first one, there is a space before the last .* for this one, it can match the string 1, and get file name and current value, but it can NOT match the string 2...
What how can I correct these regular expressions to obtain the values described above?
If I understand your requisites correctly (file name and current octal permissions value), you can use the following Pattern:
String input =
"Value for parameter UMASK in file /etc/default/security should be set to 077. " +
"[Current value: 022] [AA.1.9.3] - " +
"Value of UMASK should be set to 077 in /etc/skel/.profile [AA.1.9.3]";
// | "file " marker
// | | group 1: the file path
// | | | space after
// | | || any characters
// | | || | escaped square bracket
// | | || | | "Current value: " marker
// | | || | | | group 2:
// | | || | | | digits for value
// | | || | | | | closing bracket
Pattern p = Pattern.compile("file (.+?) .+?\\[Current value: (\\d+)\\]");
Matcher m = p.matcher(input);
// iterates, but will find only once in this instance (which is desirable)
while (m.find()) {
System.out.printf("File: %s%nCurrent value: %s%n", m.group(1), m.group(2));
}
Output
File: /etc/default/security
Current value: 022

Categories