Match text between empty lines - java

I have a text in following blocks:
AAAAAAA
BBBBBBB
CCCCCCC
DDDDDD. YYYYYYYYYYYYYYYYYYYYYY
EEEEE 1234567890
Some random text
Some text random
Random text
Text
Some random text
ZZZZZZZZZZZZZZZZ
UUUUUUUUUUUUUUUU
How to select with regexp a following block?
Some random text
Some text random
Random text
Text
Some random text
From the original text I know that this block goes after line DDDDDD. YYYYYYYYYYYYYYYYYYYYYY which is optionally followed by line EEEEE 1234567890 and also that block is between lines that contain only \s symbols.
I have tried pattern DDDDDD.*\\s+(.*)\\s+ it doesn't work.

You can use the following Pattern to match your expected text:
String text = "AAAAAAA\nBBBBBBB\nCCCCCCC\n\nDDDDDD. YYYYYYYYYYYYYYYYYYYYYY "
+ "\nEEEEE 1234567890 "
+ "\n\nSome random text\nSome text random\nRandom text\nText \nSome random text\n\n"
+ "ZZZZZZZZZZZZZZZZ\nUUUUUUUUUUUUUUUU";
Pattern p = Pattern.compile(
// | 6 "D"s
// | | actual dot
// | | | some whitespace
// | | | | 22 "Y"s
// | | | | | more whitespace
// | | | | | | optional:
// | | | | | || 5 "E"s
// | | | | | || | whitespace
// | | | | | || | | 10 digits
// | | | | | || | | | more whitespace including line breaks
// | | | | | || | | | | your text
// | | | | | || | | | | | followed by any "Z" sequence
"D{6}\\.\\s+Y{22}\\s+(E{5}\\s\\d{10}\\s+)?(.+?)(?=Z+)",
Pattern.DOTALL
);
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println(m.group(2));
}
Output
Some random text
Some text random
Random text
Text
Some random text
Note
Not sure how to delimit the final part, so I just used a capitalized Z sequence (1+).
Up to you to refine.

Related

Select row value from a different column in Spark Java using some rule

I want to select different row values for each row from different columns using some complex rule.
For example I have this data set:
+----------+---+---+---+
| Column A | 1 | 2 | 3 |
+ -------- +---+---+---+
| User 1 | A | H | O |
| User 2 | B | L | J |
| User 3 | A | O | N |
| User 4 | F | S | E |
| User 5 | S | G | V |
+----------+---+---+---+
I want to get something like this:
+----------+---+---+---+---+
| Column A | 1 | 2 | 3 | F |
+ -------- +---+---+---+---+
| User 1 | A | H | O | O |
| User 2 | B | L | J | J |
| User 3 | A | O | N | A |
| User 4 | F | S | E | E |
| User 5 | S | G | V | S |
+----------+---+---+---+---+
The selected values for column F are selected using a complex rule wherein the when function is not applicable. If there are 1000 columns to select from, can I make a UDF do this?
I already tried making a UDF to store the string of the column name to select the value from so it can be used to access that column name's row value. For example, I tried storing the row value 233 (the result of the complex rule) of row 100, then try to use it as a column name (column 233) to access its row value for row 100. However, I never got it to work.

ANTLR4 parser errors in code when Intellij plugin constructs it correctly

I am attempting to construct a compiler for the golfing language Vyxal and am using ANTLR for parsing. I have managed to write down almost the entire language, and it works perfectly in the Intellij plugin for ANTLR, but when I try running the code using the input ⟨1|2⟩⟨1|2⟩, it says line 1:10 mismatched input '<EOF>' expecting <a bunch of random characters>. I found this question, but my lexer and parser rules are in the same file, and I have already confirmed that the tokens are correct.
My grammar:
grammar Vyxal;
#header {
package io.github.seggan.jyxal.antlr;
}
file
: program EOF
;
program
: (literal | structure | element)+
;
element
: PREFIX? element_type
;
element_type
: ALPHA | '<' | ':' | '\u00d7' | '\u1e40' | '\u1e6b' | '\u2087' | '\u00be' | '\u2084' | '\u21b5'
| '\u00b9' | '\u03a0' | '\u00e6' | '\u1e61' | '\u2211' | '\u1e8e' | '\u221a' | '\u1e0b' | '\u00a7'
| '\u00b2' | '\u2026' | '\u1e45' | '\u017b' | '\u01cd' | '-' | '\u2235' | '\u2194' | '\u2260' | '\u027e'
| '\u00a4' | '\u20b4' | '\u01cf' | '\u21e7' | '\u0121' | '\u1e8f' | '\u207c' | '\u204b' | '\u2229'
| '\u2248' | '\u2237' | '\u2088' | '\u00f7' | '\u0227' | '\u0280' | '\u2080' | '\u1e02' | '\u228d'
| '\u2234' | '\u2228' | '\u022f' | '\u2070' | '\u1e8a' | '\u21e9' | '\u1e87' | '\u2039' | '\u1e2d'
| '\u2020' | '\u201f' | '\u2308' | '\u2081' | '!' | '\u20ac' | '\u0188' | '\u01d2' | '\u027d' | '\u0281'
| ',' | '\u022e' | '\u22ce' | '\u03c4' | '\u01ce' | '\u1e59' | '%' | '\u1e86' | '\u2227' | '\u21b2'
| '\u01d0' | '\u00a2' | '\u201e' | '\u0116' | '\u2082' | '\u1e1e' | '\ua60d' | '}' | '*' | '\u1e8b'
| '?' | '\u2085' | '\u0140' | '\u00df' | '\u27c7' | '\u2105' | '\u00a5'| '\u2086' | '\u0120' | '\u1e57'
| '\u221e' | '\u1e56' | '\ua71d' | '\u01d3' | '\u203a' | '\u03b5' | '\u25a1' | '\u1e6a' | '\u00a6'
| '\u0117' | '$' | '\u1e58' | '\u0130' | '=' | '\u2193' | '\u010b' | '\u2083' | '\u1e22' | '_' | '\u27d1'
| '\u010a' | '\u013f' | '\u00ac' | '\u00b6' | '\u00f0' | '\u1e1f' | '\u00a1' | '\u00af' | '\u2265'
| '\u01d4' | '\u017c' | '\u2191' | '\u1e0a' | '\u00bc' | '\u22cf' | '\u01d1' | '>' | '\u1e41' | '\u00a3'
| '\u215b' | '\u1e23' | '+' | '\u00b1' | '/' | '\u21b3' | '\u222a' | '\u2207' | '\u2264' | '\u1e03'
| '\u2310' | '^' | '\u1e60' | '\u0226' | '\u03b2' | '\u2022' | '\u00bd' | '\u1e44'
;
// structures
structure
: if_statement
| for_loop
| while_loop
| lambda
| function
| variable_assn
;
if_statement
: '[' program ('|' program)? ']'?
;
for_loop
: '(' (variable '|')? program ')'?
;
while_loop
: '{' (program '|')? program '}'?
;
lambda
: LAMBDA_TYPE (integer '|')? program ';'?
;
function
: '#' variable ((':' parameter (':' parameter)*)? '|' program)? ';'?
;
variable_assn
: ASSN_SIGN variable
;
variable
: (ALPHA | DIGIT)+
;
parameter
: '*' | variable | integer
;
// types
literal
: number
| string
| list
;
string
: normal_string
| compressed_string
| single_char_string
| double_char_string
;
number
: integer
| complex
| compressed_number
;
integer
: DIGIT+ ('.' DIGIT+)?
;
complex
: integer '°' integer
;
list
: '\u27e8' program ('|' program)* '\u27e9'?
;
any_text
: .+?
;
compressed_string
: '\u00ab' any_text '\u00ab'?
;
normal_string
: '`' any_text '`'?
;
single_char_string
: '\\' .
;
double_char_string
: '‛' . .
;
compressed_number
: '\u00bb' any_text '\u00bb'?
;
DIGIT
: [0-9]
;
// code
PREFIX
: [¨Þkø∆]
;
ALPHA
: [a-zA-Z]
;
ASSN_SIGN
: '→' | '←'
;
// strucutres
LAMBDA_TYPE
: [λƛ'µ]
;
WHT
: [ \t\n\r] -> skip
;
The code I use is simple:
String s = Files.readString(Path.of(args[0]));
VyxalLexer lexer = new VyxalLexer(CharStreams.fromString(s));
VyxalParser parser = new VyxalParser(new CommonTokenStream(lexer));
The Intellij plugin constructs it properly:
And the tokens are of the correct values:
[#-1,0:0='⟨',<163>,1:0]
[#-1,1:1='1',<170>,1:1]
[#-1,2:2='|',<154>,1:2]
[#-1,3:3='2',<170>,1:3]
[#-1,4:4='⟩',<164>,1:4]
[#-1,5:5='⟨',<163>,1:5]
[#-1,6:6='1',<170>,1:6]
[#-1,7:7='|',<154>,1:7]
[#-1,8:8='2',<170>,1:8]
[#-1,9:9='⟩',<164>,1:9]
IntelliJ displays this:
And when I run:
VyxalLexer lexer = new VyxalLexer(CharStreams.fromString("⟨1|2⟩⟨1|2⟩"));
VyxalParser parser = new VyxalParser(new CommonTokenStream(lexer));
System.out.println(parser.file().toStringTree(parser));
the following is printed:
(file (program (literal (list ⟨ (program (literal (number (integer 1)))) | (program (literal (number (integer 2)))) ⟩)) (literal (list ⟨ (program (literal (number (integer 1)))) | (program (literal (number (integer 2)))) ⟩))) <EOF>)
Which is in sync with what IntelliJ displays.
I'm guessing you haven't recently generated new parser classes. Causing your own Java code to use older parser classes than the IntelliJ plugin is using.

Java code to write a missing value in a mapping Informatica PowerCenter

I have a task to take a look in a database (SAP iDoc) that has specific values in it derived by segments. I have to export an xml at the end of the mapping that has a subcomponent that can have more than one row. My problem is that we have a component that has two values that are separated by a qualifier.
Every transaction looks like so:
+----------+-----------+--------+
| QUALF_1 | BETRG_dc | DOCNUM |
+----------+-----------+--------+
| 001 | 20 | xxxxxx |
| 001 | 22 | xxxxxx |
+----------+-----------+--------+
+---------+-----------+-----------+
| QUALF_2 | BETRG_pr | DOCNUM |
+---------+-----------+-----------+
| 013 | 30 | xxxxxx |
| 013 | 40 | xxxxxx |
+---------+-----------+-----------+
My problem is that when joined with the built in transformations we have a geometrical progression like so
+---------+-----------+-----------+
| DOCNUM | BETRG_dc | BETRG_pr |
+---------+-----------+-----------+
| xxxxxx | 20 | 30 |
| xxxxxx | 20 | 40 |
| xxxxxx | 22 | 30 |
| xxxxxx | 22 | 40 |
+---------+-----------+-----------+
As you can see only the first and last rows are correct.
The problem comes from the fact that if BETRG_dc is 0 the whole segment is not being sent so a filter transformation fails.
What i found out is the the segment number of QUALF_1 and QUALF_2 are sequencial. So QUALF_1 is for example 48 and QUALF_2 is 49.
Can you help me create a JAVA transformation that adds a row for a missing QUALF_1.
Here is a table of requirements:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 013 | 20 | 48 |
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
I want the transformation to take a look and if we have a source like this:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
To go ahead and insert a row with the segment id 48 and a value for BETRG of "0".
I have tried every transformation i can.
The expected output should be like this:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 013 | 0 | 48 |
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
You should join both of the table in a joiner transformation.
use Left(master) outer join and then take it into a target. then map the BETRG column from the right table to the target and the rest of the columns from the left table.
what happens is when ever there is no match BETRG will be empty. take it into a expression and see if the value is null or empty and change it to 0 or what value you wish.
Here is what i have created but unfortunately for now it works on a row level only and not on the whole data. I am working on making the code run properly:
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
if(QUALF.equals("001"))
{
segment_new=(SegmentNumber - 1);
}
int colCount=1;
myList.add(SegmentNumber);
System.out.println("SegmentNumber_out: " + segment_new);
if(Arrays.asList(myList).contains(segment_new)){
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
QUALF_out="013";
BETRG_out="0";
SegmentNumber_out=segment_new;
generateRow();
} else {
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
generateRow();
}
Here is what works:
import java.util.*;
private ArrayList<String> myList2 = new ArrayList<String>();
QUALF_out = QUALF;
BETRG_out = BETRG;
SegmentNumber_out = SegmentNumber;
DOCNUM = DOCNUM;
array_for_search = QUALF + ParentSegmentNumber + DOCNUM ;
myList2.add(array_for_search);
System.out.println("myList: " + myList2);
System.out.println("Array: " + myList2.contains("910" + ParentSegmentNumber + DOCNUM));
if(!myList2.contains("910" + ParentSegmentNumber + DOCNUM)){
QUALF_out="910";
BETRG_out="0";
}
generateRow();

Filter Dataset using where column is not a number using Spark Java API 2.2?

I'm new in Spark Java API. I want to filter my Dataset where a column is not a number. My dataset ds1 is something like this.
+---------+------------+
| account| amount |
+---------+------------+
| aaaaaa | |
| aaaaaa | |
| bbbbbb | |
| 123333 | |
| 555555 | |
| 666666 | |
I want return a datset ds2 like this:
+---------+------------+
| account| amount |
+---------+------------+
| 123333 | |
| 555555 | |
| 666666 | |
I tried this but id doesn't work for me.
ds2=ds1.select("account"). where(dsFec.col("account").isNaN());
Can someone please guides me with a sample spark expression to resolve this.
You can define a udf function to check whether the string in account column is numeric or not as
UDF1 checkNumeric = new UDF1<String, Boolean>() {
public Boolean call(final String account) throws Exception {
return StringUtils.isNumeric(account);
}
};
sqlContext.udf().register("numeric", checkNumeric, DataTypes.BooleanType);
and then use callUDF function to call the udf function as
df.filter(callUDF("numeric", col("account"))).show();
which should give you
+-------+------+
|account|amount|
+-------+------+
| 123333| |
| 555555| |
| 666666| |
+-------+------+
Just cast and check if result is null:
ds1.select("account").where(dsFec.col("account").cast("bigint").isNotNull());
One way to do this:
Scala Equivalent:
import scala.util.Try
df.filter(r => Try(r.getString(0).toInt).isSuccess).show()
+-------+------+
|account|amount|
+-------+------+
| 123333| |
| 555555| |
| 666666| |
+-------+------+
Or You can use the same using Java's try catch:
df.map(r => (r.getString(0),r.getString(1),{try{r.getString(0).toInt; true
}catch {
case runtime: RuntimeException => {
false}
}
})).filter(_._3 == true).drop("_3").show()
+------+---+
| _1| _2|
+------+---+
|123333| |
|555555| |
|666666| |
+------+---+

Iterating through lists of list in java

I have two tables User and Roles with one-to-one relation as below.
User
_________________________________________
| Id | user_name | full_name | creator |
_________________________________________
| 1 | a | A | a |
| 2 | b | B | a |
| 3 | c | C | a |
| 4 | d | D | c |
| 5 | e | E | c |
| 6 | f | F | e |
| 7 | g | G | e |
| 8 | h | H | e |
| 9 | i | I | e |
|10 | j | J | i |
_________________________________________
Roles
_______________________________________
| id | user_mgmt | others | user_id |
_______________________________________
| 1 | 1 | 1 | 1 |
| 2 | 0 | 1 | 2 |
| 3 | 1 | 0 | 3 |
| 4 | 0 | 1 | 4 |
| 5 | 1 | 1 | 5 |
| 6 | 0 | 1 | 6 |
| 7 | 0 | 0 | 7 |
| 8 | 0 | 0 | 8 |
| 9 | 1 | 0 | 9 |
________________________________________
The Roles table have boolean columns, so if an User have user_mgmt role he can add many users (How many users can be added by an user is not definite). I want to fetch all users created by an user and his child users ( a is parent, c is child of a and e is child of c ..) .
Here is my code to fetch users
public void loadUsers(){
List<Users> users = new ArrayList<>();
String creator = user.getUserName();
List<Users> createdUsers = userService.getUsersByCreator(creator);
for(Users user : createdUsers) {
Roles role = createdUsers.getRoles();
if(role.isEmpMgnt()){
users.add(user);
loadUsers();
}
}
}
This gives me an stack overflow error. If i don't call loadUsers() recursively it returns only a single child result. So is there any solutions to this ? Thanks for any help in advance.
This gives you stack overflow error because a has creator a. So you have infinite loop for user a. For a you should set creator to null or skip self references in code.
Also you should pass current user into loadUsers() method and read only users that are created by it. Like
public void loadUsers(String creator)
and only process users created by that creator. Here
String creator = user.getUserName();
what is user? You should use creator. Question is how do you obtain initial creator. Probably initial creator should be user where creator is null.

Categories