Regular Expression to get Information from Whatsapp Text file

Regular Expression to get Information from Whatsapp Text file - java

I have no idea about creating regular expressions for extracting different text from a text file. I am working on text file consisting of message details in whatsapp chat.
Consider the following data from a text file of whatsapp chat:
25/12/2012 9:15 am: User1: Faith makes all things possible,
Hope makes all things work,
Love makes all things beautiful,
May you have all the three for this Christmas.
MERRY CHRISTMAS
01/01/2013 12:03 am: User1: <message>.
04/08/2013 10:54 am: User2: Happy Friendship day
13/10/2013 11:57 am: User1:<message>
<message continues>
<message continues>
30/12/2013 10:07 pm: User3:<message>
30/12/2013 11:12 pm: User4: Same to you
This is a sample chat text from which I need to extract Date, Time, Username, Message. I am working in java for this.
The java code for this that I have worked out is as follows.But Didnt found any correct REGEX according to my requirement.
BufferedReader br = new BufferedReader(new FileReader("text filepath"));
String sCurrentLine;
Pattern r = Pattern.compile(REGEX); //REGEX required for extracting data
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
Matcher m = r.matcher(sCurrentLine);
if (m.find()) {
System.out.println("Date: " + m.group(1) );
System.out.println("Time: " + m.group(2) );
System.out.println("User: " + m.group(3) );
System.out.println("Message: " + m.group(4) );
} else {
System.out.println("NO MATCH");
}
Thanks in advance for any help!

I think you're looking for this regex,
(\d{2}\/\d{2}\/\d{4})\s(\d(?:\d)?:\d{2} [ap]m):\s([^:]*):(.*?)(?=\s*\d{2}\/|$)
Java regex would be,
"(?s)(\\d{2}/\\d{2}/\\d{4})\\s(\\d(?:\\d)?:\\d{2} [ap]m):\\s([^:]*):(.*?)(?=\\s*\\d{2}/|$)"
DEMO

Related

Removing the #-Tag from a message (command) Discord JDA

I tried to make a mute command with the #-Tag. I got alle the stuff for mute going but I want to implement a reason. If the Players name ist for example "Abc Abc A#0001" the bot would show in the reason message to the user "Abc A#0001".
I tried:
.replace(event.getMessage().getMentionedMembers().get(0).getAsMention() + " ", "");
= full name will be shown
and
.replace(args[1] + " ", "");
= "Abc A#0001" ist shown

If you are asking how to get for example "Full Username#0001", you can use getAsTag() from the User object instead of getAsMention()

Parsing a Tab Separated File

I'm attempting to TSV from IMDB:
$hutter Battle of the Sexes (2017) (as $hutter Boy) [Bobby Riggs Fan] <10>
NVTION: The Star Nation Rapumentary (2016) (as $hutter Boy) [Himself] <1>
Secret in Their Eyes (2015) (uncredited) [2002 Dodger Fan]
Steve Jobs (2015) (uncredited) [1988 Opera House Patron]
Straight Outta Compton (2015) (uncredited) [Club Patron/Dopeman]
$lim, Bee Moe Fatherhood 101 (2013) (as Brandon Moore) [Himself - President, Passages]
For Thy Love 2 (2009) [Thug 1]
Night of the Jackals (2009) (V) [Trooth]
"Idle Talk" (2013) (as Brandon Moore) [Himself]
"Idle Times" (2012) {(#1.1)} (as Brandon Moore) [Detective Ryan Turner]
As you can some lines start with a tab and some do not. I want a map with the actor's name as a key and a list of movies as the value. Between the actor's name is one or more tabs to until the movie listing.
My code:
while ((line = reader.readLine()) != null) {
Matcher matcher = headerPattern.matcher(line);
boolean headerMatchFound = matcher.matches();
if (headerMatchFound) {
Logger.getLogger(ActorListParser.class.getName()).log(Level.INFO, "Header for actor list found");
String newline;
reader.readLine();
while ((newline = reader.readLine()) != null) {
String[] fullLine = null;
String actor;
String title;
Pattern startsWithTab = Pattern.compile("^\t.*");
Matcher tab = startsWithTab.matcher(newline);
boolean tabStartMatcher = tab.matches();
if (!tabStartMatcher) {
fullLine = newline.split("\t.*");
System.out.println("Actor: " + fullLine[0] +
"Movie: " + fullLine[1]);
}//this line will have code to match lines that start with tabs.
}
}
}
The way I've done this only works for a few lines before I get and arrayoutofbounds exception. How can I parse the lines and split them into 2 strings at max if they have one or more tabs?

There are subtleties in parsing tab/comma-delimited data files having to do with quoting and escaping.
To save yourself a lot of work, frustration and headaches you really should consider using one of the existing CSV parsing libaries such as OpenCSV or Apache Commons CSV.
Posted as an answer instead of a comment because the OP has not stated a reason for reinventing the wheel and there are some tasks that really have been "solved" once and for all.

Need to filter, parse and sort multiple log files

I have a need to collect a subset of info from log files that reside on one-to-many log file servers. I have the following java code that does the initial data collection/filtering:
public String getLogServerInfo(String userName, String password, String hostNames, String id) throws Exception{
int timeout = 5;
String results = "";
String[] hostNameArray = hostNames.split("\\s*,\\s*");
for (String hostName : hostNameArray) {
SSHClient ssh = new SSHClient();
ssh.addHostKeyVerifier(new PromiscuousVerifier());
try {
Utils.writeStdOut("Parsing server: " + hostName);
ssh.connect(hostName);
ssh.authPassword(userName, password);
Session s = ssh.startSession();
try {
String sh1 = "cat /logs/en/event/event*.log | grep \"" + id + "\" | grep TYPE=ERROR";
Command cmd = s.exec(sh1);
results += IOUtils.readFully(cmd.getInputStream()).toString();
cmd.join(timeout, TimeUnit.SECONDS);
Utils.writeStdOut("\n** exit status: " + cmd.getExitStatus());
} finally {
s.close();
}
} finally {
ssh.disconnect();
ssh.close();
}
}
return results;
}
The results string variable looks something like this:
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:31 253 AM, HOST=server1, APPLICATION=app1, FUNCTION=function1, STATUS=null, GUID=null, etc. etc.
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:59 123 AM, HOST=server1, APPLICATION=app1, FUNCTION=function1, STATUS=null, GUID=null, etc. etc.
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:28 956 AM, HOST=server2, APPLICATION=app1, FUNCTION=function2, STATUS=null, GUID=null, etc. etc.
I need to accomplish the following:
What do I need to do to be able to sort results by TIMESTAMP? It is unsorted right now, because i am enumerating one to many files, and appending results to end of a string.
I only want a subset of "columns" returned, such as TYPE, TIMESTAMP, FUNCTION. I thought i could REGEX it in the grep, but maybe arrays would be better?
Results are simply being printed to console/report, as this is only printed for failed tests, and is there for troubleshooting purposes only.

I took the list of output that you provided and put it in a file, named test.txt, making sure that each "TYPE=ERROR etc. etc" was in a new line (I guess it's the same in your output, but it isn't clear).
Then I used cat test.txt | cut -d',' -f1,2,5 | sort -k2 to do what you want.
cut -d',' -f1,2,5 basically splits by comma and only reports tokens number 1,2,5 (TYPE,TIMESTAMP,FUNCTION). If you want more, you can add more numbers depending on what token you want
sort -k2 sorts according to the 2nd column (TIMESTAMP)
The output I get is:
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:28 956 AM, FUNCTION=function2
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:31 253 AM, FUNCTION=function1
TYPE=ERROR, TIMESTAMP=10/03/2015 07:14:59 123 AM, FUNCTION=function1
So what you should try and do, is to further pipe your command with |cut -d',' -f1,2,5 | sort -k2
I hope it helps.

After working on this some more, i come to find that one of the key/value pairs allows commas in the values, thus cut will not work. Here is the finished product:
My grep command stays the same, collecting data from all servers:
String sh1 = "cat /logs/en/event/event*.log | grep \"" + id + "\" | grep TYPE=ERROR";
Command cmd = s.exec(sh1);
results += IOUtils.readFully(cmd.getInputStream()).toString();
Put the string into an array, so i can process them line by line:
String lines[] = results.split("\r?\n");
I then used regex to get the data i needed, repeating the below for each line in the array, and for as many columns as needed. It's a bit of a hack, I probably could have done it better by simply replacing the comma in the offending key/value pair, then using SPLIT() and comma as delimeter, then looping for the fields i want.
lines2[i] = "";
Pattern p = Pattern.compile("TYPE=(.*?), APPLICATION=.*");
Matcher m = p.matcher(lines[i]);
if (m.find()) {
lines2[i] += ("TYPE=" + m.group(1));
}
Finally, this will sort by Timestamp, since it is 2nd column:
Arrays.sort(lines2);

Show Prolog "write" output in Console Java

I'm using JPL Libraries to link a Prolog program with Java interface.
I have a (working) JTextArea where I redirect everything that appears in Console, but I can't see the prolog "write" instruction.
I used this method for queries:
Query q2 = new Query(t2);
System.out.println("Query " + t2 + " is " + (q2.hasSolution()));
but when the query is false, the text in "write" parts of Prolog for example:
write('Sorry, you can''t go from the '), write(CurPlace), write(' to the '), write(Place), nl,
doesn't appear in JTextArea.
I tried also with System.out.println("first solution of " + q2 + ": X = " + q2.oneSolution().get("X")); but it doesn't work.
What happen in JTextArea is this:
Query goto(garage).
is false
but I expected also the "write" warnings contained in .pl file. For example:
write('Sorry, you don''t have the keys')

download cover art from musicbrainz with java

I am struggling for a couple of hours now on how to link a discid to a musicbrainz mbid.
So, using dietmar-steiner / JMBDiscId
JMBDiscId discId = new JMBDiscId();
if (discId.init(PropertyFinder.getProperty("libdiscid.path")))
{
String musicBrainzDiscID = discId.getDiscId(PropertyFinder.getProperty("cdrom.path"));
}
or musicbrainzws2-java
Disc controller = new Disc();
String drive = PropertyFinder.getProperty("cdrom.path");
try {
DiscWs2 disc =controller.lookUp(drive);
log.info("DISC: " + disc.getDiscId() + " match: " + disc.getReleases().size() + " releases");
....
I can extract a discid for freedb or musicbrainz easily (more or less), but I have not found a way on calculating the id I that I need to download cover art via the CoverArtArchiveClient from last.fm.
CoverArtArchiveClient client = new DefaultCoverArtArchiveClient();
try
{
UUID mbid = UUID.fromString("mbid to locate release");
fm.last.musicbrainz.coverart.CoverArt coverArt = client.getByMbid(mbid);
Theoretically, I assume, I could you the data collected by musicbrainzws2-java to trigger a search, and then use the mbid from the result ... but that cannot be the best option to do.
I am happy about any push into the right direction...
Cheers,
Ed.

You don't calculate the MBID. The MBID is attached on every entity you retrieve from MusicBrainz.
When getting releases by DiscID you get a list. Each entry is a release and has an MBID, accessible with getId():
for (ReleaseWs2 rel : disc.getReleases()){
log.info("MBID: " + rel.getId() + ", String: " + rel.toString());
}
You then probably want to try the CoverArtArchive (CAA) for every release and take the first cover art you get.
Unfortunately I don't know of any API documentation for musicbrainzws2 on the web. I recommend running javadoc on all source files.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expression to get Information from Whatsapp Text file - java

I think you're looking for this regex, (\d{2}\/\d{2}\/\d{4})\s(\d(?:\d)?:\d{2} [ap]m):\s([^:]):(.?)(?=\s\d{2}\/|$) Java regex would be, "(?s)(\\d{2}/\\d{2}/\\d{4})\\s(\\d(?:\\d)?:\\d{2} [ap]m):\\s([^:]):(.?)(?=\\s\\d{2}/|$)" DEMO

Related

Removing the #-Tag from a message (command) Discord JDA

Parsing a Tab Separated File

Need to filter, parse and sort multiple log files

Show Prolog "write" output in Console Java

download cover art from musicbrainz with java

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular Expression to get Information from Whatsapp Text file - java

I think you're looking for this regex, (\d{2}\/\d{2}\/\d{4})\s(\d(?:\d)?:\d{2} [ap]m):\s([^:]*):(.*?)(?=\s*\d{2}\/|$) Java regex would be, "(?s)(\\d{2}/\\d{2}/\\d{4})\\s(\\d(?:\\d)?:\\d{2} [ap]m):\\s([^:]*):(.*?)(?=\\s*\\d{2}/|$)" DEMO

Related

Removing the #-Tag from a message (command) Discord JDA

Parsing a Tab Separated File

Need to filter, parse and sort multiple log files

Show Prolog "write" output in Console Java

download cover art from musicbrainz with java

Categories

Resources

I think you're looking for this regex, (\d{2}\/\d{2}\/\d{4})\s(\d(?:\d)?:\d{2} [ap]m):\s([^:]):(.?)(?=\s\d{2}\/|$) Java regex would be, "(?s)(\\d{2}/\\d{2}/\\d{4})\\s(\\d(?:\\d)?:\\d{2} [ap]m):\\s([^:]):(.?)(?=\\s\\d{2}/|$)" DEMO