Example using WikipediaTokenizer in Lucene - java

I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end, incrementToken, reset, reset(reader). Can someone point me to an example to use it.
Thank you.

In Lucene 3.0, next() method is removed. Now you should use incrementToken to iterate through the tokens and it returns false when you reach the end of the input stream. To obtain the each token, you should use the methods of the AttributeSource class. Depending on the attributes that you want to obtain (term, type, payload etc), you need to add the class type of the corresponding attribute to your tokenizer using addAttribute method.
Following partial code sample is from the test class of the WikipediaTokenizer which you can find if you download the source code of the Lucene.
...
WikipediaTokenizer tf = new WikipediaTokenizer(new StringReader(test));
int count = 0;
int numItalics = 0;
int numBoldItalics = 0;
int numCategory = 0;
int numCitation = 0;
TermAttribute termAtt = tf.addAttribute(TermAttribute.class);
TypeAttribute typeAtt = tf.addAttribute(TypeAttribute.class);
while (tf.incrementToken()) {
String tokText = termAtt.term();
//System.out.println("Text: " + tokText + " Type: " + token.type());
String expectedType = (String) tcm.get(tokText);
assertTrue("expectedType is null and it shouldn't be for: " + tf.toString(), expectedType != null);
assertTrue(typeAtt.type() + " is not equal to " + expectedType + " for " + tf.toString(), typeAtt.type().equals(expectedType) == true);
count++;
if (typeAtt.type().equals(WikipediaTokenizer.ITALICS) == true){
numItalics++;
} else if (typeAtt.type().equals(WikipediaTokenizer.BOLD_ITALICS) == true){
numBoldItalics++;
} else if (typeAtt.type().equals(WikipediaTokenizer.CATEGORY) == true){
numCategory++;
}
else if (typeAtt.type().equals(WikipediaTokenizer.CITATION) == true){
numCitation++;
}
}
...

WikipediaTokenizer tf = new WikipediaTokenizer(new StringReader(test));
Token token = new Token();
token = tf.next(token);
http://www.javadocexamples.com/java_source/org/apache/lucene/wikipedia/analysis/WikipediaTokenizerTest.java.html
Regards

public class WikipediaTokenizerTest {
static Logger logger = Logger.getLogger(WikipediaTokenizerTest.class);
protected static final String LINK_PHRASES = "click [[link here again]] click [http://lucene.apache.org here again] [[Category:a b c d]]";
public WikipediaTokenizer testSimple() throws Exception {
String text = "This is a [[Category:foo]]";
return new WikipediaTokenizer(new StringReader(text));
}
public static void main(String[] args){
WikipediaTokenizerTest wtt = new WikipediaTokenizerTest();
try {
WikipediaTokenizer x = wtt.testSimple();
logger.info(x.hasAttributes());
Token token = new Token();
int count = 0;
int numItalics = 0;
int numBoldItalics = 0;
int numCategory = 0;
int numCitation = 0;
while (x.incrementToken() == true) {
logger.info("seen something");
}
} catch(Exception e){
logger.error("Exception while tokenizing Wiki Text: " + e.getMessage());
}
}

Related

ATM using data structure without database and filehandling

I created array list of customer class.store data using joptionpane. how i can get data at specific index of arraylist for udpating customer data.
here its my customer class
public class Customer_Data {
public int account_num,starting_balance=0 ;
public String pincode="",name="",type="",account_num1="";
public Object status;
}
its admin class for create delete and update customer.
public class ADMIN extends javax.swing.JFrame {
/**
* Creates new form ADMIN
*/
public ADMIN() {
this.user = new ArrayList<Customer_Data>();
initComponents();
}
List<Customer_Data> user;
public void create_account() {
Customer_Data a = new Customer_Data();
a.account_num = (user.size() - 1)+1;
String[] s0 = {"Current", "Savings"};
String[] s01 = {"Active", "Deactive"};
String s = ""; a.name=JOptionPane.showInputDialog("Enter Name");
String s1 = "";
//a.pincode = JOptionPane.showInputDialog("Enter PinCode", s1);
do {
a.pincode = JOptionPane.showInputDialog("Enter 5 digit PinCode", s1);
} while (!a.pincode.matches("[0-9]{5}")); String s2="";
s2 = JOptionPane.showInputDialog("Enter Starting Balance ");
a.starting_balance = Integer.parseInt(s2);
//String s3 = "";
a.status = (String) JOptionPane.showInputDialog(null, "Select Status...", "Status", JOptionPane.QUESTION_MESSAGE, null, s01, s01[0]);
a.type = (String) JOptionPane.showInputDialog(null, "Select Type...", "Type", JOptionPane.QUESTION_MESSAGE, null, s0, s0[0]);
user.add(a);
for (int i = 0; i < user.size(); i++) {
Customer_Data var = user.get(i);
JOptionPane.showMessageDialog(null, var.account_num + "\n" + var.name + "\n" + var.pincode + "\n" + var.status + "\n" + var.type, "sad", JOptionPane.PLAIN_MESSAGE);
}
}
how i can get data at specific index in search function
public void Search() {
String s1 = "", s2 = "";
s1 = JOptionPane.showInputDialog("Enter Account Number u want to ", s2);
for (int i = 0; i < user.size(); i++) {
if (user.contains(s1)) {
for (int u = 0; u < user.indexOf(i); u++) {
Customer_Data var = user.get(u);
JOptionPane.showMessageDialog(null, var.account_num + "\n" + var.name + "\n" + var.pincode + "\n" + var.status + "\n" + var.type, "sad", JOptionPane.PLAIN_MESSAGE);
}
} else {
JOptionPane.showMessageDialog(null, "Not Fount");
}
}
}
Since your question is mainly about search function, I'll try to demonstrate how you can fix it. Check out the Search() function in below program.
I'm looping over the Customer_Data list only once. For each Customer_Data in the list, I get account_num and compare it with user inputted value.
To avoid confusions, I did not change any identifier name you have used. But I strongly recommend you to use Java naming conventions. E.g:
Use Admin instead of ADMIN
Use CustomerData instead of Customer_Data
Use search() instead of Search()
Use accountNum instead of account_num etc.
import javax.swing.JOptionPane;
import java.util.ArrayList;
import java.util.List;
public class ADMIN {
private List<Customer_Data> user;
public static void main(String[] args) {
ADMIN admin = new ADMIN();
admin.user = new ArrayList<>();
Customer_Data customer1 = new Customer_Data();
customer1.account_num = 123;
customer1.name = "Kevin";
admin.user.add(customer1);
Customer_Data customer2 = new Customer_Data();
customer2.account_num = 456;
customer2.name = "Sally";
admin.user.add(customer2);
Customer_Data customer3 = new Customer_Data();
customer3.account_num = 789;
customer3.name = "Peter";
admin.user.add(customer3);
admin.Search();
}
public void Search() {
String s1 = "", s2 = "";
s1 = JOptionPane.showInputDialog("Enter Account Number u want to ", s2);
boolean found = false;
for (int i = 0; i < user.size(); i++) {
Customer_Data var = user.get(i);
if (var.account_num == Integer.parseInt(s1)) {
JOptionPane.showMessageDialog(null, var.account_num + "\n" + var.name, "sad", JOptionPane.PLAIN_MESSAGE);
found = true;
}
}
if (!found) {
JOptionPane.showMessageDialog(null, "Not Fount");
}
}
}
class Customer_Data {
public int account_num,starting_balance=0 ;
public String pincode="",name="",type="",account_num1="";
public Object status;
}
the List<E> method listOject.indexOf(theSearchObject) is what you are looking for .
it return an int as the first matching occurance index or -1 if it's not contained .

Overall count for substrings in a string java

I have a program which takes tweets from twitter which contain a specific word and searchs through each tweet to count the occurrences of another word that relates to the topic (e.g. in this case the main word is cameron and it's searching for tax and panama.) I have it working so it counts for that specific tweet but I can't seem to work out how to get an accumulative count for all the occurrences. I've played around with incrementing a variable when the word occurs but it doesn't seem to work. The code is below, I've taken out my twitter API keys for obvious reasons.
public class TwitterWordCount {
public static void main(String[] args) {
ConfigurationBuilder configBuilder = new ConfigurationBuilder();
configBuilder.setOAuthConsumerKey(XXXXXXXXXXXXXXXXXX);
configBuilder.setOAuthConsumerSecret(XXXXXXXXXXXXXXXXXX);
configBuilder.setOAuthAccessToken(XXXXXXXXXXXXXXXXXX);
configBuilder.setOAuthAccessTokenSecret(XXXXXXXXXXXXXXXXXX);
//create instance of twitter for searching etc.
TwitterFactory tf = new TwitterFactory(configBuilder.build());
Twitter twitter = tf.getInstance();
//build query
Query query = new Query("cameron");
//number of results pulled each time
query.setCount(100);
//set the language of the tweets that we want
query.setLang("en");
//Execute the query
QueryResult result;
try {
result = twitter.search(query);
//Get the results
List<Status> tweets = result.getTweets();
//Print out the information
for (Status tweet : tweets) {
//get information about the tweet
String userName = tweet.getUser().getName();
long userId = tweet.getUser().getId();
Date creationDate = tweet.getCreatedAt();
String tweetText = tweet.getText();
//print out the information
System.out.println();
System.out.println("Tweeted by " + userName + "(" + userId + ") on date " + creationDate);
System.out.println("Tweet: " + tweetText);
// System.out.println();
String s = tweetText;
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.print(matcher.group() + " ");
}
String str = s;
String findStr = "tax";
int lastIndex = 0;
int count = 0;
//int countall = 0;
while (lastIndex != -1) {
lastIndex = str.indexOf(findStr, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += findStr.length();
//countall++;
}
}
System.out.println();
System.out.println(findStr + " = " + count);
String two = tweetText;
String str2 = two;
String findStr2 = "panama";
int lastIndex2 = 0;
int count2 = 0;
while (lastIndex2 != -1) {
lastIndex2 = str2.indexOf(findStr2, lastIndex2);
if (lastIndex2 != -1) {
count++;
lastIndex2 += findStr.length();
}
System.out.println(findStr2 + " = " + count2);
}
}
}
catch (TwitterException ex) {
ex.printStackTrace();
}
}
}
I'm also aware that this definitely isn't the cleanest of programs, it's work in progress!
You must define your count variables outside of the for-loop.
int countKeyword1 = 0;
int countKeyword2 = 0;
for (Status tweet : tweets) {
//increase count variables in you while loops
}
System.out.Println("Keyword1 occurrences : " + countKeyword1 );
System.out.Println("Keyword2 occurrences : " + countKeyword2 );
System.out.Println("All occurrences : " + (countKeyword1 + countKeyword2) );

Exception in thread "main" java.lang.NullPointerException when trying to update file

I'm in a beginner CS class and I'm trying to update info in a file. The info in the array does get replaced temporarily; however, I am unable to save the changes to the file. And, even after it's replaced, I get the "null" error.
Here is my code, I have omitted the lines and methods that are unrelated:
public static void readData(){
// Variables
int choice2, location;
// Read file
File dataFile = new File("C:/Users/shirley/Documents/cddata.txt");
FileReader in;
BufferedReader readFile;
// Arrays
String[] code = new String[100];
String[] type = new String[100];
String[] artist = new String[100];
String[] song = new String[100];
Double[] price = new Double[100];
Double[] vSales = new Double[100];
// Split Variables
String tempCode, tempType, tempArtist, tempSong, tempPrice, tempVsales;
// Split
String text;
int c = 0;
try{
in = new FileReader(dataFile);
readFile = new BufferedReader(in);
while ((text = readFile.readLine()) != null){
// Split line into temp variables
tempCode = text.substring(0,5);
tempType = text.substring(5,15);
tempArtist = text.substring(16,30);
tempSong = text.substring(30,46);
tempPrice = text.substring(46,52);
tempVsales = text.substring(52);
// Place text in correct arrays
code[c] = tempCode;
type[c] = tempType;
artist[c] = tempArtist;
song[c] = tempSong;
price[c] = Double.parseDouble(tempPrice);
vSales[c] = Double.parseDouble(tempVsales);
c += 1; // increase counter
}
// Output to user
Scanner kb = new Scanner(System.in);
System.out.print("\nSelect another number: ");
choice2 = kb.nextInt();
// Reads data
if (choice2 == 5){
reqStatsSort(code,type,artist,song,price,vSales,c);
location = reqStatistics(code,type,artist,song,price,vSales,c);
if (location == -1){
System.out.println("Sorry, code not found.");
}
else{
System.out.print("Enter new volume sales: ");
vSales[location] = kb.nextDouble();
}
displayBestSellerArray(type,artist,song,vSales,c);
readFile.close();
in.close();
changeVolume(code,type,artist,song,price,vSales,c); // Method to rewrite file
readData();
}
}catch(FileNotFoundException e){
System.out.println("File does not exist or could not be found.");
System.err.println("FileNotFoundException: " + e.getMessage());
}catch(IOException e){
System.out.println("Problem reading file.");
System.err.println("IOException: " + e.getMessage());
}
}
/////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////
////////////////////////////////////////////////////////
///////////////// REQ STATS SORT METHOD ////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
public static void reqStatsSort(String[] sortCode, String[] sortType, String[] sortArtist,
String[] sortSong, Double[] sortPrice, Double[] sortVSales, int c){
// Variables
String tempCode, tempArtist, tempType, tempSong;
double tempVsales, tempPrice;
for(int j = 0; j < (c - 1); j++){
for (int k = j + 1; k < c; k++){
if ((sortCode[k]).compareToIgnoreCase(sortCode[j]) < 0){
// Switch CODE
tempCode = sortCode[k];
sortCode[k] = sortCode[j];
sortCode[j] = tempCode;
// Switch TYPE
tempType = sortType[k];
sortType[k] = sortType[j];
sortType[j] = tempType;
// Switch ARTIST
tempArtist = sortArtist[k];
sortArtist[k] = sortArtist[j];
sortArtist[j] = tempArtist;
// Switch SONG
tempSong = sortSong[k];
sortSong[k] = sortSong[j];
sortSong[j] = tempSong;
// Switch VOLUME
tempVsales = sortVSales[k];
sortVSales[k] = sortVSales[j];
sortVSales[j] = tempVsales;
// Switch PRICE
tempPrice = sortPrice[k];
sortPrice[k] = sortPrice[j];
sortPrice[j] = tempPrice;
}
}
}
}
/////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////
////////////////////////////////////////////////////////
/////////////// REQUEST STATISTICS METHOD //////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
public static int reqStatistics(String[] statsCode, String[] statsType,
String[] statsArtist, String[] statsSong, Double[] statsPrice,
Double[] statsVSales, int c){
// Variables
String cdCode;
// Obtain input from user
Scanner kb = new Scanner(System.in);
System.out.print("Enter a CD code: ");
cdCode = kb.nextLine();
// Binary search
int position;
int lowerbound = 0;
int upperbound = c - 1;
// Find middle position
position = (lowerbound + upperbound) / 2;
while((statsCode[position].compareToIgnoreCase(cdCode) != 0) && (lowerbound <= upperbound)){
if((statsCode[position].compareToIgnoreCase(cdCode) > 0)){
upperbound = position - 1;
}
else {
lowerbound = position + 1;
}
position = (lowerbound + upperbound) / 2;
}
if (lowerbound <= upperbound){
return(position);
}
else {
return (-1);
}
}
/////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////
////////////////////////////////////////////////////////
/////////////// BEST SELLER ARRAY METHOD //////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
public static void displayBestSellerArray (String[] displaySortedType,
String[] displaySortedArtist, String[] displaySortedSong,
Double[] displaySortedVSales, int c){
// Output to user
System.out.println();
System.out.println("MUSIC ARTIST HIT SONG VOLUME");
System.out.println("TYPE SALES");
System.out.println("--------------------------------------------------------------------");
for (int i = 0; i < c; i++){
System.out.print(displaySortedType[i] + " " + displaySortedArtist[i] + " "
+ displaySortedSong[i] + " ");
System.out.format("%6.0f",displaySortedVSales[i]);
System.out.println();
}
}
/////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////
////////////////////////////////////////////////////////
////////////////// CHANGE VOLUME METHOD ////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
public static void changeVolume(String[] writeCode, String[] writeType,
String[] writeArtist, String[] writeSong, Double[] writePrice,
Double[] writeVSales, int c){
File textFile = new File("C:/Users/shirley/Documents/cddata.txt");
FileWriter out;
BufferedWriter writeFile;
// Variables
String entireRecord, tempVSales;
int decLoc;
try{
out = new FileWriter(textFile);
writeFile = new BufferedWriter(out);
// Output to user
for (int i = 1; i <= c; i++){
// Convert volume sales to String
tempVSales = Double.toString(writeVSales[i]);
// Get rid of decimals
decLoc = (tempVSales.indexOf("."));
tempVSales = tempVSales.substring(0,decLoc);
// Create record line
entireRecord = writeCode[i] + " " + writeType[i] + " " + writeArtist[i]
+ " " + writeSong[i] + " " + writePrice[i] + " " + tempVSales;
// Write record to file
writeFile.write(entireRecord);
if (i != c){
writeFile.newLine();
}
}
writeFile.close();
out.close();
System.out.println("Data written to file.");
}
catch(IOException e){
System.out.println("Problem writing to file.");
System.out.println("IOException: " + e.getMessage());
}
}
The last method, changeVolume(), is what isn't working. The error I get is
Exception in thread "main" java.lang.NullPointerException
at culminating3.Culminating3.changeVolume(Culminating3.java:508)
at culminating3.Culminating3.readData(Culminating3.java:185)
at culminating3.Culminating3.readData(Culminating3.java:167)
at culminating3.Culminating3.main(Culminating3.java:47)
Java Result: 1
Line 508 is:
tempVSales = Double.toString(writeVSales[i]);
in the changeVolume method().
So my program asks the user for a CD code to change the volume of sales, and sorts the arrays to perform a binary search if the inputted code exists. If it does, my program replaces the old volume of sales (which it does), and saves it with the changeVolume() method (which it doesn't do and gives me the error).
Please keep in mind I'm a newbie. It looks fine to me but I can't figure out why it's not working. I apologize for any messes in the code. writeVSales[] shouldn't be null because I assigned input in the readData() method?
Problem is here:
// Convert volume sales to String
tempVSales = Double.toString(writeVSales[i]);
// Get rid of decimals
decLoc = (tempVSales.indexOf("."));
tempVSales = tempVSales.substring(0,decLoc);
I suggest you to take some sample values and work on this first.
You can use StringTokenizer to perform this.
When you input the information into the writeVSales array you start at 0 (good) and increment c everytime a new item is added, whether or not there is a new item to add or not (again this is fine).
int c = 0;
try{
in = new FileReader(dataFile);
readFile = new BufferedReader(in);
while ((text = readFile.readLine()) != null){
// Split line into temp variables
tempCode = text.substring(0,5);
tempType = text.substring(5,15);
tempArtist = text.substring(16,30);
tempSong = text.substring(30,46);
tempPrice = text.substring(46,52);
tempVsales = text.substring(52);
// Place text in correct arrays
code[c] = tempCode;
type[c] = tempType;
artist[c] = tempArtist;
song[c] = tempSong;
price[c] = Double.parseDouble(tempPrice);
vSales[c] = Double.parseDouble(tempVsales);
c += 1; // increase counter
}
Later in changeVolume() your for loop starts at 1 and goes to c. So you are missing the first element and trying to add an element from an index that is null, hence the `NullPointerexception.
// Output to user
for (int i = 1; i <= c; i++){
//code
}
Change the for loop to start and 0 and go to i < c (i.e. c - 1):
for (int i = 0; i < c; i++){
// Convert volume sales to String
tempVSales = Double.toString(writeVSales[i]);
// Get rid of decimals
decLoc = (tempVSales.indexOf("."));
tempVSales = tempVSales.substring(0,decLoc);
// Create record line
entireRecord = writeCode[i] + " " + writeType[i] + " " + writeArtist[i]
+ " " + writeSong[i] + " " + writePrice[i] + " " + tempVSales;
// Write record to file
writeFile.write(entireRecord);
if (i != c){
writeFile.newLine();
}
}

Tokenizer not separating string? (JAVA)

I have a class Called File
Location which stores the size, name, drive and directory of a file.
The class is supposed to separate the extension from the file name ("java" from "test.java") then compare it to another file using an equals method. Though for some reason it is returning false everytime. Any idea what's wrong?
Class file
import java.util.*;
public class FileLocation
{
private String name;
private char drive;
private String directory;
private int size;
public FileLocation()
{
drive = 'X';
directory = "OOProgramming\\Practicals\\";
name = "test";
size = 2;
}
public FileLocation(char driveIn, String dirIn, String nameIn, int sizeIn)
{
drive = driveIn;
directory = dirIn;
name = nameIn;
size = sizeIn;
}
public String getFullPath()
{
return drive + ":\\" + directory + name;
}
public String getFileType()
{
StringTokenizer st1 = new StringTokenizer(name, ".");
return "File type is " + st1.nextToken();
}
public String getSizeAsString()
{
StringBuilder data = new StringBuilder();
if(size > 1048575)
{
data.append("gb");
}
else if(size > 1024)
{
data.append("mb");
}
else
{
data.append("kb");
}
return size + " " + data;
}
public boolean isTextFile()
{
StringTokenizer st2 = new StringTokenizer(name, ".");
if(st2.nextToken() == ".txt" || st2.nextToken() == ".doc")
{
return true;
}
else
{
return false;
}
}
public void appendDrive()
{
StringBuilder st1 = new StringBuilder(drive);
StringBuilder st2 = new StringBuilder(directory);
StringBuilder combineSb = st1.append(st2);
}
public int countDirectories()
{
StringTokenizer stDir =new StringTokenizer(directory, "//");
return stDir.countTokens();
}
public String toString()
{
return "Drive: " + drive + " Directory: " + directory + " Name: " + name + " Size: " + size;
}
public boolean equals(FileLocation f)
{
return drive == f.drive && directory == f.directory && name == f.name && size == f.size;
}
}
Tester program
import java.util.*;
public class FileLocationTest
{
public static void main(String [] args)
{
Scanner keyboardIn = new Scanner(System.in);
FileLocation javaAssign = new FileLocation('X', "Programming\\Assignment\\", "Loan.txt", 1);
int selector = 0;
System.out.print(javaAssign.isTextFile());
}
}
this code will give true only if the file is doc.
StringTokenizer st2 = new StringTokenizer(name, ".");
if(st2.nextToken() == ".txt" || st2.nextToken() == ".doc")
if file name file.txt then what happend
(st2.nextToken() == ".txt") means ("file" == "txt") false
(st2.nextToken() == ".doc") means ("txt" == "txt") false
first token will gave file name second token will gave ext.
right code is
StringTokenizer st2 = new StringTokenizer(name, ".");
String filename = st2.nextToken();
String ext = st2.nextToken();
if(ext.equalsIgnoreCase(".txt") || ext.equalsIgnoreCase(".txt"))
use always equals to compare strings not ==
Take a look at my own question I posted a while back. I ended up using Apache Lucene's tokenizer.
Here is how you use it (copied from here):
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = charTermAttribute.toString();
}

Improving the code that parses a Text File

Text File(First three lines are simple to read, next three lines starts with p)
ThreadSize:2
ExistingRange:1-1000
NewRange:5000-10000
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
p:25 - CrossPromoEditItemRule Data:New UserLogged:false Attribute:1 Attribute:10107 Attribute:10108
p:20 - CrossPromoManageRules Data:Previous UserLogged:true Attribute:1 Attribute:10107 Attribute:10108
Below is the code I wrote to parse the above file and after parsing it I am setting the corresponding values using its Setter. I just wanted to know whether I can improve this code more in terms of parsing and other things also by using other way like using RegEx? My main goal is to parse it and set the corresponding values. Any feedback or suggestions will be highly appreciated.
private List<Command> commands;
private static int noOfThreads = 3;
private static int startRange = 1;
private static int endRange = 1000;
private static int newStartRange = 5000;
private static int newEndRange = 10000;
private BufferedReader br = null;
private String sCurrentLine = null;
private int distributeRange = 100;
private List<String> values = new ArrayList<String>();
private String commandName;
private static String data;
private static boolean userLogged;
private static List<Integer> attributeID = new ArrayList<Integer>();
try {
// Initialize the system
commands = new LinkedList<Command>();
br = new BufferedReader(new FileReader("S:\\Testing\\Test1.txt"));
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains("ThreadSize")) {
noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
} else if(sCurrentLine.contains("ExistingRange")) {
startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else if(sCurrentLine.contains("NewRange")) {
newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
} else {
allLines.add(Arrays.asList(sCurrentLine.split("\\s+")));
String key = sCurrentLine.split("-")[0].split(":")[1].trim();
String value = sCurrentLine.split("-")[1].trim();
values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
for(String s : values) {
if(s.contains("Data:")) {
data = s.split(":")[1];
} else if(s.contains("UserLogged:")) {
userLogged = Boolean.parseBoolean(s.split(":")[1]);
} else if(s.contains("Attribute:")) {
attributeID.add(Integer.parseInt(s.split(":")[1]));
} else {
commandName = s;
}
}
Command command = new Command();
command.setName(commandName);
command.setExecutionPercentage(Double.parseDouble(key));
command.setAttributeID(attributeID);
command.setDataCriteria(data);
command.setUserLogging(userLogged);
commands.add(command);
}
}
} catch(Exception e) {
System.out.println(e);
}
I think you should know what exactly you're expecting while using RegEx. http://java.sun.com/developer/technicalArticles/releases/1.4regex/ should be helpful.
To answer a comment:
p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060
to parse above with regex (and 3 times Attribute:):
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern
.compile(
"p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)\\s+Attribute:(\\d+)")
.matcher(parseLine);
if(m.find()) {
int p = Integer.parseInt(m.group(1));
String method = m.group(2);
String data = m.group(3);
boolean userLogged = Boolean.valueOf(m.group(4));
int at1 = Integer.parseInt(m.group(5));
int at2 = Integer.parseInt(m.group(6));
int at3 = Integer.parseInt(m.group(7));
System.out.println(p + " " + method + " " + data + " " + userLogged + " " + at1 + " " + at2 + " "
+ at3);
}
EDIT looking at your comment you still can use regex:
String parseLine = "p:55 - AutoRefreshStoreCategories Data:Previous UserLogged:true "
+ "Attribute:1 Attribute:16 Attribute:2060";
Matcher m = Pattern.compile("p:(\\d+)\\s-\\s(.*?)\\s+Data:(.*?)\\s+UserLogged:(.*?)").matcher(
parseLine);
if(m.find()) {
for(int i = 0; i < m.groupCount(); ++i) {
System.out.println(m.group(i + 1));
}
}
Matcher m2 = Pattern.compile("Attribute:(\\d+)").matcher(parseLine);
while(m2.find()) {
System.out.println("Attribute matched: " + m2.group(1));
}
But that depends if thre is no Attribute: names before "real" attributes (for example as method name - after p)
You can use the Scanner class. It has some helper methods to read text files
I would turn this inside out. Presently you are:
Scanning the line for a keyword: the entire line if it isn't found, which is the usual case as you have a number of keywords to process and they won't all be present on every line.
Scanning the entire line again for ':' and splitting it on all occurrences
Mostly parsing the part after ':' as an integer, or occasionally as a range.
So several complete scans of each line. Unless the file has zillions of lines this isn't a concern in itself but it demonstrates that you have got the processing back to front.

Categories