I am creating a list of files under one Dir recursively. I have one variable( UNPACK_EXT ) defined asa constant and imported to exclude from the search / filelist.
the goal is to return a file list of all files that contain the (STATS_FILE ) variable with out searching any dir that contain ( UNPACK_EXT )
this is part of the code:
I pass in a variable called baseDir, this is the starting point of my search. I only want to list file that match (STATS_FILE ), however due to the size of directories that contain (UNPACK_EXT ) in the dir name I need to exclude these, as they are already available in other part's of application.
Spublic class FindStatFilesTag extends BaseTag
{
private String baseDir;
#Override
public void doTag() throws JspException, IOException
{
if ((!isEmpty(var)) && (!isEmpty(baseDir)))
{
if (baseDir.contains("TS"))
{
String pmr = baseDir;
JspContext context = getJspContext();
baseDir = BASE_SF_PATH + "TS" + pmr.charAt(2) + pmr.charAt(3) + pmr.charAt(4) + "/" + pmr.charAt(5) + pmr.charAt(6) + pmr.charAt(7) + "/" + baseDir;
List<File> fileList = new ArrayList<File>();
String startDir = baseDir;
String dirName = startDir + "/" ;
File dir = new File(dirName);
fileList.addAll(listAll(dir));
Map<String, List<FileInfo>> fileInfoMap = new TreeMap<String, List<FileInfo>>();
for (Iterator<File> iter = fileList.iterator(); iter.hasNext();)
{
File file = iter.next();
String name = file.getAbsolutePath();
String shortName = "";
if ((name.contains(STATS_FILE)) && (!name.contains(UNPACK_EXT)))
{
shortName = name.substring(name.indexOf(STATS_FILE));
}
FileInfo fileInfo = new FileInfo(name, "", file.length());
if (fileInfoMap.containsKey(shortName))
{
List<FileInfo> fileInfoList = fileInfoMap.get(shortName);
if (!fileInfoList.contains(fileInfo))
{
fileInfoList.add(fileInfo);
Collections.sort(fileInfoList);
}
}
else
{
List<FileInfo> fileInfoList = new ArrayList<FileInfo>();
fileInfoList.add(fileInfo);
fileInfoMap.put(shortName, fileInfoList);
}
}
context.setAttribute(var, fileInfoMap);
}
I made a java program that will check contents of directory and generate for each file a md5 checksum. When the program is done it will save it to a CSV file. So far the lookup of files is working perfectly except that when writing to the CSV i want to make to only add new detected files. I think the issue lies with the md5 string used as key is not correctly found.
Here is an excerpt of the CSV file:
4d1954a6d4e99cacc57beef94c80f994,uiautomationcoreapi.h;E:\Tools\Strawberry-perl-5.24.1.1-64\c\x86_64-w64-mingw32\include\uiautomationcoreapi.h;N/A
56ab7135e96627b90afca89199f2c708,winerror.h;E:\Tools\Strawberry-perl-5.24.1.1-64\c\x86_64-w64-mingw32\include\winerror.h;N/A
146e5c5e51cc51ecf8d5cd5a6fbfc0a1,msimcsdk.h;E:\Tools\Strawberry-perl-5.24.1.1-64\c\x86_64-w64-mingw32\include\msimcsdk.h;N/A
e0c43f92a1e89ddfdc2d1493fe179646,X509.pm;E:\Tools\Strawberry-perl-5.24.1.1-64\perl\vendor\lib\Crypt\OpenSSL\X509.pm;N/A
As you can see first is the MD5 as key and afterwards is a long string containing name, location and score that will be split with the ; character.
and here is the code that should make sure only new ones are added:
private static HashMap<String, String> map = new HashMap<String,String>();
public void UpdateCSV(HashMap<String, String> filemap) {
/*Set set = filemap.entrySet();
Iterator iterator = set.iterator();
while(iterator.hasNext()) {
Map.Entry mentry = (Map.Entry) iterator.next();
String md = map.get(mentry.getKey());
System.out.println("checking key:" + md);
if (md == null) {
String[] line = mentry.getValue().toString().split(";");
System.out.println("Adding new File:" + line[0]);
map.put(mentry.getKey().toString(), mentry.getValue().toString());
}
}*/
for (final String key : filemap.keySet()) {
String md = map.get(key.toCharArray());
if (md == null) {
System.out.println("Key was not found:" + key);
String[] line = filemap.get(key).toString().split(";");
System.out.println("Adding new File:" + line[0]);
map.put(key, filemap.get(key));
}
}
}
As you can see from the commented code i tried in different ways already. hashmap filemap is the current status of the folder structure.
To read the already saved CSV file is use the following code:
private void readCSV() {
System.out.println("Reading CSV file");
BufferedReader br = new BufferedReader(filereader);
String line = null;
try {
while ((line = br.readLine()) != null) {
String str[] = line.split(",");
for (int i = 0; i < str.length; i++) {
String arr[] = str[i].split(":");
map.put(arr[0], arr[1]);
System.out.println("just added to map" + arr[0].toString() + " with value "+ arr[0].toString() );
}
}
}
catch(java.io.IOException e) {
System.out.println("Can't read file");
}
}
So when i run the program it will say that all files are new even tough they are already known in the CSV. So can anyone help to get this key string checked correctly?
As #Ben pointed out, your problem is that you use String as key when putting, but char[] when getting.
It should be something along the lines:
for (final String key : filemap.keySet()) {
map.computeIfAbsent(key, k -> {
System.out.println("Key was not found:" + k);
String[] line = filemap.get(k).toString().split(";");
System.out.println("Adding new File:" + line[0]);
return filemap.get(k);
});
}
Since you need both key as well as value from filemap, you actually better iterate over entrySet. This will save you additional filemap.gets:
for (final Map.Entry<String, String> entry : filemap.entrySet()) {
final String key = entry.getKey();
final String value = entry.getValue();
map.computeIfAbsent(key, k -> {
System.out.println("Key was not found:" + k);
String[] line = value.split(";");
System.out.println("Adding new File:" + line[0]);
return value;
});
}
I have a MapReduce Program which can process Delimited, Fixed Width and Excel Files. There is no problem in reading Delimited and Fixed Width File. But Problem with Excel File is setup() and cleanup() methods are getting called,but not the map(). I tried with adding annotations to map() still it didnt work.
public class RulesDriver extends Configured implements Tool {
private static Logger LOGGER = LoggerFactory.getLogger(RulesDriver.class);
RuleValidationService aWSS3Service = new RuleValidationService();
HashMap<String, Object> dataMap = new HashMap<String, Object>();
HashMap<String, String> controlMap = new HashMap<String, String>();
public String inputPath = "";
public String outputPath = "";
private static DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm");
ControlFileReader ctrlReader = new ControlFileReader();
CSVToExcel csv2Excel = new CSVToExcel();
HashMap<Integer,String> countMap = new HashMap<Integer,String>();
HashMap<String,Integer> numberValueMap = new HashMap<String,Integer>();
HashMap<String,Object> rulesMap = new HashMap<String,Object>();
CharsetConvertor charsetConvertor = new CharsetConvertor();
ControlFileComparison controlFileComparison = new ControlFileComparison();
boolean isControlFileValid = false;
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RulesDriver(), args);
System.exit(res);
}
#Override
public int run(String[] args) throws Exception {
LOGGER.info("HADOOP MapReduce Driver started");
if (args.length < 3) {
LOGGER.info("Args ");
return 1;
}
int j = -1;
//Prop - Starts
String cacheBucket = args[0];
String s3accesskey = args[1];
String s3accesspass = args[2];
//Prop - ends
// file2InputPath Value is from DB
String file2InputLocation = "";
String fileComparisonInd = "";
String inputPath = "";
String outputPath = "";
String url = "";
String userId = "";
String password = "";
String fileType = "";
String ctrlCompResult = "N";
try {
Configuration conf = new Configuration();
Properties prop = new Properties();
//Prop - starts
prop.setProperty("qatool.cacheBucket", cacheBucket);
prop.setProperty("qatool.s3accesskey", s3accesskey);
prop.setProperty("qatool.s3accesspass", s3accesspass);
String propertiesFile = aWSS3Service.getObjectKey(cacheBucket, "application",prop);
if(null==propertiesFile && "".equals(propertiesFile)){
return 0;
}
S3Object s3Object = aWSS3Service.getObject(cacheBucket, propertiesFile, prop);
LOGGER.info("Loading App properties");
InputStreamReader in = new InputStreamReader(s3Object.getObjectContent());
Properties appProperties = new Properties();
try {
appProperties.load(in);
prop.putAll(appProperties);
LOGGER.info(" ",prop);
}
catch (IOException e1) {
LOGGER.error("Exception while reading properties file :" , e1);
return 0;
}
initialize(prop);
if (!(dataMap == null)) {
if (("N").equals(dataMap.get("SuccessIndicator"))) {
return 0;
}
List value = (ArrayList) dataMap.get("LookUpValList");
LOGGER.info("lookUpVallist",value);
}
if (dataMap != null) {
controlMap = (HashMap<String, String>) dataMap.get("ControlMap");
}
if (controlMap != null) {
inputPath = (prop.getProperty("qatool.rulesInputLocation") + "/").concat((String) dataMap.get("InputFileName")); //TEMP
LOGGER.info(inputPath);
fileType = (String) dataMap.get("FileType");
} else {
inputPath = (prop.getProperty("qatool.rulesInputLocation") + "/").concat((String) dataMap.get("InputFileName"));
LOGGER.info(inputPath);
fileType = (String) dataMap.get("FileType");
}
rulesMap = (HashMap<String,Object>)dataMap.get("RulesMap");
isControlFileValid = controlFileComparison.compareControlFile(controlMap, aWSS3Service, prop, rulesMap); //TEMP
LOGGER.info("isControlFileValid in driver : "+isControlFileValid);
if(isControlFileValid){
ctrlCompResult = "Y";
}
conf.set("isControlFileValid", ctrlCompResult);
// ** DATABASE Connection **/
String ctrlFileId = controlMap.get("ControlFileIdentifier");
url = prop.getProperty(QaToolRulesConstants.DB_URL);
userId = prop.getProperty(QaToolRulesConstants.DB_USER_ID);
password = prop.getProperty(QaToolRulesConstants.DB_USER_DET);
InpflPrcsSumm inpflPrcsSumm = new InpflPrcsSumm();
DBConnectivity dbConnectivity = new DBConnectivity(url, userId, password);
inpflPrcsSumm = dbConnectivity.getPreviousFileDetail(ctrlFileId);
dbConnectivity.closeConnection();
LOGGER.info( " inpflPrcsSumm.getPrevFileId() " + inpflPrcsSumm.getPrevFileId());
prop.setProperty(QaToolRulesConstants.PREV_FILE_ID, inpflPrcsSumm.getPrevFileId().toString());
file2InputLocation = inpflPrcsSumm.getPrevFileLocation();
boolean file2Available = file2InputLocation.isEmpty();
String folderPath = "";
String bucket = "";
if (!file2Available) {
String arr[] = file2InputLocation.split("/");
if(file2InputLocation.startsWith("http")){
bucket = arr[3];
}else{
bucket = arr[2];
}
folderPath = file2InputLocation.substring(file2InputLocation.lastIndexOf(bucket) + bucket.length() + 1, file2InputLocation.length());
}
// File 2 input path
prop.setProperty("qatool.file2InputPath", file2InputLocation);
if(!file2Available){
file2InputLocation = file2InputLocation + "/Success";
String file2Name = aWSS3Service.getObjectKey(bucket, folderPath,prop);
LOGGER.info("bucket->"+bucket);
LOGGER.info("folderPath->"+folderPath);
file2Name = file2Name.substring(file2Name.lastIndexOf("/")+1, file2Name.length());
prop.setProperty("file2Name", (null!=file2Name && "".equals(file2Name))?"":file2Name);
LOGGER.info(prop.getProperty("file2Name"));
}
prop.setProperty("qatool.auditPrevFolderPath", folderPath);
prop.setProperty("qatool.auditBucketPrevFolderPath", bucket);
LOGGER.info("ctrlFileId : " + ctrlFileId);
LOGGER.info("BUCKET : " + bucket);
LOGGER.info("folder : " + folderPath);
Date dateobj = new Date();
outputPath = (String) prop.getProperty("qatool.rulesOutputLocation") + "/" + dateFormat.format(dateobj); //TEMP
fileComparisonInd = controlMap.get("FileComparisonIndicator");
Gson gson = new Gson();
String propSerilzation = gson.toJson(prop);
conf.set("application.properties", propSerilzation);
Job job = Job.getInstance(conf);
job.setJarByClass(RulesDriver.class);
job.setJobName("Rule Validation and Comparison");
job.getConfiguration().set("fs.s3n.awsAccessKeyId", (String) prop.getProperty("qatool.s3accesskey"));
job.getConfiguration().set("fs.s3n.awsSecretAccessKey", (String) prop.getProperty("qatool.s3accesspass"));
job.getConfiguration().set("fs.s3.awsAccessKeyId", (String) prop.getProperty("qatool.s3accesskey"));
job.getConfiguration().set("fs.s3.awsSecretAccessKey", (String) prop.getProperty("qatool.s3accesspass"));
job.getConfiguration().set("fs.s3a.awsAccessKeyId", (String) prop.getProperty("qatool.s3accesskey"));
job.getConfiguration().set("fs.s3a.awsSecretAccessKey", (String) prop.getProperty("qatool.s3accesspass"));
job.getConfiguration().set("fs.s3n.endpoint", "s3.amazonaws.com");
job.getConfiguration().set("fs.s3.endpoint", "s3.amazonaws.com");
job.getConfiguration().set("fs.s3a.endpoint", "s3.amazonaws.com");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(RulesCountReducer.class);
job.setNumReduceTasks(1);
job.setMaxMapAttempts(1);
job.setMaxReduceAttempts(1);
if("UTF-16".equalsIgnoreCase(controlMap.get("FileCodePage"))){
convertEncoding((String)dataMap.get("InputFileName"),rulesInputLocation,prop);
if (!file2Available && "Y".equals(ctrlCompResult)) {
convertEncoding(inpflPrcsSumm.getPrevFileName(),file2InputLocation,prop);
}
}
LOGGER.info("fileComparisonInd + "+ fileComparisonInd + " file2Available + " + file2Available + " ctrlCompResult + " + ctrlCompResult);
if (fileType != null && fileType.equals(QaToolRulesConstants.INPUT_FILE_TYPE_DELIMI)) {
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
MultipleInputs.addInputPath(job, new Path(rulesInputLocation), TextInputFormat.class, TextRulesMapper.class);
if (fileComparisonInd.equalsIgnoreCase(QaToolRulesConstants.FILE_COMP_INDICATOR) && !file2Available && "Y".equals(ctrlCompResult)) {
Path file2InputPath = new Path(file2InputLocation);
if (isInputPathAvail(file2InputPath, conf)) {
MultipleInputs.addInputPath(job, file2InputPath, TextInputFormat.class, TextRulesMapperFile2.class);
}
}
} else if (fileType != null && fileType.equals(QaToolRulesConstants.INPUT_FILE_TYPE_EXCEL)) {
job.setInputFormatClass(ExcelInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
MultipleInputs.addInputPath(job, new Path(rulesInputLocation), ExcelInputFormat.class, ExcelMapper.class);
String inputFileName = controlMap.get("InputFileName");
String fileExtn = inputFileName.substring(inputFileName.lastIndexOf(".") + 1);
prop.setProperty("File", "Excel");
prop.setProperty("fileExtension", fileExtn);
if (fileComparisonInd.equalsIgnoreCase(QaToolRulesConstants.FILE_COMP_INDICATOR) && !file2Available && "Y".equals(ctrlCompResult)) {
Path file2InputPath = new Path(file2InputLocation);
if (isInputPathAvail(file2InputPath, conf)) {
MultipleInputs.addInputPath(job, file2InputPath, ExcelInputFormat.class, ExcelMapper2.class);
}
}
} else if (fileType != null && fileType.equals(QaToolRulesConstants.INPUT_FILE_TYPE_FIXED)) {
prop.setProperty("File", "DAT");
MultipleInputs.addInputPath(job, new Path(rulesInputLocation), TextInputFormat.class, FixedWidthMapper.class);
if (fileComparisonInd.equalsIgnoreCase(QaToolRulesConstants.FILE_COMP_INDICATOR) && !file2Available && "Y".equals(ctrlCompResult)) {
Path file2InputPath = new Path(file2InputLocation);
if (isInputPathAvail(file2InputPath, conf)) {
MultipleInputs.addInputPath(job, file2InputPath, TextInputFormat.class, FixedWidthMapper2.class);
}
}
}
MultipleOutputs.addNamedOutput(job, "error", TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, "success", TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, QaToolRulesConstants.ADDED_DELETED, TextOutputFormat.class, Text.class, Text.class );
/*MultipleOutputs.addNamedOutput(job, QaToolRulesConstants.ADDED_UPDATED, TextOutputFormat.class, Text.class, Text.class );*/ //TEMP ADDED FOR ADDED AND UPDATED
MultipleOutputs.addNamedOutput(job, QaToolRulesConstants.DETAIL, TextOutputFormat.class, Text.class, Text.class);
FileOutputFormat.setOutputPath(job, new Path(outputPath));
j = job.waitForCompletion(true) ? 0 : 1;
LOGGER.info("Program Complted with return " + j);
//Code Added for Control file Movement -- starts
String outputBucket = rulesOutputLocation;
outputBucket = outputBucket.substring(outputBucket.indexOf("//")+2, outputBucket.length());
outputBucket = outputBucket.substring(0,(outputBucket.indexOf("/")));
String controlFileNamekey = aWSS3Service.getObjectKey(outputBucket, "delivery/"+ dataMap.get("ControlFileName"),prop);
if (controlFileNamekey != null) {
controlFileNamekey = (String) controlFileNamekey.substring(controlFileNamekey.lastIndexOf("/") + 1,controlFileNamekey.length());
String outputCtrlFilePath = "delivery/"+ dateFormat.format(dateobj) +"/" + controlFileNamekey;
LOGGER.info("controlFileNamekey "+controlFileNamekey+" outputCtrlFilePath "+outputCtrlFilePath);
aWSS3Service.moveObjects(outputBucket, "delivery/"+controlFileNamekey, outputBucket, outputCtrlFilePath,prop);
}
//Code Added for Control file Movement -- Ends
if (j == 0) {
// Get counters
LOGGER.info("Transfer");
final Counters counters = job.getCounters();
long duplicates = counters.findCounter(MATCH_COUNTER.DUPLICATES).getValue();
LOGGER.info("duplicates->"+duplicates);
long groupingThreshold = counters.findCounter(MATCH_COUNTER.GROUPING_ERR_THRESHOLD).getValue();
LOGGER.info("groupingThreshold->"+groupingThreshold);
if(groupingThreshold==1 || duplicates==1){
if(duplicates==1){
writeOutputFile(folderName,dateobj,"DuplicateRecords",prop,cacheBucket);
}else{
writeOutputFile(folderName,dateobj,"GroupingThreshold",prop,cacheBucket);
}
}else{
long successCount = counters.findCounter(MATCH_COUNTER.SUCCESS_COUNT).getValue();
if (controlMap.get("ColumnHeaderPresentIndicator") != null
&& controlMap.get("ColumnHeaderPresentIndicator").equals("Y")) {
successCount = successCount-1;
}
LOGGER.info("successCount "+successCount);
LOGGER.info("TOLERANCEVALUE " + counters.findCounter(MATCH_COUNTER.TOLERANCEVALUE).getValue());
LOGGER.info("RULES_ERRORTHRESHOLD " + counters.findCounter(MATCH_COUNTER.RULES_ERRORTHRESHOLD).getValue());
long errorThreshold = counters.findCounter(MATCH_COUNTER.RULES_ERRORTHRESHOLD).getValue();
LOGGER.info("COMPARISION_ERR_THRESHOLD " + counters.findCounter(MATCH_COUNTER.COMPARISION_ERR_THRESHOLD).getValue());
writeOutputFile(folderName,dateobj, outputPath + "," + successCount + "," + counters.findCounter(MATCH_COUNTER.TOLERANCEVALUE).getValue() + "," + errorThreshold + ","
+counters.findCounter(MATCH_COUNTER.COMPARISION_ERR_THRESHOLD).getValue()+","+ctrlCompResult,prop,cacheBucket);
String auditBucketName = "";
auditBucketName = rulesOutputLocation;
auditBucketName = auditBucketName.substring(auditBucketName.indexOf("//") + 2, auditBucketName.length() - 1);
auditBucketName = auditBucketName.substring(0, (auditBucketName.indexOf("/")));
String auditFileMovementPath = "delivery/" + dateFormat.format(dateobj);
auditFile = auditFile.replace(".xlsx","");
String inputFileName = (String) dataMap.get("InputFileName");
inputFileName = inputFileName.substring(0,inputFileName.lastIndexOf(".")).concat(".xlsx");
try {
LOGGER.info("Audit Bucket Name : " + auditBucketName);
LOGGER.info("Move parameter >>> outputbucketname : auditFileLocation : auditBucketName : auditFileMovementPath auditFile ");
LOGGER.info("Move parameter " + outputbucketname + ", " + auditFileLocation + " , " + auditBucketName + " , " + auditFileMovementPath + "/" + auditFile + "_" + inputFileName);
aWSS3Service.moveObjects(outputbucketname, auditFileLocation, auditBucketName, auditFileMovementPath +"/"+ auditFile +"_"+ inputFileName, prop);
} catch (Exception e) {
LOGGER.error("Exception while moving audit file ",e);
}
}
}else{
writeOutputFile(folderName,dateobj,"DuplicateRecords",prop,cacheBucket);
}
} catch (Exception e) {
LOGGER.error("Error in RulesDriver ", e);
}
return j;
}
}
Excel Mapper :
public class ExcelMapper extends Mapper<LongWritable, Text, Text, Text> {
#Override
protected void setup(Mapper<LongWritable, Text, Text, Text>.Context context)throws InterruptedException, IOException {
LOGGER.info("Inside Mapper Setup");
}
#Override
public void map(LongWritable key, Text value, Context context) throws InterruptedException, IOException {
}
#Override
protected void cleanup(Context context) throws IOException,
InterruptedException {
}
}
I am trying to copy files from 1 directory to another after re-naming them but keep getting the error:
Exception in thread "main" java.nio.file.NoSuchFileException: C:\Users\talain\Desktop\marketingOriginal\FX Rates\FY17\Q11\Week_12___February_12_2016.xls -> C:\Users\talain\Desktop\fakeOnedrive\FX Rates\FY17\Q11\0
My code:
public class shortenFilenameClass
{
static String absolutePathLocal = "C:\\Users\\talain\\Desktop\\marketingOriginal".replace('\\', '/'); //path to original files
static String absolutePathOnedrive= "C:\\Users\\talain\\Desktop\\fakeOnedrive".replace('\\', '/'); //path to onedrive
public static void main(String[] args) throws IOException
{
System.out.println(absolutePathLocal.length());
File local = new File(absolutePathLocal);
File onedrive = new File(absolutePathOnedrive);
int fileCount = 0;
File[] filesInDir = local.listFiles();
manipulateFiles(filesInDir, fileCount);
}
public static void manipulateFiles(File[] filesInDirPassed, int fileCount) throws IOException
{
for(int i = 0; i < filesInDirPassed.length; i++)
{
File currentFile = filesInDirPassed[i];
if(currentFile.isDirectory())
{
String local = currentFile.getAbsolutePath();
//String onedrive = current
manipulateFiles(currentFile.listFiles(), fileCount);
}
String name = currentFile.getName();
System.out.println("old filename: " + name);
String newName = String.valueOf(fileCount);
fileCount++;
File oldPath = new File(currentFile.getAbsolutePath());
System.out.println("oldPath: " + oldPath);
//currentFile.renameTo(new File(oldPath.toString()));
System.out.println("currentFile: " + currentFile);
String pathExtension = new String(currentFile.getAbsolutePath().substring(42));
pathExtension = pathExtension.replaceAll(name, newName);
File newPath = new File(absolutePathOnedrive + "/" + pathExtension);
System.out.println("newPath: " + newPath);
copyFileUsingJava7Files(oldPath, newPath);
File finalPath = new File(absolutePathOnedrive + "" + name);
//newPath.renameTo(new File(finalPath.toString()));
//copyFileUsingJava7Files(newPath, finalPath);
System.out.println("renamed: " + name + "to: " + newName + ", copied to one drive, and changed back to original name");
}
}
private static void copyFileUsingJava7Files(File source, File dest) throws IOException {
Files.copy(source.toPath(), dest.toPath());
}
}
Java is being told that C:\Users\talain\Desktop\marketingOriginal\FX Rates\FY17\Q11\Week_12___February_12_2016.xls -> C:\Users\talain\Desktop\fakeOnedrive\FX Rates\FY17\Q11\0 is a file, which isn't true. Check the line where the error originates from and see if you are using a lambda expression anywhere.
Basics of this program;
Runs a webcrawler based on PerentUrl and Keyword specified by the user in Controller (main). If the Keyword is found in the page text, the Url is then saved to an array list;
ArrayList UrlHits = new ArrayList();
Once the crawl is complete the program will call methods from the WriteFile class in the main to write a html file containing all the UrlHits.
WriteFile f = new WriteFile();
f.openfile(Search);
f.StartHtml();
f.addUrl(UrlHits);
f.EndHtml();
f.closeFile();
All but f.addUrl work correctly, creating a html file with the correct name and directory. But none of the strings from the ArrayList output to the file.
public static void main(String[] args) throws Exception {
RobotstxtConfig robotstxtConfig2 = new RobotstxtConfig();
String crawlStorageFolder = "/Users/Jake/Documents/sem 2/FYP/Crawler/TestData";
int numberOfCrawlers = 1;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
config.setMaxDepthOfCrawling(21);
config.setMaxPagesToFetch(24);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
Scanner perentUrl = new Scanner(System.in);
System.out.println("Enter full perant Url... example. http://www.domain.co.uk/");
String Url = perentUrl.nextLine();
Scanner keyword = new Scanner(System.in);
System.out.println("Enter search term... example. Pies");
String Search = keyword.nextLine();
System.out.println("Searching domain :" + Url);
System.out.println("Keyword:" + Search);
ArrayList<String> DomainsToInv = new ArrayList<String>();
ArrayList<String> SearchTerms = new ArrayList<String>();
ArrayList<String> UrlHits = new ArrayList<String>();
DomainsToInv.add(Url);
SearchTerms.add(Search);
controller.addSeed(Url);
controller.setCustomData(DomainsToInv);
controller.setCustomData(SearchTerms);
controller.start(Crawler.class, numberOfCrawlers);
WriteFile f = new WriteFile();
f.openfile(Search);
f.StartHtml();
f.addUrl(UrlHits);
f.EndHtml();
f.closeFile();
}
}
public class Crawler extends WebCrawler {
#Override
public void visit(Page page) {
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String domain = page.getWebURL().getDomain();
String path = page.getWebURL().getPath();
String subDomain = page.getWebURL().getSubDomain();
String parentUrl = page.getWebURL().getParentUrl();
String anchor = page.getWebURL().getAnchor();
System.out.println("Docid: " + docid);
System.out.println("URL: " + url);
System.out.println("Domain: '" + domain + "'");
System.out.println("Sub-domain: '" + subDomain + "'");
System.out.println("Path: '" + path + "'");
System.out.println("Parent page: " + parentUrl);
System.out.println("Anchor text: " + anchor);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
List<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
}
Header[] responseHeaders = page.getFetchResponseHeaders();
if (responseHeaders != null) {
System.out.println("Response headers:");
for (Header header : responseHeaders) {
System.out.println("\t" + header.getName() + ": " + header.getValue());
}
}
System.out.println("=============");
ArrayList<String> SearchTerms = (ArrayList<String>) this.getMyController().getCustomData();
ArrayList<String> UrlHits = (ArrayList<String>) this.getMyController().getCustomData();
for (String Keyword : SearchTerms) {
System.out.println("Searching Keyword: " + Keyword);
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
int KeywordCounter = 0;
String pagetext = htmlParseData.getText();
Pattern pattern = Pattern.compile(Keyword);
Matcher match1 = pattern.matcher(pagetext);
if (match1.find()) {
while (match1.find()) {
KeywordCounter++;
}
System.out.println("FOUND " + Keyword + " in page text. KeywordCount: " + KeywordCounter);
UrlHits.add(url);
for (int i = 0; i < UrlHits.size(); i++) {
System.out.print(UrlHits.get(i) + "\n");
System.out.println("=============");
}
} else {
System.out.println("Keyword search was unsuccesful");
System.out.println("=============");
}
}
}
}
public class WriteFile {
private Formatter x;
public void openfile(String keyword) {
try {
x = new Formatter(keyword + ".html");
} catch (Exception e) {
System.out.println("ERROR");
}
}
public void StartHtml() {
x.format("%s %n %s %n %s %n %s %n %s %n ", "<html>", "<head>", "</head>", "<body>", "<center>");
}
public void addUrl(ArrayList<String> UrlHits) {
for (String list : UrlHits) {
x.format("%s%s%s%s%s%n%s%n", "", list, "", "<br>");
}
}
public void EndHtml() {
x.format("%s %n %s %n %s %n", "</center>", "</body>", "</html>");
}
public void closeFile() {
x.close();
}
}
Apologies for the class headers outside the code blocks its a little fiddly. I have tried a few different "for" statements to get the method to output the array list but it doesn't seem to be having it. The strings are being added to the array list as i can call them using a for loop in the main. But when i pass the array list to the method addUrl, it comes up with squat. is there an easier way to use arraylists using formatters and .format?
Thanks for you help