How to replace a part of a string in an xml file? - java

I have an xml file with something like this:
<Verbiage>
The whiskers plots are based on the responses of incarcerated
<Choice>
<Juvenile> juveniles who have committed sexual offenses. </Juvenile>
<Adult> adult sexual offenders. </Adult>
</Choice>
If the respondent is a
<Choice>
<Adult>convicted sexual offender, </Adult>
<Juvenile>juvenile who has sexually offended, </Juvenile>
</Choice>
#his/her_lc# percentile score, which defines #his/her_lc# position
relative to other such offenders, should be taken into account as well as #his/her_lc# T score. Percentile
scores in the top decile (> 90 %ile) of such offenders suggest that the respondent
may be defensive and #his/her_lc# report should be interpreted with this in mind.
</Verbiage>
I am trying to find a way to parse the xml file (I've been using DOM), search for #his/her_lc# and replace that with "her". I've tried using FileReader,BufferedReader, string.replaceAll, FileWriter, but those didn't work.
Is there a way I could do this using XPath?
Ultimately I want to search this xml file for this string and replace it with another string.
do I have to add a tag around the string I want it parse it that way?
Code I tried:
protected void parse() throws ElementNotValidException {
try {
//Parse xml File
File inputXML = new File("template.xml");
DocumentBuilderFactory parser = DocumentBuilderFactory.newInstance(); // new instance of doc builder
DocumentBuilder dParser = parser.newDocumentBuilder(); // calls it
Document doc = dParser.parse(inputXML); // parses file
FileReader reader = new FileReader(inputXML);
String search = "#his/her_lc#";
String newString;
BufferedReader br = new BufferedReader(reader);
while ((newString = br.readLine()) != null){
newString.replaceAll(search, "her");
}
FileWriter writer = new FileWriter(inputXML);
writer.write(newString);
writer.close();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
Code I was given to fix:
try {
File inputXML = new File("template.xml"); // creates new input file
DocumentBuilderFactory parser = DocumentBuilderFactory.newInstance(); // new instance of doc builder
DocumentBuilder dParser = parser.newDocumentBuilder(); // calls it
Document doc = dParser.parse(inputXML); // parses file
doc.getDocumentElement().normalize();
NodeList pList = doc.getElementsByTagName("Verbiage"); // gets element by tag name and places into list to begin parsing
int gender = 1; // gender has to be taken from the response file, it is hard coded for testing purposes
System.out.println("----------------------------"); // new line
// loops through the list of Verbiage tags
for (int temp = 0; temp < pList.getLength(); temp++) {
Node pNode = pList.item(0); // sets node to temp
if (pNode.getNodeType() == Node.ELEMENT_NODE) { // if the node type = the element node
Element eElement = (Element) pNode;
NodeList pronounList = doc.getElementsByTagName("pronoun"); // gets a list of pronoun element tags
if (gender == 0) { // if the gender is male
int count1 = 0;
while (count1 < pronounList.getLength()) {
if ("#he/she_lc#".equals(pronounList.item(count1).getTextContent())) {
pronounList.item(count1).setTextContent("he");
}
if ("#he/she_caps#".equals(pronounList.item(count1).getTextContent())) {
pronounList.item(count1).setTextContent("He");
}
if ("#his/her_lc#".equals(pronounList.item(count1).getTextContent())) {
pronounList.item(count1).setTextContent("his");
}
if ("#his/her_caps#".equals(pronounList.item(count1).getTextContent())) {
pronounList.item(count1).setTextContent("His");
}
if ("#him/her_lc#".equals(pronounList.item(count1).getTextContent())) {
pronounList.item(count1).setTextContent("him");
}
count1++;
}
pNode.getNextSibling();
} else if (gender == 1) { // female
int count = 0;
while (count < pronounList.getLength()) {
if ("#he/she_lc#".equals(pronounList.item(count).getTextContent())) {
pronounList.item(count).setTextContent("she");
}
if ("#he/she_caps3".equals(pronounList.item(count).getTextContent())) {
pronounList.item(count).setTextContent("She");
}
if ("#his/her_lc#".equals(pronounList.item(count).getTextContent())) {
pronounList.item(count).setTextContent("her");
}
if ("#his/her_caps#".equals(pronounList.item(count).getTextContent())) {
pronounList.item(count).setTextContent("Her");
}
if ("#him/her_lc#".equals(pronounList.item(count).getTextContent())) {
pronounList.item(count).setTextContent("her");
}
count++;
}
pNode.getNextSibling();
}
}
}
// write the content to file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
System.out.println("-----------Modified File-----------");
StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, new StreamResult(new FileOutputStream("template.xml"))); // writes changes to file
} catch (Exception e) {
e.printStackTrace();
}
}
This code I think would work if I could figure out how to associate the tag Pronoun with the pronounParser that this code is in.

I used this example and your template.xml, and I think it works.
public static void main(String[] args) {
File inputXML = new File("template.xml");
BufferedReader br = null;
String newString = "";
StringBuilder strTotale = new StringBuilder();
try {
FileReader reader = new FileReader(inputXML);
String search = "#his/her_lc#";
br = new BufferedReader(reader);
while ((newString = br.readLine()) != null){
newString = newString.replaceAll(search, "her");
strTotale.append(newString);
}
} catch ( IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // calls it
finally
{
try {
br.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println(strTotale.toString());
}
First you must reassign the result of replaceAll:
newString = newString.replaceAll(search, "her");
Second I used a StringBuffer to collect all lines.
I hope this help.

Since strings are immutable you can not modify them, use
StringBuilder/StringBuffer
instead of String.
FileReader reader = new FileReader(inputXML);
String search = "#his/her_lc#";
String newString;
StringBuffer str;
BufferedReader br = new BufferedReader(reader);
while ((newString = br.readLine()) != null){
str.append(newString.replaceAll(search, "her"));
}
FileWriter writer = new FileWriter(inputXML);
writer.write(str);
writer.close();

Related

Converting CSV file to Hierarchy XML with JAVA

We have a program in Java that needs to convert CSV file to Hierarchy XML:
the output should be like this:
`<?xml version="1.0" encoding="UTF-8"?>
<UteXmlComuniction xmlns="http://www....../data">
<Client Genaral Data>
<Client>
<pfPg></pfPg>
<name>Arnold</name>
<Family>Bordon</family>
</Client>
<Contract>
<ContractDetail>
<Contract>100020</Contract>
<ContractYear>2019</ContractYear>
</ContractDetail>
</Contract>
</Client Genaral Data>``
But for CSV file we are flexible, we can define it as we want. I thought maybe in this way it works:
"UteXmlComuniction/ClientGeneralData/Client/pfpg", "UteXmlComuniction/ClientGeneralData/Client/name" ,
"UteXmlComuniction/ClientGeneralData/Client/Family" , ...```
This is our code, but it just gives me the flat XML. Also I can not insert "/" character in CSV file, because program can not accept this character.
public class XMLCreators {
// Protected Properties
protected DocumentBuilderFactory domFactory = null;
protected DocumentBuilder domBuilder = null;
public XMLCreators() {
try {
domFactory = DocumentBuilderFactory.newInstance();
domBuilder = domFactory.newDocumentBuilder();
} catch (FactoryConfigurationError exp) {
System.err.println(exp.toString());
} catch (ParserConfigurationException exp) {
System.err.println(exp.toString());
} catch (Exception exp) {
System.err.println(exp.toString());
}
}
public int convertFile(String csvFileName, String xmlFileName,
String delimiter) {
int rowsCount = -1;
try {
Document newDoc = domBuilder.newDocument();
// Root element
Element rootElement = newDoc.createElement("XMLCreators");
newDoc.appendChild(rootElement);
// Read csv file
BufferedReader csvReader;
csvReader = new BufferedReader(new FileReader(csvFileName));
int line = 0;
List<String> headers = new ArrayList<String>(5);
String text = null;
while ((text = csvReader.readLine()) != null) {
StringTokenizer st = new StringTokenizer(text, delimiter, false);
String[] rowValues = new String[st.countTokens()];
int index = 0;
while (st.hasMoreTokens()) {
String next = st.nextToken();
rowValues[index++] = next;
}
if (line == 0) { // Header row
for (String col : rowValues) {
headers.add(col);
}
} else { // Data row
rowsCount++;
Element rowElement = newDoc.createElement("row");
rootElement.appendChild(rowElement);
for (int col = 0; col < headers.size(); col++) {
String header = headers.get(col);
String value = null;
if (col < rowValues.length) {
value = rowValues[col];
} else {
// ?? Default value
value = "";
}
Element curElement = newDoc.createElement(header);
curElement.appendChild(newDoc.createTextNode(value));
rowElement.appendChild(curElement);
}
}
line++;
}
ByteArrayOutputStream baos = null;
OutputStreamWriter osw = null;
try {
baos = new ByteArrayOutputStream();
osw = new OutputStreamWriter(baos);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
Source src = new DOMSource(newDoc);
Result result = new StreamResult(osw);
aTransformer.transform(src, result);
osw.flush();
System.out.println(new String(baos.toByteArray()));
} catch (Exception exp) {
exp.printStackTrace();
} finally {
try {
osw.close();
} catch (Exception e) {
}
try {
baos.close();
} catch (Exception e) {
}
}
// Output to console for testing
// Resultt result = new StreamResult(System.out);
} catch (IOException exp) {
System.err.println(exp.toString());
} catch (Exception exp) {
System.err.println(exp.toString());
}
return rowsCount;
// "XLM Document has been created" + rowsCount;
}
}
Do you have any suggestion that how should I modify the code or how can I change my CSV in order to have a Hierarchy XML?
csv:
pfPg;name;Family;Contract;ContractYear
There are several libs for reading csv in Java. Store the values in a container e.g. hashmap.
Then create java classes representing your xml structure.
class Client {
private String pfPg;
private String name;
private String Family
}
class ClientGenaralData {
private Client client;
private Contract contract;
}
Do the mapping from csv to your Java classes by writing custom code or a mapper like dozer... Then use xml binding with Jackson or JAXB to create xml from Java objects.
Jackson xml
Dozer HowTo

How do you read and process a line that can have multiple objects on one line JAVA

I want to read a CSV file for a number of elements that can either be metal or non-metal and each line in the CSV file can have multiple elements. If it has multiple elements, all rows of the file will have that amount of elements. A valid line will look like:
<symbol,name,atomNum,mass,other/><symbol,name,atomNum,mass,other/>
where other is a character if its a nonmetal and double if its a metal. This is what I have done so far and am not sure how to read in multiple element objects on the line
public static void readValues(String fileName)
{
if (!constructed)
{
FileInputStream fileStrm = null;
InputStreamReader rdr;
BufferedReader bufRdr;
String line;
int lineNum;
try
{
fileStrm = new FileInputStream(fileName);
rdr = new InputStreamReader(fileStrm);
bufRdr = new BufferedReader(rdr);
lineNum = 0;
line = bufRdr.readLine();
while(line != null)
{
lineNum++;
elements[lineNum - 1] = processElements(line);
line = bufRdr.readLine();
System.out.println(elements);
}
fileStrm.close();
}
catch(IOException e)
{
if(fileStrm != null)
{
try
{
fileStrm.close();
constructed = true;
}
catch (IOException ex2)
{
}
}
System.out.print("error in file processing: " + e.getMessage());
}
}
}
private static Element processElements(String line)
{
char chr;
Element element;
String[] lineArray = line.split(",");
if(lineArray[4] == chr)
{
element = new NonMetal();
elements.setState(lineArray[4]);
}
else
{
element = new Metal();
element.setConduct(Double.parseDouble(lineArray[4]));
}
element.setSymbol(lineArray[0]);
element.setName(lineArray[1]);
element.setAtomNum(Integer.parseInt(lineArray[2]));
element.setMass(Double.parseDouble(lineArray[3]));
return elements;
}
I have a super element class and two subclasses, metal and nonmetal, each containing the standard methods for a class
You could use a regex to identify a group between < and />, while processing the line.
<(\w+),(\w+),(\w+),(\w+),(\w+)\/>
https://regex101.com/r/bKnYJZ/1/
String line = "<symbol,name,atomNum,mass,other/><symbol,name,atomNum,mass,other/>";
String regex = "<(\\w+),(\\w+),(\\w+),(\\w+),(\\w+)\\/>";
Matcher matcher = Pattern.compile(regex)
.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
System.out.println(matcher.group(5));
}
You could put this code into your processElements method to identify each element between < />
Aside of your problem, I suggest the use of try-resource-close for an easier and shorter resource handling.
try(FileInputStream fileStrm = new FileInputStream(fileName);
InputStreamReader rdr = new InputStreamReader(fileStrm);
BufferedReader bufRdr = new BufferedReader(rdr))
{
// here your code and no need to take care of resource closing...
}
catch(IOException e)
{
System.out.print("error in file processing: " + e.getMessage());
}

Implemented Jaccard distance into ANTLR to find similarity of java code

After a while, I was successfully able to get an unique id from a file .java using ANTLR. And then I divide that unique id to 4-gram using N-gram, thanks to ANTLR. This is my code:
public void runAlgoritma(File mainFile, List<String> fileJlist)
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(FileUtama.getAbsolutePath()));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
lexer.preserveWhitespacesAndComments = false;
try {
lexer.setCharStream(new ANTLRReaderStream(in));
} catch (IOException e) {
e.printStackTrace();
}
final CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
tokens.LT(10); // paksa force load
Antlr3JavaParser parser = new Antlr3JavaParser(tokens);
StringBuilder sbr = new StringBuilder();
List tokenList = tokens.getTokens();
for (int i = 0; i < tokenList.size(); i++) {
org.antlr.runtime.Token token = (org.antlr.runtime.Token) tokenList.get(i);
int text = token.getType();
sbr.append(text);
}
String mainFile = sbr.toString();
StringBuffer stringBuffer = new StringBuffer();
for (String term : new NgramAnalyzer(4).analyzer(mainFile)) {
stringBuffer.append(term + "\n");
}
System.out.println(stringBuffer);
I was wondering, How can I compare two java source codes using jaccard similiarity from the n-gram that I have made ?

splitting of csv file based on column value in java

I want to split csv file into multiple csv files depending on column value.
Structure of csv file: Name,Id,Dept,Course
abc,1,CSE,Btech
fgj,2,EE,Btech
(Rows are not separated by ; at end)
If value of Dept is CSE or ME , write it to file1.csv, if value is ECE or EE write it to file2.csv and so on.
Can I use drools for this purpose? I don't know drools much.
Any help how it can be done?
This is what I have done yet:
public void run() {
String csvFile = "C:/csvFiles/file1.csv";
BufferedReader br = null;
BufferedWriter writer=null,writer2=null;
String line = "";
String cvsSplitBy = ",";
String FileName = "C:/csvFiles/file3.csv";
String FileName2 = "C:/csvFiles/file4.csv";
try {
writer = new BufferedWriter(new FileWriter(FileName));
writer2 = new BufferedWriter(new FileWriter(FileName2));
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] values=line.split(cvsSplitBy);
if(values[2].equals("CSE"))
{
writer.write(line);
}
else if(values[2].equals("ECE"))
{
writer2.write(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
writer.flush();
writer.close();
writer2.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
1) First find column index using header row or if header is not present then by index
2) Follow below algorithm which will result map of key value where key is column by which split is performed
global resultMap;
Method add(key,row) {
data = (resultMap.containsKey(key))? resultMap.get(key):new ArrayList<String>();
data.add(row);
resultMap.put(key, data );
}
Method getSplittedMap(List rows) {
for (String currentRow : rows) {
add(key, currentRow);
}
return resultMap;
}
hope this helps.
FileOutputStream f_ECE = new FileOutputStream("provideloaction&filenamehere");
FileOutputStream f_CSE_ME = new FileOutputStream("provideloaction&filenamehere");
FileInputputStream fin = new FileinputStream("provideloaction&filenamehere");
int size = fin.available(); // find the length of file
byte b[] = new byte[size];
fin.read(b);
String s = new String(b); // file copied into string
String s1[] = s.split("\n");
for (int i = 0; i < s1.length; i++) {
String s3[] = s1[i].split(",")
if (s3[2].equals("ECE"))
f_ECE.write(s1.getBytes());
if (s3[2].equals("CSE") || s3.equals("EEE"))
f_CSE_ME.write(payload.getBytes());
}

Strip whitespace and newlines from XML in Java

Using Java, I would like to take a document in the following format:
<tag1>
<tag2>
<![CDATA[ Some data ]]>
</tag2>
</tag1>
and convert it to:
<tag1><tag2><![CDATA[ Some data ]]></tag2></tag1>
I tried the following, but it isn't giving me the result I am expecting:
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
dbfac.setIgnoringElementContentWhitespace(true);
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new FileInputStream("/tmp/test.xml"));
Writer out = new StringWriter();
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "no");
tf.transform(new DOMSource(doc), new StreamResult(out));
System.out.println(out.toString());
Working solution following instructions in the question's comments by #Luiggi Mendoza.
public static String trim(String input) {
BufferedReader reader = new BufferedReader(new StringReader(input));
StringBuffer result = new StringBuffer();
try {
String line;
while ( (line = reader.readLine() ) != null)
result.append(line.trim());
return result.toString();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
recursively traverse the document. remove any text nodes with blank content. trim any text nodes with non-blank content.
public static void trimWhitespace(Node node)
{
NodeList children = node.getChildNodes();
for(int i = 0; i < children.getLength(); ++i) {
Node child = children.item(i);
if(child.getNodeType() == Node.TEXT_NODE) {
child.setTextContent(child.getTextContent().trim());
}
trimWhitespace(child);
}
}
As documented in an answer to another question, the relevant function would be DocumentBuilderFactory.setIgnoringElementContentWhitespace(), but - as pointed out here already - that function requires the use of a validating parser, which requires an XML schema, or some such.
Therefore, your best bet is to iterate through the Document you get from the parser, and remove all nodes of type TEXT_NODE (or those TEXT_NODEs which contain only whitespace).
I support #jtahlborn's answer. Just for completeness, I adapted his solution to completely remove the whitespace-only elements instead of just clearing them.
public static void stripEmptyElements(Node node)
{
NodeList children = node.getChildNodes();
for(int i = 0; i < children.getLength(); ++i) {
Node child = children.item(i);
if(child.getNodeType() == Node.TEXT_NODE) {
if (child.getTextContent().trim().length() == 0) {
child.getParentNode().removeChild(child);
i--;
}
}
stripEmptyElements(child);
}
}
Java8+transformer does not create any but Java10+transformer puts everywhere empty lines. I still want to keep a pretty indents. This is my helper function to create xml string from any DOMElement instance such as doc.getDocumentElement() root node.
public static String createXML(Element elem) throws Exception {
DOMSource source = new DOMSource(elem);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
//transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
//transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"yes");
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
transformer.transform(source, result);
// Java10-transformer adds unecessary empty lines, remove empty lines
BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));
StringBuilder buf = new StringBuilder();
try {
final String NL = System.getProperty("line.separator", "\r\n");
String line;
while( (line=reader.readLine())!=null ) {
if (!line.trim().isEmpty()) {
buf.append(line);
buf.append(NL);
}
}
} finally {
reader.close();
}
return buf.toString(); //writer.toString();
}
In order to strip whitespace and newlines from XML in Java try the following solution which uses StringBuffer() and conditional logic:
public static String LimpaXML(String xml) {
StringBuffer result = new StringBuffer();
char c_prev = '\0';
xml = xml.trim();
int len = xml.length();
for (int i=0; i<len; i++) {
char c = xml.charAt(i);
char c_next = (i+1 < len) ? xml.charAt(i+1) : '\0';
if (c == '\n') continue;
if (c == '\r') continue;
if (c == '\t') c = ' ';
if (c == ' ') {
if (c_prev == ' ') continue;
if (c_next == '\0') continue;
if (c_prev == '>') continue;
if (c_next == '>') continue;
}
result.append(c);
c_prev = c;
}
return result.toString();
}
Try this code. read and write methods in FileStream ignore whitespace and indents.
try {
File f1 = new File("source.xml");
File f2 = new File("destination.xml");
InputStream in = new FileInputStream(f1);
OutputStream out = new FileOutputStream(f2);
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0){
out.write(buf, 0, len);
}
in.close();
out.close();
System.out.println("File copied.");
} catch(FileNotFoundException ex){
System.out.println(ex.getMessage() + " in the specified directory.");
System.exit(0);
} catch(IOException e7){
System.out.println(e7.getMessage());
}

Categories