I'm using the XMLStreamReader interface from javax.xml to parse an XML file. The file contains huge data amounts and single text nodes of several KB.
The validating and reading generally works very good, but I'm having trouble with text nodes that are larger than 15k characters. The problem occurs in this function
String foo = "";
if (xsr.getEventType() == XMLStreamConstants.CHARACTERS) {
foo = xsr.getText();
xsr.next(); // read next tag
}
return foo;
xsr being the stream reader. The text in the text node is 53'337 characters long in this particular case (but varies), however the xsr.getText() method only returns the first 15'537 of them. Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...
I did not find anything in the documentation or anywhere else about this. Is it intended behavior or can someone confirm/deny it? Am I using it the wrong way somehow?
Thanks
Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...
Actually, that is the idea :)
The parser is permitted to break up the event stream however it wishes, as long as it's consistent with the original document. That means it can, and often will, break up your text data into multiple events. How and when it chooses to do so is an implementation detail internal to the parser, and is essentially unpredictable.
So yes, if you receive multiple sequential CHARACTERS events, you need to append them manually. This is the price you pay for a low-level API.
Another option is the javax.xml.stream.isCoalescing option (documented in XMLStreamReader.next() or Using StAX), which automatically concatenates long text into a single string. The following JUint3 test passes.
Warning: isCoalescing probably shouldn't be used in production because if the document has lots of character references ( ) or entity references (<), it will cause a StackOverflowError!
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import junit.framework.TestCase;
public class XmlStreamTest extends TestCase {
public void testLengthInXMlStreamReader() throws XMLStreamException {
StringBuilder b = new StringBuilder();
b.append("<root>");
for (int i = 0; i < 65536; i++)
b.append("hello\n");
b.append("</root>");
InputStream is = new ByteArrayInputStream(b.toString().getBytes());
XMLInputFactory inputFactory = XMLInputFactory.newFactory();
inputFactory.setProperty("javax.xml.stream.isCoalescing", true);
XMLStreamReader reader = inputFactory.createXMLStreamReader(is);
reader.nextTag();
reader.next();
assertEquals(6 * 65536, reader.getTextLength());
}
}
Related
I have user readable file with several hundreds rows.
Each row is quite short(~20-30 symbols).
From time to time I need to execute equals operation with that string against another strings.
If Strings are different I need to find first row which differs. Sure I can do it manually:
in a loop find first character which differs then find previous and following '/n' but this code is not beaiful from my point of view.
Is there any other way to achieve it using some external libraries ?
There's no need for any library, what you ask is rather straightforward. But it's unique enough that no library would have it, so just write it yourself.
import java.nio.file.Files;
import java.util.*;
...
Optional<String> findFirstDifferentLine(Path file, Collection<String> rows) throws IOException {
try (var fileStream = Files.lines(file)) { // Need to close
var fileIt = fileStream.iterator();
var rowIt = rows.iterator();
while (fileIt.hasNext() && rowIt.hasNext()) {
var fileItem = fileIt.next();
if (!Objects.equal(fileItem, rowIt.next()) {
return Optional.of(fileItem);
}
}
return Optional.of(fileIt)
.filter(Iterator::hasNext)
.map(Iterator::next);
}
}
I have a method that starts creating JSON files in each of the folders in my tree.
public static void fill(List<String> subFoldersPaths) {
for (int i = 0; i < subFoldersPaths.size(); i++) {
String fullFileName = subFoldersPaths.get(i) + FILE_NAME;
String formatFullFileName = String.format(fullFileName, i)+"%d";
Runnable runnable = new JsonCreator(formatFullFileName);
new Thread(runnable).start();
}
}
List<String> subFoldersPaths is a list that contains paths to each folder in order.
Here is my folder structure:
I want each folder to be filled with files in a separate thread every 0.08 seconds. But my class will not fill every folder.
Here is a class that implements Runnable, which should perform the filling:
import com.epam.lab.model.Author;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import net.andreinc.mockneat.MockNeat;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.FileWriter;
import java.io.IOException;
public class JsonCreator implements Runnable {
private static Logger logger = LogManager.getLogger();
private static String fileName;
private static final int FILES_COUNT = 100;
public JsonCreator(String s){
this.fileName = s;
}
#Override
public void run() {
for (int i = 0; i < FILES_COUNT; i++) {
try {
String formatFullFileName = String.format(fileName, i)+".json";
FileWriter fileWriter = new FileWriter(formatFullFileName);
fileWriter.write(createJsonString());
fileWriter.close();
Thread.sleep(80);
} catch (IOException | InterruptedException e) {
logger.error("File was not created", e);
}
}
}
private static String createJsonString() {
MockNeat mockNeat = MockNeat.threadLocal();
Gson gson = new GsonBuilder()
.setPrettyPrinting()
.create();
String json = mockNeat
.reflect(Author.class)
.field("authorName", mockNeat.names().first())
.field("authorSurname", mockNeat.names().last())
.map(gson::toJson)
.val();
return json;
}
}
But this class fills not every folder with files. (maybe there is a problem with the file names) I can not figure it out.
And I want each folder below "foo" to be filled in a separate thread of JSON files in the amount of FILES_COUNT = 10
some examples of algorithm execution:
The folder structure is created with the participation of the random, so it is almost always different. but this does not affect the fact that files are not created in all folders
Your code is buggy; you cannot ever use that FileWriter constructor. Use new FileWriter(formatFullFileName, StandardCharsets.UTF_8), which is only in jdk11. If you're not on JDK11, you can't use FileWriter at all (it uses platform default encoding, and that is not acceptable; JSON must be in UTF-8 as per the JSON spec, and you have no guarantee that UTF-8 is your platform default).
you aren't guarding your FileWriter with an ARM block - you should add that.
In the initial block, formatFullFileName is a variable that is a format string. In the run() method, it's the opposite (it's the result of running a String.format op on one). Makes your code very hard to read.
Most likely your filenames are incorrect. You should be using List<Path> which would have removed any doubt. If your List<String> subFoldersPaths contains, for example, /home/misnomer/project/foo/1stLayerSubFolder0 in it, and the constant FILE_NAME (which you did not put in your pastes) is, say, example, then the path for the very first file to be created becomes: /home/misnomer/project/foo/1stLayerSubFolder0example0.json which is not what you wanted - you're missing a slash.
NB: If using the newer path API, writing a string to a file becomes vastly simpler: Files.write(path, string) is all you need (and note that the Files API defaults to UTF-8, unlike most other parts of the java libraries that involve turning strings to bytes or vice versa).
The paste needs more info, or you should debug this on your own: Print when you write a file, preferably including the thread ID (you can get it with Thread.currentThread().getName()). That's how programming works: You don't just stare at it, go --heck, I dunno, better ask stack overflow!-- and then give up. You debug it. Use a debugger, or if you can't/don't want to, use the poor man's debugger: Add a whole bunch of System.out.println statements. Go through your code and imagine (write it down if you have to) which each step is doing. Then, add a println statement that confirms this. The very place where what the program says it is doing does not match with what you thought it would do? That's where a bug is. Fix it, and keep going until all bugs are eliminated.
I'd like to read in a file and replace some text with new text. It would be simple using asm and int 21h but I want to use the new java 8 streams.
Files.write(outf.toPath(),
(Iterable<String>)Files.lines(inf)::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
Somewhere in there I'd like a lines.replace("/*replace me*/","new Code()\n");. The new lines are because I want to test inserting a block of code somewhere.
Here's a play example, that doesn't work how I want it to, but compiles. I just need a way to intercept the lines from the iterator, and replace certain phrases with code blocks.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import static java.nio.file.StandardOpenOption.*;
import java.util.Arrays;
import java.util.stream.Stream;
public class FileStreamTest {
public static void main(String[] args) {
String[] ss = new String[]{"hi","pls","help","me"};
Stream<String> stream = Arrays.stream(ss);
try {
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)stream::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
} catch (IOException ex) {}
//// I'd like to hook this next part into Files.write part./////
//reset stream
stream = Arrays.stream(ss);
Iterable<String> it = stream::iterator;
//I'd like to replace some text before writing to the file
for (String s : it){
System.out.println(s.replace("me", "my\nreal\nname"));
}
}
}
edit: I've gotten this far and it works. I was trying with filter and maybe it isn't really necessary.
Files.write(Paths.get("tmp.txt"),
(Iterable<String>)(stream.map((s) -> {
return s.replace("me", "my\nreal\nname");
}))::iterator,
CREATE, WRITE, TRUNCATE_EXISTING);
The Files.write(..., Iterable, ...) method seems tempting here, but converting the Stream to an Iterable makes this cumbersome. It also "pulls" from the Iterable, which is a bit odd. It would make more sense if the file-writing method could be used as the stream's terminal operation, within something like forEach.
Unfortunately, most things that write throw IOException, which isn't permitted by the Consumer functional interface that forEach expects. But PrintWriter is different. At least, its writing methods don't throw checked exceptions, although opening one can still throw IOException. Here's how it could be used.
Stream<String> stream = ... ;
try (PrintWriter pw = new PrintWriter("output.txt", "UTF-8")) {
stream.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(pw::println);
}
Note the use of forEachOrdered, which prints the output lines in the same order in which they were read, which is presumably what you want!
If you're reading lines from an input file, modifying them, and then writing them to an output file, it would be reasonable to put both files within the same try-with-resources statement:
try (Stream<String> input = Files.lines(Paths.get("input.txt"));
PrintWriter output = new PrintWriter("output.txt", "UTF-8"))
{
input.map(s -> s.replaceAll("foo", "bar"))
.forEachOrdered(output::println);
}
I have some very simple XML that I wish to unmarshall. I'm only interested in the values for one of the elements which repeats. Here's some sample XML:
<Document>
<HeaderGuff>Whatever</HeaderGuff>
<Foos>
<FooId>1</FooId>
<FooId>2</FooId>
</Foos>
</Document>
I would like to use JAXB to allow me to iterate of the FooId's as a Long.
The usual examples require creating a data class with setFooId and getFooId methods. Is there a way to unmarshall directly to Long such that I can do this:
for ( Long fooId : <something JAXB> )
I do not want to load all the identifiers into memory at once as there are potentially many of them, and they are only needed one at a time for individual processing.
Since you are only interested in one of the elements and don't want to bring all the data into memory at once I would use a StAX parser (included in the JDK/JRE since Java SE 6) instead of JAXB for this use case.
You would then advance your XMLStreamReader to the FooId element, process it and then advance it to the next element.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource source = new StreamSource("input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(source);
while(xsr.hasNext()) {
if(xsr.isStartElement() && "FooId".equals(xsr.getLocalName())) {
long value = Long.valueOf(xsr.getElementText());
System.out.println(value);
}
xsr.next();
}
xsr.close();
}
}
By 'output steam' i mean any object which receives a sequence of bytes, or characters or whatever. So, java.io.OutputStream, but also java.io.Writer, javax.xml.stream.XMLStreamWriter's writeCharacters method, and so on.
I'm writing mock-based tests for a class whose main function is to write a stream of data to one of these (the XMLStreamWriter, as it happens).
The problem is that the stream of data is written in a series of calls to the write method, but what matters is not the calls, but the data. For example, given an XMLStreamWriter out, these:
out.writeCharacters("Hello, ");
out.writeCharacters("world!");
Are equivalent to this:
out.writeCharacters("Hello, world!");
It really doesn't matter (for my purposes) which happens. There will be some particular sequence of calls, but i don't care what it is, so i don't want to write expectations for that particular sequence. I just want to expect a certain stream of data to be written any which way.
One option would be to switch to state-based testing. I could accumulate the data in a buffer, and make assertions about it. But because i'm writing XML, that would mean making some fairly complex and ugly assertions. Mocking seems a much better way of dealing with the larger problem of writing XML.
So how do i do this with a mock?
I'm using Moxie for mocking, but i'm interested in hearing about approaches with any mocking library.
A fairly elegant strategy to test output or input streams is to use PipedInputStream and PipedOutputStream classes. You can wire them together in the set up of the test, and then check what has been written after the target method is executed.
You can work the other direction preparing some input and then let the test read this prepared data from the input stream as well.
In your case, you could just mock that "out" variable with a PipedOutputStream, and plug a PipedInputStream to it this way:
private BufferedReader reader;
#Before
public void init() throws IOException {
PipedInputStream pipeInput = new PipedInputStream();
reader = new BufferedReader(
new InputStreamReader(pipeInput));
BufferedOutputStream out = new BufferedOutputStream(
new PipedOutputStream(pipeInput))));
//Here you will have to mock the output somehow inside your
//target object.
targetObject.setOutputStream (out);
}
#Test
public test() {
//Invoke the target method
targetObject.targetMethod();
//Check that the correct data has been written correctly in
//the output stream reading it from the plugged input stream
Assert.assertEquals("something you expects", reader.readLine());
}
I'll admit that I'm probably partial to using a ByteArrayOutputStream as the lowest level OutputStream, fetching the data after execution and peforming whatever assertions that are needed. (perhaps using SAX or other XML parser to read in the data and dive through the structure)
If you want to do this with a mock, I'll admit I'm somewhat partial to Mockito, and I think you could accomplish what you're looking to do with a custom Answer which when the user invokes writeCharacters on your mock, would simply append their argument to a Buffer, and then you can make assertions on it afterwards.
Here's what I have in my head (hand written, and haven't executed so syntax issues are to be expected :) )
public void myTest() {
final XMLStreamWriter mockWriter = Mockito.mock(XMLStreamWriter.class);
final StringBuffer buffer = new StringBuffer();
Mockito.when(mockWriter.writeCharacters(Matchers.anyString())).thenAnswer(
new Answer<Void>() {
Void answer(InvocationOnMock invocation) {
buffer.append((String)invocation.getArguments()[0]);
return null;
}
});
//... Inject the mock and do your test ...
Assert.assertEquals("Hello, world!",buffer.toString());
}
(Disclaimer: I'm the author of Moxie.)
I assume you want to do this using logic embedded in the mock so that calls that violate your expectation fail fast. Yes, this is possible - but not elegant/simple in any mocking library I know of. (In general mock libraries are good at testing the behavior of method calls in isolation/sequence, but poor at testing more complex interactions between calls over the lifecycle of the mock.) In this situation most people would build up a buffer as the other answers suggest - while it doesn't fail fast, the test code is simpler to implement/understand.
In the current version of Moxie, adding custom parameter-matching behavior on a mock means writing your own Hamcrest matcher. (JMock 2 and Mockito also let you use custom Hamcrest matchers; EasyMock lets you specify custom matchers that extend a similar IArgumentMatcher interface.)
You'll want a custom matcher that will verify that the string passed to writeCharacters forms the next part of the sequence of text you expect to be passed into that method over time, and which you can query at the end of the test to make sure it's received all of the expected input. An example test following this approach using Moxie is here:
http://code.google.com/p/moxiemocks/source/browse/trunk/src/test/java/moxietests/StackOverflow6392946Test.java
I've reproduced the code below:
import moxie.Mock;
import moxie.Moxie;
import moxie.MoxieOptions;
import moxie.MoxieRule;
import moxie.MoxieUnexpectedInvocationError;
import org.hamcrest.BaseMatcher;
import org.hamcrest.Description;
import org.junit.Assert;
import org.junit.Rule;
import org.junit.Test;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
// Written in response to... http://stackoverflow.com/questions/6392946/
public class StackOverflow6392946Test {
private static class PiecewiseStringMatcher extends BaseMatcher<String> {
private final String toMatch;
private int pos = 0;
private PiecewiseStringMatcher(String toMatch) {
this.toMatch = toMatch;
}
public boolean matches(Object item) {
String itemAsString = (item == null) ? "" : item.toString();
if (!toMatch.substring(pos).startsWith(itemAsString)) {
return false;
}
pos += itemAsString.length();
return true;
}
public void describeTo(Description description) {
description.appendText("a series of strings which when concatenated form the string \"" + toMatch + '"');
}
public boolean hasMatchedEntirely() {
return pos == toMatch.length();
}
}
#Rule
public MoxieRule moxie = new MoxieRule();
#Mock
public XMLStreamWriter xmlStreamWriter;
// xmlStreamWriter gets invoked with strings which add up to "blah blah", so the test passes.
#Test
public void happyPathTest() throws XMLStreamException{
PiecewiseStringMatcher addsUpToBlahBlah = new PiecewiseStringMatcher("blah blah");
Moxie.expect(xmlStreamWriter).anyTimes().on().writeCharacters(Moxie.argThat(addsUpToBlahBlah));
xmlStreamWriter.writeCharacters("blah ");
xmlStreamWriter.writeCharacters("blah");
Assert.assertTrue(addsUpToBlahBlah.hasMatchedEntirely());
}
// xmlStreamWriter's parameters don't add up to "blah blah", so the test would fail without the catch clause.
// Also note that the final assert is false.
#Test
public void sadPathTest1() throws XMLStreamException{
// We've specified the deprecated IGNORE_BACKGROUND_FAILURES option as otherwise Moxie works very hard
// to ensure that unexpected invocations can't get silently swallowed (so this test will fail).
Moxie.reset(xmlStreamWriter, MoxieOptions.IGNORE_BACKGROUND_FAILURES);
PiecewiseStringMatcher addsUpToBlahBlah = new PiecewiseStringMatcher("blah blah");
Moxie.expect(xmlStreamWriter).anyTimes().on().writeCharacters(Moxie.argThat(addsUpToBlahBlah));
xmlStreamWriter.writeCharacters("blah ");
try {
xmlStreamWriter.writeCharacters("boink");
Assert.fail("above line should have thrown a MoxieUnexpectedInvocationError");
} catch (MoxieUnexpectedInvocationError e) {
// as expected
}
// In a normal test we'd assert true here.
// Here we assert false to verify that the behavior we're looking for has NOT occurred.
Assert.assertFalse(addsUpToBlahBlah.hasMatchedEntirely());
}
// xmlStreamWriter's parameters add up to "blah bl", so the mock itself doesn't fail.
// However the final assertion fails, as the matcher didn't see the entire string "blah blah".
#Test
public void sadPathTest2() throws XMLStreamException{
PiecewiseStringMatcher addsUpToBlahBlah = new PiecewiseStringMatcher("blah blah");
Moxie.expect(xmlStreamWriter).anyTimes().on().writeCharacters(Moxie.argThat(addsUpToBlahBlah));
xmlStreamWriter.writeCharacters("blah ");
xmlStreamWriter.writeCharacters("bl");
// In a normal test we'd assert true here.
// Here we assert false to verify that the behavior we're looking for has NOT occurred.
Assert.assertFalse(addsUpToBlahBlah.hasMatchedEntirely());
}
}