I have a folder with thousands of text files with JSON content that I need to read, convert into a POJO and then save into a MySQL database. I intend to use a Spring Batch application.
Here is the issue so far, the research I have done only shows reading multiple CSV files or XML files and no JSON data. Specifically, I need to convert this method for parsing a CSV file into a JSON parser.
#Bean
public FlatFileItemReader<Person> reader() {
FlatFileItemReader<Person> reader = new FlatFileItemReader<Person>();
reader.setLineMapper(new DefaultLineMapper<Person>() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[] {"firstname", "lastname", "email", "age"});
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
setTargetType(Person.class);
}});
}});
return reader;
}
This code parses a JSON file:
JSONParser parser = new JSONParser();
Object obj = parser.parse(new FileReader("C:\\path\\sample.json"));
The method might be something like this
#Bean
Public FileReader<Person> reader() {
FileReader<Person> reader = new FileReader<Person>();
/**** need help on what to do here ****/
return reader;
}
Also seeing that I am reading all the files in a directory, I am passing the value of that directory in this format
#Value(value="C:\\path\\*.json")
private Resource[] resources;
So I need help on how to use this value (directory for all files) instead of what I showed earlier (single file location)
Object obj = parser.parse(new FileReader("C:\\path\\sample.json"));
You can use the MultiResourceItemReader with a JsonItemReader as delegate. Here is a quick example:
#Bean
public JsonItemReader<Person> jsonItemReader(Resource[] resources) {
JsonItemReader<Person> delegate = new JsonItemReaderBuilder<Person>()
.jsonObjectReader(new JacksonJsonObjectReader<>(Person.class))
.name("personItemReader")
.build();
MultiResourceItemReader<Person> reader = new MultiResourceItemReader<Person>();
reader.setDelegate(delegate);
reader.setResources(resources);
return reader;
}
You can find more details about the JsonItemReader in the reference documentation.
Hope this helps.
https://docs.spring.io/spring-batch/4.0.x/reference/html/readersAndWriters.html#multiFileInput - I only found this after I found this answer
Related
I'm new to Spring Integration and I have a project where I want to process the contents of a zip file. The zip contains n number of tiff files and one xml file. The xml file contains information about how many tiff files should be in the zip so I need to compare the number of tiff files with the info in the xml file.
My process is: poll a directory for a zip file, move the zip file to a "working" directory, unzip the file, find the xml file and read the contents, count the tiff files and confirm that the names and number of files matches the xml data.
I have all of the steps working except for the stage where I try to read the zip and tiff files. My problem is that the UnZipTransformer creates an intermediate directory with a name in uuid format below the working directory and I'm unable to work out how to get the files from this directory.
directory structure after the unzip has happened
working/
0eca3f50-eedb-9ebd-5a3a-4ceb3ad8abbd/
01.tif
02.tif
my.xml
This Flow works. It extracts the contents of the zip file.
#Configuration
public class FileUnzipIntegrationFlow {
public static final String UNZIP_OUTPUT = "unzipOutputChannel";
#Value("${unzipped.dir}")
String unzippedDir;
#Value("${working.dir}")
String workingDir;
#Bean
public MessageSource<File> unzippedDirectory() {
FileReadingMessageSource messageSource = new FileReadingMessageSource();
messageSource.setDirectory(new File(unzippedDir));
return messageSource;
}
#Bean
public IntegrationFlow fileUnzipper() {
return IntegrationFlows.from(unzippedDirectory(), c -> c.poller(Pollers.fixedDelay(1000)))
.filter(source -> ((File)source).getName().endsWith(".zip"))
.transform(unZipTransformer())
.log()
.get();
}
#Bean(name = UNZIP_OUTPUT)
public SubscribableChannel unzipOutputChannel()
{
return MessageChannels.publishSubscribe(UNZIP_OUTPUT)
.get();
}
#Bean
#Transformer(inputChannel = "input", outputChannel = UNZIP_OUTPUT)
public UnZipTransformer unZipTransformer() {
UnZipTransformer unZipTransformer = new UnZipTransformer();
unZipTransformer.setExpectSingleResult(false);
unZipTransformer.setZipResultType(ZipResultType.FILE);
unZipTransformer.setWorkDirectory(new File(workingDir));
unZipTransformer.setDeleteFiles(false);
return unZipTransformer;
}
}
I can't work out how to get to the sub-directory in this Flow
import static com.santanderuk.spring.integration.FileUnzipIntegrationFlow.UNZIP_OUTPUT;
#Configuration
public class XmlVerificationFlow {
#Value("${working.dir}")
String workingDir;
#Bean
public IntegrationFlow xmlVerfier() {
return IntegrationFlows.from(xmlWorkingDirectory(),
c -> c.poller(Pollers.fixedRate(1000).maxMessagesPerPoll(1))).
filter(source -> ((File)source).getName().toLowerCase().endsWith(".xml")).
handle(xmlFileHandler()).
get();
}
#Bean
public MessageSource<File> workingDirectory() {
FileReadingMessageSource messageSource = new FileReadingMessageSource();
messageSource.setDirectory(new File(workingDir));
return messageSource;
}
The snippet above only works when I manually move the xml file from the sub-directory into the working directory. I can also see the payload value in the logging, which has the directory name I need but I have not been able to find out how to access this information
2022-03-28 19:39:14.395 INFO 7588 --- [ scheduling-1] o.s.integration.handler.LoggingHandler : GenericMessage [payload={10000001.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000001.tif, 10000002.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000002.tif, 10000003.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000003.tif, 10000004.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000004.tif, 10000005.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000005.tif, 10000006.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000006.tif, 10000007.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000007.tif, 10000008.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000008.tif, 10000009.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000009.tif, 10000010.tif=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\10000010.tif, 20220304S092800.XML=processing\working\1cd8f803-2e45-dfe2-1c39-99b0d74f83f0\20220304S092800.XML}, headers={file_originalFile=processing\unzipped\202203040001.zip, id=2835eb9e-ff3b-71bf-7432-4967a1f808f6, file_name=202203040001.zip, file_relativePath=202203040001.zip, timestamp=1648492754392}]
It looks like you just ignore the power of the UnZipTransformer. It does return for us something like this:
final SortedMap<String, Object> uncompressedData = new TreeMap<>();
which has a content like this in your case:
uncompressedData.put(zipEntryName, destinationFile);
So, you don't need that extra flow to poll a working dir when you can work easily with the result of unzipping.
On the other hand the FileReadingMessageSource can be configured with the RecursiveDirectoryScanner to let it to iterate sub-dirs recursively. By default it scans only the top dir and ignore all the sub-dirs.
Thanks very much Artem. As suggested, I used the SortedMap that was returned by the UnZipTransformer in a ZipOutputHandler to count the tiff files and record the xml file name.
Updated Flow
#Bean
public IntegrationFlow fileUnzipper() {
return IntegrationFlows.from(unzippedDirectory(), c -> c.poller(Pollers.fixedDelay(1000)))
.filter(source -> ((File) source).getName().endsWith(".zip"))
.transform(unZipTransformer())
.handle(zipOutputHandler(), "process")
.log()
.get();
}
process method in the Handler
public File process(SortedMap<String, Object> uncompressedData) {
uncompressedData.forEach((s, o) -> {
if (s.endsWith("tif")) {
tiffCount++;
tiffNames.add(s);
}
if (s.endsWith("XML")) {
xmlName = s;
xmlFile = new File(o.toString());
extractedDirectory = xmlFile.getParentFile();
}
});
I'm trying to build a Springboot app that allows to insert Json object from Postman and save it to existing json file that has other data. I'm new to Jackson so perhaps I missed something?
this is how my json file looks:
[
{
"Name":"After Dark",
"Author":"Haruki Murakami"
},
{
"Name":"It",
"Author":"Stephen King"
}
]
This is what I have tried:
#PostMapping("/insertBook")
public void insertBook(#RequestBody Book book) {
File booksJsonFile = Paths.get(this.getClass().getResource("/books.json").toURI()).toFile();
objectMapper.writeValue(booksJsonFile, book);
}
It's inserts to an empty file but it's doesn't append to existing json file.
I also have tried this:
#PostMapping("/insertBook")
public void insertBook(#RequestBody Book book) throws URISyntaxException {
try {
File file = Paths.get(this.getClass().getResource("/books.json").toURI()).toFile();
FileWriter fileWriter = new FileWriter(file, true);
SequenceWriter seqWriter = objectMapper.writer().writeValuesAsArray(fileWriter);
seqWriter.write(book);
seqWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
This is what I'm sending from Postman:
Do I need to use something else to achieve the result that I want?
I will be thankfull for your help.
I have tried to reproduce your problem according to your code and I come to following conclusions:
You can not modify file under resources directly. Here is explanation why.
I managed to append new JSON to the file (using your approach but saving file locally) but it's probably not what you expect (json structure is corrupted):
[
{
"Name":"After Dark",
"Author":"Haruki Murakami"
},
{
"Name":"It",
"Author":"Stephen King"
}
][{"Name":"new name","Author":"new author"}]
I am afraid that it is not possible to update current JSON structure directly in the file.
I managed to solve your problem using org.json library. However, the disadvantage of my solution is necessity of rewriting entire file each time. In addition I used synchronized keyword in order to avoid simultaneous file modification.
public synchronized void updateJsonFile(Book book) throws IOException {
ObjectMapper objectMapper = new ObjectMapper();
Path path = Paths.get("./books.json");
final String currentJsonArrayAsString = Files.readString(path);
try (FileWriter fileWriter = new FileWriter(path.toFile(), false)) {
JSONObject jsonObject = new JSONObject(objectMapper.writeValueAsString(book));
JSONArray jsonArray = new JSONArray(currentJsonArrayAsString);
jsonArray.put(jsonObject);
fileWriter.write(jsonArray.toString());
}
}
And now the books.json has following content:
[
{
"Author":"Haruki Murakami",
"Name":"After Dark"
},
{
"Author":"Stephen King",
"Name":"It"
},
{
"Author":"new author",
"Name":"new name"
}
]
I am saving a player to a .json file like this:
public static void savePlayer(Player player) {
final String username = player.getUsername();
final byte[] json = new Gson().toJson(player).getBytes();
final String path = "pack/players/" + username;
try {
Files.write(Paths.get(path + "/data.json"), json, StandardOpenOption.CREATE);
logger.info("Successfully SAVED player [username=%s]!", username);
} catch (IOException e) {
e.printStackTrace();
}
}
and I am using this Eclipse JSON editor plugin to view the json file: https://sourceforge.net/projects/eclipsejsonedit/
However, when looking at the file it is all compressed in one line and not beautified.
The resulting one line of json code is printed like this (instead of beautified):
{"inventory":{"data":[null]},"equipment":{"data":[null]},"playerCredentials":{"username":"kay","password":"password"},"attribute":{}}
What can I do to beautify the code before saving it?
Since you are using Gson you can use their prettyPrinting and create a new Gson obejct
Gson gson = new GsonBuilder().setPrettyPrinting().create();
And then create a String object from that
String output = gson.toJson(json);
If you use FileWriter you can simple use that string to write to a file.
I want my Spring Batch module to read multiple files at a time. Currently, my batch module can only read one file at a time, but I'm able to read multiple lines from a batch module. How can I implement this?
<core type="BatchCreate" class="com.batch.BatchCore" concurrency="1" start="20" stop="80" status="true">
You can use MultiResourceItemReader to read multiple files.
Below is java configuration for same
#Bean
public MultiResourceItemReader<POJO> multiResourceItemReader() {
MultiResourceItemReader<POJO> resourceItemReader = new MultiResourceItemReader<POJO>();
ClassLoader cl = this.getClass().getClassLoader();
ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver(cl);
Resource[] resources = resolver.getResources("file:" + filePath );
resourceItemReader.setResources(resources);
resourceItemReader.setDelegate(reader());
return resourceItemReader;
}
#Bean
public FlatFileItemReader<POJO> reader() {
FlatFileItemReader<POJO> reader = new FlatFileItemReader<POJO>();
reader.setLineMapper(new DefaultLineMapper() {{
setLineTokenizer(new DelimitedLineTokenizer() {{
setNames(new String[]{"id1", "POJO"});
}});
setFieldSetMapper(new BeanWrapperFieldSetMapper<POJO>() {{
setTargetType(POJO.class);
}});
}});
return reader;
}
I am using Spring Batch to read from a CSV file and write the lines on the screen.
My job is composed of 3 parts:
Part 1 : Verify if the CSV file exists in some INPUT directory on my disk, if it returns TRUE the file will be moved to another directory called PROD.
Part 2 : Extract data from the CSV file using FlatFileItemReader.
Part 3 : Write the all the items to the screen.
The problem is the FlatFileItemReader throws org.springframework.batch.item.ItemStreamException: Failed to initialize the reader caused by java.lang.IllegalArgumentException: Input resource must be set
Here is my code:
#Bean
public FlatFileItemReader<UniversInvestissement> reader() {
FlatFileItemReader<UniversInvestissement> reader = new FlatFileItemReader<>();
File csvFile = new File("C://INPUT/data.csv");
Resource resource = resourceLoader.getResource("file:" + csvFile.getAbsolutePath());
reader.setLinesToSkip(1);
reader.setResource(resource);
DefaultLineMapper lineMapper = new DefaultLineMapper();
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
tokenizer.setNames(new String[]{"COL1", "COL2", "COL3", "COL4"});
tokenizer.setDelimiter(";");
FieldSetMapper fieldSetMapper = new UniversInvestissementFieldSetMapper();
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
reader.setLineMapper(lineMapper);
reader.setEncoding("Cp1252");
return reader;
}
#Bean
public UniversInvestissementWriter writer() {
return new UniversInvestissementWriter();
}
#Bean
public UniversInvestissementProcessor processor() {
return new UniversInvestissementProcessor();
}
#Bean
public Step extractData() {
return steps.get("extractData")
.<UniversInvestissement, UniversInvestissementProcessorResult>chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.build();
}
Actually the problem is that when the FlatFileItemReader is initialized it can't find the CSV file as a resource !
Is there a way to postpone the resource assignment and avoid this exception ?
you Can use reader.setStrict(false); if you set strict mode to false the reader will Not throw an exception on. You might have to use #StepScope to make reader lazy. I am using same setup and it's working fine for me , Hope this helps you
Verify if the CSV file exists in some INPUT directory on my disk, if
it returns TRUE the file will be moved to another directory called
PROD
This problem can easly be solved using a JobExecutionDecider
class Checker implements JobExecutionDecider {
FlowExecutionStatus decide(...) {
if(<file not found in INPUT/ dir>) {
return FlowExecutionStatus.STOPPED;
}
if(!<copy file from INPUT/ to PROD/ works>) {
return FlowExecutionStatus.FAILED;
}
return FlowExecutionStatus.COMPLETED;
}
}
Of course, extractData() must be changed to insert use of programmatic flow decision (check here for a simple example)
I think that problem in your resourceLoader, because such exception thrown by non-null assertion of resource instance. So you resourceLoader return null value.
Try to use FileSystemResource and without any resource loaders. For example:
reader.setResource(new FileSystemResource(csvFile));