How to get the result of a job in another job in Flink? [closed] - java

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Here is the situation.I have two data sources, a message queue and a MySQL table, which can be regarded as DataStream and DataSet respectively.I want to start a job based on DataStream to pull data from the message queue and perform some calculation. In the progress of calculation, a job based on DataSet(the MySQL table) is needed, whose OutputFormat should return the result to the DataStream job.
I'm stuck here and need some help.

You cannot mix the DataStream and DataSet APIs in the same job. But there are ways to access MySQL from a streaming job. You can:
query MySQL from a flatmap
use async i/o to do that more efficiently
stream in the data from mysql using something like debezium
Depending on how you want to connect the data from mysql to your other stream(s), you may want to use a CoFlatmapFunction, or a CoProcessFunction.

Related

Import large csv files spring [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have not used Spring Batch so far and i was wondering if this is maybe the time to break the ice.
I have large csv files with 10k to 30k lines that i need to import into database. Also i would need to do some processing of that data. Like checking some fields if they exist in the db (if not the row should be ignored). I never used Spring Batch so i would need to spend some time getting to know it so i can use it. But also i have a really strict deadline for this.
But is it really worth it for this kind of work? Since it would not be something like scheduled job witch would be done on daily, weekly or monthly basis. It would be done per need. Maybe once in a few months.
So is it "overkill" to use batch processing or it would be fine to just iterate line by line with some buffered reader?
If it is a one-off job for "just" 10k-20k lines, Spring Batch is complete overkill and you are better of writing something smaller, yourself.

Serialize the map on every iteration [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a ConcurrentHashMap which stores ID and timestamp when this ID was updated.
This data is to be persistent as application restart should be aware of the previous state of the data when shut-down was called.
I am currently serializing the map to a file on shut-down and loading it back when the application restarts.
However I can foresee that the serialization on shutdown would fail when the disk is full. And this would mean data-loss which is unacceptable.
I thought of using a DB to store the data but then it would add network weight on every update.
The only thing that comes to my mind right now is to serialize the map on every update. This would ensure that most of the data is persistent in case of disk full also even in case of Unexpected shutdown.
I am aware that this is a heavy operation and am open for alternative solutions.
Also note, this map may hold over 1200K entries...
Thanks in advance
If your scenario allowed some data loss then one solution can be
1.Periodically save snapshot of your hashmap so at most there will be a data loss for that interval.
2.For strict scenario you can log your action such that you can replay and get the original value.And as log is adding on end and read less may not be a performance hit.Log base technique used in like zookeper for meta data storage.
3.Or you can persist to some kind of db asynchronously by using queue and process in batch.

What is the most efficient way to use database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm new to using a database, specifically MySQL. I'm creating a web application for class in which you can look up the name of a book and it'll display the summary of the book. My question is should I send a query to the database that collects all of the books' data on initialization and put them into a HashMap inside a manager class for lookup or should I use a query each time to lookup a specific book information?
It depends on the data transport time I would say. If your average query time times the number of request goes faster than a script to put everything into a HashMap, use queries. Otherwise, use a script that collects everything and puts it into a HashMap.
But if you have thousands of rows, you should use queries, because otherwise you will use too much RAM.

Efiiciency for Pagination in Java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am writing Pagination for a set of records in Java. Here i am talking to a service which fetches a result set of 500 per time. I interact with that service to display at most 50 records at a time with page markers.
Is it a really efficient model or if there is any other way to improve the suggested pagination model ?
Depends on your data.
If your data is rapidly changing, this definitely isn't a suitable method. I would suggest tweaking the service to return only as much requested records that are needed by the page.
If data is static and need not be time dependent, this should work fine. Just fetch the 500 records, put it in a local array and display from there. Once this is exhausted, replenish the same.

Is it good idea to store store List<100000> Pojo objects in memory [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
for some test data scenario i need to read file containing 100000 lines of row and process each row with some condition and then based on condition need to output the data in text format
for this i am planning to store all lines of data in some pojo then pojo to List
My worry is to having pojo of 100000 row in memory . this is just for testing case .
I think using InputSteam to read the file will be better since you still fetch rows one by one. You can read one line per time, and process your confition and then output.
Storing too much Objects in List may encounter an Out of Memory Error.
In any case, its a bad design to store all 100000 rows as POJO in memory. Some of the possible solutions are:
Read one row at a time and process it.
Rather than reading from a file one record at a time and processing it using java, use some scripting language to populate a database table, and then from your java code you can process the records from the table.

Categories