Reading & Parsing Blockchain DAT files

Reading & Parsing Blockchain DAT files - java

I'm working on some code that reads the DAT files in the Blockchain, and I was trying to use bitcoinj because it seemed fairly straightforward. However, I can't seem to get it to actually read the blocks within the DAT file. I've tried many different versions and have made no significant progress.
I'm feeling like this should be fairly straightforward, and I'm just missing something simple here. To be clear, I'm not trying to write to the Blockchain, just read the DAT files.
Thanks!
Here is a code snippet.
NetworkParameters np = new MainNetParams();
Context c = new Context( np );
Context.getOrCreate(MainNetParams.get());
List<File> blockChainFiles = new ArrayList<>();
blockChainFiles.add( new File( "blk00000.dat" ) );
BlockFileLoader bfl = new BlockFileLoader(np, blockChainFiles);
int blockNum = 0;
// Iterate over the blocks in the dataset.
for (Block block : bfl) {
...
This code produces the following error:
Exception in thread "main" java.lang.IllegalStateException: Context does not match implicit network params: org.bitcoinj.params.MainNetParams#9d1d82f2 vs org.bitcoinj.params.MainNetParams#9d1d82f2
at org.bitcoinj.core.Context.getOrCreate(Context.java:147)
at testBitcoin.main(testBitcoin.java:20)

The block .dat files contain multiple blocks in one file, including orphans, separated by magic numbers.
Please refer https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
Your code doesn't seem to be looking for magic numbers or jumping lengths as specified by message structure.

Just get rid of the complaining line, Context.getOrCreate(MainNetParams.get());, it's not needed.
The following slightly altered version of your code worked for me:
List<File> blockChainFiles = new ArrayList<>();
blockChainFiles.add(new File("blk00000.dat"));
MainNetParams params = MainNetParams.get();
Context context = new Context(params);
BlockFileLoader bfl = new BlockFileLoader(params, blockChainFiles);
// Iterate over the blocks in the dataset.
for (Block block : bfl) {
System.out.println(block.getHashAsString());
}

You can use my blockchain parser. It writtens on Python and can parse all the data from blk*.dat files to the simple text view.

Related

Effecient way to convert a stream of Strings into grouped list of strings

I have a function which will receive a Stream<String>. This stream represents the lines in a file (as called by Files.lines(somePath)). The file itself is actually the concatenation of many files into a single file, something like this:
__HEADER__ # for file 1
data
more data
...
__HEADER__ # file 2 starts here
some more data...
...
I need to convert the stream into multiple physical files on the filesystem.
I've tried the simple approach, something along the lines of:
String allLinesJoined = lineStream.collect(Collectors.joining());
// This solution seems to get stuck on the line above ^
String files[] = allLinesJoined.split("__HEADER__");
for (fileStr : files)
{
// This function will write each fileStr to a separate file
// (filename is determined by contents of fileStr)
writeToPhysicalFile(fileStr);
}
But the input file is about ~300 MB (and could get larger) and this solution seems to get stuck on the first line. Maybe it would complete if I had more memory...?
Is there a better way to do this, if my starting point is a Stream<String>, or should I start making other changes so that this bit of code can just read through the file line by line, without using the streaming API?
(the order of the lines does matter, in the context of these files)
tl;dr
I need to turn one big file represented as Stream<String> in to many little files. Each little file begins with __HEADER__ and all lines after, until the next __HEADER__. The current library uses streams to provide the file, but is it even worth trying to do this with streams, or will my life be easier if I change the library to offer non-stream functionality?

That kills the whole idea of streams.
Try forEach():
Stream<String> lineStream = Files.lines(Paths.get("your_file"));
lineStream.forEachOrdered((s) -> {
if ("HEADER".equals(s)) {
// create new file
}
else {
// append to this file
}
});

Usefulness of DELETE_ON_CLOSE

There are many examples on the internet showing how to use StandardOpenOption.DELETE_ON_CLOSE, such as this:
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);
Other examples similarly use Files.newOutputStream(..., StandardOpenOption.DELETE_ON_CLOSE).
I suspect all of these examples are probably flawed. The purpose of writing a file is that you're going to read it back at some point; otherwise, why bother writing it? But wouldn't DELETE_ON_CLOSE cause the file to be deleted before you have a chance to read it?
If you create a work file (to work with large amounts of data that are too large to keep in memory) then wouldn't you use RandomAccessFile instead, which allows both read and write access? However, RandomAccessFile doesn't give you the option to specify DELETE_ON_CLOSE, as far as I can see.
So can someone show me how DELETE_ON_CLOSE is actually useful?

First of all I agree with you Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE) in this example the use of DELETE_ON_CLOSE is meaningless. After a (not so intense) search through the internet the only example I could find which shows the usage as mentioned was the one from which you might got it (http://softwarecave.org/2014/02/05/create-temporary-files-and-directories-using-java-nio2/).
This option is not intended to be used for Files.write(...) only. The API make is quite clear:
This option is primarily intended for use with work files that are used solely by a single instance of the Java virtual machine. This option is not recommended for use when opening files that are open concurrently by other entities.
Sorry I can't give you a meaningful short example, but see such file like a swap file/partition used by an operating system. In cases where the current JVM have the need to temporarily store data on the disc and after the shutdown the data are of no use anymore. As practical example I would mention it is similar to an JEE application server which might decide to serialize some entities to disc to freeup memory.
edit Maybe the following (oversimplified code) can be taken as example to demonstrate the principle. (so please: nobody should start a discussion about that this "data management" could be done differently, using fixed temporary filename is bad and so on, ...)
in the try-with-resource block you need for some reason to externalize data (the reasons are not subject of the discussion)
you have random read/write access to this externalized data
this externalized data only is of use only inside the try-with-resource block
with the use of the StandardOpenOption.DELETE_ON_CLOSE option you don't need to handle the deletion after the use yourself, the JVM will take care about it (the limitations and edge cases are described in the API)
.
static final int RECORD_LENGTH = 20;
static final String RECORD_FORMAT = "%-" + RECORD_LENGTH + "s";
// add exception handling, left out only for the example
public static void main(String[] args) throws Exception {
EnumSet<StandardOpenOption> options = EnumSet.of(
StandardOpenOption.CREATE,
StandardOpenOption.WRITE,
StandardOpenOption.READ,
StandardOpenOption.DELETE_ON_CLOSE
);
Path file = Paths.get("/tmp/enternal_data.tmp");
try (SeekableByteChannel sbc = Files.newByteChannel(file, options)) {
// during your business processing the below two cases might happen
// several times in random order
// example of huge datastructure to externalize
String[] sampleData = {"some", "huge", "datastructure"};
for (int i = 0; i < sampleData.length; i++) {
byte[] buffer = String.format(RECORD_FORMAT, sampleData[i])
.getBytes();
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
sbc.position(i * RECORD_LENGTH);
sbc.write(byteBuffer);
}
// example of processing which need the externalized data
Random random = new Random();
byte[] buffer = new byte[RECORD_LENGTH];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
for (int i = 0; i < 10; i++) {
sbc.position(RECORD_LENGTH * random.nextInt(sampleData.length));
sbc.read(byteBuffer);
byteBuffer.flip();
System.out.printf("loop: %d %s%n", i, new String(buffer));
}
}
}

The DELETE_ON_CLOSE is intended for working temp files.
If you need to make some operation that needs too be temporaly stored on a file but you don't need to use the file outside of the current execution a DELETE_ON_CLOSE in a good solution for that.
An example is when you need to store informations that can't be mantained in memory for example because they are too heavy.
Another example is when you need to store temporarely the informations and you need them only in a second moment and you don't like to occupy memory for that.
Imagine also a situation in which a process needs a lot of time to be completed. You store informations on a file and only later you use them (perhaps many minutes or hours after). This guarantees you that the memory is not used for those informations if you don't need them.
The DELETE_ON_CLOSE try to delete the file when you explicitly close it calling the method close() or when the JVM is shutting down if not manually closed before.

Here are two possible ways it can be used:
1. When calling Files.newByteChannel
This method returns a SeekableByteChannel suitable for both reading and writing, in which the current position can be modified.
Seems quite useful for situations where some data needs to be stored out of memory for read/write access and doesn't need to be persisted after the application closes.
2. Write to a file, read back, delete:
An example using an arbitrary text file:
Path p = Paths.get("C:\\test", "foo.txt");
System.out.println(Files.exists(p));
try {
Files.createFile(p);
System.out.println(Files.exists(p));
try (BufferedWriter out = Files.newBufferedWriter(p, Charset.defaultCharset(), StandardOpenOption.DELETE_ON_CLOSE)) {
out.append("Hello, World!");
out.flush();
try (BufferedReader in = Files.newBufferedReader(p, Charset.defaultCharset())) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
System.out.println(Files.exists(p));
This outputs (as expected):
false
true
Hello, World!
false
This example is obviously trivial, but I imagine there are plenty of situations where such an approach may come in handy.
However, I still believe the old File.deleteOnExit method may be preferable as you won't need to keep the output stream open for the duration of any read operations on the file, too.

Create a text file if it doesn't exist and append to it if it does using Java BufferedWriter

This is probably ridiculously simple for gun Java programmers, yet the fact that I (a relative newbie to Java) couldn't find a simple, straightforward example of how to do it means that I'm going to use the self-answer option to hopefully prevent others going through similar frustration.
I needed to output error information to a simple text file. These actions would be infrequent and small (and sometimes not needed at all) so there is no point keeping a stream open for the file; the file is opened, written to and closed in the one action.
Unlike other "append" questions that I've come across, this one requires that the file be created on the first call to the method in that run of the Java application. The file will not exist before that.
The original code was:
Path pathOfLog = Paths.get(gsOutputPathUsed + gsOutputFileName);
Charset charSetOfLog = Charset.forName("US-ASCII");
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog);
bwOfLog.append(stringToWrite, 0, stringToWrite.length());
iReturn = stringToWrite.length();
bwOfLog.newLine();
bwOfLog.close();
The variables starting with gs are pre-populated string variables showing the output location, and stringToWrite is an argument which is passed in.
So the .append method should be enough to show that I wanted to append content, right?
But it isn't; each time the procedure was called the file was left containing only the string of the most recent call.

The answer is that you also need to specify open options when calling the newBufferedWriter method. What gets you is the default arguments as specified in the documentation:
If no options are present then this method works as if the CREATE,
TRUNCATE_EXISTING, and WRITE options are present.
Specifically, it's TRUNCATE_EXISTING that causes the problem:
If the file already exists and it is opened for WRITE access, then its
length is truncated to 0.
The solution, then, is to change
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog);
to
bwOfLog = Files.newBufferedWriter(pathOfLog, charSetOfLog,StandardOpenOption.CREATE, StandardOpenOption.APPEND);
Probably obvious to long time Java coders, less so to new ones. Hopefully this will help someone avoid a bit of head banging.

You can also try this :
Path path = Paths.get("C:\\Users", "textfile.txt");
String text = "\nHello how are you ?";
try (BufferedWriter writer = Files.newBufferedWriter(path, StandardCharsets.UTF_8, StandardOpenOption.APPEND,StandardOpenOption.CREATE)) {
writer.write(text);
} catch (IOException e) {
e.printStackTrace();
}

Read metadata with ExifTool

I'm trying to read illustrator file metadata value by using Exiftool. I tried as per below.
File[] images = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for(File f : images) {
if (f.toString().contains(".ai"))
{
System.out.println("test "+tool.getImageMeta(f, Tag.DATE_TIME_ORIGINAL));
}
}
tool.close();
Above code not printing any value. I even tried this.
public static final File[] IMAGES = new File("filepath").listFiles();
ExifTool tool = new ExifTool(Feature.STAY_OPEN);
for (File f : IMAGES) {
System.out.println("\n[" + f.getName() + "]");
System.out.println(tool.getImageMeta(f, Format.NUMERIC,
Tag.values()));
}
Which only prints {IMAGE_HEIGHT=2245, IMAGE_WIDTH=5393}. How do I call metadata values using Exiftool. Any advices and references links are highly appreciated.

For the given API, it either;
1-does not contain the tag you are looking for
2-the file itself might not have that tag filled
3-you might want to recreate your own using a more general tag command when calling exiftool.exe
Look in the source code and find the enum containing all the tags available to the API, that'll show you what you're restricted to. But yeah, you might want to consider making your own class similar to the one you're using. I'm in the midst of doing the same. That way you can store the tags in perhaps a set or HashMap instead of an enum and therefore be much less limited in tag choice. Then, all you have to do is write the commands for the tags you want to the process's OutputStream and then read the results from the InputStream.

Java - find matching pairs from list

background:
I need to load test a process on a server that I am working with. What I am doing is I am creating a bunch of files on client side and will upload them to server. The server is monitoring for new files (in input dir, file names are unique) and once there is a new file it processes it, once done, it creates a response file with same name but different extension to output dir. If the processing fails, it puts the incoming file to error dir. I am using the inotifywait to monitor the changes on server, which outputs:
10:48:47 /path/to/in/ CREATE ABCD.infile1
10:48:55 /path/to/out/ CREATE ABCD.outfile1
or
10:49:11 /path/to/in/ CREATE ASDF.infile1
10:49:19 /path/to/err/ CREATE ASDF.infile1
problem:
I need to parse the list of all results (planning to implement in java) like so, that I take the infile and match it with the same file name (either found in ERR or OUT), calculate the time taken and indicate weather it was success or not. The idea I am having is to create 3 lists (in, out, err) and try to parse, something like (in pseudo-code)
inList
outList
errList
for item : inList
if outlist.contains(item) parse;
else if errList.contains(item) parse;
else error;
question:
Is this efficient? Or is there a better way to approach this situation? Anyway, you might think that it is a code you are executing just once, why the struggle, but I really would like to know how do handle this properly.

The solution with lists is problematic, as you will have to keep them synchronized properly with the state of drive and always load them. What is more you will reach at some point capacity limit for file stored in single location.
Alternatives what you have are that you use i/o API to check path existence, or introduce a between database where you will store your values.
Another approach is database where you will store the information about keys and physical paths that file really has.
If I was you i would start with the I/O API and design a simple interface that could be replaced in future if the solution would appear to be inefficient.

You can use the "UserDefinedfileAttributeView" concept.
Create your own File attribute, say, "Result" and set its value accordingly for the files in IN dir. If the file is moved to OUT dir, "Result"="Success" and if the file is moved to ERR dir, "Result"="Error"
I tried the below code, hope it helps.
public static void main(String[] args) {
try{
Path file = Paths.get("C:\\Users\\rohit\\Desktop\\imp docs\\Steps.txt");
UserDefinedFileAttributeView userView = Files.getFileAttributeView(file, UserDefinedFileAttributeView.class);
String attribName = "RESULT";
String attribValue = "SUCCESS";
userView.write(attribName, Charset.defaultCharset().encode(attribValue));
List<String> attribList = userView.list();
for (String s : attribList) {
ByteBuffer buf = ByteBuffer.allocate(userView.size(s));
userView.read(s, buf);
buf.flip();
String value = Charset.defaultCharset().decode(buf).toString();
if("SUCCESS".equals(value)){
System.out.print(String.format("User defined attribute: %s", s));
System.out.println(String.format("; value: %s", value));
}
}
}
catch(Exception e){
}
You can do this for every file placed in IN dir.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading & Parsing Blockchain DAT files - java

The block .dat files contain multiple blocks in one file, including orphans, separated by magic numbers. Please refer https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure Your code doesn't seem to be looking for magic numbers or jumping lengths as specified by message structure.

You can use my blockchain parser. It writtens on Python and can parse all the data from blk*.dat files to the simple text view.

Related

Effecient way to convert a stream of Strings into grouped list of strings

Usefulness of DELETE_ON_CLOSE

Create a text file if it doesn't exist and append to it if it does using Java BufferedWriter

Read metadata with ExifTool

Java - find matching pairs from list

Categories

Resources