Java mapping for COBOL comp and comp-3 fields

Java mapping for COBOL comp and comp-3 fields - java

I am invoking DB2 stored procedure created using COBOL from my java application.
input macro (type varchar):
01 SP1-INPUTS.
05 FIELD-1 PIC X(03).
05 FIELD-2 PIC S9(09) COMP.
05 FIELD-3 PIC S9(15)V9(02) COMP-3.
05 FIELD-3X REDEFINES FIELD-3 PIC X(09)
To test the stored procedure, I know only value for FIELD-1. For other fields, to fill the packed portions how many zeros should I put? Please see the code which I wrote and confused in passing dummy values.
String field1="abc";
String field2="000000000"; // 9 zeroes, correct?
String field3="00...0" // should I give 18 zeroes or 9 zeroes?
How much characters totally for the input macro ?

COBOL does not have strings, it has fixed-length fields with no terminators.
So for FIELD-1 you have three "characters". A PICture of X can allow any of the 256 possible bit-values, but would typically contain human-readable values.
FIELD-2 is a binary field. You could consider it to be a four-byte integer.
However, the way it is defined there, COMP, with an S, it has a maximum value of 999,999,999 positive and a minimum value of 999,999,999 negative.
If it were defined as COMP-5, it could contain the full range for all bits in those four bytes. Note that compiler option TRUNC(BIN) would modify the behaviour of COMP to COMP-5 anyway, so you need to check with the Mainframe side what compile option is used for TRUNC (other values are STD and OPT).
On an IBM Mainframe a native binary field is Big Endian. On your local machine a native binary value would be Little Endian. For instance, a value of 16906090 would be stored as X'01020304'.
FIELD-3 is packed-decimal. It is nine bytes long, but each byte contains two decimal digits, except for the low-order (right-most) byte which contains one digit followed by a sign specifier (the PICture uses S, so the sign should be C for positive and D for negative, base-16). There is an implied decimal point (the V) and two decimal places.
FIELD-3X is another X field. It could again contain any bit-pattern in each of the bytes. You'd need to know what the intended use of that field was (there's not even the slightest clue from the names, as with the others, which is a bad way to do things).
The design is wrong. It is completely nuts to send binary and packed-decimal fields from the Mainframe to somewhere else, or vice versa. If all the fields were defined as X or 9 without the USAGE (implied) of COMP or COMP-3, then you'd have a very easy ride.
There's a required full-stop/period missing on the final definition, but that's likely a pasto.
01 SP1J-INPUTS.
05 a-meaningful-name PIC X(03).
05 another-meaningful-name PIC S9(09)
SIGN LEADING SEPARATE.
05 a-third-meaningful-name PIC +9(15).9(02).
05 yet-annother-meaningful-name
REDEFINES a-third-meaningful-name
PIC X(19).
This shows two ways to deal with the sign, either with the SIGN clause, or by using a numeric-edited definition. The . in the numeric-edited definition is an actual decimal-point, rather than an implied one, with the V.
All the data is now "text" or "character" data, which should be easy for you to deal with. EBCDIC (IBM Mainframe encoding) to ASCII (your local machine encoding, likely) is easy and can be done at data level, not field level.
For your to-and-fro communication, the above would be much easier for you, much less error-prone and more easily auditable. Internally the COBOL program can easily convert to/from those for its own internal use.
If you don't get them to change your interface to "character" then you'll have all sorts of extra coding and testing to do, for no advantage.
And example of that layout with ABC, 123456789 (negative) and 123456789012345.67 (positive) would be
ABC-123456789+123456789012345.67
Note that in addition to no field-delimiters, there are no data/record delimiters either. No "null"s.
There is an alternative to the actual decimal-point, which would be to provide a scaling factor. Further, you could "hard-code" the scaling in your program.
I assume the above data would be easy for you to both accept and create. Please try to get your interface changed. If they refuse, document with your boss the impact in extra code, and that the extra code is just to be able to "understand" the data before you even get to think about using it. Which is silly.
To create the easy-format data for you, the COBOL program needs to do this:
MOVE FIELD-1 TO a-meaningful-name
MOVE FIELD-2 TO another-meaningful-name
MOVE FIELD-3 TO a-third-meaningful-name
To receive the easy-format data from you, the COBOL program needs to do this:
MOVE a-meaningful-name TO FIELD-1
MOVE another-meaningful-name TO FIELD-2
MOVE a-third-meaningful-name TO FIELD-3
If the REDEFINES has a purpose, it would require specific code for the fourth field, but that's difficult for me to guess, but not difficult to code once the actual need is known.
Nothing onerous, and vastly simpler than what you have to code otherwise.

Related

Confusion around how byte array writing works in Java

Let’s say I have a huge set of strings that I want to write into a file as efficient as possible. I don’t care if it’s not human readable.
The first thing that came to my mind was to write the string as raw bytes to a binary file. I tried using DataOutputStream and write the byte array. However when I open my file it is readable.
How does this work? Does it actually write binary under the hood and only my text editor is making it readable?
Is this the most efficient way to do this?
I’d use this for a project where performance is key so I’m looking for the fastest way to write to a file (no need to be human readable).
Thanks in advance.

Files are just a sack of bytes. They always are, even a txt file.
Characters don't really exist, in the sense that computers don't know what they are, not really. They just know numbers.
So, how does that work?
Welcome to the wonderful world of text encoding.
The job of storing, say, the string "Hello!" in a file requires converting the notion of H, e, l, l, o, and ! into bytes first, and then write those bytes into a file.
In order to do that, you first need a lookup table; one that translates characters into numbers. Then, we have to convert those numbers to bytes, and then we can save them to a file.
A common encoding is US-ASCII. US-ASCII contains only 94 characters in its mapping. The 26 letters of the english alphabet in both lower and uppercase form, all digits, a few useful symbols such as !##$%^&*( and space. That's it. US-ASCII simply has no 'mapping' for e.g. é or ☃ or even 😊.
All these characters are mapped to a number between 32 and 126, and so to put this in a text file, just write that number, as bytes can represent anything between 0 and 255, so it 'just fits' (in fact, the high bit is always 0).
But, it's 2021, and we have emoji and we figured out a while ago that as it turns out, there are languages out there that aren't english, amazing, that.
So, the commonly used table is the unicode table. This table represent a liiiiitle more than 94 characters. Nono, this table has a whopping, at time of writing, 143859 characters in its table. Holy moly batman, that's a ton.
Clearly, the numbers that these 143,859 glyphs are mapped to must, at the very least, fall between 0 and 143,859 (it's actually a larger number range; there are gaps for convenience and to leave room for future updates).
You could just state that each number is an int (between 0 and 2^31 - 4 bytes total), and store each character as an int (so, Hello! would turn into a file on disk that is 24 bytes large).
But, a more common encoding is UTF-8. UTF-8 has the property that it stores ASCII-compatible characters as, well, ASCII, because those 94 numbers have the same 'number translation' in unicode as it does in ASCII, AND UTF-8 stores those numbers as just that byte. UTF-8 stores each character in 1, 2, 3, or 4 bytes, depending on what character it is. It's a 'variable length encoding scheme'.
You can look up UTF_8 on e.g. wikipedia if you want to know the deal.
For english-esque text, UTF-8 is extremely efficient, and no worse than ascii (so there is no reason to use ascii). You can do this in java, quite easily:
// Path, Files etc are from java.nio.file
Path p = Paths.get("mytextfile.txt");
Files.writeString(p, "Hello!");
That's all you need; the Files API defaults to UTF_8 (be aware that the old and mostly obsolete APIs such as FileWriter don't, and you should ALWAYS specify charset encoding for those! Or better yet, just don't use em and use java.nio.file instead).
Note that you can shove a unicode snowman or even emoji in there, it'll save fine.
There is no 'binary' variant. The file is bytes. If you open it in a text editor or run cat thatfile.txt, guess what? cat or your editor is reading in the bytes, taking a wild stab in the dark as to what encoding it might be, looking every decoded value up in the character table, and then farm out the work to the font rendering engine to show the characters again. It's just the editor giving you the benefit of showing off the file with bytes:
72, 101, 108, 108, 111, 33
as Hello! because that's a lot easier to read. Open that 'text file' with a hex editor and you'll see that it contains exactly that sequence of numbers I showed (well, in hex, that too is just a rendering convenience).
Still, if you want to store it 'efficiently', the answer is trivial: Use a compression algorithm. You can throw that data through e.g. new ZipOutputStream or use more fancy compressors:
Path p = Paths.get("file.txt.gz");
try (OutputStream out = Files.newOutputStream(p);
ZipOutputStream zip = new ZipOutputStream(out)) {
String shakespeare = "type the complete works of shakespeare here";
zip.write(shakespeare.getBytes(StandardCharsets.UTF_8);
}
You'll find that file.txt.gz will be considerably fewer bytes than the total character count of the combined works of shakespeare. Voila. Efficiency.
You can futz with your compression algorithm; there are many. Some are optimized for specific purposes, most fall on a tradeoff line between 'speed of compression' and 'efficiency of compression'. Many are configurable (compress better at the cost of running longer vs compress quickly, but it won't be quite as efficient). A few basic compression algorithms are baked into java, and for the fancier ones, well, a few have pure java implementations you can use.

What is an efficient way to splice one binary input into another at an arbitrary bit number using Java without relying on parseInt?

I am trying make a function in Java to splice one binary input into another using Java at the bit level as opposed to a byte level. For example splicing "00000000" into "11111111" a position 3 would produce "1110000000011111".
I have tried looking into the JBBP library for dealing with binary data but it is almost as though there is a special coding language, other than java, which seems to be specific to this library which you write then pass as a string in Java to the JBBPParser.prepare() function. I tried looking in the java docs for the library but they only describe what the function does, not what the commands you can pass as a sting are, what those commands do, or what the proper syntax for those commands is.
Can anyone provide a link to the documentation for the commands you can pass as strings to the JBBPParser functions or provide an alternative way to splice binary data together at an arbitrary bit without relying on binary strings and parseInt as they are inefficient?

Read about Endianness. Java is big-endian. This means that your example is wrong. Instead of inserting the zero string (00000000) after position 3 in the target byte, your example demonstrates inserting the zero string either after position 6 or before position 5 (see bit numbering below).
The operation you are looking for is called "left shift". Here is a link discussing Bitwise and Bit Shift Operators.
Here is the strategy you want to use:
Determine how many bits to shift from the target into the destination. I will call this the initialShiftCount. In your example, the initialShiftCount is 3 bits.
Left shift the initialShiftCount number of bits into the destination.
Left shift all 8 bits from the insert string into the destination.
Left shift the remaining bits from the target into the destination.
Terms:
destination - the location into which you want to place the combination of the bits in the insert string and the target. In your example, this is the variable that ends up holding the string 1110000000011111'.<br/>
*insert string* - the bits you want to insert into the target bits. In your example, the insert string is00000000.<br/>
*target* - the bits into which you want to insert the insert string. In your example, the target is11111111`.
Bit Numbering
In a big-endian system, the bits are numbered as follows: 87654321

Combining text- and bit-information in a file in Java?

Alright, so we need to store a list of words and their respective position in a much bigger text. We've been asked if it's more efficient to save the position represented as text or represented as bits (data streams in Java).
I think that a bitwise representation is best since the text "1024" takes up 4*8=32 bits while only 11 if represented as bits.
The follow up question is should the index be saved in one or two files. Here I thought "perhaps you can't combine text and bitwise-representation in one file?" and that's the reason you'd need two files?
So the question first and foremost is can I store text-information (the word) combined with bitwise-information (it's position) in one file?

Too vague in terms of whats really needed.
If you have up to a few million words + positions, don't even bother thinking about it. Store in whatever format is the simplest to implement; space would only be an issue if you need to sent the data over a low bandwidth network.
Then there is general data compression available, by just wrapping your Input/OutputStreams with deflater or gzip (already built in the JRE) you will get reasonably good compression (50% or more for text). That easily beats what you can quickly write yourself. If you need better compression there is XZ for java (implements LZMA compression), open source.
If you need random access, you're on the wrong track, you will want to carefully design the data layout for the access patterns and storage should be only of tertiary concern.

The number 1024 would at least take 2-4 bytes (so 16-32 bits), as you need to know where the number ends and where it starts, and so it must have a fixed size. If your positions are very big, like 124058936, you would need to use 4 bytes per numbers (which would be better than 9 bytes as a string representation).
Using binary files you'll need of a way to know where the string starts and end, too. You can do this storing a byte before it, with its length, and reading the string like this:
byte[] arr = new byte[in.readByte()]; // in.readByte()*2 if the string is encoded in 16 bits
in.read(arr); // in is a FileInputStream / RandomAccessFile
String yourString = new String(arr, "US-ASCII");
The other possiblity would be terminating your string with a null character (00), but you would need to create your own implementation for that, as no readers support it by default (AFAIK).
Now, is it really worth storing it as binary data? That really depends on how big your positions are (because the strings, if in the text version are separated from their position with a space, would take the same amount of bytes).
My recommendation is that you use the text version, as it will probably be easier to parse and more readable.
About using one or two files, it doesn't really matter. You can combine text and binary in the same file, and it would take the same space (though making it in two separated files will always take a bit more space, and it might make it more messy to edit).

Interpreting hex dump of java class file

I understand the structure of a java .class file, but when I want to interpret the raw hex data I get a bit lost.
This is a hex dump of a class file, excluding the header and constant pool.
I understand the header to be the magic number, minor_version and major_version. It seems the next value should be the access flags.
Which value would that be in this chart? 000000b0? I thought it would be a simple number not a hex value.
Which value is this_class, the index into the constant pool where the class details can be determined?

The 000000b0 is not part of the data. It's the memory address where the following 16 bytes are located.
The two-digit hex numbers are the actual data. Read them from left to right. Each row is in two groups of eight, purely to asist in working out memory addresses etc.
So to answer your question indirectly, you can work out where the access flags are by simply counting past the number of bytes used by the magic number, minor version and major version. The access flags will come next. Likewise, to find any other values (such as this_class), you have to work out what their offset is and look at that location in the data.
You say that you expected a "simple number not a hex vaue", but that doesn;t really make sense as hex values are simple numbers. They're simply represented in base-16 instead of base-10. There are plenty of resources online that will teach you how to convert between the two.

Using DHT to lookup stuff. SHA-1. Chord protocol

I'm trying to implement the Chord protocol in order to quickly lookup some nodes and keys in a small network. What I can't figure out is ... Chord cosideres the nodes and keys as being placed on a cirlce. And their placement dictated by the hash values obtained by applying the SHA-1 hash function. How exactly do I operate with those values? Do I make them as a string de9f2c7f d25e1b3a fad3e85a 0bd17d9b 100db4b3 and then compare them as such, considering that "a" < "b" is true ? Or how? How do I know if a key is before or after another?

Since the keyspace is a ring, a single value can't be said to be greater than another, because if you go the other way around the ring, the opposite is true. You can say a value is within a range or not. In the Chord DHT, each server is responsible for the keys within the range of values between it and its predecessor.
I would advise against using strings for the hash values. You shouldn't use the hashCode function for distributed systems, but you need to math on the hash keys when adding new nodes. You could try converting the hashes into BigIntegers instead.

sha1 hashes are not strings but are very long hex numbers - they are often stored as strings because they would otherwise require a native 160 bit number type. They are built as 5 32 bit hex numbers and then often 'strung' together.
using sha1 strings as the numbers they represent is not hard but requires a library that can handle such large numbers (like BigInt or bcmath). these libraries work by calculating the numbers within the string one column at a time from the right to left, much like a person when using a pen and paper to add, multiply, divide, etc. they will typically have functions for doing common math as well comparisons etc, and often take strings as arguments. Also, make sure that you use a function for converting big numbers anytime you need to go from hex to dec, or else your 160 bit hex number will likely get rounded into a 64 bit dec float or similar and loose most of it's accuracy.
more/less than comparisons are used in chord to figure ranges but do so using modulo so that they 'wrap', making ranges such as [64, 2] possible. the actual formula is
find_successor(fingers[k] = n + 2^(k-1) mod(2^160))
where 'n' is the sha1 of a node and 'k' is the finger number.
remember, 'n' will be hex while 'k' and 'mod(160^2)' will typically be dec, so this is where your BigInt hex to BigInt dec will be needed.
even if your programing framework will let you create these vars as hex, 160 is specifically a dec (literally meaning one hounded and sixty bits) and besides, wrapping your brain around 'mod(160^2)' is already hard enough without visualizing it as hex. convert 'n' to dec rather than converting 'k' etc to hex , and then use a BigInt lib to do the math including comparisons.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.