When I am sending a Variable length file from Mainframe Connect direct to UNIX box, the file on UNIX have some extra bytes on the beginning of the Mainframe file, I tried using different SYSOPTS option but I am still getting those intial bytes. Any Idea ?
You should look at getting the file copied to a Fixed-Length record (recfm=FB) file on the mainframe before doing the transfer. There are a number of mainframe utilities that can do this (i.e. sort).
If you transfer it as a VB file you should also leave it as an EBCDIC file (the BDW/RDW fields are binary fields and should not be translated to ASCII).
As others have said, it would be useful to have an example of the file.
Following on from NealB. A vb file on the mainframe is stored in this format
<BDW><RDW>Record Data 1
<RDW>Record Data 2
....
<RDW>Record Data n-1
<BDW><RDW>Record Data n
<RDW>Record Data n+1
....
<RDW>Record Data o-1
<BDW><RDW>Record Data o
<RDW>Record Data o+1
....
Where
BDW : Block descriptor word is 4 bytes; the first 2 bytes are the block length (big endian format); the last 2 bytes will be hex 0's for disk files (tapes files can use these 2 bytes).
RDW : Record Descriptor word is 4 bytes; the first 2 bytes are the record length (big endian format); the last 2 bytes will be hex 0's.
So if Block length was 240 (and contained 3 80-byte records) then the file would be
---BDW--- ---RDW---
00F0 0000 0050 0000 80-bytes of data (record 1)
0050 0000 80-bytes of data (record 2)
0050 0000 80-bytes of data (record 3)
There may be a unix utility for handling mainframe VB files
There are are some vb options for Connect-Direct (NDM) (see http://pic.dhe.ibm.com/infocenter/sb2bi/v5r2/index.jsp?topic=%2Fcom.ibm.help.cd_interop_sysopts.doc%2FCDP_UNIXSysopts.html).
Looking at the documentation, you can not combine vb options with ascii translation; converting the file to Fixed-Length records (recfm=FB) on the mainframe may make a lot of sense.
Note: You could try looking at the file with the Record Editor and using the File-Wizard (button to the left of the layout name). The wizard should pickup that it is a Mainframe-VB file.
Note: While converting the file to a fixed-length record on the mainframe would be the best option, the java project JRecord can read Mainframe VB files if need be
Some extra bytes... how many is "some"?
If there are always 4 bytes, these may be the RDW (Record Descriptor Word) which carries the record length.
I don't know much about Connect Direct, but from a command line FTP session on the mainframe you
can verify the RDW status using the LOCSTAT command as follows:
Command:
LOCSTAT RDW
RDW's from VB/VBS files are retained as part of data.
Command:
If you see the above message you can drop the RDW's using the following command:
LOCSITE NORDW
If you are pulling from the mainframe then you can find out whether RDW's are being stripped or not using FTP command:
QUOTE STAT
You will then see several messages, one of which reports the RDW status:
211-RDWs from variable format datasets are retained as part of the data.
Again, you can fix this with
QUOTE SITE NORDW
after which QUOTE STAT should give you:
211=RDWs from variable format datasets are discarded
Are the extra bytes 0xEF 0xBB 0xBF, 0xFF 0xFE or 0xFE 0xFF? That's the UTF Byte Order Marker.
If it's UTF-8, ignore it. Strip it, if you like. It's pointless.
If it's UTF-16, then you can use the bytes to determine endianness. If you know the endianness, it's safe to ignore or strip them.
If you control the application generating the files, change it from saving UTF. Just save the files as ASCII and the BOMs will go away.
Related
I'm trying to convert mainframe fixed length file from EBCDIC format to ASCII format. Currently I'm reading file using JZOS api (ZFile) and converting field by field. is it possible to convert without knowing the layout of file (aka COPYBOOK) by just reading entire bytes of a record or line? If so how do handle packed decimals and binary values?
is it possible to convert without knowing the layout of file (aka
COPYBOOK) by just reading entire bytes of a record or line?
No.
Text fields must be converted from EBCDIC to ASCII. Binary and packed decimal fields must not be converted. If you convert binary fields then you will alter their values, it's possible (likely? certain?) you will destroy their values.
Binary fields coming from a mainframe will be big-endian, you may need to convert these to little endian. +10 in a halfword on a mainframe is x'000A' while on a little endian machine it is x'0A00'.
Packed decimal fields may have implied decimal positions. If your file contains x'12345C' that may represent +123.45 or +12,345. The format of the field tells you how to interpret the data.
You cannot do the conversion without knowing the record layout including field formats.
In my experience, the best way to avoid these difficulties is to preprocess the file on the mainframe, converting all binary and packed decimal fields to text with embedded explicit signs and decimal points. Then the file can safely go through code page (EBCDIC to ASCII in this case) conversion.
Such preprocessing can easily be done with the mainframe SORT utility, which typically excel at data transformations.
I am seeing something unusual in my zip files.
I have two .txt files and both are then zipped through java.util.zip(ZipOutputStream, ZipEntry ...) in my application and then returned in response as downloadable zip files through the browser.
One file has data which is a database blob and other is a StringBuffer. My blob txt file is of size 10 mb and my StringBuffer txt file is 15 mb but when these are zipped the blob txt zip file has size larger that the StringBuffer txt file although it contains a smaller txt file.
Any reason why this might be happening?
the StringBuffer and (as of Java 5) StringBuilder classes store just
the buffer for the character data plus current length (without the
additional offset and hash code fields of a String), but that buffer
could be larger than the actual number of characters placed in it;
a Java char takes up two bytes, even if you're using them to store
boring old ASCII values that would fit into a single byte;
Your BLOB -- binary large object -- probably contains data that isn't text, and as compressible as text. For example, it could contain an image.
If you don't already know what the blob contains, you can use a hexdump program to look at it.
Question may be quite vague, let me expound it here.
I'm developing an application in which I'll be reading data from a file. I've a FileReader class which opens the file in following fashion
currentFileStream = new FileInputStream(currentFile);
fileChannel = currentFileStream.getChannel();
data is read as following
bytesRead = fileChannel.read(buffer); // Data is buffered using a ByteBuffer
I'm processing the data in any one of the 2 forms, one is binary and other is character.
If its processed as character I do an additional step of decoding this ByteBuffer into CharBuffer
CoderResult result = decoder.decode(byteBuffer, charBuffer, false);
Now my problem is I need to read by repositioning the file from some offset during recovery mode in case of some failure or crash in application.
For this, I maintain a byteOffset which keeps track of no of bytes processed during binary mode and I persist this variable.
If something happens I reposition the file like this
fileChannel.position(byteOffset);
which is pretty straightforward.
But if processing mode is character, I maintain recordOffset which keeps track of character position/offset in the file. During recovery I make calls to read() internally till I get some character offset which is persisted recordOffset+1.
Is there anyway to get corresponding bytes which were needed to decode characters? For instance I want something similar like this if recordOffset is 400 and its corresponding byteOffset is 410 or 480 something( considering different charsets). So that while repositioning I can do this
fileChannel.position(recordOffset); //recordOffset equivalent value in number of bytes
instead of making repeated calls internally in my application.
Other approach I could think for this was using an InputStreamReader's skip method.
If there are any better approach for this or if possible to get byte - character mapping, please let me know.
Does anyone have experience with reading eVRC (Electronic Vehicle Eegistration Cards), and APD U commands in JAVA?
Any example will be useful.
Thanks in advance.
I would strongly suggest you would go with the javax.smartcardio libraries. Note that there are some availability issues, such as for 64 bit and access conditions for 32 bits in the later Java runtime environments. That said, the APDU and CardTerminal interface is pretty neat compared to many other API's dealing with APDU's.
[UPDATE] about the commands, this seems to be a simple file based card that does not perform any encryption, and employs a proprietary file structure within the specified DF. So the basic operation is: retrieve ATR, SELECT by AID, now you are in the DF (the root of the application). Then select each file using SELECT by File ID, followed by an X number of READ BINARY commands.
E.g.
send "00A4040C 0X <AID>" // SELECT DF aid was not given in document, so find this out, probably JRC
send "00A40200 02 D001 00" // SELECT EF.Registration_A (and hopefully parse the response to get the file length)
send "00B00000 00" // READ BINARY return up to 256 bytes or
send "00B00005 XX" // READ BINARY return xx bytes, the number of bytes left, from offset 05
That would be in Java (out of the top of my head):
CommandAPDU command = new CommandAPDU(0x00, 0xA4, 0x02, 0x00, new byte[] { (byte) 0xD0, (byte) 0x01 }, 256);
ResponseAPDU response = channel.send(command);
Note that you might need to parse the first few bytes of the READ BINARY to find out the file length in a compatible way. Make sure you don't read over the actual number of bytes still left as you might get any error basically. When looping, only count the number of bytes actually returned, not the (maximum) number requested.
If you are using the smartcard IO libs, you only have to specify the first 4 bytes as the header, then the data (the length of the command data will be calculated for you) and then Ne, the maximum number of bytes you want returned (if applicable).
The main pain is parsing the underlying BER structure and verifying the signature of course, but I consider that out of scope.
You may like https://github.com/grakic/jevrc
JEvrc is a reusable open source Java library for reading public data from the Serbian/EU eVRC card. It includes a simplified TLV parser for parsing card data. It supports Serbian eVRC card profile but should be possible to generalize with a patch or two.
I read from ORACLE of the following bit:
Can I execute methods on compressed versions of my objects, for example isempty(zip(serial(x)))?
This is not really viable for arbitrary objects because of the encoding of objects. For a particular object (such as String) you can compare the resulting bit streams. The encoding is stable, in that every time the same object is encoded it is encoded to the same set of bits.
So I got this idea, say if I have a char array of 4M something long, is it possible for me to compress it to several hundreds of bytes using GZIPOutputStream, and then map the whole file into memory, and do random search on it by comparing bits? Say if I am looking for a char sequence of "abcd", could I somehow get the bit sequence of compressed version of "abcd", and then just search the file for it? Thanks.
You cannot use GZIP or similar to do this as the encoding of each byte change as the stream is processed. i.e. the only way to determine what a byte means is to read all the bytes previous.
If you want to access the data randomly, you can break the String into smaller sections. That way you only need to decompress a relative short section of data.