& becomes & during FTP to MVS - java

I am using a java library (edtftpj) to FTP a file from a web app hosted of a tomcat server to an MVS system.
The FTP transfer mode is ASCII and transfer is done using FTP streams. The data from a String variable is stored into an MVS dataset.
The problem is all the ampersand characters get converted to & . I have tried various escape characters including \& , ^& and X'50' (hex value), but none of it helped.
Anyone has any idea how to escape the ampersands please?

Nothing in the FTP protocol would cause this encoding behavior.
Representing & as & is an XML based escaping representation. Other systems might use the same scheme, but as a standard, this is an XML standard encoding.
Something in the reading of the data and writing of the data thinks it should be escaping this information and is doing the encoding.
If anything on the MVS system is using Java it is probably communicating via SOAP with some other connector, which implies XML, which could be causing the escape sequence to be happening.
Either way, the FTP protocol itself part is not part of the problem, ASCII transfer should only encode things like line endings, & is already a valid ASCII character and would not be affected. It is the MVS system that is doing this escaping if anything.
Binary transfer is preferred in almost every case, since it doesn't do any interpretation or encoding of the raw bytes.

Using FTP in ASCII-mode to/from a MVS (z/OS) will always perform code page conversions (i.e ASCII <-> EBCDIC) for the data connection. Thus it's very important to setup the connection with the appropriate parameters depending on dataset type and page codes. Example:
site SBD=(IBM-037,ISO8859-1)
site TRAck
site RECfm=FB
site LRECL=80
site PRImary=5
site SECondary=5
site BLKsize=6233
site Directory=50
As alternative, use BINARY mode and manually perform the conversions with some of the standard tools or libraries on the receiving end.
Ref links:
1. Preset commands to tackle codepage problem.
2. Coverting ASCII to EBCDIC via FTP on MVS Host.
3. Transferring Files to and from MVS.
4. FTP code page conversion.
5. FTP File Transfer Protocol and Z/OS (pdf).

Related

Non-ascii characters being consumed incorrectly by Apache-camel sftp consumer

I am using a simple camel sftp route such as this:
.(sftp:account#host/some-directory?password=somePassword&charset=utf-8&delay=10000&preMove=.processing&move=.done)
It polls the sftp server grabbing files and persisting the data to a database. The files on the server are encoded in utf-8, here is a sample name with a special character:
María
This character is consumed by the route and saved as:
Mar??????a
Any idea as to why these characters are being consumed incorrectly?
To answer my own question, there was an issue with the BeanIODataFormat marshaling the data. The default encoding for the BeanIODataFormat was set to ascii. To solve the issue I had to set the BeanIODataFormat encoding to utf-8 manually.

Email intercepting techniques

I have a requirement where I intend to build a tool to scan the email contents including the attachments. The email servers is either going to be SendMail or z/OS Communication Server, both support SMTP. The sever is not Miscrosoft implementation so MAPI or Outlook API is not there into picture. The tool would be Java based code and basically need to look for contents that are not-permitted based on some rules. What are my options here? There is the possibility of using a proxy server but we are looking for a more direct approach.
The z/OS Communication Server SMTP implementation has a built-in "exit" capability - see http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/F1A1B4B0/30.3?DT=20110609204120#HDRWQ1299.
The exit is called for just about any SMTP activity and it can examine, change or reject just about anything based on the rules you establish. It is generally written in IBM Assembler Language, but there's no reason you couldn't have a thin assembler layer that passes data to a Java app using whatever protocol you like (say, a pipe or a socket).
There are many little details to handle, such as character encoding (EBCDIC vs. ASCII or UTF-8, for example) plus weeding out attachments from email content. But using the exit preserves all the z/OS specific features of IBM's SMPT server without trying to recreate any of that yourself.
Good luck!

Does a file uploaded using JSP need to encode before upload?

If a binary file uploaded using JSP, the binary data may contain some bytes that have special meaning to some network devices and will cause problems when passing through these devices, if I upload a file like a image, do I need to encode the file with Base64 or some other encodings?
If you are using form in jsp i.e
<form enctype="multipart/form-data">
Then no need of encoding.It will be send to server as Multipart file.
And it depends on your other technique what are you using to upload your file.
There is no need to encode the file. When you send data using some network protocol for example TCP the data is enclosed into protocol envelope. The envelope fields may be used by network hardware for example fields like IP address could be analyzed. But you data payload is not analyzed and therefore could not have any special meaning to routers, gateways etc.

How should I handle LIST command in my FTP server?

I'm writing an FTP server with Java, and now I want to answer to LIST command.Sending only file names is enough, and I don't need to send file size, owner, permission, etc. It seems that just sending some strings, as file names, does not satisfy the client (I tried both ASCII and binary formats). How can I find out what does an FTP client expect as a reply?
I'm testing my server using FireFTP and FileZilla
If you want to create a compatible FTP server, you need to handle LIST and NLST (standard commands) and also MLST and MLSD extension commands.
Format for LIST command is not defined anywhere and there are about 400 formats encountered in the world. Using Unix ls format or Windows DIR format would work with most clients as these are formats quite widespread and well supported by the clients.
NLST is the list of file names only.
MLST and MLSD use the machine-parseable format (this is what M letter stands for) which is described in RFC 3659. It's easier for the clients to handle and it's support is very welcome.
The canonical place to look is the relevant RFC: http://www.ietf.org/rfc/rfc959.txt
Unfortunately, in this particular instance the RFC is pretty vague:
Since the information on a file may vary widely from system
to system, this information may be hard to use automatically
in a program, but may be quite useful to a human user.
In order to ensure compatibility with existing FTP clients, your best bet is to look at some widely-deployed FTP server software and emulate the format of its output.

Handling Character Encoding in URI on Tomcat

On the web site I am trying to help with, user can type in an URL in the browser, like following Chinese characters,
http://localhost:8080?a=测试
On server, we get
GET /a=%E6%B5%8B%E8%AF%95 HTTP/1.1
As you can see, it's UTF-8 encoded, then URL encoded. We can handle this correctly by setting encoding to UTF-8 in Tomcat.
However, sometimes we get Latin1 encoding on certain browsers,
http://localhost:8080?a=ß
turns into
GET /a=%DF HTTP/1.1
Is there anyway to handle this correctly in Tomcat? Looks like the server has to do some intelligent guessing. We don't expect to handle the Latin1 correctly 100% but anything is better than what we are doing now by assuming everything is UTF-8.
The server is Tomcat 5.5. The supported browsers are IE 6+, Firefox 2+ and Safari on iPhone.
Unfortunately, UTF-8 encoding is a "should" in the URI specification, which seems to assume that the origin server will generate all URLs in such a way that they will be meaningful to the destination server.
There are a couple of techniques that I would consider; all involve parsing the query string yourself (although you may know better than I whether setting the request encoding affects the query string to parameter mapping or just the body).
First, examine the query string for single "high-bytes": a valid UTF-8 sequence must have two or more bytes (the Wikipedia entry has a nice table of valid and invalid bytes).
Less reliable would be to look a the "Accept-Charset" header in the request. I don't think this header is required (haven't looked at the HTTP spec to verify), and I know that Firefox, at least, will send a whole list of acceptable values. Picking the first value in the list might work, or it might not.
Finally, have you done any analysis on the logs, to see if a particular user-agent will consistently use this encoding?

Categories