Hercules Shared Device Server

Overview
Technical BS
Caching
Compression
To Do

Overview

Shared device support allows multiple Hercules instances to share devices. The device will be 'local' to one instance and 'remote' to all other instances. The local instance is the 'server' for that device and the remote instance is the 'client'. You do not have to IPL an operating system on the device server. Any number of Hercules instances can act as a server in a "Hercplex".

To use a device on a remote system, instead of specifying a file name on the device config statement, you specify

ip_address_or_name:port:devnum

For example:

    0100  3350  localhost:3990:0100

which says there is a device server on the local host listening on port 3990 and we want to use its 0100 device as 0100. The default port is 3990 and the default remote device number is the local device number. So we could say:

    0100  3350  localhost

instead, providing we don't actually have a file 'localhost'. Interestingly, the instance on the local host listening on 3990 could have a statement:

    0100  3350  192.168.200.1::0200

which means that instance in turn will use device 0200 on the server at 192.168.200.1 listening on port 3990. The original instance will have to 'hop' thru the second instance to get to the real device.

Device sharing can be 'split' between multiple instances. For example, suppose instance A has:

    SHRDPORT 3990
    0100  3350  localhost:3991
    0101  3350  mvscat

and instance B has:

    SHRDPORT 3991
    0100  3350  mvsres
    0101  3350  localhost

Then each instance acts as both a client and as a server.

When 'SHRDPORT' is specified, thread 'shared_server' is started at the end of Hercules initialization. In the example above, neither Hercules instance can initialize their devices until the server is started on each system. In this case, the device trying to access a server gets the 'connecting' bit set on in the DEVBLK and the device still needs to initialize. After the shared server is started, a thread is attached for each device that is connecting to complete the connection (which is the device init handler).

Technical BS

There are (at least) two approaches to sharing devices. One is to execute the channel program on the server system. The server will need to request from the client system information such as the ccw and the data to be written, and will need to send to the client data that has been read and status information. The second is to execute the channel program on the client system. Here the client system makes requests to the server system to read and write data.

The second approach is currently implemented. The first approach arguably emulates 'more correctly'. However, an advantage of the implemented approach is that it is easier because only the client sends requests and only the server sends responses.

Both client and server have a DEVBLK structure for the device. Absurdly, perhaps, in originally designing an implementation for shared devices it was not clear what type of process should be the server. It was a quantum leap forward to realize that it could just be another hercules instance.

Protocol

(If this section is as boring for you to read as it was for me to write then please skip to the next section ;-)

The client sends an 8 byte request header and maybe some data:

    +-----+-----+-----+-----+-----+-----+-----+-----+
    | cmd |flag |  devnum   |    id     |   length  |
    +-----+-----+-----+-----+-----+-----+-----+-----+

                <-------- length --------->
                +----- .  .  .  .  . -----+
                |          data           |
                +----- .  .  .  .  . -----+

'cmd' identifies the client request. The requests are:

0xe0 CONNECT: Connect to the server. This requires the server to allocate resources to support the connection. Typically issued during device initialization or after being disconnected after a network error or timeout.
0xe1 DISCONNECT: Disconnect from the server. The server can now release the allocated resources for the connection. Typically issued during device close or detach.
0xe2 START: Start a channel program on the device. If the device is busy or reserved by another system then wait until the device is available unless the NOWAIT flag bit is set, then return a BUSY code. Once START succeeds then the device is unavailable until the END request.
0xe3 END: Channel program has ended. Any waiters for the device can now retry.
0xe4 RESUME: Similar to START except a suspended channel program has resumed.
0xe5 SUSPEND: Similar to END except a channel program has suspended itself. If the channel program is not resumed then the END request is not issued.
0xe6 RESERVE: Makes the device unavailable to any other system until a RELEASE request is issued. Must be issued within the scope of START/END.
0xe7 RELEASE: Makes the device available to other systems after the next END request. Must be issued within the scope of START/END.
0xe8 READ: Read from a device. A 4-byte 'record' identifier is specified in the request data to identify what data to read in the device context. Must be issued within the scope of START/END.
0xe9 WRITE: Write to a device. A 2-byte 'offset' and a 4-byte 'record' is specified in the request data, followed by the data to be written. 'record' identifies what data is to be written in the device context and 'offset' and 'length' identify what to update in 'record'. Must be issued within the scope of START/END.
0xea SENSE: Retrieves the sense information after an i/o error has occurred on the server side. This is typically issued within the scope of the channel program having the error. Client side sense or concurrent sense will then pick up the sense data relevant to the i/o error. Must be issued within the scope of START/END.
0xeb QUERY: Obtain device information, typically during device initialization.
0xec COMPRESS: Negotiate compression parameters. Notifies the server what compression algorithms are supported by the client and whether or not data sent back and forth from the client or server should be compressed or not. Typically issued after CONNECT.
NOTE: This action should actually be SETOPT or some such; it was just easier to code a COMPRESS specific SETOPT (less code).

'flag' qualifies the client request and varies by the request.

0x80 NOWAIT: For START, if the device is unavailable then return BUSY instead of waiting for the device.
0x40 QUERY: Identifies the QUERY request:
0x41 DEVCHAR: Device characteristics data
0x42 DEVID: Device identifier data
0x43 DEVUSED: Hi used track/block (for dasdcopy)
0x48 CKDCYLS: Number cylinders for CKD device
0x4c FBAORIGIN: Origin block for FBA
0x4d FBANUMBLK: Number of FBA blocks
0x4e FBABLKSIZ: Size of an FBA block
0x3x COMP: For WRITE, data is compressed at offset 'x':
0x2x BZIP2: using bzip2
0x1x LIBZ: using zlib
0xxy: For COMPRESS, identifies the compression algorithms supported by the client (0x2y for bzip2, 0x1y for zlib, 0x3y for both) and the zlib compression parameter 'y' for sending otherwise uncompressed data back and forth. If 'y' is zero (default) then no uncompressed data is compressed between client & server.

'devnum' identifies the device by number on the server instance. The device number may be different than the device number on the client instance.

'id' identifies the client to the server. Each client has a unique positive (non-zero) identifier. For the initial CONNECT request 'id' is zero. After a successful CONNECT, the server returns in the response header the identifier to be used for all other requests (including subsequent CONNECT requests). This is saved in dev->rmtid.

'length' specifies the length of the data following the request header. Currently length is non-zero for READ/WRITE requests.

The server sends an 8 byte response header and maybe some data:

    +-----+-----+-----+-----+-----+-----+-----+-----+
    |code |stat |  devnum   |    id     |  length   |
    +-----+-----+-----+-----+-----+-----+-----+-----+

                <-------- length --------->
                +----- .  .  .  .  . -----+
                |          data           |
                +----- .  .  .  .  . -----+

'code' indicates the response to the request. OK (0x00) indicates success however other codes also indicate success but qualified in some manner:

0x80 ERROR: An error occurred. The server provides an error message in the data section.
0x40 IOERR: An i/o error occurred during a READ/WRITE request. The status byte has the 'unitstat' data. This should signal the client to issue the SENSE request to obtain the current sense data.
0x20 BUSY: Device was not available for a START request and the NOWAIT flag bit was turned on.
0x10 COMP: Data returned is compressed. The status byte indicates how the data is compressed (zlib or bzip2) and at what offset the compressed data starts (0 .. 15). This bit is only turned on when both the 'code' and 'status' bytes would otherwise be zero.
0x08 PURGE: START request was issued by the client. A list of 'records' to be purged from local cache is returned. These are 'records' that have been updated since the last START/END request from the client by other systems. Each record identifier is a 4-byte field in the data segment. The number of records then is 'length'/4. If the number of records exceeds a threshold (16) then 'length' will be zero indicating that the client should purge all locally cached records for the device.

'stat' contains status information as a result of the request. For READ/WRITE requests this contains the 'unitstat' information if an IOERR occurred.

'devnum' specifies the server device number

'id' specifies the system identifier for the request.

'length' is the size of the data returned.

Caching

Cached records (eg CKD tracks or FBA blocks) are kept independently on both the client and server sides. Whenever the client issues a START request to initiate a channel program the server will return a list of records to purge from the client's cache that have been updated by other clients since the last START request. If the list is too large the server will indicate that the client should purge all records for the device.

Compression

Data that would normally be transferred uncompressed between client and host can optionally be compressed by specifying the 'comp=' keyword on the device configuration statement or attach command. For example:

    0100  3350  192.168.2.12  comp=3

The value of the 'comp=' keyword is the zlib compression parameter which should be a number between 1 .. 9. A value closer to 1 means less compression but less processor time to perform the compression. A value closer to 9 means the data is compressed more but more processor time is required.

If the server is on 'localhost' then you should not specify 'comp='. Otherwise you are just stealing processor time to do compression/ uncompression from hercules. If the server is on a local network then I would recommend specifying a low value such as 1, 2 or 3. We are on a curve here, trying to trade cpu cycles for network traffic to derive an optimal throughput.

If the devices on the server are compressed devices (eg CCKD or CFBA) then the 'records' (eg. track images or block groups) may be transferred compressed regardless of the 'comp=' setting. This depends on whether the client supports the compression type (zlib or bzip2) of the record on the server and whether the record is actually compressed in the server cache.

For example:

Suppose on the client that you execute one or more channel programs to read a record on a ckd track, update a record on the same track, and then read another (or the same) record on the track.

For the first read the server will read the track image and pass it to the client as it was originally compressed in the file. To update a portion of the track image the server must uncompress the track image so data in it can be updated. When the client next reads from the track image, the track image is uncompressed.

Specifying 'comp=' means that uncompressed data sent to the client will be compressed. If the data to be sent to the client is already compressed then the data is sent as is, unless the client has indicated that it does not support that compression algorithm.

To Do

More doc (sorry, I got winded)
Delays observed during short transfers (redrive select ?)
Better server side behaviour due to disconnect
etc.

Greg Smith gsmith@nc.rr.com

Last updated $Date$ $Revision$