Jython Journeys

Notes about my work with jython and python

Asynchronous networking for jython: the select module

without comments

Over the Christmas holidays in 2003, I wrote a design for how Asynchronous Socket I/O might be implemented in jython, using java.nio. I wrote up some notes in HTML, and placed them on a group of pages on xhaus.com. These notes were the basis of the design which I later used to implement asynchronous I/O for jython sockets, which is now a part of the jython distribution.

Although these notes are now out-of-date, having been surpassed by the actual implementation itself, I am publishing them here for historical purposes. The notes are broken down into four main areas.

  1. Overview
  2. Socket module
  3. Select module
  4. Asyncore module

For documentation on how to use the jython’s asynchronous socket I/O, see the documentation for the socket, select, asyncore, and asynchat modules.


Implementing the select function and poll objects

The purpose of this document is to discuss implementation strategies for a jython implementation of the cpython select module. In cpython, the select module provides two different methods for simultaneously examining the readiness status of a set of I/O channels.

  1. The select function, which is derived from the original Unix select call, and takes as parameters three lists of file descriptors:
    1. a list containing file descriptors which may be readable
    2. a list containing file descriptors which may be writable
    3. a list containing file descriptors which may have out-of-band(OOB) data

    When the select function is called, it reads through each of these lists, and prepares its own internal lists of file descriptors which are to be “selected”. It then examines the readiness of the file descriptors in this internal list, and returns three lists which are subsets of the three lists passed in as parameters.

  2. Poll objects, which are object-oriented descendants of the select function. In order to examine the readiness status of multiple file descriptors, the programmer creates a poll object. File descriptors are explicitly registered with the poll object. These descriptors are checked every time the poll objects poll() method is called, until the file descriptor is explicitly unregistered.

As mentioned on the cpython Poll objects documentation page, poll objects are superior to the select function call, since they only require listing the file descriptors of interest, while select() builds a bitmap, turns on bits for the fds of interest, and then afterward the whole bitmap has to be linearly scanned again. This means that performance of poll objects should be better than the select function.

Although poll objects may be superior, the select function call is still supported in cpython, for the following reasons.

  1. Historic. There is a substantial body of code written in cpython which uses the select function.
  2. Windows. Windows does not support poll objects (Windows has it’s own equivalent API call).
  3. Unix. Although most modern unices support poll objects, there are still some variants that do not.

As discussed below, cpython poll objects exhibit a very similar model to java.nio.channels.Selector objects, and thus it is natural to implement one using the other. Once poll objects have been implemented, a jython implementation of the select function can be layered on top of them.


Implementing poll objects with java Selector objects

A likely approach to implementing cpython poll objects in jython is to implement them using java Selector objects.

The following are similarities between cpython select.poll objects and java.nio.channels.Selector objects.

  1. Both are single objects which can “watch” multiple I/O channels at once.
  2. Both monitor a set of I/O channels which have been explicitly registered with the object.
  3. Both return a set of “keys” which indicate which channels are ready for I/O. In case of cpython, this is in the form of a list of tuples giving (file descriptor, event mask). In java’s case, this is in the form of a set of java.nio.channels.SelectionKeys, which contain an event mask (or in java terminology, a set of interestOps()).
  4. Both objects support the concept of a timeout, i.e. that the method call which watches the set of registered channels can timeout after a user specified time.
  5. Both objects support the concept of taking a “snapshot view”, i.e. a zero-length timeout.
  6. Both objects support the concept of waiting indefinitely for some readiness notification to appear.

The important differences are as follows

  1. Java Selector objects have clearly defined semantics when a single selector object is operated from multiple threads. In contrast, the documentation for cpython poll objects makes no guarantees (or even statements) at all about access to poll objects from multiple threads (presumably this is related to the fact that cpython has a Global Interpreter Lock (GIL) which prevents more than one thread from executing python at any given time. See below under Threading for more information).
  2. Java SelectionKey objects permit the attachment of application objects, meaning that the programmer can attach a handler object to each channel, and which might be rsponsible for handling all I/O on the channel. Note however that although this facility is useful when writing application frameworks, it is unlikely to be useful in the case of implementing a cpython compatible select module, since cpython poll objects do not support the “attachment” of application objects to I/O channels to be watched. However, the asyncore module may be an ideal place to exploit this feature.

Threading and the cpython Global Interpreter Lock

TBD: Need to understand possible use-cases for calling poll objects from multiple threads.

If you have come across a need to call poll objects from multiple threads, I’d like to hear about it.


Registering and unregistering channels

The cpython documentation states that when a channel is to be registered, the parameter passed can be either of two object types:-

  1. The channels file descriptor
  2. An object which has a fileno() method which returns a file descriptor.

In the jython implementation of poll objects, a SelectableChannel object is required. Therefore, it is proposed that in jython the same model as cpython be adopted, with the term file descriptor replaced by the term SelectableChannel.

If an object is registered which is an instance of java.nio.channels.SelectableChannel or any of subclasses, then that object represents the channel which is to be watched.

If the object is not such an instance, then the objects fileno() method should be called, to attempt retrieve a SelectableChannel. If the method does not exist, then the object is not watchable, and an exception should be raised. If the method returns an object, but that object is not a SelectableChannel, the channel is unwatchable and an exception should be raised. If the method returns an object and that object is a SelectableChannel, then that object is the channel to be watched.

Suggested code for this purpose is

import java.nio.channels
 
class poll:
 
    def _getselectable(self, channel):
        if isinstance(channel, java.nio.channels.SelectableChannel):
            return channel
        else:
            if hasattr(channel, 'fileno') and callable(getattr(channel, 'fileno')):
                result = getattr(channel, 'fileno')()
                if isinstance(result, java.nio.channels.SelectableChannel):
                    return result
                else:
                    raise JynioException('Channel '%s' is not a watchable channel' % channel)
            else:
                raise JynioException('Channel '%s' is not a watchable channel' % channel)

A typical call sequence using the register function is given here. Note that this code should work in both cpython and jython.

import select
from socket import *
 
pobj = select.poll()
s = socket(AF_INET, SOCK_STREAM)
s.setblocking(0)
s.connect( ('www.python.org', 80) )
pobj.register(s, select.POLLIN | select.POLLOUT)
readychans = pobj.poll()

Internal data of the poll object

The poll object needs to keep track of the channels which it is watching. In java, java.nio.channels.Selector objects are used to watch multiple channels. Therefore, it is proposed to have a Selector as an instance variable of every poll object, against which all channels can be registered. The Selector should be created at __init__ time of the poll object, like so.

import java.nio.channels
 
class poll:
 
    def __init__(self):
        [....]
        self.selector = java.nio.channels.Selector.open()
        [....]

In the cpython model, a channel is registered with a watcher, whereas in java the channel registers itself with the watcher, returning a SelectionKey representing the registration. This means that when a Selector returns a set of channels that are ready for I/O operations, it will return a set of SelectionKeys. But we need to return the original object back to the programmer that they passed in. So we need a mechanism for mapping SelectionKeys back to original parameters.

When a SelectionKey is returned from a Selector, the channel corresponding to that key can be obtained through the channel() method. Further, since we know that that channel represents a socket, we can retrieve the users original socket through the socket() method. However, if we do not keep track of the users original object, then we cannot know from the SelectionKey which object the user passed in. Therefore, we need to keep track of the original object the user passed into the function. This could be done with a simple dictionary which is internal to the jython poll object. The key for the dictionary would be the SelectableChannel corresponding to the channel.

When a channel is unregistered with a poll object, the channel should no longer be watched. The java model, however, is slightly different than what might be expected (surprise!): instead of a watcher unregistering a watchable channel, the key representing the registration of a watchable channel with a watcher must be cancelled! Therefore, it will be necessary to keep track of all SelectionKeys generated by the Selectable/Selector/registration process, so that they can be cancelled later.

It is proposed to store this information in a simple python dictionary which is internal to the jython poll object. The key for the dictionary would be the SelectableChannel corresponding to the channel. The values in the dictionary should contain the following.

  1. userobject, i.e. the object which the user passed into the registration function, and which should be returned to them when the channel is ready for I/O.
  2. SelectionKey, i.e. the SelectionKey representing the registration with the internal Selector object. This is so that the registration can be cancelled at a later stage.

This dictionary should created at __init__ time of the poll object, and modified at every channel registration and unregistration. The following are suggested algorithms for the poll object. Note that, according to the cpython 2.3 documentation, it is not an error for a channel to be registered twice with a poll object, but any attempt to unregister a channel that was never registered should result in a KeyError. Note that this code ignores the need for event flags, the implementation of which will be discussed below.

import java.nio.channels
 
class poll:
 
    def __init__(self):
        self.selector = java.nio.channels.Selector.open()
        self.chanmap = {}
 
     def _getselectable(self, userobject):
         # Implementation which seeks a SelectableChannel object given above
 
    def register(self, userobject, flags):
        # Flags ignored for now
        channel = self._getselectable(userobject)
        selectionkey = channel.register(self.selector, flags)
        self.chanmap[channel] = (userobject, selectionkey)
 
    def unregister(self, userobject):
        channel = self._getselectable(userobject)
        self.chanmap[channel][1].cancel()
        del self.chanmap[channel]

Mapping cpython event flags to java interest ops

When a channel is registered with a watcher, the programmer defines a set of events in which they are interested. In cpython, this is done through OR’ing bitmasks containing the following possible values

Flag Meaning
POLLIN Check whether the channel has data ready for reading
POLLOUT Check whether the channel is ready for writing data
POLLPRI Check whether there is Out-Of-Band data present on this channel.

The following are important to note

  1. Cpython does not have an event flag to represent a CONNECT event. Instead, a client socket is known to be connected when it is ready for writing, i.e. a POLLOUT event is generated.
  2. Cpython does not have an event flag to represent a ACCEPT event, i.e. that a server socket accept() would not block if called, and would return a new socket. Instead, a server socket is known to be ready to accept when it is ready for reading, i.e. a POLLIN event is generated.
  3. Note the above seems only to be documented in the cpython documentation for the asyncore module. It is also interesting to note that the asyncore module has reverse-engineered its own pseudo-events to represent the CONNECT and ACCEPT events.

Java also uses a OR’ed bitmasks of flags to represent the sert of events that is of interest on the channel. Java defines the following constants in the java.nio.channels.SelectionKey object.

Flag Meaning
OP_ACCEPT Check whether the server socket channel is ready for accepting. Server socket channels support this event only! They do not support OP_READ!
OP_CONNECT Check whether the client socket channel is connected
OP_READ Check whether the channel has data ready for reading
OP_WRITE Check whether the channel is ready for writing data

Obviously, this presents some difficulties, since there are discrepancies between the event flags that are used by cpython and java. Cpython uses the POLLIN flag to represent two events that are represented by separate flags in java: OP_ACCEPT and OP_READ. Similarly, cpython uses the POLLOUT flag to represent both the java events OP_CONNECT and OP_WRITE. There is no way in cpython to specify the CONNECT or ACCEPT events, but these are required by Java. Conundrum!

A naive solution to this problem might be to attempt to set both java flags as being of interest when the relevant cpython flag is specified. For example, use both event mask OP_READ | OP_ACCEPT when the python mask POLLIN is specified. However, this may result in error, since OP_ACCEPT is only a valid flag on server socket channels: it would probably generate an IllegalArgumentException if used on a client socket (see the documentation for the SelectionKey.interestOps() method for more information. As mentioned on that page, if an event flag is not present in the mask of valid operations on that channel (as determined by the SelectableChannel.validOps() method, then an exception is raised.

Therefore, a proposed solution is to only add these implied java events when they are present in the validOps mask for the channel being registered. Code to carry out this proposed mapping is as follows

from java.nio.channels.SelectionKey import OP_ACCEPT, OP_CONNECT, OP_WRITE, OP_READ
 
POLLIN   = 1
POLLOUT  = 2
POLLPRI  = 4
 
class poll:
 
    def register(self, channel, mask):
        jmask = 0
        if mask &; POLLIN:
            # Note that OP_READ is NOT a valid event on server socket channels.
            if channel.validOps() &; OP_ACCEPT:
                jmask = OP_ACCEPT
            else:
                jmask = OP_READ
        if mask &; POLLOUT:
            jmask |= OP_WRITE
            if channel.validOps() &; OP_CONNECT:
                jmask |= OP_CONNECT
        [....]

Note that the absence of the python POLLPRI constant is discussed below.

Combining this code with the code given above, this gives a final definition of the poll object (minus the definition of the poll() method) as follows.

import java.nio.channels
from java.nio.channels.SelectionKey import OP_ACCEPT, OP_CONNECT, OP_WRITE, OP_READ
 
# These values should be checked in the cpython implementation.
 
POLLIN   = 1
POLLOUT  = 2
POLLPRI  = 4
 
class poll:
 
    def __init__(self):
        self.selector = java.nio.channels.Selector()
        self.chanmap = {}
 
     def _getselectable(self, userobject):
         # Implementation which seeks a SelectableChannel object given above
 
    def register(self, userobject, mask):
        jmask = 0
        if mask &; POLLIN:
            # Note that OP_READ is NOT a valid event on server socket channels.
            if channel.validOps() &; OP_ACCEPT:
                jmask = OP_ACCEPT
            else:
                jmask = OP_READ
        if mask &; POLLOUT:
            jmask |= OP_WRITE
            if channel.validOps() &; OP_CONNECT:
                jmask |= OP_CONNECT
        channel = self._getselectable(userobject)
        selectionkey = channel.register(self.selector, jmask)
        self.chanmap[channel] = (userobject, selectionkey)
 
    def unregister(self, userobject):
        channel = self._getselectable(userobject)
        self.chanmap[channel][1].cancel()
        del self.chanmap[channel]

An implementation of the poll method

We are now in a position to define the poll() method of poll objects. The return value from this function should be a list of tuples, containing (channel, event mask), where event mask has bits set for all events that occurred on the channel. Since the event mask returned from the internal java.nio.channels.Selector object will possibly contain the extended java events OP_ACCEPT and OP_CONNECT, these should be mapped to the python events POLLOUT and POLLIN respectively.

The cpython poll.poll() method takes a timeout parameter, which means different things depending on its value. Java uses separate methods to represent these different timeout scenarios. The mapping from one to the other is given in this table.

Timeout value Cpython meaning Java equivalent
None The poll call is to block until at least one channel is ready for operation. Selector.select()
Negative value Same as timeout value of None Selector.select()
Positive value The value gives the timeout value in millseconds. Selector.select(long)
Zero The value gives a timeout value of zero milliseconds. Selector.selectNow()

Therefore, different methods of the internal Selector object will have to be called, depending on the timeout value specified.

class poll:
 
    def poll(self, timeout=None):
        if timeout is None or timeout < 0:
            self.selector.select()
        elif timeout == 0:
            self.selector.selectNow()
        else:
            # No multiplication required: both cpython and java use millisecond timeouts
            self.selector.select(timeout)
        selectedkeys = self.selector.selectedKeys() # is this thread-safe?
        results = []
        for k in selectedkeys: # Naive implementation, returned list is actually a java.util.Set
            jmask = k.readyOps()
            pymask = 0
            if jmask &; OP_READ: pymask |= POLLIN
            if jmask &; OP_WRITE: pymask |= POLLOUT
            if jmask &; OP_ACCEPT: pymask |= POLLIN
            if jmask &; OP_CONNECT: pymask |= POLLOUT
            # Now return the original userobject, and the return event mask
            # A python 2.2 generator would be sweet here
            results.append( (self.chanmap[k.channel()][0], pymask) )
        return results

How to deal with Out-Of-Band data

In cpython, it is possible to check for the presence of Out-Of-Band data on a channel. Out-of-Band data, also known as priority data or urgent data, is a TCP feature whereby bytes can be sent on a socket which bypass the normal sequencing restrictions imposed on data sent through a socket.

However, although this is a built-in feature of TCP sockets, such out-of-band data is rarely used in actual protocol implementations. Nonetheless, there may be software out there which uses this facility. The readiness status for presence of this data is checked by passing the flag POLLPRI when registering the channel.

Currently, the java.nio non-blocking APIs seem to have minimal support for Out-Of-Band data. There are no events masks with which one can register interest in such priority data. This is confirmed on the documentation page for java.net.Socket.setOOBInline() method, which says: “Note, only limited support is provided for handling incoming urgent data. In particular, no notification of incoming urgent data is provided and there is no capability to distinguish between normal data and urgent data unless provided by a higher level protocol“.

Because of this lack of support for priority/urgent data on the java platform, it is proposed that this form of processing not be supported in this module. If the programmer requires the use of urgent data with jython, i.e. on the java platform, then they will probably be sophisticated enough to bypass this implementation of the select module in jython, and go direct to the java API. A consequent question is whether POLLPRI flags should simply be ignored when used, or should raise an UnsupportedOperation exception.

If this is deemed to be an unacceptable compromise, then I have no solution to propose for the problem.


Implementing the select function using poll objects.

Now that we have finished the implementation of poll objects, we can use them to implement the select function, also in the cpython select module.

The only complexity is that the select function takes separate lists for channels that are to be examined for read and write events. It might be required that the user wants to watch a channel for both read and write events, so a given channel might be present in both lists. This means that if a channel is given in both lists, it must only be registered once with the selector object, using an event mask that checks for both READ and WRITE events.

def select ( read_fd_list, write_fd_list, outofband_fd_list):
	# First create a poll object to do the actual watching.
	pobj = poll()
	# Check the read list
	for fd in read_fd_list:
		mask = POLLIN
		if fd in write_fd_list:
			mask |= POLLOUT
		pobj.register(fd, mask)
	# And now the write list
	for fd in write_fd_list:
		if not fd in read_fd_list: # fds in both have already been registered.
			pobj.register(fd, POLLOUT)
	results = poll.poll()
	# Now start preparing the results
	read_ready_list, write_ready_list, oob_ready_list = [], [], []
	for fd, mask in results:
		if mask &; POLLIN:
			read_ready_list.append(fd)
		if mask &; POLLOUT:
			write_ready_list.append(fd)
	return read_ready_list, write_ready_list, oob_ready_list

Complete but untested code based on these notes.

The following code should be a complete implementation of the cpython select module, in jython, using java APIs. However, the following are important points

  1. Will not work without socket mods. In order for this code to work, a number of modifications would have to be made to the jython socket module. I currently have no plans to make these modifications: I can’t afford the time. Maybe in a few months.
  2. Code is completely untested. This code has only been tested for correct syntax. It has never even been run.

Written by alan.kennedy

December 26th, 2003 at 9:05 pm

Posted in jython

Tagged with , ,