Optimal arbitrary value integer encoding.: april 2015

donderdag 9 april 2015

The 8 bit prototype.

For practical implementations of the algorithm discussed in the previous posting, the following python code may be used as a prototype.

class SBE8_(): #optimal stop bit encoding of arbitrary int values at char size 8
    def enc(self, I):
        L=1 #nr chars in encoding
        Vo=1<<7 #overfow value of one character
        while I>=Vo:
            I-=Vo
            L+=1
            Vo=Vo<<7 #overflow value of L characters
        Enc=bytearray()
        Sbit=128 #stop bit termination value
        Vo=1<<7
        while L:
            I,Vc=divmod(I,Vo)
            Enc.insert(0,Sbit+Vc)
            Sbit=0 #continue value of stop bit
            L-=1
        return Enc

    def dec(self, Enc):
        I=Enc.pop(0)
        Sbit=I&128
        I=I&127
        if Sbit: #done decoding
            return I
        L=0 #nr characters processed minus 1
        while not Sbit:
            Vc=Enc.pop(0)
            Sbit=Vc&128
            Vc=Vc&127
            I=(I<<7)+Vc #add character payload value
            L+=1
        while L:
            I+=(1<<L*7) #add character overflow value
            L-=1
        return I

Suggestions for a better algorithm and improved implementations are most welcome.

woensdag 8 april 2015

The new proposal

The new algorithm is a variation of the stop bit encodings that are currently being used.

Stop bit encoding reserves one bit position in every character to indicate whether the character being examined is the last one within the encoding (termination value) or is being followed by another character (continuation value).

The remaining bits are part of the actual integer encoding, the payload bits.

Any position within the character can be reserved for the stop bit.

The most significant bit is a popular choice that will be adopted in the proposal.

The (continuation, termination) values can either be (0,1) or (1,0).

In this proposal 0 is used as the continuation value and 1 as the termination indicator.

The stop bit encodings give the following results for character size 2:

0 : T0

1 : T1

2 : C1T0

3 : C1T1

4 : C1C0T0

5 : C1C0T1

6 : C1C1T0

7 : C1C1T1

8 : C1C0C0T0

9 : C1C0C0T1

etc.

The proposal is to change that into:

0 : T0

1 : T1

2 : C0T0

3 : C0T1

4 : C1T0

5 : C1T1

6 : C0C0T0

7 : C0C0T1

8 : C0C1T0

9 : C0C1T1

10 : C1C0T0

11 : C1C0T1

12 : C1C1T0

13 : C1C1T1

14 : C0C0C0T0

15 : C0C0C0T1

etc.

The following code illustrates the algorithm for various character sizes:

class SBE_(): #optimal stop bit encoding of arbitrary integer values
    def __init__(self, ChrLen=2, Cbit='0', Tbit='1'):
        self.Clp=ChrLen-1 #nr. payload bits per char
        self.Cbit=Cbit
        self.Tbit=Tbit

    def enc(self, I):
        L=1 #nr chars in encoding
        Vo=1<<(self.Clp) #overfow value of one character
        while I>=Vo:
            I=I-Vo
            L=L+1
            Vo=Vo<<self.Clp #overflow value of L characters
        Enc=''
        Sbit=self.Tbit
        Vo=1<<self.Clp
        while L:
            I,Vc=divmod(I,Vo)
            Enc=Sbit+(self.Clp*'0'+bin(Vc)[2:])[-self.Clp:]+Enc
            Sbit=self.Cbit
            L=L-1
        return Enc

    def dec(self, Enc):
        Enc,Sbit,I=self.rdchr(Enc)
        if Sbit==self.Tbit:
            return I
        L=1 #nr characters processed
        while Sbit==self.Cbit:
            Enc,Sbit,Vc=self.rdchr(Enc)
            I=(I<<self.Clp)+Vc
            L=L+1
        L=L-1
        while L:
            I=I+(1<<L*self.Clp) #add overflow value
            L=L-1
        return I

    def rdchr(self, Enc):
        Sbit=Enc[0]
        Cval=int('0b'+Enc[1:1+self.Clp],2)
        Enc=Enc[self.Clp+1:]
        return Enc, Sbit, Cval

In the next posting I will present a prototype that may be used in the development of an 8 bit character practical implementation.

Inputs from comp.lang.python.

I asked for help on comp.lang.python with the implementation of the 8 bit codec that I introduced in the previous posting.

The question triggered a very helpful discussion that made me change the proposed encoding algorithm.

Part of the discussion covered the question why I would want to use such an obscure algorithm.

There are many algorithms available, each with their own merits.

The answer is that I am looking for the algorithm that requires the minimal number of bits to encode arbitrary integer values.

None of the currently available algorithms does that.

Dave Angel showed me that the algorithm I proposed didn't either.

So I went back to the drawing board with the objective to find something better then the MIDI algorithm he used to show me.

This time I tested the new proposal more thoroughly myself.

It beats the MIDI encoding for every character size and it beats the the first proposal for character sizes larger then 2.

At character size 2 the two proposals do compete with each other, depending on the integer value.

For now I consider the new proposal to be my best option.

I will present it in the next posting and I hope that it may be improved with your help.