Optimal arbitrary value integer encoding.

Character size.

The natural character size is 2 bits, and that will eventually become the standard.

Until then, the current standard of 8 bits will dominate, so that will be our choice in the first practical implementations of the codecs.

The following python code gives a reference model that can be used to verify more practical implementations:

class IETF8_(): #integer encoding termination field with character size 8
    def __init__(self):
        self.ChrLen=8 #character size

    def enc(self, I): #encode integer value in self
        #start by encoding integer value 0:
        Fprev=None #closest continuation field linked to me
        MaxLen=1 #max number of characters
        Len=1 #current (is initial) number of characters
        Val=0 #current nr of integer values being used
        #next encode larger integer values in steps of increasing dmax'es
        while I: #more to encode
            if Len<MaxLen: Dmax=2**(Len*self.ChrLen) #msb is 'free'
            else: Dmax=2**(Len*self.ChrLen-1) #msb indicates field type
            if I<Dmax: #it fits in my current shape
                Val=I
                I=0
            else: #i need to reshape myself
                if Fprev and Fprev.incr(): #i can grow
                    Len=Len+1
                else: #I can not grow, so i spawn a new continuation field
                      #and reset myself
                    Fprev=IECF8_(Fprev, MaxLen) #new continuation field
                    Len=1 #number of characters
                    Val=0 #nr integer values being used
                    MaxLen=Dmax #new max to self.Len is current Dmax
                I=I-Dmax
        #done encoding, return encoding fields sequence image
        #in this implementation we use a python bytearray object
        #to represent the bits
        IEtext=(Len*self.ChrLen*'0' +bin(Val)[2:])[-self.ChrLen*Len:]
        #ascii character bit encoding
        MyPersonalImg=bytearray() #compact encoding
        NrBytes=len(IEtext)//8
        for ByteNr in range(NrBytes):
            ByteImg=IEtext[ByteNr*8:ByteNr*8+8]
            MyPersonalImg.append(int(ByteImg,2))
        if Fprev: return Fprev.img()+ MyPersonalImg
        else: return MyPersonalImg

class IECF8_(): #integer encoding continuation field with character size 8
    def __init__(self, Fprev=None, Len=0):
        self.ChrLen=8 #character size
        self.Fprev=Fprev #previous integer encoding (continuation) field
        self.Len=Len #my (fixed) nr of char's
        self.Val=0 #first length indication possible,
                   #next field has 1 character to start with
        if Len: self.MaxVal=2**(self.Len*self.ChrLen-1)-1
                #largest length indication possible
        else: self.MaxVal=0

    def incr(self): #increase length indication of next field if possible
        if self.Val<self.MaxVal: #next field may grow
            self.Val=self.Val+1 #i note its growths
            return True #grant request to grow
        else: return False #deny request to grow in current shape

    def img(self): #image of me and my previous fields
        IEtext=(self.Len*self.ChrLen*'0' \
                +bin(self.Val+self.MaxVal+1)[2:])[-self.ChrLen*self.Len:]
        #ascii character bit encoding
        MyPersonalImg=bytearray() #compact encoding
        NrBytes=len(IEtext)//8
        for ByteNr in range(NrBytes):
            ByteImg=IEtext[ByteNr*8:ByteNr*8+8]
            MyPersonalImg.append(int(ByteImg,2))
        if self.Fprev: return self.Fprev.img()+ MyPersonalImg
        else: return MyPersonalImg

class IDF8_(): #integer decoding field with charactersize 8
    def dec(self, Sequ): #decode byte image sequence of fields
        Flen=1 #first field length is 1 character
        MaxFlen=1 #limit to growth
        Vi=0 #intermediate value (counter) of final integer
        #
        while Flen: #there is a next field in the encoding
            Border=2**(Flen*8-1) #2**(number of bits in field - 1)
            Fimg=Sequ[:Flen] #next field image
            Sequ=Sequ[Flen:] #strip image from sequence
            Bval=0 #intial value of binary field interpretation
            for ByteNr in range(Flen):
                Bval=Bval<<8 #previous field value shift
                ByteVal=Fimg[ByteNr] #current byte valuation
                Bval=Bval+ByteVal #final new field binary value interpretation
            if Flen==MaxFlen and Bval>=Border: #it's a continuation field
                FvA=0 #aggregated values of field in all shapes
                for L in range(1, MaxFlen): #all lengths from 1 upto MaxFlen
                    FvA=FvA+2**(L*8)
                FvA=FvA+Border #final shape adds half its binary max value
                Vi=Vi+FvA #add aggregated field value
                Flen=Bval-Border+1 #next field nr of chars
                MaxFlen=Border #next max len
            else:#it's the termination field
                FvA=0 #aggregated values of field in shapes ranging
                      #from len=1 to current len
                for L in range(1, Flen):
                    FvA=FvA+2**(L*8)
                Vi=Vi+FvA+Bval #add aggregated values plus binary field value
                Flen=0 #stop eating fields from Sequ
        return Vi, Sequ

Encoder=IETF8_()
Decoder=IDF8_()
for I in range(20):
    Enc=Encoder.enc(I*111)
    print(I*111, Enc, Decoder.dec(Enc))

Please contribute.

I invite the global programmers community to contribute practical implementations in an iterative cooperative development process that will eventually minimize processor time requirements of the codecs.

I am personally primarily interested in a python implementation, but contributions in different environments can obviously be just as helpful.

Optimal arbitrary value integer encoding.

maandag 16 februari 2015

Character size.

Please contribute.

Geen opmerkingen:

Een reactie posten