giflzw
giflzw is a small C library for incrementally encoding and decoding GIF image data streams using the LZW algorithm. I usually refer to encoding and decoding rather than compression and decompression. The GIF 89a specification (and older 87a specification) frequently use the same terms.
The giflzw library is general enough it may be used for encoding and decoding data other than GIF image data, but there are better alternatives in probably every case.
Common definitions and declarations
The definitions needed to use the library are:
#define GLZW_OK 0
#define GLZW_NO_INPUT_AVAIL 1
#define GLZW_NO_OUTPUT_AVAIL 2
#define GLZW_OUT_OF_MEMORY 3
#define GLZW_INTERNAL_ERROR 4
#define GLZW_INVALID_DATA 5
GLZWUint
is a typedef for an unsigned integer of at least 32 bits. GLZWByte
is unsigned char
. These types are used in the library to avoid collisions with names in programs that include the <glzwe.h>
and <glzwd.h>
header files.
In the examples here, I use Uint
and Byte
for these types.
Encoding (compression)
source code: header file and implementation
Header file for the encoding routines
#include <glzwe.h>
Initialization
int glzwe_init(void **pstate, const GLZWUint lzw_min_code_width);
Call once to initialize the encoder for encoding one stream of bytes. The function will allocate and initialize a structure and return a pointer to it via pstate
. The function also requires an integer that is described as “LZW Minimum Code Size” in section 22 of the GIF 89a specification.
pstate
will be used to return a pointer to the encoder’s internal state. This is “opaque”; the user should not depend on what is inside the library’s state structure.
lzw_min_code_width
is an integer, at least 2 and at most 8, representing the width in bits of the data to be encoded. If whole bytes are to be encoded (as for a 256-color palette), use 8. For smaller palettes (color tables) of size 2ⁿ (2 <= n < 8), use the corresponding n. For 2-color palettes, use 2 (see the specification for details). Note that the data must be unpacked, that is, each byte in the stream contains a single value referring to a color table entry.
Returns: GLZW_OUT_OF_MEMORY
if the structure cannot be allocated, or GLZW_OK
otherwise.
Example: To begin encoding a stream of bytes, use:
void *state;
int ret = glzwe_init(&state, 8);
Encoding
int glzwe(void *state, const GLZWByte *in_ptr, GLZWByte *out_ptr,
GLZWUint *in_avail, GLZWUint *out_avail,
GLZWUint end_of_data);
Call as many times as needed to encode a stream of bytes. It is the caller’s responsibility to ensure that bytes have a value less than 2lzw_min_code_width.
state
is a pointer to the structure initialized by glzwe_init()
.
in_ptr
is a pointer to a buffer containing bytes to be encoded.
out_ptr
is a pointer to an output buffer that will contain the resulting codes.
in_avail
is a pointer to a count of bytes in the input buffer.
out_avail
is a pointer to a count of bytes left in the output buffer.
end_of_data
is a flag that is zero if there is more data to be encoded than is present in the input buffer, and non-zero when the input buffer contains the last bytes in the stream to be encoded. In the case where end_of_data
is non-zero, the input buffer may be empty, i.e. in_avail
points to a variable containing 0.
This function supports incremental encoding, i.e. a chunk of input or output at a time. The function will encode the data at the location referenced by in_ptr
, storing output codes at the location referenced by out_ptr
. As bytes are consumed or emitted, in_avail
and out_avail
are decremented. The function returns when either one reaches zero and it attempts to consume or produce more data.
The caller must keep track of how far in_avail
and out_avail
are decremented, and must also ensure the in_ptr
and out_ptr
are updated as appropriate before the next call.
Returns:
GLZW_NO_INPUT_AVAIL
when in_avail
reaches zero and end_of_data
is not set. Caller must call again with at least one byte of input available.
GLZW_NO_OUTPUT_AVAIL
when out_avail
reaches zero. Caller must call again with at least one byte of output space available.
GLZW_OK
after the last byte of output has been stored.
GLZW_INTERNAL_ERROR
can only occur if the state structure is tampered with.
Example: To encode a stream of bytes, you could use:
Uint incnt, incnt0, outcnt;
enum {inbufsize = 30, outbufsize = 10};
Byte inbuf[inbufsize], outbuf[outbufsize];
// ... open input and output to infp, outfp;
void *state;
int r = glzwe_init(&state, 8);
Byte *p = inbuf;
incnt = 0;
int end_of_data = 0;
do {
outcnt = outbufsize;
incnt0 = incnt;
r = glzwe(state, p, outbuf, &incnt, &outcnt, end_of_data);
fwrite(outbuf, 1, outbufsize - outcnt, outfp);
if (r == GLZW_NO_INPUT_AVAIL) {
p = inbuf;
incnt = fread(p, 1, inbufsize, infp);
end_of_data = incnt < inbufsize;
} else if (r == GLZW_NO_OUTPUT_AVAIL) {
p += incnt0 - incnt;
} else if (r != GLZW_OK) {
printf("ERROR r:%d\n", r);
abort();
}
} while (r != GLZW_OK);
// close files; call glzwe_end(state);
(A complete version of this program is in the examples
code directory.)
Notes on this example: The input buffer is intially empty. The first call to glzwe()
will return GLZW_NO_INPUT_AVAIL
and then the input will be read. After each call to glzwe()
, any output produced will be written. Then, if the input buffer is exhausted, more data will be read. The end_of_data
flag is set when all input is read. If the output buffer is full, the input buffer pointer is adjusted to pick up the rest of the input on the next call to glzwe()
. This program will work for any sizes of input and output buffers.
Finishing
void glzwe_end(void *state);
Release the memory allocated by glzwe_init()
for the state structure.
Decoding (decompression)
source code: header file and implementation
Header file for the decoding routines
#include <glzwd.h>
Initialization
int glzwd_init(void **pstate, const GLZWUint lzw_min_code_width);
Call once to initialize the decoder for decoding one stream of bytes. The function will allocate and initialize a structure and return a pointer to it via pstate
. The function also requires an integer that is described as “LZW Minimum Code Size” in section 22 of the GIF 89a specification.
pstate
will be used to return a pointer to the decoder’s internal state. This is “opaque”; the user should not depend on what is inside the library’s state structure.
lzw_min_code_width
is an integer, at least 2 and at most 8, representing the width in bits of the data to be decoded. It must match the value used to encode the data, as described above for glzwe_init()
.
Returns: GLZW_OUT_OF_MEMORY
if the structure cannot be allocated, or GLZW_OK
otherwise.
Example: To begin decoding a stream of bytes, use:
void *state;
int ret = glzwd_init(&state, 8);
Decoding
int glzwd(void *state, const GLZWByte *in_ptr, GLZWByte *out_ptr,
GLZWUint *in_avail, GLZWUint *out_avail);
Call as many times as needed to decode a stream of bytes.
state
is a pointer to the structure initialized by glzwd_init()
.
in_ptr
is a pointer to a buffer containing bytes to be decoded.
out_ptr
is a pointer to an output buffer that will contain the resulting bytes.
in_avail
is a pointer to a count of bytes in the input buffer.
out_avail
is a pointer to a count of bytes left in the output buffer.
This function supports incremental decoding, i.e. a chunk of input or output at a time. The function will decode the data at the location referenced by in_ptr
, storing output bytes at the location referenced by out_ptr
. As bytes are consumed or emitted, in_avail
and out_avail
are decremented. The function returns when either one reaches zero and it attempts to consume or produce more data.
The caller must keep track of how far in_avail
and out_avail
are decremented, and must also ensure the in_ptr
and out_ptr
are updated as appropriate before the next call.
Returns:
GLZW_NO_INPUT_AVAIL
when in_avail
reaches zero. Caller must call again with at least one byte of input available.
GLZW_NO_OUTPUT_AVAIL
when out_avail
reaches zero. Caller must call again with at least one byte of output space available.
GLZW_OK
after the last byte of output has been stored.
GLZW_INVALID_DATA
if the input bytes form an invalid stream of LZW codes.
GLZW_INTERNAL_ERROR
can only occur if the state structure is tampered with.
Example: To decode a stream of bytes, you could use:
Uint incnt, incnt0, outcnt;
enum {inbufsize = 30, outbufsize = 10};
Byte inbuf[inbufsize], outbuf[outbufsize];
// ... open input and output to infp, outfp;
void *state;
int r = glzwd_init(&state, 8);
Byte *p = inbuf;
incnt = 0;
do {
outcnt = outbufsize;
incnt0 = incnt;
r = glzwd(state, p, outbuf, &incnt, &outcnt);
fwrite(outbuf, 1, outbufsize - outcnt, outfp);
if (r == GLZW_NO_INPUT_AVAIL) {
p = inbuf;
incnt = fread(p, 1, inbufsize, infp);
if (!incnt) {
printf("ERROR: empty or bad input.\n");
abort();
}
} else if (r == GLZW_NO_OUTPUT_AVAIL) {
p += incnt0 - incnt;
} else if (r != GLZW_OK) {
printf("ERROR r:%d\n", r);
abort();
}
} while (r != GLZW_OK);
// close files; call glzwd_end(state);
(A complete version of this program is in the examples
code directory.)
Notes on this example: This is similar to the example for encoding. When encoding, the encoder must be notified via the end_of_data
argument that all input has been supplied. When decoding, this is not needed because a well-formed coded input stream will have an end code. Instead, we observe that if the decoder expects more input (GLZW_NO_INPUT_AVAIL
) and no more input is available (incnt
is zero), then the input must be empty or not well-formed.
The above approach will detect if the input file is truncated, but if you also wish to detect if there is extra data after the END code, you could insert this after the do ... while()
:
if (incnt || fread(inbuf, 1, 1, infp)) {
printf("ERROR: junk after END\n");
abort();
}
Finishing
void glzwd_end(void *state);
Release the memory allocated by glzwd_init()
for the state structure.