Includes: |
<sys/types.h>
<stdint.h>
<stdbool.h>
|
Introduction
A chunk is a substring of a string. Chunks can not only be specified in
characters, but can also be the result of some limited parsing of the
given string.
Functions
- LEODoForEachChunk
- LEOGetChunkRanges
void LEODoForEachChunk(
const char *inStr,
size_t inBufSize,
LEOChunkType inType,
bool (*inChunkCallback)(
const char* currStr,
size_t currLen,
size_t currStart,
size_t currEnd,
void* userData ),
uint32_t itemDelimiter,
void *userData );
Parameters
-
inStr
A UTF8-encoded string to be parsed to determine the
range of the given chunk, or, for the byte chunk type,
an arbitrary buffer of bytes.
-
inBufSize
The number of bytes in inStr to parse.
-
inType
The type of unit for the chunk items to pass to the
callback.
-
inChunkCallback
A pointer to a function which will be called for
each chunk item. If this function returns FALSE,
parsing of the string for chunks will be aborted.
Return TRUE to keep going.
-
itemDelimiter
The item delimiter to use when inType is kLEOChunkTypeItem.
-
userData
A pointer to an arbitrary block of data that will be
passed to inChunkCallback as its userData parameter.
Use this to pass in context information that your
callback needs. LEODoForEachChunk() does not make
any assumptions or do anything with this pointer
except pass it on.
Discussion
Determine all the chunks of a certain type in a string and call the given
callback for each chunk.
void LEOGetChunkRanges(
const char *inStr,
LEOChunkType inType,
size_t inRangeStart,
size_t inRangeEnd,
size_t *outChunkStart,
size_t *outChunkEnd,
size_t *outDelChunkStart,
size_t *outDelChunkEnd,
uint32_t itemDelimiter );
Parameters
-
inStr
A UTF8-encoded string to be parsed to determine the
range of the given chunk, or, for the byte chunk type,
an arbitrary zero-terminated string of bytes.
-
inType
The type of unit you wish to specify this chunk in.
-
inRangeStart
The start offset of the range expressed in the unit
specified by inType.
-
inRangeEnd
The end offset of the range expressed in the unit
specified by inType.
-
outChunkStart
On return, this is set to a byte offset indicating
the start of the payload of the given chunk, without
any starting delimiters.
-
outChunkEnd
On return, this is set to a byte offset indicating
the end of the payload of the given chunk, without
any ending delimiters.
-
outDelChunkStart
On return, this is set to a byte offset indicating
the start of the given chunk, including any starting
or ending delimiters that would have to be deleted
to remove this chunk completely from its string.
-
outDelChunkEnd
On return, this is set to a byte offset indicating
the end of the given chunk, including any starting
or ending delimiters that would have to be deleted
to remove this chunk completely from its string.
-
itemDelimiter
The item delimiter to use when inType is kLEOChunkTypeItem.
Discussion
Determine what character range corresponds to the given chunk range of inStr.
You get back two offset pairs, one for extracting the value from the string,
and a second pair for deleting them, which may include one delimiter.
Constants
- gLEOChunkTypeNames
Discussion
String names for each chunk type.
Typedefs
- LEOChunkType
Constants
kLEOChunkTypeINVALID
Used in some cases to indicate something
that *can* be a chunk reference is *not*
a chunk.
kLEOChunkTypeByte
Take a byte out of the string. This may
tear a byte out of the middle of a UTF8
string and make it invalid as a string.
kLEOChunkTypeCharacter
UTF8-characters. One character may use
several bytes, e.g. for a Chinese or
Japanese character.
kLEOChunkTypeItem
Items are delimited by a certain character
(by default, a comma). If there are several
delimiters immediately in sequence, the
items between them are considered to be
empty. Items are assumed to be UTF8-strings.
kLEOChunkTypeLine
Lines are delimited by a return or a line
feed. Otherwise, lines behave like items.
kLEOChunkTypeWord
Words are delimited by one or more spaces,
tabs, returns or line feeds, i.e. whitespace
characters. There can be no 'empty' words,
and punctuation is treated just like any
other alphabetic character.
Discussion
There are different kinds of chunks that are parsed differently, depending
on which of these flags you pass in.
© 2010-2013 Uli Kusterer, all rights reserved.
Last Updated: 2019-02-10
|