In C++, strings are backed by character arrays; it is easiest to bootstrap our
string theory by using *null-terminated*
list Zs as the Coq model type - which
is what the
zstring module does. Atop these
zstrings we can derive the
concept of
cstrings which re-express the same
zstring representation predicates,
utilities and theory in terms of a
BS.t (bytestring) Coq model carrier type. You
should *almost always* use a
cstring concept instead of a
zstring concept
if possible, as the automation has been tuned to work best for the former; there are
sufficient reasoning principles for converting between the two if necessary.
Notes
- We omit discussion of zstrings from here on out due to their aforementioned
subsumption by cstrings.
- Whereas cstring.t := BS.t *does not* contain a null-terminator,
zstring.t := list Z *does* contain a null-terminator; the conversion
utilities deal with appropriately inserting/stripping this null-terminator.
- The `'` variant of the WF predicate uses an alternative formulation which
was more useful during a previous iteration of some downstream proofs.
We retain this alternative formulation so that we can avoid re-proving that
code, but you should *always* prefer to use WF over WF'. However, there
are sufficient reasoning principles for seamlessly switching between the two.
- Due to scope complexities, some definitions are actually notations which
wrap an identically-named term which is prefixed by an underscore; use this
underscore-prefixed version if Coq complains about a notation not being
fully applied.
cstring.t and cstring.WF - Well Formed BS.ts (bytestrings) as a Carrier Type for C++ Strings
While it is useful to take advantage of Coq's bytestrings to model C++ strings,
this is not quite strong enough. In particular, C++ strings are null-terminated
and thus cannot contain any intervening zeros. While we could've bundled
this well-formedness side-condition into the carrier type, we decide instead
to separate it out into a predicate
cstring.WF : cstring.t -> Prop. This has
the added benefit of easing the proof of the core
cstring.t theory since we
avoid the use of dependent records.
The following is a rough enumeration of the definitions and theory related to
cstring.t and
cstring.WF:
- cstring.t := BS.t: bytestrings as a carrier type for C++ strings
- cstring.WF (cstr : cstring.t) : Prop := <cstr doesn't contain the zero-byte>:
well-formedness of cstring.ts.
- cstring.size (cstr : cstring.t) : Z
:= <length of the string *including* the null-terminator>
- cstring.strlen (cstr : cstring.t) : Z
:= <length of the string *excluding* the null-terminator>
- This function mirrors the behavior of `std::strlen` in C++
- cstring.from_zstring/cstring.to_zstring: utilities for converting back and
forth between cstring.ts and zstring.ts while appropriately inserting/
stripping the null-terminator to accomodate the fact that we exlude it for
cstrings.
cstring.R - Fixed Sized Strings
Very often we are interested in dealing with a string as a fixed(-size) value,
and
cstring.R (q : Qp) (cstr : cstring.t) : Rep reflects
q-ownership of the
null-terminated character-array which contains the *non-null-terminated* and
cstring.WF bytestring
cstr. This rep is useful for dealing with string constants
that are inserted by the compiler and passed around/read during execution;
while it would be reasonably straightforward to add theory for modifying characters
in a random-access way, the
BS.t theory for this is somewhat lacking and so
the use-case languishes for now.
cstring.bufR - Dynamic Sized Strings Backed by Fixed Sized Character Arrays
String constants read into a fixed-sized buffer present a slightly different use
case. If the buffer has size
sz, it contains some
cstring.t cstr which is
cstring.WF, and
cstring.size cstr <= sz then the "weakest" valid representation
is:
```
cstring.R q cstr **
Exists l,
.
Tuchar ! cstring.size cstr |-> arrayR Tuchar (fun _ => anyR q Tuchar) l **
| cstring.size cstr + length l = sz |
```
However, we want to be able to easily add new characters to our buffer and -
ideally - remove them in a disciplined way. Given that our existing definitions
require our string to be null-terminated in memory, it is not obvious how we would
justify such `push_back`/`pop_back` reasoning principles while avoiding an explicit
binding and characterization of the tail of the buffer. To address this we we
restrict our definition of
cstring.bufR q sz cstr to use
primR Tuchar q 0
instead of
anyR q Tuchar.
With this approach, a
cstring.bufR q sz cstr fact allows you to `push_back` a new
character to the
cstring.t model by writing it to the
cstring.bufR
at position
.[Tuchar ! cstring.size cstr]. Furthermore, the tail of the
cstring.t model can be removed by zeroing that portion of the
cstring.bufR.
Note: at the moment the theory supporting this `push_back`/`pop_back` use-case
is not fully fleshed out, but ongoing verification efforts will allow us
to backfill the needed things.