Commit 49ebef9b authored by Eric Myhre's avatar Eric Myhre

Working on some more prose about schema migration.

Signed-off-by: default avatarEric Myhre <hash@exultant.us>
parent c209676a
......@@ -86,10 +86,6 @@ not entirely a part of it.
schema stack probing, options for programmatic callbacks (for fancy migrations),
etc
### Slurping
// particular to ipldcbor.Node (and other serializables)?
Excursions
----------
......@@ -103,8 +99,114 @@ Aside from this introduction, we won't use the terms "open" and "closed" much.
All schema types are *like* "closed"; but they're also inherently "open" since
we are of course handling data which may have existed outside of the schema.
In IPLD schema tooling, we always coerce data from its "open" nature to our
"closed" treatment of it by frontloading that check when handling the whole
document. As such, the distinction isn't particularly useful to make.
In IPLD, the data at the Data Model layer is always "open" in nature; and
at the Schema layer we treat it as "closed". As such, we don't spend much
futher time with the "open"/"closed" distinction; it's simply "does this data
match the schema or not?".
Most go-ipld-prime APIs for handling typed data will frontload the schema match
checking -- by the time a handle to the document has been returned, the entire
piece of data is verified to match the schema.
There are also some optional ways to use the library which
defer the open->closed mapping until midway through your handling of data,
in exchange for the schema mismatch error becoming something that needs handling
at that point in your code rather than up front.
Schemas and Migration
---------------------
Fundamental to our approach to schemas is an understanding:
> Data Never Changes. Only our interpretation varies.
Data can be created under one schema, and interpreted later under another.
Data may predate or be created without any kind of schema at all.
All of this needs to be fine.
Moreover, before talking about migration, it's important to note that we
don't allow the comforting, easy notion that migration is a one-way process,
or can be carried out atomically at one magically instantaneous point in time.
Because data is immutable, and producing updated versions of it doesn't make
the older version of the data go away, migration is less a thing that you do;
and more a state of mind. Migration has to be seamless at any time.
### Using Schema Match checking as Version Detection
We don't include any built-in/blessed concepts of versioning in IPLD Schemas.
It's not necessary: we have rich primitives which can be used to build
either explicit versioning or version detection, at your option.
Since it's easy to check if a schema fits over a piece of data, it's
easy to simply probe a series of schemas until finding one that fits.
Therefore, any constraint a schema makes has the potential to be used
for version detection!
There are a handful of recognizable patterns that are used frequently:
- Using a union to get nominative typing at the document root.
- e.g. `{"foo": {...}}`, using "foo" as the type+version hint.
- See the schema-schema for an example of this!
- Any union representation will do.
- Using a "version" field, plus manual unpacking.
- e.g. `{"version": "1.2.3", "data":{...}}`
- This can be implemented using unions of either envelope or inline representation!
- However, it might not be best to do so: this requires that the multiple
versions be implemented *within* your one schema! Typically it's more
composable and maintainable to have a separate schema per version.
- This can be implemented by double-unpacking. E.g., match once with a struct
with fields for version (keep it) and content (dev/null it); and match again
with a more complete schema chosen based on the version.
(Currently, this probing is left to the library user. More built-in features
around this are expected to come in the future.)
(In the future, we may also be able to construct some specialized schemas that
suggest jumping to another schema specifically and directly (rather than
linear probing); some research required. (Ideally this would work consistently
regardless of the ordering of fields in the arriving data, but there's some
tension between that and performance.) It might also be possible to construct
these as a user already!)
### Some comments on versioning-theory
There are different philosophies of versioning: namely, explicit versioning
which to use is a choice.
In short, explicit versioning tends towards fragility and is not particularly
fork/community/decentralization-friendly. Version detection -- also known as
its generalized cousin, *Feature* detection -- is strictly more powerful, but
tends to require more thought to deploy effectively.
Explicit versioning tends to treat version numbers as a junk drawer, upon
which we can heap unbounded amounts of not-necessarily-related semantics.
This is a temptation which can be migitated through diligence, but the
fundamental incentive is always there: like global variables in programming,
a document-global explicit version allows lazy coding and fosters presumptions.
Version/feature detection has the potential to become a fractal.
Using it well thus *also* requires diligence. However, there is no built-in
siren temptation to misuse them in the same way as explicit versioning; the
trade-offs in complexity tend to be make themselves fairly pronounced and
as such are relatively easily communicated.
It's impossible to make a blanket prescription of how to associate version
information with data; IPLD Schemas makes either choice viable.
### Strongly linked Schemas
It is possible to have a document which links directly to its own Schema!
Since IPLD Schemas are themselves representable in IPLD, it's outright trivial
to make an object containing a CID linking to a Schema.
This may be useful -- in particular, it certainly solves any issue of chosing
unique version strings in using explicit versioning! -- but it is also useless,
by definion, to *migration*.
Migration means wanting to treat old data as new data matching a new schema.
Knowing which other schema is stated to match the data can be a useful input
to deciding how to treat that data, but -- unless you're okay using that *exact*
schema, and it's what your application logic is already built against -- that
knowledge doesn't fully specify what to do to turn that data into what you want.
### Actually Migrating!
// todo: more detailed behavior trees
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment