gendemo now includes own scalars, & design doc.

Check out the 'HACKME_scalars.md' file for why this commit represents a bunch of nontrivial decisions. There are a lot of possible ways the compile failures here could be fixed, but some of them would have bigger consequences of longterm weirdness than others. Or at least, it certainly looks that way from here. Maybe we'll see. The scalar implementations are almost exact replicates of the basicnode implementations. Oddly, they haven't been exported yet. This may change (again, see discussion in the HACKME_scalars document) -- in which case the code will also change to use wrapper structs. Note one other divergence: the error messages have different content for the typename: we mention the package, since this will probably be typical and useful for codegen-created types in general. The gendemo package almost compiles again. The rest of the fixes aren't related to this topic, so come in the next commit.

gendemo now includes own scalars, & design doc.
Check out the 'HACKME_scalars.md' file for why this commit represents a bunch of nontrivial decisions. There are a lot of possible ways the compile failures here could be fixed, but some of them would have bigger consequences of longterm weirdness than others. Or at least, it certainly looks that way from here. Maybe we'll see. The scalar implementations are almost exact replicates of the basicnode implementations. Oddly, they haven't been exported yet. This may change (again, see discussion in the HACKME_scalars document) -- in which case the code will also change to use wrapper structs. Note one other divergence: the error messages have different content for the typename: we mention the package, since this will probably be typical and useful for codegen-created types in general. The gendemo package almost compiles again. The rest of the fixes aren't related to this topic, so come in the next commit.
bddec04c · Eric Myhre · 9523d918 · bddec04c · bddec04c · bddec04c
Commit bddec04c authored Feb 23, 2020 by Eric Myhre
4 changed files
--- a/_rsrch/nodesolution/node/gendemo/HACKME_scalars.md
+++ b/_rsrch/nodesolution/node/gendemo/HACKME_scalars.md
+What's the deal with scalars, anyway?
+=====================================
+Two sorts of scalars
+--------------------
+There are two sorts of scalars that show up in codegen:
+- 1: scalars that are just the plain kind (e.g. "string", not even named);
+- 2: scalars that have named types.
+Plain scalars can't have any special rules or semantics attached to them.
+Named types with scalar kinds (aka a "typedef") **can** have additional rules and semantics attached to them.
+Let's talk about named scalars first, because it's clearer that there's fun there.
+### named scalars
+Named scalars cause a type to be generated.
+That type information is part of their identity (practically speaking: affects their definition of equality).
+#### named scalars are never equal even if their contents are
+It stands to reason that named scalars can't be freely interchanged.
+If you have a schema:
+```ipldsch
+type Foo string
+type Bar string
+```
+... then you'll get codegen output code with an exported type for each:
+```go
+type Foo struct{ x string }
+/*...*/
+type Bar struct{ x string }
+/*...*/
+```
+... and clearly, `(Foo{"asdf"} == Bar{"asdf"}) == false`.
+#### named scalars appear in specialized method argument types and return types
+Just like any other named type, named scalars will appear in specialized methods
+which are exported on codegen'd types.
+For example, if you have a schema:
+```ipldsch
+type Foo string
+type Bar string
+type Foomp map {Foo:Bar}
+```
+... then you'll get codegen output code which includes a method on Foomp:
+```go
+func (x *Foomp) Lookup(k *Foo) (*Bar) { /*...*/ }
+```
+Such specialized methods are often much shorter, much more efficient to execute,
+and involve much less error handling to use than their more generalized
+counterparts on the `ipld.Node` interface.
+Note that when named scalars appear in the signitures of specialized methods,
+they always appear as pointers.  They will never be `nil`, but there is still
+a reason that pointers are used here, and it's based on performance.
+(The details don't matter as a user, but: it means if those values need to be
+regarded as the `ipld.Node` interface again in the future, that boxing is
+inexpensive since we already have a (heap-escaped long ago) pointer.
+By contrast, copying by value in more places is likely to result in more
+heap escapes and thus additional undesirable new allocation costs in the
+(entirely common!) case that the values end up handled as `ipld.Node` later.)
+#### named scalars have a specialized method which unboxes them to a native primitive unconditionally
+Every named scalar type as a specialized unbox method corresponding to its kind.
+For example, for a `type Foo string`, there will be a `func (f Foo) String() string` method
+(in addition to the `func (f Foo) AsString() (string, error)` method,
+which does the same thing but is stuck presenting an error due to interface conformance even though we know that it's statically impossible).
+#### named scalars can have additional methods attached to them
+It's possible for users of codegen to attach additional methods to the types
+generated for a named scalar.
+This can be either done for purely aesthetic/ergonomic purposes particular
+to the user's exact product, or, as part of some extended library features.
+For example, we plan support extended features like "validation" methods
+via detecting when a user adds a `Valdiate() error` method to a generated type.
+### plain scalars
+Plain scalars also cause a type to be generated;
+one type for each kind in the Data Model is sufficient.
+Plain scalars show up in codegen output packages almost exactly as if
+there was a short preamble in every schema:
+```ipldsch
+type Int int
+type Bool bool
+type Float float
+type String string
+```
+#### note about schema syntax
+There's an issue about capitalization that's somewhat unresolved in schemas:
+namely, is `type Fwee struct { someField string }` allowed, or a parse error?
+This syntax is questionable because it means some of the scalar kind identifier
+keywords are allowed in the same place as type names,
+and it's potentially confusing because when we come to interacting with the
+generated output code in golang, we still have `String`-with-a-capital-S
+as a type identifier.
+At any rate, it seems clear that you can mentally capitalize the 's'
+at any time you see this debatable syntax.
+#### plain scalars appear in specialized method argument types and return types
+This is the same story as for named scalars.
+For example, if you have a schema:
+```ipldsch
+type Foomp map {String:String}
+```
+... then you'll get codegen output code which includes a method on Foomp:
+```go
+func (x *Foomp) Lookup(k *String) (*String) { /*...*/ }
+```
+The type might carry less semantic information than it does when a
+named scalar shows up in the same position, but we still use a generated
+type (and a pointer) here for two reasons: first of all, and more simply,
+consistency; but secondly, for the same performance reasons as applied
+to named scalars (if we need to treat this value as an `ipld.Node` again
+in the future, it's much better if we already have a heap pointer rather
+than a bare primitive value (`runtime.convT*` functions are not a thing your
+favorite thing to see in a pprof flamegraph)).
+(FUTURE: this is still worth review.  We might actually want to use
+bare primitives in a lot of these cases, because surely, if you're about
+to want to treat something as an `ipld.Node` again, then you can use the
+generalized methods conforming to `ipld.Node` which already yield that...?
+We'll get more information and impressions about this after trying to use
+codegen in bulk (especially the specialized methods).)
+#### plain scalars do not allow additional method attachedments
+While we can't *stop* developers from modifying the source code emitted by codegen,
+adding a method to any of the plain scalars is intensely discouraged.
+Nothing sensible or good can come of trying to attach a "Validate" method
+to something like the `String` type.  Don't do it.
+Code reuse for plain scalars
+----------------------------
+We *always* need some type that can contain a plain scalar while also
+implementing all the `ipld.Node` methods.  Even if we didn't export it
+or show it in any method signitures anywhere at all, we'd *still* need it
+for internal implementation of other types, because it's important those
+types be able to return a pointer to their fields in their implements of
+the `ipld.Node` contract (otherwise, they'd be terribly slow and alloc-heavy).
+### can we reuse another package's plain scalars?
+Since there's no functional difference between the plain scalars in a schema
+and the scalars implementation from another package that's untyped in the first place,
+can we reuse some code from an untyped package in codegen output packages?
+No.
+(Or: "maybe, conditionally, and it would have a lot of caveats and make the
+untyped package we try to hitch a ride on become significantly weirder, so...
+it's probably not worth it".)
+The reason to desire this so there's less (admittedly quite duplicative) code
+in the package emitted by using codegen.
+However, there are *many* "cons" which outweight that single "pro":
+- This would require the untyped package to export their concrete implementation types.
+	- This is the *only* reason those implementation types would need to be exported, which is a concerning smell all by itself.
+	- In the case of we consider using the 'basicnode' package in particular:
+		- Exporting those types allows creation by casting, which exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder.
+			- We don't like allowing casting for creating values in general for reasons explored well in the go-cid refactors to use wrapper structs: if casting is possible, it's far too easy for an end-user to write shoddy code which dodges all constructors and validation logic.
+		- Exporting those types allows unboxing by casting, which again exposes an API surface that's not conventional (nor necessarily even possible) for other packages, and will thus be likely to create confusion as well as create multiple ways of doing things which will make refactors harder.
+			- Since we're talking about scalars and they're essentially copy-by-value (except for bytes -- but we give up and rely on "lawful" code for those anyway, since defensive copies are completely nonviable in performance terms), this doesn't create incorrectness issues... but it's still not *good*.
+			- Note that while casting to concrete types exported by the output package of codegen is considered acceptable, this is a different beast: you still can't get the raw content out without using at least one more unboxing method; and, if you're casting or doing a type switch with type in a codegen package, it should already instantly be clear that your code is no longer general-purpose, and this will surprise no one.
+		- ...And while the above two are true only because the implmentation is by typedefs and they could be fixed by using a wrapper struct... that fix would have exactly the effect of making reuse impossible anyway, since the field in that wrapper struct would need to be unexported (otherwise, immutability would then in turn trivially shatter).
+		- The implementation of the scalar for link kinds can't be reused anyway (it *does* use a wrapper struct already, and needs to; type aliases on an interface don't permit adding methods), adding yet more inconsistency and jagged edges to the picture.
+		- The "more unnecessarily(-for-end-user-perspectives) exported symbols" code smell counts about 10x as hard for this package in particular, since it's often one of the first ones a newcomer to this library will see: there shouldn't be weird designs with elaborate and far away justifications poking up here.
+- Reusing concrete types between packages makes it more likely uncautious users could write code that uses native equality on scalars and get away with it *sometimes*.  Since this is still incorrect and would sometimes fail in fully general code, it's better if code like this flunks out as early as possible, which results in a better ecosystem overall.
+- We like it when error messages can include a type name.  It's marginally better for that to be something like "gendemo.String" ('gendemo' being consistent with whatever the rest of the package also says) than just bare "string".
+There are also a few bits that aren't entirely known (at least, at the time of this writing):
+namely, how 'any' types are going to be handled in codegen.
+Probably, though, the answer is: it's just treated as 'ipld.Node',
+and the codegen package doesn't export *any* more types which regard this situation because that's already sufficient.
+Long story short?  It's better to have plain scalar types in codegen output,
+even if they look somewhat duplicative,
+because trying to do anything fancier either fails outright
+or spawns ridiculously detailed epicycles of complexity.
+Emitting the plain scalar types in codegen output
+is *more consistent* in almost every way,
+will generate less cognitive load for users,
+and just plain *works unconditionally*.
--- a/_rsrch/nodesolution/node/gendemo/int.go
+++ b/_rsrch/nodesolution/node/gendemo/int.go
+package gendemo
+import (
+	ipld "github.com/ipld/go-ipld-prime/_rsrch/nodesolution"
+	"github.com/ipld/go-ipld-prime/_rsrch/nodesolution/node/mixins"
+)
+var (
+	_ ipld.Node          = plainInt(0)
+	_ ipld.NodeStyle     = Style__Int{}
+	_ ipld.NodeBuilder   = &plainInt__Builder{}
+	_ ipld.NodeAssembler = &plainInt__Assembler{}
+)
+// plainInt is a simple boxed int that complies with ipld.Node.
+type plainInt int
+// -- Node interface methods -->
+func (plainInt) ReprKind() ipld.ReprKind {
+	return ipld.ReprKind_Int
+}
+func (plainInt) LookupString(string) (ipld.Node, error) {
+	return mixins.Int{"gendemo.Int"}.LookupString("")
+}
+func (plainInt) Lookup(key ipld.Node) (ipld.Node, error) {
+	return mixins.Int{"gendemo.Int"}.Lookup(nil)
+}
+func (plainInt) LookupIndex(idx int) (ipld.Node, error) {
+	return mixins.Int{"gendemo.Int"}.LookupIndex(0)
+}
+func (plainInt) LookupSegment(seg ipld.PathSegment) (ipld.Node, error) {
+	return mixins.Int{"gendemo.Int"}.LookupSegment(seg)
+}
+func (plainInt) MapIterator() ipld.MapIterator {
+	return nil
+}
+func (plainInt) ListIterator() ipld.ListIterator {
+	return nil
+}
+func (plainInt) Length() int {
+	return -1
+}
+func (plainInt) IsUndefined() bool {
+	return false
+}
+func (plainInt) IsNull() bool {
+	return false
+}
+func (plainInt) AsBool() (bool, error) {
+	return mixins.Int{"gendemo.Int"}.AsBool()
+}
+func (n plainInt) AsInt() (int, error) {
+	return int(n), nil
+}
+func (plainInt) AsFloat() (float64, error) {
+	return mixins.Int{"gendemo.Int"}.AsFloat()
+}
+func (plainInt) AsString() (string, error) {
+	return mixins.Int{"gendemo.Int"}.AsString()
+}
+func (plainInt) AsBytes() ([]byte, error) {
+	return mixins.Int{"gendemo.Int"}.AsBytes()
+}
+func (plainInt) AsLink() (ipld.Link, error) {
+	return mixins.Int{"gendemo.Int"}.AsLink()
+}
+func (plainInt) Style() ipld.NodeStyle {
+	return Style__Int{}
+}
+// -- NodeStyle -->
+type Style__Int struct{}
+func (Style__Int) NewBuilder() ipld.NodeBuilder {
+	var w plainInt
+	return &plainInt__Builder{plainInt__Assembler{w: &w}}
+}
+// -- NodeBuilder -->
+type plainInt__Builder struct {
+	plainInt__Assembler
+}
+func (nb *plainInt__Builder) Build() ipld.Node {
+	return nb.w
+}
+func (nb *plainInt__Builder) Reset() {
+	var w plainInt
+	*nb = plainInt__Builder{plainInt__Assembler{w: &w}}
+}
+// -- NodeAssembler -->
+type plainInt__Assembler struct {
+	w *plainInt
+}
+func (plainInt__Assembler) BeginMap(sizeHint int) (ipld.MapNodeAssembler, error) {
+	return mixins.IntAssembler{"gendemo.Int"}.BeginMap(0)
+}
+func (plainInt__Assembler) BeginList(sizeHint int) (ipld.ListNodeAssembler, error) {
+	return mixins.IntAssembler{"gendemo.Int"}.BeginList(0)
+}
+func (plainInt__Assembler) AssignNull() error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignNull()
+}
+func (plainInt__Assembler) AssignBool(bool) error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignBool(false)
+}
+func (na *plainInt__Assembler) AssignInt(v int) error {
+	*na.w = plainInt(v)
+	return nil
+}
+func (plainInt__Assembler) AssignFloat(float64) error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignFloat(0)
+}
+func (plainInt__Assembler) AssignString(string) error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignString("")
+}
+func (plainInt__Assembler) AssignBytes([]byte) error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignBytes(nil)
+}
+func (plainInt__Assembler) AssignLink(ipld.Link) error {
+	return mixins.IntAssembler{"gendemo.Int"}.AssignLink(nil)
+}
+func (na *plainInt__Assembler) AssignNode(v ipld.Node) error {
+	if v2, err := v.AsInt(); err != nil {
+		return err
+	} else {
+		*na.w = plainInt(v2)
+		return nil
+	}
+}
+func (plainInt__Assembler) Style() ipld.NodeStyle {
+	return Style__Int{}
+}
--- a/_rsrch/nodesolution/node/gendemo/map_K2_T2.go
+++ b/_rsrch/nodesolution/node/gendemo/map_K2_T2.go
@@ -15,12 +15,6 @@ import (
 	type T2 struct { a int, b int, c int, d int }
 */
-// Note how we're not able to use `int` in the structs, but instead `plainInt`: this is so we can take address of those fields directly and return them as nodes.
-//  We don't currently have concrete exported types that allow us to do this.  Maybe we should?
-type plainString string // FIXME placeholder, doesn't actually implement Node.  we need our own type in the same package for this.
-type plainInt int       // FIXME placeholder, doesn't actually implement Node.  we need our own type in the same package for this.
 type K2 struct{ u, i plainString }
 type T2 struct{ a, b, c, d plainInt }

--- a/_rsrch/nodesolution/node/gendemo/string.go
+++ b/_rsrch/nodesolution/node/gendemo/string.go
+package gendemo
+import (
+	ipld "github.com/ipld/go-ipld-prime/_rsrch/nodesolution"
+	"github.com/ipld/go-ipld-prime/_rsrch/nodesolution/node/mixins"
+)
+var (
+	_ ipld.Node          = plainString("")
+	_ ipld.NodeStyle     = Style__String{}
+	_ ipld.NodeBuilder   = &plainString__Builder{}
+	_ ipld.NodeAssembler = &plainString__Assembler{}
+)
+// plainString is a simple boxed string that complies with ipld.Node.
+// It's useful for many things, such as boxing map keys.
+//
+// The implementation is a simple typedef of a string;
+// handling it as a Node incurs 'runtime.convTstring',
+// which is about the best we can do.
+type plainString string
+// -- Node interface methods -->
+func (plainString) ReprKind() ipld.ReprKind {
+	return ipld.ReprKind_String
+}
+func (plainString) LookupString(string) (ipld.Node, error) {
+	return mixins.String{"gendemo.String"}.LookupString("")
+}
+func (plainString) Lookup(key ipld.Node) (ipld.Node, error) {
+	return mixins.String{"gendemo.String"}.Lookup(nil)
+}
+func (plainString) LookupIndex(idx int) (ipld.Node, error) {
+	return mixins.String{"gendemo.String"}.LookupIndex(0)
+}
+func (plainString) LookupSegment(seg ipld.PathSegment) (ipld.Node, error) {
+	return mixins.String{"gendemo.String"}.LookupSegment(seg)
+}
+func (plainString) MapIterator() ipld.MapIterator {
+	return nil
+}
+func (plainString) ListIterator() ipld.ListIterator {
+	return nil
+}
+func (plainString) Length() int {
+	return -1
+}
+func (plainString) IsUndefined() bool {
+	return false
+}
+func (plainString) IsNull() bool {
+	return false
+}
+func (plainString) AsBool() (bool, error) {
+	return mixins.String{"gendemo.String"}.AsBool()
+}
+func (plainString) AsInt() (int, error) {
+	return mixins.String{"gendemo.String"}.AsInt()
+}
+func (plainString) AsFloat() (float64, error) {
+	return mixins.String{"gendemo.String"}.AsFloat()
+}
+func (x plainString) AsString() (string, error) {
+	return string(x), nil
+}
+func (plainString) AsBytes() ([]byte, error) {
+	return mixins.String{"gendemo.String"}.AsBytes()
+}
+func (plainString) AsLink() (ipld.Link, error) {
+	return mixins.String{"gendemo.String"}.AsLink()
+}
+func (plainString) Style() ipld.NodeStyle {
+	return Style__String{}
+}
+// -- NodeStyle -->
+type Style__String struct{}
+func (Style__String) NewBuilder() ipld.NodeBuilder {
+	var w plainString
+	return &plainString__Builder{plainString__Assembler{w: &w}}
+}
+// -- NodeBuilder -->
+type plainString__Builder struct {
+	plainString__Assembler
+}
+func (nb *plainString__Builder) Build() ipld.Node {
+	return nb.w
+}
+func (nb *plainString__Builder) Reset() {
+	var w plainString
+	*nb = plainString__Builder{plainString__Assembler{w: &w}}
+}
+// -- NodeAssembler -->
+type plainString__Assembler struct {
+	w *plainString
+}
+func (plainString__Assembler) BeginMap(sizeHint int) (ipld.MapNodeAssembler, error) {
+	return mixins.StringAssembler{"gendemo.String"}.BeginMap(0)
+}
+func (plainString__Assembler) BeginList(sizeHint int) (ipld.ListNodeAssembler, error) {
+	return mixins.StringAssembler{"gendemo.String"}.BeginList(0)
+}
+func (plainString__Assembler) AssignNull() error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignNull()
+}
+func (plainString__Assembler) AssignBool(bool) error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignBool(false)
+}
+func (plainString__Assembler) AssignInt(int) error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignInt(0)
+}
+func (plainString__Assembler) AssignFloat(float64) error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignFloat(0)
+}
+func (na *plainString__Assembler) AssignString(v string) error {
+	*na.w = plainString(v)
+	return nil
+}
+func (plainString__Assembler) AssignBytes([]byte) error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignBytes(nil)
+}
+func (plainString__Assembler) AssignLink(ipld.Link) error {
+	return mixins.StringAssembler{"gendemo.String"}.AssignLink(nil)
+}
+func (na *plainString__Assembler) AssignNode(v ipld.Node) error {
+	if v2, err := v.AsString(); err != nil {
+		return err
+	} else {
+		*na.w = plainString(v2)
+		return nil
+	}
+}
+func (plainString__Assembler) Style() ipld.NodeStyle {
+	return Style__String{}
+}