- 12 Mar, 2021 2 commits
-
-
Eric Myhre authored
I'm still not fully convinced this is always a win for performance, but as discussed in https://github.com/ipld/go-ipld-prime/pull/143#discussion_r582335774 , there was past evidence, and reproducing that takes work -- so, we're going to try to be relatively conservative here and keep this logical branch behavior in place for now. (The reason this may be worth re-examining is that the change to hashing interface probably gets rid of one big source of copies. That doens't answer the holistic performance question alone, though; it just reopens it.)
-
Eric Myhre authored
-
- 25 Feb, 2021 5 commits
-
-
Eric Myhre authored
And, make a package which can be imported to register "all" of the multihashes. (Or at least all of them that you would've expected from go-multihash.) There are also packages that are split roughly per the transitive dependency it brings in, so you can pick and choose. This cascaded into more work than I might've expected. Turns out a handful of the things we have multihash identifiers for actually *do not* implement the standard hash.Hash contract at all. For these, I've made small shims. Test fixtures across the library switch to using sha2-512. Previously I had written a bunch of them to use sha3 variants, but since that is not in the standard library, I'm going to move away from that so as not to re-bloat the transitive dependency tree just for the tests and examples.
-
Eric Myhre authored
This significantly reworks how linking is handled. All of the significant operations involved in storing and loading data are extracted into their own separate features, and the LinkSystem just composes them. The big advantage of this is we can now add as many helper methods to the LinkSystem construct as we want -- whereas previously, adding methods to the Link interface was a difficult thing to do, because that interface shows up in a lot of places. Link is now *just* treated as a data holder -- it doesn't need logic attached to it directly. This is much cleaner. The way we interact with the CID libraries is also different. We're doing multihash registries ourselves, and breaking our direct use of the go-multihash library. The big upside is we're now using the familiar and standard hash.Hash interface from the golang stdlib. (And as a bonus, that actually works streamingly; go-mulithash didn't.) However, this also implies a really big change for downstream users: we're no longer baking as many hashes into the new multihash registry by default.
-
Eric Myhre authored
This commit is a merge commit, but using the "ours" strategy -- meaning that the contents are functionally ignored and have no impact on the branch they're being merged into. Something went off the rails at some point here, and I'm not sure what, but I'm calling it quits and this work will have to be rebooted in a fresh set of patches sometime in the future. The "hackme" file discussing the choices: probably salvagable. It's not perfect (some sections read harsher than others; should redo it with a table that applies the same checklist to each option), and obviously the final conclusion it came to is questionable, but most of the discussion is probably good. The "carrier types" sidequest: extremely questionable. The micro-codegenerator I made for that lives on in its own repo: see https://github.com/warpfork/go-quickimmut . But overall, I don't think this went well. With these, we got immutability, and we got it without an import cycle or any direct dependence on the IPLD Schema codegen output, but... we lost other things, like the ability to differentiate empty lists from nil lists, and it drops map ordering again, etc. All of these are problems solved by the IPLD Schema codegen, and losing those features again was just... painful. And the more places we're stuck flipping between various semantics for representing "Maybe", the vastly more likely it becomes to be farming bugs; this approach hit that. Tracking "Is this nil or empty at this phase of its life" got very confusing and is one of the main reasons I'm feeling it's probably wise to put this whole thing down. The first time I re-wrote a custom order-preservation feature, I was like "ugh, golang, but okay, fine". The second and third and fourth times I ran into it, and realized not doing the work again would result in randomized error message orders and muck up my attempts at unit tests for error paths... I was less happy. I still don't know what the solution to this is, other than trying to use the IPLD Schema codegen in a cyclic way, because it *does* solve these problems. All of the transformation code in the dmt package which flipped things into the compiler structures: awful. To be fair, I expected that to be awful. But it was really awful. The "carrier types" stuff all being attached to a Compiler type, for sheer grouping: extremely questionable. If we were going to have this architecture of types|compiler|dmt overall, I'd hands down absolutely full-stop definitely no-question want the compiler to just be its own package. Unfortunately, as you already know, golang: if we want the types metadata types to be immutable, all the code for building them has to be in-package. Any form of namespacing at all would be an adequate substitute for a full-on package boundary. Possibly even just a feature that allowed some things in a package to be grouped together in the *documentation* would be satiating! The "compiler_rules.go" file: really quite good! Not entirely certain of the "first rule in a set to flunk causes short-circuit" tactic, because there's at least one case where that resulted in under-informative error messages that could've reasonably identified two issues in one pass if not for the short-circuit. But otherwise, the table-driven approach seemed promising. Holistically: it seemed like having a "compiler" system that was separate from the "dmt" data holder types would be a nice separation of concerns. When actually writing it: no, it was not. The compiler system ended up needing to pass through almost the exact range of semantics that the dmt expresses (whether good or bad), so that when the validator system was built on the compiler's data types, it could provide reasonable responses to any issues in the data that might've originated from the dmt format (and in turn this ultimately matters for end-user experience). With that degree of coupling, the compiler system ends up forced to have a very limited and sharp-edged API that's not at all natural, and certainly doesn't add any value. It's possible this is inevitable (we didn't start this quest in order to get a nice golang API, we started it to get rid of a dang import cycle!), but it definitely generated pain. I'm frankly not sure what the path forward here is. Having the validation logic attach to the dmt would solve the coupling pressure; but the tradeoff would be there's no validation logic on the constructors which produce the in-memory type info! Is that okay? I dunno. It would definitely make me grit my teeth, but maybe it's one of the least bad options left, seeing as how many other angles of attack seem to have turned pretty sour now. Another angle that deserves more thought is cyclebreaking by removing self-description from generated types (this gets a mention in the hackme doc in the diff already, but perhaps isn't studied far enough). Whatever it is: it's going to start as a new body of work. This well here is dry. Also documented and discussed in the web at https://github.com/ipld/go-ipld-prime/pull/144 .
-
Eric Myhre authored
I'm about to call it quits on this. I'm not sure exactly where this got off the rails, but I'm not happy about how its going, and with this diff, I've reached enough "huh,hmm" moments that I think it's going to end up being less work restarting on a cleaner approach than it's going to be work finishing this, fixing all the bugs resulting from the mess of maybeism, and then maintaining it. Comments in the diff body show the exact moment of my exasperation reaching a critical threshhold. I'm really not happy with the golang typesystem today. A more systematic review of this stack of diffs will follow in the subsequent commit message. It will be a merge-ignore commit.
-
Eric Myhre authored
This'll probably be a lot of grind to refactor, but otherwise isn't a huge jump semantically. I waffled on sprinting straight ahead to writing schema DMT as JSON, and using those as the test fixture input. On the one hand: it would be nice to get that much closer to purely textual test fixtures. On the other hand, it's really verbose, and I don't want to (long run, we'll use schema DSL for this, and while we'll also save the DMT fixtures, they should ideally probably mostly be generated and just human-checked rather than human-penned); and also, since we don't have implicits implemented correctly yet, we'd need to update all that JSON again anyway once that feature is complete... which pushes it overall to a net "nah".
-
- 11 Feb, 2021 1 commit
-
-
Eric Myhre authored
Begin using it in some parts of testing of schema parse&compile. (I don't think I'll try to push all usage of go-wish over to quicktest in one flurry of diffs right now. But it might be future work.) The breaking point is that asserting on lists of errors using go-wish was starting to exhibit Weird behaviors for only *some* error types. It's likely that this would be addressable using some kind of go-cmp customization, but go-wish doesn't really expose that ability. The quicktest library does. Switching to quicktest also brings a bunch of other nice features along with it, like stack traces and other forms of additional info. The holistic feel of it is also pretty similar to go-wish (particularly since https://github.com/frankban/quicktest/pull/77 ), so many things should be easy to switch. I suspect I might want some more checker functions to assert on types... but those should be easy to add as a third party, either as a go-cmp.Transformer or a qt.Checker (and then we can work on upstreaming at leisure, if appropriate). In this commit, I'm handling the error list situation using a go-cmp.Transformer to stringify all the errors. This means that the error *types* aren't checked, which is definitely loses ditches some information... but the upside is that its easy to use the tests to check that the eventual string of the error message is human-readable. In this API, I think that readability is the higher priority.
-
- 27 Jan, 2021 1 commit
-
-
Eric Myhre authored
Use a "__" pattern -- the same one already used in our schema codegen. This is less ambiguous given we have some other names which already use single underscores to demarcate other semantics (e.g. union membership groupings). Also things now ready left-to-right as is normal for golang (e.g., the word "map" or "list" now comes first). No logical changes. Just naming.
-
- 25 Jan, 2021 1 commit
-
-
Daniel Martí authored
As a method superset of Node, with the Substrate method added on top. This way, we can generally say that any ADL implementation should include this method to allow for easy encoding over the wire. Reify, the opposite of Substrate, is notably missing here. This is because it's generally a top-level function, not a method. We might want to rethink this disparity in the future.
-
- 24 Jan, 2021 13 commits
-
-
Eric Myhre authored
Extracted a few last comment hunks that were useful and that's the end of it. The migrated tests only mostly pass (but, remember, we've been several commits in a row already where we're making checkpoints to keep this refactor managable, and CI is not full green across the board for some of them). I've updated some of the error text assertions, but there's other breakage as well. One is just other todos. Another is... It may be time to switch test libraries. What go-wish (and transitively, go-cmp) is doing with `[]error` values is... not helpful.
-
Eric Myhre authored
And I think this now ports all the rules we'd previously written against the attempted schema2 (the wrapper style) API. So I can now remove the rest of that in the next commit. It will soon be time to start updating all the gengo stuff to use this, including all of its tests.
-
Eric Myhre authored
Also, shift some error message text towards a more consistent phrasing.
-
Eric Myhre authored
Commit to the strategy of having the first flunked rule for a type result in short-circuit skipping of subsequent rules. It's simple, and it's sufficient.
-
Eric Myhre authored
More rules are still to come.
-
Eric Myhre authored
Both are now accessible. Name is not always present. Get rid of casts that are unnecessary. Constructors for anonymous types are still upcoming; all the current constructors dupe the name into the reference field. Planning to add distinct methods on the Compiler for anon types.
-
Eric Myhre authored
-
Eric Myhre authored
It details a variety of considered approaches. Spoiler: I'm not actually super pleased with the one I'm currently pursuing. The amount of boilerplate I'm grinding out for this is really, really no fun at all. It's possibly that the reasoning leading here is still sound. It's just unpleasant.
-
Eric Myhre authored
Carving out hunks of the schema2 implementation of them (which still hoped to use the dmt more directly) as I port them. As comments in the diff state: I had a hope here that I could make this relatively table-driven, which would increase legibility and make it easier to port these checks to other implementations as well. We'll... see how that goes; it's not easy to flatten.
-
Eric Myhre authored
-
Eric Myhre authored
schema: so much boilerplate for feeding information to the Compiler that I wrote another supplementary code generator. (I'm getting very weary of golang.) This new bit of codegen makes the compiler.go file fairly readable again, though, so I'm satisfied with it. The Compiler API is now complete enough that I can start repairing other things to use it properly. The schemadmt.Schema.Compile() function and all of its helpers compile again now. So does *most* of the whole codegen system... with the notable exception of all the hardcoded typesystem spawning which used the old placeholder methods which have now been stricken. TypeSystem now maintains order. This allowed me to remove some sort operations from the code generator. This also means the next time any existing codegen is re-run, the output file will shift significantly. However, it shouldn't do so again in the future.
-
Eric Myhre authored
As with parent commit: this is a checkpoint. CI will not be passing.
-
Eric Myhre authored
This commit does not pass CI or even fully compile, and while I usually try to avoid those, A) I need a checkpoint!, and B) I think this one is interestingly illustrative, and I'll probably want to refer to this diff and the one that will follow it in the future as part of architecture design records (or even possibly experience reports about golang syntax). In this commit: we have three packages: - schema: full of interfaces (and only interfaces) - schema/compiler: creates values matching schema interfaces - schema/dmt: contains codegen'd types that parse schema documents. The dmt package feeds data to the compiler package, and the compiler package emits values matching the schema interface. This all works very nicely and avoids import cycles. (Avoiding import cycles has been nontrivial, here, unfortunately. The schema/schema2 package (which is still present in this commit, but will be removed shortly -- I've scraped most of it over into this new 'compiler' package already, just not a bunch of the validation rules stuff, yet) was a dream of making this all work by just having thin wrapper types around the dmt types. This didn't fly... because codegen'd nodes comply with `schema.TypedNode`, and complying with `schema.TypedNode` means they have a function which references `schema.Type`... and that means we really must depend on that interface and the package it's in. Ooof.) The big downer with this state, and why things are currently non-compiling at this checkpoint I've made here, is that we have to replicate a *lot* of methods into single-use interfaces in the schema package for this to work. This belies the meaning of "interface". The reason we'd do this -- the reason to split 'compiler' into its own package -- is most because I wanted to keep all the constructor mechanisms for schema values out of the direct path of the user's eye, because most users shouldn't be using the compiler directly at all. But... I'm shifting to thinking this attempt to segregate the compiler details isn't worth it. A whole separate package costs too much. Most concretely, it would make it impossible to make the `schema.Type` interface "closed" (e.g. by having an unexported method), and I think at that point we would be straying quite far from desired semantics.
-
- 21 Jan, 2021 2 commits
-
-
Daniel Martí authored
This means we no longer clutter the repository with lots of files, even if they are git-ignored. It's always a bit of a red flag when you run "go test ./..." and the result is a bunch of leftover files. We still want to keep the files around, for the sake of Go's build cache. And we still want their paths to be static between "go test" runs. So put them in a static dir under os.TempDir. This does mean that concurrent runs of these tests will likely not work well. I don't imagine that's going to be a problem anytime soon, though. If it really becomes a problem in the future, we could figure something out like grabbing a file lock for the directory. The idea behind using os.TempDir is that it will likely remain in place between a number of "go test" runs within a hacking session, but it will be eventually cleaned up by the system, such as when rebooting. Note that we need to use globbing since one can't build "proper packages" located outside a module. The only exception is building an ad-hoc set of explicit Go files. While at it, use filepath.Join, to be nice.
-
Daniel Martí authored
I started rewriting the "getting started" Go guide to use fluent/qp instead of fluent, but I'm missing some of the helpers since the guide uses links, among others. This is largely copy-pasted, but there are just a handful of types and it's barely a dozen lines per type. A generator is not worth it for 100 lines of code that will rarely ever need to change.
-
- 18 Jan, 2021 2 commits
-
-
Daniel Martí authored
This is what I came up with, building on top of Eric's quip. I don't want to waste too much time naming this, and I like two-letter package names in place of dot-imports, so "qp" seems good enough for now. They are the "strong" consonants when one says "Quick iPld". First, move the benchmarks comparing all fluent packages to the root fluent package, to keep things a bit more tidy. Second, make all the benchmarks report their allocation stats, without having to always remember to use the -benchmem flag. Third, add a qp benchmark. Fourth, notice a couple of potential bugs in the quip benchmarks, and add TODOs for them. Finally, add the qp API. It differs from quip in a few external ways: 1) No error pointers. Instead, it uses panics which are recovered at the top-level API layer. This reduces verbosity, removes the "forgot to handle an error" type of mistake, and does not affect performance thanks to the defers being statically allocated in the stack. 2) Supposed better composition. For example, one can use MapEntry along with Map to have a map inside another map. In contrast, quip requires either an extra layer of func literals, or extra API like AssignMapEntryString. 3) Thanks to the points above, the API is significantly smaller. Note that some helper APIs like Bool are missing, but even when added, qp should expose about half the API funcs taht quip does. This is the first proof of concept. I'll probably finish adding the rest of the API helpers when I find the first use case for qp. Benchmark numbers, with perflock and benchstat on my i5-8350u laptop: name time/op Quip-8 1.39µs ± 1% QuipWithoutScalarFuncs-8 1.42µs ± 2% Qp-8 1.46µs ± 2% name alloc/op Quip-8 912B ± 0% QuipWithoutScalarFuncs-8 912B ± 0% Qp-8 912B ± 0% name allocs/op Quip-8 18.0 ± 0% QuipWithoutScalarFuncs-8 18.0 ± 0% Qp-8 18.0 ± 0%
-
Daniel Martí authored
In particular, use keys for ipld error structs. These have one field, so the changes are pretty simple. Reduces 'go vet ./...' from 2647 lines of output to 2365. Updates #102.
-
- 10 Jan, 2021 3 commits
-
-
Daniel Martí authored
Practically every subtest ends up at 7 or so levels of names, like: TestMapsContainingMaybe/maybe-using-ptr/generate/compile/bhvtest/non-nullable/typed-create However, note that the "generate" and "compile" levels are always there, so their presence just adds verbosity in the output and makes the developer's life more difficult. Extremely nested sub-tests are already rare, so at least we can just keep the components that add useful information in the output. "bhvtest" is also pretty redundant, but that one actually matters - its subtest can be skipped depending on build tags.
-
Daniel Martí authored
%q is superior to the manually quoted "%s", since it will properly escape the inner string when quoting. For example: "here goes a double quote: \" " Even if this does not matter in practice, using %q is easier too.
-
Daniel Martí authored
In particular, this removes ~50 out of the 2.7k warnings in 'go vet ./...' in this repository. Mainly, the "unreachable code" ones. This was caused by edge cases in some of the generated code which caused an unconditional return or panic statement to be followed by other code. Fix all of them with a bit more template logic. Some of the Next methods go a bit further. If they serve no purpose as the switch has no cases to be matched, just unconditionally return an error. In the future we can perhaps reuse a single function for that. Finally, I was having a hard time actually following the logic in kindedUnionNodeAssemblerMethodTemplateMunge, so I've indented the code a bit to follow the template logic and scoping. These changes move us towards pleasing vet, which is nice, but also make the code waste a bit less space.
-
- 08 Jan, 2021 2 commits
-
-
Eric Myhre authored
Introduce 'quip' data building helpers.
-
Eric Myhre authored
Lots of individual things: - removed the "Begin*" functions; no need to expose that kind of raw operation when the callback forms are zero-cost. - renamed functions for consistent "Build" vs "Assemble". - added "Assign*" functions for all scalar kinds, which reduces the usage of "AbsorbError" (but also, left AbsorbError in). - renamed the ListEntry/MapEntry functions to also have "Assemble*" forms (still callback style). - while also adding Assign{Map|List}Entry{Kind} functions (lets you get rid of another callback whenever the value is a scalar). - added Assign{|MapEntry|ListEntry} functions, which shell out to fluent.Reflect for even more convenience (at the cost of performance). - moved higher level functions like CopyRange to a separate file. - new benchmark, for the terser form of working with scalars. (It's also evidently slightly faster, because fewer small function calls. Slightly surprising considering how much inlining we might expect, but, huh. Alright, surprise bonus; acceptable.) - example function updated to use the terser form. With these terseness improvements to handling of scalars, the overall SLOC count for using the quip system is now exactly on par with fluent. Varations on map key arguments (which could be PathSegment or even Node, in addition to string) still aren't made available this. Perhaps that's just okay. If you're really up some sort of creek where you need that, you can still just use the MapAssembler.AssembleKey system directly, which can do everything.
-
- 07 Jan, 2021 2 commits
-
-
Eric Myhre authored
gengo: support for unions with stringprefix representation.
-
Eric Myhre authored
-
- 05 Jan, 2021 1 commit
-
-
Eric Myhre authored
If there's any furhter action to take after the callback, it's necessary to recheck the error slot before taking that action; and of course we definitely don't want to overwrite the error if one had been set during the callback. I wonder if it would ever be useful to have a variant of these functions which *does* attempt to call `Finish`, etc, and thus still build a partial tree at the end even if it stopped on error midway. (Right now, if something stops midway, the final `Build` call will panic, and tell you you haven't finished all the assemblers.) It would be a bit more complicated though: we'd potentially need to accumulate more than one error. And in practice, when working on schema'd data, it would often still result in invalid results (anything other than optional fields, or type-level maps and lists, will fail some other logical rule if not fully filled in), which makes me wonder how often this would be useful.
-
- 03 Jan, 2021 5 commits
-
-
Eric Myhre authored
-
Eric Myhre authored
target of opporunity DRY improvement: use more shared templates for structs with stringjoin representations. (Encountered while working on support unions with stringprefix representations.)
-
Eric Myhre authored
Should not affect most user code; though these are technically exported symbols, they're very unlikely to be used directly.
-
Eric Myhre authored
-
Eric Myhre authored
-