diff --git a/docs/arch.md b/docs/arch.md index 34bf66e924e3c27cc19ed37cdc09dbfd5174fb10..2ef9eabc5f79aaede31a49ab8635fe5b52f80fdc 100644 --- a/docs/arch.md +++ b/docs/arch.md @@ -8,11 +8,11 @@ DMS3 software runs on network *endpoint nodes* [^1] and consist of thin decentra [^3]: network protocol stack enables an endpoint to interact with peer endpoints and access technological capability without compromising personal security and privacy. The peer-to-peer (p2p) network protocol network implements an overlay network over the Internet reducing network threat footprint. [^4]: fault tolerant data services unburden greenfield application developers from the complexities of implementing data scaling, reliability, and availability. -|![Architecture Overview](img/dms3-1.jpg) | +|![Architecture Overview](img/dms3-2.jpg) | |:--:| | **Fig. 1: Architecture Overview** | -### DApp +### DApps DMS3 provides discovery of, and access to, decentralized applications that run on endpoint nodes. Dapp generated data persists on an endpoint node that runs a dapp. @@ -22,17 +22,25 @@ Dapps may communicate with other compatible dapps via a locally running DMS3 API The DMS3 API provide a layer of isolation to dapps from the affects of changes in the network layer that implement new capabilities. -### Network Protocol Stack +### Network Protocol Stacks -The network layer consists of two major subsystems, a permission-less decentralized blockchain network for p2p interactions and a centralized permission-ed network for high performance access to scalable information storage and retrieval providing data protection, reliability and availability. +The network layer consists of several major subsystems. Permission-less decentralized blockchain networks for p2p services and a centralized permission-ed network for data cloud services providing high performance access to scalable information storage, retrieval, protection, reliability, availability, and distribution services. -#### P2P Protocol Network +#### Financial Blockchain A blockchain network similar to Ethereum. -#### PBFT Storage and Retrieval +#### Information Blockchain -Practical Byzantine Fault Tolerant service implementations of Filesystem and Search Engine services. This network provides data hosting services extending personal compute and storage node capacity via centralized network services that protect personal data and privacy without compromising endpoint control over generated data. +A blockchain network similar to IPFS. + +#### Data Cloud Services + +Practical Byzantine Fault Tolerant services. This network provides data storage, retrieval, protection, and distribution services extending personal compute and storage node capacity via centralized network services that protect personal data and privacy without compromising endpoint control over generated data. + +#### P2P Network + +A permission-less peer-to-peer network providing disintermediated services between network nodes, based on LIBP2P used by other projects including Ethereum and IPFS. ### Network Topology diff --git a/docs/css/theme_extra.css b/docs/css/theme_extra.css new file mode 100644 index 0000000000000000000000000000000000000000..98f4981a1c36d44716f4a74003328e1fcfc3c9db --- /dev/null +++ b/docs/css/theme_extra.css @@ -0,0 +1,373 @@ +/* + this style file was created primarily to apply + documentation brand color scheme. the standard + MkDocs colors are too bright and harsh on my eyes. + + the color scheme is subject to future redesign. +*/ +a { + color: #3b6018 !important; +} + +.btn-neutral { + background-color: #9da396 !important; + color: #d8e1cf !important; + border-color: #c0c7b8 !important; + +} + +.hljs, code { + background-color: #333333 !important; + border-color: #555555 !important; +} + +.wy-nav-content-wrap { + background-color: #111111 !important; +} + +.wy-nav-side { + background: #1e211c !important; +} + +.wy-side-nav-search { + background-color: #2b4016 !important; + color: yellow !important; +} + +.wy-side-nav-search > a.icon { + color: #cdd0cb !important; +} + +.wy-nav-content { + /* background: #222222 !important; */ + background: #1b1c19 !important; + /* color: #777777 !important; */ + color: #a7aca3 !important; +} + +.wy-menu { + background-color: #333333 !important; + color: #888888 !important; +} + +.wy-menu-vertical ul.tocbase li a { + background-color: #2f302e !important; + color: #dbeace !important; +} + +/* + * Sphinx doesn't have support for section dividers like we do in + * MkDocs, this styles the section titles in the nav + * + * https://github.com/mkdocs/mkdocs/issues/175 + */ +.wy-menu-vertical span { + line-height: 18px; + padding: 0.4045em 1.618em; + display: block; + position: relative; + font-size: 90%; + color: #838383; +} + +/* + * Long navigations run off the bottom of the screen as the nav + * area doesn't scroll. + * + * https://github.com/mkdocs/mkdocs/pull/202 + * + * Builds upon pull 202 https://github.com/mkdocs/mkdocs/pull/202 + * to make toc scrollbar end before navigations buttons to not be overlapping. + */ +.wy-nav-side { + height: calc(100% - 45px); + overflow-y: auto; + min-height: 0; +} + +.rst-versions{ + border-top: 0; + height: 45px; +} + +@media screen and (max-width: 768px) { + .wy-nav-side { + height: 100%; + } +} + +/* + * readthedocs theme hides nav items when the window height is + * too small to contain them. + * + * https://github.com/mkdocs/mkdocs/issues/#348 + */ +.wy-menu-vertical ul { + margin-bottom: 2em; +} + +/* + * Wrap inline code samples otherwise they shoot of the side and + * can't be read at all. + * + * https://github.com/mkdocs/mkdocs/issues/313 + * https://github.com/mkdocs/mkdocs/issues/233 + * https://github.com/mkdocs/mkdocs/issues/834 + */ +code { + white-space: pre-wrap; + word-wrap: break-word; + padding: 2px 5px; +} + +/** + * Make code blocks display as blocks and give them the appropriate + * font size and padding. + * + * https://github.com/mkdocs/mkdocs/issues/855 + * https://github.com/mkdocs/mkdocs/issues/834 + * https://github.com/mkdocs/mkdocs/issues/233 + */ +pre code { + white-space: pre; + word-wrap: normal; + display: block; + padding: 12px; + font-size: 12px; +} + +/* + * Fix link colors when the link text is inline code. + * + * https://github.com/mkdocs/mkdocs/issues/718 + */ +a code { + color: #2980B9; +} +a:hover code { + color: #3091d1; +} +a:visited code { + color: #9B59B6; +} + +/* + * The CSS classes from highlight.js seem to clash with the + * ReadTheDocs theme causing some code to be incorrectly made + * bold and italic. + * + * https://github.com/mkdocs/mkdocs/issues/411 + */ +pre .cs, pre .c { + font-weight: inherit; + font-style: inherit; +} + +/* + * Fix some issues with the theme and non-highlighted code + * samples. Without and highlighting styles attached the + * formatting is broken. + * + * https://github.com/mkdocs/mkdocs/issues/319 + */ +.no-highlight { + display: block; + padding: 0.5em; + color: #333; +} + + +/* + * Additions specific to the search functionality provided by MkDocs + */ + +.search-results article { + margin-top: 23px; + border-top: 1px solid #E1E4E5; + padding-top: 24px; +} + +.search-results article:first-child { + border-top: none; +} + +form .search-query { + width: 100%; + border-radius: 50px; + padding: 6px 12px; /* csslint allow: box-model */ + border-color: #D1D4D5; +} + +.wy-menu-vertical li ul { + display: inherit; +} + +/* + * Improve inline code blocks within admonitions. + * + * https://github.com/mkdocs/mkdocs/issues/656 + */ + .admonition code { + color: #404040; + border: 1px solid #c7c9cb; + border: 1px solid rgba(0, 0, 0, 0.2); + background: #f8fbfd; + background: rgba(255, 255, 255, 0.7); +} + +/* + * Account for wide tables which go off the side. + * Override borders to avoid wierdness on narrow tables. + * + * https://github.com/mkdocs/mkdocs/issues/834 + * https://github.com/mkdocs/mkdocs/pull/1034 + */ +.rst-content .section .docutils { + width: 100%; + overflow: auto; + display: block; + border: none; +} + +.rst-content dl:not(.docutils) dt { + /* border-top: solid 3px #6ab0de; */ + border-top: solid 3px #416f16; + background: none; + /* color: #2980B9; */ + color: #305012; +} + +td, th { + border: 1px solid #e1e4e5 !important; /* csslint allow: important */ + border-collapse: collapse; +} + +/* + * readthedocs-nested overrides + * +*/ + +/* .wy-menu-vertical li ul { + display: none +} */ + +/* .wy-menu-vertical li.current ul { + display: block; +} */ + +/* + * Pallet 227 - 201 - 175 + * e3e3e3 c9c9c9 bdbdbd + */ + +.wy-menu-vertical ul.tocbase ul.current { + display: block !important; +} + +.wy-menu-vertical ul.tocbase ul.toc-hidden { + display: none !important; +} + +.wy-menu-vertical ul.tocbase li a { + position: relative; + color: #808080; + font-size: 14px; + font-weight: 400; +} + +.wy-menu-vertical ul.tocbase li.current.with-children > a { + color: #333333; + font-weight: bold; +} + +.wy-menu-vertical ul.tocbase span.toctree-expand { + position: absolute; + top: 6px; + margin: 0px; + padding: 0px; + font-size: 12px !important; + font-weight: 400; + color: #333333; +} + +.wy-menu-vertical ul.tocbase li.toctree-l2 > a > span.toctree-expand { + left: 6px; +} +.wy-menu-vertical ul.tocbase li.toctree-l3 > a > span.toctree-expand { + left: 16px; +} +.wy-menu-vertical ul.tocbase li.toctree-l4 > a > span.toctree-expand { + left: 26px; +} +.wy-menu-vertical ul.tocbase li.toctree-l5 > a > span.toctree-expand { + left: 36px; +} + +.wy-menu-vertical ul.tocbase li.navtree.toctree-l1.inactive > a, +.wy-menu-vertical ul.tocbase li.navtree.toctree-l2.inactive > a { + color: #b3b3b3; +} + +/* Selected top-level heading */ +.wy-menu-vertical .navtree li.toctree-l2.current, +.wy-menu-vertical .navtree li.toctree-l2.current > a { + background: #fcfcfc; +} +.wy-menu-vertical .navtree li.toctree-l2.current > a, +.wy-menu-vertical .navtree li.toctree-l2.current > span { + color: #404040 !important; +} + +.wy-menu-vertical .navtree li.toctree-l2.current > a { + padding: 6px 0 6px 22px; + font-weight: bold; +} + +/* Level 2 heading */ +.wy-menu-vertical ul.navtree.subnav-l1 ul.subnav-l2, +.wy-menu-vertical ul.navtree.subnav-l1 li.toctree-l2.current li.toctree-l3 > a { + background: #e3e3e3; +} + +.wy-menu-vertical ul.subnav-l1 ul.subnav-l2, +.wy-menu-vertical ul.subnav-l1 li.toctree-l2.current li.toctree-l3 > a { + background: #c9c9c9; +} + +.wy-menu-vertical ul.tocbase li.toctree-l2.current li.toctree-l3 > a { + padding: 6px 0 6px 32px; +} + +/* Level 3 heading */ +.wy-menu-vertical ul.navtree.subnav-l1 li.toctree-l2.current li.toctree-l3.current > a, +.wy-menu-vertical ul.navtree.subnav-l1 li.toctree-l3.current li.toctree-l4 > a { + background: #c9c9c9; +} +.wy-menu-vertical ul.subnav-l1 li.toctree-l2.current li.toctree-l3.current > a, +.wy-menu-vertical ul.subnav-l1 li.toctree-l3.current li.toctree-l4 > a { + background: #bdbdbd; +} +.wy-menu-vertical li.toctree-l3.current li.toctree-l4 > a { + padding: 6px 0 6px 42px; +} + +/* Level 4 heading */ +.wy-menu-vertical ul.tocbase li.toctree-l3.current li.toctree-l4.current > a, +.wy-menu-vertical li.toctree-l4.current li.toctree-l5 > a { + background: #bdbdbd; +} +.wy-menu-vertical li.toctree-l4.current li.toctree-l5 > a { + padding: 6px 0 6px 52px; +} + +.wy-menu-vertical ul.navtree.subnav-l1 li.toctree-l2.current li.toctree-l3 > a:hover, +.wy-menu-vertical ul.subnav-l1 li.toctree-l2.current li.toctree-l3 > a:hover, +.wy-menu-vertical ul.navtree.subnav-l1 li.toctree-l2.current li.toctree-l3.current > a:hover, +.wy-menu-vertical ul.tocbase li.toctree-l4 > a:hover, +.wy-menu-vertical ul.tocbase li.toctree-l4 > a:hover, +.wy-menu-vertical ul.tocbase li.toctree-l5 > a:hover, +.wy-menu-vertical ul.tocbase li.toctree-l6 > a:hover, +.wy-menu-vertical ul.tocbase li.toctree-l7 > a:hover { + /* background: #d6d6d6 !important; */ + background: #484b45 !important; +} diff --git a/docs/design.md b/docs/design.md index dd4fe3472348f99036d216b4db760163c2c2a7b8..f999e1de2f9885a64520a91acb16119188c73d40 100644 --- a/docs/design.md +++ b/docs/design.md @@ -14,15 +14,15 @@ Consistent performance is achieved by enforcing configurable upper bound on the Information is organized into index repository **_containers_** [^1] considering multiple criteria: -[^1]: container and repository is used interchangeably in this document to mean the set of files and folders the search engine uses to track volatile and persistent state of a repository. +[^1]: container, and repository is used interchangeably in this document to mean the set of files and folders the search engine uses to track volatile and persistent state of a repository. -1. A document **_kind_** suggests a certain structure and semantics context for a class of similar documents. Documents of a certain kind are said to conform to a common schema and contain information of a common theme. +1. A document **_kind_** suggests a certain structure and semantic context for a class of similar documents. Documents of a certain kind are said to conform to a common schema and contain information of a common theme. The configuration of a document kind defines an **_infospace_** that specifies the concrete structure of documents in the index repository conveyed to the search engine. The semantics of substructures in the document are subject to the application that generates and processes the document. -2. An **_infostore_** is a set of containers that collects documents of similar kind. As content is added to a repository and as that repository grows and reaches a configurable document limit, the set is expanded with an additional repository to host additional documents of the same kind. The container-set or **_reposet_** [^2] may expand up to an architectural limit and provides a mechanism to physically shard information. The repository document limit and other life-cycle properties can be configured based on the kind of infospace. An infostore serves to limit search scope of a query target. +2. An **_infostore_** is a set of containers that collects documents of similar kind. As content is added to a repository and as that repository grows and reaches a configurable document limit, the set is expanded with an additional repository to host additional documents of the same kind. The container-set or **_reposet_** [^2] may expand up to an architectural limit and provides a mechanism to physically shard information. The repository document limit and other life-cycle properties can be configured based on the kind of infospace. An infostore serves to limit the search scope of a query target. [^2]: container-set and reposet is used interchangeably in this document, it represents a generic term that refers to the set of index repositories that form an infostore. -3. A **_metastore_** is a set of index repositories that collects documents providing metadata for corresponding infostores of similar kind. A metastore serves to limit search scope of a query target. Each document in a metastore repository describes the kind of data contained in an associated infostore. A metastore also represents a manifest for hosted information, serves as a testament made by the endpoint user regarding hosted information. +3. A **_metastore_** is a set of index repositories that collects documents providing metadata for corresponding infostores of similar kind. A metastore serves to limit search scope of a query target. Each document in a metastore repository describes the kind of data contained in an associated infostore. A metastore also represents a manifest for hosted information, serves as a testament made by the endpoint user regarding hosted information. Matching kind names are used to associate metastores and infostores. Infostore and metastore life-cycle management is at the purview of a user managing a DMS3 endpoint node, also known as a **_source_** of information. @@ -58,7 +58,9 @@ Any source managing a dms3 node may decide independently to assume a role as **_ ## Installing the software -Installation instructions will be provided at a later time. DMS3 builds on forked versions of key open source software components. A partial list of key components is provided below, additional credits will be provided at a later time. +Installation instructions will be provided at a later time. + +DMS3 builds on forked versions of key open source software components. We thank all open source contributors and acknowledge their community contributions. A partial list of key components is provided below, additional credits will be provided at a later time. - [IPFS](https://ipfs.io/) with modifications to support new capabilities. In particular dms3 adds the ability to manage document index repositories and dis-intermediated search. @@ -90,7 +92,6 @@ A summary of _index_ repository life-cycle commands are listed below. - use _mkdoc_ to generate a document template to be indexed - use _addoc_ to index a document - use _rmdoc_ to delete a document added to an index repository -- use _publish_ to share, allowing p2p notes to find, documents in an index - use _ls_ to display a list of index repositories - use _show_ to display a document in an index repository - use _stat_ to display index repository statistics @@ -108,7 +109,7 @@ To-Be-Specified ## Search Engine -DMS3 uses a forked version of the open source [Indri Search Engine](https://sourceforge.net/p/lemur/wiki/Home/) as its underlying Search Engine (**_SE_**) with some modifications that allow extensible non-text data type support. +DMS3 uses Vank as its underlying Search Engine (**_SE_**). Vank is a forked version of the open source [Indri Search Engine](https://sourceforge.net/p/lemur/wiki/Home/) augmented with a library that allows extensible application data type support. Refer to the following page for [a background on the Indri search engine](https://sourceforge.net/p/lemur/wiki/Language%20Modeling%20and%20Information%20Retrieval%20Background/). @@ -116,65 +117,104 @@ Refer to [Indri repository structure](https://sourceforge.net/p/lemur/wiki/Indri Refer to [Indri parameters file](https://sourceforge.net/p/lemur/wiki/IndriBuildIndex%20Parameters/) for a discussion of search engine indexer configuration parameters. -## Application programming interface (API) +## Application Programming Interface (API) + +DMS3 provides an API to simplify dapp development that targets the platform. Details of the dms3api will be provided at a later time. + +The API helps dapps create or connect to multiple index repositories. Each repository has an active in-memory index where new documents are added to the index dynamically. In-memory indexes are written out to disk for durability, and when the number of indexes grows a background process merges repository indexes consolidating them on disk. All this dynamic index management is performed by the search engine for each of the connected repository contexts. + +DMS3 treats each index repository as a container, and manages the growth in information using multiple kinds of reposets to physically shard information in bounded containers. + +### Information Blockchain + +**_DMS3 Information block chain_** is a filing system for organizing information to facilitate the creation, storage, retrieval, and sharing of information libraries that scale in size and offer high performance search. + +Information is organized into a _container namespace_ managed by lifecycle management libraries and a high performance search engine as searchable **_index repository sets_ (a.k.a. reposet)**. + +The container namespace is identified by an ordered set of _component name keys_ defined as: + +**_type_** +: A container has a **_type_** classifying the contained information as either data (value _"infostore"_) or metadata (value _"metastore"_). + +**_kind_** +: A container is assigned a unique abstract **_kind_** that defines a common _structure_ and _semantic theme_ over the documents it contains. The value for kind is a string chosen by the end user. + +**_name_** +: A container has a **_name_** value assigned by the end user that is unique within a kind. Information added into an index repositories is secured by the DMS3 blockchain. + +**_setID_** +: A container is assigned a **_setID_** that specifies the order of the container within the reposet. The value is an integer bound by a maximum number defined by the architecture, initial range is [1,255]. + +**_wwID_** +: The index library assigns **_wwID_** (who-what ID) that identifies the app (who) and the index parameters name (what) that created the index. When creating an index with the dms3api, the application name will always be "dms3" and the index parameters name will match the pre-configured _kind_ value used to generate the parameters file. + +**_volID_** +: A container is assigned a **_volID_** that specifies a volume id. The volume id conveys logical group information about the documents within the container. The volume id is assigned by the lifecycle management library. + +There are several DMS3 storage areas used to build the DMS3 information blockchain: + +Local File Store (lfs) +: Index repository state is stored in the local dms3 repository and accessed via the search engine interface. This represents a _Mutable Index Repository_. + +Unix File Store (ufs) +: Index metadata and document data files are added to the dms3 UnixFS store, so they may be used to reconstitute an index repository or be used to share information on the p2p network. This is part of the _Immutable Index Repository_. -DMS3 provides an API to simplify dapp developments that target the platform. Details of the dms3api will be provided at a later time. +Key Value Store (kvs) +: Index repositories metadata is stored in a dms3 key-value store to enable recovery of index repositories in the local file store, and to share information on the p2p network. This is part of the immutable index repository. -The API helps dapps create or connect to multiple repositories. Each repository has an active in-memory index where new documents are added to the index dynamically. In-memory indexes are written out to disk for durability, and when the number of indexes grows a background process merges repository indexes consolidating them on disk. All this dynamic index management is performed by the search engine for each of the connected repository contexts. +### Mutable Index State -dms3 treats each index repository as a container, and manages the growth in information using multiple kinds of reposets to physically chard information in bounded containers. +The mkidx command is used to create a mutable index repository. -### Index repository state +dms3 uses a path naming convention for each index repository on the local file system within the _index_ sub-folder that contains the repository root _reposet_. -dms3 uses a path naming convention for each index repository on the local file system within the _index_ sub-folder. In the following example: +The following shows an example index repository path: ```bash -ls ~/.dms3/index/reposet/blog/myblog20/w1543348319-a1-c1-o0/ +ls ~/.dms3/index/reposet/blog/infospace-myblog20/1/dms3-blog/w1543348319/ ``` -where path components have the following implied semantics + +Path component keys map the index into the container namespace: ~/.dms3 -: initialized local file system dms3 root +: initialized default local file system dms3 repository root /index -: initialized local file system index repository root +: local file system dms3 index repository root /reposet -: root folder for all resposets on the local file system +: root folder for all dms3 index repositories on the local file system /blog -: _kind_ name of the reposet, shared by all repos in the set. +: _kind_ name of the reposet, shared by all repos within the set. -/myblog20 -: root folder name for a physical repo instance +/infospace-myblog20 +: _type_-_name_ of a repo providing applications with a _Physical Sharding Mechanism_. -/w1543348319-a1-c1-o0 -: root folder for a logical grouping repo instance +/1 +: _setID_ of a repo providing container set growth for a kind of information - with name components encoding index metadata: +/dms3-blog +: library assigned _wwID_ label identifying the app name hyphen-separated index parameters name used to create the index. - w - : window start time value, seconds since epoch +/w1543348319 +: library assigned _volID_ of a repo providing applications with a _Logical Sharding Mechanism_. - a - : area value associated with documents - c - : category value associated with documents +### Key Search Engine Properties - o - : offset value used to encode absolute times - area/category designations provide applications with a logical sharding mechanism. +This section describes some key low level interface of the search engine. Detailed documentation on the search engine will be provided later and is outside the scope of this document. +#### Index Metadata and Fields -### Index metadata and fields +The indexer defines metadata as non-searchable _fields_ and provides forward and reverse document lookup by metadata value. Whereas other fields can be used to qualify query searches. -The indexer defines metadata as non-searchable _fields_ and provides forward and reverse document lookup by metadata value. Whereas, fields can be used to qualify query searches. +#### Document Structure and Schema +_Details to be documented at a later time..._ -### Document structure and schema Fields are used to convey document structure and are used to influence document queries. -### Supported document parsers +#### Supported Document Parsers The preconfigured file class environment and parsers [indexer file formats](https://sourceforge.net/p/lemur/wiki/Indexer%20File%20Formats/) exhibit the metadata extraction behavior discussed here. @@ -190,6 +230,10 @@ path docno : /home/username/temp/data/sitemap.xml +Path to index +: The lfs path of the index repository + + For **html, xml, text** documents, the parser additionally adds the metadata. For example: filetype @@ -210,7 +254,7 @@ author The parser additionally adds the metadata listed within a document, regardless of position in the document tag hierarchy (i.e. whether the metadata fields are wrapped within a tag or not). the metadata associated with a document is not required to include all the metadata keys specified during index creation. -A file may contain multiple documents wrapped within a tag, in the absence of such a tag, the file represents a single document. +A file added to the index may contain multiple documents wrapped within a tag, in the absence of such a tag, the file represents a single document. The indexer requires one metadata field for documents: _docno_, the value of this field must be unique within the index, otherwise the document is @@ -220,22 +264,268 @@ When an application generated (using the addString interface) document includes first added by the parser, and the second added by the application. The parser will use the last occurrence as the authoritative _docno_ record. +### Immutable Index State + +Immutable index repository state enables mutable index recovery and sharing on the p2p network. + +The immutable index repository state is updated when a container is created, and when a document is added to a container. + +Immutable index repository state consists of the following dag node tree: + +```mermaid +erDiagram + Reposet-root ||--o{ Store-block : contains + Reposet-root { + Link-array-256 metastore-block + Link-array-256 infostore-block + } + Store-block ||--o{ RepoProps : contains + Store-block { + Link-array-65536 RepoProps + } + RepoProps { + String Type + String Kind + String Name + String Path + String CreatedAt + Link Params + Link-array-256 Corpus + Boolean PublishStatus + Integer PublishInterval + Map Stats + } + RepoProps ||--|| Params : links-to + Params { + Bytes Tagged-parameters-file + } + RepoProps ||--o{ Corpus : links-to + Corpus { + Link-array-65536 file-id + } +``` + +Initial design makes the following choices and assumptions: + +1) limit maximum block sizes to 2MiBs + +2) maximum corpus size per index repo container = 10TiBs + +3) an average document size of 1 KiB + +To accommodate the above assumptions, the following constraints can be computed: + +- max block size = 2 x 1024 x 1024 = 2097152 +- size of CID string = 32 bytes +- CIDs per block = 2097152 / 32 = 65536 +- max corpus size 10 x 1024 x 1024 x 1024 = 10737418240 +- max documents per container 10737418240 / 1024 = 10485760 +- max CIDs per container = 10485760 / 32 = 327680 +- max corpus blocks per container = 327680 / 65536 = 5 + +_describe repoprops and store blocks_ + + +### Managing Index Repositories + +#### Configure + +Index configuration property structure is shown below. Notes on current implementation constraints and limitations are discussed here. Configuration rules are likely to change as the functionality evolves. + +Once an index of a specific kind is created, its configuration parameters must not be modified. Otherwise the search engine will misbehave with incorrect query results or worse, the engine may crash. Modifying index structure will renumber terms like metadata and fields, effectively corrupting the configuration. + +When recovering a corrupted index from immutable state, the documents re-added to the index will need to be re-encoded, to avoid incorrect time information. Recovery tools will be developed at a later time. + +DMS3 configuration file allows configuring various index repository properties. The search engine supports its own set of configuration properties via the parameters file. The lifecycle management library further imposes additional conventions when mapping dms3 index configuration to create the search engine parameters file. The following is a summary of the current mapping conventions that will likely evolve over time: + +index and corpus parameter configuration + params[index] = cfg Indexer.Path * code overrides configured value * + params[corpus][annotation] = cfg Indexer.Corpus.Annotation * not used * + params[corpus] = cfg Indexer.Corpus + params[corpus][path] = cfg Indexer.Corpus.Path * code overrides configured value * + params[corpus][class] = cfg Indexer.Corpus.Class + params[corpus][metadata] = cfg Indexer.Corpus.Metadata * not used * + +optional parameter configuration + params[memory] = cfg Indexer.Memory + params[stemmer][name] = cfg Indexer.Stemmer + params[normalize] = cfg Indexer.Normalize + params[stopper][word] = cfg Indexer.Stopper[i] + +metadata and field parameter configuration is hard coded + +document kind-specific field parameter configuration + params[field][name][f] = cfg Metadata.Kind[i][f] + +note: the infospace interface can override some parameters at time of +index creation (see MakeIndex). + +TODO: remove from index configuration: + Indexer.Corpus: + annotations not used + path is overriden (computed) + metadata not used + + Indexer.Path: + path is overriden (computed) + + Indexer.Stopper: [] + is overriden, or complemented by global stopwords file + +
+{
+  "Indexer": {
+    "Corpus": {
+      "Class": "html",
+      "Path": ""
+    },
+    "MaxDocs": "100M",
+    "Memory": "100M",
+    "Normalize": true,
+    "Path": "",
+    "Stemmer": "krovetz",
+    "Stopper": [
+      "a",
+      "an",
+      "the",
+      "as"
+    ]
+  },
+  "Metadata": {
+    "Kind": [
+      {
+        "Field": [
+          "About",
+          "Address",
+          "Affiliation",
+          "Author",
+          "Brand",
+          "Citation",
+          "Description",
+          "Email",
+          "Headline",
+          "Keywords",
+          "Language", Valid
+          "Name",
+          "Telephone",
+          "Version"
+        ],
+        "Name": "blog"
+      }
+    ]
+    "Publisher": [
+      {
+        "Schedule:" [
+          {
+            "Interval": "immediately",
+            "Status": "enabled",
+            "Name": [
+              "myblog20"
+            ]
+          },
+          {
+            "Interval": "daily",
+            "Status": "enabled",
+            "Name": [
+              "mideastern-foods"
+            ]
+          },
+          {
+            "Interval": "weekly",
+            "Status": "disabled",
+            "Name": [
+            ],
+          },
+        ]
+      }
+    ]
+  },
+  "Retriever": {
+    "MaxResultCount": 100
+  }
+}
+
+ +_Details to be documented at a later time..._ + +#### Index + +_Details to be documented at a later time..._ + +#### Query + +_Details to be documented at a later time..._ + +#### Track + +A key-value is stored in the dms3 KVS data store when a new index repository is created. + +A key composing the container namespace name of an index repository is used to lookup repo statistics in the KVS. + +_Additional details to be documented at a later time..._ + + +#### Recover + +The immutable index state is used to reconstruct the mutable index state. + +Index container recovery will be on a container instance basis. + +_Details to be documented at a later time..._ + +#### Publish + +An index may be marked for publishing to share its content on the p2p network, + +The publish properties of an index is specified by the index configuration file. + +A number of mutually exclusive publishing schedules are supported. An index repository may be assigned to at most one schedule. + +Publishing properties are bound to the repository _name_ key of the container namespace, and affects all container instances within the named sub-space. + +The publish properties define: + +Status +: Current publishing status. The value is _enabled_ or _disabled_. The default value is initialized to _disabled_ when the index repository is created. + +Interval +: The interval duration at which index state updates are published. Valid interval values include: _immediate_, _daily_, _weekly_, _biweekly_, _monthly_, _quarterly_, _half-annual_, _annual_ + +Name +: The list of index repository names to be published. + +Once index publishing schedule is configured, you must run the daemon with index publishing and subscription feature enabled: + +``` +dms3 daemon [--enable-index-pubsub] +``` -## Network protocol stack +## Network Protocol Stacks DMS3 offers two classes of fault tolerant services. -1. Centralized practical byzantine fault tolerant (PBFT) services provide protection for personal private data. +1. Decentralized Information Blockchain protocol services to provide distribution and access services for shared public data. - For personal information not publicly shared, DMS3 offers centralized PBFT services for maintaining immutable data integrity. The immutable state for a repository consists of the initial configuration parameters, documents added to the repository corpus, and other repository management state. +2. Decentralized Financial Blockchain protocol services to provide smart contract based information trading services. -2. Decentralized p2p protocol services to provide distribution and access services for shared public data. +3. Centralized practical byzantine fault tolerant (PBFT) services provide data storage scaling, protection, access, and distribution services. + +4. Decentralized p2p protocol services to provide distribution and access services for shared public data. - For shared public information, this service defrays the hosting and access costs of dedicated compute and storage resources. The following sub-sections describe these services. -### Practical byzantine fault tolerant (PBFT) services +### Decentralized Information Blockchain Services + +_Details to be documented at a later time..._ + + +### Decentralized Financial Blockchain Services + +_Details to be documented at a later time..._ + + +### Practical Byzantine Fault Tolerant (PBFT) Services DMS3 allows participants to offer centralized data protection services to protect the users' personal information. @@ -252,6 +542,8 @@ dms3 implements two PBFS services. DMS3 PBFS services automatically recover from a configured number of arbitrary simultaneous faults. +_Additional details to be documented at a later time..._ + #### Index fault recovery @@ -268,13 +560,13 @@ There are two options to recover an index repository. #### Publishing a repository -Use the _index_ _publish_ command to enable other users on the p2p network to search and access the contents of a repository. +Use the _index_ _config_ command to enable publishing of repositories to be shared with other users on the p2p network. ### Published information distribution and protection services -High demand for published information can overburden participant's compute resources, dms3 enables optional paid p2p services to offload compute and storage onto other p2p nodes. +High demand for published information can overburden participant's compute resources. DMS3 enables optional paid p2p services to offload bandwidth, compute, and storage loads onto a proprietary fault tolerant centralized Data Cloud. -p2p protocol network enables participants to contribute compute and storage resources to gain income in return to the use of their resources. +p2p protocol network also enables participants to contribute compute and storage resources to gain income in return to the use of their resources. #### Information curators diff --git a/docs/diagrams.md b/docs/diagrams.md index 250a5e0815c5927292f1b848e0293de8deaeef63..82045cafc13e8063d39cf802187f1701c07a23b5 100644 --- a/docs/diagrams.md +++ b/docs/diagrams.md @@ -64,3 +64,23 @@ graph TD; B-->D; C-->D; ``` + +```mermaid +erDiagram + CUSTOMER ||--o{ ORDER : places + CUSTOMER { + string name + string custNumber + string sector + } + ORDER ||--|{ LINE-ITEM : contains + ORDER { + int orderNumber + string deliveryAddress + } + LINE-ITEM { + string productCode + int quantity + float pricePerUnit + } +``` diff --git a/docs/img/dms3-2.jpg b/docs/img/dms3-2.jpg new file mode 100644 index 0000000000000000000000000000000000000000..823038807163c7c73b334315f7ddaee8ca20b4b6 Binary files /dev/null and b/docs/img/dms3-2.jpg differ diff --git a/docs/notes.md b/docs/notes.md index 96871d30317d28f831a2ab527e71aba8d77aef7c..c649990f066a5410347756e42220acac559c5c38 100644 --- a/docs/notes.md +++ b/docs/notes.md @@ -1,556 +1,25 @@ -## Implementation Notes +## Appendix Notes -### Preexisting Component behavior +### Packaged Assets -### Peer Node Builder and Command Execution Environment +When DMS3 is initialized, a list of assets are added to the DMS3 UnixFS. -egrep --color NewNode ../go-dms3-fs/core/builder.go - implements NewNode, primary function used to build a new peer node object. +The list of assets include: -Following programs reference NewNode to create a new peer node: +Default stopwords +: This file is used by the search engine indexer. It is referenced from the params file generated when creating a new index container. the params file assumes this file exists at a well known path and file name in the dms3 repo index root (by default: ~/.dms3/index/stopwords) -egrep --color NewNode ../go-dms3-fs/cmd/dms3fswatch/main.go - Online: true - watch and adds local filesystem objects to dms3fs +Kind of Content schema +: Metadata defining document field structure and semantics. The definitions are used to create index containers hosting documents of a certain kind. -egrep --color NewNode ../go-dms3-fs/cmd/dms3fs/init.go - Online: false - adds default assets to dms3fs - initializes dms3ns keyspace +### Kind of Content Schema -egrep --color NewNode ../go-dms3-fs/cmd/dms3fs/main.go - Online: false - implements the primary CLI binary for dms3fs - command details specify necessary execution context (client or daemon) - client is used when command does not use repo and can run on client - daemon is used when running and command uses repo or details require it - see [go-dms3-fs/cmd/dms3fs/main.go]commandShouldRunOnDaemon +The index config file is initialized with document kind metadata from the pre-packaged asset list. The asset list kind schema is a subset sourced from a working group focused on defining schema standards: [schema.org](https://schema.org/). -egrep --color NewNode ../go-dms3-fs/cmd/dms3fs/daemon.go - Online: true - Runs a network-connected DMS3FS node +As a privacy concerned solution, the DMS3 subset excludes elements of schema typically used for tracking and targeted advertising, or that they are intended for features outside the scope of DMS3. -CLI commands run by the primary CLI binary for dms3fs - -egrep --color NewNode ../go-dms3-fs/core/commands/add.go - adds local filesystem objects to dms3fs - -egrep --color NewNode ../go-dms3-fs/core/commands/index/mk.go (move to new behavior section) - make/add local index repository objects to dms3fs - - -### Mutable File System (mfs) - -A merkle DAG tracks permanent filesystem objects. The Linked Data (LD) format allows creating relationships between DAG objects designed to support an object in a Unix filesystem that has a name and a hierarchical path. The DAG graph supports an extensible node data format allowing use of arbitrary data structures managed in the DAG. - -A mutable file system uses the well know key "/local/filesroot" to provide unix-like [path/name] filesystem operations as a convenience. The mfs root Cid is periodically published in the datastore to keep the mfs root Cid fresh. - -NewRoot references with publish function specified -egrep --color \.NewRoot ../go-dms3-fs/fuse/dms3ns/dms3ns_unix.go - implements fuse dms3ns filesystem. uses dms3ns publisher -egrep --color \.NewRoot ../go-dms3-fs/core/core.go - add command makes a new Dms3FsNode calling [core/builder.go]NewNode() builder [core/core.go]NewNode() calls setupNode(), which calls loadsFilesRoot(), which calls [../]go-mfs/system.go]mfs.NewRoot() to create or reuse a mfs root DAG node, which must be of type TDirectory or THAMTShard. NewRoot() starts a republisher for the mfs root, which republishes the mfs root cid to the datastore under key "/local/filesroot" (using short/long intervals: time.Millisecond*300, time.Second*3). - - Dms3FsNode node uses record.NamespacedValidator(). - - Files or Directories added/managed using the files commands and with using builder will have the mfs root as parent. so when child list is updated, republisher will update mfs root cid in the datastore. - -NewRoot references without publish function specified -egrep --color \.NewRoot ../go-dms3-fs/core/commands/add.go -egrep --color \.NewRoot ../go-dms3-fs/core/commands/index/mk.go - - -### Block, Exchange, and Bitswap Services - -Objects stored in the merkle DAG can be accessed by local and remote peer nodes. A number of path resolvers are used to effectively resolve a given path to a dms3fs path, where the last component of the path represents a Cid. The peer node object includes properties that enable access to the DAGservice and Blockservice interfaces. The Blockservice enables seamless retrieval of a block from the local node or a remote provider. When storing a block on the local node, the Blockservice informs the Exchange service of the existence of the Cid, which in turn informs the Bitswap service responsible for swapping blocks between peer nodes. The Bitswap service runs a pubsub protocol to manage WantLists of Cids to swap with peer nodes. - -#### Object Path Resolution - -egrep --color Resolve ./core/pathresolver.go - Resolve first calls ResolveDMS3NS() to resolve dms3ns path to dms3fs path, then calls the provided path resolver to resolve to a DAG node. the provided path resolver is single hop resolver [go-path/resolver/resolver.go]Resolver - r := &resolver.Resolver{ - DAG: n.DAG, - ResolveOnce: uio.ResolveUnixfsOnce, // or resolver.ResolveOnce, aka [go-path/resolver/resolver.go]ResolveOnce - } -egrep --color ResolveToCid ../go-dms3-fs/core/pathresolver.go - ResolveToCid is used by plumbing commands like Pin and - -core path resolver uses the [go-merkledag/merkledag.go]Get() to retrieve a DAG node from a resolved path. DAG Get() in turn uses the Dms3FsNode blockservice to fetch the DAG node block. - b, err := n.Blocks.GetBlock(ctx, c) - -#### Object Access - -egrep --color bserv\.New ../go-dms3-fs/core/builder.go -DAG node getter uses [dms3-fs/go-blockservice]New() to get Blocks -the Dms3FsNode [core/builder.go]setupNode() creates the blockservice - n.Blocks = bserv.New(n.Blockstore, n.Exchange) - n.DAG = dag.NewDAGService(n.Blocks) -[core/core.go]startOnlineServicesWithHost() creates the Dms3FsNode block exchange service. - // setup exchange service - bitswapNetwork := bsnet.NewFromDms3FsHost(n.PeerHost, n.Routing) - n.Exchange = bitswap.New(ctx, bitswapNetwork, n.Blockstore) -the blockservice GetBlock() seamlessly retrieves blocks from the local node or a peer node: - - block, err := bs.Get(c) - if err == nil { - return block, nil - } - if err == blockstore.ErrNotFound && f != nil { - // TODO be careful checking ErrNotFound. If the underlying - // implementation changes, this will break. - log.Debug("Blockservice: Searching bitswap") - blk, err := f.GetBlock(ctx, c) - if err != nil { - if err == blockstore.ErrNotFound { - return nil, ErrNotFound - } - return nil, err - } - log.Event(ctx, "BlockService.BlockFetched", c) - return blk, nil - } - -go-fs-blockstore/blockstore.go - the blockstore hosts blocks in a flatfs datastore. -go-block-format/blocks.go - defines the block format -go-ds-flatfs/flatfs.go - implements a datastore that stores all objects in a two-level directory structure in the local file system, regardless of the hierarchy of the keys. -go-fs-ds-help/key.go - provides conversion between Cid and datastore key -the blockstore Get() retrieves the block from the flatfs datastore using a key converted from the Cid, and returns a formated basic block consisting of data read and the cid key. - -#### Locating Remote Objects - -So how does exchange find the block? -egrep --color Fetcher ../go-fs-exchange-interface/interface.go -Fetcher is an interface that includes GetBlock() and GetBlocks() functions implemented by the [go-bitswap/bitswap.go] service. the [go-bitswap/network/] exchanges DAG blocks with peer nodes using a [github.com/gxed/pubsub/] protocol. GetBlocks() adds cids of blocks into WantList. the [go-blockservice/blockservice.go]AddBlock() function announces to the exchange service that this node has a block (HasBlock()), which may trigger notification to peer node(s) that want the added cid block. - - - -## DMS3 Introduced Component behavior - -### Corpus Documents - -Corpus documents added to an index repository are also stored in the DMS3FS. - -Index repository and reposet state is stored in the DMS3FS UnixFS block store. - -The Merkle DAG is used to store repository properties and relationships in the UnixFS by managing index repository related directory and file nodes. - -The index datastore is used to track the Cid of a DAG reposet, and Cids of corpus documents in the DMS3FS for a specific repo. The tracking in index datastore applies a key convention that facilitates lookup of reposet and corpus documents given repository class, kind, name, and repo name. - -Notes on index documents: -- index environment addFile - - assumes metadata fields are stored within the file being added - - returns -1, not docid. docid can subsequently be looked up via metadata fields -- index environment addString - - includes metadata vector as input parameter - - returns -docid -- the only indri required metadata is docno. all index documents have unique docno values -- the only infospace required metadata is base-time - - represents absolute start time of partition window - - document abs-time fields are encoded as rel-time to partition window start time - -### Configuration (mutable) - -To Be Defined - - -### Repository (immutable) - -Repositories provide a common framework for information organization, indexing, and query services. The framework is common to both infostore and metastore and differs only in the semantics of content in each state space. - -A repository set facilitates life-cycle management of growth in a specific kind of content. - -DMS3FS stores the following for repository properties: - -1. A DMS3FS UnixFS stored index properties - - - A reposet is tracked in the datastore using a key convention: - - "/local/filesroot/index/reposet/_kind_/_type_/_name_" - - where, - _type_ is either the string "infostore" or the string "metastore" and - _kind_ is a locally unique string for the kind of reposet, - _name_ is a locally unique string for the name of reposet - - - a reposet direcory contains the following file entries: - - $ dms3fs ls QmSyVYKKQ3bH8EcNh9PUrQecwrcSHWXDBRt7yL1JsJbC5Z - zb2rhek8ZLjWVXJdTxJhLo7d4o8oPTJZDLe1WwtcQtrDcqhQb 3128 params - Qma1pjUPVL7Qu68bciyrqvz1doS6SHzo6qQfGRYysW4qma 129 reposetprops - QmSbzdraUBhMfNuu6EMz4BqiPkyFgY99n9vSZJtdaVSm9Z 168 w1543348319-a1-c1-o0 - - where, - params - is a configuration file that informa search engine functions. This file is common for all repositories in a reposet. The contents are controlled via kind specific index configuration parameters. - reposetprops - a JSON encoded reposet properties file - w1543348319-a1-c1-o0 - a JSON encoded repo properties file for a repository in the reposet. - - Reposet properties are defined as follows: - - ```go -type reposetProps struct { - Type: string, // reposet class or type, "infostore" or "metastore" - Kind: string, // locally unique reposet kind - Name string, // locally unique reposet name - CreatedAt int64, // create time - MaxAreas uint8, // maximum number of areas - MaxCats uint8, // maximum number of categories - MaxDocs uint64, // maximum number of documents per repo -} - -example: - $ dms3fs cat Qma1pjUPVL7Qu68bciyrqvz1doS6SHzo6qQfGRYysW4qma - { - "Type":"metastore", - "Kind":"blog", - "Name":"myblog20", - "CreatedAt":1543348319, - "MaxAreas":64, - "MaxCats":64, - "MaxDocs":50000000 - } -``` - - Repo properties are defined as follows: - - ```go -type repoProps struct { - Type: string, // repo class or type, "infostore" or "metastore" - Kind: string, // locally unique repo kind - Name string, // locally unique repo name - Offset: 0, // shard tag1 create time (seconds) since reposet create - Area: 1, // shard tag2 area index - Cat: 1, // shard tag3 category index - "Path": string // local fs path to params file -} - -example: - $ dms3fs cat QmSbzdraUBhMfNuu6EMz4BqiPkyFgY99n9vSZJtdaVSm9Z - { - "Type":"metastore", - "Kind":"blog", - "Name":"w1543348319-a1-c1-o0", - "Offset":0, - "Area":1, - "Cat":1, - "Path":"/home/username/.dms3-fs/index/reposet/blog/myblog20/params" - } - -``` - -### Repository (mutable) - -Local filesystem stored repository mutable state: - - a. Directory Path - - A path is created when creating a new reposet that contains files and sub folders used by the indexer. A copy of the parameters file is placed in the local file system for the index server to read its configuration from. The path of a repository is composed of the following components: - - "_kind_/_type_/_name_", where - _root_ is the index repository root - _kind_ is a locally unique string for the kind of repository, - _type_ is either the string "infostore" or the string "metastore", and - _shard_ is a locally unique string for the name of repository - - For example, the very first "blog" kind repository created at Unix time of 1538751225 (seconds since 1970 epoch) will create the following folder structure by default: - - ```go - // - // local filesystem repository file folder hierarchy. - // - // index , cfg parameter Indexer.Path, must be relative path - // - - // reposet root - // - /reposet - // reposet kind root - // - /reposet/ - // reposet root folder - // - /reposet// - // reponame, composed as: - // - window: uint64, // creation time (Unix, seconds), sharding tag - // - area: uint8, // area number, sharding tag - // - cat: uint8, // category number, sharding tag - // - offset: int64, // time since creation (seconds), recovery tag - // repo specific files and sub-folders, cfg parameters are ignored for these - // - /reposet////corpus, cfg parameter Indexer.Corpus.Path - // - /reposet////metadata, cfg parameter Indexer.Corpus.Metadata - // - /reposet//params, repo params file, no corresponding cfg parameter - // params file is common for all repos in a reposet - // -``` - -### index information space publisher/resolver - - current behavior/capability - - command execution process calls core.NewNode() that builds the peer node - object with mode = {localmode, offlinemode, onlinemode}. - - network services are only enabled when the daemon is started. - - core.NewNode() calls setupNode() which uses startOnlineServises() - dms3-fs/go-dms3-fs/core/builder.go setupNode() calls startOnlineServices() - to initialize the routing subsystem. startOnlineServices() calls - startOnlineServicesWithHost() to initialize the set of Host services: - func NewNode(ctx context.Context, cfg *BuildCfg) (*Dms3FsNode, error) { - n.RecordValidator = record.NamespacedValidator{ - "pk": record.PublicKeyValidator{}, - "dms3ns": dms3ns.Validator{KeyBook: n.Peerstore}, - } - builder.go initializes BuildCfg.Host = DefaultHostOption = constructPeerHost - and passes it to startOnlineServices() in core.go: - constructPeerHost() { - return dms3p2p.New(ctx, options...) - dms3-p2p/go-p2p/config/config.go - dms3-p2p/go-p2p/p2p.go - When mode == OnlineMode(), following services are initialized: - Online services initialized by core/core.go and builder.go - // Online - PeerHost p2phost.Host // the network host (server+client) - Bootstrapper io.Closer // the periodic bootstrapper - Routing routing.Dms3FsRouting // the routing system. recommend dms3fs-dht - Exchange exchange.Interface // the block exchange + strategy (bitswap) - Namesys namesys.NameSystem // the name system, resolves paths to hashes - Ping *ping.PingService - Reprovider *rp.Reprovider // the value reprovider system - Dms3NsRepub *dms3nsrp.Republisher - - (PeerHost service) - dms3-p2p/go-p2p/p2p/host/basic/basic_host.go - basic host service - dms3-p2p/go-p2p/p2p/protocol/identify - IDService node ID protocol - ID service check major/minor version to see if peer is compatible. - - (Bootstrapper service) - periodic check for known peers - core/core.go - - (Routing service) - routing.Dms3FsRouting // the routing system. recommend dms3fs-dht - dms3-p2p/go-p2p-kad-dht (recommended, default) - dms3-p2p/go-p2p-pubsub-router - dms3-p2p/go-floodsub () - dms3-fs/go-fs-routing/none - see bottom of core/core.go and startOnlineServicesWithHost() - type RoutingOption func(context.Context, p2phost.Host, ds.Batching, record.Validator) (routing.Dms3FsRouting, error) - var DHTOption RoutingOption = constructDHTRouting - var DHTClientOption RoutingOption = constructClientDHTRouting - var NilRouterOption RoutingOption = nilrouting.ConstructNilRouting - - the RoutingOption function is invoked by referencing one of: - DHTOption, DHTClientOption, and NilRouterOption - see cmd/dms3fs/daemon.go and ../go-dms3-fs/core/builder.go - - (Exchange service) - dms3-fs/go-fs-exchange-interface/interface.go - protocol used to exchange block with peer nodes - - (Namesys service) - dms3ns publish use case - package sequence - dms3-fs/go-dms3-fs/core/commands/name/publish.go - publish command - dms3-fs/go-dms3-fs/namesys/ - name publisher/resolve - dms3-fs/go-dms3-fs/go-dms3ns/ - name record create/update/validate - dms3-fs/go-datastore/ - local store - dms3-p2p/go-p2p-routing/ - routing store - runs local only and writes record entry to ds and routing - // ds dms3-fs/go-datastore - ds key: ds.NewKey("/dms3ns/" + base32.RawStdEncoding.EncodeToString([]byte(id))) - // r routing.ValueStore, routing dms3-p2p/go-p2p-routing - r key: "/dms3ns/"+h(pubkey) - dms3-fs/go-dms3-fs/core/commands/name/publish.go - err := n.Namesys.PublishWithEOL(ctx, k, ref, eol) - // see dms3-fs/go-dms3-fs/namesys/ - dms3ns publish/resolve impl - record, err := p.updateRecord(ctx, k, value, eol) - // Create record - entry, err := dms3ns.Create(k, []byte(value), seqno, eol) - // Set the TTL - entry.Ttl = proto.Uint64(uint64(ttl.Nanoseconds())) - data, err := proto.Marshal(entry) - // Put the new record. - if err := p.ds.Put(Dms3NsDsKey(id), data); err != nil - ds.NewKey("/dms3ns/" + base32.RawStdEncoding.EncodeToString([]byte(id))) - return entry, nil - return PutRecordToRouting(ctx, p.routing, k.GetPublic(), record) - // Store dms3ns entry at "/dms3ns/"+h(pubkey) - return r.PutValue(timectx, dms3nskey, data) - dms3nskey = RecordKey(pid peer.ID) string - return "/dms3ns/" + string(pid) - // see dms3-fs/go-dms3-fs/go-dms3ns/record.go - // r routing.ValueStore, see dms3-p2p/go-p2p-routing - - (Ping service) - dms3-p2p/go-p2p/p2p/protocol/ping - - (Reprovider service) - swarm exchange use case - dms3-fs/go-dms3-fs/exchange/reprovide/providers.go - provides set of pinned (DirectKeys and RecursiveKeys) cids into - routing.ContentRouting - routes cids to swarm peers - - (Dms3NsRepub service) - dms3-fs/go-dms3-fs/namesys/republisher - - - dms3-p2p/go-p2p/p2p/protocol/index - need: lookup & query - dms3-fs/go-dms3inf/dms3info.go - need: index info record management - dms3-fs/go-dms3-fs/infosys/ - dms3ns name publisher/resolver implementation - -routing strategies in -less ../../dms3-p2p/go-p2p-routing/routing.go - -routing clients ---------------- -egrep --color \.Dms3FsRouting ../go-fs-routing/offline/offline.go - -egrep --color \.Dms3FsRouting ../go-fs-routing/none/none_client.go - -egrep --color \.Dms3FsRouting ../go-dms3-fs/core/core.go -egrep --color \.Dms3FsRouting ../go-dms3-fs/core/commands/dht.go - -egrep --color \.ContentRouting ../go-bitswap/network/dms3fs_impl.go - -egrep --color \.ContentRouting ../go-dms3-fs/exchange/reprovide/reprovide.go - -egrep --color \.ValueStore ../go-dms3-fs/namesys/publisher.go -egrep --color \.ValueStore ../go-dms3-fs/namesys/routing.go -egrep --color \.ValueStore ../go-dms3-fs/namesys/namesys.go - -routing providers ------------------ -type ContentRouting interface { - // Provide adds the given cid to the content routing system. If 'true' is - // passed, it also announces it, otherwise it is just kept in the local - // accounting of which objects are being provided. - Provide(context.Context, *cid.Cid, bool) error - - // Search for peers who are able to provide a given key - FindProvidersAsync(context.Context, *cid.Cid, int) <-chan pstore.PeerInfo -} - -// PeerRouting is a way to find information about certain peers. -// This can be implemented by a simple lookup table, a tracking server, -// or even a DHT. -type PeerRouting interface { - // Find specific Peer - // FindPeer searches for a peer with given ID, returns a pstore.PeerInfo - // with relevant addresses. - FindPeer(context.Context, peer.ID) (pstore.PeerInfo, error) -} - -// ValueStore is a basic Put/Get interface. -type ValueStore interface { - // PutValue adds value corresponding to given Key. - PutValue(context.Context, string, []byte, ...ropts.Option) error - - // GetValue searches for the value corresponding to given Key. - GetValue(context.Context, string, ...ropts.Option) ([]byte, error) -} - -egrep --color \.Dms3FsRouting ../../dms3-p2p/go-p2p-routing-helpers/parallel.go -egrep --color \.PeerRouting ../../dms3-p2p/go-p2p-routing-helpers/parallel.go -egrep --color \.ContentRouting ../../dms3-p2p/go-p2p-routing-helpers/parallel.go -egrep --color \.ValueStore ../../dms3-p2p/go-p2p-routing-helpers/parallel.go - -egrep --color \.ContentRouting ../../dms3-p2p/go-p2p-routing-helpers/composed.go -egrep --color \.PeerRouting ../../dms3-p2p/go-p2p-routing-helpers/composed.go -egrep --color \.Dms3FsRouting ../../dms3-p2p/go-p2p-routing-helpers/composed.go -egrep --color \.ValueStore ../../dms3-p2p/go-p2p-routing-helpers/composed.go - -egrep --color \.ValueStore ../../dms3-p2p/go-p2p-routing-helpers/limited.go - -egrep --color \.Dms3FsRouting ../../dms3-p2p/go-p2p-routing-helpers/null.go - -egrep --color \.Dms3FsRouting ../../dms3-p2p/go-p2p-routing-helpers/tiered.go - -egrep --color \.ContentRouting ../../dms3-p2p/go-p2p-pubsub-router/pubsub.go -../../dms3-p2p/go-p2p-pubsub-router/pubsub.go - -recored validators -egrep --color Validator ./core/builder.go -egrep --color Validator ../go-dms3ns/record.go -egrep --color Validator ../../dms3-p2p/go-p2p-kad-dht/opts/options.go -egrep --color Validator ../../dms3-p2p/go-floodsub/pubsub.go -egrep --color Validator ../../dms3-p2p/go-p2p-record/validator.go -egrep --color Validator ../../dms3-p2p/go-p2p-record/pubkey.go -egrep --color Validator ../../dms3-p2p/go-p2p-pubsub-router/pubsub.go - -for command examples of record get/put/validate, see: -dms3fs name publish (and resolve) -dms3fs name pubsub * -dms3fs pubsub * - - -## Index and Query Commands - - -### Index Commands - -Example blog use case: - -```bash -dms3fs index config show # show index configuration -dms3fs index config --json Metadata {} # reset all index metadata -dms3fs index config --json Metadata.Kind [{}] # reset kind metadata -dms3fs index config --json Metadata.Kind '[{"Name": "blog", "Field": ["author","name"]}]' # set blog fields -dms3fs index config --json Metadata.Kind '[{"Name": "blog", "Field": ["About", "Address", "Affiliation", "Author", "Brand", "Citation", "Description", "Email", "Headline", "Keywords", "Language", "Name", "Telephone", "Version"]}]' # set blog fields -dms3fs index config --json Metadata.Kind # show metadata kinds - -dms3fs index mkidx -k=blog -n myblog # make blog kind infostore -dms3fs index mkdoc -k=blog > b.xml # make empty blog template - # edit the blog document: b.xml -dms3fs index addoc b.xml # add blog to infostore -dms3fs index rmdoc # remove doc from repository - -dms3fs index mkidx -k=blog -n myblog # make metastore for blog infostore -dms3fs index mkdoc -k=blog > n.xml # make empty blog metastore template - # edit metastore document: ns.xml -dms3fs index addoc ns.xml # add it to metastore infostore -dms3fs index publish # publish path, infostore or metastore - -dms3fs index ls # list infostore -dms3fs index stats # display infostore stats -dms3fs index show # show service status -dms3fs index start # start service -dms3fs index stop # stop service -dms3fs index restart # reset/restart service -dms3fs index recover # rebuild index data repository -``` -Notes: -[cmd/dms3fs/daemon.go]serveHTTPApi registers http.ServeMux url handlers under "/api" and creates a goroutine to serve the url endpoints. - -[core/corehttp/corehttp.go] module defines most of the urls served by the daemon, and defines the Serve function called by the goroutine created in -[cmd/dms3fs/daemon.go]serveHTTPApi. - -[core/corehttp/commands.go] module defines CommandsOption function that constructs a ServerOption for hooking commands into the HTTP server. - -all commands under [core/commands/root.go]Root and [core/commands/root.go]RootRO are hooked as daemon api url endpoints, and executed via the daemon. cli command Root is defined in [cmd/dms3fs/dms3fs.go] along with cmdDetailsMap, which restricts command execution environment/requirement. - -the index repository management command cmdDetails should be updated in the cmdDetailsMap to properly mark commands that cannot run on daemon (such as index/config/edit). -All index repository management commands are wired in core/commands/root.go -the commands are implemented in commands/index/*.go. the configuration command is implemented in commands/idxconfig.go, because of dependency on private function unwrapOutput in package commands. core functions are located in core/coreindex/, coreapi in core/coreapi/index.go, core/coreapi/interface/index.go, core/coreapi/interface/options/index.go (core api needs updates, currently testing in local environment). - -### Query Commands - -```bash -dms3fs index query -options... ... # find docs in infostore -``` - -### Kind of Content - -Documents in a infostore should be structured using common metadata fields to enable more refined matching when searching using a robust query language. - -Users are free to choose any preferred document metadata field structure. Standardized fields improve ease of discovery use of common search query patterns. - -Significant collaborative community effort has been invested to create, maintain, and promote common [**_schema vocabulary_**](https://schema.org/) for structured data on the Internet. - -As an advocate for personal privacy, DMS3 takes a minimalist approach to limit use of metadata vocabulary to that which is relevant for the sharing endpoints. DMS3 is not an advocate for collecting and aggregating personal data for the promotion and placement of sponsored messages targeting individuals. - -DMS3 suggested subset vocabulary usage is discussed below. We encourage community discussions and and suggestions for what may constitute appropriate common metadata for various kind of content. Use of suggested vocabulary is strictly at the option of the information source. - -#### Blog Kind of Content - -This secion defines the metadata fields for Blog content. - -##### [Blog](https://schema.org/Blog "Schema Org. Blog") metadata fields +#### [Blog](https://schema.org/Blog "Schema Org. Blog") metadata fields - About - The subject matter of the content @@ -569,7 +38,7 @@ This secion defines the metadata fields for Blog content. - Description property from [Thing](https://schema.org/thing "Schema Org. Thing") - A description of the content -##### [Person](https://schema.org/person "Schema Org. Person") metadata fields +#### [Person](https://schema.org/person "Schema Org. Person") metadata fields - Address - Physical address of the person. diff --git a/docs/why.md b/docs/why.md index c5c6d7700f80c4de5abd3f636f0616fdbfde4d86..5b43aa05b57ef2ef88229d567aba3f62f8a9e87d 100644 --- a/docs/why.md +++ b/docs/why.md @@ -213,3 +213,9 @@ In "An Ugly Truth: Inside Facebook's Battle for Domination," New York Times repo The new book explores in-depth the inner workings of the company and its top executives. The word ugly in the title comes from a memo written by one of Facebook's own vice presidents, with Frenkel and Kang’s reporting highlighting that many of the platform’s perceived flaws are deliberate design choices. “So many of Facebook's problems are built into the way that they do business,” Frenkel says. “The very business model that they're premised on … is to keep you online.” + +##### [Apple says it will reject any government demands to use new child sexual abuse image detection system for surveillance](https://www.cnbc.com/2021/08/09/apple-will-reject-demands-to-use-csam-system-for-surveillance-.html) + +"Some cryptographers are worried about what could happen if a country such as China were to pass a law saying the system also has to include politically sensitive images. Apple CEO Tim Cook has previously said that the company follows laws in every country where it conducts business." + +“It’s truly disappointing that Apple got so hung up on its particular vision of privacy that it ended up betraying the fulcrum of user control: being able to trust that your device is truly yours,” technology commentator Ben Thompson wrote in a newsletter on Monday. diff --git a/mkdocs.yml b/mkdocs.yml index ec98454221b53bd55e033e88c114848aefb5f7fe..58de9ec7aac13b1dbc83e93240ee99fccfbdaea6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,5 +1,6 @@ site_name: DMS3 documentation +site_url: https://dms3.io/ nav: - Home: index.md @@ -11,14 +12,20 @@ nav: - Architecture Overview: arch.md - High-level Design: design.md - Roadmap: roadmap.md + - Appendix Notes: notes.md # - Example Math and Diagrams: diagrams.md -# - Implementation Notes: notes.md theme: # name: readthedocs name: rtd-dropdown custom_dir: docs/ +plugins: + - search + - mermaid2: + arguments: + theme: 'dark' + markdown_extensions: - footnotes - pymdownx.arithmatex @@ -39,11 +46,13 @@ markdown_extensions: format: !!python/name:pymdownx.arithmatex.fence_mathjax_format extra_css: -# - https://unpkg.com/mermaid@7.1.2/dist/mermaid.css + - 'css/theme_extra.css' + - https://unpkg.com/mermaid@7.1.2/dist/mermaid.css extra_javascript: - 'js/converter.js' - - 'js/mermaid.min.js' - - 'js/mermaid.min.js.map' +# - 'js/mermaid.min.js' + - https://unpkg.com/mermaid/dist/mermaid.min.js +# - 'js/mermaid.min.js.map' - 'https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.9.1/underscore-min.js' - 'https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.9.1/underscore-min.js.map' - 'https://cdnjs.cloudflare.com/ajax/libs/raphael/2.3.0/raphael.min.js'