Implementing a World Wide Database on an EOS blockchain


#1

Hello Community,

My apologies in advance if there is a more appropriate category for this discussion.

Please have a look at this list of data sources. It is a list of databases implemented in a standard called Resource Description Framework. RDF allows the world to share a web-scale database schema where shared, reused URIs replace both primary keys and table/column names. These open schema are called ontologies and they are maintained by communities of enthusiast of the field being modeled (for more info on the semantic web vision, this is a good place to start).

The problem is that our RDF databases currently live in silos, meaning they (like most applications) run in isolated processes, which means isolated state, which makes confederated query results a difficult problem to solve (no viable solutions are known AFAIK). When I started studying the EOS vision, I began to wonder if the EOS blockchain could allow you to take the persistence object of one of these databases and swap it for an EOS smart contract running on the blockchain. Suppose the database uses an rdf.db file it writes and reads the db state to and from, and uses db_file_source to manage file i/o. If I traded db_file_source with eso_source, where eos_source implemented a file interface but was reading/writing to an EOS blockchain, would that be feasible from a performance standpoint, since the focus here is on the transaction relay speed (which I haven’t been hearing much about) and not so much volume of transactions in a given time.

What I’m proposing is that all these RDF databases write their db state (perform their CRUD operations) against one common persistence object, EOS. That way, when I query one of them, that instance of the database can both lookup and reason against the “global persistence object” to derive a much richer and more disparate set of results. Only the low-level persistence object needs to run on the block chain, which would minimize the amount of effort required to port the databases over to EOS.

My question is, can EOS meet or exceed the performance of a file-backed persistence. If so, my next question is are there any here who would be willing to contribute to an open source effort to make a EOS-backed persistence for RDF databases? Or could you point me to any places I could find such persons? I suspect if EOS can handle this requirement that there are a lot of enthusiastic folks in the semantic web community who would be interested in a viable solution for confederated queries.


#2

Maybe a visual example or short paper could elaborate further on this idea ?


#3

I think this sounds intriguing. I look forward to the visual example or white paper.
-Thomas


#4

Hello,

For this example I will be using Virtuoso, a quad-store written in C++ language. This RDF database has a class disk.c which implements a file access manager on xyz.db, which persists the database state. I propose swapping the xyz.db file for a named pipe, e.g. a FIFO special file, and have a script (an “eos adaptor”) on the other end of the pipe that serves as Virtuoso’s proxy to the eos smart contract. This minimizes the level-of-effort required for the existing quad-stores to interface with the blockchain. There may be a better way to abstract the persistence layer (e.g. extract an interface from disk.c, then implement a eos_disk.c) but the above seemed to me to be the path of least resistance.

High-Level Architecture
As an example of how a disk write will work for the eos adaptor, the buffer used by disk.c will pass the write onset and the total_bytes to the eos blockchain and this transaction (and nothing more) will be recorded on the blockchain, respecting its index in the sequence of i/o transactions. On a read request, the buffer will pass the read onset and length to eos. To derive the current state of the file inside that char window, the smart contract will do the equivalent of running the transaction history from beginning to end. For optimization, as it runs each transaction/state transition, it will only consider (i.e. read and record) the bytes passing through the requested window. It will then return the state/content in the window back to the adaptor which passes the content back to the file buffer. In this way, the state machine (the blockchain) could model an xyz.db file of indefinite length and manage access to that file for an indefinite number of Virtuoso instances.

Interoperability and Data Access Control
If all the data in this blockchain-backed database was public data (e.g. DBPedia), then data access control is not an issue. But if we wish to store private data on this blockchain, then role-based access control would be needed to manage which transactions are allowed to be considered when deriving the file state. Those parts of the requested window which are not authorized are simply “redacted”.

Semantic interop between clients is built into the language the CRUD operations are written in, so we will get this requirement for free.

Simple Use Case
To give an example, suppose the Pet Lover’s Association creates a Dog ontology, and inserts facts about different dog breeds. Suppose also that Acme, Co. has created an ontology for defining service requests. These ontology determine the structure of entries in the database (they define the metadata attributes). Now suppose I am searching for a dog sitter. I will use my favorite RDF database client to insert a “record” in the database representing my request. I will begin with entering the keywords “New Service”. The client will match my keywords to the most popular URI, lets assume the one provided by Acme, Co.'s service ontology. To determine how to display Acme, Co.'s schema as a form, the client will call an eos smart contract which accepts a URI, retrieves the corresponding schema and returns the appropriate viewer code (Javascript, CSS, etc). It would act similar to how XSLT works. The model->viewer mappings are inserted as records into the RDF database by the crowd. The resulting code is then placed onto the “canvas” of the RDF client. This can happen recursively for elements within the Service schema, those controlled by parties outside Acme, Co. for example, such as time (which might map to a Date picker). The form will be pre-filled with data from the blockchain (e.g. my RDF “space” on eos containing my contact info, or a list of dog species maintained by the PLA) whenever possible.

A service provider will then subscribe to services requests and she will view lists of requests in the manner described above, meaning the list will be a structured document which is transformed into viewer code by the blockchain. Her subscription likewise is a record (more specifically, a SPARQL query that is triggered to execute each time someone inserts a record on the blockchain involving properties and/or subjects appearing in the query) registered to the blockchain by the service provider. The blockchain will push her query result sets to the eos account of her choice. This means she can push the result set to another smart contract of her own authoring (an “agent”), perhaps to provide automated services for the event’s creator. In this way, human-to-machine and machine-to-machine service requests are mixed in with human-to-human service requests.

Performance
Another consideration is that blocks are created every 3 seconds, and create/insert/delete DB transactions are encapsulated as blocks in this model. The latency will need to be on par with file i/o operations. Some delay would perhaps be tolerable as a tradeoff for the confederated querying, but a few seconds or less for average query response time should be the goal.

I am eager to know your thoughts.


#5

Hi

It is astonishing that you did not get your answers from some developers here. :frowning:

In my humble opinion I say definitely yes - it is possible.

If private data is stored on the blockchain, is it possible to outsource them to a sidechain which is maintained externally?

Does a node store the blockchain in full or can a node store the blockchain in partial? If we start to push the data sources from your list into the blockchain, then the size of the blockchain will explode.

Thank you.