- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15
Update protocol to match automerge-repo JS 1.0 #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Context: the automerge-repo JS implementation supports a request workflow for syncing with a document we don't have. In this workflow the requesting peer sends a "request" message which is identical to the current sync message except tagged with a different message type. Responding peers can then either respond with a sync message or with an "unavailable" message, which allows the receiver to tell the difference between a document the other end doesn't have and a document the other end has but which is empty. Problem: in the repo loop we have no way of telling the difference between a request for a new document and announcing a document we have. This is because both situations are expressed using the RepoEvent::NewDoc event. Solution: split `NewDoc` into a `RequestDoc` and `NewDoc` event. This allows us to send request messages for `RequestDoc` and sync messages for `NewDoc`.
77d267a    to
    782b2c3      
    Compare
  
    782b2c3    to
    c012dfb      
    Compare
  
    Context: when a peer requests a document from us we often don't want to just check our own storage, we also want to ask all other peers we are connected to if they have the document. This is the case for e.g. a sync server which is relaying documents between two peers. Problem: the current immplementation just checks local storage Solution: when we receive a request, first send a request out to all the peers we are connected to and once they have all responded only then send a response to the requestor.
c012dfb    to
    6aa2fd6      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although I didn't have enough time yet to review in all details...
Good to see many existing TODO's taken care off...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have finally wrapped my head around the logic.
At a very high-level I'm wondering: why separate the new request concept from the existing peer connection and document info? I find the back and forth between these hard to read, and I'm hoping the logic can be consolidated by updating the existing concepts, as opposed to introduce a new and separate one(which is logically not separate because of the back and forth).
Couple of things:
- The equivalent ofSelf::fail_requestcould be just called in the main loop on all requests that are complete. The benefit is that that logic is in one place, as opposed to being called at different points.
- I'm wondering if Requestcould not be replaced by a combination of aawaiting_response_from: HashSet<RepoId>onDocumentInfo, andawaiting_our_responsecould be inferred from the existence ofPeerConnection::PendingAuth(perhaps with additional data to differentiate between a peer who requested the document, like a boolean) when the docstate transitions out of bootstrap state?.
- Request::initiate_localseems to me the same as having a document in the bootstrap state without any peers awaiting our response?
In all cases, the goal is to not have to read all the code to understand the logic, instead just looking at the enums and structs should be enough. Currently this is not possible because request is separate from the document info, and the "state" of the algorithm appears shared between the state of the request and the state of the document info.
It would also be beneficial to layer the logic, as opposed to have what seems like special conditions such as in the handling of NetworkEvent::Request, which skips the share auth phase if the document is not syncing.
Layering would look like:
- Always first get the share decisions
- If sharing is accepted, either start syncing if DocState::Sync, otherwise request from other peers.
- As sync messages, or Unavailableevents are handled, update the document info and or peer connection and send outgoing messages.
In other words, always follow the same steps, but with different data.
| request::Request::new(document_id.clone()) | ||
| }); | ||
|  | ||
| if info.state.is_bootstrapping() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move this to the DocState::Bootstrap match arm above?
| "responding to request with sync as we have the doc" | ||
| ); | ||
| // if we have this document then just start syncing | ||
| Self::enqueue_share_decisions( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we do this first in all cases?
This is a fairly chunky change, the primary reason for which is that the automerge repo JS protocol introduces a request workflow for requesting documents which you don't have, in contrast to the symmetric sync protocol we have been using so far. This requires the addition of two new message types: a "request" and "unavailable" type. The requesting peer sends a request and then the responding peer responds with either a "sync" or "unavailable" message. Some further complexity is introduced by the fact that each peer is expected to only return unavailable once it has queried both it's storage and all the other peers it knows about.
To implement this then there are the following major changes:
request_documentto return anOption. This is the primary advantage of the new workflow, that you no longer need to use a timeout in the common case when noone has the document, instead everyone will returnUnavailableand we can returnOk(None)fromrequest_documentRepoEvent::NewDocintoRepoEvent::NewDocandRepoEvent:RequestDoc, so that we can distinguish between the two cases and decide whether to send request messages to our peers or notRepo::new_documentreturn a future, this allows us to move theDocumentInfocreation into the repo loopRequeststate machine which is tracked by theRepoto figure out when to send requests to other peers on behalf of incoming requests and when a request is complete.This brings us up to interop with the current JS implementation, with the exception of ephemeral messages, which I have another patch for which I will submit once this is merged.
This will be a breaking change to the network protocol and would thus require a network wide upgrade. It is possible that we could write a compatibility layer which we could maintain for a while. @issackelly is that something you would need or is a big bang upgrade okay for you?