|
From: | Ergus |
Subject: | Re: Question collaborative editing. |
Date: | Thu, 1 Oct 2020 01:11:59 +0200 |
On Wed, Sep 30, 2020 at 09:08:27PM +0300, Eli Zaretskii wrote:
From: Qiantan Hong <qhong@mit.edu> CC: Ergus <spacibba@aol.com>, Fermin <fmfs@posteo.net>, Jean Louis <bugs@gnu.support>, Noam Postavsky <npostavs@gmail.com>, Emacs developers <emacs-devel@gnu.org>, Karl Fogel <kfogel@red-bean.com>, Stefan Monnier <monnier@iro.umontreal.ca> Date: Wed, 30 Sep 2020 17:48:13 +0000 > I'm sorry, I probably miss too much background information here, > because I don't really understand what is the issue you are describing > here. E.g., why is user intent important, when all you need is to > keep buffer text synchronized between several sessions? That’s the baseline for collaborative editing, there’s apparently solution that satisfies this single property (convergence property) but are extremely bad — e.g. the effect of some user input takes no effect, or a trivial program displaying an empty buffer for every user. Actually if we only want this property there’s much simpler way than CRDT/OT, e.g. let the server (or one client) send the whole buffer periodically. One example of diff not preserving user intent is: suppose in the buffer there is “one two one” And users agree that this sentence should always end in “one”. Now user A deleted the first two words “one two “ user B inserted “three” between the first “one” and “two” However, the diff have no idea about users’ agreement and what exactly user does (unless we get it from modification hooks). For user A, the diff just see “one” after and assume that A delete the suffix “two one”. The CRDT then does it job correctly and gives final synchronization result “one three”. Clearly, user intent is violated.I understand all that, but (a) you can overcome that by techniques different from CRDT, and (b) unless we mean very different things by "user intent", the intent is not important here. What is important, AFAIU, is to specify the changes in a way that will produce the desired result even if the buffer in the other Emacs was meanwhile modified.
Hi Eli: I will probably describe the obvious, but it seems that there is a miss communication here. The CRDT algorithm works in a way that every single character in the buffer has a unique ID if you have "ab" then a could be zero and b could be 1... Inserting a char between them (acb) will make c be 0.5; adding another (acdb) will make d=0.75 (this is just a simplified example) So adding a letter only needs to contain the new number and the char to insert {0.5; c} {0.75; d}. And to insert "c" we only need to find which char in the buffer are x < 0.5 and x+1 > 0.5. If I understood the Qiantan idea; his approach is to add such ID as a text property within the emacs buffer. To modify the buffer it is only needed to search (go to) the property ID of text chars/words whatever in the emacs buffer and perform the action. So the error probability is smaller because the IDs ARE in the text itself. This will reduce undetected errors inserting in the wrong positions when translating form the CRDT ID to the real "global" buffer position to do an "insert". Which could happen in some corner cases... (lets say basically synchronization errors between local modifications (which modify local absolute positions) and processing remote ones using outdated information (translating to global indices)... Said that; the external library approach seemed to work with tandem https://github.com/jscheid/tandem-emacs I am not sire how well it did. My previous concerns about the internal storage and performance was because the linear search of text properties in a long linear buffer may be very expensive as described in the refereed paper; so using a two steps search improved performance as described there... There are also other optimizations like grouping operations or remove redundant ones that with the C approach could reduce the operation that arrive to emacs. Probably in our case the improvement may be to use a bisection search and position hints. Any way. I am pretty sure that the Qiantan lisp only implementation could work; but performance is something that needs to be tested (hopefully performance issues will be negligible compared to network latency). The C library implementation was my utopic ideal implementation in order to make it efficient, portable and reusable with other editors... But it will somehow require to start a completely independent project from scratch and would require much more time to implement (algorithm, network handling, json, synchronization between clients and with emacs)... And basically emacs provides all that. Maybe this time simple is not better, but more realistic.
[Prev in Thread] | Current Thread | [Next in Thread] |