4. Design rationale
This proposal introduces a new denylist format which aims to fulfil the
following aspects, which are a must for such a system:
- Efficient parsing at scale. Compact.
- Simplicity and extensibility for extra features, both in future versions of
the spec and in custom systems.
- Easy to read and to understand.
- Integration-ready: Avoid the requirement of custom tooling or implementation.
support to manage denylists. Text-file operations as interface for
list-editing.
- Support the necessary types of blocks (by cid, by path, double-hash etc.)
needed by users and operators.
- IPFS and DAGification friendly.
The proposed design is part of a holistic approach to content-moderation for IPFS for which we have the following detailed wishlist of items ultimately related to the denylist format:
-
Regarding the type of blocking:
- Ability to block content from being retrieved, stored or served by multihash
- Ability to block content that is referenced with an IPFS-path from a blocked multihash or traversing a blocked multihash.
- Ability to block by regexp-matching an IPFS path
- Ability to block based on content-type (i.e. only store/serve plain-text,and pictures)
- Ability to block based on CID codec (only allow Codec X)
- Ability to block based on multihash function (”no identity multihashes”)
- Ability to block IPNS names
-
Regarding the lists:
- Compact format, compression friendly
- Line-based so that updates can be watched
- Lists support CIDs
- Lists support CIDs+path (explicit)
- Lists support CIDs+path (implicit - everything referenced from CID)
- Lists support double-hashed multi-hashes
- Lists support double-hashed cid+path (current badbits format)
- Lists can be edited by hand on a text editor
- Lists are ipfs-replication-friendly (adding a new entry does not require downloading more than 1 IPFS block, to sync the list).
- Lists support comments
- Lists support gateway http error hints (i.e. type of block)
echo "/ipfs/cid" >> ~/.config/ipfs/denylists/custom
should work
- Lists have a header section with information about the list.
-
Regarding the implementation:
- Multiple denylists should be supported
- Hot-reloading of list (no restart of IPFS required)
- List removal does not require restart
- Minimal introduction of latency
- Minimal memory footprint (i.e. only read minimum amount of data into memory)
- Clean denylist module entrypoints (easy integration in current ipfs stack layers)
- Portable architecture (to other IPFS implementations). i.e. good interfaces to switch from an embedded implementation to something that could run separately, or embedded in other languages (i.e. even servicing multiple ipfs daemons).
- Text-based API.
ipfs deny <cid>
and the like are nice-to-have but not a must to work with denylists.
- Security in mind: do not enable amplification attacks through lists (i.e. someone requesting a recursively blocked CID repeteadly over the gateway endpoint causes traversal of the whole CID-DAG.
-
Regarding list distribution:
- Ability to subscribe to multiple lists, and fetch any updates as they happen
- Ability to publish own lists so that others can subscribe to them
- List-subscription configuration or file details remote lists that the user is subscribed to. Editable by hand.
- Ability to subscribe to list subscriptions.
- List subscriptions can carry context (i.e. publisher, email, type of blocking.
4.1 User benefit
Users and developers will benefit from a list format that is easy to work with because:
- It can be understood by just looking at it.
- It can be edited by hand.
- Implementations can choose to support different aspects (i.e. blocking but no optional hints).
- Denylist parsers are easy and stupid.
4.3 Alternatives
This proposal is a follow up to a previous proposal, which has several shortcomings that make it not very practical when working at scale. Both list formats can co-exist though but ultimately it will be a matter of implementation support, and it would be better to settle on one thing.
It is also a followup on the "badbits" denylist format, which has similar issues and is not flexible enough.
4.4 Copyright
Copyright and related rights waived via CC0.