Adds support for additional, optional content type options that allow the client and server to signal or negotiate a specific block order in the returned CAR.
We want to make it easier to build light-clients for IPFS. We want them to have low memory footprints on arbitrary sized files. The main pain point preventing this is the fact that CAR ordering isn't specified.
This requires keeping some kind of reference either on disk, or in memory to previously seen blocks for two reasons.
Blocks can arrive out of order, meaning when a block is consumed (data is read and returned to the consumer) and when it's received might not match.
Blocks can be reused multiple times, this is handy for cases when you plan to cache on disk but not at all when you want to process a stream with use & forget policy.
What we really want is for the gateway to help us a bit, and give us blocks in a useful order.
The existing Trustless Gateway specification does not provide a mechanism for negotiating the order of blocks in CAR responses.
This IPIP aims to improve the status quo.
CAR content type
(application/vnd.ipld.car
)
already supports version
parameter, which allows gateway to indicate which
CAR flavor is returned with the response.
The proposed solution introduces two new parameters for the content type headers
in HTTP requests and responses: order
and dups
.
The order
parameter allows the client to indicate its preference for a
specific block order in the CAR response, and the dups
parameter specifies
whether duplicate blocks are allowed in the response.
A Client SHOULD send Accept
HTTP header to leverage content type negotiation
based on section 12.5.1 of [rfc9110] to get the preferred response type.
More details in Section 5. (CAR Responses) of [trustless-gateway].
The proposed specification change aims to address the limitations of the existing Trustless Gateway specification by introducing a mechanism for negotiating the block order in CAR responses.
By allowing clients to indicate their preferred block order, Trustless Gateways can cache CAR responses for popular content, resulting in improved performance and reduced network load. Clients benefit from more efficient data handling by deserializing blocks as they arrive,
We reuse exiting HTTP content type negotiation, and the CAR content type, which
already had the optional version
parameter.
The proposed specification change brings several benefits to end users:
Improved Performance: Gateways can decide on their implicit default ordering
and cache CAR responses for popular content. In turn, clients can benefit
from strong Etag
in ordered (deterministic) responses. This reduces the
response time for subsequent requests, resulting in faster content retrieval
for users.
Reduced Memory Usage: Clients no longer need to buffer the entire CAR response in memory until the deserialization of the requested entity is finished. With the ability to deserialize blocks as they arrive, users can conserve memory resources, especially when dealing with large CAR responses.
Efficient Data Handling: By discarding blocks as soon as the CID is validated and data is deserialized, clients can efficiently process the data in real-time. This is particularly useful for light clients, IoT devices, mobile web browsers, and other streaming applications where immediate access to the data is required.
Customizable Ordering: Clients can indicate their preferred block order in the
Accept
header, allowing them to prioritize specific ordering strategies that
align with their use cases. This flexibility enhances the user experience
and empowers users to optimize content retrieval according to their needs.
The proposed specification change is backward compatible with existing client and server implementations.
Trustless Gateways that do not support the negotiation of block order in CAR
responses will continue to function as before, providing their existing default
behavior, and the clients will be able to detect it by inspecting the
Content-Type
header present in HTTP response.
Clients that do not send the Accept
header or do not recognize the order
and dups
parameters in the Content-Type
header will receive and process CAR
responses as they did before: buffering/caching all blocks until done with the
final deserialization.
Existing implementations can choose to adopt the new specification and implement support for the negotiation of block order incrementally. This allows for a smooth transition and ensures compatibility with both new and old clients.
The proposed specification change does not introduce any negative security implications beyond those already present in the existing Trustless Gateway specification. It focuses on enhancing performance and data handling without affecting the underlying security model of IPFS.
Light clients with support for order
and dups
CAR content type parameters
will be able to detect malicious response faster, reducing risks of
memory-based DoS attacks from malicious gateways.
Several alternative approaches were considered before arriving at the proposed solution:
Implicit Server-Side Configuration: Instead of negotiating the block order, in the CAR response, the Trustless Gateway could have a server-side configuration that specifies the default order. However, this approach would limit the flexibility for clients, requiring them to have prior knowledge about order supported by each gateway.
Fixed Block Order: Another option was to enforce a fixed block order in the
CAR responses. However, this approach would not cater to the varying needs
and preferences of different clients and use cases, and is not backward
compatible with the existing Trustless Gateways which return CAR responses
with Weak Etag
and unspecified block order.
Separate X-
HTTP Header: Introduction of a separate HTTP reader was
rejected because we try to use HTTP semantics where possible, and gateways
already use HTTP content type negotiation for CAR version
and reusing it
saves a few bytes in each round-trip. Also, [rfc6648] advises against
use of X-
and similar constructs in new protocols.
The decision to not implement a single preset pack with predefined behavior,
instead of separate parameters for order and duplicates (dups), was driven
by considerations of ambiguity and potential future problems when adding
more determinism to responses. For instance, if we were to include a new
behavior like foo=y|n
alongside an existing preset like pack=orderdfs+dupsy
,
it would either necessitate the addition of a separate parameter or impose
the adoption of a new version of every preset (e.g., orderdfs-dupsy+fooy
and
orderdfs+dupsy+foon
). Maintaining and deploying such changes across a
decentralized ecosystem, where gateways may operate on different software,
becomes more complex. In contrast, utilizing separate parameters for each
behavior enables easier maintenance and deployment in a decentralized
ecosystem with varying gateway software.
The proposed solution of negotiating the block order through headers is future-proof, allows for flexibility, interoperability, and customization while maintaining compatibility with existing implementations.
Implementation compliance can be determined by testing the negotiation process
between clients and Trustless Gateways using various combinations of order
and
dups
parameters.
Relevant tests were added to gateway-conformance test suite in #87, and include the below fixture.
bafybeihchr7vmgjaasntayyatmp5sv6xza57iy2h4xj7g46bpjij6yhrmy
(CAR)
dups=n
, then there should be no duplicate blocks in the returned CAR.dups=y
, then the blocks of the file are sent twice.order=dfs
and checking if blocks that belong to files arrive in the DFS order.
bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i
(CAR)Copyright and related rights waived via CC0.