Applications Area Working Group | M. Kerwin |
Internet-Draft | QUT |
Intended status: Informational | December 21, 2018 |
Expires: June 24, 2019 |
Using the file URI Scheme
draft-kerwin-rfc8089-bis-info-latest
This document describes common usages of file URIs, beyond those prescribed – and in some cases even allowed – in the core specification.
This draft should be discussed on the GitHub repository <https://github.com/phluid61/internet-drafts/labels/rfc8089-bis-info>.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on June 24, 2019.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
The file URI scheme is specified in [draft-kerwin-rfc8089-bis-core]. That specification defines the syntax and describes operations that can be performed on a core subset of file URIs, necessary for basic interoperability. However in the real world there are many uses of file URIs that do not conform with the core specification, but do nevertheless exhibit common traits and behaviours. This document describes those cases, to provide a pathway for interoperability beyond the core specification.
This is not a standard, so any prescriptive or normative language is intended to provide interoperability and/or security, but does not describe an actual standard requirement.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Syntax elements are defined in Augmented Backus-Naur Form (ABNF) [RFC5234], where possible using incremental alternative syntax to extend the core syntax rather than replacing existing definitions.
These extensions might be encountered by existing usages of the file URI scheme, but are not supported by the core specification [draft-kerwin-rfc8089-bis-core].
Some resources include active scripts that interact with the resource’s URI, for example JavaScript accessing the Location interface [HTML5.Location] in a HTML document. These scripts can inspect and/or modify the query component ([RFC3986], Section 3.4) of the URI. To support this behaviour, the file URI scheme may be extended to include a query component.
As the absolute path to a file is represented by the hierarchical part of a file URI ([draft-kerwin-rfc8089-bis-core], Section 2), the query component, if present, is not used when dereferencing a file URI. As a result, multiple file URIs can point to the same file if they differ only in the presence and/or value of the query components. Care must be taken to avoid issues resulting from possibly unexpected aliasing in such cases.
To allow a query component to be included in a file URI the core file-URI rule can be extended with the following definition:
file-URI =/ file-scheme ":" file-hier-part "?" query
This uses the query rule from [RFC3986].
It might be necessary to include user information such as a user name in a file URI, for example when representing a VMS file path with a node reference that includes an access control string.
To allow user information to be included in a file URI the core file-auth rule can be extended with the following definition:
file-auth =/ userinfo "@" host
This uses the userinfo rule from [RFC3986].
The presence of a password in a “user:password” userinfo field is deprecated by [RFC3986], Section 3.2.1. Implementers MUST take care when dealing with information that can be used to identify a user or grant access to a system, including generation, transmission, and storage of said information.
On MS-DOS or Windows file systems an absolute file path can begin with a drive letter. This is supported by the core syntax explicitly in the local-path rule and implicitly in auth-path.
Note that comparison of drive letters in MS-DOS or Windows file paths is case-insensitive. In some usages of file URIs drive letters are canonicalized by converting them to uppercase, and other usages treat URIs that differ only in the case of the drive letter as identical.
Historically some usages of file URIs have misused drive letters in several ways:
To accommodate historical file URIs that have a drive letter encoded in the authority, the core file-auth rule case be extended with the following definition:
file-auth =/ drive-letter
For example:
[RFC3986] forbids the vertical line “|” character from appearing unescaped in any portion of a URI, however it might be necessary to interpret or update old file URIs that include it.
To accommodate historical file URIs that have a vertical line “|” character instead of a colon “:” in the drive letter construct the auth-path, local-path, and drive-letter rules in the core specification can be extended with the following definitions:
auth-path =/ [ file-auth ] file-absolute local-path =/ file-absolute drive-letter =/ ALPHA "|" file-absolute = "/" drive-letter path-absolute
This is intended to support MS-DOS or Windows file URIs with vertical line characters in the drive letter construct. For example:
It can also be paired with the expansion in Section 2.3.1. For example:
To update such an old URI, replace the vertical line “|” character with a colon “:”.
To accommodate historical file URIs that don’t use either a colon “:” or vertical line “|” character in the drive letter construct the core drive-letter rule can be expanded with the following definition:
drive-letter =/ ALPHA
For example:
It can also be paired with the expansion in Section 2.3.1. For example:
Care MUST be taken when interpreting all such file URIs, as this interpretation can only be applied if it can be determined with reasonable certainty that the drive letters are intended as such.
To mimic the behaviour of MS-DOS or Windows file systems, relative references beginning with a slash “/” SHOULD be resolved relative to the drive letter, when present; and resolution of “..” dot segments (per Section 5.2.4 of [RFC3986]) SHOULD be modified to not ever overwrite the drive letter.
For example:
base URI: file:///c:/path/to/file.txt rel. ref.: /some/other/thing.bmp resolved: file:///c:/some/other/thing.bmp base URI: file:///c:/foo.txt rel. ref.: ../bar.txt resolved: file:///c:/bar.txt
However given that this behaviour is not supported by the core specification nor the generic URI specification in [RFC3986], implementations MUST take care when implementing this extension.
Some usages of the file URI scheme allow UNC filespace selector strings [MS-DTYP] to be translated to and from file URIs, either by mapping the entire UNC string to the path segment of a URI, or by mapping the equivalent segments of the two schemes (hostname <=> authority, sharename+objectnames <=> path),
In either case it is not uncommon to encounter a dollar sign “$” in the sharename segment of a UNC filespace selector string, for example “\\localhost\c$\foo.txt”, or the equivalent position in a file URI. The dollar sign symbol is a reserved character ([RFC3986], Section 2.2) but does not carry special meaning when it appears in these positions without percent-encoding ([RFC3986], Section 2.1).
It is common to encounter file URIs that encode entire UNC strings in the path, usually with all backslash “\” characters replaced with slashes “/”.
To interpret such URIs, the core auth-path rule can be extended with the following definitions:
auth-path =/ unc-authority path-absolute unc-authority = 2*3"/" file-host file-host = inline-IP / IPv4address / reg-name inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D"
This syntax uses the IPv4address, IPv6address, IPvFuture, and reg-name rules from [RFC3986].
This extended syntax is intended to support URIs that take the following forms:
This representation is notably used by the Firefox web browser. See Bugzilla#107540
[Bug107540].It also further limits the definition of a “local file URI” ([draft-kerwin-rfc8089-bis-core], Section 1.1) by excluding any file URI with a path that encodes a UNC string.
It is less common, but not unheard of, to encounter implementations that transform UNC filespace selector strings into file URIs and vice versa by mapping the equivalent segments of the two schemes.
The following is an algorithmic description of the process of translating a UNC filespace selector string to a file URI. It uses the syntactic elements defined in [MS-DTYP].
For example:
UNC String: \\host.example.com\Share\path\to\file.txt URI: file://host.example.com/Share/path/to/file.txt
The inverse algorithm, for translating a file URI to a UNC filespace selector string, is left as an exercise for the reader.
Historically some usages of file URIs have naively copied entire file paths into the path components of file URIs. Where MS-DOS or Windows file paths were thus copied the resulting URI strings contained unencoded backslash “\” characters, which are forbidden by both [RFC1738] and [RFC3986].
It might be possible to translate or update such an invalid file URI by replacing all backslashes “\” with slashes “/”, if it can be determined with reasonable certainty that the backslashes are intended as path separators.
TO DO
[draft-kerwin-rfc8089-bis-core] | Kerwin, M., "The file URI Scheme", 2018. |
[MS-DTYP] | Microsoft Open Specifications, "Windows Data Types, 2.2.57 UNC", October 2015. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC3986] | Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005. |
[RFC5234] | Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008. |
[RFC8174] | Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. |
[Bug107540] | Bugzilla@Mozilla, "Bug 107540", October 2007. |
[HTML5.Location] | The World Wide Web Consortium, "HTML5", October 2014. |
[RFC1738] | Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, DOI 10.17487/RFC1738, December 1994. |