| Internet-Draft | rfc8089-bis-info | April 2018 |
| Kerwin | Expires 1 November 2018 | [Page] |
This document describes common usages of file URIs, beyond those prescribed -- and in some cases even allowed -- in the core specification.¶
This draft should be discussed on the GitHub repository <https://github.com/phluid61/internet-drafts/labels/rfc8089-bis-info>.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 1 November 2018.¶
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The file URI scheme is specified in [draft-kerwin-rfc8089-bis-core]. That specification defines the syntax and describes operations that can be performed on a core subset of file URIs, necessary for basic interoperability. However in the real world there are many uses of file URIs that do not conform with the core specification, but do nevertheless exhibit common traits and behaviours. This document describes those cases, to provide a pathway for interoperability beyond the core specification.¶
This is not a standard, so any prescriptive or normative language is intended to provide interoperability and/or security, but does not describe an actual standard requirement.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Syntax elements are defined in Augmented Backus-Naur Form (ABNF) [RFC5234], where possible using incremental alternative syntax to extend the core syntax rather than replacing existing definitions.¶
These extensions might be encountered by existing usages of the file URI scheme, but are not supported by the core specification [draft-kerwin-rfc8089-bis-core].¶
Some resources include active scripts that interact with the resource's URI, for example JavaScript accessing the Location interface [HTML5.Location] in a HTML document. These scripts can inspect and/or modify the query component ([RFC3986], Section 3.4) of the URI. To support this behaviour, the file URI scheme may be extended to include a query component.¶
As the absolute path to a file is represented by the hierarchical part of a file URI ([draft-kerwin-rfc8089-bis-core], Section 2), the query component, if present, is not used when dereferencing a file URI. As a result, multiple file URIs can point to the same file if they differ only in the presence and/or value of the query components. Care must be taken to avoid issues resulting from possibly unexpected aliasing in such cases.¶
To allow a query component to be included in a file URI the core file-URI
rule can be extended with the following definition:¶
file-URI =/ file-scheme ":" file-hier-part "?" query¶
It might be necessary to include user information such as a user name in a file URI, for example when representing a VMS file path with a node reference that includes an access control string.¶
To allow user information to be included in a file URI the core file-auth
rule can be extended with the following definition:¶
file-auth =/ userinfo "@" host¶
This uses the userinfo rule from [RFC3986].¶
The presence of a password in a "user:password" userinfo field is deprecated by [RFC3986], Section 3.2.1. Implementers MUST take care when dealing with information that can be used to identify a user or grant access to a system, including generation, transmission, and storage of said information.¶
On MS-DOS or Windows file systems an absolute file path can begin
with a drive letter. This is supported by the core syntax
explicitly in the local-path rule and implicitly in auth-path.¶
Note that comparison of drive letters in MS-DOS or Windows file paths is case-insensitive. In some usages of file URIs drive letters are canonicalized by converting them to uppercase, and other usages treat URIs that differ only in the case of the drive letter as identical.¶
Historically some usages of file URIs have misused drive letters in several ways:¶
Encoding the drive letter in the URI's authority component.¶
Omitting the colon ":" from the drive letter, or replacing it with a vertical line "|" character.¶
A combination of the two.¶
[RFC3986] forbids the vertical line "|" character from appearing unescaped in any portion of a URI, however it might be necessary to interpret or update old file URIs that include it.¶
To accommodate historical file URIs that have a vertical line "|"
character instead of a colon ":" in the drive letter construct the
auth-path, local-path, and drive-letter rules in the core
specification can be extended with the following definitions:¶
auth-path =/ [ file-auth ] file-absolute local-path =/ file-absolute drive-letter =/ ALPHA "|" file-absolute = "/" drive-letter path-absolute¶
This is intended to support MS-DOS or Windows file URIs with vertical line characters in the drive letter construct. For example:¶
It can also be paired with the expansion in Section 2.3.1. For example:¶
file://c|/path/to/file¶
To update such an old URI, replace the vertical line "|" character with a colon ":".¶
To accommodate historical file URIs that don't use either a colon ":"
or vertical line "|" character in the drive letter construct the
core drive-letter rule can be expanded with the following
definition:¶
drive-letter =/ ALPHA¶
For example:¶
It can also be paired with the expansion in Section 2.3.1. For example:¶
file://c/path/to/file¶
Care MUST be taken when interpreting all such file URIs, as this interpretation can only be applied if it can be determined with reasonable certainty that the drive letters are intended as such.¶
To mimic the behaviour of MS-DOS or Windows file systems, relative references beginning with a slash "/" SHOULD be resolved relative to the drive letter, when present; and resolution of ".." dot segments (per Section 5.2.4 of [RFC3986]) SHOULD be modified to not ever overwrite the drive letter.¶
For example:¶
base URI: file:///c:/path/to/file.txt rel. ref.: /some/other/thing.bmp resolved: file:///c:/some/other/thing.bmp base URI: file:///c:/foo.txt rel. ref.: ../bar.txt resolved: file:///c:/bar.txt¶
However given that this behaviour is not supported by the core specification nor the generic URI specification in [RFC3986], implementations MUST take care when implementing this extension.¶
Some usages of the file URI scheme allow UNC filespace selector strings [MS-DTYP] to be translated to and from file URIs, either by mapping the entire UNC string to the path segment of a URI, or by mapping the equivalent segments of the two schemes (hostname <=> authority, sharename+objectnames <=> path),¶
In either case it is not uncommon to encounter a dollar sign "$" in the sharename segment of a UNC filespace selector string, for example "\\localhost\c$\foo.txt", or the equivalent position in a file URI. The dollar sign symbol is a reserved character ([RFC3986], Section 2.2) but does not carry special meaning when it appears in these positions without percent-encoding ([RFC3986], Section 2.1).¶
It is common to encounter file URIs that encode entire UNC strings in the path, usually with all backslash "\" characters replaced with slashes "/".¶
To interpret such URIs, the core auth-path rule can be extended
with the following definitions:¶
auth-path =/ unc-authority path-absolute unc-authority = 2*3"/" file-host file-host = inline-IP / IPv4address / reg-name inline-IP = "%5B" ( IPv6address / IPvFuture ) "%5D"¶
This syntax uses the IPv4address, IPv6address, IPvFuture,
and reg-name rules from [RFC3986].¶
Note that the file-host rule is the same as host but with
percent-encoding applied to "[" and "]" characters.¶
This extended syntax is intended to support URIs that take the following forms:¶
The representation of a non-local file, with an empty authority and a complete (transformed) UNC string in the path. E.g.:¶
file:////host.example.com/path/to/file¶
As above, with an extra slash between the empty authority and the two slashes of the transformed UNC string, as per the syntax defined in [RFC1738]. E.g.:¶
file://///host.example.com/path/to/file¶
This representation is notably used by the Firefox web browser. See Bugzilla#107540 [Bug107540].¶
It also further limits the definition of a "local file URI" ([draft-kerwin-rfc8089-bis-core], Section 1.1) by excluding any file URI with a path that encodes a UNC string.¶
It is less common, but not unheard of, to encounter implementations that transform UNC filespace selector strings into file URIs and vice versa by mapping the equivalent segments of the two schemes.¶
The following is an algorithmic description of the process of translating a UNC filespace selector string to a file URI. It uses the syntactic elements defined in [MS-DTYP].¶
Initialize a new URI with the "file:" scheme identifier.¶
Append the authority:¶
Append the share-name:¶
For each object-name:¶
For example:¶
UNC String: \\host.example.com\Share\path\to\file.txt URI: file://host.example.com/Share/path/to/file.txt¶
The inverse algorithm, for translating a file URI to a UNC filespace selector string, is left as an exercise for the reader.¶
Historically some usages of file URIs have naively copied entire file paths into the path components of file URIs. Where MS-DOS or Windows file paths were thus copied the resulting URI strings contained unencoded backslash "\" characters, which are forbidden by both [RFC1738] and [RFC3986].¶
It might be possible to translate or update such an invalid file URI by replacing all backslashes "\" with slashes "/", if it can be determined with reasonable certainty that the backslashes are intended as path separators.¶
TO DO¶