Applications Area Working Group M. Kerwin
Internet-Draft QUT
Intended status: Informational December 21, 2018
Expires: June 24, 2019

Using the file URI Scheme
draft-kerwin-rfc8089-bis-info-latest

Abstract

This document describes common usages of file URIs, beyond those prescribed – and in some cases even allowed – in the core specification.

Note to Readers

This draft should be discussed on the GitHub repository <https://github.com/phluid61/internet-drafts/labels/rfc8089-bis-info>.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on June 24, 2019.

Copyright Notice

Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.


Table of Contents

1. Introduction

The file URI scheme is specified in [draft-kerwin-rfc8089-bis-core]. That specification defines the syntax and describes operations that can be performed on a core subset of file URIs, necessary for basic interoperability. However in the real world there are many uses of file URIs that do not conform with the core specification, but do nevertheless exhibit common traits and behaviours. This document describes those cases, to provide a pathway for interoperability beyond the core specification.

1.1. Notational Conventions

This is not a standard, so any prescriptive or normative language is intended to provide interoperability and/or security, but does not describe an actual standard requirement.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Syntax elements are defined in Augmented Backus-Naur Form (ABNF) [RFC5234], where possible using incremental alternative syntax to extend the core syntax rather than replacing existing definitions.

2. Nonstandard Extensions

These extensions might be encountered by existing usages of the file URI scheme, but are not supported by the core specification [draft-kerwin-rfc8089-bis-core].

2.1. Query Components

Some resources include active scripts that interact with the resource’s URI, for example JavaScript accessing the Location interface [HTML5.Location] in a HTML document. These scripts can inspect and/or modify the query component ([RFC3986], Section 3.4) of the URI. To support this behaviour, the file URI scheme may be extended to include a query component.

As the absolute path to a file is represented by the hierarchical part of a file URI ([draft-kerwin-rfc8089-bis-core], Section 2), the query component, if present, is not used when dereferencing a file URI. As a result, multiple file URIs can point to the same file if they differ only in the presence and/or value of the query components. Care must be taken to avoid issues resulting from possibly unexpected aliasing in such cases.

To allow a query component to be included in a file URI the core file-URI rule can be extended with the following definition:

   file-URI       =/ file-scheme ":" file-hier-part "?" query

This uses the query rule from [RFC3986].

2.2. User Information

It might be necessary to include user information such as a user name in a file URI, for example when representing a VMS file path with a node reference that includes an access control string.

To allow user information to be included in a file URI the core file-auth rule can be extended with the following definition:

   file-auth      =/ userinfo "@" host

This uses the userinfo rule from [RFC3986].

The presence of a password in a “user:password” userinfo field is deprecated by [RFC3986], Section 3.2.1. Implementers MUST take care when dealing with information that can be used to identify a user or grant access to a system, including generation, transmission, and storage of said information.

2.3. MS-DOS and Windows Drive Letters

On MS-DOS or Windows file systems an absolute file path can begin with a drive letter. This is supported by the core syntax explicitly in the local-path rule and implicitly in auth-path.

Note that comparison of drive letters in MS-DOS or Windows file paths is case-insensitive. In some usages of file URIs drive letters are canonicalized by converting them to uppercase, and other usages treat URIs that differ only in the case of the drive letter as identical.

Historically some usages of file URIs have misused drive letters in several ways:

2.3.1. Drive Letter Authority

To accommodate historical file URIs that have a drive letter encoded in the authority, the core file-auth rule case be extended with the following definition:

   file-auth      =/ drive-letter

For example:

2.3.2. Vertical Line Character

[RFC3986] forbids the vertical line “|” character from appearing unescaped in any portion of a URI, however it might be necessary to interpret or update old file URIs that include it.

To accommodate historical file URIs that have a vertical line “|” character instead of a colon “:” in the drive letter construct the auth-path, local-path, and drive-letter rules in the core specification can be extended with the following definitions:

   auth-path      =/ [ file-auth ] file-absolute

   local-path     =/ file-absolute

   drive-letter   =/ ALPHA "|"

   file-absolute  = "/" drive-letter path-absolute

This is intended to support MS-DOS or Windows file URIs with vertical line characters in the drive letter construct. For example:

It can also be paired with the expansion in Section 2.3.1. For example:

To update such an old URI, replace the vertical line “|” character with a colon “:”.

2.3.3. Letter-Only Drive Letter

To accommodate historical file URIs that don’t use either a colon “:” or vertical line “|” character in the drive letter construct the core drive-letter rule can be expanded with the following definition:

   drive-letter   =/ ALPHA

For example:

It can also be paired with the expansion in Section 2.3.1. For example:

Care MUST be taken when interpreting all such file URIs, as this interpretation can only be applied if it can be determined with reasonable certainty that the drive letters are intended as such.

2.4. MS-DOS and Windows Relative Resolution

To mimic the behaviour of MS-DOS or Windows file systems, relative references beginning with a slash “/” SHOULD be resolved relative to the drive letter, when present; and resolution of “..” dot segments (per Section 5.2.4 of [RFC3986]) SHOULD be modified to not ever overwrite the drive letter.

For example:

   base URI:   file:///c:/path/to/file.txt
   rel. ref.:  /some/other/thing.bmp
   resolved:   file:///c:/some/other/thing.bmp

   base URI:   file:///c:/foo.txt
   rel. ref.:  ../bar.txt
   resolved:   file:///c:/bar.txt

However given that this behaviour is not supported by the core specification nor the generic URI specification in [RFC3986], implementations MUST take care when implementing this extension.

2.5. UNC Strings

Some usages of the file URI scheme allow UNC filespace selector strings [MS-DTYP] to be translated to and from file URIs, either by mapping the entire UNC string to the path segment of a URI, or by mapping the equivalent segments of the two schemes (hostname <=> authority, sharename+objectnames <=> path),

In either case it is not uncommon to encounter a dollar sign “$” in the sharename segment of a UNC filespace selector string, for example “\\localhost\c$\foo.txt”, or the equivalent position in a file URI. The dollar sign symbol is a reserved character ([RFC3986], Section 2.2) but does not carry special meaning when it appears in these positions without percent-encoding ([RFC3986], Section 2.1).

2.5.1. file URI with UNC Path

It is common to encounter file URIs that encode entire UNC strings in the path, usually with all backslash “\” characters replaced with slashes “/”.

To interpret such URIs, the core auth-path rule can be extended with the following definitions:

   auth-path      =/ unc-authority path-absolute

   unc-authority  = 2*3"/" file-host

   file-host      = inline-IP / IPv4address / reg-name

   inline-IP      = "%5B" ( IPv6address / IPvFuture ) "%5D"

This syntax uses the IPv4address, IPv6address, IPvFuture, and reg-name rules from [RFC3986].

This extended syntax is intended to support URIs that take the following forms:

It also further limits the definition of a “local file URI” ([draft-kerwin-rfc8089-bis-core], Section 1.1) by excluding any file URI with a path that encodes a UNC string.

2.5.2. file URI with Authority

It is less common, but not unheard of, to encounter implementations that transform UNC filespace selector strings into file URIs and vice versa by mapping the equivalent segments of the two schemes.

The following is an algorithmic description of the process of translating a UNC filespace selector string to a file URI. It uses the syntactic elements defined in [MS-DTYP].

  1. Initialize a new URI with the “file:” scheme identifier.
  2. Append the authority:
    1. Append the “//” authority sigil to the URI.
    2. Append the host-name field of the UNC string to the URI as its host component. If the host-name field is the string “localhost” this can produce an ambiguous file URI, and the field SHOULD be replaced with a fully qualified domain name or address.
  3. Append the share-name:
    1. Transform the share-name to a path segment ([RFC3986], Section 3.3) to conform to the encoding rules of Section 2 of [RFC3986].
    2. Append a delimiting slash character “/” and the transformed segment to the URI.
  4. For each object-name:
    1. Transform the object-name to a path segment as above.

      The colon character “:” is allowed as a delimiter before stream-name and stream-type in the file-name, if present.
    2. Append a delimiting slash character “/” and the transformed segment to the URI.

For example:

   UNC String:   \\host.example.com\Share\path\to\file.txt
   URI:          file://host.example.com/Share/path/to/file.txt

The inverse algorithm, for translating a file URI to a UNC filespace selector string, is left as an exercise for the reader.

2.6. Backslash as Separator

Historically some usages of file URIs have naively copied entire file paths into the path components of file URIs. Where MS-DOS or Windows file paths were thus copied the resulting URI strings contained unencoded backslash “\” characters, which are forbidden by both [RFC1738] and [RFC3986].

It might be possible to translate or update such an invalid file URI by replacing all backslashes “\” with slashes “/”, if it can be determined with reasonable certainty that the backslashes are intended as path separators.

3. Security Considerations

TO DO

4. References

4.1. Normative References

[draft-kerwin-rfc8089-bis-core] Kerwin, M., "The file URI Scheme", 2018.
[MS-DTYP] Microsoft Open Specifications, "Windows Data Types, 2.2.57 UNC", October 2015.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017.

4.2. Informative References

[Bug107540] Bugzilla@Mozilla, "Bug 107540", October 2007.
[HTML5.Location] The World Wide Web Consortium, "HTML5", October 2014.
[RFC1738] Berners-Lee, T., Masinter, L. and M. McCahill, "Uniform Resource Locators (URL)", RFC 1738, DOI 10.17487/RFC1738, December 1994.

Author's Address

Matthew Kerwin Queensland University of Technology Victoria Park Road Kelvin Grove, QLD 4059 Australia EMail: matthew.kerwin@qut.edu.au