Overview

Top Close Open

This design document proposes that TT3 will have an explicit way to differentiate between accessing an item in a hash and calling a hash virtual method.

Description

Top Close Open

In TT2, there is no way to differentiate between accessing a hash item and calling a hash virtual method.

[% hash.size %]       # hash item or vmethod

It is impossible to test for the presence of a hash item where a virtual method of the same name exists. Consider a hash reference containing information about a particular font configuration.

[% font1 = {
      family = 'Georgia'
      weight = 'bold'
   }
   font2 = {
      family = 'Georgia'
      weight = 'normal'
      size   = '140%'
   }

   font = some_condition ? font1 : font2
%]

When it comes to generating our CSS file (for example), we have no way of determining if the font hash contains a size item. If there is no size key defined, then TT will fall back on calling the size virtual method.

font-family: [% font.family %];
font-weight: [% font.weight %];
[% IF font.size     # size is optional -%]      # FAIL!
font-size: [% font.size %];
[% END -%]

The item hash virtual method can be used to explicitly look for an item in a hash reference.

[% IF font.item('size') %]
...
[% END %]

However, what happens if the hash array contains an item key? In that case, the value for the item is returned and the item vmethod is never called.

Note that this problem only occurs with hash virtual methods. Lists are unaffected because their items always have numerical indices. Text strings don't have any sub-items so any dot operations are always virtual methods.

Possible Solutions

Top Close Open

Resolve Virtual Methods First

Top Close Open

One solution is to make virtual methods resolve before hash items. This would at least solve the problem of being able to call the item() virtual method, even in the presence of an item key in the hash. A reciprocal method() method would perform the same explicit lookup for virtual methods only.

[% font.item('size')   %]
[% font.method('size') %]

The other benefit of resolving virtual methods before hash items is that the list of virtual methods is finite, usually static, and typically known in advance whereas the contents of a hash array generally aren't.

In other words, you will know that font.size will always resolve to the virtual method (unless you've explicitly disabled that particular virtual method, or indeed all hash virtual methods as will be possible). Under the current system, you can never be sure without first looking to see if such an item is defined in the hash (and even then, you can never be sure that the font.item call is being mapped to the item() virtual method).

The downside is that accessing hash items will take slightly longer as it requires an additional lookup in the hash virtual method table first. However, it should be possible to mitigate this by specifying that virtual methods are fixed at compile time. This will allow the compiler to detect those dotops that are candidates for being vmethods and those that aren't. The correct code can be generated in each case to either look in the virtual methods table first, or skip that step if the dotop isn't recognised as being a virtual method of any kind.

Namespace Prefixes

Top Close Open

While that provides a solution, it is hardly optimal in terms of syntax.

[% font.item('size')   %]   vs   [% font.size %]
[% font.method('size') %]   vs   [% font.size %]

One further solution is to allow namespace prefixes to adorn dotops. Namespace prefixes are a general purpose extension to TT3 that will be discussed in detail in a subsequent design document. In brief, they look like this:

prefix:value

Here are some examples of namespaces in use:

[% foo = var:include        %]     # the variable called 'include'
[% bar = word:hello         %]     # the word 'hello'
[% baz = file:wibble.txt    %]     # URIs
[% bam = http://example.com %]

Applying this to dotops, we could have:

[% font.item:size   %]
[% font.method:size %]

Explicit Parens

Top Close Open

Another solution is to identify virtual methods by the presence of a (possibly empty) parenthesised argument list.

[% font.size %]         # hash item
[% font.size() %]       # hash method

This provides a lightweight syntax for unambiguously differentiating between the two. However, it would required that all dotops have argument lists (or just hash dotops, but that would be inconsistent).

[% font.keys().join().html() %] vs [% font.keys.join.html %]

This also clashes with another proposed change to the TT language which uses the presence or absence of parenthesis to determine if it should call a subroutine or reference it.

# proposed TT3 behaviour
[% foo = bar() %]       # call bar, store result in foo
[% wiz = bar   %]       # store bar reference in wiz
[% foo = wiz() %]       # call wiz(), aliased to bar()

I'm not ruling this one out just yet, but let's just say that it's already got a pencil line through it and a side comment in red pen saying "NO NO NO!". Feel free to convince me otherwise if you think this is the way forward.

High Precedence Pipe Operator

Top Close Open

With the merging of filters and text virtual methods in TT3, there will be a direct parallel between the pipe operator | (currently an alias for the FILTER keyword) and the invocation of a virtual method.

# effectively the same thing in TT3
[% text|html %]
[% text.html %]

However, the | pipe operator has traditionally had a much lower precedence than the . dot operator.

# TT2
[% INCLUDE header title=page.title | indent(4) %]

In the above example, the output of the INCLUDE directive is piped into the html filter. However, if the | pipe operator were to be used at the same precedence level as the . dot operator, then the page.title would be piped into the indent filter/vmethod first, with the resulting output then being assigned to the title variable for passing into the header template.

Using <<...>> to indicate the precedence, that would be parsed as:

# <<...>> indicates the bit that gets evaluated first
[% INCLUDE header title = << page.title | indent(4) >> %]

We could work around that problem by using the filter keyword as the lower precedence keyword when required.

# TT3
[% include header title=page.title filter indent(4) %]
# parsed as
[% << include header title=page.title >> filter indent(4) %]

Here a rather contrived example that uses both:

[% include example items=hash|keys|join filter indent(4) %]

Although that is in keeping with other cases where keywords denote lower precedence counterparts of their symbolic forms (e.g. and vs &&, not vs !), it contradicts the Unix | pipe operator from which it is derived. Here it effectively has very low precedence, combining a series of commands and their arguments rather than operating on any of the individual arguments.

$ cat /some/file | grep wibble        # Unix shell

Reformatting the earlier example slightly, it becomes much less obvious that the | is only operating on the preceeding token, rather than the whole expression.

[% include example items=hash
         | keys
         | join
         filter html 
%]

I know for a fact that I've got lots of places where I've used | in this way, all of which will become subtly broken by this change. I suspect that means that many others will be in the same position. For that reason, I think it's probably best to leave | at the same low precedence level that it currently has. It will then be a direct alias for the filter keyword. Nothing more, nothing less.

Furthermore, using the pipe operator as a high precedence dot operator for calling virtual methods only solves half the problem. There is no existing counterpart for fetching a hash item while bypassing any virtual method lookup, and few, if any, unused symbols that could be pressed into this service.

Dotop Modifiers

Top Close Open

Another possible solution is to use an additional symbol after the dot to denote a more specific operation. The obvious choice for calling a virtual method would appear to be .|

[% hash.|keys %]        # keys vmethod

I don't think it looks particularly nice, but it's hopefully not something that you'll need to use that often. In the simple case, | should work fine by itself.

[% hash.|keys %]        # keys vmethod (high precedence)
[% hash|keys %]         # keys vmethod (low precedence)

It's only when you're using it as part of another expression that you may need to be explicit:

[% include example text=hash.|keys.join | indent(4) %]

In fact, I really don't like that much at all. But then I would probably write the filter out as a keyword in that case:

[% include example text=hash.|keys.join filter indent(4) %]

And let's not forget that this only applies if my hash could possibly contain a keys, otherwise it would be dotops all the way.

[% include example text=hash.keys.join | indent(4) %]

So I guess I'm prepared to overlook the ugliness, given that it's something of an edge case.

One candidate for the counterpart operator (to indicate that we're only interested in looking up a hash item) is .\

[% hash.\keys %]        # hash item 'keys'
[% hash.|keys %]        # hash virtual method 'keys'

I think there's a nice duality between the two operators. They're close enough in appearance to reflect the fact that they're doing similar things but different enough to be visually distinctive. Furthermore, the backslash symbol is typically used (in TT and elsewhere) to escape the following symbol.

[% some_text = "The $item costs over \$1000" %]

In a similar sense, the backslash is escaping the next word to tell TT to treat it as a literal entry in the hash rather than invoking any special meaning (i.e. a virtual method).

Given that we accept | by itself, we may also want to do the same for \.

[% hash|keys %]     # low precedence form of .|
[% hash\keys %]     # "    "    "    "    "  .\

Other Candidates

Top Close Open

Having given this subject careful consideration over the best part of 5 years or so, I think I can honestly say that I really haven't got a clue about which symbols are most appropriate for this role. There's nothing that jumps out as being obvious, or those that do have other conflicting roles.

The .| and .\ are probably the most logically consistent choices so far, but they're far from perfect, IMHO.

So here are some of the other things that I've considered.

Quotes

Single or double quoted strings indicates literal hash items, backquoted strings for virtual methods.

[% hash.'keys' %]         # hash item 'keys'
[% hash."keys" %]         # ditto
[% hash.`keys` %]         # virtual method 'keys'

This is nice because backquotes have traditionally been used to indicate execution (e.g. of a shell command) whereas single/double quotes are typically literal values (although double quotes can of course contain interpolated values).

One minor complaint is that it's not always easy to distinguish between single and backquotes in certain fonts, particularly in smaller sizes. Although we can easily brush that off as a a fault of the font (or display) rather than the language, the fact that they are less visually distinct should not be ignored. Furthermore, it's not unheard of for novice programmers or non-programming designer to not realise that there's a difference between ' and `, and write ``hello world'' when they really mean "hello world".

Percent / Ampersand

Playing along with Perl's sigils that use % to indicate a hash array and & for code.

[% hash.%keys %]        # hash item 'keys'
[% hash.&keys %]        # virtual method 'keys'

I quite like the &keys to indicate code, but I'm not sure about the %keys. Apart from anything else, I've got plans to use that for hash expansion (along with @ for list expansion).

[% combo_list = [@list1, @list2] %]             # list expansion
[% combo_hash = [%hash1, %hash2] %]             # hash expansion

By natural progression to dotops:

[% my.combo_list = [my.@list1, my.@list2] %]    # list expansion
[% my.combo_hash = [my.%hash1, my.%hash2] %]    # hash expansion

Post-Circumfix Operators

We could support Perl's hash syntax for direct access to items:

[% hash{item} %]

But then we should probably also support a similar syntax for lists.

[% list[n] %]

What if you use the "wrong" quotes on a hash or list? TT doesn't care when it comes to dotops, so it probably shouldn't care care anywhere else.

[% list{3} %]
[% hash['keys'] %]

In which case, is the different between {...} and [...] only that we automatically quote items in {...} but not in [...] (like Perl)?

[% hash['item'] %]  ===  [% hash{item} %]
[% hash[key] %]     ===  [% hash{$key} %]

Note that numbers don't care if they're quoted or not, so the difference is moot when it comes to lists. However, if you wanted to use a variable to provide the list index then the difference between {...} and [...] would be significant.

[% list[3] %]       ===  [% list{3} %]
[% list[n] %]       ===  [% list{$n} %]

Although this fits quite nicely with TT's idea of how the world of data works, it is somewhat different to Perl's outlook. That doesn't bother me by itself because TT has never been afraid to do things different to Perl. But the fact that we're reusing Perl's specific syntax but changing the meaning slightly makes me wary that we could end up confusing people.

It also looks rather cumbersome when chained together. Although we've already established that these are edge cases that shouldn't need to be used that often, it would still be nice to have something that looked less "brackety" and closer to what dotops look like.

[% foo.bar.baz %]
[% foo{bar}{baz} %]

Conclusion

Top Close Open

http://tt3.template-toolkit.org/docs/design/TT3DD05.html last modified 13:25:03 10-Dec-2009