?

Log in

No account? Create an account

TADS 3 System Development

Locational naming

TADS 3 System Development

Locational naming

Previous Entry Share Next Entry
I'm finally getting around to a language feature I've been meaning to add for a long time now. The design is pretty straightforward, but I'd like to get some opinions before settling on the exact syntax.

I'll start by describing the feature at a high level, then I'll get into the syntax details.

First the motivation. Object naming in larger TADS games gets a little tedious, mostly because all object names are global. For a common object like a table, it'd be nice to be able to name it something simple like 'table', but we usually can't because we might have a few other tables scattered around the game. The usual way I deal with this is to choose fairly long names for items, doing something like combining the room name and the item name: kitchenTable, say, or iceCavePedestal. That solves the problem of keeping names unique, but at the cost of making the names hard to read and tedious to type.

(Note that anonymous objects were motivated by this same problem, and went a long way toward solving it. Anonymous objects neatly deal with the very common case of self-contained objects that no one else needs to refer to directly, such as decorations and components. But anonymity doesn't work when we need to refer to an object from code belonging to another object. That happens often enough that the object naming nuisance is still with us.)

Given this pattern of ad hoc object naming by location, how could make it more automatic, and also improve readability and reduce our keyboard workload?

The compiler already has some awareness of the game world's containment structure, thanks to the '+' syntax for object definitions. That syntax lets us mirror the containment structure of the game world in the lexical structure of the source code, in a fairly natural way. The idea behind the new feature is to take this compile-time containment structure and use it to partition the object namespace. Rather than having to *explicitly* use a name like kitchenTable, it would be nicer to be able to call it simply 'table', and let the compiler remember that it's defined within the kitchen. If we also have a table defined in the parlor, it would be nice to be able to call it simply 'table' as well, and let the compiler sort out which is which based on context, and when there's not enough context, based on some new syntax that lets us tell the compiler which one we're talking about.

The details of the new feature:

1. Object names can be repeated, as long a name is only used once at a given containment level.

kitchen: Room;
+ table: Surface;
++ box: Container;

parlor: Room;
+ table: Surface;
++ box: Container;

2. Within an object definition, the naming context is established by the object's location. Within a parlor method, a reference to 'table' is the parlor table; within a kitchen method, a reference to 'table' is the kitchen table. This extends inwards and outwards, so that 'table' means the parlor table from within the parlor, the parlor table, and the parlor table box.

3. When the location context doesn't resolve an ambiguous name, we can explicitly name the object path. For example, in the code for livingRoom, outside of the kitchen and parlor namespaces, if we want to refer to one of those tables, we have to call it 'kitchen.table' or 'parlor.table'.

4. Similarly, if we want to refer to something outside of the current context namespace, we can use an explicit path. E.g., within a kitchen method, we can refer to parlor.table.

5. Paths only have to qualify things as far as they're unique, so we can refer to parlor.box, for example - we don't have to write the full path to parlor.table.box.

6. Globally unique object names never have to be qualified when referenced. If there's only one 'box' object defined in the game, there's never a need to use a location path to refer to it, no matter what the context - 'box' can only mean that one object. This is crucial because it means that existing code is seamlessly compatible with the new naming system. All existing code necessarily uses a unique name for each object, since reuse was always an error before, so all object references in existing code are already unique without any location qualifiers.

Now, on to my syntax questions.

When I started thinking about this feature, it seemed natural to use "." as the location path operator. This overloads the "." symbol, which of course also is used for property evaluation, but there's no ambiguity because of the rule that a given symbol can be an object name or a property name, but not both. If an object name is on the right side of a dot, there's only one possible meaning. There's also an excellent precedent for this within the C++ language family, in that Java uses "." as the package namespace scoping operator, which I think is a fairly close parallel.

If we relied on C++ alone as our syntax model, we'd probably choose "::" as the operator, since C++ uses that as its namespace scoping operator. The advantage of using "::" instead of "." for TADS location naming is that there's no ambiguity in the syntax - anyone looking at a piece of code would be able to tell that they're looking at an object location path, without having to know anything about the object names involved. With ".", there's no semantic ambiguity, but there's syntactic ambiguity - you have to know how the names are defined to know whether a given "." means property evaluation or location scoping. So a casual reader looking at a piece of code without having studied the overall context might misinterpret it.

Of the two, "." or "::", I like "." better aesthetically. I do see some value in the clarity of using separate operators for scoping vs property evaluation, but "." just looks a little cleaner to me somehow.

The wrench in the works, though, is that we need a global scoping operator, and I don't think "." will work for that. The global scoping operator is needed for situations like this:

kitchen: Room;
+ table: Surface;
++ box: Container;

box: Container;

We have a box within the kitchen, and a box out on its own at the top of the location tree. If we refer to just 'box' anywhere within the kitchen object tree, the context rule will always give us kitchen.table.box. If for some reason we really want to refer to that top-level box instead within the kitchen context, we need a global scoping operator - we need a way to say "the outermost 'box'". In C++, we'd write ::box. I don't think there's an equivalent with Java packages - or, rather, I think the equivalent in Java is that you simply have to move that outermost 'box' into a namespace if you want to be able to refer to it from within another namespace. I'm pretty sure I want to have an explicit outer scoping operator, though, mostly because it makes things a little easier to isolate when writing extensions and libraries.

Here are the options I see:

1. ::box, kitchen.box - use :: as the global scope operator only, and use . as the scope path operator. The upside is that most paths will contain only .'s because most paths won't need to shoot out to global scope, and when they do, it'll mostly be single element paths like ::box. The downside is the inconsistent representation of what are essentially two facets the same operator; multi-element global paths like ::kitchen.box make this especially apparent.

2. ::box, kitchen::box - use :: as the scoping operator in all cases. It's consistent, but I find paths like kitchen::box somewhat less aesthetically pleasing than kitchen.box.

3. kitchen.box - use . as the infix operator, and don't allow reusing an object name at global scope. There's no need for a global scope operator in this case because all ambiguous names are inside unique root objects.

The more I think about it, the more this strikes me as a reasonable and maybe even desirable restriction. Allowing reuse at the global level seems like a recipe for confusion when mixing code from multiple sources, like libraries and extensions.

4. outer.box, kitchen.box - use a keyword to represent the global scope, such as outer, global, unnamed, or anonymous. In principle I kind of like this approach, but in practice I haven't been able to come up with a keyword that I like for it.

Any thoughts are welcome.
  • Nested objects are members of the object they're defined in. "+" objects are not members. Overloading "." breaks the intuitive scoping rules we currently have, leaking the member namespace upwards. The member namespace should really be private to the object and should not confict with identifiers outside of it. (As a side effect, it does break backwards compatibility; the above code compiles just fine right now.) Using a new operator, like "::", allows it to have a whole new meaning, including the "it's not exact" behavior, where "foo::bar" and "foo::baz::bar" can refer to the same thing but "foo.bar" and "foo.baz.bar" are different things. If "." is overloaded, then:
    
    kitchen: Room {
        stateHandler: Handler {
            burnedDown = nil;
        }
    }
    

    one would expect to write "kitchen.burnedDown" and have it work. But it won't work, since we have two different rules for the same operator. As for whether it's common for users to write code like that, hey, you yourself told me recently something about "trying to outsmart the user and decide we know better than they do what they want" :-)

    Another thing is that a new user, just learning Tads, will never expect that "." can be used for that. Even people coming from Java and C# will not expect it, since "." is actually a member access operator. It's just happens that all members, without exception, belong to the namespace of the class. The rvalue is always a member of a class. You can't access non-members with "." It's not intuitive to overload "." for this.

    I suppose the core reason I really dislike the idea is that introducing limitations and special cases to a language because of a subjective perception about the aesthetic on-screen appearance of an ASCII character, does not look like good design to me. One of my professors once told me that whenever you come up with a feature for a language, the first thing you should do is take a look at the problems it can create rather than the ones it can solve. The intentions of overloading "." are well meant indeed. But it just looks like abusive overloading to me (I've certainly seen much of that with C++ operator overloading; don't get me started on that one.) Don't forget that in Perdition's Flames you wrote: "Hell is paved with good intentions." :-)

    Also, "::" is more future proof, in case you decide to introduce full namespace support with its own set of rules.

    Edit:
    Oh, just to make it clear: I don't feel *that* strongly about the issue of overloading ".", even if the discussion might look very involved. Personally I would have no trouble understanding the quirks of overloading it. It's just that Tads already has "difficult to fully comprehend" tag attached to it, and I just find it preferable to keep that to a minimum.

    Edited at 2012-07-28 09:30 pm (UTC)
    • Re backward compatibility: I'm not sure I see what you're getting at there; what gets broken? The earlier code example of

      kitchen: Room
      box: Thing
      ;
      + box: Thing;

      definitely won't compile with existing versions - you'll get "symbol 'box' is already defined - can't redefine as object" at the "+ box" definition, since "box" is already a property.
      • You're right. Now I'm confused myself. Tads is already leaking member identifiers into the global scope.

        I suppose that means overloading "." is confusingly consistent at some level :-P
        • "I suppose that means overloading '.' is confusingly consistent at some level" - actually, I kind of see it the other way around. It's not that member identifiers leak into global scope; it's that properties aren't namespace members. Classes aren't namespaces, and "." for property evaluation isn't a scoping operator. A better analog in C++ would be an indexing operator.

          This might be part of why I haven't been seeing the overloading as being as potentially confusing as you have - property evaluation is a dynamic run-time operator, and locational naming is just a way of writing a compound identifier that's resolved at compile time. The two uses occupy such widely separated conceptual spots that I haven't been worried about overloading the symbol, much as I assume you don't have similar objections to '.' also being overloaded as the decimal point symbol in floating-point constants. The decimal point overload is easily distinguishable at a glance on the basis of lexical structure, which isn't true of the namespace overload, so I think it comes back to the question of whether it would be distinguishable at a glance on the basis of the English semantics of the object names involved. My intuition is: most of the time yes, some of the time no; but maybe that's not a satisfactory average case.


          Edited at 2012-07-30 11:05 pm (UTC)
Powered by LiveJournal.com