Log in

No account? Create an account

TADS 3 System Development

Locational naming

TADS 3 System Development

Locational naming

Previous Entry Share Next Entry
I'm finally getting around to a language feature I've been meaning to add for a long time now. The design is pretty straightforward, but I'd like to get some opinions before settling on the exact syntax.

I'll start by describing the feature at a high level, then I'll get into the syntax details.

First the motivation. Object naming in larger TADS games gets a little tedious, mostly because all object names are global. For a common object like a table, it'd be nice to be able to name it something simple like 'table', but we usually can't because we might have a few other tables scattered around the game. The usual way I deal with this is to choose fairly long names for items, doing something like combining the room name and the item name: kitchenTable, say, or iceCavePedestal. That solves the problem of keeping names unique, but at the cost of making the names hard to read and tedious to type.

(Note that anonymous objects were motivated by this same problem, and went a long way toward solving it. Anonymous objects neatly deal with the very common case of self-contained objects that no one else needs to refer to directly, such as decorations and components. But anonymity doesn't work when we need to refer to an object from code belonging to another object. That happens often enough that the object naming nuisance is still with us.)

Given this pattern of ad hoc object naming by location, how could make it more automatic, and also improve readability and reduce our keyboard workload?

The compiler already has some awareness of the game world's containment structure, thanks to the '+' syntax for object definitions. That syntax lets us mirror the containment structure of the game world in the lexical structure of the source code, in a fairly natural way. The idea behind the new feature is to take this compile-time containment structure and use it to partition the object namespace. Rather than having to *explicitly* use a name like kitchenTable, it would be nicer to be able to call it simply 'table', and let the compiler remember that it's defined within the kitchen. If we also have a table defined in the parlor, it would be nice to be able to call it simply 'table' as well, and let the compiler sort out which is which based on context, and when there's not enough context, based on some new syntax that lets us tell the compiler which one we're talking about.

The details of the new feature:

1. Object names can be repeated, as long a name is only used once at a given containment level.

kitchen: Room;
+ table: Surface;
++ box: Container;

parlor: Room;
+ table: Surface;
++ box: Container;

2. Within an object definition, the naming context is established by the object's location. Within a parlor method, a reference to 'table' is the parlor table; within a kitchen method, a reference to 'table' is the kitchen table. This extends inwards and outwards, so that 'table' means the parlor table from within the parlor, the parlor table, and the parlor table box.

3. When the location context doesn't resolve an ambiguous name, we can explicitly name the object path. For example, in the code for livingRoom, outside of the kitchen and parlor namespaces, if we want to refer to one of those tables, we have to call it 'kitchen.table' or 'parlor.table'.

4. Similarly, if we want to refer to something outside of the current context namespace, we can use an explicit path. E.g., within a kitchen method, we can refer to parlor.table.

5. Paths only have to qualify things as far as they're unique, so we can refer to parlor.box, for example - we don't have to write the full path to parlor.table.box.

6. Globally unique object names never have to be qualified when referenced. If there's only one 'box' object defined in the game, there's never a need to use a location path to refer to it, no matter what the context - 'box' can only mean that one object. This is crucial because it means that existing code is seamlessly compatible with the new naming system. All existing code necessarily uses a unique name for each object, since reuse was always an error before, so all object references in existing code are already unique without any location qualifiers.

Now, on to my syntax questions.

When I started thinking about this feature, it seemed natural to use "." as the location path operator. This overloads the "." symbol, which of course also is used for property evaluation, but there's no ambiguity because of the rule that a given symbol can be an object name or a property name, but not both. If an object name is on the right side of a dot, there's only one possible meaning. There's also an excellent precedent for this within the C++ language family, in that Java uses "." as the package namespace scoping operator, which I think is a fairly close parallel.

If we relied on C++ alone as our syntax model, we'd probably choose "::" as the operator, since C++ uses that as its namespace scoping operator. The advantage of using "::" instead of "." for TADS location naming is that there's no ambiguity in the syntax - anyone looking at a piece of code would be able to tell that they're looking at an object location path, without having to know anything about the object names involved. With ".", there's no semantic ambiguity, but there's syntactic ambiguity - you have to know how the names are defined to know whether a given "." means property evaluation or location scoping. So a casual reader looking at a piece of code without having studied the overall context might misinterpret it.

Of the two, "." or "::", I like "." better aesthetically. I do see some value in the clarity of using separate operators for scoping vs property evaluation, but "." just looks a little cleaner to me somehow.

The wrench in the works, though, is that we need a global scoping operator, and I don't think "." will work for that. The global scoping operator is needed for situations like this:

kitchen: Room;
+ table: Surface;
++ box: Container;

box: Container;

We have a box within the kitchen, and a box out on its own at the top of the location tree. If we refer to just 'box' anywhere within the kitchen object tree, the context rule will always give us kitchen.table.box. If for some reason we really want to refer to that top-level box instead within the kitchen context, we need a global scoping operator - we need a way to say "the outermost 'box'". In C++, we'd write ::box. I don't think there's an equivalent with Java packages - or, rather, I think the equivalent in Java is that you simply have to move that outermost 'box' into a namespace if you want to be able to refer to it from within another namespace. I'm pretty sure I want to have an explicit outer scoping operator, though, mostly because it makes things a little easier to isolate when writing extensions and libraries.

Here are the options I see:

1. ::box, kitchen.box - use :: as the global scope operator only, and use . as the scope path operator. The upside is that most paths will contain only .'s because most paths won't need to shoot out to global scope, and when they do, it'll mostly be single element paths like ::box. The downside is the inconsistent representation of what are essentially two facets the same operator; multi-element global paths like ::kitchen.box make this especially apparent.

2. ::box, kitchen::box - use :: as the scoping operator in all cases. It's consistent, but I find paths like kitchen::box somewhat less aesthetically pleasing than kitchen.box.

3. kitchen.box - use . as the infix operator, and don't allow reusing an object name at global scope. There's no need for a global scope operator in this case because all ambiguous names are inside unique root objects.

The more I think about it, the more this strikes me as a reasonable and maybe even desirable restriction. Allowing reuse at the global level seems like a recipe for confusion when mixing code from multiple sources, like libraries and extensions.

4. outer.box, kitchen.box - use a keyword to represent the global scope, such as outer, global, unnamed, or anonymous. In principle I kind of like this approach, but in practice I haven't been able to come up with a keyword that I like for it.

Any thoughts are welcome.
  • Personally, I don't like the "." operator for this. I think it's important to be able to read your own code 4 months later without the need to get "the feel" for it all over again. Being able to know that "box" refers to an object rather than a property without first having to look at its definition is important and outweighs any cosmetic issues. This would help later maintenance of your games (be it for half a year's worth of user bug reports and feedback, or re-use of your old code in a new game.)

    I just hope though that my opinion here is not too biased, since I hate Java with a passion...
  • As for a keyword to represent the global namespace, "nil" looks like a natural choice? Though I prefer the C++ syntax which just uses nothing.
    • I think your first option is definitely the least desirable for the reason you give; it looks inconsistent.

      I believe C# uses . as the scoping operator as well as property evaluation operator, and I assume that doesn't cause too much of a headache for C# programmers, and I agree it looks more aesthetically pleasing.

      That said, I take realnc's point point about the readability of code. If we use :: it'll be quite unambiguous what it means, and it also provides a straightforward consistent solution to the global scope problem. Thus, while aesthetically, I'd prefer your option 4 (or 3), pragmatically I'd incline towards your option 2. In other words I'd be happy enough either way, but suspect that :: might be the safer choice.

      Incidentally, when you come to document this feature it may be worth emphasizing that the namespace scoping is strictly compile time (assuming I correctly understand your intention). If the player takes the box from the parlor to the kitchen and the box that was originally in the kitchen to the parlor they presumably retain their initial namespace references (that is after the swap the box brought into the kitchen is still parlor::box or parlor.box and the box deposited in the parlor is now kitchen::box). I wonder if there's a potential source of subtle bugs here if an author writes code on the assumption that any object named box will do (e.g. the player needs to put a certain number of cans in a box on the kitchen scales) and just refers to it as 'box' (rather than, say, checking whether it is of class Box), when the player actually employs parlor::box for the purpose.


      • You're correct that the naming structure is frozen at compile-time and isn't affected by moving objects around after the game starts running. I expect that in practice it'll be most natural to use this kind of naming for objects that are fixed in place anyway; I think when you're designing an object to be portable you'll be less likely to think of its name in terms of its initial location.

        As for the confusion about "box" becoming confused with something like a class name in the author's mind, that hadn't occurred to me but I suppose it's possible. If it turns out to be confusing it's certainly something we can emphasize in the documentation, although it strikes me as the kind of thing where we should wait to see if it actually turns out to confuse anyone, lest documenting it presumptively backfires by planting the confusion it was meant to clear up!

        Edited at 2012-07-27 10:04 pm (UTC)
      • C# can get away with using "." because it doesn't use the "+" containment model. Neither does Java. (Interestingly, C# also uses "::" and a "global" keyword to make matters more confusing.) If you see "foo.bar" in C#, you know where to look for "bar". In Tads, you don't. It could be a property/method in foo, or it could be a nested object and be something like "foo::snafu::baz::bar". Or maybe "foo::snafu::baz.bar". Or perhaps "foo::snafu.baz.bar"? Note how I used "::" here to resolve the ambiguity :-)

        In short, using "." should IMO always mean that the identifier to the right is a member of the object to the left. Using it for namespace scoping, especially recursive nested scoping, breaks that assumption.

        "foo::bar" a bit clearer. You still have to go hunting to find "bar", but you will know which kind of hunting it will be, namely an object preceded by one or more "+" characters below the "foo" definition.

        On another matter, unrelated to the "." vs "::" issue, there's the issue of precedence. If you have:

        + box
        + table
        ++ box

        then which box should the compiler pick for "kitchen.box"? Should it emit an error, a warning and use the first box, or simply use the first box and stay silent? This can be a serious source of mistakes, since the author can use "kitchen.box", and then later delete the first box (or move it into the namespace of a different object):

        + box

        + table
        ++ box

        But throughout his whole program, he has references to "kitchen.box" and those will now operate on a different box object. Right now, if you remove an object, you know that the compiler will provide extremely accurate assistance in finding all your references to it, since the code won't build and you'll get nice errors showing the file and line of each reference.

        An attempt to solve this problem might be to enforce the use of a full path when there's ambiguity. But it's a naive solution and doesn't work; even if you enforce full paths when there's ambiguity, you still end up with "kitchen.box" which *is* the full path of the first box object, so you didn't solve the problem (deleting the first box object now makes "kitchen.box" silently refer to the other box.)

        Short of always using full paths (no automatic lookups in nested namespaces), disallowing duplicate identifiers in the same top-level namespace might be the only solution.

        Edited at 2012-07-28 03:18 am (UTC)
        • Regarding the potential for confusion if "." is overloaded, I see your point and agree to some extent, but I'm not entirely convinced it would actually be that confusing in practice. If this were C++ maybe so, but given that the object sets we're talking about are mostly meant to model physical scenarios, I expect that in practice it'll usually be obvious whether something is an object or property name just from the name itself. And when it's not, it shouldn't be too hard to clear it up with a simple text search, assuming the name in question isn't something like 'a' or 'x'.

          On the second point, where you have kitchen.box and kitchen.table.box, the resolution scheme would be that an exact match takes precedence, so that's how you'd resolve that ambiguity. I can see some value in avoiding these situations by generalizing the restriction about top-level naming so that it's true at any level - i.e., a name can only be reused at a given level if it's qualified at that level.

          Your point about moving objects is another interesting one - definitely a potential source of confusion. As with the '.' overloading, I'm not sure how much actual confusion this would cause in practice; it's conceivable that it would be a big problem, or that it would just never come up. What I expect is that it won't be much of a practical problem, simply because most object references won't have explicit paths to start with, because they'll be local to the '+' tree where they're used. Outside-looking-in references should be the exception, as a natural result of the way these games tend to be designed. A big part of the point of this is to reduce verbosity by contextual scoping, so if it turns out that real code has lots of explicit paths, it kind of defeats the purpose, since you might as well just use the compound "parlorTableBox" type names you have to use now.

          I'm starting to think this feature might need some amount of empirical experimentation/testing before rolling it out, to see what the tradeoffs are. I'm still not liking :: aesthetically, for example, so from my perspective it would be useful to get some experiential data on whether overloading '.' is confusing or benign in a real project, and similarly to determine if object relocation creates horribly subtle bugs.
          • In practice it'll usually be obvious whether something is an object or property name just from the name itself.

            kitchen: Room {
                box: Thing { }
            + box: Thing { }

            I wouldn't want to be the one documenting what "kitchen.box" means only to have the user go "wut?" :-P
            • *That* kind of conflict won't be possible, actually - the locational naming doesn't change the rule that a symbol can be a property name or an object name, but not both. So you couldn't create a definition like that in a single game.

              Now, you could have that 'kitchen' definition in one game, and that 'box' definition in another game, so if your point isn't about conflicting usage in one game but just that the usage isn't *always* so obvious from the name itself, point taken. However, I only said it's *usually* clear, which I still think is true; and anyway I think this might be a misleading example, because in practice nested objects aren't really intended (or convenient) for creating containment hierarchies - nested objects are much more suitable for creating small *anonymous* objects that get plugged into inherited property roles, so you're much more likely to see something like 'east: RoomConnector { }'. Nested objects don't usually look like they confer object names, they usually look like unnamed objects being plugged into property names.

              I think there's also an unrelated point to be taken from your example, forgetting what I just said about it probably being atypical to start with. If your point is that this is a pretty subtle difference for a naive user to grasp, then it could, maybe perversely, be taken a point in favor of overloading the syntax. Because with overloaded syntax, the naive user can just write kitchen.box in either case and it'll work, even if they haven't internalized the different information models in the two cases. With separate syntax the user will have to internalize the model to know which case is . and which is ::. Just to be clear, though, I don't think it's good programming language design to intentionally fuzzy up the syntax to let users off the hook for understanding the information model - I'm trying to justify overloading '.' on the basis that the difference is clear enough that users don't require separate operators for clarity any more than the compiler does.

              Edited at 2012-07-28 08:04 pm (UTC)
              • Nested objects are members of the object they're defined in. "+" objects are not members. Overloading "." breaks the intuitive scoping rules we currently have, leaking the member namespace upwards. The member namespace should really be private to the object and should not confict with identifiers outside of it. (As a side effect, it does break backwards compatibility; the above code compiles just fine right now.) Using a new operator, like "::", allows it to have a whole new meaning, including the "it's not exact" behavior, where "foo::bar" and "foo::baz::bar" can refer to the same thing but "foo.bar" and "foo.baz.bar" are different things. If "." is overloaded, then:
                kitchen: Room {
                    stateHandler: Handler {
                        burnedDown = nil;

                one would expect to write "kitchen.burnedDown" and have it work. But it won't work, since we have two different rules for the same operator. As for whether it's common for users to write code like that, hey, you yourself told me recently something about "trying to outsmart the user and decide we know better than they do what they want" :-)

                Another thing is that a new user, just learning Tads, will never expect that "." can be used for that. Even people coming from Java and C# will not expect it, since "." is actually a member access operator. It's just happens that all members, without exception, belong to the namespace of the class. The rvalue is always a member of a class. You can't access non-members with "." It's not intuitive to overload "." for this.

                I suppose the core reason I really dislike the idea is that introducing limitations and special cases to a language because of a subjective perception about the aesthetic on-screen appearance of an ASCII character, does not look like good design to me. One of my professors once told me that whenever you come up with a feature for a language, the first thing you should do is take a look at the problems it can create rather than the ones it can solve. The intentions of overloading "." are well meant indeed. But it just looks like abusive overloading to me (I've certainly seen much of that with C++ operator overloading; don't get me started on that one.) Don't forget that in Perdition's Flames you wrote: "Hell is paved with good intentions." :-)

                Also, "::" is more future proof, in case you decide to introduce full namespace support with its own set of rules.

                Oh, just to make it clear: I don't feel *that* strongly about the issue of overloading ".", even if the discussion might look very involved. Personally I would have no trouble understanding the quirks of overloading it. It's just that Tads already has "difficult to fully comprehend" tag attached to it, and I just find it preferable to keep that to a minimum.

                Edited at 2012-07-28 09:30 pm (UTC)
                • Re backward compatibility: I'm not sure I see what you're getting at there; what gets broken? The earlier code example of

                  kitchen: Room
                  box: Thing
                  + box: Thing;

                  definitely won't compile with existing versions - you'll get "symbol 'box' is already defined - can't redefine as object" at the "+ box" definition, since "box" is already a property.
                  • You're right. Now I'm confused myself. Tads is already leaking member identifiers into the global scope.

                    I suppose that means overloading "." is confusingly consistent at some level :-P
                    • "I suppose that means overloading '.' is confusingly consistent at some level" - actually, I kind of see it the other way around. It's not that member identifiers leak into global scope; it's that properties aren't namespace members. Classes aren't namespaces, and "." for property evaluation isn't a scoping operator. A better analog in C++ would be an indexing operator.

                      This might be part of why I haven't been seeing the overloading as being as potentially confusing as you have - property evaluation is a dynamic run-time operator, and locational naming is just a way of writing a compound identifier that's resolved at compile time. The two uses occupy such widely separated conceptual spots that I haven't been worried about overloading the symbol, much as I assume you don't have similar objections to '.' also being overloaded as the decimal point symbol in floating-point constants. The decimal point overload is easily distinguishable at a glance on the basis of lexical structure, which isn't true of the namespace overload, so I think it comes back to the question of whether it would be distinguishable at a glance on the basis of the English semantics of the object names involved. My intuition is: most of the time yes, some of the time no; but maybe that's not a satisfactory average case.

                      Edited at 2012-07-30 11:05 pm (UTC)
    • I think on this part I'm getting pretty comfortable with the restriction that all top-level names are unique, which eliminates the need for a global scoping operator or keyword.
  • this may be only tangential to the main question, but since T3 doesn't use namespacing in the general sense (at least, I don't think it does...) is there any accomodation in the syntax here for a future where T3 does implement some kind of namespacing (particularly for the benefit of extension writers)?
    • I haven't really been planning to add a more formal package or namespace mechanism. If that were added at some point, it could probably just use the same syntax that the locational naming does, although doing that would have the same potential for confusion that overloading "." for locational naming has.

      I've been thinking, though, that locational naming might be a good enough scoping tool for extensions that it would make a pure namespace feature redundant. An extension could effectively group its objects into a namespace by putting them inside a container object, which wouldn't have to do anything other than serve as the namespace container.
      • This wouldn't work for functions though. Or, if Tads decides to go that route, macros (not sure if it's a good idea to be able to scope macros. But, why not?)
        • For functions, you can always instead make the code a method of the namespace object.

          I don't even want to think about macro namespaces. :)

          Edited at 2012-07-28 06:14 am (UTC)
Powered by LiveJournal.com