转载地址:http://dmitrysoshnikov.com/ecmascript/chapter-5-functions/
Introduction
In this article we will talk about one of the general ECMAScript objects — about functions. In particular, we will go through various types of functions, will define how each type influencesvariables object of a context and what is contained in the scope chain of each function. We will answer the frequently asked questions such as: “is there any difference (and if there are, what are they?) between functions created as follows:
var foo = function () { ... }; |
from functions defined in a “habitual” way?”:
function foo() { ... } |
Or, “why in the next call, the function has to be surrounded with parentheses?”:
( function () { ... })(); |
Since these articles relay on earlier chapters, for full understanding of this part it is desirable to read Chatper 2. Variable object and Chapter 4. Scope chain, since we will actively use terminology from these chapters.
But let us give one after another. We begin with consideration of function types.
Types of functions
In ECMAScript there are three function types and each of them has its own features.
Function Declaration
- has an obligatory name;
- in the source code position it is positioned: either at the Program level or directly in the body of another function (FunctionBody);
- is created on entering the context stage;
- influences variable object;
- and is declared in the following way:
function exampleFunc() { ... } |
The main feature of this type of functions is that only they influence variable object (they are stored in the VO of the context). This feature defines the second important point (which is a consequence of a variable object nature) — at the code execution stage they are already available (since FD are stored in the VO on entering the context stage — before the execution begins).
Example (function is called before its declaration in the source code position):
foo(); function foo() { alert( 'foo' ); } |
What’s also important is the position at which the funcion is defined in the source code (see the second bullet in the Function declaration definition above):
// function can be declared: // 1) directly in the global context function globalFD() { // 2) or inside the body // of another function function innerFD() {} } |
These are the only two positions in code where a function may be declared (i.e. it is impossible to declare it in an expression position or inside a code block).
There’s one alternative to function declarations which is called function expressions, which we are about to cover.
Function Expression
- in the source code can only be defined at the expression position;
- can have an optional name;
- it’s definition has no effect on variable object;
- and is created at the code execution stage.
The main feature of this type of functions is that in the source code they are always in theexpression position. Here’s a simple example such assignment expression:
var foo = function () { ... }; |
This example shows how an anonymous FE is assigned to foo
variable. After that the function is available via foo
name — foo()
.
The definition states that this type of functions can have an optional name:
var foo = function _foo() { ... }; |
What’s important here to note is that from the outside FE is accessible via variable foo
— foo()
, while from inside the function (for example, in the recursive call), it is also possible to use _foo
name.
When a FE is assigned a name it can be difficult to distinguish it from a FD. However, if you know the definition, it is easy to tell them apart: FE is always in the expression position. In the following example we can see various ECMAScript expressions in which all the functions are FE:
// in parentheses (grouping operator) can be only an expression ( function foo() {}); // in the array initialiser – also only expressions [ function bar() {}]; // comma also operates with expressions 1, function baz() {}; |
The definition also states that FE is created at the code execution stage and is not stored in the variable object. Let’s see an example of this behavior:
// FE is not available neither before the definition // (because it is created at code execution phase), alert(foo); // "foo" is not defined ( function foo() {}); // nor after, because it is not in the VO alert(foo); // "foo" is not defined |
The logical question now is why do we need this type of functions at all? The answer is obvious — to use them in expressions and “not pollute” the variables object. This can be demonstrated in passing a function as an argument to another function:
function foo(callback) { callback(); } foo( function bar() { alert( 'foo.bar' ); }); foo( function baz() { alert( 'foo.baz' ); }); |
In case a FE is assigned to a variable, the function remains stored in memory and can later be accessed via this variable name (because variables as we know influence VO):
var foo = function () { alert( 'foo' ); }; foo(); |
Another example is creation of encapsulated scope to hide auxiliary helper data from external context (in the following example we use FE which is called right after creation):
var foo = {}; ( function initialize() { var x = 10; foo.bar = function () { alert(x); }; })(); foo.bar(); // 10; alert(x); // "x" is not defined |
We see that function foo.bar
(via its [[Scope]]
property) has access to the internal variable x
of function initialize
. And at the same time x
is not accessible directly from the outside. This strategy is used in many libraries to create “private” data and hide auxiliary entities. Often in this pattern the name of initializing FE is omitted:
( function () { // initializing scope })(); |
Here’s another examples of FE which are created conditionally at runtime and do not pollute VO:
var foo = 10; var bar = (foo % 2 == 0 ? function () { alert(0); } : function () { alert(1); } ); bar(); // 0 |
Question “about surrounding parentheses”
Let’s go back and answer the question from the beginning of the article — “why is it necessary to surround a function in parentheses if we want to call it right from it’s definition”. Here’s an answer to this question: restrictions of the expression statement.
According to the standard, the expression statement (ExpressionStatement) cannot begin with an opening curly brace — {
since it would be indistinguishable from the block, and also the expression statement cannot begin with a function
keyword since then it would be indistinguishable from thefunction declaration. I.e., if we try to define an immediately invoked function the following way (starting with a function
keyword):
function () { ... }(); // or even with a name function foo() { ... }(); |
we deal with function declarations, and in both cases a parser will produce a parse error. However, the reasons of these parse errors vary.
If we put such a definition in the global code (i.e. on the Program
level), the parser should treat the function as declaration, since it starts with a function
keyword. And in first case we get aSyntaxError
because of absence of the function’s name (a function declaration as we said should always have a name).
In the second case we do have a name (foo
) and the function declaration should be created normally. But it doesn’t since we have another syntax error there — a grouping operator without an expression inside it. Notice, in this case it’s exactly a grouping operator which follows the function declaration, but not the parentheses of a function call! So if we had the following source:
// "foo" is a function declaration // and is created on entering the context alert(foo); // function function foo(x) { alert(x); }(1); // and this is just a grouping operator, not a call! foo(10); // and this is already a call, 10 |
everything is fine since here we have two syntactic productions — a function declaration and agrouping operator with an expression (1
) inside it. The example above is the same as:
// function declaration function foo(x) { alert(x); } // a grouping operator // with the expression (1); // another grouping operator with // another (function) expression ( function () {}); // also - the expression inside ( "foo" ); // etc |
In case we had such a definition inside a statement, then as we said, there because of ambiguity we would get a syntax error:
if ( true ) function foo() {alert(1)} |
The construction above by the specification is syntactically incorrect (an expression statement cannot begin with a function
keyword), but as we will see below, none of the implementations provide the syntax error, but handle this case, though, every in it’s own manner.
Having all this, how should we tell the parser that what we really want it to call a function immediately after its creation? The answer is obvious. It’s should be a function expression, and nota function declaration. And the simplest way to create an expression is to use mentioned abovegrouping operator. Inside it always there is an expression. Thus, the parser distinguishes a code as a function expression (FE) and there is no ambiguity. Such a function will be created during theexecution stage, then executed, and then removed (if there are no references to it).
( function foo(x) { alert(x); })(1); // OK, it's a call, not a grouping operator, 1 |
In the example above the parentheses at the end (Arguments
production) are already call of the function, and not a grouping operator as it was in case of a FD.
Notice, in the following example of the immediate invocation of a function, the surrounding parentheses are not required, since the function is already in the expression position and the parser knows that it deals with a FE which should be created at code execution stage:
var foo = { bar: function (x) { return x % 2 != 0 ? 'yes' : 'no' ; }(1) }; alert(foo.bar); // 'yes' |
As we see, foo.bar
is a string but not a function as can seem at first inattentive glance. The function here is used only for initialization of the property — depending on the conditional parameter — it is created and called right after that.
Therefore, the complete answer to the question “about parentheses” is the following:
Grouping parentheses are needed when a function is not at the expression position and if we want to call it immediately right after its creation — in this case we just manually transform the function to FE.
In case when a parser knows that it deals with a FE, i.e. the function is already at the expression position — the parentheses are not required.
Apart from surrounding parentheses it is possible to use any other way of transformation of a function to FE type. For example:
1, function () { alert( 'anonymous function is called' ); }(); // or this one ! function () { alert( 'ECMAScript' ); }(); // and any other manual // transformation ... |
However, grouping parentheses are just the most widespread and the elegant way to do it.
By the way, the grouping operator can surround the function description as without call parentheses, and also including call parentheses. I.e. both expressions below are correct FE:
( function () {})(); ( function () {}()); |
Implementations extension: Function Statement
The following example shows a code in which none of implementations processes accordingly to the specification:
if ( true ) { function foo() { alert(0); } } else { function foo() { alert(1); } } foo(); // 1 or 0 ? test in different implementations |
Here it is necessary to say that according to the standard this syntactic construction in general isincorrect, because as we remember, a function declaration (FD) cannot appear inside a code block(here if
and else
contain code blocks). As it has been said, FD can appear only in two places: at the Program level or directly inside a body of another function.
The above example is incorrect because the code block can contain only statements. And the only place in which function can appear within a block is one of such statements — the expression statement. But by definition it cannot begin with an opening curly brace (since it is indistinguishable from the code block) or a function
keyword (since it is indistinguishable from FD).
However in section of errors processing the standard allows for implementations extensions of program syntax. And one of such extensions can be seen in case of functions which appear in blocks. All implementations existing today do not throw an exception in this case and process it. But every in its own way.
Presence of if
-else
branches assumes a choice is being made which of the two function will be defined. Since this decision is to be made at runtime, that implies that a function expression (FE)should be used. However the majority of implementations will simply create both of the function declarations (FD) on entering the context stage, but since both of the functions use the same name, only the last declared function will get called. In this example the function foo
shows 1
although theelse
branch never executes.
However, SpiderMonkey implementation treats this case in two ways: on the one hand it does not consider such functions as declarations (i.e. the function is created on the condition at the code execution stage), but on the other hand they are not real function expressions since they cannot be called without surrounding parentheses (again the parse error — “indistinguishably from FD”) and they are stored in the variable object.
My opinion is that SpiderMonkey handles this case correctly, separating the own middle type of function — (FE + FD). Such functions are correctly created due the time and according to conditions, but also unlike FE, and more like FD, are available to be called from the outside. This syntactic extension SpiderMonkey names as Function Statement (in abbreviated form FS); this terminology ismentioned in MDC. JavaScript inventor Brendan Eich also noticed this type of functions provided by SpiderMonkey implementation.
Feature of Named Function Expression (NFE)
In case FE has a name (named function expression, in abbreviated form NFE) one important feature arises. As we know from definition (and as we saw in the examples above) function expressions do not influence variable object of a context (this means that it’s impossible to call them by namebefore or after their definition). However, FE can call itself by name in the recursive call:
( function foo(bar) { if (bar) { return ; } foo( true ); // "foo" name is available })(); // but from the outside, correctly, is not foo(); // "foo" is not defined |
Where is the name “foo” stored? In the activation object of foo
? No, since nobody has defined any “foo” name inside foo
function. In the parent variable object of a context which creates foo
? Also not, remember the definition — FE does not influence the VO — what is exactly we see when callingfoo
from the outside. Where then?
Here’s how it works: when the interpreter at the code execution stage meets named FE, before creating FE, it creates auxiliary special object and adds it in front of the current scope chain. Then it creates FE itself at which stage the function gets the [[Scope]]
property (as we know from theChapter 4. Scope chain) — the scope chain of the context which created the function (i.e. in[[Scope]]
there is that special object). After that, the name of FE is added to the special object as unique property; value of this property is the reference to the FE. And the last action is removing that special object from the parent scope chain. Let’s see this algorithm on the pseudo-code:
specialObject = {}; Scope = specialObject + Scope; foo = new FunctionExpression; foo.[[Scope]] = Scope; specialObject.foo = foo; // {DontDelete}, {ReadOnly} delete Scope[0]; // remove specialObject from the front of scope chain |
Thus, from the outside this function name is not available (since it is not present in parent scope), but special object which has been saved in [[Scope]]
of a function and there this name is available.
It is necessary to note however, that some implementations, for example Rhino, save this optional name not in the special object but in the activation object of the FE. Implementation from Microsoft — JScript, completely breaking FE rules, keeps this name in the parent variables object and the function becomes available outside.
NFE and SpiderMonkey
Let’s have a look at how different implementations handle this problem. Some versions of SpiderMonkey have one feature related to special object which can be treated as a bug (although all was implemented according to the standard, so it is more of an editorial defect of the specification). It is related to the mechanism of the identifier resolution: the scope chain analysis istwo-dimensional and when resolving an identifier it considers the prototype chain of every object in the scope chain as well.
We can see this mechanism in action if we define a property in Object.prototype
and use a “nonexistent” variable from the code. In the following example when resolving the name x
the global object is reached without finding x
. However since in SpiderMonkey the global object inherits from Object.prototype
the name x
is resolved there:
Object .prototype.x = 10; ( function () { alert(x); // 10 })(); |
Activation objects do not have prototypes. With the same start conditions, it is possible to see the same behavior in the example with inner function. If we were to define a local variable x
and declare inner function (FD or anonymous FE) and then to reference x
from the inner function, this variable would be resolved normally in the parent function context (i.e. there, where it should be and is), instead of in Object.prototype
:
Object .prototype.x = 10; function foo() { var x = 20; // function declaration function bar() { alert(x); } bar(); // 20, from AO(foo) // the same with anonymous FE ( function () { alert(x); // 20, also from AO(foo) })(); } foo(); |
Some implementations set a prototype for activation objects, which is an exception compared to most of other implementations. So, in the Blackberry implementation value x
from the above example is resolved to 10
. I.e. do not reach activation object of foo
since value is found in Object.prototype
:
AO(bar FD or anonymous FE) -> no -> AO(bar FD or anonymous FE).[[Prototype]] -> yes - 10 |
And we can see absolutely the same situation in SpiderMonkey in case of special object of a named FE. This special object (by the standard) is a normal object — “as if by expression new Object()
“, and accordingly it should be inherited from Object.prototype
, what is exactly what can be seen in SpiderMonkey implementation (but only up to version 1.7). Other implementations (including newer versions of SpiderMonkey) do not set a prototype for that special object:
function foo() { var x = 10; ( function bar() { alert(x); // 20, but not 10, as don't reach AO(foo) // "x" is resolved by the chain: // AO(bar) - no -> __specialObject(bar) -> no // __specialObject(bar).[[Prototype]] - yes: 20 })(); } Object .prototype.x = 20; foo(); |
NFE and JScript
ECMAScript implementation from Microsoft — JScript which is currently built into Internet Explorer (up to JScript 5.8 — IE8) has a number of bugs related with named function expressions (NFE). Every of these bugs completely contradicts ECMA-262-3 standard; some of them may cause serious errors.
First, JScript in this case breaks the main rule of FE that they should not be stored in the variable object by name of functions. An optional FE name which should be stored in the special object and be accessible only inside the function itself (and nowhere else) here is stored directly in the parent variable object. Moreover, named FE is treated in JScript as the function declaration (FD), i.e. is created on entering the context stage and is available before the definition in the source code:
// FE is available in the variable object // via optional name before the // definition like a FD testNFE(); ( function testNFE() { alert( 'testNFE' ); }); // and also after the definition // like FD; optional name is // in the variable object testNFE(); |
As we see, complete violation of rules.
Secondly, in case of assigning the named FE to a variable at declaration, JScript creates two different function objects. It is difficult to name such behavior as logical (especially considering that outside of NFE its name should not be accessible at all):
var foo = function bar() { alert( 'foo' ); }; alert( typeof bar); // "function", NFE again in the VO – already mistake // but, further is more interesting alert(foo === bar); // false! foo.x = 10; alert(bar.x); // undefined // but both function make // the same action foo(); // "foo" bar(); // "foo" |
Again we see the full disorder.
However it is necessary to notice that if to describe NFE separately from assigning to variable (for example via the grouping operator), and only after that to assign it to a variable, then check on equality returns true
just like it would be one object:
( function bar() {}); var foo = bar; alert(foo === bar); // true foo.x = 10; alert(bar.x); // 10 |
This moment can be explained. Actually, again two objects are created but after that remains, really, only one. If again to consider that NFE here is treated as the function declaration (FD) then on entering the context stage FD bar is created. After that, already at code execution stage the second object — function expression (FE) bar is created and is not saved anywhere. Accordingly, as there is no any reference on FE bar
it is removed. Thus there is only one object — FD bar
, the reference on which is assigned to foo
variable.
Thirdly, regarding the indirect reference to a function via arguments.callee
, it references that object with which name a function is activated (to be exact — functions since there are two objects):
var foo = function bar() { alert([ arguments.callee === foo, arguments.callee === bar ]); }; foo(); // [true, false] bar(); // [false, true] |
Fourthly, as JScript treats NFE as usual FD, it is not submitted to conditional operators rules, i.e. just like a FD, NFE is created on entering the context and the last definition in a code is used:
var foo = function bar() { alert(1); }; if ( false ) { foo = function bar() { alert(2); }; } bar(); // 2 foo(); // 1 |
This behavior can also be “logically” explained. On entering the context stage the last met FD with name bar is created, i.e. function with alert(2)
. After that, at code execution stage already new function — FE bar
is created, the reference on which is assigned to foo
variable. Thus (as further in the code the if-block with a condition false
is unreachable), foo
activation produces alert(1)
. The logic is clear, but taking into account IE bugs, I have quoted “logically” word since such implementation is obviously broken and depends on JScript bugs.
And the fifth NFE bug in JScript is related with creation of properties of global object via assigning value to an unqualified identifier (i.e. without var
keyword). Since NFE is treated here as FD and, accordingly, stored in the variable object, assignment to unqualified identifier (i.e. not to variablebut to usual property of global object) in case when the function name is the same as unqualified identifier, this property does not become global.
( function () { // without var not a variable in the local // context, but a property of global object foo = function foo() {}; })(); // however from the outside of // anonymous function, name foo // is not available alert( typeof foo); // undefined |
Again, the “logic” is clear: the function declaration foo gets to the activation object of a local context of anonymous function on entering the context stage. And at the moment of code execution stage, the name foo already exists in AO, i.e. is treated as local. Accordingly, at assignment operation there is simply an update of already existing in AO property foo, but not creation of new property of global object as should be according to the logic of ECMA-262-3.
Functions created via Function constructor
This type of function objects is discussed separately from FD and FE since it also has its own features. The main feature is that the [[Scope]]
property of such functions contains only global object:
var x = 10; function foo() { var x = 20; var y = 30; var bar = new Function ( 'alert(x); alert(y);' ); bar(); // 10, "y" is not defined } |
We see that the [[Scope]] of bar function does not contain AO of foo context — the variable “y” is not accessible and the variable “x” is taken from the global context. By the way, pay attention, theFunction constructor can be used both with new keyword and without it, in this case these variants are equivalent.
The other feature of such functions is related with Equated Grammar Productions and Joined Objects. This mechanism is provided by the specification as suggestion for the optimization (however, implementations have the right not to use such optimization). For example, if we have an array of 100 elements which is filled in a loop with functions, then implementation can use this mechanism of joined objects. As a result only one function object for all elements of an array can be used:
var a = []; for ( var k = 0; k < 100; k++) { a[k] = function () {}; // possibly, joined objects are used } |
But functions created via Function constructor are never joined:
var a = []; for ( var k = 0; k < 100; k++) { a[k] = Function ( '' ); // always 100 different funcitons } |
Another example related with joined objects:
function foo() { function bar(z) { return z * z; } return bar; } var x = foo(); var y = foo(); |
Here also implementation has the right to join objects x and y (and to use one object) because functions physically (including their internal [[Scope]] property) are not distinguishable. Therefore, the functions created via Function constructor always require more memory resources.
Algorithm of function creation
The pseudo-code of function creation algorithm (except steps with joined objects) is described below. This description helps to understand in more detail which function objects exist in ECMAScript. The algorithm is identical for all function types.
F = new NativeObject(); // property [[Class]] is "Function" F.[[Class]] = "Function" // a prototype of a function object F.[[Prototype]] = Function .prototype // reference to function itself // [[Call]] is activated by call expression F() // and creates a new execution context F.[[Call]] = <reference to function > // built in general constructor of objects // [[Construct]] is activated via "new" keyword // and it is the one who allocates memory for new // objects; then it calls F.[[Call]] // to initialize created objects passing as // "this" value newly created object F.[[Construct]] = internalConstructor // scope chain of the current context // i.e. context which creates function F F.[[Scope]] = activeContext.Scope // if this functions is created // via new Function(...), then F.[[Scope]] = globalContext.Scope // number of formal parameters F.length = countParameters // a prototype of created by F objects __objectPrototype = new Object (); __objectPrototype.constructor = F // {DontEnum}, is not enumerable in loops F.prototype = __objectPrototype return F |
Pay attention, F.[[Prototype]] is a prototype of the function (constructor) and F.prototype is a prototype of objects created by this function (because often there is a mess in terminology, andF.prototype in some articles is named as a “prototype of the constructor” that is incorrect).
Conclusion
This article has turned out rather big; however, we will mention functions again when will discuss their work as constructors in one of chapters about objects and prototypes which follow. As always, I am glad to answer your questions in comments.
Additional literature
- 13. — Function Definition;
- 15.3 — Function Objects.