ECMA-262 Core: JavaScript Syntax



Navigation Aids -- This Page        Navigation Aids -- This Topic        Navigation Aids -- This Site








Introduction And Review

The source text of an ECMAScript program is first converted into a sequence of input elements, which are either tokens, line terminators, comments or white space.

ECMAScript syntax is not unlike other programming languages. However, some areas are completely different while other areas conform to popular languages such as Perl, C and Java.





Unicode Character Set

Unicode is the character set of the Web. The 16-bit encoding scheme of Unicode can represent every commonly used written language we have on plant Earth. English specking programmers are only using a tiny portion of the Unicode character set. Typically, we use the Latin-1 (ISO-8859-1) subset of the Unicode. However, most of us can get by with the ASCII character set.

The current ECMAScript standard allows Unicode characters anywhere in a JavaScript program. However, earlier versions allowed only ASCII characters.





Case Sensitivity

JavaScript is case-sensitive. All keywords are lower-case. Case sensitivity means that you cannot use a keyword in a upper-case form. Identifiers must be spelled exactly as they were declared. The names of built-in objects and functions must be spelled exactly as given.

XML is also case-sensitive. XML like JavaScript also supports Unicode. When an XML parser or the JavaScript interpreter processes a document, it evaluates each character against the Unicode character mapping. A upper-case "A" (U+0041) is not the same as a lower-case "a" (U+0061).

Because of case-sensitivity, a programmer should conform to a standard way of specifying identifiers. Types of identifiers (label names, variable names, function names and object names) can be made to conform to a naming style that is consistent for each identifier type. This will reduce confusion and debugging time.

Prior to HTML 4.0, HTML was not case-sensitive. Both HTML elements and JavaScript code will exist in the same document. This can be confusing. Many of the HTML elements and attribute names correspond to JavaScript objects and their properties. Plus, HTML code can be embedded inside JavaScript code. Keep your HTML tags and attribute names in lower-case and maybe you can eliminate many JavaScript coding problems.





White Space

Typically, "white space" is a small character set made up of spaces, tabs and newlines. A series of these characters will be ignored by the interpreter as they appear between tokens. However, we also have "significant white space". Some program tokens have significant rules concerning the use of white space. Some of these rules are:

Programmers can take advantage of white space to format their lines of code into consistent patterns (indenting, extra white space lines, etc.). Any technique that aids the programmer to make the code more readable.

In summary, during the process of tokenizing, the interpreter must be aware of the rules for typical white space and significant white space.

A side note: XHTML has an attribute that will control white space in a few of its tags. This attribute is "xml:space". It is really XML syntax used in the XHTML DTD declaration to define white space. Take this link to read more: xml:space.





Line Breaks (new line char)

Line breaks are not necessary in your code as long as you use a semicolon after every statement. But line breaks make your code more readable by placing each statement on a separate line.

Line breaks are interpreted as whitespace as for as the interpreter tokenizer is concerned. A statement that is split between two lines (semicolon absent on first line) will be tokenized properly.





Semicolons

JavaScript accepts a semicolon at the end of every statement (EOS: end-of-statement). The EOS semicolon is optional. The carriage return at the EOS tells JavaScript that the statement has ended. It is good coding practice to include semicolons. Some older browsers may require them. Future versions of ECMAScript could become more strict and require EOS to end with a semicolon just like in C and Java.





Comments

ECMAScript has two types of comment lines. The single-line comment begins with to forward slashes (//). The multiline comment begins with a forward slash and an asterisk (/*) and ends with with an asterisk and a forward slash (*/). // a single line comment /* a multiline line comment as many line as needed */ Comments are treated like white space.





Tokens

The JavaScript interpreter has the job of reading raw data (script) and making sense of it. The interpreter has many rules concerning this process. It has to recognize white space and lump consecutive white space characters together (and then ignore it). And it needs to know what to do with the relevant data that exists between white space. What it does with the relevant data is called tokenizing. These tokens can be keywords. In addition they can be literals, identifiers, regular expressions, operators and syntax punctuation.





Identifiers

The rules for JavaScript identifiers conform to Java and many program languages. Here is a list of what JavaScript allows as an identifier:

Any combination of the above is allowed to make up an identifier with the following exceptions:

Recall, identifiers are case-sensitive. Later version of JavaScript allow any letter or number from the Unicode character set. And these characters can be represented with "\u", that is, an escape sequence followed by the unicode encoding symbol.





Literals

A literal is a value coded directly in the script. Literals are used to assign a value to a variable. Since variables can represent many data types, the syntax rules for literals will depend on the desired data type of the variable being initialized. The primitive data type as well as the reference data types have syntax structure governing there corresponding literals. Literals are covered separately on page "ECMA-262 Core: JavaScript Literals".





Keywords

A keyword is a word that is part of the language (built-in vocabulary). JavaScript has many keywords and the complete set can be found in this list. Keywords are not to be used as identifiers since keywords are reserved strictly for the language syntax.

As a presentation convention, we attempt to render keywords with green bold italic when used in the context of a keyword.





Reserved Words

Reserved words are words targeted by the ECMAScript standard as words that may be incorporated into the language syntax at a future release. Like keywords, reserved words are not to be used as identifiers. The JavaScript reserved words are common words used in other program languages (but not yet implemented in JavaScript). The complete set of reserved words can be found in this list.





Operator Structure

Some operators are keywords (delete, void, new, etc.). Most operators are symbols. These symbols are part of the JavaScript language syntax. Expressions containing operators will conform to a style structure depending on the number of operands associated with the operator. In turn, operands values may need to conform to the token type the operator expects. The JavaScript interpreter can recognize operators and the general syntax rules of expected operands.





Statement Syntax

JavaScript statements also have a predetermined structure. Some statements have multiple forms where some portions are optional. Other statements depend on punctuation (block symbols, commas, multiple semicolons) to determine statement syntax structure. The syntax for statements is covered separately on page "ECMA-262 Core: JavaScript Statement Syntax".



Top            

Rx4AJAX        About Us | Topic Index | Contact Us | Privacy Policy | 2008 This Site Built By PPThompson