PHP: What’s a valid JavaScript identifier (or function name)?

After another reply to a question I’ve had on StackOverflow for a while, I decided that I perhaps should add another level of security to my method of providing JSONP from PHP. The way I did it before, I didn’t do any checking on the provided callback. This means that someone could technically put whatever they wanted in there, including malicious code. So, therefore it might be a good idea to check if the callback, which should be a function name, actually is a valid function name. But,

What is valid?

To figure that out, we need a look in the ECMAScript Language Specification. In chapter 13 on functions, we find that a function name is a so-called identifier, which is described in chapter 7.6. There we can find the following facts:

Identifier <IdentifierName> but not <ReservedWord>
IdentifierName <IdentifierStart>
<IdentifierName> <IdentifierPart>
IdentifierStart <UnicodeLetter>
\ <UnicodeEscapeSequence>
IdentifierPart <IdentifierStart>
UnicodeLetter Uppercase letter (Lu)
Lowercase letter (Ll)
Titlecase letter (Lt)
Modifier letter (Lm)
Other letter (Lo)
Letter number (Nl)
UnicodeCombiningMark Non-spacing mark (Mn)
Combining spacing mark (Mc)
UnicodeDigit Decimal number (Nd)
UnicodeConnectorPunctuation Connector punctuation (Pc)
UnicodeEscapeSequence The definitions of the nonterminal UnicodeEscapeSequence is given in 7.8.4
ZWNJ U+200C (Zero-width non-joiner)
ZWJ U+200D (Zero-width joiner)
ReservedWord <Keyword>
Keyword break, do, instanceof, typeof, case, else, new, var, catch, finally, return, void, continue, for, switch, while, debugger, function, this, with, default, if, throw, delete, in, try
FutureReservedWord class, enum, extends, super, const, export, import
implements, let, private, public, yield, interface, package, protected, static
NullLiteral null
BooleanLiteral true, false

Looks long, but not too complicated.

Checking if a string is valid

To check if a string is a valid identifier is now pretty easy. We just need to make sure the string matches the allowed syntax, and that it’s not a reserved word. The first we can solve with a regular expression and the second with a simple white list array. For example, something along the following lines.

function is_valid_identifier($subject)
      = '/^[$_\p{L}][$_\p{L}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\x{200C}\x{200D}]*+$/u';

    $reserved_words = new array('break', 'do', 'instanceof', 'typeof', 'case',
      'else', 'new', 'var', 'catch', 'finally', 'return', 'void', 'continue',
      'for', 'switch', 'while', 'debugger', 'function', 'this', 'with',
      'default', 'if', 'throw', 'delete', 'in', 'try', 'class', 'enum',
      'extends', 'super', 'const', 'export', 'import', 'implements', 'let',
      'private', 'public', 'yield', 'interface', 'package', 'protected',
      'static', 'null', 'true', 'false');

    return preg_match($identifier_syntax, $subject)
        && ! in_array(mb_strtolower($subject, 'UTF-8'), $reserved_words);

Not too complex, although the regular expression might look a bit nuts at first because of all the Unicode character groups. You might find regular expressions other places to do this that uses a-z for the letters, but as you saw from the specification that won’t cover much of what’s actually valid.

I built the expression using the very helpful RegexBuddy and exported an HTML explanation of it. Also threw together a tiny identifier validator thing where you can test it out with. You find it all at

And that’s that. Hope that might be helpful for someone and please let me know if you find any issues with it!

Note: I have ignored the issue with the Unicode escape sequences for now as I’m not quite sure how to best handle those. From the specification:

A UnicodeEscapeSequence cannot be used to put a character into an IdentifierName that would otherwise be illegal. In other words, if a \UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence’s CV, the result must still be a valid IdentifierName that has the exact same sequence of characters as the original IdentifierName. All interpretations of identifiers within this specification are based upon their actual characters regardless of whether or not an escape sequence was used to contribute any particular characters.

So, I’m not sure if there is a way to just convert those sequences into actual characters or if this is done automatically by PHP as they come in as GET parameters or what. Either way, my code above there is ignoring them. This means, identifiers with escape sequences will not be considered valid. If you have some good ideas on how to handle it, please leave a comment ๐Ÿ™‚

  • It should be noted that your regex has some false positives. Supplementary Unicode characters (e.g. U+2F800 CJK Compatibility Ideograph, which is listed in the [Lo] category) are disallowed in identifier names, as JavaScript interprets them as two individual surrogate halves (e.g.


    ) which donโ€™t match any of the allowed Unicode categories. Your regex, however, would allow such a character.

    • Yeah, thanks for letting me know ๐Ÿ™‚ For most purposes I think I’ll prefer to keep it simple though. At least where I use it in a JSONP handler as just a sanity-check of the callback.

  • Hello, your post has greatly helped me.
    Up until a few hours ago I had no idea what JSONP was, but was trying to use cross domain JSON trough Sencha Touch…
    Well, long story short, found your post about JSONP used your function, failed a little more, fiddled with your code and got it.
    The only thing was that the callback function name Sencha sends has a lot of dots (.) in it, so I had to filter…
    which worked perfectly.
    Now I’m wondering which could be the most loose way to protect from injection XSS. Though my hacky solution seems to work fine in this case.
    Thanks again.

  • Krish

    Hi what about the validity of question mark ?

  • King

    can i know ” javascript tokens are categorized in to five group, what are they ? and they are identifiers……………?