Spotting Dead Code

Highlighting dead code sections is a ubiquitous feature of modern IDEs. How would you implement it in Arbori?

Our running example would be adopted from Trivadis Code Guildelines, Rule G-1040:

begin
   if 2 = 3 then
      null; -- some dead code here
   else
      null; -- some enabled code here
   end if;

   case
      when 1 = 1 and 'x' = 'y' then
         null; -- some dead code here
      else
         null; -- some further enabled code here
   end case;

   return;
   null; -- some dead code here
 end;
/

The first step of semantic analysis is always examining the parse tree:

[0,39)   sql_statement  sql_statements
  [0,38)   block_stmt  labeled_block_stmt  library_unit
    [0,1)   'BEGIN'
    [1,36)   seq_of_stmts
      [1,14)   if_stmt  labeled_nonblock_stmt  ...
        [1,2)   'IF'
        [2,5)   and_expr  boolean_primary  pls_expr  rel
          [2,3)   arith_expr  digits  factor  ...
          [3,5)   relal_op_sim_expr_opt
            [3,4)   '='  relal_op
            [4,5)   arith_expr  digits  factor  ...
        [5,6)   'THEN'
        [6,8)   labeled_nonblock_stmt  null_stmt  ...
          [6,7)   'NULL'
          [7,8)   ';'
        [8,11)   else_clause_opt
          [8,9)   'ELSE'
          [9,11)   labeled_nonblock_stmt  null_stmt  ...
            [9,10)   'NULL'
            [10,11)   ';'
        [11,12)   'END'
        [12,13)   'IF'
        [13,14)   ';'
      [14,36)   stmt_list_opt
        [14,34)   stmt_list_opt
          [14,32)   case_stmt  labeled_nonblock_stmt  ...
            [14,15)   'CASE'
            [15,26)   case_stmt_alt  case_stmt_alt_seq
              [15,16)   'WHEN'
              [16,23)   and_expr  case_choice  ...
                [16,19)   and_expr  boolean_primary  rel
                  [16,17)   arith_expr  digits  factor  ...
                  [17,19)   relal_op_sim_expr_opt
                    [17,18)   '='  relal_op
                    [18,19)   arith_expr  digits  factor  ...
                [19,20)   'AND'
                [20,23)   boolean_primary  rel
                  [20,21)   arith_expr  factor  pri  ...
                  [21,23)   relal_op_sim_expr_opt
                    [21,22)   '='  relal_op
                    [22,23)   arith_expr  factor  pri  ...
              [23,24)   'THEN'
              [24,26)   labeled_nonblock_stmt  null_stmt  ...
                [24,25)   'NULL'
                [25,26)   ';'
            [26,29)   else_clause_opt
              [26,27)   'ELSE'
              [27,29)   labeled_nonblock_stmt  null_stmt  ...
                [27,28)   'NULL'
                [28,29)   ';'
            [29,30)   'END'
            [30,31)   'CASE'
            [31,32)   ';'
          [32,34)   labeled_nonblock_stmt  return_stmt  ...
            [32,33)   'RETURN'
            [33,34)   ';'
        [34,36)   ... stmt  null_stmt  ...
          [34,35)   'NULL'
          [35,36)   ';'
    [36,37)   'END'
    [37,38)   ';'
  [38,39)   '/'

To identify the dead code sections in the IF and CASE operators we must have to evaluate boolean conditions which were recognized at the parse nodes [2,5) and [16,23), correspondingly. From the parse tree output it is evident that those two nodes are not labeled with something natural like “boolean_condition“. It is desirable to handle both scenarios at once, which leaves us with the two choices: and_expr or pls_expr. The first grammar symbol is clearly an artifact of the conjunction in our example, and it would not work for disjunction. On the other hand, the pls_expr payload is too general and embraces other types of expressions. To solve it, we can always add more constraints into Arbori query such as the parse node predecessor payload which is a keyword, or better yet the successor, which is the same — the keyword THEN — in both scenarios, but it won’t be necessary.

Our more pressing issue at the moment is how to evaluate an arbitrary boolean condition with constant literals. This is a challenge because code execution/evaluation is currently beyond Arbori’s capabilities. We solve it with a hack by introducing JavaScript condition which evaluates the code recognized at the parse node:

    function isTrueCondition(node) {
        var content = nodeContent(node);
        content = content.replaceAll('and','&&');
        content = content.replaceAll('=','==');
        try {
            return eval(content)?true:false;
        } catch (error) {
            return 'N/A';   
        }
    }

For example, the CASE operator condition 1 = 1 and 'x' = 'y' would be rewritten into JS syntax 1 == 1 && 'x' == 'y' before being submitted to evaluation by the JS engine.

Now, we can plug in this function into the following Arbori query identifying the dead code. It has been done in two steps, first identifying all conditions that evaluate to true and false together with the corresponding code sections:

TrueConditions:  [boolean) pls_expr 
              & .isTrueCondition('boolean')
              & ( boolean = dead-3
                | boolean^+1 = dead
              ) & [dead) else_clause_opt
;
              
FalseConditions:  [boolean) pls_expr 
              & !.isTrueCondition('boolean')
              & boolean = dead-2
              & [dead-1) 'THEN' 
;

Then, both predicates can be combined in the report query, where wold also add the easier dead code recognition for the statements following the RETURN. The full Arbori program:

include "std.arbori"

{
    var Service = Java.type('oracle.dbtools.util.Service');

     function isTrueCondition(node) {
        var content = nodeContent(node);
        content = content.replaceAll('and','&&');
        content = content.replaceAll('=','==');
        try {
            return eval(content)?true:false;
        } catch (error) {
            return 'N/A';   
        }
    }
}

TrueConditions:  [boolean) pls_expr 
              & .isTrueCondition('boolean')
              & ( boolean = dead-3
                | boolean^+1 = dead
              ) & [dead) else_clause_opt
;
              
FalseConditions:  [boolean) pls_expr 
              & !.isTrueCondition('boolean')
              & boolean = dead-2
              & [dead-1) 'THEN' 
;

FollowingReturn: [semicolon) ';' & [semicolon-1) 'RETURN'
            &  [dead) stmt
            &  semicolon)=[dead
;
              
DeadCode: TrueConditions
  |       FalseConditions
  |       FollowingReturn
-> {
    var pos = target.src.get(tuple.get('dead').from).begin; 
    var lineMap = Service.lineMap(target.input);
    var line = Service.charPos2LineNo1(lineMap,pos);
    
    javax.swing.JOptionPane.showMessageDialog(null,'line#'+line+'  :  '+nodeContent('dead'));           
}    

Advertisement

How to cook your own Language

The idea of Domain Specific Languages has been around for a while. For example, Oracle Graal/Truffle framework provides a runtime environment for many popular languages, but is also making an effort to simplify the new language development. The starting point for Graal/Truffle is the language semantics captured in the Abstract Syntax Tree (AST). However, the first part — how to get from the language grammar defined in CFG to the AST — is left open.

In this article we’ll explore the full path, from CFG to the language implementation runtime. In particular we’ll investigate what is the minimal amount of effort involved for the language developer to accomplish it and, in particular, if the concept of AST is necessary.

Running Example

For illustrative purposes, we’ll adopt the example from Wikipedia page on AST:

a := 30
b := 47 - 5
while b != 0
    --print a
    --print b
    if a > b
        a := a - b
        --print a
    else
        b := b - a
        --print b
    end
end        
print a
print b

The minor amendments to the original syntax include the addition of the end delimiter for the while and if statements; otherwise, the sequence of statements in the grammar is ambiguous. Also, the print statement is added, which is helpful for debugging purposes.

This this code snippet would be referenced in the following sections by the file name: example.expr.

Grammar

Writing the grammar for our tiny language is certainly a required step. The alternative — manually written procedural recursive descend parser — is more laborious. Here is the BNF:

program:
     stmt
   | program stmt
;   

stmt:
     assignment
   | while
   | if
   | print
;

assignment: identifier ':' '=' expr
;

expr:
     digits
   | identifier
   | expr '-' expr
;

boolean: 
      expr '>' expr
    | expr '!' '=' expr
;

while:  'while' boolean  program 'end'
;

if:  'if' boolean  program 'else'  program 'end'
;

print: 'print' expr
;  

Arguably, plain BNF is the simplest language posible, so that little comment is warranted.

For future references, this grammar is referenced by the file name: expr.grammar.

Parsing

The prerequisite for parsing is the choice of Parser Engine. Our choice here is the one based on Earley parsing algorithm. The implementation is in Java and is provided in the dbtools-common.jar, which is the support library for popular SQL tools, such as SQLDeveloper, SQLcl, and ORDS. If the reader is uncomfortable with the code base which is not Open Source, here is the github link to an implementation, which doesn’t include Oracle artifacts.

With parser engine support we instantiate the parser and run it as follows:

package ast;

import java.io.IOException;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;

import oracle.dbtools.parser.Cell;
import oracle.dbtools.parser.Earley;
import oracle.dbtools.parser.Grammar;
import oracle.dbtools.parser.LexerToken;
import oracle.dbtools.parser.Matrix;
import oracle.dbtools.parser.ParseNode;
import oracle.dbtools.parser.RuleTuple;
import oracle.dbtools.parser.Visual;
import oracle.dbtools.parser.plsql.SyntaxError;
import oracle.dbtools.util.Service;

public class ExprEarley extends Earley {

	public static void main( String[] args ) throws Exception {
		String input =  Service.readFile(ExprEarley.class, "example.expr");

		Earley earley = new ExprEarley();

		List<LexerToken> src = LexerToken.parse(input);
		if( src.size() < 100 )
			LexerToken.print(src);
		else
			System.out.println("src.size()="+src.size());

		Matrix matrix = new Matrix(earley);
		/*if( src.size() < 400 ) {
			Visual.pixels = 900;  // Mac scaling for UHD; default for Win
			matrix.visual = new Visual(src, earley);
		}*/

		//earley.skipRanges = false; //<-- disable optimization

		earley.parse(src, matrix);

		if( matrix.visual != null )
			matrix.visual.draw(matrix);


		// get the bottom-right corner of the matrix
		// this is "top" in the "bottom-to-the-top" parser terminology
		Cell top = matrix.get(0, src.size());
		if( top != null ) // if input is syntactically recognized
			System.out.println(src.size()+ " tokens, top = " + top.toString()); 
		else {
			System.out.println(SyntaxError.checkSyntax(input, new String[]{"program"}, src, earley, matrix));
			return;
		}

		ParseNode root = earley.forest(src, matrix, true, input);
		root.printTree();
	}

    public ExprEarley() throws Exception {
        super(getRules());
        isCaseSensitive = true;
    }

	private static Set<RuleTuple> getRules() throws IOException {
        Set<RuleTuple> ret = new TreeSet<>();
        String rulesTxt = Service.readFile(ExprEarley.class,"expr.grammar"); 
        List<LexerToken> src = LexerToken.parse(rulesTxt);
        ParseNode root = Grammar.parseGrammarFile(src, rulesTxt);
        Grammar.grammar(root, src, ret);
        return ret;
	}
}

The bulk of the code it the ExprEarley.main() method, which a user who not interested in the bells and whistles of Earley algorithm can skip. Moreover, in many use cases, using the wrapper Parsed.class is less verbose as compared to the above boiler plate code parsing in steps, and this would actually be the approach in the next section. Therefore, the parser for the expr.grammar itself, is actually only a dozen or so lines at the bottom, that is the constructor, and the grammar extraction method.

For an interested reader, however, here is the walkthrough of the main() method:

  • At line 22 the example.expr input have been read. The first argument is Java resource prompting for the file directory collocated with ExprEarley.class. Alternatively, the reader can use any other of dozens reading the file methods available in Java.
  • At line 26 we have performed lexical analysis, which produced a list of Tokens. The Lexical Analyzer in dbtools-common library is essentially a sugarcoated java.util.StringTokenizer amended to cover a large variety of practical languages. The language specifics such as the comment style is controlled via extra arguments to the call, but in our case we have chosen the default, which is SQL.
  • At line 32 we have installed the object which captures the crux of the Earley parsing algorithm. The commented code visualizes the parsing process as the two dimensional grid with the recognized grammar rules in the grid cells. The cubic complexity of the parser method is evident from this visualization, as the power 2 comes from the two dimensional nature of the grid, and one extra multiple comes from the Earley complete step. An interested reader is referred to this document for details.
  • At line 38 we have an option to disable ad-hock optimization which differentiates our implementation from the canonical Earley algorithm. The user has an option to toggle optimization at the Matrix visualization panel, and witness grid areas colored in blue with skipped analysis.
  • Finally, at line 56 the parser produced the Parse Tree, which in our case is
[0,38)   program
  [0,36)   program
    [0,34)   program
      [0,10)   program
        [0,4)   assignment  program  stmt
          [0,1)   identifier
          [1,2)   ':'
          [2,3)   '='
          [3,4)   digits  expr
        [4,10)   assignment  stmt
          [4,5)   identifier
          [5,6)   ':'
          [6,7)   '='
          [7,10)   expr
            [7,8)   digits  expr
            [8,9)   '-'
            [9,10)   digits  expr
      [10,34)   stmt  while
        [10,11)   'while'
        [11,15)   boolean
          [11,12)   expr  identifier
          [12,13)   '!'
          [13,14)   '='
          [14,15)   digits  expr
        [15,33)   if  program  stmt
          [15,16)   'if'
          [16,19)   boolean
            [16,17)   expr  identifier
            [17,18)   '>'
            [18,19)   expr  identifier
          [19,25)   assignment  program  stmt
            [19,20)   identifier
            [20,21)   ':'
            [21,22)   '='
            [22,25)   expr
              [22,23)   expr  identifier
              [23,24)   '-'
              [24,25)   expr  identifier
          [25,26)   'else'
          [26,32)   assignment  program  stmt
            [26,27)   identifier
            [27,28)   ':'
            [28,29)   '='
            [29,32)   expr
              [29,30)   expr  identifier
              [30,31)   '-'
              [31,32)   expr  identifier
          [32,33)   'end'
        [33,34)   'end'
    [34,36)   print  stmt
      [34,35)   'print'
      [35,36)   expr  identifier
  [36,38)   print  stmt
    [36,37)   'print'
    [37,38)   expr  identifier

Program Evaluation

The common wisdom of program execution is transforming parse tree into AST. Again, we investigate if this step can be omitted, or possibly replaced by something else.

In the implementation we have chosen JavaScript engine available in Java, because it is a dynamic language with functions as first class citizens. Pure Java solution would be little more verbose with boilerplate wrapper objects.

There are the two alternative ways to run the scripts:

  • as Arbori program, or
  • directly via JavaScript engine

We don’t really leverage any Semantic Analysis facilities available, and the only advantage that Arbori provides is creating the environment with the parsed object available as global variable. In the second approach we would have to explicitly put it there.

The Java which runs Arbori is the following:

package ast;

import java.util.Map;

import oracle.dbtools.arbori.MaterializedPredicate;
import oracle.dbtools.arbori.Program;
import oracle.dbtools.parser.Earley;
import oracle.dbtools.parser.Parsed;
import oracle.dbtools.util.Service;

public class Evaluate {
	public static void main( String[] args ) throws Exception {
		String input =  Service.readFile(ExprEarley.class, "example.expr");
		Earley earley = new ExprEarley();
		
		Parsed parsed = new Parsed(input,earley,"program");
		
		Program program = new Program(earley, "JS") {
		};
		String prg = Service.readFile(ExprEarley.class, "evaluate.arbori");
		program.compile(prg);
        
        Map<String,MaterializedPredicate> output = program.eval(parsed);
	}
}

As we have promised in the previous section, the parsing is performed with one call at line 16. The third argument — the "program" literal — is the grammar top symbol.

The evaluate.arbori code mentioned at line 20 is essentially the following JavaScript:

{
   var Service = Java.type('oracle.dbtools.util.Service');
   target.root.printTree();
   
   var debug = true;
   
   var scope = new java.util.HashMap();
   
   var ruleHandlers = {
        "assignment:  identifier  ':'  '='  expr;"
                                 : function Assignment( node, children ) {
                                       scope.put( children[0].content(target.src),eval(children[3]) );
                                   }
      , "expr:  expr  '-'  expr;"
                                 : function Minus( node, children ) {
                                        return eval(children[0])-eval(children[2]);
                                   }
      , "expr:  digits;"         
                                 : function Digits( node, children ) {
                                       return java.lang.Integer.parseInt(node.content(target.src));
                                   }
      , "if:  'if'  boolean  program  'else'  program  'end';"         
                                 : function If( node, children ) {
                                       var condition = eval(children[1]);
                                       if( condition )
                                           return eval(children[2]);
                                       else    
                                           return eval(children[4]);
                                   }
      , "boolean:  expr  '>'  expr;"         
                                 : function Gt( node, children ) {
                                       return eval(children[0]) > eval(children[2]);
                                   }
      , "expr:  identifier;"         
                                 : function Id( node, children ) {
                                       return scope.get(node.content(target.src));
                                   }
      , "while:  'while'  boolean  program  'end';"         
                                 : function While( node, children ) {
                                       var i = 0;   // to avoid accidental infinite loop
                                                        //  removed in the production code 
                                       while( eval(children[1]) && ++i<5 ) 
                                           eval(children[2]);
                                   }
      , "boolean:  expr  '!'  '='  expr;"         
                                 : function Ne( node, children ) {
                                        return eval(children[0]) != eval(children[3]);
                                   }
      , "print:  'print'  expr;"         
                                 : function Print( node, children ) {
                                       print( children[1].content(target.src)+"="+eval(children[1]) );
                                   }
   }
   
   
   function matchingProduction( node ) {
      var children = node.children().toArray();
      var rules = target.getEarley().rules;
      for( ir in rules ) {
         var rule = rules[ir];
         if( !node.contains(rule.head) )
         	continue;
         if( node.from+1 == node.to ) {
            if( rule.head != rule.rhs[0] && node.contains(rule.rhs[0]) )
               return rule;
            else
               continue;   
         }
         var matched = true;
      	 for( ik in children ) {
      	    if( rule.rhs[ik] == null ) {
      	       matched = false;
      	       break;
      	    }
      	    if( !children[ik].contains(rule.rhs[ik]) ) {
      	       matched = false;
      	       break;
      	    }
      	 }
      	 if( matched )
      	 	return rule.toString();
      }
   }
   
   function eval(node) {
      if(debug) java.lang.System.out.print(Service.padln(node,40)+"  ");

      var children = node.children().toArray();
            	 
      var rule = matchingProduction( node );
      
      if(debug) print(rule);
      
      if( rule == null )
      	  return "N/A";
      	      
      var handler = ruleHandlers[rule];
      //print('handler='+handler);	
      if( handler != null) {
	      var ret = handler(node, children); 
	      if( debug && ret != null ) 
	         print("EVAL "+node + " TO " + ret);
	      return ret;
	  } else if( children != null ) {    	 
	      for( ik in children ) {
	         eval(children[ik]);
	      } 
	  } else
	      return "?"; 	 
   }
   
   eval(target.root);
   print(scope);
}

It looks little more verbose than it actually is, because only the lines 9 to 52 is actually the code that language developer has to write, The matchingProduction() and eval() functions are universal for the language, and thus can be hidden in a library. The matchingProduction() function purpose is to find the grammar rule disjunct that has been actually recognized at the parse node. The implementation itself is far from being efficient, but this is not our concern here. Again, our focus is how much language specific supporting code the developer has to write.

In lines 9 to 52 we have mapped the string representations of the rule productions into the functions that perform the evaluation at the nodes where such production has been discovered. For language developer it might be helpful to run the following Arbori query which captures matching production at each node:

matchingProductions: [node < node)
-> {
   var node = tuple.get('node')
   print(Service.padln(node,40)+"  "+matchingProduction(node));
}

Consequently, the string representation of relevant rules are copied to the ruleHandlers map.

For example, at the node

[15,33) if program stmt

it have recognized the rule production

if: 'if' boolean program 'else' program 'end';

There is only one production that is recognized at each node, so that we can leverage this knowledge by attaching the evaluation function to it in the ruleHandlers map. In our example, the function If( node, children ) would be called as soon as the above production is recognized. In this function we process the parse tree node, and then descend to node’s children.

Conclusion

There seems to be an interesting conceptual distinction between Semantic Analysis of a program, and the execution of thereof. At present, it seems that you can’t just write an Arbori query that executes the program, which is somewhat puzzling. Semantic Analysis inspects the program code globally, for example, capturing the relations between the ancestor and descendant parse tree nodes. On the other hand, the program execution mechanism is mostly concerned with the parent-child relations. It is the recursion leverage, that perhaps fixes the execution mechanism shortcomings.

Aggregates with Having Clause

Arbori features very limited data type system, hinged upon essentially a single type: the parse tree node. In many Arbori applications semantic code analysis imposes restrictions on code complexity, which rely upon numerically valued aggregate functions. However, due to aforementioned datatype design aggregation with grouping in Arbori covers only 4 cases:

  1. The oldest node, that is preceding all others
  2. The youngest node, that is following everybody else
  3. The node outliving other nodes
  4. The node with the lifespan within the others, that is closest to parse tree leaves

The examples where aggregation doesn’t fit into these cases are easy to come by. For instance, can we query for all the select_clauses which select_term count doesn’t exceed 10?

Here is the solution, leveraging Java Script Conditions:

{
   function descendantLimit( descendantContent, limit ) {
      var nodeNames = tuple.keySet();
      var node = tuple.get(nodeNames.toArray()[0]);
      var cnt = 0;
      var descendants = node.descendants();
      for( i in descendants ) {
         if( descendants[i].contains(descendantContent) )
            cnt++;
      }
      return cnt < eval(limit);
   }
}

smallSelectTerms: [clause) select_clause 
               & .descendantLimit('select_term','10')
-> {
   print(tuple);
}

At line#2 we have introduced a new function descendantLimit. Javascript conditions in Arbori are allowed have 0,1, or 2 arguments, and the arguments are assumed to be string literals. The return value is always boolean, so that Arbori runtime can evaluate the function when examining each tuple.

At line#3 we have extracted all the tuple attribute names. Getting ahead to line#15 we see that the function is called within unary predicate, so this list must have length 1.

At next line #4 we have extracted the current parse node, by the attribute name.

Line #6 introduced the variable referencing the list of all the node descendants, which is iterated at line #7.

Lines# 8-9 increment the counter whenever a descendant payload contains select_term.

Finally, at line#11 function returns the evaluation of the condition that number of select_terms doesn’t exceed the limit.

Fully programmable Code Templates

A recurring theme of this blog is extensibility of SQL Developer, and Integrated Development Environments, in general. The readers of this blog probably already guessed the answer. But what is the problem?

Consider the Code Templates functionality, which steered some confusion on SQL Dev OTN Forum. What are code templates? They are functions that generate code. Such as a function returning a fixed string literal, or the current date function that generates date string, or a function that generates the current schema name. Those functions can be defined either on the server side in PL/SQL, or client tier in javascript. Again, as you might have already guessed that here we’ll promote the Arbori based solution.

There are several facets to the problem:

  • Where do we keep those javascript functions?
  • How do we prompt those, which are relevant in the context?
  • How do we act according to the user’s choice?

We’ll declare the functions in arbori file, like this:

include "std.arbori"

templateDefinitions: runOnce -> {
   function selectFromDual() {
       return 'select * from dual;'
   }
   function insertInto() {
       return 'insert into '
   }
   ...
}

Next, let’s get a handle to the editor window, and provide a function that will do the insertion in the editor panel:

   var panel = struct.get('outlinePanel');
   var editor = panel.getEditor().getFocusedEditorPane()
      
   function insertAtCursor( text ) {
      var doc = editor.getDocument()
      var pos = editor.getCaretPosition()
      doc.insertString(pos, text, null)
   }   

Finally, we’ll build the context menu that prompts those templates:

   var menu = new javax.swing.JPopupMenu()
   
   var item1 = new javax.swing.JMenuItem("<html><b>"+selectFromDual());
   var listener1 = new java.awt.event.ActionListener(
      function () {
         insertAtCursor(selectFromDual())
      }   
   );
   item1.addActionListener(listener1);
   menu.add(item1)
   
   var item2 = new javax.swing.JMenuItem(insertInto());   
   var listener2 = new java.awt.event.ActionListener(
      function () {
         insertAtCursor(insertInto())
      }   
   );
   item1.addActionListener(listener2);
   menu.add(item2)

   editor.setComponentPopupMenu(menu)

Here is the full code:

include "std.arbori"

codeTemplates: runOnce -> {
   function selectFromDual() {
       return 'select * from dual;'
   }
   function insertInto() {
       return 'insert into'
   }
   var panel = struct.get('outlinePanel');
   var editor = panel.getEditor().getFocusedEditorPane()
      
   function insertAtCursor( text ) {
      var doc = editor.getDocument()
      var pos = editor.getCaretPosition()
      doc.insertString(pos, text, null)
   }   
   
   var menu = new javax.swing.JPopupMenu()
   
   var item1 = new javax.swing.JMenuItem("<html><b>"+selectFromDual());
   var listener1 = new java.awt.event.ActionListener(
      function () {
         insertAtCursor(selectFromDual())
      }   
   );
   item1.addActionListener(listener1);
   menu.add(item1)
   
   var item2 = new javax.swing.JMenuItem(insertInto());   
   var listener2 = new java.awt.event.ActionListener(
      function () {
         insertAtCursor(insertInto())
      }   
   );
   item2.addActionListener(listener2);
   menu.add(item2)

   editor.setComponentPopupMenu(menu)
}

and the screenshot:

Certainly, this is very basic implementation, which can be greatly improved. For example, the context menu can be aware of the text surrounding the cursor, and filter out the item list accordingly. However, to keep the article concise, I leave it to imagination of the reader, what other directions this idea can be extended.

Too many tabs

This feature is credited to Surya Gupta from Eugene Perkov team. Arbori execution results panel instantiates a tab with a grid for every named predicate, which quickly becomes overwhelming for complex program. In version 20.2 this problem is little less significant, because you can modularize your program leveraging the include directive. The included predicates are not displayed, and if you want them to, you can always define their clones in the main module.

Here is the direct method to get rid of the unwanted tabs:

include "std.arbori"

Prelude: runOnce -> {
    /**
     * Arbori program with numerous auxiliary predicates often results
     * in overwhelming number of tabs at SQLDev Arbori execution panel
     */
    function removeTabsExcept( tabArray ) {
        var execOrder = program.querySequence();
        var copy = execOrder.clone();
        for( tabNo = execOrder.length-1; 0 <= tabNo; tabNo=tabNo-1 ) {
            var exclude = true;
            for( tIndex in tabArray ) {
                if( tabArray[tIndex] == execOrder[tabNo] )
                    exclude = false;
            }
            if( exclude ) {
                execOrder.remove(tabNo);
            }
        }
        var Thread = Java.type('java.lang.Thread');
        new Thread(function () {
             Thread.sleep(1000);
             execOrder.clear();
             execOrder.addAll(copy);
        }).start();
    }
}

UnwantedTab1: runOnce;

OKTab1: runOnce;

UnwantedTab3: runOnce;

OKTab4: runOnce;

Finale: runOnce -> {
    removeTabsExcept(['OKTab1','OKTab4']);
}

This program works since version 19.4. The function removeTabsExcept that does all the work should have been refactored into the standard library, but it is not there in 20.2 (it would be in 20.3).

A more intuitive method to remove tabs would have been obtaining a handle to the results panel, and removing the children widgets. The handle would be there in version 20.3.

Arbori 20.2

SQL Developer release 20.2 features several Arbori amendments.

Include directive

Suppose you have defined Arbori predicates and Java Script functions in some arbori file, how do you reuse them? The include directive was there in previous versions, but it failed to work for user-defined Arbori libraries. In version 20.2 it has been remedied with the following ways to include your library:

Absolute path
include "C:\Users\dim\Documents\Arbori\collapse non table nodes.arbori"

finale: runOnce -> {
   print('finished')
}

This snippet executes the collapse non table nodes.arbori program first, and then adds a log record to Java console. The runOnce predicate is defined in the standard library std.arbori, which has been included into the collapse non table nodes.arbori program. What else is in this program is a separate topic, which we’ll focus later.

Relative path

The default SQL Developer directory is where it is started from, which is something like

C:\sqldeveloper-20.2\sqldeveloper\bin

If you create a folder ArboriLib there, then you can reference its content like this:

include "ArboriLib/collapse non table nodes.arbori"

finale: runOnce -> {
   print('finished')
}
Java environment setting

Edit the sqldeveloper.conf to define dbtools.arbori.home Java environment variable like this:

AddVMOption -Ddbtools.arbori.home=/Users/dim/Documents/Arbori

Then, the included Arbori directive will lookup files to the path relative to this directory.

Arbori Editor

Arbori code is now first-class citizen in SQL Developer. You can open .arbori files via the Open File menu item:

The editor has the expected functionality: syntax coloring, undo-redo, Ctrl-F, edit history tab, etc.

UI Widget Handles

Extending SQL Developer functionality via Nashorn Java Script actions has become recurrent theme. To be able to add widgets, or control the existing ones, however, there needs to be a handle to SQL Developer UI objects. In this section we’ll walk through the two examples how to do it.

First, lets add a button to the Code Outline panel. Consider the following Arbori-Nashorn code:

include "std.arbori"

Prelude: runOnce -> {
   var panel = struct.get("outlinePanel");
   var button = new javax.swing.JButton("<html><b>My button!");
   var listener = new java.awt.event.ActionListener(
      function () {
         print('OK');
      }   
   );
   button.addActionListener(listener);
   panel.getToolbar().add(button);
   panel.getToolbar().revalidate();
}

At line #4 we have obtained the handle to the Code Outline panel. Then, we have instantiated a button, and added a listener to it. The final two steps were adding the button to the panel’s toolbar, and refreshing the GUI to render it.

The second example is little more sophisticated. Displaying the full parse tree is problematic, because it has leaf node for every lexical token. The total number of parse tree nodes can be estimated to be multiplied by the average tree depth — roughly the logarithm of the number of tokens. Given that each line of the code editor usually fits several tokens, it is evident that the parse tree fails to provide birds-eye view onto the large file. If you don’t follow this idea, then just compare the sizes of the scrollers for moderately sized input, at the SQL Worksheet and Code Outline panel.

Can we collapse parse tree nodes selectively? The Code Outline toolbar features a built-in menu button to collapse the nodes with certain payload. What if we want to collapse the nodes with ad-hock criteria? For example, given a SQL query with nested subqueries/inline views, can we have all the clauses collapsed?

Here is Arbori program that does it:

include "std.arbori"

Prelude: runOnce -> {
   var panel = struct.get('outlinePanel');
   var tree = panel.tree;
   print('before: tree.getRowCount()='+tree.getRowCount());
   for( i = 1; i < tree.getRowCount(); i++ ) 
        tree.expandRow(i);    
}

"To collapse": [node) select_clause
        | [node) from_clause
        | [node) where_clause
        | [node) group_by_clause
        | [node) order_by_clause
-> {
    var node = tuple.get("node");
    var i;
    for( i = 1; i < tree.getRowCount(); i++ ) {
        var path = tree.getPathForRow(i);
        var n = path.getLastPathComponent().getUserObject();
        if( n.from == node.from && n.to == node.to  )
            tree.collapseRow(i);
    }
}

Finale: runOnce -> {
    print("after: tree.getRowCount()="+tree.getRowCount());    
}

There are three steps:

  1. Prelude, where we get a handle to the outline tree, log the number of rows, and expand all of them
  2. “To Collapse”, where we identify the nodes of interest, and collapse them
  3. Finale, where we log the size of the collapsed tree

Here is an unwieldy SQL query scrolling dozens of pages, which parse tree is collapsed to quite succinct overview:

Java Script Conditions

Arbori is diminutive language with grammar fitting into couple of pages. It turns out that it still features some redundancies. Consider the following input

SELECT Ord.SalesOrderID, Ord.OrderDate,
    (SELECT *
     FROM Sales.SalesOrderDetail 
     WHERE Ord.SalesOrderID = OrdDet.SalesOrderID) 
From Sales.SalesOrderHeader;

and the program

Matches: ?node = 'FROM'
;

What is the expected behavior of matching node source text, is it case sensitive? In particular, would there be a single match in the above example, or the two?

I was never satisfied with Arbori question mark syntax, and suggest a better alternative: Java Script condition. It is a function which returns a boolean, which should be declared in some Nashorn block, and can be used as Arbori condition. Let’s define a string matching condition in the above example.

include "std.arbori"

Prelude: runOnce -> {
    function nodeContent(nodeVar) {
        var node = tuple.get(nodeVar);
        return target.input.substring(
            target.src[node.from].begin,
            target.src[node.to-1].end
        );
    }
    function stringMatch() {
        print(tuple)
        return 'FROM'== nodeContent('node')
    }
}

MatchesFrom: .stringMatch
      & [node < node) 
;

There we have defined the two Java Script functions: the nodeContent, which cuts a text fragment from the input under a recognized parse node, and the stringMatch, which uses it to perform the actual comparison. This is the second article where I invoke the nodeContent function, which hints that this function better be included into the standard library std.arbori.

The purpose of the Prelude predicate is just the declaration of the stringMatch. This function, returning a boolean, can be used as condition in the following predicate — the MatchesFrom. It is tempting to define this predicate simply as

MatchesFrom: .stringMatch
;

but this wouldn’t work. This query would have no attributes, which is not a problem itself. Unlike SQL, Arbori admits relations with empty set of attributes. However, even if this query were succeeded somehow, it would be impossible to extract any attribute from the tuples, because there aren’t any! The trick here is just adding an attribute named node. Please also note that the Java script implementation of the stringMatch actually extracts the value of the attribute node from every tuple. If there were no such attribute in the MatchesFrom Arbori query, then it would raise a runtime error!

In Arbori there is no analog of Relational Algebra project operator, AKA SQL select. The query attributes are inferred implicitly from the conditions. Therefore, we could amend the query with a condition like

[node) identifier

but this condition would filter out all the keywords. Therefore, the bogus condition

 [node < node) 

which asserts that the beginning position of the node interval is less than the node’s end. Let’s execute the program:

This screenshot highlights the two points. First, witness only one match in the input. Second, the print statement inside the stringMatch implementation is useful for debugging purposes. Java console panel at the left demonstrates that the stringMatch condition has been evaluated at every parse tree node.

Customizing SQL Execution

Contemporary IDE strength is defined not only by advanced and flawless functionality, but also by its openness. How open is SQLDeveloper? Yes, you can write extensions in some cumbersome XML syntax, or even plug-in custom java classes, but you can do better via Arbori Nashorn scripting. To illustrate this point I’m planning to write a series of essays demonstrating how to solve problems. I’ll start with a problem which solution works in SQLDeveloper 19.4. The later essays would focus on more impressive customizations supported by soon to be released 20.2.

Suppose we are given a PL/SQL anonymous block with bind variables

begin
:ret := plus(:arg1,:arg2);
end;

how can we execute it? Certainly, this task looks trivial: click run on the plus stored procedure in the navigator tree. Well, this is not exactly the solution, because you wouldn’t be executing the given PL/SQL block; you would be running a generated wrapper block. And if you try to run this block from SQL Worksheet then you will have to figure out how to supply and retrieve bind variables.

You may argue that there is print command in SQL Plus, so that you can channel the function return value through it. Yes, that is kind of a solution, but it is not entirely satisfactory for several reasons. First, SQL*Plus is just one RDBMS client among many, with fading ubiquitousness along the passage of time. Count the number of questions for “jdbc” vs. “sqlplus” on stackoverflow, if you are not convinced. Second, “run via sqlplus” gives you very little flexibility in how can you run this anonymous PL/SQL block, or how would you scale up your solution in more challenging situations.

Let’s run it in Arbori. The first step would be referring to the standard library, where you’ll find many useful predicates and Java Script functions:

include "std.arbori"

For example, it defines the predicate runOnce, such that that it has only one tuple. Consequently, the Nashorn block that follows the next Arbori rule — prelude — is run once, thereby performing initialization:

prelude: runOnce -> {
    var binds = [];
    function nodeContent(nodeVar) {
        var node = tuple.get(nodeVar);
        return target.input.substring(
            target.src[node.from].begin,
            target.src[node.to-1].end
        );
    }
}

There we allocate an array that will collect all the bind variables found by Arbori query, and define a utility function that cuts the fragment of the text recognized under a parse tree node.

In the next section we run Arbori query that searches for all bind variables and collect them in the previously defined binds array:

binds: [node) bind_var -> {
    binds.push(nodeContent('node'));
}

As usual, to know the grammar symbol to look for, that is bind_var in this example, you need to examine parse tree:

After execution it is instructive to iterate through the result set confirming that the binds query indeed found all the relevant parse tree nodes.

Now that we know all the bind variables, we are ready to execute the anonymous PL/SQL block:

run: [input) block_stmt  -> {
    var  sql = nodeContent('input');    
    print('executing: '+sql);
    
    var ConnectionResolver = Java.type('oracle.dbtools.db.ConnectionResolver');  
    var connectionList = ConnectionResolver.getConnectionNames();
    var conn = ConnectionResolver.getConnection('IdeConnections%23local_hr');
    var cs = conn.prepareCall(sql);
    cs.registerOutParameter(1,java.sql.Types.NUMERIC);
  
    var JOptionPane = Java.type('javax.swing.JOptionPane');  
    for( i = 1; i < binds.length; i++ ) {
        var bindVar=JOptionPane.showInputDialog(binds[i],0);
        var intBind = java.lang.Integer.parseInt(bindVar);
        print(binds[i]+'='+intBind)
        cs.setString(i+1, bindVar);    
    }
    cs.executeUpdate();
    JOptionPane.showConfirmDialog(null,'result='+cs.getInt(1));
    cs.close();
}

Before walking through the last code section, lets witness its working. In the screenshot below user is prompted for input variable names:

The result will be shown also in a dialog, but numerous other options to present it are certainly possible.

Here is the line-by-line explanation of the run query. At line #1 there is basic Arbori query searching for the parse tree nodes labeled as block_stmt. This is where the entire begin...end text fragment would be recognized, and with the Nashorn actions fired at every found tuple. The Nashorn action starting at line #2 constitutes the bulk of the code. There we extract the PL/SQl block as a text, and print it to the console at the next line for debugging purpose.

At lines #5,6,7 we leverage SQL Developer ConnectionResolver, so that we can get connection info from the list of connections that you are already have. Obtaining connection via the standard JDBC API is also possible, although little more verbose with hard coded connection credentials.

At lines #8,9 we prepare the call and register the output bind variable. Next comes the loop at lines #12-17 where we iterate through the array of bind variables, ask for their values as captured on the previous screenshot, and set them in the prepared statement.

After setting all the input bind variables we are ready to execute the statement at line #18, and fire the dialog at the next line showing the function output.

To many readers this may look like cumbersome reinventing the wheel. What is the point of reimplementing the existing functionality? Am I implying that the user has to write couple of pages of Arbori code in order to do some basic task? Not necessarily, as you might perfectly reuse the code written by somebody else. The arbori file extension would be recognized in version 20.2, so all you need is a folder/library of already written Arbori programs, and you open them just as naturally as SQL scripts.

Arbori Semantic Actions

Arbori is a tidy query language with relatively limited expressiveness. This constrains its applications, such as SQL Formatter. The arithmetic in Arbori is virtually non-existent; therefore, is it possible, for example, to break procedure/function argument list after every 5th argument?

One approach would have been extending Arbori language with extra capabilities. This direction however is less than desirable, because requiring users to learn new language with concise syntax is challenging proposition by itself. A new language with extensive cumbersome syntax is a nonstarter.

Alternatively, one can piggyback onto some popular general purpose programming language, such as Java Script. This is the direction followed by SQL Developer version 19.2. The “Output queries” section at page 21 of the arbori reference guide describes how this newly introduced JS action is attached to Arbori query.

In this post we’ll provide one more example: Custom Syntax Coloring of TO_DATE built-in SQL function called with wrong number of arguments. Employing the Arbori query from that forum thread is performed in 5 steps:

1. Append the Arbori query

WrongToDateNumOfArgs:
( [node^) function_call
| [node^) datetime_literal
| [node^) function_expression
) & ?node = 'TO_DATE'
& ([arg) expr
| [arg) string_literal
| [arg) pls_expr )
&  arg^-1 = node
& [arg+1) ')'
& [arg-1) '('
;

at the bottom of the editor pane at the Preferences -> Code Editor -> PL/SQL Syntax Colors -> PL/SQL Custom Syntax Rules

2. Confirm your edit with OK, restart SQLDev. Saving Arbori predicate which is not equipped with JS callback action is a workaround for the bug reported in version 21.2.

3. At the parent preference panel Preferences -> Code Editor -> PL/SQL Syntax Colors you should be able to see the newly introduced style:color

4. Add JS semantic action:

WrongToDateNumOfArgs:
( [node^) function_call
| [node^) datetime_literal
| [node^) function_expression
) & ?node = 'TO_DATE'
& ([arg) expr
| [arg) string_literal
| [arg) pls_expr )
&  arg^-1 = node
& [arg+1) ')'
& [arg-1) '('
->{
   var node = tuple.get("node");
   struct.addStyle(target, node, "WrongToDateNumOfArgs");
}

5. Witness its working:custom

Custom Syntax Coloring

SqlDeveloper 18.3 features one more application of Semantic Analysis based upon Arbori parse tree query language. It provides much easier venue to mastering Arbori, compared to formatting (which is ubiquitous, but grew to almost 800 conditions aggregated into more than 30 rules).

First, there are 3 additional PL/SQL color styles, PlsqlCustom1-3:

styles.png There I have changed the default gray color for PlsqlCustom1 to more eye-catching pinkish highlight. This color would be appropriate for what we’ll do in the very first exercise — highlighting all the EXCEPTION keywords in PL/SQL code. The template is already there, all you need to do is activating the commented PlsqlCustom1 rule:

custom.png

Let’s examine the rule in detail:

PlSqlCustom1:          
   [node) 'EXCEPTION'  
->                  
; 

The rule header – PlSqlCustom1 – matches the name of the syntax style from the parent – Code Editor: PL/SQL Syntax Colors – preference page.

Consequently, PL/SQL code fragments identified by this rule are highlighted according to the chosen style.

The colon separates the rule header from the rule body. In general, a rule body specifies some logical criteria for PL/SQL code selection. The code selection method is parse driven: as soon as some nodes of interest are identified on the parse tree, the text covered by that node is subject to the rule’s action. In our example, the condition is

[node) 'EXCEPTION'

and it chooses all the nodes in the parse tree which payload is the EXCEPTION keyword. The “node” is the rule’s attribute name and the rule action expects a variable with such name verbatim. The rule attribute names are wrapped with the open bracket on the left and closed parenthesis on the right. This syntax is warranted by the convention to identify parse tree nodes with semi open segments specifying the text fragment boundaries.

The arrow  at the next line prompts that this rule triggers some action, namely the custom syntax highlighting. This is opposed to auxiliary rules, which can be leveraged in the other rules in the spirit similar to how views and CTEs are used in SQL.

The final semicolon demarcates the end of the rule.

The second rule – PlSqlCustom2 – is enabled by default, and its logic is documented at the help page.

Lets try something little more ambitious and try to highlight all the variables which are not defined. We can either add the PlSqlCustom3 rule, or start with scratch and repurpose the PlSqlCustom1. Here is the first attempt:

PlSqlCustom1: 
[node) identifier 
& ![node)decl_id 
& ![node-1) '.'
& ![node+1) '.'
& ![node^) pkg_body
& ![node^) subprg_spec
& ![node^) subprg_body
& ![node^) procedure_call
->
;

identifiers.pngWithout the negative conditions many more identifiers would have been highlighted as indicated by those red arrows. How would you know what grammar symbol to target? This can only be accomplished when examining the parse tree. For example, p_param1_todo declaration would have been highlighted because it is formally an identifier. However, it is also a decl_id. The reader is encouraged try commenting some of those negative conditions to see what is the effect at the PL/SQL code sample panel.

Why the raise has been highlighted, it is certainly not an identifier? This is a bug: the combined SQL&PL/SQL grammar is highly ambiguous, and the parser recognized the statement as a function call.

The second step is highlighting all variable declarations. We still want to keep the previous rule for later, this is why it is renamed, and its semantic action is removed:

decl.png

The final step is excluding those declarations from all identifier entries. However, it is little more subtle than that, and we can’t formally apply the subset difference. The identifiers and their declarations are always different nodes, and we want to match them by name. For that purpose, we rename the node attribute to the decl and introduce the two-attribute predicate matches(node, decl). Then, the final query is a set difference of all the ids minus matches (subtracting relations of different signature is legitimate in Arbori):

final.png

A little more refined version of the Arbori query for undefined variables:

ids:    
     [node) identifier
   & ([node) name | [node) arg) 
   & ![node+1) '.'
   & ![node^) procedure_call
   & (![node+1) '=' | ![node+2) '>')
   & ! ?node='true'
   & ! ?node='false'
   & ! ?node='sqlerrm'
; 

decls:     
     [decl) decl_id 
    & ([decl^) basic_d | [decl^) prm_spec | [decl^) subprg_spec)
|  [decl) identifier & [decl^) cursor_d 
; 

matches: ?ids.node = ?decls.decl
; 

PlSqlCustom1: ids - matches
->
;

This solution is only approximate, because no effort has been made to identify the variable scope. The ancestor-descendant relationship between the parse nodes is expressible in Arbori, and you can find numerous examples of its usage at the Custom Format preference page. However, unlike formatting, syntax highlighting is something triggered by every user keystroke. Arbori queries involving ancestor-descendant are slightly less efficient than simple traverse of parse tree… use it with caution.

Version 19.2 amendments

Since version 19.2 you are no longer limited with 3 rules and the fixed rule naming scheme.