Java Script Conditions

May 29, 2020

Arbori is diminutive language with grammar fitting into couple of pages. It turns out that it still features some redundancies. Consider the following input

SELECT Ord.SalesOrderID, Ord.OrderDate,
    (SELECT *
     FROM Sales.SalesOrderDetail 
     WHERE Ord.SalesOrderID = OrdDet.SalesOrderID) 
From Sales.SalesOrderHeader;

and the program

Matches: ?node = 'FROM'
;

What is the expected behavior of matching node source text, is it case sensitive? In particular, would there be a single match in the above example, or the two?

I was never satisfied with Arbori question mark syntax, and suggest a better alternative: Java Script condition. It is a function which returns a boolean, which should be declared in some Nashorn block, and can be used as Arbori condition. Let’s define a string matching condition in the above example.

include "std.arbori"

Prelude: runOnce -> {
    function nodeContent(nodeVar) {
        var node = tuple.get(nodeVar);
        return target.input.substring(
            target.src[node.from].begin,
            target.src[node.to-1].end
        );
    }
    function stringMatch() {
        print(tuple)
        return 'FROM'== nodeContent('node')
    }
}

MatchesFrom: .stringMatch
      & [node < node) 
;

There we have defined the two Java Script functions: the nodeContent, which cuts a text fragment from the input under a recognized parse node, and the stringMatch, which uses it to perform the actual comparison. This is the second article where I invoke the nodeContent function, which hints that this function better be included into the standard library std.arbori.

The purpose of the Prelude predicate is just the declaration of the stringMatch. This function, returning a boolean, can be used as condition in the following predicate — the MatchesFrom. It is tempting to define this predicate simply as

MatchesFrom: .stringMatch
;

but this wouldn’t work. This query would have no attributes, which is not a problem itself. Unlike SQL, Arbori admits relations with empty set of attributes. However, even if this query were succeeded somehow, it would be impossible to extract any attribute from the tuples, because there aren’t any! The trick here is just adding an attribute named node. Please also note that the Java script implementation of the stringMatch actually extracts the value of the attribute node from every tuple. If there were no such attribute in the MatchesFrom Arbori query, then it would raise a runtime error!

In Arbori there is no analog of Relational Algebra project operator, AKA SQL select. The query attributes are inferred implicitly from the conditions. Therefore, we could amend the query with a condition like

[node) identifier

but this condition would filter out all the keywords. Therefore, the bogus condition

 [node < node) 

which asserts that the beginning position of the node interval is less than the node’s end. Let’s execute the program:

This screenshot highlights the two points. First, witness only one match in the input. Second, the print statement inside the stringMatch implementation is useful for debugging purposes. Java console panel at the left demonstrates that the stringMatch condition has been evaluated at every parse tree node.

Contemporary IDE strength is defined not only by advanced and flawless functionality, but also by its openness. How open is SQLDeveloper? Yes, you can write extensions in some cumbersome XML syntax, or even plug-in custom java classes, but you can do better via Arbori Nashorn scripting. To illustrate this point I’m planning to write a series of essays demonstrating how to solve problems. I’ll start with a problem which solution works in SQLDeveloper 19.4. The later essays would focus on more impressive customizations supported by soon to be released 20.2.

Suppose we are given a PL/SQL anonymous block with bind variables

begin
:ret := plus(:arg1,:arg2);
end;

how can we execute it? Certainly, this task looks trivial: click run on the plus stored procedure in the navigator tree. Well, this is not exactly the solution, because you wouldn’t be executing the given PL/SQL block; you would be running a generated wrapper block. And if you try to run this block from SQL Worksheet then you will have to figure out how to supply and retrieve bind variables.

You may argue that there is print command in SQL Plus, so that you can channel the function return value through it. Yes, that is kind of a solution, but it is not entirely satisfactory for several reasons. First, SQL*Plus is just one RDBMS client among many, with fading ubiquitousness along the passage of time. Count the number of questions for “jdbc” vs. “sqlplus” on stackoverflow, if you are not convinced. Second, “run via sqlplus” gives you very little flexibility in how can you run this anonymous PL/SQL block, or how would you scale up your solution in more challenging situations.

Let’s run it in Arbori. The first step would be referring to the standard library, where you’ll find many useful predicates and Java Script functions:

include "std.arbori"

For example, it defines the predicate runOnce, such that that it has only one tuple. Consequently, the Nashorn block that follows the next Arbori rule — prelude — is run once, thereby performing initialization:

prelude: runOnce -> {
    var binds = [];
    function nodeContent(nodeVar) {
        var node = tuple.get(nodeVar);
        return target.input.substring(
            target.src[node.from].begin,
            target.src[node.to-1].end
        );
    }
}

There we allocate an array that will collect all the bind variables found by Arbori query, and define a utility function that cuts the fragment of the text recognized under a parse tree node.

In the next section we run Arbori query that searches for all bind variables and collect them in the previously defined binds array:

binds: [node) bind_var -> {
    binds.push(nodeContent('node'));
}

As usual, to know the grammar symbol to look for, that is bind_var in this example, you need to examine parse tree:

After execution it is instructive to iterate through the result set confirming that the binds query indeed found all the relevant parse tree nodes.

Now that we know all the bind variables, we are ready to execute the anonymous PL/SQL block:

run: [input) block_stmt  -> {
    var  sql = nodeContent('input');    
    print('executing: '+sql);
    
    var ConnectionResolver = Java.type('oracle.dbtools.db.ConnectionResolver');  
    var connectionList = ConnectionResolver.getConnectionNames();
    var conn = ConnectionResolver.getConnection('IdeConnections%23local_hr');
    var cs = conn.prepareCall(sql);
    cs.registerOutParameter(1,java.sql.Types.NUMERIC);
  
    var JOptionPane = Java.type('javax.swing.JOptionPane');  
    for( i = 1; i < binds.length; i++ ) {
        var bindVar=JOptionPane.showInputDialog(binds[i],0);
        var intBind = java.lang.Integer.parseInt(bindVar);
        print(binds[i]+'='+intBind)
        cs.setString(i+1, bindVar);    
    }
    cs.executeUpdate();
    JOptionPane.showConfirmDialog(null,'result='+cs.getInt(1));
    cs.close();
}

Before walking through the last code section, lets witness its working. In the screenshot below user is prompted for input variable names:

The result will be shown also in a dialog, but numerous other options to present it are certainly possible.

Here is the line-by-line explanation of the run query. At line #1 there is basic Arbori query searching for the parse tree nodes labeled as block_stmt. This is where the entire begin...end text fragment would be recognized, and with the Nashorn actions fired at every found tuple. The Nashorn action starting at line #2 constitutes the bulk of the code. There we extract the PL/SQl block as a text, and print it to the console at the next line for debugging purpose.

At lines #5,6,7 we leverage SQL Developer ConnectionResolver, so that we can get connection info from the list of connections that you are already have. Obtaining connection via the standard JDBC API is also possible, although little more verbose with hard coded connection credentials.

At lines #8,9 we prepare the call and register the output bind variable. Next comes the loop at lines #12-17 where we iterate through the array of bind variables, ask for their values as captured on the previous screenshot, and set them in the prepared statement.

After setting all the input bind variables we are ready to execute the statement at line #18, and fire the dialog at the next line showing the function output.

To many readers this may look like cumbersome reinventing the wheel. What is the point of reimplementing the existing functionality? Am I implying that the user has to write couple of pages of Arbori code in order to do some basic task? Not necessarily, as you might perfectly reuse the code written by somebody else. The arbori file extension would be recognized in version 20.2, so all you need is a folder/library of already written Arbori programs, and you open them just as naturally as SQL scripts.

Arbori Semantic Actions

August 1, 2019

Arbori is a tidy query language with relatively limited expressiveness. This constrains its applications, such as SQL Formatter. The arithmetic in Arbori is virtually non-existent; therefore, is it possible, for example, to break procedure/function argument list after every 5th argument?

One approach would have been extending Arbori language with extra capabilities. This direction however is less than desirable, because requiring users to learn new language with concise syntax is challenging proposition by itself. A new language with extensive cumbersome syntax is a nonstarter.

Alternatively, one can piggyback onto some popular general purpose programming language, such as Java Script. This is the direction followed by SQL Developer version 19.2. The “Output queries” section at page 21 of the arbori reference guide describes how this newly introduced JS action is attached to Arbori query.

In this post we’ll provide one more example: Custom Syntax Coloring of TO_DATE built-in SQL function called with wrong number of arguments. Employing the Arbori query from that forum thread is performed in 5 steps:

1. Append the Arbori query

WrongToDateNumOfArgs:
( [node^) function_call
| [node^) datetime_literal
| [node^) function_expression
) & ?node = 'TO_DATE'
& ([arg) expr
| [arg) string_literal
| [arg) pls_expr )
&  arg^-1 = node
& [arg+1) ')'
& [arg-1) '('
;

at the bottom of the editor pane at the Preferences -> Code Editor -> PL/SQL Syntax Colors -> PL/SQL Custom Syntax Rules

2. Add JS semantic action:

WrongToDateNumOfArgs:
( [node^) function_call
| [node^) datetime_literal
| [node^) function_expression
) & ?node = 'TO_DATE'
& ([arg) expr
| [arg) string_literal
| [arg) pls_expr )
&  arg^-1 = node
& [arg+1) ')'
& [arg-1) '('
->{
   var node = tuple.get("node");
   struct.addStyle(target, node, "WrongToDateNumOfArgs");
};

3. Confirm your edit with OK, restart SQLDev

4. At the parent preference panel Preferences -> Code Editor -> PL/SQL Syntax Colors you should be able to see the newly introduced style:color

5. Witness its working:custom

Custom Syntax Coloring

October 10, 2018

SqlDeveloper 18.3 features one more application of Semantic Analysis based upon Arbori parse tree query language. It provides much easier venue to mastering Arbori, compared to formatting (which is ubiquitous, but grew to almost 800 conditions aggregated into more than 30 rules).

First, there are 3 additional PL/SQL color styles, PlsqlCustom1-3:

styles.png There I have changed the default gray color for PlsqlCustom1 to more eye-catching pinkish highlight. This color would be appropriate for what we’ll do in the very first exercise — highlighting all the EXCEPTION keywords in PL/SQL code. The template is already there, all you need to do is activating the commented PlsqlCustom1 rule:

custom.png

Let’s examine the rule in detail:

PlSqlCustom1:          
   [node) 'EXCEPTION'  
->                  
; 

The rule header – PlSqlCustom1 – matches the name of the syntax style from the parent – Code Editor: PL/SQL Syntax Colors – preference page.

Consequently, PL/SQL code fragments identified by this rule are highlighted according to the chosen style.

The colon separates the rule header from the rule body. In general, a rule body specifies some logical criteria for PL/SQL code selection. The code selection method is parse driven: as soon as some nodes of interest are identified on the parse tree, the text covered by that node is subject to the rule’s action. In our example, the condition is

[node) 'EXCEPTION'

and it chooses all the nodes in the parse tree which payload is the EXCEPTION keyword. The “node” is the rule’s attribute name and the rule action expects a variable with such name verbatim. The rule attribute names are wrapped with the open bracket on the left and closed parenthesis on the right. This syntax is warranted by the convention to identify parse tree nodes with semi open segments specifying the text fragment boundaries.

The arrow  at the next line prompts that this rule triggers some action, namely the custom syntax highlighting. This is opposed to auxiliary rules, which can be leveraged in the other rules in the spirit similar to how views and CTEs are used in SQL.

The final semicolon demarcates the end of the rule.

The second rule – PlSqlCustom2 – is enabled by default, and its logic is documented at the help page.

Lets try something little more ambitious and try to highlight all the variables which are not defined. We can either add the PlSqlCustom3 rule, or start with scratch and repurpose the PlSqlCustom1. Here is the first attempt:

PlSqlCustom1: 
[node) identifier 
& ![node)decl_id 
& ![node-1) '.'
& ![node+1) '.'
& ![node^) pkg_body
& ![node^) subprg_spec
& ![node^) subprg_body
& ![node^) procedure_call
->
;

identifiers.pngWithout the negative conditions many more identifiers would have been highlighted as indicated by those red arrows. How would you know what grammar symbol to target? This can only be accomplished when examining the parse tree. For example, p_param1_todo declaration would have been highlighted because it is formally an identifier. However, it is also a decl_id. The reader is encouraged try commenting some of those negative conditions to see what is the effect at the PL/SQL code sample panel.

Why the raise has been highlighted, it is certainly not an identifier? This is a bug: the combined SQL&PL/SQL grammar is highly ambiguous, and the parser recognized the statement as a function call.

The second step is highlighting all variable declarations. We still want to keep the previous rule for later, this is why it is renamed, and its semantic action is removed:

decl.png

The final step is excluding those declarations from all identifier entries. However, it is little more subtle than that, and we can’t formally apply the subset difference. The identifiers and their declarations are always different nodes, and we want to match them by name. For that purpose, we rename the node attribute to the decl and introduce the two-attribute predicate matches(node, decl). Then, the final query is a set difference of all the ids minus matches (subtracting relations of different signature is legitimate in Arbori):

final.png

A little more refined version of the Arbori query for undefined variables:

ids:    
     [node) identifier
   & ([node) name | [node) arg) 
   & ![node+1) '.'
   & ![node^) procedure_call
   & (![node+1) '=' | ![node+2) '>')
   & ! ?node='true'
   & ! ?node='false'
   & ! ?node='sqlerrm'
; 

decls:     
     [decl) decl_id 
    & ([decl^) basic_d | [decl^) prm_spec | [decl^) subprg_spec)
|  [decl) identifier & [decl^) cursor_d 
; 

matches: ?ids.node = ?decls.decl
; 

PlSqlCustom1: ids - matches
->
;

This solution is only approximate, because no effort has been made to identify the variable scope. The ancestor-descendant relationship between the parse nodes is expressible in Arbori, and you can find numerous examples of its usage at the Custom Format preference page. However, unlike formatting, syntax highlighting is something triggered by every user keystroke. Arbori queries involving ancestor-descendant are slightly less efficient than simple traverse of parse tree… use it with caution.

Version 19.2 amendments

Since version 19.2 you are no longer limited with 3 rules and the fixed rule naming scheme.

 

Formula for Formatting

September 28, 2017

SQL Developer release 17.3 is out with formatter amendments warranting an explanation. The fundamentals didn’t change: the formatting mechanism is still based on parse tree analysis, which is implemented via Arbori — parse tree query language. New for 17.3 is more intuitive rule structure and better coordination of rules with formatting options. Still, rather than describing 17.3 implementation increments, I’d try presenting a self-contained description of Arbori formatting engine.

We start with indentation, arguably, the most important formatting feature. Consider the following exampleindents.png

The first indent (at level = 0) is applied to the list of attributes in the select clause and the list of tables in the from clause in the main query block. The second offset (at level = 1) is applied to the inner view in parenthesis. The third offset is applied to the select term and the table in the inner view.

How is this specified formally?  We have to examine parse tree:

parseTreeIndents.png

The parse tree nodes, which we want to indent, have grammar symbols (such as the select_list) encircled in green. Unfortunately, there is another, nested, select_list occurrence, marked in red, which labels the parse tree node that we don’t want to indent. We just have to refine our condition that we are after the parse tree nodes labeled with select_list whose parent payload is select_clause!

Formally, all indentation formatting conditions are gathered in the rule named simpleIndentConditions:

simpleIndentConditions.pngThe operations “|” and “&” are logical disjunction and conjunction, correspondingly, so that the entire simpleIndentCondition rule is a monumental DNF. The conjunctions are listed alphabetically; therefore, we find the condition of interest for the nodes labeled with the select_clause buried somewhere in the middle:

selectListConditions.png

The condition consists of the 3 parts:

  • Nodes, labeled with the select_list
  • Parent node (formally “node^“) labeled with select_clause
  • Boolean bind variable :breaksAfterSelectFromWhere

The value of this bind variable is carried over from the Advanced Format preference page:

advancedFormat.png

Bind variables provide a vehicle for this “wall of options” to be seamlessly integrated into the formatter. Flipping the Line Breaks->SELECT/FROM/WHERE option off would have set :breaksAfterSelectFromWhere to false, while flipping it back again would have set :breaksAfterSelectFromWhere = true.  Consequently, with the condition :breaksAfterSelectFromWhere = true, the conjunction would reduce to just

|  [node) select_list & [node^) select_clause

while :breaksAfterSelectFromWhere = false would have effectively eliminating this conjunction from the simpleIndentCondition rule.

Let’s do this experiment, flipping the Line Breaks->SELECT/FROM/WHERE option off. People who requested this option are usually not fond of formatting style with excessive breaks. While at it, let’s switch off breaks on subqueries as well. The result

alignmentsExample.png

warrants an explanation. The first snag is the failure to align the select terms “2” and “3” to the predecessor. Somehow a space has been lost… The other alignments look OK. However, the alignments bring us beyond simple indentation model that I have described earlier. Once again, :breaksAfterSelectFromWhere is set to false, so it have effectively nullified the  condition to indent the select term in the simpleIndentCondition rule. How did it format the select terms (and other items)?

There we have two new rules for line breaks and alignments. The rule for line breaks consists of the two parts, covering conditions when to insert line break before a parse node:

extraBrkBefore.png

and after. Here we see that if a user have chosen an option to have line breaks before the comma, then the line break will be inserted before a comma, which is followed by a select_term.  This is not the condition that inserted line breaks before the select terms 2, and 3 in our example. The following one did:

extraBrkAfter.png

Being able to control how to inserting line breaks solves half of the problem. We also need to specify alignments:

pairwiseAlignments.png

The alignment rule is little more sophisticated than what we have considered before. It involves two variables: which node we want to align, and the target parse tree node, named “predecessor” which we want to align it to. There are several ways the “predecessor” node is related to the aligned node, hence the alignment rule has been split in several parts. The first rule, called pairwiseAlignments1, defines the predecessor node as the younger sibling of the aligned node, formally: predecessor=node-1. This is exactly how from_clause is aligned to the select_clause. The second rule, called pairwiseAlignments2, defines the predecessor node as the younger sibling of the younger sibling of the aligned node, formally: predecessor=node-1-1. This is how individual select terms are aligned. Here is the parse tree to convince the reader how in our example the select term “2” is aligned with the select list “1”:

pairwiseAlignments3.png

Likewise, the select term “3” is aligned with the select list “1,2”:

pairwiseAlignments2.png

 

 

Arbori; the missing manuals

February 11, 2017

Number of lines of code is well known to be poor metric for virtually any purpose. One thing is indisputable though: the total volume of code is growing. Today, developers are frequently challenged with mining huge codebases. Therefore, querying parse trees is not just an academic exercise, but something of ever increasing practical value.

Blog post, though, is hardly a proper venue to introduce a new language. Here are two references to supplement it:

  1. Arbori Starter Manual by Eugene Perkov
  2. Little more detailed Semantic Analysis with Arbori

Formatting is one of the software development features that always steers controversy. To put it bluntly, everybody wants to format the code their own way. In previous SQL Dev versions the answer to that challenge was having elaborate tree-table widget spiced with several dozens of various options.

Arguably, the combined SQL and PL/SQL grammar is the most cumbersome programming language in existence, with total number of grammar productions exceeding 20000. From that perspective, the list of “Advanced Format” options offered by SQL Dev feels inadequate. To address this mismatch, the latest release introduced “Custom Format”. It offers to a developer means to specify formal conditions which SQL and PL/SQL grammatical constructs to indent,  align, pad, add extra line breaks and so on.

As usual, the best way to demonstrate a new feature is an example. Consider the following enhancement request by Stefan Poschenrieder to remove the indent of the NOT NULL ENABLE inline constraint in the following code

CREATE TABLE dim_aspect (
    dim_aspectid       NUMBER(11,0)
        NOT NULL ENABLE,
    source             VARCHAR2(50),
    hier1_desc         VARCHAR2(50),
    hierarchy1         VARCHAR2(50),
    dwh_inserteddate   DATE
);

The first step is identifying the target grammar symbol in the Code Outline panel. Here is screen snapshot illustrating it:
code outline.png
As illustrated there, positioning a cursor over the NOT keyword highlights the corresponding node in the parse tree. Therefore, the grammar symbol of interest is probably “inline_constraint“. Clicking onto the node labeled inline_constraint to double check this guess witnesses it highlighting “NOT NULL ENABLE” in the code editor. (Please be aware that clicking onto a hyperlinked looking grammar symbol has side effect of opening documentation page featuring that symbol railroad diagram definition).

As soon as the grammar symbol of interest is known, lets find it at Custom Format preferences panel. Admittedly, glaring omission there is the search widget. Yet, grammar symbols are arranged in lexicographic order, so that a reader would have no trouble spotting inline_constraint in the following formatting rule:

...
| [node) index_subpartition_clause[69,119)# 
| [node) inline_constraint & ![node) inline_constraint[14,67)
| [node) inline_ref_constraint
...

This fragment of rather convoluted disjunctive condition instructs the formatter to indent the nodes with payload “index_subpartition_clause[69,119)#“, or the nodes labeled with “inline_constraint” (but not those that also labeled as “inline_constraint[14,67)“), or the nodes labeled “inline_ref_constraint“.  Therefore, commenting out the middle conjunction

...
| [node) index_subpartition_clause[69,119)# 
--| [node) inline_constraint & ![node) inline_constraint[14,67)
| [node) inline_ref_constraint
...

is likely a solution to the problem. After making this change, a reader is advised to test what effect do modified formatting rules have on a SQL or PL/SQL code sample in the preview editor. This action is performed with the bottom-left “Run” button. The two neighbors “Import…” and “Export…” are for saving and retrieving your work. By default the modified “format.arbori” file is kept in the SQLDev product preferences directory.

Let’s reinforce this custom formatting idea with two more examples, both from Steven Feuerstein. In both cases we’ll learn some additional syntax of that formal formatting rules specification. The first example is just a bug witnessed by the following test case:

CREATE OR REPLACE PACKAGE BODY plch_pkg IS BEGIN
   FOR indx IN 1..100000 LOOP
      g_strings(    
         indx       --<--- ???
      ) := 'String ' || indx;
   END LOOP;
END;
/

The rule responsible for that unwanted indentation of the “indx” parameter is

| [node) pls_expr & [node-1) '(' & ![node) numeric_literal

It requires the parse node to be labeled with “pls_expr” but not “numeric_literal“. Also, it stipulates that the prior sibling node (formally, “node-1“) is labeled with open parenthesis. Adding the condition for the node successor not to be the closing parenthesis

| [node) pls_expr & [node-1) '(' & ![node) numeric_literal & ![node+1) ')' 

is all it takes to fix this snag.

In the second example

CREATE PACKAGE fmt AS
    PROCEDURE create_checklist (
        user_id_in       IN INTEGER,
        question_id_in   IN INTEGER
    );
    --<-- wanted extra line break 
    PROCEDURE remove_checklist (
        checklist_id_in   IN INTEGER
    );

    PROCEDURE remove_checklist (
        user_id_in       IN INTEGER,
        question_id_in   IN INTEGER
    );

END fmt;

the formatted code have only single line breaks after each package member. The formatting rule responsible for this functionality is “extraLines“. It formally a disjunction of the two rules “sql_stmts” and “significant_statements“. In the later rule we see the following conjunctive condition:

| [node) basic_decl_item & [node+20 < node)

The syntax of first part is familiar: it requires the node to be labeled as “basic_decl_item“.

The second part is little more cumbersome. The “[ node” and “node )” refer to the node beginning and ending position, correspondingly. Each node in the parse tree recognizes a sequence of tokens. In our example nodes labeled as “basic_decl_item” recognizes a sequence of tokens beginning with the PROCEDURE keyword and ending with semicolon. The beginning position is the offset of the PROCEDURE keyword, while the ending position is the offset of the semicolon. The condition  “[node+20 < node)” requires more than 20 tokens between the node beginning and ending position, or informally the package member to be of “sufficient length”.

Please note that package data members are recognized as “basic_decl_item” as well. Therefore, the proper fix depends on if a user wants to inset double line breaks after data members or not. The length condition was a naive attempt on distinguishing data and procedures, but more elaborate condition, distinguishing data and procedures is possible too.

Let’s walk trough yet another example demonstrating how flexible it is. Consider the following example adapted from ToadWorld ER:

Can we vertically align THEN values of CASE statement ? Current Format:

SELECT CASE
         WHEN a.col5 = b.col5 THEN 'COL5'
         WHEN a.col55555555555555 = b.col55 THEN 'COL55555'
         WHEN a.col555555 = b.col55555 THEN 'COL555555'
       END AS VAL
FROM   table1;

Desired Format:

SELECT CASE
         WHEN a.col5 = b.col5                THEN 'COL5'
         WHEN a.col55555555555555 = b.col55  THEN 'COL55555'
         WHEN a.col555555   = b.col5555      THEN 'COL555555'
       END AS VAL
FROM   table1;

For the reference, the default formatting in SQLDeveloper looks less than desirable:

SELECT
        CASE WHEN
            a.col5 = b.col5
        THEN
            'COL5'
        WHEN
            a.col55555555555555 = b.col55
        THEN
            'COL55555'
        WHEN
            a.col555555 = b.col55555
        THEN
            'COL555555'
        END
    AS val
FROM
    table1

However, to satisfy this request it requires exactly 4 changes of the formatting program.

First, before taking on alignments, we should fix unwanted indentation. How about not indenting conditions a.col5 = b.col5 and some such? By examining the parse tree, we notice that the code fragment a.col5 = b.col5 is formally a condition. Therefore, it must be that the line

| [node) condition  &  [node-1) 'WHEN'

of the formatting program that is responsible for the effect. Comment (or delete it), and rerun the formatter to confirm this hypothesis:

SELECT
        CASE WHEN a.col5 = b.col5 THEN
            'COL5'
        WHEN a.col55555555555555 = b.col55 THEN
            'COL55555'
        WHEN a.col555555 = b.col55555 THEN
            'COL555555'
        END
    AS val
FROM
    table1;

What about not indenting expressions, like string literal ‘COL5’? Just comment out the following rule:

--| [node) expr  &  [node-1) 'THEN'

The effect of this change, however, is somewhat discouraging:

SELECT
        CASE WHEN a.col5 = b.col5 THEN 'COL5' WHEN a.col55555555555555 = b.col5 THEN 'COL55555' WHEN a.col555555 = b.col5 THEN 'COL555555' WHEN a.col55555555
 = b.col5 THEN 'COL5555555' END
    AS val
FROM
    table1;

Here we need to look onto parse tree yet again to identify what grammar symbol would indent the entire WHEN-THEN fragments. The symbol in question is the  searched_case_expression#, so that the missing indentation rule alternative is just

| [node) searched_case_expression#

With this amendments the sample code formats to:

SELECT
        CASE
            WHEN a.col5 = b.col5 THEN 'COL5'
            WHEN a.col55555555555555 = b.col55 THEN 'COL55555'
            WHEN a.col555555 = b.col5555 THEN 'COL555555'
        END
    AS val
FROM
    table1;

Please note that even though there is a set options for CASE expressions (which, again, is a concession to traditional format design), flipping those options won’t produce format as clean as we just did.

So far we made 3 custom formatting program changes . Now we are ready to take on alignments. Let’s first investigate how current  formatting rules align the equalities,  because reverse engineering similar functionality is the easiest way to master new skills. Here is the code:

/**
 * All alignments (paddings) ...
**/
paddedIdsInScope: (
-- types
  [id) identifier & [id+1) datatype & [scope) relational_properties
| [id) decl_id & ([id+1) prm_spec_unconstrained_type | [id+1) 'IN' | [id+1) 'OUT' ) & [scope) fml_part
| [id) decl_id & ([id+1) constrained_type | [id+1) object_d_rhs) & [scope) adt_definition
| [id) decl_id & ([id+1) constrained_type | [id+1) object_d_rhs) & [scope) decl_list
-- =>
| [id) sim_expr & [id+1) '=' & [id+1+1) '>' & [scope) paren_expr_list
-- :=
| [id) name & [id+1) ':' & [id+1+1) '=' & [scope) seq_of_stmts
-- =
| [id) column & [id+1) '=' & [id+1+1) expr & [scope) where_clause
| [id) column & [id+1) '=' & [id+1+1) expr & [scope) on_using_condition
) & scope < id ->

The first highlighted condition specify that we are interested in columns, followed by equality, followed by some expression. They would be chosen together with the containing where_clause. Therefore, if we choose columns within the CASE scope

| [id) column & [id+1) '=' & [id+1+1) expr & [scope) case_expression

then it would align the equalities (but not THEN expressions):

SELECT
        CASE
            WHEN a.col5                = b.col5 THEN 'COL5'
            WHEN a.col55555555555555   = b.col55 THEN 'COL55555'
            WHEN a.col555555           = b.col55555 THEN 'COL555555'
        END
    AS val
FROM
    table1;

The condition we are after is a slight variation of thereof:

| [id) condition & [id+1) 'THEN' & [id+1+1) expr & [scope) case_expression

which produces:

SELECT
        CASE
            WHEN a.col5 = b.col5               THEN 'COL5'
            WHEN a.col55555555555555 = b.col55 THEN 'COL55555'
            WHEN a.col555555 = b.col55555      THEN 'COL555555'
        END
    AS val
FROM
    table1;