vit_r: default (Default)
[personal profile] vit_r
This is a technical post. It contains the source code of a SQL prototype.

Click this link to switch into a correct DW style to read the text below. You must use wide enough monitor to understand the code.

I use PostgreSQL and the schema "temp" on my test server.

You can load raw data from [livejournal.com profile] podmoskovnik journal. See the first comment for instructions how to load Russian election data.




-- Project:     UXE /   Magic Numbers   
--
-- LINK:        Constructive Paranoia   /      /   German Garden Formatting    /   sample for SQL
--
-- BASE URL:    https://vit-r.dreamwidth.org/927050.html
--
-- (CC) Vit-R, 2018
--
--
-- ========================================================
-- NOTE: This code is written in an industry grade coding style.
--       Don't try to read it from your smartphone.
--       You need a wide enough monitor and a tool with 
--       a reasonable syntax highlighting.  
--       Auto-formatting may make the code unreadable.
-- ========================================================

-- ========================================================
-- Introduction
-- ========================================================
--   It is widely known that we perceive numbers incorrectly.
-- Price tags say "9.99" instead of "10.-" to fool us.
-- The psychology of numbers is more interesting than these simple
-- tricks. 
--   We like some numbers and dislike other numbers. 
-- There are numbers that seem to be bigger or smaller than they
-- really are. There are numbers that we like to add and 
-- the numbers we like to subtract. Unfortunately it is 
-- very difficult to get correct experimental data.
--   Russian elections are great source for psychological analysis.
-- We know that the numbers people write in protocols are manipulated.
-- We know what protocol fields were raised and what protocol fields 
-- were suppressed. After statisticians found out ways to detect 
-- traces of calculators usage the numbers in russian election 
-- protocols are manipulated with more human intuition and basic
-- calculation abilities.
--   If we assume that not manipulated results and the 
-- results of other manipulation kinds (added ballot papers, 
-- votes moved from one candidate to another, values directly
-- taken from calculator screens, etc.) are random, we can 
-- consider Russian election protocols as a great source 
-- for numbers psychology analysis.
--   Imagine, you must pay $10 for every experiment 
-- participant and do not know how many of them take seriously 
-- your instructions like "You must make this number a bit bigger
-- than X percent, this number must be smaller then Y percent
-- this numbers must be small and the sum of all numbers 
-- must be correct and nobody must see that this numbers were 
-- artificially constructed..."
--   Imagine, you have limited budget but need many participants 
-- to test even quite exotic cases...
--   Russian elections regularly generate results of such experiments 
-- and you can use them for free!

-- ========================================================
-- Theory
-- ========================================================
--   Magic numbers are such numbers that make perceived values 
--     a) significantly bigger or smaller than they really are, 
--     b) significantly attractive or unappealing, 
--     c) significantly simple or misleading for calculations,  
--     etc.
--   There are many grounds why some numbers obtain magics:
-- they may be "calculable" in our decimal number system (100, 50),
-- they may be rhythmic (1n1, 2n2),
-- they may be sounding good in particular language
-- they may be loaded with cultural meaning...

-- ========================================================
-- Hypothesis
-- ========================================================
--   People who manipulate values in election protocol tend to 
-- write good magic numbers in fields that they must raise and
-- bad magic numbers in fields that they must suppress.
--   The numbers for raised election protocol fields move from 
-- left to right. Consequently in such cases a "good" number will 
-- be used in more cases then preceding and next numbers.
--   Note, to raise a number someone could write not only 
-- a "good" 100 but also 101 or 102 that mean "100 plus some more".
-- Additionally magic numbers are distributed not evenly and there
-- may be situations when 2 neighboring numbers are similarly
-- attractive and compete for number of cases. Also a person 
-- who adds "approximately 150" has more possibilities 
-- than someone who can move not more than 5 votes.
--   Because the numbers in a raised field move from left to
-- right the "left slope" must be more prominent.
--   A bad number in a raised protocol field will be used 
-- in less cases than the other neighbors.
--   Let say a good number creates a hill with small pits left 
-- and right. A bad number creates a pit.
--   The numbers for suppressed fields move from right to left
-- and consequently create "right slope". Good numbers will be
-- avoided and bad numbers will be preferred. 
  


-- ========================================================
-- Hypothesis test
-- ========================================================
--   This set of SQL requests finds simplest criteria for 
-- magic numbers. 
--     a) only the effects of 1, 2 or 3 last digits are tested
--     b) only the "left slope"-effects for raised fields 
--        and only the "right slope"-effects for suppressed fields
--        are used to test hypotheses 
--     c) only the previous and the next number are considered
--     d) the goal is only to find magical numbers and 
--        statistical significance is not calculated 
--     e) effects of leading digits are ignored 
--        other assumptions and constraints can be found in the code below.   
--   Note, leading digits are important. For instance the number
-- "777" is attractive, the number "877" sounds wrong and
-- the number "1177" is too "artificial". (One of the requests below
-- allows to compare different "effects" for the same last digits.)

-- ========================================================
-- Usage
-- ========================================================
--   I use PostgreSQL and pgAdmin. For other databases you may be needed
-- to make some adjustments and change the chain WITH ... AS ... 
-- into sequence of CREATE TEMP TABLE calls.
--   I use the dataset with the timestamp "2018-03-20 11:21" 
-- saved in the table temp.table_227_level_3
-- You may get slightly different results with other datasets.
--   
--   To see the steps for data preparations visit the base URL 
--
--     1. Copy the WITH ...AS part of the SQL request from
--        "Preparation START" till "Preparation END".
--        The comments before "Preparation START" have no effect
--        and could be copied or removed.
--     2. Add one of "Reporting SELECTs"
--     3. Change / adjust parameters, filters, names or anything else
--     4. Connect to your database and execute the SQL script
--     5. To check other parameters replace the last SELECT 
--        statement with one other SELECT from "Reporting SELECTs"
--
--   The request takes several seconds. There may be comical performance
-- degradation effects for instance by removing several tables 
-- from UNIONs in ru_el_2018_number_set table  

-- ========================================================
-- Short FAQ in Russian
-- ========================================================
-- 1. Это не очередное доказательство того, что выборы нарисованы, а 
--    использование этого факта.
-- 2. Вычислять процент нарисованного не интересно. Если отчерпать дерьмо
--    сверху, мёд внизу всё равно уже не мёд. Выбрасывать надо всё.
-- 3. Демократия начнётся не когда процент уменьшится, а когда начнут 
--    сажать тех, кто подтасовывает результаты. Причём любым путём 
--    подтасовывает. Пока же власть в Российской Федерации 
--    принадлежит училке, которая подбрасывает десяток заготовленных
--    бюллетеней и главарю комиссии, который выгоняет наблюдателей
--    и сочиняет числа в протоколе. Голос гражданина ничего не значит,
--    потому что без проблем вычёркивается нафиг.
-- 4. Тут нет призывов. Нам пофиг. Можно, даже, сказать, что 
--    изберком и нарушители закона о выборах на местах делают полезное дело.
--    Если вы сделаете всё по-демократически, со свободными СМИ, 
--    с нормальной предвыборной подготовкой, с правильной организацией 
--    и честным подсчётом, то выберете не Навального, а Жириновского.
--    А нам это нафиг нужно. Впрочем, и Небутерброд-Навальный тоже.
-- 5. Мне не стыдно. СССР - да, Россия - нет. И не из Украины.
--    Русский язык тоже ни к чему не обязывает. Я выполннил пожелание
--    "Ну и убирайся в свой Израиль", только немного не доехал.


-- ========================================================
-- Preparation START
-- ========================================================

-- DBG  :    to use TEMP TABLEs instead of one request uncomment the following line 
--                                  DROP TABLE IF EXISTS TABLE magic_numbers ;
--                              uncomment the following string and comment out the next string that starts with "WITH..."
--                                  CREATE TEMP TABLE magic_numbers ( magic_number
--                              than uncomment the string with ';' below (See   mark)

WITH magic_numbers (   magic_number             --  psychologically important number as last n digits
                                                --  NOTE:   the effects of leading digits are ignored 

                     , magic_number_len         --  number of digits in the magic_number
                                                --
                                                --  REASON: magic_number is an SQL integer 
                                                --          and the length is the best way to distinguish remainders like 0 and 00

                     , magic_number_type        --  one of string values below
                                                --
                                                --  VALUES: 'good_number'   - direct  configuration pit-hill-pit  
                                                -- 
                                                --                            NOTE: there are strong and weak magical numbers
                                                --                                    but this version does not distinguish them.
                                                -- 
                                                --          'bad_number'    - inverse configuration lawn-pit-lawn 
                                                -- 
                                                --                            NOTE: the force to avoid a bad magical number
                                                --                                    is smaller than the force for good magical numbers
                                                -- 
                                                --          'pit_left'      - preceding number for a 'good_number' 
                                                --                              ('good_number' - 1)
                                                -- 
                                                --                            NOTE: a strong magical number can affect 
                                                --                                    not only direct neighbors 
                                                --                                    (valley-montain-valley configuration)
                                                --                                    but we do not test for such cases.
                                                -- 
                                                --          'pit_right'     - next number for a 'good_number'       
                                                --                              ('good_number' + 1)
                                                --
                                                --          'good_fraction' - may be used in derived numbers for instance 5  from 5 + 3 + 2 = 10
                                                --
                                                --  NOTE:   This version seeks only for simple cases: good_number and bad_number

                                                --  NOTE:   the following texts can be used under (CC) license  
                     , description_en           --  author's public description of the nature of this magic_number
                     , description_ru           --  Russian version of description_en

                     )                      AS ( -- magic_numbers table start

    -- a manually collected table for magic numbers
    --
    VALUES
                -- magic_number | magic number properties
                (     0 
                                , 1 
                                , 'good_number' 
                                , '0 is a strong good number. It seems to be big because it belongs to next 10 and it is good for calculations'
                                , 'С математикой плохо, а запутаться страшно. Ноль очень любят.'
                                ) --

            ,   (     9 
                                , 1 
                                , 'bad_number' 
                                , '9 is the left pit before good 0. It also has some unpleasant characteristics thad dig it down in comparison to 8'
                                , 'Многим кажется, что 1009 сильно меньше 1010. К тому же это куб трёх, а такая математика - уже перебор '
                                ) --

            ,   (     1 
                                , 1 
                                , 'pit_right' 
                                , '1 is the right pit after good 0. It is relatively weak because of rhythms like 1n1 or 1nn1'
                                , 'Эффекты с единицей на конце мы не рассматриваем, потому что они комбинированные'
                                ) --

            ,   (    00 
                                , 2 
                                , 'good_number'  
                                , 'Two 0 digits are like one 0. Even they look as a artificial construct they are liked. '
                                , 'С математикой очень плохо, потому приходится для больших чисел брать два нуля в суммах, чтобы не напутать со сложением'
                                ) --

            ,   (    99 
                                , 2 
                                , 'bad_number' 
                                , '99 is like 9. Additionally a number with repeated digits looks artificially created.'
                                , 'Если к 99 прибавить единичку или немного побольше, кажется, что прибавили не 1, а 100.'
                                ) --


            ,   (   000 
                                , 3 
                                , 'good_number' 
                                , 'Weak good number. Numbers with 4 digits are rare and the number with 3 repeated 0 digits looks too good'
                                , 'С математикой настолько плохо, что приходится писать числа с тремя нулями.'
                                ) -- 

            ,   (   999 
                                , 3 
                                , 'bad_number' 
                                , 'Weak bad number. The proportion signal/noise is weak and the effects of repeated 9 are not significant.'
                                , 'Это тут просто для интереса. Три девятки - это как одна девятка, только с кучей условий'
                                ) --

            ,   (     5 
                                , 1 
                                , 'good_fraction' 
                                , '5 is a weak good number convenient for addition and subtraction. The fraction effect not considered here.'
                                , 'Пятёрка хороша для сложения, но это мы тут не рассматриваем'
                                ) --

            ,   (    05 
                                , 2 
                                , 'good_fraction' 
                                , 'This number shows weak effects of 5'
                                , 'Ноль любят, пятёрку тоже. Почему бы не любить их комбинацию'
                                ) --


            ,   (    77 
                                , 2 
                                , 'good_number' 
                                , 'Lucky number. It may be strongly affected by preceding digits'
                                , 'Семёрка счастливое чило, особенно, когда её много'
                                ) --


            ,   (    13 
                                , 2 
                                , 'bad_number' 
                                , 'Unlucky number'
                                , 'Люди, рисующие цифры в протоколах суеверны.'
                                ) -- 

            ,   (    50
                                , 2 
                                , 'good_number' 
                                , '50 is a combination of two good numbers 5 and 0 with strong 0'
                                , 'Ещё один эффект пробелов в математике'
                                ) -- 

            ,   (    64 
                                , 2 
                                , 'bad_number' 
                                , 'Kemerovo number. It is dual: 6 makes it big but small in comparison with hundreds and 4 maks it not such big'
                                , 'Нумерологические гадания на предсказания катастроф'
                                ) -- 
 
) -- magic_numbers table end 
-- DBG  : uncomment the following line with ";" (see notes above)
--                           and repeat for other tables
--                            ; -- magic_numbers  (TEMP TABLE end)

,   election_2018_with_javka AS (

    -- copy all columns from the raw table and add n_javka (Явка в терминах изберкома)

    SELECT
                *
            ,
                (    n_vydan_dosrochno 
                   + n_vydan_v_pomeshchenii 
                   + n_vydan_vne_pomeshchenija 
                   )                                AS n_javka
             ,  (    n_dejstvitelnyh 
                   + n_ne_dejstvitelnyh 
                   )                                AS n_golosov
        FROM
                temp.table_227_level_3              -- Election data state 2018-03-20 11:21
                
) -- election_2018_with_javka 

,   filtered_election_2018 AS (

    -- place to filter election data

    SELECT
                *
        FROM
                
                election_2018_with_javka
        WHERE
                1 = 1
        --  AND
        --                        n_javka   > 0     -- NOTE: there are UIKs with n_javka = 0
        --  AND
        --      100.0 * n_putin / n_javka   > 65
        --  AND
        --      region LIKE '%город%'

        -- -------------------------------
        -- conditions for integer percents
        -- -------------------------------

            -- -- integer percent in javka
            -- AND
            --      ( --    -0.05 < x < 0.05                

            --            100.0 * n_javka / n_spisok % 1   >    1 - 0.05
            --          OR
            --            100.0 * n_javka / n_spisok % 1   =        0.00
            --          OR
            --            100.0 * n_javka / n_spisok % 1   <        0.05

            --          ) --
            --  AND
            --            n_javka != n_spisok 

            -- -- integer percent in putin
            -- AND
            --      ( --    -0.05 < x < 0.05                

            --            100.0 * n_putin / n_golosov % 1   >   1 - 0.05
            --          OR  
            --            100.0 * n_putin / n_golosov % 1   =       0.00
            --          OR  
            --            100.0 * n_putin / n_golosov % 1   <       0.05

            --          ) --
            --  AND
            --            n_putin != n_golosov


            -- -- remove integer percent in javka
            -- AND
            --  ( -- not in delta or is exact 100%
            --      ( --    x <= - 0.05 OR 0.05 <= x 

            --              100.0 * n_javka / n_spisok % 1   <= 1 - 0.05
            --          AND
            --              100.0 * n_javka / n_spisok % 1   !=     0.00
            --          AND
            --              100.0 * n_javka / n_spisok % 1   >=     0.05
            --          ) --
            --  OR
            --              n_javka = n_spisok 

            --  ) -- not in delta or is exact 100%

            -- -- remove integer percent in putin
            -- AND  
            --                                n_golosov    > 0        -- protect against division by zero
            --  AND
            --  (  -- not in delta or is exact 100%
            --      ( --    x <= - 0.05 OR 0.05 <= x 

            --              100.0 * n_putin / n_golosov % 1  <= 1 - 0.05
            --          AND 
            --              100.0 * n_putin / n_golosov % 1  !=     0.00
            --          AND
            --              100.0 * n_putin / n_golosov % 1  >=     0.05
            --          ) --
            --  OR
            --              n_putin = n_golosov

            --  ) -- not in delta or is exact 100%

) -- filtered_election_2018 
    

-- ========================================================
--  prepared tables with numbers to test                (el_18_value) 
--  and counts of uiks that have reported this number   (count_of_uiks)

--  NOTE: the first table describes protocol rows 
--        The protocol_row_id is an integer which is better for JOINs
--        The values of the protocol_row_id must be set manually in subsecuent tables
--          and the correctness of this is not checked in this prototype

,   protocol_row_kinds (        protocol_row_name       --  readable name of a protocol row kind
                            ,   protocol_row_id         --  int ID to speed up joins
                            ,   protocol_row_type       --  supposed manipulation kind 
                                                            -- text value one of
                                                            -- 'raised'         numbers are moved from left  to right
                                                            -- 'suppressed'     numbers are moved from right to left
                                                            -- 'remainder'      numbers are changed to make sums of other numbers correct
                                                            -- 'insignificant'  numbers are not important 
                            ) AS (

    VALUES
                (   'spisok'                    ,   0 ,     'insignificant' )   -- NOTE: the manipulations with spisok are tricky and are ignored here
            ,   (   'javka'                     ,   1 ,     'raised'        )   -- most important values on 2018 elections and the most frequent fraud
            ,   (   'komissija'                 ,   2 ,     'insignificant' )

                -- most following protocol rows are not important        
            ,   (   'vydan_dosrochno'           ,  11 ,     'insignificant' )
            ,   (   'vydan_v_pomeshchenii'      ,  12 ,     'insignificant' )
            ,   (   'vydan_vne_pomeshchenija'   ,  13 ,     'insignificant' )

            ,   (   'pogashennyh'               ,  20 ,     'insignificant' )
            ,   (   'jashchik_perenosnoj'       ,  21 ,     'insignificant' )
            ,   (   'jashchik_statsionarnyj'    ,  22 ,     'insignificant' )
            ,   (   'ne_dejstvitelnyh'          ,  23 ,     'remainder'     )   --  values below statistically predicted are frequent in cases of fraud
            ,   (   'dejstvitelnyh'             ,  24 ,     'remainder'     )   --  this number is frequently changed by lazy UIKs 
                                                                                --      and is the sum of all 4n numbers  
            ,   (   'utrachennyh'               ,  25 ,     'insignificant' )
            ,   (   'ne_uchtennyh'              ,  16 ,     'insignificant' )

            ,   (   'baburin'                   ,  41 ,     'insignificant' )
            ,   (   'grudin'                    ,  42 ,     'suppressed'    )   --  the values for this candidate were reduced 
            ,   (   'zhirinovskij'              ,  43 ,     'insignificant' )   --  a weak candidate, possibly suppressed in some cases
            ,   (   'putin'                     ,  44 ,     'remainder'     )   --  the values were in many cases raised to reach good percentage 
            ,   (   'sobchak'                   ,  45 ,     'insignificant' )   --  a weak candidate, maybe slightly raised in some cases
            ,   (   'surajkin'                  ,  46 ,     'insignificant' )
            ,   (   'titov'                     ,  47 ,     'insignificant' )
            ,   (   'javlinskij'                ,  48 ,     'insignificant' )

) -- protocol_row_kinds

,   grouped_spisok_2018 AS (

    SELECT
                    0 -- 'spisok'::text
                                                            AS protocol_row_id
            ,           n_spisok                            AS el_18_value
            ,   count(  n_spisok )                          AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_spisok

) -- grouped_spisok_2018 

,   grouped_javka_2018 AS (

    SELECT
                    1 -- 'javka'::text
                                                            AS protocol_row_id
            ,           n_javka                             AS el_18_value
            ,   count(  n_javka )                           AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_javka

) -- grouped_javka_2018 

,   grouped_ne_dejstvitelnyh_2018 AS (


    SELECT
                    23 -- 'ne_dejstvitelnyh'::text
                                                            AS protocol_row_id
            ,           n_ne_dejstvitelnyh                  AS el_18_value
            ,   count(  n_ne_dejstvitelnyh )                AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_ne_dejstvitelnyh

) -- grouped_ne_dejstvitelnyh_2018 

,   grouped_dejstvitelnyh_2018 AS (


    SELECT
                    24 -- 'dejstvitelnyh'::text
                                                            AS protocol_row_id
            ,           n_dejstvitelnyh                     AS el_18_value
            ,   count(  n_dejstvitelnyh )                   AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_dejstvitelnyh

) -- grouped_dejstvitelnyh_2018 

,   grouped_grudin_2018 AS (


    SELECT
                    42 -- 'grudin'::text
                                                            AS protocol_row_id
            ,           n_grudin                            AS el_18_value
            ,   count(  n_grudin )                          AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_grudin

) -- grouped_grudin_2018

,   grouped_putin_2018 AS (


    SELECT
                    44 -- 'putin'::text     
                                                            AS protocol_row_id
            ,           n_putin                             AS el_18_value
            ,   count(  n_putin )                           AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_putin

) -- grouped_putin_2018 

,   grouped_zhirinovskij_2018 AS (


    SELECT
                    43 -- 'zhirinovskij'::text
                                                            AS protocol_row_id
            ,           n_zhirinovskij                      AS el_18_value
            ,   count(  n_zhirinovskij )                    AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_zhirinovskij

) -- grouped_zhirinovskij_2018

,   grouped_sobchak_2018 AS (


    SELECT
                    45 -- 'sobchak'::text
                                                            AS protocol_row_id
            ,           n_sobchak                           AS el_18_value
            ,   count(  n_sobchak )                         AS count_of_uiks
        FROM
                filtered_election_2018
        GROUP BY
                        n_sobchak

) -- grouped_sobchak_2018

-- NOTE: other protocol rows were not added because they are not interesting

-- ========================================================
-- aggregation table

,   ru_el_2018_number_set AS (
    
    --  this table collects all interesting kinds of protocol rows 
    --  
    --  NOTE: do not uncomment tables you do not use because they are small and this does not save calculation times 
    --          but deletion of some tables my make the data mix "uneven" and this may significantly increase times of some joins

        SELECT
                    *
            FROM
                    grouped_spisok_2018                 

    UNION
        SELECT      
                    *
            FROM
                    grouped_javka_2018      

    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_ne_dejstvitelnyh_2018

    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_dejstvitelnyh_2018

    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_putin_2018          
    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_grudin_2018         

    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_zhirinovskij_2018           
    UNION                                   
        SELECT      
                    *
            FROM
                    grouped_sobchak_2018            
    
) -- ru_el_2018_number_set

,   selected_protocol_row_ids AS (

    SELECT 
            DISTINCT protocol_row_id 
        FROM 
                ru_el_2018_number_set

) -- selected_protocol_row_ids 
 
-- ========================================================
--  preparation of a table with counts of UIKs for 3 neighbor numbers
--  This is the simplest method to check for magic numbers

,   full_numbers_vector AS (

    -- prepare full vector without gaps 
    -- NOTE: the count UIKs for a magic number may be 0 and this number will be absent in the ru_el_2018_number_set table

    SELECT 
                ordered_num                                 AS ordered_num
            ,   ordered_num %   10                          AS last_1
            ,   ordered_num %  100                          AS last_2
            ,   ordered_num % 1000                          AS last_3
            ,   ordered_num - 1                             AS previous_num
            ,   ordered_num + 1                             AS next_num         
          FROM 
                GENERATE_SERIES(   1
                                 , ( SELECT max( el_18_value ) FROM grouped_spisok_2018 )   -- n_spisok is the maximal possible number 
                                                                                            -- NOTE: max( n_javka ) is < 8000 
                                 )                          AS ordered_num
) -- full_numbers_vector


,   full_el_18_value_vector AS (

    -- collect count of UIKs for 3 neighbor numbers for all kinds of protocol_row_id 

    SELECT  
                selected_protocol_row_ids.protocol_row_id   AS  protocol_row_id
            ,   full_numbers_vector.ordered_num             AS  el_18_value 
            ,   full_numbers_vector.last_1                  AS  last_1
            ,   full_numbers_vector.last_2                  AS  last_2
            ,   full_numbers_vector.last_3                  AS  last_3

                --  counts
            ,   COALESCE(   the_same_el_18_value.count_of_uiks
                          , 0
                          )                                 AS  count_of_uiks_the_same
            ,   COALESCE(   previous_el_18_value.count_of_uiks
                          , 0
                          )                                 AS  count_of_uiks_previous
            ,   COALESCE(       next_el_18_value.count_of_uiks
                          , 0
                          )                                 AS  count_of_uiks_next

        FROM
                    full_numbers_vector     
            CROSS JOIN
                    selected_protocol_row_ids

            LEFT OUTER JOIN
                    ru_el_2018_number_set                   AS          the_same_el_18_value
                ON
                        full_numbers_vector.ordered_num             =   the_same_el_18_value.el_18_value
                    AND
                        selected_protocol_row_ids.protocol_row_id   =   the_same_el_18_value.protocol_row_id
            LEFT OUTER JOIN
                    ru_el_2018_number_set                   AS          previous_el_18_value
                ON
                        full_numbers_vector.previous_num            =   previous_el_18_value.el_18_value 
                    AND
                        selected_protocol_row_ids.protocol_row_id   =   previous_el_18_value.protocol_row_id

            LEFT OUTER JOIN
                    ru_el_2018_number_set                   AS          next_el_18_value
                ON
                        full_numbers_vector.next_num                =   next_el_18_value.el_18_value
                    AND 
                        selected_protocol_row_ids.protocol_row_id   =   next_el_18_value.protocol_row_id

        WHERE
                -- the same number or one of its neighbors must be reported by at least one UIK
                the_same_el_18_value.count_of_uiks  > 0
            OR
                previous_el_18_value.count_of_uiks > 0
            OR
                    next_el_18_value.count_of_uiks > 0

) -- full_el_18_value_vector

,   full_el_18_value_with_diffs AS (

    -- add differences (left_height and right_height) 
    --  and configuration flags (is_hill, is_pit, is_left_slope, etc.)

    SELECT
                protocol_row_kinds.protocol_row_id          AS protocol_row_id
            ,   protocol_row_kinds.protocol_row_name        AS protocol_row_name
            ,   protocol_row_kinds.protocol_row_type        AS protocol_row_type

            ,   el_18_value
            ,   last_1
            ,   last_2
            ,   last_3
            ,   count_of_uiks_the_same
            ,   count_of_uiks_previous
            ,   count_of_uiks_next
    
            ,   (    count_of_uiks_the_same
                   + count_of_uiks_previous
                   + count_of_uiks_next
                   ) / 3                                    AS mean_count_of_uiks

            --  ========================================
            --  add differences in counts of UIKs
            --  NOTE:   both values may be 0 

            --  ========================================
            --  left height as diff in uiks count
            --  ATTENTION: the code for left height calculation is partially repeated below 
            --              as right height calculation with count_of_uiks_previous ----> count_of_uiks_next

            ,   (    count_of_uiks_the_same
                   - count_of_uiks_previous
                   )                                        AS left_height

            --  ========================================
            -- left height as percent
            -- NOTE: it is simpler to make calculations at once 
            --       this means the calculation of mean_count_of_uiks is repeated many times
            ,   round(   100.0 
                            * -- left_height
                              (    count_of_uiks_the_same
                                 - count_of_uiks_previous   
                                ) 
                            / -- 3 values divided on 3  
                              (   count_of_uiks_the_same
                                + count_of_uiks_previous
                                + count_of_uiks_next
                                ) 
                            * 3

                        , 2 -- digits for percent part
                        )                                   AS left_height_percent

            --  ========================================
            --  right height / see above
            --  ATTENTION: the code for right height calculation is partially repeated above 
            --              as left height calculation with count_of_uiks_previous <---- count_of_uiks_next

            ,   (    count_of_uiks_the_same
                   - count_of_uiks_next
                   )                                        AS right_height

            --  ========================================
            -- right height as percent  / see above 
             
            ,   round(   100.0 
                            * -- right_height
                              (     count_of_uiks_the_same
                                  - count_of_uiks_next
                                  ) 
                            / -- mean count of uiks see above
                                (   count_of_uiks_the_same
                                  + count_of_uiks_previous
                                  + count_of_uiks_next
                                  )
                            * 3

                        , 2 -- digits for percent part / see above
                        )                                   AS right_height_percent

            --  ==========================================
            --  different configurations
            --
            --  NOTE:   1. values 1 and 0 mean "yes" and "no" but integers are better for sum() calls below
            --          2. the simplest condition is used to distinguish configurations
            --              it can be changed for instance to 
            --                      count_of_uiks_previous  + some_insignificant_delta <    count_of_uiks_the_same
            --              instead of
            --                      count_of_uiks_previous                             <    count_of_uiks_the_same 
            --          3. one and only one of the following configuration flags must be 1, 
            --              all other flags must be 0
            --              CAUTION: this condition is not checked

            --
            --      <  >    /\          is_hill
            --
            --
            --      >  <    \/          is_pit
            --
            --
            --               /
            --      <  <    /           is_left_slope
            --
            --
            --              \   
            --      >  >     \          is_right_slope
            --           
            --               _                   
            --      <  =    /           is_left_half_hill
            --
            --              _
            --      =  >     \          is_right_half_hill
            --
            --
            --      >  =    \_          is_left_half_pit
            --
            --
            --      =  <    _/          is_right_half_pit
            --
            --
            --      =  =    --          is_lawn

            ,   CASE
                    WHEN
                            count_of_uiks_previous  <   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  >   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_hill

            ,   CASE
                    WHEN
                            count_of_uiks_previous   >  count_of_uiks_the_same
                        AND
                            count_of_uiks_the_same   <  count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_pit

            ,   CASE
                    WHEN
                            count_of_uiks_previous  <   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  <   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_left_slope

            ,   CASE
                    WHEN
                            count_of_uiks_previous  >   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  >   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_right_slope

            ,   CASE
                    WHEN
                            count_of_uiks_previous  <   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  =   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_left_half_hill

            ,   CASE
                    WHEN
                            count_of_uiks_previous  =   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  >   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_right_half_hill

            ,   CASE
                    WHEN
                            count_of_uiks_previous  >   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  =   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_left_half_pit

            ,   CASE
                    WHEN
                            count_of_uiks_previous  =   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  <   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_right_half_pit

            ,   CASE
                    WHEN
                            count_of_uiks_previous  =   count_of_uiks_the_same 
                        AND
                            count_of_uiks_the_same  =   count_of_uiks_next
                    THEN
                            1
                    ELSE
                            0
                    END                                     AS is_lawn
        
        FROM
                    full_el_18_value_vector
            INNER JOIN
                    protocol_row_kinds
                ON 
                        full_el_18_value_vector.protocol_row_id
                    =         protocol_row_kinds.protocol_row_id 

        WHERE
                -- avoid cases when all 3 counts are = 0
                --  also protect against division by 0
                (   count_of_uiks_the_same
                  + count_of_uiks_previous
                  + count_of_uiks_next
                  ) > 0

) -- full_el_18_value_with_diffs

-- ========================================================
-- collect basic statistics

,   magic_number_statistics AS (

    SELECT 
                -- magic number as string with leading zeros
                lpad(   magic_numbers.magic_number::text
                      , magic_numbers.magic_number_len
                      , '0'
                      )                                                 AS  magic_number_string

            ,   magic_numbers.magic_number_type                         AS  magic_number_type

            ,   full_el_18_value_with_diffs.protocol_row_id         AS  protocol_row_id
            ,   full_el_18_value_with_diffs.protocol_row_name           AS  protocol_row_name
            ,   full_el_18_value_with_diffs.protocol_row_type           AS  protocol_row_type
    
            ,   count( * )                                              AS      count_all

                -- count occurrence of each configuration for magic_number occurrence
            ,   sum(            is_hill     )                           AS      count_hills
            ,   sum(            is_pit      )                           AS      count_pits
            ,   sum(       is_left_slope    )                           AS    count_l_slopes
            ,   sum(      is_right_slope    )                           AS    count_r_slopes
            ,   sum(  is_left_half_hill     )                           AS   count_lh_hills
            ,   sum( is_right_half_hill     )                           AS   count_rh_hills
            ,   sum(  is_left_half_pit      )                           AS   count_lh_pits
            ,   sum( is_right_half_pit      )                           AS   count_rh_pits
            ,   sum(            is_lawn     )                           AS      count_lawns

        FROM 
                    full_el_18_value_with_diffs             
            INNER JOIN
                    magic_numbers
                ON
                    -- the reported number ends with the magic number
                    magic_numbers.magic_number  =   (   CASE
                                                            WHEN    magic_number_len = 1 
                                                                THEN
                                                                    full_el_18_value_with_diffs.last_1

                                                            WHEN    magic_number_len = 2 
                                                                THEN
                                                                    full_el_18_value_with_diffs.last_2

                                                            WHEN    magic_number_len = 3
                                                                THEN
                                                                    full_el_18_value_with_diffs.last_3

                                                            END ) 

        WHERE
                -- NOTE: this is important place where statistically not interesting protocol values from far right tail are removed
 
                mean_count_of_uiks > 7      --  NOTE: The threshold number was randomly selected and may be changed 
                                            --          or other criteria may be used to filter our insignificant cases 

        GROUP BY
                magic_numbers.magic_number
            ,   magic_numbers.magic_number_len
            ,   magic_numbers.magic_number_type

            ,   full_el_18_value_with_diffs.protocol_row_id
            ,   full_el_18_value_with_diffs.protocol_row_name
            ,   full_el_18_value_with_diffs.protocol_row_type

) -- magic_number_statistics

,   magic_number_hypothesis_test_results AS (

    -- this table collect the number of cases that agree with the hypothesis about
    --  left and right slope (see above in the section "Hypothesis test" 

    SELECT
                magic_number_string                         AS magic_number_string

            ,   protocol_row_name                           AS protocol_row_name

            ,   magic_number_type                           AS magic_number_type
                
            ,   protocol_row_type                           AS protocol_row_type

            ,   count_all                                   AS count_all

            ,   CASE 

                    WHEN
                             protocol_row_type   =  'raised'    
                        AND
                             magic_number_type  =   'good_number'

                        THEN
                            -- count of UIKs for this number is bigger than the count of UIKs for previous number
                                count_hills
                            +   count_lh_hills
                            +   count_l_slopes

                    WHEN
                            protocol_row_type   =   'raised'    
                        AND
                            magic_number_type   =   'bad_number'
                        THEN
                            -- count for the next number is bigger 
                                count_pits
                            +   count_rh_pits
                            +   count_l_slopes

                    WHEN
                            protocol_row_type   =   'suppressed'    
                        AND
                            magic_number_type   =   'good_number'
                        THEN
                            -- count for the previous number is bigger 
                                count_pits
                            +   count_lh_pits
                            +   count_r_slopes
                    WHEN
                            protocol_row_type   =   'suppressed'    
                        AND
                            magic_number_type   =   'bad_number'
                        THEN
                            -- count for the next number is smaller 
                                count_hills
                            +   count_rh_hills
                            +   count_r_slopes
                    ELSE
                                0   -- there is no hypothesis to test
                    END                                     AS count_hypotheses
            ,   count_hills                                 AS count_hills
            ,   count_pits                                  AS count_pits

        FROM
                    magic_number_statistics
 
) -- magic_number_hypothesis_test_results

-- ========================================================
-- Preparation END
-- ========================================================

-- ========================================================
-- Reporting SELECTs
--
-- NOTE:    use one of the following reporting SELECT calls
--

-- -------
-- |  1. |
-- --------------------------------------------------------
-- test magic numbers hypothesis 

SELECT 
            round(    100.0 * count_hypotheses / count_all 
                    , 1
                    )                                       AS  hypotheses_confirmation_percent 
        ,   *
        ,   count_hills - count_pits                        AS diff_hills_pits
    FROM
            magic_number_hypothesis_test_results
    WHERE
            count_all > 7           -- NOTE: The threshold number was randomly selected and may be changed 
        AND
            protocol_row_type       IN ( 'raised', 'suppressed' )
        AND
            magic_number_type       IN ( 'good_number', 'bad_number' )
    ORDER BY
    --      hypotheses_confirmation_percent DESC
             length( magic_number_string )
        ,    magic_number_string
        ,    protocol_row_name
    ; --

-- -------
-- |  2. |
-- --------------------------------------------------------
-- list all numbers that end with the specific magic_number
--

SELECT 
             protocol_row_name
        ,    el_18_value
        ,    CASE 
                WHEN
                        count_of_uiks_previous < count_of_uiks_the_same
                    THEN
                        'raising'
                ELSE
                        'failing'
                END                                         AS counts_direction
        ,    is_hill
        ,    is_pit
        ,    mean_count_of_uiks

        ,   left_height
        ,   left_height_percent

    FROM 
            full_el_18_value_with_diffs     
    WHERE
            last_2 = 77
        AND
            protocol_row_name   IN  ( 'javka' )
        AND
            mean_count_of_uiks > 7      
        
    ORDER BY
            left_height_percent
        ,   el_18_value
        ,   protocol_row_id
    LIMIT       
                200
    ; --


-- -------
-- |  3. |
-- --------------------------------------------------------
-- list all heights for the specific magic number

SELECT 
            el_18_value
        ,   protocol_row_name
        ,   count_of_uiks_previous
        ,   count_of_uiks_the_same
        ,   count_of_uiks_next
        ,   left_height
        ,   right_height
        ,   left_height_percent
        ,   right_height_percent
        ,   is_hill
        ,   is_pit

    FROM 
            full_el_18_value_with_diffs
    WHERE
            last_2 = 77 
        AND
            protocol_row_name = 'javka'
    --  AND
    --      left_height <= 0

    ORDER BY
            left_height
        ,   left_height_percent DESC
    --  ,   el_18_value
    --  ,   protocol_row_id
    ; --

-- -------
-- |  4. |
-- --------------------------------------------------------
-- list left height statistic for magic number in all protocol row kinds

SELECT
            protocol_row_name                               AS protocol_row_name
    ,       round(    avg( left_height )
                    , 2
                    )                                       AS mean_left_hight
    ,       round(    avg( left_height_percent  )
                    , 2
                    )                                       AS mean_left_per_cent
    ,       sum( CASE
                        WHEN
                            left_height > 0 
                        THEN
                            1
                        ELSE
                            0
                    END )                                   AS count_left_positive
    ,       sum( CASE
                    WHEN
                            left_height <= 0 
                        THEN
                            1
                    ELSE
                            0
                END )                                       AS count_left_negative
    ,       sum( CASE
                    WHEN
                            left_height > 0 
                    THEN
                             1
                    ELSE
                            -1
                    END )                                   AS diff_count_left_positive
    ,       round(      100.0
                        * sum( CASE
                                WHEN
                                        left_height > 0 
                                THEN
                                        1
                                ELSE
                                        -1
                                END )   
                        / count(*)
                    ,   2
                    )                                       AS diff_count_left_positive
    FROM 
            full_el_18_value_with_diffs
    WHERE
            last_1 = 0
    --  AND 
    --      el_18_value > 100
    GROUP BY
            protocol_row_name
    ; --

-- -------
-- |  5. |
-- --------------------------------------------------------
-- list configurations statistics for a specific magic number

SELECT 

            protocol_row_name                                               AS protocol_row_name

        ,   count( * )                                                      AS count_all


        ,             sum(            is_hill   )                           AS      count_hills
        ,   round(    sum(            is_hill   ) * 100.0 / count(*)
                    , 1
                    )                                                       AS    percent_hills

        ,             sum(            is_pit    )                           AS      count_pits
        ,   round(    sum(            is_pit    ) * 100.0 / count(*)
                    , 1
                    )                                                       AS    percent_pits

        ,             sum(       is_left_slope  )                           AS    count_l_slopes
        ,   round(    sum(       is_left_slope  ) * 100.0  / count(*)
                    , 1
                    )                                                       AS  percent_l_slopes

        ,             sum(      is_right_slope  )                           AS    count_r_slopes
        ,   round(    sum(      is_right_slope  ) * 100.0  / count(*)
                    , 1
                    )                                                       AS  percent_r_slopes

        ,             sum(  is_left_half_hill   )                           AS   count_lh_hills
        ,   round(    sum(  is_left_half_hill   ) * 100.0  / count(*)
                    , 1
                    )                                                       AS percent_lh_hills

        ,             sum( is_right_half_hill   )                           AS   count_rh_hills
        ,   round(    sum( is_right_half_hill   ) * 100.0  / count(*)
                    , 1
                    )                                                       AS percent_rh_hills

        ,             sum(  is_left_half_pit        )                       AS   count_lh_pits
        ,   round(    sum(  is_left_half_pit        ) * 100.0  / count(*)
                    , 1
                    )                                                       AS percent_lh_pits

        ,             sum( is_right_half_pit        )                       AS   count_rh_pits
        ,   round(    sum( is_right_half_pit        ) * 100.0  / count(*)
                    , 1
                    )                                                       AS percent_rh_pits

        ,             sum(            is_lawn       )                       AS      count_lawns
        ,   round(    sum(            is_lawn       ) * 100.0  / count(*)
                    , 1
                    )                                                       AS    percent_lawns

    FROM 
            full_el_18_value_with_diffs
    WHERE
            last_1 = 0
    --  AND
    --      el_18_value > 50
    --  AND
    --      left_height < 0
        AND
            mean_count_of_uiks > 7
    GROUP BY
            protocol_row_name
    ORDER BY
            protocol_row_name
    LIMIT 200
    ; --

-- -------
-- |  6. |
-- --------------------------------------------------------
-- preparation of a segment for graphics 

SELECT

            full_numbers_vector.ordered_num                 AS  number_reported_by_uik

        ,           grouped_javka_2018.count_of_uiks        AS  count_of_uiks_javka 
        ,   grouped_dejstvitelnyh_2018.count_of_uiks        AS  count_of_uiks_dejstvitelnyh
        ,           grouped_putin_2018.count_of_uiks        AS  count_of_uiks_putin
        ,          grouped_grudin_2018.count_of_uiks        AS  count_of_uiks_grudin
        ,    grouped_zhirinovskij_2018.count_of_uiks        AS  count_of_uiks_zhirinovskij
        ,         grouped_sobchak_2018.count_of_uiks        AS  count_of_uiks_sobchak



    FROM
                    full_numbers_vector
        LEFT OUTER JOIN
                    grouped_javka_2018              
            ON
                    full_numbers_vector.ordered_num     =   grouped_javka_2018.el_18_value
        LEFT OUTER JOIN
                    grouped_dejstvitelnyh_2018
            ON
                    full_numbers_vector.ordered_num     =   grouped_dejstvitelnyh_2018.el_18_value 
        LEFT OUTER JOIN
                    grouped_putin_2018
            ON
                    full_numbers_vector.ordered_num     =   grouped_putin_2018.el_18_value
        LEFT OUTER JOIN
                    grouped_grudin_2018
            ON
                    full_numbers_vector.ordered_num     =   grouped_grudin_2018.el_18_value

        LEFT OUTER JOIN
                    grouped_zhirinovskij_2018
            ON
                    full_numbers_vector.ordered_num     =   grouped_zhirinovskij_2018.el_18_value
        LEFT OUTER JOIN
                    grouped_sobchak_2018
            ON
                    full_numbers_vector.ordered_num     =   grouped_sobchak_2018.el_18_value

    WHERE
                                        58  < full_numbers_vector.ordered_num
        AND
            full_numbers_vector.ordered_num < 72

    ORDER BY
            full_numbers_vector.ordered_num
    ; --


-- -------
-- |  7. |
-- --------------------------------------------------------
-- 2018-04-01
-- synchronous configurations:
--   select all numbers where 
--          grudin has more than 5 UIKs 
--      and the counts of uiks 
--              have significant hills by javka and putin 
--          but pits by grudin


SELECT
    DISTINCT
            this_number_by_javka.el_18_value

        ,   this_number_by_grudin.count_of_uiks_the_same    AS  count_of_uiks_grudin
        ,    this_number_by_javka.count_of_uiks_the_same    AS  count_of_uiks_javka
        ,    this_number_by_putin.count_of_uiks_the_same    AS  count_of_uiks_putin

    FROM
                full_el_18_value_with_diffs                 AS this_number_by_grudin
        INNER JOIN
                full_el_18_value_with_diffs                 AS this_number_by_javka
            ON
                     this_number_by_javka.el_18_value   
                =   this_number_by_grudin.el_18_value   
    
        INNER JOIN
                full_el_18_value_with_diffs                 AS this_number_by_putin
            ON
                    this_number_by_javka.el_18_value
                =   this_number_by_putin.el_18_value

    WHERE
            -- select kind_of_numbers by ID (see protocol_row_kinds table)
    
             this_number_by_javka.protocol_row_name     =   'javka'
        AND
            this_number_by_grudin.protocol_row_name     =   'grudin'
        AND
             this_number_by_putin.protocol_row_name     =   'putin'


            --  select is_pit by grudin and is_hill by others
        AND
             this_number_by_grudin.is_pit   =   1
        AND
             this_number_by_javka.is_hill   =   1
        AND
             this_number_by_putin.is_hill   =   1

            --  select heights
            --  NOTE: numbers move for grudin from right to left and for others they move in the opposite direction
        AND
            this_number_by_grudin.right_height_percent  <   -2.5
        AND
             this_number_by_javka.left_height_percent   >    2.5
        AND
             this_number_by_putin.left_height_percent   >    2.5

            --  filter out not significant cases 
        AND
            this_number_by_grudin.mean_count_of_uiks  > 10
        AND
             this_number_by_putin.mean_count_of_uiks  > 10

    ORDER BY
            this_number_by_javka.el_18_value
    ; --


-- -------
-- |  8. |
-- --------------------------------------------------------
-- test for correctness of the configurations calculations 
--  in full_el_18_value_with_diffs table.
-- This SELECT returns wrong rows. (Correct case returns an empty table)


SELECT 
            *
    FROM
            full_el_18_value_with_diffs
    WHERE
            (     is_hill   
                + is_pit 
                + is_left_slope 
                + is_right_slope  
                + is_left_half_hill
                + is_right_half_hill
                + is_left_half_pit
                + is_right_half_pit
                + is_lawn       
                ) != 1          

    LIMIT 400
    ; --

-- -------
-- |  9. |
-- --------------------------------------------------------
-- extended statistics for magic numbers test

SELECT 
            lpad(   magic_numbers.magic_number::text
                  , magic_numbers.magic_number_len
                  , '0'
                  )
        ,   magic_numbers.magic_number_type

        ,   protocol_row_name
    
        ,   count( * )                                              AS      count_all

        ,   sum(            is_hill
                        +   is_left_slope
                        + is_left_half_hill
                        )                                           AS      count_as_good_number


        ,   round( sum(     is_hill
                        +   is_left_slope
                        + is_left_half_hill
                        )                   * 100.0 / count(*)
                    , 1     
                    )                                               AS    percent_as_good_number

        ,   sum(            is_pit
                        +   is_right_slope
                        + is_right_half_pit
                        )                                           AS      count_as_bad_number


        ,   round( sum(     is_pit
                        +   is_right_slope
                        + is_right_half_pit
                        )                   * 100.0 / count(*)
                    , 1     
                    )                                               AS    percent_as_bad_number


        ,   sum(            is_hill     )                           AS      count_hills
        ,   round( sum(     is_hill     ) * 100.0 / count(*)
                   , 1
                   )                                                AS    percent_hills
        ,   sum(            is_pit      )                           AS      count_pits
        ,   round( sum(     is_pit      ) * 100.0  / count(*)
                   , 1
                   )                                                AS    percent_pits
--      ,   sum(       is_left_slope    )                           AS    count_l_slopes
--      ,   sum(       is_left_slope    ) * 100.0  / count(*)       AS  percent_l_slopes
--      ,   sum(      is_right_slope    )                           AS    count_r_slopes
--      ,   sum(      is_right_slope    ) * 100.0  / count(*)       AS  percent_r_slopes
--      ,   sum(  is_left_half_hill     )                           AS   count_lh_hills
--      ,   sum(  is_left_half_hill     ) * 100.0  / count(*)       AS percent_lh_hills
--      ,   sum( is_right_half_hill     )                           AS   count_rh_hills
--      ,   sum( is_right_half_hill     ) * 100.0  / count(*)       AS percent_rh_hills
--      ,   sum(  is_left_half_pit      )                           AS   count_lh_pits
--      ,   sum(  is_left_half_pit      ) * 100.0  / count(*)       AS percent_lh_pits
--      ,   sum( is_right_half_pit      )                           AS   count_rh_pits
--      ,   sum( is_right_half_pit      ) * 100.0  / count(*)       AS percent_rh_pits
--      ,   sum(            is_lawn     )                           AS      count_lawns
--      ,   sum(            is_lawn     ) * 100.0  / count(*)       AS    percent_lawns

    FROM 
                full_el_18_value_with_diffs             
        INNER JOIN
                magic_numbers
            ON
                -- the reported number ends with magic number
                magic_numbers.magic_number  =   (   CASE
                                                        WHEN    magic_number_len = 1 
                                                            THEN
                                                                full_el_18_value_with_diffs.last_1

                                                        WHEN    magic_number_len = 2 
                                                            THEN
                                                                full_el_18_value_with_diffs.last_2

                                                        WHEN    magic_number_len = 3
                                                            THEN
                                                                full_el_18_value_with_diffs.last_3

                                                        END ) 
    WHERE
            mean_count_of_uiks > 10
    GROUP BY
            magic_numbers.magic_number
        ,   magic_numbers.magic_number_len
        ,   magic_numbers.magic_number_type

        ,   full_el_18_value_with_diffs.protocol_row_id 
        ,   full_el_18_value_with_diffs.protocol_row_name

    ORDER BY
            magic_numbers.magic_number
        ,   magic_numbers.magic_number_len
        ,   full_el_18_value_with_diffs.protocol_row_id 

    LIMIT 400
    ; --

-- -------
-- | 10. |
-- --------------------------------------------------------
-- 2018-04-05
-- order protocol row values by left_height

SELECT 
            protocol_row_name
        ,   el_18_value
        ,   count_of_uiks_the_same
        ,   count_of_uiks_previous
        ,   count_of_uiks_next
        ,   left_height
        ,   left_height_percent
        ,   right_height
        ,   right_height_percent
    FROM
            full_el_18_value_with_diffs
    WHERE
            protocol_row_name = 'javka'
    ORDER BY
            --  uncomment one of the following 
            abs( left_height ) DESC
        --  abs( right_height ) DESC
    LIMIT 
            100
    ; --


-- -------
-- | 11. |
-- --------------------------------------------------------
-- left height for first 70 numbers for insignificant rows
--
-- NOTE: it is impossible to test insignificant candidates with simple methods
--       while count_of_uiks reises very quickly at small numbers and fails afterwards
--       However the graphics of the raise have anomalies on magic numbers

SELECT
            ordered_num

        ,   ne_dejstvitelnyh_numbers.left_height_percent    AS   ne_dejstvitelnyh_left_height_percent
        ,             grudin_numbers.left_height_percent    AS             grudin_left_height_percent
        ,       zhirinovskij_numbers.left_height_percent    AS       zhirinovskij_left_height_percent
        ,            sobchak_numbers.left_height_percent    AS            sobchak_left_height_percent

        ,   ne_dejstvitelnyh_numbers.count_of_uiks_the_same AS   ne_dejstvitelnyh_count_of_uiks
        ,             grudin_numbers.count_of_uiks_the_same AS             grudin_count_of_uiks
        ,       zhirinovskij_numbers.count_of_uiks_the_same AS       zhirinovskij_count_of_uiks
        ,            sobchak_numbers.count_of_uiks_the_same AS            sobchak_count_of_uiks

    FROM

                GENERATE_SERIES(   1, 70 )                  AS  ordered_num 

        INNER JOIN
                full_el_18_value_with_diffs                 AS  ne_dejstvitelnyh_numbers
            ON
                    ne_dejstvitelnyh_numbers.protocol_row_name  =   'ne_dejstvitelnyh'
                AND
                    ne_dejstvitelnyh_numbers.el_18_value        =   ordered_num  

        INNER JOIN
                full_el_18_value_with_diffs                 AS  grudin_numbers
            ON
                    grudin_numbers.protocol_row_name            =   'grudin'
                AND
                    grudin_numbers.el_18_value                  =   ordered_num  

        INNER JOIN
                full_el_18_value_with_diffs                 AS  zhirinovskij_numbers
            ON
                    zhirinovskij_numbers.protocol_row_name      =   'zhirinovskij'
                AND
                    zhirinovskij_numbers.el_18_value            =   ordered_num  

        INNER JOIN
                full_el_18_value_with_diffs                 AS  sobchak_numbers
            ON
                    sobchak_numbers.protocol_row_name           =   'sobchak'
                AND
                    sobchak_numbers.el_18_value                 =   ordered_num  

    ORDER BY
            ordered_num 
    LIMIT 
            200
    ; --
    
-- -------
-- | 12. |
-- --------------------------------------------------------
-- order last 2 digits from most preferred down to most avoided 

SELECT 
            lpad(   last_2::text
                  , 2
                  , '0'
                  )                                         AS last_2_digits_as_text

        ,   sum( count_of_uiks_the_same)                    AS count_of_uiks_with_last_2
        ,   ROW_NUMBER() 
                OVER( ORDER BY 
                        sum( count_of_uiks_the_same) DESC 
                      )                                     AS place
    FROM
            full_el_18_value_with_diffs

    WHERE
            protocol_row_name   = 'javka'           --  1
                                    -- 'grudin'     -- 42
                                    -- 'putin'      -- 44
    GROUP BY
            last_2

    ORDER BY
            place           
        --  last_2
    ; --

-- -------
-- | 13. |
-- --------------------------------------------------------
-- order last 2 digits from most preferred down to most avoided 
-- and then order places by last digit and then by second to make 10x10 table
-- NOTE: add into the table above filtered_election_2018
--          AND
--             n_javka   >= 100  
--      to filter out 1 and 2 digit numbers (from 0 till 99)

, last_two_digits_places AS (
    SELECT 
            lpad(   last_2::text
                  , 2
                  , '0'
                  )                                             AS last_2_digits_as_text
            ,   last_2 / 10                                     AS digit_before_last
            ,   last_2 % 10                                     AS last_digit

            ,   sum( count_of_uiks_the_same)                    AS count_of_uiks_with_last_2_digits
            ,   ROW_NUMBER() 
                    OVER( ORDER BY 
                            sum( count_of_uiks_the_same) DESC 
                          )                                     AS place
 
       FROM
                full_el_18_value_with_diffs

        WHERE
                protocol_row_name   = 'javka'           --  1
                                        -- 'grudin'     -- 42
                                        -- 'putin'      -- 44
        GROUP BY
                last_2

) --  last_two_digits_places


SELECT
            first_0_table.last_digit    AS  n
        ,   first_0_table.place         AS  "0n"
        ,   first_1_table.place         AS  "1n"
        ,   first_2_table.place         AS  "2n"
        ,   first_3_table.place         AS  "3n"
        ,   first_4_table.place         AS  "4n"
        ,   first_5_table.place         AS  "5n"
        ,   first_6_table.place         AS  "6n"
        ,   first_7_table.place         AS  "7n"
        ,   first_8_table.place         AS  "8n"
        ,   first_9_table.place         AS  "9n"


    FROM
                last_two_digits_places                  AS  first_0_table
            
        INNER JOIN
                last_two_digits_places                  AS  first_1_table
            ON
                    first_0_table.last_digit        =       first_1_table.last_digit
                AND
                    first_0_table.digit_before_last =   0   --  0n
                AND
                    first_1_table.digit_before_last =   1   --  1n

        INNER JOIN
                last_two_digits_places                  AS  first_2_table
            ON
                    first_0_table.last_digit        =       first_2_table.last_digit
                AND
                    first_2_table.digit_before_last =   2   --  2n

        INNER JOIN
                last_two_digits_places                  AS  first_3_table
            ON
                    first_0_table.last_digit        =       first_3_table.last_digit
                AND
                    first_3_table.digit_before_last =   3   --  3n

        INNER JOIN
                last_two_digits_places                  AS  first_4_table
            ON
                    first_0_table.last_digit        =       first_4_table.last_digit
                AND
                    first_4_table.digit_before_last =   4   --  4n

        INNER JOIN
                last_two_digits_places                  AS  first_5_table
            ON
                    first_0_table.last_digit        =       first_5_table.last_digit
                AND
                    first_5_table.digit_before_last =   5   --  5n

        INNER JOIN
                last_two_digits_places                  AS  first_6_table
            ON
                    first_0_table.last_digit        =       first_6_table.last_digit
                AND
                    first_6_table.digit_before_last =   6   --  6n

        INNER JOIN
                last_two_digits_places                  AS  first_7_table
            ON
                    first_0_table.last_digit        =       first_7_table.last_digit
                AND
                    first_7_table.digit_before_last =   7   --  7n

        INNER JOIN
                last_two_digits_places                  AS  first_8_table
            ON
                    first_0_table.last_digit        =       first_8_table.last_digit
                AND
                    first_8_table.digit_before_last =   8   --  8n

        INNER JOIN
                last_two_digits_places                  AS  first_9_table
            ON
                    first_0_table.last_digit        =       first_9_table.last_digit
                AND
                    first_9_table.digit_before_last =   9   --  9n


    ORDER BY
        first_0_table.last_digit
    ; --

-- -------
-- | 14. |
-- --------------------------------------------------------
-- list values with maximal count of uiks
-- NOTE:    this SELECT is slow.  
 
SELECT 
            protocol_row_name
        ,   el_18_value
        ,   count_of_uiks_the_same
    FROM
            full_el_18_value_with_diffs                     AS  el_18_values
    WHERE
    --      el_18_values.protocol_row_name = 'javka'    -- 
    --  AND
            el_18_values.count_of_uiks_the_same > ( -- max count of UIKs
                                                    SELECT  
                                                                max( count_of_uiks_the_same ) 
                                                        FROM  
                                                                full_el_18_value_with_diffs AS intern_for_max
                                                        WHERE
                                                                intern_for_max.protocol_row_name = el_18_values.protocol_row_name
                                                        ) -- max count of uiks
                                                    - 20    -- margin for numbers below the max number

    ORDER BY
            protocol_row_name
        ,   count_of_uiks_the_same DESC
        
    ;--

-- EOF

Date: 2018-04-13 01:16 am (UTC)
elsie_flynn: (Default)
From: [personal profile] elsie_flynn
Очень интересная идея, хотя я не все поняла в технической части. А нельзя вместо left/right slope использовать positive/negative slopes, please?

Date: 2018-04-13 02:11 pm (UTC)
elsie_flynn: (Default)
From: [personal profile] elsie_flynn
Про ямки - об@яснение просто замечательное, исчерпывающее.
А насчет склонов - там, где я живу, positive slope - negative slope общеизвестны, их дети в middle school учат. Не знаю, в России у нас этого вроде в школе не было.
Графики классные, с ними текст читается по-другому. Все становится по местам.

Profile

vit_r: default (Default)
vit_r

February 2026

S M T W T F S
12 34 567
8 9 1011121314
15161718192021
22232425262728

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 11th, 2026 11:29 am
Powered by Dreamwidth Studios