Teradata: On doing PIVOT operation values of columns interchange - sql

I am having a Table (DB.TAB_UNPIVOTED) on TeraData having millions of rows, and 4 columns - ID, Week, Sales & Profits. Here below is a small microcosm of the problem.
For sake of illustration DB.TAB_UNPIVOTED is like this:
I want to PIVOT this table. Following is my code:
SET SQL_STMT = ('CREATE TABLE DB.TAB_PIVOTED AS
( SELECT * FROM DB.TAB_UNPIVOTED
PIVOT
( SUM(Sales) AS Sales, SUM(Profits) AS Profits
FOR KW_Prefix IN (
'CW_1' AS CW_1,
'CW_2' AS CW_2,
'CW_3' AS CW_3
)
) AS dt
) WITH DATA;'
) ;
EXECUTE IMMEDIATE SQL_STMT;
The strange thing is that, on PIVOTING, everytime I get a different table DB.TAB_PIVOTED, where values across different columns are interchanged. For eg; one time the output can be:
But, next time, it could be a dfferent result, with values interchanged, though Profits & Sales maintain their pairing. In one output they could be under CW_1, where as on another run, it could be under CW_3:
This problem may not be reproduced on small data, but with my data in millions and CW varing from 1 to 52, I see it all the time.
Does anyone have an idea, where in the pivoting am I making the mistake?
Inputs would be very esteemed.
Update:
I tried running the code directly as well, instead of EXECUTE IMMEDIATE SQL_STMT,but same strange results.
CREATE TABLE DB.TAB_PIVOTED AS
( SELECT * FROM DB.TAB_UNPIVOTED
PIVOT
( SUM(Sales) AS Sales, SUM(Profits) AS Profits
FOR KW_Prefix IN (
'CW_1' AS CW_1,
'CW_2' AS CW_2,
'CW_3' AS CW_3
)
) AS dt
) WITH DATA;

Related

Validate my interpretation of an SQL query

my question is definitely going to be a little different, so I hope I'm still adhering to the stack overflow question etiquette. With that in mind, I'll get straight to the point.
Essentially, since I am still learning SQL I was looking at examples of scheduled queries in GCP and came across something and I wanted to see if I understand what's going on. So I took the query and wrote some comments explaining what I think the lines in the query are doing. The context in the code itself is irrelevant, I'm more curious if I'm correctly understanding what each of the clauses is doing.
Would anyone be able to tell me if I am interpreting it correctly or if I misunderstood some stuff, based on my comments? The code and comments are below. Note that the comments come first and the queries I'm commenting on follow directly after.
-- Create temporary table with the subquery below via the WITH () clause
-- Table contains session date, which webpage, total sessions, total sessions with a logout, and total clicks
-- The data in this temporary table is coming from the `gcp-project-223467.web.top_level` table in BigQuery
-- The columns correspond to dates 01/01/2022 & onwards, and exclude the 'Home'and 'Team' pages
-- The resulting data in the temp table is grouped by date & page type (first and second columns of the resulting temp table)
WITH logins AS (
SELECT
session_date as date,
website_page as page,
SUM(sessions) AS sessions,
SUM(sessions_with_logout) AS logouts,
SUM(clicks) AS clicks
FROM `gcp-project-223467.web.top_level`
WHERE DATE_session >= "2022-01-01"
AND website_page NOT IN ('Home','Team')
AND clicks > 0
GROUP BY 1, 2
)
-- Select the data from the above subquery (via SELECT logins.*)
-- Left join another temp table with data coming from `ingka-web-analytics-prod.web_data.transactions` in BigQuery
-- Left join is being done according to the logins & login_days date_hit AND logins & login_days ´logins_web´ columns.
-- The specific data taken from the aforementioned BQ table is aggregated and filtered via CASE WHEN - THEN statements
-- Further conditions are specified via the WHERE statements
-- The resulting temporary table in the subquery under LEFT JOIN is named login_days.
-- The columns in the select statement before the left join (web logins, mobile logins etc)
-- are from the temporary table in the select statement under the left join statement
SELECT
logins.*,
logins_web,
mobile_logins,
logins_ios,
logins_android,
logins_final
FROM logins
LEFT JOIN (
SELECT
date_hit as date,
website_page as page,
SUM(CASE WHEN login_type = 'web' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_web,
COUNT(DISTINCT CASE WHEN login_type = 'mobile' THEN login_id END ) AS mobile_logins,
SUM(CASE WHEN login_type = 'ipad' THEN SAFE_CAST(count_logins_final AS INT64) END ) AS logins_ios,
COUNT(DISTINCT CASE WHEN login_type = 'android' THEN login_id END ) AS logins_android,
COUNT(DISTINCT login_id) AS logins_final,
FROM `gcp-project-223467.web.login_data`
WHERE date_hit >= "2022-01-01" AND website_page NOT IN ('Home','Team')
AND count_logins_final != 'NaN'
AND count_logins_final NOT LIKE '%,%'
AND count_logins_final > '0'
AND website_platform != 'ibes'
AND login_type = 'Successful'
GROUP BY 1, 2
)login_days
ON logins.date = login_days.date AND logins.page = login_days.page
WHERE sessions_with_logout > 0

with XMLDIFF, how to compare only the fields that my xml elements have in common?

introduction:
I have query using a pipeline function. I won't change the names of the returned columns but I will add other columns.
I want to compare the result of the old query with the new query (syntaxal always the same (select * from mypipelinefunction) , but I have changed the pipeline function )
I have used "select *" instead of "select the name of the columns" because there is a lot names.
code:
the code example is simplified to focus on the problem addressed in the title. (no pipeline function. Only two "identic" queries are tested. The second query has one more column that the first.
SELECT
XMLDIFF (
XMLTYPE.createXML (
DBMS_XMLGEN.getxml ('select 1 one, 2 two from dual')),
XMLTYPE.createXML (
DBMS_XMLGEN.getxml ('select 1 one from dual')))
from dual.
I want that XMLDIFF to say that there is no difference because the only columns that I care about are the colums that are in common.
In short I would like to have this result
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</xd:xdiff>
instead of this result
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><xd:delete-node xd:node-type="element" xd:xpath="/ROWSET[1]/ROW[1]/TWO[1]"/></xd:xdiff>
Is this possible to force XMLdiff to compare only the columns that are in commun?
code
Another way to fix this problem would be to have a shortcut in TOAD that transform select * from t in select first_column, ......last_column from t. And it should work even if t is a pipeline function
If you only care about certain columns then wrap your query in a outer-query to only output the columns you care about:
SELECT XMLDIFF (
XMLTYPE.createXML (
DBMS_XMLGEN.getxml (
'SELECT one FROM (select 1 one, 2 two from dual)'
)
),
XMLTYPE.createXML (
DBMS_XMLGEN.getxml (
'SELECT one FROM (select 1 one from dual)'
)
)
) AS diff
FROM DUAL;
Which outputs:
DIFF
<xd:xdiff xsi:schemaLocation="http://xmlns.oracle.com/xdb/xdiff.xsd http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xd="http://xmlns.oracle.com/xdb/xdiff.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><?oracle-xmldiff operations-in-docorder="true" output-model="snapshot" diff-algorithm="global"?></xd:xdiff>
db<>fiddle here

BigQuery - Create a table from results of a query that uses complex CTEs?

I have a multi CTE query with large underlying datasets that is run too frequently. I could just create a table of the results of that query for people to use instead, and refresh that daily. But I'm lost on the syntax to create such a table.
CREATE OR REPLACE TABLE dataset.target_table
AS
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
FROM final
Is basically what I have. This actually creates the target table with the right schema even, but doesn't insert any rows...Any hints? Thanks
If you really want to do this in one step, you can just do SELECT INTO...
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
INTO dataset.target_table
FROM final
That said, since this isn't just a once-off need I recommend creating the landing table once initially and then scheduling a daily flush and fill (TRUNCATE + INSERT) to update the data. It will give you more explicit control over the data types and also lets you work with a persistent object rather than something built from scratch daily.

SQL Conditional join inside a function

This question has been asked many times on SO but I never quite found the answer to it, they are mostly solutions to avoid the problem altogether.
I'm working with SQL MS and I'm trying to build a query inside a function (for security reasons) that will either return a table or it's unnested version by country.
meaning that the function should either be
SELECT * FROM SALES AS S
or
SELECT
S.*,
C.Country,
C.CountryPercentage * S.AmountWithouthVAT as CountryValue
FROM SALES AS S
INNER JOIN CountryAllocation AS C ON S.CountryAllocationID = C.CountryAllocationID
(the fact that this join will make a single row into many rows is why I don't simply use the above one. And the reason why I don't make the join outside the function is because the person running the function will not have access to either of the tables. Also note that because of the way permissions in SQL Server work a dynamic query will require permission evaluation, meaning that is not a feasible option unless I'm to develop a structure around certificates)
So, now I got 2 problems:
The output table might or might not have the columns Country and CountryValue causing problems when defining the output type of the function
The actual way to have a function parameter to switch between the 2 versions of the table.
I've got a solution, but this code pains my eyes to look upon:
CREATE FUNCTION [dbo].[fn_I_view] (#Type int)
RETURNS #OutTable TABLE
(
SaleID int,
AmountWithouthVAT decimal(18, 2),
Country varchar(50),
AlocationPercentage decimal(18, 2)
)
AS
BEGIN
WITH
Out1 AS
(
SELECT
S.*,
NULL as Country,
NULL as AlocationPercentage
FROM Sales AS S
WHERE #Type = 1
),
Out2 AS
(
SELECT
S.*,
C.Country,
C.CountryPercentage * S.AmountWithouthVAT as CountryValue
FROM SALES AS S
INNER JOIN CountryAllocation AS C ON S.CountryAllocationID = C.CountryAllocationID
WHERE #Type = 2
)
INSERT INTO #OutTable
SELECT * FROM Out1
UNION ALL
SELECT * FROM Out2
RETURN
END
GO
so, I can't exactly fix the first problem, only worked around it by making SELECT * from [INV].[fn_I_ViewAllMyInvoices](1) still return those 2 extra columns with NULL and I didn't fix the second problem either, as I'm calculating both queries when I only needed 1 of them (and as you can expect this is a demo code, the real deal is way more complex)
Is there any way to improve this code?/solve the problem in a different way? performance, readability as well as maintenance improvements are all welcome
You don't need to calculate both. Just do:
BEGIN
IF #type = 1
BEGIN
INSERT INTO #OutTable
SELECT S.*, NULL as Country, NULL as AlocationPercentage
FROM Sales s;
END;
ELSE
BEGIN
INSERT INTO #OutTable
SELECT S.*, C.Country, C.CountryPercentage * S.AmountWithouthVAT as CountryValue
FROM SALES S JOIN
CountryAllocation C
ON S.CountryAllocationID = C.CountryAllocationID;
END;
RETURN;
END;

How should I temporarily store data within a PL/SQL procedure?

I am very new to PL/SQL. I have data in an initial table, named 'FLEX_PANEL_INSPECTIONS' that I am attempting to summarise in a second table, named 'PANEL_STATUS_2' using a PL/SQL procedure. However, due to the nature of the data, I have had to write a case statement in order to correctly summarise the data from FLEX_PANEL_INSPECTIONS. I have therefore created a third, intermediate table to bridge the two (named 'PANEL_STATUS_1') since the case statement will not allow columns in the group by clause which specifically order the data (to the extent of my knowledge - I get an error when I try and do this). I do not want to be storing data in the intermediate table - is there any way that I can either make it temporary (i.e. exist only while the procedure runs so that data from 'PANEL_STATUS_1' is not retained); create a view within the procedure, or remove the need for the intermediate table altogether?
Any help or criticism of my mistakes / misunderstanding of PL/SQL would be greatly appreciated. Here is the code I have written:
create or replace procedure PANEL_STATUS_PROCEDURE (panel_lot_id in number) as
begin
--Populate intermediate table with information about the status of the panels.
insert into PANEL_STATUS_1 (FLEX_LOT_ID, FLEX_PANEL_DMX, FLEX_PANEL_STATUS)
select FLEX_LOT_ID, FLEX_PANEL_DMX,
--Sum the status values of the 4 panel inspections. A panel passes if and only if this sum = 4.
case sum (FLEX_PANEL_STATUS)
when 4 then 1
else 0
end as new_panel_status
from FLEX_PANEL_INSPECTIONS
where FLEX_LOT_ID = panel_lot_id
group by FLEX_LOT_ID, FLEX_PANEL_DMX;
--Add information about the machine ID and the upload time to this table.
insert into PANEL_STATUS_2 (FLEX_LOT_ID, FLEX_PANEL_DMX, FLEX_PANEL_STATUS, MACHINE_ID, UPLOAD_TIME)
select distinct PANEL_STATUS_1.*, MACHINE_ID, UPLOAD_TIME
from PANEL_STATUS_1, FLEX_PANEL_INSPECTIONS
where (FLEX_PANEL_INSPECTIONS.FLEX_LOT_ID = PANEL_STATUS_1.FLEX_LOT_ID
and FLEX_PANEL_INSPECTIONS.FLEX_PANEL_DMX = PANEL_STATUS_1.FLEX_PANEL_DMX)
and FLEX_PANEL_INSPECTIONS.FLEX_LOT_ID = panel_lot_id;
end PANEL_STATUS_PROCEDURE;
/
You can create your temp table as
create global temporary table gtt_panel_status
( column datatype ... )
on commit [delete|preserve] rows;
(specifying either delete or preserve in the on commit clause).
However you usually don't need a temp table. You might try a with clause (CTE), or else an inline view along lines of select x, y, z from (select your subquery here).
Edit: actually looking at your query some more, I think what you a actually need is an analytic sum, i.e. a total without aggregating. For example, something like this:
create or replace procedure panel_status_procedure
( panel_lot_id in number )
as
begin
-- Add information about the machine ID and the upload time to this table.
insert into panel_status_2
( flex_lot_id
, flex_panel_dmx
, flex_panel_status
, machine_id
, upload_time )
select distinct
flex_lot_id
, flex_panel_dmx
, case sum(flex_panel_status) over (partition by flex_lot_id, flex_panel_dmx)
when 4 then 1
else 0
end
, machine_id
, upload_time
from flex_panel_inspections pi
where pi.flex_lot_id = panel_lot_id;
end panel_status_procedure;