Teradata: IN clause in Pivot can't take data from Table - sql

I wish to extract a few Calender Weeks from an yearly data. Once that's done, I want to pivot it, so that there is one row for each ID.
We have a table DB.MY_CWs having just one column CW containing the Calender Weeks we are interested in.
The following code extracts the relevant Calender Weeks.
CREATE TABLE DB.MY_TABLE AS
(
SELECT ID,
WeekNumber_Of_Year(Sales_Date)) AS CW,
AVG(Sales) AS Sales
FROM DB.DataBase_XYZ
WHERE CW IN (SELECT CW FROM DB.MY_CWs)
GROUP BY ID,CW
) WITH DATA;
This Code gives us the output like this:
But, I would like to pivot it so that I get an output like this:
I took the help from code here and ran the following, but TeraData doesn't respond and there is no Error either.
CREATE TABLE DB.MY_TABLE2 AS
(
SELECT *
FROM DB.MY_TABLE
PIVOT
(SUM(Sales) AS Sales
FOR CW IN (SELECT CW FROM DB.MY_CWs)
) AS dt
) WITH DATA;
If instead of (SELECT CW FROM DB.MY_CWs) I would have used (15,16,17), then everything works fine and I would have got the pivoted Table, as shown above.
Can anyone suggest where I am making the mistake?
Many thanks.

I tried to recreate the scenario.
I am getting below error.
CREATE TABLE Failed. 4306: (-4306)Invalid PIVOT query: PIVOT query with sub-query in IN-List is not supported in DDL statement.
There are few limitation while using subquery in pivot table.
TD Documentation:
https://docs.teradata.com/r/Teradata-VantageTM-NewSQL-Engine-Release-Summary/March-2019/Release-16.20-Feature-Update-1-Features/Subquery-Support-in-PIVOT-IN-List
Snippet from TD Documentation
Considerations
PIVOT with a subquery in the IN-list is not supported in a multistatement request. PIVOT columns are decided dynamically at the optimization phase. Because of this dynamic behavior, the following are usage considerations of a PIVOT query with a subquery in the IN-list.
Not supported in DDL creation statements.
Not supported in stored procedure's cursor FETCH statement.
SET operations are not allowed on a PIVOT query if subquery is given in the IN-list.
Resultant PIVOT column names cannot be explicitly specified in the SELECT list.
Does not support ORDER BY clause.
If you are using SQL Assistant, kindly check your history for the error details.
Otherwise you can query dbc.dbqlogtbl to check the errortext.
Workaround:
You can achieve the desired output through Dynamic SQL and Stored Procedure.
Steps:
Convert the output of the subquery to a String. We can do that through XMLAGG.
Concatenate the Step1 output in the IN Clause and execute the dynamically generated SQL.
REPLACE PROCEDURE DYNAMIC_PIVOT()
BEGIN
DECLARE Sqltxt VARCHAR(1000);
DECLARE CWtxt VARCHAR(250);
--Convert rows from MY_CWs to comma delimited string
SET CWtxt=(SELECT TRIM( TRAILING ',' FROM ( XMLAGG(CAST(CW AS VARCHAR(10))||',') (VARCHAR(255)) ) ) FROM MY_CWs);
SET Sqltxt=('CREATE TABLE MY_TABLE2 AS
(
SELECT *
FROM MY_TABLE
PIVOT
(SUM(Sales) AS Sales
FOR CW IN ('|| CWtxt ||')
) AS dt
) WITH DATA;') ;
CALL DBC.SYSEXECSQL(Sqltxt);
END;
CALL DYNAMIC_PIVOT();

Related

Using calculation with an an aliased column in ORDER BY

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.
The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort
It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

Sorting concatenated strings after grouping in Netezza

I'm using the code on this page to create concatenated list of strings on a group by aggregation basis.
https://dwgeek.com/netezza-group_concat-alternative-working-example.html/
I'm trying to get the concatenated string in sorted order, so that, for example, for DB1 I'd get data1,data2,data5,data9
I tied modifying the original code to selecting from a pre-sorted table but it doesn't seem to make any difference.
select Col1
, count(*) as NUM_OF_ROWS
, trim(trailing ',' from SETNZ..replace(SETNZ..replace (SETNZ..XMLserialize(SETNZ..XMLagg(SETNZ..XMLElement('X',col2))), '<X>','' ),'</X>' ,',' )) AS NZ_CONCAT_STRING
from
(select * from tbl_concat_demo order by 1,2) AS A
group by Col1
order by 1;
Is there a way to sort the strings before they get aggregated?
BTW - I'm aware there is a GROUP_CONCAT UDF function for Netezza, but I won't have access to it.
This is notoriously difficult to accomplish in sql, since sorting is usually done while returning the data, and you want to do it in the ‘input’ set.
Try this:
1)
Create temp table X as select * from tbl_concat_demo Order by col2
Partition by (col1)
In your original code above: select from X instead of tbl_concat_demo
Let me know if it works ?

Need SQL with subquery to get distinct values for VBA code

I have a table BAR_DATA with two fields: LongDate, Time. Both are long integers. No Access Date/Time involved here.
For each distinct LongDate value there are hundreds of records, each with Time value which may be distinct or duplicate within that LongDate.
I need to create an SQL statement that will group by LongDate and give me a count of distinct Times within each LongDate.
The following SQL statement, (built by an Acess query) does NOT work (some LongDates are omitted):
Query A
SELECT DISTINCT BAR_DATA.LongDate, Count(BAR_DATA.Time) AS CountOfTime
FROM BAR_DATA
GROUP BY BAR_DATA.LongDate
HAVING (((Count(BAR_DATA.Time))<>390 And (Count(BAR_DATA.Time))<>210));
However, if I use Query B to reference Query DistinctDateTime, it does work:
Query B
SELECT DistinctDateTime.LongDate, Count(DistinctDateTime.Time) AS CountOfTime
FROM DistinctDateTime
GROUP BY DistinctDateTime.LongDate
HAVING (((Count(DistinctDateTime.Time))<>390 And (Count(DistinctDateTime.Time))<>210));
Query DistinctDateTime
SELECT DISTINCT BAR_DATA.LongDate, BAR_DATA.Time
FROM BAR_DATA;
My problem:
I need to get Query B and Query DistinctDateTime wrapped into a single SQL statement so I can paste it into a VBA function. I presume there
is some subquery techniques, but I have failed at every attempt, and find no pertinent example.
Any help will be greatly appreciated. Thanks!
Subquery your distinct table inside and perform your aggregates outside until you get the desired result:
SELECT DistinctDateTime.LongDate, Count(DistinctDateTime.Time) AS CountOfTime
FROM
(
SELECT DISTINCT BAR_DATA.LongDate, BAR_DATA.Time
FROM BAR_DATA
) AS DistinctDateTime
GROUP BY DistinctDateTime.LongDate
HAVING (((Count(DistinctDateTime.Time))<>390 And (Count(DistinctDateTime.Time))<>210));

What exactly is the/this data statement in SAS doing? PostgreSQL equivalent?

I'm converting a SAS script to Python for a PostgreSQL environment. In a few places I've found a data statement in SAS, which looks something like this (in multiple scripts):
data dups;
set picc;
by btn wtn resp_ji;
if not (first.resp_ji and last.resp_ji);
run;
Obviously datasets aren't the same in python or SQL environments, and I'm having trouble determining what this specific statement is doing. To be clear, there are a number of scripts being converted which create a dataset in this manner with this same name. So my expectation would be that most of these would be overwritten over and over.
I'm also unclear as to what the postgres equivalent to the condition in the data dups statement would be.
Is there an obvious PostgreSQL statement that would work in its place? Something like this?:
CREATE TABLE dups AS
SELECT btn, wtn, resp_ji
WHERE /*some condition that matches the condition in the data statement*/
Does the
by btn wtn respji;
statement mean which columns are copied over, or is that the equivalent of an ORDER BY clause in PostgreSQL?
Thanks.
The statement is using what's called 'by group processing'. Before the step can run, it requires that the data is sorted by btn wtn resp_ji.
The first.resp_ji piece is checking to see if it's the first time it's seen the current value of resp_ji within the current btn/wtn combination. Likewise the last.resp_ji piece is checking if it's the final time that it will see the current value of resp_ji within the current btn/wtn combination.
Combining it all together the statement:
if not (first.resp_ji and last.resp_ji);
Is saying, if the current value of resp_ji occurs multiple times for the current combination of btn/wtn then keep the record, otherwise discard the record. The behaviour of the if statement when used like that implicitly keeps/discards the record.
To do the equivalent in SQL, you could do something like:
Find all records to discard.
Discard those records from the original dataset.
So...
create table rows_to_discard as
select btn, wtn, resp_ji, count(*) as freq
from mytable
group by btn, wtn, resp_ji
having count(*) = 1
create table want as
select a.*
from mytable a
left join rows_to_discard b on b.btn = a.btn
and b.wtn = a.wtn
and b.resp_ji = a.resp_ji
where b.btn is null
EDIT : I should mention that there is no simple SQL equivalent. It may be possible by numbering rows in subqueries, and then building logic on top of that but it'd be ugh-ly. It may also depend on the specific flavour of SQL being used.
As someone that learned SAS before postgressql, I found the following much more similar to SAS first. last. logic:
--first.
select distinct on (resp_ji) from <table> order by resp_ji
--last.
select distinct on (resp_ji) from <table> order by resp_ji desc
A way to detect duplicates (when no extra differentiating field is available) is to use the ctid as tie-breaker:
CREATE TABLE dups
AS
SELECT * FROM pics p
WHERE EXISTS ( SELECT * FROM pics x
WHERE x.btn = p.btn
AND x.wtn = p.wtn
AND x.resp_ji = p.resp_ji
AND x.ctid <> p.ctid
);

Selecting columns from a stored Proc?

I have a stored proc and I am trying to select all rows from it.
SELECT * FROM dbo.SEL_My_Func 'arg1','arg2','ar3'
didnt work. So I also tried:
SELECT * FROM EXEC dbo.SEL_My_Func 'arg1','arg2','ar3'
but this also didnt work. How do I get to test my stored proc returns correct results?
I have had to use a proc, rather than a function because I have an ORDER BY as part of the SQL, see: Selecting first row per group
Lastly, am I right in thinking there is no problem limiting which columns are returned from the stored probc, you just cant specify which rows (otherwise you would be better using a SQL function)?
The solution you are using qwill not work:
workarounds are there
SELECT a.[1], a.[2]
FROM OPENROWSET('SQLOLEDB','myserver';'sa';'mysapass',
'exec mydatabase.dbo.sp_onetwothree') AS a
or split your task in two queries
Declare #tablevar table(col1,..
insert into #tablevar(col1,..) exec MyStoredProc 'param1', 'param2'
SELECT col1, col2 FROM #tablevar
EXEC dbo.SEL_My_Func 'arg1','arg2','ar3'
Your assumption about functions is incorrect. You can use partition functions to select the first row in group.
Here is an example to find the first dealer_id for each client:
select client, dealer_id
from (
select client, dealer_guid
, RANK() over ( partition by client order by dealer_id) as rnk
from Dealers
) cd where rnk = 1
This can also be done with a function call as well as with a table (my example).