Transpose rows to columns in Oracle 10g - sql

I have a table which looks like the following
po_num | terms type | terms description
-------------------------------------------
10 | 1 | Desc-10-1
10 | 2 | Desc-10-2
10 | 3 | Desc-10-3
20 | 1 | Desc-20-1
20 | 3 | Desc-20-3
30 | |
So, for each Purchase Order (PO_NUM) there could be several terms of agreements (maximum three - 1,2,3) or no terms of agreement at all. Now, what I need is to transpose rows into columns - that is, for each po_num, I would like to have a similar output like below.
po_num | terms1 | termsDesc2 | terms2 | termsDesc2 | terms3 |termsDesc3
---------------------------------------------------------------------------------------
10 | 1 | Desc-10-1 | 2 | Desc-10-2 | 3 |Desc10-3
20 | 1 | Desc-20-1 | | | 3 |Desc20-3
30 | | | | | |
I cannot use pivot since I don't have Oracle 11.2 installed. I do not wanna use scalar subqueries in the select because the performance degrades several times with that approach. I tried to use the following query to first concatenate all the fields, the split them with an outer query, but haven't been able to do it yet.
SELECT po_num,
RTRIM (
XMLAGG (
XMLELEMENT (
po_table,
po_table.terms_id || '|' || po_table.terms_description || '|')).
EXTRACT ('//text()'),
'|')
po_concat
FROM po_table
WHERE 1 = 1
GROUP BY PO_table.po_num

In 10g you can use DECODE function instead of PIVOT:
CREATE TABLE po_table (
po_num NUMBER,
terms_type NUMBER,
terms_description VARCHAR2(20)
);
INSERT INTO po_table VALUES(10, 1, 'Desc-10-1');
INSERT INTO po_table VALUES(10, 2, 'Desc-10-2');
INSERT INTO po_table VALUES(10, 3, 'Desc-10-3');
INSERT INTO po_table VALUES(20, 1, 'Desc-20-1');
INSERT INTO po_table VALUES(20, 3, 'Desc-20-3');
INSERT INTO po_table VALUES(30, NULL, NULL);
COMMIT;
SELECT
po_num,
MAX(DECODE(terms_type, 1, terms_type)) AS terms1,
MAX(DECODE(terms_type, 1, terms_description)) AS terms1Desc,
MAX(DECODE(terms_type, 2, terms_type)) AS terms2,
MAX(DECODE(terms_type, 2, terms_description)) AS terms2Desc,
MAX(DECODE(terms_type, 3, terms_type)) AS terms3,
MAX(DECODE(terms_type, 3, terms_description)) AS terms3Desc
FROM
po_table
GROUP BY po_num
ORDER BY po_num;
Output:
PO_NUM TERMS1 TERMS1DESC TERMS2 TERMS2DESC TERMS3 TERMS3DESC
---------- ------- ------------ ------- ------------ ------- ----------
10 1 Desc-10-1 2 Desc-10-2 3 Desc-10-3
20 1 Desc-20-1 3 Desc-20-3
30

Something like this:
SELECT
po_num,
MAX(CASE WHEN terms_id=1 THEN terms_description ELSE '' END) as termsDesc1,
MAX(CASE WHEN terms_id=2 THEN terms_description ELSE '' END) as termsDesc2,
MAX(CASE WHEN terms_id=3 THEN terms_description ELSE '' END) as termsDesc3
FROM po_table
GROUP BY po_num

Related

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!
There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

SELECT check the colum of the max row

Here my row with my first select:
SELECT
user.id, analytic_youtube_demographic.age,
analytic_youtube_demographic.percent
FROM
`user`
INNER JOIN
analytic ON analytic.user_id = user.id
INNER JOIN
analytic_youtube_demographic ON analytic_youtube_demographic.analytic_id = analytic.id
Result:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |13-17| 19,6 |
| 1 |18-24| 38.4 |
| 1 |25-34| 22.5 |
| 1 |35-44| 11.5 |
| 1 |45-54| 5.3 |
| 1 |55-64| 1.6 |
| 1 |65+ | 1.2 |
| 2 |13-17| 10 |
| 2 |18-24| 10 |
| 2 |25-34| 25 |
| 2 |35-44| 5 |
| 2 |45-54| 25 |
| 2 |55-64| 5 |
| 1 |65+ | 20 |
---------------------------
The max value by user_id:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |18-24| 38.4 |
| 2 |45-54| 25 |
| 2 |25-34| 25 |
---------------------------
And I need to filter Age in ['25-34', '65+']
I must have at the end :
-----------
| id |
|----------
| 2 |
-----------
Thanks a lot for your help.
Have tried to use MAX(analytic_youtube_demographic.percent). But I don't know how to filter with the age too.
Thanks a lot for your help.
You can use the rank() function to identify the largest percentage values within each user's data set, and then a simple WHERE clause to get those entries that are both of the highest rank and belong to one of the specific demographics you're interested in. Since you can't use windowed functions like rank() in a WHERE clause, this is a two-step process with a subquery or a CTE. Something like this ought to do it:
-- Sample data from the question:
create table [user] (id bigint);
insert [user] values
(1), (2);
create table analytic (id bigint, [user_id] bigint);
insert analytic values
(1, 1), (2, 2);
create table analytic_youtube_demographic (analytic_id bigint, age varchar(32), [percent] decimal(5, 2));
insert analytic_youtube_demographic values
(1, '13-17', 19.6),
(1, '18-24', 38.4),
(1, '25-34', 22.5),
(1, '35-44', 11.5),
(1, '45-54', 5.3),
(1, '55-64', 1.6),
(1, '65+', 1.2),
(2, '13-17', 10),
(2, '18-24', 10),
(2, '25-34', 25),
(2, '35-44', 5),
(2, '45-54', 25),
(2, '55-64', 5),
(2, '65+', 20);
-- First, within the set of records for each user.id, use the rank() function to
-- identify the demographics with the highest percentage.
with RankedDataCTE as
(
select
[user].id,
youtube.age,
youtube.[percent],
[rank] = rank() over (partition by [user].id order by youtube.[percent] desc)
from
[user]
inner join analytic on analytic.[user_id] = [user].id
inner join analytic_youtube_demographic youtube on youtube.analytic_id = analytic.id
)
-- Now select only those records that are (a) of the highest rank within their
-- user.id and (b) either the '25-34' or the '65+' age group.
select
id,
age,
[percent]
from
RankedDataCTE
where
[rank] = 1 and
age in ('25-34', '65+');

Update table in Postgresql by grouping rows

I want to update a table by grouping (or combining) some rows together based on a certain criteria. I basically have this table currently (I want to group by 'id_number' and 'date' and sum 'count'):
Table: foo
---------------------------------------
| id_number | date | count |
---------------------------------------
| 1 | 2001 | 1 |
| 1 | 2001 | 2 |
| 1 | 2002 | 1 |
| 2 | 2001 | 6 |
| 2 | 2003 | 12 |
| 2 | 2003 | 2 |
---------------------------------------
And I want to get this:
Table: foo
---------------------------------------
| id_number | date | count |
---------------------------------------
| 1 | 2001 | 3 |
| 1 | 2002 | 1 |
| 2 | 2001 | 6 |
| 2 | 2003 | 14 |
---------------------------------------
I know that I can easily create a new table with the pertinent info. But how can I modify an existing table like this without making a "temp" table? (Note: I have nothing against using a temporary table, I'm just interested in seeing if I can do it this way)
If you want to delete rows you can add a primary key (for distinguish rows) and use two sentences, an UPDATE for the sum and a DELETE for obtain less rows.
You can do something like this:
create table foo (
id integer primary key,
id_number integer,
date integer,
count integer
);
insert into foo values
(1, 1 , 2001 , 1 ),
(2, 1 , 2001 , 2 ),
(3, 1 , 2002 , 1 ),
(4, 2 , 2001 , 6 ),
(5, 2 , 2003 , 12 ),
(6, 2 , 2003 , 2 );
select * from foo;
update foo
set count = count_sum
from (
select id, id_number, date,
sum(count) over (partition by id_number, date) as count_sum
from foo
) foo_added
where foo.id_number = foo_added.id_number
and foo.date = foo_added.date;
delete from foo
using (
select id, id_number, date,
row_number() over (partition by id_number, date order by id) as inner_order
from foo
) foo_ranked
where foo.id = foo_ranked.id
and foo_ranked.inner_order <> 1;
select * from foo;
You can try it here: http://rextester.com/PIL12447
With only one UPDATE
(but with a trigger) you can set a NULL value in count and trigger a DELETE in that case.
create table foo (
id integer primary key,
id_number integer,
date integer,
count integer
);
create function delete_if_count_is_null() returns trigger
language plpgsql as
$BODY$
begin
if new.count is null then
delete from foo
where id = new.id;
end if;
return new;
end;
$BODY$;
create trigger delete_if_count_is_null
after update on foo
for each row
execute procedure delete_if_count_is_null();
insert into foo values
(1, 1 , 2001 , 1 ),
(2, 1 , 2001 , 2 ),
(3, 1 , 2002 , 1 ),
(4, 2 , 2001 , 6 ),
(5, 2 , 2003 , 12 ),
(6, 2 , 2003 , 2 );
select * from foo;
update foo
set count = case when inner_order = 1 then count_sum else null end
from (
select id, id_number, date,
sum(count) over (partition by id_number, date) as count_sum,
row_number() over (partition by id_number, date order by id) as inner_order
from foo
) foo_added
where foo.id_number = foo_added.id_number
and foo.date = foo_added.date
and foo.id = foo_added.id;
select * from foo;
You can try it in: http://rextester.com/MWPRG10961

Count Based on Columns in SQL Server

I have 3 tables:
SELECT id, letter
FROM As
+--------+--------+
| id | letter |
+--------+--------+
| 1 | A |
| 2 | B |
+--------+--------+
SELECT id, letter
FROM Xs
+--------+------------+
| id | letter |
+--------+------------+
| 1 | X |
| 2 | Y |
| 3 | Z |
+--------+------------+
SELECT id, As_id, Xs_id
FROM A_X
+--------+-------+-------+
| id | As_id | Xs_id |
+--------+-------+-------+
| 9 | 1 | 1 |
| 10 | 1 | 2 |
| 11 | 2 | 3 |
| 12 | 1 | 2 |
| 13 | 2 | 3 |
| 14 | 1 | 1 |
+--------+-------+-------+
I can count all As and Bs with group by. But I want to count As and Bs based on X,Y and Z. What I want to get is below:
+-------+
| X,Y,Z |
+-------+
| 2,2,0 |
| 0,0,2 |
+-------+
X,Y,Z
A 2,2,0
B 0,0,2
What is the best way to do this at MSSQL? Is it an efficent way to use foreach for example?
edit: It is not a duplicate because I just wanted to know the efficent way not any way.
For what you're trying to do without knowing what is inefficient with your current code (because none was provided), a Pivot is best. There are a million resources online and here in the stack overflow Q/A forums to find what you need. This is probably the simplest explanation of a Pivot which I frequently need to remind myself of the complicated syntax of a pivot.
To specifically answer your question, this is the code that shows how the link above applies to your question
First Tables needed to be created
DECLARE #AS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XS AS TABLE (ID INT, LETTER VARCHAR(1))
DECLARE #XA AS TABLE (ID INT, AsID INT, XsID INT)
Values were added to the tables
INSERT INTO #AS (ID, Letter)
SELECT 1,'A'
UNION
SELECT 2,'B'
INSERT INTO #XS (ID, Letter)
SELECT 1,'X'
UNION
SELECT 2,'Y'
UNION
SELECT 3,'Z'
INSERT INTO #XA (ID, ASID, XSID)
SELECT 9,1,1
UNION
SELECT 10,1,2
UNION
SELECT 11,2,3
UNION
SELECT 12,1,2
UNION
SELECT 13,2,3
UNION
SELECT 14,1,1
Then the query which does the pivot is constructed:
SELECT LetterA, [X],[Y],[Z]
FROM (SELECT A.LETTER AS LetterA
,B.LETTER AS LetterX
,C.ID
FROM #XA C
JOIN #AS A
ON A.ID = C.ASID
JOIN #XS B
ON B.ID = C.XSID
) Src
PIVOT (COUNT(ID)
FOR LetterX IN ([X],[Y],[Z])
) AS PVT
When executed, your results are as follows:
Letter X Y Z
A 2 2 0
B 0 0 2
As i said in comment ... just join and do simple pivot
if object_id('tempdb..#AAs') is not null drop table #AAs
create table #AAs(id int, letter nvarchar(5))
if object_id('tempdb..#XXs') is not null drop table #XXs
create table #XXs(id int, letter nvarchar(5))
if object_id('tempdb..#A_X') is not null drop table #A_X
create table #A_X(id int, AAs int, XXs int)
insert into #AAs (id, letter) values (1, 'A'), (2, 'B')
insert into #XXs (id, letter) values (1, 'X'), (2, 'Y'), (3, 'Z')
insert into #A_X (id, AAs, XXs)
values (9, 1, 1),
(10, 1, 2),
(11, 2, 3),
(12, 1, 2),
(13, 2, 3),
(14, 1, 1)
select LetterA,
ISNULL([X], 0) [X],
ISNULL([Y], 0) [Y],
ISNULL([Z], 0) [Z]
from (
select distinct a.letter [LetterA], x.letter [LetterX],
count(*) over (partition by a.letter, x.letter order by a.letter) [Counted]
from #A_X ax
join #AAs A on ax.AAs = A.ID
join #XXs X on ax.XXs = X.ID
)src
PIVOT
(
MAX ([Counted]) for LetterX in ([X], [Y], [Z])
) piv
You get result as you asked for
LetterA X Y Z
A 2 2 0
B 0 0 2