GROUP BY returns wrong count in SnowFlake

GROUP BY returns wrong count in SnowFlake - sql

Below is a row from a SnowFlake query. This is the only row in the result with this information (i.e., this row is unique).
ID ACCOUNT_NUMBER DATE_1 DATE_2
123 347 2017-10-19 2017-10-29
I ran a GROUP BY like below to count the number of rows in each group. I got 3 for the above row. Shouldn't I get 1?
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TABLE GROUP BY 1, 2, 3, 4;
ID ACCOUNT_NUMBER DATE_1 DATE_2 COUNT
123 347 2017-10-19 2017-10-29 3
I expected to see count of 1 for this row, but I got 3.

The result is correct. The DISTINCT is applied after the grouping and has not effect in provided query.
Docs
Typically, a SELECT statement’s clauses are evaluated in the order shown below:
From
Where
Group by
Having
Window
QUALIFY
Distinct
Order by
Limit
Both below queries produces the same result:
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
To apply DISTINCT it should be provided before grouping(subquery)
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM (SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2 FROM TAB)
GROUP BY 1, 2, 3, 4;
or as a part of aggregate function:
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2,
COUNT(DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2)
FROM TAB
GROUP BY 1, 2, 3, 4;
For sample data:
CREATE OR REPLACE TABLE TAB(ID INT,
ACCOUNT_NUMBER INT,
DATE_1 TEXT,
DATE_2 TEXT)
AS
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29';

Related

Obtaining one description from multiple domain tables with Oracle

How can I merge descriptions obtained from three domain tables into a single description column?
There is a transaction table that has transaction ID's and three domain tables that between them have the descriptions for the transaction IDs - something like this:
Transaction Table: TANS_TBL with columns like TRANS_ID, TRANS_START_TM, TRANS_END_TM, TRANS_RESULT_CD
Domain Table 1: DMN_TANS_DESC_TBL1 with columns like TRANS_ID, DMN1_SHORT_DESC, DMN1_LONG_DESC
Domain Table 2: DMN_TANS_DESC_TBL2 with columns like TRANS_ID, DMN2_SHORT_DESC, DMN2_LONG_DESC
Domain Table 3: DMN_TANS_DESC_TBL3 with columns like TRANS_ID, DMN3_SHORT_DESC, DMN3_LONG_DESC
The rows for the TRANS_ID and short descriptions are not unique
in each table. A TRANS_ID may have multiple rows in a Domain Table.
Only the Short Description is needed; only one row for a
TRANS_ID is wanted.
The column names for descriptions are
different in each domain table.
Any given TRANS_ID will appear in only one domain table (I believe other things in the application will break if that is not true, but I don't see anything to enforce that)
Data needs to be extracted with the column headers like this:
TRANS_ID, TRANS_SHORT_DESC, TRANS_START_TM, TRANS_END_TM
No table modifications or additions are permitted.
Using this, the descriptions can be obtained:
select trns.TRANS_ID, dmn1.DMN1_SHORT_DESC, dmn2.DMN2_SHORT_DESC, dmn3.DMN3_SHORT_DESC
from TRANS_TBL trns
left join DMN_TANS_DESC_TBL1 dmn1 ON dmn1.TRANS_ID=trns.TRANS_ID
left join DMN_TANS_DESC_TBL2 dmn2 ON dmn2.TRANS_ID=trns.TRANS_ID
left join DMN_TANS_DESC_TBL3 dmn3 ON dmn3.TRANS_ID=trns.TRANS_ID;
`
However that has two problems:
Duplicate rows for each domain table description row, and
There are three description columns, two out of three NULL, for each row
One description row from one domain table can be obtained with this:
select TRANS_ID, TRANS_DESC
from ( select dmn1.TRANS_ID, dmn1.DMN1_SHORT_DESC as "TRANS_DESC", row_number()
over( partision by dmn1.TRANS_ID ORDER by dmn1.TRANS_ID) as row_num
from DM_TANS_DESC_TBL1 dmn1
)
where row_num=1;
But I haven't found a way to bring those descriptions from the 3 domain tables into a single transaction ID description column.

You can simply use COALESCE or nested NVL or a DECODE or a CASE statement to combine the columns into one. Something like:
SELECT trans_id,
trans_desc
FROM (SELECT trans_id,
trans_desc,
ROW_NUMBER() OVER (PARTITION BY trans_id ORDER BY DECODE(trans_desc,NULL,2,1) ASC, trans_id DESC) seq
FROM (SELECT trans_tbl.trans_id,
COALESCE(dmn1.dmn1_short_desc,dmn2.dmn2_short_desc,dmn3.dmn3_short_desc) trans_desc
FROM trans_tbl
left join DMN_TANS_DESC_TBL1 dmn1 ON dmn1.TRANS_ID=trns.TRANS_ID
left join DMN_TANS_DESC_TBL2 dmn2 ON dmn2.TRANS_ID=trns.TRANS_ID
left join DMN_TANS_DESC_TBL3 dmn3 ON dmn3.TRANS_ID=trns.TRANS_ID))
WHERE seq = 1
The DECODE in the ROW_NUMBER logic is in order to prefer non-null values over null values.

With your sample data something like here:
WITH
tbl (TRANS_ID, TRANS_START_TM, TRANS_END_TM) AS
(
Select 1, '09:00:00' , '09:01:00' From Dual Union All
Select 2, '09:12:00' , '09:15:00' From Dual Union All
Select 3, '09:16:00' , '09:17:00' From Dual Union All
Select 4, '09:21:00' , '09:22:00' From Dual Union All
Select 5, '09:23:00' , '09:27:00' From Dual
),
desc_tbl_1 (TRANS_ID, DMN1_SHORT_DESC) AS
(
Select 1, 'D1 T1 - some desc' From Dual Union All
Select 1, 'D1 T1 - some desc' From Dual
),
desc_tbl_2 (TRANS_ID, DMN2_SHORT_DESC) AS
(
Select 2, 'D2 T2 - some desc' From Dual Union All
Select 2, 'D2 T2 - some desc' From Dual Union All
Select 3, 'D2 T3 - some desc' From Dual
),
desc_tbl_3 (TRANS_ID, DMN3_SHORT_DESC) AS
(
Select 4, 'D3 T4 - some desc' From Dual Union All
Select 5, 'D3 T5 - some desc ' From Dual
),
... you could create a CTE descriptions to colect them in one column
descriptions (TRANS_ID, TRANS_SHORT_DESC, RN) AS
(
Select TRANS_ID, DMN1_SHORT_DESC, ROW_NUMBER() OVER(Partition By TRANS_ID Order By 1) From desc_tbl_1 Union All
Select TRANS_ID, DMN2_SHORT_DESC, ROW_NUMBER() OVER(Partition By TRANS_ID Order By 1) From desc_tbl_2 Union All
Select TRANS_ID, DMN3_SHORT_DESC, ROW_NUMBER() OVER(Partition By TRANS_ID Order By 1) From desc_tbl_3
)
Main SQL
Select t.TRANS_ID, d.TRANS_SHORT_DESC, t.TRANS_START_TM, t.TRANS_END_TM
From tbl t
Inner Join descriptions d ON(d.TRANS_ID = t.TRANS_ID And d.RN = 1)
Result:
TRANS_ID
TRANS_SHORT_DESC
TRANS_START_TM
TRANS_END_TM
1
D1 T1 - some desc
09:00:00
09:01:00
2
D2 T2 - some desc
09:12:00
09:15:00
3
D2 T3 - some desc
09:16:00
09:17:00
4
D3 T4 - some desc
09:21:00
09:22:00
5
D3 T5 - some desc
09:23:00
09:27:00
NOTE - If there is a possibility that some ID has no description from 3 domains then use Left Join and handle null value.

how to select first non-duplicate data in sql

I have table(Id, Name, Type) in sql.
Id, Name, Type:
1, AA, 1
2, BB, 2
3, CC, 4
4, DD, 2
5, EE, 3
6, FF, 3
I want select the first non-duplicate data. Result:
Id, Name, Type:
1, AA, 1
2, BB, 2
3, CC, 4
6, FF, 3
I use DISTINCT and GROUP BY, but not working, I have select all row not select Type with DISTINCT or GROUP BY.
select DISTINCT Type
from tbltest

I like CTE's and ROW_NUMBER since it allows to change it easily to delete the duplicates.
Presuming that you want to remove duplicate Types and first means according to the ID:
WITH CTE AS(
SELECT Id, Name, Type,
RN = ROW_NUMBER() OVER ( PARTITION BY Type ORDER BY ID )
FROM dbo.Table1
)
SELECT Id, Name, Type FROM CTE WHERE RN = 1

You can do this in several ways. My preference is row_number():
select id, name, type
from (select t.*, row_number() over (partition by type order by id) as seqnum
from tbltest t
) t
where seqnum = 1;
EDIT:
Performance of the above should be reasonable. However, the following might be faster with an index on type, id:
selct id, name, type
from tbltest t
where not exists (select 1 from tbltest t2 where t2.type = t.type and t2.id < t.id);
That is, select the rows that have no lower id for the same type.

Oracle SQL: Adding row to select result

Say I have a table with columns: id, group_id, type, val
Some example data from the select:
1, 1, 'budget', 100
2, 1, 'budget adjustment', 10
3, 2, 'budget', 500
4, 2, 'budget adjustment', 30
I want the result to look like
1, 1, 'budget', 100
2, 1, 'budget adjustment', 10
5, 1, 'budget total', 110
3, 2, 'budget', 500
4, 2, 'budget adjustment', 30
6, 2, 'budget total', 530
Please advise,
Thanks.

This will get the you two added lines desired, but not the values for ID and type that you want.
Oracle examples: http://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm
Select id, group_id, type as myType, sum(val) as sumVal
FROM Table name
Group by Grouping sets ((id, group_id, type, val), (group_ID))

As #Serpiton suggested, it seems the functionality you're really looking for is the ability to add sub-totals to your result set, which indicates that rollup is what you need. The usage would be something like this:
SELECT id,
group_id,
coalesce(type, 'budget total') as type,
sum(val) as val
FROM your_table
GROUP BY ROLLUP (group_id), id, type

You can using union all to add more row to original select.
select group_id,type,val from tableA
union all
select group_id, 'budget total' as type,sum(val) as val from tableA group by group_id,type
To show right order and id you can using nested select
select rownum, group_id,type,val from (select group_id,type,val from tableA
union all
select group_id, 'budget total' as type,sum(val) as val from tableA group by group_id,type) order by group_id asc

with foo as
(select 1 group_id, 'budget' type, 100 val
from dual
union
select 1, 'budget adjustment', 10
from dual
union
select 2, 'budget', 500
from dual
union
select 2, 'budget adjustment', 30
from dual)
SELECT rank() over(order by type, group_id) rk,
group_id,
nvl(type, 'budget total') as type,
sum(val) as val
FROM foo
group by Grouping sets((group_id, type, val),(group_id))
its just the continuation of xQbert post to have id values!

Multiple row's coulmns in one row's multiple columns

Table Schema
ID Status Patient
1 critical Gabriel
1 moderate Frank
1 critical Dorin
2 low Peter
3 critical Noman
3 moderate Johnson
Expected OutPut
ID Patient1 Patient2
1 Gabriel Dorin
3 Noman Null
Here I have to show only those patient whose situation is critcal.
I found the similar question Multiple column values in a single row, but its in SQL also the columns are hard coded.
Thanks!

First step is to select the critical patients and order them:
select id, patient, row_number() over (partition by id order by patient) as rnk
from your_table
where status='critical';
After this you can select first two critical patients in this manner:
select id,
max(case when rnk=1 then patient end) as Patient1,
max(case when rnk=2 then patient end) as Patient2
from (
select id,
patient,
row_number() over (partition by id order by patient) as rnk
from your_table
where status='critical'
)
group by id;
If you want a more flexible solution you can try a query like below, but you should choose the number of ranks in before the runtime:
with your_table as
(select 1 as id, 'critical' as status, 'Gabriel' as patient from dual
union all
select 1, 'moderate', 'Frank' from dual union all
select 1, 'critical', 'Dorin' from dual union all
select 1, 'critical', 'Vasile' from dual union all
select 2, 'low', 'Peter' from dual union all
select 3, 'critical', 'Noman' from dual union all
select 3, 'moderate', 'Johnson' from dual )
select * from (
select id, patient, row_number() over (partition by id order by patient) as rnk
from your_table
where status='critical'
)
pivot (max(patient) for rnk in (1, 2, 3))
order by 1 ;
(This is for three patients.)

Try to build query and execute the result to a cursor.
SET SERVEROUTPUT ON
DECLARE
v_fact NUMBER := 1;
v_max_cnt number:=1;
V_query CLOB:='';
BEGIN
select max(RNum) into v_max_cnt from(
select row_number() over (partition by ID order by ID) RNum from PATIENTSTATUS where status='critical'
)x;
FOR v_counter IN 1..v_max_cnt LOOP
V_query := V_query||v_fact||' as Patient'||v_fact||(case when v_fact=v_max_cnt then '' else ',' end);
v_fact:=v_fact+1;
END LOOP;
DBMS_OUTPUT.PUT_LINE ('select * from (
select id, patient, row_number() over (partition by id order by patient) as rnk
from PATIENTSTATUS
where status=''critical'')
pivot (max(patient) for rnk in ('||V_query||'))
order by 1;');
END;
From a procedure, data can be inserted to a cursor by
OPEN CUR_Your_Cursor FOR V_query;

Select query select based on a priority

Someone please change my title to better reflect what I am trying to ask.
I have a table like
Table (id, value, value_type, data)
ID is NOT unique. There is no unique key.
value_type has two possible values, let's say A and B.
Type B is better than A, but often not available.
For each id if any records with value_type B exists, I want all the records with that id and value_type B.
If no record for that id with value_Type B exists I want all records with that id and value_type A.
Notice that if B exists for that id I don't want records with type A.
I currently do this with a series of temp tables. Is there a single select statement (sub queries OK) that can do the job?
Thanks so much!
Additional details:
SQL Server 2005

RANK, rather than ROW_NUMBER, because you want ties (those with the same B value) to have the same rank value:
WITH summary AS (
SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM TABLE t
WHERE t.value_type IN ('A', 'B'))
SELECT s.id,
s.value,
s.value_type,
s.data
FROM summary s
WHERE s.rank = 1
Non CTE version:
SELECT s.id,
s.value,
s.value_type,
s.data
FROM (SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM TABLE t
WHERE t.value_type IN ('A', 'B')) s
WHERE s.rank = 1
WITH test AS (
SELECT 1 AS id, 'B' AS value_type
UNION ALL
SELECT 1, 'B'
UNION ALL
SELECT 1, 'A'
UNION ALL
SELECT 2, 'A'
UNION ALL
SELECT 2, 'A'),
summary AS (
SELECT t.*,
RANK() OVER (PARTITION BY t.id
ORDER BY t.value_type DESC) AS rank
FROM test t)
SELECT *
FROM summary
WHERE rank = 1
I get:
id value_type rank
----------------------
1 B 1
1 B 1
2 A 1
2 A 1

SELECT *
FROM table
WHERE value_type = B
UNION ALL
SELECT *
FROM table
WHERE ID not in (SELECT distinct id
FROM table
WHERE value_type = B)

The shortest query to do the job I can think of:
SELECT TOP 1 WITH TIES *
FROM #test
ORDER BY Rank() OVER (PARTITION BY id ORDER BY value_type DESC)
This is about 50% worse on CPU as OMG Ponies' and Christoperous 5000's solutions, but the same number of reads. It's the extra sort that is making it take more CPU.
The best-performing original query I've come up with so far is:
SELECT *
FROM #test
WHERE value_type = 'B'
UNION ALL
SELECT *
FROM #test T1
WHERE NOT EXISTS (
SELECT *
FROM #test T2
WHERE
T1.id = T2.id
AND T2.value_type = 'B'
)
This consistently beats all the others presented on CPU by about 1/3rd (the others are about 50% more) but has 3x the number of reads. The duration on this query is often 2/3rds the time of all the others. I consider it a good contender.
Indexes and data types could change everything.

declare #test as table(
id int , value [nvarchar](255),value_type [nvarchar](255),data int)
INSERT INTO #test
SELECT 1, 'X', 'A',1 UNION
SELECT 1, 'X', 'A',2 UNION
SELECT 1, 'X', 'A',3 UNION
SELECT 1, 'X', 'A',4 UNION
SELECT 2, 'X', 'A',5 UNION
SELECT 2, 'X', 'B',6 UNION
SELECT 2, 'X', 'B',7 UNION
SELECT 2, 'X', 'A',8 UNION
SELECT 2, 'X', 'A',9
SELECT * FROM #test x
INNER JOIN
(SELECT id, MAX(value_type) as value_type FROM
#test GROUP BY id) as y
ON x.id = y.id AND x.value_type = y.value_type

Try this (MSSQL).
Select id, value_typeB, null
from myTable
where value_typeB is not null
Union All
Select id, null, value_typeA
from myTable
where value_typeB is null and value_typeA is not null

Perhaps something like this:
select * from mytable
where id in (select distinct id where value_type = "B")
union
select * from mytable
where id in (select distinct id where value_type = "A"
and id not in (select distinct id where value_type = "B"))

This uses a union, combining all records of value B with all records that have only A values:
SELECT *
FROM mainTable
WHERE value_type = B
GROUP BY value_type UNION SELECT *
FROM mainTable
WHERE value_type = A
AND id NOT IN(SELECT *
FROM mainTable
WHERE value_type = B);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GROUP BY returns wrong count in SnowFlake - sql

Related

Obtaining one description from multiple domain tables with Oracle

how to select first non-duplicate data in sql

Oracle SQL: Adding row to select result

Multiple row's coulmns in one row's multiple columns

Select query select based on a priority

Categories

Resources