Group and count by another columns value

Group and count by another columns value - sql

I have a table like below:
CREATE TABLE public.test_table
(
"ID" serial PRIMARY KEY NOT NULL,
"CID" integer NOT NULL,
"SEG" integer NOT NULL,
"DDN" character varying(3) NOT NULL
)
and data looks like this:
ID CID SEG DDN
1 1 1 "711"
2 1 2 "800"
3 1 3 "124"
4 2 1 "711"
5 3 1 "711"
6 3 2 "802"
7 4 1 "799"
8 5 1 "799"
9 5 2 "804"
10 6 1 "799"
I need to group these data by CID column and get column counts depends on DDN columns first values but counts must give me two different information, if it's more than 1 or not.
I'm really sorry if couldn't explains clearly. Let me show you what I need..
DDN END TRA
711 1 2
799 2 1
As you can see, DDN:711 has 1 record of single count (ID:4). This is END column.
But 2 times has multiple SEG count (ID:1to3 and ID:5to6). This is TRA column.
I can not be sure what column should be in group clause!
My solution:
Just found a solution like below
WITH x AS (
SELECT
(SELECT t1."DDN" FROM public.test_table AS t1
WHERE t1."CID"=t."CID" AND t1."SEG"=1) AS ddn,
COUNT("CID") AS seg_count
FROM public.test_table AS t
GROUP BY "CID"
)
SELECT ddn, COUNT(seg_count) AS "TOTAL",
SUM(CASE WHEN x.seg_count=1 THEN 1 ELSE 0 END) as "END",
SUM(CASE WHEN x.seg_count>1 THEN 1 ELSE 0 END) as "TRA"
FROM x
GROUP BY ddn;

Equivalent, faster query:
SELECT "DDN"
, COUNT(*) AS "TOTAL"
, COUNT(*) FILTER (WHERE seg_count = 1) AS "END"
, COUNT(*) FILTER (WHERE seg_count > 1) AS "TRA"
FROM (
SELECT DISTINCT ON ("CID")
"DDN" -- assuming min "SEG" is always 1
, COUNT(*) OVER (PARTITION BY "CID") AS seg_count
FROM test_table
ORDER BY "CID", "SEG"
) sub
GROUP BY "DDN";
db<>fiddle here
Notes
CTEs are typically slower and should only be used where needed in Postgres.
This is equivalent to the query in the question assuming that the minimum "SEG" per "CID" is always 1 - since this query returns the row with the minimum "SEG" while your query returns the one with "SEG" = 1. Typically, you would want the "first" segment and my query implements this requirement more reliably, but that's not clear from the question.
COUNT(*) is slightly faster than COUNT(column) and equivalent while not involving NULL values (applicable here). Related:
PostgreSQL: running count of rows for a query 'by minute'
About DISTINCT ON:
Select first row in each GROUP BY group?
The aggregate FILTER syntax requires Postgres 9.4+:
Conditional SQL count

Here is the solution i propose, the query can be simplified i guess.
CREATE TABLE test_table
(
ID serial PRIMARY KEY NOT NULL,
CID integer NOT NULL,
SEG integer NOT NULL,
DDN character varying(3) NOT NULL
);
insert into test_table(CID,SEG,DDN)
values
( 1, 1, '711'),
( 1, 2, '800'),
( 1, 3, '124'),
( 2, 1, '711'),
( 3, 1, '711'),
( 3, 2, '802'),
( 4, 1, '799'),
( 5, 1, '799'),
( 5, 2, '804'),
( 6, 1, '799');
with summary as (with ddn_t as (select cid,ddn,row_number() OVER( PARTITION BY cid)from test_table)
select a.cid,count(distinct a.ddn),b.ddn
from ddn_t a
join ddn_t b on b.cid=a.cid and b.row_number=1
group by a.cid, b.ddn)
select ddn,
sum (case when count >1 then 1 else 0 end) as TRA,
sum (case when count = 1 then 1 else 0 end) as END
from summary
group by ddn;

Related

SQL Server exclusive select on column value

Let's say I am returning the following table from a select
CaseId
DocId
DocumentTypeId
DocumentType
ExpirationDate
1
1
1
I797
01/02/23
1
2
2
I94
01/02/23
1
3
3
Some Other Value
01/02/23
I want to select ONLY the row with DocumentType = 'I797', then if there is no 'I797', I want to select ONLY the row where DocumentType = 'I94'; failing to find either of those two I want to take all rows with any other value of DocumentType.
Using SQL Server ideally.
I think I'm looking for an XOR clause but can't work out how to do that in SQL Server or to get all other values.

Similar to #siggemannen answer
select top 1 with ties
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end gr
,docs.*
from docs
order by
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end
Shortest:
select top 1 with ties
docs.*
from docs
order by
case when DocumentType='I797' then 1
when DocumentType='I94' then 2
else 3
end

Something like this perhaps:
select *
from (
select t.*, DENSE_RANK() OVER(ORDER BY CASE WHEN DocumentType = 'I797' THEN 0 WHEN DocumentType = 'I94' THEN 1 ELSE 2 END) AS prioorder
from
(
VALUES
(1, 1, 1, N'I797', N'01/02/23')
, (1, 2, 2, N'I94', N'01/02/23')
, (1, 3, 3, N'Some Other Value', N'01/02/23')
, (1, 4, 3, N'Super Sekret', N'01/02/23')
) t (CaseId,DocId,DocumentTypeId,DocumentType,ExpirationDate)
) x
WHERE x.prioorder = 1
The idea is to rank rows by 1, 2, 3 depending on document type. Since we rank "the rest" the same, you will get all rows if I797 and I94 is missing.

select * from YourTable where DocumentType = 'I797'
union
select * from YourTable t where DocumentType = 'I94' and (not exists (select * from YourTable where DocumentType = 'I797'))
union
select * from YourTable t where (not exists (select * from YourTable where DocumentType = 'I797' or DocumentType = 'I94' ))

SQL Server - find horizontal occurrences

I am using SQL Server 2008R2 and have tableA that has four columns res_id,res_id2,res_id3,res_id4 numeric.
I want to find away to find the occurrences of the same IDs on each row (the met column) excluding 0 or null
Example:
golf_id res_id res_id2 res_id3 res_id4 met
1579 2068252 2068252 NULL 0 1
1492 2076015 2076015 2076016 2076016 2
1494 2076046 2076046 2076046 2076047 2
1617 2077041 2077042 2077043 2077044 4
1545 2076102 2076102 NULL NULL 1
So in the first row I have only 2068252 so met should be 1
In the second row I have 2076015 and 2076016 so met should be 2
In the third row I have 2076046 and 2076047 so met should be 2
In the fourth row I have 2077041, 2077042, 2077043, 2077044 so met
should be 4
In the fifth row I have 2076102 so met should be 1
Thank you

A simple solution would be to unpivot your data, and then COUNT the DISTINCT values. I'm pretty sure this'll work on 2008 R2 (though I don't have access to such an instance, nor have had access to one for the best part of a decade).
WITH YourTable AS(
SELECT *
FROM (VALUES(1579,2068252,2068252,NULL ,0 ),
(1492,2076015,2076015,2076016,2076016),
(1494,2076046,2076046,2076046,2076047),
(1617,2077041,2077042,2077043,2077044),
(1545,2076102,2076102,NULL ,NULL ))V(golf_id,res_id,res_id2,res_id3,res_id4))
SELECT YT.golf_id,
YT.res_id,
YT.res_id2,
YT.res_id3,
YT.res_id4,
(SELECT COUNT(DISTINCT NULLIF(V.res_id,0))
FROM (VALUES(res_id),(res_id2),(res_id3),(res_id4))V(res_id)) met
FROM YourTable YT;

Generic way to do is something like this:
select
a.golf_id
, a.res_id
, a.res_id2
, a.res_id3
, a.res_id4
, b.met
from (
select b.golf_id, count(distinct res_id) as met
from (
select golf_id, res_id from tableA where res_id > 0
union all
select golf_id, res_id2 from tableA where res_id2 > 0
union all
select golf_id, res_id3 from tableA where res_id3 > 0
union all
select golf_id, res_id4 from tableA where res_id4 > 0
) as b
group by b.golf_id
) as b
join tableA as a
on a.golf_id = b.golf_id

Computing number of filled columns in SQL table

I have a SQL table with 6 columns. 1 ID int column and 5 Datetime columns Round1, Round2, ..., Round5
The data looks something like this. Either there is a date or the cell is empty.
I would like the query to show the number of filled datetime columns. That is
Can you please give some hint on how to build this query? Would this involve aggregate function?
Thank you

Consider:
SELECT ID, IIf(Round1 Is Null, 0, 1) + IIf(Round2 Is Null, 0, 1) +
IIf(Round 3 Is Null, 0, 1) + IIf(Round4 Is Null, 0, 1) + IIf(Round5 Is Null, 0, 1) AS Cnt
FROM Table;
Aggregate function is not helpful unless you first normalize the data with UNION query.
SELECT ID, Round1 AS Dte, "R1" AS Src FROM table
UNION SELECT ID, Round2, "R2" FROM table
UNION SELECT ID, Round3, "R3" FROM table
UNION SELECT ID, Round4, "R4" FROM table
UNION SELECT ID, Round5, "R5" FROM table;
Then use that query in aggregate SQL.
SELECT ID, Count(Dte) AS CntD FROM Q1 GROUP BY ID;

You can use CASE expressions to return 1 when a value is not NULL or 0 otherwise. Then just add all that.
SELECT id,
CASE
WHEN round1 IS NOT NULL THEN
1
ELSE
0
END
+
...
CASE
WHEN round5 IS NOT NULL THEN
1
ELSE
0
END total
FROM elbat;
And next time do not post images of tables. Post their CREATE statements with sample data as INSERT statements. And tag the specific DBMS you're using.

Finding row of "duplicate" data with greatest value

I have a table setup as follows:
Key || Code || Date
5 2 2018
5 1 2017
8 1 2018
8 2 2017
I need to retrieve only the key and code where:
Code=2 AND Date > the other record's date
So based on this data above, I need to retrieve:
Key 5 with code=2
Key 8 does not meet the criteria since code 2's date is lower than code 1's date
I tried joining the table on itself but this returned incorrect data
Select key,code
from data d1
Join data d2 on d1.key = d2.key
Where d1.code = 2 and d1.date > d2.date
This method returned data with incorrect values and wrong data.

Perhaps you want this:
select d.*
from data d
where d.code = 2 and
d.date > (select d2.date
from data d2
where d2.key = d.key and d2.code = 1
);
If you just want the key, I would go for aggregation:
select d.key
from data d
group by d.key
having max(case when d2.code = 2 then date end) > max(case when d2.code <> 2 then date end);

use row_number, u can select rows with dates in ascending order. This is based on your sample data, selecting 2 rows
DECLARE #table TABLE ([key] INT, code INT, DATE INT)
INSERT #table
SELECT 5, 2, 2018
UNION ALL
SELECT 5, 2, 2018
UNION ALL
SELECT 8, 1, 2018
UNION ALL
SELECT 8, 2, 2017
SELECT [key], code, DATE
FROM (
SELECT [key], code, DATE, ROW_NUMBER() OVER (
PARTITION BY [key], code ORDER BY DATE
) rn
FROM #table
) x
WHERE rn = 2

How to select columns of data in BigQuery that has all NULL values

How to select columns of data in BigQuery that has all NULL values
A B C
NULL 1 NULL
NULL NULL NULL
NULL 2 NULL
NULL 3 NULL
I want to retrieve columns A and C. Please can you help!!

Expanding on my comment on Mikhail's answer, this is what I had in mind. It doesn't require generating a query string, which could be quite long if you have a large number of columns. It compares the count of null values for each column name to the total number of rows in the table to decide if the column should be included in the result.
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT null_column
FROM `project.dataset.table` AS t,
UNNEST(REGEXP_EXTRACT_ALL(
TO_JSON_STRING(t),
r'\"([a-zA-Z0-9\_]+)\":null')
) AS null_column
GROUP BY null_column
HAVING COUNT(*) = (SELECT COUNT(*) FROM `project.dataset.table`);

Below is for BigQuery StandardSQL
Simple option:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
it returns below where 0(zero) indicates that respective column has all NULLs
A B C
0 3 0
If this is "not enough" - below is more "sophisticated" version:
#standardSQL
WITH `project.dataset.table` AS (
SELECT NULL A, 1 B, NULL C UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT NULL, 2, NULL UNION ALL
SELECT NULL, 3, NULL
)
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT COUNT(A) A, COUNT(B) B, COUNT(C) C
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
it returns result as below - enlisting only columns with all NULLs
column
A
C
Note: for your real table - just remove WITH block and replace project.dataset.table with your real table reference
Also, of course, use real column names
My table has round 700 columns..
Below is an example of how you can easily generate above query for any number of columns.
1. Just run below
2. Copy result - this is a generated query
3. paste generated query into new UI and run it
4. Enjoy (I hope you will) result :o)
Of course, as usually replace project.dataset.table with your real table reference
#standardSQL
SELECT
CONCAT('''
SELECT SPLIT(y, ':')[OFFSET(0)] column
FROM (
SELECT REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', '') x
FROM (
SELECT ''', y,
'''
FROM `project.dataset.table`
) t
), UNNEST(SPLIT(x)) y
WHERE CAST(SPLIT(y, ':')[OFFSET(1)] AS INT64) = 0
'''
)
FROM (
SELECT
STRING_AGG(CONCAT('COUNT(', x, ') ', x), ', ') y
FROM (
SELECT REGEXP_EXTRACT_ALL(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}]', ''), r'"([\w_]+)":') x
FROM `project.dataset.table` t
LIMIT 1
), UNNEST(x) x
)
Note: please pay attention to query cost - both "generation query" and final query itself will do full scan
You can generate columns list much cheaper off of table schema in any client of your choice
To test / play with it - you can use same dummy data as for initial queries in my answer

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group and count by another columns value - sql

Related

SQL Server exclusive select on column value

SQL Server - find horizontal occurrences

Computing number of filled columns in SQL table

Finding row of "duplicate" data with greatest value

How to select columns of data in BigQuery that has all NULL values

Categories

Resources