Including count combinations with null value in SQL - sql

I have one dataset, and am trying to list all of the combinations of said dataset. However, I am unable to figure out how to include the combinations that are null. For example, Longitudinal? can be no and cohort can be 11-20, however for Region 1, there were no patients of that age in that region. How can I show a 0 for the count?
Here is the code:
SELECT "s_safe_005prod"."ig_eligi_group1"."site_name" AS "Site Name",
"s_safe_005prod"."ig_eligi_group1"."il_eligi_ellong" AS "Longitudinal?",
"s_safe_005prod"."ig_eligi_group1"."il_eligi_elcohort" AS "Cohort",
count(*) AS "count"
FROM "s_safe_005prod"."ig_eligi_group1"
GROUP BY "s_safe_005prod"."ig_eligi_group1"."site_name",
"s_safe_005prod"."ig_eligi_group1"."il_eligi_ellong",
"s_safe_005prod"."ig_eligi_group1"."il_eligi_elcohort"
ORDER BY "s_safe_005prod"."ig_eligi_group1"."site_name",
"s_safe_005prod"."ig_eligi_group1"."il_eligi_ellong" ASC,
"s_safe_005prod"."ig_eligi_group1"."il_eligi_elcohort" ASC

Create a cross join across the unique values from each of the three grouping fields to create a set of all possible combinations. Then left join that to the counts you have originally and coalesce null values to zero.
WITH groups AS
(
SELECT a.site_name, b.longitudinal, c.cohort
FROM (SELECT DISTINCT site_name FROM s_safe_005prod.ig_eligi_group1) a,
(SELECT DISTINCT il_eligi_ellong AS longitudinal FROM s_safe_005prod.ig_eligi_group1) b,
(SELECT DISTINCT il_eligi_elcohort AS cohort FROM s_safe_005prod.ig_eligi_group1) c
),
dat AS
(
SELECT site_name,
il_eligi_ellong AS longitudinal,
il_eligi_elcohort AS cohort,
count(*) AS "count"
FROM s_safe_005prod.ig_eligi_group1
GROUP BY site_name,
il_eligi_ellong,
il_eligi_elcohort
)
SELECT groups.site_name,
groups.longitudinal,
groups.cohort,
COALESCE(dat.[count],0) AS "count"
FROM groups
LEFT JOIN dat ON groups.site_name = dat.site_name
AND groups.longitudinal = dat.longitudinal
AND groups.cohort = dat.cohort;

Related

Select other table as a column based on datetime in BigQuery [duplicate]

This question already has an answer here:
Full outer join and Group By in BigQuery
(1 answer)
Closed 5 months ago.
I have two tables which has a relationship, but I want to grouping them based on time. Here are the tables
I want select a receipt as a column based on published_at, it must be in between pickup_time and drop_time, so will get this result :
I tried with JOIN, but it seems like select rows with drop_time is NULL only
SELECT
t.source_id AS source_id,
t.pickup_time AS pickup_time,
t.drop_time AS drop_time,
ARRAY_AGG(STRUCT(r.source_id, r.receipt_id, r.published_at) ORDER BY r.published_at LIMIT 1)[SAFE_OFFSET(0)] AS receipt
FROM `my-project-gcp.data_source.trips` AS t
JOIN `my-project-gcp.data_source.receipts` AS r
ON
t.source_id = r.source_id
AND
r.published_at >= t.pickup_time
AND (
r.published_at <= t.drop_time
OR t.drop_time IS NULL
)
GROUP BY source_id, pickup_time, drop_time
and tried with sub-query, got
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN
SELECT
t.source_id AS source_id,
t.pickup_time AS pickup_time,
t.drop_time AS drop_time,
ARRAY_AGG((
SELECT
STRUCT(r.source_id, r.receipt_id, r.published_at)
FROM `my-project-gcp.data_source.receipts` as r
WHERE
t.source_id = r.source_id
AND
r.published_at >= t.pickup_time
AND (
r.published_at <= t.drop_time
OR t.drop_time IS NULL
)
LIMIT 1
))[SAFE_OFFSET(0)] AS receipt
FROM `my-project-gcp.data_source.trips` as t
GROUP BY source_id, pickup_time, drop_time
Each source_id is a car and only one driver can drive a car at once.
We can partition therefore by that entry.
Your approach is working for small tables. Since there is no unique join key, the cross join fails on large tables.
I present here a solution with union all and look back technique. This is quite fast and works with up to middle large table sizes in the range of a few GB. It prevents the cross join, but is a quite long script.
In the table trips are all drives by the drivers are listed. The receipts list all fines.
We need a unique row identication of each trip to join on this one later on. We use the row number for this, please see table trips_with_rowid.
The table summery_tmp unions three tables. First we load the trips table and add an empty column for the fines. Then we load the trips table again to mark the times were no one was driving the car. Finally, we add the table receipts such that only the columns source_id, pickup_time and fine is filled.
This table is sorted by the pickup_time for each source_id and the table summary. So the fine entries are under the entry of the driver getting the car. The column row_id_new is filled for the fine entries by the value of the row_id of the driver getting the car.
Grouping by row_id_new and filtering unneeded entries does the job.
I changed the second of the entered times (lazyness), thus it differs a bit from your result.
With trips as
(Select 1 source_id ,timestamp("2022-7-19 9:37:47") pickup_time, timestamp("2022-07-19 9:40:00") as drop_time, "jhon" driver_name
Union all Select 1 ,timestamp("2022-7-19 12:00:01"),timestamp("2022-7-19 13:05:11"),"doe"
Union all Select 1 ,timestamp("2022-7-19 14:30:01"),null,"foo"
Union all Select 3 ,timestamp("2022-7-24 08:35:01"),timestamp("2022-7-24 09:15:01"),"bar"
Union all Select 4 ,timestamp("2022-7-25 10:24:01"),timestamp("2022-7-25 11:14:01"),"jhon"
),
receipts as
(Select 1 source_id, 101 receipt_id, timestamp("2022-07-19 9:37:47") published_at,40 price
Union all Select 1,102, timestamp("2022-07-19 13:04:47"),45
Union all Select 1,103, timestamp("2022-07-19 15:23:00"),32
Union all Select 3,301, timestamp("2022-07-24 09:15:47"),45
Union all Select 4,401, timestamp("2022-07-25 11:13:47"),45
Union all Select 5,501, timestamp("2022-07-18 07:12:47"),45
),
trips_with_rowid as
(
SELECT 2*row_number() over (order by source_id,pickup_time) as row_id, * from trips
),
summery_tmp as
(
Select *, null as fines from trips_with_rowid
union all Select row_id+1,source_id,drop_time,null,concat("no driver, last one ",driver_name),null from trips_with_rowid
union all select null,source_id, published_at, null,null, R from receipts R
),
summery as
(
SELECT last_value(row_id ignore nulls) over (partition by source_id order by pickup_time ) row_id_new
,*
from summery_tmp
order by 1,2
)
select source_id,min(pickup_time) pickup_time, min(drop_time) drop_time,
any_value(driver_name) driver_name, array_agg(fines IGNORE NULLS) as fines_Sum
from summery
group by row_id_new,source_id
having fines_sum is not null or (pickup_time is not null and driver_name not like "no driver%")
order by 1,2

Count Distinct values in one column based on other column

I am trying to count distinct values on Z_l based on value by using with clause. Sample data exercise included below.
please look at the picture, the distinct values of Z_l based on X='ny'
with distincz_l as (select ny.X, ny.z_l o.cnt From HOPL ny join (select X, count(*) as cnt from HOPL group by X) o on (ny.X = o.Z_l)) select * from HOPL;
You don't even need a WITH clause, since you just need one single sentence:
SELECT z_l, count(1)
FROM hopl
WHERE x='ny'
GROUP BY z_l
;

How to include column not included in Group By

I have the table DirectCosts with the following columns:
DetailsID (unique)
InvoiceNumber
ProjectID
PayableID
I need to find the duplicates combinations of payableid and invoicenumber.
How can I adjust the following query so that it accommodates the combination AND displays the list of instead of the count?
SELECT sinvoicenumber, count(*)
FROM exportdirectcostdetails where iprocoreprojectid = 1187294
GROUP BY sinvoicenumber
HAVING COUNT(*) > 2
Is there a way it can display all columns?
Original Question : Why do I get error ed2 should have column name defined
You are having a derived table, so you need to have column names for the derived table.
select ed1.sinvoicenumber,
ed1.ipayableid,
ed2.sinvoicenumber
from ExportDirectCostDetails ed1
inner join
(
SELECT sinvoicenumber, count(sinvoicenumber) AS InvoiceNumberCount
FROM exportdirectcostdetails
where iprocoreprojectid = 1187294
GROUP BY sinvoicenumber
HAVING COUNT(*) > 2
) ed2
on ed1.sinvoicenumber = ed2.sinvoicenumber
Updated Question: How to have all column names
You need to have PARTITION BY clause defined and then apply filter as given below:
SELECT t.* FROM
(SELECT *, count(*) OVER(PARTITION BY payableid,invoiceNumber) AS InvoiceCount
FROM exportdirectcostdetails where iprocoreprojectid = 1187294) as t
WHERE InvoiceCount > 1

Select number of records until the sum is less than 'n' - Access SQL

I am working on Microsoft Access. My requirement is, User will give any percentage value and I have to find the number of IDs which form the percentage of the 'Value' column. For e.g. in the below DataSet (it is sorted by descending of value column which is also required), the sum of all values is '8409131'.
ID NAME VALUE
1000000090 A 2295175
1000000974 B 1942753
1000015555 C 1887965
1000004864 D 1310400
1000015557 E 972838
If I enter 75%, the value is 65170765.25, so I need to return all the IDs which forms the '65170765', less than or equals to. So in this case below are the sum of values which are less than 65170765.
ID NAME VALUE
1000000090 A 2295175
1000000974 B 1942753
1000015555 C 1887965
Is this possible to achieve my requirement in Access SQL?
My plan is to make a running total column to find sum of first two rows and then sum of that value with next row. But in Access, I am not able to figure out how to create incremental rows in select query also to achieve this.
Query I tried:
SELECT T1.ID, T1.NAME, T1.VALUE,(T1.VALUE + T2.VALUE)
FROM (
SELECT ID , RUN_MANAGER.NAME AS NAME, RUN_MANAGER.REPORTING_PERIOD, SUM(VALUE) As VALUE
FROM DATA
INNER JOIN RUN_MANAGER
ON DATA.RUN_NUMBER=RUN_MANAGER.RUN_NUMBER
WHERE RUN_MANAGER.NAME='A'
GROUP BY ID,RUN_MANAGER.NAME
ORDER BY SUM(VALUE) DESC) AS T1
INNER JOIN (
SELECT ID , RUN_MANAGER.NAME AS NAME, RUN_MANAGER.REPORTING_PERIOD, SUM(VALUE) As VALUE
FROM DATA
INNER JOIN RUN_MANAGER
ON DATA.RUN_NUMBER=RUN_MANAGER.RUN_NUMBER
WHERE RUN_MANAGER.NAME='A'
GROUP BY ID,RUN_MANAGER.NAME
ORDER BY SUM(VALUE) DESC) AS T2
ON T1.ID=T2.ID+1
This is not a duplicate question. The problem is, this question is based on Access SQL and also I do not have any incremental ascending rows.
If you have a table like t:
ID NAME VALUE
1000000090 A 2295175
1000000974 B 1942753
1000015555 C 1887965
1000004864 D 1310400
1000015557 E 972838
You can use this query:
SELECT *
FROM t
WHERE
(SELECT SUM(VALUE) FROM t ti WHERE ti.Name <= t.Name) < (SELECT SUM(VALUE) FROM t ti) * 0.75
For this:
ID NAME VALUE
1000000090 A 2295175
1000000974 B 1942753
1000004864 D 1310400

SQL nested aggregate functions MAX(COUNT(*))

I'm trying to select max(count of rows).
Here is my 2 variants of SELECT
SELECT MAX(COUNT_OF_ENROLEES_BY_SPEC) FROM
(SELECT D.SPECCODE, COUNT(D.ENROLEECODE) AS COUNT_OF_ENROLEES_BY_SPEC
FROM DECLARER D
GROUP BY D.SPECCODE
);
SELECT S.NAME, MAX(D.ENROLEECODE)
FROM SPECIALIZATION S
CROSS JOIN DECLARER D WHERE S.SPECCODE = D.SPECCODE
GROUP BY S.NAME
HAVING MAX(D.ENROLEECODE) =
( SELECT MAX(COUNT_OF_ENROLEES_BY_SPEC) FROM
( SELECT D.SPECCODE, COUNT(D.ENROLEECODE) AS COUNT_OF_ENROLEES_BY_SPEC
FROM DECLARER D
GROUP BY D.SPECCODE
)
);
The first one is working OK, but I want to rewrite it using "HAVING" like in my second variant and add there one more column. But now 2nd variant don't output any data in results, just empty columns.
How can I fix it ? Thank YOU!)
This query based on description given in comments and some suggestions, so it may be wrong:
select -- 4. Join selected codes with specializations
S.Name,
selected_codes.spec_code,
selected_codes.count_of_enrolees_by_spec
from
specialization S,
(
select -- 3. Filter records with maximum popularity only
spec_code,
count_of_enrolees_by_spec
from (
select -- 2. Count maximum popularity in separate column
spec_code,
count_of_enrolees_by_spec,
max(count_of_enrolees_by_spec) over (partition by null) max_count
from (
SELECT -- 1. Get list of declarations and count popularity
D.SPECCODE AS SPEC_CODE,
COUNT(D.ENROLEECODE) AS COUNT_OF_ENROLEES_BY_SPEC
FROM DECLARER D
GROUP BY D.SPECCODE
)
)
where count_of_enrolees_by_spec = max_count
)
selected_codes
where
S.SPECCODE = selected_codes.spec_code
Also query not tested and some syntax errors are possible.