SQL Server Select Statement Columns - sql

Original Query:
select StudyID, count(CompletedDate), count(Removed), count(RemovalReason)
from Study a
full outer join Households b
on a.HouseholdID = b.HouseholdID
where StudyID = '123456'
and Removed = 1
and RemovalReason = 5
group by StudyID
How do I write out this query so that for each column (CompletedDate, Removed, and RemovalReason) is not restricted to the conditions (i.e. Removed = 1, Removal Reason = 5) and only applies to the specific column. If I execute this query, it will not show me the total count for CompletedDate because I'm restricting it to these conditions. Is there a way to write it directly next to count?
Table/Columns - Study:
HouseholdID (primary key),
StudyID,
CompletedDate
Table/Columns - Households:
HouseholdID (primary key),
Removed,
RemovalReason

I think you are looking for something like this, but your question is a little loose with details:
select StudyID
, count(CompletedDate)
, sum(case when Removed = 1 then 1 else 0 end)
, sum(case when RemovalReason = 5 then 1 else 0 end)
from Study a
join Households b
on a.HouseholdID = b.HouseholdID
where StudyID = '123456'
group by StudyID

Related

How to update data with multi condition

I have a table with 5 columns like this table below:
I wanna fill the approval column based on conditions such as:
APPROVAL = "Y" IN CASE
(1) CUSTOMER_NEW <> CUSTOMER (as in the case CUSTOMER = 12467)
(2) CUSTOMER_NEW = CUSTOMER AND SALE_SHORT_ID COLUMN HAS DIFFERENT VALUE (as in the case CUSTOMER = 13579)
(3) CUSTOMER_NEW = CUSTOMER AND SALE_SHORT_ID HAS THE SAME VALUE THEN MAX(LEN(SALE_ID) (as in the case CUSTOMER = 65465)
ELSE "N"
My result expected to like this table:
Thanks for your support.
I am not sure about all your your conditions but this one could do it:
UPDATE a
SET approval =
CASE WHEN
a.customer <> a.customer_new OR
a.customer = a.customer_new AND b.sale_short_eq = 0 OR
a.customer = a.customer_new AND a.sale_id = b.max_len_sale_id
THEN 'Y'
ELSE 'N'
END
FROM
thetable a
INNER JOIN (
SELECT
customer,
CASE WHEN MIN(sale_short_id) = MAX(sale_short_id) THEN 1 ELSE 0 END AS sale_short_eq,
( SELECT TOP 1 sale_id
FROM thetable
WHERE customer = c.customer
ORDER BY LEN(sale_id) DESC
) AS max_len_sale_id
FROM thetable c
GROUP BY customer
) b
ON a.customer = b.customer
Note that the columns b.sale_short_eq and b.max_len_sale_id are returned by a sub-query joined to the main query.
Especially SALE_SHORT_ID HAS THE SAME VALUE THEN MAX(LEN(SALE_ID) is not clear. Did you mean that SALE_ID must be the same as the maximum length SALE_ID? Because SALE_SHORT_ID cannot be the same than any SALE_ID. And what happens if two different SALE_IDs have the same max length?
See: https://dbfiddle.uk/7Ct87-Mk

use distinct within case statement

I have a query that uses multiple left joins and trying to get a SUM of values from one of the joined columns.
SELECT
SUM( case when session.usersessionrun =1 then 1 else 0 end) new_unique_session_user_count
FROM session
LEFT JOIN appuser ON appuser.appid = '6279df3bd2d3352aed591583'
AND appuser.userid = session.userid
LEFT JOIN userdevice ON userdevice.appid = '6279df3bd2d3352aed591583'
AND userdevice.userid = appuser.userid
WHERE session.appid = '6279df3bd2d3352aed591583'
AND (session.uploadedon BETWEEN '2022-04-18 08:31:26' AND '2022-05-18 08:31:26')
But this obviously gives a redundant session.usersessionrun=1 counts since it's a joined resultset.
Here the logic was to mark the user as new if the sessionrun for that record is 1.
I grouped by userid and usersessionrun and it shows that the records are repeated.
userid. sessionrun. count
628212 1 2
627a01 1 4
So what I was trying to do was something like
SUM(CASE distinct(session.userid) AND WHEN session.usersessionrun = 1 THEN 1 ELSE 0 END) new_unique_session_user_count
i.e. for every unique user count, session.usersessionrun = 1 should only be done once.
As you have discovered, JOIN operations can generate combinatorial explosions of data.
You need a subquery to count your sessions by userid. Then you can treat the subquery as a virtual table and JOIN it to the other tables to get the information you need in your result set.
The subquery (nothing in my answer is debugged):
SELECT COUNT(*) new_unique_session_user_count,
session.userid
FROM session
WHERE session.appid = '6279df3bd2d3352aed591583'
AND session.uploadedon BETWEEN '2022-04-18 08:31:26'
AND '2022-05-18 08:31:26'
AND session.usersessionrun = 1
AND session.appid = '6279df3bd2d3352aed591583'
GROUP BY userid
This subquery summarizes your session table and has one row per userid. The trick to avoiding JOIN-created combinatorial explosions is using subqueries that generate results with only one row per data item mentioned in a JOIN's ON-clause.
Then, you join it with the other tables like this
SELECT summary.new_unique_session_user_count
FROM (
SELECT COUNT(*) new_unique_session_user_count,
session.userid
FROM session
WHERE session.appid = '6279df3bd2d3352aed591583'
AND session.uploadedon BETWEEN '2022-04-18 08:31:26'
AND '2022-05-18 08:31:26'
AND session.usersessionrun = 1
AND session.appid = '6279df3bd2d3352aed591583'
GROUP BY userid
) summary
JOIN appuser ON appuser.appid = '6279df3bd2d3352aed591583'
AND appuser.userid = summary.userid
JOIN userdevice ON userdevice.appid = '6279df3bd2d3352aed591583'
AND userdevice.userid = appuser.userid
There may be better ways to structure this query, but it's hard to guess at them without more information about your table definitions and business rules.

CASE WHEN 2 results in 1 column have the same result in another column

with the sample data below a collection_ref will carry 3 or 4 movements but the key 2 are the purchase and sale. The supply chain approved will be 1 once the movement is complete.
I want to CASE WHEN the movement is a Sale and Supply Chain Approved is 1 then a new column for all lines that have the same collection ref to show 'Completed' Else 'Outstanding'... any ideas?
Thanks in advance
SELECT DISTINCT a.bulk_type_code,
a.bulk_number,
a.supplier_contract_ref,
a.supplier_consignment_ref,
a.supplier_org_code,
a.supplier_org_name,
a.collection_ref,
a.raw_weight_tons,
a.financial_net_weight_tons,
a.purchase_weight_tons,
a.delivery_date,
b.delivery_term_code,
b.delivery_term_description,
c.week_number
FROM bi.bulk_subcontainer_pricing_group_summary a
LEFT JOIN bi.contracts b ON a.supplier_contract_number = b.contract_number
OR a.purchaser_contract_number = b.contract_number
INNER JOIN bi.weeks c ON a.delivery_date BETWEEN c.start_date AND c.end_date
WHERE a.delivery_date BETWEEN #DFrom AND #DTo
AND a.bulk_type_code IN (#BulkType)
AND a.business_unit_code = 'OLLIMP'
AND a.pricing_type_group_code = #pricing_hidden
ORDER BY a.collection_ref
It would be helpful if you could share the schema instead if writing out the column / key names.
It sounds like what you want is something like this
Select Case when Bulk_Type_Code = 'Sale' and Supply_Chain_Approved = 1 then 'Completed'
when Bulk_Type_Code = 'Sale' and Supply_Chain_Approved = 0 then 'Outstanding'
else NULL end as CalculatedColumn
If the Bulk_Type_Code is not sale, then this column will be null.

Postgresql: Query using multiple attributes

I'm trying to create a query with Postgresql. Unfortunately, the attributes in the attribute_table are listed as rows instead of columns which makes it harder to pull. I want to pull a count based on the three attributes I have listed below (1000 = gender - 2 = female, 1001 = age group - 5 = 55-64, 1002 = household size = 1). How do I adjust this query so that it only gives me one row vs three rows of the same personal_ID? Also when I use this query, it doesn't pull any values but if I put only 1 attribute it works.
select sa.country_id ,count(distinct sa.personal_id )
from study_table sa ,attribute_table a
where sa.country_id =a.country_id and sa.personal_id =a.personal_id
and to_char(sa.mailing_date,'yyyy-MM')='2021-01'
and attribute_id =1000 and a.attribute_number =2
and attribute_id =1001 and a.attribute_number =5
and attribute_id =1002 and a.attribute_number =1
and study_type ='Wave'
and status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id
First, use proper, explicit, standard JOIN syntax.
Second, your WHERE conditions are contradictory. You need ORs . . . and then a HAVING for the final filtering:
select sa.country_id ,count(distinct sa.personal_id )
from study_table sa join
attribute_table a
on sa.country_id = a.country_id and
sa.personal_id = a.personal_id
where to_char(sa.mailing_date,'yyyy-MM') = '2021-01' and
( (attribute_id = 1000 and a.attribute_number = 2) or
(attribute_id = 1001 and a.attribute_number = 5) or
(attribute_id = 1002 and a.attribute_number = 1)
) and
study_type ='Wave'
status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id
having count(distinct attribute_id) = 3;
It seems your attribute_table follows an Entity-Attribute-Value. (IMHO that is an extremely bad bad plan - but if that is what you got that is what you deal with). So to get/validate/set 3 attributes you need to reference that table 3 times in the query -once for each attribute.
select sa.country_id
, count(distinct sa.personal_id )
from study_table sa
join ( select g.personal_id
from attribute_table g
join attribute_table ag on ag.personal_id = ag.personal_id
join attribute_table hs on hs.personal_id = hs.personal_id
where 1=1
and (g.attribute_id =1000 and g.attribute_number =2)
and (ag.attribute_id =1001 and ag.attribute_number =5)
and (hs.attribute_id =1002 and hs.attribute_number =1)
) sf
on sf.personal_id = sa.personal_id
where 1=1
and to_char(sa.mailing_date,'yyyy-mm')='2021-01'
and sa.study_type ='Wave'
and sa.status not in ('NEW','EXCLUDED','ERROR')
group by sa.country_id;
The above of course assumes the column names study_type and status originate in the study table - seems likely.
But if not you will need 2 more references to the attribute table.
Hint: Always use tables aliases on ALL column references.

splitting select query into two or more parts

I am using TOAD for Oracle. While i implement some sql queries i encountered these problem:
I am using a few tables that each of them has approx. 10M rows for a select query. 2 tables have over 70M rows data.
Let's say i have;
a TRANSACTION table (prim. key: SQ_TRANSACTION_ID)
a TRANSACTION_DETAIL table (foreign keys: RF_TRANSACTION_ID,
RF_PRODUCT_ID)
a PRODUCT table (prim. key: SQ_PRODUCT_ID)
My select query is like;
SELECT TR.TRANSACTION_ID,
SUM(CASE WHEN PR.CD_PRODCUT_TYPE = 'A'
THEN TRD.CS_INVOICE_PRICE ELSE 0 END) A_PRODUCT_TOTAL,
SUM(CASE WHEN PR.CD_PRODCUT_TYPE <> 'A'
THEN TRD.CS_INVOICE_PRICE ELSE 0 END) B_PRODUCT_TOTAL
FROM TRANSACTION TR,
TRANSACTION_DETAIL TRD,
PRODUCT PR
WHERE TR.SQ_TRANSACTION_ID = TRD.RF_TRANSACTION_ID
AND TRD.RF_PRODUCT_ID = PR.SQ_PRODUCT_ID
GROUP BY TR.TRANSACTION_ID,
CASE WHEN PR.CD_PRODCUT_TYPE = 'A' THEN TRD.CS_INVOICE_PRICE ELSE 0 END,
CASE WHEN PR.CD_PRODCUT_TYPE <> 'A' THEN TRD.CS_INVOICE_PRICE ELSE 0 END
Is there a way to split this query into two or more parts with referenced each other by using their foreign/primary keys? I mean like splitting into two parts that first part fetches A_PRODUCT_TOTAL and second part fetches B_PRODUCT_TOTAL. Each part's transaction id should match at the result data.
A direct translation of your query would be:
SELECT TR.TRANSACTION_ID, SUM(TRD.CS_INVOICE_PRICE) A_PRODUCT_TOTAL
FROM TRANSACTION TR join
TRANSACTION_DETAIL TRD
on TR.SQ_TRANSACTION_ID = TRD.RF_TRANSACTION_ID join
PRODUCT PR
on TRD.RF_PRODUCT_ID = PR.SQ_PRODUCT_ID
WHERE PR.CD_PRODCUT_TYPE = 'A'
GROUP BY TR.TRANSACTION_ID,
CASE WHEN PR.CD_PRODCUT_TYPE = 'A' THEN TRD.CS_INVOICE_PRICE ELSE 0 END
However, I suspect that you don't want the second clause in the group by, because each transaction would be split into reows where the invoice price is the same:
SELECT TR.TRANSACTION_ID, SUM(TRD.CS_INVOICE_PRICE) A_PRODUCT_TOTAL
FROM TRANSACTION TR join
TRANSACTION_DETAIL TRD
on TR.SQ_TRANSACTION_ID = TRD.RF_TRANSACTION_ID join
PRODUCT PR
on TRD.RF_PRODUCT_ID = PR.SQ_PRODUCT_ID
WHERE PR.CD_PRODCUT_TYPE = 'A'
GROUP BY TR.TRANSACTION_ID;
The query for 'B' would be similar.