Why can't I correctly group data in SQL? - sql

I am trying to pull data and format with these headers in this order.
I'm pulling the data from Snowflake using SQL, and in the table I'm pulling the data from, the PD_amt and CN_amt are listed as separate transactions. Currently, when I pull the data, it still shows up as two separate lines with null values rather than being correctly grouped into a single row under a single user id; you can see it here in the output (I highlighted a few rows to show the issue).
select atr.user_id
, date_trunc('day', atr.trans_date) trans_day
, atr.year_month
, case when atr.trans_type = 'PD' then atr.trans_id end as PD_transaction_id
, case when atr.trans_type = 'PD' then sum(atr.amount) end as PD_amt
, case when atr.trans_type = 'CN' then atr.trans_id end as CN_transaction_id
, case when atr.trans_type = 'CN' then sum(atr.amount) end as CN_amt
from wisen_data.sm_account_trans atr
where ((trans_type = 'CN' and trans_sub_type = 'HSNMLP')
or trans_type in ('PP','PD'))
and atr.status not in ('VOIDED','FAILED')
and atr.trans_date >= date_trunc('month', current_date-3*365)
group by atr.user_id, trans_day, atr.year_month, atr.trans_type, atr.trans_id
order by trans_day
I'm not super proficient with SQL so I'm hoping to get some quick help to get this to work. Thank you!

Use conditional aggregation. The case expression is the argument to the sum():
select atr.user_id,
date_trunc('day', atr.trans_date) as trans_day
atr.year_month
sum(case when atr.trans_type = 'PD' then atr.amount end) as PD_amt,
sum(case when atr.trans_type = 'CN' then atr.amount end) as CN_amt
from wisen_data.sm_account_trans atr
where ((trans_type = 'CN' and trans_sub_type = 'HSNMLP')
or trans_type in ('PP','PD')
) and
atr.status not in ('VOIDED','FAILED')
atr.trans_date >= date_trunc('month', current_date-3*365)
group by atr.user_id, trans_day, atr.year_month
order by trans_day;
I removed PD_transaction_id and CN_transaction_id because I'm not sure what these are supposed to be.

Related

How to present a particular SQL queried row as columns in output

I need to present the attached output in PIC1 as the result in PIC2. The query used for generating PIC1 output in SQLDeveloper:
select subs_nm, as_of_date, run_status, (select max (tp.pr_vl)
from ual_mng.tqueue tq, ual_mng.tparams tp, ual_mng.tstatus ts
WHERE tq.tid = tp.tid AND tq.tid = ts.tid and tq.run_id = pcm.run_id and tp.pr_nm in ('TOT_RECORD_CNT')) as RECORD_COUNT
from UAL_MNG.PCM_SUBS_RUN_DTL_VW pcm where SUBS_NM='S_TS2_AQUA_A1_RLAP_DL' and AS_OF_DATE in ('2021-09-01','2021-09-02') order by run_start_dtm desc;
Appreciate all help.
If you don't need it to be dynamic (ie. it will only be two columns and you know which two months they are) you can do
select subs_nm,
max(case when as_of_date = '2021-09-01' then RECORD_COUNT else 0 end) as SEP1,
max(case when as_of_date = '2021-09-02' then RECORD_COUNT else 0 end) as SEP2,
from (
-- Your query
)
group by subs_nm
You can work out the percentage difference using the same expressions.
nb. I would always use an explicit date format mask. This might not run on a different machine / software. So use to_date('2021-09-01', 'yyyy-mm-dd')
Posting the query, which worked in the script :
select subs_nm, SEP1, SEP2, round((((SEP1-SEP2)/SEP1)*100),2) as DIFF_PER from ( select subs_nm,
max(case when as_of_date='2021-09-01' then RECORD_COUNT else '0' end) as SEP1,
max(case when as_of_date='2021-09-02' then RECORD_COUNT else '0' end) as SEP2 from (-- *Main Query*);

aggregate function error in case expression

I have this query
SELECT mylearning.Employee_Id,
case
when max(case when not mylearning.CourseStatusTXT = 'Completed' then 1 else 0 end) = 0 then '2018 Complete'
when max(case when mylearning.CourseStatusTXT in ('Started', 'Not Started') then 1 else 0 end) = 1 then '2018 Not Complete'
end as Completion_Status
FROM Analytics.myLearning_Completions as mylearning inner join Analytics.Workday WD on mylearning.Employee_ID = WD.Employee_ID
And I want to add a condition to the first when statement to make it like this
when max(case when not mylearning.CourseStatusTXT = 'Completed' then 1 else 0 end) = 0
and WD.Adjusted_Hire_Date like '2019% '
and mylearning.CourseTimeCompletedH < cast (WD.Adjusted_Hire_Date as date format 'YYYY/MM/DD') +7
then '2018 Complete'
but I keep getting this error
Executed as Single statement. Failed [3504 : HY000] Selected non-aggregate values must be part of the associated group.
Elapsed time = 00:00:00.069
How can I fix it?
Like a couple others mentioned, you are trying to mix grouped data with non-aggregated data in your calculation, which is why you're getting the 3504 error. You need to either include the referenced columns in your GROUP BY or include them inside an aggregate function (i.e. MAX).
I'm not 100% sure if this is what you're after, but hopefully it can help you along.
SELECT
mylearning.Employee_Id,
CASE
WHEN
MAX(CASE WHEN NOT mylearning.CourseStatusTXT = 'Completed' THEN 1 ELSE 0 END) = 0 AND
WD.Adjusted_Hire_Date like '2019% ' AND
-- Check if most recently completed course is before Hire (Date + 1 week)
MAX(mylearning.CourseTimeCompletedH) <
CAST(WD.Adjusted_Hire_Date AS DATE FORMAT 'YYYY/MM/DD') + 7
THEN '2018 Complete' -- No incomplete learnings
WHEN MAX(
CASE WHEN mylearning.CourseStatusTXT IN ('Started', 'Not Started') THEN 1 ELSE 0 END
) = 1 THEN '2018 Not Complete' -- Started / Not Started learnings exist
END AS Completion_Status
FROM Analytics.myLearning_Completions as mylearning -- Get learning info
INNER JOIN Analytics.Workday WD on mylearning.Employee_ID = WD.Employee_ID -- Employee info
GROUP BY mylearning.Employee_Id, WD.Adjusted_Hire_Date
This will give you a summary per employee, with a couple assumptions:
Assuming employee_ID value in Analytics.Workday is a unique value (one-to-one join), to use WD.Adjusted_Hire_Date in your comparisons, you just need to include it in the GROUP BY.
Assuming you have multiple courses per employee_Id, in order to use mylearning.CourseTimeCompletedH in your comparisons, you'd need to wrap that in an aggregate like MAX.
The caveat here is that the query will check if the most recently completed course per employee is before the "hire_date" expression, so I'm not sure if that's what you're after.
Give it a try and let me know.
The issue here is that you are mixing detail row by row information in the same query as group or aggregated data. Aggregated data will output a single value for all the rows unless you have a group by clause. If you have a group by clause then it will output a single value for each group. When you are grouping you can also include any values that are in the group by clause since they will be unique for the group.
if you want this data for each employee, then you could group by employee_id. Any other data would need to also be an aggregate like Max(Adjusted_Hire_Date)
Maybe this is what you want?
SELECT
mylearning.employee_id
, case
when CourseStatusTXT = 'Completed' and WD.Adjusted_Hire_Date like '2019%'
and mylearning.CourseTimeCompletedH < cast (WD.Adjusted_Hire_Date as date format 'YYYY/MM/DD') +7
then '2018 Complete'
else '2018 Not Complete'
end CompletionStatus
FROM myLearning_Completions mylearning, Workday WD
WHERE mylearning.employee_id = WD.employee_id

SQL Query to consolidate multiple records on one row based on criteria

I want to produce a SQL Query which looks at multiple rows and populates certain information in certain columns. I am using SQL Server.
I have 2 tables a transaction table and a transaction data table. Sample data below.
What I am after is a query which returns this:
The Bank Column and Amount Column should get their information from the record that has the bank as the Category (ie: first line in the TransData table for example) and the Category needs to be derived from the Category column if there is only one other record apart form the bank record, otherwise populates with 'Multiple'
Initially I thought this was relatively straight forward, but I was wrong, and now I am stuck.
The code I have got so far is:
SELECT
T.dtm_TransDate,
T.txt_Type,
T.txt_Description,
CASE
WHEN TD.txt_Category = 'Current' THEN TD.txt_Category
END AS 'Bank',
TD.dbl_Amount
FROM dbo.tbl_Trans AS T
JOIN dbo.tbl_TransData AS TD ON TD.int_TransID = T.int_Trans_ID
WHERE
(T.txt_Type = 'REC' OR T.txt_Type = 'PAY')
AND T.dtm_TransDate > '2019-02-01'
This produces every record, but I need to consolidate several records into one.
I did not know if I need to use a pivot for this, but unsure on how that works.
Any help or pointing me in the right direction would be much appreciated. Any further information needed or clarified, please let me know.
Thanks in advance
This might work well for you. Here, I have an inner query pre-grouping per transaction ID. I am getting the sum of transactions only if associated with "BANK", but also getting the description and count. At the same time, doing for those NOT as bank and getting the max description... if only one, great, you have it. If more than one, you have the COUNTER for that too.
Then the outer query applies a test based on the COUNTER column to either retrieve the bank or category (multiple) respectively.
select
T.TransID,
T.TransDate,
T.Reference,
T.Description,
case when PQ.CountOfBank = 1
then PQ.SingleBank
else 'Multiple Bank' end Bank,
case when PQ.CountOfPurpose = 1
then PQ.SinglePurpose
else 'Multiple' end Category,
PQ.SumOfBank amount
from
(select
TD.TransID
sum( case when TD.Category = 'Bank'
then TD.Amount else 0 end ) SumOfBank,
sum( case when TD.Category = 'Bank'
then 1 else 0 end ) CountOfBank,
max( case when TD.Category = 'Bank'
then TD.Category else ' ' end ) as SingleBank,
sum( case when TD.Category = 'Bank'
then 0 else 1 end ) CountOfPurpose,
max( case when TD.Category = 'Bank'
then ' ' else TD.Category end ) as SinglePurpose
from
TransData TD
group by
TD.TransID ) PQ
JOIN Trans T
on PQ.TransID = T.TransID
If you need a date filter applied, add that to the INNER PreQuery (PQ) WHERE clause portion.
Do you just want a group by?
SELECT T.dtm_TransDate, T.txt_Type, T.txt_Description,
'BANK' as bank, -- unclear what the logic is here
(CASE WHEN MIN(TD.txt_Category) = MAX(TD.txt_Category)
THEN MIN(TD.txt_Category)
ELSE 'Multiple'
END) as category,
SUM(TD.dbl_Amount) as Amount
FROM dbo.tbl_Trans T JOIN
dbo.tbl_TransData TD
ON TD.int_TransID = T.int_Trans_ID
WHERE T.txt_Type IN ('REC', 'PAY') AND
T.dtm_TransDate > '2019-02-01'
GROUP BY T.dtm_TransDate, T.txt_Type, T.txt_Description,
Can't you just use something similar to the below?:
SELECT
T.dtm_TransDate,
T.txt_Type,
T.txt_Description,
CASE
WHEN TD.txt_Category = 'Current' THEN TD.txt_Category
END AS 'Bank',
MAX(TD.dbl_Amount)
FROM dbo.tbl_Trans AS T
JOIN dbo.tbl_TransData AS TD ON TD.int_TransID = T.int_Trans_ID
WHERE (T.txt_Type = 'REC' OR T.txt_Type = 'PAY')
AND T.dtm_TransDate > '2019-02-01'
GROUP BY T.dtm_TransDate, T.txt_Type, T.txt_Description, TD.txt_Category
What I did here was add a MAX() on the Amount attribute taken from the second table. In theory what this does is get the greatest value of the TransID attribute you are joining on.
Hope this helps :)

compare two different date ranges sales in two columns

I want to compare two different date ranges sales in two columns.. I am using query below but its giving wrong sales.. please correct my query
select s1.Itm_cd,s1.Itm_Name,Sum(S1.amount),Sum(s2.amount)
from salestrans s1,salestrans s2
where s1.Itm_cd = S2.Itm_cd
and S1.Tran_dt between '20181101' and'20181130'
and S2.Tran_dt between '20171101' and '20171130'
group by s1.Itm_cd,s1.Itm_Name
Order by s1.Itm_cd
I suspect that you want conditional aggregation here:
WITH cte AS (
SELECT
s1.Itm_cd,
s1.Itm_Name,
SUM(CASE WHEN s1.Tran_dt BETWEEN '20181101' AND '20181130'
THEN s1.amount ELSE 0 END) AS sum_2018,
SUM(CASE WHEN s1.Tran_dt BETWEEN '20171101' AND '20171130'
THEN s1.amount ELSE 0 END) AS sum_2017
FROM salestrans s1
GROUP BY
s1.Itm_cd,
s1.Itm_Name
)
SELECT
Itm_cd,
Itm_Name,
sum_2018,
sum_2017,
CASE WHEN COALESCE(sum_2017, 0) <> 0
THEN FORMAT(100.0 * (sum_2018 - sum_2017) / sum_2017, 'N', 'en-us')
ELSE 'NA' END AS growth_pct
FROM cte
ORDER BY
Itm_cd;
Please try the following
select s1.Itm_cd,s1.Itm_Name,Sum(S1.amount),Sum(s2.amount)
from salestrans s1,salestrans s2
where s1.Itm_cd = S2.Itm_cd
and Convert(Varchar(10),S1.Tran_dt,112) between '20181101' and'20181130'
and Convert(Varchar(10),S2.Tran_dt,112) between '20171101' and '20171130'
group by s1.Itm_cd,s1.Itm_Name
Order by s1.Itm_cd
Here the logic is that in right side while comparision you are providing only date and not any separator and time. The same way should be applied to the column in left side for comparision.
if(Convert(Varchar(10), getdate(),112) = '20181224')
print 'Matched'
else
print 'Not Matched'
if(getdate() = '20181224')
print 'Matched'
else
print 'Not Matched'
Here the output is Matched for first and Not Matched because in first case both side same format has been taken for comparison.

PIVOT from JOIN tables

I have a large database and in this database, there are two tables I need to pull information from. I have pulled all the data I need out of the two tables by using both a JOIN and a CASE WHEN. Here is a screen shot from the output
SQL Server output
This is the code that I used to pull the data:
SELECT [PORTMultiMax].[dbo].cardholdertable.cardid as CardID,CardHolderTable.FirstName as FirstName,CardHolderTable.LastName as LastName, CardHolderTable.InitLet as MI, CardHolderPersonalDataXrTable.PersonalDataItem as Data,
CASE WHEN PersonalDataID = '4' THEN 'SSN'
WHEN PersonalDataID = '22' THEN 'Employer'
WHEN PersonalDataID = '30' THEN 'Training Type'
WHEN PersonalDataID = '32' THEN 'Primary Sponsor'
WHEN PersonalDataID = '37' THEN 'Training Date'
ELSE NULL END AS Description
FROM [PORTMultiMax].[dbo].[CardHolderTable]
join [PORTMultiMax].[dbo].[CardHolderPersonalDataXrTable]
on cardholdertable.CardID=CardHolderPersonalDataXrTable.CardID
where PersonalDataID IN (4,22,30,32,33,37)
order by LastName
The tables involved are named: CardHolderTable and CardHolderPersonalDataXrTable
What I need to do next is get rid of the duplicate name entries in the data. So for example, "JAMES AARON" has multiple rows due to him having multiple descriptors ("Training Date, TrainingType, Employer, and SSN").
I wanted to try and use a PIVOT to pull the row data out and put them in columns named "SSN, Employer, etc...". My problem is I have never used PIVOT before and I am confused on how to apply a PIVOT code to my current SQL query.
PLEASE HELP. Thank you so much
Given your query, I think conditional aggregation is simpler:
SELECT ch.FirstName, ch.LastName
MAX(CASE WHEN PersonalDataID = '4' THEN 1 ELSE 0 END) as is_SSN
MAX(CASE WHEN PersonalDataID = '22' THEN ELSE 0 END) as is_Employer,
MAX(CASE WHEN PersonalDataID = '30' THEN ELSE 0 END) as is_TrainingType,
MAX(CASE WHEN PersonalDataID = '32' THEN ELSE 0 END) as is_PrimarySponsor,
MAX(CASE WHEN PersonalDataID = '37' THEN ELSE 0 END) as is_TrainingDate
from [PORTMultiMax].[dbo].[CardHolderTable] ch join
[PORTMultiMax].[dbo].[CardHolderPersonalDataXrTable] cp
on ch.CardID = cp.CardID
where PersonalDataID IN (4, 22, 30, 32, 33, 37)
group by ch.FirstName, ch.LastName;