I've created a query in Apache Spark in hopes of taking multiple rows of customer data and rolls it up into one row, showing what types of products they have open. So data that looks like this:
Customer Product
1 Savings
1 Checking
1 Auto
Ends up looking like this:
Customer Product
1 Savings/Checking/Auto
The query currently still has multiple rows. I tried group by, but that doesn't show the multiple products that a customer has, instead, it'll just show one product.
Is there a way to do this is Apache Spark or SQL (which is really similar to apache)? Unfortunately, I don't have MYSQL nor do I think IT will install it for me.
SELECT
"ACCOUNT"."account_customerkey" AS "account_customerkey",
max(
concat(case when Savings=1 then ' Savings'end,
case when Checking=1 then ' Checking 'end,
case when CD=1 then ' CD /'end,
case when IRA=1 then ' IRA /'end,
case when StandardLoan=1 then ' SL /'end,
case when Auto=1 then ' Auto /'end,
case when Mortgage=1 then ' Mortgage /'end,
case when CreditCard=1 then ' CreditCard 'end)) AS Description
FROM "ACCOUNT" "ACCOUNT"
inner join (
SELECT
"ACCOUNT"."account_customerkey" AS "customerkey",
CASE WHEN "ACCOUNT"."account_producttype" = 'Savings' THEN 1 ELSE NULL END AS Savings,
CASE WHEN "ACCOUNT"."account_producttype" = 'Checking' THEN 1 ELSE NULL END AS Checking,
CASE WHEN "ACCOUNT"."account_producttype" = 'CD' THEN 1 ELSE NULL END AS CD,
CASE WHEN "ACCOUNT"."account_producttype" = 'IRA' THEN 1 ELSE NULL END AS IRA,
CASE WHEN "ACCOUNT"."account_producttype" = 'Standard Loan' THEN 1 ELSE NULL END AS StandardLoan,
CASE WHEN "ACCOUNT"."account_producttype" = 'Auto' THEN 1 ELSE NULL END AS Auto,
CASE WHEN "ACCOUNT"."account_producttype" = 'Mortgage' THEN 1 ELSE NULL END AS Mortgage,
CASE WHEN "ACCOUNT"."account_producttype" = 'Credit Card' THEN 1 ELSE NULL END AS CreditCard
FROM "ACCOUNT" "ACCOUNT"
)a on "account_customerkey" =a."customerkey"
GROUP BY
"ACCOUNT"."account_customerkey"
Please try this.
scala> df.show()
+--------+--------+
|Customer| Product|
+--------+--------+
| 1| Savings|
| 1|Checking|
| 1| Auto|
| 2| Savings|
| 2| Auto|
| 3|Checking|
+--------+--------+
scala> df.groupBy($"Customer").agg(collect_list($"Product").as("Product")).select($"Customer",concat_ws(",",$"Product").as("Product")).show(false)
+--------+---------------------+
|Customer|Product |
+--------+---------------------+
|1 |Savings,Checking,Auto|
|3 |Checking |
|2 |Savings,Auto |
+--------+---------------------+
scala>
See https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/collect_list and related functions
You need to use collect_list which is available with SQL or %sql.
%sql
select id, collect_list(num)
from t1
group by id
I used my own data, you need to tailor. Just demonstrating in more native SQL form.
Related
My initial query and result looks like below
SELECT * FROM (
SELECT S.CURRENCYCD , S.COMBINEDTXNAMOUNT
, S.INTERCHANGEAMOUNT
, S.OTHERCHARGEAMOUNT
,S.APPLIEDCHARGES
FROM
(SELECT X.CURRENCYCD,
CASE WHEN X.CombinedTxnAmount IS NULL THEN 0 ELSE X.CombinedTxnAmount END AS CombinedTxnAmount,
CASE WHEN X.InterchangeAmount IS NULL THEN 0 ELSE X.InterchangeAmount END AS InterchangeAmount,
CASE WHEN X.OtherChargeAmount IS NULL THEN 0 ELSE X.OtherChargeAmount END AS OtherChargeAmount,
CASE WHEN X.MinimumBillAdjustmentAmount IS NULL THEN 0 ELSE X.MinimumBillAdjustmentAmount END +
CASE WHEN X.AppliedChargeAmount IS NULL THEN 0 ELSE X.AppliedChargeAmount END AS APPLIEDCHARGES FROM
XMLTABLE (XMLNAMESPACES('http://www.eds.com/AgileCard/xsd/ACFMerchantStatement/2013/05' AS "acf",'http://www.eds.com/AgileCard/xsd/ACFMerchant/2012/03' as "acfMerchant")
,'/acf:Statement/acf:ChargeSummaryTotals/acf:ChargeSummaryTotalDetail' passing xmltype ('Myxml')
COLUMNS CURRENCYCD VARCHAR(4) PATH 'acfMerchant:CurrencyCd',
CombinedTxnAmount NUMBER PATH 'acf:CombinedTxnAmount',
InterchangeAmount NUMBER PATH 'acf:InterchangeAmount',
OtherChargeAmount NUMBER PATH 'acf:OtherChargeAmount',
MinimumBillAdjustmentAmount NUMBER PATH 'acf:MinimumBillAdjustmentAmount',
AppliedChargeAmount NUMBER PATH 'acf:AppliedChargeAmount'
)X ) S
The result is:
currencyCD | combinedTxnAmount | InterchangeAmount| OtherChangeAmount|AppliedCharges
GBP | 126.5 | 97.02 | 252.92 | 476.44
I am trying to unpivot this query result but for some reason I am not getting the expected result. Below is my unpivot query.
SELECT * FROM (
SELECT S.CURRENCYCD , S.COMBINEDTXNAMOUNT
, S.INTERCHANGEAMOUNT
, S.OTHERCHARGEAMOUNT
,S.APPLIEDCHARGES
FROM
(SELECT X.CURRENCYCD,
CASE WHEN X.CombinedTxnAmount IS NULL THEN 0 ELSE X.CombinedTxnAmount END AS CombinedTxnAmount,
CASE WHEN X.InterchangeAmount IS NULL THEN 0 ELSE X.InterchangeAmount END AS InterchangeAmount,
CASE WHEN X.OtherChargeAmount IS NULL THEN 0 ELSE X.OtherChargeAmount END AS OtherChargeAmount,
CASE WHEN X.MinimumBillAdjustmentAmount IS NULL THEN 0 ELSE X.MinimumBillAdjustmentAmount END +
CASE WHEN X.AppliedChargeAmount IS NULL THEN 0 ELSE X.AppliedChargeAmount END AS APPLIEDCHARGES FROM
XMLTABLE (XMLNAMESPACES('http://www.eds.com/AgileCard/xsd/ACFMerchantStatement/2013/05' AS "acf",'http://www.eds.com/AgileCard/xsd/ACFMerchant/2012/03' as "acfMerchant")
,'/acf:Statement/acf:ChargeSummaryTotals/acf:ChargeSummaryTotalDetail' passing xmltype ('myxml')
COLUMNS CURRENCYCD VARCHAR(4) PATH 'acfMerchant:CurrencyCd',
CombinedTxnAmount NUMBER PATH 'acf:CombinedTxnAmount',
InterchangeAmount NUMBER PATH 'acf:InterchangeAmount',
OtherChargeAmount NUMBER PATH 'acf:OtherChargeAmount',
MinimumBillAdjustmentAmount NUMBER PATH 'acf:MinimumBillAdjustmentAmount',
AppliedChargeAmount NUMBER PATH 'acf:AppliedChargeAmount'
)X ) S) UNPIVOT ((AMT) FOR TASK IN ((COMBINEDTXNAMOUNT) AS 'Combined Transaction Charge',
(INTERCHANGEAMOUNT) as 'Interchange'
,(OTHERCHARGEAMOUNT) as 'Other Charges'
,(APPLIEDCHARGES) as 'Charges Applied To Account'
))
Below is what this query returns:
currencycd | Task | AMT
GBP | Combined Transaction Charge | 126.5
GBP | Interchange | 0
GBP | Other Charges | 0
GBP | Charges Applied To Account | 0
Only the first unpivot value displays the amount correctly. Remaining values are displayed as zero.
Can someone point me what is going wrong with this query.
I have a table that lists items and a status about these items. The problem is that some items have multiple different status entries. For example.
HOST Status
1.1.1.1 PASS
1.1.1.1 FAIL
1.2.2.2 FAIL
1.2.3.3 PASS
1.4.2.1 FAIL
1.4.2.1 FAIL
1.1.4.4 NULL
I need to return one status per asset.
HOST Status
1.1.1.1 PASS
1.2.2.2 FAIL
1.2.3.3 PASS
1.4.2.1 FAIL
1.1.4.4 No Results
I have been trying to do this with T-SQL Case statements but can't quite get it right.
The conditions are any Pass + anything is a Pass, Fail+ No Results is a fail and Null is No Results.
Try using a case statement to convert to ordered results and group on that, finally, you'll need to convert back to the nice, human-readable answer:
with cte1 as (
SELECT HOST,
[statNum] = case
when Status like 'PASS' then 2
when Status like 'FAIL' then 1
else 0
end
FROM table
)
SELECT HOST, case max(statNum) when 2 then 'PASS' when 1 then 'FAIL' else 'No Results' end
FROM cte1
GROUP BY HOST
NOTE: I used a CTE statement to hopefully make things a little clearer, but everything could be done in a single SELECT, like so:
SELECT HOST,
[Status] = case max(case when Status like 'PASS' then 2 when Status like 'FAIL' then 1 else 0 end)
when 2 then 'PASS'
when 1 then 'FAIL'
else 'No Result'
end
FROM table
You can use Max(Status) with Group by Host to get Distinct values:
Select host, coalesce(Max(status),'No results') status
From Table1
Group by host
Order by host
Fiddle Demo Results:
| HOST | STATUS |
|---------|------------|
| 1.1.1.1 | PASS |
| 1.1.4.4 | No results |
| 1.2.2.2 | FAIL |
| 1.2.3.3 | PASS |
| 1.4.2.1 | FAIL |
By default SQL Server is case insensitive, If case sensitivity is a concern for your server, then use the lower() function as below:
Select host, coalesce(Max(Lower(status)),'No results') status
From Table1
Group by host
Order by host
Fiddle demo
WITH CTE( HOST, STATUSValue)
AS(
SELECT HOST,
CASE STATUS WHEN 'PASS' 1 ELSE 0 END AS StatusValue
FROM Data
)
SELECT DISTINCT HOST,
CASE ISNULL(GOOD.STATUSVALUE,-1) WHEN 1 THEN 'Pass'
ELSE CASE ISNULL( BAD.STATUSVALUE,-1) WHEN 0 Then 'Fail' Else 'No Results' END
END AS Results
FROM DATA AS D
LEFT JOIN CTE AS GOOD
ON GOOD.HOST = D.HOST
AND GOOD.STATUSVALUE = 1
LEFT JOIN CTE AS BAD
ON BAD.HOST = BAD.HOST
AND BAD.STATUSVALUE = 0
I really wanted to come up with the solution by myself for this one, but this is turning out to be slightly more challenging than I thought it would be.
The table I am trying to retrieve information would look something like below in simpler form.
Table: CarFeatures
+---+---+---+---+-----+
|Car|Nav|Bth|Eco|Radio|
+---+---+---+---+-----+
|a |y |n |n |y |
+---+---+---+---+-----+
|b |n |y |n |n |
+---+---+---+---+-----+
|c |n |n |y |n |
+---+---+---+---+-----+
|d |n |y |y |n |
+---+---+---+---+-----+
|e |y |n |n |n |
+---+---+---+---+-----+
On the SSRS report, I need to display all the cars that has all the features from the given parameters. This will receive parameters from the report like: Nav-yes/no, Bth-yes/no, Eco-yes/no, Radio-yes/no.
For instance, if the parameter input were 'Yes' for navigation and 'No' for others, the result table should be like;
+---+----------+
|Car|Features |
+---+----------+
|a |Nav, Radio|
+---+----------+
|e |Nav |
+---+----------+
I thought this would be simple, but as I try to get the query done, this is kind of driving me crazy. Below is what I thought initially will get me what I need, but didn't.
select Car,
case when #nav = 'y' then 'Nav ' else '' end +
case when #bth = 'y' then 'Bth ' else '' end +
case when #eco = 'y' then 'Eco ' else '' end +
case when #radio = 'y' then 'Radio ' else '' end As Features
from CarFeatures
where (nav = #nav -- here I don't want the row to be picked if the input is 'n'
or bth = #bth
or eco = #eco
or radio = #radio)
Basically the logic should be something like, if there is a row for every parameter that is 'yes,' list me all the features with 'yes' for that row, even though the parameters are 'no' for those other features.
Also, I am not considering to filter on the report. I want this to be on stored proc itself.
I would certainly like to avoid multiple ifs considering I have 4 parameters and the permutation of 4 in if might not be a better thing to do.
Thanks.
Your schema is awkward and denormalised, you should have 3 tables,
Car
Feature
CarFeature
The CarFeature table should consist of two columns, CarId and FeatureId. Then your could do something like,
SELECT DISTINCT
cr.CarId
FROM
CarFeature cr
WHERE
cr.FeatureId IN SelectedFeatures;
Rant
{
Not only would it be easy to add features without changing the schema,
offer better performance because of support of set based operations
covered by good indecies, overall use less storage because you no
longer need to store the No values, you would comply with some well
thought out and established patterns backed by 40+ years of
development effort and clarification.
}
If, for whatever reason, you cannot change the data or schema, you could UNPIVOT the columns like this, Fiddle Here
SELECT
p.Car,
p.Feature
FROM
(
SELECT
Car,
Nav,
Bth,
Eco,
Radio
FROM
CarFeatures) cf
UNPIVOT (Value For Feature In (Nav, Bth, Eco, Radio)) p
WHERE
p.Value='Y';
Or, you could do it old style like this Fiddle Here,
SELECT
Car,
'Nav' Feature
FROM
CarFeatures
WHERE
Nav = 'Y'
UNION ALL
SELECT
Car,
'Bth' Feature
FROM
CarFeatures
WHERE
Bth = 'Y'
UNION ALL
SELECT
Car,
'Eco' Feature
FROM
CarFeatures
WHERE
Eco = 'Y'
UNION ALL
SELECT
Car,
'Radio' Feature
FROM
CarFeatures
WHERE
Radio = 'Y'
to essentially, denormalise into subquery. Both queries give results like this,
CAR FEATURE
A Nav
A Radio
B Bth
B Radio
C Eco
D Bth
D Eco
E Nav
Try This, I believe this will solve your purpose..
SELECT Car,
tblPivot.Property AS Features,
tblPivot.Value
INTO #tmpFeature
FROM
(SELECT CONVERT(sql_variant,Car) AS Car,CONVERT(sql_variant,NAV) AS NAV, CONVERT(sql_variant,BTH) AS BTH, CONVERT(sql_variant,ECO) AS ECO,
CONVERT(sql_variant,Radio) AS Radio FROM CarFeatures) CarFeatures
UNPIVOT (Value For Property In (NAV,BTH, ECO, Radio)) as tblPivot
Where tblPivot.Value='y'
SELECT
Car,
STUFF((
SELECT ', ' + Features
FROM #tmpFeature
WHERE (Car = Results.Car)
FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
,1,2,'') AS Features
FROM #tmpFeature Results
GROUP BY Car
Try this its working ...........
declare #table table (Car char(1),Nav char(1),Bth char(1),Eco char(1),Radio char(1))
insert into #table
select 'a', 'y' , 'n' , 'n', 'y'
union all
select 'b', 'n' , 'y' , 'n', 'n'
union all
select 'c', 'n' , 'n' , 'y', 'n'
union all
select 'd', 'n' , 'y' , 'y', 'n'
union all
select 'e', 'y' , 'n' , 'n', 'n'
select * from #table
select a.car,
Nav = left((case when a.nav = 'y' then 'Nav, ' else '' end) +
(case when a.bth = 'y' then 'Bth, ' else '' end)+
(case when a.Eco = 'y' then 'Eco, ' else '' end)+
(case when a.Radio = 'y' then 'Radio,' else '' end),
(len(((case when a.nav = 'y' then 'Nav, ' else '' end) +
(case when a.bth = 'y' then 'Bth, ' else '' end)+
(case when a.Eco = 'y' then 'Eco, ' else '' end)+
(case when a.Radio = 'y' then 'Radio,' else '' end)))-1))
from #table a
--Aha! I kind of figured out myself (very happy) :). Since the value for the columns can only be 'y' or 'n,' to ignore when the parameters value are no,
-- I will just ask it look for value that will never be there.
--If anyone has a better way of doing it or enhancing what I have (preferred) would be appreciated.
--Thanks to everyone who replied. Since this is a part of already existing table and also a piece of a big stored proc, I was reluctant to go with previous answers to the question.
--variable declaring and assignments here
select Car,
case when #nav = 'y' then 'Nav ' else '' end +
case when #bth = 'y' then 'Bth ' else '' end +
case when #eco = 'y' then 'Eco ' else '' end +
case when #radio = 'y' then 'Radio ' else '' end As Features
from CarFeatures
where (nav = (case when #nav = 'y' then 'Y' else 'B' end
OR case when #bth = 'y' then 'Y' else 'B' end
OR case when #eco = 'y' then 'Y' else 'B' end
OR case when #radio = 'y' then 'Y' else 'B' end
)
Given the table like
| userid | active | anonymous |
| 1 | t | f |
| 2 | f | f |
| 3 | f | t |
I need to get:
number of users
number of users with 'active' = true
number of users with 'active' = false
number of users with 'anonymous' = true
number of users with 'anonymous' = false
with single query.
As for now, I only came out with the solution using union:
SELECT count(*) FROM mytable
UNION ALL
SELECT count(*) FROM mytable where active
UNION ALL
SELECT count(*) FROM mytable where anonymous
So I can take first number and find non-active and non-anonymous users with simple deduction .
Is there any way to get rid of union and calculate number of records matching these simple conditions with some magic and efficient query in PostgreSQL 9?
You can use an aggregate function with a CASE to get the result in separate columns:
select
count(*) TotalUsers,
sum(case when active = 't' then 1 else 0 end) TotalActiveTrue,
sum(case when active = 'f' then 1 else 0 end) TotalActiveFalse,
sum(case when anonymous = 't' then 1 else 0 end) TotalAnonTrue,
sum(case when anonymous = 'f' then 1 else 0 end) TotalAnonFalse
from mytable;
See SQL Fiddle with Demo
Assuming your columns are boolean NOT NULL, this should be a bit faster:
SELECT total_ct
,active_ct
,(total_ct - active_ct) AS not_active_ct
,anon_ct
,(total_ct - anon_ct) AS not_anon_ct
FROM (
SELECT count(*) AS total_ct
,count(active OR NULL) AS active_ct
,count(anonymous OR NULL) AS anon_ct
FROM tbl
) sub;
Find a detailed explanation for the techniques used in this closely related answer:
Compute percents from SUM() in the same SELECT sql query
Indexes are hardly going to be of any use, since the whole table has to be read anyway. A covering index might be of help if your rows are bigger than in the example. Depends on the specifics of your actual table.
-> SQLfiddle comparing to #bluefeet's version with CASE statements for each value.
SQL server folks are not used to the proper boolean type of Postgres and tend to go the long way round.
I will be using my output to place into an Excel pivot table. The data is dealing with credit accounts that have either charged off or not.
EDIT: If chargeoffs is checked in the pivot table I want the totalaccounts column to be a count of total accounts regardless of the chargeoffdate value. If chargeoffs is left unchecked I want totalaccounts to be a count of all accounts when chargeoffdate is NULL.
Here is my SQL syntax so far:
SELECT
c.brand,
CASE WHEN a.chargeoffdate IS NULL THEN 'No Chargeoffs'
-- Below here should not be only chargeoffs, it should be chargeoffs + the column above ^^^
WHEN a.chargeoffdate IS NOT NULL THEN 'Chargeoffs'
ELSE 'Unknown' END AS chargeoffs,
COUNT(*) AS totalaccounts
FROM accounts
GROUP BY brand, chargeoffs
You can see the comment in my SQL to understand what I am going for, but I can't figure out how to accomplish this.
I tried:
CASE WHEN a.chargeoffdate IS NULL THEN 'No Chargeoffs'
-- Below here should not be only chargeoffs, it should be chargeoffs + the column above ^^^
WHEN (a.chargeoffdate IS NOT NULL OR a.chargeoffdate IS NULL) THEN 'Chargeoffs Included'
ELSE 'Unknown' END AS chargeoffs
But got the same results as the top query for some reason. Thanks.
ANOTHER EDIT: OUTPUT DESIRED
BRAND 1 | WITH CHARGEOFFS | COUNT(TOTALACCOUNTS)
BRAND 1 | WITHOUT CHARGEOFFS | COUNT(TOTALACCOUNTS)
BRAND 2 | WITH CHARGEOFFS | COUNT(TOTALACCOUNTS)
BRAND 2 | WITHOUT CHARGEOFFS | COUNT(TOTALACCOUNTS)
Updated:
Chargeoffs = Count of all accounts whether chargeoffdate is null or not
No Chargeoffs = Count of all accounts where chargeoffdate is null (they haven't charged off)
SELECT
brand,
count(*) as "Chargeoffs",
sum(CASE WHEN a.chargeoffdate IS NULL THEN 1 ELSE 0 END) as 'No Chargeoffs'
FROM accounts
GROUP BY brand
UPDATE: I'm tired, I obtained this long SQL, wich is near what you want:
SELECT brand,
tp,
CASE WHEN TP = 1 then sum(cnt) END as 'No Chargeoffs',
sum(cnt) as "Chargeoffs"
FROM(
SELECT
brand,
CASE WHEN a.chargeoffdate IS NULL THEN 1 ELSE 0 END as tp
count(*) as cnt
FROM accounts
GROUP BY brand, CASE WHEN a.chargeoffdate IS NULL THEN 1 ELSE 0 END
GROUP BY brand, tp
That was kind of stupid and my solution was easy. I just left it how I had it and the pivot table added them together for me.
I found this out after I had created 2 separate queries and did some data manipulation with SAS to get what I wanted. Ouch.