I have a database table that records transactions in a shop. The main data recorded is the cost of each item. E.g.
Item || Cost
TV || 80.00
XboxGame || 55.00
Monitor || 45.00
Controller || 15.00
I want to find out the number of items purchased that cost less than $25, how many cost between $25 and $49.99, and how many cost $50 and above. This should be the result of running the sql command:
Table1 (Target):
Cost || Number of Items
==================================
Less than $25 || 1
$25 - $49.99 || 1
$50 and above || 2
I have tried the following:
SELECT (SELECT COUNT(*) AS Expr1
FROM tblTransactions
WHERE (Cost < 25)) AS [Less than $25],
(SELECT COUNT(*) AS Expr1
FROM tblTransactions AS tblTransactions_1
WHERE (Cost >= 25) AND (Cost < 50)) AS [$25 - $49.99],
(SELECT COUNT(*) AS Expr1
FROM tblTransactions AS tblTransactolions_2
WHERE (ContributionAmount >= 50)) AS [$50 and above];
However this produces:
Table 2 (actual result)
Less than $25 || $25 - $49.99 || $50 and above
========================================
1 || 1 || 2
My main question is: how do I change my SQL so that it produces Table 1, instead of Table 2?
Use Group By:
Select
[Cost] = CASE WHEN cost < 25 then 'Less than $25'
WHEN cost >= 25 and cost < 50 then '$25 - $49.99'
WHEN cost >= 50 then '$50 and above' END
,COUNT(*) As [Count]
FROM tblTransactions
GROUP BY (CASE WHEN cost < 25 then 'Less than $25'
WHEN cost >= 25 and cost < 50 then '$25 - $49.99'
WHEN cost >= 50 then '$50 and above' END)
ORDER BY CASE WHEN
cost < 25 then '1'
WHEN cost >= 25 and cost < 50 then '2'
WHEN cost >= 50 then '3' END
Edited to include a sorting.
You need to have separate queries that are unioned together:
SELECT 'Less than $25' AS [Cost], COUNT(*) AS [Number of Items]
FROM tblTransactions
WHERE (Cost < 25)
UNION ALL
SELECT '$25 - $49.99', COUNT(*)
FROM tblTransactions AS tblTransactions_1
WHERE (Cost >= 25) AND (Cost < 50)
UNION ALL
SELECT '$50 and above', COUNT(*)
FROM Transactions AS tblCoachBudgetHistory_2
WHERE (ContributionAmount >= 50)
Use UNION ALL:
SELECT 'Some text' AS [Label],
COUNT(*)
FROM TABLE
WHERE SomeCol = 'some val'
UNION ALL
SELECT 'Some text' AS [Label],
COUNT(*)
FROM TABLE
WHERE SomeCol = 'some val'
You could do that with a UNION
(SELECT COUNT(*) AS Count,
, "Less than $25" AS Label
FROM tblTransactions
WHERE Cost < 25)
UNION ALL
(SELECT COUNT(*) AS Count
, "$25 - $49.99" as Label
FROM tblTransactions
WHERE (Cost >= 25) AND (Cost < 50))
UNION ALL
(SELECT COUNT(*) AS Count
,"$50 and above" as Label
FROM Transactions
WHERE ContributionAmount >= 50)
If I were writing this, I would start with a subquery (or view) to first group the values into buckets.
SELECT
(CASE WHEN (Cost >= 50) THEN 50
WHEN (Cost >= 25) THEN 25
ELSE 0 END) as costLevel
FROM transactions t
Then it's trivial to group.
SELECT
(CASE costLevel WHEN 50 THEN '$50 and above'
WHEN 25 THEN '$25 - $49.99'
ELSE 'Less than $25' END) as [Cost]
, COUNT(*) AS [Number of Items]
FROM (thePreviousQuery) AS cl
GROUP BY costLevel
-- optional:
-- ORDER BY costLevel
The above syntax is for SQL Server, so YMMV elsewhere. (I've also treated tblTransactions and Transactions as a typo.)
Personally, I tend to prefer range tables for this;
SELECT name, COUNT(*)
FROM (VALUES ('Less than $25', CAST(0 as DECIMAL(5, 2)), CAST(25 as DECIMAL(5, 2))),
('$25 - $49.99', 25, 50),
('$50 and above', 50, null)) Range(name, lower, upper)
LEFT JOIN Transactions
ON cost >= lower
AND (upper IS NULL OR cost < upper)
GROUP BY name
(and working fiddle example. Should work on any RDBMS, pretty much)
This has the added benefit that the table can actually be persisted, meaning maintainable by end-users. Also, an index on cost can be used by the optimizer, meaning potentially better performant queries. Due to the LEFT JOIN, you'll also get a 0 count for ranges that have no transactions.
Related
I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...
SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09
Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below
I have a table with the following columns:
account, validity_date,validity_month,amount.
For each row i want to check if the value in field "amount' exist over the rows range of the next month. if yes, indicator=1, else 0.
account validity_date validity_month amount **required_column**
------- ------------- --------------- ------- ----------------
123 15oct2019 201910 400 0
123 20oct2019 201910 500 1
123 15nov2019 201911 1000 0
123 20nov2019 201911 500 0
123 20nov2019 201911 2000 1
123 15dec2019 201912 400
123 15dec2019 201912 2000
Can anyone help?
Thanks
validity_month/100*12 + validity_month MOD 100 calculates a month number (for comparing across years, Jan to previous Dec) and the inner ROW_NUMBER reduces multiple rows with the same amount per month to a single row (kind of DISTINCT):
SELECT dt.*
,CASE -- next row is from next month
WHEN Lead(nextMonth IGNORE NULLS)
Over (PARTITION BY account, amount
ORDER BY validity_date)
= (validity_month/100*12 + validity_month MOD 100) +1
THEN 1
ELSE 0
END
FROM
(
SELECT t.*
,CASE -- one row per account/month/amount
WHEN Row_Number()
Over (PARTITION BY account, amount, validity_month
ORDER BY validity_date ) = 1
THEN validity_month/100*12 + validity_month MOD 100
END AS nextMonth
FROM tab AS t
) AS dt
Edit:
The previous is for exact matching amounts, for a range match the query is probably very hard to write with OLAP-functions, but easy with a Correlated Subquery:
SELECT t.*
,CASE
WHEN
( -- check if there's a row in the next month matching the current amount +/- 10 percent
SELECT Count(*)
FROM tab AS t2
WHERE t2.account_ = t.account_
AND (t2.validity_month/100*12 + t2.validity_month MOD 100)
= ( t.validity_month/100*12 + t.validity_month MOD 100) +1
AND t2.amount BETWEEN t.amount * 0.9 AND t.amount * 1.1
) > 0
THEN 1
ELSE 0
END
FROM tab AS t
But then performance might be really bad...
Assuming the values are unique within a month and you have a value for each month for each account, you can simplify this to:
select t.*,
(case when lead(seqnum) over (partition by account, amount order by validity_month) = seqnum + 1
then 1 else 0
end)
from (select t.*,
dense_rank() over (partition by account order by validity_month) as seqnum
from t
) t;
Note: This puts 0 for the last month rather than NULL, but that can easily be adjusted.
You can do this without the subquery by using month arithmetic. It is not clear what the data type of validity_month is. If I assume a number:
select t.*,
(case when lead(floor(validity_month / 100) * 12 + (validity_month mod 100)
) over (partition by account, amount order by validity_month) =
(validity_month / 100) * 12 + (validity_month mod 100) - 1
then 1 else 0
end)
from t;
Just to add another way to do this using Standard SQL. This query will return 1 when the condition is met, 0 when it is not, and null when there isn't a next month to evaluate (as implied in your result column).
It is assumed that we're partitioning on the account field. Also includes a 10% range match on the amount field based on the comment made. Note that if you have an id field, you should include it (if two rows have the same account, validity_date, validity_month, amount there will only be one resulting row, due to DISTINCT).
Performance-wise, should be similar to the answer from #dnoeth.
SELECT DISTINCT
t1.account,
t1.validity_date,
t1.validity_month,
t1.amount,
CASE
WHEN t2.amount IS NOT NULL THEN 1
WHEN MAX(t1.validity_month) OVER (PARTITION BY t1.account) > t1.validity_month THEN 0
ELSE NULL
END AS flag
FROM `project.dataset.table` t1
LEFT JOIN `project.dataset.table` t2
ON
t2.account = t1.account AND
DATE_DIFF(
PARSE_DATE("%Y%m", CAST(t2.validity_month AS STRING)),
PARSE_DATE("%Y%m", CAST(t1.validity_month AS STRING)),
MONTH
) = 1 AND
t2.amount BETWEEN t1.amount * 0.9 AND t1.amount * 1.1;
I need to create a table that determines the number of orders that fall into different order size ranges. However, I need to display a count of 0 for the order size range of 1,001 and Above and it isn't showing up when I run my query.
SELECT "Bucket", COUNT(*) AS "Order Count"
FROM
(SELECT CASE
WHEN O.QuantityShares <= 100 THEN '0-100'
WHEN O.QuantityShares <= 400 THEN '101-400'
WHEN O.QuantityShares <= 800 THEN '401-800'
WHEN O.QuantityShares <= 1000 THEN '801-1,000'
ELSE '1,001 and Above'
FROM OrderTransactions O)
GROUP BY "Bucket"
ORDER BY "Bucket" ASC;
You can do this with a subquery to define the buckets and then a LEFT JOIN:
SELECT b.bucket, COUNT(ot.QuantityShres) AS "Order Count"
FROM (SELECT 0 as lower, 100 as upper, '0-100' as bucket FROM dual
SELECT 101, 400, '101-400' FROM dual
. . .
) b LEFT JOIN
OrderTransactions ot
ON ot.QuantityShares BETWEEN b.lower AND b.upper
GROUP BY b.bucket
ORDER BY MIN(b.lower) ASC;
The UNPIVOTclause might make for an elegant solution.
You just need to create a table of your buckets then 'LEFT JOIN' your quantities to that.
SELECT "Bucket", COUNT(*) AS "Order Count"
FROM
(SELECT '0-100' As Bucket
UNION
SELECT '101-400' AS Bucket
UNION
SELECT '401-800' AS Bucket
UNION
SELECT '801-1,000') Buckets
LEFT JOIN
(SELECT CASE
WHEN O.QuantityShares <= 100 THEN '0-100'
WHEN O.QuantityShares <= 400 THEN '101-400'
WHEN O.QuantityShares <= 800 THEN '401-800'
WHEN O.QuantityShares <= 1000 THEN '801-1,000'
ELSE '1,001 and Above' END As Bucket
FROM OrderTransactions O) QTY
ON Buckets.Bucket = QTY.Bucket
GROUP BY Buckets.Bucket
ORDER BY Buckets.Bucket ASC;
Is there a safe way to not have to group by a field when using an aggregate in another field? Here is my example
SELECT
C.CustomerName
,D.INDUSTRY_CODE
,CASE WHEN D.INDUSTRY_CODE IN ('003','004','005','006','007','008','009','010','017','029')
THEN 'PM'
WHEN UPPER(CustomerName) = 'ULINE INC'
THEN 'ULINE'
ELSE 'DR'
END AS BU
,ISNULL((SELECT SUM(GrossAmount)
where CONVERT(date,convert(char(8),InvoiceDateID )) between DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 1, 0) and DATEADD(year, -1, GETDATE())),0) [PREVIOUS YEAR GROSS]
FROM factMargins A
LEFT OUTER JOIN dimDate B ON A.InvoiceDateID = B.DateId
LEFT OUTER JOIN dimCustomer C ON A.CustomerID = C.CustomerId
LEFT OUTER JOIN CRCDATA.DBO.CU10 D ON D.CUST_NUMB = C.CustomerNumber
GROUP BY
C.CustomerName,D.INDUSTRY_CODE
,A.InvoiceDateID
order by CustomerName
before grouping I was only getting 984 rows but after grouping by the A.InvoiceDateId field I am getting over 11k rows. The rows blow up since there are multiple invoices per customer. Min and Max wont work since then it will pull data incorrectly. Would it be best to let my application (crystal) get rid of the extra lines? Usually I like to have my base data be as close as possible to how the report will layout if possible.
Try moving the reference to InvoiceDateID to within an aggregate function, rather than within a selected subquery's WHERE clause.
In Oracle, here's an example:
with TheData as (
select 'A' customerID, 25 AMOUNT , trunc(sysdate) THEDATE from dual union
select 'B' customerID, 35 AMOUNT , trunc(sysdate-1) THEDATE from dual union
select 'A' customerID, 45 AMOUNT , trunc(sysdate-2) THEDATE from dual union
select 'A' customerID, 11000 AMOUNT , trunc(sysdate-3) THEDATE from dual union
select 'B' customerID, 12000 AMOUNT , trunc(sysdate-4) THEDATE from dual union
select 'A' customerID, 15000 AMOUNT , trunc(sysdate-5) THEDATE from dual)
select
CustomerID,
sum(amount) as "AllRevenue"
sum(case when thedate<sysdate-3 then amount else 0 end) as "OlderRevenue",
from thedata
group by customerID;
Output:
CustomerID | AllRevenue | OlderRevenue
A | 26070 | 26000
B | 12035 | 12000
This says:
For each customerID
I want the sum of all amounts
and I want the sum of amounts earlier than 3 days ago
I would like to apply total $10.00 discount for each customers.The discount should be applied to multiple transactions until all $10.00 used.
Example:
CustomerID Transaction Amount Discount TransactionID
1 $8.00 $8.00 1
1 $6.00 $2.00 2
1 $5.00 $0.00 3
1 $1.00 $0.00 4
2 $5.00 $5.00 5
2 $2.00 $2.00 6
2 $2.00 $2.00 7
3 $45.00 $10.00 8
3 $6.00 $0.00 9
The query below keeps track of the running sum and calculates the discount depending on whether the running sum is greater than or less than the discount amount.
select
customerid, transaction_amount, transactionid,
(case when 10 > (sum_amount - transaction_amount)
then (case when transaction_amount >= 10 - (sum_amount - transaction_amount)
then 10 - (sum_amount - transaction_amount)
else transaction_amount end)
else 0 end) discount
from (
select customerid, transaction_amount, transactionid,
sum(transaction_amount) over (partition by customerid order by transactionid) sum_amount
from Table1
) t1 order by customerid, transactionid
http://sqlfiddle.com/#!6/552c2/7
same query with a self join which should work on most db's including mssql 2008
select
customerid, transaction_amount, transactionid,
(case when 10 > (sum_amount - transaction_amount)
then (case when transaction_amount >= 10 - (sum_amount - transaction_amount)
then 10 - (sum_amount - transaction_amount)
else transaction_amount end)
else 0 end) discount
from (
select t1.customerid, t1.transaction_amount, t1.transactionid,
sum(t2.transaction_amount) sum_amount
from Table1 t1
join Table1 t2 on t1.customerid = t2.customerid
and t1.transactionid >= t2.transactionid
group by t1.customerid, t1.transaction_amount, t1.transactionid
) t1 order by customerid, transactionid
http://sqlfiddle.com/#!3/552c2/2
You can do this with recursive common table expressions, although it isn't particularly pretty. SQL Server stuggles to optimize these types of query. See Sum of minutes between multiple date ranges for some discussion.
If you wanted to go further with this approach, you'd probably need to make a temporary table of x, so you can index it on (customerid, rn)
;with x as (
select
tx.*,
row_number() over (
partition by customerid
order by transaction_amount desc, transactionid
) rn
from
tx
), y as (
select
x.transactionid,
x.customerid,
x.transaction_amount,
case
when 10 >= x.transaction_amount then x.transaction_amount
else 10
end as discount,
case
when 10 >= x.transaction_amount then 10 - x.transaction_amount
else 0
end as remainder,
x.rn as rn
from
x
where
rn = 1
union all
select
x.transactionid,
x.customerid,
x.transaction_amount,
case
when y.remainder >= x.transaction_amount then x.transaction_amount
else y.remainder
end,
case
when y.remainder >= x.transaction_amount then y.remainder - x.transaction_amount
else 0
end,
x.rn
from
y
inner join
x
on y.rn = x.rn - 1 and y.customerid = x.customerid
where
y.remainder > 0
)
update
tx
set
discount = y.discount
from
tx
inner join
y
on tx.transactionid = y.transactionid;
Example SQLFiddle
I usually like to setup a test environment for such questions. I will use a local temporary table. Please note, I made the data un-ordered since it is not guaranteed in a real life.
-- play table
if exists (select 1 from tempdb.sys.tables where name like '%transactions%')
drop table #transactions
go
-- play table
create table #transactions
(
trans_id int identity(1,1) primary key,
customer_id int,
trans_amt smallmoney
)
go
-- add data
insert into #transactions
values
(1,$8.00),
(2,$5.00),
(3,$45.00),
(1,$6.00),
(2,$2.00),
(1,$5.00),
(2,$2.00),
(1,$1.00),
(3,$6.00);
go
I am going to give you two answers.
First, in 2014 there are new windows functions for rows preceding. This allows us to get a running total (rt) and a rt adjusted by one entry. Give these two values, we can determine if the maximum discount has been exceeded or not.
-- Two running totals for 2014
;
with cte_running_total
as
(
select
*,
SUM(trans_amt)
OVER (PARTITION BY customer_id
ORDER BY trans_id
ROWS BETWEEN UNBOUNDED PRECEDING AND
0 PRECEDING) as running_tot_p0,
SUM(trans_amt)
OVER (PARTITION BY customer_id
ORDER BY trans_id
ROWS BETWEEN UNBOUNDED PRECEDING AND
1 PRECEDING) as running_tot_p1
from
#transactions
)
select
*
,
case
when coalesce(running_tot_p1, 0) <= 10 and running_tot_p0 <= 10 then
trans_amt
when coalesce(running_tot_p1, 0) <= 10 and running_tot_p0 > 10 then
10 - coalesce(running_tot_p1, 0)
else 0
end as discount_amt
from cte_running_total;
Again, the above version is using a common table expression and advanced windowing to get the totals.
Do not fret! The same can be done all the way down to SQL 2000.
Second solution, I am just going to use the order by, sub-queries, and a temporary table to store the information that is normally in the CTE. You can switch the temporary table for a CTE in SQL 2008 if you want.
-- w/o any fancy functions - save to temp table
select *,
(
select count(*) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id <= o.trans_id
) as sys_rn,
(
select sum(trans_amt) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id <= o.trans_id
) as sys_tot_p0,
(
select sum(trans_amt) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id < o.trans_id
) as sys_tot_p1
into #results
from #transactions o
order by customer_id, trans_id
go
-- report off temp table
select
trans_id,
customer_id,
trans_amt,
case
when coalesce(sys_tot_p1, 0) <= 10 and sys_tot_p0 <= 10 then
trans_amt
when coalesce(sys_tot_p1, 0) <= 10 and sys_tot_p0 > 10 then
10 - coalesce(sys_tot_p1, 0)
else 0
end as discount_amt
from #results
order by customer_id, trans_id
go
In short, your answer is show in the following screen shot. Cut and paste the code into SSMS and have some fun.