SQL server sum and count - sql

I've got two tables:
orders:
orderId
orderTotal
state
orderStatesHistory
id
orderId
stateId
I need to display get to display results like this.
Right now my code is this, I'm using SQL Server
WITH totalOrders AS
(
SELECT orders.state, COUNT(1) AS counter, SUM(orders.orderTotal) AS total
FROM orders
CROSS APPLY
(
SELECT orderId, currentDateTime FROM orderStatesHistory
WHERE (orderStatesHistory.stateId = 1 OR orderStatesHistory.stateId = 3)
AND NULLIF(orderStatesHistory.subStateId, 0) IS NULL
AND orderStatesHistory.currentDateTime BETWEEN '2015-07-28 00:00:00' AND '2015-08-04 00:00:00'
AND orders.id = orderStatesHistory.orderId
) AS statesHistory
WHERE orders.state IN (1,2,3,4,5)
AND orders.documentType = 1
GROUP BY orders.state
)
SELECT 9999 AS state, SUM(counter) AS counter, SUM(total) AS total
FROM totalOrders
UNION
SELECT state, counter, total
FROM totalOrders
Problem is somehow, registries in orderStatesHistory might be duplicated, and I only want to use each orderId once in "count()" and "sum()"
I've been struggling pretty hard, not sure if i'll be able to do it all with SQL, maybe some genius helps me out, if not I'll do it throught the software.
NOTE: When I do count() and sum(), I want to only use one time each orderId, if they are duplicated i don't want to count them.
Any help is apreciated, even someone saying it is imposible.
PD: I'm willing to use JOINS if necesary, no need to use SQL server specific language.
UPDATE 1:
Data in orderStatesHistory
Data in orders

You should be able to accomplish this using a JOIN and DISTINCT instead of a CROSS APPLY. If you could post some sample data, it would help in verifying.
WITH totalOrders AS
(
SELECT orders.state, COUNT(1) AS counter, SUM(orders.orderTotal) AS total
FROM orders
INNER JOIN
(
SELECT DISTINCT orderId FROM orderStatesHistory
WHERE (orderStatesHistory.stateId = 1 OR orderStatesHistory.stateId = 3)
AND NULLIF(orderStatesHistory.subStateId, 0) IS NULL
AND orderStatesHistory.currentDateTime BETWEEN '2015-07-28 00:00:00' AND '2015-08-04 00:00:00'
) AS statesHistory
ON orders.id = statesHistory.orderId
WHERE orders.state IN (1,2,3,4,5)
AND orders.documentType = 1
GROUP BY orders.state
)
SELECT 9999 AS state, SUM(counter) AS counter, SUM(total) AS total
FROM totalOrders
UNION
SELECT state, counter, total
FROM totalOrders

Related

How to write SQL query without join?

Recently during an interview I was asked a question: if I have a table like as below:
The requirement is: how many orders and how many shipments per day (based on date column) - output needs to be like this:
I have written the following code, but interviewer ask me to write a SQL query without JOIN and UNION, achieve the same output.
SELECT
COALESCE(a.order_date, b.ship_date), orders, shipments
FROM
(SELECT
order_date, COUNT(1) AS orders
FROM
table
GROUP BY 1) a
FULL JOIN
(SELECT
ship_date, COUNT(1) AS shipments
FROM table) b ON a.order_date = b.ship_date
Is this possible? Could you guys please advice?
You can use UNION and GROUP BY with conditional aggregation as follows:
SELECT DATE_,
COUNT(CASE WHEN FLAG = 'ORDER' THEN 1 END) AS ORDERS,
COUNT(CASE WHEN FLAG = 'SHIP' THEN 1 END) AS SHIPMENTS
FROM (SELECT ORDER_DATE AS DATE_, 'ORDER' AS FLAG FROM YOUR_TABLE
UNION ALL
SELECT SHIP_DATE AS DATE_, 'SHIP' AS FLAG FROM YOUR_TABLE) T
In BigQuery, I would express this as:
select date, countif(n = 0) as orders, countif(n = 1) as numships
from t cross join
unnest(array[order_date, ship_date]) date with offset n
group by 1
order by date;
The advantage of this approach (over union all) is two-fold. First, it only scans the table once. More importantly, the unnest() is all on the same node where the data resides -- so data does not need to be moved for the unpivot.

MIN and DISTINCT for ORACLE DB

I have this query
SELECT table_article.articleID, table_article_to_date.sellDate, MIN(table_article.price) AS minPrice
FROM table_article table_article
LEFT JOIN table_article_to_date table_article_to_date ON (table_article.ord_no=table_article_to_date.ord_no)
WHERE table_article.price > 0 AND table_article_to_date.sellDate BETWEEN_TWO_DATES
GROUP BY table_article.articleID, table_article_to_date.sellDate, table_article.price
For the sell_date I use a time range. My problem is, that i get more than one entrie each articleID.
I wish to have the lowest price of each articleID in a specified time range. DISTINCT is not woking with MIN
Givs a change to make this with a query?
The problem here is that you are adding the sell date to the GROUP BY clause. TO solve the issue, you need to take it out and make use of subqueries to get it back. To achieve this, you can do an inner join of the query and a query with the id, sell date and prize on the id and prize.
Also, no need for the prize in the group by, since it's already in the MIN.
SELECT articleData.articleId, articleData.sellDate, articleData.price FROM
(
SELECT articleId, MIN(price)
FROM table
[...]
GROUP BY articleId
) AS minPrice
INNER JOIN
(
SELECT articleId, sellDate, price
FROM table
) AS articleData
ON minPrice.price = articleData.price AND minPrice.articleId = articleData.articleId

SQL querying a customer ID who ordered both product A and B

Having a bit of trouble when trying to figure out how to return a query of a customer who ordered both A and B
What I'm looking for is all customers who order both product A and product B
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING COUNT(distinct product) = 2
I don't normally post code only answers but there isn't a lot that words can add to this- the query predominantly explains itself
You can also
HAVING max(product) <> min(product)
It may be worth pointing out that in queries, the WHERE is performed, filtering to just products A and B. Then the GROUP BY is performed, grouping customer and counting the distinct number of products (or getting the min and max). Then the HAVING is performed, filtering to just those with 2 distinct products (or getting only those where MIN i.e. A, is different to MAX i.e. B)
If you'v never encountered HAVING, it is logically equivalent to:
SELECT CustomerID
FROM(
SELECT CustomerID, COUNT(distinct product) as count_distinct_product
FROM table
WHERE product in ('a','b')
GROUP BY customerid
)z
WHERE
z.count_distinct_product = 2
In a HAVING clause you can only refer to columns that are mentioned in the group by. You can also refer to aggregate operations (such as count/min/max) on other columns not mentioned in the group by
I have never worked with SQLLite, but since it's specs say it is a Relational Database, it should allow the following query.
select CustomerID
from table t
where exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'A'
)
and exists (
select *
from table
where CustomerID = t.CustomerID
and Product = 'B'
)
I'd use a correlated sub-query with a HAVING clause to scoop in both products in a single WHERE clause.
SELECT
t.Customer
FROM
#t AS t
WHERE
EXISTS
(
SELECT
1
FROM
#t AS s
WHERE
t.Customer = s.Customer
AND s.Product IN ('A', 'B')
HAVING
COUNT(DISTINCT s.Product) = 2
)
GROUP BY
t.Customer;
Select customerid from table group by customerid having product like 'A' and product like 'B' or
you can try having count(distinct product) =2this seems to be more accurate.
The whole idea is in a group of customerid suppose 1 if I have several A's and B's count(distinct product) will give as 2 else it will be 1 so the answer is as above.
Another way I just figured out was
SELECT CustomerID
FROM table
WHERE product in ('a','b')
GROUP BY customerid
HAVING sum(case product ='a' then 1 else 0 end) > 0
and sum(case when product ='b' then 1 else 0 end) > 0

Using COUNT with MAX in SQL

I am trying to find which customer has the most transactions. Transaction table has an foreign key that identifies each transaction with a customer. What I currently is the following code:
WITH Customers as (
SELECT
[CustName] as 'Customer',
[TRANSACTION].[CustID] as 'Total # of Transactions'
FROM [dbo].[CUSTOMER]
INNER JOIN [dbo].[TRANSACTION]
ON [CUSTOMER].[CustID] = [TRANSACTION].[CustID]
)
SELECT *
FROM Customers
WHERE 'Total # of Transactions' = (SELECT MAX('Total # of Transactions') FROM Customers);
Two things are wrong:
1) The latter part of the code doesn't accept 'Total # of Transactions'. If I were to rename it to a single word, I could treat it kind of like a variable.
2) My last SELECT statement gives me a result of the customer and all their transactions, but doesn't give me a COUNT of those transactions. I'm not sure how to use COUNT in conjunction with MAX.
First select customers and transaction count.
Then select the largest one.
Them limit your select to that item.
Work you way from the inside out.
SELECT *
FROM Customers
WHERE CustID =
(
SELECT TOP 1 CustID
FROM (SELECT CustID, COUNT(*) AS TCOUNT
FROM TRANSACTIONS
GROUP BY CustID) T
ORDER BY T.TCOUNT DESC
) TT
This should get you everything you need. To get the top customer just add Top 1 to the select
WITH Customers as (
SELECT
[CustName] as Name
FROM [dbo].[CUSTOMER]
INNER JOIN [dbo].[TRANSACTION]
ON [CUSTOMER].[CustID] = [TRANSACTION].[CustID]
)
-- to get count of transactions
Select Count(*) as count, Name
FROM Customers
Group by Name
Order By Count(*) desc
Your inner table just returns CustID as a total number of transactions? You need to start by finding the total count for each customer. Also for a column you can use [Name], when you use apostrophes it thinks you are comparing a string. If you want to return all customers with the highest count, you could use this:
WITH TransactionCounts as (
SELECT
CustID,
COUNT(*) AS TransactionCount
FROM [dbo].[TRANSACTION]
GROUP BY CustID
)
SELECT TOP 1 CUSTOMER.*, TransactionCount
FROM TransactionCounts
INNER JOIN CUSTOMER ON CUSTOMER.CustID = TransactionCounts.CustId
ORDER BY TransactionCount DESC
-- alternate to select all if multiple customers are tied for highest count
--WHERE TransactionCount = (SELECT MAX(TransactionCount) FROM TransactionCounts)

How do I combine SELECT statements to allow me to calculate percentages, successes and failures in SQL Server?

Imagine a table :
CUST_PROMO (customer_id,PROMOTION) which is used as a mapping between every promotion that customer have received.
select promotion, count(customer_id) as promo_size
from CUST_PROMO
group by promotion
This gets us the total number of customers in each promotion.
Now, we've got CUSTOMER (customer_id, PROMO_RESPONDED,PROMO_PURCHASED), which lists the customer and which promotion got the customer to respond, and which got them to purchase.
select PROMO_RESPONDED, count(customer_id) as promo_responded
from CUSTOMER
group by PROMO_RESPONDED
select PROMO_PURCHASED,count(customer_id) as promo_responded
from CUSTOMER
group by PROMO_PURCHASED
This is all very self-explanatory; now I've got the number of people for whom each promo was successful.
But; what I'd like to end up with is [in CSV form]
PROMOTION,PROMO_SIZE,PROMO_RESPONDED,PROMO_PURCHASED,PROMO_RESPSUCCESSRATE,blah
1,100,12,5,12%,...
2,200,23,14,11.5%,...
I have no idea how to do this. I can UNION the three queries above; but that doesn't actually result in what I want. I thought about creating an in-memory table, inserting in each promo value and then doing an update statement with a join against it to set the values each -- but that's pretty messy; and requires a new UPDATE statement for each table/select statement. I could also make a temp table per result set and then join them together; but really; who wants to do that?
I can't think of any way of joining this data that makes any sense; since I'm dealing with aggregates.
So, at best, I need a function that, like UNION, will combine result sets, but will actually combine like columns on a key and ADD those columns rather than union which adds rows. The description makes it sound like a JOIN; but I can't see that working.
Thanks for the help!
SELECT
cp.promotion,
PROMO_SIZE = COUNT(*),
PROMO_RESPONDED = COUNT(c1.customer_id),
PROMO_PURCHASED = COUNT(c2.customer_id),
PROMO_RESPSUCCESSRATE = COUNT(c1.customer_id) * 100.0 / COUNT(*)
FROM CUST_PROMO cp
LEFT JOIN CUSTOMER c1
ON cp.customer_id = c1.customer_id AND cp.promotion = c1.PROMO_RESPONDED
LEFT JOIN CUSTOMER c2
ON cp.customer_id = c2.customer_id AND cp.promotion = c2.PROMO_PURCHASED
GROUP BY cp.promotion
WITH tmp AS
(
SELECT PROMOTION, 0 as promo_responded, 0 as promo_purchased, COUNT(customer_id) as total
FROM CUST_PROMO
GROUP BY PROMOTION
SELECT PROMOTION, COUNT(customer_id) as promo_responded, 0 as promo_purchased, 0 as total
FROM CUSTOMER
GROUP BY PROMO_RESPONDED
UNION
SELECT PROMOTION, COUNT(customer_id) as promo_purchased, 0 as promo_responded, 0 as total
FROM CUSTOMER
GROUP BY PROMO_PURCHASED
)
SELECT PROMOTION, SUM(promo_responded) as TotalResponded, SUM(promo_purchased) as TotalPurchased, SUM(Total) as TotalSize,
SUM(promo_responded)/SUM(Total) as ResponseRate, SUM(promo_purchased)/SUM(Total) as PurchaseRate
FROM tmp
Does this work? I'm not sure about division and multiplication operators, but I belive my logic is good.The key is using corelated select substatements in the select statement.
SELECT c.promotion,
COUNT(c.customer_id) as promo_size,
(SELECT COUNT(customer_id)
FROM CUSTOMER
WHERE PROMO_RESPONDED = c.promotion) PROMO_RESPONDED,
(SELECT COUNT(customer_id)
FROM CUSTOMER
WHERE PROMO_PURCHASED = c.promotion) PROMO_PURCHASED,
(SELECT COUNT(customer_id) *100/count(c.customer_id)
FROM CUSTOMER
WHERE PROMO_RESPONDED = c.promotion)
FROM CUST_PROMO c
GROUP BY c.promotion
A cleaner solution using decode. Still not sure the math is working
select PROMOTION, count(CUSTOMER_ID) as promo_size,
SUM(DECODE(PROMO_RESPONDED, PROMOTION, 1, 0)) PROMO_RESPONDED,
SUM(DECODE(PROMO_PURCHASED, PROMOTION, 1, 0)) PROMO PURCHASED,
SUM(DECODE(PROMO_RESPONDED, PROMOTION, 1, 0))*100/count(CUSTOMER_ID) PROMO_RESPONDED
from CUST_PROMO join CUSTOMER using CUSTOMER_ID
group by PROMOTION
Yes, I think JOINing the three aggregate queries is the way to go. The LEFT JOINs are there just in case some promotion get no response or no purchases.
I also changed the COUNT(customer_id) to COUNT(*). The result is the same, unless customer_id field can have NULL values in the two tables which most probably is not the case. If however, a customer may appear in two rows of a table with same promotion code, then you should change those into COUNT(DISTINCT customer_id) :
SELECT prom.promotion
, prom.promo_size
, responded.promo_responded
, purchased.promo_purchased
, responded.promo_responded / prom.promo_size
AS promo_response_success_rate
FROM
( SELECT promotion
, COUNT(*) AS promo_size
FROM CUST_PROMO
GROUP BY promotion
) AS prom
LEFT JOIN
( SELECT PROMO_RESPONDED AS promotion
, COUNT(*) AS promo_responded
FROM CUSTOMER
GROUP BY PROMO_RESPONDED
) AS responded
ON responded.promotion = prom.promotion
LEFT JOIN
( SELECT PROMO_PURCHASED AS promotion
, COUNT(*) AS promo_purchased
FROM CUSTOMER
GROUP BY PROMO_PURCHASED
) AS purchased
ON purchased.promotion = prom.promotion