Create a column with daily count in impala - sql

I want to create a count column which will has the count per day. I have managed to do it like this:
select book, orders, s.common_id,s.order_date,d.customer_region,t.cnt
from books_tbt as s
inner join customer_tbt as d
on s.common_id = d.common_id
inner join (select count(*) as cnt,order_date from customer_tbt where customer !='null'
group by order_date) as t
on t.order_date = d.order_date
where d.customer !='null'
and s.order_date = 20220122
group by book, orders, s.common_id,s.order_date,d.customer_region,t.cnt;
I want to ask if there is a more efficient way to do it?

You can simply use COUNT(*) OVER( Partitioned by ORDER_DATE Order by ORDER_DATE) window function to calculate count for an order date.
select book, orders, s.common_id,s.order_date,d.customer_region,d.cnt
from books_tbt as s
inner join
( select d.*, COUNT(*) OVER( Partition by ORDER_DATE Order by ORDER_DATE) as cnt from customer_tbt d) as d on s.common_id = d.common_id -- count(*) over can not be calculated together with group by so we are using a sub qry
where d.customer !='null'
and s.order_date = 20220122
group by book, orders, s.common_id,s.order_date,d.customer_region,d.cnt;

Related

Combining multiple queries

I want a table with all customers and their last charge transaction date and their last invoice date. I have the first two, but don't know how to add the last invoice date to the table. Here's what I have so far:
WITH
--Last customer transaction
cust_trans AS (
SELECT customer_id, created
FROM charges a
WHERE created = (
SELECT MAX(created) AS last_trans
FROM charges b
WHERE a.customer_id = b.customer_id)),
--All customers
all_cust AS (
SELECT customers.id AS customer, customers.email, CAST(customers.created AS DATE) AS join_date, ((1.0 * customers.account_balance)/100) AS balance
FROM customers),
--Last customer invoice
cust_inv AS (
SELECT customer_id, date
FROM invoices a
WHERE date = (
SELECT MAX(date) AS last_inv
FROM invoices b
WHERE a.customer_id = b.customer_id))
SELECT * FROM cust_trans
RIGHT JOIN all_cust ON all_cust.customer = cust_trans.customer_id
ORDER BY join_date;
This should get what you need. Notice each individual subquery is left-joined to the customer table, so you always START with the customer, and IF there is a corresponding record in each subquery for max charge date or max invoice date, it will be pulled in. Now, you may want to apply a COALESCE() for the max dates to prevent showing nulls, such as
COALESCE(maxCharges.LastChargeDate, '') AS LastChargeDate
but your call.
SELECT
c.id AS customer,
c.email,
CAST(c.created AS DATE) AS join_date,
((1.0 * c.account_balance) / 100) AS balance,
maxCharges.LastChargeDate,
maxInvoices.LastInvoiceDate
FROM
customers c
LEFT JOIN
(SELECT
customer_id,
MAX(created) LastChargeDate
FROM
charges
GROUP BY
customer_id) maxCharges ON c.id = maxCharges.customer_id
LEFT JOIN
(SELECT
customer_id,
MAX(date) LastInvoiceDate
FROM
invoices
GROUP BY
customer_id) maxInvoices ON c.id = maxInvoices.customer_id
ORDER BY
c.created

SQL get top 3 values / bottom 3 values with group by and sum

I am working on a restaurant management system. There I have two tables
order_details(orderId,dishId,createdAt)
dishes(id,name,imageUrl)
My customer wants to see a report top 3 selling items / least selling 3 items by the month
For the moment I did something like this
SELECT
*
FROM
(SELECT
SUM(qty) AS qty,
order_details.dishId,
MONTHNAME(order_details.createdAt) AS mon,
dishes.name,
dishes.imageUrl
FROM
rms.order_details
INNER JOIN dishes ON order_details.dishId = dishes.id
GROUP BY order_details.dishId , MONTHNAME(order_details.createdAt)) t
ORDER BY t.qty
This gives me all the dishes sold count order by qty.
I have to manually filter max 3 records and reject the rest. There should be a SQL way of doing this. How do I do this in SQL?
You would use row_number() for this purpose. You don't specify the database you are using, so I am guessing at the appropriate date functions. I also assume that you mean a month within a year, so you need to take the year into account as well:
SELECT ym.*
FROM (SELECT YEAR(od.CreatedAt) as yyyy,
MONTH(od.createdAt) as mm,
SUM(qty) AS qty,
od.dishId, d.name, d.imageUrl,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_desc,
ROW_NUMBER() OVER (PARTITION BY YEAR(od.CreatedAt), MONTH(od.createdAt) ORDER BY SUM(qty) DESC) as seqnum_asc
FROM rms.order_details od INNER JOIN
dishes d
ON od.dishId = d.id
GROUP BY YEAR(od.CreatedAt), MONTH(od.CreatedAt), od.dishId
) ym
WHERE seqnum_asc <= 3 OR
seqnum_desc <= 3;
Using the above info i used i combination of group by, order by and limit
as shown below. I hope this is what you are looking for
SELECT
t.qty,
t.dishId,
t.month,
d.name,
d.mageUrl
from
(
SELECT
od.dishId,
count(od.dishId) AS 'qty',
date_format(od.createdAt,'%Y-%m') as 'month'
FROM
rms.order_details od
group by date_format(od.createdAt,'%Y-%m'),od.dishId
order by qty desc
limit 3) t
join rms.dishes d on (t.dishId = d.id)

Psql : Get Min, Max and Count records for each partner' invoice, and last payment

I have a table invoice like this :
id, partner_id, number, invoice_date
And a Payment Table like this:
id, payment_date, partner_id
I want to get min and max for both number & invoice_date, and count invoices, and last payment for each partner, something like this :
partner_id, min number, min date, max number, max date, count, last_pay
1, INV-2017-003, 02-01-2017, INV-2020-010, 01-01-2020, 142, 02-12-2019
5, INV-2019-124, 05-03-2019, INV-2020-005, 01-01-2020, 150, 01-01-2020
....
You can join those three tables including partners and grouping by parners' id column along with related aggregations :
select pr.id, min(invoice_date), max(invoice_date), count(*), max(payment_date) as last_pay
from partners pr
left join invoices i on i.partner_id = pr.id
left join payments p on p.partner_id = pr.id
group by pr.id
Update : You can use min() over (), max() over () and row_number() analytic functions to get the desired code depending on max and min dates :
select *
from
(
select pr.id,
min(invoice_date) over (partition by pr.id order by invoice_date) as min_invoice_date,
max(invoice_date) over (partition by pr.id order by invoice_date desc) as max_invoice_date,
max(code) over (partition by pr.id order by invoice_date desc) as max_code,
min(code) over (partition by pr.id order by invoice_date) as min_code,
count(*) over (partition by pr.id) as cnt,
max(payment_date) over (partition by pr.id) as last_pay,
row_number() over (partition by pr.id order by invoice_date desc) as rn
from partners pr
left join invoices i on i.partner_id = pr.id
left join payments p on p.partner_id = pr.id
) q
where rn = 1
Why isn't this simple aggregation?
select i.partner_id,
min(i.number) as min_number,
min(i.invoice_date) as min_invoice_date,
max(i.number) as min_number,
max(i.invoice_date) as min_invoice_date,
count(distinct i.invoice_id) as num_invoices,
max(p.payment_date) as max_payment_date
from invoices i left join
payments p
on p.invoice_id = i.invoice_id
group by i.partner_id;
If you want the number on the earliest invoice (and min() doesn't work), then you can do this with a "first" aggregation function. Unfortunately, Postgres doesn't directly support one. But it does through array functions:
select i.partner_id,
(array_agg(i.number order by i.invoice_date asc))[1] as min_number,
min(i.invoice_date) as min_invoice_date,
(array_agg(i.number order by i.invoice_date desc))[1] as min_number,
max(i.invoice_date) as min_invoice_date,
count(distinct i.invoice_id) as num_invoices,
max(p.payment_date) as max_payment_date
from invoices i left join
payments p
on p.partner_id = i.partner_id
group by i.partner_id;
This is similar to #BarbarosĂ–zhan, but calculates the min/max before the join (if you got multiple rows per partner for both invoices and payments the COUNT will be wrong otherwise). Additionally there's only a single PARTTION/ORDER which should result in a more efficient plan.
SELECT i.*, p.last_pay
FROM
( -- 1st row has all the min values = filtered using row_number
SELECT
partner_id
,number AS min_code
,invoice_date AS min_invoice_date
-- value from row with max date
,Last_Value(number)
Over (PARTITION BY partner_id
ORDER BY invoice_date
ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS max_code
,Last_Value(invoice_date)
Over (PARTITION BY partner_id
ORDER BY invoice_date
ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS max_invoice_date
,Count(*)
Over (PARTITION BY partner_id) AS Cnt
,Row_Number()
Over (PARTITION BY partner_id ORDER BY invoice_date) AS rn
FROM invoices
) AS i
LEFT JOIN
( -- max payment date per partner
SELECT partner_id, Max(payment_date) AS last_pay
FROM payments
GROUP BY partner_id
) AS p
ON p.partner_id = i.partner_id
WHERE i.rn = 1

how to filter data in sql based on percentile

I have 2 tables, the first one is contain customer information such as id,age, and name . the second table is contain their id, information of product they purchase, and the purchase_date (the date is from 2016 to 2018)
Table 1
-------
customer_id
customer_age
customer_name
Table2
------
customer_id
product
purchase_date
my desired result is to generate the table that contain customer_name and product who made purchase in 2017 and older than 75% of customer that make purchase in 2016.
Depending on your flavor of SQL, you can get quartiles using the more general ntile analytical function. This basically adds a new column to your query.
SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q
WHERE q4=4
This returns the lowest age of the 4th-quartile customers, which can be used in a subquery against the customers who made purchases in 2017.
The argument to ntile is how many buckets you want to divide into. In this case 75%+ equals 4th quartile, so 4 buckets is OK. The OVER() clause specifies what you want to sort by (customer_age in our case), and also lets us partition (group) the data if we want to, say, create multiple rankings for different years or countries.
Age is a horrible field to include in a database. Every day it changes. You should have date-of-birth or something similar.
To get the 75% oldest value in 2016, there are several possibilities. I usually go for row_number() and count(*):
select min(customer_age)
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c join
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
)
where seqnum >= 0.75 * cnt;
Then, to use this for a query for 2017:
with a2016 as (
select min(customer_age) as customer_age
from (select c.*,
row_number() over (order by customer_age) as seqnum,
count(*) over () as cnt
from customers c
where exists (select 1
from customer_products cp
where cp.customer_id = c.customer_id and
cp.purchase_date >= '2016-01-01' and
cp.purchase_date < '2017-01-01'
)
) c
where seqnum >= 0.75 * cnt
)
select c.*, cp.product_id
from customers c join
customer_products cp
on cp.customer_id = c.customer_id and
cp.purchase_date >= '2017-01-01' and
cp.purchase_date < '2018-01-01' join
a2016 a
on c.customer_age >= a.customer_age;

MySQL: Returning multiple columns from an in-line subquery

I'm creating an SQL statement that will return a month by month summary on sales.
The summary will list some simple columns for the date, total number of sales and the total value of sales.
However, in addition to these columns, i'd like to include 3 more that will list the months best customer by amount spent. For these columns, I need some kind of inline subquery that can return their ID, Name and the Amount they spent.
My current effort uses an inline SELECT statement, however, from my knowledge on how to implement these, you can only return one column and row per in-line statement.
To get around this with my scenario, I can of course create 3 separate in-line statements, however, besides this seeming impractical, it increases the query time more that necessary.
SELECT
DATE_FORMAT(OrderDate,'%M %Y') AS OrderMonth,
COUNT(OrderID) AS TotalOrders,
SUM(OrderTotal) AS TotalAmount,
(SELECT SUM(OrderTotal) FROM Orders WHERE DATE_FORMAT(OrderDate,'%M %Y') = OrderMonth GROUP BY OrderCustomerFK ORDER BY SUM(OrderTotal) DESC LIMIT 1) AS TotalCustomerAmount,
(SELECT OrderCustomerFK FROM Orders WHERE DATE_FORMAT(OrderDate,'%M %Y') = OrderMonth GROUP BY OrderCustomerFK ORDER BY SUM(OrderTotal) DESC LIMIT 1) AS CustomerID,
(SELECT CustomerName FROM Orders INNER JOIN Customers ON OrderCustomerFK = CustomerID WHERE DATE_FORMAT(OrderDate,'%M %Y') = OrderMonth GROUP BY OrderCustomerFK ORDER BY SUM(OrderTotal) DESC LIMIT 1) AS CustomerName
FROM Orders
GROUP BY DATE_FORMAT(OrderDate,'%m%y')
ORDER BY DATE_FORMAT(OrderDate,'%y%m') DESC
How can i better structure this query?
FULL ANSWER
After some tweaking of Dave Barkers solution, I have a final version for anyone in the future looking for help.
The solution by Dave Barker worked perfectly with the customer details, however, it made the simpler Total Sales and Total Sale Amount columns get some crazy figures.
SELECT
Y.OrderMonth, Y.TotalOrders, Y.TotalAmount,
Z.OrdCustFK, Z.CustCompany, Z.CustOrdTotal, Z.CustSalesTotal
FROM
(SELECT
OrdDate,
DATE_FORMAT(OrdDate,'%M %Y') AS OrderMonth,
COUNT(OrderID) AS TotalOrders,
SUM(OrdGrandTotal) AS TotalAmount
FROM Orders
WHERE OrdConfirmed = 1
GROUP BY DATE_FORMAT(OrdDate,'%m%y')
ORDER BY DATE_FORMAT(OrdDate,'%Y%m') DESC)
Y INNER JOIN
(SELECT
DATE_FORMAT(OrdDate,'%M %Y') AS CustMonth,
OrdCustFK,
CustCompany,
COUNT(OrderID) AS CustOrdTotal,
SUM(OrdGrandTotal) AS CustSalesTotal
FROM Orders INNER JOIN CustomerDetails ON OrdCustFK = CustomerID
WHERE OrdConfirmed = 1
GROUP BY DATE_FORMAT(OrdDate,'%m%y'), OrdCustFK
ORDER BY SUM(OrdGrandTotal) DESC)
Z ON Z.CustMonth = Y.OrderMonth
GROUP BY DATE_FORMAT(OrdDate,'%Y%m')
ORDER BY DATE_FORMAT(OrdDate,'%Y%m') DESC
Move the inline SQL to be a inner join query. So you'd have something like...
SELECT DATE_FORMAT(OrderDate,'%M %Y') AS OrderMonth, COUNT(OrderID) AS TotalOrders, SUM(OrderTotal) AS TotalAmount, Z.OrderCustomerFK, Z.CustomerName, z.OrderTotal as CustomerTotal
FROM Orders
INNER JOIN (SELECT DATE_FORMAT(OrderDate,'%M %Y') as Mon, OrderCustomerFK, CustomerName, SUM(OrderTotal) as OrderTotal
FROM Orders
GROUP BY DATE_FORMAT(OrderDate,'%M %Y'), OrderCustomerFK, CustomerName ORDER BY SUM(OrderTotal) DESC LIMIT 1) Z
ON Z.Mon = DATE_FORMAT(OrderDate,'%M %Y')
GROUP BY DATE_FORMAT(OrderDate,'%m%y'), Z.OrderCustomerFK, Z.CustomerName
ORDER BY DATE_FORMAT(OrderDate,'%y%m') DESC
You can also do something like:
SELECT
a.`y`,
( SELECT #c:=NULL ) AS `temp`,
( SELECT #d:=NULL ) AS `temp`,
( SELECT
CONCAT(#c:=b.`c`, #d:=b.`d`)
FROM `b`
ORDER BY b.`uid`
LIMIT 1 ) AS `temp`,
#c as c,
#d as d
FROM `a`
Give this a shot:
SELECT CONCAT(o.order_month, ' ', o.order_year),
o.total_orders,
o.total_amount,
x.sum_order_total,
x.ordercustomerfk,
x.customername
FROM (SELECT MONTH(t.orderdate) AS order_month,
YEAR(t.orderdate) AS order_year
COUNT(t.orderid) AS total_orders,
SUM(t.ordertotal) AS total_amount
FROM ORDERS t
GROUP BY MONTH(t.orderdate), YEAR(t.orderdate)) o
JOIN (SELECT MONTH(t.orderdate) AS ordermonth,
YEAR(t.orderdate) AS orderyear
SUM(t.ordertotal) 'sum_order_total',
t.ordercustomerfk,
c.customername
FROM ORDERS t
JOIN CUSTOMERS c ON c.customerid = o.ordercustomerfk
GROUP BY t.ordercustomerfk, MONTH(t.orderdate), YEAR(t.orderdate)) x ON x.order_month = o.order_month
AND x.order_year = o.order_year
ORDER BY o.order_year DESC, o.order_month DESC