Looping through table to create a new table in SQL jointly with group by Postgres - sql

Suppose a table has the following structure
product | day | transactionid | saleprice |
------------------------------------------------ |
Apple | 1 | 239849248 | 10 |
Apple | 2 | 239834328 | 10 |
Apple | 2 | 239849249 | 10 |
Apple | 3 | 239849234 | 11 |
Banana | 1 | 239843244 | 2 |
Banana | 2 | 239843244 | 2 |
Banana | 3 | 239843244 | 3 |
Banana | 4 | 239843244 | 3 |
Orange | 1 | 239234238 | 25 |
Orange | 2 | 239234238 | 25 |
Orange | 3 | 239234238 | 25 |
Orange | 3 | 239234238 | 26 |
Orange | 3 | 239234238 | 26 |
Orange | 4 | 239234238 | 27 |
Where a number of products are sold, every day, with multiple transactions at different prices. For each product, I am interested in a change-log of Min(SalePrice) (changelog because this rarely changes in my data). The following query gives me, for a particular product (say Orange):
SELECT max(product), day, min(saleprice)
FROM tableabove
where product = 'Orange'
group by day
order by day asc;
Gives me:
product | day | minsaleprice |
Orange | 1 | 25 |
Orange | 2 | 25 |
Orange | 3 | 25 |
Orange | 4 | 27 |
So, I have what I need for a product I specify, but now in the way I need it. For example, for orange I only need the days when the price changed (and Day 1) which means it should have only two rows for Day 1, and Day 4. I also do not know how to iterate this over all products in the table to generate a new table that looks as follows.
product | day | minsaleprice |
Apple | 1 | 10 |
Apple | 3 | 11 |
Banana | 1 | 2 |
Banana | 3 | 3 |
Orange | 1 | 25 |
Orange | 4 | 27 |
Any help is appreciated. Thanks.

I think you just want lag():
select t.*
from (select t.*,
lag(saleprice) over (partition by product order by day) as prev_saleprice
from tableabove t
) t
where prev_saleprice is null pr prev_saleprice <> saleprice;
EDIT:
If you only want changes day by day, then the same idea works with an additional aggregation:
select t.*
from (select t.product, t.day, min(salesprice) as min_saleprice
lag(min(saleprice)) over (partition by product order by day) as prev_minsaleprice
from tableabove t
group by t.product, t.day
) t
where prev_minsaleprice is null pr prev_minsaleprice <> minsaleprice;

Following on guidance from Gordon Linoff, I was was able to write the query as follows:
SELECT table2.*
FROM (SELECT table1.*, lag(table1.minsaleprice) OVER(partition by product) as prev_price
FROM (SELECT product, day, MIN(saleprice) as minsaleprice FROM tableabove
GROUP BY day, product ORDER BY product, day)
as table1)
as table2
WHERE prev_price IS null OR prev_fee <> minsaleprice

Related

How to get data from previos year?

Here my base sample
I need get data from previous period with lag in Hello table
Could you help me?
+------+--------+------+-------+
| Year | Animal | Plus | Hello |
+-------+------+--------+------+
| 2 | Cat | 3 | |
| 2 | Dog | 4 | |
| 2 | Mouse | 5 | |
| 3 | Cat | 5 | 3 |
| 3 | Dog | 6 | 4 |
| 3 | Mouse | 6 | 5 |
| 3 | Horse | 6 | |
| 3 | Pig | 6 | |
| 3 | Goose | 6 | |
| 4 | Cat | | 5 |
| 4 | Dog | | 6 |
| 4 | Mouse | | 6 |
| 4 | Horse | | 6 |
| 4 | Pig | | 6 |
+-------+------+--------+------+
You are looking for LAG. This function looks into previous rows.
select
place, year, animal, plus,
lag(plus) over (partition by animal order by year) as hello
from mytable
order by year, animal;
The "previous" row is the closest previous one, i.e. if for ' Goose' there are rows for year 3 and 5 and none for year 4, then year 3 would be considered the previous row for year 5 and LAG would show that value.
If you really want the adjacent previous year, i.e. year - 1, then you can select this year as follows:
select
place, year, animal, plus,
(
select plus
from mytable prev_year
where prev_year.animal = mytable.animal
and prev_year.year = mytable.year - 1)
) as hello
from mytable
order by year, animal;
Same thing with an outer join:
select
t.place, t.year, t.animal, t.plus, prev_year.plus as hello
from mytable t
left join mytable prev_year on prev_year.animal = t.animal
and prev_year.year = t.year - 1
order by t.year, t.animal;

SQL calculating sum and number of distinct values within group

I want to calculate
(1) total sales amount
(2) number of distinct stores per product
in one query, if possible. Suppose we have data:
+-----------+---------+-------+--------+
| store | product | month | amount |
+-----------+---------+-------+--------+
| Anthill | A | 1 | 1 |
| Anthill | A | 2 | 1 |
| Anthill | A | 3 | 1 |
| Beetle | A | 1 | 1 |
| Beetle | A | 3 | 1 |
| Cockroach | A | 1 | 1 |
| Cockroach | A | 2 | 1 |
| Cockroach | A | 3 | 1 |
| Anthill | B | 1 | 1 |
| Beetle | B | 2 | 1 |
| Cockroach | B | 3 | 1 |
+-----------+---------+-------+--------+
I have tried this with no luck:
select
[product]
,[month]
,[amount]
,cnt_distinct_stores = count(distinct(stores))
from dbo.temp
group by
[product]
,[month]
order by 1,2
Would there be possible any combination of GROUP BY clause with window functions like SUM(amount) OVER(partition by [product],[month] ORDER BY [month] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Try
SELECT product,
SUM(amount),
COUNT(DISTINCT store)
FROM dbo.temp
GROUP BY product

SELECTing monthly order amounts and item subtotals in a single query with two tables

I have an Orders table in the form:
| id | service_fee_cents | grand_total_cents | created_at |
|----|-------------------|-------------------|---------------|
| 1 | 1400 | 10000 | Jan 21 2018 |
| 2 | 1000 | 10000 | Feb 16 2018 |
| 3 | 500 | 10000 | March 21 2018 |
| 4 | 500 | 10000 | March 20 2018 |
And an Items table in the form
| id | order_id | title | price_cents | quantity |
|----|----------|--------|-------------|----------|
| 1 | 1 | lorem | 2000 | 2 |
| 2 | 1 | ipsum | 2030 | 1 |
| 3 | 2 | pie | 4000 | 4 |
| 4 | 3 | cheese | 6000 | 2 |
| 5 | 3 | burger | 7000 | 1 |
| 6 | 4 | custar | 1000 | 1 |
And I'm trying to run a SQL query to get a result in the form
| month | total_service_fee | total_grand_total | total_subtotal |
|-----------|-------------------|-------------------|----------------|
|2017-11-01 | 42 | 1,610 | 610 |
|2017-12-01 | 30 | 19,912 | 1,912 |
|2018-01-01 | 179 | 1,413 | 413 |
|2018-02-01 | 165 | 2,910 | 910 |
|2018-03-01 | 1,403 | 10,727 | 1,727 |
I've managed to get the first three columns using this query:
SELECT
date_trunc('month', created_at)::date AS month,
SUM(service_fee_cents) / 100 AS total_service_fee,
SUM(grand_total_cents) / 100 AS total_grand_total
FROM orders
GROUP BY month ORDER BY month
How do I get the last one? In the app, I get the sum via the following Ruby code:
order_subtotal = order.items.map{|item| item.price * item.quantity}.reduce(:+)
Which basically takes all the order's items, multiplies price by quantity and adds the results.
This should be a good start:
SELECT Date_trunc('month', created_at) :: DATE AS month,
SUM(service_fee_cents) / 100 AS total_service_fee,
SUM(grand_total_cents) / 100 AS total_grand_total,
SUM(total_subtotal) / 100 AS total_subtotals
FROM orders o
join (SELECT order_id,
SUM(price_cents * quantity) total_subtotal
FROM items i
GROUP BY order_id) i
ON o.id = i.order_id
GROUP BY month
ORDER BY month
You can get there by just joining the Orders table to the Items table and generating a SUM of subtotals by month. This may however be a somewhat expensive query to run if there are thousands of items in each order like you said.
SELECT
date_trunc('month', created_at)::date AS month,
SUM(service_fee_cents) / 100 AS total_service_fee,
SUM(grand_total_cents) / 100 AS total_grand_total,
SUM(price_cents * quantity) / 100 AS sub_total
FROM Orders o
JOIN Items i ON i.order_id = o.id
GROUP BY month ORDER BY month
http://sqlfiddle.com/#!15/555a2/1

For each category (A field), I would like the top ten types (another field) by count

Let's say I have a table that looks like this:
+------------+-------------+-------+
| Category | Type | Count |
+------------+-------------+-------+
| Fruits | Apple | 13 |
| Vegetables | Carrot | 7 |
| Legumes | Kidney Bean | 1 |
| Fruits | Orange | 1 |
| Vegetables | Green | 3 |
| Legumes | Black Bean | 1 |
| Vegetables | Leek | 1 |
| Fruits | Banana | 1 |
| Legumes | Lentil | 1 |
| Fruits | Mango | 1 |
| Fruits | Pinapple | 18 |
| Fruits | Strawberry | 1 |
| Legumes | Flat Bean | 2 |
| Vegetables | Brocolli | 8 |
| Fruits | Rambotan | 1 |
| Fruits | Marang | 15 |
| Vegetables | Cauliflower | 5 |
| Vegetables | Aubergine | 1 |
+------------+-------------+-------+
For each category, I would like the top ten types by count.
Given the table in question is actually millions of rows, if I simply did a select category, type, sum(Count) group by category, type order by category, type then I would get results where the type was not in the top ten.
I'm using postgresql but believe there's likely a more "general" sql way of doing this. Is there?
select Category, Type, Count from (
select your_table.*, row_number() over(partition by Category order by Count desc) as rn
from your_table
) t
where rn <= 10
This gives exactly 10 rows (if exists) for each Category, with highest Count column.
If you want top 10 result "with ties", then use rank() function instead of row_number()
U can use rank,row number and dense_rank
Link to dense rank and rank description
row_number

How to return smallest value inside the resultset as a separate column in SQL?

I've been struggling with the following SQL query.
My resultset is now:
| Id | Customer | Sales |
| 1 | 1 | 10 |
| 2 | 1 | 20 |
| 3 | 2 | 30 |
| 4 | 2 | 40 |
What I'd like to do is to add additional column that shows the smallest sale for that customer:
| Id | Customer | Sales | SmallestSale |
| 1 | 1 | 10 | 10 |
| 2 | 1 | 20 | 10 |
| 3 | 2 | 30 | 30 |
| 4 | 2 | 40 | 30 |
As the select query to get those three columns is now rather complex I'd like to avoid subqueries.
Any ideas?
Mika
Assuming your RDBMS supports windowed aggregates
SELECT Id,
Customer,
Sales,
MIN(Sales) OVER (PARTITION BY Customer) AS SmallestSale
FROM YourTable
select s.Id, s.Customer, s.Sales, sm.SmallestSale
from Sales s
inner join (
select Customer, min(sales) as SmallestSale
from Sales
group by Customer
) sm on s.Customer = sm.Customer