PostgreSQL: How to write a query for this scenario - sql

I have this below table.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 345 | 144.9 | 2015-09|
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
I need to write a query to show only the bill amount that has most recent transaction date (Date) like below.
+_______+________+__________+________+
|Playid |billid| amount | Date |
+_______+________+__________+________+
|123 | 456 | 200 | 2015-10|
+_______+________+__________+________+
Please help me how do I do it.

MAX(Date) can be used if you want to display only the playid and the most recent date.
However, The issue with what you are trying to do, is that you want to display all the columns. And this where the ranking functions come into play. In this case you can use the row_number function like this:
SELECT PlayId, billid, amount, date
FROM
(
SELECT
PlayId, billid, amount, date,
row_number() over(partition by playid order by date dec) as rn
FROM tablename
) t
where rn = 1
The row_number() over(partition by playid order by date dec) will give each group of playid a ranking number, the first one (the lowest one) will be the one with the most recent date. Then you just need to filter on the row number equal to 1.

Postgres offers distinct on. This is simpler to write and often has the best performance:
select distinct on (playid) t.*
from t
order by playid, order by date desc;

Related

BigQuery for running count of distinct values with a dynamic date-range

We are trying to make a query where we get the sum of unique customers on a specific year-month + the sum of unique customers on the 364 days before the specific date.
For example:
Our customer-table looks like this:
| order_date | customer_unique_id |
| -------- | -------------- |
| 2020-01-01 | tom#email.com |
| 2020-01-01 | daisy#email.com |
| 2019-05-02 | tom#email.com |
In this example we have two customers who ordered on 2020-01-01 and one of them already ordered within the 364-days timeframe.
The desired table should look like this:
| year_month | unique_customers |
| -------- | -------------- |
| 2020-01 | 2 |
We tried multiple solutions, such as partitioning and windows, but nothing seem to work correctly. The tricky part is the uniqueness. We want the look 364 days back but want to do a count distinct on the customers based on that whole period and not based on date/year/month because then we would get duplicates. For example, if you partition by date, year or month tom#email.com would be counted twice instead of once.
The goal of this query is to get insight into the order-frequency (orders divided by customers) over a time period from 12 months.
We work with Google BigQuery.
Hope someone can help us out! :)
Here is a way to achieve your desired results. Note that this query does year-month join in a separate query, and joins it with the rolling 364-day-interval query.
with year_month_distincts as (
select
concat(
cast(extract(year from order_date) as string),
'-',
cast(extract(month from order_date) as string)
) as year_month,
count(distinct customer_id) as ym_distincts
from customer_table
group by 1
)
select x.order_date, x.ytd_distincts, y.ym_distincts from (
select
a. order_date,
(select
count(distinct customer_id)
from customer_table b
where b.order_date between date_sub(a.order_date, interval 364 day) and a.order_date
) as ytd_distincts
from orders a
group by 1
) x
join year_month_distincts y on concat(
cast(extract(year from x.order_date) as string),
'-',
cast(extract(month from x.order_date) as string)
) = y.year_month
Two options using arrays that may help.
Look back 364 days as requested
In case you wish to look back 11 months (given reporting is monthly)
month_array AS (
SELECT
DATE_TRUNC(order_date,month) AS order_month,
STRING_AGG(DISTINCT customer_unique_id) AS cust_mth
FROM customer_table
GROUP BY 1
),
year_array AS (
SELECT
order_month,
STRING_AGG(cust_mth) OVER(ORDER by UNIX_DATE(order_month) RANGE BETWEEN 364 PRECEDING AND CURRENT ROW) cust_12m
-- (option 2) STRING_AGG(cust_mth) OVER (ORDER by cast(format_date('%Y%m', order_month) as int64) RANGE BETWEEN 99 PRECEDING AND CURRENT ROW) AS cust_12m
FROM month_array
)
SELECT format_date('%Y-%m',order_month) year_month,
(SELECT COUNT(DISTINCT cust_unique_id) FROM UNNEST(SPLIT(cust_12m)) AS cust_unique_id) as unique_12m
FROM year_array

Counting number of orders per customer

I have a table with the following columns: date, customers_id, and orders_id (unique).
I want to addd a column in which, for each order_id, I can see how many times the given customer has already placed an order during the previous year.
e.g. this is what it would look like:
customers_id | orders_id | date | order_rank
2083 | 4725 | 2018-08-314 | 1
2573 | 4773 | 2018-09-035 | 1
3393 | 3776 | 2017-09-11 | 1
3393 | 4172 | 2018-01-09 | 2
3393 | 4655 | 2018-08-17 | 3
I'm doing this in BigQuery, thank you!
Use count(*) with a window frame. Ideally, you would use an interval. But BigQuery doesn't (yet) support that syntax. So convert to a number:
select t.*,
count(*) over (partition by customer_id
order by unix_date(date)
range between 364 preceding and current row
) as order_rank
from t;
This treats a year as 365 days, which seems suitable for most purposes.
I suggest that you use the over clause and restrict the data in your where clause. You don't really need a window for your case. If you consider one your a period from 365 days in the past until now, this is gonna work:
select t.*,
count(*) over (partition by customer_id
order by date
) as c
from `your-table` t
where date > DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
order by customer_id, c
If you need some specific year, for example 2019, you can do something like:
select t.*,
count(*) over (partition by customer_id
order by date
) as c
from `your-table` t
where date between cast("2019-01-01" as date) and cast("2019-12-31" as date)
order by customer_id, c

SQL to find max of sum of data in one table, with extra columns

Apologies if this has been asked elsewhere. I have been looking on Stackoverflow all day and haven't found an answer yet. I am struggling to write the query to find the highest month's sales for each state from this example data.
The data looks like this:
| order_id | month | cust_id | state | prod_id | order_total |
+-----------+--------+----------+--------+----------+--------------+
| 67212 | June | 10001 | ca | 909 | 13 |
| 69090 | June | 10011 | fl | 44 | 76 |
... etc ...
My query
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders GROUP BY `month`, `state`
ORDER BY sales;
| month | state | sales |
+------------+--------+--------+
| September | wy | 435 |
| January | wy | 631 |
... etc ...
returns a few hundred rows: the sum of sales for each month for each state. I want it to only return the month with the highest sum of sales, but for each state. It might be a different month for different states.
This query
SELECT `state`, MAX(order_sum) as topmonth
FROM (SELECT `state`, SUM(order_total) order_sum FROM orders GROUP BY `month`,`state`)
GROUP BY `state`;
| state | topmonth |
+--------+-----------+
| ca | 119586 |
| ga | 30140 |
returns the correct number of rows with the correct data. BUT I would also like the query to give me the month column. Whatever I try with GROUP BY, I cannot find a way to limit the results to one record per state. I have tried PartitionBy without success, and have also tried unsuccessfully to do a join.
TL;DR: one query gives me the correct columns but too many rows; the other query gives me the correct number of rows (and the correct data) but insufficient columns.
Any suggestions to make this work would be most gratefully received.
I am using Apache Drill, which is apparently ANSI-SQL compliant. Hopefully that doesn't make much difference - I am assuming that the solution would be similar across all SQL engines.
This one should do the trick
SELECT t1.`month`, t1.`state`, t1.`sales`
FROM (
/* this one selects month, state and sales*/
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
) AS t1
JOIN (
/* this one selects the best value for each state */
SELECT `state`, MAX(sales) AS best_month
FROM (
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
)
GROUP BY `state`
) AS t2
ON t1.`state` = t2.`state` AND
t1.`sales` = t2.`best_month`
It's basically the combination of the two queries you wrote.
Try this:
SELECT `month`, `state`, SUM(order_total) FROM orders WHERE `month` IN
( SELECT TOP 1 t.month FROM ( SELECT `month` AS month, SUM(order_total) order_sum FROM orders GROUP BY `month`
ORDER BY order_sum DESC) t)
GROUP BY `month`, state ;

PostgreSQL return multiple rows with DISTINCT though only latest date per second column

Lets says I have the following database table (date truncated for example only, two 'id_' preix columns join with other tables)...
+-----------+---------+------+--------------------+-------+
| id_table1 | id_tab2 | date | description | price |
+-----------+---------+------+--------------------+-------+
| 1 | 11 | 2014 | man-eating-waffles | 1.46 |
+-----------+---------+------+--------------------+-------+
| 2 | 22 | 2014 | Flying Shoes | 8.99 |
+-----------+---------+------+--------------------+-------+
| 3 | 44 | 2015 | Flying Shoes | 12.99 |
+-----------+---------+------+--------------------+-------+
...and I have a query like the following...
SELECT id, date, description FROM inventory ORDER BY date ASC;
How do I SELECT all the descriptions, but only once each while simultaneously only the latest year for that description? So I need the database query to return the first and last row from the sample data above; the second it not returned because the last row has a later date.
Postgres has something called distinct on. This is usually more efficient than using window functions. So, an alternative method would be:
SELECT distinct on (description) id, date, description
FROM inventory
ORDER BY description, date desc;
The row_number window function should do the trick:
SELECT id, date, description
FROM (SELECT id, date, description,
ROW_NUMBER() OVER (PARTITION BY description
ORDER BY date DESC) AS rn
FROM inventory) t
WHERE rn = 1
ORDER BY date ASC;

SQL to find the date when the price last changed

Input:
Date Price
12/27 5
12/21 5
12/20 4
12/19 4
12/15 5
Required Output:
The earliest date when the price was set in comparison to the current price.
For e.g., price has been 5 since 12/21.
The answer cannot be 12/15 as we are interested in finding the earliest date where the price was the same as the current price without changing in value(on 12/20, the price has been changed to 4)
This should be about right. You didn't provide table structures or names, so...
DECLARE #CurrentPrice MONEY
SELECT TOP 1 #CurrentPrice=Price FROM Table ORDER BY Date DESC
SELECT MIN(Date) FROM Table WHERE Price=#CurrentPrice AND Date>(
SELECT MAX(Date) FROM Table WHERE Price<>#CurrentPrice
)
In one query:
SELECT MIN(Date)
FROM Table
WHERE Date >
( SELECT MAX(Date)
FROM Table
WHERE Price <>
( SELECT TOP 1 Price
FROM Table
ORDER BY Date DESC
)
)
This question kind of makes no sense so im not 100% sure what you are after.
create four columns, old_price, new_price, old_date, new_date.
! if old_price === new_price, simply print the old_date.
What database server are you using? If it was Oracle, I would use their windowing function. Anyway, here is a quick version that works in mysql:
Here is the sample data:
+------------+------------+---------------+
| date | product_id | price_on_date |
+------------+------------+---------------+
| 2011-01-01 | 1 | 5 |
| 2011-01-03 | 1 | 4 |
| 2011-01-05 | 1 | 6 |
+------------+------------+---------------+
Here is the query (it only works if you have 1 product - will have to add a "and product_id = ..." condition on the where clause if otherwise).
SELECT p.date as last_price_change_date
FROM test.prices p
left join test.prices p2 on p.product_id = p2.product_id and p.date < p2.date
where p.price_on_date - p2.price_on_date <> 0
order by p.date desc
limit 1
In this case, it will return "2011-01-03".
Not a perfect solution, but I believe it works. Have not tested on a larger dataset, though.
Make sure to create indexes on date and product_id, as it will otherwise bring your database server to its knees and beg for mercy.
Bernardo.