SQL: select, sort and join tables

SQL: select, sort and join tables - sql

For example I have table 'orders' with columns: ID, order_date, order_price. I need to sort part of table with previous dates by DESC and part of table for future dates by ASC.
For previous it would be:
SELECT * FROM orders WHERE order_date < CURRENT_DATE() ORDER BY DESC
For future dates it would be:
SELECT * FROM orders WHERE order_date >= CURRENT_DATE() ORDER BY ASC
How can I combine these requests in one?

assuming mysql 8 or later then row_number window function
(based on (current_date() is a mysql function)
select results.* from (
SELECT 0 - row_number() over (order by order_date desc) as RowNum
, * FROM orders WHERE order_date < CURRENT_DATE()
union
SELECT row_number() over (order by order_date asc) as RowNum
, * FROM orders WHERE order_date >= CURRENT_DATE()
) results
order by results.RowNum asc
Might need a tweek on the orders before current date

Since the DBMS wasn't provided just assuming it supports some DATEDIFF function. I'm sure there's a way to handle this in one clause, but can't think of it.
ORDER
BY CASE WHEN current_date < order_date THEN 0
ELSE 1
END,
CASE WHEN current_date >= order_date THEN DATEDIFF('day',current_date,order_date)
ELSE DATEDIFF('day',current_date,order_date)*-1
END

I would use:
order by (case when order_date < current_date() then 1 else 2 end),
(case when order_date < current_date() then order_date end) desc,
order_date asc
The first key explicitly puts the older records first. The second explicitly orders them by descending date. And the third orders the rest ascendingly.

Related

How to create a Postgres query that will generate a series with calculated values

I have this query that I use to calculate returning customers (with more than one order)
SELECT COUNT(*)
FROM (SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE shop_id = #{shop_id}
AND orders.active = true
AND orders.created_at >= '#{from}'
AND orders.created_at < '#{to}'
GROUP BY customer_id
HAVING COUNT(orders) > 1
ORDER BY order_count) src;
And if I want new customers (that have only one order) I simply change this line:
HAVING COUNT(orders) = 1
Now, how can I generate a series between 2 given dates that will give me the number of new and returning customers for each day between the dates?
Expected result:
date
new
returning
2022-01-01
2
3
2022-01-02
5
9
I have tried this but doesn't work at all (error syntax near from is the error I'm getting) and I'm not sure how to fix. Ideas?
select *, return_customers
from (select created_at, count(*) as order_count
from orders
where shop_id = 43
and created_at >= '2022-07-01'
and created_at < '2022-07-10'
group by customer_id
having count(orders) > 1
order by order_count) as return_customers from generate_series(timestamp '2007-01', timestamp '2022-07-11', interval '1 day')
as g(created_at)
left join (
select created_at::date,
count(*) as order_count
from orders
where shop_id 43
and created_at >= '2022-07-01'
and created_at < '2022-07-10'
group by customer_id
having count(orders) > 1
order by order_count
group by 1) o using (created_at)) sub
order by created_at desc;

This is based on your initial query w/o the having clause and conditional counts with filter. order by in src is redundant too.
SELECT src.order_date as "date",
COUNT(*) filter (where order_count > 1) as "returning",
COUNT(*) filter (where order_count = 1) as "new"
FROM
(
SELECT date_trunc('day', o.created_at)::date as order_date,
COUNT(*) as order_count
FROM orders o
WHERE o.shop_id = #{shop_id}
AND o.active
AND o.created_at >= '#{from}'
AND o.created_at < '#{to}'
GROUP BY o.customer_id, order_date
) as src
group by order_date;

Find rows with similar date values

I want to find customers where for example, system by error registered duplicates of an order.
It's pretty easy, if reg_date is EXACTLY the same but I have no idea how to implement it in query to count as duplicate if for example there was up to 1 second difference between transactions.
select * from
(select customer_id, reg_date, count(*) as cnt
from orders
group by 1,2
) x where cnt > 1
Here is example dataset:
https://www.db-fiddle.com/f/m6PhgReSQbVWVZhqe8n4mi/0
CUrrently only customer's 104 orders are counted as duplicates because its reg_date is identical, I want to count also orders 1,2 and 4,5 as there's just 1 second difference

demo:db<>fiddle
SELECT
customer_id,
reg_date
FROM (
SELECT
*,
reg_date - lag(reg_date) OVER (PARTITION BY customer_id ORDER BY reg_date) <= interval '1 second' as is_duplicate
FROM
orders
) s
WHERE is_duplicate
Use the lag() window function. It allows to have a look hat the previous record. With this value you can do a diff and filter the records where the diff time is more than one second.

Try this following script. This will return you day/customer wise duplicates.
SELECT
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy') reg_date,
customer_id,
count(*) as cnt
FROM orders
GROUP BY
TO_CHAR(reg_date :: DATE, 'dd/mm/yyyy'),
customer_id
HAVING count(*) >1

Same output in two different lateral joins

I'm working on a bit of PostgreSQL to grab the first 10 and last 10 invoices of every month between certain dates. I am having unexpected output in the lateral joins. Firstly the limit is not working, and each of the array_agg aggregates is returning hundreds of rows instead of limiting to 10. Secondly, the aggregates appear to be the same, even though one is ordered ASC and the other DESC.
How can I retrieve only the first 10 and last 10 invoices of each month group?
SELECT first.invoice_month,
array_agg(first.id) first_ten,
array_agg(last.id) last_ten
FROM public.invoice i
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id ASC
LIMIT 10
) first ON i.id = first.id
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id DESC
LIMIT 10
) last on i.id = last.id
WHERE i.invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
GROUP BY first.invoice_month, last.invoice_month;

This can be done with a recursive query that will generate the interval of months for who we need to find the first and last 10 invoices.
WITH RECURSIVE all_months AS (
SELECT date_trunc('month','2018-01-01'::TIMESTAMP) as c_date, date_trunc('month', '2018-05-11'::TIMESTAMP) as end_date, to_char('2018-01-01'::timestamp, 'YYYY-MM') as current_month
UNION
SELECT c_date + interval '1 month' as c_date,
end_date,
to_char(c_date + INTERVAL '1 month', 'YYYY-MM') as current_month
FROM all_months
WHERE c_date + INTERVAL '1 month' <= end_date
),
invocies_with_month as (
SELECT *, to_char(invoice_date::TIMESTAMP, 'YYYY-MM') invoice_month FROM invoice
)
SELECT current_month, array_agg(first_10.id), 'FIRST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date ASC limit 10
) first_10 ON TRUE
GROUP BY current_month
UNION
SELECT current_month, array_agg(last_10.id), 'LAST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date DESC limit 10
) last_10 ON TRUE
GROUP BY current_month;
In the code above, '2018-01-01' and '2018-05-11' represent the dates between we want to find the invoices. Based on those dates, we generate the months (2018-01, 2018-02, 2018-03, 2018-04, 2018-05) that we need to find the invoices for.
We store this data in all_months.
After we get the months, we do a lateral join in order to join the invoices for every month. We need 2 lateral joins in order to get the first and last 10 invoices.
Finally, the result is represented as:
current_month - the month
array_agg - ids of all selected invoices for that month
type - type of the selected invoices ('first 10' or 'last 10').
So in the current implementation, you will have 2 rows for each month (if there is at least 1 invoice for that month). You can easily join that in one row if you need to.

LIMIT is working fine. It's your query that's broken. JOIN is just 100% the wrong tool here; it doesn't even do anything close to what you need. By joining up to 10 rows with up to another 10 rows, you get up to 100 rows back. There's also no reason to self join just to combine filters.
Consider instead window queries. In particular, we have the dense_rank function, which can number every row in the result set according to groups:
SELECT
invoice_month,
time_of_month,
ARRAY_AGG(id) invoice_ids
FROM (
SELECT
id,
invoice_month,
-- Categorize as end or beginning of month
CASE
WHEN month_rank <= 10 THEN 'beginning'
WHEN month_reverse_rank <= 10 THEN 'end'
ELSE 'bug' -- Should never happen. Just a fall back in case of a bug.
END AS time_of_month
FROM (
SELECT
id,
invoice_month,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date) month_rank,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date DESC) month_rank_reverse
FROM (
SELECT
id,
invoice_date,
to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
) AS fiscal_year_invoices
) ranked_invoices
-- Get first and last 10
WHERE month_rank <= 10 OR month_reverse_rank <= 10
) first_and_last_by_month
GROUP BY
invoice_month,
time_of_month
Don't be intimidated by the length. This query is actually very straightforward; it just needed a few subqueries.
This is what it does logically:
Fetch the rows for the fiscal year in question
Assign a "rank" to the row within its month, both counting from the beginning and from the end
Filter out everything that doesn't rank in the 10 top for its month (counting from either direction)
Adds an indicator as to whether it was at the beginning or end of the month. (Note that if there's less than 20 rows in a month, it will categorize more of them as "beginning".)
Aggregate the IDs together
This is the tool set designed for the job you're trying to do. If really needed, you can adjust this approach slightly to get them into the same row, but you have to aggregate before joining the results together and then join on the month; you can't join and then aggregate.

Postgresql: How to use a WITH subquery with JOIN

I have 2 tables: orders and contragents. Each contragent might have many orders. Each order has an order_date. I want to get a first order date for each contragent, but with a caveat: if there was a gap between orders more than 180 days, I need to "forget" those before the gap (and thus the first order after the gap is considered "the first".
For this, I've implement a following statement:
with o1 as (
select order_date, lag(order_date) over(order by order_date ASC) as prev_order_date
from orders o
where o.contragent_code = :code
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
limit 1
this results in a single value being returned for a contragent with code code, which is what I need.
But I cannot figure out how to run a select that would return this date for every contragent in a table!
The only way I was able to do it was using a CREATE FUNCTION, but I will be unable to do it on production, so.. any advice is highly appreciated!

You want to add partition by, which is kinda like group by for over.
with o1 as (
select order_date, lag(order_date) over(partition by contragent_code order by order_date ASC) as prev_order_date
from orders o
order by order_date desc)
select o1.date_debts from o1
where extract(day from o1.order_date-o1.prev_order_date)>=180 or o1.prev_order_date is null
order by order_date desc
Now lag looks for the previous order_date of rows with same contragent_code.
UPDATE: at the end, it appears that that was not exactly enough. This is the final statement:
with s as (
select o.contragent_code, o.order_date,
case
when
extract(day from order_date-lag(order_date) over(partition by contragent_code order by order_date asc))>=180
then o.order_date else null
end as date_with_gap
from orders o
) select contragent_code, coalesce(max(date_with_gap), min(order_date)) from s
group by contragent_code

Sql query needs to sort on multiple date columns together

I have a table with three date fields, a start date, mid-term date, and end date. I would like to create a single query to get the most recent activity from the table. Activity in this case being when the date fields are updated.
Without having to write 3 separate queries and then combining the values in my code to get the 10 most recent activities, can I do this in one query. So right now I have
SELECT TOP 10 * FROM core_table
ORDER BY [start_date] Desc
SELECT TOP 10 * FROM core_table
ORDER BY [process_date] Desc
SELECT TOP 10 * FROM core_table
ORDER BY [archive_date] Desc
So I would want to pull the results of those three queries together to get the top 10 entries based on all three dates.

based on answer given by Itiong_sh, which is not exactly the same : you can do it in ORDER BY
select top 10 * from core_table
order by
CASE
WHEN start_date >= process_date AND start_date >= archive_date
THEN start_date
WHEN process_date >= archive_date
THEN process_date
ELSE archive_date
END
DESC

I think you need UNION:
SELECT TOP 10
*
FROM
( ( SELECT TOP 10
*, start_date AS activity_date
FROM core_table
ORDER BY [start_date] DESC
)
UNION
( SELECT TOP 10
*, process_date AS activity_date
FROM core_table
ORDER BY [process_date] DESC
)
UNION
( SELECT TOP 10
*, archive_date AS activity_date
FROM core_table
ORDER BY [archive_date] DESC
)
) AS t
ORDER BY activity_date DESC ;

An expansion on Raphaël Althaus' answer:
CREATE TABLE core_table (
...
max_date AS
CASE
WHEN start_date >= process_date AND start_date >= archive_date
THEN start_date
WHEN process_date >= archive_date
THEN process_date
ELSE archive_date
END
);
CREATE INDEX core_table_ie1 ON core_table (max_date);
Then, you can simply...
SELECT TOP 10 *
FROM core_table
ORDER BY max_date DESC;
...and it should use the index range scan instead of a full table scan.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: select, sort and join tables - sql

Related

How to create a Postgres query that will generate a series with calculated values

Find rows with similar date values

Same output in two different lateral joins

Postgresql: How to use a WITH subquery with JOIN

Sql query needs to sort on multiple date columns together

Categories

Resources