SQL select specific group from table - sql

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.

I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;

This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

Related

Find last job change date with JOB_TITLE and EVENT_DATE

Hi I am working in an Azure Databricks and I am looking for a SQL query solution.
Assuming that my db has five columns:
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Basically I want the EVENT_DATE when JOB_TITLE changed the last time. This is my desired output:
ID
JOB_TITLE
PAY
LAST_JOB_CHANGE_DATE
12345
VP1
90,000
2021-01-10
12346
VP2
200,000
2021-02-01
For the last column LAST_JOB_CHANGE_DATE, we are pulling from the 2nd and 4th row of the table because that's the date when they changed job the last time.
Thank you!
You can just use INNER JOIN to accomplish that, ie
%sql
SELECT a.*
FROM yourTable a
INNER JOIN
(
SELECT id, MAX(event_date) event_date
FROM yourTable b
GROUP BY id
) b ON a.id = b.id
AND a.event_date = b.event_date
The ROW_NUMBER approach would also work well:
%sql
WITH cte AS
(
SELECT
ROW_NUMBER() OVER( PARTITION BY id ORDER BY event_date DESC ) AS rn,
*
FROM yourTable a
)
SELECT *
FROM cte
WHERE rn = 1
My results:
There's probably a simpler solution for this but the following should work.
I'm assuming you wanted the MOST resent job change for each employee. To illustrate this, I added an extra row for an Engineer1. The ROW_NUMBER() window function helps us with this.
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12345
2018-01-04
Engineer1
75,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Here is the query:
SELECT <---- (4)
c.ID,
c.JOB_TITLE,
c.PAY,
c.last_job_change_date
FROM
(
SELECT <---- (3)
b.ID,
ROW_NUMBER() OVER (PARTITION BY b.ID ORDER BY b.last_job_change_date DESC) AS row_id,
b.JOB_TITLE,
b.PAY,
b.last_job_change_date
FROM
(
SELECT <---- (2)
a.ID,
a.JOB_TITLE,
a.PAY,
a.EVENT_DATE as last_job_change_date
FROM
(
SELECT <---- (1)
ID,
EVENT_DATE,
PAY,
JOB_TITLE,
LEAD(JOB_TITLE, 1) OVER (
PARTITION BY ID ORDER BY EVENT_DATE DESC) job_change
FROM yourtable
) a
WHERE JOB_TITLE <> job_change
) b
) c
WHERE row_id = 1
I used a 4 step process and annotated the query with each step:
Returns a table with a column for the subsequent job title (ordered by most recent title) of each employee.
Returns the table from (1) but removes rows where the employee did not change their job
Add row numbers so we can get the most recent job change of each employee
Return most recent job changes for each employee

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

SQL query to group by data but with order by clause

I have table booking in which I have data
GUEST_NO HOTEL_NO DATE_FROM DATE_TO ROOM_NO
1 1 2015-05-07 2015-05-08 103
1 1 2015-05-11 2015-05-12 104
1 1 2015-05-14 2015-05-15 103
1 1 2015-05-17 2015-05-20 101
2 2 2015-05-01 2015-05-02 204
2 2 2015-05-04 2015-05-05 203
2 2 2015-05-17 2015-05-22 202
What I want is to get the result as.
1 ) It should show output as Guest_no, Hotel_no, Room_no, and column with count as number of time previous three column combination repeated.
So OutPut should like
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 2
1 1 104 1
1 1 101 1
2 2 204 1
etc. But I want result to in ordered way e.g.: The output should be order by bk.date_to desc
My query is as below its showing me count but if I use order by its not working
select bk.guest_no, bk.hotel_no, bk.room_no,
count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked
from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
order by bk.date_to desc
So with adding order by result is showing different , because as I added order by date_to column so i have to add this column is group by clause too which will end up in different result as below
GUEST_NO HOTEL_NO ROOM_NO Count
1 1 103 1
1 1 104 1
1 1 103 1
1 1 101 1
2 2 204 1
Which is not the output I want.
I want these four column but with order by desc of date_to column and count as no of repetition of first 3 columns
I think a good way to do this would be grouping by guest_no, hotel_no and room_no, and sorting by the maximum (i.e. most recent) booking date in each group.
SELECT
guest_no,
hotel_no,
room_no,
COUNT(1) AS BookingCount
FROM
booking
GROUP BY
guest_no,
hotel_no,
room_no
ORDER BY
MAX(date_to) DESC;
Maybe this is what you're looking for?
select
guest_no,
hotel_no,
room_no,
count(*) as Count
from
booking
group by
guest_no,
hotel_no,
room_no
order by
min(date_to) desc
Or maybe max() instead of min(). SQL Fiddle: http://sqlfiddle.com/#!6/e684c/3
You could try this.
select t.* from
(
select bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to,
count(*) as noOfTimesBooked from booking bk
group by bk.guest_no, bk.hotel_no, bk.room_no, bk.date_to
) t
order by t.date_to
You will also have to select date_to and then group the result by it.
If you use 'group by' clause, SQL Server doesn't allow you to use 'order by'. So you can make a sub query and use 'order by' in the outer query.
SELECT * FROM
(select bk.guest_no,bk.hotel_no,bk.room_no
,count(bk.guest_no+bk.hotel_no+bk.room_no) as noOfTimesRoomBooked,
(SELECT MAX(date_to) FROM booking CK
WHERE CK.guest_no=BK.guest_no AND bk.hotel_no=CK.bk.hotel_no
bk.room_no=CK.ROOM_NO ) AS DATEBOOK
from booking bk
group by bk.guest_no,bk.hotel_no,bk.room_no,bk.date_to) A
ORDER BY DATEBOOK
IT MIGHT HELP YOU

Select MAX for multiple criteria in a group

Apologies if this has been answered, I'm new enough that I didn't even know how to search:
I have one table:
Lot SKU Cost Date
1001-1 1001 .30 10-12-14
1001-2 1001 .33 10-19-14
1001-3 1001 .32 11-20-14
1002-1 1002 .45 10-12-14
1002-2 1002 .45 10-19-14
1002-3 1002 .44 12-01-14
1003-1 1003 .12 10-15-14
1003-2 1003 .13 10-19-14
1003-3 1003 .10 11-23-14
i need to sum the cost of the oldest row for each SKU.
expected outcome: (.30 + .45 + .12) = .87
is this possible through one query?
ANSI SQL support a function called row_number(), which can be very helpful for this type of query. The following is how you would use it in this case:
select sum(cost)
from (select t.*, row_number() over (partition by sku order by date) as seqnum
from table t
) t
where seqnum = 1;
This should work:
select sum(t.cost)
from some_table t
join ( select sku ,
min(some_datetime_column) oldest
from some_table
and some_datetime_column is not null
group by sku
) s on s.sku = t.sku
and s.oldest = t.some_datetime_column

Select info from table where row has max date

My table looks something like this:
group date cash checks
1 1/1/2013 0 0
2 1/1/2013 0 800
1 1/3/2013 0 700
3 1/1/2013 0 600
1 1/2/2013 0 400
3 1/5/2013 0 200
-- Do not need cash just demonstrating that table has more information in it
I want to get the each unique group where date is max and checks is greater than 0. So the return would look something like:
group date checks
2 1/1/2013 800
1 1/3/2013 700
3 1/5/2013 200
attempted code:
SELECT group,MAX(date),checks
FROM table
WHERE checks>0
GROUP BY group
ORDER BY group DESC
problem with that though is it gives me all the dates and checks rather than just the max date row.
using ms sql server 2005
SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group
That works to get the max date..join it back to your data to get the other columns:
Select group,max_date,checks
from table t
inner join
(SELECT group,MAX(date) as max_date
FROM table
WHERE checks>0
GROUP BY group)a
on a.group = t.group and a.max_date = date
Inner join functions as the filter to get the max record only.
FYI, your column names are horrid, don't use reserved words for columns (group, date, table).
You can use a window MAX() like this:
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
to get max dates per group alongside other data:
group date cash checks max_date
----- -------- ---- ------ --------
1 1/1/2013 0 0 1/3/2013
2 1/1/2013 0 800 1/1/2013
1 1/3/2013 0 700 1/3/2013
3 1/1/2013 0 600 1/5/2013
1 1/2/2013 0 400 1/3/2013
3 1/5/2013 0 200 1/5/2013
Using the above output as a derived table, you can then get only rows where date matches max_date:
SELECT
group,
date,
checks
FROM (
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY group)
FROM table
) AS s
WHERE date = max_date
;
to get the desired result.
Basically, this is similar to #Twelfth's suggestion but avoids a join and may thus be more efficient.
You can try the method at SQL Fiddle.
Using an in can have a performance impact. Joining two subqueries will not have the same performance impact and can be accomplished like this:
SELECT *
FROM (SELECT msisdn
,callid
,Change_color
,play_file_name
,date_played
FROM insert_log
WHERE play_file_name NOT IN('Prompt1','Conclusion_Prompt_1','silent')
ORDER BY callid ASC) t1
JOIN (SELECT MAX(date_played) AS date_played
FROM insert_log GROUP BY callid) t2
ON t1.date_played = t2.date_played
SELECT distinct
group,
max_date = MAX(date) OVER (PARTITION BY group), checks
FROM table
Should work.