How do you select latest entry per column1 and per column2? - sql

I'm fairly new to mysql and need a query I just can't figure out. Given a table like so:
emp cat date amt cum
44 e1 2009-01-01 1 1
44 e2 2009-01-02 2 2
44 e1 2009-01-03 3 4
44 e1 2009-01-07 5 9
44 e7 2009-01-04 5 5
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e7 2009-01-02 2 2
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I need to select the latest date transaction per 'emp' and per 'cat'. The above table would produce something like:
emp cat date amt cum
44 e1 2009-01-07 5 9
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I've tried something like:
select * from orders where emp=44 and category='e1' order by date desc limit 1;
select * from orders where emp=44 and category='e2' order by date desc limit 1;
....
but this doesn't feel right. Can anyone point me in the right direction?

This should work, but I haven't tested it.
SELECT orders.* FROM orders
INNER JOIN (
SELECT emp, cat, MAX(date) date
FROM orders
GROUP BY emp, cat
) criteria USING (emp, cat, date)
Basically, this uses a subquery to get the latest entry for each emp and cat, then joins that against the original table to get all the data for that order (since you can't GROUP BY amt and cum).

The answer given by #R.Bemrose should work, and here's another trick for comparison:
SELECT o1.*
FROM orders o1
LEFT OUTER JOIN orders o2
ON (o1.emp = o2.emp AND o1.cat = o2.cat AND o1.date < o2.date)
WHERE o2.emp IS NULL;
This assumes that the columns (emp, cat, date) comprise a candidate key. I.e. there can be only one date for a given pair of emp & cat.

Related

Row_Number Sybase SQL Anywhere change on multiple condition

I have a selection that returns
EMP DOC DATE
1 78 01/01
1 96 02/01
1 96 02/01
1 105 07/01
2 4 04/01
2 7 04/01
3 45 07/01
3 45 07/01
3 67 09/01
And i want to add a row number (il'l use it as a primary id) but i want it to change always when the "EMP" changes, and also won't change when the doc is same as previous one like:
EMP DOC DATE ID
1 78 01/01 1
1 96 02/01 2
1 96 02/01 2
1 105 07/01 3
2 4 04/01 1
2 7 04/01 2
3 45 07/01 1
3 45 07/01 1
3 67 09/01 2
In SQL Server I could use LAG to compare previous DOC but I can't seem to find a way into SYBASE SQL Anywhere, I'm using ROW_NUMBER to partitions by the "EMP", but it's not what I need.
SELECT EMP, DOC, DATE, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY EMP, DOC, DATE) ID -- <== THIS WILL CHANGE THE ROW NUMBER ON SAME DOC ON SAME EMP, SO WOULD NOT WORK.
Anyone have a direction for this?
You sem to want dense_rank():
select
emp,
doc,
date,
dense_rank() over(partition by emp order by date) id
from mytable
This numbers rows within groups having the same emp, and increments only when date changes, without gaps.
if performance is not a issue in your case, you can try sth. like:
SELECT tx.EMP, tx.DOC, tx.DATE, y.ID
FROM table_xxx tx
join y on tx.EMP = y.EMP and tx.DOC = y.DOC
(SELECT EMP, DOC, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY DOC) ID
FROM(SELECT EMP, DOC FROM table_xxx GROUP BY EMP, DOC)x)y

Find how many times the department of an employee has changed

I have the following table Employee storing any updates made on an employee:
EmployeeId DepartmentId Status From To
44 30 Recruited 01/01/2017 06/03/2017
44 56 IN 07/03/2017 07/03/2018
44 67 IN 06/05/2018 06/09/2018
44 33 IN 07/09/2018 02/02/2019
44 33 OUT 03/02/2019 31/12/2019
44 45 Recruited 01/02/2020 03/02/2020
44 45 IN 04/02/2020 NULL
I want to count how many times each employee has changed his department knowing that the employee life cycle is like below : Recuited - IN - OUT
and the employees that left the company then went back to it like in this example.
I'm not sure what "Recruited", "In" and "Out" have to do with this. If each row represents a period of time when an employee was in a department, then use lag() to measure changes:
select employeeId, count(*)
from (select t.*,
lag(departmentId) over (partition by employeeId order by from_date) as prev_departmentId
from t
) t
where prev_departmentId is null or prev_departmentId <> departmentId
group by employeeId;

calculate Count and Sum from two different table with group by without using inner query

I have two table first A having column id,phone_number,refer_amount
and second B having column phone_number,transaction_amount
now i want sum() of refer_amount and transaction_amount and count() of phone_number from both table using group by phone_number without using inner query
Table A
phone_number refer_amount
123 50
456 80
789 90
123 90
123 80
123 20
456 20
456 79
456 49
123 49
Table B
phone_number transaction_amount
123 50
123 51
123 79
456 22
456 11
456 78
456 66
456 88
456 88
456 66
789 66
789 23
789 78
789 46
i have tried following query but it gives me wrong output:
SELECT a.phone_number,COUNT(a.phone_number) AS refer_count,SUM(a.refer_amount) AS refer_amount,b.phone_number,COUNT(b.phone_number) AS toal_count,SUM(b.transaction_amount) AS transaction_amount FROM dbo.A AS a,dbo.B AS b WHERE a.phone_number=b.phone_number GROUP BY a.phone_number,b.phone_number
output (wrong):
phone_number refer_count refer_amount phone_number transaction_count transaction_amount
123 15 867 123 15 900
456 28 1596 456 28 1676
789 5 450 789 5 291
output (That I want):
phone_number refer_count refer_amount phone_number transaction_count transaction_amount
123 5 289 123 3 180
456 4 228 456 7 419
789 1 90 789 5 291
I would do the aggregations on the B table in a separate subquery, and then join to it:
SELECT
a.phone_number,
COUNT(a.phone_number) AS a_cnt,
SUM(a.refer_amount) AS a_sum,
COALESCE(b.b_cnt, 0) AS b_cnt,
COALESCE(b.b_sum, 0) AS b_sum
FROM A a
LEFT JOIN
(
SELECT
phone_number,
COUNT(*) AS b_cnt,
SUM(transaction_amount) AS b_sum
FROM B
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
One major potential issue with your current approach is that the join could result in duplicate counting, as a given phone_number record in the A table gets replicated due to the join.
Speaking of joins, note that above I use an explicit join, rather than the implicit one you were using. In general, you should not put commas into the FROM clause.
This can help. You don't need sum(b.phone_number) when checking for a.phone_number = b.phone_number. Distinct is needed for phone number as there are two columns to consider.
For group by, anything not in aggregate function needs to be in group by function.
select a.phone_number, count(distinct a.phone_number), sum(a.refer_amount),
sum (b.transaction_amount)
from A as a, B as b
where a.phone_number=b.phone_number
group by a.phone_number

Split a column based on a character in BigQuery

I have a table as shown below on BigQuery
Name | Score
Tim | 63 > 89 > 90
James| 67 > 44
I want to split the Score column into N separate columns where N is the maximum score length in the entire table. I would like the table to be as follow.
Name| Score_1 | Score_2 | Score_3
Tim | 63 | 89 | 90
James| 67 | 44 | 0 or NA
I tried the Split command but I end up doing a new row for each Name-Score combination.
For BigQuery Standard SQL
Below is simple case and assumes you know in advance the expected max score length (3 in below example)
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
SELECT 'James', '67 > 44'
)
SELECT
name,
score[SAFE_OFFSET(0)] AS score_1,
score[SAFE_OFFSET(1)] AS score_2,
score[SAFE_OFFSET(2)] AS score_3
FROM (
SELECT name, SPLIT(score, ' > ') score
FROM `project.dataset.your_table`
)
with result
Row name score_1 score_2 score_3
1 Tim 63 89 90
2 James 67 44 null
Of course above approach means - if you have many scores - like 10 or 20 or more - you will need to add respective number of extra lines like below
score[SAFE_OFFSET(20)] AS score_21
So, above gives you what you wanted from schema of output point of view
At the same time, below makes more sense to me and in most practical cases is better and most optimal :
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
SELECT 'James', '67 > 44'
)
SELECT name, score
FROM `project.dataset.your_table`, UNNEST(SPLIT(score, ' > ')) score
with result
Row name score
1 Tim 63
2 Tim 89
3 Tim 90
4 James 67
5 James 44

Sql get latest records of the month for each name

This question is probably answered before but i cant find how to get the latest records of the months.
The problem is that I have a table with sometimes 2 row for the same month. I cant use the aggregate function(I guess) cause in the 2 rows, i have different data where i need to get the latest.
Example:
name Date nuA nuB nuC nuD
test1 05/06/2013 356 654 3957 7033
test1 05/26/2013 113 237 399 853
test3 06/06/2013 145 247 68 218
test4 06/22/2013 37 37 6 25
test4 06/27/2013 50 76 20 84
test4 05/15/2013 34 43 34 54
I need to get a result like:
test1 05/26/2013 113 237 399 853
test3 06/06/2013 145 247 68 218
test4 05/15/2013 34 43 34 54
test4 06/27/2013 50 76 20 84
** in my example the data is in order but in my real table the data is not in order.
For now i have something like:
SELECT Name, max(DATE) , nuA,nuB,nuC,nuD
FROM tableA INNER JOIN
Group By Name, nuA,nuB,nuC,nuD
But it didn't work as i want.
Thanks in advance
Edit1:
It seems that i wasn't clear with my question...
So i add some data in my example to show you how i need to do it.
Thanks guys
Use SQL Server ranking functions.
select name, Date, nuA, nuB, nuC, nuD from
(Select *, row_number() over (partition by name, datepart(year, Date),
datepart(month, Date) order by Date desc) as ranker from Table
) Z
where ranker = 1
Try this
SELECT t1.* FROM Table1 t1
INNER JOIN
(
SELECT [name],MAX([date]) as [date] FROM Table1
GROUP BY [name],YEAR([date]),MONTH([date])
) t2
ON t1.[date]=t2.[date] and t1.[name]=t2.[name]
ORDER BY t1.[name]
Can you not just do an order
select * from tablename where Date = (select max(Date) from tablename)
followed by only pulling the first 3?