SQL: Select distinct sum of column with max(column) - sql

I have a salary table like this:
id | person_id | start_date | pay
1 | 1234 | 2012-01-01 | 3000
2 | 1234 | 2012-05-01 | 3500
3 | 5678 | 2012-01-01 | 5000
4 | 5678 | 2013-01-01 | 6000
5 | 9101 | 2012-09-01 | 2000
6 | 9101 | 2014-04-01 | 3000
7 | 9101 | 2011-01-01 | 1500
and so on...
Now I want to query the sum of the salaries of a specific month for all persons of a company.
I already have the ids of the persons who worked in the specific month in the specific company, so I can do something like WHERE person_id IN (...)
I have some problems with the salaries query though. The result for e.g. the month 2012-08 should be:
10000
which is 3500+5000+1500.
So I need to find the summed up pay value (for all persons in the IN clause) for the maximum start_date <= the specific month.
I tried various INNER JOINS but it's been a long day and I can't think straight at the moment.
Any hint is highly appreciated.

You need to get the active record. This following does this by calculating the max start date before the month in question:
select sum(s.pay)
from (select person_id, max(start_date) as maxstartdate
from salary
where person_id in ( . . . ) and
start_date < <first day of month of interest>
group by person_id
) p join
salary s
on s.person_id = p.person_id and
s.maxstartdate = p.start_date
You need to fill in the month and list of ids.
You can also do this with ranking functions, but you don't specify which SQL engine you are using.

You have to use group by for these things....
select person_id,sum(pay) from salary where person_id in(...) group by person_id
may it will helps you.....

Related

SQL MAX: max date from multiple locations same part

what I'm looking to find is that last or max date a part number was purchased from any store. so we can have so sales or sales and just give the max date:
part
date
loc
123
8/1/2022
store 1
123
8/2/2022
store 1
123
null
store 2
123
8/3/2022
store 3
result would be:
part
date
Loc
123
8/3/2022
store 1
123
8/3/2022
store 2
123
8/3/2022
store 3
Select the max date in a subquery for every part, it would give you one Result, the highest date.
The Query should work with most rdms
SELECT DISTINCT [part], (SELECT MAX([date]) FROM Table1 WHERE part = t1.part) [Date],[loc] FROM Table1 t1
part | Date | loc
---: | :------- | :------
123 | 8/3/2022 | store 1
123 | 8/3/2022 | store 2
123 | 8/3/2022 | store 3
db<>fiddle here
I am sure there is a more efficient way to do the query but I used a subquery. this should get you the desired result
SELECT DISTINCT m.[part], ad.x AS 'date', m.[loc]
FROM [MainTable] AS 'm'
LEFT OUTER JOIN
(SELECT MAX([date]) AS 'x', [part]
FROM [MainTable]
GROUP BY [part]) AS 'ad'
WHERE m.[part] = 123 --desired value
nbk's answer
There is the cleaner query.

Alias Reference Date_Diff Days. Need to Parse or create temp table with dates?

Below I have the tables and query which output the below
Table1
EmployeeID | StartDateTimestamp | CohortID | CohortName
---------- | ------------------ | -------- | ----------
1 | 20080101 01:30:00 | 1 | Peanut
1 | 20090204 01:01:00 | 2 | Apple
2 | 20190107 05:52:14 | 1 | Peanut
3 | 20190311 02:35:26 | 2 | Apple
Employee
EmployeeID | HireStartName | StartDateTimestamp2
---------- | ------------- | -------------------
1 | HiredStart | 20080501 01:30:00
1 | DeferredStart | 20090604 01:01:00
2 | HiredStart | 20190115 05:52:14
3 | HiredStart | 20190330 02:35:26
Query
select
t.cohortid,
min(e.startdatetimestamp2) first,
max(e.startdatetimestamp2) last
from table1 t
inner join employee e on e.employeeid = t.employeeid
group by t.cohort_id
Output
ID | FIRST | LAST
1 |20190106 12:00:05 |20180214 03:45:12
2 |20180230 01:45:23 |20180315 01:45:23
My attempt:
SELECT DATE_DIFF(first, last, Day), ID, max(datecolumn1) first, min(datecolumn1) last
Error: Unrecognized name.
How do I enter the reference alias first and last in a Date_Diff?
Do I need to derive a table?
Clarity: Trying to avoid inputting in the dates, since I am looking to find the date diff of both first and last columns for as many rows as there is data.
This answer has been discussed here: Date Difference between consecutive rows
DateDiff has deprecated, and now it is Date_Diff (first, last, day)
Then I tried:
SELECT ID, DATE_DIFF(PARSE_DATE('%y%m%d',t.first), PARSE_DATE('%y%m%d',t.last), DAY) days
FROM table
Failed to parse input string "20180125 01:00:05"
Tried this
SELECT CohortID, date_diff(first,last,day) as days
FROM (select cohortid,min(startdatetimestamp2) first,
max(startdatetimestamp2) last
FROM employee
JOIN table1 on table1.employeeid = employee.employeeid
group by cohortid)
I get days not found on either side of join
Regarding your first question about using aliases in a query, there are some restriction where to use them, specially in the FROM, GROUP BY and ORDER BY statements. I encourage you to have a look here to check these restrictions.
About your main issue, obtaining the date difference between two dates. I would like to point that your timestamp data, in both of your tables, are actually considered as DATETIME format in BigQuery. Therefore, you should use DATETIME builtin functions to get the desired results.
The below query uses the data you provided to obtain the aimed output.
WITH
data AS
(
SELECT
t.cohortid AS ID,
PARSE_DATETIME('%Y%m%d %H:%M:%S', MIN(e.startdatetimestamp2)) AS first,
PARSE_DATETIME('%Y%m%d %H:%M:%S', MAX(e.startdatetimestamp2)) AS last
FROM
`test-proj-261014.sample.table1` t
INNER JOIN
`test-proj-261014.sample.employee` e
ON
e.employeeid = t.employeeid
GROUP BY t.cohortid
)
SELECT
ID,
first,
last,
DATETIME_DIFF(last, first, DAY ) AS diff_days
FROM
data
And the output:
Notice that I created a temp table to format the fields StartDateTimestamp and StartDateTimestamp2, using the PARSE_DATETIME(). Afterwards, I used the DATETIME_DIFF() method to obtain the difference in days between the two fields.

Calculate time span over a number of records

I have a table that has the following schema:
ID | FirstName | Surname | TransmissionID | CaptureDateTime
1 | Billy | Goat | ABCDEF | 2018-09-20 13:45:01.098
2 | Jonny | Cash | ABCDEF | 2018-09-20 13:45.01.108
3 | Sally | Sue | ABCDEF | 2018-09-20 13:45:01.298
4 | Jermaine | Cole | PQRSTU | 2018-09-20 13:45:01.398
5 | Mike | Smith | PQRSTU | 2018-09-20 13:45:01.498
There are well over 70,000 records and they store logs of transmissions to a web-service. What I'd like to know is how would I go about writing a script that would select the distinct TransmissionID values and also show the timespan between the earliest CaptureDateTime record and the latest record? Essentially I'd like to see what the rate of records the web-service is reading & writing.
Is it even possible to do so in a single SELECT statement or should I just create a stored procedure or report in code? I don't know where to start aside from SELECT DISTINCT TransmissionID for this sort of query.
Here's what I have so far (I'm stuck on the time calculation)
SELECT DISTINCT [TransmissionID],
COUNT(*) as 'Number of records'
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
Not sure how to get the difference between the first and last record with the same TransmissionID I would like to get a result set like:
TransmissionID | TimeToCompletion | Number of records |
ABCDEF | 2.001 | 5000 |
Simply GROUP BY and use MIN / MAX function to find min/max date in each group and subtract them:
SELECT
TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime))
FROM yourdata
GROUP BY TransmissionID
HAVING COUNT(*) > 1
Use min and max to calculate timespan
SELECT [TransmissionID],
COUNT(*) as 'Number of records',datediff(s,min(CaptureDateTime),max(CaptureDateTime)) as timespan
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
A method that returns the average time for all transmissionids, even those with only 1 record:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime)) * 1.0 / NULLIF(COUNT(*) - 1, 0)
FROM yourdata
GROUP BY TransmissionID;
Note that you may not actually want the maximum of the capture date for a given transmissionId. You might want the overall maximum in the table -- so you can consider the final period after the most recent record.
If so, this looks like:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second,
MIN(CaptureDateTime),
MAX(MAX(CaptureDateTime)) OVER ()
) * 1.0 / COUNT(*)
FROM yourdata
GROUP BY TransmissionID;

How to sum different criteria in SQL?

I have data that looks like this in Redshift:
+-------------+------------+---------+
| Employee_ID | Manager_ID | Revenue |
+-------------+------------+---------+
| 123 | 123 | 1015.24 |
| 541 | 123 | 5587.23 |
+-------------+------------+---------+
I want to write a query that sums manager revenue whenever a Manager_ID is inputted and sums employee revenue whenever an Employee_ID is inputted. Currently, I have a query that looks like this and I have to run it twice:
SELECT
sum(revenue) as revenue
FROM
employee_rev r
WHERE
r.manager_id in ('123','124') --I change this to employee_ID the second time around
If it helps, there is another table like this:
+-------------+------------------------+
| Employee_ID | Role |
+-------------+------------------------+
| 123 | Manager |
| 541 | Individual Contributor |
+-------------+------------------------+
Thank you so much for your time, this seemed really simple and now I'm pretty frustrated.
I think you can just do:
SELECT sum(revenue) as revenue
FROM employee_rev r
WHERE 123 in (r.employee_id, r.manager_id);
That is, for a given id, look in both columns. An employee should never be in the manager column, so this would appear to do what you want.
EDIT:
For multiple ids, you would have to test independently. Either:
WHERE 123 IN (r.employee_id, r.manager_id) OR
456 IN (r.employee_id, r.manager_id)
or:
WHERE r.employee_id in (123, 456) OR
r.manager_id in (123, 456)
Use union to add two selects into one 'table', then sum it. I think this should work
SELECT sum(result) from (
SELECT
sum(revenue) as result
FROM
employee_rev r
WHERE
r.manager_id in ('123')
UNION ALL
SELECT
sum(revenue) as result
FROM
employee_rev r
WHERE
r.employee_id in ('124')
)

Find name of employees hired on different joining date

I wrote a query to find the employess hired on same date.
this is the query
select a.name,b.name,a.joining,b.joining from [SportsStore].[dbo].[Employees] a,
[SportsStore].[dbo].[Employees] b where a.joining = b.joining and a.name>b.name
Then a question popped into my mind. How do i find those employess only who were hired on different dates? I tried something like this
select a.name,b.name,a.joining,b.joining from [SportsStore].[dbo].[Employees] a,
[SportsStore].[dbo].[Employees] b where a.joining != b.joining and a.name>b.name
but then i realized this doesnt make sense . I thought about a sub query but it wont work either because we are selecting from two tables.
So i searched and could not find anything.
So the question is how do we "Find name of employees hired on different joining date?"
JOIN the Employees table with a subquery that counts the joining dates.
where j.num = 1
returns employees hired on different dates
where j.num > 1
returns employees hired on same date
select e.id, e.name, e.joining
from [SportsStore].[dbo].[Employees] e
inner join (select joining, count(*) num
from [SportsStore].[dbo].[Employees]
group by joining) j
on j.joining = e.joining
where j.num = 1;
+----+------+---------------------+
| id | name | joining |
+----+------+---------------------+
| 1 | abc | 01.01.2017 00:00:00 |
+----+------+---------------------+
| 2 | def | 01.01.2017 00:00:00 |
+----+------+---------------------+
| 5 | mno | 01.01.2017 00:00:00 |
+----+------+---------------------+
+----+------+---------------------+
| id | name | joining |
+----+------+---------------------+
| 3 | ghi | 02.01.2017 00:00:00 |
+----+------+---------------------+
| 4 | jkl | 03.01.2017 00:00:00 |
+----+------+---------------------+
Can check it here: http://rextester.com/OOO96554
If you just need the names (and not the list of different hiring dates), the following rather simple query should do the job:
select id, name
from employee
group by id, name
having count(distinct joining) > 1
after getting the answer , I have another way to get the same result . Here it is. I Hope its helpful to others and someone might explain which approach is better and in what scenario .
select name,joining from [SportsStore].[dbo].[Employees] where joining not in
(
select joining
from [SportsStore].[dbo].[Employees]
group by joining
having count(*)=1
)