Finding a date from a SUM - sql

I am having trouble finding what date my customers hit a certain threshold in how much money they make.
customer_id | Amount | created_at
---------------------------
1134 | 10 | 01.01.2010
1134 | 15 | 02.01.2010
1134 | 5 | 03.24.2010
1235 | 10 | 01.03.2010
1235 | 15 | 01.03.2010
1235 | 30 | 01.03.2010
1756 | 50 | 01.05.2010
1756 | 100 | 01.25.2010
To determine how much total amount they made I run a simple query like this:
SELECT customer_id, SUM(amount)
FROM table GROUP BY customer_id
But I need to be able to find for e.g. the date a customer hits $100 in total amount.
Any help is greatly appreciated. Thanks!

Jesse,
I believe you are looking for a version of "running total" calculation.
Take a look at this post calculate-a-running-total.
There is number of useful links there.
This article have a lot of code that you could reuse as well: http://www.sqlteam.com/article/calculating-running-totals.

Something like having clause
SELECT customer_id, SUM(amount) as Total FROM table GROUP BY customer_id having Total > 100

I'm not sure if MySQL supports subqueries, so take this with a grain of salt:
SELECT customer_id
, MIN(created_at) AS FirstDate
FROM ( SELECT customer_id
, created_at
, ( SELECT SUM(amount)
FROM [Table] t
WHERE t.CustomerID = [Table].CustomerID
AND t.created_at <= [Table].created_at
) AS RunTot
FROM [Table]
) x
WHERE x.RunTot >= 100
GROUP BY customer_id

Related

Postgresql: how to select from map of multiple values

I have a SOME_DELTA table which records all party related transactions with amount change
Ex.:
PARTY_ID | SOME_DATE | AMOUNT
--------------------------------
party_id_1 | 2019-01-01 | 100
party_id_1 | 2019-01-15 | 30
party_id_1 | 2019-01-15 | -60
party_id_1 | 2019-01-21 | 80
party_id_2 | 2019-01-02 | 50
party_id_2 | 2019-02-01 | 100
I have a case where where MVC controller accepts map someMap(party_id, some_date) and I need to get part_id list with summed amount till specific some_date
In this case if I send mapOf("party_id_1" to Date(2019 - 1 - 15), "party_id_2" to Date(2019 - 1 - 2))
I should get list of party_id with summed amount till some_date
Output should look like:
party_id_1 | 70
party_id_2 | 50
Currently code is:
select sum(amount) from SOME_DELTA where party_id=:partyId and some_date <= :someDate
But in this case I need to iterate through map and do multiple DB calls for summed amount for eatch party_id till some_date which feels wrong
Is there a more delicate way to get in one select query? (to avoid +100 DB calls)
You can use a lateral join for this:
select map.party_id,
c.amount
from (
values
('party_id_1', date '2019-01-15'),
('party_id_2', date '2019-01-02')
) map (party_id, cutoff_date)
join lateral (
select sum(amount) amount
from some_delta sd
where sd.party_id = map.party_id
and sd.some_date <= map.cutoff_date
) c on true
order by map.party_id;
Online example

Calculate time span over a number of records

I have a table that has the following schema:
ID | FirstName | Surname | TransmissionID | CaptureDateTime
1 | Billy | Goat | ABCDEF | 2018-09-20 13:45:01.098
2 | Jonny | Cash | ABCDEF | 2018-09-20 13:45.01.108
3 | Sally | Sue | ABCDEF | 2018-09-20 13:45:01.298
4 | Jermaine | Cole | PQRSTU | 2018-09-20 13:45:01.398
5 | Mike | Smith | PQRSTU | 2018-09-20 13:45:01.498
There are well over 70,000 records and they store logs of transmissions to a web-service. What I'd like to know is how would I go about writing a script that would select the distinct TransmissionID values and also show the timespan between the earliest CaptureDateTime record and the latest record? Essentially I'd like to see what the rate of records the web-service is reading & writing.
Is it even possible to do so in a single SELECT statement or should I just create a stored procedure or report in code? I don't know where to start aside from SELECT DISTINCT TransmissionID for this sort of query.
Here's what I have so far (I'm stuck on the time calculation)
SELECT DISTINCT [TransmissionID],
COUNT(*) as 'Number of records'
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
Not sure how to get the difference between the first and last record with the same TransmissionID I would like to get a result set like:
TransmissionID | TimeToCompletion | Number of records |
ABCDEF | 2.001 | 5000 |
Simply GROUP BY and use MIN / MAX function to find min/max date in each group and subtract them:
SELECT
TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime))
FROM yourdata
GROUP BY TransmissionID
HAVING COUNT(*) > 1
Use min and max to calculate timespan
SELECT [TransmissionID],
COUNT(*) as 'Number of records',datediff(s,min(CaptureDateTime),max(CaptureDateTime)) as timespan
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
A method that returns the average time for all transmissionids, even those with only 1 record:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime)) * 1.0 / NULLIF(COUNT(*) - 1, 0)
FROM yourdata
GROUP BY TransmissionID;
Note that you may not actually want the maximum of the capture date for a given transmissionId. You might want the overall maximum in the table -- so you can consider the final period after the most recent record.
If so, this looks like:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second,
MIN(CaptureDateTime),
MAX(MAX(CaptureDateTime)) OVER ()
) * 1.0 / COUNT(*)
FROM yourdata
GROUP BY TransmissionID;

How to sum different criteria in SQL?

I have data that looks like this in Redshift:
+-------------+------------+---------+
| Employee_ID | Manager_ID | Revenue |
+-------------+------------+---------+
| 123 | 123 | 1015.24 |
| 541 | 123 | 5587.23 |
+-------------+------------+---------+
I want to write a query that sums manager revenue whenever a Manager_ID is inputted and sums employee revenue whenever an Employee_ID is inputted. Currently, I have a query that looks like this and I have to run it twice:
SELECT
sum(revenue) as revenue
FROM
employee_rev r
WHERE
r.manager_id in ('123','124') --I change this to employee_ID the second time around
If it helps, there is another table like this:
+-------------+------------------------+
| Employee_ID | Role |
+-------------+------------------------+
| 123 | Manager |
| 541 | Individual Contributor |
+-------------+------------------------+
Thank you so much for your time, this seemed really simple and now I'm pretty frustrated.
I think you can just do:
SELECT sum(revenue) as revenue
FROM employee_rev r
WHERE 123 in (r.employee_id, r.manager_id);
That is, for a given id, look in both columns. An employee should never be in the manager column, so this would appear to do what you want.
EDIT:
For multiple ids, you would have to test independently. Either:
WHERE 123 IN (r.employee_id, r.manager_id) OR
456 IN (r.employee_id, r.manager_id)
or:
WHERE r.employee_id in (123, 456) OR
r.manager_id in (123, 456)
Use union to add two selects into one 'table', then sum it. I think this should work
SELECT sum(result) from (
SELECT
sum(revenue) as result
FROM
employee_rev r
WHERE
r.manager_id in ('123')
UNION ALL
SELECT
sum(revenue) as result
FROM
employee_rev r
WHERE
r.employee_id in ('124')
)

SQL to find max of sum of data in one table, with extra columns

Apologies if this has been asked elsewhere. I have been looking on Stackoverflow all day and haven't found an answer yet. I am struggling to write the query to find the highest month's sales for each state from this example data.
The data looks like this:
| order_id | month | cust_id | state | prod_id | order_total |
+-----------+--------+----------+--------+----------+--------------+
| 67212 | June | 10001 | ca | 909 | 13 |
| 69090 | June | 10011 | fl | 44 | 76 |
... etc ...
My query
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders GROUP BY `month`, `state`
ORDER BY sales;
| month | state | sales |
+------------+--------+--------+
| September | wy | 435 |
| January | wy | 631 |
... etc ...
returns a few hundred rows: the sum of sales for each month for each state. I want it to only return the month with the highest sum of sales, but for each state. It might be a different month for different states.
This query
SELECT `state`, MAX(order_sum) as topmonth
FROM (SELECT `state`, SUM(order_total) order_sum FROM orders GROUP BY `month`,`state`)
GROUP BY `state`;
| state | topmonth |
+--------+-----------+
| ca | 119586 |
| ga | 30140 |
returns the correct number of rows with the correct data. BUT I would also like the query to give me the month column. Whatever I try with GROUP BY, I cannot find a way to limit the results to one record per state. I have tried PartitionBy without success, and have also tried unsuccessfully to do a join.
TL;DR: one query gives me the correct columns but too many rows; the other query gives me the correct number of rows (and the correct data) but insufficient columns.
Any suggestions to make this work would be most gratefully received.
I am using Apache Drill, which is apparently ANSI-SQL compliant. Hopefully that doesn't make much difference - I am assuming that the solution would be similar across all SQL engines.
This one should do the trick
SELECT t1.`month`, t1.`state`, t1.`sales`
FROM (
/* this one selects month, state and sales*/
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
) AS t1
JOIN (
/* this one selects the best value for each state */
SELECT `state`, MAX(sales) AS best_month
FROM (
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
)
GROUP BY `state`
) AS t2
ON t1.`state` = t2.`state` AND
t1.`sales` = t2.`best_month`
It's basically the combination of the two queries you wrote.
Try this:
SELECT `month`, `state`, SUM(order_total) FROM orders WHERE `month` IN
( SELECT TOP 1 t.month FROM ( SELECT `month` AS month, SUM(order_total) order_sum FROM orders GROUP BY `month`
ORDER BY order_sum DESC) t)
GROUP BY `month`, state ;

SQL: Select distinct sum of column with max(column)

I have a salary table like this:
id | person_id | start_date | pay
1 | 1234 | 2012-01-01 | 3000
2 | 1234 | 2012-05-01 | 3500
3 | 5678 | 2012-01-01 | 5000
4 | 5678 | 2013-01-01 | 6000
5 | 9101 | 2012-09-01 | 2000
6 | 9101 | 2014-04-01 | 3000
7 | 9101 | 2011-01-01 | 1500
and so on...
Now I want to query the sum of the salaries of a specific month for all persons of a company.
I already have the ids of the persons who worked in the specific month in the specific company, so I can do something like WHERE person_id IN (...)
I have some problems with the salaries query though. The result for e.g. the month 2012-08 should be:
10000
which is 3500+5000+1500.
So I need to find the summed up pay value (for all persons in the IN clause) for the maximum start_date <= the specific month.
I tried various INNER JOINS but it's been a long day and I can't think straight at the moment.
Any hint is highly appreciated.
You need to get the active record. This following does this by calculating the max start date before the month in question:
select sum(s.pay)
from (select person_id, max(start_date) as maxstartdate
from salary
where person_id in ( . . . ) and
start_date < <first day of month of interest>
group by person_id
) p join
salary s
on s.person_id = p.person_id and
s.maxstartdate = p.start_date
You need to fill in the month and list of ids.
You can also do this with ranking functions, but you don't specify which SQL engine you are using.
You have to use group by for these things....
select person_id,sum(pay) from salary where person_id in(...) group by person_id
may it will helps you.....