Summing one tables value and grouping it with a single value of another table - sql

I have two tables.
Table 1: Actuals
A table with all the employess and how many hours they have worked in a given month
| ID | empID | hours | Month |
--------------------------------------
Table 2:
A target table which has hour target per month. The target refers to an amount that the sum of all the hours worked by the employees should meet.
| ID | hours target | Month |
-----------------------------------
Is it possible to return the sum of all table 1's hours with the target of table 2 and group it by the month and return it in a single data set?
Example
| Month | Actual Hours | hours target |
-----------------------------------------
| 1 | 320 | 350 |
etc.
Hope that is clear enough and many thanks for considering the question.

This should work:
SELECT t.[month], sum(a.[hours]) as ActualHours, t.[hourstarget]
FROM [TargetTable] t
JOIN [ActualsTable] a on t.[month] = a.[month]
GROUP BY t.[month], t.[hourstarget]
Written in plain English, you're saying "give me the sum of all hours accrued, grouped by the month (and also include the target hours for that month)".

WITH
t1 AS (SELECT mnth, targetHours FROM tblTargetHours),
t2 AS (SELECT mnth, sum(hours) AS totalhours FROM tblEmployeeHours GROUP BY mnth)
SELECT t1.mnth, t2.totalhours, t1.targethours
FROM t1, t2
WHERE t1.mnth = t2.mnth
results:
mnth totalhours targethours
1 135 350
2 154 350
3 128 350

Related

Running sum of unique users in redshift

I have a table with as follows with user visits by day -
| date | user_id |
|:-------- |:-------- |
| 01/31/23 | a |
| 01/31/23 | a |
| 01/31/23 | b |
| 01/30/23 | c |
| 01/30/23 | a |
| 01/29/23 | c |
| 01/28/23 | d |
| 01/28/23 | e |
| 01/01/23 | a |
| 12/31/22 | c |
I am looking to get a running total of unique user_id for the last 30 days . Here is the expected output -
| date | distinct_users|
|:-------- |:-------- |
| 01/31/23 | 5 |
| 01/30/23 | 4 |
.
.
.
Here is the query I tried -
SELECT date
, SUM(COUNT(DISTINCT user_id)) over (order by date rows between 30 preceding and current row) AS unique_users
FROM mytable
GROUP BY date
ORDER BY date DESC
The problem I am running into is that this query not counting the unique user_id - for instance the result I am getting for 01/31/23 is 9 instead of 5 as it is counting user_id 'a' every time it occurs.
Thank you, appreciate your help!
Not the most performant approach, but you could use a correlated subquery to find the distinct count of users over a window of the past 30 days:
SELECT
date,
(SELECT COUNT(DISTINCT t2.user_id)
FROM mytable t2
WHERE t2.date BETWEEN t1.date - INTERVAL '30 day' AND t1.date) AS distinct_users
FROM mytable t1
ORDER BY date;
There are a few things going on here. First window functions run after group by and aggregation. So COUNT(DISTINCT user_id) gives the count of user_ids for each date then the window function runs. Also, window function set up like this work over the past 30 rows, not 30 days so you will need to fill in missing dates to use them.
As to how to do this - I can only think of the "expand to the data so each date and id has a row" method. This will require a CTE to generate the last 2 years of dates plus 30 days so that the look-back window works for the first dates. Then window over the past 30 days for each user_id and date to see which rows have an example of this user_id within the past 30 days, setting the value to NULL if no uses of the user_id are present within the window. Then Count the user_ids counts (non NULL) grouping by just date to get the number of unique user_ids for that date.
This means expanding the data significantly but I see no other way to get truly unique user_ids over the past 30 days. I can help code this up if you need but will look something like:
WITH RECURSIVE CTE to generate the needed dates,
CTE to cross join these dates with a distinct set of all the user_ids in user for the past 2 years,
CTE to join the date/user_id data set with the table of real data for past 2 years and 30 days and window back counting non-NULL user_ids, partition by date and user_id, order by date, and setting any zero counts to NULL with a DECODE() or CASE statement,
SELECT, grouping by just date count the user_ids by date;

SQL Finding sum of rows and returning count of keys

For a database table looking something like this:
id | year | stint | sv
----+------+-------+---
mk1 | 2001 | 1 | 30
mk1 | 2001 | 2 | 20
ml0 | 1999 | 1 | 43
ml0 | 2000 | 1 | 44
hj2 | 1993 | 1 | 70
I want to get the following output:
count
-------
3
with the conditions being count the number of ids that have a sv > 40 for a single year greater than 1994. If there is more than one stint for the same year, add the sv points and see if > 40.
This is what I have written so far but it is obviously not right:
SELECT COUNT(DISTINCT id),
SUM(sv) as SV
FROM public.pitching
WHERE (year > 1994 AND sv >40);
I know the syntax is completely wrong and some of the conditions' information is missing but I'm not familiar enough with SQL and don't know how to properly do the summing of two rows in the same table with a condition (maybe with a subquery?). Any help would be appreciated! (using postgres)
You could use a nested query to get the aggregations, and wrap that for getting the count. Note that the condition on the sum must be in a having clause:
SELECT COUNT(id)
FROM (
SELECT id,
year,
SUM(sv) as SV
FROM public.pitching
WHERE year > 1994
GROUP BY id,
year
HAVING SUM(sv) > 40 ) years
If an id should only count once even it fulfils the condition in more than one year, then do COUNT(distinct id) instead of COUNT(id)
You can try like following using sum and partition by year.
select count( distinct year) from
(
select year, sum(sv) over (partition by year) s
from public.pitching
where year > 1994
) t where s>40
Online Demo

Calculate time span over a number of records

I have a table that has the following schema:
ID | FirstName | Surname | TransmissionID | CaptureDateTime
1 | Billy | Goat | ABCDEF | 2018-09-20 13:45:01.098
2 | Jonny | Cash | ABCDEF | 2018-09-20 13:45.01.108
3 | Sally | Sue | ABCDEF | 2018-09-20 13:45:01.298
4 | Jermaine | Cole | PQRSTU | 2018-09-20 13:45:01.398
5 | Mike | Smith | PQRSTU | 2018-09-20 13:45:01.498
There are well over 70,000 records and they store logs of transmissions to a web-service. What I'd like to know is how would I go about writing a script that would select the distinct TransmissionID values and also show the timespan between the earliest CaptureDateTime record and the latest record? Essentially I'd like to see what the rate of records the web-service is reading & writing.
Is it even possible to do so in a single SELECT statement or should I just create a stored procedure or report in code? I don't know where to start aside from SELECT DISTINCT TransmissionID for this sort of query.
Here's what I have so far (I'm stuck on the time calculation)
SELECT DISTINCT [TransmissionID],
COUNT(*) as 'Number of records'
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
Not sure how to get the difference between the first and last record with the same TransmissionID I would like to get a result set like:
TransmissionID | TimeToCompletion | Number of records |
ABCDEF | 2.001 | 5000 |
Simply GROUP BY and use MIN / MAX function to find min/max date in each group and subtract them:
SELECT
TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime))
FROM yourdata
GROUP BY TransmissionID
HAVING COUNT(*) > 1
Use min and max to calculate timespan
SELECT [TransmissionID],
COUNT(*) as 'Number of records',datediff(s,min(CaptureDateTime),max(CaptureDateTime)) as timespan
FROM [log_table]
GROUP BY [TransmissionID]
HAVING COUNT(*) > 1
A method that returns the average time for all transmissionids, even those with only 1 record:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second, MIN(CaptureDateTime), MAX(CaptureDateTime)) * 1.0 / NULLIF(COUNT(*) - 1, 0)
FROM yourdata
GROUP BY TransmissionID;
Note that you may not actually want the maximum of the capture date for a given transmissionId. You might want the overall maximum in the table -- so you can consider the final period after the most recent record.
If so, this looks like:
SELECT TransmissionID,
COUNT(*),
DATEDIFF(second,
MIN(CaptureDateTime),
MAX(MAX(CaptureDateTime)) OVER ()
) * 1.0 / COUNT(*)
FROM yourdata
GROUP BY TransmissionID;

Postgres how to determine there are X records spanning 2 days of datetimes in a table

I have a table containing electricity meter readings which looks something like this:
| meter_id | reading_interval_datetime |
| 110 | 2018-01-15T00:00:00+00:00 |
| 110 | 2018-01-15T00:30:00+00:00 |
The table is filled with at most 48 records per day (one reading every 30 mins).
What's an efficient way to check if a particular meter has at least two days of readings in there?
You can determine if a meter_id has at least two days by doing:
select meter_id
from t
group by meter_id
having min(reading_interval_datetime::date) <> max(reading_interval_datetime::date);
This will check that there are two dates in the data.
I would do this:
sql> create index your_table_idx on your_table(meter_id, date(reading_interval_datetime));
sql> select meter_id, date(reading_interval_datetime), count(1)
from your_table
where meter_id = THE_METER_ID_YOUD_LIKE_TO_CHECK
group by meter_id, date(reading_interval_datetime)
having count(1) > 1

SQL: Select distinct sum of column with max(column)

I have a salary table like this:
id | person_id | start_date | pay
1 | 1234 | 2012-01-01 | 3000
2 | 1234 | 2012-05-01 | 3500
3 | 5678 | 2012-01-01 | 5000
4 | 5678 | 2013-01-01 | 6000
5 | 9101 | 2012-09-01 | 2000
6 | 9101 | 2014-04-01 | 3000
7 | 9101 | 2011-01-01 | 1500
and so on...
Now I want to query the sum of the salaries of a specific month for all persons of a company.
I already have the ids of the persons who worked in the specific month in the specific company, so I can do something like WHERE person_id IN (...)
I have some problems with the salaries query though. The result for e.g. the month 2012-08 should be:
10000
which is 3500+5000+1500.
So I need to find the summed up pay value (for all persons in the IN clause) for the maximum start_date <= the specific month.
I tried various INNER JOINS but it's been a long day and I can't think straight at the moment.
Any hint is highly appreciated.
You need to get the active record. This following does this by calculating the max start date before the month in question:
select sum(s.pay)
from (select person_id, max(start_date) as maxstartdate
from salary
where person_id in ( . . . ) and
start_date < <first day of month of interest>
group by person_id
) p join
salary s
on s.person_id = p.person_id and
s.maxstartdate = p.start_date
You need to fill in the month and list of ids.
You can also do this with ranking functions, but you don't specify which SQL engine you are using.
You have to use group by for these things....
select person_id,sum(pay) from salary where person_id in(...) group by person_id
may it will helps you.....