Problems with complex query - sql

There are two tables.
In the first I have columns:
id - a person
time - the time of receiving the bonus (timestamp)
money - size of bonus
And the second:
id
time - time of getting a rank (timestamp)
range - military rank (int)
The task is to withdraw the amount and number of bonuses received by people in the rank of captain (range = 7) with aggregation by day.
I have no ideas how to do a table with this data. I can summarize data by all days such as
SELECT DISTINCTROW Payment.user_id AS user_id, Sum(IIf(IsNull(Payment.money),0,Payment.money)) AS [Sum - money], Count(Payment.money) AS [Count - Payment], Format(Payment.time, "Short Date") as day
FROM Payment
GROUP BY Payment.user_id, Format (Payment.time, "Short Date")
Having ((Count(Payment.money) > 0));
Can you help me with second part and summarize them? thanks
For example: first table (Payment):
user_id time money
a 01.01.10 00:00:00 15,00
a 01.01.10 10:00:00 2,00
a 03.01.10 00:00:00 3,00
c 04.01.10 00:00:00 4,00
c 04.01.10 00:05:00 5,00
d 06.01.10 00:00:00 6,00
e 07.01.10 00:00:00 7,00
e 08.01.10 00:00:00 8,00
The second one:
user_id time range
a 01.01.10 00:00:00 6
a 01.01.10 09:00:00 7
a 04.01.10 00:00:00 8
b 04.01.10 00:00:00 4
c 04.01.10 00:05:00 7
d 06.01.10 00:00:00 5
e 07.01.10 00:00:00 6
f 08.01.10 00:00:00 6
g 08.01.10 00:00:00 7
I expected:
user_id time sum
a 01.01.10 2
a 03.01.10 3
c 04.01.10 5

Here is one possible method using joins:
select t1.user_id, datevalue(p.time) as [time], sum(p.money) as [sum]
from
(
(select t.user_id, t.time from rank t where t.range = 7) t1
inner join payment p on t1.user_id = p.user_id
)
left join
(select t.user_id, t.time from rank t where t.range > 7) t2 on p.user_id = t2.user_id
where
p.time >= t1.time and (t2.user_id is null or p.time < t2.time)
group by
t1.user_id, datevalue(p.time)
I have assumed that your second table is called rank (this was not stated in your question).
Here, the subquery t1 obtains the set of users with range = 7 (captain), and the subquery t2 obtains the set of users with range > 7. I then select all records with a payment date greater than or equal to the date of promotion to captain, but less than any subsequent promotion (if it exists).
This yields the following result:
+---------+------------+------+
| user_id | time | sum |
+---------+------------+------+
| a | 01/01/2010 | 2.00 |
| a | 03/01/2010 | 3.00 |
| c | 04/01/2010 | 5.00 |
+---------+------------+------+
Unless I have misunderstood, I would argue that your expected result is incorrect as the payment below occurs before user_id = c achieved the rank of captain:
c 04.01.10 00:00:00 4,00
c 04.01.10 00:05:00 7

Related

Join based on ID and closest date

I have two tables:
Table 1 which contains phone calls (for every CustomerID there is at most one PhoneCall per day):
ActicityID CustomerID PhoneDate
1 A 2019-11-01
2 A 2019-12-01
3 A 2019-12-20
4 B 2019-11-01
5 B 2019-11-20
6 C 2019-11-03
7 D 2019-11-03
8 D 2019-12-01
9 E 2019-11-05
10 F 2019-11-01
Table 2 which contains Orders (OrdDate is the date when the order was placed and BillingDate is the date when the order was charged)
CustomerID OrdDate BillingDate
A 2019-12-03 2019-12-04
A 2019-12-21 2019-12-21
B 2019-11-03 2019-11-10
D 2019-12-02 2019-12-02
F 2019-11-02 2019-11-02
I want to join the tables. The joined table should have the same number of rows as Table 1.
So basically I want to know if there was order after a phone call. The problem is that if just join on CustomerID I get an OrdDat and a BillingDate for every customer who has ever made an order. For example Customer A made an order after the call on 2019-12-01 and after the call on the 2019-12-20 but not after the first call.
So my desired output would be
ActicityID CustomerID PhoneDate OrdDate BillingDate
1 A 2019-11-01 NULL NULL
2 A 2019-12-01 2019-12-03 2019-12-04
3 A 2019-12-20 2019-12-21 2019-12-21
4 B 2019-11-01 2019-11-03 2019-11-10
5 B 2019-11-20 NULL NULL
6 C 2019-11-03 NULL NULL
7 D 2019-11-03 NULL NULL
8 D 2019-12-01 2019-12-02 2019-12-02
9 E 2019-11-05 NULL NULL
10 F 2019-11-01 2019-11-02 2019-11-02
I think I need to join on CustomerID and the closest date between PhoneDate and OrdDate but my SQL knowledge is quite limited and I couldn't figure out how to do it.
I think you can do what you want by using lead() to get the next phone date and then just joining:
select a.*, b.orddate, b.billdate
from (select a.*,
lead(phonedate) over (partition by customerid order by phonedate) as next_pd
from a
) a left join
b
on b.customerid = a.customerid and
b.orddate >= a.phonedate and
(b.orddate < a.next_pd or a.next_pd is null);
You need to use a sub-query to limit the other table, referencing the TOP 1 associated date record...
SELECT
ActivityID,
CustomerID,
PhoneDate,
(SELECT TOP (1)
OrderDate
FROM
dbo.CustomerBilling AS b
WHERE
a.PhoneDate < OrderDate AND
a.CustomerID = CustomerID
ORDER BY OrderDate) AS BillingDate
FROM
dbo.Activity AS a

Take the last row Group By date

I need to select content statistics group By Date.
Here example of records :
id cid viewCount created_at
1 1 50 31-12-2018 18:00:00
2 1 50 01-01-2019 18:00:00
3 2 50 01-01-2019 18:00:00
4 2 100 01-01-2019 19:00:00
5 2 150 01-01-2019 20:00:00
6 3 1000 01-01-2019 15:00:00
Need to return :
id cid viewCount date
1 1 50 31-12-2018
2 1 50 01-01-2019
5 2 150 01-01-2019
6 3 1000 01-01-2019
I tried the following code
$qb = $this->createQueryBuilder('c');
$qb->select('a.id as id')
->addSelect('COALESCE(SUM(a.viewCount),0) as viewCount')
->addSelect('DATE_FORMAT(a.createdAt, \'%d-%m-%Y\') as date');
->innerJoin('c.analytics', 'a')
->groupBy('c.cid')
->addGroupBy('date')
->orderBy('a.createdAt', 'ASC');
return:
id cid viewCount date
1 1 50 31-12-2018
2 1 50 01-01-2019
3 2 50 01-01-2019
4 2 100 01-01-2019
5 2 150 01-01-2019
6 3 1000 01-01-2019
I have tried to create a subquery :
$qbLastHour = $this->createQueryBuilder('cc');
$qbLastHour->select('MAX(DATE_FORMAT(aa.createdAt, \'%H\'))')
->innerJoin('cc.analytics', 'aa')
->where('cc.id=c.id')
->groupBy('cc.cid')
->addGroupBy('s');
$qb->addSelect(sprintf("(%s) AS r", $qbLastHour->getDQl()));
But something go wrong because i dont groupBy date at the subquery.
If someone can help me. Thank you
Update
Here is an attempt, in sql again, to select only one row per date and cid based on the max time per day
SELECT id, c.cid, viewCount, max_date
FROM content a
JOIN content_analytic c ON a.id = c.content_id
RIGHT JOIN (SELECT c.cid, DATE_FORMAT(created_at, '%d-%m-%Y') dt, MAX(created_at) max_date
FROM content a
JOIN content_analytic c ON a.id = c.content_id
GROUP BY dt, c.cid) x ON x.max_date = a.created_at and x.cid = c.cid
This is how I believe the query should be in pure sql
SELECT c.cid, COALESCE(SUM(a.viewCount), 0), DATEFORMAT(a.created_at, ‘%d-%m-%Y’) as date
FROM content a
INNER JOIN content_analytic c ON a.id = c.content_id
GROUP BY c.cid, date
ORDER BY date

Join tables with dates within intervals of 5 min (get avg)

I want to join two tables based on timestamp, the problem is that both tables didn't had the exact same timestamp so i want to join them using a near timestamp using a 5 minute interval.
This query needs to be done using 2 Common table expressions, each common table expression needs to get the timestamps and group them by AVG so they can match
Freezer | Timestamp | Temperature_1
1 2018-04-25 09:45:00 10
1 2018-04-25 09:50:00 11
1 2018-04-25 09:55:00 11
Freezer | Timestamp | Temperature_2
1 2018-04-25 09:46:00 15
1 2018-04-25 09:52:00 13
1 2018-04-25 09:59:00 12
My desired result would be:
Freezer | Timestamp | Temperature_1 | Temperature_2
1 2018-04-25 09:45:00 10 15
1 2018-04-25 09:50:00 11 13
1 2018-04-25 09:55:00 11 12
The current query that i'm working on is:
WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
WHERE A.Timestamp = B.Timestamp
You should may want to modify your join criteria instead of filtering the output. Use BETWEEN to bracket your join value on the timestamps. I chose +/- 150 seconds because that's half of 2-1/2 minutes to either side (5-minute range to match). You may need something different.
;WITH Temperatures_1 (
SELECT Freezer, Temperature_1, Timestamp
FROM TABLE_A
),
WITH Temperatures_2 (
SELECT Freezer, Temperature_2, Timestamp
FROM TABLE_B
)
SELECT A.Freezer, A.Timestamp, Temperature_1, Temperature_2
FROM Temperatures_1 as A
RIGHT JOIN Temperatures_2 as B
ON A.FREEZER = B.FREEZER
AND A.Timestamp BETWEEN (DATEADD(SECOND, -150, B.Timestamp)
AND (DATEADD(SECOND, 150, B.Timestamp)
You should change the key of join two table by adding the timestamp. The timestamp you should need to approximate the datetime on both side tables A and B tables.
First you should check if the value of the left table (A) datetime is under 2.5 minutes then approximate to the near 5 min. If it is greater the approximate to the next 5 minutes. The same thing you should do on the right table (B). Or you can do this on the CTE and the right join remains the same as your query.

Postgres count number or rows and group them by timestamp

Let's assume I have one table in postgres with just 2 columns:
ID which is PK for the table (bigint)
time which is type of timestamp
Is there any way how to get IDs grouped by time BY YEAR- when the time is date 18 February 2005 it would fit in 2005 group (so result would be)
year number of rows
1998 2
2005 5
AND if the number of result rows is smaller than some number (for example 3) SQL will return the result by month
Something like
month number of rows
(February 2018) 5
(March 2018) 2
Is that possible some nice way in postgres SQL?
You can do it using window functions (as always).
I use this table:
TABLE times;
id | t
----+-------------------------------
1 | 2018-03-14 20:04:39.81298+01
2 | 2018-03-14 20:04:42.92462+01
3 | 2018-03-14 20:04:45.774615+01
4 | 2018-03-14 20:04:48.877038+01
5 | 2017-03-14 20:05:08.94096+01
6 | 2017-03-14 20:05:16.123736+01
7 | 2017-03-14 20:05:19.91982+01
8 | 2017-01-14 20:05:32.249175+01
9 | 2017-01-14 20:05:35.793645+01
10 | 2017-01-14 20:05:39.991486+01
11 | 2016-11-14 20:05:47.951472+01
12 | 2016-11-14 20:05:52.941504+01
13 | 2016-10-14 21:05:52.941504+02
(13 rows)
First, group by month (subquery per_month).
Then add the sum per year with a window function (subquery with_year).
Finally, use CASE to decide which one you will output and remove duplicates with DISTINCT.
SELECT DISTINCT
CASE WHEN yc > 5
THEN mc
ELSE yc
END AS count,
CASE WHEN yc > 5
THEN to_char(t, 'YYYY-MM')
ELSE to_char(t, 'YYYY')
END AS period
FROM (SELECT
mc,
sum(mc) OVER (PARTITION BY date_trunc('year', t)) AS yc,
t
FROM (SELECT
count(*) AS mc,
date_trunc('month', t) AS t
FROM times
GROUP BY date_trunc('month', t)
) per_month
) with_year
ORDER BY 2;
count | period
-------+---------
3 | 2016
3 | 2017-01
3 | 2017-03
4 | 2018
(4 rows)
Just count years. If it's at least 3, then you group by years, else by months:
select
case (select count(distinct extract(year from time)) from mytable) >= 3 then
to_char(time, 'yyyy')
else
to_char(time, 'yyyy-mm')
end as season,
count(*)
from mytable
group by season
order by season;
(Unlike many other DBMS, PostgreSQL allows to use alias names in the GROUP BY clause.)

Select min/max from group defined by one column as subgroup of another - SQL, HPVertica

I'm trying to find the min and max date within a subgroup of another group. Here's example 'data'
ID Type Date
1 A 7/1/2015
1 B 1/1/2015
1 A 8/5/2014
22 B 3/1/2015
22 B 9/1/2014
333 A 8/1/2015
333 B 4/1/2015
333 B 3/29/2014
333 B 2/28/2013
333 C 1/1/2013
What I'd like to identify is - within an ID, what is the min/max Date for each block of similar Type? So for ID # 333 I want the below info:
A: min & max = 8/1/2015
B: min = 2/28/2013
max = 4/1/2015
C: min & max = 1/1/2013
I'm having trouble figuring out how to identify only uninterrupted groupings of Type within a grouping of ID. For ID #1, I need to keep the two 'A' Types with separate min/max dates because they were split by a Type 'B', so I can't just pull the min date of all Type A's for ID #1, it has to be two separate instances.
What I've tried is something like the below two lines, but neither of these accurately captures the case mentioned above for ID #1 where Type B interrupts Type A.
Max(Date) OVER (Partition By ID, Type)
or this:
Row_Number() OVER (Partition By ID, Type ORDER BY Date DESC)
,then selecting Row #1 for max date, and date ASC w/ row #1 for min date
Thank you for any insight you can provide!
If I understand right, you want the min/max values for an id/type grouped using a descending date sort, but the catch is that you want them based on clusters within the id by time.
What you can do is use CONDITIONAL_CHANGE_EVENT to tag the rows on change of type, then use that in your GROUP BY on a standard min/max aggregation.
This would be the intermediate step towards getting to what you want:
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
order by ID, Date desc, Type
ID Type Date cce
1 A 2015-07-01 00:00:00 0
1 B 2015-01-01 00:00:00 1
1 A 2014-08-05 00:00:00 2
22 B 2015-03-01 00:00:00 0
22 B 2014-09-01 00:00:00 0
333 A 2015-08-01 00:00:00 0
333 B 2015-04-01 00:00:00 1
333 B 2014-03-29 00:00:00 1
333 B 2013-02-28 00:00:00 1
333 C 2013-01-01 00:00:00 2
Once you have them grouped using CCE, you can do an aggregate on this to get the min/max you are looking for grouping on cce. You can play with the order by at the bottom, this ordering seem to make the most sense to me.
select id, type, min(date), max(date)
from (
select ID, Type, Date,
CONDITIONAL_CHANGE_EVENT(Type) OVER( PARTITION BY ID ORDER BY Date desc) cce
from mytable
group by ID, Type, Date
) x
group by id, type, cce
order by id, 3 desc, 4 desc;
id type min max
1 A 2015-07-01 00:00:00 2015-07-01 00:00:00
1 B 2015-01-01 00:00:00 2015-01-01 00:00:00
1 A 2014-08-05 00:00:00 2014-08-05 00:00:00
22 B 2014-09-01 00:00:00 2015-03-01 00:00:00
333 A 2015-08-01 00:00:00 2015-08-01 00:00:00
333 B 2013-02-28 00:00:00 2015-04-01 00:00:00
333 C 2013-01-01 00:00:00 2013-01-01 00:00:00