DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950
Hi. I have one table called 'Model Versions' which has the fields and records as below.
intent_id
intent_name
version
version_created_at
client
sentence
1
a_intent
1
2021-01-01
es_client1
sentence_1
1
a_intent
1
2021-01-01
es_client1
sentence_2
1
a_intent
1
2021-01-01
es_client1
sentence_3
2
b_intent
2
2021-02-01
es_client1
sentence_1
2
b_intent
2
2021-02-01
es_client1
sentence_2
2
b_intent
2
2021-02-01
es_client1
sentence_3
3
c_intent
3
2021-03-01
es_client1
sentence_1
3
c_intent
3
2021-03-01
es_client1
sentence_2
4
d_intent
4
2021-04-01
es_client1
sentence_1
4
d_intent
4
2021-04-01
es_client1
sentence_2
5
e_intent
5
2021-05-01
es_client1
sentence_1
6
g_intent
1
2021-01-01
es_client2
sentence_1
6
g_intent
1
2021-01-01
es_client2
sentence_2
7
h_intent
2
2021-03-01
es_client2
sentence_1
7
h_intent
2
2021-03-01
es_client2
sentence_2
7
h_intent
2
2021-03-01
es_client2
sentence_3
8
i_intent
3
2021-04-01
es_client2
sentence_1
8
i_intent
3
2021-04-01
es_client2
sentence_2
9
j_intent
4
2021-05-01
es_client2
sentence_1
9
j_intent
4
2021-05-01
es_client2
sentence_2
10
k_intent
1
2021-01-01
es_client3
sentence_1
10
k_intent
1
2021-01-01
es_client3
sentence_2
11
k_intent
2
2021-06-01
es_client3
sentence_1
11
k_intent
2
2021-06-01
es_client3
sentence_2
12
k_intent
3
2021-07-01
es_client3
sentence_1
12
k_intent
3
2021-07-01
es_client3
sentence_2
13
k_intent
4
2021-08-01
es_client3
sentence_1
13
k_intent
4
2021-08-01
es_client3
sentence_2
14
k_intent
5
2021-10-01
es_client3
sentence_1
14
k_intent
5
2021-10-01
es_client3
sentence_2
Expected Output:
I wanted to get the top 3 versions of each client along with their respective sentence count. My expected output looks like below:
client
version
total_count_of_sentences_per_version
version_created_at
es_client1
5
1
2021-05-01
es_client1
4
2
2021-04-01
es_client1
3
2
2021-03-01
es_client2
4
2
2021-05-01
es_client2
3
2
2021-04-01
es_client2
2
3
2021-03-01
es_client3
5
2
2021-10-01
es_client3
4
2
2021-08-01
es_client3
3
2
2021-06-01
I tried writing a query with multiple CTEs and Partition By's. But none worked out. Seeking your help to achieve this.
DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950
You did not specify which of the top 3 version you wish to fetch. I'll assume you want to retrieve the 3 latest versions, based on creation date.
My suggestion is to use a ROW_NUMBER() for each client in a windowed function, and to filter the top 3 rows.
For instance :
with cte as(
select
client,
version,
version_created_at,
count(Sentence) total_count_of_sentences_per_version,
row_number() over(partition by client order by version_created_at desc) version_row_number
from model_versions
group by
client,version,
version_created_at
)
select
client,
version,
total_count_of_sentences_per_version,
version_created_at
from cte
where version_row_number <=3
Try it online
You can try this:
WITH main_tab
AS (SELECT client,
version,
Count(*)
OVER (
partition BY client, version),
Min(version_created_at)
OVER (
partition BY client, version),
Dense_rank()
OVER (
partition BY client
ORDER BY version DESC) rn
FROM model_versions)
SELECT DISTINCT m.*
FROM main_tab m;
Related
I am trying to create a dates table in SQL based on a set of inputs, but I haven't been able to figure it out.
I am receiving in SQL inputs as below:
This table:
Date
Value
2022-01-01
5
2022-07-12
10
2022-11-15
3
A Start Date = 2022-01-01
A stop Date = 2022-12-01
I need to get a table as below starting from Start Date until Stop Date, assiging each correspondent number based on the initial table to each date in that period:
Date
Value
2022-01-01
5
2022-01-02
5
2022-01-03
5
2022-01-04
5
.
5
.
5
.
5
2022-07-09
5
2022-07-10
5
2022-07-11
5
2022-07-12
10
2022-07-13
10
2022-07-14
10
.
10
.
10
2022-11-13
10
2022-11-14
10
2022-11-15
3
2022-11-16
3
2022-11-17
3
2022-11-18
3
How can I do that?
Thanks.
Using the window function lead() over() in concert with an ad-hoc tally table
Example
Select Date = dateadd(DAY,N,A.Date)
,A.Value
From (
Select *
,nDays = datediff(DAY,Date,lead(Date,1,dateadd(day,1,'2022-12-01')) over (order by date))
From YourTable
) A
Join ( Select Top 1000 N=-1+Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) B
on N<NDays
Order by Date
Results
Date Value
2022-01-01 5
2022-01-02 5
2022-01-03 5
2022-01-04 5
2022-01-05 5
...
2022-07-10 5
2022-07-11 5
2022-07-12 10
2022-07-13 10
2022-07-14 10
...
2022-11-12 10
2022-11-13 10
2022-11-14 10
2022-11-15 3
2022-11-16 3
2022-11-17 3
...
2022-11-30 3
2022-12-01 3
Given are two series, like this:
#period1
DATE
2020-06-22 310.62
2020-06-26 300.05
2020-09-23 322.64
2020-10-30 326.54
#period2
DATE
2020-06-23 312.05
2020-09-02 357.70
2020-10-12 352.43
2021-01-25 384.39
These two series are correlated to each other, i.e. they each mark either the beginning or the end of a date period. The first series marks the end of a period1 period, the second series marks the end of period2 period. The end of a period2 period is at the same time also the start of a period1 period, and vice versa.
I've been looking for a way to aggregate these periods as date ranges, but apparently this is not easily possible with Pandas dataframes. Suggestions extremely welcome.
In the easiest case, the output layout should reflect the end dates of periods, which period type it was, and the amount of change between start and stop of the period.
Explicit output:
DATE CHG PERIOD
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.0 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
However, if there is any possibility of actually grouping by a date range consisting of start AND stop date, that would be much more favorable
Thank you!
p1 = pd.DataFrame(data={'Date': ['2020-06-22', '2020-06-26', '2020-09-23', '2020-10-30'], 'val':[310.62, 300.05, 322.64, 326.54]})
p2 = pd.DataFrame(data={'Date': ['2020-06-23', '2020-09-02', '2020-10-12', '2021-01-25'], 'val':[312.05, 357.7, 352.43, 384.39]})
p1['period'] = 1
p2['period'] = 2
df = p1.append(p2).sort_values('Date').reset_index(drop=True)
df['CHG'] = abs(df['val'].diff(periods=1))
df.drop('val', axis=1)
Output:
Date period CHG
0 2020-06-22 1 NaN
1 2020-06-23 2 1.43
2 2020-06-26 1 12.00
3 2020-09-02 2 57.65
4 2020-09-23 1 35.06
5 2020-10-12 2 29.79
6 2020-10-30 1 25.89
7 2021-01-25 2 57.85
EDIT: matching the format START - STOP - CHANGE - PERIOD
Starting from the above data frame:
df['Start'] = df.Date.shift(periods=1)
df.rename(columns={'Date': 'Stop'}, inplace=True)
df = df1[['Start', 'Stop', 'CHG', 'period']]
df
Output:
Start Stop CHG period
0 NaN 2020-06-22 NaN 1
1 2020-06-22 2020-06-23 1.43 2
2 2020-06-23 2020-06-26 12.00 1
3 2020-06-26 2020-09-02 57.65 2
4 2020-09-02 2020-09-23 35.06 1
5 2020-09-23 2020-10-12 29.79 2
6 2020-10-12 2020-10-30 25.89 1
7 2020-10-30 2021-01-25 57.85 2
# If needed:
df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)
df = pd.concat([df1, df2], axis=1)
df.columns = ['start','stop']
df['CNG'] = df.bfill(axis=1)['start'].diff().abs()
df['PERIOD'] = 1
df.loc[df.stop.notna(), 'PERIOD'] = 2
df = df[['CNG', 'PERIOD']]
print(df)
Output:
CNG PERIOD
Date
2020-06-22 NaN 1
2020-06-23 1.43 2
2020-06-26 12.00 1
2020-09-02 57.65 2
2020-09-23 35.06 1
2020-10-12 29.79 2
2020-10-30 25.89 1
2021-01-25 57.85 2
2021-01-29 14.32 1
2021-02-12 22.57 2
2021-03-04 15.94 1
2021-05-07 45.42 2
2021-05-12 16.71 1
2021-09-02 47.78 2
2021-10-04 24.55 1
2021-11-18 41.09 2
2021-12-01 19.23 1
2021-12-10 20.24 2
2021-12-20 15.76 1
2022-01-03 22.73 2
2022-01-27 46.47 1
2022-02-09 26.30 2
2022-02-23 35.59 1
2022-03-02 15.94 2
2022-03-08 21.64 1
2022-03-29 45.30 2
2022-04-29 49.55 1
2022-05-04 17.06 2
2022-05-12 36.72 1
2022-05-17 15.98 2
2022-05-19 18.86 1
2022-06-02 27.93 2
2022-06-17 51.53 1
I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02
Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
i have the following table :
id name start end
1 Asla 2021-01-01 2021-12-31
1 Asla 2022-01-01 2022-04-15
2 Tina 2021-05-16 2021-09-23
3 Layla 2021-01-01 2021-09-27
3 Layla 2022-01-01 2022-07-18
2 Sim 2020-05-12 2020-08-13
3 Anderas 2021-07-01 2021-09-13
3 Anderas 2021-10-01 2021-11-18
3 Anderas 2022-01-01 2029-11-18
4 Klara 2022-01-01 null
what i want to do get persons that have work (date) under 2021 and create a new column that show status (if the person continue having work under 2022 -- ok else not ok and if the person is new like 'Klara' get new ) and show last record for every person . maybe too End = null ??????
i tried this .
select w.id ,w.name ,w.start ,w.end, max_date.end
from Work_date w
left join (select * from Work_date where start>='2022-01-01')max_date on max_date.id=id
where w.start>='2021-01-01'
``` but the problem i get the result as this
<pre>
id name start end
1 Asla 2021-01-01 null
1 Asla 2022-01-01 2022-04-15
2 Tina 2021-05-16 null
3 Layla 2021-01-01 null
3 Layla 2022-01-01 2022-07-18
3 Anderas 2021-07-01 null
3 Anderas 2021-10-01 2021-11-18
3 Anderas 2022-01-01 null
4 Klara 2022-01-01 null
</pre>
men i want to get result as <pre>
id name start end status
1 Asla 2022-01-01 2022-04-15 ok
2 Tina 2021-05-16 2021-09-23 not ok
3 Layla 2022-01-01 2022-07-18 ok
3 Anderas 2022-01-01 2029-11-18 ok
4 Klara 2022-01-01 null ok
Looks like you can simply aggregate.
Then use a CASE WHEN for the status.
select
w.id
, w.name
, max(w.start) as start
, max(w.end) as end
, case
when year(max(end)) < 2022 then 'not ok'
else 'ok'
end as status
from Work_date w
where w.start >= '2021-01-01'
group by w.id, w.name
order by w.id, max(w.start), max(w.end);
ID
NAME
START
END
STATUS
1
Asla
2022-01-01
2022-04-15
ok
2
Tina
2021-05-16
2021-09-23
not ok
3
Layla
2022-01-01
2022-07-18
ok
3
Anderas
2022-01-01
2029-11-18
ok
4
Klara
2022-01-01
null
ok
Demo on db<>fiddle here
I have a table RESERVED_BOOKINGS_OVERRIDDEN
booking_product_id on_site_from_dt on_site_to_dt venue_id
4 2021-08-07 16:00:00.000 2021-08-14 10:00:00.000 12
4 2021-08-07 16:00:00.000 2021-08-10 10:00:00.000 12
6 2021-08-02 16:00:00.000 2021-08-09 10:00:00.000 12
and another table ALLOCATED_PRODUCTS
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 3
2021-08-08 00:00:00.000 4 12 3
2021-08-05 00:00:00.000 6 12 1
Now I need to update the ReservedQuant column in the ALLOCATED_PRODUCTS table based on the rows in RESERVED_BOOKINGS_OVERRIDDEN
The ReservedQuant must minus by the amount of rows found where the ALLOCATED_PRODUCTS.Date is within the RESERVED_BOOKINGS_OVERRIDDEN.on_site_from_dt and RESERVED_BOOKINGS_OVERRIDDEN.on_site_to_dt and ALLOCATED_PRODUCTS.booking_product_id = RESERVED_BOOKINGS_OVERRIDDEN.booking_product_id.
This should be the state of the data after the update:
Date booking_product_id venue_id ReservedQuant
2021-08-05 00:00:00.000 4 12 3
2021-08-06 00:00:00.000 4 12 3
2021-08-07 00:00:00.000 4 12 1
2021-08-08 00:00:00.000 4 12 1
2021-08-05 00:00:00.000 6 12 0
update a set a.ReservedQuant=ReservedQuant-(select count(1) from RESERVED_BOOKINGS_OVERRIDDEN b where a.booking_product_id=b.booking_product_id
and a.date between cast(b.on_site_from_dt as date) and cast(b.on_site_to_dt as date))
from ALLOCATED_PRODUCTS a