Transpose a table with multiple ID rows and different assessment dates - sql

I would like to transpose my table to see trends in a data. The data is formatted as such:
UserId is can occur multiple times because of different assessment periods. Let's say a user with ID 1 inccured some charges in January, February, and March. There are currently three rows that contain data from these periods respectively.
I would like to see everything as one row - independently of the number of periods (up to 12 months), for each user ID.
This would enable me to see and compare changes between assessment periods and attributes.
Current format:
UserId AssessmentDate Attribute1 Attribute2 Attribute3
1 2020-01-01 00:00:00.000 -01:00 20.13 123.11 405.00
1 2021-02-01 00:00:00.000 -01:00 1.03 78.93 11.34
1 2021-03-01 00:00:00.000 -01:00 15.03 310.10 23.15
2 2021-02-01 00:00:00.000 -01:00 14.31 41.30 63.20
2 2021-03-01 00:03:45.000 -01:00 0.05 3.50 1.30
Desired format:
UserId LastAssessmentDate Attribute1_M-2 Attribute2_M-1 ... Attribute3_M0
1 2021-03-01 00:00:00.000 -01:00 20.13 123.11 23.15
2 2021-03-01 00:03:45.000 -01:00 NULL 41.30 1.30
Either SQL or Pandas - both work for me. Thanks for the help!

Related

Select telemetry data based on relational data in PostgreSQL/TimescaleDB

I am storing some telemetry data from some sensors in an SQL table (PostgreSQL) and I want to know how I can I write a query that will group the telemetry data using relational information from two other tables.
I have one table which stores the telemetry data from the sensors. This table contains three fields, one for the timestamp, one for the sensor ID, one for the value of the sensor at that time. The value column is an incrementing count (it only increases)
Telemetry table
timestamp
sensor_id
value
2022-01-01 00:00:00
5
3
2022-01-01 00:00:01
5
5
2022-01-01 00:00:02
5
6
...
...
...
2022-01-01 01:00:00
5
675
I have another table which stores the state of the sensor, whether it was stationary or in motion and the start/end dates of that particular state for that sensor:
**Status **table
start_date
end_date
status
sensor_id
2022-01-01 00:00:00
2022-01-01 00:20:00
in_motion
5
2022-01-01 00:20:00
2022-01-01 00:40:00
stationary
5
2022-01-01 00:40:00
2022-01-01 01:00:00
in_motion
5
...
...
...
...
The sensor is located at a particular location. The Sensor table stores this metadata:
**Sensor **table
sensor_id
location_id
5
16
In the final table, I have the shifts that occur in each location.
**Shift **table
shift
location_id
occurrence_id
start_date
end_date
A Shift
16
123
2022-01-01 00:00:00
2022-01-01 00:30:00
B Shift
16
124
2022-01-01 00:30:00
2022-01-01 01:00:00
...
...
...
...
...
I want to write a query so that I can retrieve telemetry data that is grouped both by the shifts at the location of the sensor as well as the status of the sensor:
sensor_id
start_date
end_date
status
shift
value_start
value_end
5
2022-01-01 00:00:00
2022-01-01 00:20:00
in_motion
A Shift
3
250
5
2022-01-01 00:20:00
2022-01-01 00:30:00
stationary
A Shift
25
325
5
2022-01-01 00:30:00
2022-01-01 00:40:00
stationary
B Shift
325
490
5
2022-01-01 00:40:00
2022-01-01 01:00:00
in_motion
B Shift
490
675
As you can see, the telemetry data would be grouped both by the information contained in the Shift table as well as the Status table. Particularly, if you notice the sensor was in a stationary status between 2022-01-01 00:20:00 and 2022-01-01 00:40:00, however if you notice the 2nd and 3rd rows in the above table, this is cut into two rows based on the fact that the shift had changed at 2022-01-01 00:30:00.
Any idea about how to write a query that can do this? That would be really appreciated, thanks!

Comparing dates from Multiple rows with the same IDs

I have the following table
ID FromDate ToDate
1 2020-01-01 2020-12-31
1 2021-01-01 2021-12-31
1 2022-03-01 2022-12-31
If the difference between "ToDate" from any row and FromDate in the subsequent row is less than
30 days then I should get 1 row with FromDate and the second ToDate.
Below is what I would expect to get:
ID FromDate ToDate
1 2020-01-01 2021-12-31
1 2022-03-01 2022-12-31
Any suggestions would be greatly appreciated

SQL query for getting data for the last 6 months grouped by month?

I know a basic query to get some results for the last 6 months. Let's say like this:
SELECT *
FROM RANDOM_TABLE
WHERE Date_Column >= DATEADD(MONTH, -6, GETDATE())
But what if I'd like to get results grouped by month - each month looking back 6 months into the past?
The first three rows of a result could ideally look like this (count of IDs is random):
Month_and_year
COUNT(ID)
January 2017
120
February 2017
160
March 2017
240
The last three rows:
Month_and_year
COUNT(ID)
November 2021
80
December 2021
350
January 2021
260
Hope it's understandable.
Thanks in advance!
EDIT:
Over the hours I made a few corrections. Most notably I corrected the self join query to reflect my intentions and also added more details to better explain what is going on.
To my knowledge there are two ways about it (which are probably the same under the hood).
Also, please note that these solutions assume you have a month field already in place. If you have a date or timestamp field, you should take one extra preparation step.
[Addendum] To be more precise, I'd say that the ideal would be to have a date/timestamp field that is truncated/flattened to the first day of the month.
As an example,
month
amount
2021-01-01
50
2021-02-01
20
2021-03-01
10
2021-04-01
100
2021-05-01
20
2021-06-01
40
2021-07-01
80
2021-08-01
50
The first is to use a "self-non-equi join"
SELECT
a.month,
SUM(b.amount) AS amount_over_6_months
FROM table AS a
INNER JOIN table AS b ON a.month BETWEEN b.month AND DATEADD(MONTH, 5, b.month)
WHERE a.month >= DATEADD(MONTH, -5, GETDATE())
GROUP BY a.month
What happens here is that you are joining the table with itself. Specifically, for each row in the (a) alias, you will join six rows from the (b) alias. For each row you will join the rows where the month is equal, all the way back to five months prior. So...
a.month
b.month
a.amount
b.amount
2021-01-01
2021-01-01
50
50
2021-02-01
2021-01-01
20
50
2021-02-01
2021-02-01
20
20
2021-03-01
2021-01-01
10
50
2021-03-01
2021-02-01
10
20
2021-03-01
2021-03-01
10
10
2021-04-01
2021-01-01
100
50
2021-04-01
2021-02-01
100
20
2021-04-01
2021-03-01
100
10
2021-04-01
2021-04-01
100
100
2021-05-01
2021-01-01
20
50
2021-05-01
2021-02-01
20
20
2021-05-01
2021-03-01
20
10
2021-05-01
2021-04-01
20
100
2021-05-01
2021-05-01
20
20
2021-06-01
2021-01-01
40
50
2021-06-01
2021-02-01
40
20
2021-06-01
2021-03-01
40
10
2021-06-01
2021-04-01
40
100
2021-06-01
2021-05-01
40
20
2021-06-01
2021-06-01
40
40
2021-07-01
2021-02-01
80
20
2021-07-01
2021-03-01
80
10
2021-07-01
2021-04-01
80
100
2021-07-01
2021-05-01
80
20
2021-07-01
2021-06-01
80
40
2021-07-01
2021-07-01
80
80
...
...
...
...
Then it's just a matter of grouping based on the month in the (a) alias, and summing the amounts coming from the (b) alias.
The advantage of this approach is that it should be vendor and generation agnostic, save the DATEADD() fucuntion.
The second solution would be to use window functions. I cannot comment on whether this would work with your vendor and the specific version.
SELECT
month,
SUM(amount) OVER (ORDER BY month ROWS BETWEEN 5 PRECEDING AND CURRENT ROW)
FROM table

SQL join 3 tables on ID and dates

I have the below test data. There are 3 tables, sales table, sales delivery table and sales delivery months table.
I need to join all the tables together, so that the blue marked rows are connected to the blue marked rows and the red marked rows are connected to the red marked rows.
The join should use the From and To columns that exist in every table, I guess.
Update:
I have tried the following:
SELECT *
FROM Sales co
LEFT JOIN SalesDelivery cd
ON co.SalesID = cd.SalesID
AND cd.From BETWEEN co.From AND co.To
AND cd.To BETWEEN co.From AND co.To
LEFT JOIN SalesDeliveryMonth cdp
ON cd.SalesDeliveryID = cdp.SalesDeliveryID
AND cdp.From BETWEEN cd.From AND cd.To
AND cdp.To BETWEEN cd.From AND cd.To
Sales table:
SalesID Name Revenue From To Current row
100 New CRM 250000.00 1800-01-01 2018-10-03 0
100 New CRM 500000.00 2018-10-03 9999-12-31 1
SalesDelivery table:
SalesID SalesDeliveryID SalesDeliveryName Revenue SalesStart From To Current row
100 AB100 New CRM 250000.00 2018-07-01 1800-01-01 2018-10-03 0
100 AB100 New CRM 500000.00 2018-07-01 2018-10-03 9999-12-31 1
100 ABM100 New CRM - maintenance 0.00 2018-07-01 2018-10-03 9999-12-31 1
SalesDeliveryMonths table:
RevenueMonth Month SalesDeliveryID SalesID From To Current row
833333.3333 2018-07-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-07-01 AB100 100 2018-10-04 9999-12-31 1
833333.3333 2018-08-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-08-01 AB100 100 2018-10-04 9999-12-31 1
833333.3333 2018-09-01 AB100 100 1800-01-01 2018-10-04 0
166666.6667 2018-09-01 AB100 100 2018-10-04 9999-12-31 1

Get value for each date if not exist then take previous last updated value

I have table which have Column EmployeeID, AccountID,updated date. Each row have Data for Account if it changes on date.
If no change then there is no record for that AccountID on that date. Example
EmployeeID AccountID UpdatedDate
1775 1 2010-12-04 00:00:00.000
1775 1 2010-08-13 23:59:59.000
1775 1 2010-08-13 00:00:00.000
1775 2 2010-12-04 00:00:00.000
1775 3 2010-12-04 00:00:00.000
1775 4 2010-12-04 00:00:00.000
1775 5 2010-12-04 00:00:00.000
1775 6 2010-12-04 00:00:00.000
1775 7 2010-12-04 00:00:00.000
1775 7 2010-06-29 23:59:59.000
I have to get value of each account for each person on each day . if there is no value on current day then it should take last update value from previous day based on Max update date value. and show the result like
EmployeeID,Date, Values of each account.
1775;20120307;45;0;0;0;0;0;0;0;0;0;0;504;0;0;25.0;0.0;0.0;0.0;0.0;0.0;0.0;100;;;;;
Can any one help me?
You can either create a table of dates and left join against that, or create your dates dynamically as described here: generate days from date range