Find mean of difference between dates -Sql Select

Find mean of difference between dates -Sql Select - sql

I have a table with below details
Repid | buildDate | BuildVersion
---------------------------------
1 2013-11-15 10:41:00 1683
1 2013-11-15 11:10:00 1684
1 2013-11-15 12:14:00 1685
2 2013-11-15 10:41:00 1688
2 2013-11-15 11:10:00 1689
2 2013-11-15 12:14:00 1690
for each Repid, i need to find the average of difference in hours between successive build versions.

select b1.RepId
, avg(abs(datediff(hour, b1.buildDate, b2.buildDate)))
from builds b1
join builds b2
on b1.BuildVersion = b2.BuildVersion + 1
and b1.Repid = b2.Repid
group by
b1.RepId
Live example at SQL Fiddle.

Related

Deleting observations based on certain conditions in SQL

I currently have the dataset below:
Group
Start
End
A
2021-01-01
2021-04-05
A
2021-01-01
2021-06-05
A
2021-03-01
2021-06-05
B
2021-06-13
2021-08-05
B
2021-06-13
2021-09-05
B
2021-07-01
2021-09-05
C
2021-10-07
2021-10-17
C
2021-10-07
2021-11-15
C
2021-11-12
2021-11-15
I want like the following final dataset: Essentially, I would like to remove all observations that don't equal the minimum start value and I want to do this by group.
Group
Start
End
A
2021-01-01
2021-04-05
A
2021-01-01
2021-06-05
B
2021-06-13
2021-08-05
B
2021-06-13
2021-09-05
C
2021-10-07
2021-10-17
C
2021-10-07
2021-11-15
I tried the following code but I cannot do a min statement in a where clause. Any help would be appreciated.
Delete from #df1
where start != min(start)

If you want to remove all rows, that have not the same [start] you can join a subquery which find the earliest day, you can add additional ON clauses if you need to find other rows as well
DELETE
o1
FROM observations o1
INNER JOIN(SELECT MIN([Start]) minstart , [Group] FROM observations GROUP BY [Group] ) o2
ON o1.[Group] = o2.[Group] AND o1.[Start] <> o2.minstart
SELECT *
FROM observations
Group | Start | End
:---- | :--------- | :---------
A | 2021-01-01 | 2021-04-05
A | 2021-01-01 | 2021-06-05
B | 2021-06-13 | 2021-08-05
B | 2021-06-13 | 2021-09-05
C | 2021-10-07 | 2021-10-17
C | 2021-10-07 | 2021-11-15
db<>fiddle here

You can try this
DELETE FROM table_name WHERE start IN (SELECT MIN(start) FROM table_name GROUP BY start)

Another alternative using a CTE:
with keepers as (
select [Group], min(Start) as mStart
from #df1 group by [Group]
)
delete src
from #df1 as src
where not exists (select * from keepers
where keepers.[Group] = src.[Group] and keepers.mStart = src.Start)
;
You should make an effort to avoid using reserved words as names, since that requires extra effort to write sql using those.

BD2: SQL _CASE with group by

I have the following tables
---SALARY_ITEMS---
PERSONID | EMPLOYMENT _REF | GROUP1 | CODE | FROM | END | QUANTI
000101 XYX 400 11101 2020-02-12 2020-02-12 12
000101 XYX 300 1100 2020-01-29 2020-02-29 1
000102 XYY 450 11111 2020-02-01 2020-02-12 19
000102 XYY 400 11101 2020-02-02 2020-02-12 82
000103 XYA 500 1100 2020-02-10 2020-02-12 11
000104 XYB 700 1100 2020-01-12 2020-02-12 24
---PERSON ---
PERSONID NAME
000101 Carolina
000102 Helen
000103 Jack
000104 Anna
---EMPLOYMENT---
PERSONID EMPLOYMENT _REF POSITION
000101 XYX doctor
000102 XYY nurse
000103 XYA nurse
000104 XYB Proffesor
----absent---
PERSONID CODE2 FROM END
000101 123 2020-03-01 2020-06-30
000102 120 2020-02-05 2020-02-13
000102 123 2020-03-01 2020-03-28
000103 115 2020-05-05 2020-06-30
000104 123 2020-02-01 2020-05-30
What I tried to do: get all employee that they are doctor and nurse and have certain group with certain code and works over 100 hours in a 2020 -Feb.
The following SQL query give me what i want But i want to add something to my query that is :
create a new column to see if the employee was absent in the same period 2020-feb with absent code 120 or 119 or both.
If he was I will get the 'CODE2' ELSE 'NOTHING'.
How can I do this in DB2?
This is the result I need to get:
PERSONID | NAME | POSITION | QUANTITY |ABSENT (this what i want to have)
000102 Helen NURSE 101 120
Query:
SELECT
S.PERSONID, P.NAME,E.POSTION , sum(S.QUANTITY) as QUANTITY
FROM
SALARY_ITEMS S
LEFT JOIN
PERSON P ON S.PERSONID = P.PERSONID
LEFT JOIN
EMPLOYMENT E ON E.EMPLOYMENT_REF = S.EMPLOYMENT _REF
WHERE
S.group1 IN ('400', '440', '450', '470', '640')
AND S.code IN ('11101', '11111', '11121', '11131', '11141')
AND S.from >= '2020-02-01'
AND S.end <= '2020-02-29'
AND E.POSTION IN ('nurse', 'doctor')
AND (SELECT SUM(S2.QUANTITY) AS QUANTITY2
FROM SALARY_ITEMS S2
WHERE S2.group1 IN ('400', '440', '450', '470', '640')
AND S2.code IN ('11101', '11111', '11121', '11131', '11141')
AND S2.from >= '2020-02-01'
AND S2.end <= '2020-02-29'
AND S.PERSONID = S2.PERSONID) >= '100'
GROUP BY
S.PERSONID, P.NAME, E.POSTION

Finding a minimum date before another date

Let's say I have two tables. One is a table with information about customer service inquiries, which contains information about the customer and the time the inquiry was placed. The customer's information (in this case, the ID) is saved for all future inquiries.
CUST_ID INQUIRY_ID INQUIRY_DATE
001 34 2015-05-03 08:15
001 36 2015-05-05 13:12
002 39 2015-05-10 18:43
003 42 2015-05-12 14:58
003 46 2015-05-14 07:27
001 50 2015-05-18 19:06
003 55 2015-05-20 11:40
The other table contains information about the resolution dates for all customer inquiries.
CUST_ID RESOLVED_DATE
001 2015-05-06 12:54
002 2015-05-11 08:09
003 2015-05-14 19:37
001 2015-05-19 16:12
003 2015-05-22 08:40
The resolution table doesn't have a key to link to the inquiry table other than the CUST_ID, so in order to calculate the time to resolution, I want to determine the minimum inquiry date before the resolution for EACH resolution date. The resulting table would look like this:
CUST_ID FIRST_INQUIRY RESOLVED_DT
001 2015-05-03 08:15 2015-05-06 12:54
001 2015-05-18 19:06 2015-05-19 16:12
002 2015-05-10 18:43 2015-05-11 08:09
003 2015-05-12 14:58 2015-05-14 19:37
003 2015-05-20 11:40 2015-05-22 08:40
At first I just went with min(case when INQUIRY_DATE < RESOLVED_DT), but for people like customers 001 and 003 who have multiple inquiries across different dates, the query would just return the first ever inquiry date, not the first since the last inquiry. Does anyone know how to do this? I'm using Netezza.

One option is to create a subquery for each table (inquries and resolutions) which numbers the transaction for each CUST_ID using the date. Then, the two subqueries can be joined together using this ordered index column along with the CUST_ID.
I also used the INQUIRY_ID in the inquiries table to break a tie, should it occur. There is not way to break a tie in the resolutions table for a given customer and date based on the data you showed us.
SELECT t1.CUST_ID, t1.INQUIRY_ID AS FIRST_INQUIRY, t2.RESOLVED_DATE AS RESOLVED_DT
FROM
(
SELECT CUST_ID, INQUIRY_ID, INQUIRY_DATE,
(SELECT COUNT(*) + 1
FROM inquiries
WHERE CUST_ID = t.CUST_ID AND INQUIRY_DATE <= t.INQUIRY_DATE
AND INQUIRY_ID < t.INQUIRY_ID) AS index
FROM inquiries AS t
) AS t1
INNER JOIN
(
SELECT CUST_ID, RESOLVED_DATE,
(SELECT COUNT(*) + 1
FROM resolutions
WHERE CUST_ID = t.CUST_ID AND RESOLVED_DATE < t.RESOLVED_DATE) AS index
FROM resolutions t
) AS t2
ON t1.CUST_ID = t2.CUST_ID AND t1.index = t2.index
Here are what the subquery tables look like:
inquiries:
CUST_ID INQUIRY_ID INQUIRY_DATE index
001 34 2015-05-03 08:15 1
001 36 2015-05-05 13:12 2
002 39 2015-05-10 18:43 1
003 42 2015-05-12 14:58 1
003 46 2015-05-14 07:27 2
001 50 2015-05-18 19:06 3
003 55 2015-05-20 11:40 3
resolutions:
CUST_ID RESOLVED_DATE index
001 2015-05-06 12:54 1
002 2015-05-11 08:09 1
003 2015-05-14 19:37 1
001 2015-05-19 16:12 2
003 2015-05-22 08:40 2
Note that this solution is not robust to missing data, e.g. there is an inquiry which was not closed, or the resolution was never recorded.

SQL - Creating a timeline for each ID (Vertica)

I am dealing with the following problem in SQL (using Vertica):
In short -- Create a timeline for each ID (in a table where I have multiple lines, orders in my example, per ID)
What I would like to achieve -- At my disposal I have a table on historical order date and I would like to compute new customer (first order ever in the past month), active customer- (>1 order in last 1-3 months), passive customer- (no order for last 3-6 months) and inactive customer (no order for >6 months) rates.
Which steps I have taken so far -- I was able to construct a table similar to the example presented below:
CustomerID Current order date Time between current/previous order First order date (all-time)
001 2015-04-30 12:06:58 (null) 2015-04-30 12:06:58
001 2015-09-24 17:30:59 147 05:24:01 2015-04-30 12:06:58
001 2016-02-11 13:21:10 139 19:50:11 2015-04-30 12:06:58
002 2015-10-21 10:38:29 (null) 2015-10-21 10:38:29
003 2015-05-22 12:13:01 (null) 2015-05-22 12:13:01
003 2015-07-09 01:04:51 47 12:51:50 2015-05-22 12:13:01
003 2015-10-23 00:23:48 105 23:18:57 2015-05-22 12:13:01
A little bit of intuition: customer 001 placed three orders from which the second one was 147 days after its first order. Customer 002 has only placed one order in total.
What I think that the next steps should be -- I would like to know for each date (also dates on which a certain user did not place an order), for each CustomerID, how long it has been since his/her last order. This would imply that I would create some sort of timeline for each CustomerID. In the example presented above I would get 287 (days between 1st of May 2015 and 11th of February 2016, the timespan of this table) lines for each CustomerID. I have difficulties solving this previous step. When I have performed this step I want to create a field which shows at each date the last order date, the period between the last order date and the current date, and what state someone is in at the current date. For the example presented earlier, this would look something like this:
CustomerID Last order date Current date Time between current date /last order State
001 2015-04-30 12:06:58 2015-05-01 00:00:00 0 00:00:00 New
...
001 2015-04-30 12:06:58 2015-06-30 00:00:00 60 11:53:02 Active
...
001 2015-09-24 17:30:59 2016-02-01 00:00:00 129 11:53:02 Passive
...
...
002 2015-10-21 17:30:59 2015-10-22 00:00:00 0 06:29:01 New
...
002 2015-10-21 17:30:59 2015-11-30 00:00:00 39 06:29:01 Active
...
...
003 2015-05-22 12:13:01 2015-06-23 00:00:00 31 11:46:59 Active
...
003 2015-07-09 01:04:51 2015-10-22 00:00:00 105 11:46:59 Inactive
...
At the dots there should be all the inbetween dates but for sake of space I have left these out of the table.
When I know for each date what the state is of each customer (active/passive/inactive) my plan is to sum the states and group by date which should give me the sum of new, active, passive and inactive customers. From here on I can easily compute the rates at each date.
Anybody that knows how I can possibly achieve this task?
Note -- If anyone has other ideas how to achieve the goal presented above (using some other approach compared to the approach I had in mind) please let me know!

EDIT
Suppose you start from a table like this:
SQL> select * from ord order by custid, ord_date ;
custid | ord_date
--------+---------------------
1 | 2015-04-30 12:06:58
1 | 2015-09-24 17:30:59
1 | 2016-02-11 13:21:10
2 | 2015-10-21 10:38:29
3 | 2015-05-22 12:13:01
3 | 2015-07-09 01:04:51
3 | 2015-10-23 00:23:48
(7 rows)
You can use Vertica's Timeseries Analytic Functions TS_FIRST_VALUE(), TS_LAST_VALUE() to fill gaps and interpolate last_order date to the current date:
Then you just have to join this with a Vertica's TimeSeries generated from the same table with interval one day starting from the first day each customer did place his/her first order up to now (current_date):
select
custid,
status_dt,
last_order_dt,
case
when status_dt::date - last_order_dt::date < 30 then case
when nord = 1 then 'New' else 'Active' end
when status_dt::date - last_order_dt::date < 90 then 'Active'
when status_dt::date - last_order_dt::date < 180 then 'Passive'
else 'Inactive'
end as status
from (
select
custid,
last_order_dt,
status_dt,
conditional_true_event (first_order_dt is null or
last_order_dt > lag(last_order_dt))
over(partition by custid order by status_dt) as nord
from (
select
custid,
ts_first_value(ord_date) as first_order_dt ,
ts_last_value(ord_date) as last_order_dt ,
dt::date as status_dt
from
( select custid, ord_date from ord
union all
select distinct(custid) as custid, current_date + 1 as ord_date from ord
) z timeseries dt as '1 day' over (partition by custid order by ord_date)
) x
) y
where status_dt <= current_date
order by 1, 2
;
And you will get something like this:
custid | status_dt | last_order_dt | status
--------+------------+---------------------+---------
1 | 2015-04-30 | 2015-04-30 12:06:58 | New
1 | 2015-05-01 | 2015-04-30 12:06:58 | New
1 | 2015-05-02 | 2015-04-30 12:06:58 | New
...
1 | 2015-05-29 | 2015-04-30 12:06:58 | New
1 | 2015-05-30 | 2015-04-30 12:06:58 | Active
1 | 2015-05-31 | 2015-04-30 12:06:58 | Active
...
etc.

Merge two tables using common fields

I have two tables, which I need to get data from table 1 to table 2 by matching customer name & Sale date. In the first table, the name is in two columns but the other table its in one column.
> list(CustomerSales.CSV)
[[1]]
CustomerFirstName CustomerLastName SaleDate_Time InvoiceNo InvoiceValue
1 Hendricks Eric 30-09-2015 13:00 10 5000
2 Fier Marilyn 02-10-2015 15:30 15 18000
3 O'Brien Donna 03-10-2015 13:30 16 25000
4 Perez Barney 03-10-2015 16:10 17 20000
5 Fier Marilyn 04-10-2015 11:10 18 6000
6 Hendricks Eric 05-10-2015 14:00 19 8000
> list(ReturnSales.CSV)
[[1]]
CustomerName SaleDate_Time ReturnDate_Time ReturnNo ReturnValue
1 Hendricks Eric 05-10-2015 14:00 10-10-2015 14:00 1 1000
2 O'Brien Donna 03-10-2015 13:30 15-10-2015 13:30 2 2000
3 Perez Barney 03-10-2015 16:10 12-10-2015 16:10 3 1500
4 Fier Marilyn 02-10-2015 15:30 08-10-2015 15:30 4 2000
The result should be a table like this.
list(ReturnSales.CSV)
[[1]]
CustomerName SaleDate_Time InvoiceNo InvoiceValue ReturnDate_Time ReturnNo ReturnValue
1 Hendricks Eric 05-10-2015 14:00 19 8000 10-10-2015 14:00 1 1000
2 O'Brien Donna 03-10-2015 13:30 16 25000 15-10-2015 13:30 2 2000
3 Perez Barney 03-10-2015 16:10 17 20000 12-10-2015 16:10 3 1500
4 Fier Marilyn 02-10-2015 15:30 15 18000 08-10-2015 15:30 4 2000
Table 2 customer name & SaleDate_Time should be match with table 1 CustomerFirstName, CustomerLastName, & SaleDate_Time. Then combine from table 1, InvoiceNo & InvoiceValue to table 2.
Any suggestions?

Are you looking for SQL Query for the above scenario then you can something like below.
SELECT RS.CustomerName
,CS.SaleDate_Time
,CS.InvoiceNo
,CS.InvoiceValue
,RS.Return_DateTime
,RS.ReturnNo
,RS.ReturnValue
FROM CustomerSales CS
INNER JOIN ReturnSales RS
ON RS.CustomerName = CS.CustomerfirstName + ' ' + Cs.CustomerLastName
WHERE RS.SaleDate_Time = CS.SaleDate_Time

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas