SQL Count project items where project status history is within date range - sql

I have 4 tables:
projects: id, title, current_status_id
statuses: id, label
status_history: project_id, status_id, created_at
messages: id, project_id, body, created_at
A status_history row is inserted when, in the application, the project changes status (say, from "lead" to "active" to "complete"). Note the created_at column is a timestamp that records the date of the change. Between status changes, activity is happening in the project and messages are created. For example, the project is initialized with a "lead" status, some messages are created while the project is in this "lead" state, the project is changed to "active" status, some messages are created while the project is in this state, and so on.
I want to create query that shows: date, # of messages created in "lead" projects, # messages created in "active" projects, and # messages in projects with other statuses. Can this be done all in one query? I am using PostgreSQL.
Here is some pseudo-code that hopefully illuminates what I'm looking for.
* Start at the earliest date
* Find all projects whose status was 'lead' on that date
* Count the number of created messages from these projects with that date
* Find all projects whose status was 'active' on that date
* Count the number of created messages from these projects with that date
* Find all projects whose status was anything else on that date
* Count the number of created messages from these projects with that date
* ... some projects change status, some stay the same, business happens ...
* Go to next date
* Find all projects whose status was 'lead' on that date
* Count the number of created messages from these projects with that date
* Find all projects whose status was 'active' on that date
* Count the number of created messages from these projects with that date
* Find all projects whose status was anything else on that date
* Count the number of created messages from these projects with that date
* ... some projects change status, some stay the same, business happens ...
* keep doing this until the present
While the project does have a current_status_id column, it is the present status and not necessarily the status of the project last month. The status of the project does not change every day - a status_history row is not created every day for every project.

You are looking for a query like this...This is MSSQL but I assume very similar to Postgresql or you can simply find the correct syntaxes online.
SELECT count(*) AS 'count', messages.created_at, statuses.label
FROM messages
JOIN projects ON projects.id = messages.project_id
JOIN status_history ON projects.id = status_history.project_id
JOIN statuses ON statuses.id ON status_history.status_id
GROUP BY created_at, statues.label

Try the below.
Replace "lead" and "active" with the status IDs for those 2 statuses.
Note that the first field being selected is a conversion of your created_at timestamp to a date value (removing time).
The counts provided show the number of projects newly created with those statuses. They do not include projects who were already around but which changed to those statuses on the given days. This is accomplished via the not exists subquery.
select date(created_at) as dt
, sum(case when sh.status_id = 'lead' then 1 else 0 end) as num_lead
, sum(case when sh.status_id = 'active' then 1 else 0 end) as num_active
, sum(case when sh.status_id not in ('lead','active') then 1 else 0 end) as num_else
from status_history sh
where not exists
( select 1
from status_history x
where x.project_id = sh.project_id
and x.created_at < sh.created_at )
group by date(created_at)
order by 1

what about:
SELECT to_char(tmp.date, 'YYYY-MM-DD') as date, COUNT(tmp.status = 'lead') as num_lead, COUNT(tmp.status = 'active') as num_active FROM
(
SELECT m.created_at AS date, COUNT(m.id) as messages, s.label as status FROM messages AS m
INNER JOIN project AS p ON p.id = m.project_id
INNER JOIN statuses AS s ON s.id = p.current_status_id
GROUP BY m.created_at, s.id, s.label
) as tmp
GROUP BY tmp.date;
Grouping should be 100%-correct (because it's not clear that one id belongs to exactly one textual-representation, label is not primary_key!)
Temporary table contains all relations of "Messages per date and project_status_label" and the outer select-function only changes dimension.

Related

How can I count duplicates that fall within a date range? (SQL)

I have a table that contains Applicant ID, Application Date and Job Description.
I am trying to identify duplicates, defined as when the same Applicant ID applies for the same Job Description within 3 days of their other application.
I have already done this for the same date, this way:
CREATE TABLE Duplicates
SELECT
COUNT (ApplicantID) as ApplicantCount
ApplicantID
ApplicationDate
JobDescription
FROM Applications
GROUP BY ApplicantID,ApplicationDate,JobDescription
-
DELETE FROM Duplicates WHERE ApplicantCount <2
SELECT COUNT(*) FROM Duplicates
I'm now trying to make it so it doesn't have to match exactly on the ApplicationDate, but falls within a range. How do you do this?
You can use lead()/lag(). Here is an example that returns the first application when there is a duplicate:
SELECT a.*
FROM (SELECT a.*,
LEAD(ApplicationDate) OVER (PARTITION BY ApplicantID, JobDescription) as next_ad
FROM Applications a
) a
WHERE next_ad <= ApplicationDate + INTERVAL 3 DAY;
You can also phrase this using exists:
select a.*
from applications a
where exists (select 1
from applications a2
where a2.ApplicantID = a.ApplicantID and
a2.JobDescription = a.JobDescription and
a2.ApplicationDate > a.ApplicationDate and
a2.ApplicationDate <= a.ApplicationDate + interval 3 day
);

How show the last status of a mobile number and old data in the same row ? using SQL

I'm working in a telecom and part of work is to check the last status for a specific mobile number along with that last de-active status,it's easy to get the active number by using the condition ACTIVE int the statement ,but it's not easy to pick the last de-active status because each number might have more than one de-active status or only one status ACTIVE, I use the EXP_DATE as an indicator for the last de-active status,I want to show both new data and old data in one row,but I'm struggling with that ,below my table and my expected result :-
my expected result
query that I use on daily basis
select * from test where exp_date>sysdate; to get the active numbers , to get the de-active number select * from test where exp_date<sysdate;
You just need to do outer join with one subquery containing ACTIVE records and one with latest DE-ACTIVE record as following:
SELECT A.MSISDN,
A.NAME,
A.SUB_STATUS,
A.CREATED_DATE,
A.EXP_DATE,
D.MSISDN AS MSISDN_,
D.NAME AS OLD_NAME,
D.SUB_STATUS OLD_STATUS,
D.CREATED_DATE AS OLD_CREATED_DATE,
D.EXP_DATE AS OLD_EXP_DATE
FROM
(SELECT * FROM TEST
WHERE EXP_DATE > SYSDATE
AND SUB_STATUS = 'ACTIVE') A -- ACTIVE RECORD
-- USE CONDITION TO FETCH ACTIVE RECORD AS PER YOUR REQUIREMENT
FULL OUTER JOIN
(SELECT * FROM
(SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.MSISDN ORDER BY EXP_DATE DESC NULLS LAST) AS RN
FROM TEST T
WHERE T.EXP_DATE < SYSDATE
AND T.SUB_STATUS='DE-ACTIVE')
-- USE CONDITION TO FETCH DEACTIVE RECORD AS PER YOUR REQUIREMENT
WHERE RN = 1
) D
ON (A.MSISDN = D.MSISDN)
Cheers!!
Here is an overview of how to do this -- one query to get a distinct list of all the phone numbers, left join to a list of the most recent active on that phone number,left join to a list of the most recent de-active on the phone number
How about conditional aggregation?
select msidn,
max(case when status = 'DE-ACTIVE' then create_date end) as deactive_date,
max(case when status = 'ACTIVE' then exp_date end) as active_date
from test
group by msisdn

SQL - Get difference on dates based on same column from different rows

I have a table where it's stored all updates on form fields. I'm trying to build a query where I want to calculate how many time has elapsed between each update.
Here is an example of my table:
ticket,last_update,status
12345,2019-03-29 13:54:55.000000,canceled
12345,2019-03-29 12:46:20.000000,analysis
12345,2019-03-28 18:30:55.000000,canceled
12345,2019-03-28 09:31:20.000000,analysis
I want to check the diff time on status change between analysis to other statuses (each analysis has a subsequent status).
Example output:
First analysis: differente between analysis 2019-03-28 09:31:20.000000 and 2019-03-28 18:30:55.000000 canceled
First analysis: differente between analysis 2019-03-29 12:46:20.000000 and 2019-03-29 13:54:55.000000 canceled
Is possible to write a SQL statement to return this data? I'm stuck on this statement:
select ticket, last_update, status from history as h
where h.ticket = 12345
and h.field = 'custom_field_a';
I would like to avoid write some code on backend to perform it.
Tried it using PARTITION BY:
select ticket,
last_update - lag(last_update) over (partition by ticket order by last_update) as difference
from history as h
where h.ticket = 12345
and h.field = 'custom_field_a'
group by ticket, last_update;
It should return 2 rows containing difference against analysis -> canceled, analysis -> canceled but i got 4 rows.
You can do something like this:
select ticket,
max(last_update) filter (where status = 'created') as created_ts,
max(last_update) filter (where status = 'cancelled') as cancelled_ts,
max(last_update) filter (where status = 'analysis') as analysis_ts,
from history as h
where h.ticket = 12345 and
h.field = 'custom_field_a'
group by ticket;
I'm not sure how you want the differences expressed, but you can just subtract the relevant values.
You are able to use the LAG functionality, which takes the data from the previous row. This query below should be able to calculate the difference:
SELECT last_update - lag(last_update) over (order by last_update) as difference
FROM history AS h
where h.ticket = 12345
and h.field = 'custom_field_a';
/A
You can join the relevant lines, like so:
select created.ticket
, created.last_update as created_ts
, analysis.last_update as analysis_ts
, canceled.last_update as canceled_ts
from history as created
left join history as analysis
on created.ticket = analysis.ticket
and created.field = analysis.field
and analysis.status = 'analysis'
left join history as canceled
on created.ticket = canceled.ticket
and created.field = canceled.field
and canceled.status = 'canceled'
where created.ticket = 12345
and created.field = 'custom_field_a'
and created.status = 'created'
Not sure how field plays into it, it's probably a join condition on all joins as well. This will work if you have one entry per status, otherwise you'll get duplicate rows and might need a different strategy.
You will want to use the lag() window function to get the time difference between the two
https://www.postgresql.org/docs/current/functions-window.html
Edit you may want to use a CTE to filter your query first for the result you want.
with history_set as(
select
ticket,
lag(last_update)
over (partition by ticket order by last_update) as prev_update,
last_update,
last_update - lag(last_update)
over (partition by ticket order by last_update) as time_diff,
status
from history as h
where h.ticket = 12345
and h.field = 'custom_field_a'
order by last_update
)
select
ticket,
prev_update,
last_update,
time_diff,
status
from history_set
where status <> 'analysis'

ORACLE: How to get earliest record of certain value when value alternates?

I'll simplify what I'm looking for here.
I have a table that stores an asset name, the date (job runs daily), and a value that is either 1 or 0 that indicates whether the asset is out of compliance.
I need to get the earliest date where the value is 0.
The issue I run into is that the issue can be intermittent, such that the same asset may show as in compliance, then out, and then in again. I want to retrieve the earliest date it was out of compliance this time.
Asset Date Compliant
NAME 2-FEB-18 0
NAME 1-FEB-18 0
NAME 31-JAN-18 1
NAME 30-JAN-18 0
In this example, I want to retrieve 1-FEB-18, and not 30-JAN-18.
I'm using a subquery into a temp table that retrieves the MIN(date) which would return 30-JAN-18. Thoughts?
Anonymized current subquery:
least_recent_created AS
(
SELECT t.date,t.ASSET, t.DATABASE_NAME FROM table t
WHERE t.date =
(
SELECT MIN(date)
FROM table2 t2
WHERE t.ASSET_ID = t2.ASSET_ID
AND t.DATABASE_NAME = t2.DATABASE_NAME
AND t2.compliant = 0
)
)
You want the earliest out-of-compliance date since the last in compliance. If the asset was never in compliance, I assume you want the earliest date.
select t.asset, min(date)
from (select t.*,
max(case when t.complaint = 1 then date end) over (partition by asset) as max_compliant1_date
from t
) t
where complaint = 0 and
(date > max_complaint1_date or max_complaint1_date is null)
group by t.asset;
You can use the following query:
SELECT "Asset", MAX("Date")
FROM (
SELECT "Asset", "Date", "Compliant",
CASE
WHEN "Compliant" = 0 AND
LAG("Compliant") OVER (PARTITION BY "Asset"
ORDER BY "Date") = 1 THEN "Date"
END AS OutOfComplianceDate
FROM mytable) t
WHERE OutOfComplianceDate IS NOT NULL
GROUP BY "Asset"
The inner query identifies 'Out-of-Compliance' dates, that is dates where the current record has "Compliant" = 0 whereas the immediately preceding record has "Compliant" = 1.
The outer query returns the latest 'Out-of-Compliance' date per "Asset".
Demo here

SQL SUM and GROUP BY based on WHERE clause

I'm running PostgreSQL 9.4 and have the following table structure for invoicing:
id BIGINT, time UNIX_TIMESTAMP, customer TEXT, amount BIGINT, status TEXT, billing_id TEXT
I hope I can explain my challenge correctly.
A invoice record can have 2 different status; begin, ongoing and done.
Several invoice records can be part of the same invoice line, over time.
So when an invoice period begins, a record is started with status begin.
Then every 6 hour there will be generated a new record with status ongoing containing the current amount spend in amount.
When an invoice is closed a record with status done is generated with the total amount spend in column amount. All the invoice records within the same invoice contains the same billing_id.
To calcuate a customers current spendings I can run the following:
SELECT sum(amount) FROM invoice_records where id = $1 and time between '2017-06-01' and '2017-07-01' and status = 'done'
But that does not take into account if there's an ongoing invoice which are not closed yet.
How can I also count the largest billing_id with no status done?
Hope it make sense.
Per invoice (i.e. billing_id) you want the amount of the record with status = 'done' if such exists or of the last record with status = 'ongoing'. You can use PostgreSQL's DISTINCT ON for this (or use standard SQL's ROW_NUMBER to rank the records per invoice).
SELECT DISTINCT ON (billing_id) billing_id, amount
FROM invoice_records
WHERE status IN ('done', 'ongoing', 'begin')
ORDER BY
billing_id,
CASE status WHEN 'done' THEN 1 WHEN 'ongoing' THEN 2 ELSE 3 END,
unix_timestamp desc;
The ORDER BY clause represents the ranking.
select sum (amount), id
from (
select distinct on (billing_id) *
from (
select distinct on (status, billing_id) *
from invoice_records
where
id = $1
and time between '2017-06-01' and '2017-07-01'
and status in ('done', 'ongoing')
order by status, billing_id desc
) s
order by billing_id desc
) s