Friends,
I have a SQL problem I could use help on. I'm working with SQL Server 2008.
The use case is the following. We have a system where users watch videos, and each time a user watches a video, we record that activity. We capture three properties each time; assetid (an asset is a video), customerid, and status.
A record can have three different statuses; 'completion', 'playing', and 'start'.
The person who wrote this part of the system is not a developer, and instead of of updating the status of an existing record, inserts a new duplicate record each time a user watches a video. Therefore we have thousands of duplicate records. Here is a sample dataset
The problem I need to solve is to select a record by assetid, customerid, and status. I need to choose a record that has a status of 'completion' if it exists.
If a record has a status of 'playing', but no record with the same assetid and customerid with a status of 'completion' exists, then choose that record.
If a record has a status of 'start', but no record with the same assetid and customerid with a status of either 'completion' or 'playing' exists, then choose that record.
Here is sample code where I tried to use a CASE statement to solve the problem. I also tried another case statement with a NOT IN subquery, but without success.
INSERT INTO #ViewTime (AssetID, CustomerID, ViewTime)
SELECT
tt.customerid, tt.assetId, tt.assetstatus,
CASE WHEN
tt.AssetStatus = 'COMPLETION'
AND
ISNUMERIC(timeposition) = 1
THEN
CONVERT(Numeric(18,3), timePosition)
WHEN
tt.AssetStatus = 'PLAYING'
AND
ISNUMERIC(timeposition) = 1
THEN
CONVERT(Numeric(18,3), timePosition)
WHEN
tt.AssetStatus = 'START'
AND
ISNUMERIC(timeposition) = 1
THEN
CONVERT(Numeric(18,3), timePosition)
ELSE null
END AS ViewTime
FROM
TableAssetTracking tt
inner join TableAssets ta
on tt.AssetID = ta.AssetID
WHERE
tt.timePosition is not null
AND
AssetBuffering is null
Any suggestions would be greatly appreciated. Thanks, Derek
The problem I need to solve is to select a record by assetid, customerid, and status.
I need to choose a record that has a status of 'completion' if it exists.
select distinct assetID, CustomerID
from table
where status = 'complete'
If a record has a status of 'playing', but no record with the same assetid and customerid with a status of 'completion' exists, then choose that record.
select assetID, CustomerID
from table
where status = 'playing'
except
select assetID, CustomerID
from table
where status = 'complete'
If a record has a status of 'start', but no record with the same assetid and customerid with a status of either 'completion' or 'playing' exists, then choose that record.
select assetID, CustomerID
from table
where status = 'start'
except
select assetID, CustomerID
from table
where status in ('complete', 'playing')
The above is not going to give you ViewTime that I see in the example
It was not in the requirements statement
select *
from
( select assetID, CustomerID, status, ViewTime
, row_number() over (partition by assetID, CustomerID order by status, ViewTime desc) as rn
from table
where status in ('complete', 'playing', 'start')
) tt
where tt.rn = 1
There are numerous ways to do this. Here is one.
I am going to use psuedocode because the tables in your sample code don't match the description in your question. You will have to adapt this technique to your tables.
SELECT DISTINCT t.AssetId, t.CustomerId, (
SELECT TOP 1 Status
FROM MyTable t1
WHERE t1.AssetId=t.AssetId
AND t1.CustomerId=t.CustomerId
ORDER BY CASE t1.Status
WHEN 'Completion' THEN 0
WHEN 'Playing' THEN 1
WHEN 'Started' THEN 2
ELSE 3
END ASC
) AS Status
FROM MyTable t
I did this in order to show you a broad way of looking at your data. You want to REALLY identify the last record that was inserted into your table for each person and video. So we get the last one, and then get the status for that record.
IF OBJECT_ID('tempdb..#lastRecord') IS NOT NULL DROP TABLE #lastRecord
SELECT
max(trackingassetdatecreated) dt,
assetid,
customerid,
INTO #lastRecord
FROM
TableAssetTracking
GROUP BY
assetid,
customerid
SELECT
t.assetstatus,
lr.*
FROM
TableAssetTracking t
INNER JOIN
#lastRecord lr on
lr.trackingassetdatecreated = t.trackingassetdatecreated
and lr.assetid = t.assetid
and lr.customerid = t.customerid
Related
I'm working in a telecom and part of work is to check the last status for a specific mobile number along with that last de-active status,it's easy to get the active number by using the condition ACTIVE int the statement ,but it's not easy to pick the last de-active status because each number might have more than one de-active status or only one status ACTIVE, I use the EXP_DATE as an indicator for the last de-active status,I want to show both new data and old data in one row,but I'm struggling with that ,below my table and my expected result :-
my expected result
query that I use on daily basis
select * from test where exp_date>sysdate; to get the active numbers , to get the de-active number select * from test where exp_date<sysdate;
You just need to do outer join with one subquery containing ACTIVE records and one with latest DE-ACTIVE record as following:
SELECT A.MSISDN,
A.NAME,
A.SUB_STATUS,
A.CREATED_DATE,
A.EXP_DATE,
D.MSISDN AS MSISDN_,
D.NAME AS OLD_NAME,
D.SUB_STATUS OLD_STATUS,
D.CREATED_DATE AS OLD_CREATED_DATE,
D.EXP_DATE AS OLD_EXP_DATE
FROM
(SELECT * FROM TEST
WHERE EXP_DATE > SYSDATE
AND SUB_STATUS = 'ACTIVE') A -- ACTIVE RECORD
-- USE CONDITION TO FETCH ACTIVE RECORD AS PER YOUR REQUIREMENT
FULL OUTER JOIN
(SELECT * FROM
(SELECT T.*,
ROW_NUMBER() OVER (PARTITION BY T.MSISDN ORDER BY EXP_DATE DESC NULLS LAST) AS RN
FROM TEST T
WHERE T.EXP_DATE < SYSDATE
AND T.SUB_STATUS='DE-ACTIVE')
-- USE CONDITION TO FETCH DEACTIVE RECORD AS PER YOUR REQUIREMENT
WHERE RN = 1
) D
ON (A.MSISDN = D.MSISDN)
Cheers!!
Here is an overview of how to do this -- one query to get a distinct list of all the phone numbers, left join to a list of the most recent active on that phone number,left join to a list of the most recent de-active on the phone number
How about conditional aggregation?
select msidn,
max(case when status = 'DE-ACTIVE' then create_date end) as deactive_date,
max(case when status = 'ACTIVE' then exp_date end) as active_date
from test
group by msisdn
The following query resulted in correct results only for the inner query (post_engagement, website purchases) while all other numbers were incorrectly increased manyfold. Any ideas? Thanks.
Schema of the two tables:
Favorite_ads (id, campaign_id, campaign_name, objective, impressions, spend)
Actions (id, ads_id, action_type, value)
SELECT
f.campaign_id,
f.campaign_name,
f.objective,
SUM(f.impressions) AS Impressions,
SUM(f.spend) AS Spend,
SUM(a.post_engagement) AS "Post Engagement",
SUM(a.website_purchases) AS "Website Purchases"
FROM
favorite_ads f
LEFT JOIN (
SELECT
ads_id,
CASE WHEN action_type = 'post_engagement' THEN SUM(value) END AS
post_engagement,
CASE WHEN action_type = 'offsite_conversion.fb_pixel_purchase' THEN SUM(value) END AS website_purchases
FROM Actions a
GROUP BY ads_id, action_type
) a ON f.id = a.ads_id
WHERE date_trunc('month',f.date_start) = '2018-04-01 00:00:00' AND
date_trunc('month',f.date_stop) = '2018-04-01 00:00:00' --only get campaigns
that ran in April, 2018
GROUP BY f.campaign_id, campaign_name, objective
Order by campaign_id
Without knowing the actual table structure, constraints, dependencies and data, it's hard to tell, what the issue may be.
You already have some leads in the comments, which you should consider first.
For example you wrote, that this sub-query returns correct results:
SELECT ads_id,
CASE
WHEN action_type = 'post_engagement'
THEN SUM(value)
END AS post_engagement,
CASE
WHEN action_type = 'offsite_conversion.fb_pixel_purchase'
THEN SUM(value)
END AS website_purchases
FROM Actions a
GROUP BY ads_id, action_type
Is this one also giving correct results:
SELECT ads_id,
SUM(
CASE
WHEN action_type = 'post_engagement'
THEN value
END
) AS post_engagement,
SUM(
CASE
WHEN action_type = 'offsite_conversion.fb_pixel_purchase'
THEN value
END
) AS website_purchases
FROM Actions
GROUP BY ads_id
If so, then try replacing your sub-query with that one.
If you still have a problem, then I'd investigate if your join condition is correct, as it would seem, that for a campaign (campaign_id) you could probably have multiple entries with the same id, which will multiply the sub-query results - that depends on what is actually the primary key (or unique constraint) in the favorite_ads.
I'm running PostgreSQL 9.4 and have the following table structure for invoicing:
id BIGINT, time UNIX_TIMESTAMP, customer TEXT, amount BIGINT, status TEXT, billing_id TEXT
I hope I can explain my challenge correctly.
A invoice record can have 2 different status; begin, ongoing and done.
Several invoice records can be part of the same invoice line, over time.
So when an invoice period begins, a record is started with status begin.
Then every 6 hour there will be generated a new record with status ongoing containing the current amount spend in amount.
When an invoice is closed a record with status done is generated with the total amount spend in column amount. All the invoice records within the same invoice contains the same billing_id.
To calcuate a customers current spendings I can run the following:
SELECT sum(amount) FROM invoice_records where id = $1 and time between '2017-06-01' and '2017-07-01' and status = 'done'
But that does not take into account if there's an ongoing invoice which are not closed yet.
How can I also count the largest billing_id with no status done?
Hope it make sense.
Per invoice (i.e. billing_id) you want the amount of the record with status = 'done' if such exists or of the last record with status = 'ongoing'. You can use PostgreSQL's DISTINCT ON for this (or use standard SQL's ROW_NUMBER to rank the records per invoice).
SELECT DISTINCT ON (billing_id) billing_id, amount
FROM invoice_records
WHERE status IN ('done', 'ongoing', 'begin')
ORDER BY
billing_id,
CASE status WHEN 'done' THEN 1 WHEN 'ongoing' THEN 2 ELSE 3 END,
unix_timestamp desc;
The ORDER BY clause represents the ranking.
select sum (amount), id
from (
select distinct on (billing_id) *
from (
select distinct on (status, billing_id) *
from invoice_records
where
id = $1
and time between '2017-06-01' and '2017-07-01'
and status in ('done', 'ongoing')
order by status, billing_id desc
) s
order by billing_id desc
) s
The below provided data is tiny snapshot of a huge log table.
Please help with me a query to identify records having the TRAN_ID's 451140014 and 440102253.
The status of the record is getting updated to 'Definite' from 'Actual'.
As per the business rules of our application it is NOT suppose to happen, I need to fetch the list of all records in this huge table where the statuses are getting updated.
ROW_ID TRAN_ID TRAN_DATE CHG_TYPE DB_SESSION DB_OSUSER DB_HOST STAT_CD
500-XNEGXU 451327759 7/24/2015 11:35:26 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451299279 7/24/2015 10:13:18 AM Update SBLDATALOAD siebelp pas01 Actual
500-XNEGXU 451140014 7/24/2015 1:04:36 AM Update SBLDATALOAD siebelp pas01 Definite
500-XNEGXU 440102253 6/23/2015 3:10:33 PM Update SBLDATALOAD convteam pas01 Actual
500-XNEGXU 426245149 5/8/2015 2:11:21 PM Update SBLDATALOAD convteam pas11 Actual
Edit :
thanks a lot Ponder for your help. Little modification of your query to get the results in a single row. This would give me the next transaction id which flipped the status from 'Actual' to 'Definite'
select row_id, tran_id, next_tran_id,tran_date, next_tran_date,stat_cd
from (
select abc.*, lag(tran_id) over (order by tran_id desc) next_tran_id,lag(tran_date) over (order by tran_id desc) next_tran_date,
case when stat_cd='Actual' and (lag(stat_cd) over (partition by row_id order by tran_id desc)) = 'Definite' then 1
end change
from abc )
where change = 1 order by row_id, tran_id
This query, using function lead() displays all rows where stat_cd is Definite and prior row in order of tran_id:
select row_id, tran_id, tran_date, stat_cd
from (
select data.*,
case when stat_cd='Definite'
or (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
end change
from data )
where change = 1 order by row_id, tran_id
SQLFiddle demo
You may need to change over (order by tran_id) to over (partition by row_id order by tran_id) if your data is organized this way.
Edit: Modified query after additional informations were provided:
select row_id, tran_id, tran_date, stat_cd
from (
select xyz.*,
case
when stat_cd='Actual'
and (lead(stat_cd) over (order by tran_id)) = 'Definite' then 1
when stat_cd='Definite'
and (lag(stat_cd) over (order by tran_id)) = 'Actual' then 2
end change
from xyz)
where change is not null
SQLFiddle demo
I have two tables with below structure
Person(ID, Name, ...)
Action(ID, FirstPersonId, SecondPersonId, Date)
I wanna retrieve this data for each person:
Number of action that a person be on second person from last action
that be on first person
Current query
Select Result.Id ,
(Select Count(*)
From Action
Where SecondPersonId = Result.Id
AND Date > Result.LastAction)
From
(Select ID ,
(
Select Top 1 Date
From Action
Where Action.FirstPersonId = Person.Id
) as LastAction
From Person ) As Result
this query has bad performance and i need very better one.
with lastActionPerson as -- last action for every first person
(select FirstPersonId , max([Date]) as LastActionDate
from Action
)
select a.SecondPersonId ,count(*)
from lastActionPerson lap
join Action a
on a.SecondPersonId = lap.FirstPersonId -- be on second person
and a.[Date] > lap.lastActionDate
-- you could continue to right join person table to show the person without actions
group by a.SecondPersonId