Related
I have data from two different sources. On one hand I have user data from our app. This has a primary key of ID and UTC date. There are only rows for UTC dates when are users uses the app. On the other hand I have advertisement campaign attribition data for the users (which can be multiple advertisment campaigns per user). This table has a primary key of ID and campaign and a metric containing a advertisment attribution timestamp. I want to combine the two data sources such that I can compute if a campaign is generating more revenue than it costs among other campaign statistics.
App data example:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, UTC_Date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0)])
advertisement campaign attribition data example:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, Attribution_Timestamp Timestamp, campaign_name STRING>>
[(1, TIMESTAMP('2021-01-01 09:54:31'), "A"),
(1, TIMESTAMP('2021-01-09 22:32:51'), "B"),
(2, TIMESTAMP('2021-01-03 19:12:11'), "A")])
The end result I would like to get is:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, UTC_Date DATE, Revenue FLOAT64, campaign_name STRING>>
[(1, DATE('2021-01-01'), 0, "A"),
(1, DATE('2021-01-05'), 5, "A"),
(1, DATE('2021-01-10'), 0, "B"),
(2, DATE('2021-01-03'), 10, "A"),
(2, DATE('2021-01-08'), 0, "A"),
(2, DATE('2021-01-09'), 0, "A")])
This can be achieved by somehow joining the campaign attribution data to the app data and then forward filling.
The problem I have is that the advertisment attribution timestamp can have a mismatch with the UTC dates in the app data table. This means I cannot use a left join as it will not assign campaign_name B to ID 1. Does anyone know an elegant way to solve this problem?
Found a solution! Here is what I did (and a little bit more sample data):
WITH app_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(1, DATE('2021-01-12'), 0),
(1, DATE('2021-01-15'), 0),
(1, DATE('2021-01-16'), 15),
(1, DATE('2021-01-18'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0),
(2, DATE('2021-01-15'), 4),
(2, DATE('2021-02-01'), 0),
(2, DATE('2021-02-08'), 8),
(2, DATE('2021-02-15'), 0),
(2, DATE('2021-03-04'), 0),
(2, DATE('2021-03-06'), 12),
(3, DATE('2021-02-15'), 10),
(3, DATE('2021-02-23'), 5),
(3, DATE('2021-03-25'), 0),
(3, DATE('2021-03-30'), 0)])
),
advertisment_attribution_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, campaign_name STRING>>
[(1, DATE(TIMESTAMP('2021-01-01 09:54:31')), "A"),
(1, DATE(TIMESTAMP('2021-01-09 22:32:51')), "B"),
(1, DATE(TIMESTAMP('2021-01-17 14:30:05')), "C"),
(2, DATE(TIMESTAMP('2021-01-03 19:12:11')), "A"),
(1, DATE(TIMESTAMP('2021-01-15 18:17:57')), "B"),
(3, DATE(TIMESTAMP('2021-03-14 22:32:51')), "C")])
)
SELECT
t1.*,
IFNULL(LAST_VALUE(t2.campaign_name IGNORE NULLS) OVER (PARTITION BY t1.adid ORDER BY t1.utc_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), "Organic") as campaign_name
FROM
app_data t1
LEFT JOIN
advertisment_attribution_data t2
ON t1.adid = t2.adid
AND t1.utc_date = (SELECT MIN(t3.utc_date) FROM app_data t3 WHERE t2.adid=t3.adid AND t2.utc_date <= t3.utc_date)
EDIT
It doesn't work when I select a real table in app_data. It says: Unsupported subquery with table in join predicate.
EDIT 2
Found a way to solve the problem where you cannot use subqueries in joins (apparently it is possible for tables which are not selected from an existing table...) This is the way it works in any case:
WITH app_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(1, DATE('2021-01-12'), 0),
(1, DATE('2021-01-15'), 0),
(1, DATE('2021-01-16'), 15),
(1, DATE('2021-01-18'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0),
(2, DATE('2021-01-15'), 4),
(2, DATE('2021-02-01'), 0),
(2, DATE('2021-02-08'), 8),
(2, DATE('2021-02-15'), 0),
(2, DATE('2021-03-04'), 0),
(2, DATE('2021-03-06'), 12),
(3, DATE('2021-02-15'), 10),
(3, DATE('2021-02-23'), 5),
(3, DATE('2021-03-25'), 0),
(3, DATE('2021-03-30'), 0)])
),
advertisment_attribution_data AS
(
SELECT
*,
(
SELECT
MIN(t2.utc_date)
FROM app_data t2
WHERE t1.adid=t2.adid
AND t1.utc_date <= t2.utc_date
) as attribution_join_date -- is the closest next date for this adid in app_data to the attribution date. This ensures the join lateron works.
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, campaign_name STRING>>
[(1, DATE(TIMESTAMP('2021-01-01 09:54:31')), "A"),
(1, DATE(TIMESTAMP('2021-01-09 22:32:51')), "B"),
(1, DATE(TIMESTAMP('2021-01-17 14:30:05')), "C"),
(2, DATE(TIMESTAMP('2021-01-03 19:12:11')), "A"),
(1, DATE(TIMESTAMP('2021-01-15 18:17:57')), "B"),
(3, DATE(TIMESTAMP('2021-03-14 22:32:51')), "C")]) t1
)
SELECT
t1.*,
IFNULL(LAST_VALUE(t2.campaign_name IGNORE NULLS) OVER (PARTITION BY t1.adid ORDER BY t1.utc_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 'Organic') as campaign_name
FROM
app_data t1
LEFT JOIN
advertisment_attribution_data t2
ON t1.adid = t2.adid
AND t1.utc_date = t2.attribution_join_date
SQL Fiddle with schema and my intial attempt.
CREATE TABLE person
([firstname] varchar(10), [surname] varchar(10), [dob] date, [personid] int);
INSERT INTO person
([firstname], [surname], [dob] ,[personid])
VALUES
('Alice', 'AA', '1/1/1990', 1),
('Alice', 'AA', '1/1/1990', 2),
('Bob' , 'BB', '1/1/1990', 3),
('Carol', 'CC', '1/1/1990', 4),
('Alice', 'AA', '1/1/1990', 5),
('Kate' , 'KK', '1/1/1990', 6),
('Kate' , 'KK', '1/1/1990', 7)
;
CREATE TABLE person_membership
([personid] int, [personstatus] varchar(1), [memberid] int);
INSERT INTO person_membership
([personid], [personstatus], [memberid])
VALUES
(1, 'A', 10),
(2, 'A', 20),
(3, 'A', 30),
(3, 'A', 40),
(4, 'A', 50),
(4, 'A', 60),
(5, 'T', 70),
(6, 'A', 80),
(7, 'A', 90);
CREATE TABLE membership
([membershipid] int, [memstatus] varchar(1));
INSERT INTO membership
([membershipid], [memstatus])
VALUES
(10, 'A'),
(20, 'A'),
(30, 'A'),
(40, 'A'),
(50, 'T'),
(60, 'A'),
(70, 'A'),
(80, 'A'),
(90, 'T');
There are three tables (as per the fiddle above). Person table contains duplicates, same people entered more than once, for the purpose of this exercise we assume that a combination of the first name, surname and DoB is enough to uniquely identify a person.
I am trying to build a query which will show duplicates of people (first name+surname+Dob) with two or more active entries in the Person table (person_membership.person_status=A) AND two or more active memberships (membership.mestatus=A).
Using the example from SQL Fiddle, the result of the query should be just Alice (two active person IDs, two active membership IDs).
I think I'm making progress with the following effort but it looks rather cumbersome and I need to remove Katie from the final result - she doesn't have a duplicate membership.
SELECT q.firstname, q.surname, q.dob, p1.personid, m.membershipid
FROM
(SELECT
p.firstname,p.surname,p.dob, count(*) as cnt
FROM
person p
GROUP BY
p.firstname,p.surname,p.dob
HAVING COUNT(1) > 1) as q
INNER JOIN person p1 ON q.firstname=p1.firstname AND q.surname=p1.surname AND q.dob=p1.dob
INNER JOIN person_membership pm ON p1.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
WHERE pm.personstatus = 'A' AND m.memstatus = 'A'
Since you are using SQL Server windows function will be handy for this scenario. The following will give you the expected output.
SELECT firstname,surname,dob,personid,memberid
from(
SELECT firstname,surname,dob,p.personid,memberid
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid) rnasc
,Rank() over(partition by p.firstname,p.surname,p.dob order by p.personid desc) rndesc
FROM [StagingGRG].[dbo].[person] p
INNER JOIN person_membership pm ON p.personid=pm.personid
INNER JOIN membership m ON pm.memberid = m.membershipid
where personstatus='A' and memstatus='A')a
where a.rnasc+rndesc>2
You have to add Group by and Having clause to return duplicate items only-
SELECT
person.firstname,person.surname,person.dob
FROM
person, person_membership, membership
WHERE
person.personid=person_membership.personid AND person_membership.memberid = membership.membershipid
AND
person_membership.personstatus = 'A' AND membership.memstatus = 'A'
GROUP BY
person.firstname,person.surname,person.dob
HAVING COUNT(1) > 1
WITH Encashment AS (
SELECT T.MachineId, T.Amount, CAST(Occured AS DATETIME) AS Occured
FROM (VALUES
(1, 101, '2017-10-20 09:36:40.057')
,(1, 203, '2017-10-14 12:36:30.081')
,(1, 400, '2017-10-11 04:17:38.023')
) AS T(MachineId, Amount, Occured)
), MoneyAccepted AS (
SELECT T.MachineId, T.Amount, CAST(Occured AS DATETIME) AS Occured
FROM (VALUES
(1, 1, '2017-10-15 09:36:40.057')
,(1, 100, '2017-10-16 12:36:30.081')
,(1, 100, '2017-10-12 16:17:38.023')
,(1, 1, '2017-10-13 09:37:47.057')
,(1, 1, '2017-10-13 09:37:47.057')
,(1, 1, '2017-10-12 15:37:47.057')
,(1, 100, '2017-09-15 12:37:31.081')
,(1, 100, '2017-09-15 16:37:31.081')
,(1, 100, '2017-09-16 13:37:31.081')
,(1, 100, '2017-09-17 13:37:31.081')
) AS T(MachineId, Amount, Occured)
)
I can get Amount among two encashment.(Select Amount from Encashment).
But, I want to get amount from MoneyAccepted for every Encashment.
For example: Encashment happened in 20-10-2017,till this dateTime accepted 101(100(2017-10-16 12:36:30.081)+1(2017-10-15 09:36:40.057)) money.
How can I get that?
Thanks in advance!
I think what you are looking for is:
DECLARE #Encashment AS TABLE (MachineID INT, Amount INT, Occured DATETIME2)
DECLARE #MoneyAccepted AS TABLE (MachineID INT, Amount INT, Occured DATETIME2)
INSERT #Encashment (MachineID, Amount, Occured)
VALUES (1, 101, '20171020 09:36:40.057')
, (1, 203, '20171014 12:36:30.081')
, (1, 400, '20171011 04:17:38.023')
INSERT #MoneyAccepted (MachineID, Amount, Occured)
VALUES (1, 1, '20171015 09:36:40.057')
, (1, 100, '20171016 12:36:30.081')
, (1, 100, '20171012 16:17:38.023')
, (1, 100, '20171014 09:17:38.023')
, (1, 1, '20171013 09:37:47.057')
, (1, 1, '20171013 09:37:47.057')
, (1, 1, '20171012 15:37:31.081')
SELECT E.Occured AS Encashment_Occured
, SUM(MA.Amount) AS SUM_Amount
FROM #MoneyAccepted AS MA
INNER JOIN (
SELECT MachineID
, Amount
, Occured
, LAG(Occured) OVER(PARTITION BY MachineID ORDER BY Occured) AS Previous_Occured
FROM #Encashment
) AS E
ON E.MachineID = MA.MachineID
AND E.Occured > MA.Occured
AND E.Previous_Occured <= MA.Occured
GROUP BY E.Occured
Result:
+-----------------------------+------------+
| Encashment_Occured | SUM_Amount |
+-----------------------------+------------+
| 2017-10-14 12:36:30.0810000 | 203 |
| 2017-10-20 09:36:40.0570000 | 101 |
+-----------------------------+------------+
This uses LAG, which was introduced in sql server 2012, in order to get the range of applicable dates in a single row.
Please edit your question, remove html and use plain text for sample data.
I think you could use CROSS APPLY.
Try this:
WITH Encashment AS (
SELECT T.MachineId, T.Amount, CAST(Occured AS DATETIME) AS Occured
FROM (VALUES
(1, 101, '2017-10-20 09:36:40.057')
,(1, 203, '2017-10-14 12:36:30.081')
,(1, 400, '2017-10-11 04:17:38.023')
) AS T(MachineId, Amount, Occured)
), MoneyAccepted AS (
SELECT T.MachineId, T.Amount, CAST(Occured AS DATETIME) AS Occured
FROM (VALUES
(1, 1, '2017-10-15 09:36:40.057')
,(1, 100, '2017-10-16 12:36:30.081')
,(1, 100, '2017-10-12 16:17:38.023')
,(1, 1, '2017-10-13 09:37:47.057')
,(1, 1, '2017-10-13 09:37:47.057')
,(1, 1, '2017-10-12 15:37:47.057')
,(1, 100, '2017-09-15 12:37:31.081')
,(1, 100, '2017-09-15 16:37:31.081')
,(1, 100, '2017-09-16 13:37:31.081')
,(1, 100, '2017-09-17 13:37:31.081')
) AS T(MachineId, Amount, Occured)
)
SELECT M.*, EN.*
FROM MoneyAccepted AS M
CROSS APPLY (
SELECT TOP (1) E.* FROM Encashment AS E
WHERE E.MachineId = M.MachineId AND E.Occured > M.Occured
ORDER BY E.Occured ASC
) AS EN
I have two table where i want to join and show quantity with details.
table are join with ITM,DIA , and total Qty is equal in both table on ITM/DIA combination
I want to split table2 quantity on table1 and populate table2 data along with table1 data.
I have below data for your reference, "table1" and "table2". and you can see my expected result in table "tableResult"
CREATE TABLE table1
(`ITM` varchar(5), `DIA` varchar(4), `LOC` varchar(4), `ID` varchar(3), `QTY` int)
;
INSERT INTO table1
(`ITM`, `DIA`, `LOC`, `ID`, `QTY`)
VALUES
('Item1', 'DIA1', 'LOC1', 'ID1', 3),
('Item1', 'DIA1', 'LOC2', 'ID2', 4),
('Item1', 'DIA1', 'LOC2', 'ID2', 6),
('Item1', 'DIA2', 'LOC2', 'ID2', 6),
('Item1', 'DIA2', 'LOC3', 'ID3', 18),
('Item1', 'DIA2', 'LOC4', 'ID4', 90),
('Item1', 'DIA2', 'LOC4', 'ID5', 23),
('Item1', 'DIA3', 'LOC5', 'ID6', 50),
('Item1', 'DIA3', 'LOC6', 'ID7', 20),
('Item2', 'DIA1', 'LOC4', 'ID8', 44),
('Item2', 'DIA2', 'LOC5', 'ID8', 21),
('Item2', 'DIA3', 'LOC6', 'ID9', 20)
;
CREATE TABLE table2
(`ITM` varchar(5), `DIA` varchar(4), `NTA` varchar(5), `QTY` int)
;
INSERT INTO table2
(`ITM`, `DIA`, `NTA`, `QTY`)
VALUES
('Item1', 'DIA1', 'NTA1', 10),
('Item1', 'DIA1', 'NTA2', 3),
('Item1', 'DIA2', 'NTA3', 30),
('Item1', 'DIA2', 'NTA4', 7),
('Item1', 'DIA2', 'NTA5', 100),
('Item1', 'DIA3', 'NTA6', 70),
('Item2', 'DIA1', 'NTA7', 22),
('Item2', 'DIA1', 'NTA8', 20),
('Item2', 'DIA2', 'NTA9', 6),
('Item2', 'DIA2', 'NTA10', 15),
('Item2', 'DIA3', 'NTA11', 8),
('Item2', 'DIA3', 'NTA11', 12)
;
CREATE TABLE tableResult
(`ITM` varchar(5), `DIA` varchar(4), `LOC` varchar(4), `ID` varchar(3), `QTY` int, `NTA` varchar(5), `NewQTY` int)
;
INSERT INTO tableResult
(`ITM`, `DIA`, `LOC`, `ID`, `QTY`, `NTA`, `NewQTY`)
VALUES
('Item1', 'DIA1', 'LOC1', 'ID1', 3, 'NTA1', 3),
('Item1', 'DIA1', 'LOC2', 'ID2', 4, 'NTA1', 4),
('Item1', 'DIA1', 'LOC2', 'ID2', 6, 'NTA1', 3),
('Item1', 'DIA1', 'LOC2', 'ID2', 6, 'NTA2', 3),
('Item1', 'DIA2', 'LOC2', 'ID2', 6, 'NTA3', 6),
('Item1', 'DIA2', 'LOC3', 'ID3', 18, 'NTA3', 18),
('Item1', 'DIA2', 'LOC4', 'ID4', 90, 'NTA3', 6),
('Item1', 'DIA2', 'LOC4', 'ID4', 90, 'NTA4', 7),
('Item1', 'DIA2', 'LOC4', 'ID4', 90, 'NTA5', 77),
('Item1', 'DIA2', 'LOC4', 'ID5', 23, 'NTA5', 23),
('Item1', 'DIA3', 'LOC5', 'ID6', 50, 'NTA6', 50),
('Item1', 'DIA3', 'LOC6', 'ID7', 20, 'NTA6', 20),
('Item2', 'DIA1', 'LOC4', 'ID8', 44, 'NTA7', 22),
('Item2', 'DIA1', 'LOC4', 'ID8', 44, 'NTA8', 20),
('Item2', 'DIA2', 'LOC5', 'ID8', 21, 'NTA9', 6),
('Item2', 'DIA2', 'LOC5', 'ID8', 21, 'NTA10', 15),
('Item2', 'DIA3', 'LOC6', 'ID9', 20, 'NTA11', 8),
('Item2', 'DIA3', 'LOC6', 'ID9', 20, 'NTA11', 12)
;
Below is screenshot of data;
I can make it with a proc and follow cursor, but I want to is there any easy way with SQL 2014 and I know for a fact CTE recusive trick will help..
Could you please share your solution on this? appreciate lot on your valuable ideas..
You simply need to explode quantities in units both for table1 and table2 and then couple them side by side.
Pay attention to FN_NUMBERS(n), it is a function that returns only one column with numbers from 1 to n, you need it in you database, there are many ways to do it, just google for "tally tables" or look here.
I use the following:
CREATE FUNCTION FN_NUMBERS(
#MAX INT
)
RETURNS #N TABLE (N INT NOT NULL PRIMARY KEY)
BEGIN
WITH
Pass0 as (select '1' as C union all select '1'), --2 rows
Pass1 as (select '1' as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select '1' as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select '1' as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select TOP (#MAX) '1' as C from Pass3 as A, Pass3 as B) --65536 rows
,Tally as (select TOP (#MAX) '1' as C from Pass4 as A, Pass2 as B, Pass1 as C) --4194304 rows
--,Tally as (select TOP (#MAX) '1' as C from Pass4 as A, Pass3 as B) --16777216 rows
--,Tally as (select TOP (#MAX) '1' as C from Pass4 as A, Pass4 as B) --4294836225 rows
INSERT INTO #N
SELECT TOP (#MAX) ROW_NUMBER() OVER(ORDER BY C) AS N
FROM Tally
RETURN
END
Back to the sql..
;with
t1 as (
select *, ROW_NUMBER() over (partition by itm,dia order by loc,id) rn
from table1 t1
join FN_NUMBERS(500) on n<=t1.qty
),
t2 as (
select *, ROW_NUMBER() over (partition by itm,dia order by nta) rn
from table2 t2
join FN_NUMBERS(500) on n<=t2.qty
),
t3 as (
select t1.itm, t1.dia, t1.loc, t1.id, t1.qty, t2.nta, count(t1.n) NewQTY
from t1
join t2 on t1.itm=t2.itm and t1.dia = t2.dia and t1.rn=t2.rn
group by t1.itm, t1.dia, t1.loc, t1.id, t1.qty, t2.nta
)
select *
from t3
order by 1,2,3,4,5,6
I've got a table with almost 10 million views and would to run this query on the latest million or hundred thousand or so.
Here's a SQL fiddle with example data and input/output: http://sqlfiddle.com/#!9/340a41
Is this even possible?
CREATE TABLE object (`id` int, `name` varchar(7), `value` int);
INSERT INTO object (`id`, `name`, `value`)
VALUES
(1, 'a', 1),
(2, 'b', 2),
(3, 'c', 100),
(4, 'a', 1),
(5, 'b', 2),
(6, 'c', 200),
(7, 'a', 2),
(8, 'b', 2),
(9, 'c', 300),
(10, 'a', 2),
(11, 'b', 2),
(12, 'a', 2),
(13, 'b', 2),
(14, 'c', 400)
;
-- Want:
-- name, max(id), count(id)
-- 'a', 4, 2
-- 'b', 14, 5
-- 'a', 12, 3
If you want the latest and the id is implemented sequentially, then you can do this using limit or top. In SQL Server:
select top 100000 o.*
from object o
order by id desc;
In MySQL, you would use limit:
select o.*
from object o
order by id desc
limit 100000
select name, count(id) cnt, max(id) max_id, max(value) max_v
from
(select
top 1000000 -- MS SQL Server
id,name,value
from myTable
limit 1000000 --mySQL
order by id desc)
group by name
remove line which doesn't match your server.