SQL update from one Table A to table B based on match 2 ID based on aggregation of points - sql

trying to update the value to table(final) from table(first_stage), both of these tables have the same field name but different values, tables contents are:
(Min_Date) date,
(Max_Date) date,
(NoofDays) int,
(IMSI) string,
(Site) string,
(Down_Link) int,
(Up_Link) int,
(Connection) int
based on IMSI and Site, if it exists on the table row then take the minimum date as Min_Date and Maximum date as Max_Date and get the
min(Min_Date),max(Max_Date)sum(NoofDays),sum(Down_Link),sum(up_Link),sum(connection)
and if the both of row id's are not matched(IMSI,Site) with table (Final) then insert the row into final table. I'm still newbee with sql
table first_stage:
MinDate Max_Date NoofDays IMSI Site Down_link Up_link Connection
2019-03-22 2019-03-26 1 222 google 1 1 1
2019-03-26 2019-03-27 3 222 youtube 1 1 1
2019-03-02 2019-03-27 5 333 facebook 2 3 1
2019-03-02 2019-03-27 5 111 facebook 20 33 11
table final:
MinDate Max_Date NoofDays IMSI Site Down_link Up_link Connection
2019-03-01 2019-03-27 1 222 google 2 2 1
2019-03-12 2019-03-25 1 222 youtube 2 2 2
2019-03-25 2019-03-27 4 333 facebook 3 6 1
it must matched with both IMSI and Site to make update statement, final table after update must be look like as the below:
table final:
MinDate Max_Date NoofDays IMSI Site Down_link Up_link Connection
2019-03-01 2019-03-27 2 222 google 3 3 2
2019-03-12 2019-03-27 4 222 youtube 3 3 3
2019-03-02 2019-03-27 9 333 facebook 5 9 2
2019-03-02 2019-03-27 5 111 facebook 20 33 11

I have never worked with vertica, but I think this might work:
MERGE
INTO FINAL
USING FIRST_STAGE
ON IMSI = FIRST_STAGE.IMSI and Site = FIRST_STAGE.Site
WHEN MATCHED THEN UPDATE SET
Min_Date = least(FIRST_STAGE.Min_Date, Min_Date),
Max_Date = greatest(FIRST_STAGE.Max_Date, Max_Date),
NoofDays = FIRST_STAGE.NoofDays + NoofDays,
Down_Link = FIRST_STAGE.Down_Link + Down_Link,
up_Link = FIRST_STAGE.up_Link + up_Link,
connection = FIRST_STAGE.connection + connection
WHEN NOT MATCHED THEN INSERT ( Min_Date,
Max_Date,
NoofDays,
IMSI,
Site,
Down_Link,
Up_Link,
Connection )
VALUES ( FIRST_STAGE.Min_Date,
FIRST_STAGE.Max_Date,
FIRST_STAGE.NoofDays,
FIRST_STAGE.IMSI,
FIRST_STAGE.Site,
FIRST_STAGE.Down_Link,
FIRST_STAGE.Up_Link,
FIRST_STAGE.Connection )

Related

How to index match with conditions in sql

I have tables like this:
regist table
userID
registDate
1
2022-01-22
2
2022-01-23
session table
userID
date_key
traffic
null
2022-01-02
facebook
1
2021-01-03
facebook
1
2021-01-04
google
1
2021-01-05
linkedin
2
2021-01-15
facebook
2
2021-01-25
facebook
3
2021-01-20
facebook
Output
userID
date_key
traffic
regist date
1
2021-01-03
facebook
2022-01-22
1
2021-01-04
google
2022-01-22
1
2021-01-05
linkedin
2022-01-22
2
2021-01-15
facebook
2022-01-23
How do I merge the tables so that I can return the regist date. Do I do a right join?
Is this correct?
select *
from sessiontables st
left join registtable rt on st.userID = rt.userID
where st.userID is not null
How to do exist userID exist in regist table statement?
if I understand correctly, You can try to use self join with an aggregate function.
select rt.userID,
st.date_key,
st.traffic,
rt.registDate
from (
SELECT userID,min(date_key) date_key,traffic
FROM sessiontables
GROUP BY traffic,userID
) st
JOIN registtable rt
ON st.userID=rt.userID

Query to find active days per year to find revenue per user per year

I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;
You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Writing SQL INSERT which retrieves its data from two separate related rows

I am writing a SQL script that is to insert a new record using data from two rows that are under the same AccountID.
My table looks like the following:
AccountID | ActivityId | DisplayDetails | TransactionDate | EnvironmentId
============================================================================
1 7 Display1 2015-02-02 00:00:00.000 1
1 8 DisplayThis1 2018-02-02 00:00:00.000 1
1 7 Display2 1999-02-02 00:00:00.000 2
1 8 DisplayThis2 2000-02-02 00:00:00.000 2
My fix is to find find each 7,8 combination and insert a new row with ActivityId 78 that gets the DisplayDetails from ActivityId 7 and TransactionDate from ActivityId 8.
My queries looks like the following:
SELECT *
INTO #ActivityEight
FROM Account A
WHERE A.ActivityId = 8
INSERT INTO #Account (AccountId, ActivityId, DisplayDetails, TransactionDate)
SELECT VL.AccountId, 78, S.DisplayDetails, VL.TransactionDate
FROM #temp2 VL WITH(NOLOCK)
JOIN #ActivityEight S
ON VL.AccountId = S.AccountId
WHERE VL.ActivityId = 7
However when I run SELECT * FROM Account I get a 78 row for each 7 and 8 row, when I should only get 1 78 row per 7 and 8 combination.
AccountID | ActivityId | DisplayDetails | TransactionDate | EnvironmentId
=============================================================================
1 7 Display1 2015-02-02 00:00:00.000 1
1 8 DisplayThis1 2018-02-02 00:00:00.000 1
1 7 Display2 1999-02-02 00:00:00.000 2
1 8 DisplayThis2 2000-02-02 00:00:00.000 2
1 78 DisplayThis1 2015-02-02 00:00:00.000 NULL
1 78 DisplayThis2 2015-02-02 00:00:00.000 NULL
1 78 DisplayThis1 1999-02-02 00:00:00.000 NULL
1 78 DisplayThis2 1999-02-02 00:00:00.000 NULL
I believe I can utilize the EnvironmentId to achieve the desired functionality, but I'm not sure how.
Any help would be appreciated.
Thanks!
I think this will help you
INSERT INTO #Account (AccountId, ActivityId, DisplayDetails, TransactionDate)
SELECT VL.AccountId, 78, S.DisplayDetails, VL.TransactionDate
FROM Account VL WITH(NOLOCK)
JOIN Account S ON VL.AccountId = S.AccountId and VL.EnvironmentId = S.EnvironmentId
WHERE VL.ActivityId = 7 and S.ActivityId = 8

Access SQL - Select only the last sequence

I have a table with an ID and multiple informative columns. Sometimes however, I can have multiple data for an ID, so I added a column called "Sequence". Here is a shortened example:
ID Sequence Name Tel Date Amount
124 1 Bob 873-4356 2001-02-03 10
124 2 Bob 873-4356 2002-03-12 7
124 3 Bob 873-4351 2006-07-08 24
125 1 John 983-4568 2007-02-01 3
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
So, I would like to obtain only these lines:
124 3 Bob 873-4351 2006-07-08 24
125 2 John 983-4568 2008-02-08 13
126 1 Eric 345-9845 2010-01-01 18
Anyone could give me a hand on how I could build a SQL query to do this ?
Thanks !
You can calculate the maximum sequence using group by. Then you can use join to get only the maximum in the original data.
Assuming your table is called t:
select t.*
from t join
(select id, MAX(sequence) as maxs
from t
group by id
) tmax
on t.id = tmax.id and
t.sequence = tmax.maxs