Sum rows of two tables - sql

I have a Database with two tables:
subscriber_mm
uid_local | uid_foreign | uid_partner
7 |2 |0
7 |4 |0
2 |1 |0
2 |2 |0
5 |1 |0
5 |3 |0
partner_mm
uid_local | uid_foreign | uid_partner
7 |1 |1
My goal is to count the total number of rows by uid_local from both tables
example:
count both tables by uid_local = 7
result: 3
example:
count both tables by uid_local = 2
result: 2
This is my solution (not the best) without the WHERE statement
SELECT sum(
ROWS ) AS total_rows
FROM (
SELECT count( * ) AS ROWS
FROM partner_mm
UNION ALL
SELECT count( * ) AS ROWS
FROM subscriber_mm
) AS u
how can i implement the WHERE statement?

Here pass a value instead of 2 for your query :
select sum(total_count ) as total_count from
(select count(*) as total_count from subscriber_mm s where s.uid_local=2
union all
select count(*) as total_count from partner_mm m where m.uid_local=2) as a
or
select a.uid_local,sum(total_count ) as total_count from
(select s.uid_local as uid_local, count(*) as total_count from subscriber_mm s group by s.uid_local
union all
select m.uid_local as uid_local, count(*) as total_count from partner_mm m group by m.uid_local) as a
group by a.uid_local

Please Try it
select uid_local,uid_foreign,uid_partner from subscriber_mm
union
select uid_local,uid_foreign,uid_partner from partner_mm
You can use Union For sum two table rows

No need of doing SUM since you need only to count the total number of rows returned from both tables for particular uid_local. The total rows can be obtained by using UNION ALL operator which clubs the resultset returned from both the tables WITHOUT removing the repeated records
Select count(*) as Result From
(
Select * from subscriber_mm
where uid_local=7
union all
Select * from subscriber_mm
where uid_local=7
)as tmp

Related

Filling in missing data in Snowflake

I have a table in Snowflake like this:
TIME USER ITEM
1 frank 1
2 frank 0
3 frank 0
4 frank 0
5 frank 2
6 alf 5
7 alf 0
8 alf 6
9 alf 0
10 alf 9
I want to be able to replace all the zeroes with the next non-zero value, so in the end I have a table like this:
TIME USER ITEM
1 frank 1
2 frank 2
3 frank 2
4 frank 2
5 frank 2
6 alf 5
7 alf 6
8 alf 6
9 alf 9
10 alf 9
How would I write a query that does that in Snowflake?
You can use conditional_change_event function for this - documented here:
with base_table as (
select
t1.*,
conditional_change_event(item) over (order by time desc) event_num
from test_table t1
order by time desc
)
select
t1.time,
t1.user,
t1.item old_item,
coalesce(t2.item, t1.item) new_item
from base_table t1
left join base_table t2 on t1.event_num = t2.event_num + 1 and t1.item = 0
order by t1.time asc
Above SQL Results:
+----+-----+--------+--------+
|TIME|USER |OLD_ITEM|NEW_ITEM|
+----+-----+--------+--------+
|1 |frank|1 |1 |
|2 |frank|0 |2 |
|3 |frank|0 |2 |
|4 |frank|0 |2 |
|5 |alf |2 |2 |
|6 |alf |5 |5 |
|7 |alf |0 |6 |
|8 |alf |6 |6 |
|9 |alf |0 |9 |
|10 |alf |9 |9 |
+----+-----+--------+--------+
You can use lead(ignore nulls):
select t.*,
(case when item = 0
then lead(nullif(item, 0) ignore nulls) over (partition by user order by time)
else item
end) as imputed_item
from t;
You can also phrase this using first_value():
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc)
from t;
If you want to use first_value() or last_value() in Snowflake, please keep in mind that Snowflake supports window frames differently from the ANSI standard as documented here. This means that if you want to use the default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW you have to include it explicitly in the statement, otherwise, the default would be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and that is why the LAST_VALUE example from the previous answer would not work correctly. Here is one example that would work:
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc rows between unbounded preceding and current row)
from t;
Nothing wrong with above solutions ... but here's a different approach ... I think it's simpler.
select * from good
union all
select
bad.time
,bad.user
,min(good.item)
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2
Full COPY|PASTE|RUN SQL:
with cte as (
select * from (
select 1 time, 'frank' user , 1 item union
select 2 time, 'frank' user , 0 item union
select 3 time, 'frank' user , 0 item union
select 4 time, 'frank' user , 0 item union
select 5 time, 'frank' user , 2 item union
select 6 time, 'alf' user , 5 item union
select 7 time, 'alf' user , 0 item union
select 8 time, 'alf' user , 6 item union
select 9 time, 'alf' user , 0 item union
select 10 time, 'alf' user , 9) )
, good as (select * from cte where item<> 0)
, bad as (select * from cte where item= 0)
select * from good
union all
select
bad.time
,bad.user
,min(good.item )
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2

SQL select top five most recent row and distinct by a specific column

Ok, So say I have a table as picture below name appModelFlat only with a few hundred more rows. It does not have a date field but I want to find out the five most recently created environments (EnvName). There is only 14 possible environments (EnvName). But I want to select the five most recently inserted rows that inserted different EnvName. That is to say I want to select distinct EnvName (Although distinct doesn't work this way) most recent 5 rows , and I know they are the most recent by their id. The higher the id the newer the row is. Any help on this query would be appreciated.
id|AppName|EnvName|ServerTypeName|ServerId|OS |OSVersion|CPU|Memory|ExtraStorage|MachineDesc |
----------------------------------------------------------------------------------------------------
1 |ASB |DEV |App |1 |Windows|7 |4 |4 |100 |ASB-DEV-App |
----------------------------------------------------------------------------------------------------
5 |AMS |DEV |APP |2 |RedHat |7.2 |4 |4 |50 |AMS-DEV-App |
----------------------------------------------------------------------------------------------------
6 |SPB |TST |App |1 |Windows|7 |2 |8 |50 |SPB-TST-App |
----------------------------------------------------------------------------------------------------
7 |SBI |TST |Oracle |1 |Solaris|11 |4 |8 |100 |SBI-TST-Oracle|
----------------------------------------------------------------------------------------------------
Here is my first attempt although I'm not sure if it is right. It does give me five results.
SELECT DISTINCT top 5 [ID] = ( SELECT TOP 1 [ID] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[AppName]= ( SELECT TOP 1 [AppName] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[EnvName]
,[ServerTypeName] = ( SELECT TOP 1 [ServerTypeName] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[ServerId] = ( SELECT TOP 1 [ServerId] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
,[OS] = ( SELECT TOP 1 [OS] FROM [AppModelFlat] Y WHERE Y.[EnvName] = X.[EnvName])
FROM [AppModelFlat] X order by id desc
edit:
For expected result. Lets say I only wanted to select the top 2 since I only gave 5 entries here. I would want to get back the following.
5 |AMS |DEV |APP |2 |RedHat |7.2 |4 |4 |50 |AMS-DEV-App |
----------------------------------------------------------------------------------------------------
7 |SBI |TST |Oracle |1 |Solaris|11 |4 |8 |100 |SBI-TST-Oracle|
Because I only have one of each EnvName and each row has the highest Id number for that row.
using row_number() to get the latest row for each EnvName, and only taking the top 5 from ordered Id desc
select top 5 *
from (
select *
, rn = row_number() over (partition by EnvName order by id desc)
from appModelFlat
) s
where rn = 1
order by id desc
top with ties version:
select top 5 *
from (
select top 1 with ties *
from appModelFlat
order by row_number() over (partition by EnvName order by id desc)
) s
order by id desc
A simple sub query would also do the trick:
SELECT TOP 5 Id, AppName, EnvName, ServerTypeName, ServerId, OS
FROM AppModelFlat Records
INNER JOIN (SELECT EnvName,
MAX(Id) as Id
FROM AppModelFlat) Latest ON Records.Id = Latest.Id

SQL select all ids with the max value, then count how many rows are associated with each id

Suppose I have the following table HASCO:
Table HASCO
+--------+------+-----+
|PID |Amount|Date |
+--------+------+-----+
|1 |1000 |Date1|
+--------+------+-----+
|1 |8000 |Date2|
+--------+------+-----+
|2 |8000 |Date3|
+--------+------+-----+
|2 |3000 |Date4|
+--------+------+-----+
|2 |4000 |Date5|
+--------+------+-----+
|3 |4000 |Date6|
+--------+------+-----+
I wanna get the following result:
+--------+--------+
|PID |numTours|
+--------+--------+
|1 |2 |
+--------+--------+
|2 |3 |
+--------+--------+
PID 1 and 2 both have the maximum amount 8000, then PID 1 has 2 rows and PID 2 has three rows.
I tried the following query:
SELECT HASCO.PID, COUNT(*) AS numTour
FROM HASCO
GROUP BY HASCO.PID
HAVING HASCO.PID IN
(
SELECT HASCO.PID
FROM HASCO
WHERE HASCO.AMOUNT = (SELECT MAX(HASCO.AMOUNT) FROM HASCO)
This works on db2 but is there a better way to do it?
The sub-query in Having clause can simplified to
SELECT HASCO.PID, COUNT(*) AS numTour
FROM HASCO
GROUP BY HASCO.PID
HAVING max(HASCO.AMOUNT) = (SELECT MAX(HASCO.AMOUNT) FROM HASCO)
If DB2 supports windowed aggregate functions then
Select PID,count(1)
(
Select HASCO.PID,
Max(AMOUNT)Over() as Max_amount,
Max(AMOUNT) Over(Partition by PID) as Max_Pid_Amt
From HASCO
) A
Where Max_amount = Max_Pid_Amt
Group by PID
Solution 1
WITH MAXIVALUE(SELECT MAX(HASCO.AMOUNT) maxi FROM HASCO)
SELECT f1.PID, COUNT(*) AS numTours
FROM HASCO f1
GROUP BY f1.PID
HAVING max(f1.AMOUNT) = (SELECT maxi FROM MAXIVALUE)
Solution 2
with hasmaxi (
select distinct f1.pid from HASCO f1
where exists
(select 1 from HASCO f2 having max(f2.AMOUNT)=f1.AMOUNT)
)
SELECT f3.PID, COUNT(*) AS numTours
FROM HASCO f3 inner join hasmaxi f4 on f3.PID=f4.f1.PID
GROUP BY f3.PID
Solution 3
SELECT f3.PID, COUNT(*) AS numTours
FROM HASCO f3 inner join
(
select distinct f1.pid from HASCO f1
where exists
(select 1 from HASCO f2 having max(f2.AMOUNT)=f1.AMOUNT)
) f4 on f3.PID=f4.f1.PID
GROUP BY f3.PID

How to find rows with the sequence of values in a column using SQL?

Consider the example table name "Person".
Name |Date |Work_Hours
---------------------------
John| 22/1/13 |0
John| 23/1/13 |0
Joseph| 22/1/13 |1
Joseph| 23/1/13 |1
Johnny| 22/1/13 |0
Johnny| 23/1/13 |0
Jim| 22/1/13 |1
Jim| 23/1/13 |0
In the above table, I have to find rows with the sequence of '0' followed by '1' in the column Work_Hours. Please share the idea/Query to do it.
The output I need is
Name |Date |Work_Hours
---------------------------
John| 23/1/13 |0
Joseph| 22/1/13 |1
Johnny| 23/1/13 |0
Jim| 22/1/13 |1
To look into previous or following records, you would usually use the aggregate functions LAG and LEAD:
select first_name, work_date, work_hours
from
(
select first_name, work_date, work_hours
, lag(work_hours) over (order by first_name, work_date) as prev_work_hours
, lead(work_hours) over (order by first_name, work_date) as next_work_hours
from person
)
where (work_hours = 0 and next_work_hours = 1) or (work_hours = 1 and prev_work_hours = 0)
order by first_name, work_date;
Some thing like
select no_hours.Name, no_hours.Date, some_hours.Date
From Person no_hours
inner join Person some_hours
On no_hours.Name = some_hours.name and some_hours.Date > no_hours.date
Where no_hours.work_hours = 0 and some_hours.work_hours = 1
would be a start.
Needless to say, name is not a good unique identifier...
Also works hours going from 0 to 1 to 0 would appear, and 0 to 1 to 0 to 1 would appear a lot...
Would be >= no_hours.date if you can go from 0 to 1 on the same day.
Perhaps:
SELECT p1.Name,
p1.Date AS Date_1,
p2.Date AS Date_2,
p1.Work_Hours As Work_Hours_1,
p2.Work_Hours As Work_Hours_2
FROM Person p1
INNER JOIN Person p2
on p1.Name=p2.Name
AND p1.Work_Hours=0
AND p2.Work_Hours=1
ORDER BY p1.Name,p1.Date,p2.Date,Work_Hours_1,Work_Hours_2
Demo
Your problem (as phrased) is equivalent to asking: Is there a 1 that follows any given row with a 0 for a name?
You can do this a correlated subquery:
select Name, Date, Work_Hours
from (select t.*,
(select min(date)
from table t2
where t2.name = t.name and t2.date > t.date and t2.Work_Hours = 1
) as DateOfLater1
from table t
) t
where DateOfLater1 is not null and work_hours = 0 or
(DateOfLater1 = date and work_hours = 1);

SQL CTE compare rows in the same table

I have a table with customers from different data sources. There are SSN, License#, and some unique IDs but not all sources have the same IDs. I would like to compare the records on the ID columns (SSN, License, SystemID) and assign a mapped ID if same person found.
I am assuming I can use CTE but not sure where to start. Still trying to learn my way in SQL. Any help will be appreciated. Thanks.
This is how the table looks:
Source|RowID|SSN |License|SystemID
A |1 |SSN1|Lic111 |
A |2 | | |Sys666
B |3 |SSN2| |Sys777
C |4 |SSN1| |
D |5 | |Lic333 |
D |6 | |Lic333 |Sys666
E |7 | | |Sys777
Results (added MapCustomerID)
Source|RowID|SSN |License|SystemID|MapCustomerID
A |1 |SSN1|Lic111 | |1
A |2 | | |Sys666 |2
B |3 |SSN2| |Sys777 |3
C |4 |SSN1| | |1
D |5 | |Lic999 | |4
D |6 | |Lic333 |Sys666 |2
E |7 | | |Sys777 |3
Here is what may be a "good-enough" approach to the problem.
Along each of the three dimensions, find the minimum row id for that dimensions (with a special handling of NULLs). The overall customer identifier is then the minimum of these three ids. To make it sequential with no gaps, use dense_rank().
with ids as (
select t.*,
(case when SSN is not null
then min(RowId) over (partition by SSN)
end) as SSN_id,
(case when License is not null
then min(RowId) over (partition by License)
end) as License_id,
(case when SystemId is not null
then min(RowId) over (partition by SystemId)
end)as SystemId_id
from t
),
leastid as (
select ids.*,
(case when SSN_Id <= coalesce(License_Id, SSN_Id) and
SSN_Id <= coalesce(SystemId_id, SSN_Id)
then SSN_Id
when License_Id <= coalesce(SystemId_id, License_Id)
then License_Id
else SystemId_id
end) as LeastId
from ids
)
select Source, RowID, SSN, License, SystemID,
dense_rank(LeastId) over (order by LeastId) as MapCustomerId
from LeastIds;
This is not a complete solution, but it works for your data. It does not work in the following case:
A |1 |SSN1|Lic111 | |1
A |2 |SSN1| |Sys666 |2
A |3 | | |Sys666 |2
Because this requires two "hops".
When I have faced this situation in the past, I have created the extra column in the table and repeatedly used update to get the minimum id over the different dimensions. Such iteration quickly connects the different pieces. It is probably possible to write a recursive CTE to do the same thing. But, the simpler solution above may solve your problem.
EDIT:
Because I've faced this problem before, I wanted to come up with a single query solution (rather than iterating through updates). This is possible using recursive CTEs. Here is code that seems to work:
with t as (
select 'A' as source, 1 as RowId, 'SSN1' as SSN, 'Lic111' as License, 'ABC' as SystemId union all
select 'A', 2, 'SSN1', NULL, 'Sys666' union all
select 'A', 3, NULL, NULL, 'Sys666' union all
select 'A', 4, NULL, 'Lic222', 'Sys666' union all
select 'A', 5, NULL, 'Lic222', NULL union all
select 'A', 6, NULL, 'Lic444', NULL
),
first as (
select t.*,
(select min(RowId)
from t t2
where t2.SSN = t.SSN or
t2.License = t.License or
t2.SystemId = t.SystemId
) as minrowid
from t
),
cte as (
select rowid, minrowid
from first
union all
select cte.rowid, first.minrowid
from cte join
first
on cte.minrowid = first.rowid and
cte.minrowid > first.minrowid
),
lookup as (
select rowid, min(minrowid) as minrowid,
dense_rank() over (order by min(minrowid)) as MapCustomerId
from cte
group by rowid
)
select t.*, lookup.MapCustomerId
from t join
lookup
on t.rowid = lookup.rowid;