Calculate Distinct count when joining two tables

Calculate Distinct count when joining two tables - sql

id1 id2 year State Gender
==== ====== ====== ===== =======
1 A 2008 ca M
1 B 2008 ca M
3 A 2009 ny F
3 A 2008 ny F
4 A 2009 tx F
select
state, gender, [year],
count (distinct(cast(id1 as varchar(10)) + id2))
from
tabl1
group by state, gender, [year]
i could find the distinct count through statewise.
now i need to find distinct count through city wise. like in CA - 3 cities.. sfo,la,sanjose. i have a look up table that i could find the state and the city.
table2 - city
====
cityid name
==== ====
1 sfo
2 la
3 sanjose
table 3 - state
====
stateid name
==== ====
1 CA
2 Az
table 4 lookup state city
====
pk_cityId pk_state_id
1 1
2 1
select state,city,gender, [year],
count (distinct(cast(id1 as varchar(10)) + id2))
from
tabl1 p
group by state, gender, [year],city
this query to find city and state name.
select c.city,s.state from city_state sc
inner join (select * from state)s on sc.state_id = s.state_id
inner join (select * from city)c on sc.city_id = c.city_id
i did similar to this query using the look up table but the problem is that i get the distinct count throughout the states and the same no of count is repeating for each city in the state.
ex: for count for ca : 10 then the count for cities should be like La - 5, sanjose - 4, sfo-1.
but with my query i get as sfo - 10,la-10, sanjose-10.. i couldnt find the count for the lower level. any help would be appreciated.
UPDATE:
i have updated the query and the lookup tables.

Your implied schema seems to have a flaw:
You're trying to get city level aggregates but you are joining your data table (table1) to your city table (table2) based on the state. This will cause EVERY city in the same state to have the same aggregate values; in your case: all California states having count of 10.
Can you provide actual DDL statements for your two tables? Perhaps you have other columns there (city_id?) that might provide the necessary data for you to correct your query.

I think you need something like the following, but can't be sure w/o further information:
;WITH DistinctState AS
(
SELECT DISTINCT
id1
, id2
, [year]
, [State]
, Gender
FROM tab1
)
SELECT s.state
, c.city
, gender
, [year]
, count(*)
FROM DistinctState s
INNER JOIN
tab2 c
ON s.id1 = c.id1
AND s.id2 = c.id2
GROUP BY
s.state
, c.city
, gender
, [year]

Related

Sql to get all cities in a country given a tree type databsae

I have been given a table like this
Id Name Type ParentId
1 US country -1
2 NY state 1
3 NYC city 2
4 Yonkers city 2
5 Washington state 1
6 Seattle city 5
7 Tacoma city 5
8 Canada country -1
9 Manitoba state 8
I want to write a sql query to write the all cities in a state.
Example
Country state city
US NY NYC
US NY Yonkers
I get that I need to write a recursive query but not able yo do so. I need help to write a sql for this.

You can use a recursive common table expression:
with recursive cte as (
select id, name, type, parentid
from the_table
where type = 'state'
and name = 'NY'
union all
select c.id, c.name, c.type, c.parentid
from the_table c
join cte p on p.id = c.parentid
)
select *
from cte
where type <> 'state';
The above is standard ANSI SQL, but not all database products support this exact syntax.
If the number of levels is fixed (so it's always Country -> State -> City) and will never change, you can use a simpler query:
select c.*
from the_table c
where parentid in (select s.id
from the_table s
where s.type = 'state'
and s.name = 'NY');

SELECT t1.name country, t2.name state, t3.name city
FROM table t1
JOIN table t2 ON t1.id = t2.parent_id
JOIN table t3 ON t2.id = t3.parent_id
WHERE t2.name = 'NY';

How to join tables on column containing text

I have 2 tables to merge:
t1
Continent Country City
-----------------------
Europe Germany Munich
NA Canada Ontario
Asia Singapore (blank)
Asia Japan Tokyo
AND
t2
Country Status
-----------------
Germany Complete
Canada Incomplete
Singapore Complete
Japan Complete
I want to get the continent with 2nd highest "Complete" status. I am new to SQL and I am trying hard to learn the basics, but I cannot get this done.

I understand that you want to pull out the continent that has the second most country marked as completed.
If so, you can join, aggregate, order by the count of completed countries per continent, and then filter on the second rows:
select continent
from
(select distinct country, continent from t2) t2
inner join t1 on t1.country = t2.country
group by continent
order by sum(case when status = 'Complete' then 1 else 0 end) desc
limit 1, 1
Note the use of distinct when retrieving the association of countries and continents: this is because your sample data seems like it could have more than one row per country/continent tuple (since it is referencing cities). Without the distinct, we would potientally generate duplicate rows, causing sum() to be wrong.

I understand you mean the country with more cities with status complete. You can use sub-queries:
with a as
(
select a.country,
sum(case when status = 'Complete' then 1 else 0 end) as CompleteCount
from t1 a inner join t2 b on a.country = b.country
group by a.country
)
select country from
(
select country,
ROW_NUMBER() OVER( ORDER BY CompleteCount desc) as OrderComplete
from a
)a where OrderComplete = 2

T SQL Adress Table with the same Company need latest Contact

i got an Address Table with Primary and Secondary Company locations, example:
ADDRESSES:
ID CompanyName AdressType MainID Location
1 ExampleCompany H 0 Germany
2 ExampleCompany N 1 Sweden
3 ExampleCompany N 1 Germany
and we got another Contacts Table including the latest Contact to each of the Company Locations
Contacts
ID SuperID Datecreate Notes
1 1 10.04.2018 XY
2 3 09.04.2018 YX
3 2 11.04.2018 XX
Now we want to select the latest Contact per Company and sort them so we got a list of all our customers that we did not contact in a long time.
i thought about something like this:
SELECT
ADDRH.ID,
ADDRH.COMPANY1,
TOPCONT.ID,
TOPCONT.DATECREATE,
TOPCONT.NOTES0
FROM dbo.ADDRESSES ADDRH
OUTER APPLY (SELECT TOP 1 ID, SUPERID, DATECREATE, CREATEDBY, NOTES0 FROM DBO.CONTACTS CONT WHERE ADDRH.ID = CONT.SUPERID ORDER BY DATECREATE DESC) TOPCONT
WHERE
TOPCONT.ID IS NOT NULL
ORDER BY TOPCONT.DATECREATE
But this is still missing the fact that we got the same company multiple times in the addresses table. how can i create a list that got each company with the latest contact?
Thanks for your help
Greetings

Well, you have to remove duplicates from address as well. Because of the structure of your data, I think the best approach is to use row_number():
SELECT ac.*
FROM (SELECT a.ID, a.COMPANY1, c.ID, c.DATECREATE, c.NOTES0,
ROW_NUMBER() OVER (PARTITION BY a.COMPANY1 ORDER BY c.DATECREATE DESC) as seqnum
FROM dbo.ADDRESSES a JOIN
DBO.CONTACTS c
ON a.ID = c.SUPERID
WHERE c.ID IS NOT NULL
) ac
WHERE seqnum = 1
ORDER BY c.DATECREATE;

How to compare two different tables based on two different Columns in SQL Server?

The first table consists of Accounts with State and Country information which are mostly correct with few wrong rows:
ID NAME State Country
--------------------------------------------------
1 Account 1 NJ USA
2 Account 2 NY NULL
3 Account 3 Beijing Japan
And I have the second table which has the correct state and County information to which the first tables needs to be compared to:
State_Code State Country_Code Country
-------------------------------------------------------
01 NJ A01 USA
02 NY A01 USA
03 Beijing c01 China
The query should check if the state in the first table exists in the second table and if it does, is it associated with the correct country and the result would be a table of rows with wrong info:
So in my example, the comparison should give me the result:
ID NAME State Country
------------------------------------------------------
2 Account 2 NY NULL
3 Account 3 Beijing Japan
I am a beginner trying to learn more about SQL and I tried solving this using left join and outer join both of which didn't give me the correct result. I would be very grateful if some one could point me in the right direction or give me an example on how I should approach this.
(I am using Microsoft SQL Server Management Studio)

Please try this. You can change the join condition if you need any changes.
Data
CREATE TABLE firstTable
(
ID INT
,NAME VARCHAR(10)
,State VARCHAR(10)
,Country VARCHAR(10)
)
GO
INSERT INTO firstTable VALUES
(1 ,'Account 1','NJ','USA'),
(2 ,'Account 2','NY',NULL),
(3 ,'Account 3','Beijing','Japan')
GO
CREATE TABLE SecondTable
(
State_Code VARCHAR(10)
,State VARCHAR(10)
,Country_Code VARCHAR(10)
,Country VARCHAR(10)
)
GO
INSERT INTO SecondTable VALUES
('01','NJ' ,'A01','USA'),
('02','NY' ,'A01','USA'),
('03','Beijing' ,'c01','China')
GO
SOLUTION
select f.* from firstTable f
FULL JOIN SecondTable s
ON f.State = s.State and f.Country = s.Country
WHERE f.State IS NOT NULL AND ( s.Country_Code IS NULL OR s.State IS NULL )
OUTPUT
ID NAME State Country
----------- ---------- ---------- ----------
2 Account 2 NY NULL
3 Account 3 Beijing Japan
(2 rows affected)

You want to join the two tables on the State and look for records where the Country doesn't match. This query should get you there:
SELECT t1.*, t2. Country AS Expected
FROM table1 t1
JOIN table2 t2 ON t1.State = t2.State
WHERE t1.County != t2.Country
Unfortunately, I don't know your table names, so I just had to go with table1 and table2, but hopefully this gives you what you need. I also added in the expected Country, but you can remove that if you don't need it.

I think what you're looking for is "NOT EXISTS". Basically you look for any state/country combos in the first table that don't exist in the second table. Here's an example.
SELECT tbo.ID, tbo.NAME, tbo.STATE, tbo.COUNTRY
FROM TableOne tbo
WHERE NOT EXISTS(
SELECT * FROM TableTwo tbt
WHERE tbo.State = tbt.State
AND tbo.Country = tbt.Country
)

grabbing information by joining multiple tables

This is going to be a little complicated. Let me start with my tables.
clients [src = 0]
---------
clientID code company
--------- ------- ---------
1 ABC ABC Corp
2 DEF DEF Corp
carriers [src = 1]
---------
clientID code company
--------- ------- -------
1 ABC ABC Inc.
2 JHI JHI Inc.
link
--------
contactID uID src
--------- ----- ----
1 1 0
1 1 1
1 2 0
contact info
--------------
contactID fname lname
--------- ------- --------
1 John Smith
2 Quincy Jones
So, i'm trying to do a search for say "ABC" on the link table. The link table needs to basically join to either the carriers or clients table depending on the link.src column. It should find two matches, one in the clients and one in the carriers, but since both resolve to contactID (links table) of 1, i should then query the contact info table and return
Found 1 record(s):
John Smith
I hope this makes sense. Any help is greatly appreciated!

Here is one approach using left join:
select co.*
from link l left join
clients cl
on l.src = 0 and l.uid = cl.code left join
carriers ca
on l.src = 1 and l.uid = ca.code left join
contacts co
on l.contactid = co.contactid
where 'ABC' in (co.code, cl.code)

Here is another approach. First, you UNION the Clients and Carriers tables and add a new column ContactType to differentiate one from the other. Use 0 for Clients and 1 for Carriers, the same as src. Then you perform a LEFT JOIN to get the desired result.
;WITH Clients(ClientID, Code, Company) AS(
SELECT 1, 'ABC', 'ABC Corp' UNION ALL
SELECT 2, 'DEF', 'DEF Corp'
)
,Carriers(ClientID, Code, Company) AS(
SELECT 1, 'ABC', 'ABC Inc.' UNION ALL
SELECT 2, 'JHI', 'JHI Inc.'
)
,Link(ContactId, UID, Src) AS(
SELECT 1, 1, 0 UNION ALL
SELECT 1, 1, 1 UNION ALL
SELECT 1, 2, 0
)
,ContactInfo(ContactID, FName, LName) AS(
SELECT 1, 'John', 'Smith' UNION ALL
SELECT 2, 'Quincy', 'Jones'
)
-- START
,Contact(ContactID, ContactType, Code, Company) AS(
SELECT
ClientID, 0, Code, Company
FROM Clients
UNION ALL
SELECT
ClientID, 1, Code, Company
FROM Carriers
)
SELECT DISTINCT
ci.FName,
ci.LName
FROM Link l
LEFT JOIN Contact c
ON c.ContactID = l.UID
AND c.ContactType = l.src
LEFT JOIN ContactInfo ci
ON ci.ContactID = c.ContactID
WHERE
c.Code = 'ABC'

Look at this from a modeling pov. You have two tables with the same type of data in each one, the entity company. The only difference between them is their role or relationship to your company. So why not keep them all in the same bucket?
Companies:
ID code Name
-- ---- ---------
1 ABC ABC Corp
2 DEF DEF Corp
3 JHI JHI Inc.
If a particular company could only be a client or a carrier, that designation could be placed in the Companies table. Since obviously one company can be both, the designation goes into a separate table. The following shows that company 1, 'ABC', is both a client ('L') and carrier ('R'), company 2 is only a client and company 3 is only a carrier.
CompanyRoles:
CompanyID Type
--------- ----
1 'L'
1 'R'
2 'L'
3 'R'
There is no need to keep multiple copies of the same data just because a company can play multiple roles. If there is role-dependent data, data that is maintained on client but not for carriers, or vice versa, then subtables can keep that.
As for the contacts, if a company has one contact no matter the role, the contact reference can be added to the Companies table. If the contact is role dependent, it is added to the CompanyRoles table.
CompanyRoles:
CompanyID Type ContactID
--------- ---- ---------
1 'L' 1
1 'R' 2
2 'L' 3
3 'R' 4
Want to see a list of clients?
select c.ID as ClientID, c.Code as ClientCode, c.Name as ClientName,
ci.ContactName
from Companies c
join CompanyRoles cr
on cr.CompanyID = c.ID
and cr.Type = 'L'
left join Contacts ct -- In case no contact is currently defined
on ct.ContactID = cr.ContactID
join ClientSpecificData csd
on csd.ClientID = c.ID;
Want to see a list of carriers?
select c.ID as CarrierID, c.Code as CarrierCode, c.Name as CarrierName,
ci.ContactName
from Companies c
join CompanyRoles cr
on cr.CompanyID = c.ID
and cr.Type = 'R'
left join Contacts ct -- In case no contact is currently defined
on ct.ContactID = cr.ContactID
join CarrierSpecificData csd
on csd.ClientID = c.ID;
You can create views on the last two queries to provide a single data source for those apps that deal with only Clients or only Carriers. Triggers on the views can deal with incoming DML statements as needed to route the data to the appropriate tables.
As you can see, the queries are clean and simple. Data integrity is easy and scalability is not an issue. What more could you want?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas