SQL Server: Merge Data Rows in single table in output - sql

I have a SQL Server table with the following fields and sample data:
ID Name Address Age
23052-PF Peter Timbuktu 25
23052-D1 Jane Paris 22
23052-D2 David London 24
23050-PF Sam Beijing 22
23051-PF Nancy NYC 26
23051-D1 Carson Cali 22
23056-PF Grace LA 28
23056-D1 Smith Boston 23
23056-D2 Mark Adelaide 26
23056-D3 Hose Mexico 25
23056-D4 Mandy Victoria 24
Each ID with -PF is unique in the table.
Each ID with the -Dx is related to the same ID with the -PF.
Each ID with -PF may have 0 or more IDs with -Dx.
The maximum number of -Dx rows for a given -PF is 9.
i.e. an ID 11111-PF can have 11111-D1, 11111-D2, 11111-D3 up to 11111-D9.
Output expected for above sample data:
ID ID (without suffix) PF_Name PF_Address PF_Age D_Name D_Address D_Age
23052-PF 23052 Peter Timbuktu 25 Jane Paris 22
23052-PF 23052 Peter Timbuktu 25 David London 24
23050-PF 23050 Sam Beijing 22 NULL NULL NULL
23051-PF 23051 Nancy NYC 26 Carson Cali 22
23056-PF 23056 Grace LA 28 Smith Boston 23
23056-PF 23056 Grace LA 28 Mark Adelaide 26
23056-PF 23056 Grace LA 28 Hose Mexico 25
23056-PF 23056 Grace LA 28 Mandy Victoria 24
I need to be able to join the -PF and -Dx as above.
If a -PF has 0 Dx rows, then D_Name, D_Address and D_Age columns in the output should return NULL.
If a -PF has one or more Dx rows, then PF_Name, PF_Address and PF_Age should repeat for each row in the output and D_Name, D_Address and D_Age should contain the values from each related Dx row.
Need to use MSSQL.
Query should not use views or create additional tables.
Thanks for all your help!

select
pf.ID,
pf.IDNum,
pf.Name as PF_Name,
pf.Address as PF_Address,
pf.Age as PF_Age,
dx.Name as D_Name,
dx.Address as D_Address,
dx.Age as D_Age
from
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) = '-PF'
) pf
left outer join
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) != '-PF'
) dx
on pf.IDNum = dx.IDNum
SqlFiddle demo: http://sqlfiddle.com/#!6/dfdbb/1

SELECT t1.ID, LEFT(t1.ID,5) "ID (without Suffix)",
t1.Name "PF_Name", t1.Address "PF_Address", t1.Age "PF_Age",
t2.Name "D_Name", t2.Address "D_Address", t2.Age "D_Age"
FROM PFTable t1
LEFT JOIN PFTable t2 on LEFT(t1.ID,5) = LEFT(t2.ID,5)
WHERE RIGHT(t1.ID,2) = 'PF'

Related

SPARK SQL query for match output

I have 2 ds as below
ds1:
CustId Name Street1 City
=================================
1 Ron 1 Mn strt Hyd
2 Ashok westend av Delhi
3 Rajesh 5th Cross Mumbai
4 Venki 2nd Main NY
ds2:
Id CustName CustAddr1 City
=========================================
11 Ron 1 Mn Street Hyd
12 Ron eastend avn Patna
13 Rajesh 2nd Main Mumbai
14 Girish 100ft rd BLR
15 Dinesh 60ft Mum
16 Rajesh 1st Cross Mumbai
I am trying to find an exact match like ds1.Name --> ds2.CustName, ds1.city --> ds2.city
Output:
GrpID Rec_Id Count ds1.cond Rec_Id Count ds2.cond
======================================================================
1 1 1 Ron + Hyd 1001 1 Ron + Hyd
2 2 1 Rajesh + Mumbai 1002 2 Rajesh + Mumbai
How to write (SPARK) SQL query for it?
I tried
final Dataset<Row> rslt = spark.sql("select * from ds1 JOIN ds2 ON ds1.Name==ds2.CustName");
(using only name)
but it gives output of mXn for m matching rows in ds1 with n matching rows in ds2.
My first work on this. Any suggestion?

Join multiple tables and pick results from most recent table

I have 4 tables. I want all the rows and cols from my first table tbl_2021 and only those data which are not in tbl_2021 but present in the the rest 3 tables, but based on one condition
if there id exist in tbl_2020, tbl_2019 and in tbl_2018 then i need the id and it's details from the most recent table that is tbl_2020.
if an id is across 2019 and 2018 table, then i need the data from 2019 so on like that.If in 2020 and 018 then 020 and so on
if the same is across 2021,2020,2019 and 2018 then the data from 2021 is selected.
And - I'm hail from a shell scripting background, and i've just started with sql. so if any noble mind could tell me the approach or what i should do to get these pieces together would mean more than happiness to me. Thank you
tbl_2021
id
name
addr
location
country
contintent
gdp
123
rob
dware
texas
us
us
8
456
lilly
gwood
london
uk
uk
5
670
rick
utown
newyrok
us
us
8
490
zang
kcity
hk
hongkong
hongkong
6
tbl_020
id
location
name
999
ger
roger
888
bel
leslie
670
us
marie
tbl_019
id
location
name
data
network
999
uk
roger
xx
na
555
rus
vladmir
ux
na
879
us
marie
xx
ua
481
cn
kim
tbl_018
id
location
name
data
network
823
uk
roger
xx
na
555
rus
vladmir
ux
na
879
us
maria
xx
ua
670
us
marie
xy
uy
888
in
raj
xx
jo
output:
id
name
addr
location
country
contintent
gdp
123
rob
dware
texas
us
us
8
456
lilly
gwood
london
uk
uk
5
670
rick
utown
newyrok
us
us
8
490
zang
kcity
hk
hongkong
hongkong
6
999
roger
ger
888
leslie
bel
555
vladmir
rus
879
marie
us
481
kim
cn
823
roger
uk
First, you should fix your data model. It is not a good idea to store such data in separate tables. Instead, you should store in a single table with a year column.
Second, I think you can solve your problem using full join, but it is a little tricky:
select coalesce(t21.id, t20.id, t19.id, t18.id) as id,
coalesce(t21.name, t20.name, t19.name, t18.name) as name,
t21.addr,
. . .
from tbl_2021 t21 full join
tbl_2020 t20
on t21.id = t20.id full join
tbl_2019 t19
on t19.id = coalesce(t21.id, t20.id) full join
tbl_2018 t18
on t18.id = coalesce(t21.id, t20.id, t19.id);
You need to carefully figure out how the columns should be pulled from the different tables.
First you can union all the data from four tables with union all. Then with row_number() we need to serialized rows for each id from higher to lower. Finally select one row for each id with highest year .
with cte as
(
select id,name addr ,location ,country, contintent,data,network, row_number()over (partition by id order by sl ) rn from
(
select id,name ,addr ,location , country, contintent,data,network, 1 sl from tbl_21
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data, network, 2 sl from tbl_20
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data,network, 3 sl from tbl_19
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data,network, 4 sl from tbl_18
)t
)
select id,name ,addr ,location ,country, contintent,data,network from cte where rn=1

sqlite3: COUNT & EXCEPT not working as expected

I'm fairly new to SQL but having searched the internet for an answer to this I still cannot get my COUNT and EXCEPT statements to select what I want.
My Database:
sqlite> CREATE TABLE Football(Team TEXT, Player TEXT, Age INTEGER, primary key(Team, Player));
sqlite> .separator ,
sqlite> .import databaseTest Football
sqlite> .headers on
sqlite> .mode col
sqlite> SELECT Team, Player, Age FROM Football ORDER BY Team;
Team Player Age
---------- ---------- ----------
Arsenal Cech 38
Arsenal Giroud 29
Arsenal Sanchez 28
Arsenal Walcott 27
Chelsea Costa 29
Chelsea Courtois 25
Chelsea Hazard 26
Chelsea Willian 26
Liverpool Can 23
Liverpool Coutinho 24
Liverpool Wjinaldum 25
Liverpool Woodburn 17
Manchester Aguero 29
Manchester Jesus 19
Manchester Silva 28
Manchester Toure 34
Manchester De Gea 26
Manchester Felliani 29
Manchester Rooney 32
Manchester Schweinste 35
Tottenham Delle Ali 22
Tottenham Kane 24
Tottenham Rose 24
Tottenham Vertonghen 27
What I want to do is SELECT the COUNT of teams that do not have a player over the age of 30. So the select statement should be 3 (Chelsea, Liverpool, Tottenham).
This is the statement I've tried and assumed would work:
sqlite> SELECT COUNT(DISTINCT Team) FROM Football
...> EXCEPT
...> SELECT COUNT(DISTINCT Team) FROM Football WHERE Age > 30;
COUNT(DISTINCT Team)
--------------------
6
But as you can see it returns '6'. What am I doing wrong and how can I get the correct result?
Here is another way. Look at the maximum age for each team:
SELECT COUNT(*)
FROM (SELECT Team
FROM Football
GROUP BY Team
HAVING MAX(Age) <= 30
) t;
You can also use EXCEPT, but this also requires a subquery. You need to do the set operation before doing the count:
SELECT COUNT(DISTINCT TEAM)
FROM (SELECT Team FROM Football
EXCEPT
SELECT Team FROM Football WHERE Age > 30
) t;
Strictly speaking, this query could use COUNT(*) rather than COUNT(DISTINCT). However, it can be troublesome to remember that EXCEPT (like UNION) removes duplicate values.

Max from Query from Select data

I am pretty new to SQL and need some help with a query. I am trying the find the MAX TradeCodeID using the following query. It is not returning the data I need. It is pretty much returning t.
select distinct
t.useremployeeid,
max(t.usertradeID),
t.Projectfullname,
t.userfirstname + ' '+ t.userlastname as GreatestPM
from
(select distinct
users.UserTradeId, UserEmployeeID, UserFirstName, UserLastName,
ProjectFullName, ProjectManager,
max(ScheduleDate) as LastDate
from
schedules
left outer join
users on ScheduleUserID = UserID
left outer join
Phases on SchedulePhaseID = PhaseID
left outer join
Projects on phases.ProjectID = projects.ProjectID
left outer join
UserTrades on UserTrades.UserTradeID = Users.UserTradeID
where
users.useractive = 1
and users.useremployeeid <> 0
and users.usertradeid between 21 and 24
and projectfullname is not null
group by
users.UserTradeid, UserEmployeeID, UserFirstName, UserLastName,
ProjectFullName, ProjectManager
having
max(scheduledate) > getdate() ) t
group by
t.projectfullname, t.userfirstname,t.userlastname, UserEmployeeID
order by
t.projectfullname
From the following data set:
useremployeeid UserTradeID Projectfullname GreatestPM
--------------------------------------------------------------------------------
12121 22 162331.05 John Smith
25487 21 166324.1 Chuck Norris
45639 21 166324.1 Brad Pitt
35789 23 166324.1 John Doe
15697 24 166324.1 Matt Damon
28957 23 166324.1 Taylor Swift
76985 21 166324.1 Tony Romo
25496 21 166324.1 George Strait
85695 22 167091.1 Robin Roberts
75632 21 167091.1 Scott Smith
66897 22 1663341.01 Garth Brooks
58766 21 1663341.01 Travis Tritt
37895 21 1663341.01 Sara Roberts
95687 21 1663352.01 Justin Timberlake
85697 24 1663352.01 Sally Walker
I am looking to get the following results:
useremployeeid UserTradeID Projectfullname GreatestPM
----------------------------------------------------------
12121 22 162331.05 John Smith
15697 24 166324.1 Matt Damon
85695 22 167091.1 Robin Roberts
66897 22 1663341.01 Garth Brooks
85697 24 1663352.01 Sally Walker
Thank you for the help.

SQL Select Distinct returning duplicates

I am trying to return the country, golfer name, golfer age, and average drive for the golfers with the highest average drive from each country.
However I am getting a result set with duplicates of the same country. What am I doing wrong? here is my code:
select distinct country, name, age, avgdrive
from pga.golfers S1
inner join
(select max(avgdrive) as MaxDrive
from pga.golfers
group by country) S2
on S1.avgdrive = s2.MaxDrive
order by avgdrive;
These are some of the results I've been getting back, I should only be getting 15 rows, but instead I'm getting 20:
COUN NAME AGE AVGDRIVE
---- ------------------------------ ---------- ----------
Can Mike Weir 35 279.9
T&T Stephen Ames 41 285.8
USA Tim Petrovic 39 285.8
Ger Bernhard Langer 47 289.3
Swe Fredrik Jacobson 30 290
Jpn Ryuji Imada 28 290
Kor K.J. Choi 37 290.4
Eng Greg Owen 33 291.8
Ire Padraig Harrington 33 291.8
USA Scott McCarron 40 291.8
Eng Justin Rose 25 293.1
Ind Arjun Atwal 32 293.7
USA John Rollins 30 293.7
NIr Darren Clarke 37 294
Swe Daniel Chopra 31 297.2
Aus Adam Scott 25 300.6
Fij Vijay Singh 42 300.7
Spn Sergio Garcia 25 301.9
SAf Ernie Els 35 302.9
USA Tiger Woods 29 315.2
You are missing a join condition:
select s1.country, s1.name, s1.age, s1.avgdrive
from pga.golfers S1 inner join
(select country, max(avgdrive) as MaxDrive
from pga.golfers
group by country
) S2
on S1.avgdrive = s2.MaxDrive and s1.country = s2.country
order by s1.avgdrive;
Your problem is that some people in one country have the same average as the best in another country.
DISTINCT eliminated duplicate rows, not values in some fields.
To get a list of countries with ages, names, and max drives, you would need to group the whole select by country.