Google Sheets "=QUERY()" JOIN ON or equitant

Google Sheets "=QUERY()" JOIN ON or equitant - sql

I have one large spreadsheet with names, addresses, phone numbers, emails, Etc. Some records have a second address for which I have a column named "Address 2" I was hopping to write a query that would give me an output with duplicate rows of which the only difference was the "Address 2" column would be in the main address Column.
Data:
A
B
C
D
E
F
G
1
Status
Name
Address
Phone
Email
Address2
Hire Date
2
Joe Smith
123 Smith St
201 555 3099
Joe#stackoverflow.com
7th Avenue Sq
4
Q
Jane Smith
321 Not Smith St
12/15/1980
5
Robert Smith
818 555 4321
Robert#googlesheets.com
12/13/1981
Looking for an Query output to look like:
A
B
C
D
E
F
1
Status
Name
Address
Phone
Email
Hire Date
2
Joe Smith
123 Smith St
201 555 3099
Joe#stackoverflow.com
3
Joe Smith
7th Avenue Sq
201 555 3099
Joe#stackoverflow.com
4
Q
Jane Smith
321 Not Smith St
12/15/1980
5
Robert Smith
818 555 4321
Robert#googlesheets.com
12/13/1981
I was trying something like:
=QUERY({Sheet1!$A2:$G,Sheet1!$B2:$B,Sheet1!$F2:$J },"SELECT Col1, Col2, Col3, Col4, Col5, Col7 JOIN Col6 ON Col2 = Col2")
Which I think is more or less how it would be in SQL, but Google sheets doesn't have a join function.
Is there any way to get this done?

most simple you can do is:
=QUERY({A1:E, G1:G; A2:B, F2:F, D2:E, G2:G}, "where Col3 is not null", )

Something like this?
You can stack data of the same length with {},
this sample create 2 query function and stack them together.
=ArrayFormula(
LAMBDA(DATA_1,DATA_2,
QUERY({DATA_1;DATA_2},"WHERE Col2 IS NOT NULL ORDER BY Col2",1)
)(
QUERY({A1:G4},"SELECT "&JOIN(",","Col"&{1,2,3,4,5,7}),1),
QUERY({A1:G4},"SELECT "&JOIN(",","Col"&{1,2,6,4,5,7})&" WHERE Col6 IS NOT NULL LABEL "&JOIN(",","Col"&{1,2,6,4,5,7}&"''"),1)
)
)

Related

Access SQL concatenating items by order

I have a need to concatenate name information from two tables in a specific order. So far, all I have is the SQL to retrieve the data. I need help with the SQL to output a name string in the correct order.
example data:
TBL_TITLE_ORDER
ORDER_ID
ORDER_SEQUENCE
TITLE_ID
FIRSTNAME_ID
20
1
30
20
2
456
20
3
33
21
1
31
21
2
32
TBL_TITLE
TITLE_ID
TITLE_NAME
30
Mr
31
Mrs
32
Jones
33
Smith
TBL_FIRSTNAME
FIRSTNAME_ID
NAME
456
John
SELECT TBL_TITLE.TITLE_NAME, TBL_FIRSTNAME.NAME
FROM (TBL_TITLE_ORDER LEFT JOIN TBL_TITLE ON TBL_TITLE_ORDER.TITLE_ID = TBL_TITLE.TITLE_ID) LEFT JOIN TBL_FIRSTNAME ON TBL_TITLE_ORDER.FIRSTNAME_ID = TBL_FIRSTNAME.FIRSTNAME_ID
WHERE (TBL_TITLE_ORDER.ORDER_ID) = 20
ORDER BY TBL_TITLE_ORDER.ORDER_SEQUENCE;
The output I need is a complete name string in the proper order sequence. There may or may not be a record in the TBL_FIRSTNAME table.
What I have so far:
TITLE_NAME
NAME
Mr
John
Smith
Required output:
Mr John Smith
Mrs Jones

Join multiple tables to return only one result for each record from main table

Currently I have three tables I am joining. I have data that was migrated from one system(old) to another system(new). I need to compare this data to ensure matches but also mismatches. I have three tables. One has the list of accounts being moved. The two systems have differnt ID types so this first table is a list of all IDs for the two tables and each account that was moved. So this is my base population.
ID1 ID2
ABC 123
ABC 123
ABC 123
DEF 456
DEF 456
DEF 456
I then have table 2 which is all the data from the old system.
ID Fname Lname
ABC John Smith
ABC Tom Smith
ABC Kate Smith
DEF Jason Thomas
DEF Ruby Thomas
DEF Alex Johnson
Then table 3 is all the data found in the new system.
ID Fname Lname
123 John Smith
123 Tom Smith
123 Kate Smith
456 Jason Thomas
456 Ruby Thomas
Right now when I join these tables on the ID I get a lot more rows than I need.
When I do my join I receive this:
ID Fname_old Lname_old ID2 Fname_new Lname_new
ABC John Smith 123 John Smith
ABC John Smith 123 Tom Smith
ABC John Smith 123 Kate Smith
I am trying to join them where it only returns the row that matches, and if it can't find a match I should still get the ID from the ID file and the data from table 2(old data) as this is the data that was sent to the new system.
ID1 ID2 Fname_old Lname_old Fname_new Lname_new
ABC 123 John Smith John Smith
ABC 123 Tom Smith Tom Smith
ABC 123 Kate Smith Kate Smith
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
DEF 456 Alex Johnson
The code I am using is:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from table1 a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID

If its just duplicate rows in your first table you could try distincting them in a derived table like below:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (SELECT DISTINCT ID1, ID2 FROM table1) a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID

You are joining them on ID columns.
ID columns are usually UNIQUE while you have multiple identical IDs and specify join on those IDs.
Since you need to compare data, i suggest you lookup MATCH and how it works as that seems to be closer to what you are looking for here.

You can get a match using row_number():
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (select a.*,
row_number() over (partition by id order by id) as seqnum
from table1 a
) a left join
(select b.*,
row_number() over (partition by id order by id) as seqnum
from table2 b
) b
on a.ID1 = b.ID and a.seqnum = b.seqnum
(select c.*,
row_number() over (partition by id order by id) as seqnum
from table3 c
) c
on a.ID2 = c.ID and a.seqnum = c.seqnum;
Note: This does not preserve the "ordering" of the original values, so any rows can be matched with any other. Why? SQL tables represent unordered sets.
If there is an ordering in the tables, you can use that in the order by clauses to get a match consistent with the ordering.

If you have a compare chance for name and last name this code will work.
select DISTINCT a.ID1, a.ID2, b.fname as fname_old, b.lname as lname_old, c.fname as
fname_new, c.lname as lname_new from table2 b
left join table1 a on a.ID1=b.ID
left join table3 c on a.ID2=c.ID and b.Fname=c.Fname and b.Lname=c.Lname
My Result :
ID1 ID2 fname_old lname_old fname_new lname_new
ABC 123 John Smith John Smith
ABC 123 Kate Smith Kate Smith
ABC 123 Tom Smith Tom Smith
DEF 456 Alex Johnson NULL NULL
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas

You say that this is data transferred to two systems. So you expect all data to match. You could hence reduce the query to only find data that doesn't match, if any.
Here is a SQL standard compliant query. You tagged your request with hive. I don't know about hive, so you may have to adjust the query.
select
t2.id as id1,
t3.id as id2,
t2.fname as fname_old,
t2.lname as lname_old,
t3.fname as fname_new,
t3.lname as lname_new
from table2 t2
full outer join t3
on t3.fname = t2.fname
and t3.lname = t2.lname
and exists (select null from table1 t1 where t1.id1 = t2.id and t1.id2 = t3.id)
where t2.id is null or t3.id is null;
This is a full anti join. It returns all rows that have no exact match in the other table. It doesn't, however guesstimate which deviating rows may be pairs. You will get a result like this:
ID1 | ID2 | Fname_old | Lname_old | Fname_new | Lname_new
----+-----+-----------+-----------+-----------+----------
DEF | | Alex | Johnson | |
GHI | | Jone | Miller | |
GHI | | Maxx | Miller | |
GHI | | Fritz | Miller | |
| 789 | | | Joan | Miller
| 789 | | | Max | Miller
| 799 | | | Fritz | Miller
As you see, you would have to examine this result manually. But ideally the query shouldn't return any row at all, which would just prove that everything went as expected and nobody (system or person) messed with the data :-)

Find rows for an id where value changes in any of the coloumn in sql

I have a table with say 10 columns where the unique identifier is id column.
I need print all the rows where value for any given id changes for any of the 10 coloums.
Eg:
Id name city state address mail phn number store no business name date
1 adam california ca acbb street xyz#gmail.com 12345 456 abc pvt 01-01-2017
1 adam newyork Ny avc xyz#gmail.com 12345 456 abc pvt 11-03-2018
1 adam newyork Ny avc xyz#gmail.com 12345 456 abcd pvt 24-03-2018
2 brian dallas Tx sasa sasa#gmail.com 21212 dsdssd ltd 01-01-2017
2 brian dallas Tx sasa sasa#gmail.com 232323 21212 dsdssd ltd 01-01-2017
3 donald dallas Tx qwer qwwqw#gmail.com 2121212 435345 sffsss ltd 23-01-2017
As shown above for id 1 there is a change from the first row to 2nd row on city state address so that should come .also for id 1 the 3rd row changes only business name so that should also be printed.
So basically for an id the status for that id before that date should be checked and if there is any change in any column it should be printed.
Same goes for id 2.For id 3 there is only one entry so that should not be printed.
The check needs to be done based on date for an id.
The data set that I have is in millions so would need a fine tuned query.

If you have a column for row number( for example ROWID), you can simply select all rows that has a pair (based on ID) in previous records and check if any of the last row columns differs from the first one:
select a.* from [yourTable] a
inner join [yourTable] b
on a.Id = b.Id where a.RowId > b.RowId
and (a.name <> b.name OR a.city <> b.city OR a.state <> b.state OR a.address <> b.address ... )

SQL Server display left join result set as additional columns instead of rows

I am working with SQL Server 2012, I have a two tables, Prospects and ProspectContact, with this sample data:
ProspectID ProspectName ProspectAdd
-----------------------------------------------
1 ABC Company India
2 XYZ Company UAE
3 PQR Company KSA
4 JKL Company INDIA
ProspectContacts table:
ProspectID CName CAdd CDesignation Email
---------------------------------------------------------------------------
1 Mr Sagar India Manager sagar#gmail.com
1 Mr Saurabh India Manager Saurabh#gmail.com
2 Mr Swami UAE Director swami#gmail.com
2 Mr .ABC UAE Engg abc#gmail.com
3 Mr PQR KSA Manager pqr#gmail.com
4 Mr XYZ INDIA Manager xyz#gmail.com
If I apply left join on the above tables with below query
SELECT
P.[PROSPECTNAME], P.[ADDRESS], PC.CONTACTPERSON, PC.EMAILID
FROM
[PROSPECTS] P
LEFT JOIN
[PROSPECTCONTACTS] PC ON P.PROSPECTID = PC.PROSPECTID
WHERE
P.USERID = #UserID
ORDER BY
CDATE DESC
I get this result set:
ProspectName Adress Name Email
---------------------------------------------------
ABC COMPANY INDIA Mr. Sagar sagar#gmail.com
ABC COMPANY INDIA Mr. Saurabh saurabh#gmail.com
whereas my expected result set would be this:
ProspectName Adress Name1 Email1 Name2 Email2
-----------------------------------------------------------------------------
ABC COMPANY INDIA Mr. Sagar sagar#gmail.com Mr.Saurabh saurabh#gmail.com
XYZ COMPNAY USA Mr.Swami swami#gmail.com Mr. ABC abc#gmail.com
PQR COMPANY KSA Mr.Pqr pqr#gmail.com NULL NULL
JKL COMPNAY INDIA Mr.XYZ xyz#gmail.com NULL NULL

Try something like this:
SELECT P.[PROSPECTNAME], P.[ADDRESS], PC.CONTACTPERSON, PC.EMAILID
FROM [PROSPECTCONTACTS] PC
LEFT JOIN [PROSPECTS] P
ON P.PROSPECTID=PC.PROSPECTID
WHERE P.USERID=#UserID ORDER BY CDATE Desc

get extra rows for each group where date doesn't exist

I've been playing with this for days, and can't seem to come up with something. I have this query:
select
v.emp_name as Name
,MONTH(v.YearMonth) as m
,v.SalesTotal as Amount
from SalesTotals
Which gives me these results:
Name m Amount
Smith 1 123.50
Smith 2 40.21
Smith 3 444.21
Smith 4 23.21
Jones 1 121.00
Jones 2 499.00
Jones 3 23.23
Jones 4 41.82
etc....
What I need to do is use a JOIN or something, so that I get a NULL value for each month (1-12), for each name:
Name m Amount
Smith 1 123.50
Smith 2 40.21
Smith 3 444.21
Smith 4 23.21
Smith 5 NULL
Smith 6 NULL
Smith ... NULL
Smith 12 NULL
Jones 1 121.00
Jones 2 499.00
Jones 3 23.23
Jones 4 41.82
Jones 5 NULL
Jones ... NULL
Jones 12 NULL
etc....
I have a "Numbers" table, and have tried doing:
select
v.emp_name as Name
,MONTH(v.YearMonth) as m
,v.SalesTotal as Amount
from SalesTotals
FULL JOIN Number n on n.Number = MONTH(v.YearMonth) and n in(1,2,3,4,5,6,7,8,9,10,11,12)
But that only gives me 6 additional NULL rows, where what I want is actually 6 NULL rows for each group of names. I've tried using Group By, but not sure how to use it in a JOIN statement like that, and not even sure if that's the correct route to take.
Any advice or direction is much appreciated!

Here's one way to do it:
select
s.emp_name as Name
,s.Number as m
,st.salestotal as Amount
from (
select distinct emp_name, number
from salestotals, numbers
where number between 1 and 12) s left join salestotals st on
s.emp_name = st.emp_name and s.number = month(st.yearmonth)
Condensed SQL Fiddle

You could do:
SELECT EN.emp_name Name,
N.Number M,
ST.SalesTotal Amount
FROM ( SELECT Number
FROM NumberTable
WHERE Number BETWEEN 1 AND 12) N
CROSS JOIN (SELECT DISTINCT emp_name
FROM SalesTotals) EN
LEFT JOIN SalesTotals ST
ON N.Number = MONTH(ST.YearMonth)
AND EN.emp_name = ST.emp_name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas