I want to join two tables with a common column in Big query? - sql

To join the tables, I am using the following query.
SELECT *
FROM(select user as uservalue1 FROM [projectname.FullData_Edited]) as FullData_Edited
JOIN (select user as uservalue2 FROM [projectname.InstallDate]) as InstallDate
ON FullData_Edited.uservalue1=InstallDate.uservalue2;
The query works but the joined table only has two columns uservalue1 and uservalue2.
I want to keep all the columns present in both the table. Any idea how to achieve that?

#legacySQL
SELECT <list of fields to output>
FROM [projectname:datasetname.FullData_Edited] AS FullData_Edited
JOIN [projectname:datasetname.InstallDate] AS InstallDate
ON FullData_Edited.user = InstallDate.user
or (and preferable)
#standardSQL
SELECT <list of fields to output>
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
ON FullData_Edited.user = InstallDate.user
Note, using SELECT * in such cases lead to Ambiguous column name error, so it is better to put explicit list of columns/fields you need to have in your output
The way around it is in using USING() syntax as in example below.
Assuming that user is the ONLY ambiguous field - it does the trick
#standardSQL
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)
For example:
#standardSQL
WITH `projectname.datasetname.FullData_Edited` AS (
SELECT 1 user, 'a' field1
),
`projectname.datasetname.InstallDate` AS (
SELECT 1 user, 'b' field2
)
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)
returns
user field1 field2
1 a b
whereas using ON FullData_Edited.user = InstallDate.user gives below error
Error: Duplicate column names in the result are not supported. Found duplicate(s): user

Don't use subqueries if you want all columns:
SELECT *
FROM [projectname.FullData_Edited] as FullData_Edited JOIN
[projectname.InstallDate] as InstallDate
ON FullData_Edited.uservalue1 = InstallDate.uservalue2;
You may have to list out the particular columns you want to avoid duplicate column names.
While you are at it, you should also switch to standard SQL.

Related

Want to concatenate column of the second query to the first query but getting errors such as "query block has incorrect number of result columns"

SELECT
ID, PRIM_EMAIL, SEC_EMAIL, PHONE
FROM
STUDENTS.RECORDS
WHERE
ID IN (SELECT ID FROM STUDENTS.INFO WHERE ROLL_NO = '554')
UNION
SELECT NAME
FROM STUDENTS.INFO
WHERE ROLL_NO = '554';
Here Roll_No is a user inserted data so for now I have hard coded it. Basically with the help of ROLL_NO I sort the STUDENTS_INFO table from where I get the ID and based on that I try to get PRIM_EMAIL, SEC_EMAIL, PHONE from the STUDENTS.RECORDS table while matching the foreign keys of both the tables. In addition to the current result set I also want to have the prov_name column.
Any help is very much appreciated. Thank you!
I suspect that you want to put all this information on the same row, which suggests a join rather than union all:
select
r.ID,
r.PRIM_EMAIL,
r.SEC_EMAIL,
r.PHONE,
r.NAME
from STUDENTS.RECORDS r
inner join STUDENTS.INFO i ON i.ID = r.ID
where I.ROLL_NO = '554';
I think the source of your error query block has incorrect number of result columns is coming from trying to union together a table with 4 columns (id, prim_email, sec_email, phone) with 1 column (name).
From your question, I am gathering that you want a single table of id, prim_email, sec_email, phone from students.records and name from students.info.
I think the following query using CTE's might get you (partially) to your final result. You may want to refactor for optimizing performance.
with s_records as ( select * from students.records ),
s_info as ( select * from students.info ),
joined as (
select
s_records.id,
s_records.prim_email,
s_records.sec_email,
s_records.phone,
s_info.name
from s_records
left join s_info
on s_records.roll_no = s_info.roll_no
where roll_np = '554' )
select * from joined
Overall, I think that a join will be part of your solution rather than a union :-)

Ensuring two columns only contain valid results from same subquery

I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?
Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol
You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol
From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.

SQL Server : compare two tables with UNION and Select * plus additional label column

I've been playing around with the sample on Jeff' Server blog to compare two tables to find the differences.
In my case the tables are a backup and the current data. I can get what I want with this SQL statement (simplified by removing most of the columns). I can then see the rows from each table that don't have an exact match and I can see from which table they come.
SELECT
MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
FROM
(SELECT
'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT
'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
FROM
[JAS001new].[dbo].[AR_CustomerAddresses]) tmp
GROUP BY
[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
HAVING
COUNT(*) = 1
This Stack Overflow Answer gives me a much cleaner SQL query but does not tell me from which table the rows come.
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
UNION
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
INTERSECT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
I could use the first version but I have many tables that I need to compare and I think that there has to be an easy way to add the source table column to the second query. I've tried several things and googled to no avail. I suspect that maybe I'm just not searching for the correct thing since I'm sure it's been answered before.
Maybe I'm going down the wrong trail and there is a better way to compare the databases?
Could you use the following setup to accomplish your goal?
SELECT 'New not in Old' Descriptor, *
FROM
(
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
) a
UNION
SELECT 'Old not in New' Descriptor, *
FROM
(
SELECT * FROM [JAS001].[dbo].[AR_CustomerAddresses]
EXCEPT
SELECT * FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) b
You can't add the table name there because union, except, and intersection all compare all columns. This means you can't differentiate between them by adding the table name to the query. A group by gives you control over what columns are considered in finding duplicates so you can exclude the table name.
To help you with the large number of tables you need to compare you could write a sql query off the metadata tables that hold table names and columns and generate the sql commands dynamically off those values.
Derive one column using table names like below
SELECT MIN(TableName) as TableName
,[strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
FROM
(SELECT 'Old' as TableName
,[JAS001].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001].[dbo].[AR_CustomerAddresses]
UNION ALL
SELECT 'New' as TableName
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCustomer]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strAddress1]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strCity]
,[JAS001new].[dbo].[AR_CustomerAddresses].[strPostalCode]
,'[JAS001new].[dbo].[AR_CustomerAddresses]' as table_name_came
FROM [JAS001new].[dbo].[AR_CustomerAddresses]
) tmp
GROUP BY [strCustomer]
,[strAddress1]
,[strCity]
,[strPostalCode]
,table_name_came
HAVING COUNT(*) = 1

SQL DB2 store query result into variable

Usind a DB2 tables:
This is what is done manually in a table
select app_id from table_app where APP_NAME='App_Temp';
gets me an ID, say it's 234
I copy than and do:
select * form table_roles where role_app_id=234;
gets me a row , which is my desired end result.
Is there a way save the result of the first one into a variable and do the second query without the manual step in between using local variables?
I know, you can query out the information with a very simple join between two tables, but I'd like to know, how it works with variables. Is there a way?
Just can just plug it in:
select *
from table_roles
where role_app_id = (select app_id from table_app where APP_NAME = 'App_Temp');
If there can be more than one match, use in instead of =.
You can also phrase this as a join:
select r.*
from table_roles r join
table_app
on r.role_app_id = a.app_id and APP_NAME = 'App_Temp';
However, this might return duplicates, if two apps have the same name. In that case, use select distinct:
select distinct r.*
from table_roles r join
table_app
on r.role_app_id = a.app_id and APP_NAME = 'App_Temp';

sql select into subquery

I'm doing a data conversion between systems and have prepared a select statement that identifies the necessary rows to pull from table1 and joins to table2 to display a pair of supporting columns. This select statement also places blank columns into the result in order to format the result for the upload to the destination system.
Beyond this query, I will also need to update some column values which I'd like to do in a separate statement operation in a new table. Therefore, I'm interested in running the above select statement as a subquery inside a SELECT INTO that will essentially plop the results into a staging table.
SELECT
dbo_tblPatCountryApplication.AppId, '',
dbo_tblPatCountryApplication.InvId,
'Add', dbo_tblpatinvention.disclosurestatus, ...
FROM
dbo_tblPatInvention
INNER JOIN
dbo_tblPatCountryApplication ON dbo_tblPatInvention.InvId = dbo_tblPatCountryApplication.InvId
ORDER BY
dbo_tblpatcountryapplication.invid;
I'd like to execute the above statement so that the results are dumped into a new table. Can anyone please advise how to embed the statement into a subquery that will play nicely with a SELECT INTO?
You can simply add an INTO clause to your existing query to create a new table filled with the results of the query:
SELECT ...
INTO MyNewStagingTable -- Creates a new table with the results of this query
FROM MyOtherTable
JOIN ...
However, you will have to make sure each column has a name, as in:
SELECT dbo_tblPatCountryApplication.AppId, -- Cool, already has a name
'' AS Column2, -- Given a name to create that new table with select...into
...
INTO MyNewStagingTable
FROM dbo_tblPatInvention INNER JOIN ...
Also, you might like to use aliases for your tables, too, to make code a little more readable;
SELECT a.AppId,
'' AS Column2,
...
INTO MyNewStagingTable
FROM dbo_tblPatInvention AS i
INNER JOIN dbo_tblPatCountryApplication AS a ON i.InvId = a.InvId
ORDER BY a.InvId
One last note is that it looks odd to have named your tables dbo_tblXXX as dbo is normally the schema name and is separated from the table name with dot notation, e.g. dbo.tblXXX. I'm assuming that you already have a fully working select query before adding the into clause. Some also consider using Hungarian notation in your database (tblName) to be a type of anti-pattern to avoid.
If the staging table doesn't exist and you want to create it on insert then try the following:
SELECT dbo_tblPatCountryApplication.AppId,'', dbo_tblPatCountryApplication.InvId,
'Add', dbo_tblpatinvention.disclosurestatus .......
INTO StagingTable
FROM dbo_tblPatInvention
INNER JOIN dbo_tblPatCountryApplication
ON dbo_tblPatInvention.InvId = dbo_tblPatCountryApplication.InvId;
If you want to insert them in a specific order then use try using a sub-query in the from clause:
SELECT *
INTO StagingTable
FROM
(
SELECT dbo_tblPatCountryApplication.AppId, '', dbo_tblPatCountryApplication.InvId,
'Add', dbo_tblpatinvention.disclosurestatus .......
FROM dbo_tblPatInvention
INNER JOIN dbo_tblPatCountryApplication ON
dbo_tblPatInvention.InvId = dbo_tblPatCountryApplication.InvId
order by dbo_tblpatcountryapplication.invid
) a;
Try
INSERT INTO stagingtable (AppId, ...)
SELECT ... --your select goes here