I am working with a dataset of tables that (a) often requires joining tables together, however also (b) frequently has duplicate columns names. Any time I write a query along the lines of:
SELECT
t1.*, t2.*
FROM t1
LEFT JOIN t2 ON t1.this_id = t2.matching_id
...I get the error Duplicate column names in the result are not supported. Found duplicate(s): this_col, that_col, another_col, more_cols, dupe_col, get_the_idea_col
I understand that with BigQuery, it is better to avoid using * when selecting tables, however my data tables aren't too big + my bigquery budget is high, and doing these joins with all columns helps significantly with data exploration.
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
Thanks!
The simplest way is to select records rather than columns:
SELECT t1, t2
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
This is pretty much what I do for ad hoc queries.
If you want the results as columns and not records (they don't look much different in the results), you can use EXCEPT:
SELECT t1.* EXCEPT (duplicate_column_name),
t2.* EXCEPT (duplicate_column_name),
t1.duplicate_column_name as t1_duplicate_column_name,
t2.duplicate_column_name as t2_duplicate_column_name
FROM t1 LEFT JOIN
t2
ON t1.this_id = t2.matching_id;
Is there anyway BigQuery can automatically handle / rename columns in these situations (e.g. prefix the column with the table name), as opposed to not allowing the query all together?
This is possible with BigQuery Legacy SQL - which can be handy for data exploration unless you are dealing with data types or using some functions/features specific to standard sql
So below
#legacySQL
SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.this_id = t2.matching_id
will produce output where all column names will be prefixed with respective alias like t1_this_id and t2_matching_id
Related
I'm migrating the backend a budget database from Access to SQL Server and I ran into an issue.
I have 2 tables (let's call them t1 and t2) that share many fields in common: Fund, Department, Object, Subcode, TrackingCode, Reserve, and FYEnd.
If I want to join the tables to find records where all 7 fields match, I can create an inner join using each field:
SELECT *
FROM t1
INNER JOIN t2
ON t1.Fund = t2.Fund
AND t1.Department = t2.Department
AND t1.Object = t2.Object
AND t1.Subcode = t2.Subcode
AND t1.TrackingCode = t2.TrackingCode
AND t1.Reserve = t2.Reserve
AND t1.FYEnd = t2.FYEnd;
This works, but it runs very slowly. When the backend was in Access, I was able to solve the problem by adding a calculated column to both tables. It basically, just concatenated the fields using "-" as a delimiter. The revised query is as follows:
SELECT *
FROM t1 INNER JOIN t2
ON CalculatedColumn = CalculatedColumn
This looks cleaner and runs much faster. The problem is when I moved t1 and t2 to SQL Server, the same query gives me an error message:
I'm new to SQL Server. Can anyone explain what's going on here? Is there a setting I need to change for the calculated column?
Posting this as an answer from my comment.
Usually, this is an issue with mismatched Data types between the two columns referenced. Check and make sure the data types of the two fields (CompositeID) are the same.
You have to calculate the columns before joining them as the ON clause can only access columns for the table.
It is no good to have two identical tables anyway so you should rethink your design completely.
SELECT t1a.*,t2a.*
FROM (SELECT CalculatedColumn, * FROM t1) t1a INNER JOIN (SELECT CalculatedColumn, * FROM t2 ) t2a
ON t1a.CalculatedColumn = t2a.CalculatedColumn
Is there a way in ABAP's OpenSQL to simplify the select columns in a JOIN when I want to grab all the fields of one table but only selected fields from the other table(s)?
For instance, in mysql we can simply do:
SELECT tb1.*, tb2.b, tb2.d
FROM tableA tb1
INNER JOIN tableB tb2 ON tb1.x = tb2.a
However, OpenSQL does not seem to allow selecting tb1~*, tb2~b, tb2~d so I have to resort to this:
SELECT tb1.x, tb1.y, tb1.z, tb2.b, tb2.d
FROM tableA tb1
INNER JOIN tableB tb2 ON tb1.x = tb2.a
For very large tables, especially standard tables, this becomes unwieldy, difficult to read and more annoying to maintain.
Is there a better way to select all fields of tb1 and some fields from tb2?
Yes, this is possible in the OpenSQL from 7.40 SP08. See this article.
The quotation from the article has that.
Column Specification
In the SELECT list, you can specify all columns of a data source using the syntax data_source~* from 7.40, SP08 on. This can be handy when working with joins.
SELECT scarr~carrname, spfli~*, scarr~url
FROM scarr INNER JOIN spfli ON scarr~carrid = spfli~carrid
INTO TABLE #DATA(result).
In the previous versions unfortunately one has to specify the columns one by one or use database native SQL for example with ADBC.
My first question here. This has been a really helpful platform so far. I am some what a newbie in sql. But I have a freelance project in hand which I should release this month.(reporting application with no database writes)
To the point now: I have been provided with data (excel sheets with rows spanning up to 135000). Requirement is to implement a standalone application. I decided to use sql server compact 3.5 sp2 and C#. Due to time pressure(I thought it made sense too), I created tables based on each xls module, with fields of each tables matching the names of the headers in the xls, so that it can be easily imported via CSV import using SDF viewer or sql server compact toolbox added in visual studio. (so no further table normalizations done due to this reason).
I have a UI design for a typical form1 in which inputs from controls in it are to be checked in an sql query spanning 2 or 3 tables. (eg: I have groupbox1 with checkboxes (names matching field1,field2.. of table1) and groupbox2 with checkboxes matching field3, field4 of table2). also date controls based on which a common 'DateTimeField' is checked in each of the tables.
There are no foreign keys defined on tables for linking(did not arise the need to, since the data are different for each). The only commmon field is a 'DateTimeField'(same name) which exists in each table. (basically readings on a datetime stamp from locations. field1, field 2 etc are locations. For a particular datetime there may or may not be readings from table 1 or table2)
How will I accomplish an sql select query(using Union/joins/nested selects - if sql compact 3.5 supports it) to return fields from the 2 tables based on datetime(where clause). For a given date time there can be even empty values for fields in table 2. I have done a lot of research on this and tried as well. but not yet a good solution probably also due to my bad experience. apologies!
I would really appreciate any of your help! Can provide a sample of the data how it looks if you need it. Thanks in advance.
Edit:
Sample Data (simple as that)
Table 1
t1Id xDateTime loc1 loc2 loc3
(could not format the tabular schmema here. sorry. but this is self explanatory)
... and so on up to 135000 records existing imported from xls
Table 2
t2Id xDateTime loc4 loc5 loc6
.. and so on up to 100000 records imported from xls. merging table 1 and table 2 will result in a huge amount of blank rows/values for a date time.. hence leaving it as it is.
But a UI multiselect(loc1,loc2,loc4,loc5 from both t1 and t2) event from winform needs to combine the result from both tables based on a datetime.
... and so on
I managed to write it which comes very close. I say very close cause i have test in detail with different combination of inputs.. Thanks to No'am for the hint. Will mark as answer if everything goes well.
SELECT T1.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
INNER JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION
SELECT T2.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
RIGHT JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION
SELECT T1.xDateTime, T1.loc2, T2.loc4 FROM Table1 T1
LEFT JOIN Table2 T2 ON T1.xDateTime = T2.xDateTime
WHERE (T1.xDateTime BETWEEN 'somevalue1' AND 'somevalue2')
If 't1DateTime' and 't2DateTime' are the common fields, then apparently you need a query such as
SELECT table1.t1DateTime, table1.tiID, table1.loc2, table2.t2id, table2.loc4
FROM table1
INNER JOIN table2 ON table2.t2DateTime = table1.t1DateTime
This will give you values from rows which match in both tables, according to DateTime. If there is also supposed to be a match with the locations then you will have to add the desired condition to the 'ON' statement.
Based on your comment:
For a given date time there can be even empty values for fields in table 2
my understanding would be that you are not interested in orphaned records in table 2 (based on date) so in that case a LEFT JOIN would do it:
SELECT table1.t1DateTime, table1.tiID, table1.loc2, table2.t2id, table2.loc4
FROM table1
LEFT JOIN table2 ON table2.t2DateTime = table1.t1DateTime
However if there are also entries in table2 with no matching dates in table1 that you need to return you could try this:
SELECT table1.t1DateTime, table1.tiID, table1.loc2, ISNULL(table2.t2id, 0), ISNULL(table2.loc4, 0.0)
FROM table1
LEFT JOIN table2 ON table2.t2DateTime = table1.t1DateTime
WHERE (T1.t1DateTime BETWEEN 'somevalue1' AND 'somevalue2')
UNION ALL
SELECT table2.t2DateTime, '0', '0.0', table2.t2id, table2.loc4
FROM table2
LEFT OUTER JOIN table1 on table1.t1DateTime=table2.t2DateTime
WHERE table1.t1Datetime IS NULL AND T2.t2DateTime BETWEEN 'somevalue1' AND 'somevalue2'
Thanks a lot to #kbbucks.
Works with this so far.
SELECT T1.MonitorDateTime, T1.loc2, T.loc4
FROM Table1 T1
LEFT JOIN Table2 T2 ON T2.MonitorDateTime = T1.MonitorDateTime
WHERE T1.MonitorDateTime BETWEEN '04/05/2011 15:10:00' AND '04/05/2011 16:00:00'
UNION ALL
SELECT T2.MonitorDateTime, '', T2.loc4
FROM Table2 T2
LEFT OUTER JOIN Table1 T1 ON T1.MonitorDateTime = T2.MonitorDateTime
WHERE T1.MonitorDateTime IS NULL AND T2.MonitorDateTime BETWEEN '04/05/2011 15:10:00' AND '04/05/2011 16:00:00'
I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.
I have two tables: table1, table2. Table1 has 10 columns, table2 has 2 columns.
SELECT * FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
I want to select all columns from table1 and only 1 column from table2. Is it possible to do that without enumerating all columns from table1 ?
Yes, you can do the following:
SELECT t1.*, t2.my_col FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
Even though you can do the t1.*, t2.col1 thing, I would not recommend it in production code.
I would never ever use a SELECT * in production - why?
you're telling SQL Server to get all columns - do you really, really need all of them?
by not specifying the column names, SQL Server has to go figure that out itself - it has to consult the data dictionary to find out what columns are present which does cost a little bit of performance
most importantly: you don't know what you're getting back. Suddenly, the table changes, another column or two are added. If you have any code which relies on e.g. the sequence or the number of columns in the table without explicitly checking for that, your code can brake
My recommendation for production code: always (no exceptions!) specify exactly those columns you really need - and even if you need all of them, spell it out explicitly. Less surprises, less bugs to hunt for, if anything ever changes in the underlying table.
Use table1.* in place of all columns of table1 ;)