I am trying to use SQL to create a table that concatenates all dates from a specific range to all items in another table. See image for an example.
I have a solution where I can create a column of "null" values in both tables and join on that column but wondering if there is a more sophisticated approach to doing this.
Example image
I've tried the following:
Added a constant value to each table
Then I joined the 2 tables on that constant value so that each row matched each row of both tables.
This got the intended result but I'm wondering if there's a better way to do this where I don't have to add the constant values:
SELECT c.Date_,k.user_email
FROM `operations-div-qa.7_dbtCloud.calendar_table_hours_st` c
JOIN `operations-div-qa.7_dbtCloud.table_key` k
ON c.match = k.match
ORDER BY Date_,user_email asc
It's not a concatenation in the image given, Its a join
select t1.dates Date ,t2.name Person
from table t1,table t2;
Cross join should work for you:
It joins every row from both tables with each other. Use this when there is no relationship between the tables.
Did not test so syntax may be slightly off.
SELECT c.Date_,k.user_email
FROM `operations-div-qa.7_dbtCloud.calendar_table_hours_st` c
CROSS JOIN `operations-div-qa.7_dbtCloud.table_key` k
ORDER BY Date_,user_email asc
Related
When I use a Join in BigQuery, it completes it but creates a new column which are named Id_1 and Date_1 with the same information from the primary key. What could cause this? Here is the code.
SELECT
*
FROM
`bellabeat-case-study-373821.bellabeat_case_study.daily_Activity`
JOIN
`bellabeat-case-study-373821.bellabeat_case_study.sleep_day`
ON
`bellabeat-case-study-373821.bellabeat_case_study.daily_Activity`.Id = `bellabeat-case-study-373821.bellabeat_case_study.sleep_day`.Id
AND `bellabeat-case-study-373821.bellabeat_case_study.daily_Activity`.Date = `bellabeat-case-study-373821.bellabeat_case_study.sleep_day`.Date
I made the query and expected the tables to join by the Primary keys of Id and Date, but instead this created two new columns with the same information.
When you use * in the select list the ON variant of a JOIN clause produces all columns from both tables in the result set. If there are columns with the same name on both sides, then both will show up in the result [with slightly different names] as you can see.
You can use the USING variant of the JOIN clause instead, that merges the columns and produces only one resulting column for each column mentioned in the USING clause. This is probably what you want. See BigQuery - INNER JOIN.
Your query could take the form:
SELECT
*
FROM
`bellabeat-case-study-373821.bellabeat_case_study.daily_Activity`
JOIN
`bellabeat-case-study-373821.bellabeat_case_study.sleep_day`
USING (Id, Date)
Note: USING can only be used when the columns you want to join with have the exact same name. It won't be possible to use it if a column is, for example, called id in one table and employee_id in the other one.
I am trying to outer join multiple time series tables in PostgreSQL on multiple conditions - which include the date column and several other identifier columns.
However the tables do not have continuous time series i.e. some dates are missing for some of the join conditions. Furthermore I don't want "duplicate" table specific new columns to be added for a row when there is not match
I have tried COALESCE() on the dates to fill in missing dates, which is fine. However it is the subsequent joins that are causing me problems. I also can't assume that one of the tables will have rows for all the dates required.
I thought perhaps to use generate series for a date range, with empty columns (? if possible) and then join all the tables on to that?
Please see example below:
I want to join Table A and Table B on columns date, identifier_1 and identifier_2 as an outer join. However where a value is not matched I do not want new columns to be added e.g. table_b_identifier_1.
Table A
id1 and id2 are missing rows on the 03/07 and 04/07, and id1 is also missing a row for the 05/07.
Table B
id2 is missing a row on the 02/07
Desired Output:
Essentially it is a conditional join. If there is a row in both tables for identifier_1 and
It's not clear what is wrong with your attempt to use COALESCE to fill columns data, but it works well as intended in such a query
SELECT
COALESCE(a.date, b.date) AS date,
COALESCE(a.identifier_1, b.identifier_1) AS identifier_1,
COALESCE(a.identifier_2, b.identifier_2) AS identifier_2,
a.value_a,
b.value_b
FROM table_a a
FULL JOIN table_b b ON a.date = b.date
AND a.identifier_1 = b.identifier_1
AND a.identifier_2 = b.identifier_2
Please, check a demo
I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated
This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!
SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).
Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example
If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle
I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins