Creating a table out of two tables in SQL - sql

I'm trying to create a table based off of two tables. For example, I have a column in one table called Customer_ID and a column in another table called Debit_Card_Number. How can I make it so I can get the Customer_ID column from one table and the Debit_card_number from the other table and make a table? Thanks

Assuming Two Table Names as TableOne and TableTwo and CustomerID as a common Attribute.
CREATE TABLE NEW_TABLE_NAME AS (
SELECT
TableOne.Customer_ID,
TableTwo.Debit_Card_Number
FROM
TableOne,
TableTwo
Where
tableOne.CustomerID = tableTwo.CustomerID
)

Look into using a join. Use Left Join to give you the id, even if there isn't a matching card number for that id. Your value to match on will probably be the id, assuming that value is in the table with the card number
create table joined_table as(
select t1.customer_id, t2.debit_card_number
from t1
inner join t2
on t1.matchValue = t2.matchValue
)

Related

New table from two table with max(timestamp) - Bigquery SQL

I have two tables where the combination of retailer and id are the one common between the two. I need to create a new table for all retailer + id combination from the first table and respective data for those from the second table that has the latest timestamp
The first table will have only one record for each retailer, id combination but the second table will have multiple records for each retailer, id combination based on the time it was scraped, I need to create a new table with the latest timestamp data for each combination
input table 1:
input table 2:
output table:
This is basically aggregation and join:
select *
from table1 t1 left join
(select t2.retailer, max(timestamp) as max_timestamp
from table2 t2
group by t2.retailer
) t2
on using (retailer);
If you wanted the entire most recent row, you can use a variant of this:
select *
from table1 t1 left join
(select ( array_agg(t2 order by timestamp desc limit 1) )[safe_ordinal(1)].*
from table2 t2
group by t2.retailer
) t2
on using (retailer);

Replacing the values in a column in select query

I have a table where one of the columns is ids as foreign key to another table. how can I replace the ids with data from the other table inside select query?
You use a join:
select t1.column_one, t2.display_value
from table_one t1
join table_two t2 on t1.fk_column_to_table_two = t2.primary_key_column;

How to map each distinct value of a column in one table with each distinct value of a column in another table in Hive

I have two tables in Hive, Table1 and Table2. I want to get each distinct customerID in Table1 and map it to each distinct value in a column called category of Table2. However I am a bit lost on how to do this in hive. A better example of what I am trying to do is the following: Let's say Table1 contains 5 distinct customerID's and Table2 contains 3 distinct categories. I want my query result to look something like the following:
However Table1 and Table2 do not have any columns in common so I am a bit lost on how to perform a join on this two tables in hive. Is this task possible in hive? Any insights on this would be greatly appreciated!
You can do that with a cross join of distinct values from both tables.
select t1.customerid,t2.categories
from (select distinct customerid from tbl1) t1
cross join (select distinct categories from tbl2) t2

How to find the common values from two different tables having a common column

I have two tables table1 and table2. Both tables have a common column named city.
How do I find all values under city which are in both the tables ?
You can do an inner join on the city column, to find values that exist in both tables.
select
-- Output the city from either table (since it will be the same)
t1.city
from
-- Join table1 and table2 together, on a matching city column
table1 t1 join table2 t2 on (t1.city=t2.city)
group by
-- Only return a single row per city
t1.city
SELECT tbone.desired_column1
tbone.desired_column2,
--other columns from table one
tbtwo.desired_column1,
tbtwo.desired_column2
--other columns from table two
-- Bellow we're stating what this table could be identified as (tbone and tbtwo), so that you don't have to keep typing table name above and bellow. Can be anything, such as A or B or HORSECORRECTINGBATTERY
FROM table1 tbone,
table2 tbtwo
WHERE tbone.city = tbtwo.city
If you don't want to specify which columns to take, just go with
SELECT * FROM ...

Join two tables using HiveQL

These are two tables below-
CREATE EXTERNAL TABLE IF NOT EXISTS Table1 (This is the MAIN table through which comparisons need to be made)
(
ITEM_ID BIGINT,
CREATED_TIME STRING,
BUYER_ID BIGINT
)
CREATE EXTERNAL TABLE IF NOT EXISTS Table2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)
As BUYER_ID and USER_ID they both are same thing.
I need to find the total COUNT and all those BUYER_ID that are not there in Table2 by comparing from Table1. So I think it's a kind of Left Outer Join Query. I am new to HiveSql stuff so I am having problem to figure out what should be the actual syntax to do this in HiveQL. I wrote the below SQL Query. Can anyone tell me whether the SQL query below is fine or not to achieve my scenario?
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
GROUP BY BUYER_ID;
If I understand your requirements correctly, I think you are almost there. It seems you only need to add a condition checking if there's no match between the two tables:
SELECT COUNT(BUYER_ID), BUYER_ID
FROM Table1 dw
LEFT OUTER JOIN Table2 dps ON (dw.BUYER_ID = dps.USER_ID)
WHERE dps.USER_ID IS NULL
GROUP BY BUYER_ID;
The above will filter out BUYER_IDs that do have matches in Table2, and will show the remaining BUYER_IDs and their corresponding count values. (Well, that's what I understand you want.)