I have tables build on Hadoop. These are impala tables. (Not Kudu).
Issue : I have to update the few cols values (eg: load_date,fraud_type) from ulti_up_2 table for a set of keys (dw,auth,ulti_date) in the final_up_2 table.
I have the below mentioned used case :-
Table 1 :
create table dbo.ulti_up_2 (
dw string
,auth int
,ulti_date string
,load_date string
,fraud_type string
),
insert into dbo.ulti_up_2
values ('b',1,'2021-07-25','2021-07-27','x'),
('c',0,'2021-07-25','2021-07-27','y');
Table 2:
create table dbo.final_up_2 (id int,auth_date string,dw string,auth int,ulti_date string,load_date string,fraud_type string)
insert into dbo.final_up_2 values
(1,'2021-07-24','a',1,'2021-07-25','2021-07-25','p'),
(2,'2021-07-24','b',1,'2021-07-25','2021-07-25','q'),
(3,'2021-07-24','c',0,'2021-07-25','2021-07-25','t'),
(4,'2021-07-24','d',1,'2021-07-25','2021-07-25','r');
create table dbo.refresh_table1 as
select df_prep.id,df_prep.auth_date,df_prep.dw,df_prep.auth,df_prep.ulti_date,
ulti_prep.fraud_type,ulti_prep.load_date
from dbo.final_up_2 df_prep
left join
dbo.ulti_up_2 ulti_prep
on df_prep.dw=ulti_prep.dw and
df_prep.auth=ulti_prep.auth and
df_prep.ulti_date=ulti_prep.ulti_date;
Output Coming :
id|auth_date|dw|auth|ulti_date|fraud_type|load_date
(1,'2021-07-24','a',1,'2021-07-25',NULL,NULL),
(2,'2021-07-24','b',1,'2021-07-25','x','2021-07-27'),
(3,'2021-07-24','c',0,'2021-07-25','y','2021-07-27'),
(4,'2021-07-24','d',1,'2021-07-25',NULL,NULL);
Output I need :
id|auth_date|dw|auth|ulti_date|fraud_type|load_date
(1,'2021-07-24','a',1,'2021-07-25','p','2021-07-25'),
(2,'2021-07-24','b',1,'2021-07-25','x','2021-07-27'),
(3,'2021-07-24','c',0,'2021-07-25','y','2021-07-27'),
(4,'2021-07-24','d',1,'2021-07-25','r','2021-07-25');
Thanks in Advance. Please Help.
This is because the left join with ulti_up_2 is failing for some cases. if you handle them, you should get data.
create table dbo.refresh_table1 as
select df_prep.id,df_prep.auth_date,df_prep.dw,df_prep.auth,df_prep.ulti_date,
ifnull(ulti_prep.fraud_type,df_prep.fraud_type) as fraud_type , --This will fetch data from final_up_2 in case left join fails.
ifnull(ulti_prep.load_date,df_prep.load_date) as load_date --This will fetch data from final_up_2 in case left join fails.
from dbo.final_up_2 df_prep
left join
dbo.ulti_up_2 ulti_prep
on df_prep.dw=ulti_prep.dw and
df_prep.auth=ulti_prep.auth and
df_prep.ulti_date=ulti_prep.ulti_date;
I am fairly new to SQL. What I am trying to do is create a view from an existing table. I also need to add a new column to the view which maps to the values of an existing column in the table.
So within the view, if the value in a field for Col_1 = A, then the value in the corresponding row for New_Col = C etc
Does this even make sense? Would I use the CASE clause? Is mapping in this way even possible?
Thanks
The best way to do this is to create a mapping or lookup table
For example consider the following LOOKUP table.
COL_A NEW_VALUE
---- -----
A C
B D
Then you can have a query like this:
SELECT A.*, LOOK.NEW_VALUE
FROM TABLEA AS A
JOIN LOOKUP AS LOOK ON A.COL_A = LOOK.COL_A
This is what DimaSUN is doing in his query too -- but in his case he is creating the table dynamically in the body of the query.
Also note, I'm using a JOIN (which is an inner join) so only results in the lookup table will be returned. This could filter the results. A LEFT JOIN there would return all data from A but some of the new columns might be null.
Generally, a view is an instance of a table/a replica provided that there is no alteration to the original table. So, as per your query you can manipulate the data and columns in a view by using case.
Create View viewname as
Select *,
case when column=a.value then 'C'
....
ELSE
END
FROM ( Select * from table) a
If You have restricted list of replaced values You may hardcode that list in query
select T.*,map.New_Col
from ExistingTable T
left join (
values
('A','C')
,('B','D')
) map (Col_1,New_Col) on map.Col_1 = T.Col_1
In this sample You hardcode 'A' -> 'C' and 'B' -> 'D'
In general case You better may to use additional table ( see Hogan answer )
I have 2 bigquery tables with nested columns, I need to update all the columns in table1 whenever table1.value1=table2.value, also those tables having a huge amount of data.
I could update a single nested column with static column like below,
#standardSQL
UPDATE `ck.table1`
SET promotion_id = ARRAY(
SELECT AS STRUCT * REPLACE (100 AS PromotionId ) FROM UNNEST(promotion_id)
)
But when I try to reuse the same to update multiple columns based on table2 data I am getting exceptions.
I am trying to update table1 with table2 data whenever the table1.value1=table2.value with all the nested columns.
As of now, both tables are having a similar schema.
I need to update all the columns in table1 whenever table1.value1=table2.value
... both tables are having a similar schema
I assume by similar you meant same
Below is for BigQuery Standard SQL
You can use below query to get combining result and save it back to table1 either using destination table or CREATE OR REPLACE TABLE syntax
#standardSQL
SELECT AS VALUE IF(value IS NULL, t1, t2)
FROM `project.dataset.table1` t1
LEFT JOIN `project.dataset.table2` t2
ON value1 = value
I have not tried this approach with UPDATE syntax - but you can try and let us know :o)
I am using Postgresql db. I have data in two tables. Table A has 10 records and Table B 5 records.
I would like to copy Table A data to Table B but only copy the new entries (5 records) and ignore the duplicates/already existing data
I would like to copy data from Table A to Table B where Table B will have 10 records (5 old records + 5 new records from Table A)
Can you please help me as to how can this be done?
Assuming id is your primary key, and table structures are identical(both table has common columns as number of columns and data type respectively), use not exists :
insert into TableB
select *
from TableA a
where not exists ( select 0 from TableB b where b.id = a.id )
If you are looking to copy rows unique to A that are not in B then you can use INSERT...SELECT. The SELECT statement should use the union operator EXCEPT:
INSERT INTO B (column)
SELECT column FROM A
EXCEPT
SELECT column FROM B;
EXCEPT (https://www.postgresql.org/docs/current/queries-union.html) compares the two result sets and will return the distinct rows present in result A but not in B, then supply these values to INSERT. For this to work both the columns and respective datatypes must match in both SELECT queries and your INSERT.
INSERT INTO Table_A
SELECT *
FROM Table_B
ON CONFLICT DO NOTHING
Here, the conflict will be taken based on your primary key.
In SQL something like
SELECT count(id), sum(if(column1 = 1, 1, 0)) from groupedTable
could be formulated to perform a count of the total records as well as filtered records in a single pass.
How can I perform this in spark-data-frame API? i.e. without needing to join back one of the counts to the original data frame.
Just use count for both cases:
df.select(count($"id"), count(when($"column1" === 1, true)))
If column is nullable you should correct for that (for example with coalesce or IS NULL, depending on the desired output).
You can try using spark with hive as hive supports sum if() functionality of SQL
First you need to create hive table on top of your data using below code
val conf = new SparkConf().setAppName("Hive_Test")
val sc = new SparkContext(conf)
//Creation of hive context
val hsc = new HiveContext(sc)
import spark.implicits._
import spark.sql
hsc.sql("CREATE TABLE IF NOT EXISTS emp (id INT, name STRING)")
hsc.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/test.txt' INTO TABLE emp")
hsc.sql("""select count(id), SUM(v)
from (
select id, IF(name=1, count(*), 0) AS v
from emp
where id>0
group by id,name
) t2""")