How to transfer data between two tables with different structures? - hive

I want to transfer data into the table 2 using the data contain in table 1. The two tables have the following schema :
Table 1 :
column A, column B, column C
Table 2 :
column A, column B, column C, column D, column E
The result I want in table 2 is the following :
Table 2 :
A values of Table 1, B values of Table 1, C values of Table 1, NULL (for D values), NULL (for E values)
Is there an HQL command that can do this job ?

INSERT INTO TABLE2
SELECT A, B, C, NULL, NULL FROM TABLE1;

Related

Count non-null values from multiple columns at once without manual entry in SQL

I have a SQL table with about 50 columns, the first represents unique users and the other columns represent categories which are scored 1-10.
Here is an idea of what I'm working with
user
a
b
c
abc
5
null
null
xyz
null
6
null
I am interested in counting the number of non-null values per column.
Currently, my queries are:
SELECT col_name, COUNT(col_name) AS count
FROM table
WHERE col_name IS NOT NULL
Is there a way to count non-null values for each column in one query, without having to manually enter each column name?
The desired output would be:
column
count
a
1
b
1
c
0
Consider below approach (no knowledge of column names is required at all - with exception of user)
select column, countif(value != 'null') nulls_count
from your_table t,
unnest(array(
select as struct trim(arr[offset(0)], '"') column, trim(arr[offset(1)], '"') value
from unnest(split(trim(to_json_string(t), '{}'))) kv,
unnest([struct(split(kv, ':') as arr)])
where trim(arr[offset(0)], '"') != 'user'
)) rec
group by column
if applied to sample data in your question - output is
I didn't do this in big-query but instead in SQL Server, however big query has the concept of unpivot as well. Basically you're trying to transpose your columns to rows and then do a simple aggregate of the columns to see how many records have data in each column. My example is below and should work in big query without much or any tweaking.
Here is the table I created:
CREATE TABLE example(
user_name char(3),
a integer,
b integer,
c integer
);
INSERT INTO example(user_name, a, b, c)
VALUES('abc', 5, null, null);
INSERT INTO example(user_name, a, b, c)
VALUES('xyz', null, 6, null);
INSERT INTO example(user_name, a, b, c)
VALUES('tst', 3, 6, 1);
And here is the UNPIVOT I did:
select count(*) as amount, col
from
(select user_name, a, b, c from example) e
unpivot
(blah for col in (a, b, c)
) as unpvt
group by col
Here's example of the output (note, I added an extra record in the table to make sure it was working properly):
Again, the syntax may be slightly different in BigQuery but I think thould get you most of the way there.
Here's a link to my db-fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=deaa0e92a4ef1de7d4801e458652816b

Updating a list of data in a normalised table postgres

I have two tables. One is table A which contains an id. Table B is a normalised table that contains a foreign key to table A and some other column called value.
e.g.
Table B
| id | fk| value
Table A
|pk| ... |
Basically I have a list (of n length) of values that I want to insert into table B that are to one foreignKey e.g list = [a, b, c, d] key = 1. The problem is table B might already have these values so I only want to insert the ones that aren't already in that table, as well as delete the ones that aren't in my list.
list = [a, b, c, d], key = 1
table B
| id |fk | value
| 1 | 1 | a
| 2 | 1 | b
| 3 | 1 | e
Is there a way that I can insert only c and d from the list into the table and delete e from the table in one statement? My current attempt is to delete every entry that matches the key and then insert them all but I don't think this is the efficient way to do this.
Why not just truncate b and insert the new values?
truncate table b;
insert into b (fk, value)
<your list here>;
Or if key is a column in b and you want to delete all keys with a given value:
delete from b where key = 1;
insert into b (fk, value, key)
<your list here with "1" for the key>
This doesn't preserve the id column from b, but your question does not mention that as being important.
An alternative method would use CTEs:
with data(fk, value) as (
<your list here>
),
d as (
delete from b
where (b.fk, b.value) not in (select d.fk, d.value from data d)
)
insert into b (fk, value)
select d.fk, d.value
from data d
where (d.fk, d.value) not in (select b.fk, b.value from b);

Insert value from two tables based on the rows source table

I have two tables A and B each with 1 column for simplicity sake and they are primary Keys.
A contains the values (1,2,3) B contains (1,2,3)
The third table needs to be the insertion of both A and B and has a composite primary key.
Table C (id, src)
If the id is coming from table A I'd like src to be 'A' and if its coming from B then 'B'.
There can be duplicate ID's between the tables but they are not the same item which is why I need to create a composite key based on which table the row is coming from.
I've tried
Insert into C (anID, src)
Select
Case when (A.anID is not null)
then A.anID else B.anID end,
case when (A.anID is not null)
then 'A' else 'B' end
from
A,
B
But my results always end up as just 3 rows (1, A) (2,A) (3,A) When there should be 6 rows (one of each of those with a B)
insert into TableC (id, src)
select ID, 'A' from tableA
union
select ID, 'B' from tableB

Excel like calculations in SQL

In Excel if I have say numbers 1, 2 and 3 in columns A, B and C. I can write a formula in column D "=A+B" and then a formula in column E "=D+C".
Basically, I can use the result of a calculated column in the same row.
Can I achieve something similar in SQL with a single line of query.
For example, something like
SELECT A, B, C, A+B as D, D+C as E
FROM TABLE1
Result: 1, 2, 3, 3, 6
You can use calculated columns when create table as
CREATE TABLE tbl(id int, A int, B int, C int, D as A+B, E as A + B + C);
insert tbl(A, B, C) values (1, 2, 3)
Or use
SELECT A, B, C, A+B as D, + A+B + C as E
FROM TABLE1

sql - select row id based on two column values in same row as id

Using a SELECT, I want to find the row ID of 3 columns (each value is unique/dissimilar and is populated by separate tables.) Only the ID is auto incremented.
I have a middle table I reference that has 3 values: ID, A, B.
A is based on data from another table.
B is based on data from another table.
How can I select the row ID when I only know the value of A and B, and A and B are not the same value?
Do you mean that columns A and B are foreign keys?
Does this work?
SELECT [ID]
FROM tbl
WHERE A = #a AND B = #b
SELECT ID FROM table WHERE A=value1 and B=value2
It's not very clear. Do you mean this:
SELECT ID
FROM middletable
WHERE A = knownA
AND B = knownB
Or this?
SELECT ID
FROM middletable
WHERE A = knownA
AND B <> A
Or perhaps "I know A" means you have a list of values for A, which come from another table?
SELECT ID
FROM middletable
WHERE A IN
( SELECT otherA FROM otherTable ...)
AND B IN
( SELECT otherB FROM anotherTable ...)