How to update Hive table rows - hive

I have a table that looks like this:
id | Col2 | Col3 | Text
--------------------------
1 | ... | ... | "abc"
2 | ... | ... | "def
3 | ... | ... | "ghi"
4 | ... | ... | "jkl"
And another table that looks like this:
id | Text
-------------
1 | "qwe"
2 | "rty"
And I want to end up with a table that looks like this:
id | Col2 | Col3 | Text
--------------------------
1 | ... | ... | "qwe"
2 | ... | ... | "rty"
3 | ... | ... | "ghi"
4 | ... | ... | "jkl"
where the original values for col2 and col3 are maintained. Essentially, I want to use the values from table 2 to update the values of table 1 where ids are the same.
I tried:
SELECT
A.id,
col1,
col2,
A.text
FROM table1 AS A
LEFT JOIN (
SELECT
id,
text
FROM table2
) AS B
ON A.product_id = B.product_id
But this just returned the original table. Is there a way to achieve what I want in Presto/Hive?

You are loading Text from table A, it should be from table B or NVL(B.text, A.Text) if you want to update value if exists in table B and leave as is if not exists (see comment in the code)
INSERT OVERWRITE table1
SELECT
A.id,
col1,
col2,
NVL(B.text, A.Text) as Text -- Take Text from table B, if not exists, leave as is (A.Text)
FROM table1 AS A
LEFT JOIN B ON A.product_id = B.product_id
You can use coalesce(B.text, A.Text) instead of NVL, as #PiotrFindeisen mentioned, it will work fine on Presto and Hive as well.

Related

How Can I make a kind of column by distribution in a column in Hive SQL

I want to make a result in hive like this :
| COL1 | HISTOGRAM |
+------+-----------------------+
| a | {"A":2, "B":2} |
| b | {"C":2, "A":1, "B":1} |
from this table :
| COL1 | COL2 |
+------+------+
| a | A |
| a | B |
| a | A |
| a | B |
| b | A |
| b | B |
| b | C |
| b | C |
Presto SQL has like what I want, I think :
select COL1, histogram(COL2)
from sample_table
group by COL1
You can calculate counts group by col1, col2, then aggregate col2:cnt strings using collect_set or collect_list into array, concatenate array with comma as a delimiter and convert resulting string to map using str_to_map.
Demo:
select stack (8, --number of tuples
'a','A'
,'a','B'
,'a','A'
,'a','B'
,'b','A'
,'b','B'
,'b','C'
,'b','C'
) as (COL1,COL2)
)
select col1, str_to_map(concat_ws(',',collect_set(concat(col2,':',cnt)))) histogram
from
(
select col1, col2, count(*) cnt from data_example group by col1, col2
)s
group by col1
;
Result:
col1 histogram
a {"A":"2","B":"2"}
b {"A":"1","B":"1","C":"2"}

Insert value into a table when strings match conditions from another table

I have two tables in a PostgreSQL database. Table2 has an FK from table1's PK. I want to search table1 for specific strings, and if I find matches I want to update a column in table2 with a string.
Table1
+----+------+------+------+
| PK | Col1 | Col2 | Col3 |
+----+------+------+------+
| 1 | A | x | x |
| 2 | x | x | x |
| 3 | x | A | x |
| 4 | x | x | x |
| 5 | x | x | A |
+----+------+------+------+
Table2
+----+-----------------+
| FK | matching_column |
+----+-----------------+
| 1 | string |
| 2 | |
| 3 | string |
| 4 | |
| 5 | string |
+----+-----------------+
So where table1 contains '%A%'
update table2 with 'string'
I'm not sure where to start on this one. Does anyone have a solution?
Use a subquery:
UPDATE table2
SET matching_column = 'string'
WHERE fk = (SELECT pk
FROM table1
WHERE "Col1" LIKE '%A%'
OR "Col2" LIKE '%A%'
OR "Col3" LIKE '%A%');
You could use the update ... set ... from syntax. If I followed you correctly, you want:
update table2 t2
set t2.matching_column = 'string'
from table 1 t1
where
t1.pk = t2.fk
and 'A' in (t1.col1, t1.col2, t1.col3)
This phrases as: if the fk of table2 exists in table1 and one of the 3 columns contains (col1, col2, col3) contains 'A', then set column matching_column in the corresponding record in table2 to 'string'.

SQL Server select column names from multiple tables

I have three tables in SQL Server with following structure:
col1 col2 a1 a2 ... an,
col1 col2 b1 b2 ... bn,
col1 col2 c1 c2 ... cn
The two first records are the same, col1 and col2, however the tables have different lengths.
I need to select the column names of the tables and the result I'm trying to achieve is the followig:
col1, col2, a1, b1, c1, a2, b2, c2 ...
Is there a way to do it?
It's possible but result's is combined into single column of three table tables.
For example
SELECT A.col1 +'/' +B.col1 +'/' + C.col1 As Col1 ,
A.col2 +'/' +B.col2 +'/' + C.col2 As col2 ,a1, b1, c1, a2, b2, c2 ,
* FROM A
INNER JOIN B
ON A.ID =B.ID
INNER JOIN C
ON C.ID = B.ID
SQL-Server is not the right tool to create a generic resultset. The engine needs to know what's coming out in advance. Well, you might try to find a solution with dynamic SQL...
I want to suggest two different approaches.
Both would work with any number of tables, as long as all of them have the columns col1 and col2 with appropriate types.
Let's create a simple mokcup scenario before:
DECLARE #mockup1 TABLE(col1 INT,col2 INT,SomeMore1 VARCHAR(100),SomeMore2 VARCHAR(100));
INSERT INTO #mockup1 VALUES(1,1,'blah 1.1','blub 1.1')
,(1,2,'blah 1.2','blub 1.2')
,(1,100,'not in t2','not in t2');
DECLARE #mockup2 TABLE(col1 INT,col2 INT,OtherType1 INT,OtherType2 DATETIME);
INSERT INTO #mockup2 VALUES(1,1,101,GETDATE())
,(1,2,102,GETDATE()+1)
,(1,200,200,GETDATE()+200);
--You can add as many tables as you need
A very pragmatic approach:
Try this simple FULL OUTER JOIN:
SELECT *
FROM #mockup1 m1
FULL OUTER JOIN #mockup2 m2 ON m1.col1=m2.col1 AND m1.col2=m2.col2
--add more tables here
The result
+------+------+-----------+-----------+------+------+------------+-------------------------+
| col1 | col2 | SomeMore1 | SomeMore2 | col1 | col2 | OtherType1 | OtherType2 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 1 | blah 1.1 | blub 1.1 | 1 | 1 | 101 | 2019-03-08 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 2 | blah 1.2 | blub 1.2 | 1 | 2 | 102 | 2019-03-09 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| 1 | 100 | not in t2 | not in t2 | NULL | NULL | NULL | NULL |
+------+------+-----------+-----------+------+------+------------+-------------------------+
| NULL | NULL | NULL | NULL | 1 | 200 | 200 | 2019-09-24 10:53:20.257 |
+------+------+-----------+-----------+------+------+------------+-------------------------+
But you will have to deal with non-unique column names... (This is the moment, where a dynamically created statement can help).
A generic approach using container type XML
Whenever you do not know the result in advance, you can pack the result in a container. This allows a clear structure on the side of your RDBMS and shifts the troubles how to deal with this set to the consumer.
The cte will read all existing pairs of col1 and col2
Each table's row(s) for the pair of values is inserted as XML
Pairs not existing in any of the tables show up as NULL
Try this out
WITH AllDistinctCol1Col2Values AS
(
SELECT col1,col2 FROM #mockup1
UNION ALL
SELECT col1,col2 FROM #mockup2
--add all your tables here
)
SELECT col1,col2
,(SELECT * FROM #mockup1 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content1
,(SELECT * FROM #mockup2 x WHERE c1c2.col1=x.col1 AND c1c2.col2=x.col2 FOR XML PATH('row'),TYPE) AS Content2
FROM AllDistinctCol1Col2Values c1c2
GROUP BY col1,col2;
The result
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| col1 | col2 | Content1 | Content2 |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | <row><col1>1</col1><col2>1</col2><SomeMore1>blah 1.1</SomeMore1><SomeMore2>blub 1.1</SomeMore2></row> | <row><col1>1</col1><col2>1</col2><OtherType1>101</OtherType1><OtherType2>2019-03-08T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 2 | <row><col1>1</col1><col2>2</col2><SomeMore1>blah 1.2</SomeMore1><SomeMore2>blub 1.2</SomeMore2></row> | <row><col1>1</col1><col2>2</col2><OtherType1>102</OtherType1><OtherType2>2019-03-09T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 100 | <row><col1>1</col1><col2>100</col2><SomeMore1>not in t2</SomeMore1><SomeMore2>not in t2</SomeMore2></row> | NULL |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+
| 1 | 200 | NULL | <row><col1>1</col1><col2>200</col2><OtherType1>200</OtherType1><OtherType2>2019-09-24T11:03:49.877</OtherType2></row> |
+------+------+-----------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------+

Oracle SQL statement without duplicates

I have a requirement to write a SQL statement to return 2 columns, however there cannot be duplicates in either of these columns. For example:
|---------------------|------------------|
| 10 | A |
|---------------------|------------------|
| 11 | B |
|---------------------|------------------|
| 12 | C |
|---------------------|------------------|
| 13 | A | <--- Don't return
|---------------------|------------------|
Using distinct doesn't work, since the row highlighted above is distinct. It also doesn't matter which of the duplicates is returned.
Does anyone know of a way to do this? It feels as though I'm missing something obvious.
Thanks.
You can try to make row number by col2 and get rn = 1 data row.
CREATE TABLE T(
col1 int,
col2 varchar(5)
);
insert into t values (10,'A');
insert into t values (11,'B');
insert into t values (12,'C');
insert into t values (13,'A');
Query 1:
SELECT t1.col1,t1.col2
FROM (
SELECT t1.*,ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col1) rn
FROM T t1
)t1
WHERE t1.rn = 1
Results:
| COL1 | COL2 |
|------|------|
| 10 | A |
| 11 | B |
| 12 | C |
If you just want the lowest value from the first column, do:
SELECT MIN(column1), column2
FROM YourTable
GROUP BY column2
This is not posible in one query, because each column have different number of unique values

select data from table 1 and get cross reference values of table 2 when table1 column name is matching with row values of column1 in table2

Table 1
id | check | status
1 | abc | 1
2 | def | 3
Table 2
Column1 | rawvalue | to_be_updated_value|
check | abc | new
status | 3 | 3333
Please help me to write select statement to get the following output in Oracle11g
Expected Output:
id | check | status
1 | **new** | 1
2 | def | **3333**
This is the first thing I think you want to do (applies to SQL Server):
SELECT ISC.COLUMN_NAME, T2.*
FROM <YourDatabase>.INFORMATION_SCHEMA.COLUMNS ISC
INNER JOIN Table2 T2
ON T2.Column1 = ISC.COLUMN_NAME
WHERE ISC.TABLE_NAME = N'Table1';