Hive - joining two tables to find string that like string in reference table

Hive - joining two tables to find string that like string in reference table - hive

I stumbled in a case where requires to mask data using keyword from other reference table, illustrated below:
1:
Table A contains thousands of keyword and Table B contains 15 millions ++ row for each day processing..
How can I replace data in table B using keyword in table A in new column?
I tried to use join but join can only match when the string exactly the same
Here is my code
select
sourcetype, hourx,minutex,t1.adn,hostname,t1.appsid,t1.taskid,
product_id,
location,
smsIncoming,
case
when smsIncoming regexp keyword = true then keyword
else 'undef' end smsIncoming_replaced
from(
select ... from ...
)t1
left join
(select adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount,max(datex)
from ( select adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount,datex ,last_update,
max(last_update) over (partition by keyword) as last_modified
from sqm_stg.reflex_service_map ) as sub
where last_update = last_modified
group by adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount)t2
on t1.adn=t2.adn and t1.appsid=t2.appsid and t1.taskid=t2.taskid
Need advice :)
Thanks

Use instr(string str, string substr) function: Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str. Be aware that this is not zero based. The first character in str has index 1.
case
when instr(smsIncoming,keyword) >0 then keyword
else 'undef'
end smsIncoming_replaced

Related

How can reiterate a certain statement in select in oracle

There is a column of text with JSON expressions in it. It is not at all clear how long the JSON array is, and in the example code below I have repeated the phrase up to six times (it can be more than six repetitions). How can I repeat a duplicate (case when) based on the longest array length?
I also want to specify the column names with the variables d_i and a_i (here i is the counter).
Can I use a while or loop? If Yes, HOW?
Note: If in any row, the first value in the JSON expression is not greater than 0, then the length of the JSON array in that row is zero, and this continues until the end of the representation. This means that if the first cell of the JSON array has a value, the second cell may have a value, and if the second cell has no value, then the length of the array is definitely 1.
If this condition occurs, the loop must start again.
I hope I have stated what I mean correctly.
select t.tx_id,
--00
case WHEN t.fee[0]:amount>0 then t.fee[0]:denom end as d_0,
case when t.fee[0]:amount>0 then t.fee[0]:amount/1000000 end as a_0,
--01
case WHEN t.fee[1]:amount>0 then t.fee[1]:denom end as d_1,
case when t.fee[1]:amount>0 then t.fee[1]:amount/1000000 end as a_1,
--02
case WHEN t.fee[2]:amount>0 then t.fee[2]:denom end as d_2,
case when t.fee[2]:amount>0 then t.fee[2]:amount/1000000 end as a_2,
--03
case WHEN t.fee[3]:amount>0 then t.fee[3]:denom end as d_3,
case when t.fee[3]:amount>0 then t.fee[3]:amount/1000000 end as a_3,
--04
case WHEN t.fee[4]:amount>0 then t.fee[4]:denom end as d_4,
case when t.fee[4]:amount>0 then t.fee[4]:amount/1000000 end as a_4,
--05
case WHEN t.fee[5]:amount>0 then t.fee[5]:denom end as d_5,
case when t.fee[5]:amount>0 then t.fee[5]:amount/1000000 end as a_5,
--06
case WHEN t.fee[6]:amount>0 then t.fee[6]:denom end as d_6,
case when t.fee[6]:amount>0 then t.fee[6]:amount/1000000 end as a_6
from terra.transactions t
where t.tx_id not in (select s.tx_id from terra.swaps s) and fee[0].amount>0 limit 1000

Assuming that you have the table:
CREATE TABLE transactions (
tx_id NUMBER PRIMARY KEY,
fee JSON
);
With the data:
INSERT INTO transactions (tx_id, fee) VALUES (
1,
'[{"denom":"ABC","amount":100},{"denom":"DEF","amount":0},{"denom":"GHI","amount":1}]'
);
Then the simplest method is to output the data as rows (and not as columns):
select t.tx_id,
j.*
from terra.transactions t
CROSS JOIN JSON_TABLE(
t.fee,
'$[*]'
COLUMNS
denom VARCHAR2(20) PATH '$.denom',
amount NUMBER PATH '$.amount'
) j
where t.tx_id not in (select s.tx_id from terra.swaps s)
and j.amount>0
Which outputs:
TX_ID
DENOM
AMOUNT
1
ABC
100
1
GHI
1
If you want to dynamically pivot the rows to columns then this is best done in whatever middle-tier application (PHP, C#, Java, Python, etc.) that you are using to access the database. If you want to do it in Oracle then you can look at the answers to this question.
db<>fiddle here

I use flatten table:
with flattenTable as (
SELECT
tx_id,
fee,
b.value as fee_parsed,
b.value:amount as fee_amount,
b.value:denom as fee_denom
FROM terra.transactions, TABLE(FLATTEN(terra.transactions.fee)) b
where tx_id not in (select s.tx_id from terra.swaps s ) and fee_amount>0)
SELECT f.*,
case when f.fee_denom='uusd' then f.fee_amount/1000000 else f.fee_amount/1000000*(select
avg(price_usd)
from terra.oracle_prices o,flattenTable f
where o.CURRENCY = f.fee_denom and o.block_timestamp=CURRENT_DATE) end as Fee_USD
from flattenTable f
limit 100

Teradata - selecting between two columns based on whether it starts with a number or not

I have the a query which looks similar to:
SELECT
s.cola, s.colb, t.colc, t.cold, u.cole, u.colf, u.colg, u.colh, u.coli, u.colj, u.colk, u.coll
FROM table1 s
INNER JOIN table2 t
ON s.colb = t.colc
INNER JOIN table3 u
ON u.colm = t.cold
WHERE cast(s.cola as date) between date '2017-11-06' and date '2017-11-10'
ORDER BY 3
I need to add a new column, called col_new, which is to be filled by either u.colm or u.coln. This column will have values from u.colm if that column starts with a number. Otherwise it will have values from u.coln. It is known that either u.coln or u.colm starts with a number, for each entry in table u.
I tried the following query to test if entries starting with a number can be identified or not:
SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(colm), 1, 1)) = 1
THEN 'yes'
ELSE 'no'
END AS col_new
FROM table_u
It returned the error: Syntax error: expected something between '(' and the 'substring' keyword.
Kindly suggest a solution.
Edit:
Exact Error:
[Teradata Database] [3706] Syntax error: expected something between '(' and the 'substring' keyword.

Instead of isnumeric(), just do a comparison:
SELECT (CASE WHEN LEFT(LTRIM(colm), 1) BETWEEN '0' AND '9' THEN 'yes'
ELSE 'no'
END) AS col_new
FROM table_u;
LEFT() is a convenient shorthand for the first "n" characters of a string.

How to use INTO and GROUP BY clause together

SELECT cast ( SUBSTRING ( CAST ("ProcessingDate" AS text), 5, 2 ) as integer),
COUNT(*) INTO resultValue1,resultValue2
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y' AND "ProcessingDate" >= 20160110
GROUP BY 1
ORDER BY 1;
In my database, the ProcessingDate is stored as YYYYMMDD. So, I am extracting its month from it.
This query is working fine if we remove the INTO clause but, I want to store the result to use it further.
So what should be the datatype of the variable resultValue1 and resultValue2 how to store the data(because data will be multiple).
As I am new to PostgreSQL I don't know how to do this can anybody help me out.

Here resultValue1 & resultValue2 will be tables not variables. you can group by using the column names.
Give some column alias names for both columns and group by using them.
You probably want this.
SELECT cast ( SUBSTRING ( cast ("ProcessingDate" as text),5 , 2 ) as
integer) AS resultValue1, COUNT(*) AS resultValue2
INTO <NewTable> --NewTable will be created with those two columns
FROM "DemoLogs"."Project"
-- conditions
Group By 1
-- other contitions/clauses
;
Kindly refer this INTO Documentation.
Hope this helps.

Try this:
SELECT cast ( SUBSTRING ( cast ("ProcessingDate" as text),5 , 2 ) as integer)resultValue1,
COUNT(*)resultValue2
INTO <Table Name>
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y'
AND "ProcessingDate" >= '20160110'
Group By 1
Order By 1;

Store the above query in the variable and remove that INTO clause from it.
So, query will be now :
query = SELECT cast ( SUBSTRING ( CAST ("ProcessingDate" AS text), 5, 2 ) as
integer),
COUNT(*)
FROM "DemoLogs"."Project"
WHERE "Addr" = 'Y' AND "ProcessingDate" >= 20160110
GROUP BY 1
ORDER BY 1;
Now declare result_data of type record and use in the following manner:
We are looping over result_data because it gives number of rows as output
and I have declared resultset of type text to store the result (as I needed further)
FOR result_data IN EXECUTE query
LOOP
RAISE NOTICE 'result : %',to_json(result_data);
resultset = resultset || to_json(result_data);
END LOOP;

how to convert the output of sub query into numeric

select rptName
from RptTable
where rpt_id in (
select LEFT(Reports, NULLIF(LEN(Reports)-1,-1))
from repoAccess1
where uid = 'VIKRAM'
)
this is my sql query In which i have use the sub query to access selected field
in this sub query returns
select LEFT(Reports, NULLIF(LEN(Reports)-1,-1))
from repoAccess1
where uid = 'VIKRAM'
Returns
1,2
that means the query should be like
select rptName
from RptTable where rpt_id in (1,2)
But i m getting this error
Msg 8114, Level 16, State 5, Line 1
Error converting data type nvarchar to numeric.
could anyone tell me ow to modify to get exact ans

It's a little hard to tell without the concrete table definitions, but I'm pretty sure you're trying to compare different data types to each other. If this is the case you can make use of the CAST or the CONVERT function, for example:
SELECT
[rptName]
FROM [RptTable]
WHERE [rpt_id] IN
(
SELECT
CONVERT(int, LEFT([Reports], NULLIF(LEN([Reports]) - 1, -1)))
FROM [repoAccess1]
WHERE [uid] = 'VIKRAM'
)
UPDATE: Since you have updated your question: The LEFT function returns results of either varchar or nvarchar data type. So the resulting query would be
SELECT
[rptName]
FROM [RptTable]
WHERE [rpt_id] IN('1', '2')
Please note the apostrophes (is this the correct term?) around the values. Since [rpt_id] seems to be of data type int the values cannot implicitly be converted. And that's where the aforementioned CAST or CONVERT come into play.

If I understand correctly, the subquery is returning a single row with a value of '1,2'. This is not a number, hence the error.
Before continuing, let me emphasize that storing values in comma delimited string is not the SQL-way of doing things. You should have one row per id, with proper types and foreign keys defined.
That said, sometimes we are stuck with other people's really bad design decisions. If this is the case, you can use LIKE:
select rptName
from RptTable r
where exists (select 1
from repoAccess1 a
where a.uid = 'VIKRAM' and
',' + a.reports + ',' like '%,' + cast(r.rpt_id as varchar(255)) + ',%'
);

select rptName
from RptTable
where rpt_id in (
select CAST(LEFT(Reports, NULLIF(LEN(Reports)-1,-1)) AS INT) as Val
from repoAccess1
where uid = 'VIKRAM'
)

Your query would work fine when (LEFT(Reports, NULLIF(LEN(Reports)-1,-1)) ) returns either 1 or 2 since SQL Server implicitly converts the varchar value to numeric.
It seems there might be a data issue. One of the data returned by LEFT function is non-numeric. In order to find that particular record you can use isnumeric function. Try like this,
SELECT rptName
FROM RptTable
WHERE rpt_id IN (
SELECT LEFT(Reports, NULLIF(LEN(Reports) - 1, - 1))
FROM repoAccess1
WHERE uid = 'VIKRAM'
AND ISNUMERIC(LEFT(Reports, NULLIF(LEN(Reports) - 1, - 1))) = 1
)

How to aggragate integers in postgresql?

I have a query that gives list of IDs:
ID
2
3
4
5
6
25
ID is integer.
I want to get that result like that in ARRAY of integers type:
ID
2,3,4,5,6,25
I wrote this query:
select string_agg(ID::text,',')
from A
where .....
I have to convert it to text otherwise it won't work. string_agg expect to get (text,text)
this works fine the thing is that this result should later be used in many places that expect ARRAY of integers.
I tried :
select ('{' || string_agg(ID::text,',') || '}')::integer[]
from A
WHERE ...
which gives: {2,3,4,5,6,25} in type int4 integer[]
but this isn't the correct type... I need the same type as ARRAY.
for example SELECT ARRAY[4,5] gives array integer[]
in simple words I want the result of my query to work with (for example):
select *
from b
where b.ID = ANY (FIRST QUERY RESULT) // aka: = ANY (ARRAY[2,3,4,5,6,25])
this is failing as ANY expect array and it doesn't work with regular integer[], i get an error:
ERROR: operator does not exist: integer = integer[]
note: the result of the query is part of a function and will be saved in a variable for later work. Please don't take it to places where you bypass the problem and offer a solution which won't give the ARRAY of Integers.
EDIT: why does
select *
from b
where b.ID = ANY (array [4,5])
is working. but
select *
from b
where b.ID = ANY(select array_agg(ID) from A where ..... )
doesn't work
select *
from b
where b.ID = ANY(select array_agg(4))
doesn't work either
the error is still:
ERROR: operator does not exist: integer = integer[]

Expression select array_agg(4) returns set of rows (actually set of rows with 1 row). Hence the query
select *
from b
where b.id = any (select array_agg(4)) -- ERROR
tries to compare an integer (b.id) to a value of a row (which has 1 column of type integer[]). It raises an error.
To fix it you should use a subquery which returns integers (not arrays of integers):
select *
from b
where b.id = any (select unnest(array_agg(4)))
Alternatively, you can place the column name of the result of select array_agg(4) as an argument of any, e.g.:
select *
from b
cross join (select array_agg(4)) agg(arr)
where b.id = any (arr)
or
with agg as (
select array_agg(4) as arr)
select *
from b
cross join agg
where b.id = any (arr)
More formally, the first two queries use ANY of the form:
expression operator ANY (subquery)
and the other two use
expression operator ANY (array expression)
like it is described in the documentation: 9.22.4. ANY/SOME
and 9.23.3. ANY/SOME (array).

How about this query? Does this give you the expected result?
SELECT *
FROM b b_out
WHERE EXISTS (SELECT 1
FROM b b_in
WHERE b_out.id = b_in.id
AND b_in.id IN (SELECT <<first query that returns 2,3,4,...>>))
What I've tried to do is to break down the logic of ANY into two separate logical checks in order to achieve the same result.
Hence, ANY would be equivalent with a combination of EXISTS at least one of the values IN your list of values returned by the first SELECT.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive - joining two tables to find string that like string in reference table - hive

Related

How can reiterate a certain statement in select in oracle

Teradata - selecting between two columns based on whether it starts with a number or not

How to use INTO and GROUP BY clause together

how to convert the output of sub query into numeric

How to aggragate integers in postgresql?

Categories

Resources