Converting array of string into array of integer - hive

Right now I'm having a performance issue with this query :
select userid from table_x inner join table_y on array_contains(split(table_y.userids,','),cast(table_x.userid as string))
The userids on y is represented as a string of numbers "123, 134, 156" which actually means three userids, namely 123,134 and 156. Table_x has a userid columns which details the personal information of each user. I want to select the userid which is contained in the userids column in table_y.
Am I right in assuming that the reason for the perforamance issue is because I have to convert the userids in table_y to array of string using split(table_y.userids,',') and use array_contains for string. If so, is there anyone who knows how to convert the string of userids into array of integer?
Thank you!

It seems that you are doing a Cartesian product join. Hive cannot join on array_contains - it is applied after hive generates all possible combinations.
To truly join, you need to use explode(split(table_y.userids,',')) and then have a regular equality join:
select x.uid from (select cast(table_x.userid as string) as uid from table_x) x
inner join
(select explode(split(table_y.userids,',')) as uid from table_y) y on
x.uid=y.uid;

Related

Databricks spark sql to show associated strings from hashed strings

I'm using a query in databricks like this :
select * from thisdata where hashed_string in (sha2("mystring1", 512),sha2("mystring2", 512),sha2("mystring3", 512))
This works well and gives me the data I need, but is there a way to show the associated string to the hashed string?
example
mystring1 - 1494219340aa5fcb224f6b775782f297ba5487
mystring2 - 5430af17738573156426276f1e01fc3ff3c9e1
Probably not as theres a reason for it to be hashed, but just checking if there is a way.
If you have table with string and corresponding hash columns then you can perform inner join instead of using IN clause. After joining, using concat_ws function you can get the required result.
Let's say, you create a table with name hashtable where you have columns mystring and hashed_mystring and other table name as maintable.
You can use below query to join and extract the result in the required format.
select concat_ws('-',h.mystring, m.hashed_string) from maintable m
inner join hashtable h on m.hashed_string = h.hashed_mystring

Postgres Query based on Delimited String

we have a column in one of our tables that consist of a delimited string of ID's, like this: 72,73,313,502.
What I need to do in the query, is to parse that string, and for each ID, join it to another table. So in the case of this string, I'd get 4 rows.......like the ID and the name from the other table.
Can this be done in a query?
One option is regexp_split_to_table() and a lateral join. Assuming that the CSV string is stored in table1(col) and that you want to join each element against table2(id), you would do:
select ... -- whatever columns you want
from table1 t1
cross join lateral regexp_split_to_table(t1.col, ',') x(id)
inner join table2 t2 on t2.id = x.id::int
It should be noted that storing CSV strings in a relational database is bad design, and should almost always be avoided. Each value in the list should be stored as a separate row, using the proper datatype. Recommended reading: Is storing a delimited list in a database column really that bad?

How do I convert a list of arbitrary values into a subquery that can be used in a join

I want to join a list of values with table rows in SQL.
Shortly, I have a list of elements and I want to join this element with a given table.
For example, my list is (1,3,5,100,200,700) and the table is:
id | val
1 | a
2 | b
3 | c
4 | d
100| e
200| f
I know how to do that with in clause:
SELECT * FROM table
WHERE id IN(list)
Unfortunately, in my situation (very-very long list) I cannot use in clause and required using join with that list.
How to convert list (in the format that "in" can deal with) to something that can be joined.
Important constraint: I don't have writing permissions in the database so answers like "write this list into new table and then join them" - don't help me. Generally, I need to use this practice hundreds times in the code, so create a new table for every query not feasible even I have writing permissions.
Can you help me, please?
If that list is a string, you can e.g. replace the parentheses with curly braces and cast it to an array:
where id = any('{1,3,5,100,200,700}'::int[])
Or if you can't change the input:
where id = any(translate('(1,3,5,100,200,700)', '()', '{}')::int[])
The same approach can be used for joining:
select mt.*
from my_table mt
join unnest('{1,3,5,100,200,700}'::int[]) as x(id) on x.id = mt.id
But those two solutions aren't the same: if the list contains duplicates the join will return duplicate rows as well. The id = any() condition will not do that.

How to use a column name as part of string pattern and replacement string in Replace() statement?

I would like to replace a string in a column which includes values that comes from another column. The code below is what I envisaged could be possible but it doesn't work:
SELECT
REPLACE(m1.MovieDescription,'/img/movie/'+m1.MovieID,'/img/film/'+f1.FilmID)
FROM
Movie m1
INNER JOIN
Film f1
ON f1.MovieID = m1.MovieID
The MovieID and FilmID values are unique for each row. Is there a way to achieve the above or do I need to resort to dynamic sql statement with a cursor?
Your query should be fine, if that is what you want to do. If the ids are numbers, use CONCAT() instead of + to avoid a type conversion error:
SELECT REPLACE(m1.MovieDescription,
CONCAT('/img/movie/', m.MovieID),
CONCAT('/img/film/', f.FilmID)
)
FROM Movie m JOIN INNER JOIN
Film f1
ON f.MovieID = m.MovieID;

Postgresql, sql command, join table with similar string, only string "OM:" is at the begin

I wanna join table.
left join
c_store on o_order.customer_note = c_store.store_code
String in field is almost same, just contains "OM:" on start of field, for example, field from o_order.customer_note is
OM:4008
and from c_store.store_code is
4008
Is possible to join table c_store.store_code based on remove (or replace ) from every field in o_order.customer_note?
I tried
c_store on replace(o_order.customer_note, '', 'OM:') = c_store.store_code
but no success. I think, this is only for rename column name, right? Sorry for this question, I am new in this.
Thanks.
Use a string concatenation in your join condition:
SELECT ...
FROM o_order o
LEFT JOIN c_store c
ON o.customer_note = 'OM:' || c.store_code::text;
But not that while the above logic might fix your query in the short term, in the long term the better fix would be to have proper join columns setup in your database. That is, it is desirable to be able to do joins on equality alone. This would let Postgres use an index, if it exists.