HIVE: How to include null rows in lateral view explode

HIVE: How to include null rows in lateral view explode - hive

I have a table as follows:
user_id email
u1 e1, e2
u2 null
My goal is to convert this into the following format:
user_id email
u1 e1
u1 e2
u2 null
So for this I am using the lateral view explode() function in Hive, as follows:
select * FROM table
LATERAL VIEW explode (split(email ,',')) email AS email_id
But doing this the u2 row is getting skipped as it has null value in email. How can we include the nulls too in the output?
Edit: I am using a workaround doing an union of this table with the base table without explode, but I think the data will be scanned one more time because of this. I wanted to know if there is a better way to do it.

include OUTER in the query to get rows with NULL values
something like,
select * FROM table LATERAL VIEW OUTER explode ( split ( email ,',' ) ) email AS email_id;
check this link -> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#LanguageManualLateralView-OuterLateralViews

Related

SQL: select all depending nodes from a list of parent

let's take a table called users like this
id
name
...
path
22
John
...
2/8/11/22/
23
Mark
...
1/3/9/15/21/23/
where the path rapresents the hierarchy parent-child.
Now, I have a list of "special users" which we can call them "board-users" and, for semplicity, let's say they have id 1 to 10.
I would like to select all the rows which derive from the board users (including board users rows itself) and I would like to add a column related to the board users parent; something like this:
id
name
...
path
board_parent_id
1
Stephany
...
1/
1
2
Karl
...
2/
2
...
...
...
...
...
83
Lucy
...
4/11/43/51/69/73/83/
4
I have tried something like
SELECT u1.id as board_parent_id, u2.*
FROM USERS AS u1
CROSS JOIN USERS AS u2
WHERE u1.id = '1'
AND u2.path LIKE '%1%'
UNION
SELECT u1.id as board_parent_id, u2.*
FROM USERS AS u1
CROSS JOIN USERS AS u2
WHERE u1.id = '2'
AND u2.path LIKE '%2%'
UNION
...
but honestly I believe this is a very stupid way to do this

First observation:
your current query should search for '%/1/%' not '%1%'
the latter will match '63/41/999/', when you don't want it to
But that also means that your path should start with '/'
or concat '/' to the start in your query
If you know the list of golden id's, then it's just a join?
WITH
golden(id) AS
(
VALUES
(1),(2),(3)...,(n)
)
SELECT
golden.id AS board_parent_id,
users.*
FROM
golden
INNER JOIN
users
ON CONCAT('/', user.path) LIKE CONCAT('%/', golden.id, '/%')

Thank to the comments and other answers I finally got my query done.
I post here an example
WITH BOARD AS(
SELECT * FROM users
WHERE ID IN ('id_1', 'id_2', 'id_3', 'id_4'...)
)
SELECT BOARD.id AS board_id,
u.*
FROM users AS u
INNER JOIN BOARD ON u.path LIKE CONCAT('%',BOARD.id.'%')
This way, it seems to work the way I wanted

How to find common matches for different column values in SQL

I have the following table "Friends":
The goal is to find out how many users have the exact same list of friends.
In this case, the result would be user_id 1 and user_id 4 since both user 1 and user 4 are friends with "2" and "3".
I think I am on the right track by using the code below:
SELECT * FROM Friends A, Friends B WHERE A.friend_id=B.friend_id AND A.user_id <> B.user_id
However, I am not able to figure out how to finish the query so that it calculates the matching list of friends. Does anyone have any suggestions?

You didn't provide your SQL type.
For MySQL, you can group concat the friends for every user and cross by this value. The default is ",". Then join the same table to compare by the same list of friends.
SELECT t1.friends, t1.friends FROM
(
SELECT user_id,GROUP_CONCAT(friend_id) as friends
FROM friends
GROUP BY friend_id
) as t1
JOIN
(
SELECT user_id,GROUP_CONCAT(friend_id) as friends
FROM friends
GROUP BY friend_id
) as t2
on t1.friends = t2.friends

SQL selecting multiple record by different variable

Ok, so I have a table Assignment:
[UserId]
[GroupId]
[UpdatedBy]
[UpdatedAt]
Also, I have a function for returning users from a specific group:
select UserId
from dbo.GetGroupUsers() ggu
where ggu.GroupId = ?
In Assignment, I want to check all groups that our user is listed and then I want to select ALL users from these groups, without duplicate.
How can I achieve this?
Edit:
Sample output form selecting groupid = 4
for example user "test1" belong to other group where id=2 at the same time and i want selected all users from for example group 2 and 4 (cause in this example test1 said so) without records duplicate

All groups from one UserId (say UserId 10):
select GroupId from Assignment where UserId = 10
Select all users from those groups (without duplicate):
select distinct UserId
from dbo.GetGroupUsers() ggu
where ggu.GroupId in (select GroupId from Assignment where UserId = 10)
I hope this is what you wanted.

An inner self join should get you the IDs of the users you're looking for. Join the result with your user table (which you didn't post) to possibly get other information about these users.
SELECT DISTINCT
a2.userid
FROM assignment a1
INNER JOIN assignment a2
ON a2.groupid = a1.groupid
WHERE a1.userid = ?;
(Replace the ? with the ID of the user, you want to start with.)

Assuming your input is a user id:test1 and assuming that you are just looking at one table (Assignment)
DECLARE #UserId INT = 2
;WITH GROUPS AS
(
SELECT DISTINCT GroupId FROM Assignment WHERE UserId = #UserId
)
SELECT distinct assgn.UserName, gps.GroupId FROM Assignment assgn
INNER JOIN
GROUPS gps ON
assgn.GroupId = gps.GroupId
Please let me know if this helps

Check if any children exist

Considering these two tables:
Position(positionid, positiontext, reportstopositionid)
User(userid, positionid)
How can I check if a user has any subordinates in one query?
Is it even possible?
A subordinate:
user (a) with positionid has at least one or more subordinates if there exists any users (b) with the positionid of user (a) as reportstopositionid to users (b) corresponding positionid

This will return users who have subordinates:
SELECT *
FROM User u
WHERE EXISTS (
SELECT 1
FROM Position p
WHERE p.reportstopositionid = u.positionid
)

how about this one?
SELECT DISTINCT a.*
FROM user a
INNER JOIN position b
ON a.userID = b.reportstopositionID
the records returned by this query are those which has matching IDs userid on position table at column reportstopositionID

I think you want to do this with a where clause:
select u.*
from user u
where u.positionId in (select reportstopositionid from position p)
This gets the list of users who match, without duplicates.

Getting two rows "in scope" for a PostgreSQL function call: cross join?

I have a user-defined function similarity() that compares two users and returns a score. Right now, the calling query looks like:
SELECT similarity(user1, user2)
FROM (
SELECT users.id
FROM users AS user1
WHERE users.id = 123
)
CROSS JOIN
(
SELECT users.id
FROM users AS user2
WHERE users.id = 456
);
This feels messy. Is there a better way to set up the two users for the function call?

If there is only one from each table (seems likely that id is a PK) then
SELECT similarity((SELECT a FROM users a WHERE a.id=123),
(SELECT b FROM users b WHERE b.id=456));

Simplified form that does exactly the same:
SELECT similarity(a, b)
FROM users a, users b
WHERE a.id = 123
AND b.id = 456;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HIVE: How to include null rows in lateral view explode - hive

Related

SQL: select all depending nodes from a list of parent

How to find common matches for different column values in SQL

SQL selecting multiple record by different variable

Check if any children exist

Getting two rows "in scope" for a PostgreSQL function call: cross join?

Categories

Resources