Hive sql expand multiple columns to rows - sql

I have a hive table which has the following format:
user | item | like | comment
Joe 5 1 0
Lan 3 0 1
Mack 5 1 1
and I want use HIVE SQL to convert like and comment to the user behavior column, then keep rows which user and item and times of behaviors:
user | item | behavior | times
Joe 5 like 1
Joe 5 comment 0
Lan 3 like 0
Lan 3 comment 1
Mack 5 like 1
Mack 5 comment 1
could you please give any advice?

Using map and explode.
select user,item,behavior,times
from tbl
lateral view explode(map('like',like,'comment',comment)) t as behavior,times
As as side note, you should avoid using reserved keywords like user, like, comment as column names.

Great answer by Prabhala and Linoff. Here I'm offering yet another way, the builtin UDTF stack, which is both intuitive and native.
select
stack(2, user, item, 'like', like,
user, item, 'comment', comment)
as (user, item, behavior, times)
from tbl
);

One method uses union all:
select user, item, 'like' as behavior, like as times
from t
union all
select user, item, 'comment' as behavior, comment as times
from t;

Related

Split not-atomar value into multiple rows with PostgreSQL

I have some not atomar data in a database like this:
ID
Component ID List
1
123, 456
2
123, 345
I need to transform those table into a view that provides the "Component ID List" in a way, that I can use joins. Expected result:
ID
Component ID List
1
123
1
456
2
123
2
345
Because I have this case in quite a few tables I look for the possibility to create a reusable way to perform this action, e.g. with a SQL-function. The tables have different column-names so the function would need a parameter, like this:
SELECT *, split_values("Component ID List") FROM xyz
I know the best way would be to fix the problem in the raw-data but that's not possible in this case.
Any suggestions how to solve this the best way possible?
You can use unnest(string_to_array(Component_ID_List, ', ')):
SELECT ID,
unnest(string_to_array(Component_ID_List, ', ')) as Component_ID_List
FROM table_name;
Fiddle

Case statement logic and substring

Say I have the following data:
Passes
ID | Pass_code
-----------------
100 | 2xBronze
101 | 1xGold
102 | 1xSilver
103 | 2xSteel
Passengers
ID | Passengers
-----------------
100 | 2
101 | 5
102 | 1
103 | 3
I want to count then create a ticket in the output of:
ID 100 | 2 pass (bronze)
ID 101 | 5 pass (because it is gold, we count all passengers)
ID 102 | 1 pass (silver)
ID 103 | 2 pass (steel)
I was thinking something like the code below however, I am unsure how to finish my case statement. I want to substring pass_code so that we get show pass numbers e.g '2xBronze' should give me 2. Then for ID 103, we have 2 passes and 3 customers so we should output 2.
Also, is there a way to firstly find '2xbronze' if the pass_code contained lots of other things such as '101001, 1xbronze, FirstClass' - this may change so i don't want to substring, could we search for '2xbronze' and then pull out the 2??
SELECT
CASE
WHEN Passes.pass_code like '%gold%' THEN Passengers.passengers
WHEN Passes.pass_code like '%steel%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%bronze%' THEN SUBSTRING(passes.pass_code, 1,1)
WHEN Passes.pass_code like '%silver%' THEN SUBSTRING(passes.pass_code, 1,1)
else 0 end as no,
Passes.ID,
Passes.Pass_code,
Passengers.Passengers
FROM Passes
JOIN Passengers ON Passes.ID = Passengers.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=db698e8562546ae7658270e0ec26ca54
So assuming you are indeed using Oracle (as your DB fiddle implies).
You can do some string magic with finding position of a splitter character (in your case the x), then substringing based on that. Obviously this has it's problems, and x is a bad character seperator as well.. but based on your current set.
WITH PASSCODESPLIT AS
(
SELECT PASSES.ID,
TO_Number(SUBSTR(PASSES.PASS_CODE, 0, (INSTR(PASSES.PASS_CODE, 'x')) - 1)) AS NrOfPasses,
SUBSTR(PASSES.PASS_CODE, (INSTR(PASSES.PASS_CODE, 'x')) + 1) AS PassType
FROM Passes
)
SELECT
PASSCODESPLIT.ID,
CASE
WHEN PASSCODESPLIT.PassType = 'gold' THEN Passengers.Passengers
ELSE PASSCODESPLIT.NrOfPasses
END AS NrOfPasses,
PASSCODESPLIT.PassType,
Passengers.Passengers
FROM PASSCODESPLIT
INNER JOIN Passengers ON PASSCODESPLIT.ID = Passengers.ID
ORDER BY PASSCODESPLIT.ID ASC
Gives the result of:
ID NROFPASSES PASSTYPE PASSENGERS
100 2 bronze 2
101 5 gold 5
102 1 silver 1
103 2 steel 3
As can also be seen in this fiddle
But I would strongly advise you to fix your table design. Having multiple attributes in the same column leads to troubles like these. And the more variables/variations you start storing, the more 'magic' you need to keep doing.
In this particular example i see no reason why you don't simply have the 3 columns in Passes, also giving you the opportunity to add new columns going forward. I.e. to keep track of First class.
You can extract the numbers using regexp_substr(). So I think this does what you want:
SELECT (CASE WHEN p.pass_code LIKE '%gold%'
THEN TO_NUMBER(REGEXP_SUBSTR(p.pass_code, '^[0-9]+'))
ELSE pp.passengers
END) as num,
p.ID, p.Pass_code, pp.Passengers
FROM Passes p JOIN
Passengers pp
ON p.ID = pp.ID;
Here is a db<>fiddle.
This converts the leading digits in the code to a number. Also note the use of table aliases to simplify the query.

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

I want a hierachical order in my output but dont know how?

I want in the output a hierarchical order like so:
My Data :
Name | Cost | Level
----------------+--------+------
Car1 | 2000 | 1
Component1.1 | 3000 | 2
Component1.2 | 2300 | 3
Computer2 | 5000 | 1
Component2.1 | 2000 | 2
Component2.2 | Null | 3
Output: Show all those data, which has money in it and order it by the level, something like first 1, then 2, then,3 and after that start with 1 again.
Name | Level
------------------+------
Car1 | 1
Component1.1 | 2
Component1.2 | 3
Computer | 1
Component2.1 | 2
What ORDER BY does is:
Name | Level
----------------+------
Car1 | 1
Computer1 | 1
Component1.1 | 2
Component2.1 | 2
Component1.2 | 3
Component2.2 | 3
I tried the CONNECT BY PRIOR function and it didn't work well
SELECT Name, Level
FROM Product
CONNECT BY PRIOR Level;
In MySQL you would normaly use 'order by'. So if you want to order on table row "level" your synntax would be something like this:
SELECT * FROM items ORDER BY level ASC
You can make use of ASC (Ascending) or DESC (descending).
Hope this will help you.
You can use below one
SELECT Name, Level
FROM Auction
ORDER BY Name, Level
WHERE Money!='Null' ;
Doing order by Name will print the result in hierarchical order, but if it has a column called parent id, then it would have been easier to show.
i suggest this for you :
SELECT Name, Level FROM Product ORDER BY Name, Level WHERE Money!='Null' ASC;
i wish this help you brother
It is still not clear whether this is what you are really expecting. It seems to me from your data set, you want to numerically order the components based on some kind of a version number at the end of the component. If that's truly what you want then you may ignore the non-numeric characters in the name and order by pure numbers towards the end of string ( with the required where clause ).
ORDER BY REPLACE ( name, TRANSLATE(name,' .0123456789',' '),'');
If that's the case, then the adding level too to the ORDER BY shouldn't make any difference unless your numeric order of versions and level are in sync.
A problem may appear if you have components like component2_name1.2 etc which could break this logic, for which you may require REGEXP to identify the required pattern. But, it doesn't appear so from your data and I assumed that to be the case and you may want to clarify if that's not what you may always have in your dataset.
Here's a demo of the result obtained for your sample data.
Demo
This will work of the numeric character is always a valid decimal and has only one decimal point. If you have complex versioning system like say 1.1.8, 2.1.1 etc, it needs far sophisticated ordering on top of REPLACE ( name, TRANSLATE(name,' .0123456789',' '),'').
You will find such examples in posts such as this one Here
Note: I would request you to also please read the instructions here to know how to ask a good question. This would avoid all confusion to people who try to understand and answer your question.

How can I select only rows with multiple hits for a specific column?

I am not sure how to phrase this question so I'll give an example:
Suppose there is a table called tagged that has two columns: tagger and taggee. What would the SQL query look like to return the taggee(s) that are in multiple rows? That is to say, they have been tagged 2 or more times by any tagger.
I would like a 'generic' SQL query and not something that only works on a specific DBMS.
EDIT: Added "tagged 2 or more times by any tagger."
HAVING can operate on the result of aggregate functions. So if you have data like this:
Row tagger | taggee
--------+----------
1. Joe | Cat
2. Fred | Cat
3. Denise | Dog
4. Joe | Horse
5. Denise | Horse
It sounds like you want Cat, Horse.
To get the taggee's that are in multiple rows, you would execute:
SELECT taggee, count(*) FROM tagged GROUP BY taggee HAVING count(*) > 1
That being said, when you say "select only rows with multiple hits for a specific column", which row do you want? Do you want row 1 for Cat, or row 2?
select distinct t1.taggee from tagged t1 inner join tagged t2
on t1.taggee = t2.taggee and t1.tagger != t2.tagger;
Will give you all the taggees who have been tagged by more than one tagger