HiveSQL: how to merge lines ? - hive

i have one question about Hive and SQL need to resolve, please help me !
now i have one table like this:
id col1 col2
1 10 100
1 20 100
1 30 200
2 10 100
2 10 200
2 20 200
then i want to be like this:
id 100 200
1 10,20 30
2 10 10,20
how should i do?

Related

How to calculate leftovers of each balance top-up using first in first out technique?

Imagine we have user balances. There's a table with top-up and withdrawals. Let's call it balance_updates.
transaction_id
user_id
current_balance
amount
created_at
1
1
100
100
...
2
1
0
-100
3
2
400
400
4
2
300
-100
5
2
200
-200
6
2
300
100
7
2
50
-50
What I want to get off this is a list of top-ups and their leftovers using the first in first out technique for each user.
So the result could be this
top_up
user_id
leftover
1
1
0
3
2
50
6
2
100
Honestly, I struggle to turn it to SQL. Tho I know how to do it on paper. Got any ideas?

SQL: Selecting Random Sample based on ID with multiple rows for each ID

My data has the following Structure
ID
Month
Year
Revenue
1
1
20
860
1
2
20
22
1
5
20
339
2
3
20
12098
3
3
20
12
3
4
20
10
3
6
20
9
3
7
20
122
3
8
20
11
There are 1000s of IDs and I want to select a random sample of 100 IDs. So if I randomly select ID 3, I need all rows of data for ID 3. I have to use SQL for this. I welcome any suggestions.
You can use following query.
For MS-Sql
Select top 100 * from table_name where ID=$randomId ORDER BY NEWID(); //like ID=3
For My-Sql
Select * from table_name where ID=$randomId ORDER BY RAND() LIMIT 100; //like ID=3

Sum Multiple Rows But Retain the Number of Rows in a Result

In my SQL Server 2008 stored procedure, I have a table variable with RecordID, TotalMinutes, ProcessID.
Declare #tblSum table(RecordID int, TotalMinutes int, ProcessID int)
RecordID is my primary key, total minutes is the total minutes, and I have different processes but these processes are repeated multiple times on my data.
Here is an example of my data:
RecordID TotalMinutes ProcessID
--------------------------------------------
1 10 1
2 20 1
3 30 1
4 10 2
5 40 2
6 10 2
7 10 3
8 55 3
9 60 3
10 15 4
My plan is to return the data by totaling or adding all the data with same ProcessID and put it on a new table variable with FinalMinutes column just like the table below:
RecordID TotalMinutes ProcessID FinalMinutes
-----------------------------------------------------
1 10 1 60
2 20 1 60
3 30 1 60
4 10 2 80
5 60 2 80
6 10 2 80
7 10 3 125
8 55 3 125
9 60 3 125
10 15 4 15
I cannot do a group by since it will cut the result into 4 rows. I need to retain the number of rows, and every data it has, I will just add a FinalMinutes column on a new table variable.
Here is one way using SUM()Over() windowed aggregate function
Select *,
FinalMinutes = sum(TotalMinutes)over(partition by ProcessID)
From yourtable

SQL Query to turn my columns un to rows

Im looking for a way to make this happen:
From this:
id date budget forecast actual
0 1/1/111 100 200 5
1 2/1/111 10 20 3
2 3/1/111 300 5000 1
3 4/1/111 400 800 0
To this:
column1 column2 column3 column4 column5
id 0 1 2 3
date 1/11/1111 2/11/1111 3/11/1111 4/11/1111
budget 100 10 300 400
forecast 200 20 5000 800
actual 5 3 1 0
Is there any way of doing this in SQL?
Thanks in advance.

Possible to group by counts?

I am trying to change something like this:
Index Record Time
1 10 100
1 10 200
1 10 300
1 10 400
1 3 500
1 10 600
1 10 700
2 10 800
2 10 900
2 10 1000
3 5 1100
3 5 1200
3 5 1300
into this:
Index CountSeq Record LastTime
1 4 10 400
1 1 3 500
1 2 10 700
2 3 10 1000
3 3 5 1300
I am trying to apply this logic per unique index -- I just included three indexes to show the outcome.
So for a given index I want to combine them by streaks of the same Record. So notice that the first four entries for Index 1 have Records 10, but it is more succinct to say that there were 4 entries with record 10, ending at time 400. Then I repeat the process going forward, in sequence.
In short I am trying to perform a count-grouping over sequential chunks of the same Record, within each index. In other words I am NOT looking for this:
select index, count(*) as countseq, record, max(time) as lasttime
from Table1
group by index,record
Which combines everything by the same record whereas I want them to be separated by sequence breaks.
Is there a way to do this in SQL?
It's hard to solve your problem without having a single primary key, so I'll assume you have a primary key column that increases each row (primkey). This request would return the same table with a 'diff' column that has value 1 if the previous primkey row has the same index and record as the current one, 0 otherwise :
SELECT *,
IF((SELECT index, record FROM yourTable p2 WHERE p1.primkey = p2.primkey)
= (SELECT index, record FROM yourTable p2 WHERE p1.primkey-1 = p2.primkey), 1, 0) as diff
FROM yourTable p1
If you use a temporary variable that increases each time the IF expression is false, you would get a result like this :
primkey Index Record Time diff
1 1 10 100 1
2 1 10 200 1
3 1 10 300 1
4 1 10 400 1
5 1 3 500 2
6 1 10 600 3
7 1 10 700 3
8 2 10 800 4
9 2 10 900 4
10 2 10 1000 4
11 3 5 1100 5
12 3 5 1200 5
13 3 5 1300 5
Which would solve your problem, you would just add 'diff' to the group by clause.
Unfortunately I can't test it on sqlite, but you should be able to use variables like this.
It's probably a dirty workaround but I couldn't find any better way, hope it helps.