hive convert columns into a row - hive

Here is the data i have in hive:
customers
id | name
---+-------
1 | n1
2 | n2
orders
oid | cid | amt
----+-----+----
1 | 1 | 10
2 | 1 | 20
3 | 1 | 30
4 | 2 | 10
I would like to get something like this:
cid, avg(amt), oid1,oid2,oid3...etc
in other words i want each custid, avg of amt and all the order ids associated with cid in 1 row.
I have come up with something like this:
select c.id,avg(o.amt),.... from customers c join orders o on c.id = o.cid;
can some one please fill up how to achieve this please.

It would be difficult to have a column for each order id (I am assuming that there will be a varying number of orders for each customer) but you could collect them to an array and make it its own column. Also, you said you want custid, avg_amt, and all the orders; since this doesn't include name you don't need to join customers to orders.
Query
select cid
,AVG(amt) as avg_amt
,collect_list(oid) as orders_array
from customers
group by cid;
Output
1 20 [1,2,3]
2 10 [4]

Related

Join table A on table B and select only the first occurrence from B after specific date from table A

I'm trying to determine the best way to do the following.... Table a has a specific start_date. table b has a bunch of dollar amounts with various dates based on payments received and when. I only want to show the row from table b with the first date occurrence >= the start_date from table a. I also do not want to retrieve duplicates ID numbers which is what I am encountering now.
I have something like this so far...
Select a.ID, a.Start_Date
From a
Left Join (Select ID, Min(Recd_Dt) as Mindate, Total_Recd
Group by ID, Total_Recd) b on a.ID = b.ID and a.Start_Date <= b.Mindate
table a looks like this...
ID | Start_Dt
1 | 11/2/2017
2 | 11/3/2017
table b looks like this...
ID | Recd_Dt | Total_Recd
1 | 11/1/2017 | $600
1 | 11/10/2017 | $800
1 | 11/19/2017 | $100
2 | 11/2/2017 | $200
2 | 11/5/2017 | $600
2 | 11/6/2017 | $100
Id Like to see something like this...
ID | Recd_Dt | Total_Recd | Sum_of_Total_Recd_After_Start
1 | 11/10/2017 | $800 | $900
2 | 11/5/2017 | $600 | $700
furthermore, I'd like to also have a second join on the same table b that will give me a sum of any amount that occurred after the Start_Date
Give this a try:
SELECT
a.ID,
b.Recd_Dt,
b.Total_Recd,
SUM(Total_Recd) OVER(PARTITION BY a.ID) AS Sum_of_Total_Recd_After_Start
FROM a
INNER JOIN b ON a.ID = b.ID AND b.Recd_Dt > a.Start_Dt
QUALIFY ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY b.Start_Dt) = 1
1) Get all rows from table "a"
2) Get related rows from table "b" with Recd_Dt > Start_Dt
3) ROW_NUMBER orders rows by the earliest Start_Dt per each ID
4) QUALIFY ... = 1 keeps only the first row per ID grouping
5) SUM(Total_Recd) adds up the Total_Recd column per each ID grouping
I haven't tested it, but let me know if it works.

pulling data from max field

I have a table structure with columns similar to the following:
ID | line | value
1 | 1 | 10
1 | 2 | 5
2 | 1 | 6
3 | 1 | 7
3 | 2 | 4
ideally, i'd like to pull the following:
ID | value
1 | 5
2 | 6
3 | 4
one solution would be to do something like the following:
select a.ID, a.value
from
myTable a
inner join (select id, max(line) as line from myTable group by id) b
on a.id = b.id and a.line = b.line
Given the size of the table and that this is just a part of a larger pull, I'd like to see if there's a more elegant / simpler way of pulling this directly.
This is a task for OLAP-functions:
select *
from myTable a
qualify
rank() -- assign a rank for each id
over (partition by id
order by line desc) = 1
Might return multiple rows per id if they share the same max line. If you want to return only one of them, add another column to the order by to make it unique or switch to row_number to get an indeterminate row.

SUM in multi-currency

I am trying to do SUM() in a multi-currency setup. The following will demonstrate the problem that I am facing:-
Customer
-------------------------
Id | Name
1 | Mr. A
2 | Mr. B
3 | Mr. C
4 | Mr. D
-------------------------
Item
-------------------------
Id | Name | Cost | Currency
1 | Item 1 | 5 | USD
2 | Item 2 | 2 | EUR
3 | Item 3 | 10 | GBP
4 | Item 4 | 5 | GBP
5 | Item 5 | 50 | AUD
6 | Item 6 | 20 | USD
7 | Item 3 | 10 | EUR
-------------------------
Order
-------------------------
User_Id | Product_Id
1 | 1
2 | 1
1 | 2
3 | 3
1 | 5
1 | 7
1 | 5
2 | 6
3 | 4
4 | 2
-------------------------
Now, I want the output of a SELECT query that lists the Customer Name and the total amount worth of products purchased as:-
Customer Name | Amount
Mr. A | Multiple-currencies
Mr. B | 25 USD
Mr. C | 15 GBP
Mr. D | 2 EUR
So basically, I am looking for a way to add the cost of multiple products under the same customer, if all of them have the same currency, else simply show 'multiple-currencies'. Running the following query will not help:-
SELECT Customer.Name, SUM(Item.Amount) FROM Customer
INNER JOIN Order ON Order.User_Id = Customer.Id
INNER JOIN Item ON Item.Id = Order.Product_Id
GROUP BY Customer.Name
What should my query be? I am using Sqlite
I would suggest two output columns, one for the currency and one for the amount:
SELECT c.Name,
(case when max(currency) = min(currency) then sum(amount)
end) as amount,
(case when max(currency) = min(currency) then max(currency)
else 'Multiple Currencies'
end) as currency
FROM Customer c INNER JOIN
Order o
ON o.User_Id = c.Id INNER JOIN
Item
ON i.Id = o.Product_Id
GROUP BY c.Name
If you want, you can concatenate these into a single string column. I just prefer to have the information in two different columns for something like this.
The above is standard SQL.
I think your query should looks like this
SELECT
Data.Name AS [Customer Name],
CASE WHEN Data.Count > 1 THEN "Multiple-currencies" ELSE CAST(Data.Amount AS NVARCHAR) END AS Amount
FROM
(SELECT
Customer.Name,
COUNT(Item.Currency) AS Count,
SUM(Item.Amount) AS Amount
FROM
Customer
INNER JOIN Order ON Order.User_Id = Customer.Id
INNER JOIN Item ON Item.Id = Order.Product_Id
GROUP BY
Customer.Name) AS Data
A subquery to get the count of currencies and then ask for them in the main query to show the total or the text "Multiple-currencies".
Sorry if there is any mistake or mistype but I don't have a database server to test it
Hope this helps.
IMO I would start by standardizing variable names. Why call ID in customer table USER_ID in order table? Just a pet peeve. Anyway, you should learn how to build queries.
start with joining the customer table to the order table on then join the result to the item table. The first join is on CUSTOMER_ID and the second join is on PRODUCT_ID. Once you have that working use SUM and GROUP BY
Ok, I managed to solve the problem this way:-
SELECT innerQuery.Name AS Name, (CASE WHEN innerQuery.Currencies=1 THEN (innerQuery.Amount || innerQuery.Currency) ELSE 'Mutliple-Currencies' END) AS Amount, FROM
(SELECT Customer.Name, SUM(Item.Amount), COUNT(DISTINCT Item.Currency) AS Currencies, Item.Currency AS Currency FROM Customer
INNER JOIN Order ON Order.User_Id = Customer.Id
INNER JOIN Item ON Item.Id = Order.Product_Id
GROUP BY Customer.Name)innerQuery

Select row that has max total value SQL Server

I have the following scheme (2 tables):
Customer (Id, Name) and
Sale (Id, CustomerId, Date, Sum)
How to select the following data ?
1) Best customer of all time (Customer, which has Max Total value in the Sum column)
For example, I have 2 tables (Customers and Sales respectively):
id CustomerName
---|--------------
1 | First
2 | Second
3 | Third
id CustomerId datetime Sum
---|----------|------------|-----
1 | 1 | 04/06/2013 | 50
2 | 2 | 04/06/2013 | 60
3 | 3 | 04/07/2013 | 30
4 | 1 | 03/07/2013 | 50
5 | 1 | 03/08/2013 | 50
6 | 2 | 03/08/2013 | 30
7 | 3 | 24/09/2013 | 20
Desired result:
CustomerName TotalSum
------------|--------
First | 150
2) Best customer of each month in the current year (the same as previous but for each month in the current year)
Thanks.
Try this for the best customer of all times
SELECT Top 1 WITH TIES c.CustomerName, SUM(s.SUM) AS TotalSum
FROM Customer c JOIN Sales s ON s.CustomerId = c.CustomerId
GROUP BY c.CustomerId, c.CustomerName
ORDER BY SUM(s.SUM) DESC
One option is to use RANK() combined with the SUM aggregate. This will get you the overall values.
select customername, sumtotal
from (
select c.customername,
sum(s.sum) sumtotal,
rank() over (order by sum(s.sum) desc) rnk
from customer c
join sales s on c.id = s.customerid
group by c.id, c.customername
) t
where rnk = 1
SQL Fiddle Demo
Grouping this by month and year should be trivial at that point.

SQL: keep one record from n-records with same foreign key

I have some records and want only to keep the lowest (min) number from a customer:
This:
Customer | Number
1 | 2
1 | 4
2 | 1
1 | 3
2 | 2
should be tranformed via sql to:
Customer | Number
1 | 2
2 | 1
using Sybase ADS local table.
You should be able to use min() and group by the customer:
select customer, min(number)
from yourtable
group by customer
See SQL Fiddle with Demo
SELECT Customer, MIN(Number)
FROM your_table
GROUP BY Customer;