Finding the maximum price for a given customer id - hive

I need to write a hive query. I am working on a data set that has three columns : Customer ID, Product ID and the Price. I need to write a query which outputs the columns Customer ID and Product ID for the maximum item bought by the customer.

SELECT [customer], [product] FROM table WHERE [price] = (SELECT MAX(t.[price]) AS price
FROM table as t WHERE t.[customer] = [customer])
Could be something like this if you're wanting to find the most expensive item that a customer has purchased? I'm unsure if the syntax is 100% correct but it should give you something to go from. I've added a cheat sheet below for Hive just incase.
Hive Cheat Sheet

Using row_number():
select Customer_ID, Product_ID
from
(select Customer_ID,
Product_ID,
row_number () over ( partition by Customer_ID order by Price desc) rn
from table
where customer_id=given_customer_id --add filter if necessary
)s
where rn=1;

Related

Sum amount group by id in temporary table

This is my code:
SELECT ROW_NUMBER()OVER(order by date) as rowNo, id, amount
INTO #temptbl
FROM sales
WHERE code = 1000
I'm trying to group this temporary table and sum the amount by its id so the id will be unique.
*The row column is mandatory because it will be used later.
I've tried few ways like this,
SELECT ROW_NUMBER()OVER(order by date) as rowNo, id, amount
INTO #temptbl
FROM sales
WHERE code = 1000
GROUP BY id
and even tried subquey nad nesting but it wont work. I know the solution must be simple its just i cant see it yet. Thank you
You can use SUM with Partition BY like this:
SELECT ROW_NUMBER()OVER(order by date) as rowNo, id,
Sum(amount) over (partition by id) sumAmount
FROM sales
WHERE code = 1000
As you know, since you want the Row number then sum amount will be repeated for same ids

Redshift: Alternative to filter 1st row after "row_number() partition by", with better performance

In this example, I try to get the datetime and product name of the 1st order from each customer.
My query looks like this:
select * from(
select customerid,
orderdatetime,
productname,
row_number() over (partition by customerid order by orderdatetime) rn
from t
) where rn=1
In table t, customerid+orderdatetime can serve as the primary key, while productname is free text characters. There are huge number of customers, and each customer made significant number of orders.
I feel that in this query, much calculation is wasted in order by, because I want only the earliest (minimum). Is there really such waste? Is there alternative way to get the result, which is faster?
I'm using Amazon Redshift‎.
you can try by using corelated subquery, as customer id and orderdatetime is primary key
so it may help to get better performance
select t.* from your_table t
where orderdatetime = (select min(orderdatetime) from your_table t1
where t1.customerid=t.customerid
)

SQL generate ranks of groups and subgroups based on third column

I want to write a SQL query to generate ranks of groups and subgroups based on third column (Price in this case). While i know we can use dense_rank() to generate ranks based on one column. I have no idea how to generate the two columns of ranks as shown below in a single query.
Both the rankings are based on price. So J3 comes first because J3 sum(price) is 1600. J1 comes second because J1 sum(price) is 1500 and so on.
Any inputs are appreciated.
I have provided the sample input and output. The name of the input table is "RENTAL"
First roll up jet_type prices to the jet_type level, then create a ranking of all jet_types ordered by rolled up price, and finally use your window function in the outer query partitioned by jet_price and ordered by highest rolled up price to create rank_service_wthin_jet:
select a.jet_type, b.rownum rank_jet, a.service_type, a.price,
row_number() over(partition by a.jet_type order by a.price desc) rank_service_wthin_jet
from yourtable a join (
select jet_type, row_number() over(order by price desc) rownum from (
select jet_type, sum(price) price from yourtable
group by jet_type)a)b on a.jet_type=b.jet_type
You can generate two columns as:
select t.*,
dense_rank() over (order by jet_type) as rank_jet,
row_number() over (partition by jet_type order by price desc) as rank_service_within_jet
. . .
This does not exactly return what is in your table. But the results are quite similar and -- even more important -- make sense.

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Sql query to select every distinct and last row along with other columns which are retrieved using inner join

I'm starting with sql language so I'll try to specify my problem at my very best. I've been reading other questions, but no one seems to fit for me. So...
I need to find the last paid price of each Item at each supplier from the InventoryEntriesDetail table.
As the table InventoryEntriesDetail contains both sales and purchases documents, I've joined InventoryEntriesDetail table to the InventoryEntries table and filtered it to get only the purchases docs.
Now I have several rows with:supplier, item, date, docNr, etc. . But i need only the last record from each supplier / item.
Any suggestions ?
As requested, here is where I get so far
SELECT
dbo.MA_InventoryEntries.CustSupp as Supplier,
dbo.MA_InventoryEntriesDetail.Item as Item,
dbo.MA_InventoryEntriesDetail.PostingDate as Date,
dbo.MA_InventoryEntries.InvRsn AS InvRsn,
dbo.MA_InventoryEntriesDetail.Qty as Qty,
dbo.MA_InventoryEntriesDetail.UnitValue as Price,
dbo.MA_InventoryEntriesDetail.DiscountFormula as Discount,
dbo.MA_InventoryEntriesDetail.EntryId as ID
FROM
dbo.MA_InventoryEntriesDetail
LEFT OUTER JOIN dbo.MA_InventoryEntries
ON dbo.MA_InventoryEntriesDetail.EntryId = dbo.MA_InventoryEntries.EntryId
WHERE
dbo.MA_InventoryEntries.CustSuppType = 6094849
ORDER BY
Supplier,
Item,
Date
Some notes:
the last EntryId is not always the record that i'm searching for
the left join is a 'must' beacuse the InventoryEntriesDetail was completed with the InvRsn only a few years ago (and it always been in InventoryEntries (not detail))
the DB I'm using has its first records back on 2001, so the tables we are joining have over 11mln records
PS: thanks Chris Albert for correcting the question. Now its much more clear. :)
You can use ROW_NUMBER() to identify the last record per supplier, item partition:
SELECT supplier, item, date, docNr, qty, price, DiscountFormula
FROM (
SELECT supplier, item, date, docNr, qty, price, DiscountFormula
ROW_NUMBER() OVER (PARTITION BY supplier, item
ORDER BY date DESC) AS rn
FROM yourtable )
WHERE t.rn = 1
You can use RANK() in place of ROW_NUMBER() if you have more than one 'last' records and you want to select all of them.
EDIT:
This is a guess of how your actual query should look like (after the edit made in the OP):
SELECT Supplier, Item, Date, InvRsn, Qty, Price, Discount, ID
FROM (
SELECT
dbo.MA_InventoryEntries.CustSupp as Supplier,
dbo.MA_InventoryEntriesDetail.Item as Item,
dbo.MA_InventoryEntriesDetail.PostingDate as Date,
dbo.MA_InventoryEntries.InvRsn AS InvRsn,
dbo.MA_InventoryEntriesDetail.Qty as Qty,
dbo.MA_InventoryEntriesDetail.UnitValue as Price,
dbo.MA_InventoryEntriesDetail.DiscountFormula as Discount,
dbo.MA_InventoryEntriesDetail.EntryId as ID,
RANK() OVER (PARTITION BY dbo.MA_InventoryEntries.CustSupp,
dbo.MA_InventoryEntriesDetail.Item
ORDER BY dbo.MA_InventoryEntriesDetail.PostingDate DESC) AS rn
FROM
dbo.MA_InventoryEntriesDetail
LEFT OUTER JOIN dbo.MA_InventoryEntries
ON dbo.MA_InventoryEntriesDetail.EntryId = dbo.MA_InventoryEntries.EntryId
WHERE
dbo.MA_InventoryEntries.CustSuppType = 6094849 ) t
WHERE t.rn = 1
ORDER BY Supplier, Item, Date