Get distinct of minimum value - hive

I have this table test in Hive.
+----------+-------+-------+
| name | price | notes |
+----------+-------+-------+
| product1 | 100 | |
| product1 | 200 | note1 |
| product2 | 10 | note2 |
| product2 | 5 | note2 |
+----------+-------+-------+
and I expect to get this result (distinct of products with minimum price)
+----------+-------+-------+
| name | price | notes |
+----------+-------+-------+
| product1 | 100 | |
| product2 | 5 | note2 |
+----------+-------+-------+
I can't use the following query because of different notes in product1.
SELECT name, MIN(price), notes
FROM test
GROUP BY name, notes;
+----------+-------+-------+
| name | price | notes |
+----------+-------+-------+
| product1 | 100 | |
| product1 | 200 | note1 |
| product2 | 5 | note2 |
+----------+-------+-------+

Remove notes in group by and try again:-
SELECT name, MIN(price), notes
FROM test
GROUP BY name
Run Code

Try this
SELECT name,
SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price DESC),',',1) AS min_price,
SUBSTRING_INDEX(GROUP_CONCAT(notes ORDER BY price DESC),',',1) AS note_value
FROM test
GROUP BY name;

You can do this in Hive with windowing functions.
Query:
select distinct name
, min_price
, notes
from (
select *
, min(price) over (partition by name) num_price
from db.table ) x
where min_price = price
Output:
product1 100
product2 5 note2

This can be found using a subquery as well.
hive> select A.name,A.price,B.notes from (select name,min(price) as price from products group by name) as A
inner join (select name,price,notes from products) as B
on a.name = b.name and a.price = b.price;
The above query will give the output as:
product1 100
product2 5 note2
But, the suquery approach has 2 iterations over the same table and is not suggested for larger tables.
For larger tables,see #GoBrewers14 answer:
hive> select name,price,notes from (select *, min(price)over(partition by name) as min_price from products) as a
> where a.price = a.min_price;

Related

Retrieve the minimal create date with multiple rows

I have an issue with an SQL query that I am trying to write. I am trying to retrieve the row that has the minimal create_dt for each inst (see table) and amount (which isn't unique).
Unfortunately I can't use group by as the amount column isn't unique.
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company A | 400 | 4545 | 01/11/2018 |
| Company A | 200 | 4545 | 31/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
| Company B | 212 | 4893 | 04/10/2016 |
| Company B | 100 | 4893 | 10/10/2017 |
| Company B | 20 | 4893 | 04/10/2018 |
+--------------+--------+------+-------------+
In the above example I expect to see:
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
+--------------+--------+------+-------------+
Code:
SELECT
bill_company, bill_name, account_no
FROM
dbo.customer_information;
SELECT
balance_id, balance_id2, minus_balance,new_balance,
create_date, account_no
FROM
dbo.btr
SELECT
balance_id, balance_id2, expired_Date, amount, balance_type, account_no
FROM
dbo.btr_balance
SELECT
balance_ist, expired_date, account_no, balance_type
FROM
dbo.BALANCE_inst
Retrieve the minimal create data for a balance instance with the lowest balance for a balance inst.
(SELECT
bill_company,
bill_name,
account_no,
balance_ist,
amount,
MIN(create_date)
FROM
dbo.mtr btr
LEFT JOIN
btr_balance btrb ON btr.balance_id = btrb.balance_id
AND btr.balance_id2 = btrb.balance_id2
LEFT JOIN
balance_inst bali ON btr.account_no = bali.account_no
AND btrb.expired_date = bali.expired_date
GROUP BY
bill_company, bill_name, account_no,amount, balance_ist)
I have seen some solutions about using correlated query but can't see to get my head around it.
Common Table Expression (CTE) will help you.
;with cte as (
select *, row_number() over(partition by company_name order by create_date) rn
from dbo.myTable
)
select * from cte
where rn = 1;
use row_number() i assumed bill_company is your company name
select * from
( SELECT bill_company,
bill_name,
account_no,
balance_ist,
amount,
create_date,
row_number() over(partition by bill_company order by create_date) rn
FROM dbo.mtr btr left join btr_balance btrb
on btr.balance_id = btrb.balance_id and btr.balance_id2 = btrb.balance_id2
left join balance_inst bali
on btr.account_no = bali.account_no and btrb.expired_date = bali.expired_date
) t where t.rn=1

how to bake in a record count in a sql query

I have a query that looks like this:
select id, extension, count(distinct(id)) from publicids group by id,extension;
This is what the results looks like:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 1
18459154909 | 9891114 | 1
18459154919 | 43244 | 1
18459154919 | 8776232 | 1
18766145025 | 12311 | 1
18766145025 | 1122111 | 1
18766145201 | 12422 | 1
18766145201 | 14141 | 1
But what I really want is for the results to look like this:
id | extension | count
-------------+-------------------------+-------
18459154909 | 12333 | 2
18459154909 | 9891114 | 2
18459154919 | 43244 | 2
18459154919 | 8776232 | 2
18766145025 | 12311 | 2
18766145025 | 1122111 | 2
18766145201 | 12422 | 2
18766145201 | 14141 | 2
I'm trying to get the count field to show the total number of records that have the same id.
Any suggestions would be appreciated
I think you want to count distincts extentions, not ids.
Run this query:
select id
, extension
(select count(*) from publicids p1 where p.id = p1.id ) distinct_id_count
from publicids p
group by id,extension;
This is more or less the same as Pastor's answer. Depending on what the optimizer does it might be faster with higher record count source tables.
select p.id, p.extension, p2.id_count
from publicids p
inner join (
select id, count(*) as id_count
from publicids group by id
) as p2 on p.id = p2.id

How to mapping data with text multi level?

How to write sql statement?
Table_Product
+------------------+
| Product |
+------------------+
| AAA |
| ABB |
| ABC |
| ACC |
+------------------+
Table_Mapping
+---------------+---------------+
| ProductGroup | ProductName |
+---------------+---------------+
| A* | Product1 |
| ABC | Product2 |
+---------------+---------------+
I need the following result:
+------------+---------------+
| Product | ProductName |
+------------+---------------+
| AAA | Product1 |
| ABB | Product1 |
| ABC | Product2 |
| ACC | Product1 |
+------------+---------------+
Thanks,
TOM
The following query does what you describe when run from within the Access application itself:
SELECT Table_Product.Product, Table_Mapping.ProductName
FROM
Table_Product
INNER JOIN
Table_Mapping
ON Table_Product.Product = Table_Mapping.ProductGroup
WHERE InStr(Table_Mapping.ProductGroup, "*") = 0
UNION ALL
SELECT Table_Product.Product, Table_Mapping.ProductName
FROM
Table_Product
INNER JOIN
Table_Mapping
ON Table_Product.Product LIKE Table_Mapping.ProductGroup
WHERE InStr(Table_Mapping.ProductGroup, "*") > 0
AND Table_Product.Product NOT IN (SELECT ProductGroup FROM Table_Mapping)
ORDER BY 1
You would want to use CASE WHEN. Try this:
Select Product, (CASE
WHEN Product = 'AAA' THEN 'Product1'
WHEN Product = 'ABB' THEN 'Product1'
WHEN Product = 'ABC' THEN 'Product2'
WHEN Product = 'ACC' THEN 'Product1'
ELSE Null END) as 'ProductName'
from Table_Product
order by Product
If this is literally how it is uimplemented then you can
Select Product, ProductName
From Table_Product P
inner join Table_Mapping M
On M.Product_Group & '*' like P.Product
You would have to remove change your A* Product record to A Product though. I.e. you can't embed the wildcards in the record. If you want to have wildcards at the start it will run a lot slower.

Sql Inner Join among 2 tables summing the qty field multiple times

I have two tables , A and B
Table A contains:
OrderNo | StyleNo | Qty
O-20 | S-15 | 20
O-20 | S-18 | 40
O-25 | S-19 | 50
Table B contains:
OrderNo | StyleNo | Ship Qty
O-20 | S-15 | 5
O-20 | S-18 | 30
O-20 | S-15 | 12
O-20 | S-18 | 6
Result Requires
OrderNo | StyleNo | Qty | Ship Qty
O-20 | S-15 | 20 | 17
O-20 | S-18 | 40 | 36
O-25 | S-19 | 50 | 0
The following query is not working
select
B.Orderno, B.StyleNo, sum(A.Qty), sum(B.QtyShip)
from
A
inner join
B on A.OrderNo = B.OrderNo and A.StyleNo = B.StyleNo
group by
B.OrderNo, B.StyleNo
The issue you're having is that it's summing the qty field multiple times. Move the sums to subqueries and use a join on those:
select a.orderno, a.styleno, a.qty, b.qtyship
from (
select orderno, styleno, sum(qty) qty
from a
group by orderno, styleno
) a
join (
select orderno, styleno, sum(qtyship) qtyship
from b
group by orderno, styleno
) b on a.orderno = b.orderno and a.styleno = b.styleno
SQL Fiddle Demo

Count rows grouped by condition in SQL

We have a table like this:
+----+--------+
| Id | ItemId |
+----+--------+
| 1 | 1100 |
| 1 | 1101 |
| 1 | 1102 |
| 2 | 2001 |
| 2 | 2002 |
| 3 | 1101 |
+----+--------+
We want to count how many items each guy has, and show the guys with 2 items or more. Like this:
+----+-----------+
| Id | ItemCount |
+----+-----------+
| 1 | 3 |
| 2 | 2 |
+----+-----------+
We didn't count the guy with Id = 3 because he's got only 1 item.
How can we do this in SQL?
SELECT id, COUNT(itemId) AS ItemCount
FROM YourTable
GROUP BY id
HAVING COUNT(itemId) > 1
Use this query
SELECT *
FROM (
SELECT COUNT(ItemId ) AS COUNT, Id FROM ITEM
GROUP BY Id
)
my_select
WHERE COUNT>1
SELECT id,
count(1)
FROM YOUR_TABLE
GROUP BY id
HAVING count(1) > 1;
select Id, count(ItemId) as ItemCount
from table_name
group by Id
having ItemCount > 1