Group by in HIVE not working like i want

Group by in HIVE not working like i want - sql

Hi so i am trying to output the city with playerid with the most AB(runs).
Output the birth city of the player who had the most at bats (AB) in
his career.
Now i get what i want Cinncinati, sander01, 14432, this is correct. But it shows up in 3's like this. That too for every city and player and runs, like the 2nd most. I only need 1 entry, the other 2 are redundant. I think there something i did wrong with group by, any help? plz
Cinncinati, sander01, 14432
Cinncinati, sander01, 14432
Cinncinati, sander01, 14432
Chicago, dere90, 12324
Chicago, dere90, 12324
Chicago, dere90, 12324
SELECT a.bcity,a.id, b.ab FROM master a
JOIN
(SELECT id, SUM(ab) as ab FROM batting
GROUP by id) b
ON a.id = b.id
ORDER by b.ab DESC
limit 30;

Refer to DISTINCT for getting distinct result set.Now coming to your question,join master table with the top row from the result set b.
select a.bcity,b.id,b.ab from master a
join
(select id,sum(ab) as ab from batting
group by id
order by ab desc
limit 1
) b
on a.id = b.id
You can change the LIMIT 30 to LIMIT 1 and get the same result.
SELECT a.bcity,a.id, b.ab FROM master a
JOIN
(SELECT id, SUM(ab) as ab FROM batting
GROUP by id
) b
ON a.id = b.id
ORDER by b.ab DESC
limit 1;
Note: if there are multiple players with the same most runs then LIMIT 1 will not give the correct answer.

Related

SELECT 100 last entries with maximum 3 entries per unique user id

I'm having the following request to get all artworks inner join with their user info:
SELECT a.*, row_to_json(u.*) as users
FROM artworks a INNER JOIN users u USING(address)
WHERE (a.flag != "ILLEGAL" OR a.flag IS NULL)
ORDER BY a.date DESC
LIMIT 100
How could i have the same query but including no more than 3 entries per user?
Each user have a unique id called "address"
I think DISTINCT ON only work for 1 per user, maybe ROW_NUMBER?
Thank you in advance, i'm pretty new to DB queries.

You need an extra column in which you specify the nth time that the user is in the table. This will look something like this:
USER | N
user1 | 1
user1 | 2
user1 | 3
user2 | 1
user2 | 2
Getting the extra column in a new table can be done by using the following code
--Create new Table as T
WITH T AS (
SELECT TOP 100
a.*,
row_to_json(u.*) as users,
ROW_NUMBER() OVER(PARTITION BY u.user ORDER BY a.date DESC) AS N
FROM artworks a INNER JOIN users u USING(address)
WHERE (a.flag != "ILLEGAL" OR a.flag IS NULL) )
--Select columns from your new table
SELECT columns from T
WHERE (T.N =1 OR T.N =2 OR T.N =3)

Just an addition to your original query will do. Count the resulting records for each user and then filter by the counter value.
I am using users.address as the user id.
SELECT * from
(
SELECT a.*, row_to_json(u.*) as userinfo,
row_number() over (partition by u.address order by a.date desc) as ucount
FROM artworks a INNER JOIN users u ON a.address = u.address
WHERE a.flag != "ILLEGAL" OR a.flag IS NULL
) t
WHERE ucount <= 3
ORDER BY date DESC
LIMIT 100;
A remark - you have users as a column alias and as a table name which may cause confusion. I have changed the alias to userinfo.

Mysql subquery with "in" problem to associate parent table

I try to create a query who select the contacts information (table invoice_contacts), and the adresses (table invoice_adresses) associate to the contact which is the most used in the (table invoice_compta)
For exemple I have two contact :
Mike
John
Mike have 2 adresses :
Paris
London
Mike have 1 invoice with Paris, and 5 invoice with London, so I want the adresse of London associate to Mike.
I have try this query with an subquery which count all adresses associate to the contact for an adresses (with NB_ADRESSES), and select only the biggest (with order by NB_ADRESSES desc and limit 1), it's seem wells but I have an error when I do where ia2.ID_CONTACT = ic.ID_CONTACT ic.ID_CONTACT is not found.. (and I need to associate the contact to the subquery).
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE in (
select ia3.ID_ADRESSE
from (
select ia2.ID_ADRESSE,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
where ia2.ID_CONTACT = ic.ID_CONTACT
group by ia2.ID_ADRESSE
order by NB_ADRESSES desc
limit 1
) as ia3
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
I also have try with "exist" or "inner join" instead of "in" but I doesn't find good results, so the best way seems it to be with this query for me, but I don't found the solution.
I hope you will help me :)
Thanks
UPDATE :
So finally I have found an solution with this query :
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE = (
select ia3.ID_ADRESSE
from (
select ia2.*,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
group by ia2.ID_ADRESSE
) as ia3
where ia3.ID_CONTACT = ic.ID_CONTACT
order by NB_ADRESSES desc
limit 1
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
Thanks

Let me rephrase the problem as finding the most common contact/address combination for a given invoice.
I find it hard to follow your query and your table naming. But this is the idea:
select contact, address
from (select contact, address, count(*) as cnt,
row_number() over (partition by contact order by count(*) desc) as seqnum
from invoices
group by contact, address
) ca
where seqnum = 1;
The subquery is counting the number of times a given address (or city if you prefer) occur for each contact. The row_number() enumerates these, so the most common one has a value of "1". The outer query then chooses the most common value.

T SQL Adress Table with the same Company need latest Contact

i got an Address Table with Primary and Secondary Company locations, example:
ADDRESSES:
ID CompanyName AdressType MainID Location
1 ExampleCompany H 0 Germany
2 ExampleCompany N 1 Sweden
3 ExampleCompany N 1 Germany
and we got another Contacts Table including the latest Contact to each of the Company Locations
Contacts
ID SuperID Datecreate Notes
1 1 10.04.2018 XY
2 3 09.04.2018 YX
3 2 11.04.2018 XX
Now we want to select the latest Contact per Company and sort them so we got a list of all our customers that we did not contact in a long time.
i thought about something like this:
SELECT
ADDRH.ID,
ADDRH.COMPANY1,
TOPCONT.ID,
TOPCONT.DATECREATE,
TOPCONT.NOTES0
FROM dbo.ADDRESSES ADDRH
OUTER APPLY (SELECT TOP 1 ID, SUPERID, DATECREATE, CREATEDBY, NOTES0 FROM DBO.CONTACTS CONT WHERE ADDRH.ID = CONT.SUPERID ORDER BY DATECREATE DESC) TOPCONT
WHERE
TOPCONT.ID IS NOT NULL
ORDER BY TOPCONT.DATECREATE
But this is still missing the fact that we got the same company multiple times in the addresses table. how can i create a list that got each company with the latest contact?
Thanks for your help
Greetings

Well, you have to remove duplicates from address as well. Because of the structure of your data, I think the best approach is to use row_number():
SELECT ac.*
FROM (SELECT a.ID, a.COMPANY1, c.ID, c.DATECREATE, c.NOTES0,
ROW_NUMBER() OVER (PARTITION BY a.COMPANY1 ORDER BY c.DATECREATE DESC) as seqnum
FROM dbo.ADDRESSES a JOIN
DBO.CONTACTS c
ON a.ID = c.SUPERID
WHERE c.ID IS NOT NULL
) ac
WHERE seqnum = 1
ORDER BY c.DATECREATE;

SQL display two results side-by-side

I have two tables, and am doing an ordered select on each of them. I wold like to see the results of both orders in one result.
Example (simplified):
"SELECT * FROM table1 ORDER BY visits;"
name|# of visits
----+-----------
AA | 5
BB | 9
CC | 12
.
.
.
"SELECT * FROM table2 ORDER BY spent;"
name|$ spent
----+-------
AA | 20
CC | 30
BB | 50
.
.
.
I want to display the results as two columns so I can visually get a feeling if the most frequent visitors are also the best buyers. (I know this example is bad DB design and not a real scenario. It is an example)
I want to get this:
name by visits|name by spent
--------------+-------------
AA | AA
BB | CC
CC | BB
I am using SQLite.

Select A.Name as NameByVisits, B.Name as NameBySpent
From (Select C.*, RowId as RowNumber From (Select Name From Table1 Order by visits) C) A
Inner Join
(Select D.*, RowId as RowNumber From (Select Name From Table2 Order by spent) D) B
On A.RowNumber = B.RowNumber

Try this
select
ISNULL(ts.rn,tv.rn),
spent.name,
visits.name
from
(select *, (select count(*) rn from spent s where s.value>=spent.value ) rn from spent) ts
full outer join
(select *, (select count(*) rn from visits v where v.visits>=visits.visits ) rn from visits) tv
on ts.rn = tv.rn
order by ISNULL(ts.rn,tv.rn)
It creates a rank for each entry in the source table, and joins the two on their rank. If there are duplicate ranks they will return duplicates in the results.

I know it is not a direct answer, but I was searching for it so in case someone needs it: this is a simpler solution for when the results are only one per column:
select
(select roleid from role where rolename='app.roles/anon') roleid, -- the name of the subselect will be the name of the column
(select userid from users where username='pepe') userid; -- same here
Result:
roleid | userid
--------------------------------------+--------------------------------------
31aa33c4-4e66-4da3-8525-42689e46e635 | 12ad8c95-fbef-4287-9834-7458a4b250ee

For RDBMS that support common table expressions and window functions (e.g., SQL Server, Oracle, PostreSQL), I would use:
WITH most_visited AS
(
SELECT ROW_NUMBER() OVER (ORDER BY num_visits) AS num, name, num_visits
FROM visits
),
most_spent AS
(
SELECT ROW_NUMBER() OVER (ORDER BY amt_spent) AS num, name, amt_spent
FROM spent
)
SELECT mv.name, ms.name
FROM most_visited mv INNER JOIN most_spent ms
ON mv.num = ms.num
ORDER BY mv.num

Just join table1 and table2 with name as key like bellow:
select a.name,
b.name,
a.NumOfVisitField,
b.TotalSpentField
from table1 a
left join table2 b on a.name = b.name

Top 5 with most friends

Hi I'm new to SQL and I'm trying to figure out how I'm going to get the top 5 "bands" with most friends (userId) and this is what i have; a usertbl with userId as PK then a bandsTbl with bandId as PK then I have a table bandfriends with FK userId and bandId.
bandfriends
userid | bandId
---------------
1 | 1
1 | 2
1 | 3
Thanks!

SELECT TOP 5 bandId, fanCount
FROM
(SELECT bandId, COUNT(*) as fanCount
FROM bandfriends
GROUP BY bandId
ORDER BY COUNT(*) DESC)
You can also optionally specify WITH TIES in the select statement. See this and this.

select top 5 b.b_name, count(friends) as numOfFriends
from bands b inner join link l on b.b_id = l.bands inner join
friends f on f.f_id = l.friends
group by b.b_name
order by numOfFriends desc
If you have friends table, bands table and a link table, works for me :)

Read up on COUNT and GROUP BY at mysql.org
You'll want something like this (I haven't tested it):
SELECT bandId, COUNT(*) as fans FROM bandfriends
ORDER BY fans DESC
GROUP BY bandId
LIMIT 5;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group by in HIVE not working like i want - sql

Related

SELECT 100 last entries with maximum 3 entries per unique user id

Mysql subquery with "in" problem to associate parent table

T SQL Adress Table with the same Company need latest Contact

SQL display two results side-by-side

Top 5 with most friends

Categories

Resources