SQL calculate item frequency using multiple / dependent columns? - sql

I'm completely new to SQL, and have read StackOverflow posts on SQL to try and figure this out, and other sources and unable to do this in SQL. Here goes...
I have a table of 3 columns and thousands of rows, with data for first 2 columns. The third column is currently empty and I need to populate the third column based on data already in the first and second columns.
Say I have states in the first column and fruit entries in the second column. I need to write an SQL statement(s) that calculates the number of different states where each fruit comes from, and then inserts this popularity number into the third column for every row. A popularity number of 1 in that row means that fruit only comes from one state, a popularity number of 4 means the fruit comes from 4 states. So my table is currently like:
state fruit popularity
hawaii apple
hawaii apple
hawaii banana
hawaii kiwi
hawaii kiwi
hawaii mango
florida apple
florida apple
florida apple
florida orange
michigan apple
michigan apple
michigan apricot
michigan orange
michigan pear
michigan pear
michigan pear
texas apple
texas banana
texas banana
texas banana
texas grape
And I need to figure out how to calculate and then update the third column, named popularity, which is the number of states that exports that fruit. The goal is to produce (sorry bad pun) the table below, where based on above table, "apple" appears in all 4 states, oranges and banana appear in 2 states, and kiwi, mango, pear, and grape only appear in 1 state, hence their corresponding popularity numbers.
state fruit popularity
hawaii apple 4
hawaii apple 4
hawaii banana 2
hawaii kiwi 1
hawaii kiwi 1
hawaii mango 1
florida apple 4
florida apple 4
florida apple 4
florida orange 2
michigan apple 4
michigan apple 4
michigan apricot 1
michigan orange 2
michigan pear 1
michigan pear 1
michigan pear 1
texas apple 4
texas banana 2
texas banana 2
texas banana 2
texas grape 1
My small programmer brain says to try and figure out a way to loop through the data in some kind of script, but reading up a little on SQL and databases, it seems like you don't write long and slow looping scripts in SQL. I'm not even sure if you can? but instead that there are better/faster ways to do this in SQL.
Anyone know how to, in SQL statement(s), calculate and update the third column for each row, which is here called popularity and corresponds to the number of states that each fruit comes from? Thanks for reading, very grateful for any help.
So far I have tried these SQL statements below, which output but don't quite get me what I need:
--outputs those fruits appearing multiple times in the table
SELECT fruit, COUNT(*)
FROM table
GROUP BY fruit
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--outputs those fruits appearing only once in the table
SELECT fruit, COUNT(*)
FROM table
GROUP BY fruit
HAVING COUNT(*) = 1
--outputs list of unique fruits in the table
SELECT COUNT (DISTINCT(fruit))
FROM table

If you want to simply update your table with the priority it would look like:
update my_table x
set popularity = ( select count(distinct state)
from my_table
where fruit = x.fruit )
If you want to select the data then you can use an analytic query:
select state, fruit
, count(distinct state) over ( partition by fruit ) as popularity
from my_table
This provides the number of distinct states, per fruit.

I ran this and got (what I think) is what you want:
WITH t
AS (SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
UNION ALL
SELECT 'hawaii' as STATE, 'mango' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'florida' as STATE, 'orange' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'apricot' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'orange' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'apple' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'banana' as fruit FROM dual
UNION ALL
SELECT 'texas' as STATE, 'grape' as fruit FROM dual)
SELECT state,
fruit,
count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
FROM t;
Returned
florida apple 4
florida apple 4
florida apple 4
hawaii apple 4
hawaii apple 4
michigan apple 4
michigan apple 4
texas apple 4
michigan apricot 1
hawaii banana 2
texas banana 2
texas banana 2
texas banana 2
texas grape 1
hawaii kiwi 1
hawaii kiwi 1
hawaii mango 1
florida orange 2
michigan orange 2
michigan pear 1
michigan pear 1
Obviously, you'd only need to run:
SELECT state,
fruit,
count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
FROM table_name;
Hope it helps...

If your table is #fruit...
To count the different states for each fruit
select fruit, COUNT(distinct state) statecount from #fruit group by fruit
and so to update the table with these values
update #fruit
set popularity
= statecount
from
#fruit
inner join
(select fruit, COUNT(distinct state) statecount from #fruit group by fruit) sc
on #fruit.fruit = sc.fruit

This should get you most of the way there. Basically you want to get a count of distinct states that the fruit is in and then use that to join back to the original table.
update table
set count = cnt
from
(
select fruit, count(distinct state) as cnt
from table
group by fruit) cnts
inner join table t
on cnts.fruit = t.fruit

Another option:
SELECT fruit
, COUNT(*)
FROM
(
SELECT state
, fruit
, ROW_NUMBER() OVER (PARTITION BY state, fruit ORDER BY NULL) rn
FROM t
)
WHERE rn = 1
GROUP BY fruit
ORDER BY fruit;

Try this:
select a.*,b.total
from [table] as a
left join
(
SELECT fruit,count(distinct [state]) as total
FROM [table]
group by fruit
) as b
on a.fruit = b.fruit
Note this is SQL Server code, do your own tweaks if necessary.

try this
create table states([state] varchar(10),fruit varchar(10),popularity int)
INSERT INTO states([state],fruit)
VALUES('hawaii','apple'),
('hawaii','apple'),
('hawaii','banana'),
('hawaii','kiwi'),
('hawaii','kiwi'),
('hawaii','mango'),
('florida','apple'),
('florida','apple'),
('florida','apple'),
('florida','orange'),
('michigan','apple'),
('michigan','apple'),
('michigan','apricot'),
('michigan','orange'),
('michigan','pear'),
('michigan','pear'),
('michigan','pear'),
('texas','apple'),
('texas','banana'),
('texas','banana'),
('texas','banana'),
('texas','grape')
update t set t.popularity=a.cnt
from states t inner join
(SELECT fruit,count(distinct [state]) as cnt
FROM states
group by fruit) a
on t.fruit =a.fruit

Related

SQL SELECT products which are available in multiple countries

i have this table of fruits with market and prices (table is actually just an excerpt)
Product Market Price
Apple UK 4
Apple DE 5
Apple US 4
Apple IT 3
Banana US 2
Orange UK 1
Kiwi ES 3
Kiwi DE 10
Kiwi US 12
Kiwi UK 11
Cucumb IT 5
Cucumb DE 4
Cucumb UK 3
Peach IT 12
Peach DE 10
Peach UK 10
Peach US 11
Now i only want to select (or group) products which are available in all four markets DE, UK, IT and US. Which should result in this table:
Product Market Price
Apple UK 4
Apple DE 5
Apple US 4
Apple IT 3
Peach IT 12
Peach DE 10
Peach UK 10
Peach US 11
I have tried it with group by+having+Count Distinct, but it does not work. See below
SELECT
market, product, AVG(price) as pr
FROM
fruits
WHERE
market IN (DE, IT, UK, US)
GROUP BY
market, product
HAVING Count (DISTINCT market=4)
i guess that i'm using the count distinct in a wrong way.
Please help. Thanks!
Having - condition Count (DISTINCT market=4)should be rather:
SELECT
market, product, AVG(price) as pr
FROM
fruits
WHERE
market IN (DE, IT, UK, US)
GROUP BY
market, product
HAVING Count (DISTINCT market) = 4
You don't have duplicate countries for a product. I would recommend simple window function:
select f.*
from (select f.*, count(*) over (partition by product) as cnt
from fruits f
where market in ('DE', 'IT', 'UK', 'US')
) f
where cnt = 4;

Adding a new column that auto increments based on existing field values

Not sure how to do this, but I want to add a new column "Jersey" that will increment by 1 depending on the "Sport" value in both tab_A and tab_B. So if a "Sport" already exists in tab_B, then just grab the max jersey_no and add 1 for the new column. That one is easy.
Now if the "Sport" does not exist in tab_B, then give it a "Jersey" value of 100 for the new column. However, if there is more than one of the same "Sport" in tab_A (but do not exist in tab_B), then it should start with 100 and increment by 1 for the next same Sport, and so on (e.g. see Garcia example below).
I created a sequence "seqnce" but, that really didn't help at all. Is there another way to accomplish this? Thanks in advance!
Tab_A
Name State Sport
Garcia CA Basketball
Garcia AL Basketball
Garcia NY Basketball
McGee CA Swimming
Tontou CA Football
Tontou AL Swimming
Tab_B
Name Sport Jersey_No
Garcia Swimming 100
Garcia Football 100
McGee Swimming 101
Tontou Swimming 101
Tontou Swimming 102
Expected Output
Name State Sport Jersey
Garcia CA Basketball 100
Garcia AL Basketball 101
Garcia NY Basketball 102
McGee CA Swimming 102
Tontou CA Football 100
Tontou AL Swimming 103
My Code
select name, state, sport
,nvl ((select max(b.jersey_no + 1) from tab_b b
where b.sport = a.sport
and b.name = a.name),
(case
when not exists (select 1 from tab_b b
where b.sport = a.sport
and b.name = a.name
having count(a.sport) > 1)
then seqnce.nextval
else '100'
end )
) Jersey
from tab_a
If it's only select result then using row_number(). If you need update column in table, then write trigger
Example this:
WITH taba AS
(SELECT 'Garcia' Name, 'CA' State, 'Basketball' Sport from dual
UNION ALL
SELECT 'Garcia' Name, 'AL' State, 'Basketball' Sport from dual
UNION ALL
SELECT 'Garcia' Name, 'NY' State, 'Basketball' Sport from dual
UNION ALL
SELECT 'McGee' Name, 'CA' State, 'Swimming' Sport from dual
UNION ALL
SELECT 'Tontou' Name, 'CA' State, 'Football' Sport from dual
UNION ALL
SELECT 'Tontou' Name, 'AL' State, 'Swimming' Sport from dual),
tabb AS
(SELECT 'Garcia' Name, 'Swimming' Sport, 100 Jersey from dual
UNION ALL
SELECT 'Garcia' Name, 'Football', 100 from dual
UNION ALL
SELECT 'McGee' Name, 'Swimming', 101 from dual
UNION ALL
SELECT 'Tontou' Name, 'Swimming', 101 from dual
UNION ALL
SELECT 'Tontou' Name, 'Swimming', 102 from dual)
SELECT taba.Name,
taba.State ,
taba.Sport,
row_number() over(partition by taba.Name, taba.Sport ORDER BY taba.State)
+ nvl((SELECT MAX(tabb.Jersey)
FROM tabb
WHERE taba.name = tabb.name
AND taba.sport = tabb.sport), 99)
FROM taba
result:
Garcia AL Basketball 100
Garcia CA Basketball 101
Garcia NY Basketball 102
McGee CA Swimming 102
Tontou CA Football 100
Tontou AL Swimming 103

Related with count, max and group by

Here is my table. I am trying to get distinct person distinct fruit based on maximum number of fruits he has.
persons | fruits
David apple
David apple
David apple
David banana
David orange
Sam apple
Sam banana
Sam orange
Sam orange
Sam orange
Sam orange
Tom apple
Tom banana
Tom banana
Tom orange
I want to see my result as:
persons | fruits
David apple
Sam orange
Tom banana
I tried using count and max functions and group by, but was not able to get right result.
You can use distinct on:
select distinct on (person) person, fruit
from (select person, fruit, count(*) as cnt
from personfruits pf
group by person, fruit
) pf
order by person, cnt desc;
You can write this without the subquery as well:
select distinct on (person) person, fruit
from personfruits pf
group by person, fruit
order by person, count(*) desc;
However, that is a bit hard to follow for someone not really familiar with distinct on.
From what I understand, you want to see which fruit occurs most often, per person. If that's correct, this should work
SELECT persons, fruits
FROM (
SELECT
persons,
fruits,
RANK() OVER(PARTITION BY persons ORDER BY FruitCount DESC) AS FruitRank -- Rank fruit count per person
FROM (
SELECT
persons,
fruits,
count(*) FruitCount -- get # rows per (person, fruit) combination
FROM MyTable
GROUP BY persons, fruits
) src
) src
WHERE FruitRank = 1 -- Return fruit with largest FruitCount, per person

Oracle SQL Columns to Rows without UNPIVOT

What I currently have:
Team User Apples Oranges Pears
Red Adam 4 5 6
Red Avril 11 12 13
Blue David 21 22 23
What's needed:
Team User Product Count
Red Adam Apples 4
Red Adam Oranges 5
Red Adam Pears 6
Red Avril Apples 11
Red Avril Oranges 12
Red Avril Pears 13
Blue David Apples 21
....
This is to be implemented using Oracle SQL. I understand this can be done using UNPIVOT, but my Oracle SQL version is too old to support this method. Can someone provide an example of how to achieve this using CROSS APPLY or equivalent methods? Count changes depending on team-user-product combination, and the number of product types may change slightly in the future so a scalable solution might be necessary.
This is time-sensitive, so I appreciate the help.
You can do this using a cross join and some case statements by using a dummy subquery that holds the same number of rows as you have columns that you want to unpivot (since you want each column to go into its own row) like so:
WITH your_table AS (SELECT 'Red' Team, 'Adam' usr, 4 Apples, 5 Oranges, 6 Pears FROM dual UNION ALL
SELECT 'Red' Team, 'Avril' usr, 11 Apples, 12 Oranges, 13 Pears FROM dual UNION ALL
SELECT 'Blue' Team, 'David' usr, 21 Apples, 22 Oranges, 23 Pears FROM dual)
-- end of mimicking your table. See SQL below:
SELECT yt.team,
yt.usr,
CASE WHEN d.id = 1 THEN 'Apples'
WHEN d.id = 2 THEN 'Oranges'
WHEN d.id = 3 THEN 'Pears'
END product,
CASE WHEN d.id = 1 THEN yt.apples
WHEN d.id = 2 THEN yt.oranges
WHEN d.id = 3 THEN yt.pears
END count_of_product
FROM your_table yt
CROSS JOIN (SELECT LEVEL ID
FROM dual
CONNECT BY LEVEL <= 3) d -- number of columns to unpivot
ORDER BY team, usr, product;
TEAM USR PRODUCT COUNT_OF_PRODUCT
---- ----- ------- ----------------
Blue David Apples 21
Blue David Oranges 22
Blue David Pears 23
Red Adam Apples 4
Red Adam Oranges 5
Red Adam Pears 6
Red Avril Apples 11
Red Avril Oranges 12
Red Avril Pears 13
Doing it this way means that you only have to go through the table once, rather than multiple times if you were doing the union all method.
ETA: Here's the method that Aleksej was referring to - I would suggest testing both methods against your set of data (which is hopefully large enough to be representative) to see which one is more performant:
WITH your_table AS (SELECT 'Red' Team, 'Adam' usr, 4 Apples, 5 Oranges, 6 Pears FROM dual UNION ALL
SELECT 'Red' Team, 'Avril' usr, 11 Apples, 12 Oranges, 13 Pears FROM dual UNION ALL
SELECT 'Blue' Team, 'David' usr, 21 Apples, 22 Oranges, 23 Pears FROM dual)
-- end of mimicking your table. See SQL below:
SELECT yt.team,
yt.usr,
CASE WHEN LEVEL = 1 THEN 'Apples'
WHEN LEVEL = 2 THEN 'Oranges'
WHEN LEVEL = 3 THEN 'Pears'
END product,
CASE WHEN LEVEL = 1 THEN yt.apples
WHEN LEVEL = 2 THEN yt.oranges
WHEN LEVEL = 3 THEN yt.pears
END count_of_product
FROM your_table yt
CONNECT BY PRIOR team = team
AND PRIOR usr = usr
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= 3
ORDER BY team, usr, product;
TEAM USR PRODUCT COUNT_OF_PRODUCT
---- ----- ------- ----------------
Blue David Apples 21
Blue David Oranges 22
Blue David Pears 23
Red Adam Apples 4
Red Adam Oranges 5
Red Adam Pears 6
Red Avril Apples 11
Red Avril Oranges 12
Red Avril Pears 13
You can use a big union all like this:
select
Team,
"User",
'Apples' Product,
Apples "Count"
from your_table
union all
select
Team,
"User",
'Oranges' Product,
Oranges "Count"
from your_table
union all
select
Team,
"User",
'Pears' Product,
Pears "Count"
from your_table
union all
. . .
Also, try not to use keywords such as User or Count as identifiers or else, wrap them in double quotes like I did.

Using a query to return the most frequent value and the count within a group using SQL in MS Access

Say I have a table showing the type of fruit consumed by an individual over a 24 hour period that looks like this:
Name Fruit
Tim Apple
Tim Orange
Tim Orange
Tim Orange
Lisa Peach
Lisa Apple
Lisa Peach
Eric Plum
Eric Orange
Eric Plum
How would I get a table that shows only the most consumed fruit for each person, as well as the number of fruits consumed. In other words, a table that looks like this:
Name Fruit Number
Tim Orange 3
Lisa Peach 2
Eric Plum 2
I tried
SELECT Name, Fruit, Count(Fruit)
FROM table
GROUP BY Name
But that returns an error because Name needs to be in the GROUP BY statement as well. Every other method I've tried returns the counts for ALL values rather than just the maximum values. MAX(COUNT()) doesn't appear to be a valid statement, so I'm not sure what else to do.
This is a pain, but you can do it. Start with your query and then use join:
SELECT n.Name, n.Fruit
FROM (SELECT Name, Fruit, Count(Fruit) as cnt
FROM table as t
GROUP BY Name, Fruit
) as t INNER JOIN
(SELECT Name, max(cnt) as maxcnt
FROM (SELECT Name, Fruit, Count(Fruit) as cnt
FROM table
GROUP BY Name, Fruit
) as t
GROUP BY Name
) as n
ON t.name = n.name and t.cnt = n.maxcnt;