I am expecting the result of having count of 2 different columns values
Name | fruits
----------------
Vishal | orange
Manish | orange
Vishal | apple
Manish | orange
Manish | apple
Vishal | orange
Vishal | mango
Vishal | banana
Result should be
Name | Orange count | Apple count| mango | banana
--------------------------
Vishal | 2 | 1 | 1 | 1
Manish | 2 | 1 | 0 | 0
Another result should be
name | fruits
---------------
Vishal | orange, Apple , mango, banana
Manish | orange , Apple
You can use conditional aggregation for this:
select name,
count(case when fruits = 'orange' then 1 end) as orange_count,
count(case when fruits = 'apple' then 1 end) as apple_count,
count(case when fruits = 'mango' then 1 end) as mango_count,
count(case when fruits = 'banana' then 1 end) as banana_count
from the_table
group by name;
Some DBMS also support the ANSI SQL filter clause which makes this a bit more readable:
select name,
count(*) filter (where fruits = 'orange') as orange_count,
count(*) filter (where fruits = 'apple') as apple_count,
count(*) filter (where fruits = 'mango') as mango_count,
count(*) filter (where fruits = 'banana') as banana_count
from the_table
group by name;
Here is a generic pivot query which should work across most RDBMS:
SELECT Name,
SUM(CASE WHEN fruits = 'orange' THEN 1 ELSE 0 END) AS orange_count,
SUM(CASE WHEN fruits = 'apple' THEN 1 ELSE 0 END) AS apple_count,
SUM(CASE WHEN fruits = 'mango' THEN 1 ELSE 0 END) AS mango_count,
SUM(CASE WHEN fruits = 'banana' THEN 1 ELSE 0 END) AS banana_count
FROM yourTable
GROUP BY Name
If you are using SQL Server, Oracle, or Postgres, there are built-in PIVOT functions which can simplify this and possibly improve performance as well.
Related
For Below Postgres SQL query, I do use PIVOT in BigQuery, beside PIVOT, any other method for such query in BigQuery?
-- Postgres SQL --
SELECT
Apple,
Orange,
Lemon,
CASE WHEN Apple >= 50 THEN 1 ELSE 0 END AS Apple50
CASE WHEN Orange >= 50 THEN 1 ELSE 0 END AS Orange50
CASE WHEN Lemon >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT td.timestamp,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 16), 0) as Apple,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 17), 0) as Orange,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 18), 0) as Lemon
FROM TableData td
WHERE td.attribute_id IN (16, 17, 18)
GROUP BY td.timestamp
ORDER BY timestamp;
) AS td2
-- My attempt BigQuery Query --
SELECT
value_16 as Apple,
value_17 as Orange,
value_18 as Lemon,
CASE WHEN value_16 >= 50 THEN 1 ELSE 0 END as Apple50
CASE WHEN value_17 >= 50 THEN 1 ELSE 0 END as Orange50
CASE WHEN value_18 >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT * FROM(
SELECT
timestamp,
attribute_id,
value
FROM `PROJECT_ID.DB_NAME.FRUITS` as td
WHERE td.attribute_id IN (16,17,18)
)PIVOT
(
MAX(value) as value
FOR attribute_id IN (16,17,18)
)
)as td2
Below is the sample relation of the table.
-- TableData --
attribute_id | value | timestamp |
--------------+-----------+------------+
17 | 100 | 1618822794 |
17 | 100 | 1618822861 |
16 | 50 | 1618822794 |
16 | 50 | 1618822861 |
-- TableAttribute --
id | name |
--------------+----------+
16 | Apple |
17 | Orange |
18 | Lemon |
-- Expected Result --
timestamp | Apple | Orange | Lemon | Apple50 | Orange50 | Lemon50 |
--------------+---------+--------+-------+---------+----------+---------+
1618822794 | 50 | 100 | 0 | 1 | 1 | 0
1618822861 | 50 | 100 | 0 | 1 | 1 | 0
Pivot is likely the best way to achieve what you're wanting. Consider the following approach though as it might be simpler to manage:
with aggregate_data as (
select td.timestamp
, ta.name
, td.value as value
from TableData td
full outer join TableAttribute ta
on td.attribute_id = ta.id
)
select timestamp
, value_Apple as Apple
, value_Orange as Orange
, value_Lemon as Lemon
, _50_Apple as Apple50
, _50_Orange as Orange50
, _50_Lemon as Lemon50
from aggregate_data
pivot(max(value) value, max(case when value >=50 then 1 else 0 end) _50 for name in ('Apple', 'Orange', 'Lemon'))
where timestamp is not null
Say I have a column called 'Fruit' and it has these three values:
ID | Fruit |
010 | Apple |
020 | Orange |
010 | Pear |
Say there are other columns like Profile_ID. How do I get the table to read like this instead, where the values in that one column are now columns of their own, and when a Profile is associated with a given fruit, it'll show a 'X' in that row:
ID | Apple | Orange | Pear
010 | x | | x
020 | | x |
You can use conditional aggregation. If you want a particular profile id then:
select id,
max(case when fruit = 'Apple' and profile_id = ? then 'X' end) as apple,
max(case when fruit = 'Orange' and profile_id = ? then 'X' end) as orange,
max(case when fruit = 'Pear' and profile_id = ? then 'X' end) as pear
from t
group by id;
If you just want to know if a row exists:
select id,
max(case when fruit = 'Apple' then 'X' end) as apple,
max(case when fruit = 'Orange' then 'X' end) as orange,
max(case when fruit = 'Pear' then 'X' end) as pear
from t
group by id;
Below is for BigQuery Standard SQL and generic enough to handle any number of distinct "Fruits" and does not require manual typing and explicit mentioning all of them
EXECUTE IMMEDIATE (
SELECT """
SELECT id, """ ||
STRING_AGG("""MAX(IF(Fruit = '""" || Fruit || """', 'X', '')) AS """ || Fruit, ', ')
|| """
FROM `project.dataset.table`
GROUP BY id
"""
FROM (
SELECT DISTINCT Fruit
FROM `project.dataset.table`
ORDER BY Fruit
)
);
If to apply to sample data from your question - output is
Row id Apple Orange Pear
1 010 X X
2 020 X
If have the following dataset:
... and I want to do a crosstab of sorts, counting the data against specific criteria e.g.:
Colour criteria: String contains "Blue", "Red", "Yellow" or "Green" (not case sensitive)
Type criteria: String contains "Car", "Lorry", or "Bus (not case sensitive)
... and I would like the result to look like the following:
Is there an SQL query that I can run on the original data to produce the result I'm looking for?
You can use CROSS APPLY with conditional aggregation; CROSS APPLY simplifies the generation of the list of colours:
select c.colour,
sum(case when v.VehicleData like '%Car%' then 1 else 0 end) Car,
sum(case when v.VehicleData like '%Lorry%' then 1 else 0 end) Lorry,
sum(case when v.VehicleData like '%Bus%' then 1 else 0 end) Bus
from vehicles v
cross apply (values ('Blue'), ('Red'), ('Yellow'), ('Green')
) AS c(colour)
where v.VehicleData like '%' + c.colour + '%'
group by c.colour
Output:
colour Car Lorry Bus
Blue 3 1 0
Red 1 2 0
Yellow 0 1 1
Green 0 0 2
Demo on dbfiddle
With conditional aggregation:
select c.colour,
count(case when t.VehicleData like '%Car%' then 1 end) Car,
count(case when t.VehicleData like '%Lorry%' then 1 end) Lorry,
count(case when t.VehicleData like '%Bus%' then 1 end) Bus
from (
select 'Blue' colour union all
select 'Red' union all
select 'Yellow' union all
select 'Green'
) c left join tbl1 t
on t.VehicleData like '%' + c.colour + '%'
group by c.colour
See the demo.
Results:
> colour | Car | Lorry | Bus
> :----- | --: | ----: | --:
> Blue | 3 | 1 | 0
> Red | 1 | 2 | 0
> Yellow | 0 | 1 | 1
> Green | 0 | 0 | 2
Let
user | fruit
------------
1 | apple
1 | apple
1 | apple
2 | apple
2 | apple
1 | pear
Trying to combine count and group by to get
user | apples | pears
---------------------
1 | 3 | 1
2 | 2 | 0
Any hints on how to proceed are appreciated.
Use case expressions to do conditional counting:
select user,
count(case when fruit = 'apple' then 1 end) as apples,
count(case when fruit = 'pear' then 1 end) as pears
from tablename
group by user
If you´re working on an Oracle, you would use the PIVOT-function:
SELECT *
FROM fruit t
PIVOT (COUNT(fruit) AS cnt
FOR(fruit) IN ('apple' AS apple
, 'pear' AS pear) );
More details and full samples on PIVOT / UNPIVOT you´ll find in the web (f.e. here https://oracle-base.com/articles/11g/pivot-and-unpivot-operators-11gr1 )
I'm trying to do some crosstabs in SQL Server 2008 R2. That part is alright, however, if I try to get percentages for each cell, I run into a problem.
Here is a distilled use case: A survey where people give their favorite color and their favorite fruit. I'd like to know how many like a given fruit AND a given color:
with survey as (
select 'banana' fav_fruit, 'yellow' fav_color
union select 'banana', 'red'
union select 'apple', 'yellow'
union select 'grape', 'red'
union select 'apple', 'blue'
union select 'orange', 'purple'
union select 'pomegranate', 'green'
)
select
s.fav_color,
sum(case
when s.fav_fruit = 'banana' then 1
else 0
end) as banana,
sum(case
when s.fav_fruit = 'banana' then 1
else 0
end) / sum(1) -- why does division always yield 0? "+", "-", and "*" all behave as expected.
* 100 as banana_pct,
sum(1) as total
from
survey s
group by
s.fav_color;
Results:
fav_color banana banana_pct total
------------------------------------
blue 0 0 1
green 0 0 1
purple 0 0 1
red 1 0 2
yellow 1 0 2
What I was expecting:
fav_color banana banana_pct total
------------------------------------
blue 0 0 1
green 0 0 1
purple 0 0 1
red 1 50 2
yellow 1 50 2
Please help me to get what I was expecting?
You are using SQL Server. Here is a much simpler example that replicates the issue:
select 1/2
SQL Server does integer division.
Replace the denominator with something like sum(1.0) or sum(cast 1 as float) or sum(1e0) instead of sum(1).
Contrary to my expectation at least, SQL Server stores numbers with decimal points as numeric/decimal type (see here) rather than float. The fixed number of decimal spaces might affect subsequent operations.
Query:
SQLFIddleexample
SELECT s.fav_color,
sum( CASE WHEN s.fav_fruit = 'banana' THEN 1 ELSE 0 END ) AS banana,
sum( CASE WHEN s.fav_fruit = 'banana' THEN 1 ELSE 0 END) / sum(1.00) -- why does division always yield 0? "+", "-", and "*" all behave as expected.
* 100 AS banana_pct,
sum(1) AS total
FROM survey s
GROUP BY s.fav_color
Result:
| FAV_COLOR | BANANA | BANANA_PCT | TOTAL |
-------------------------------------------
| blue | 0 | 0 | 1 |
| green | 0 | 0 | 1 |
| purple | 0 | 0 | 1 |
| red | 1 | 50 | 2 |
| yellow | 1 | 50 | 2 |
I've recently discovered the IIF function. It makes things much cleaner. Taking Justin's example from above:
SELECT s.fav_color,
sum( IIF(s.fav_fruit = 'banana', 1,0) AS banana,
sum( IIF(s.fav_fruit = 'banana', 1,0) / sum(1.00)
* 100 AS banana_pct,
sum(1) AS total
FROM survey s
GROUP BY s.fav_color