BigQuery Standard SQL Group by aggregate multiple columns - sql

Sample dataset:
|ownerId|category|aggCategory1|aggCategory2|
--------------------------------------------
| 1 | dog | animal | dogs |
| 1 | puppy | animal | dogs |
| 2 | daisy | flower | ignore |
| 3 | rose | flower | ignore |
| 4 | cat | animal | cats |
...
Looking to do a group by that contains number of owners from category, aggCategory1, aggCategory2 for example outputting:
|# of owners|summaryCategory|
-----------------------------
| 1 | dog |
| 1 | puppy |
| 1 | daisy |
| 1 | rose |
| 1 | cat |
| 2 | animal |
| 2 | flower |
| 1 | dogs |
| 2 | ignore |
| 1 | cats |
Doesn't have to be that format but looking to get the above data points.
Thanks!

One method is to use union all to unpivot the data and then aggregation in an outer query:
SELECT category, COUNT(*)
FROM (SELECT ownerID, category
FROM t
UNION ALL
SELECT ownerID, aggCategory1
FROM t
UNION ALL
SELECT ownerID, aggCategory2
FROM t
) t
GROUP BY category
The more BigQuery'ish way to write this uses arrays:
SELECT cat, COUNT(*)
FROM t CROSS JOIN
UNNEST(ARRAY[category, aggcategory1, aggcategory2]) cat
GROUP BY cat;

SELECT COUNT(T.ownerID), T.category
FROM (
SELECT ownerID, category
FROM table
UNION
SELECT ownerID, aggCategory1
FROM table
UNION
SELECT ownerID, aggCategory2
FROM table
) AS T
GROUP BY T.category
With a GROUP BY and the union with all of yours categories columns, it can be good.

use union all
with cte as
(
SELECT ownerID, category as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory1 as summaryCategory
FROM table
UNION
SELECT ownerID, aggCategory2 as summaryCategory
FROM table
) select count(ownerID),summaryCategory from cte group by summaryCategory

Related

SQL group by a field and only return one joined row for each grouping

Table data
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 2 | 7 August | cat | Y |
| 3 | 10 August | cat | Z |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
What I want to do is group by the name, then for each group choose one of the rows with the earliest required by date.
For this data set, I would like to end up with either rows 1 and 4, or rows 2 and 4.
Expected result:
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 1 | 7 August | cat | X |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
OR
+-----+----------------+--------+----------------+
| ID | Required_by | Name | Another_Field |
+-----+----------------+--------+----------------+
| 2 | 7 August | cat | Y |
| 4 | 11 August | dog | A |
+-----+----------------+--------+----------------+
I have something that returns 1,2 and 4 but I'm not sure how to only pick one from the first group to get the desired result. I'm joining the grouping with the data table so that I can get the ID and another_field back after the grouping.
SELECT d.id, d.name, d.required_by, d.another_field
FROM
(
SELECT min(required_by) as min_date, name
FROM data
GROUP BY name
) agg
INNER JOIN
data d
on d.required_by = agg.min_date AND d.name = agg.name
This is typically solved using window functions:
select d.id, d.name, d.required_by, d.another_field
from (
select id, name, required_by, another_field,
row_number() over (partition by name order by required_by) as rn
from data
) d
where d.rn = 1;
In Postgres using distinct on() is typically faster:
select distinct on (name) *
from data
order by name, required_by
Online example
SELECT [id]
,[date]
,[name]
FROM [test].[dbo].[data]
WHERE date IN (SELECT min(date) FROM data GROUP BY name)
enter image description here

Oracle - Fill null values in a column with values from another column

I am using Oracle 11.1.1.9.0 and my goal is to fill the Null values with the first NOT NULL values in "Raw Materials" column by each product i.e A, B and C in Product column. An example table and the intended result are illustrated at the end of this request.
None of the code sets in below works:

CODE 1:
IFNULL(Raw Materials,
First_value(Raw Materials) OVER (PARTITION BY Product))

CODE 2:
IFNULL(Raw Materials, 
First_value(Raw Materials) OVER (PARTITION BY Product
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))

CODE 3:
COALESCE(lag(Raw Materials ignore null) OVER (partition by Product),
Raw Materials)
CODE 4:
IFNULL(Raw Materials, EVALUATE('LAG(%1, 1) OVER (PARTITION BY %2)' AS varchar2(20), Raw Materials, Product))
Note: IFNULL function does work in the environment. It was tested with IFNULL(Raw Materials, '1') and it resulted in all null values becoming 1 in Raw Materials column.
Thank you.
+---------+----------+ +---------+----------+
| product | material | | product | material |
+---------+----------+ +---------+----------+
| A | | | A | Apple |
| A | | | A | Apple |
| A | | | A | Apple |
| A | | | A | Apple |
| A | Apple | | A | Apple |
| B | | | B | Orange |
| B | | | B | Orange |
| B | | => | B | Orange |
| B | | | B | Orange |
| B | Orange | | B | Orange |
| C | | | C | Banana |
| C | | | C | Banana |
| C | | | C | Banana |
| C | | | C | Banana |
| C | Banana | | C | Banana |
+---------+----------+ +---------+----------+
Left is the example table data. Right is the intended result.
The below link "Oracle code environment" shows the code environment and samples of Oracle Logical SQL function.
Oracle code environment
Oracle Logical SQL manual: https://docs.oracle.com/middleware/11119/biee/BIEUG/appsql.htm#CHDDCFJI
For your dataset, you could simply do a window MAX() or MIN():
NVL(Raw_Materials, MAX(Raw_Materials) OVER(PARTITION BY Product))
If you have a column that can be used to order the rows (I assumed id), you can use LAG() with the IGNORE NULLS clause:
NVL(Raw_Materials, LAG(Raw_Materials IGNORE NULLS) OVER(PARTITION BY Product ORDER BY id))
While you say that you are looking for some "first" value, your sample data suggests that you just want all same products to have the same material:
update mytable m1 set material =
(
select min(material)
from mytable m2
where m2.product = m1.product
);
If you just want to select this data. Then you can use this:
select product, min(material) over (partition by product)
from mytable;
According to the docs (https://docs.oracle.com/cd/E28280_01/bi.1111/e10540/sqlref.htm#BIEMG678) it seems OBIEE uses a special syntax for analytic window functions (e.g. MIN() OVER()):
select
product,
evaluate('min(%1) over (partition by %2)', material, product)
from mytable;
You must enable this by seeting the EVALUATE_SUPPORT_LEVEL accordingly.
(I hope I got this right. Otherwise read the docs on this and try something along the lines for yourself.)
You can try below query,We are using First value analytic function nullif, COALESCE, etc work on row level not column level.
with temp as (select 'A' product,NULL raw_material from dual union all
select 'A',NULL from dual union all
select 'A',NULL from dual union all
select 'A',NULL from dual union all
select 'A','APPLE' from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B',NULL from dual union all
select 'B','ORANGE' from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C',NULL from dual union all
select 'C','Banana' from dual)
select a.*,FIRST_VALUE(raw_material IGNORE NULLS)
OVER (partition by product ORDER BY product) first_product from temp a;
Oracle does not have an IFNULL function. Your code would have worked if you swapped IFNULL for COALESCE in either of your first two code snippets:
SELECT t.*,
COALESCE(
raw_material,
FIRST_VALUE(raw_material)
IGNORE NULLS
OVER ( PARTITION BY product )
) AS updated_raw_material
FROM test_data t;
Outputs:
PRODUCT | RAW_MATERIAL | UPDATED_RAW_MATERIAL
:------ | :----------- | :-------------------
A | null | Apple
A | null | Apple
A | null | Apple
A | Apple | Apple
B | null | Orange
B | null | Orange
B | null | Orange
B | null | Orange
B | Orange | Orange
C | null | Banana
C | null | Banana
C | null | Banana
C | null | Banana
C | null | Banana
C | Banana | Banana
db<>fiddle here

How to execute statement for each row and return entire result

This is a continuation of a previous question: Find groups with matching rows
I have a table which contains people and the cars that they own
+-------+-------+
| Name | Model |
+-------+-------+
| Bob | Camry |
| Bob | Civic |
| Bob | Prius |
| John | Camry |
| John | Civic |
| John | Prius |
| Kevin | Civic |
| Kevin | Focus |
| Mark | Civic |
| Lisa | Focus |
| Lisa | Civic |
+-------+-------+
This query gives me everyone who has the exact same cars as Lisa, as well as Lisa herself, which is fine.
;with cte as (
select *
, cnt = count(*) over (partition by name)
from t
)
, matches as (
select x2.name
from cte as x
inner join cte as x2
on x.model = x2.model
and x.cnt = x2.cnt
and x.name = 'Lisa'
group by x2.name, x.cnt
having count(*) = x.cnt
)
select t.*
from t
inner join matches m
on t.name = m.name
Result:
+-------+-------+
| name | model |
+-------+-------+
| Lisa | Civic |
| Lisa | Focus |
| Kevin | Civic |
| Kevin | Focus |
+-------+-------+
If i wanted to find all people who owns the same cars as Bob, i rerun the query and the result should give me John.
Right now, i have a list of names in Java, and for each name, i run this query. It is really slow. Is there anyway to find ALL people who own the same cars and partition the results into groups within a single database call?
For example, using the first table. I could run a query that would group the names. Notice how Mark has disappeared, because he does not own the exact same cars as someone else, only a subset.
+-------+-------+-------+
| Name | Model | Group |
+-------+-------+-------+
| Bob | Camry | 1 |
| Bob | Civic | 1 |
| Bob | Prius | 1 |
| John | Camry | 1 |
| John | Civic | 1 |
| John | Prius | 1 |
| Kevin | Civic | 2 |
| Kevin | Focus | 2 |
| Lisa | Focus | 2 |
| Lisa | Civic | 2 |
+-------+-------+-------+
This result set is also fine, i just need to know who belongs in what group, i can fetch their cars later.
+-------+-------+
| Name | Group |
+-------+-------+
| Bob | 1 |
| John | 1 |
| Kevin | 2 |
| Lisa | 2 |
+-------+-------+
I need to somehow loop over a list of names and find all people who own the same cars, and then combine it all into a single result set.
You can do this two ways. One way is to do the complex joins. The other way is a short-cut. Just aggregate the cars into a string and compare the strings.
with nc as (
select n.name,
stuff( (select ',' + t.model
from t
where t.name = n.name
order by t.model
for xml path ('')
), 1, 1, '') as cars
from (select distinct name from t) n
)
select nc.name, nc.cars, dense_rank() over (order by nc.cars)
from nc
order by nc.cars;
This creates a list with the names and the list of cars as a comma delimited list. If you like you can join back to the original table to get the original rows.
Using the concatenation method like vkp's answer on the previous question would work here as well if we add dense_rank():
with cte as (
select
name
, models = stuff((
select
',' + i.model
from t i
where i.name=t.name
order by 1
for xml path(''), type).value('.','varchar(max)')
,1,1,'')
from t
group by name
)
select
name
, models
, dr = dense_rank() over (order by models)
from cte
rextester: http://rextester.com/GTT11495
results:
+-------+-------------------+----+
| name | models | dr |
+-------+-------------------+----+
| Bob | Camry,Civic,Prius | 1 |
| Mark | Civic | 2 |
| Kevin | Civic,Focus | 3 |
| Lisa | Civic,Focus | 3 |
+-------+-------------------+----+
try this with othar sample data.
it work in all case.
declare #t table(Name varchar(50),Model varchar(50))
insert into #t values
('Bob','Camry')
,('Bob','Civic')
,('Bob','Prius')
,('Kevin','Civic')
,('Kevin','Focus')
,('Mark','Civic')
,('Lisa','Focus')
,('Lisa','Civic')
,('John','Camry')
,('John','Civic')
,('John','Prius')
declare #input varchar(50)='Bob'
;with
CTE1 AS
(
select name,model,ROW_NUMBER()over( order by name) rn
from #t
where name=#input
)
,cte2 as
(
select t.name,t.Model
,ROW_NUMBER()over(partition by t.name order by t.name) rn3
from #t t
inner JOIN
cte1 c on t.Model=c.model
where t.Name !=#input
)
select * from cte2 c
where exists(select rn3 from cte2 c1
where c1.name=c.name and c1.rn3=(select max(rn) from cte1)
)
Select *, row_number() Over(partition by Model order by name) as Group
from t
where model in(Select Model from t
Where name in('Bob', 'Lisa'))

SQL join rows in two tables

I'm not good at SQL and wonder if this can be done: I have two tables: table_a and table_b. Both tables have a TEXT type column named category.
Example:
Table_a
|-id-|-category-|
| 1 | fruits |
| 2 | meats |
| 3 | fruits |
| 4 | sweets |
| 5 | meats |
Table_b
|-id-|-category-|
| 1 | veggies |
| 2 | meats |
| 3 | veggies |
| 4 | veggies |
| 5 | meats |
What I need is to select all distinct categories from both tables in alphabetic order.
The result should be:
fruits
meats
sweets
veggies
Thank you
You should use UNION and an ORDER BY clause :
SELECT DISTINCT category
FROM Table_A
UNION
SELECT DISTINCT category
FROM Table_B
ORDER BY category
In sql you could use union and order by
select distinct category from (
select category
from table_a
order by category
union
select category
from table_b )

Sql two table query most duplicated foreign key

I got those two tables sport and student:
First table sport:
|idsport | name |
_______________________
| 1 | bobsled |
| 2 | skating |
| 3 | boarding |
| 4 | iceskating |
| 5 | skiing |
Second table student:
foreign key
|idstudent | name | sport_idsport
__________________________________________
| 1 | john | 3 |
| 2 | pauly | 2 |
| 3 | max | 1 |
| 4 | jane | 2 |
| 5 | nico | 5 |
so far i did this it output which number is mostly inserted, but cant get it to work
with two tables
SELECT sport_idsport
FROM (SELECT sport_idsport FROM student GROUP BY sport_idsport ORDER BY COUNT(*) desc)
WHERE ROWNUM<=1;
I need to output name of most popular sport, in that case it would be skating.
I use oracle sql.
with counter as (
Select sport_idsport,
count(*) as cnt,
dense_rank() over (order by count(*) desc) as rn
from student
group by sport_idsport
)
select s.*, c.cnt
from sport s
join counter c on c.sport_idsport = s.idsport and c.rn = 1;
SQLFiddle example: http://sqlfiddle.com/#!4/b76e21/1
select cnt, sport_idsport from (
select count(*) cnt, sport_idsport
from student
group by sport_idsport
order by count(*) desc
)
where rownum = 1