BigQuery to aggregate with array - google-bigquery

Given 2 tables below
WITH id_tags AS
(
SELECT 1 as ID, ['Michael','New York'] as tags UNION ALL
SELECT 2 as ID, ['Michael','Jon', 'Texas'] as tags UNION ALL
SELECT 3 as ID, ['abcd','Washington'] as tags UNION ALL
SELECT 4 as ID, ['Washington','New York','Michael'] as tags UNION ALL
SELECT 5 as ID, ['America','Michael'] as tags UNION ALL
SELECT 6 as ID, ['Washington','Michael', 'defg'] as tags UNION ALL
SELECT 7 as ID, ['America','Burqq','defg'] as tags
),
tagsCategory AS(
SELECT 'Michael' as tags, 'Person' as category UNION ALL
SELECT 'Burqq' as tags, 'Person' as category UNION ALL
SELECT 'New York' as tags, 'City' as category UNION ALL
SELECT 'Washington' as tags, 'City' as category UNION ALL
SELECT 'Texas' as tags, 'City' as category
)
I want to display an exception list. The exception list is when id_tags has 0 or more than 1 person
and when id_tags has 0 or more than 1 city. (basically, it is only 1 per category)
Expected results:
----------------------------------
ID | Reason
----------------------------------
2 | Person more than 1
3 | Only 1 city
4 | More than 1 city detected
5 | No City Detected
7 | No City Detected
Explanation
ID 1 is totally fine, it has 1 City and has 1 person therefore it is not in the list
ID 2 it is because it has 2 people
ID 3 because it has 1 city. we can ignore 'abcd' as it is not in the tags category
ID 4 is is because it has 2 cities
ID 5 because America is not city and therefore no city in that row
ID 6 is fine. We can ignore 'defg' as it is not in the list
ID 7 (the same reason as ID 5)
It looks easy for me at the first glance, however, I always find bugs in my query. Do you have any suggestion/help me with the logic or even query example?
I use standardBigQuery.

WITH id_tags AS
(
SELECT 1 AS ID, ['Michael','New York'] AS tags UNION ALL
SELECT 2 AS ID, ['Michael','Jon', 'Texas'] AS tags UNION ALL
SELECT 3 AS ID, ['abcd','Washington'] AS tags UNION ALL
SELECT 4 AS ID, ['Washington','New York','Michael'] AS tags UNION ALL
SELECT 5 AS ID, ['America','Michael'] AS tags UNION ALL
SELECT 6 AS ID, ['Washington','Michael', 'defg'] AS tags UNION ALL
SELECT 7 AS ID, ['America','Burqq','defg'] AS tags
),
tagsCategory AS(
SELECT 'Michael' AS tags, 'Person' AS category UNION ALL
SELECT 'Jon' AS tags, 'Person' AS category UNION ALL
SELECT 'Burqq' AS tags, 'Person' AS category UNION ALL
SELECT 'New York' AS tags, 'City' AS category UNION ALL
SELECT 'Washington' AS tags, 'City' AS category UNION ALL
SELECT 'Texas' AS tags, 'City' AS category
)
SELECT id,
COUNTIF(category = 'City') AS cities,
COUNTIF(category = 'Person') AS names
FROM id_tags, UNNEST(tags) tag
JOIN tagsCategory ON tag = tagsCategory.tags
GROUP BY id
HAVING NOT cities = 1 OR NOT names = 1

Related

Join a table that depends on another table

I have a POST table, a CATEGORY table, a TAG table and a MIGTATION_TAG table, I explain the MIGTATION_TAG table contains the movement of the tags between the categories, for example the tag whose ID = 1 belongs to the category whose l 'ID = 10 if I change its category to 12 a line will be added to the MIGTATION_TAG table as follows:
ID 1 TAG_ID 1 CATEGOTY_ID 12
the POST table
id title content tag_id
---------- ---------- ---------- ----------
1 title1 Text... 1
2 title2 Text... 3
3 title3 Text... 1
4 title4 Text... 2
5 title5 Text... 5
6 title6 Text... 4
the CATEGORY table
id name
---------- ----------
1 category_1
2 category_2
3 category_3
the TAG table
id name fist_category_id
---------- ---------- ----------------
1 tag_1 1
2 tag_2 1
3 tag_3 3
4 tag_4 1
5 tag_5 2
the MIGTATION_TAG table
id tag_id category_id
---------- ---------- ----------------
9 1 3
8 5 1
7 1 2
5 3 1
4 2 2
3 5 3
2 3 3
1 1 3
so i would like to know how many posts are registered for each category.
in some cases if there has been no change of category for a tag then it keeps its first category,
I manage to join the TAG table to the POST table via LEFT JOIN but the problem is that the join must depend on the MIGTATION_TAG table which must check if there has been a migration, if so then it must bring me back the last MAX (tag_id ) for each tag ,
here is my query
select category, COUNT(*) AS numer_of_posts
from(
select CATEGORY.name,
case
when POST.tag_id is not null then CATEGORY.name
end as category
from POST
left join TAG ON POST.tag_id = TAG.id
left join (
select id, MAX(tag_id) tag_id
from MIGTATION_TAG
group by id, tag_id
) MIGTATION_TAG
ON TAG.id = MIGTATION_TAG.tag_id
left join CATEGORY on MIGTATION_TAG.category_id = CATEGORY.id
)
GROUP BY category
;
here is the result i want to display with my query
Important ! for the post with id = 6 the tag_id = 4 whish was not changed so it will be using the fist_category_id in TAG table
category numer_of_posts
---------- --------------
category_1 3
category_2 1
category_3 2
Best regards
You can use:
SELECT MAX(c.name) AS category,
COUNT(*)
FROM post p
INNER JOIN tag t
ON (p.tag_id = t.id)
LEFT OUTER JOIN (
SELECT tag_id,
MAX(category_id) KEEP (DENSE_RANK LAST ORDER BY id) AS category_id
FROM migration_tag
GROUP BY tag_id
) m
ON (t.id = m.tag_id)
INNER JOIN category c
ON ( COALESCE(m.category_id, t.first_category_id) = c.id )
GROUP BY c.id
ORDER BY category
Which, for the sample data:
CREATE TABLE POST ( id, title, content, tag_id ) AS
SELECT 1, 'title1', 'Text...', 1 FROM DUAL UNION ALL
SELECT 2, 'title2', 'Text...', 3 FROM DUAL UNION ALL
SELECT 3, 'title3', 'Text...', 1 FROM DUAL UNION ALL
SELECT 4, 'title4', 'Text...', 2 FROM DUAL UNION ALL
SELECT 5, 'title5', 'Text...', 5 FROM DUAL UNION ALL
SELECT 6, 'title6', 'Text...', 4 FROM DUAL;
CREATE TABLE CATEGORY ( id, name ) AS
SELECT 1, 'category_1' FROM DUAL UNION ALL
SELECT 2, 'category_2' FROM DUAL UNION ALL
SELECT 3, 'category_3' FROM DUAL;
CREATE TABLE TAG (id, name, first_category_id) AS
SELECT 1, 'tag_1', 1 FROM DUAL UNION ALL
SELECT 2, 'tag_2', 1 FROM DUAL UNION ALL
SELECT 3, 'tag_3', 3 FROM DUAL UNION ALL
SELECT 4, 'tag_4', 1 FROM DUAL UNION ALL
SELECT 5, 'tag_5', 2 FROM DUAL;
CREATE TABLE MIGRATION_TAG ( id, tag_id, category_id ) AS
SELECT 9, 1, 3 FROM DUAL UNION ALL
SELECT 8, 5, 1 FROM DUAL UNION ALL
SELECT 7, 1, 2 FROM DUAL UNION ALL
SELECT 5, 3, 1 FROM DUAL UNION ALL
SELECT 4, 2, 2 FROM DUAL UNION ALL
SELECT 3, 5, 3 FROM DUAL UNION ALL
SELECT 2, 3, 3 FROM DUAL UNION ALL
SELECT 1, 1, 3 FROM DUAL;
Outputs:
CATEGORY
COUNT(*)
category_1
3
category_2
1
category_3
2
fiddle
One option uses a left join to bring the tag table, and the a lateral join to lookup the latest migration, ifi any. We can then use conditional logic:
select coalesce(t2.category_id, t.first_category_id) category, count(*) number_of_posts
from post p
inner join tag t on t.id = p.tag_id
outer apply (
select mt.category_id
from migration_tag mt
where mt.tag_id = p.tag_id
order by mt.id desc fetch first row only
) t2
group by coalesce(t2.category_id, t.first_category_id)

Categorizing in select statement

So i have this table
id | object | type
--------------------------------
1 | blue | color
1 | burger | food
2 | sandwich | food
2 | red | color
2 | coke | beverage
3 | sprite | beverage
3 | coke | beverage
3 | red | color
4 | bacon | food
i have to create a select statement that will show a table with columns id, color, food and beverage. Arranged by ID with their designated things on it.
so my expected result is
id | color | food | beverage
-------------------------------------------
1 | blue | burger |
2 | red | sandwich | coke
3 | red | | sprite
3 | | | coke
4 | | bacon |
as of now i have this code
Select id as id,
Case When I.Type = 'color' Then I.Object End As color,
Case When I.Type = 'food' Then I.Object End As food,
Case When I.Type = 'beverage' Then I.Object End As beverage
From table I
order by id
but the problem with my code is it doesnt group by its ID so it creates multiple rows for every object.
TIA!
You are looking for a pivot query. What is challenging about your problem is that, for a given id and type, there can be more than one object present. To handle this, you can first do a GROUP BY query to CSV aggregate objects for a given type using LISTAGG:
SELECT id,
MAX(CASE WHEN t.type = 'color' THEN t.object ELSE NULL END) AS color,
MAX(CASE WHEN t.type = 'food' THEN t.object ELSE NULL END) AS food,
MAX(CASE WHEN t.type = 'beverage' THEN t.object ELSE NULL END) AS beverage
FROM
(
SELECT id,
LISTAGG(object, ',') WITHIN GROUP (ORDER BY object) AS object,
type
FROM yourTable
GROUP BY id, type
) t
GROUP BY t.id
The inner query first aggregates objects across both id and type, and the outer query is a simple pivot query as you might expect.
Here is a Fiddle which shows an almost identical query in MySQL (Oracle seems to be perpetually broken):
SQLFiddle
You can try with something like the following:
with test(id, object, type) as
(
select 1,'blue', 'color' from dual union all
select 1,'burger', 'food' from dual union all
select 2,'sandwich','food' from dual union all
select 2,'red', 'color' from dual union all
select 2,'coke', 'beverage' from dual union all
select 3,'sprite', 'beverage' from dual union all
select 3,'coke', 'beverage' from dual union all
select 3,'red', 'color' from dual union all
select 4,'bacon', 'food' from dual
)
select id,
max( case when type = 'color'
then object
else null
end
) as color,
max( case when type = 'food'
then object
else null
end
) as food,
max( case when type = 'beverage'
then object
else null
end
) as beverage
from (
select id, object, type, row_number() over ( partition by id, type order by object) row_for_id
from test
)
group by id, row_for_id
order by id, row_for_id
The inner query is the main part, where you handle the case of a single id with many objects of a type; you can modify the ordering by editing the order by object.
The external query can be re-written in different ways, for example with a PIVOT; i used the MAX hoping to make it clear.
Try this
I have achieved using pivot clause
select id,object,type from yourtable
pivot
(
LISTAGG(object, ',') WITHIN GROUP (ORDER BY object)
for type IN
(
'color' AS "color",
'food' AS "food",
'beverage' AS "beverage"
)
)
order by id
Robin: My comment to you (under Tim Biegeleisen's answer) is partially incorrect. There IS a pivot-based solution; however, the "groups" are not by id, but instead they are by id AND rank within your three "categories". For this solution (or ANY solution that does not use dynamic SQL) to work, it is necessary that all the "types" (and their names) be known beforehand, and they must be hardcoded in the SQL query.
NOTE: In this solution, I assumed that for each id, the "objects" within the same "type" are associated to each other based on their alphabetical order (so, for example, for id = 3, "coke" is associated with "red" and "sprite" is associated with NULL, unlike your sample output). I asked you about that right below your Question - if there are additional rules you did not share with us, requiring a different pairing of objects of different types, it may or may not be possible to adapt the solution to meet those additional rules.
EDIT: On closer look, this is pretty much what Aleksej provided, without using the explicit pivot syntax. His solution has the advantage that it would work in older versions of Oracle (before 11.1 where pivot first became available).
QUERY (including test data in the first CTE):
with
inputs ( id, object, type ) as (
select 1, 'blue' , 'color' from dual union all
select 1, 'burger' , 'food' from dual union all
select 2, 'sandwich' , 'food' from dual union all
select 2, 'red' , 'color' from dual union all
select 2, 'coke' , 'beverage' from dual union all
select 3, 'sprite' , 'beverage' from dual union all
select 3, 'coke' , 'beverage' from dual union all
select 3, 'red' , 'color' from dual union all
select 4, 'bacon' , 'food' from dual
),
r ( id, object, type, rn ) as (
select id, object, type, row_number() over (partition by id, type order by object)
from inputs
)
select id, color, food, beverage
from r
pivot ( max(object) for type in ( 'color' as color, 'food' as food,
'beverage' as beverage))
order by id, rn
;
OUTPUT:
ID COLOR FOOD BEVERAGE
---- -------- -------- --------
1 blue burger
2 red sandwich coke
3 red coke
3 sprite
4 bacon

Oracle SQL displaying data from multiple records

Sorry if I am not explaining this properly, I am relatively new to SQL.
In oracle have a table describing properties (city, property type, cost of rent per month, other information)
My question is: assuming 3 unique property types (hotel, house, empty lot), how can I show which cities do not have all 3 types of properties?
GROUP BY solution, make sure there are less than 3 different property types for each city returned:
select city
from tablename
where property_type in ('hotel', 'house', 'empty lot')
group by city
having count(distinct property_type) < 3
Your SQL query should be
SELECT City
FROM YourTable
WHERE hotel <> 'hotelname' and house <> 'housename' and emptylot <> 'name'
assuming
Hotel, House, Emptylot is column name in your database.
There are two ways,
GROUP BY
Analytic COUNT() OVER()
For example, Let's say I have sample data of 3 cities, where city 1 has all the property types satisfied, rest other cities are not having all the required property types.
Using GROUP BY
SQL> -- sample table data
SQL> WITH DATA AS(
2 SELECT 1 city, 'hotel' property FROM dual UNION ALL
3 SELECT 1 city, 'house' property FROM dual UNION ALL
4 SELECT 1 city, 'empty' property FROM dual UNION ALL
5 SELECT 2 city, 'hotel' property FROM dual UNION ALL
6 SELECT 2 city, 'house' property FROM dual UNION ALL
7 SELECT 2 city, 'scrap' property FROM dual UNION ALL
8 SELECT 3 city, 'empty' property FROM dual UNION ALL
9 select 3 city, 'house' property from dual
10 )
11 -- query
12 SELECT city
13 FROM data
14 WHERE property IN ('hotel', 'house', 'empty')
15 GROUP BY city
16 HAVING COUNT(property) < 3
17 /
CITY
----------
2
3
SQL>
Using Analytic COUNT() OVER()
SQL> -- sample table data
SQL> WITH DATA AS(
2 SELECT 1 city, 'hotel' property FROM dual UNION ALL
3 SELECT 1 city, 'house' property FROM dual UNION ALL
4 SELECT 1 city, 'empty' property FROM dual UNION ALL
5 SELECT 2 city, 'hotel' property FROM dual UNION ALL
6 SELECT 2 city, 'house' property FROM dual UNION ALL
7 SELECT 2 city, 'scrap' property FROM dual UNION ALL
8 SELECT 3 city, 'empty' property FROM dual UNION ALL
9 select 3 city, 'house' property from dual
10 )
11 -- query
12 SELECT DISTINCT city
13 FROM
14 (SELECT t.* ,
15 COUNT(property) OVER(PARTITION BY city ORDER BY city) rn
16 FROM DATA t
17 WHERE property IN ('hotel', 'house', 'empty')
18 )
19 WHERE rn < 3
20 /
CITY
----------
2
3
SQL>
Could be something like this:
SELECT City
FROM YourTable
WHERE [property type] != 'hotel' OR
[property type] != 'empty lot' OR
[property type] != 'house'
(Edited)Try this query :
select t.city from table_name t
where t.city NOT IN
(select city from table_name
where ( property_type ='hotel' or
property_type ='house' or
property_type ='Empty lot')
);
(Query returning cities where all three types of property are not present):
select t.city from table t inner join
(select city from table
where property_type not in ('House','Hotel','Empty lot')) x
on t.city=x.city
group by t.city
having count(*)<3 ;

How to do select count(*) group by and select * at same time?

For example, I have table:
ID | Value
1 hi
1 yo
2 foo
2 bar
2 hehe
3 ha
6 gaga
I want my query to get ID, Value; meanwhile the returned set should be in the order of frequency count of each ID.
I tried the query below but don't know how to get the ID and Value column at the same time:
SELECT COUNT(*) FROM TABLE group by ID order by COUNT(*) desc;
The count number doesn't matter to me, I just need the data to be in such order.
Desire Result:
ID | Value
2 foo
2 bar
2 hehe
1 hi
1 yo
3 ha
6 gaga
As you can see because ID:2 appears most times(3 times), it's first on the list,
then ID:1(2 times) etc.
you can try this -
select id, value, count(*) over (partition by id) freq_count
from
(
select 2 as ID, 'foo' as value
from dual
union all
select 2, 'bar'
from dual
union all
select 2, 'hehe'
from dual
union all
select 1 , 'hi'
from dual
union all
select 1 , 'yo'
from dual
union all
select 3 , 'ha'
from dual
union all
select 6 , 'gaga'
from dual
)
order by 3 desc;
select t.id, t.value
from TABLE t
inner join
(
SELECT id, count(*) as cnt
FROM TABLE
group by ID
)
x on x.id = t.id
order by x.cnt desc
How about something like
SELECT t.ID,
t.Value,
c.Cnt
FROM TABLE t INNER JOIN
(
SELECT ID,
COUNT(*) Cnt
FROM TABLE
GROUP BY ID
) c ON t.ID = c.ID
ORDER BY c.Cnt DESC
SQL Fiddle DEMO
I see the question is already answered, but since the most obvious and most simple solution is missing, I'm posting it anyway. It doesn't use self joins nor subqueries:
SQL> create table t (id,value)
2 as
3 select 1, 'hi' from dual union all
4 select 1, 'yo' from dual union all
5 select 2, 'foo' from dual union all
6 select 2, 'bar' from dual union all
7 select 2, 'hehe' from dual union all
8 select 3, 'ha' from dual union all
9 select 6, 'gaga' from dual
10 /
Table created.
SQL> select id
2 , value
3 from t
4 order by count(*) over (partition by id) desc
5 /
ID VALU
---------- ----
2 bar
2 hehe
2 foo
1 yo
1 hi
6 gaga
3 ha
7 rows selected.

Find Missing Pairs in SQL

Assume there's a relational database with 3 tables:
Courses {name, id},
Students {name, id},
Student_Course {student_id, course_id}
I want to write an SQL that gives me the student-course pairs that do NOT exist. If that is not feasible, at least it'd be good to know if there are missing pairs or not.
Also, since this is a small part of a larger problem I'd like to automate, seeing many different ways of doing it would be useful.
1st find all pairs and then remove pairs present (either by left join/not null or not exists)
select s.id as student_id, c.id as course_id
from Courses as c
cross join Students as s
left join Student_Course as sc on sc.student_id = s.id and sc.course_id = c.id
where sc.course_id is null -- any sc field defined as "not null"
with Courses as(
select 1 as id,'Math' as name union all
select 2 as id,'English' as name union all
select 3 as id,'Physics' as name union all
select 4 as id,'Chemistry' as name),
Students as(
select 1 as id,'John' as name union all
select 2 as id,'Joseph' as name union all
select 3 as id,'George' as name union all
select 4 as id,'Michael' as name
),
studcrse as(
select 1 as studid, 1 as crseid union all
select 1 as studid, 2 as crseid union all
select 1 as studid, 3 as crseid union all
select 2 as studid, 3 as crseid union all
select 2 as studid, 4 as crseid union all
select 3 as studid, 1 as crseid union all
select 3 as studid, 2 as crseid union all
select 3 as studid, 4 as crseid union all
select 3 as studid, 3 as crseid union all
select 4 as studid, 4 as crseid )
SELECT A.ID AS studentId,a.name as studentname,b.id as crseid,b.name as crsename
from Students as a
cross join
Courses as b
where not exists
(
select 1 from studcrse as c
where c.studid=a.id
and c.crseid=b.id)