SQL add examples from grouping data - sql

Hello I have a small question in oracle SQL.
I have table Auto_Parts:
Category,Manufacturer_id,Part_name
Tires,Michelin, Pilot Pro
Tires,Michelin, Power One
Tires,Bridgestone, Potenza
Tires,Bridgestone, Turanza
Tires,Bridgestone, Blizzak
The query:
select Category,Manufacturer_id,count(*) cnt,example_1,example_2,example_3
from auto_parts
group by Category,Manufacturer_id
result:
Category,Manufacturer_id,cnt ,example_1,example_2,example_3
Tires ,Michelin ,1000 ,Pilot Pro,Power One,Power Two
Tires ,Bridgestone ,200 ,Potenza ,Turanza ,Blizzak
Question: how can I get 3 arbitrary values from the table above and present them as 3 columns in my query (a sample output is presented above, the columns are example_1,2,3)

This should do the trick. Obviously you don't need the WITH block, I just used that to mimic your data, so your query would start at the "select Category..."
with auto_parts as
(
select 'Tires' as Category,'Michelin' as Manufacturer_id, 'Pilot Pro' as part_name from dual union all
select 'Tires' as Category,'Michelin' as Manufacturer_id, 'Power One' as part_name from dual union all
select 'Tires' as Category,'Bridgestone' as Manufacturer_id, 'Potenza' as part_name from dual union all
select 'Tires' as Category,'Bridgestone' as Manufacturer_id, 'Turanza' as part_name from dual union all
select 'Tires' as Category,'Bridgestone' as Manufacturer_id, 'Blizzak' as part_name from dual
)
select
Category
Manufacturer_id,
count(*) cnt ,
max(case when rn = 1 then part_name end) example_1,
max(case when rn = 2 then part_name end) example_2,
max(case when rn = 3 then part_name end) example_3
from (
select category, manufacturer_id, part_name, row_number() over (partition by category, manufacturer_id order by dbms_random.value) rn
from auto_parts
)
group by Category,Manufacturer_id;

Related

How to SUM values from 2 separate tables that share the same column name in SQL

I have 2 tables that have the exact same columns but different data. The columns are 'name', 'gender' and 'count'. The first table is called names_2014 and the second names_2015. My goal is simply to find the top 5 most popular names amongst both these tables.
I know that to get the most popular names for one table is:
SELECT name, count
FROM names_2014
ORDER BY count DESC
LIMIT 5;
However, the closest I've gotten to my goal is:
SELECT name, count
FROM names_2014
UNION DISTINCT -- I've tried UNION ALL as well
SELECT name, SUM(count)
FROM names_2015
GROUP BY name
ORDER BY count DESC
LIMIT 5
I've tried many similar variations to this but none of them are successful. It seems that I need to combine both of the tables, and then SUM(count) and GROUP BY name but I guess I'm not combining the tables properly. Any help is much appreciated as I've spent hours on this and I feel like the solution is so close but I just can't see it. I'm new to SQL and just trying to test my limits.
You may perform the aggregation on a subquery that unions the two tables as the following:
select name, sum(count) cnt
from
(
select name, count
from names_2014
union all
select name, count
from names_2015
) T
group by name
order by cnt desc
limit 5
From your final ask it's not clear if you want to separate these top 5 by source table or not. Following is one answer that you might be looking for:
with name_2014 as (
select 'a' as name, 'm' as gender, 1 as cnt
union all
select 'b' as name, 'f' as gender, 3 as cnt
union all
select 'c' as name, 'm' as gender, 2 as cnt
),
name_2015 as (
select 'd' as name, 'f' as gender, 10 as cnt
union all
select 'b' as name, 'f' as gender, 5 as cnt
union all
select 'e' as name, 'm' as gender, 1 as cnt
)
(select 'name_2014' as src_table_name, name, sum(cnt) as total_counts from name_2014 group by name order by 3 desc limit 1)
union all
(select 'name_2015' as src_table_name, name, sum(cnt) as total_counts from name_2015 group by name order by 3 desc limit 1)
This sample query will give you top 1 names per table. (You can change limit and get top 5 from your query.)
If you do not want to know table names you can tweak the above query.
If you do not care about source tables at all and just want top 5 then:
with name_2014 as (
select 'a' as name, 'm' as gender, 1 as cnt
union all
select 'b' as name, 'f' as gender, 3 as cnt
union all
select 'c' as name, 'm' as gender, 2 as cnt
),
name_2015 as (
select 'd' as name, 'f' as gender, 10 as cnt
union all
select 'b' as name, 'f' as gender, 5 as cnt
union all
select 'e' as name, 'm' as gender, 1 as cnt
)
select name, sum(cnt) as total_count from
(select name, cnt from name_2014
union all
select name, cnt from name_2015)
group by 1 order by 2 desc limit 5

BigQuery: Union on repreated fields with different order of fields

How to make a UNION ALL work for repeated fields if the order of the fields does not match?
In the example below I try to UNION data_1_nested and data_2_nested, while the repeated field nested has two fields: id and age but in different order.
I could UNNEST and renest but this would not be very helpful if I have more then 1 nested field that I need to UNION on.
Example:
with
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all
Select 'a123' as id, 3 as age,'b' as grade
union all
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all
Select 'b456' as id, 5 as age,'f' as grade
union all
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id,
array_agg(STRUCT(
age,grade
)) as nested
from data_1
group by 1
)
,
data_2_nested as (
SELECT id,
array_agg(STRUCT(
grade, age
)) as nested
from data_2
group by 1
)
SELECT * from data_1_nested
union all
SELECT * from data_2_nested
Below should work for you
select * from data_1_nested
union all
select id, array(select as struct age, grade from t.nested) from data_2_nested t
if applied to sample data from your question - output is
I modified your data a little bit to make 2 nested fields that need to be union. I also added a JS function for parsing the JSON. It is an ugly solution, but it seems to be working. Not sure if it is scalable (how many functions have to be created to covert different nested fields).
CREATE TEMP FUNCTION JsonToItems(input STRING)
RETURNS ARRAY<STRUCT<age INT64, grade STRING>>
LANGUAGE js AS """
return JSON.parse(input);
""";
with
data_1 as (
Select 'a123' as id, 1 as age, 'a' as grade
union all
Select 'a123' as id, 3 as age,'b' as grade
union all
Select 'a123' as id, 4.5 as age,'c' as grade
)
,
data_2 as (
Select 'b456' as id, 6 as age,'e' as grade
union all
Select 'b456' as id, 5 as age,'f' as grade
union all
Select 'b456' as id, 2.5 as age,'g' as grade
)
,
data_1_nested as (
SELECT id,
array_agg(STRUCT(
age,grade
)) as nested,
array_agg(STRUCT(
age,grade
)) as nested2
from data_1
group by 1
)
,
data_2_nested as (
SELECT id,
array_agg(STRUCT(
grade, age
)) as nested,
array_agg(STRUCT(
grade, age
)) as nested2
from data_2
group by 1
)
select id, JsonToItems(json), JsonToItems(json2) from (
SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_1_nested
union all
SELECT id, TO_JSON_STRING(nested) as json, TO_JSON_STRING(nested2) as json2 from data_2_nested
);

Get parent path of hierarchical data in oracle

I have three tables, CATEGORY, GROUPING and PERFORMER where a category's direct children could be any other category or grouping or performer and a grouping's children could be any other grouping or performer, given this context when a category id or grouping id or performer id is provided then I need to get the whole parent path of given id. How to get it using SQL in oracle
if performer_id= 300 then result should be 300->202->201->101->100
if grouping_id = 203 then result should be 203->102->101->100
if category_id = 103 then result should be 103->101->100
Stack overflow is not a coding site, you should always include examples of what you have tried. That said, I found this so intriguing, I thought I would give it a try myself.
This is a totally brute force method. I extract the hierarchies from each level, then use listagg to bring them together:
WITH
category
AS
(SELECT '100' category_id, NULL parent_id
FROM DUAL
UNION ALL
SELECT '101' category_id, '100' parent_id
FROM DUAL
UNION ALL
SELECT '102' category_id, '101' parent_id
FROM DUAL
UNION ALL
SELECT '103' category_id, '101' parent_id
FROM DUAL),
GROUPING
AS
(SELECT '200' GROUPING_ID, NULL parent_id, '101' category_id
FROM DUAL
UNION ALL
SELECT '201' GROUPING_ID, '200' parent_id, '101' category_id
FROM DUAL
UNION ALL
SELECT '202' GROUPING_ID, '201' parent_id, '101' category_id
FROM DUAL
UNION ALL
SELECT '203' GROUPING_ID, NULL parent_id, '102' category_id
FROM DUAL),
performer
AS
(SELECT '300' performer_id, '202' GROUPING_ID, '101' category_id
FROM DUAL
UNION ALL
SELECT '301' performer_id, '201' GROUPING_ID, '101' category_id
FROM DUAL
UNION ALL
SELECT '302' performer_id, '203' GROUPING_ID, '103' category_id
FROM DUAL
UNION ALL
SELECT '303' performer_id, NULL GROUPING_ID, '102' category_id
FROM DUAL),
pset (p_gid, p_parentid, p_catid)
AS
(SELECT GROUPING.GROUPING_ID, parent_id, GROUPING.category_id
FROM performer INNER JOIN GROUPING ON performer.GROUPING_ID = GROUPING.GROUPING_ID
WHERE performer_id = '300'
UNION ALL
SELECT GROUPING_ID, parent_id, category_id
FROM pset INNER JOIN GROUPING ON GROUPING_ID = p_parentid),
cset (p_catid, p_parent)
AS
(SELECT p_catid, parent_id
FROM pset INNER JOIN category ON pset.p_catid = category.category_id
UNION ALL
SELECT category_id, parent_id
FROM cset INNER JOIN category ON category_id = p_parent),
dset
AS
(SELECT p_catid
FROM cset
UNION
SELECT p_gid
FROM pset
UNION
SELECT '300'
FROM DUAL)
SELECT LISTAGG (p_catid, '->') WITHIN GROUP (ORDER BY p_catid DESC) AS performer_chain
FROM dset
And the answer is
PERFORMER_CHAIN
300->202->201->200->101->100

What's the best way of re-using classification rules for multiple queries within big query standard SQL?

I'm using Big Query to analyse Google Analytics data.
I need to classify visits dependent on whether they visit particular URLs that indicate they were in the booking process or purchased etc.
There is a long list of URLs that represent each step and hence it would be advantageous to include the classifications within a view and re-use with appropriate joins for whatever query requires the classification.
I have the following view that seems to do what I need:
SELECT
fullVisitorId,
visitID,
LOWER(h.page.pagePath) AS path,
CASE
WHEN
LOWER(h.page.pagePath) = '/' THEN '/'
WHEN
LOWER(h.page.pagePath) LIKE '{path-here}%' OR
.... .... ....
ELSE 'other'
END
AS path_classification,
_TABLE_SUFFIX AS date
FROM
`{project-id}.{data-id}.ga_sessions_*`, UNNEST(hits) AS h
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX, r'[0-9]{8}')
AND
h.type = 'PAGE'
I'm wondering if there's a simpler way of achieving this that doesn't require selecting from a pre-existing table as this doesn't seem necessary to define the classifications. I get the feeling that it's possible to use something more straight forward, but I'm not sure how to do it.
Does anyone know how to put these definitions into a view without querying a table within the view?
Let's consider simple example:
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, '123' AS path UNION ALL
SELECT 2, '234' UNION ALL
SELECT 3, '345' UNION ALL
SELECT 4, '456'
)
SELECT
id,
path,
CASE path
WHEN '123' THEN 'a'
WHEN '234' THEN 'b'
WHEN '345' THEN 'c'
ELSE 'other'
END AS path_classification
FROM yourTable
ORDER BY id
Above can be refactored into below
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, '123' AS path UNION ALL
SELECT 2, '234' UNION ALL
SELECT 3, '345' UNION ALL
SELECT 4, '456'
)
SELECT
id,
path,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath = path LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification
FROM yourTable,
(SELECT ARRAY_AGG(STRUCT<cpath STRING, crule STRING>(path, rule)) AS rules
FROM `project.dataset.rules`) AS r
ORDER BY id
which relies on rules view that is defined as below
#standardSQL
SELECT '123' AS path, 'a' AS rule UNION ALL
SELECT '234', 'b' UNION ALL
SELECT '345', 'c' UNION ALL
SELECT NULL, 'other'
As you can see all classification rules are only in rules view!
You can play around this approach with below :
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, '123' AS path UNION ALL
SELECT 2, '234' UNION ALL
SELECT 3, '345' UNION ALL
SELECT 4, '456'
),
rules AS (
SELECT '123' AS path, 'a' AS rule UNION ALL
SELECT '234', 'b' UNION ALL
SELECT '345', 'c' UNION ALL
SELECT NULL, 'other'
)
SELECT
id,
path,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath = path LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification
FROM yourTable,
(SELECT ARRAY_AGG(STRUCT<cpath STRING, crule STRING>(path, rule)) AS rules
FROM rules) AS r
ORDER BY id
this can be further "simplified" by moving ARRAY_AGG inside view as below
#standardSQL
SELECT ARRAY_AGG(STRUCT<cpath STRING, crule STRING>(path, rule)) AS rules
FROM (
SELECT '123' AS path, 'a' AS rule UNION ALL
SELECT '234', 'b' UNION ALL
SELECT '345', 'c' UNION ALL
SELECT NULL, 'other'
)
In this case final query is as simple as below
#standardSQL
SELECT
id,
path,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath = path LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification
FROM yourTable, rules AS r
ORDER BY id
Depends on your specific rules - above can /should be adjusted/optimized respectively - but I hope this gives you a main direction
Q in comment: does your solution enable the use of matching with the LIKE keyword or matching with regex?
Original question was - What's the … way of re-using classification rules for multiple queries within big query standard SQL?
So above examples in my initial answer just show you how to make this happen (focus on “reuse”)
How you will use it (matching with the LIKE keyword or matching with regex) is totally up to you!
See example below
Take a look at path_classification_exact_match vs path_classification_like_match vs path_classification_regex_match
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, '123' AS path UNION ALL
SELECT 2, '234' UNION ALL
SELECT 3, '345' UNION ALL
SELECT 4, '456' UNION ALL
SELECT 5, '234abc' UNION ALL
SELECT 6, '345bcd' UNION ALL
SELECT 7, '456cde'
),
rules AS (
SELECT ARRAY_AGG(STRUCT<cpath STRING, crule STRING>(path, rule)) AS rules
FROM (
SELECT '123' AS path, 'a' AS rule UNION ALL
SELECT '234', 'b' UNION ALL
SELECT '345', 'c' UNION ALL
SELECT NULL, 'other'
)
)
SELECT
id,
path,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath = path LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification_exact_match,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE path LIKE CONCAT('%',rr.cpath,'%') LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification_like_match,
IFNULL(
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE REGEXP_CONTAINS(path, rr.cpath) LIMIT 1),
( SELECT rr.crule FROM UNNEST(r.rules) AS rr WHERE rr.cpath IS NULL LIMIT 1)
) AS path_classification_regex_match
FROM yourTable, rules AS r
ORDER BY id
Output is:
id path path_classification_exact_match path_classification_like_match path_classification_regex_match
1 123 a a a
2 234 b b b
3 345 c c c
4 456 other other other
5 234abc other b b
6 345bcd other c c
7 456cde other other other
Hope this helps :o)
It sounds like you may be interested in WITH clauses, which let you compose queries without having to use subqueries. For example,
#standardSQL
WITH Sales AS (
SELECT 1 AS sku, 3.14 AS price UNION ALL
SELECT 2 AS sku, 1.00 AS price UNION ALL
SELECT 3 AS sku, 9.99 AS price UNION ALL
SELECT 2 AS sku, 0.90 AS price UNION ALL
SELECT 1 AS sku, 3.56 AS price
),
ItemTotals AS (
SELECT sku, SUM(price) AS total
FROM Sales
GROUP BY sku
)
SELECT sku, total
FROM ItemTotals;
If you want to compose expressions, you can use CREATE TEMP FUNCTION statements to provide "macro-like" functionality:
#standardSQL
CREATE TEMP FUNCTION LooksLikeCheese(s STRING) AS (
LOWER(s) IN ('gouda', 'gruyere', 'havarti')
);
SELECT
s1,
LooksLikeCheese(s1) AS s1_is_cheese,
s2,
LooksLikeCheese(s2) AS s2_is_cheese
FROM (
SELECT 'spam' AS s1, 'ham' AS s2 UNION ALL
SELECT 'havarti' AS s1, 'crackers' AS s2 UNION ALL
SELECT 'gruyere' AS s1, 'ice cream' AS s2
);

How do I combine data from multiple rows into one?

I’m using SQL Server 2008. I have data as in this table:
Team Email Groups
------- ------------------ ------
|Team1|-|email0#email.com|-|A|
|Team1|-|email1#email.com|-|B|
|Team1|-|email2#email.com|-|C|
|Team2|-|email3#email.com|-|A|
|Team2|-|email4#email.com|-|B|
|Team2|-|email5#email.com|-|C|
I want to get the data in this format:
Team A B C
------- ------------------ ------------------ ------------------
|Team1|-|email0#email.com|-|email1#email.com|-|email2#email.com|
|Team2|-|email3#email.com|-|email4#email.com|-|email5#email.com|
How can I achieve this?
Using PIVOT You can do the following
With SampleData AS
(
SELECT 'Team1' as Team , 'email0#email.com' as email, 'A' as Groups
UNION SELECT 'Team1' as Team , 'email1#email.com' as email, 'B' as Groups
UNION SELECT 'Team1' as Team , 'email2#email.com' as email, 'C' as Groups
UNION SELECT 'Team2' as Team , 'email3#email.com' as email, 'A' as Groups
UNION SELECT 'Team2' as Team , 'email4#email.com' as email, 'B' as Groups
UNION SELECT 'Team2' as Team , 'email5#email.com' as email, 'C' as Groups
)
SELECT Team, A, B,C FROM
(SELECT * FROM SampleData) source
PIVOT
(MAX(email) FOR Groups IN ([A], [B], [C]) )as pvt
Produces
Team A B C
----- ---------------- ---------------- ----------------
Team1 email0#email.com email1#email.com email2#email.com
Team2 email3#email.com email4#email.com email5#email.com
See a working Data.SE example
In a DB that doesn't support PIVOT you can instead do multiple joins to your table. Although you may want to anyway, since as GBN pointed out, since we're not using an aggregate.
With SampleData AS
(
SELECT 'Team1' as Team , 'email0#email.com' as email, 'A' as Groups
UNION SELECT 'Team1' as Team , 'email1#email.com' as email, 'B' as Groups
UNION SELECT 'Team1' as Team , 'email2#email.com' as email, 'C' as Groups
UNION SELECT 'Team2' as Team , 'email3#email.com' as email, 'A' as Groups
UNION SELECT 'Team2' as Team , 'email4#email.com' as email, 'B' as Groups
UNION SELECT 'Team2' as Team , 'email5#email.com' as email, 'C' as Groups
)
SELECT
source.Team,
A.email,
B.email,
C.email
FROM
(SELECT DISTINCT TEAM From SampleData) source
LEFT JOIN SampleData A
ON source.Team = A.Team
AND A.GROUPS = 'A'
LEFT JOIN SampleData B
ON source.Team = B.Team
AND B.GROUPS = 'B'
LEFT JOIN SampleData C
ON source.Team = C.Team
AND C.GROUPS = 'C'
See a working Data.SE example