I would like to pivot table, but to use PIVOT() have to use any aggregation functions such as max(), min(), mean(), count(), sum(). I don't need use these functions, but I need to transform my table without using them.
Source table
SOURCE
ATTRIBUTE
CATEGORY
GOOGLE
MOVIES
1
YAHOO
JOURNAL
2
GOOGLE
MUSIC
1
AOL
MOVIES
3
The new table should be like this:
ATTRIBUTE
GOOGLE
YAHOO
AOL
MOVIES
1
3
Will be grateful if someone would help me.
The pivot syntax requires an aggregate function. It's not optional.
https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
SELECT ... FROM ... PIVOT ( <aggregate_function> ( <pivot_column> )
FOR <value_column> IN ( <pivot_value_1> [ , <pivot_value_2> ... ] ) )
[ ... ]
In the Snowflake documentation, optional syntax is in square braces. The <aggregate_function> is not in square braces, so it's required. If you only have one value, any of the aggregate functions you listed except count will work and give the same result.
create or replace table T1("SOURCE" string, ATTRIBUTE string, CATEGORY int);
insert into T1("SOURCE", attribute, category) values
('GOOGLE', 'MOVIES', 1),
('YAHOO', 'JOURNAL', 2),
('GOOGLE', 'MUSIC', 1),
('AOL', 'MOVIES', 3);
select *
from T1
PIVOT ( sum ( CATEGORY )
for "SOURCE" in ( 'GOOGLE', 'YAHOO', 'AOL' ));
ATTRIBUTE
'GOOGLE'
'YAHOO'
'AOL'
MOVIES
1
null
3
MUSIC
1
null
null
JOURNAL
null
2
null
Here's an alternative approach utilising ARRAY_AGG() which maybe more flexible. For example if you wanted to pivot the Attribute instead of the category, easy peasy. The 'category' here appears to be more of a classification or label instead of something we'd want summed.
The array doesn't need to be constrained any particular data type. This can be extended to pivots with many objects.
WITH CTE AS ( SELECT 'GOOGLE' SOURCE, 'MOVIES' ATTRIBUTE, 1 CATEGORY UNION ALL SELECT 'YAHOO', 'JOURNAL', 2 UNION ALL SELECT 'GOOGLE', 'MUSIC', 1 UNION ALL SELECT 'AOL', 'MOVIES', 3 );
SELECT
ATTRIBUTE, GOOGLE[0] GOOGLE, YAHOO[0] YAHOO, AOL[0] AOL
FROM
CTE PIVOT (ARRAY_AGG(CATEGORY) FOR SOURCE IN ('GOOGLE', 'YAHOO', 'AOL')
) AS A (ATTRIBUTE, GOOGLE, YAHOO, AOL);
Related
Suppose I have a table like this in SQL Server 2017, let's call it "maps_and_cups"
some_code
quantity
big_map
6
tiny_map
5
big_cup
10
tiny_cup
4
I would like to know the best way to group the maps and cups into one, in this way.
some_code
quantity
maps
11
cups
14
I know that it is using "if" and "case", adding and comparing if it is a tiny_map, a big_map, and so on, I have seen several examples but I cannot make it compile.
You can indeed use a case when expression. For instance:
with base as
(select case some_code when 'big_map' then 'maps'
when 'tiny_map' then 'maps'
when 'big_cup' then 'cups'
when 'tiny_cup' then 'cups'
else 'other'
end grp,
quantity
from maps_and_cups)
select grp, sum(quantity) quantity from base group by grp;
However, if you're going to list each and every code explicitly, you might as well create a reference table for it:
some_code
grp
big_map
maps
tiny_map
maps
big_cup
cups
tiny_cup
cups
...and then join that table into your query:
select grp, sum(quantity)
from maps_and_cups a left join ref_maps_cups b on a.some_code = b.some_code
group by grp;
You can solve this task using "case" and "charindex" functions, like this:
declare
#t table (some_code varchar (20), quantity int)
insert into #t
values
('big_map', 6),
('tiny_map', 5),
('big_cup',10),
('tiny_cup', 4)
select
case
when charindex ('map', some_code)>0 then 'map'
when charindex ('cup', some_code)>0 then 'cup'
end some_code
,sum(quantity) quantity
from #t
group by
case
when charindex ('map', some_code)>0 then 'map'
when charindex ('cup', some_code)>0 then 'cup'
end
OUTPUT:
If you just want the right three characters for aggregating, you can use right():
select right(some_code, 3) + 's', sum(quantity)
from maps_and_cups
group by right(some_code, 3) + 's';
You are creating a problem for yourself as you're (probably) breaking the first normal form by storing non atomic values in the field "some_code". (Some field name i'd say. ;)
Why not separating the value into [ type ] and [ size ] ?
Each array consists of information about which list (internal_list_id) does a certain contact belong to (vid).
I'm trying to include all internal_list_id (separated by comma) in one column grouped by vid.
The end data should like something like:
ContactID | ListMembership:
3291601 1058,1060
I've tried with the below code but it returns information about the first object only:
SELECT list_memberships[offset(1)].vid ContactId, list_memberships[offset(1)].internal_list_id ListMembership FROM hs.contacts as c
The below results is achieved via:
SELECT list_memberships FROM hs.contacts as c
P.S. If you have any suggestions for better a title please let me know. Thanks!
Use STRING_AGG(x) FROM UNNEST(array), like in:
WITH data AS (
SELECT visitStartTime, hits[OFFSET(0)].product
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
LIMIT 100
)
SELECT visitStartTime, (
SELECT STRING_AGG(FORMAT('$%i', localProductPrice), ', ')
FROM UNNEST(product)
) aggregated
FROM data
I have data which can have varying json keys, I want to store all of this data in bigquery and then explore the available fields later.
My structure will be like so:
[
{id: 1111, data: {a:27, b:62, c: 'string'} },
{id: 2222, data: {a:27, c: 'string'} },
{id: 3333, data: {a:27} },
{id: 4444, data: {a:27, b:62, c:'string'} },
]
I wanted to use a STRUCT type but it seems all the fields need to be declared?
I then want to be able to query and see how often each key appears, and basically run queries over all records with for example a key as though it was in its own column.
Side note: this data is coming from URL query strings, maybe someone thinks it is best to push the full url and the use the functions to run analysis?
There are two primary methods for storing semi-structured data as you have in your example:
Option #1: Store JSON String
You can store the data field as a JSON string, and then use the JSON_EXTRACT function to pull out the values it can find, and it will return NULL for any value it cannot find.
Since you mentioned needing to do mathematical analysis on the fields, let's do a simple SUM for the values of a and b:
# Creating an example table using the WITH statement, this would not be needed
# for a real table.
WITH records AS (
SELECT 1111 AS id, "{\"a\":27, \"b\":62, \"c\": \"string\"}" as data
UNION ALL
SELECT 2222 AS id, "{\"a\":27, \"c\": \"string\"}" as data
UNION ALL
SELECT 3333 AS id, "{\"a\":27}" as data
UNION ALL
SELECT 4444 AS id, "{\"a\":27, \"b\":62, \"c\": \"string\"}" as data
)
# Example Query
SELECT SUM(aValue) AS aSum, SUM(bValue) AS bSum FROM (
SELECT id,
CAST(JSON_EXTRACT(data, "$.a") AS INT64) AS aValue, # Extract & cast as an INT
CAST(JSON_EXTRACT(data, "$.b") AS INT64) AS bValue # Extract & cast as an INT
FROM records
)
# results
# Row | aSum | bSum
# 1 | 108 | 124
There are some pros and cons to this approach:
Pros
The syntax is fairly straight forward
Less error prone
Cons
Store costs will be slightly higher since you have to store all the characters to serialize to JSON.
Queries will run slower than using pure native SQL.
Option #2: Repeated Fields
BigQuery has support for repeated fields, allowing you to take your structure and express it natively in SQL.
Using the same example, here is how we would do that:
## Using a with to create a sample table
WITH records AS (SELECT * FROM UNNEST(ARRAY<STRUCT<id INT64, data ARRAY<STRUCT<key STRING, value STRING>>>>[
(1111, [("a","27"),("b","62"),("c","string")]),
(2222, [("a","27"),("c","string")]),
(3333, [("a","27")]),
(4444, [("a","27"),("b","62"),("c","string")])
])),
## Using another WITH table to take records and unnest them to be joined later
recordsUnnested AS (
SELECT id, key, value
FROM records, UNNEST(records.data) AS keyVals
)
SELECT SUM(aValue) AS aSum, SUM(bValue) AS bSum
FROM (
SELECT R.id, CAST(RA.value AS INT64) AS aValue, CAST(RB.value AS INT64) AS bValue
FROM records R
LEFT JOIN recordsUnnested RA ON R.id = RA.id AND RA.key = "a"
LEFT JOIN recordsUnnested RB ON R.id = RB.id AND RB.key = "b"
)
# results
# Row | aSum | bSum
# 1 | 108 | 124
As you can see, to perform a similar, it is still rather complex. You also still have to store items like strings and CAST them to other values when necessary, since you cannot mix types in a repeated field.
Pros
Store size will be less than JSON
Queries will typically execute faster.
Cons
The syntax is more complex, not as straight forward
Hope that helps, good luck.
I am having the following - simplified - layout for tables:
TABLE blocks (id)
TABLE content (id, blockId, order, data, type)
content.blockId is a foreign key to blocks.id. The idea is that in the content table you have many content entries with different types for one block.
I am now looking for a query that can provide me with an aggregation based on a blockId where all the content entries of the 3 different types are concatenated and put into respective columns.
I have already started and found the listagg function which is working well, I did the following statement and lists me all the content entries in a column:
SELECT listagg(c.data, ',') WITHIN GROUP (ORDER BY c.order) FROM content c WHERE c.blockId = 330;
Now the concatenated string however contains all the data elements of the block in one column. What I would like to achieve is that its put into separate columns based on the type. For example the following content of content would be like this:
1, 1, 0, "content1", "FRAGMENT"
2, 1, 1, "content2", "BULK"
3, 1, 3, "content4", "FRAGMENT"
4, 1, 2, "content3", "FRAGMENT"
Now I wanted to get as an output 2 columns, one is FRAGMENT and one is BULK, where FRAGMENT contains "content1;content3;content4" and BULK contains "content2"
Is there an efficient way of achieving this?
You can use case:
SELECT listagg(CASE WHEN content = 'FRAGMENT' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as fragments,
listagg(CASE WHEN content = 'BULK' THEN c.data END, ',') WITHIN GROUP (ORDER BY c.order) as bulks
FROM content c
WHERE c.blockId = 330;
As an alternative, if you want it more dynamic, you could pivot the outcome.
Note, that this will only work for Oracle 11.R2. HereĀ“s an example how it could look like:
select * from
(with dataSet as (select 1 idV, 1 bulkid, 0 orderV, 'content1' dataV, 'FRAGMENT' typeV from dual union
select 2, 1, 1, 'content2', 'BULK' from dual union
select 3, 1, 3, 'content4', 'FRAGMENT' from dual union
select 4, 1, 2, 'content3', 'FRAGMENT' from dual)
select typeV, listagg(dataSet.dataV ,',') WITHIN GROUP (ORDER BY orderV) OVER (PARTITION BY typeV) dataV from dataSet)
pivot
(
max(dataV)
for typeV in ('BULK', 'FRAGMENT')
)
O/P
Bulk | FRAGMENT
-----------------
content2 | content1,content3,content4
The important things here:
OVER (PARTITION BY typeV): this acts like a group by for the listagg, concatinating everything having the same typeV.
for typeV in ('BULK', 'FRAGMENT'): this will gather the data for BULK and FRAGMENT and produce separate columns for each.
max(dataV) simply to provide a aggregate function, otherwise pivot wont work.
MySQL provides a string function named FIELD() which accepts a variable number of arguments. The return value is the location of the first argument in the list of the remaining ones. In other words:
FIELD('d', 'a', 'b', 'c', 'd', 'e', 'f')
would return 4 since 'd' is the fourth argument following the first.
This function provides the capability to sort a query's results based on a very specific ordering. For my current application there are four statuses that I need to manager: active, approved, rejected, and submitted. However, if I simply order by the status column, I feel the usability of the resulting list is lessened since rejected and active status items are more important than submitted and approved ones.
In MySQL I could do this:
SELECT <stuff> FROM <table> WHERE <conditions> ORDER BY FIELD(status, 'rejected', 'active','submitted', 'approved')
and the results would be ordered such that rejected items were first, followed by active ones, and so on. Thus, the results were ordered in decreasing levels of importance to the visitor.
I could create a separate table which enumerates this importance level for the statuses and then order the query by that in descending order, but this has come up for me a few times since switching to MS SQL Server so I thought I'd inquire as to whether or not I could avoid the extra table and the somewhat more complex queries using a built-in function similar to MySQL's FIELD().
Thank you,
David Kees
Use a CASE expression (SQL Server 2005+):
ORDER BY CASE status
WHEN 'active' THEN 1
WHEN 'approved' THEN 2
WHEN 'rejected' THEN 3
WHEN 'submitted' THEN 4
ELSE 5
END
You can use this syntax for more complex evaluation (including combinations, or if you need to use LIKE)
ORDER BY CASE
WHEN status LIKE 'active' THEN 1
WHEN status LIKE 'approved' THEN 2
WHEN status LIKE 'rejected' THEN 3
WHEN status LIKE 'submitted' THEN 4
ELSE 5
END
For your particular example your could:
ORDER BY CHARINDEX(
',' + status + ',',
',rejected,active,submitted,approved,'
)
Note that FIELD is supposed to return 0, 1, 2, 3, 4 where as the above will return 0, 1, 10, 17 and 27 so this trick is only useful inside the order by clause.
A set based approach would be to outer join with a table-valued-constructor:
LEFT JOIN (VALUES
('rejected', 1),
('active', 2),
('submitted', 3),
('approved', 4)
) AS lu(status, sort_order)
...
ORDER BY lu.sort_order
I recommend a CTE (SQL server 2005+).
No need to repeat the status codes or create the separate table.
WITH cte(status, RN) AS ( -- CTE to create ordered list and define where clause
SELECT 'active', 1
UNION SELECT 'approved', 2
UNION SELECT 'rejected', 3
UNION SELECT 'submitted', 4
)
SELECT <field1>, <field2>
FROM <table> tbl
INNER JOIN cte ON cte.status = tbl.status -- do the join
ORDER BY cte.RN -- use the ordering defined in the cte
Good luck,
Jason
ORDER BY CHARINDEX(','+convert(varchar,status)+',' ,
',rejected,active,submitted,approved,')
just put a comma before and after a string in which you are finding the substring index or you can say that second parameter.
and first parameter of charindex is also surrounded by ,