Big query Pivoting with a specific requirement - google-bigquery

I have used pivot in big query, but here is a specific use case and the data that I need to show in looker. I am trying the similar option in looker but wanted to know if I can just show this in big query.
This is how my data (Sample) in BIG QUERY table is:
The output should be as below:
If you look at it, it's pivoting but I need to assign the column names as shown (for the specific range) and for the range 6 and more, I need to add the pivot columns data into one.
I don't see pivot index or something like this in BIG_QUERY. Was thinking if there is a way to sum up the column data after pivot index 6 or so? Any suggestions how to achieve this?

Hope below approach would be helpful,
SELECT * FROM (
SELECT Node, bucket, total_code
FROM sample, UNNEST([RANGE_BUCKET(data1, [1, 2, 3, 4, 5, 6, 7])]) bucket
) PIVOT (SUM(total_code) `range` FOR bucket IN (1, 2, 3, 4, 5, 6, 7));
output:
RANGE_BUCKET - https://cloud.google.com/bigquery/docs/reference/standard-sql/mathematical_functions#range_bucket

Related

Aggregating one bigquery table to another bigquery table

I am trying to aggregate multi PB (around 7PB) worth of BigQuery Table into another BigQuery Table
I have (partition_key, clusterkey1, clusterkey2, col1, col2, val)
Where partition_key is used for bigquery partition and clusterkey is used for clustering.
For example
(timestamp1, timestamp2, 0, 1, 2, 1)
(timestamp3, timestamp4, 0, 1, 2, 7)
(timestamp31, timestamp22, 2, 1, 2, 2)
(timestamp11, timestamp12, 2, 1, 2, 3)
should result to
(0, 1, 2, 8)
(2, 1, 2, 5)
I want to aggregate base on (clusterkey2, col1, col2), across all partition_key and all clusterkey1 for val
What is a feasible way to do this?
Should I write a custom loader and just read all data from it line by line, or is there a native way to do this?
Depending on where / how you are executing this you can do it by writing a simple sql script and defining the target output, for example:
SELECT clusterkey2
, col1
, col2
, sum(val)
from table
group by clusterkey2, col1, col2
This will get you the desired results.
From here you can do a few things, but they are mostly all outlined here in the documentation:
https://cloud.google.com/bigquery/docs/writing-results#writing_query_results
Specifically from the above you are looking to set the destination table.
One thing to note, you may want to include a partition key in the where clause to help narrow down your data if you do not want the aggregate results of the whole table.

How to create a pivot query together with group by statement with multiple columns

In order to achieve the following structure, using "pivot" table and "group by" with multiple columns. (i.e. as illustrated in the second image below),
what would be the SQL implementation?
The source query is:
SELECT
t1.date,
t1.area,
t1.canal,
SUM(t1.peso) AS peso
FROM table1 t1
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
and source query generates a initial structure as in:
Then, the goal's to achieve a final structure grouped by columns "area" and "canal", pivoting column "date" but only to the column "peso".
Plus, a partial total of each area, named as "total" .
As illustrated in the image bellow.
After a 24 long hours and quick nap. I finally got an asnwer.
A simple and straight forward line, using PySpark.
df = dfa.groupBy("area", "canal").pivot("date").sum("peso")
Thanks to #andrew-svds and his Warmup github's repository
https://github.com/andrew-svds/spark-pivot-examples/tree/master/0-Warmup
The complete chunk is described bellow, for reference purposes
qry = """
SELECT
t1.date,
t1.area,
t1.canal,
t1.peso
FROM table1 t1
''''
dfy = spark.sql(qry)
dfy = dfa.groupBy("area", "canal").pivot("date").sum("peso")
dfy = dfy.orderBy("area", "canal")
display(dfy)
I believe there are many way to get the same results. That one was the simplest and more intuitive that I was able to write.
Perhaps tomorrow with a nice and good sleep, I'll get an even simpler line of code! :)
Bets wishes,
I

How to get data in a column in order by using SQL in operator

There is a data set as shown below;
When input for event_type is 4, 1, 2, 3 for example, I would like to get 3, 999, 3, 9 from cnt_stamp in this order. I created a SQL code as shown below, but it seems like it always returns 999, 3, 9, 3 regardless the order of the input.
How can I fix the SQL to achieve this? Thank you for taking your time, and please let me know if you have any question.
SELECT `cnt_stamp` FROM `stm_events` WHERE `event_type` in (4,1,2,3)
Add ORDER BY FIELD(event_type, 4, 1, 2, 3) in your query. It should look like:
SELECT cnt_stamp FROM stm_events WHERE event_type in (4,1,2,3) ORDER BY FIELD(event_type, 4, 1, 2, 3);
its cannot because as default the data sort by ascending, if u want result like u want,, better u create 1 column for indexing

Doing a concat over a partition in SQL?

I have some data ordered like so:
date, uid, grouping
2018-01-01, 1, a
2018-01-02, 1, a
2018-01-03, 1, b
2018-02-01, 2, x
2018-02-05, 2, x
2018-02-01, 3, z
2018-03-01, 3, y
2018-03-02, 3, z
And I wanted a final form like:
uid, path
1, "a-a-b"
2, "x-x"
3, "z-y-z"
but running something like
select
a.uid
,concat(grouping) over (partition by date, uid) as path
from temp1 a
Doesn't seem to want to play well with SQL or Google BigQuery (which is the specific environment I'm working in). Is there an easy enough way to get the groupings concatenated that I'm missing? I imagine there's a way to brute force it by including a bunch of if-then statements as custom columns, then concatenating the result, but I'm sure that will be a lot messier. Any thoughts?
You are looking for string_agg():
select a.uid, string_agg(grouping, '-' order by date) as path
from temp1 a
group by a.uid;

SQL Sum one column based on various text in another

First question here so not sure exactly how to describe my issue really.
Suppose I have a very large table called "soitem". From this table, I would like to see rows "soitem.productnum", "soitem.description", and "oitem.qtyfulfilled".
This will be filtered by "soitem.datelastfullfillment" which so far looks like this:
SELECT soitem.productnum, SUM(soitem.qtyfulfilled) AS Total_Sold
FROM soitem
WHERE soitem.datelastfulfillment BETWEEN '9/1/15' AND '9/30/15'
GROUP BY soitem.productnum
I would like to take this a step further and have it only sum the qtyfulfilled when a different column, "qbclassid", equals (2, 3, 9, 12, 14).
If you notice, my query also does not show soitme.description yet since it usually gives me an error. If anyone can get that to work too, even better!
Try this:
SELECT soitem.productnum, soitem.description, SUM(soitem.qtyfulfilled) AS Total_Sold
FROM soitem
WHERE soitem.datelastfulfillment BETWEEN '9/1/15' AND '9/30/15'
AND soitem.qbclassid IN (2, 3, 9, 12, 14)
GROUP BY soitem.productnum, soitem.description