BigQuery: Flattening all repeated fields in nested schema

BigQuery: Flattening all repeated fields in nested schema - google-bigquery

I am having so much trouble with querying from Big Query's nested schema.
I have the following fields.
I want to flatten the table and get something like this.
user | question_id | user_choices
123 | 1 | 1
123 | 1 | 2
123 | 1 | 3
123 | 1 | 4
From other resources, I got to a point where I can query from one of the records in the repeated columns. Such as the following:
SELECT user, dat.question_id FROM tablename, UNNEST(data) dat
It gives me this result.
But when I do this, I get another repeated columns again.
SELECT user, dat.question_id, dat.user_choices FROM tablename, UNNEST(data) dat
Can anyone help me how to UNNEST this table properly so I can have flattened schema for all data items?
Thanks!

Below is for BigQuery Standard SQL
#standardSQL
SELECT user, question_id, choice
FROM `project.dataset.table`,
UNNEST(data) question,
UNNEST(user_choices) choice
You can test, play with above using dummy data from your question like below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user,
[STRUCT<question_id INT64, user_choices ARRAY<INT64>>
(1,[1,2,3]),
(2,[2,5]),
(3,[1,3])
] data UNION ALL
SELECT 2 user,
[STRUCT<question_id INT64, user_choices ARRAY<INT64>>
(1,[2,3]),
(2,[4,5]),
(3,[2,6])
] data
)
SELECT user, question_id, choice
FROM `project.dataset.table`,
UNNEST(data) question,
UNNEST(user_choices) choice
ORDER BY user, question_id, choice
with result
Row user question_id choice
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 5
6 1 3 1
7 1 3 3
8 2 1 2
9 2 1 3
10 2 2 4
11 2 2 5
12 2 3 2
13 2 3 6

Related

Generate a serial number based on quantity column in sql

Hi Experts I have a table like this
T1
Order_no
Qty
1
3
2
5
3
1
4
3
I need to generate a column 'serial no' having values based on 'qty'
Output needed
OrderNo
Qty
SerailNo
1
3
1
1
3
2
1
3
3
2
5
1
2
5
2
2
5
3
2
5
4
2
5
5
3
1
1
4
3
1
4
3
2
4
3
3
Any suggestions?
Thanks in advance!!

You don't mention the specific database so I'll assume you are using PostgreSQL, aren't you?
You can use a Recursive CTE to expand the rows. For example:
with recursive
n as (
select order_no, qty, 1 as serial_no from t1
union all
select order_no, qty, serial_no + 1
from n
where serial_no < qty
)
select * from n order by order_no, serial_no
Result:
order_no qty serial_no
--------- ---- ---------
1 3 1
1 3 2
1 3 3
2 5 1
2 5 2
2 5 3
2 5 4
2 5 5
3 1 1
4 3 1
4 3 2
4 3 3
See running example at DB Fiddle.
EDIT FOR ORACLE
If you are using Oracle the query changes a bit to:
with
n (order_no, qty, serial_no) as (
select order_no, qty, 1 from t1
union all
select order_no, qty, serial_no + 1
from n
where serial_no < qty
)
select * from n order by order_no, serial_no
Result:
ORDER_NO QTY SERIAL_NO
--------- ---- ---------
1 3 1
1 3 2
1 3 3
2 5 1
2 5 2
2 5 3
2 5 4
2 5 5
3 1 1
4 3 1
4 3 2
4 3 3
See running example at db<>fiddle.

You should first provide the database you're using. Whether it's oracle, Sql Server, PostGreSQL will determine which procedural language to use. It's very likely that you'll need to do this in two steps:
1st: Duplicate the number of rows based on the column Qty using a decreasing loop
2nd: You'll need to create a sequential partionned column based on the Qty column

Add position column based on order by - Oracle

I have this table with the following records:
table1
id ele_id_1 ele_val ele_id_2
1 2 123 1
1 1 abc 1
1 4 xyz 2
1 4 456 1
2 5 22 1
2 4 344 1
2 3 6 1
2 2 Test Name 1
2 1 Hello 1
I am trying to add position for each id when ele_id_1 and ele_id_2 is order by ASC.
Here is the output:
id ele_id_1 ele_val ele_id_2 position
1 2 123 1 2
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 3
2 5 22 1 5
2 4 344 1 4
2 3 6 1 3
2 2 Test Name 1 2
2 1 Hello 1 1
I have 34 million rows in table1, so would like to use an efficient way of doing this.
Any idea on how I can add position with values?

I think you want row_number() used like this:
select row_number() over (partition by id
order by ele_id_1, ele_id_2
) as position
Oracle can use an index for this, on (id, ele_id_1, ele_id_2).
I should note that for your example data order by ele_id_1, ele_id_2 and order by ele_id_2, ele_id_1 produce the same result. Your question suggests that you want the first.
So, you would get
id ele_id_1 ele_val ele_id_2 position
1 1 123 2 2
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 3
Rather than:
id ele_id_1 ele_val ele_id_2 position
1 1 123 2 3
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 2
EDIT:
If you want to update the data, then merge is probably the best approach.
MERGE INTO <yourtable> dest
USING (select t.*,
row_number() over (partition by id
order by ele_id_1, ele_id_2
) as new_position
from <yourtable> t
) src
ON dest.id = src.id AND
dest.ele_id_1 = src.ele_id_1 AND
dest.ele_id_2 = src.ele_id_2
WHEN MATCHED THEN UPDATE
SET desc.postition = src.new_position;
Note that updating all the rows in a table is an expensive operation. Truncating the table and recreating it might be easier:
create table temp_t as
select t.*,
row_number() over (partition by id
order by ele_id_1, ele_id_2
) as new_position
from t;
truncate table t;
insert into t ( . . . )
select . . . -- all columns but position
from temp_t;
However, be very careful if you truncate the table. Be sure to back it up first!

How to apply Joins in Oracle to require result

I have Following 3 tables :
SHIFT_MASTER,PATTERN_MASTER,PATTERN_DETAILS
S_ID ,P_ID,P_D_ID are the priamry keys of SHIFT_MASTER,PATTERN_MASTER,PATTERN_DETAILS tables respectively.
SHIFT_MASTER
S_ID | S_NUMBER| S_Name
---------------------------------
1 A MORNING
2 B AFTERNOON
3 C NIGHT
PATTERN_MASTER
P_ID | P_NAME
----------------
1 Pattern 1
2 Pattern 2
PATTERN_DETAILS
P_D_ID|P_ID | S_ID| ...
---------------------
1 1 1
2 1 2
3 1 3
4 1 2
5 1 1
6 2 3
7 2 2
8 2 1
9 2 3
I GOT OUTPUT AS
P_ID | S_ID
1 1,2,3,2,1
2 3,2,1,3
USING QUERY
SELECT PATTERN_DETAILS.P_ID "PATTERN",
LISTAGG(PATTERN_DETAILS.S_ID, ', ')
WITHIN GROUP (ORDER BY PATTERN_DETAILS.P_D_ID) "SHIFT"
FROM PATTERN_DETAILS
GROUP BY PATTERN_DETAILS.P_ID;
WHAT I WANT IS
P_NAME | S_NUMBER
Pattern 1 A,B,C,B,A
Pattern 2 C,B,A,C
Any suggestion ??? Instead of P_ID i want to show pattern name and instead of shift id i want to show shift number .How to perform join operation along with listagg function ?

You need to join all three tables to get this,
SELECT pm.p_name "P_NAME",
listagg(sm.s_number, ', ') WITHIN GROUP (ORDER BY pd.p_d_id) "S_NUMBER"
FROM pattern_master pm,
pattern_details pd,
shift_master sm
WHERE sm.s_id= pd.s_id
AND pm.p_id = pd.p_id
GROUP BY pm.p_name;

Select to build groups by (analytic) sum

Please help me to build a sql select to assign (software development) tasks to a software release. Actually this is a fictive example to solve my real business specific problem.
I have a relation Tasks:
ID Effort_In_Days
3 3
1 2
6 2
2 1
4 1
5 1
I want to distribute the Tasks to releases which are at most 2 days long (tasks longer than 2 shall still be put into one release). In my real problem I have much more "days" available to distribute "tasks" to. Expected output:
Release Task_ID
1 3
2 1
3 6
4 2
4 4
5 5
I think I need to use analytic functions, something with sum(effort_in_days) over and so on, to get the result. But I'm I haven't used analytic functions much and didn't find an example that's close enough to my specific problem. I need to build groups (releases) if a sum (>= 2) is reached.

I would do something like:
with data as (
select 3 ID, 3 Effort_In_Days from dual union all
select 1 ID, 2 Effort_In_Days from dual union all
select 6 ID, 2 Effort_In_Days from dual union all
select 2 ID, 1 Effort_In_Days from dual union all
select 4 ID, 1 Effort_In_Days from dual union all
select 5 ID, 1 Effort_In_Days from dual
)
select id, effort_in_days, tmp, ceil(tmp/2) release
from (
select id, effort_in_days, sum(least(effort_in_days, 2)) over (order by effort_in_days desc rows unbounded preceding) tmp
from data
);
Which results in:
ID EFFORT_IN_DAYS TMP RELEASE
---------- -------------- ---------- ----------
3 3 2 1
1 2 4 2
6 2 6 3
2 1 7 4
4 1 8 4
5 1 9 5
Basically, I am using least() to convert everything over 2 down to 2. Then I am putting all rows in descending order by that value and starting to assign releases. Since they are in descending order with a max value of 2, I know I need to assign a new release every time when I get to a multiple of 2.
Note that if you had fractional values, you could end up with releases that do not have a full 2 days assigned (as opposed to having over 2 days assigned), which may or may not meet your needs.
Also note that I am only showing all columns in my output to make it easier to see what the code is actually doing.

This is an example of a bin-packing problem (see here). There is not an optimal solution in SQL, that I am aware of, except in some boundary cases. For instance, if all the tasks have the same length or if all the tasks are >= 2, then there is an easy-to-find optimal solution.
A greedy algorithm works pretty well. This is to put a given record in the first bin where it fits, probably going through the list in descending size order.
If your problem is really as you state it, then the greedy algorithm will work to produce an optimal solution. That is, if the maximum value is 2 and the efforts are integers. There might even be a way to calculate the solution in SQL in this case.
Otherwise, you will need pl/sql code to achieve an approximate solution.

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE data AS
select 3 ID, 3 Effort_In_Days from dual union all
select 1 ID, 2 Effort_In_Days from dual union all
select 6 ID, 2 Effort_In_Days from dual union all
select 2 ID, 1 Effort_In_Days from dual union all
select 4 ID, 1 Effort_In_Days from dual union all
select 5 ID, 1 Effort_In_Days from dual union all
select 9 ID, 2 Effort_In_Days from dual union all
select 7 ID, 1 Effort_In_Days from dual union all
select 8 ID, 1 Effort_In_Days from dual;
Query 1:
Give the rows an index so that they can be kept in order easily;
Assign groups to the rows where the Effort_In_Days is 1 so that all adjacent rows with Effort_In_Days of 1 are in the same group and rows separated by higher values for Effort_In_Days are in different groups;
Assign a cost of 1 to each row where the Effort_In_Days is higher than 1 or where Effort_In_Days is 1 and the row has an odd row number within the group; then
Finally, the release is the sum of all the costs for the row and all preceding rows.
Like this:
WITH indexes AS (
SELECT ID,
Effort_In_Days,
ROWNUM AS idx
FROM Data
),
groups AS (
SELECT ID,
Effort_In_Days,
idx,
CASE Effort_In_Days
WHEN 1
THEN idx - ROW_NUMBER() OVER ( PARTITION BY Effort_In_Days ORDER BY idx )
END AS grp
FROM indexes
ORDER BY idx
),
costs AS (
SELECT ID,
Effort_In_Days,
idx,
CASE Effort_In_Days
WHEN 1
THEN MOD( ROW_NUMBER() OVER ( PARTITION BY grp ORDER BY idx ), 2 )
ELSE 1
END AS cost
FROM groups
ORDER BY idx
)
SELECT ID,
Effort_In_Days,
SUM( cost ) OVER ( ORDER BY idx ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS Release
FROM costs
ORDER BY idx
Results:
| ID | EFFORT_IN_DAYS | RELEASE |
|----|----------------|---------|
| 3 | 3 | 1 |
| 1 | 2 | 2 |
| 6 | 2 | 3 |
| 2 | 1 | 4 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 9 | 2 | 6 |
| 7 | 1 | 7 |
| 8 | 1 | 7 |

Add auto incrementing number based on column

I am trying to wrap my head around a problem I hit exporting data from one system to another.
Let's say I have a table like:
id | item_num
1 1
2 1
3 2
4 3
5 3
6 3
I need to add a column to the table and update it to contain an incrementing product_num field based on item. This would be the end result given the above table.
id | item_num | product_num
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3
Any ideas on going about this?
Edit: This is being done in Access 2010 from one system to another (sql server source, custom/unknown ODBC driven destination)

Perhaps you could create a view in your SQL Server database and then select from that in Access to insert into your destination.
Possible solutions in SQL Server:
-- Use row_number() to get product_num in SQL Server 2005+:
select id
, item_num
, row_number() over (partition by item_num order by id) as product_num
from MyTable;
-- Use a correlated subquery to get product_num in many databases:
select t.id
, t.item_num
, (select count(*) from MyTable where item_num = t.item_num and id <= t.id) as product_num
from MyTable t;
Same result:
id item_num product_num
----------- ----------- --------------------
1 1 1
2 1 2
3 2 1
4 3 1
5 3 2
6 3 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery: Flattening all repeated fields in nested schema - google-bigquery

Related

Generate a serial number based on quantity column in sql

Add position column based on order by - Oracle

How to apply Joins in Oracle to require result

Select to build groups by (analytic) sum

Add auto incrementing number based on column

Categories

Resources