How can I flatten table in SQL in Google Big Query?

How can I flatten table in SQL in Google Big Query? - sql

I have this table
And tried to achieve the following output:
I found different articles (like this) how to do it, unfortunately they do not work with my table.
The schema of the table is the following:

Consider below approach - less verbose and easy to manage if any adjustments needed
select * from (
select id_car, kv.element.key, kv.element.value
from `project.dataset.table`, unnest(table.keyvalue.list) as kv
)
pivot (min(value) for key in ('id', 'model', 'status', 'speed'))
if applied to sample data in your question - output is

I created a table with the schema you mentioned and data you gave:
I ran the following query on this table:
Select id_car,STRING_AGG(id,'') as id, STRING_AGG(model,'') as model, STRING_AGG(Status,'') as status, STRING_AGG(speed,'') as speed from (SELECT id_car,
if(my.element.key = "id", my.element.value,'') as id,
if(my.element.key = "model", my.element.value, '') as `model`,
if(my.element.key = "Status", my.element.value, '') as Status,
if(my.element.key = "speed", my.element.value, '') as speed,
FROM `ProjectID.Dataset.Table`, unnest(table.keyvalue.list) as my) group by id_car
This gives me the same output that you expect:

Related

How to Join two Queries in Sequelize

I tried to join two queries and based on it find the result. I am able to write the code in SQL. My SQL Code is
SELECT a.awbid,
m.mpscount
FROM ( select a.awbid
FROM awbmaster a
where a.batchid ='B/117/15022022'
and a.hubid ='117'
) as a
left join ( select count('mpsid') as "mpscount" ,
awbid from
mpsmaster m
where m.batchid = 'B/117/15022022'
group by "awbid"
) as m on a.awbid = m.awbid
But, I am not yet found any solution regarding the sequelize. How can I write the above SQL code in sequelize?

First, we can simplify this SQL query simply using a subquery to find out mpscount:
SELECT a.awbid,
(SELECT count(*)
FROM mpsmaster m
WHERE m.batchid = 'B/117/15022022'
AND m.awbid=a.awbid ) AS mpscount
FROM awbmaster a
WHERE a.batchid ='B/117/15022022'
AND a.hubid ='117'
Now if we already have both models MpsMaster and AwbMaster and a correct association between them then we can make a request something like this:
const records = AwbMaster.findAll(}
attributes: [
'awbid',
[Sequelize.literal('(SELECT count(*) FROM mpsmaster m WHERE m.batchid = \'B/117/15022022\' AND m.awbid=AwbMaster.awbid)'), 'mpscount']
]
})

Extract values from repeated columns in an array with BigQuery

Each array consists of information about which list (internal_list_id) does a certain contact belong to (vid).
I'm trying to include all internal_list_id (separated by comma) in one column grouped by vid.
The end data should like something like:
ContactID | ListMembership:
3291601 1058,1060
I've tried with the below code but it returns information about the first object only:
SELECT list_memberships[offset(1)].vid ContactId, list_memberships[offset(1)].internal_list_id ListMembership FROM hs.contacts as c
The below results is achieved via:
SELECT list_memberships FROM hs.contacts as c
P.S. If you have any suggestions for better a title please let me know. Thanks!

Use STRING_AGG(x) FROM UNNEST(array), like in:
WITH data AS (
SELECT visitStartTime, hits[OFFSET(0)].product
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
LIMIT 100
)
SELECT visitStartTime, (
SELECT STRING_AGG(FORMAT('$%i', localProductPrice), ', ')
FROM UNNEST(product)
) aggregated
FROM data

single query to find difference in values in the three tables in oracle

Am having three similar tables
test_dev
test_qmg
test_prod
All the tables have same columns. i want single query to find difference in values in the three tables.
example:
select * from test_dev
minus
select * from test_qmg
minus
select * from test_prod
column names are same for all three tables. I want to find the difference in values in column.
select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk where visa_type_id=1 select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk_qmg where visa_type_id=1 select VALIDITY_DAYS_BEFORE_ENTRY,VALIDITY_DAYS_AFTER_ENTRY from visa_type_lk_prod where visa_type_id=1
here validity_days_before_entry,validity_days_before_entry column will change. i want to find that difference

I believe this is what you are looking for:
SELECT dev.visa_type_id,
(dev.VALIDITY_DAYS_BEFORE_ENTRY - qmg.VALIDITY_DAYS_BEFORE_ENTRY - prod.VALIDITY_DAYS_BEFORE_ENTRY) as difference_before,
(dev.VALIDITY_DAYS_AFTER_ENTRY - qmg.VALIDITY_DAYS_AFTER_ENTRY - prod.VALIDITY_DAYS_AFTER_ENTRY) as difference_after
FROM visa_type_lk dev INNER JOIN visa_type_lk_qmg qmg ON dev.visa_type_id = qmg.visa_type_id
INNER JOIN visa_type_lk_prod prod ON qmg.visa_type_id = prod.visa_type_id
WHERE dev.visa_type_id =1
Here's a link to an SQL fiddle to demonstrate: http://sqlfiddle.com/#!2/56e16/2
Are you sure this is really what you want to do though? I can't imagine how this data would be useful. By the way, all of these tables are in one database, right?

SELECT
MIN(environment_name) as environment_name,VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM
(
SELECT
'development' as environment_name, VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM visa_type_lk A
where visa_type_id in (select visa_type_id from visa_type_lk_prod)
UNION ALL
SELECT
'production' as environment_name, VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
FROM visa_type_lk_prod B
)
tmp
GROUP BY VISA_TYPE_ID,
VISA_TYPE_EN,
VISA_TYPE_AR,
VALIDITY_DAYS_BEFORE_ENTRY,
VALIDITY_DAYS_AFTER_ENTRY,
STAY_DAYS,
STAY_GRACE_DAYS,
EXTENSION1_DAYS,
EXTENSION1_GRACE_DAYS,
EXTENSION2_DAYS,
EXTENSION2_GRACE_DAYS,
IS_BORDER_VISA,
IS_MULTIPLE_ENTRY_VISA,
VIOLATION_GRACE_DAYS,
IS_ARCHIVED,
JOB_CLOSE_AFTER_DAYS,
IS_ALLOWED_FOR_ESTAB_QUOTA,
REPLACE_WITH_VISA_TYPE_ID
HAVING COUNT(*) = 1
order by visa_type_id,environment_name

Bigquery: "Not enough memory"

Bigquery started to give me error:not enough memory when I run this query this morning. The two tables involved contain no more than 5GB data. Plus I'm using table decorators, 1407249067530 equals around 10:30am today(20140805). I wonder what's the problem.
Job ID: red-road-574:job_x8flLfo4QwA1gQ_FCrNWbKY-bZM
select * from
(
select t_connection.row_id AS debug_row_id,
t_connection.hardware_id AS hardware_id,
t_connection.debug_data AS debug_data,
t_connection.connection_status AS connection_status,
t_connection.date_time AS debug_date_time,
t_gps.hardware_id AS hardware_id2,
t_gps.latitude AS latitude,
t_gps.longitude AS longitude,
t_gps.date_time AS gps_date_time,
t_gps.zip_code AS zip_code,
ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,
from(
select *,
ABS(t_gps.date_time-t_connection.date_time) AS time_diff
from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id,
gg.hardware_id as hardware_id,
gg.latitude as latitude,
gg.longitude as longitude,
gg.date_time as date_time,
gg.zip_code as zip_code
from [my data set.table1_20140805#1407249067530-] gg
) AS t_gps
INNER JOIN EACH
( select CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,
dd.hardware_id as hardware_id,
dd.date_time as date_time,
dd.debug_data as debug_data,
case
when dd.debug_reason = 1 then 'Successful_Connection'
when dd.debug_reason = 2 then 'Dropped_Connection'
when dd.debug_reason = 3 then 'Failed_Connection'
end AS connection_status
from [my data set.table2_20140805#1407249067530-] dd
where dd.debug_reason in (50013, 50017, 50018)
) as t_connection
ON t_connection.hardware_id = t_gps.hardware_id
)
) WHERE row_num=1

You're hitting an odd corner case. When you use allowLargeResults with results that are nested or repeated and you don't use flattenResults=false, the query goes into a special mode. (when you use timestamps, you're really using a nested data structure, which was a design decision that spawned 1000 bugs and is hopefully changing soon). This special query mode has some limitations, which are what you're hitting.
In general, we want this to be seamless, which is why it isn't documented. However, since you're running into a problem here, I'll explain a little about about how to avoid it.
You have a couple of options to get around this:
If you're using nested or repeated results (it looks like you're not, which is good):
rename your results without dots in the name.
set the flattenResults field on the query to 'false'. This means that nested and repeated fields will be actually nested and repeated in the results.
If you're using timestamps in the results:
Convert your timestamps to strings or numeric values. Sorry.
If you don't really need large results:
unset the allowLargeResults flag.
I realize that all of these options are deeply unsatisfying. This is an area we're actively working to improve.

Now with allowLargeReults=true and flattenResults=false and convert timestamps to numeric value at the first step
select * from
(
select row_id AS debug_row_id,
hardware_id AS hardware_id,
debug_data AS debug_data,
connection_status AS connection_status,
date_time AS debug_date_time,
hardware_id2 AS hardware_id2,
latitude AS latitude,
longitude AS longitude,
date_time2 AS gps_date_time,
zip_code AS zip_code,
ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,
from(
select *,
ABS(t_gps.date_time2-t_connection.date_time) AS time_diff
from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id_gps,
gg.hardware_id as hardware_id2,
gg.latitude as latitude,
gg.longitude as longitude,
TIMESTAMP_TO_MSEC(gg.date_time) as date_time2,
gg.zip_code as zip_code
from [test.gps32_20140805#1407249067530-] gg
) AS t_gps
INNER JOIN EACH
( select CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,
dd.hardware_id as hardware_id,
TIMESTAMP_TO_MSEC(dd.date_time) as date_time,
dd.debug_data as debug_data,
case
when dd.debug_reason = 1 then 'Successful_Connection'
when dd.debug_reason = 2 then 'Dropped_Connection'
when dd.debug_reason = 3 then 'Failed_Connection'
end AS connection_status
from [test.debug_data_developer_20140805#1407249067530-] dd
where dd.debug_reason in (50013, 50017, 50018)
) as t_connection
ON t_connection.hardware_id = t_gps.hardware_id2
)
) WHERE row_num=1
it gives me
Query Failed
Error: Resources exceeded during query execution.
Job ID: red-road-574:job_ikWQvffmPEUP6DtTvJaYpXHFJ2M

This is the functioning SQL with allowLargeResults=true, flattenResults=true. I don't know what I did to make this work, maybe only add a HAVING clause? But in the JOIN, I change one side to be a whole table instead of the one with decorator as above, so the data involved actually increased. I'm not sure whether it can keep successful or it's just temporary luck.

Aggregate values and Pivot

I am partly on my way to solving this, but have hit a stumbling block, which I think can be solved with pivot(s).
I have the following SQL query, combining two temporary table variables (may change these to temporary tables, as I think performance maybe come a problem as they will be hit a large number of times):
SELECT MeterId, MeterDataOutput.BuildingId, MeterDataOutput.Value,
MeterDataOutput.TimeStamp, UtilityId, SnapshotId
FROM #MeterDataOutput as MeterDataOutput INNER JOIN #InsertOutput AS InsertOutput
ON MeterDataOutput.BuildingId = InsertOutput.BuildingId
AND MeterDataOutput.[Timestamp] = InsertOutput.[TimeStamp]
This produces the following table:
I have then modified the query to group by BuildingId, SnapshotId, Timestamp, Utility and applied the SUM() function to aggregate the Value field (and dropped the MeterId as its not required), as follows:
SELECT MeterDataOutput.BuildingId, SUM(MeterDataOutput.Value) AS Value, MeterDataOutput.TimeStamp, UtilityId, SnapshotId
FROM #MeterDataOutput as MeterDataOutput
INNER JOIN #InsertOutput AS InsertOutput
ON MeterDataOutput.BuildingId = InsertOutput.BuildingId
AND MeterDataOutput.[Timestamp] = InsertOutput.[TimeStamp]
GROUP BY MeterDataOutput.BuildingId, MeterDataOutput.TimeStamp, UtilityId, SnapshotId
This query the provides me with the following table:
Now the bit I'm having trouble with is transforming the UtilityId values to columns, and placing the values from the Value field under each column. I.e:
For reference buildingId, Timestamp, Snapshot and Value are variable. UtilityId value 6 is always 'Electricity', 7 is always 'Gas' and 8 is always 'Water'.
I'm actually starting to get the hand of the SQL lark :)

Maybe something like this:
SELECT
pvt.BuildingId,
pvt.SnapshotId,
pvt.TimeStamp,
pvt.[6] AS Electricity,
pvt.[7] AS Gas,
pvt.[8] AS Water
FROM
(
SELECT
MeterDataOutput.BuildingId,
MeterDataOutput.Value,
MeterDataOutput.TimeStamp,
UtilityId,
SnapshotId
FROM #MeterDataOutput as MeterDataOutput
INNER JOIN #InsertOutput AS InsertOutput
ON MeterDataOutput.BuildingId = InsertOutput.BuildingId
AND MeterDataOutput.[Timestamp] = InsertOutput.[TimeStamp]
) AS SourceTable
PIVOT
(
SUM(Value)
FOR UtilityId IN ([6],[7],[8])
) AS pvt

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I flatten table in SQL in Google Big Query? - sql

I have this table And tried to achieve the following output: I found different articles (like this) how to do it, unfortunately they do not work with my table. The schema of the table is the following:

Related

How to Join two Queries in Sequelize

Extract values from repeated columns in an array with BigQuery

single query to find difference in values in the three tables in oracle

Bigquery: "Not enough memory"

Aggregate values and Pivot

Categories

Resources