Merge statement? - sql

I am more of beginner with sql but would like some help on which statement would be best to use for my query. So I have an app that has test data, because the score could be 90 or be 85.6 the values are in different columns - former in int.value, latter in double.value. I need to merge the two columns together into one column for "test_score". Here is my current query, data goes to a table called "App_test_outcome":
SELECT event_date, timestamp_micros(event_timestamp) as Timestamp, user_pseudo_id, geo.country, geo.region, geo.city, geo.sub_continent,
(select value.string_value from unnest (event_params) where key = "test_passed") as Test_outcome,
(select value.string_value from unnest (event_params) where key = "test_category") as Test_outcome_category,
(select value.double_value from unnest (event_params) where key = "test_score") as Test_outcome_score,
FROM `Appname.analytics_number.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20200201' AND '20201130' AND
event_name = "test_completed"
Would I need to make another query to then merge with the above query already in that table, or is there a way to run a query and merge the two columns together in one. As I would prefer the latter option if possible.
I did get an error message when trying to append a query with double.value to int.value but an error message appeared "invalid schema: Field Test_outcome_score has changed type from FLOAT to INTEGER". Which makes me think I cannot merge the two columns anyway.
Any help would be great,
Many thanks,

Maybe IFNULL with CAST will help:
(select IFNULL(value.double_value, CAST(value.int_value AS FLOAT64)) from unnest (event_params) where key = "test_score") as Test_outcome_score

Related

MERGE on multiple tables

I am trying to do the following but this is an "Illegal operation (write) on meta-table".
MERGE x.y.events_* as events
USING
(
select distinct
user_id,
user_pseudo_id
from x.y.events_*
where user_id is not null
and user_pseudo_id is not null
qualify row_number() over (partition by user_pseudo_id) = 1
order by user_pseudo_id
) user_ids
ON events.user_pseudo_id = user_ids.user_pseudo_id
WHEN MATCHED THEN
UPDATE SET events.user_id = user_ids.user_id
This works fine if I define x.y.events_20230115 after MERGE but I have about 700 tables to update plus I would like to run this dynamically every day so it updates yesterdays dataset. With the wildcard, bigQuery tell me that this is an "Illegal operation (write) on meta-table". Makes sense, however I can't figure out how to proceed.
I am aware that I can use something like _table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(#run_date, INTERVAL 1 DAY)) in WHERE clauses but that doesn't seem like a solution here as I'm trying to write stuff.
Could anyone kindly point me to the right direction here? How to dynamically refer to the table suffix in MERGE x.y.events_ or is there perhaps a better way of doing this? Some sort of iteration?

How to use Index to combine data from 2 tables in SQL SERVER

I have 2 tables, one contains a list of customers(t_client) with their unique ID, the other one contains a list of promotional codes(t_promo_code).
I have created the index for both data table: idx_client; idx_code and I want to join these 2 tables so that each client can have a promotional code.
I suppose there should be something like this in SQL server?
SELECT *
FROM [EMAIL].[dbo].[T_client]
JOIN [EMAIL].[dbo].[T_promo_code] ON
(INDEX([EMAIL].[dbo].[T_client].idx_client)) = (INDEX ([EMAIL].[dbo].[T_promo_code].idx_code))
However, I cannot find anything... And I am really not familiar with Index. If I can turn index into a column, that would be much easier, yet I don't know how to do that either.
I only found a select sentence like this:
Select #row_index := #row_index +1 as index
But it seems that it only works for MYSQL, while I am using SQL SERVER 2008.
Any ideas?
Sorry if I didn't make it clear. I have the difficulty to join these 2 tables because the table t_promo_code doesn't have any column to match t_client.
Hence, I was considering generate a shared key for they by using INDEX. However, I just found another solution, that is using Row_number instead of Index.
Eventually, I have used the following SQL, and it works!
Select Email, 'test_campaign' AS Campaign, GEtDATE() AS DATE, Code
from(
SELECT Code, row_number() over (order by code) as row_num
FROM [t_promo_code])A
join
(SELECT Email, row_number() over (order by Email) as row_num
FROM [t_client])B
on A.row_num=B.row_num
ORDER BY A.Code,B.Email

How to extract record's table name when using Table wildcard functions [duplicate]

I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
This functionality is available now in BigQuery through _TABLE_SUFFIX pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will need to use Standard SQL to enable table wildcards
You will have to rename _TABLE_SUFFIX into something else in your SELECT list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
Not available today, but something I'd love to have too. The team takes feature requests seriously, so thanks for adding support for this one :).
In the meantime, a workaround is doing a manual union of a SELECT of each table, plus an additional column with the date data.
For example, instead of:
SELECT x, #TABLE_ID
FROM table201401, table201402, table201303
You could do:
SELECT x, month
FROM
(SELECT x, '201401' AS month FROM table201401),
(SELECT x, '201402' AS month FROM table201402),
(SELECT x, '201403' AS month FROM table201403)

Is there a way to select table_id in a Bigquery Table Wildcard Query

I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
This functionality is available now in BigQuery through _TABLE_SUFFIX pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will need to use Standard SQL to enable table wildcards
You will have to rename _TABLE_SUFFIX into something else in your SELECT list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
Not available today, but something I'd love to have too. The team takes feature requests seriously, so thanks for adding support for this one :).
In the meantime, a workaround is doing a manual union of a SELECT of each table, plus an additional column with the date data.
For example, instead of:
SELECT x, #TABLE_ID
FROM table201401, table201402, table201303
You could do:
SELECT x, month
FROM
(SELECT x, '201401' AS month FROM table201401),
(SELECT x, '201402' AS month FROM table201402),
(SELECT x, '201403' AS month FROM table201403)

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)