BigQuery DML COUNT() across multiple tables

BigQuery DML COUNT() across multiple tables - google-bigquery

I'm looking for a mechanism to control the accuracy of data that I import daily on multiple BigQuery tables. Each table have similar format with a DATE and an ID column. The Table format looks like this:
Table_1
| DATE | ID |
| 2018-10-01 | A |
| 2018-10-01 | B |
| 2018-10-02 | A |
| 2018-10-02 | B |
| 2018-10-02 | C |
What I want to control is the evolution of the number of IDs, through such kind of output table:
CONTROL_TABLE
| DATE | COUNT(Table1.ID) | COUNT(Table2.ID) | COUNT(Table3.ID) |
| 2018-10-01 | 2 | 487654 | 675386 |
| 2018-10-02 | 3 | 488756 | 675447 |
I'm trying to do such through 1 single SQL query, but face several limits with the DML such as:
-> One single SELECT with all the tables jointed is out of question for performance purpose (20+ tables with millions lines)
-> I was thinking of going through temporary tables, but it seems I cannot run Multiple DELETE + INSERT functions on several tables with DML
-> I cannot use a wildcard table as the output of the query
Would anyone have an idea how to get such result in an optimized way, ideally through 1 single query ?

Related

Why is this Query not Updateable?

I was looking to provide an answer to this question in which the OP has two tables:
Table1
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | |
| 2 | |
| 3 | |
+--------+--------+
Table2
+----+--------+--------+--------+
| ID | testID | stepID | status |
+----+--------+--------+--------+
| 1 | 1 | 1 | pass |
| 2 | 1 | 2 | fail |
| 3 | 1 | 3 | pass |
| 4 | 2 | 1 | pass |
| 5 | 2 | 2 | pass |
| 6 | 3 | 1 | fail |
+----+--------+--------+--------+
Here, the OP is looking to update the status field for each testID in Table1 with pass if the status of all stepID records associated with the testID in Table2 have a status of pass, else Table1 should be updated with fail for that testID.
In this example, the result should be:
+--------+--------+
| testID | Status |
+--------+--------+
| 1 | fail |
| 2 | pass |
| 3 | fail |
+--------+--------+
I wrote the following SQL code in an effort to accomplish this:
update Table1 a inner join
(
select
b.testID,
iif(min(b.status)=max(b.status) and min(b.status)='pass','pass','fail') as v
from Table2 b
group by b.testID
) c on a.testID = c.testID
set a.testStatus = c.v
However, MS Access reports the all-too-familiar, 'operation must use an updateable query' response.
I know that a query is not updateable if there is a one-to-many relationship between the record being updated and the set of values, but in this case, the aggregated subquery would yield a one-to-one relationship between the two testID fields.
Which left me asking, why is this query not updateable?

You're joining in a query with an aggregate (Max).
Aggregates are not updateable. In Access, in an update query, every part of the query has to be updateable (with the exception of simple expressions, and subqueries in WHERE part of your query), which means your query is not updateable.
You can work around this by using domain aggregates (DMin and DMax) instead of real ones, but this query will take a large performance hit if you do.
You can also work around it by rewriting your aggregates to take place in an EXISTS or NOT EXISTS clause, since that's part of the WHERE clause thus doesn't need to be updateable. That would likely minimally affect performance, but means you have to split this query in two: 1 query to set all the fields to "pass" that meet your condition, another to set them to "fail" if they don't.

Last accessed timestamp of a Netezza table?

Does anyone know of a query that gives me details on the last time a Netezza table was accessed for any of the operations (select, insert or update) ?

Depending on your setup you may want to try the following query:
select *
from _v_qryhist
where lower(qh_sql) like '%tablename %'

There are a collection of history views in Netezza that should provide the information you require.

Netezza does not track this information in the catalog, so you will typically have to mine that from the query history database, if one is configured.
Modern Netezza query history information is typically stored in a dedicated database. Depending on permissions, you may be able to see if history collection is enabled, and which database it is using with the following command. Apologies in advance for the screen-breaking wrap to come.
SYSTEM.ADMIN(ADMIN)=> show history configuration;
CONFIG_NAME | CONFIG_DBNAME | CONFIG_DBTYPE | CONFIG_TARGETTYPE | CONFIG_LEVEL | CONFIG_HOSTNAME | CONFIG_USER | CONFIG_PASSWORD | CONFIG_LOADINTERVAL | CONFIG_LOADMINTHRESHOLD | CONFIG_LOADMAXTHRESHOLD | CONFIG_DISKFULLTHRESHOLD | CONFIG_STORAGELIMIT | CONFIG_LOADRETRY | CONFIG_ENABLEHIST | CONFIG_ENABLESYSTEM | CONFIG_NEXT | CONFIG_CURRENT | CONFIG_VERSION | CONFIG_COLLECTFILTER | CONFIG_KEYSTORE_ID | CONFIG_KEY_ID | KEYSTORE_NAME | KEY_ALIAS | CONFIG_SCHEMANAME | CONFIG_NAME_DELIMITED | CONFIG_DBNAME_DELIMITED | CONFIG_USER_DELIMITED | CONFIG_SCHEMANAME_DELIMITED
-------------+---------------+---------------+-------------------+--------------+-----------------+-------------+---------------------------------------+---------------------+-------------------------+-------------------------+--------------------------+---------------------+------------------+-------------------+---------------------+-------------+----------------+----------------+----------------------+--------------------+---------------+---------------+-----------+-------------------+-----------------------+-------------------------+-----------------------+-----------------------------
ALL_HIST_V3 | NEWHISTDB | 1 | 1 | 20 | localhost | HISTUSER | aFkqABhjApzE$flT/vZ7hU0vAflmU2MmPNQ== | 5 | 4 | 20 | 0 | 250 | 1 | f | f | f | t | 3 | 1 | 0 | 0 | | | HISTUSER | f | f | f | f
(1 row)
Also make note of the CONFIG_VERSION, as it will come into play when crafting the following query example. In my case, I happen to be using the version 3 format of the query history database.
Assuming history collection is configured, and that you have access to the history database, you can get the information you're looking for from the tables and views in that database. These are documented here. The following is an example, which reports when the given table was the target of a successful insert, update, or delete by referencing the "usage" column. Here I use one of the history table helper functions to unpack that column.
SELECT FORMAT_TABLE_ACCESS(usage),
hq.submittime
FROM "$v_hist_queries" hq
INNER JOIN "$hist_table_access_3" hta
USING (NPSID, NPSINSTANCEID, OPID, SESSIONID)
WHERE hq.dbname = 'PROD'
AND hta.schemaname = 'ADMIN'
AND hta.tablename = 'TEST_1'
AND hq.SUBMITTIME > '01-01-2015'
AND hq.SUBMITTIME <= '08-06-2015'
AND
(
instr(FORMAT_TABLE_ACCESS(usage),'ins') > 0
OR instr(FORMAT_TABLE_ACCESS(usage),'upd') > 0
OR instr(FORMAT_TABLE_ACCESS(usage),'del') > 0
)
AND status=0;
FORMAT_TABLE_ACCESS | SUBMITTIME
---------------------+----------------------------
ins | 2015-06-16 18:32:25.728042
ins | 2015-06-16 17:46:14.337105
ins | 2015-06-16 17:47:14.430995
(3 rows)
You will need to change the digit at the end of the $v_hist_table_access_3 view to match your query history version.

Oracle Recursive Select to Find Current ID Associated with a Customer

I have a table that contains the history of Customer IDs that have been merged in our CRM system. The data in the historical reporting Oracle schema exists as it was when the interaction records were created. I need a way to find the Current ID associated with a customer from potentially an old ID. To make this a bit more interesting, I do not have permissions to create PL/SQL for this, I can only create Select statements against this data.
Sample Data in customer ID_MERGE_HIST table
| OLD_ID | NEW_ID |
+----------+----------+
| 44678368 | 47306920 |
| 47306920 | 48352231 |
| 48352231 | 48780326 |
| 48780326 | 50044190 |
Sample Interaction table
| INTERACTION_ID | CUST_ID |
+----------------+----------+
| 1 | 44678368 |
| 2 | 48352231 |
| 3 | 80044190 |
I would like a query with a recursive sub-query to provide a result set that looks like this:
| INTERACTION_ID | CUST_ID | CUR_CUST_ID |
+----------------+----------+-------------+
| 1 | 44678368 | 50044190 |
| 2 | 48352231 | 50044190 |
| 3 | 80044190 | 80044190 |
Note: Cust_ID 80044190 has never been merged, so does not appear in the ID_MERGE_HIST table.
Any help would be greatly appreciated.

You can look at CONNECT BY construction.
Also, you might want to play with recursive WITH (one of the descriptions: http://gennick.com/database/understanding-the-with-clause). CONNECT BY is better, but ORACLE specific.
If this is frequent request, you may want to store first/last cust_id for all related records.
First cust_id - will be static, but will require 2 hops to get to the current one
Last cust_id - will give you result immediately, but require an update for the whole tree with every new record

Unique string table in SQL and replacing index values with string values during query

I'm working on an old SQL Server database that has several tables that look like the following:
|-------------|-----------|-------|------------|------------|-----|
| MachineName | AlarmName | Event | AlarmValue | SampleTime | ... |
|-------------|-----------|-------|------------|------------|-----|
| 3 | 180 | 8 | 6.780 | 2014-02-24 | |
| 9 | 67 | 8 | 1.45 | 2014-02-25 | |
| ... | | | | | |
|-------------|-----------|-------|------------|------------|-----|
There is a separate table in the database that only contains unique strings, as well as the index for each unique string. The unique string table looks like this:
|----------|--------------------------------|
| Id | String |
|----------|--------------------------------|
| 3 | MyMachine |
| ... | |
| 8 | High CPU Usage |
| ... | |
| 67 | 404 Error |
| ... | |
|----------|--------------------------------|
Thus, when we want to get something out of the database, we get the respective rows out, then lookup each missing string based on the index value.
What I'm hoping to do is to replace all of the string indexes with the actual values in a single query without having to do post-processing on the query result.
However, I can't figure out how to do this in a single query. Do I need to use multiple JOINs? I've only been able to figure out how to replace a single value by doing something like -
SELECT UniqueString.String AS "MachineName" FROM UniqueString
JOIN Alarm ON Alarm.MachineName = UniqueString.Id
Any help would be much appreciated!

Yes, you can do multiple joins to the UniqueStrings table, but change the order to start with the table you are reporting on and use unique aliases for the joined table. Something like:
SELECT MN.String AS 'MachineName', AN.String as 'AlarmName' FROM Alarm A
JOIN UniqueString MN ON A.MachineName = MN.Id
JOIN UniqueString AN ON A.AlarmName = AN.Id
etc for any other columns

Merge computed data from two tables back into one of them

I have the following situation (as a reduced example). Two tables, Measures1 and Measures2, each of which store an ID, a Weight in grams, and optionally a Volume in fluid onces. (In reality, Measures1 has a good deal of other data that is irrelevant here)
Contents of Measures1:
+----+----------+--------+
| ID | Weight | Volume |
+----+----------+--------+
| 1 | 100.0000 | NULL |
| 2 | 200.0000 | NULL |
| 3 | 150.0000 | NULL |
| 4 | 325.0000 | NULL |
+----+----------+--------+
Contents of Measures2:
+----+----------+----------+
| ID | Weight | Volume |
+----+----------+----------+
| 1 | 75.0000 | 10.0000 |
| 2 | 400.0000 | 64.0000 |
| 3 | 100.0000 | 22.0000 |
| 4 | 500.0000 | 100.0000 |
+----+----------+----------+
These tables describe equivalent weights and volumes of a substance. E.g. 10 fluid ounces of substance 1 weighs 75 grams. The IDs are related: ID 1 in Measures1 is the same substance as ID 1 in Measures2.
What I want to do is fill in the NULL volumes in Measures1 using the information in Measures2, but keeping the weights from Measures1 (then, ultimately, I can drop the Measures2 table, as it will be redundant). For the sake of simplicity, assume that all volumes in Measures1 are NULL and all volumes in Measures2 are not.
I can compute the volumes I want to fill in with the following query:
SELECT Measures1.ID, Measures1.Weight,
(Measures2.Volume * (Measures1.Weight / Measures2.Weight))
AS DesiredVolume
FROM Measures1 JOIN Measures2 ON Measures1.ID = Measures2.ID;
Producing:
+----+----------+-----------------+
| ID | Weight | DesiredVolume |
+----+----------+-----------------+
| 4 | 325.0000 | 65.000000000000 |
| 3 | 150.0000 | 33.000000000000 |
| 2 | 200.0000 | 32.000000000000 |
| 1 | 100.0000 | 13.333333333333 |
+----+----------+-----------------+
But I am at a loss for how to actually insert these computed values into the Measures1 table.
Preferably, I would like to be able to do it with a single query, rather than writing a script or stored procedure that iterates through every ID in Measures1. But even then I am worried that this might not be possible because the MySQL documentation says that you can't use a table in an UPDATE query and a SELECT subquery at the same time, and I think any solution would need to do that.
I know that one workaround might be to create a new table with the results of the above query (also selecting all of the other non-Volume fields in Measures1) and then drop both tables and replace Measures1 with the newly-created table, but I was wondering if there was any better way to do it that I am missing.

UPDATE Measures1
SET Volume = (Measures2.Volume * (Measures1.Weight / Measures2.Weight))
FROM Measures1 JOIN Measures2
ON Measures1.ID = Measures2.ID;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery DML COUNT() across multiple tables - google-bigquery

Related

Why is this Query not Updateable?

Last accessed timestamp of a Netezza table?

Oracle Recursive Select to Find Current ID Associated with a Customer

Unique string table in SQL and replacing index values with string values during query

Merge computed data from two tables back into one of them

Categories

Resources