Flatten BQ Logs tables? - google-bigquery

Flatten BQ Logs tables? - google-bigquery

I'm trying to figure out how to flatten the bigquery log tables (logs.cloudaudit_googleapis_com_data_access_20160404 etc) so that i can basically see all completed jobs for any given destination table.
I ideally just want something like the below to show me all the entries for jobs that touched the table [dataset_xyz.table_abc] and then i can figure out how to make sense of some fields being populated based on type of job etc.
SELECT
*
FROM
[logs.cloudaudit_googleapis_com_data_access_20160404]
where
(
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId='dataset_xyz'
and
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId='table_abc'
)
or
(
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.datasetId='dataset_xyz'
and
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.tableId='table_abc'
)
or
(
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId='dataset_xyz'
and
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId='table_abc'
)
I was trying to do lots of nested flattens but i could not really figure it out tbh as find the log structure a bit complex.
I basically want to be able to query the log exports to say "show me everything that had anything to do with editing table [dataset_xyz.table_abc]" so i guess mainly load, tableCopy, and query jobs that either appended or overwrote any data in table [dataset_xyz.table_abc].
The only next thing i can think of it to literally pick apart the table along its nested records and then somehow join them all back together separately but that seems like a crazy idea. I'm sure there is a way to flatten it repeatedly but i just cant figure out how to flatten such a complicated structure. Even if i could just flatten everything under protoPayload.serviceData.jobCompletedEvent.* so i could do
select protoPayload.serviceData.jobCompletedEvent.* from flatten(...
Or maybe there is a much easier way to go about this that i'm missing?
p.s. i think this could be a good example for the guide as i'd imagine is a common enough thing people want to do.

Have you tried this:
SELECT protoPayload.serviceData.jobCompletedEvent.job.jobName.jobId
FROM [audit_logs.cloudaudit_googleapis_com_data_access_20160406]
omit record if ( sum(
(protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId = 'table' and
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId = 'ds'
)
or
(
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId = 'ds' AND
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId = 'table'
)
) = 0)

Related

SQL DB2 - How to SELECT or compare columns based on their name?

Thank you for checking my question out!
I'm trying to write a query for a very specific problem we're having at my workplace and I can't seem to get my head around it.
Short version: I need to be able to target columns by their name, and more specifically by a part of their name that will be consistent throughout all the columns I need to combine or compare.
More details:
We have (for example), 5 different surveys. They have many questions each, but SOME of the questions are part of the same metric, and we need to create a generic field that keeps it. There's more background to the "why" of that, but it's pretty important for us at this point.
We were able to kind of solve this with either COALESCE() or CASE statements but the challenge is that, as more surveys/survey versions continue to grow, our vendor inevitably generates new columns for each survey and its questions.
Take this example, which is what we do currently and works well enough:
CASE
WHEN SURVEY_NAME = 'Service1' THEN SERV1_REC
WHEN SURVEY_NAME = 'Notice1' THEN FNOL1_REC
WHEN SURVEY_NAME = 'Status1' THEN STAT1_REC
WHEN SURVEY_NAME = 'Sales1' THEN SALE1_REC
WHEN SURVEY_NAME = 'Transfer1' THEN Null
ELSE Null
END REC
And also this alternative which works well:
COALESCE(SERV1_REC, FNOL1_REC, STAT1_REC, SALE1_REC) as REC
But as I mentioned, eventually we will have a "SALE2_REC" for example, and we'll need them BOTH on this same statement. I want to create something where having to come into the SQL and make changes isn't needed. Given that the columns will ALWAYS be named "something#_REC" for this specific metric, is there any way to achieve something like:
COALESCE(all columns named LIKE '%_REC') as REC
Bonus! Related, might be another way around this same problem:
Would there also be a way to achieve this?
SELECT (columns named LIKE '%_REC') FROM ...
Thank you very much in advance for all your time and attention.
-Kendall

Table and column information in Db2 are managed in the system catalog. The relevant views are SYSCAT.TABLES and SYSCAT.COLUMNS. You could write:
select colname, tabname from syscat.tables
where colname like some_expression
and syscat.tabname='MYTABLE
Note that the LIKE predicate supports expressions based on a variable or the result of a scalar function. So you could match it against some dynamic input.
Have you considered storing the more complicated properties in JSON or XML values? Db2 supports both and you can query those values with regular SQL statements.

Having vs. Where in SQL, using the ORM in Laravel

I think my question is more related to SQL than to Laravel or its ORM, but I'm having the problem while programming in Laravel, so that's why I tagged it in the question.
My problem is as follows, I have the following model (sorry for the Spanglish):
I have the users table, nothing special here,
Then the juegos (games) tables, in it there's a jornada column (its like the week, to know which games are played in a certain week)
And finally the pronosticos (who the user says will win, which is stored in the diferencia column)
So I want to make a form where the user can make his bet. Basically this form will take its data from the pronosticos table, like this:
$juegos = Juego::where('jornada', $jor)
-> orderBy('expira')
-> get();
This produces what I want, a collection of models that I can iterate to show all the games for a given jornada (week).
Now, if the user has already make its bet, I want to bring also the scores values the user is betting on, with a query, so I thought I could use something like:
$juegos = Juego::where('jornada', $jor)
-> leftJoin('pronosticos', 'juegos.id', '=', 'pronosticos.juego_id')
-> addSelect(['pronosticos.user_id', 'juegos.id', 'expira', 'visitante', 'local', 'diferencia'])
-> having('pronosticos.user_id', $uid)
-> orderBy('expira')
-> get();
Now, the problem is, it is bringing an empty set, and thats quite obvious, if the user has made his bet, it will work, but if he hasn't the having will filter out everything, giving the empty set.
So I think I'm not getting clearly how to make the having or where to work correctly. Maybe what I want is to do a leftJoin not with the pronosticos table, but from the pronosticos table already filtered with a where clause.
Maybe I'm doing everything wrong and should do the leftJoin to a subselect? If that's so, I have no idea how to do it.
Or maybe my expectations are outside what can be done to SQL and I may return two different sets, and process them in the app?
EDIT
This is the query I want to express in Laravel's ORM:
SELECT * from juegos
LEFT JOIN (SELECT * FROM pronosticos WHERE user_id=1) AS p
ON p.juego_id = juegos.id
WHERE jornada = 2 ORDER BY expira

How would I join two tables, A and B, on A.slug and B.path in PostgreSQL?

Say I have an articles table that has a column called slugs, storing the slugs of the article - for example example-article-2016.
I also have a log table that logs each visit to each article, and has a column called paths that stores the same data in a different format: /articles/example-article-2016.
I have thought about just processing the path column in a way that would remove the /articles/ part, and then joining, but I am curious if there is a way to join on these columns without actually modifying the data.

You don't have to actually modify the data permanently, but you do have to adjust it for the join. One way would be to replace /articles/ with '' for example:
SELECT ...
FROM articles a
JOIN log l ON REPLACE(l.paths, '/articles/', '') = a.slugs
This won't use indexes and is not ideal, but works perfectly fine in ad-hoc scenarios. If you need to do this join a lot, you should consider a schema change.

You can just do:
SELECT
a.slugs /*, l.visited_at */
FROM
articles a
JOIN logs l ON substr(l.path, length('/articles/')+1) = a.slugs ;
The substr function should be quite fast to execute. You can obviously change length('/articles/')+1 by the constant 11; but I think that leaving it there is much more informative of what you're actually doing... If the last bit of performance is needed, put the 11.
You will probably take benefit of having the following computed index:
CREATE INDEX idx_logs_slug_from_path
ON logs ((substr(path, length('/articles/')+1))) ;
Check the whole setup at dbfiddle here

Oracle - allowing the LIKE clause being configured by another select

In a script I'm running the following query on an Oracle database:
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
where DATA.FILTER like (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
In the actual script it's a bit more complicated, but you get the idea. ;-)
DATA2.FILTER contains the filter which needs to applied on DATA as a LIKE -clause. The idea is to have it as generic as possible, meaning it should be possible to filter on:
FILTER%
%FILTER
FI%LTER
%FILTER%
FILTER (as if the clause was DATA.FILTER = (select DATA2.FILTER from DATA2 where DATA2.FILTER2 = 'WYZ')
The system the script runs on does not allow stored procedures to run for this kind of task, also I can't make the scrip build the query directly before running it.
Whatever data needs to be fetched, it has to be done with one single query.
I've already tried numerous solutions I found online, but no matter what I do, I seem to be misisng the mark.
What am I doing wrong here?

This one?
select nvl(max(to_char(DATA.VALUE)), 'OK')
from DATA
JOIN DATA2 on DATA.FILTER LIKE DATA2.FILTER
where DATA2.FILTER2 = 'WYZ'
Note: The performance of this query is not optimal, because for table DATA Oracle will perform always a "Full Table Scan" - but it works.

How do I get all the columns of a table besides one

Suppose we have 20 columns in a table and I want to return 19 of them.
How can I do that ?
select *
will give me all of them but I want only 19.
Is there a good solution for that situation ? something like
select * - [columnName]
?!?

Nope, sorry. You can take *, or you can take them one at a time, but you can't take "all of them except for X, Y, or Z."

As has been said, you use SELECT * for all columns or list the columns individually if you don't want them all.
Listing columns does seem like a chore but there is an important reason why it's actually good.
While it's OK for ad hoc queries, it's highly recommended that use don't use SELECT * in code because when the database schema changes you will get different columns in the results returned to your application which is almost certainly not what you want. If you could do select * but address from customer this would have the same problem: changing the DB would change the structure of the results of your query which is bad.
So not only can you not do it, I would recommend not doing it even if you could.

You can explicitly name each column you wish to select. That is the only way to exclude columns.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas