Finding the query that created a table in BigQuery - google-bigquery

I am a new employee at the company. The person before me had built some tables in BigQuery. I want to investigate the create table query for that particular table.
Things I would want to check using the query is:
What joins were used?
What are the other tables used to make the table in question?
I have not worked with BigQuery before but I did my due diligence by reading tutorials and the documentation. I could not find anything related there.

Brief outline of your actions below:
Step 1 - gather all query jobs of that user using Jobs.list API - you must have Is Owner permission for respective projects to get someone else's jobs
Step 2 - extract only those jobs run by the user you mentioned and referencing your table of interest - using destination table attribute
Step 3 - for those extracted jobs - just simply check respective queries which allow you to learn how that table was populated
Hth!

I have been looking for an answer since a long time.
Finally found it :
Go to the three bars tab on the left hand side top
From there go to the Analytics tab.
Select BigQuery under which you will find Scheduled queries option,click on that.
In the filter tab you can enter the keywords and get the required query of the table.

For me, I was able to go through my query history and find the query I used.
Step 1.
Go to the Bigquery UI, on the bottom there are personal history and project history tabs. If you can use the same account used to execute the query I recommend personal history.
Step 2.
Click on the tab and there will be a list of queries ordered from most recently run. Check the time the table was created and find a query that ran before the table creation time.
Since the query will run first and create the table there will be slight differences. For me it stayed between a few seconds.
Step 3.
After you find the query used to create the table, simply copy it. And you're done.

Related

Spotfire - Information Link - Filter Not Working

I am a beginner in Spotfire. I developed a simple information link.
Steps
I Created 2 tables by adding columns.
Then created Joins. 3 simple inner joins on the above table. The reason for 3 joins is it makes the query run faster than only 1 join.
Then created an information link by adding elements and joins.
This works perfectly well. The data is fetched properly. But as soon as I add filter, it stops working.
I tried
Creating Filter -> and then adding as element to the information link
Adding filter in the column filter itself: Column E_ID - Expression %1 = 1000
Editing the sql query in the information link. I added one more
clause in the where section: AND E1."E_ID" = 1000
None of these work. If I remove the filter, its working perfectly fine. The filter is on the same column on which on of the join is based.
Please suggest where I am making mistake.
Too long to comment...
So, I've noticed joins in the information designer can be cumbersome. It's convenient for people who don't have access to the data source, but if you do have access to the data source (as you do in this scenario), I would handle all of the logic on the DB server side. Thus, you are just supplying Spotfire with a flat file which it can easily ingest and create visualizations on. This will prevent Spotfire from bogging down with data transformations as well.
With that being said, I would also recommend using Stored procedures to serve up the data to Spotfire. Here are a couple of answers I posted on why which will make your life easier.
https://stackoverflow.com/a/38247931/6167855
https://stackoverflow.com/a/39640197/6167855
https://stackoverflow.com/a/43523380/6167855
https://stackoverflow.com/a/38247931/6167855

Shifting Window in google Big Query dataset

I have 30 daily sharded tables in Big Query from Nov 1 to Nov 30, 2016.
Each of these tables follow the naming convention of "sample_datamart_YYYYMMDD".
Each of these daily tables have a field called timestampServer.
My goal is to advance the data by 24 hours at 00:00:00 UTC every day.
So that the data is kept current without me having to copy the tables.
Is there any way to :
1) do a calculation on the field timestampServer so that it gets updated every 24 hours?
2) and at the same time rename the table name from sample_datamart_20161130 to sample_datamart_20161201?
I've read the other posts and I think those are more on aggregations in a 30 day window. My objective is not to do any aggreagtions. I just want to move the whole dataset forward by 24 hours so that when I searched for the last 1 day, there will always be data there.
Does anyone know if Google Cloud Datasets: Update be able to perform the tasks?
https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update#try-it
Thanks very much for any guidance.
As of #2 - how to rename the table name from sample_datamart_20161130
to sample_datamart_20161201?
This can be achieved by copying table to new table and then deleting original table.
Zero extra cost as copy job is free of charge
Table can be copied with Jobs: Insert API using copy configuration and then table can be deleted using Tables: Delete API
Just wanted to note that above answer just directly answers your (second) question. But somehow I feel you can go wrong direction. If you want to describe in more details what your are trying to achieve (as oposed to how you think you will implement it) we might be able to provide better help for you. If you will go this way - I would recommend to post it as a separate question :o)

Oracle Application Express SQL Query to show meaningful information

I am trying to write a query that 1) works and 2) shows meaningful information.
However, I can't seem to complete both scenarios. Both bits of code do work to a degree. My SQL query does work by showing all the useful information a user wants but when you click the edit button it doesn't link properly so it won't allow the user to update that row. The other shows only keys and rowid but when you click edit does show the information and allows it to be updated.
So as not to get another down-voted question, I have taken pictures of each scenario to show the problem, but, ultimately, I need to show meaningful information: an id or key isn't meaningful to the vast majority of users.
Here is my code
SELECT APPLICATIONS.APP_ID, APPLICATIONS.SRN, STUDENTS.SURNAME, STUDENTS.FORENAME, APP_STATUS.STATUS, METHODS.METHOD, JOBS.JOB_TITLE, APPLICATIONS.APP_DATE
FROM APPLICATIONS
JOIN STUDENTS
ON APPLICATIONS.SRN = STUDENTS.SRN
JOIN APP_STATUS
ON APPLICATIONS.STATUS_ID = APP_STATUS.STATUS_ID
JOIN METHODS
ON APPLICATIONS.METHOD_ID = METHODS.METHOD_ID
JOIN JOBS
ON APPLICATIONS.JOB_ID = JOBS.JOB_ID;
and here are the pictures of it in action
below is the code that does not show meaningful information but does work.
select "ROWID",
"APP_ID",
"SRN",
"STATUS_ID",
"METHOD_ID",
"JOB_ID",
"APP_DATE"
from "#OWNER#"."APPLICATIONS"
If i knew how to properly use rowid i am sure this is a simple feat but i dont so if i could get any help it would be useful
//edit
who ever renamed this to Application Expression why? what i am using is Apex Application Express it was relevant information that got changed to something wrong which might make it hard for someone with a similar problem to find later.
In the second, simple query, apex can determine which table (and record) you are trying to edit.
In the first query, with the joins, it can't tell which of the five tables in query you want to edit. You probably want to have the edit link pass the primary key of the row from APPLICATIONS to the child page. You would need to build into that page any logic (lists of values etc) that map lookup tables (such as status) to the values needed in the APPLICATIONS table.

Process Each Row in Kettle ONE AT A TIME?

I was wondering if it is possible to work on a per row basis in the kettle?
I am trying to implement a reporting scheme which consists of a table, where the requests get queued for processing and then the Pentaho job that picks up the records on that table.
my job currently has 3 transformations in it,
1st is to get records from the queued requests table
2nd is to analyze the values on each record and come up with multiple results based on that record. for example, a user would request to have records of movies of the horror genre. then it should spit out the horror movies
3rd is to further retrieve the information about the movies such as the year, director and etc, which is to be outputted to an excel file.
this is the idea, but it's a bit challenging doing it in Pentaho as it does stuff all at the same. is there a way that I can make my job work on records one by one?
EDIT.
Just to add, I have been trying to extend the implementation of the Pentaho cookbook sample but if I compare to my design, its like step 2 and step 3 only.
I can't seem to make the table input step work one at a time.
i just made it act like the implementation in the cookbook, i did adjustments on it. instead of using two transformations to gather all the necessary fields, i just retrieved all the information that i need in 1 transformation.
then after that i copied those information to the next steps, then some queries to complete the information and it is now working.
passing parameters between transformations is a bit confusing, there are parameters to be set on the transformation itself and also on the job where the transformations lay so i kinda went guessing for some time just to make it work.

Use a Query from the Destination db to limit OLE DB Source task in SSIS 2008

All,
I have a package that I'm building as a data importer so I can copy sets of data from my production environment and develop on another instance.
I have two tables that contain header and detail rows for service tickets. Those service tickets are tied back to orders.
I am pulling the service tickets from a certain time window, however, the originating orders fall outside of the date range that I'm pulling for the tickets.
I want to be able to take the following steps in an SSIS package:
Import the header and detail rows within the given date range from prod to dev
Select the relevant order numbers from dev tables
Use the list of order numbers to import only the relevant orders from prod
I poked through other answers and couldn't find answers that addressed this directly, so I apologize if there is an answer out there and I missed it. I may not have been asking the question correctly. I'm assuming that I would need to pull those order numbers into a temp table or variable in order to apply them as a filter.
As I write this, it just crossed my mind to use a join on the source system with the ticket to order tables and still use the date range to limit, but I'm still posting the question to see if anyone has dealt with this before.
Your steps are already fairly clear, are you asking how to actually implement them? It looks like you can do all three steps by using SELECT statements in your data sources:
Build a SELECT statement dynamically with the correct dates to use in your data source. The dates could be programmatically generated in a script task, or saved in a database table and populated into variables. Then you copy the data across to the dev system.
Run a SELECT statement in the dev system that returns the order numbers, and copy the results to a table in the prod database.
Run a SELECT statement in the prod database that joins on the table from step 2 and copy the results back to dev.
An alternative to the table in steps 2 and 3 would be a lookup transformation, but if you have a large number of rows then using a table will probably be faster.