github_repos: Join "commits" and "content" table - google-bigquery

I want to retrieve the file content of a file at a given commit in Googles BigQuery Github Dataset ("github_repos").
Hence I would expect something like SELECT content FROM sample_contents WHERE commit_id = abc (just as one example. In the future it should be a join). But sadly I am not able to find a link i.e. an attribute which is shared between the commits-table and the contents-table.
How can I join the tables commits and contents?

You need to use the table file to join these two tables commits and contents.
You can see this example:
with example as (
SELECT a.id as idFiles,b.id as idContents,a.repo_name as idContents_1,c.repo_name as idCommit_1
FROM `files` a
join `contents` b on a.id=b.id
join `commits` c on a.repo_name=ARRAY_TO_STRING(c.repo_name, "")
where a.repo_name='conda-forge/feedstocks' and a.id='c798dafa5d536d96203f762db8fd11cbde8f3139'
)
select *
from example limit 10
As you can see in this image, the table files uses the field id to join the table contents because it uses the same id field. The table files uses the field repo_name to join the table commit. You must use the table files to join the tables commit and contents. You can’t join these two tables directly (commit and contents).

Related

SQL DELETE FROM several tables

I have the following tables:
dataset, links, files
dataset has a field called tiComplete, if it is 0 then the record is incomplete, links and files both have a field "biDataset" that references the record in the dataset table.
I'm trying to create a query that deletes all entries from dataset, links and files where tiComplete = 0, this is what I have:
DELETE
`datasets`.*,
`links`.*,
`files`.*
FROM
`datasets` `d`
INNER JOIN
`links` `l`
ON
`l`.biDataset=`d`.biPK
INNER JOIN
`files` `f`
ON
`f`.biDataset=`d`.biPK
WHERE
`d`.tiComplete=0;
However when I try to save the procedure that contains this I get:
SQL Error(1109): Unknown table `datasets` in MULTI DELETE
I'm using MariaDB version 10 with HeidiSQL version 11.0.0.5919
Your multiple table delete syntax is off. Use this version:
DELETE d, l, f
FROM datasets d
INNER JOIN links l ON l.biDataset = d.biPK
INNER JOIN files f ON f.biDataset = d.biPK
WHERE d.tiComplete = 0;
If you alias the tables, as you have done, then the aliases whose tables are intended for deletion should appear in the DELETE clause as a CSV list.
Note that I removed the ugly backticks everywhere, which weren't necessary and only obfuscate the code. Also, an alternative approach here would be to look into cascading deletion. Using that approach, deletion of a record in the parent table would automatically delete all records in linked children tables.

Importing Data using joins; more than one row returned

Im trying to move data from a source table to a Master table. my source table has a category, but no ID. I need to take the category name from the source table, join it to the category table, and insert that ID into my target table.
Here is what I am working with so far (SQL Server):
Select C_CATID, C_CategoryDesc, b.Record_No (using this to differentiate records, as i keep getting duplicates) from [Category_Tree_Lookup]
inner join [dbo].[TBL_C_Mapped SpendData_12_2014OLD03312015] b
on C_categorydescription = b.category
where C_CategoryDesc = b.category
I keep receiving too many records that are duplicates, so i cannot insert into the table.
Any help would be greatly appreciated!
You can use distinct function like this
select distinct C_CATID, C_CategoryDesc, b.Record_No from tablename

Query a table only to see the rows that match rows in two other tables

I have two folders, that have files in them. The files and their data are stored in tables, one table per folder. The folder tables only have data of the files that are currently in the folders. I have another Table that has all the data inside it even files that are no longer in the two folders, just to keep track of history.
I require a query that will show the contents of both tables, but not files that are no longer in the folders.
For more information:
Each file has a ID
Each ID is Different
Folder A does not have any matching IDs as Foler B
This data is stored in Access 2010
What I thought would work:
I was thinking of an inner join using the third table and the other two folder tables, with a where clause that only shows:
TableC.ID = TableA.ID AND TableC.ID = TableB.ID
But this did not function.
Try....
select ID,detail1,details From TableA
union
Select ID,detail1,detail2... from TableB
select commonColumns, AOnlyColumn, NULL As BOnlyColumn
from FolderA
UNION
select commonColumns, NULL, BOnlyColumn
from FolderB

delete rows in parent-child tables found in another parent-child tables

I am loading data into a parent-child pair of tables in a "staging" database schema. If there are duplicate records that were previously loaded into a parent-child pair of tables in a "master" database schema, I want to delete them from the "staging" database tables.
This query
SELECT A.*,B.*
FROM STG.AUTO_REPR_PAR_STG A
JOIN STG.AUTO_REPR_CHLD_STG B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
WHERE EXISTS ( SELECT A.*, B.*
FROM MST.AUTO_REPR_PAR A
JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
)
will show what's in staging that was previously loaded in master. But how do I delete from the parent-child pair of tables in staging database? I am drawing a "blank"....I tried this but it bombs ("Tables not allowed in FROM clause"):
DELETE FROM STG.AUTO_REPR_PAR_STG A
JOIN STG.AUTO_REPR_CHLD_STG B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
WHERE EXISTS (SELECT A.*, B.*
FROM MST.AUTO_REPR_PAR A
JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
)
Back-end is Teradata v13. I am currently researching the CASCADE DELETE option but I am not even sure it is supported....Any idea?
There's no way to delete from multiple tables in a single DELETE statement, you need one for each table:
DELETE FROM STG.AUTO_REPR_PAR_STG A
WHERE TEST_SEQ_NUM IN (
SELECT A.TEST_SEQ_NUM FROM MST.AUTO_REPR_PAR A JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM )
;DELETE FROM STG.AUTO_REPR_CHLD_STG B
WHERE TEST_SEQ_NUM IN (
SELECT A.TEST_SEQ_NUM FROM MST.AUTO_REPR_PAR A JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM )
If you run this as a Multi Statement Request the join will be done only once.
You may try something like this:
Instead of a subquery with the EXIST clause, you can use an OUTER JOIN - you select all rows with NULL columns in the target outer table, i.e. the not-matching rows;
You save the result of the the previous query into a temporary table, and you run 2 DELETE statements.
An OUTER JOIN is much more efficient compared to a subquery with EXISTS, especially with large data sets.

How to get names present in both views?

I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins