Google bigquery is extremely slow on simple query

Google bigquery is extremely slow on simple query - sql

I have a simple query that count records from 4 tables (NO JOINING):
SELECT count(tx._sequence_num) as txc,
count(o._sequence_num) as oc,
count(t._sequence_num) as tc,
count(ol._sequence_num) as olc
FROM `xxx.TAX_TRANSACTIONS` tx,
xxx.ORDER o,
xxx.TRANSACTION t,
xxx.ORDER_LINES ol
It never returns result to me
If I separate it to 4 queries like that:
SELECT count(tx._sequence_num) as txc FROM `xxx.TAX_TRANSACTIONS` tx; --202685
SELECT count(o._sequence_num) as oc FROM xxx.ORDER o; --175642
SELECT count(t._sequence_num) as tc FROM xxx.TRANSACTION t; --199392
SELECT count(ol._sequence_num) as olc FROM xxx.ORDER_LINES ol; --174947
It return just after 1-2 seconds (--xxxxxx in the right is the records count)
Same for this simple join, I never get the result:
SELECT ol.DEVICE_ID AS VIN,
tx.TAX_LINES AS SKU,
o.USER_ID AS ACCOUNT_DN,
o.ORDER_NUMBER,
cast(t.AMOUNT as FLOAT64)/100 AS TOTAL_AMOUNT ,
t.TRANSACTION_STATUS,
t.TRANSACTION_TYPE,
t.TRANSACTION_TAG,
t.CREATED_ON ,
tx.TAX_CALCULATED,
tx.TRANSACTION_STATUS AS TAX_TXN_STATUS,
tx.ERROR_MESSAGE REMARKS,
tx.TRANSACTION_ID AS TAX_TXN_ID,
tx.TAXATION_TYPE AS TAX_TXN_TYPE,
tx.TRANSACTION_DATE TAX_TXN_DATE
FROM xxx.TAX_TRANSACTIONS tx join
`xxx.ORDER` o on o.ORDER_NUMBER = tx.ORDER_NUMBER join
xxx.TRANSACTION t on o.ORDER_NUMBER = t.ORDER_NUMBER join
xxx.ORDER_LINES ol on o.ID = ol.ORDER_ID
WHERE (t.TRANSACTION_TYPE IN ("purchase") AND t.TRANSACTION_STATUS ="approved" AND tx.TAXATION_TYPE = "SalesInvoice") or
(t.TRANSACTION_TYPE IN ("refund") AND tx.TAXATION_TYPE = "ReturnInvoice") or
(tx.TRANSACTION_STATUS IN ("Error"))
ORDER BY CREATED_ON DESC
Is there something wrong with my query? Please let me know how to resolve the problem (joining). Thank you

You say you're not doing any JOINs, but actually you are. Worse, you are doing CROSS JOINs. By putting 4 tables as you have done in your FROM clause you are implicitly joining all 4 of them together.
In other words, the number of rows produced by the join will be 202685 * 175642 * 199392 * 174947 = 1241835900000000000000 which is a humungous number. That's why your query doesn't complete.
Maybe take a look at the execution graph which is currently in preview (I can see it on your screenshot above) - it might give an indication into what operation is being performed here.
If you want COUNTs of the number of rows in each of those tables then you have to write 4 separate queries, as you have done.
UPDATE, as a demonstration I have a table that has 288 rows in it
select count(*)
from `project.dataset.t` a
returns 288
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b
returns 82944
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c
returns 23887872
select count(*)
from `project.dataset.t` a,
`project.dataset.t` b,
`project.dataset.t` c,
`project.dataset.t` d
returns 6879707136 (6.8billion). That's an enormous number, and that's for a table with only 288 rows in it. Your query will (as I said above) produce 1241835900000000000000 rows.
Here is the execution graph for my query that returns 6879707136:

Related

Merge two columns into one postgres, the result not be .the result must not be the concatenation of char. I am looking to add the rows of both columns

I have the following table, result of a select. What you need to do now is combine stage with phase_name and time with phase_time. In the time and stage column respectively. The result should not be the concatenation, but I need as many rows in time and stage as the sum of the rows of phase_time and time and of phase_name and stage.
This is the query:
select c.iqnum, t.*, "airgoLocator_phase".name as phase_name,"airgoLocator_phasehistory".timestamp as "phase_time"
from
"airgoLocator_surgerytimes" c cross join lateral(
values
(c.adminssion_time, 'adminssion_time'),
(c.pre_enter_time, 'pre_enter_time'),
(c.quiro_enter_time, 'quiro_enter_time'),
(c.quiro_exit_time, 'quiro_exit_time'),
(c.recu_enter_time, 'recu_enter_time'),
(c.exit_time, 'exit_time')
) as t(time, stage)
inner join
"airgoLocator_phasehistory"
on
c.id = "airgoLocator_phasehistory".surgery_id
inner join
"airgoLocator_phase"
on
"airgoLocator_phasehistory".phase_id = "airgoLocator_phase".level
order by c.id desc, "airgoLocator_phase".level asc
;
This is the result that I have at the moment as a result of the query above
This is the end result I want

Joining two subqueries with Cypher in Neo4J

I have a following SQL query:
SELECT q1.customerId, q1.invoiceId, q2.workId, sum(q2.price)
FROM (select customer.id as customerId, invoice.id as invoiceId, work.id as workId from customer, invoice, workinvoice, work where customer.id=invoice.customerid and invoice.id=workinvoice.invoiceId and workinvoice.workId=work.id
) as q1, (select work.id as workId, sum((price * hours * workhours.discount) + (purchaseprice * amount * useditem.discount)) as price from worktype,workhours,work,warehouseitem,useditem where worktype.id=workhours.worktypeid and workhours.workid=work.id and work.id=useditem.workid and useditem.warehouseitemid=warehouseitem.id group by work.id
) as q2
WHERE q1.workId = q2.workId group by q1.invoiceId;
This query should return me a sum of work prices for each invoice per customer.
I would be interested to know how to do this kind of query in Neo4J. I know that there is UNION https://neo4j.com/docs/cypher-manual/current/clauses/union/. However that does seem to do what I want. I need to make two subqueries and join them from same node as in that SQL example. What would be the correct way to do this with Cypher?

There's a quite complex example of how to do a join in cypher which you can find here: https://github.com/moxious/halin/blob/master/src/api/data/queries/dbms/3.5/tasks.js#L22
Basically, the technique is that you run the first query, collect the results. Then you run the second, collect the results. Then you unwind the second, match using a filter, and return the result.
In really simplified form, it looks something like this:
CALL something() YIELD a, b
WITH collect({ a: a, b: b }) as resultSet1
CALL somethingElse YIELD a, c
WITH resultSet1, collect({ a: a, c: c }) as resultSet2
UNWIND resultSet2 as rec
WITH [item in resultSet1 WHERE item.a = rec.a][0] as match, rec
RETURN match.a, match.b, rec.c
The list comprehension bit is basically doing the join. Here we're joining on the "a" field.

I figured out the solution I wanted:
MATCH (inv:invoice)-[:WORK_INVOICE]->(w:work)<-[h:WORKHOURS]-(wt:worktype) WITH inv, w, SUM(wt.price * h.hours * h.discount) as workTimePrice OPTIONAL MATCH (w)-[u:USED_ITEM]->(i:item) WITH inv, workTimePrice + SUM(u.amount * u.discount * i.purchaseprice) as workItemPrice RETURN inv, sum(workItemPrice) as invoicePrice

How to optimize this SQL query summing amounts across related tables?

I have an Access project that implements SQL, and I have been working on optimizing a reconciliation process. This process uses a voucher system which links all tables together.
Each record in each table has a specific voucher ID in which an amount is associated with.
The vouchers themselves are unique and can contain multiple voucher numbers, which can be seen below.
Table: Rec_Vouchers
v_id v_num voucher
1 12341234 12341234
2 10101010 10101010;22222222
2 22222222 10101010;22222222
...
I have 8 other tables that are linked by these voucher ID's. I'm trying to join all of the tables together to show the distict voucher ID and voucher and all corresponding sums of amounts for each table with that specific voucher ID. Below is the query and a sample of the results. I've worked on this for a while now, and it's starting to give me a headache. This query works, but takes way to long to execute.
Also, at some point, I need to match all of these values together to determine if a voucher is "Matching", "Not matching", or "Matching with a difference". So far I've only tried creating a function within the below code that would return a string value of "M", "NM", or "MwD" to display in the column for each voucher. Again, this works, but takes an extremely long time. I've also tried letting VBA do the dirty work with the query's returned recordset, and this takes a good amount of time too, but not as long as creating the function within my sql query. This is the next step, so if you could help with all of this that would be great, but I really just need to optimize the query I have given.
I know this is a lot to wrap your head around, so let me know if you need any more information. Any help would be appreciated. Thanks!
select a.v_id, a.voucher,
(Select sum(b.amount) from rec_month_4349_test b where b.voucher = a.v_id) as GL,
(Select sum(c.payments) from rec_daily_balancing_test c where c.voucher = a.v_id) as DB,
(Select count(x.v_num) from rec_vouchers x where a.v_id = x.v_id and x.v_num not like 'ONL%') as GLcount,
(select count(c.batch_num) from rec_daily_balancing_test c where a.v_id = c.voucher) as DBcount,
(select sum(d.amount) from rec_ed_test d where a.v_id = d.voucher) as ED,
(select sum(e.batchtotal) from rec_eft_batches_new_test e where a.v_id = e.voucher) as EFT,
(select sum(f.batchtotal) from rec_check_batch_test f where a.v_id = f.voucher) as CHK,
(select sum(g.idxtotal) from rec_lockbox_test g where a.v_id = g.voucher) as LBX,
(select sum(h.amount) from rec_lcdl_test h where a.v_id = h.voucher) as LCDL,
((select sum(i.payment_amount) from rec_electronic_files_test i where a.v_id = i.voucher) + (select sum(j.amount) from rec_electronic_edits_test j where a.v_id = j.voucher)) as Elec
from rec_vouchers a
group by a.v_id, a.voucher
Sample Results:
v_id GL DB GLcount DBcount ED EFT CHK LBX LCDL Elec
6131 19204.00 19204.00 1 1 NULL NULL NULL NULL NULL NULL
6132 125330.00 14932.00 6 6 NULL NULL NULL NULL NULL 14932.00
6133 18245.00 NULL 2 0 NULL NULL NULL NULL NULL NULL
6175 98.93 98.93 1 1 NULL 98.93 NULL NULL NULL NULL

It is tempting to say that the "traditional" way to write this query is by moving the tables to the from clause using join predicates. However, that would probably introduce unnecessary cartesian products. Your method is actually ok; the alternative would be doing left joins to aggregated subqueries.
The killer on performance is probably due to cycling through the tables to find the matches. You can significantly improve performance by having an index on the fields used in the where clause for each query. For the first two tables, for instance, you should have an index on rec_month_4349_test(voucher) and rec_daily_balancing_test(voucher).
In SQL Server, you can further optimize this query by including the variable used for summation in the index as well. The following indexes would be better: rec_month_4349_test(voucher, amount) and rec_daily_balancing_test(voucher, payments) (or you can include them in the index without being searchable, which is a bit more advanced).
This optimization works in most databases (an index-scan rather than an index-lookup). I don't know if it works in MS Access (a software product that I try to avoid when possible).
Remember, you would need to do this for all the tables.

Not sure if this is the best solution, but I created separate views for each table to select the voucher and the sum of the amounts for each specific voucher. Each view looked similair to the following:
rec_sum4349
SELECT voucher, sum(amount) AS GL
FROM rec_month_4349_test
GROUP BY voucher
I then have one view that combines all of the separate view together using full joins like the following:
rec_vouch_test
SELECT a.voucher, a.GL, b.DB
FROM rec_sum4349 a
FULL JOIN rec_sumDB b
ON a.voucher = b.voucher
WHERE a.voucher IS NOT NULL AND a.voucher <> ''
ORDER BY a.voucher
After I saw that this worked really well, I created the views for the rest of the tables that I needed summed amounts for and added them to the above view. The results are exactly what I was looking for and the run-time was cut down from almost 2 minutes to the matter of a couple seconds! Thanks for all the help. Now on to matching everything up!

correlated sub queries is the last choice i would prefer.
i would suggest to write sub-query and join each table so that each table can utilize the indexes on them.
Create the following indexes on each table and see the below query.
rec_vouchers
Clustered Index (v_id , voucher)
Filtered Non Clustered Index(v_num) WHERE v_num NOT LIKE 'ONL%'
rec_month_4349_test
Non Clustered Index(voucher) Include (amount)
rec_daily_balancing_test
Non Clustered Index(voucher) Include (payments)
rec_ed_test
Non Clustered Index(voucher) Include (amount)
rec_eft_batches_new_test
Non Clustered Index(voucher) Include (batchtotal)
rec_check_batch_test
Non Clustered Index(voucher) Include (batchtotal)
rec_lockbox_test
Non Clustered Index(voucher) Include (idxtotal)
rec_lcdl_test
Non Clustered Index(voucher) Include (amount)
rec_electronic_files_test
Non Clustered Index(voucher) Include (payment_amount)
rec_electronic_edits_test
Non Clustered Index(voucher) Include (amount)
SELECT a.v_id,a.voucher
,t1.GL ,t2.DB ,t3.GLcount
,t4.DBcount ,t5.ED ,t6.EFT
,t7.CHK ,t8.LBX ,t9.LCDL
,(t10.Elec1+t11.Elec2) AS Elec
FROM
( SELECT t0.v_id ,t0.voucher
FROM rec_vouchers t0
GROUP BY t0.v_id ,t0.voucher
)a
JOIN
( SELECT SUM(b.amount) AS GL,b.voucher
FROM rec_month_4349_test b
Group By b.voucher
) t1
ON a.v_id=t1.voucher
JOIN
( SELECT SUM(c.payments) AS DB,c.voucher
FROM rec_daily_balancing_test c
Group By c.voucher
) t2
ON a.v_id=t2.voucher
JOIN
( SELECT COUNT(x.v_num) AS GLcount,x.v_id
FROM rec_vouchers x
WHERE x.v_num NOT LIKE 'ONL%'
Group BY x.v_id
) t3
ON a.v_id=t3.v_id
JOIN
( SELECT COUNT(c.batch_num) AS DBcount,c.voucher
FROM rec_daily_balancing_test c
Group By c.voucher
) t4
ON a.v_id=t4.voucher
JOIN
( SELECT SUM(d.amount) AS ED,d.voucher
FROM rec_ed_test d
Group By d.voucher
) t5
ON a.v_id=t5.voucher
JOIN
( SELECT SUM(e.batchtotal) AS EFT,e.voucher
FROM rec_eft_batches_new_test e
Group By e.voucher
) t6
ON a.v_id=t6.voucher
JOIN
( SELECT SUM(f.batchtotal) AS CHK,f.voucher
FROM rec_check_batch_test f
Group By f.voucher
) t7
ON a.v_id=t7.voucher
JOIN
( SELECT SUM(g.idxtotal) AS LBX,g.voucher
FROM rec_lockbox_test g
Group By g.voucher
) t8
ON a.v_id=t8.voucher
JOIN
( SELECT SUM(h.amount) AS LCDL,h.voucher
FROM rec_lcdl_test h
Group By h.voucher
) t9
ON a.v_id=t9.voucher
JOIN
( SELECT SUM(i.payment_amount) AS Elec1,i.voucher
FROM rec_electronic_files_test i
GROUP BY i.voucher
) t10
ON a.v_id=t10.voucher
JOIN
( SELECT SUM(j.amount) AS Elec2,j.voucher
FROM rec_electronic_edits_test j
GROUP BY j.voucher
) t11
ON a.v_id=t11.voucher

SQL Select Where or Having

Im attempting to get some records from a table based on certain factors.
One of the factors is simply with fields on the same table, the other is when joining to another table, I want to compare the number of records in the joined table to a field on the first table. Below is a sample code.
select * from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = 1486
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
-- Having COUNT(LC.ID) = D.Lines
Now from the code above I cant have the Count function in the where clause, and I cant have a field in in the having clause without it being in a function.
Im probably missing something very simple here. But I cant figure it out.
Any help is appreciated it.
EDIT: I do apologise should have explained the structure of the tables, the Destinations are single records, which the LiveCalls table can hold multiple records based on the Destinations ID (foreign key).
Thank you very much for everyones help. My final code:
select D.ID, D.Description, D.Lines, D.Active, D.AlternateFail, D.ConfigurationID, COUNT(LC.ID) AS LiveCalls from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = #ConfigurationID
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
GROUP BY D.ID, D.Description, D.Lines, D.Active, D.AlternateFail, D.ConfigurationID
HAVING COUNT(LC.ID) <= D.Lines

The simple thing you're missing is the GROUP BY statement.
As JNK mentioned in the comments below, you cannot use an aggregate function (such as COUNT, AVG, SUM, MIN) if you don't have a GROUP BY clause, unless your SELECT statement only references literal values (and no column names).
Your code should probably be something like:
SELECT <someFields>
FROM tDestinations D
LEFT JOIN tLiveCalls LC on LC.DestinationID = D.ID
WHERE D.ConfigurationID = 1486
AND (D.Active = 1 AND D.AlternateFail > GETDATE())
GROUP BY <someFields>
HAVING COUNT(LC.ID) = D.Lines
Note that you have to specify the selected fields explicitely, in both the SELECT and GROUP BY statements (no * allowed).

you can only use having with aggregations. Actually having is the "where clause" for aggregation, BUT you can still have a where on the columns that you are no aggregating.
For example:
SELECT TABLE_TYPE, COUNT(*)
FROM INFORMATION_SCHEMA.TABLES
where TABLE_TYPE='VIEW'
group by TABLE_TYPE
having COUNT(*)>1
In your case you need to use havving count(*)=1
so, I think your query would be something like this:
select YOUR_COLUMN
from tDestinations D
left join tLiveCalls LC on LC.DestinationID = D.ID
where D.ConfigurationID = 1486 AND (D.Active = 1 AND D.AlternateFail > GETDATE())
group by YOUR_COLUMN
Having COUNT(LC.ID) = value

MS-Access -> SELECT AS + ORDER BY = error

I'm trying to make a query to retrieve the region which got the most sales for sweet products. 'grupo_produto' is the product type, and 'regiao' is the region. So I got this query:
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao ORDER BY total DESC
Then when i run the query, MS-Access asks for the "total" parameter. Why it doesn't consider the newly created 'column' I made in the select clause?
Thanks in advance!

Old Question I know, but it may help someone knowing than while you cant order by aliases, you can order by column index. For example, this will work without error :
SELECT
firstColumn,
IIF(secondColumn = '', thirdColumn, secondColumn) As yourAlias
FROM
yourTable
ORDER BY
2 ASC
The results would then be ordered by the values found in the second column wich is the Alias "yourAlias".

Aliases are only usable in the query output. You can't use them in other parts of the query. Unfortunately, you'll have to copy and paste the entire subquery to make it work.

You can do it like this
select * from(
select a + b as c, * from table)
order by c
Access has some differences compared to Sql Server.

Why it doesn't consider the newly
created 'column' I made in the select
clause?
Because Access (ACE/Jet) is not compliant with the SQL-92 Standard.
Consider this example, which is valid SQL-92:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY x, y;
In fact, x and y the only valid elements in the ORDER BY clause because all others are out of scope (ordinal numbers of columns in the SELECT clause are valid though their use id deprecated).
However, Access chokes on the above syntax. The equivalent Access syntax is this:
SELECT a AS x, c - b AS y
FROM MyTable
ORDER
BY a, c - b;
However, I understand from #Remou's comments that a subquery in the ORDER BY clause is invalid in Access.

Try using a subquery and order the results in an outer query.
SELECT TOP 1 * FROM
(
SELECT
r.nm_regiao,
(SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) T1
ORDER BY total DESC
(Not tested.)

How about:
SELECT TOP 1 r.nm_regiao
FROM (SELECT Dw_Empresa.cod_regiao,
Count(Dw_Empresa.cod_regiao) AS CountOfcod_regiao
FROM Dw_Empresa
WHERE Dw_Empresa.[grupo_produto]='1'
GROUP BY Dw_Empresa.cod_regiao
ORDER BY Count(Dw_Empresa.cod_regiao) DESC) d
INNER JOIN tb_regiao AS r
ON d.cod_regiao = r.cod_regiao

I suggest using an intermediate query.
SELECT r.nm_regiao, d.grupo_produto, COUNT(*) AS total
FROM Dw_Empresa d INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
GROUP BY r.nm_regiao, d.grupo_produto;
If you call that GroupTotalsByRegion, you can then do:
SELECT TOP 1 nm_regiao, total FROM GroupTotalsByRegion
WHERE grupo_produto = '1' ORDER BY total DESC
You may think it's extra work to create the intermediate query (and, in a sense, it is), but you will also find that many of your other queries will be based off of GroupTotalsByRegion. You want to avoid repeating that logic in many other queries. By keeping it in one view, you provide a simplified route to answering many other questions.

How about use:
WITH xx AS
(
SELECT TOP 1 r.nm_regiao, (SELECT COUNT(*)
FROM Dw_Empresa
WHERE grupo_produto='1' AND
cod_regiao = d.cod_regiao) as total
FROM Dw_Empresa d
INNER JOIN tb_regiao r ON r.cod_regiao = d.cod_regiao
) SELECT * FROM xx ORDER BY total

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Google bigquery is extremely slow on simple query - sql

Related

Merge two columns into one postgres, the result not be .the result must not be the concatenation of char. I am looking to add the rows of both columns

Joining two subqueries with Cypher in Neo4J

How to optimize this SQL query summing amounts across related tables?

SQL Select Where or Having

MS-Access -> SELECT AS + ORDER BY = error

Categories

Resources