I'm facing with problem in one query. The easiest will be to explain step by step:
At first I'm searching a specific values in colum1 in table1 by using query like this:
Query #1:
select column1
from table1
where column1 in('xxx','yyy','zzz')
group by column1
having count(*) >3
So now I have a list on values from column1, which occurs more than 3 times.
Then I need to use that list in where condition in another query:
select column1, column2, column3
from table1
where column1 in (query 1)
Unfortunately when I'm using query 1 as subquery, execution is really slow and I need to find a different way to this. Any suggest how can I increase a performance ?
Best regards and thank you in advance
If they are the same table, then use window functions:
select t.*
from (select t.*, count(*) over (partition by column1) as cnt
from table1 t
where column1 in ('xxx', 'yyy', 'zzz')
) t
where cnt > 3;
Both this an your original query will benefit from h having an index on table1(column1).
1)First of all take a look if the query is correctly indexed.
Maybe you have to add an index on column1.
2) try with it:
select column1, column2, column3
from table1 as T1 inner join (
select column1, column2, column3
from table1
where column1 in (query 1)) as T2
on t1.column1 = t2.column1
Related
I have below query
SELECT ROW_NUMBER() OVER ( PARTITION BY COLUMN1, COLUMN2, COLUMN3 ORDER BY COLUMN1, COLUMN2) AS ROW_NUM, COLUMN1, COLUMN2, COLUMN3
FROM (SUBQUERY)
GROUP BY COLUMN1, COLUMN2, COLUMN3
OUTPUT of above query:-
I need to perform something equivalent to
IF (COLUMN2 == 'PQR' AND COLUMN3 IS NOT NULL)
THEN
"Delete whole partition from output having value A3 in column1"
Explaination:-
If COLUMN2 is having value PQR and COLUMN3 is having any DATE_TIME (i.e. NOT NULL) then all the corresponding COLUMN1 value should not be present in output of query.
OUTPUT required is:-
I tried to be as clear as I can be. Let me know if I need to clarify my question more.
NOTE:- I want to remove those rows only from output of the query not from actual table.
If you are doing this using a subquery, then you might want to use window functions:
SELECT s.*
FROM (SELECT ROW_NUMBER() OVER ( PARTITION BY COLUMN1, COLUMN2, COLUMN3 ORDER BY COLUMN1, COLUMN2) AS ROW_NUM,
COLUMN1, COLUMN2, COLUMN3,
COUNT(CASE WHEN COLUMN2 = 'PQR' THEN COLUMN3 END) OVER (PARTITION BY COLUMN1) as cnt
FROM (SUBQUERY)
GROUP BY COLUMN1, COLUMN2, COLUMN3
) s
WHERE cnt = 0;
This counts the number of COLUMN3 values where COLUMN2 = 'PQR' over all each COLUMN1. It then returns only the rows where this count is 0.
The advantage of this approach is that it only evaluates the subquery once -- that can be a performance win (over NOT EXISTS) if it is complicated.
If you want a select query then you can use NOT EXISTS:
SELECT * FROM YOUR_TABLE T1
WHERE NOT EXISTS (SELECT 1 FROM YOUR_TABLE T2
WHERE T1.COLUMN1 = T2.COLUMN1
AND T2.COLUMN2 = 'PQR' AND T2.COLUMN3 IS NOT NULL);
You can use the EXISTS to delete such records as follows:
DELETE FROM YOUR_TABLE T1
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE T2
WHERE T1.COLUMN1 = T2.COLUMN1
AND T2.COLUMN2 = 'PQR' AND T2.COLUMN3 IS NOT NULL);
I have a table with some columns in it. I would like to write a query that iterates through each row and find the total count of all rows that match a column in the selected row and also find the count of all rows that match 2 columns. With these 2 values, I would like to find the percentage difference and print them as column1, percentage(query1(column2)/query2(column2 and column3)).
Below is the query which I wrote
SELECT DISTINCT (t2.column1)
,(
SELECT count(DISTINCT column2)
FROM table1 t1
WHERE t1.column1 = t2.column1
ORDER BY column2
) AS total_count
,(
SELECT count(DISTINCT column2)
FROM table1 t1
WHERE t1.column1 = t2.column1
AND column3 IN (
10
,20
)
ORDER BY column1
,column2
,column3
) AS column3_count
FROM table1 t2;
The above query works but takes a lot of time to process.
I want it as
SELECT DISTINCT (column1)
,percentage(query1 that matches ALL rows WITH column1 / query2 that match ALL rows WITH column1
AND SOME other CONSTRAINT)
FROM TABLE t1
I would like to optimize the above query too. Please let me know
Thanks
I think you just want conditional aggregation. For the counts:
select t1.column1,
count(distinct column2) as num_column2,
count(distinct case when column3 in (10, 20) then column2 end) as num_column2_column3
from table1 t1
group by t1.column1;
I don't understand the calculation for the percentage, but it would seem to be based on these numbers.
select t1.column1,
count(distinct column2) as num_column2,
count(distinct case when column3 in (10, 20) then column2 end) as num_column2_column3
from table1 t1
group by t1.column1;
i'm querying a system that won't allow using DISTINCT, so my alternative is to do a GROUP BY to get near to a result
my desired query was meant to look like this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT(column3)) AS column3
FROM table
for the alternative, i would think i'd need some type of nested query along the lines of this,
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(SELECT column FROM table GROUP BY column) AS column3
FROM table
but it didn't work. Am i close?
You are using the wrong syntax for COUNT(DISTINCT). The DISTINCT part is a keyword, not a function. Based on the docs, this ought to work:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(DISTINCT column3) AS column3
FROM table
Do, however, read the docs. BigQuery's implementation of COUNT(DISTINCT) is a bit unusual, apparently so as to scale better for big data. If you are trying to count a large number of distinct values then you may need to specify a second parameter (and you have an inherent scaling problem).
Update:
If you have a large number of distinct column3 values to count, and you want an exact count, then perhaps you can perform a join instead of putting a subquery in the select list (which BigQuery seems not to permit):
SELECT *
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
)
CROSS JOIN (
SELECT count(*) AS column3
FROM (
SELECT column3
FROM table
GROUP BY column3
)
)
Update 2:
Not that joining two one-row tables would be at all expensive, but #FelipeHoffa got me thinking more about this, and I realized I had missed a simpler solution:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
COUNT(*) AS column3
FROM (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2
FROM table
GROUP BY column3
)
This one computes a subtotal of column1 and column2 values, grouping by column3, then counts and totals all the subtotal rows. It feels right.
FWIW, the way you are trying to use DISTINCT isn't how its normally used, as its meant to show unique rows, not unique values for one column in a dataset. GROUP BY is more in line with what I believe you are ultimately trying to accomplish.
Depending upon what you need you could do one of a couple things. Using your second query, you would need to modify your subquery to get a count, not the actual values, like:
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
(SELECT sum(1) FROM table GROUP BY column) AS column3
FROM table
Alternatively, you could do a query off your initial query, something like this:
SELECT sum(column1), sum(column2), sum(column4) from (
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
1 AS column4
FROM table GROUP BY column3)
GROUP BY column4
Edit: The above is generic SQL, not too familiar with Google Big Query
You can probably use a CTE
WITH result as (select column from table group by column)
SELECT
SUM(column1) AS column1,
SUM(column2) AS column2,
Select Count(*) From result AS column3
FROM table
Instead of doing a COUNT(DISTINCT), you can get the same results by running a GROUP BY first, and then counting results.
For example, the number of different words that Shakespeare used by year:
SELECT corpus_date, COUNT(word) different_words
FROM (
SELECT word, corpus_date
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
As a bonus, let's add a column that identifies which books were written during each year:
SELECT corpus_date, COUNT(word) different_words, GROUP_CONCAT(UNIQUE(corpus)) books
FROM (
SELECT word, corpus_date, UNIQUE(corpus) corpus
FROM [publicdata:samples.shakespeare]
GROUP BY word, corpus_date
)
GROUP BY corpus_date
ORDER BY corpus_date
I have a series of select count queries tied together by UNION egg
Select Count(Column1) From Table1 where Table1 column1 = 1
union
Select Count(Column2) From Table1 where Table1 column2 = 1
It works fine but it just orders in asc or desc order but I want it to go in order by which I requested, I want the first query to always be first in the result no matter what the value is. Thanks for any help.
Run two queries?
You can add a column and sort on it
Select 1 as sequence, Count(Column1) From Table1 where Table1 column1 = 1
union
Select 2 as sequence, Count(Column2) From Table1 where Table1 column2 = 1
ORDER BY sequence
Try this:
SELECT COUNT(*) AS cnt, 1 AS SortOrder FROM Table1 WHERE column1 = 1
UNION ALL
SELECT COUNT(*) AS cnt, 2 AS SortOrder FROM Table1 WHERE column2 = 1
ORDER BY SortOrder
The main change I have made is to add a column which you can use to ORDER BY. Some of the other changes I have made:
You don't mean UNION, you mean UNION ALL. Otherwise with your query if the counts were the same you'd only get one row. In the new query this wouldn't happen, but you should still use UNION ALL because that's semantically what you mean.
Writing COUNT(column1) is unnecessary because your WHERE clause guarantees that column1 can never be NULL. Use COUNT(*). I imagine that even if you write COUNT(column1) most databases will see that column1 cannot be NULL and omit the unnecessary NULL check, but again there is nothing wrong with being explicit - you want to count all rows and COUNT(*) makes that clear.
You shouldn't have Table1 column1 with a space between. There should be a dot. Or simply omit the table name as it is not required here.
How can I query the results of two equally designed tables?
if table1 contains 1 column with data:
abc
def
hjj
and table2 contains 1 column with data:
uyy
iuu
pol
then i want my query to return
abc
def
hjj
uyy
iuu
pol
but I want to make sure that if I try to do the same task with multiple columns that the associations remain.
SELECT
Column1, Column2, Column3 FROM Table1
UNION
SELECT
Column1, Column2, Column5 AS Column3 FROM Table2
ORDER BY
Column1
Notice how I do an order by at the end and that Column5 in Table2 is the equivalent of Column3 in Table1. The Order By is of course optional, but allows you to control the order of items from both tables once they are combined.
Use a UNION
SELECT *
FROM TABLE_A
UNION
SELECT *
FROM TABLE_B
UNION will give you all distinct results, as where UNION ALL will give you results combined from the sets.
SELECT col FROM t1 UNION SELECT col FROM t2
Union reference.
sev, since union is the solution to what you described and you say that didn't work, perhaps you can provide the code you wrote that didn't work as clearly we are missing part of the picture. Are you positive the second table has the records you want? How do you know for sure?