Bigquery SELECT * WHEN COUNT(DISTINCT value) does not work - google-bigquery

I have a bigQuery table with 30+ columns and I want to SELECT * where session is unique.
I've been to almost all questions regarding this subject in StackOverflow but none helped me to achieve the expect result.
I've tried SELECT COUNT(DISTINCT session) FROM table.id but the problem is that returns only session column and I need the whole row.
Then I tried:
SELECT *
FROM `table.id`
WHERE session IN (
SELECT session
FROM `table.id`
GROUP BY session
HAVING COUNT(*) = 1
)
But it returns much less rows then SELECT COUNT(DISTINCT sessions)
So by logic I tried:
SELECT *, COUNT(DISTINCT sessions) and SELECT * WHERE COUNT(DISTINCT sessions)
none works
Anyone can help? Thanks in advance and kind regards,

I want to SELECT * where session is unique ...
Use below instead - note use of = in COUNT(*) = 1
SELECT *
FROM `table.id`
WHERE session IN (
SELECT session
FROM `table.id`
GROUP BY session
HAVING COUNT(*) = 1
)

You query seems alright with HAVING COUNT(*) = 1 as suggested by #Mikhail.
What wrong is that you are trying to match this result with SELECT COUNT(DISTINCT sessions).
Note that DISTINCT is used to show distinct records including 1 record from duplicate too. On the other hand HAVING COUNT(*) = 1 is checking only records which are not duplicate.
For a simple example, if session has : 1, 1, 2, 3
DISTINCT will result in: 1, 2, 3
HAVING COUNT(*) = 1 will result in: 2, 3
hence the difference you see in both result.

Related

How To include row when using HAVING COUNT(*) = 1 BiqQuery

I have a bigQuery table with 30+ columns and I want to SELECT * where session is unique.
I have this query:
SELECT *
FROM `table.id`
WHERE session IN (
SELECT session
FROM `table.id`
GROUP BY session
HAVING COUNT(*) = 1
)
And it works, but I just learned from another question that HAVING COUNT(*) = 1 excludes the duplicate row:
Note that DISTINCT is used to show distinct records including 1 record from duplicate too. On the other hand HAVING COUNT() = 1 is checking only records which are not duplicate.
For a simple example, if session has : 1, 1, 2, 3
DISTINCT will result in: 1, 2, 3
HAVING COUNT() = 1 will result in: 2, 3
I need the DISTINCT result, the one that includes one entry of the duplicate.
Anyone can help me? Thanks in advance, kind regards
Maybe ROW_NUMBER?
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY session) as row_num
FROM `table.id`
)
WHERE row_num = 1

Getting result basis on count of another SQL query

I have a table with the following columns:
bkng_date
bkng_id (varchar)
villa_id (varchar)
This query
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date;
returns the no.of records for each date as count.
Now I need to find dates in the resultset of this query where cnt = 2.
I tried a couple of subqueries but I'm not getting the desired results.
The simplest, correct and safe solution is adding having count(*) = 2 clause as Gordon said.
For completeness, if you were curious how to solve it using subqueries (you didn't provide your db vendor though it's very likely your vendor supports having clause), it would be:
select x.bkng_date, x.cnt from (
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
) x
where x.cnt = 2
or
with x as (
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
)
select * from x where cnt = 2
Best Option is to use the Having Clause as follows,
select bkng_date,count(*) as cnt
from tab_bkng_det
group by bkng_date
having count(*) = 2

Get number of results but part of resultset

I have queries that can return large resultsets (> 100K rows). I need to display the number of results to the user and the user is able to page through the results in our application. However, nobody is going to page through 100K items when 25 are displayed on a page. So I want to limit the number of pageable results to 5K while still displaying the total number of results to the user.
Of course, I can fire two seperate queries to the database: one counting all results, one returning the TOP(5000). But the queries can be expensive.
Is there a smart way to combine these two queries into one? The queries below are over simplified:
SELECT COUNT(*) FROM TABLE WHERE field = 1;
SELECT TOP(5000) * FROM TABLE Where field = 1;
Can anyone help?
do cross join
select * from
(SELECT TOP(5000) * FROM TABLE Where field = 1) a
,(SELECT COUNT(*) as cnt FROM TABLE WHERE field = 1;) b
You can try below
SELECT TOP(5000) *,(SELECT COUNT(*) FROM TABLE WHERE field = 1)
FROM TABLE Where field = 1;
You can try below query:
SELECT TOP 5000 *,
COUNT(*)
OVER(ORDER BY (SELECT NULL))
FROM table
WHERE field = 1;

Get Total Sum with User Sum

SQL Table:
UserId ReportsRead
1 4
2 6
3 5
I would like to query that table so that I can get the following out:
UserId ReportsRead TotalReports
1 4 15
The problem is that because I apply the WHERE clause the sum I get will be the same as users reports read.
SELECT UserId, ReportsRead, SUM(ReportsRead) AS TotalReports FROM MyTable WHERE UserId = 1
Is there a built in function that will allow me to do this? I would like to avoid Sub-queries entirely.
I don't usually recommend subqueries in this situation, but in this case, it seems like a simple approach:
SELECT UserId, ReportsRead,
(SELECT SUM(ReportsRead) from MyTable) AS TotalReports
FROM MyTable
WHERE UserId = 1;
If you want rows for all users, then window functions are the way to go:
select t.*, sum(reportsread) over () as totalreports
from mytable;
However, you can't include a where clause and still expect to get the correct total.
Use the sum window function.
SELECT UserId, ReportsRead, SUM(ReportsRead) OVER() AS TotalReports
FROM MyTable
Use a filtering condition to get a specific userId like
SELECT *
FROM (SELECT UserId, ReportsRead, SUM(ReportsRead) OVER() AS TotalReports
FROM MyTable
) t
WHERE UserId=1

adding count( ) column on each row

I'm not sure if this is even a good question or not.
I have a complex query with lot's of unions that searches multiple tables for a certain keyword (user input). All tables in which there is searched are related to the table book.
There is paging on the resultset using LIMIT, so there's always a maximum of 10 results that get withdrawn.
I want an extra column in the resultset displaying the total amount of results found however. I do not want to do this using a separate query. Is it possible to add a count() column to the resultset that counts every result found?
the output would look like this:
ID Title Author Count(...)
1 book_1 auth_1 23
2 book_2 auth_2 23
4 book_4 auth_.. 23
...
Thanks!
This won't add the count to each row, but one way to get the total count without running a second query is to run your first query using the SQL_CALC_FOUND_ROWS option and then select FOUND_ROWS(). This is sometimes useful if you want to know how many total results there are so you can calculate the page count.
Example:
select SQL_CALC_FOUND_ROWS ID, Title, Author
from yourtable
limit 0, 10;
SELECT FOUND_ROWS();
From the manual:
http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_found-rows
The usual way of counting in a query is to group on the fields that are returned:
select ID, Title, Author, count(*) as Cnt
from ...
group by ID, Title, Author
order by Title
limit 1, 10
The Cnt column will contain the number of records in each group, i.e. for each title.
Regarding second query:
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl
cross join (select count(*) as cnt from tbl) as x
If you will not join to other table(s):
select tbl.id, tbl.title, tbl.author, x.cnt
from tbl, (select count(*) as cnt from tbl) as x
My Solution:
SELECT COUNT(1) over(partition BY text) totalRecordNumber
FROM (SELECT 'a' text, id_consult_req
FROM consult_req cr);
If your problem is simply the speed/cost of doing a second (complex) query I would suggest you simply select the resultset into a hash-table and then count the rows from there while returning, or even more efficiently use the rowcount of the previous resultset, then you do not even have to recount
This will add the total count on each row:
select count(*) over (order by (select 1)) as Cnt,*
from yourtable
Here is your answare:
SELECT *, #cnt count_rows FROM (
SELECT *, (#cnt := #cnt + 1) row_number FROM your_table
CROSS JOIN (SELECT #cnt := 0 AS variable) t
) t;
You simply cannot do this, you'll have to use a second query.