Bigquery CASE SENSITIVE query with LIMIT clause is not working? - google-bigquery

When making a Bigquery query like for example:
Select Campaign FROM TABLE WHERE Campaign CONTAINS 'buy' GROUP BY Campaign IGNORE CASE LIMIT 100
The IGNORE CASE clause is not working when used with LIMIT clause.
Some time ago it did work.
Is this a Bigquery fault or something changed?
Thanks a lot
Ramiro

A couple of things here:
Legacy SQL expects IGNORE CASE to appear at the end of the query, so you need to use LIMIT 100 IGNORE CASE instead of IGNORE CASE LIMIT 100
The BigQuery team advises using standard SQL instead of legacy SQL if you're working on new queries, since it tends to have better error messages, better performance, etc. and it's where we're focusing our efforts going forward. You may also be interested in the migration guide.
If you want to use standard SQL for your query, you could do:
Select LOWER(Campaign) AS Campaign
FROM TABLE
WHERE LOWER(Campaign) LIKE '%buy%'
GROUP BY LOWER(Campaign)
LIMIT 100

Related

Big Query - different number of users when using legacy and normal sql

I have written a query in Google Big Query and want to get the same number of users I see in Google Analytics. I used Legacy and Normal SQL and got 3 different users numbers while the sessions were the same. What did I do wrong, or does anyone have an explanation/solution for it? Every help is appreciated!
Normal SQL
SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS
NULL,0,totals.visits)) AS sessions
FROM `XXX.XXX.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20181120' AND '20181120'
Legacy SQL
SELECT COUNT(DISTINCT fullVisitorId) AS users, SUM(IF(totals.visits IS
NULL,0,totals.visits)) AS sessions
FROM TABLE_DATE_RANGE([XXX:XXX.ga_sessions_], TIMESTAMP('2018-11-20'),
TIMESTAMP('2018-11-20'))
I think this warning from the documentation explains what is happening:
In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard SQL, it returns an exact count.
StandardSQL has the correct number. You can test this by attempting to use EXACT_COUNT_DISTINCT() in legacy SQL.

Replace specific characters within SQL query

I'm struggling with some special characters that work fine with my SQL query, however will create problems in a secondary system (Excel), so I would like to replace them already during the query if possible.
TRANSACTIONS
ID DESC
1 14ft
2 15/16ft
3 17ft
This is just a dummy example, but "/" represents one of the characters I need to remove, but there are a few different. Although it should technically work, I can't use:
select ID, case when DESC = '15/16ft' then '15_16ft' else DESC from TRANSACTIONS
I can't keep track on all the strings, so I should approach based on character. I'd prefer converting them to another char or removing them altogether.
Unfortunately not sure on the exact db engine, although good chance it's an IBM based product, but most "generic" SQL queries tend to run fine. And just to emphazise that I'm looking to convert data within the SQL query, not update the database records. Thanks a lot!

How to paginate results in Legacy SQL

We are using Legacy SQL on a specific request. We can't use standard SQL for some internal reasons.
We would like to paginate our results, because we have a lots of rows. Like that :
SELECT ... FROM ... LIMIT 10000 30000 // In standard SQL
But in Legacy SQL Offset don't exists. So how to do the same job ?
Edit :
I don't want to order. I want to paginate. For example get 1000 rows after skipping 2000 rows. A simple LIMIT clause with an offset, like in traditional SQL Database or like in BigQuery Standard SQL.
To do this, I want to use Big Query Legacy SQL.
The pagination you talking about is done via tabledata.list API
Based on your question and follow-up comments - It might be the way for you to go. Even though it does not involve querying. Just API or related method in client of your choice.
pageToken parameter allows you to page result
Btw, another benefit of this approach - it is free of charge
If you still need to do pagination via query - you option is using ROW_NUMBER()
In this case - you can prepare your data in temp table with below query
SELECT <needed fields>, ROW_NUMBER() OVER() num
FROM `project.dataset.table`
Then, you can page it using num
SELECT <needed fields>
FROM `project.dataset.temp`
WHERE num BETWEEN 10000 AND 30000

Beginner SQL section: avoiding repeated expression

I'm entirely new at SQL, but let's say that on the StackExchange Data Explorer, I just want to list the top 15 users by reputation, and I wrote something like this:
SELECT TOP 15
DisplayName, Id, Reputation, Reputation/1000 As RepInK
FROM
Users
WHERE
RepInK > 10
ORDER BY Reputation DESC
Currently this gives an Error: Invalid column name 'RepInK', which makes sense, I think, because RepInK is not a column in Users. I can easily fix this by saying WHERE Reputation/1000 > 10, essentially repeating the formula.
So the questions are:
Can I actually use the RepInK "column" in the WHERE clause?
Do I perhaps need to create a virtual table/view with this column, and then do a SELECT/WHERE query on it?
Can I name an expression, e.g. Reputation/1000, so I only have to repeat the names in a few places instead of the formula?
What do you call this? A substitution macro? A function? A stored procedure?
Is there an SQL quicksheet, glossary of terms, language specification, anything I can use to quickly pick up the syntax and semantics of the language?
I understand that there are different "flavors"?
Can I actually use the RepInK "column" in the WHERE clause?
No, but you can rest assured that your database will evaluate (Reputation / 1000) once, even if you use it both in the SELECT fields and within the WHERE clause.
Do I perhaps need to create a virtual table/view with this column, and then do a SELECT/WHERE query on it?
Yes, a view is one option to simplify complex queries.
Can I name an expression, e.g. Reputation/1000, so I only have to repeat the names in a few places instead of the formula?
You could create a user defined function which you can call something like convertToK, which would receive the rep value as an argument and returns that argument divided by 1000. However it is often not practical for a trivial case like the one in your example.
Is there an SQL quicksheet, glossary of terms, language specification, anything I can use to quickly pick up the syntax and semantics of the language?
I suggest practice. You may want to start following the mysql tag on Stack Overflow, where many beginner questions are asked every day. Download MySQL, and when you think there's a question within your reach, try to go for the solution. I think this will help you pick up speed, as well as awareness of the languages features. There's no need to post the answer at first, because there are some pretty fast guns on the topic over here, but with some practice I'm sure you'll be able to bring home some points :)
I understand that there are different "flavors"?
The flavors are actually extensions to ANSI SQL. Database vendors usually augment the SQL language with extensions such as Transact-SQL and PL/SQL.
You could simply re-write the WHERE clause
where reputation > 10000
This won't always be convenient. As an alternativly, you can use an inline view:
SELECT
a.DisplayName, a.Id, a.Reputation, a.RepInK
FROM
(
SELECT TOP 15
DisplayName, Id, Reputation, Reputation/1000 As RepInK
FROM
Users
ORDER BY Reputation DESC
) a
WHERE
a.RepInK > 10
Regarding something like named expressions, while there are several possible alternatives, the query optimizer is going to do best just writing out the formula Reputation / 1000 long-hand. If you really need to run a whole group of queries using the same evaluated value, your best bet is to create view with the field defined, but you wouldn't want to do that for a one-off query.
As an alternative, (and in cases where performance is not much of an issue), you could try something like:
SELECT TOP 15
DisplayName, Id, Reputation, RepInk
FROM (
SELECT DisplayName, Id, Reputation, Reputation / 1000 as RepInk
FROM Users
) AS table
WHERE table.RepInk > 10
ORDER BY Reputation DESC
though I don't believe that's supported by all SQL dialects and, again, the optimizer is likely to do a much worse job which this kind of thing (since it will run the SELECT against the full Users table and then filter that result). Still, for some situations this sort of query is appropriate (there's a name for this... I'm drawing a blank at the moment).
Personally, when I started out with SQL, I found the W3 schools reference to be my constant stopping-off point. It fits my style for being something I can glance at to find a quick answer and move on. Eventually, however, to really take advantage of the database it is necessary to delve into the vendors documentation.
Although SQL is "standarized", unfortunately (though, to some extent, fortunately), each database vendor implements their own version with their own extensions, which can lead to quite different syntax being the most appropriate (for a discussion of the incompatibilities of various databases on one issue see the SQLite documentation on NULL handling. In particular, standard functions, e.g., for handling DATEs and TIMEs tend to differ per vendor, and there are other, more drastic differences (particularly in not support subselects or properly handling JOINs). If you care for some of the details, this document provides both the standard forms and deviations for several major databases.
You CAN refer to RepInK in the Order By clause, but in the Where clause you must repeat the expression. But, as others have said, it will only be executed once.
There are good answers for the technical problem already, so I'll only address some of the rest of your questions.
If you're just working with the DataExplorer, you'll want to familiarize yourself with SQL Server syntax since that's what it's running. The best place to find that, of course, is MSDN's reference.
Yes, there are different variations in SQL syntax. For example, the TOP clause in the query you gave is SQL Server specific; in MySQL you'd use the LIMIT clause instead (and these keywords don't necessarily appear in the same spot in the query!).

Maximum values in wherein clause of mysql

Do anyone knows about how many values I can give in a where in clause? I get 25000 values in a where in clause and mysql is unable to execute. Any thoughts? Awaiting for your thoughts
Although this is old, it still shows up in search results so is worth answering.
There is no hard-coded maximum in MySQL for the length of a query. This includes all parts of the query such as the WHERE clause.
However, there is a value called max_allowed_packet which determines the largest query you can run on the MySQL server process. It isn't to do with the number of elements in the query, but the total length of the query. So
SELECT * FROM mytable WHERE mycol IN (1,2,3);
is less likely to hit the limit than
SELECT * FROM mytable WHERE mycal IN ('This string','That string','Tother string');
The value of max_allowed_packet is configurable from server to server. But almost certainly, if you find yourself hitting the limit because you're writing SQL statements of epic length (rather than dealing with binary data which is a legitimate reason to hit it), then you need to re-think your SQL.
I think that if this restriction is a problem then you're doing something wrong.
Perhaps you could store the data from your where clause in a table and then join with it. This would probably be more efficient.
I think it is something with execution time.
I think you are doing soemthing like this: Correct me if i am wrong:
Select FROM table WHERE V1='A1' AND V2='A1' AND V3='A3' AND ... Vn='An'
There is always a efficient way how you can do your SELECT in your database. Working in a database is importent to keep in your mind that seconds are very importent.
If you can share how your query is look like, then we can help you making a efficient SELECT statement.
I wish u succes