Difference between "Preview" and Query in BigQuery - google-bigquery

I have the following table schema:
+-----+---------+----------+
+ chn | INTEGER | NULLABLE |
+-----+---------+----------|
+ size| STRING | NULLABLE |
+-----+---------+----------|
+ char| REPEATED| NULLABLE |
+-----+---------+----------|
+ ped | INTEGER | NULLABLE |
+-----+---------+----------
When I click on 'preview' in the Google BigQuery Web UI, I get the following result:
But when I query my table, I get this result:
It seems like "preview" is interpreting my repeated field as an array, I would want to get the same result in a query to limit the number of rows.
I did try to uncheck "Use Legacy SQL" which gave me the same result but the problem is that with my table, a same query takes ~1.0 sec to execute with "Use Legacy SQL" checked and ~12 seconds when it's unchecked.
I am looking for speed here so unfortunately, not using Legacy SQL is not an option...
Is there another way to render my repeated field like it does in the "preview" ?
Thanks for the help :)

In legacy SQL, BigQuery flattens the result of queries by default. This means two things:
All child fields of RECORD fields are propagated to the top-level, with their names changed from record.subrecord.leaf to record_subrecord_leaf. Parent records are removed from the schema.
All repeated fields are converted to fields of optional mode, with each repeated value expanded into its own row. (As a side note, this step is very similar to the FLATTEN function exposed in legacy SQL.)
What you see here is a product of #2. Each repeated value is becoming its own row (as you can see by the row count on the left-hand side in your two images) and the values from the other columns are, well, repeated for each new row.
You can prevent this behavior and receive "unflattened results" in a couple ways.
Using standard SQL, as you note in your original question. All standard SQL queries return unflattened results.
While using legacy SQL, setting the flattenResults parameter to false. This requires also specifying a destination table and setting allowLargeResults to false. These can be found in the Show Options panel beneath the query editor if you want to set them within the UI. Mikhail has some good suggestions for managing the temporary-ness of destination tables if you aren't interested in keeping them around.
I should note that there are a number of corner cases with legacy SQL with flattenResults set to false which might trip you up if you start writing more complex queries. A prominent example is that you can't output more than one independently repeated field in query results using legacy SQL, but you can output multiple with standard SQL. These issues are unlikely to be resolved in legacy SQL, and going forward we're suggesting people use standard SQL when they run into them.
If you could provide more details about your much slower query using standard SQL (e.g. job ID in legacy SQL, job ID in standard SQL, for comparison), I, and the rest of the BigQuery team, would be very interested in investigating further.

Is there another way to render my repeated field like it does in the
"preview" ?
To see original not-flattened output in Web UI for Legacy SQL, i used to set respective options (click Show Options) to actually write output to table with checked Allow Large Results and unchecked Flatten Results.
This actually not only saves result into table but also shows result in the same way as preview does (because it is actually preview of that table). To make sure that table gets removed afterwards - i have "dedicated" dataset (temp) with default expiration set to 1 day (or hour - depends on how aggressive you want to be with your junk), so you don't need to worry of that table(s) - it will get deleted automatically for you. Wanted to note: this was quite a common pattern for us to deal with and having to do extra settings was boring, so we ended up with our own custom UI that does all this for user automatically

What you see is called Flatten.
By default the UI flattens the query output, there is currently no option to show query results like you want. In order to produce unflatten results you must write to a table, but that's different thing.

Related

ms access query (ms access freezes)

I have this report and need to add totals for each person (the red circle)
existing report
new report
I cannot change the existing report so I export data from MS SQL to MS Access and create a new report there. I got it working for one employee but have trouble with a query which would for multiple employees.
This query extract data use as input:
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[EMP_ID])=376) And (([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
this query populates the report:
SELECT *
FROM TIME1
WHERE RCD_NUM = (SELECT Max(RCD_NUM) FROM [TIME1] UQ WHERE UQ.PPERIOD = [TIME1].PPERIOD AND UQ.PC = [TIME1].PC);
the problem is if I remove EMP_ID from the first query like this
SELECT [TIME].[RCD_NUM], [TIME].[EMP_ID], [TIME].[PPERIOD], [TIME].[PRUN], [TIME].[TDATE], [TIME].[PC], [TIME].[RATE], [TIME].[HOURS], [TIME].[AMOUNT], [TIME].[JOB_ID], [TIME].[UPDATED], [TIME].[UPDATED_BY], [TIME].[LOG_DATE], [TIME].[ORIGINAL_REC_NUM]
FROM [TIME]
WHERE ((([TIME].[TDATE])<=#12/31/2006# And ([TIME].[TDATE])>=#1/1/2006#) And (([TIME].[PC])<599));
then the second query doesn't work and ms access freezes when running this query.
any help/idea please?
Caveat: I won't pretend to know the precise cause of the problem, but I have had to repeatedly refactor queries in Access to get them working even though the original SQL statements are completely valid in regards to syntax and logic. Sometimes I've had to convolute a sequence of queries just to avoid bugs in Access. Access is often rather dumb and will simply (re)execute queries and subqueries exactly as given without optimization. At other times Access will attempt to combine queries by performing some internal optimizations, but sometimes those introduce frustrating bugs. Something as simple as a name change or column reordering can be the difference between a functioning query and one that crashes or freezes Access.
First consider:
Can you leave the data on SQL Server and link to the results in Access (rather than export/importing it into Access)? Even if you need or prefer to use Access for creating the actual report, you could use all the power of SQL Server for querying the data--it is likely less buggy and more efficient.
Common best practice is to create SQL Server stored procedures that return just what data you need in Access. A pass-through query is created in Access to retrieve the data, but all data operations are performed on the server.
Perhaps this is just a performance issue where limiting the set by [EMP_ID] selects a small subset, but the full table is large enough to "freeze" Access.
How long have you let Access remain frozen before killing the process? Be patient... like many, many minutes (or hours). Start it in the morning and check after lunch. :) It might eventually return a result set. This does not imply it is tolerable or that there is no other solution, but it can be useful to know if it eventually returns data or not.
How many possible records are there?
Are the imported data properly indexed? Add indexes to all key fields and those which are used in WHERE clauses.
Is the database located on a network share or is it local? Try copying the database to a local drive.
Other hints:
Try the BETWEEN operator for dates in the WHERE clause.
Try refactoring the "second" query by performing a join in the FROM clause rather than the WHERE clause. In doing this, you may also want to save the subquery as a named query (just as [TIME1] is saved). Whether or not a query is saved or embedded in another statement CAN change the behavior of Access (see caveat) even though the results should be identical.
Here's a version with the embedded aggregate query. Notice how all column references are qualified with the source. Some of the original query's columns do not have a source alias prefixing the column name. Remember the caveat... such picky details can affect Access behavior.:
SELECT TIME1.*
FROM TIME1 INNER JOIN
(SELECT UQ.PPERIOD, UQ.PC, Max(UQ.RCD_NUM) As Max_RCD_NUM
FROM [TIME1] UQ
GROUP BY UQ.PPERIOD, UQ.PC) As TIMEAGG
ON (TIME1.PPERIOD = TIMEAGG.PPERIOD) And (TIME1.PC = TIMEAGG.PC)
AND (TIME1.RCD_NUM = TIMEAGG.Max_RCD_NUM)

Endeca search query on multiple fields

How to create an Endeca query on combination of multiple fields [just like where clause in sql query]. Suppose we have three fields indexed are -
empId
empName
empGender
Now, I need a query like "where empName like 's%' AND empGender=male"
Thanks.
Firstly,
Checkout Record Filters in the Advanced Development Guide.
If you are trying to use a Record Filter on a property, you will need to enable it explicitly in Developer Studio for that property, while your Dimensions will automatically have the ability to apply a Record Filter. This will help when you have explicit values to filter on, for example empGender.
Your Record Filter can then look as follow:
Nr=AND(empGender:male)
You can further use the Ntk parameter to specify fields to search on so assuming your empName field is enabled for wildcard searching (configure this in Developer Studio) searching this field will look as follow:
Ntk=empName&Ntt=s*
So assuming your properties have been configured correctly, your example above will probably end up looking as follow:
Nr=AND(empGender:male)&Ntk=empName&Ntt=s*
To take this one step further, you can specify Search Filters (ie. Ntk + Ntt parameters) together. I haven't tried this for wildcards so you'll need to confirm that yourself but to combine Search Filters you delimit them with |
Ntk=empName|empId&Ntt=s*|1234*
I suggest you manually build up queries in the Reference Application to confirm you get your expected results and then start to code this up in your application.
radimbe, the problem with record filters for this use case is that they need to be precise. This means you don't get pelling correction, thesaurus expansion, case insensitivity or stemming. It's very unlikely that a user will input precise information like this.
Saraubh, you can do a boolean search to do OR text search queries. You can also use the Endeca Query Language to specify a complex set of boolean logic that goes beyond boolean search and which would incorporate spelling correction, stemming, etc.
In general though, I think for an application like this, you should move away from searching specific individual fields simultaneously and make use of the faceting capabilities of dimensions to guide the user. Additionally, a search box that searches many fields in combination simultaneously in order of importance is really the way to go for a simplified user interface for this sort of application.

How can I make MS Access Query Parameters Optional?

I have a query that I would like to filter in different ways at different times. The way I have done this right now by placing parameters in the criteria field of the relevant query fields, however there are many cases in which I do not want to filter on a given field but only on the other fields. Is there any way in which a wildcard of some sort can be passed to the criteria parameter so that I can bypass the filtering for that particular call of the query?
If you construct your query like so:
PARAMETERS ParamA Text ( 255 );
SELECT t.id, t.topic_id
FROM SomeTable t
WHERE t.id Like IIf(IsNull([ParamA]),"*",[ParamA])
All records will be selected if the parameter is not filled in.
Note the * wildcard with the LIKE keyword will only have the desired effect in ANSI-89 Query Mode.
Many people mistakenly assume the wildcard character in Access/Jet is always *. Not so. Jet has two wildcards: % in ANSI-92 Query Mode and * in ANSI-89 Query Mode.
ADO is always ANSI-92 and DAO is always ANSI-89 but the Access interface can be either.
When using the LIKE keyword in a database object (i.e. something that will be persisted in the mdb file), you should to think to yourself: what would happen if someone used this database using a Query Mode other than the one I usually use myself? Say you wanted to restrict a text field to numeric characters only and you'd written your Validation Rule like this:
NOT LIKE "*[!0-9]*"
If someone unwittingly (or otherwise) connected to your .mdb via ADO then the validation rule above would allow them to add data with non-numeric characters and your data integrity would be shot. Not good.
Better IMO to always code for both ANSI Query Modes. Perhaps this is best achieved by explicitly coding for both Modes e.g.
NOT LIKE "*[!0-9]*" AND NOT LIKE "%[!0-9]%"
But with more involved Jet SQL DML/DDL, this can become very hard to achieve concisely. That is why I recommend using the ALIKE keyword, which uses the ANSI-92 Query Mode wildcard character regardless of Query Mode e.g.
NOT ALIKE "%[!0-9]%"
Note ALIKE is undocumented (and I assume this is why my original post got marked down). I've tested this in Jet 3.51 (Access97), Jet 4.0 (Access2000 to 2003) and ACE (Access2007) and it works fine. I've previously posted this in the newsgroups and had the approval of Access MVPs. Normally I would steer clear of undocumented features myself but make an exception in this case because Jet has been deprecated for nearly a decade and the Access team who keep it alive don't seem interested in making deep changes to the engines (or bug fixes!), which has the effect of making the Jet engine a very stable product.
For more details on Jet's ANSI Query modes, see About ANSI SQL query mode.
Back to my previous exampe in your previous question. Your parameterized query is a string looking like that:
qr = "Select Tbl_Country.* From Tbl_Country WHERE id_Country = [fid_country]"
depending on the nature of fid_Country (number, text, guid, date, etc), you'll have to replace it with a joker value and specific delimitation characters:
qr = replace(qr,"[fid_country]","""*""")
In order to fully allow wild cards, your original query could also be:
qr = "Select Tbl_Country.* From Tbl_Country _
WHERE id_Country LIKE [fid_country]"
You can then get wild card values for fid_Country such as
qr = replace(qr,"[fid_country]","G*")
Once you're done with that, you can use the string to open a recordset
set rs = currentDb.openRecordset(qr)
I don't think you can. How are you running the query?
I'd say if you need a query that has that many open variables, put it in a vba module or class, and call it, letting it build the string every time.
I'm not sure this helps, because I suspect you want to do this with a saved query rather than in VBA; however, the easiest thing you can do is build up a query line by line in VBA, and then creating a recordset from it.
A quite hackish way would be to re-write the saved query on the fly and then access that; however, if you have multiple people using the same DB you might run into conflicts, and you'll confuse the next developer down the line.
You could also programatically pass default value to the query (as discussed in you r previous question)
Well, you can return non-null values by passing * as the parameter for fields you don't wish to use in the current filter. In Access 2003 (and possibly earlier and later versions), if you are using like [paramName] as your criterion for a numeric, Text, Date, or Boolean field, an asterisk will display all records (that match the other criteria you specify). If you want to return null values as well, then you can use like [paramName] or Is Null as the criterion so that it returns all records. (This works best if you are building the query in code. If you are using an existing query, and you don't want to return null values when you do have a value for filtering, this won't work.)
If you're filtering a Memo field, you'll have to try another approach.

Is there a way to parser a SQL query to pull out the column names and table names?

I have 150+ SQL queries in separate text files that I need to analyze (just the actual SQL code, not the data results) in order to identify all column names and table names used. Preferably with the number of times each column and table makes an appearance. Writing a brand new SQL parsing program is trickier than is seems, with nested SELECT statements and the like.
There has to be a program, or code out there that does this (or something close to this), but I have not found it.
I actually ended up using a tool called
SQL Pretty Printer. You can purchase a desktop version, but I just used the free online application. Just copy the query into the text box, set the Output to "List DB Object" and click the Format SQL button.
It work great using around 150 different (and complex) SQL queries.
How about using the Execution Plan report in MS SQLServer? You can save this to an xml file which can then be parsed.
You may want to looking to something like this:
JSqlParser
which uses JavaCC to parse and return the query string as an object graph. I've never used it, so I can't vouch for its quality.
If you're application needs to do it, and has access to a database that has the tables etc, you could run something like:
SELECT TOP 0 * FROM MY_TABLE
Using ADO.NET. This would give you a DataTable instance for which you could query the columns and their attributes.
Please go with antlr... Write a grammar n follow the steps..which is given in antlr site..eventually you will get AST(abstract syntax tree). For the given query... we can traverse through this and bring all table ,column which is present in the query..
In DB2 you can append your query with something such as the following, but 1 is the minimum you can specify; it will throw an error if you try to specify 0:
FETCH FIRST 1 ROW ONLY

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.