Use standard SQL queries in java bigquery API - google-bigquery

Is it possible to use standard SQL queries when using java bigquery API?
I am trying to execute query but it throws
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
"message" : "11.3 - 11.56: Unrecognized type FLOAT64"

There are two ways to use standard SQL with the BigQuery Java API. The first is to start your query text with #standardSQL, e.g.:
#standardSQL
SELECT ...
FROM YourTable;
The second is to set useLegacySql to false as part of the QueryJobConfiguration object. For example (taken from the documentation):
public static void runStandardSqlQuery(String queryString)
throws TimeoutException, InterruptedException {
QueryJobConfiguration queryConfig =
QueryJobConfiguration.newBuilder(queryString)
// To use standard SQL syntax, set useLegacySql to false.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
runQuery(queryConfig);
}

Related

Is "--view_udf_resource" broken?

I would like to reference a UDF inside a View. According to BigQuery documentation ('bq help mk') and to this post How do I create a BigQuery view that uses a user-defined function?, it is possible to do it with the "--view_udf_resource" syntax.
However, when I try it I get the following error:
# gsutil cat gs://mybucket/bar.js
CREATE TEMP FUNCTION GetWord() AS ('fire');
# bq mk --nouse_legacy_sql --view_udf_resource="gs://mybucket/bar2.js" --view="SELECT 1 as one, GetWord() as myvalue" mydataset.myfoo
Error in query string: Function not found: GetWord at [1:18]
I have also tried it with the Java API and I get the same error:
public void foo(){
final String viewQuery = "#standardSQL\n SELECT 1 as one, GetWord() as myvalue";
UserDefinedFunction userDefinedFunction = UserDefinedFunction.inline("CREATE TEMP FUNCTION GetWord() AS ('fire');");
ViewDefinition tableDefinition = ViewDefinition.newBuilder(viewQuery)
.setUserDefinedFunctions(userDefinedFunction)
.build();
TableId viewTableId = TableId.of(projectName, dataSetName, "foobar");
final TableInfo tableInfo = TableInfo.newBuilder(viewTableId, tableDefinition).build();
bigQuery.create(tableInfo);
}
com.google.cloud.bigquery.BigQueryException: Function not found: GetWord at [2:19]
Am I doing something wrong? Or is the Google's documentation misleading and it is not possible to reference any custom UDF from a View?
You cannot (currently) create a view using standard SQL that uses UDFs. You need to have all of the logic be inline as part of the query itself, and the post that you are looking at is about JavaScript UDFs using legacy SQL. There is an open feature request to support permanent registration of UDFs, however, which would enable you to reference UDFs from views.

Major performance difference between Entity Framework generated sp_executesql and direct query in SSMS

I'm using Entity Framework for making a rather large query. Recently this query is failing due to timeout exceptions.
When I started investigating this issue I used LinqPad and directly copied the SQL output in SSMS and ran the query. This query returns within 1 second!
The query then looks like (only for illustration, the real query is much larger)
DECLARE #p__linq__0 DateTime2 = '2017-10-01 00:00:00.0000000'
DECLARE #p__linq__1 DateTime2 = '2017-10-31 00:00:00.0000000'
SELECT
[Project8].[Volgnummer] AS [Volgnummer],
[Project8].[FkKlant] AS [FkKlant],
-- rest omitted for brevity
Now I used SQL Profiler to capture the real SQL send to the server. The query is exactly the same with the difference that this query is encapsulated within a call to sp_executesql. Like this:
exec sp_executesql N'SELECT
[Project8].[Volgnummer] AS [Volgnummer],
[Project8].[FkKlant] AS [FkKlant],
-- rest omitted for brevity
',N'#p__linq__0 datetime2(7),#p__linq__1 datetime2(7)',
#p__linq__0='2017-10-01 00:00:00',#p__linq__1='2017-10-31 00:00:00'
When I copy/paste this query in SSMS it runs for 60 seconds and thus results in a timeout when using from EF with default settings!
I can't wrap my head around why this difference is occurring, as this is the same query, the only thing is, it is executed differently.
I read a lot about why EF uses sp_executesql and I understand why. I also read that sp_executesql is different from EXEC because it makes use of the queryplan cache, but I don't understand why the SQL optimizer has such difficulty in creating a performant query plan for the sp_executesql version whereas it is capable of creating a performant queryplan for the direct query version.
I'm not sure if the complete query itself adds to the question. If it does, let me know and I will make an edit.
Thanks to the supplied comments I managed two things:
I now understand the query plan and the differences between parameter sniffing and variables in queries
I implemented a DbCommandInterceptor to add OPTION (OPTIMIZE FOR UNKNOWN) to the query when needed.
The SQL query compiled by Entity Framework can be intercepted before send to the server by adding an implementation to DbInterception.
Such an implementation is trivial to make:
public class QueryHintInterceptor : DbCommandInterceptor
{
public override void ReaderExecuting(DbCommand command,
DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
queryHint = " OPTION (OPTIMIZE FOR UNKNOWN)";
if (!command.CommandText.EndsWith(queryHint))
{
command.CommandText += queryHint;
}
base.ReaderExecuting(command, interceptionContext);
}
}
// Add to the interception proces:
DbInterception.Add(new QueryHintsInterceptor());
As Entity Framework also caches the queries, I check if an optimization already has been added.
But this approach will intercept all queries and obviously one should not do this. As the DbCommandInterceptionContext gives access to the DbContext I added an interface with a single property (ISupportQueryHints) to my DbContext which I set to a optimization when the query needs this.
This now looks like this:
public class QueryHintInterceptor : DbCommandInterceptor
{
public override void ReaderExecuting(DbCommand command,
DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
var dbContext =
interceptionContext.DbContexts.FirstOrDefault(d => d is ISupportQueryHints) as ISupportQueryHints;
if (dbContext != null)
{
var queryHint = $" OPTION ({dbContext.QueryHint})";
if (!command.CommandText.EndsWith(queryHint))
{
command.CommandText += queryHint;
}
}
base.ReaderExecuting(command, interceptionContext);
}
}
Where needed this can be used as:
public IEnumerable<SomeDto> QuerySomeDto()
{
using (var dbContext = new MyQuerySupportingDbContext())
{
dbContext.QueryHint = "OPTIMIZE FOR UNKNOWN";
return this.PerformQuery(dbContext);
}
}
Because my application makes use of a message based architecture surrounding commands and queries as described here my implementation consists of a decorator around the queryhandlers in need of optimization. This decorator sets the query hints to the DbContext whenever needed. This is however an implementation detail. The basic idea stays the same.
I updated #Ric.Net's QueryHintInterceptor class to handle the case where multiple contexts are being used for a query and may have their own hints:
public class QueryHintInterceptor : DbCommandInterceptor
{
public override void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
var contextHints = interceptionContext.DbContexts
.Select(c => (c as ISupportQueryHints)?.QueryHint)
.Where(h => !string.IsNullOrEmpty(h))
.Distinct()
.ToList();
var queryHint = $"{System.Environment.NewLine}OPTION ({ string.Join(", ", contextHints) })";
if (contextHints.Any() && !command.CommandText.EndsWith(queryHint))
{
command.CommandText += queryHint;
}
base.ReaderExecuting(command, interceptionContext);
}
}
Although honestly, if you're at that point, you might consider building a more robust solution like the one described here.

can we use Spark sql for reporting queries in REST web services

Some basic question regarding Spark. Can we use spark only in the context of processing jobs?In our use case we have stream of positon and motion data which we can refine and save it to cassandra tables.That is done with kafka and spark streaming.But for a web user who want to view some report with some search criteria can we use Spark(Spark SQL).Or for this purpose should we restrict to cql ? If we can use spark , how can we invoke spark-sql from a webservice deployed in tomcat server.
Well, you can do it by passing a SQL request via HTML address like:
http://yourwebsite.com/Requests?query=WOMAN
At the receiving point, the architecture will be something like:
Tomcat+Servlet --> Apache Kafka/Flume --> Spark Streaming --> Spark SQL inside a SS closure
In the servlet (if you don't know what a servlet is, better look it up) in the webapplication folder in your tomcat, you will have something like this:
public class QueryServlet extends HttpServlet{
#Override
public void doGet(ttpServletRequest request, HttpServletResponse response){
String requestChoice = request.getQueryString().split("=")[0];
String requestArgument = request.getQueryString().split("=")[1];
KafkaProducer<String, String> producer;
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("acks", "all");
properties.setProperty("retries", "0");
properties.setProperty("batch.size", "16384");
properties.setProperty("auto.commit.interval.ms", "1000");
properties.setProperty("linger.ms", "0");
properties.setProperty("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.setProperty("block.on.buffer.full", "true");
producer = new KafkaProducer<>(properties);
producer.send(new ProducerRecord<String, String>(
requestChoice,
requestArgument));
In the Spark Streaming running application (which you need to be running in order to catch the queries, otherwise you know how long it takes Spark to start), You need to have a Kafka Receiver
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(batchInt*1000));
Map<String, Integer> topicMap = new HashMap<>();
topicMap.put("wearable", 1);
//FIrst Dstream is a couple made by the topic and the value written to the topic
JavaPairReceiverInputDStream<String, String> kafkaStream =
KafkaUtils.createStream(jssc, "localhost:2181", "test", topicMap);
After this, what happens is that
You do a GET setting either the GET body or giving the argument to the query
The GET is caught by your servlet, which immediately creates, send, close a Kafka Producer (it is possible to actually avoid the Kafka Step, simply sending your Spark Streaming app the information in any other way; see SparkStreaming receivers)
Spark Streaming operates your SparkSQL code as any other submitted Spark application, but it keeps running waiting for other queries to come.
Of course, in the servlet you should check the validity of the request, but this is the main idea. Or at least the architecture I've been using

Is it possible to retrieve the namespace of a raised error?

When I raise an error from within an XQuery query, for instance with:
error( fn:QName( 'http://example.com', 'XMPL0001' ), 'Conflict' )
... the following is returned by BaseX (be it when communicating with the server, or from within the GUI)
Stopped at ., 1/7:
[XMPL0001] Conflict
Is it somehow possible to retrieve the namespace of the error (in this case http://example.com) as well?
I'm using a customized PHP client and I would like to use this information to prevent possible (future) conflicts with my custom error codes and parse the errors to throw either a standard BaseX\Exception or a custom SomeNamespace\Exception, depending on the namespace of the error.
I could, of course, simply use another error code pattern than the typical ABCD1234 XQuery pattern, to prevent possible (future) error code conflicts, but the possible use of a namespace appeals to me more, because I can then define an uniform Exception interface, such as:
interface ExceptionInterface
{
public function getCategory(); // the 4 alpha character part
public function getCode(); // the 4 digit part
}
I'm currently using BaseX 7.7.2, by the way.
Yes, you can retrieve information about the error using a few variables in the error namespace, which are in scope of the try-catch statement, like so:
declare namespace err = "http://www.w3.org/2005/xqt-errors";
try {
error( fn:QName( 'http://example.com', 'XMPL0001' ), 'Conflict' )
}
catch * {
namespace-uri-from-QName($err:code)
}
This assumes that you are using XQuery 3.0.

How to use dyanamic linq in wcf?

I am using linq-to-entity to retrieve data dynamically and created a method as follows:
public List<object> getDynamicList(string tablename, List<string> colnames)
{
try
{
var query = DynamicQueryable.getDynamicData(dbcontext, tablename, colnames);
List<object> objQueryable = new List<object>();
object obj = query.AsQueryable();
objQueryable.Add(obj);
return objQueryable;
}
catch (Exception ex)
{
HandleError(ex);
}
}
this method in wcf service internally refers dynamic class given in LINQ samples (C:\Program Files (x86)\Microsoft Visual Studio 10.0\Samples\1033)by MSVS2010.
when i am passing tablename,columns dynamically it does but on client side,while consuming that method it gives error- The server did not provide a meaningful reply; this might be caused by a contract mismatch, a premature session shutdown or an internal server error.
does wcf gives issue with iqueryable return type ?
please suggest......
You should look into WCF Data Services aka oData Services
Try using returning ToList() instead because Linq uses concept of Deffer Loading which means when at client side ToList or Result is being accessed it tries to connect to Server for getting result and it is where it would be getting failed. It is recommended when you use such kind of ORM you detach your objects and send result to client.