Defining an EXTRACT range from a SELECT statement - azure-data-lake

I intend to process a dataset from EventHub stored in ADLA, in batches. It seems logical to me to process intervals, where my date is between my last execution datetime and the current execution datetime.
I thought about saving the execution timestamps in a table so I can keep track of it, and do the following:
DECLARE #my_file string = #"/data/raw/my-ns/my-eh/{date:yyyy}/{date:MM}/{date:dd}/{date:HH}/{date:mm}/{date:ss}/{*}.avro";
DECLARE #max_datetime DateTime = DateTime.Now;
#min_datetime =
SELECT (DateTime) MAX(execution_datetime) AS min_datetime
FROM my_adldb.dbo.watermark;
#my_json_bytes =
EXTRACT Body byte[],
date DateTime
FROM #my_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(#"{""type"":""record"",""name"":""EventData"",""namespace"":""Microsoft.ServiceBus.Messaging"",""fields"":[{""name"":""SequenceNumber"",""type"":""long""},{""name"":""Offset"",""type"":""string""},{""name"":""EnqueuedTimeUtc"",""type"":""string""},{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes"",""null""]}},{""name"":""Body"",""type"":[""null"",""bytes""]}]}");
How do I properly add this interval to my EXTRACT query? I tested it using a common WHERE clause with interval defined by hand and it worked, but when I attempt to use #min_datetime it doesn't work, since its result is a rowset.
I thought about applying some filtering in a subsequent query, but I am afraid this means #my_json_bytes will extract my whole dataset and filter it aftewards, resulting in a suboptimized query.
Thanks in advance.

You should be able to apply the filter as part of a later SELECT. U-SQL can push up predicates in certain conditions but I haven't been able to test this yet. Try something like this:
#min_datetime =
SELECT (DateTime) MAX(execution_datetime) AS min_datetime
FROM my_adldb.dbo.watermark;
#my_json_bytes =
EXTRACT Body byte[],
date DateTime
FROM #my_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(#"{""type"":""record"",""name"":""EventData"",""namespace"":""Microsoft.ServiceBus.Messaging"",""fields"":[{""name"":""SequenceNumber"",""type"":""long""},{""name"":""Offset"",""type"":""string""},{""name"":""EnqueuedTimeUtc"",""type"":""string""},{""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},{""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes"",""null""]}},{""name"":""Body"",""type"":[""null"",""bytes""]}]}");
#working =
SELECT *
FROM #my_json_bytes AS j
CROSS JOIN
#min_datetime AS t
WHERE j.date > t.min_datetime;

Related

How to include date range in SQL query for Data Studio BigQuery community connector

I'm trying to make a community connector using the advanced services from Google's Data Studio to connect to my BigQuery data table. The connector is all set up and my getData function returns a query which looks like:
var sqlString = "SELECT * FROM `PROJECT.DATASET.TABLE` WHERE " +
"DATE(timestamp) >= #startDate AND DATE(timestamp) <= #endDate;"
where PROJECT, DATASET, and TABLE are filled in with their respective IDs. The 'timestamp' field is a BigQuery field in my data table of type TIMESTAMP.
In my getConfig function, I'm setting the configuration to add a daterange object to the request passed into getData:
function getConfig() {
...
config.setDateRangeRequired(true);
...
}
I'm then returning the community connector object (defined as 'cc' variable in code below) in my getData function, setting the sql string, query parameters for startDate and endDate, and some other necessary info:
function getData(request) {
...
return cc
.newBigQueryConfig()
.setAccessToken(accessToken) // defined earlier
.setBillingProjectId(billingProjectId) // defined earlier
.setUseStandardSql(true)
.setQuery(sqlString)
.addQueryParameter('startDate', bqTypes.STRING,
request.dateRange.startDate)
.addQueryParameter('endDate', bqTypes.STRING,
request.dateRange.endDate)
}
When I run this connector in a report, it connects to BigQuery and even queries the table, but it does not return any data. When I replace #startDate and #endDate with string literals of format 'yyyy-mm-dd', it works as expected, so it seems like my only problem is that I can't figure out how to set the date range parameters in the query (which I assume I'm supposed to do to allow date range control in data studio reports). How do I configure this daterange object so that people can control daterange tags in data studio reports?
Edit: For clarification, I know how to add the date range control on a report. The problem is that the query does not return any data even when the date range query parameters are passed in.
I ended up fixing my SQL query. I made my WHERE condition as
WHERE DATE(requestTimestamp) BETWEEN #startDate AND #endDate
and it actually returned data correctly. I didn't mention another parameter I was using in my query because I thought it was irrelevant, but I had quotes around another conditioned parameter, which may have screwed up the query. The condition before was more like:
WHERE id = '#id' AND DATE(requestTimestamp) BETWEEN #startDate AND #endDate
I think putting quotes around #id was the problem, because changing the query to:
WHERE id = #id AND DATE(requestTimestamp) BETWEEN #startDate AND #endDate
worked perfectly
You can use a Date range control and configured the timestamp field to it. It should automatically pick the timestamp type field.
Go to Insert and select Date range control to add it to your report.
You can select the date range in view mode.
Like this,

measuring the time of each operation in SQL query

I have a SQL query in the format of tree (A⨝B)⨝(C⨝D)⨝(E⨝F)⨝(G⨝H)⨝(I⨝J) containing different joins.I want to know that is there any method that we can find the time for each join operation separately like how much time sub expression (A⨝B) can take. Or (C⨝D) can take. instead of whole expression. Or how can we find the time for only the sub expression (A⨝B)⨝(C⨝D). I have converted my SQL query into tree by using Java language. Thanks in Advance. I am using SQL server for implementing my queries
I'm not sure if this is what you need but you could try with DATEDIFF if you can split each operation.
DECLARE #timer1 DATETIME
DECLARE #timer2 DATETIME
SET #timer1 = GETDATE()
--stuff to measure here
SET #timer2 = GETDATE()
SELECT DATEDIFF(millisecond,#timer1,#timer2 ) AS time_spent
I think you can compare different queries and see which one does the best.

SQL using parenthesis to include multiple criteria after WHERE

I am wondering if it's possible for us to use parenthesis to include multiple criteria after WHERE. For example, I am looking for data from multiple dates, and original code looks like this,
SELECT * FROM MyDB
WHERE Date = '2016-06-30' OR Date = '2016-09-30' OR Date = '2016-12-31'
This code will be extremely long if I will need to get data from more time periods, or column name is long and complex. I tried to change the code into the following format and apparently it's not correct,
SELECT * FROM MyDB
WHERE Date = ('2016-06-30', '2016-09-30', '2016-12-31')
I am wondering if there is a way we can code in the method I described above so that it can get me data from all dates (or other criteria)?
Thanks in advance!
Using IN
SELECT * FROM MyDB
WHERE Date IN ('2016-06-30', '2016-09-30', '2016-12-31')

Date range comparison using varchar columns in Teradata

I've been tasked to take a calendar date range value from a form front-end and use it to, among other things, feed a query in a Teradata table that does not have a datetime column. Instead the date is aggregated from two varchar columns: one for year (CY = current year, LY = last year, LY-1, etc), and one for the date with format MonDD (like Jan13, Dec08, etc).
I'm using Coldfusion for the form and result page, so I have the ability to dynamically create the query, but I can't think of a good way to do it for all possible cases. Any ideas? Even year differences aside, I can't think of anything outside of a direct comparison on each day in the range with a potential ton of separate OR statements in the query. I'm light on SQL knowledge - maybe there's a better way to script it in the SQL itself using some sort of conversion on the two varchar columns to form an actual date range where date comparisons could then be made?
Here is some SQL that will take the VARCHAR date value and perform some basic manipulations on it to get you started:
SELECT CAST(CAST('Jan18'||TRIM(EXTRACT(YEAR FROM CURRENT_DATE)) AS CHAR(9)) AS DATE FORMAT 'MMMDDYYYY') AS BaseDate_
, CASE WHEN Col1 = 'CY'
THEN BaseDate_
WHEN Col1 = 'LY'
THEN ADD_MONTHS(BaseDate_, -12)
WHEN Col1 = 'LY-1'
THEN ADD_MONTHS(BaseDate_, -24)
ELSE BaseDate_
END AS DateModified_
FROM {MyDB}.{MyTable};
The EXTRACT() function allows you to take apart a DATE, TIME, or TIMESTAMP value.
You have you use TRIM() around the EXTRACT to get rid of the whitespace that is added converting the DATEPART to a CHAR data type. Teradata is funny with dates and often requires a double CAST() to get things sorted out.
The CASE statement simply takes the encoded values you suggested will be used and uses the ADD_MONTHS() function to manipulate the date. Dates are INTEGER in Teradata so you can also add INTEGER values to them to move the date by a whole day. Unlike Oracle, you can't add fractional values to manipulate the TIME portion of a TIMESTAMP. DATE != TIMESTAMP in Teradata.
Rob gave you an sql approach. Alternatively you can use ColdFusion to generate values for the columns you have. Something like this might work.
sampleDate = CreateDate(2010,4,12); // this simulates user input
if (year(sampleDate) is year(now())
col1Value = 'CY';
else if (year(now()) - year(sampleDate) is 1)
col1Value = 'LY'
else
col1Value = 'LY-' & DateDiff("yyyy", sampleDate, now());
col2Value = DateFormat(sampleDate, 'mmmdd');
Then you send col1Value and col2Value to your query as parameters.

Sql Shorthand For Dates

Is there a way to write a query equivalent to
select * from log_table where dt >= 'nov-27-2009' and dt < 'nov-28-2009';
but where you could specify only 1 date and say you want the results for that entire day until the next one.
I'm just making this up, but something of the form:
select * from log_table where dt = 'nov-27-2009':+1;
I do not believe there is one method that is portable to all RDBMSes.
A check in one of my references (SQL Cookbook) shows that no one RDBMS solves the problem quite the same way. I would recommend checking out Chapter 8 of that book, which covers all of the different methods for DB2, Oracle, PostgreSQL, MySQL.
I've had to deal with this issue in SQLite, though, and SQL Cookbook doesn't address that RDBMS, so I'll mention a bit about it here. SQLite doesn't have a date/time data type; you have to create your own by storing all date/time data as TEXT and ensure that your application enforces its formatting. SQLite does have a set of date/time conversion functions that allow you to add nominal date/times while maintaining the data as strings. If you need to add two time durations (HH:MM:SS) to each other, though, based upon data that you've stored in text columns that you are treating as date/time data, you'll have to write your own functions (search for "Defining SQLite User Functions") and attach them to the database at runtime via a call to sqlite3_create_function(). If you want an example of some user functions that add time values, let me know.
For MS SQL Server, check out DATEPART.
/* dy = Day of Year */
select * from log_table where datepart(dy, dt) = datepart(dy, '2009-nov-27');
With SQL Server, you could
Select * From table
Where dt >= DateAdd(day, DateDiff(day, 0, #ParamDate), 0)
And dt < DateAdd(day, DateDiff(day, 0, #ParamDate), 1)
As long as you are dealing with the date data type for the respective data type, the following will work:
t.date_column + 1
...will add one day to the given date. But I have yet to find a db that allows for implicit data type conversion into a date.
SELECT '12-10-2009' + 1
...will fail on SQL Server because SQL Server only performs the implicit conversion when comparing to a datetime data type column. So you need to use:
SELECT CONVERT(DATETIME, '12-10-2009') + 1
For Oracle, you'd have to use the TO_DATE function; MySQL would use something like STR_TO_DATE, etc.
Have a column that just has the date part (time is 00:00:00.000) and then you can add a where clause: WHERE dt = '2009-11-27'