u-sql: Loading files in u-sql script - azure-data-lake

I have thousands of csv files containing spanning from January 2016 until today.
I want to load all files from 25th of November 2016 until 02nd of January 2017.
I know I can use virtual path as below, but wont this load all my data from disk? I only want data from the period above. Will adding the #result query (modifying to my time period) ensure that only the files I am interested in are loaded into memory?
DECLARE #file_set_path2 string = #dir + "{date:yyyy}/{date:MM}/{date:dd}/{date:MM}{date:dd}{date:yyyy}.csv";
#data =
EXTRACT vala int,
valb long,
valc DateTime,
date DateTime // virtual file set column
FROM #file_set_path2
USING Extractors.Csv();
#result =
SELECT *
FROM #data
WHERE date > DateTime.Parse("2016-11-24")
AND date < DateTime.Parse("2017-01-03");

If the predicate is comparing against values that the compiler can see (e.g., constants, constant foldable expressions or script parameters) and the predicate can be moved (e.g., you use AND and not && in the predicate for conjunction) then the optimizer will only touch the files inside the specified range. So the query above should be fine.
You should get a warning if the predicate is not one of the above.
If you do not get this behavior, please let me know.

Related

Access SQL Date Function

So I'm working on editing some SQL code and I've just began learning it. I'm trying to fix an update query so it updates a table's value5 column with a corresponding database value. The value type from the database is a number, which I want to convert to a date and place into my table. The database number is in yyyymmdd format so I've been trying to use datefromparts() which doesn't work. Anyone have any ideas?
UPDATE tbl INNER JOIN dB ON
(dB.value1= tbl.value1 OR
dB.value2 =tbl.value2 ) AND
(LEFT(dB.value3 ,5)=tbl.value3 ) AND
(dB.value4 =tbl.value4 )
SET tbl.value5 = DateFromParts(Left(dB.value5,4),Mid(dB.value5,5,2),Right(dB.value5,2))
WHERE tblInvoice.value5 IS NULL;
The current program uses the code
"SET tbl.value5 = dB.value5"
instead (it runs perfectly fine) and I am having another issue with testing the conversion SQL code (datefromparts()). Because I am converting from numbers to time/date, I have to go into the design view of the target table and change the input data type of the value5 column from numbers to time/date. When I run the query with the conversion SQL code, the query stalls for a bit and no values get updated, leaving me with just a blank value5 column. If I now want to fill in the original number values, I change the SQL code back into its original "SET tbl.value5 = dB.value5", change the input data type from time/date to numbers, and rerun the program. The query stalls and no values are updated, and I am again left with blank columns, even though the same code left me with the corrected update values before the modifications to the SQL and table input Data types. I come from a VBA background and I'm just really confused with how this is working. Any tips would be appreciated, thanks!
Have you tried with substring instead?
SELECT DATEFROMPARTS ( left('20101231',4), substring('20101231',5,2), right('20101231',2) ) AS Result;
MS Access (and MS Jet too) have no DateFromParts function. Using DateSerial instead.
SET tbl.value5 = DateSerial(Left(dB.value5, 4), Mid(dB.value5, 5, 2), Right(dB.value5, 2))
It's not clear if you work with T-SQL or Access SQL. In Access, you can use Format:
SET tbl.value5 = CDate(Format(dB.value5, "####\/##\/##"))
In T-SQL you could use a similar method.

Extracting the last file name from a field where the number of folders and length of file names is variable

I am trying to extract the last file name from a field in SQL where the separator is /, and there is also one after the last file name. (I am using this to create a new filed in a BI web intelligence document.)
Filename1/filename2/filename3/filename4/ result required Filename4
File1/file2/file3/file4/file5/file6 result required file6
I have tried various combinations but without success. As you can see the file names are not of a standard length and the number of folders is variable.
Any help on this would really be appreciated.
Thank you
Lyn
Depending on your answer to my comment ... do you have a input string that ends with "/" or not ? I have put both types of test strings in this query using SQL 2008 as dbms. Just comment out either Set #tstString to run each condition and you will see the two result possibilities.
Declare #tmpFirstMark int
Declare #tmpLastMark int
Declare #tmpUseMark int
Declare #tstString varchar(100)
Set #tstString = 'Filename1/filename2/filename2/filename4/'
Set #tstString = 'File1/file2/file3/file4/file5/file6'
-- Calculate 1st Occurrence of "/"
Set #tmpFirstMark = PATINDEX('%/%',#tstString)
-- Calculate last Occurrence of "/"
Set #tmpLastMark = (LEN(#tstString) - PATINDEX('%/%',REVERSE(#tstString)) + 1)
-- Calculate 2nd to last Occurrence of "/"
Set #tmpUseMark = #tmpLastMark - PATINDEX('%/%', REVERSE(SUBSTRING(#tstString, 1, #tmpLastMark-1)))
Select
#tstString
,#tmpFirstMark
,#tmpLastMark
,#tmpUseMark
,SUBSTRING(#tstString, #tmpLastMark + 1, LEN(#tstString)) as 'resultSTR'
,SUBSTRING(#tstString, #tmpUseMark + 1, #tmpLastMark-#tmpUseMark-1) as 'otherResult'
I would use a regular expression to retrieve the desired output :
([^/]+)/?$
This will match as many non-/ characters as possible (at least 1) before the end of the string, that may be followed by an optional /.
You will want to use the first group of the match to retrieve the filename of a directory without its trailing /.
You haven't specified your RDBMS and I'm not so confortable with using regexps in SQL so I hope you'll be able to piece that together in your SQL dialect.

SQL Multi-Function Variable

I have a fairly long bit of SQL code that creates a number of temp tables. Within the different creations there are some functions that occur multiple times. The functions are constant but they have an int at the end to change the result range, eg.
WHERE getdate() between mfg_ww_begin_datetime and mfg_ww_end_datetime) -2
When I want to change my overall query, I have to go in and manually change each of these ints - is there a way to set these ints at the top of my query so that I can change just one value and each time it is used in the rest, it references that value I have control of at the top?
Well I'm not the smartest, but this works after some more searching.
DECLARE #CurrentWW INT, #SampleSize INT, #RollingAvg INT
SET #CurrentWW = 7
SET #SampleSize = 25
SET #RollingAvg = 10
And using those variable names in the rest of the query. They can be referenced multiple times.

Is there a limit to the query plan size in dm_exec_text_query_plan? My plan is getting cut off

Using ssms 2012 to query a 2008R2 instance, I am trying to get a plan for a specific query using the DMVs like this:
SELECT t.text
, p.query_plan
from sys.dm_exec_query_stats qs
cross join sys.dm_exec_sql_text(qs.sql_handle) t
cross join sys.dm_exec_text_query_plan(qs.plan_handle,0,-1) p
where t.text like ...
The query plan column is getting cut off after 43,679 characters. Ending with
< /Outp
instead of
< /ShowPlanXML>
I tested this with a smaller query and the whole text was returned. The query in question is not that complex, but has a lot of columns, which may be making it a bit more verbose. Also, the value returned is not a link to the plan but just the XML in text form.
Is there a limit to what is stored in plan cache or I am doing something wrong in SSMS that it is not returning the value as a plan link in the column?
Even if the bug Aaron mentions in comments is in play here, you should be able to get around it with an SSMS tweak. You can directly cast the result of your query into the xml data type, and then return it that way.
If you're returning the XML in a grid view, go to Tools/Options/Query Results/SQL Server/Results to Grid and see what the setting for Maximum Characters Retrieved for XML data is, and bump it up to "Unlimited". This should allow you to circumvent the varchar limit.
Whoops! Wrong DMV. I needed to use dm_exec_query_plan not dm_exec_text_query_plan. That solved it - thanks for the replies.
Posting another answer as this may help someone else out in the future. There are a few approaches that I've found to work around the 43,679 character limitation in outputs displayed in Grid View as specific to this question. These approaches also work if your plan exceeds the SQL XML Data Type limit of 128 nested nodes.
The first, and arguably easiest option, is to fire up PowerShell ISE and follow the instructions outlined on this blog post by Patrick Keisler. Paste his script into the ISE editor, adjust the OFFSET values (recommended), output paths/names, and then run everything to generate the final sqlplan file.
If you don't want to use PowerShell, a kludgy TSQL method I cobbled together can also be used, as follows:
-- Quick and dirty script to output large execution plans from cache
-- Be sure to replace Plan Handle and Offset Values Below
DECLARE #query_plan_nvarmax NVARCHAR(MAX), #len_out INT, #sub_str INT = 0, #sub_end INT = 43679
DECLARE #full_query_plan TABLE
(
line INT IDENTITY(1,1),
qp_line NVARCHAR(MAX)
)
SELECT #query_plan_nvarmax = query_plan,
#len_out = LEN(query_plan)--, CAST(query_plan AS XML) as xml_query_plan
-- Usage: Get Query Offset Values From sys.dm_exec_query_stats DMV
-- sys.dm_exec_text_query_plan(plan_handle, OFFSET_START|DEFAULT, OFFSET_END|DEFAULT)
FROM sys.dm_exec_text_query_plan(0x050005005EDA4857307D56540300000001000000000000000000000000000000000000000000000000000000, 10078, 83616)
WHILE #sub_str < #len_out
BEGIN
INSERT INTO #full_query_plan (qp_line)
SELECT SUBSTRING(#query_plan_nvarmax, #sub_str, #sub_end)
SET #sub_str = #sub_end
SET #sub_end = #sub_end + 43679
IF #sub_end > #len_out
SET #sub_end = #len_out
END
-- Save Output of qp_line column to text editor and remove newline characters \r\n
-- I prefer Notepad++, but any editor will suffice then save output as a .sqlplan and open in SSMS
SELECT *
FROM #full_query_plan
Take note, you'll need to edit the output of the query in an external text editor to remove the newline characters \r\n and save the result as a .sqlplan file.

Why would YEAR fail with a conversion error from a Date?

I got a view named 'FechasFirmaHorometros' defined as
SELECT IdFormulario,
CONVERT(Date, RValues) AS FechaFirma
FROM dbo.Respuestas
WHERE ( IdPreguntas IN (SELECT IdPregunta
FROM dbo.Preguntas
WHERE
( FormIdentifier = dbo.IdFormularioHorometros() )
AND ( Label = 'SLFYHDLR' )) )
And i have a Function named [RespuestaPreguntaHorometrosFecha] defined as
SELECT Respuestas.RValues
FROM Respuestas
JOIN Preguntas
ON Preguntas.Label = #LabelPregunta
JOIN FechasFirmaHorometros
ON FechasFirmaHorometros.IdFormulario = Respuestas.IdFormulario
WHERE Respuestas.IdPreguntas = Preguntas.IdPregunta
AND YEAR(FechasFirmaHorometros.FechaFirma) = #Anio
AND MONTH(FechasFirmaHorometros.FechaFirma) = #Mes
#LabelPregunta VARCHAR(MAX)
#Anio INT
#Mes INT
I keep getting this message upon hitting the aforementioned function while debugging another stored procedure that uses it
Conversion failed when converting date and/or time from character string.
Yet i can freely do things like
SELECT DAY(FechaFirma) FROM FechasFirmaHorometros
Why is this happening and how can i solve or work around it?
I assume that RValues is a string column of some type, for some reason. You should fix that and store date data using a date data type (obviously in a separate column than this mixed bag).
If you can't fix that, then you can prevent what Damien described above by:
CASE WHEN ISDATE(RValues) = 1 THEN CONVERT(Date, RValues) END AS FechaFirma
(Which will make the "date" NULL if SQL Server can't figure out how to convert it to a date.)
You can't prevent this simply by adding a WHERE clause, because SQL Server will often try to attempt the conversion in the SELECT list before performing the filter (all depends on the plan). You also can't force the order of operations by using a subquery, CTE, join order hints, etc. There is an open Connect item about this issue - they are "aware of it" and "hope to address it in a future version."
Short of a CASE expression, which forces SQL Server to evaluate the ISDATE() result before attempting to convert (as long as no aggregates are present in any of the branches), you could:
dump the filtered results into a #temp table, and then subsequently select from that #temp table, and only apply the convert then.
just return the string, and treat it as a date on the client, and pull YEAR/MONTH etc. parts out of it there
just use string manipulation to pull YEAR = LEFT(col,4) etc.
use TRY_CONVERT() since I just noticed you're on SQL Server 2012:
TRY_CONVERT(DATE, RValues) AS FechaFirma