U-SQL Column Type Convertion - azure-data-lake

I have created a U-SQL query, which gets the input file from the DataLake Store and converts the values. The final output is stored in DataLake Store.
DECLARE #in string = "system/dbotable{*}.tsv";
DECLARE #out string ="system/temp.tsv";
#searchlog =
EXTRACT
Id int,
Address string,
number int
FROM #in
USING Extractors.Tsv();
#transactions =
SELECT
*,
ROW_NUMBER()
OVER(PARTITION BY Id ORDER BY Id DESC) AS RowNumber
FROM #searchlog;
#result =
SELECT
Id ,
Address,
number
FROM #transactions
WHERE RowNumber == 1;
OUTPUT #result
TO #out
USING Outputters.Tsv();
And it is showing the following error,
Execution failed with error '1_SV1_Extract Error : '{"diagnosticCode":195887132,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR","message":"Invalid character when attempting to convert column data.","description":"HEX: \"2243616E696E6522\" Invalid character when converting input record.\nPosition: line 1, column index: 1, column name: \"Id\".","resolution":"Check the input for errors or use \"silent\" switch to ignore over(under)-sized rows in the input.\nConsider that ignoring \"invalid\" rows may influence job results and that types have to be nullable for conversion errors to be ignored.","helpLink":""

It seems like the Id column is not always of type Integer.
I would extract the Id column as string first and then in a second step, try to convert it to Int, using a user defined function as shown in here: https://msdn.microsoft.com/en-us/library/azure/mt621309.aspx (example based on DateTime).
The other option, would be to use silent:true in your extractor, so you automatically ignore rows which fail the conversion.

Related

Optimal way to SET/DECLARE a list in SQL query?

I am writing a SQL Query based of user input, as these inputs will change on a daily basis.
The goal of the query is to pull all data for only the ID's in the user-defined list. Example below-
However, I am getting the following error:
"Conversion failed when converting the varchar [...] to data type int"
Any idea on what the optimal way to specify a list and use that list at the "ID in (..)" clause?
I have tried converting the ID list into strings, but still receiving a similar error.
id_list = [12,16,22,42,1,24]
date = '2020-12-18'
query = (
"""
DECLARE #id varchar(1000), #date datetime
SET #id = '{}'
SET #date = '{}'
SELECT * from TABLE where ID in (#id) and Date = #Date
"""
.format(id_list,date))
The desired result is for a query to be able to take a list of IDs that could be utilized in the clause.
id in #id
SQL Server doesn't support lists or arrays. So the best method is a table:
declare #id_list table (id int);
insert into #idlist (id)
values (12), (16), (22), (42), (1), (24);
You can then use this wherever you would use a table variable. For instance:
where id in (select id from #id_list)

Transact-SQL Select statement results with bad GUID

We have a table with GUID primary keys. When I search for a specific key, I can use either:
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE6'
SELECT * FROM products WHERE productID LIKE '34594289-16B9-4EEF-9A1E-B35066531DE6'
RESULT (for both):
product_ID Prd_Model
------------------------------------ --------------------------------------------------
34594289-16B9-4EEF-9A1E-B35066531DE6 LW-100
(1 row affected)
We have a customer who uses our ID but adds more text to it to create some kind of compound field in their own system. They sent me one of these values to look up and I had an unexpected result. I meant to trim the suffix but forgot, so I ran this:
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
When I ran it, I unexpectedly got the same result:
product_ID Prd_Model
------------------------------------ --------------------------------------------------
34594289-16B9-4EEF-9A1E-B35066531DE6 LW-062
(1 row affected)
Now if I trim a value off the end of the GUID when searching I get nothing (GUID is 1 digit short):
SELECT * FROM products WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE'
Result:
product_ID Prd_Model
------------------------------------ --------------------------------------------------
(0 rows affected)
When using the LIKE command instead of '=' and if I add the suffix to the end, the statement returns zero results. This is what I would expect.
So why does the longer string with the suffix added to the end return a result when using '=' in the statement? It's obviously ignoring anything beyond the 36-character GUID length but I'm not sure why.
This behaviour is documented:
Converting uniqueidentifier Data
The uniqueidentifier type is considered a character type for the purposes of conversion from a character expression, and therefore is subject to the truncation rules for converting to a character type. That is, when character expressions are converted to a character data type of a different size, values that are too long for the new data type are truncated. See the Examples section.
So, the string value '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD' is truncated to '34594289-16B9-4EEF-9A1E-B35066531DE6' when it is implicitly cast (due to Data Type Precedence) to a uniqueidentifier and, unsurprisingly, '34594289-16B9-4EEF-9A1E-B35066531DE6' equals itself so the row is returned.
And the documentation does indeed give an example:
The following example demonstrates the truncation of data when the value is too long for the data type being converted to. Because the uniqueidentifier type is limited to 36 characters, the characters that exceed that length are truncated.
DECLARE #ID NVARCHAR(max) = N'0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong';
SELECT #ID, CONVERT(uniqueidentifier, #ID) AS TruncatedValue;
Here is the result set.
String TruncatedValue
-------------------------------------------- ------------------------------------
0E984725-C51C-4BF4-9960-E1C80E27ABA0wrong 0E984725-C51C-4BF4-9960-E1C80E27ABA0
I, however, find it odd that you say that the statement below returns no rows:
SELECT *
FROM products
WHERE productID='34594289-16B9-4EEF-9A1E-B35066531DE'
Though true, it won't return rows, it will also error:
Conversion failed when converting from a character string to uniqueidentifier.
The fact it doesn't implies your column isn't a uniqueidentifier which would mean that your first statement isn't true; as the longer string would not be truncated. This means that one of the statements in the question is likely wrong; either your column is a uniqueidentifier and thus you get results but get an error in the latter, or it isn't and neither statement would return a result set. As you can see in this demonstration:
CREATE TABLE dbo.YourTable (UID uniqueidentifier, String varchar(36));
INSERT INTO dbo.YourTable (UID,String)
VALUES('34594289-16B9-4EEF-9A1E-B35066531DE6','34594289-16B9-4EEF-9A1E-B35066531DE6');
GO
--Returns data
SELECT *
FROM dbo.YourTable
WHERE UID = '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
GO
--Errors
SELECT *
FROM dbo.YourTable
WHERE UID = '34594289-16B9-4EEF-9A1E-B35066531DE';
GO
--Returns no data
SELECT *
FROM dbo.YourTable
WHERE String = '34594289-16B9-4EEF-9A1E-B35066531DE6_GBR_USD'
GO
--Returns no data
SELECT *
FROM dbo.YourTable
WHERE String = '34594289-16B9-4EEF-9A1E-B35066531DE'
GO
DROP TABLE dbo.YourTable;
db<>fiddle

SQL query with column name obtained from another table SQL Server 2012

trying to run the query
select * from customers, TablesList where TablesList.TableName+'ID' =
10 and tableslist.tableid= 123
where the column name obtained from another table. I get the following error
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting
the nvarchar value 'CustomersID' to data type int.
I know I can do something like Select * from customers where customersID = 10
But trying to create CustomersID column name dynamically from another table. The intent it to have TablesList.TableName+'ID' give me CustomersID string that I can use to equate to 10.
My guess is that the value for Tablelist.TableName is Customer so when you do + 'ID' it results in 'CustomerID'. 'CustomerID' is the VALUE that is returned and not the FIELD NAME that gets compared to 10.
Hence when sqlserver try to convert 'CustomerID' to 10 you get an error message telling you that it's not an integer Value.
As far as I know you cannot get a "field name" from a field value directly trough SQL, for that you'd need to create a stored proc or some kind of programming language to build the query dynamically
TablesList.TableName+'ID' generates the string 'CustomersID'. You get the error because your comparison is actually made like this:
'CustomersID' = 10 -- The comparison NVARCHAR = INT produces the error
What i think you're trying to achieve requires dynamic SQL.
The problem with what you have is that your where clause is checking if the value 'CustomerID' is equal to 10. It isn't (and can't) use that string as a column name in that context. You need to use dynamic sql.
Dynamic SQL is where you build up a string which contains the SQL you ulitmately want to run. So as an example, you could do something like this:
declare #sql varchar(max)
set #sql = 'select * from customers where ' + (select top 1 TableName from TableList where tableId = 123) + 'ID = 10'
EXEC(#sql)
This sets the #sql variable to select * from customers where customerID = 10 then runs that statement.
Use Concat:
select * from customers, TablesList where Concat('TablesList.TableName','ID') =
10 and tableslist.tableid= 123

SQL Server 2005 I am not able to read from a table

Please suppose that in SQL Server 2005, if you launch the following query:
SELECT CHICKEN_CODE FROM ALL_CHICKENS WHERE MY_PARAMETER = 'N123123123';
you obtain:
31
as result.
Now, I would like to write a function that, given a value for MY_PARAMETER, yields the corresponding value of CHICKEN_CODE, found in the table ALL_CHICKENS.
I have written the following stored function in SQL Server 2005:
ALTER FUNCTION [dbo].[determines_chicken_code]
(
#input_parameter VARCHAR
)
RETURNS varchar
AS
BEGIN
DECLARE #myresult varchar
SELECT #myresult = CHICKEN_CODE
FROM dbo.ALL_CHICKENS
WHERE MY_PARAMETER = #input_parameter
RETURN #myresult
END
But if I launch the following query:
SELECT DBO.determines_chicken_code('N123123123')
it yields:
NULL
Why?
Thank you in advance for your kind cooperation.
define the length of your varchar variables like this
varchar(100)
Without the 100 (or whatever length you choose) its lengh is 1 and the where clause will filter out the correct results.
Specify a length for your varchar (ex.: varchar(100)). Without length, varchar = 1 char.
As per other PS, You can store only one char in the #myresult because you have not specified any length, bcoz 1 char length is default for Varchar datatype.
Why we are getting NUll, not the first char:
If there are multiple records are filtered on basis of Where clause in ALL_CHICKENS table then the value of CHICKEN_CODE column is picked up from last row in ALL_CHICKENS table.
It seems that the last row has null value in CHICKEN_CODE column.
Specify a length for #input_parameter, #myresult as by default varchar lengh is 1.

SQL for nvarchar 0 = '' & = 0?

I was searching for integers in a nvarchar column. I noticed that if the row contains '' or 0 it is picked up if I search using just 0.
I'm assuming there is some implicit conversion happening which is saying that 0 is equal to ''. Why does it assign two values?
Here is a test:
--0 Test
create table #0Test (Test nvarchar(20))
GO
insert INTO #0Test (Test)
SELECT ''
UNION ALL
SELECT 0
UNION ALL
SELECT ''
Select *
from #0Test
Select *
from #0Test
Where test = 0
SELECT *
from #0Test
Where test = '0'
SELECT *
from #0Test
Where test = ''
drop table #0Test
The behavior you see is the one describe din the product documentation. The rules of Data Type Precedence specify that int has higher precedence than nvarchar therefore the operation has to occur as an int type:
When an operator combines two expressions of different data types, the
rules for data type precedence specify that the data type with the
lower precedence is converted to the data type with the higher
precedence
Therefore your query is actually as follow:
Select *
from #0Test
Where cast(test as int) = 0;
and the empty string N'' yields the value 0 when cast to int:
select cast(N'' as int)
-----------
0
(1 row(s) affected)
Therefore the expected result is the one you see, the rows with an empty string qualify for the predicate test = 0. Further proof that you should never mix types freely. For a more detailed discussion of the topic, see How Data Access Code Affects Database Performance.
You are implicitly converting the field to int with your UNION statement.
Two empty strings and the integer 0 will result in an int field. This is BEFORE you insert into the nvarchar field, so the data type in the temp table is irrelevant.
Try changing the second select in the UNION to:
SELECT '0'
And you will get the expected result.