Header row not appearing at the top of CSV while using Outputters - azure-data-lake

In u-sql query dumping data from one csv file to another through Outputterss.Csv() function but header row with column names is appearing at the end of file instead of top.Please find my code below.Thanks for the help.
#telDataResult=
SELECT
"vin" AS vin,
"outsideTemperature" AS outsideTemperature,
"EventProcessedUtcTime" AS EventProcessedUtcTime,
"PartitionId" AS PartitionId,
"EventEnqueuedUtcTime" AS EventEnqueuedUtcTime,
"IoTHub" AS IoTHub
FROM #telData
UNION
SELECT
t.vin ,
Convert.ToString(outsideTemperature) AS outsideTemperature
EventProcessedUtcTime ,
PartitionId ,
EventEnqueuedUtcTime ,
IoTHub
FROM
#telData AS t
UNION
SELECT
t.vin ,
Convert.ToString(outsideTemperature) AS outsideTemperature
EventProcessedUtcTime ,
PartitionId ,
EventEnqueuedUtcTime ,
IoTHub
FROM
#telData1 AS t;
OUTPUT #telDataResult
TO
#"wasb://blobcontainer#blobstorage.blob.core.windows.net/cluster/logs/2016/outputofADLA.csv"
USING Outputters.Csv();

When you use a native outputter, the individual rows are written in a parallel fashion by multiple vertices, so there is no guarantee of order. We are currently working on supporting the output of header rows natively. In the meantime, you can use our custom outputter that writes header rows to output files. The custom outputter can be found in https://github.com/Azure/usql/tree/master/Examples/HeaderOutputter. Using the HeaderOutputter, your code will look like the following.
#telDataResult= SELECT
t.vin ,
Convert.ToString(outsideTemperature) AS outsideTemperature
EventProcessedUtcTime ,
PartitionId ,
EventEnqueuedUtcTime ,
IoTHub
FROM
#telData;
OUTPUT #telDataResult TO <OutputFile>
USING new new HeaderOutputter.HeaderOutputter(quoting:false);

Related

Extracting nested JSON using json_extract_array: getting null results on populated data fields?

I am attempting to extract the following configuration data from this nested json file (Image of JSON file), using this query:
SELECT
json_extract_scalar(f, '$.affiliationTypes') as affiliation_types
, json_extract_scalar(f, '$.organizationTypes') as organization_types
, json_extract_scalar(f, '$.verificationTypes') as verification_types
, json_extract_scalar(f, '$.notifierIds') as notifier_ids
, json_extract_scalar(f, '$.metadata') as metadata
, json_extract_scalar(body, '$.accountId') as account_id
FROM `my database`
left join unnest(json_extract_array(body, '$.request.config')) as f
LIMIT 100;
but am getting null results for all of the selected config data. I've even filtered on accounts/requests where I know there is config data listed still nothing. Any ideas? (many of these fields have data I've just nulled them out for this post).
AS an example, I would like for the first select statement to output: STUDENT_PART_TIME, EMPLOYEE, FACULTY, VETERAN, STUDENT_FULL_TIME, ACTIVE_DUTY
For those who dont want to open image, this is the json chunk I'm interested in:
"body": {
"accountId":""
,"activeAffiliationTypes":[]
,"affiliations":[]
,"aggregateState":"OPEN"
,"birthMonth":null
,"birthYear":null
,"customerFacingConclusiveVerificationType":null
,"customerFacingRequestedAffiliationType":""
,"dataSourceErrorCodes":null
,"dataSourceResponseHttpStatusCategory":"UNKNOWN"
,"dataSourceResponseHttpStatusCode":-1
,"emailDomain":null
,"errorCodes":[]
,"errors":[]
,"issuedRewards":[]
,"metadata":{}
,"request": {
"accountId":""
,"active":false
,"assetMap":{}
,"config": {
"affiliationTypes":["STUDENT_PART_TIME","EMPLOYEE","FACULTY","VETERAN","STUDENT_FULL_TIME","ACTIVE_DUTY"]
,"assetTypes":[]
,"consolationRewardIds":[]
,"ignoreVerifierClasses":null
,"locale":"en_US"
,"metadata":{}
,"notifierIds":null
,"organizationTypes":[]
,"rewardIds":[]
,"testMode":false
,"verificationModelClass":""
,"verificationSourceClasses":[]
,"verificationTypes":["AUTHORITATIVE"]
}
,"created":null
SELECT STRING(c)
FROM 'my database'
LEFT JOIN UNNEST(
JSON_QUERY_ARRAY(body, '$.request.config.affiliationTypes')
) c

Convert DB2 MAX(DECODE to HIVE

I have a code that was written for DB2 and now need to rewrite it for hive and I am unable to find equivalent code for MAX(DECODE in HIVE
My Current CODE
SELECT
E.CHARGE_ARRANGEMENT_NUMBER ,
E.MIG_MAIN_PROD_CODE ,
MAX(DECODE(UPPER(E.MIG_EXTRA_ACTION), 'KEEP', E.MIG_EXTRA_ACTION)) AS EXTRAS_KEEP ,
MAX(DECODE(UPPER(E.MIG_EXTRA_ACTION), 'DROP', E.EXTRAS_LIST)) AS EXTRAS_DROP
FROM
EXTRA_MAPPINGS_PRE E
GROUP BY
E.CHARGE_ARRANGEMENT_NUMBER ,
E.MIG_MAIN_PROD_CODE
so you need to use if or case when then end.
SELECT
E.CHARGE_ARRANGEMENT_NUMBER ,
E.MIG_MAIN_PROD_CODE ,
MAX(IF(UPPER(E.MIG_EXTRA_ACTION) ='KEEP', E.MIG_EXTRA_ACTION,null) ) AS EXTRAS_KEEP ,
MAX(IF(UPPER(E.MIG_EXTRA_ACTION) ='DROP', E.EXTRAS_LIST,null)) AS EXTRAS_DROP
FROM
EXTRA_MAPPINGS_PRE E
GROUP BY
E.CHARGE_ARRANGEMENT_NUMBER ,
E.MIG_MAIN_PROD_CODE
You can use case-when if you are not comfortable with if.

Extracting data from JSON field in Amazon Redshift

I am trying to extract some data from a JSON field in Redshift.
Given below is a sample view of the data I am working with.
{"fileFormat":"excel","data":{"name":John,"age":24,"dateofbirth":1993,"Class":"Computer Science"}}
I am able to extract data for the first level namely data corresponding to
fileFormat and data as below:
select CONFIGURATION::JSON -> 'fileFormat' from table_name;
I am trying to extract information under data like name, age,dateofbirth
You could use Redshift's native function json_extract_path_text
- https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html
SELECT
json_extract_path_text(
configuration,
'data',
'name'
)
AS name,
json_extract_path_text(
configuration,
'data',
'age'
)
AS age,
etc
FROM
yourTable

Azure Stream analytics default field values for missing fields

I have some json values coming in from an IOT datasource to stream analytics. They want to change the json in a later version to have extra fields but older versions will not have these fields. Is there a way I can detect the field is missing and set up a default value for it before it gets to the output? for example they would like to add an e.OSversion which if it did not exist would default to "unknown". The output is a sql database as it happens.
WITH MetricsData AS
(
SELECT * FROM [MetricsData]
PARTITION BY LID
WHERE RecordType='UseList'
)
SELECT
e.LID as LID
,e.EventEnqueuedUtcTime AS SubmitDate
,CAST (e.UsedDate as DateTime) AS UsedDate
,e.Version as Version
,caUsedList.ArrayValue.Module AS Module
,caUsedList.ArrayValue.UsageCount AS UsedCount
INTO
[ModuleUseOutput]
FROM
Usagedata as e
CROSS APPLY getElements (e.UsedList) as caUsedList
Please use case..when.. operator.
Example:
select j.id, case when j.version is null then 'unknown' else j.version end as version
from jsoninput as j
Output:
Or you could just set the default value in the sql database column directly.

using multiple parameters in append query in Access 2010

I have been trying to get an append query to work but I keep getting an error stating that 0 rows are being appended whenever I use more than 1 parameter in the query. This is for a
The table in question has 1 PK which is a GUID [which is generating values with newid()] and one required field (Historical) which I am explictly defining in the query.
INSERT INTO dbo_sales_quotas ( salesrep_id
, [year]
, territory_id
, sales_quota
, profit_quota
, product_super_group_uid
, product_super_group_desc
, class_9
, Historical
, sales_quotas_UID )
SELECT dbo_sales_quotas.salesrep_id
, dbo_sales_quotas.Year
, dbo_sales_quotas.territory_id
, dbo_sales_quotas.sales_quota
, dbo_sales_quotas.profit_quota
, dbo_sales_quotas.product_super_group_uid
, dbo_sales_quotas.product_super_group_desc
, dbo_sales_quotas.class_9
, dbo_sales_quotas.Historical
, dbo_sales_quotas.sales_quotas_UID
FROM dbo_sales_quotas
WHERE (((dbo_sales_quotas.salesrep_id)=[cboSalesRepID])
AND ((dbo_sales_quotas.Year)=[txtYear])
AND ((dbo_sales_quotas.territory_id)=[txtTerritoryID])
AND ((dbo_sales_quotas.sales_quota)=[txtSalesQuota])
AND ((dbo_sales_quotas.profit_quota)=[txtProfitQuota])
AND ((dbo_sales_quotas.product_super_group_uid)=[cboProdSuperGroup])
AND ((dbo_sales_quotas.product_super_group_desc)=[txtProductSuperGroupDesc])
AND ((dbo_sales_quotas.class_9)=[cboClass9])
AND ((dbo_sales_quotas.Historical)='No')
AND ((dbo_sales_quotas.sales_quotas_UID)='newid()'));
Even if I assign specific values, I still get a 0 rows error except when I reduce the number of parameters to 1 (which it then works perfectly regardless of which parameter) I have verified that the parameters have the correct formats.
Can anyone tell me what I'm doing wrong?
Break out the SELECT part of your query and examine it separately. I'll suggest a simplified version which may be easier to study ...
SELECT
dsq.salesrep_id,
dsq.Year,
dsq.territory_id,
dsq.sales_quota,
dsq.profit_quota,
dsq.product_super_group_uid,
dsq.product_super_group_desc,
dsq.class_9,
dsq.Historical,
dsq.sales_quotas_UID
FROM dbo_sales_quotas AS dsq
WHERE
dsq.salesrep_id=[cboSalesRepID]
AND dsq.Year=[txtYear]
AND dsq.territory_id=[txtTerritoryID]
AND dsq.sales_quota=[txtSalesQuota]
AND dsq.profit_quota=[txtProfitQuota]
AND dsq.product_super_group_uid=[cboProdSuperGroup]
AND dsq.product_super_group_desc=[txtProductSuperGroupDesc]
AND dsq.class_9=[cboClass9]
AND dsq.Historical='No'
AND dsq.sales_quotas_UID='newid()';
I wonder about the last 2 conditions in the WHERE clause. Is the Historical field type bit instead of text? Does the string 'newid()' match sales_quotas_UID in any rows in the table?