Azure Data Factory -> Lookup , ForEach and copy activity

Azure Data Factory -> Lookup , ForEach and copy activity - sql

I want to take a file from the azure synapse and load it in ADLs using ADf. I want to read the data of the Last 13 months and make a different file for each month.
I made a CSV file where I wrote the start date and end date of each month and make a lookup activity over this file. Then using foreach activity, I load the file from the copy activity.
Now I want to write a query for each month's data.
select * from edw.factbaldly where INSERT_DATE > #activity('Lookup1').output.value.startdate and INSERT_DATE < #activity('Lookup1').output.value.EndDate
select * from edw.factbaldly where INSERT_DATE > #item().startdate and INSERT_DATE < #item().EndDate'
I use these to queries but not able to read the data of lookup activity and fetch the data.
Please help me with the query.
Thanks in advance.

I assume your Lookup1 CSV column headings are startdate and enddate
In your ForEach > Settings > Items you will have #activity('Lookup1').output.value
Inside the ForEach block, your Copy activity Source query will look like:
select * from edw.factbaldly where INSERT_DATE > '#{item().startdate}' and INSERT_DATE < '#{item().enddate}'
ADF will substitute #{thing} with a string so you'll get the dates as quoted strings in the query
Maybe also you want one of the signs as >= or <= ?
In fact you probably don't need to maintain the CSV because you can use a variable and ADF functions utcnow(), addToTime() and startOfMonth() to find the dates

In the lookup activity you will fetch the #item().startdate and #item().EndDate. Or I guess you have already set those in the lookup before ForEach. But to use this details when you produce new files, you must use the query from the question in source part of the Copy Activity.
If you can't use the query directly on the file, you can import the whole file to DB table and then use your query in the copy activity source.

You can use an expression like this
#concat('select * from edw.factbaldly where INSERT_DATE> >',item().startdate,'and INSERT_DATE <',item().EndDate)
If I where you , i could have added a set variable activity and tested the above expression . The set variable should give us a syntactically correct TSQL statement .
HTH

Related

Write SQL from SAS

I have this code in SAS, I'm trying to write SQL equivalent. I have no experience in SAS.
data Fulls Fulls_Dupes;
set Fulls;
by name, coeff, week;
if rid = 0 and ^last.week then output Fulls_Dupes;
else output Fulls;
run;
I tried the following, but didn't produce the same output:
Select * from Fulls where rid = 0 groupby name,coeff,week
is my sql query correct ?

SQL does not have a concept of observation order. So there is no direct equivalent of the LAST. concept. If you have some variable that is monotonically increasing within the groups defined by distinct values of name, coeff, and week then you could select the observation that has the maximum value of that variable to find the observation that is the LAST.
So for example if you also had a variable named DAY that uniquely identified and ordered the observations in the same way as they exist in the FULLES dataset now then you could use the test DAY=MAX(DAY) to find the last observation. In PROC SQL you can use that test directly because SAS will automatically remerge the aggregate value back onto all of the detailed observations. In other SQL implementations you might need to add an extra query to get the max.
create table new_FULLES as
select * from FULLES
group by name, coeff, week
having day=max(day) or rid ne 0
;
SQL also does not have any concept of writing two datasets at once. But for this example since the two generated datasets are distinct and include all of the original observations you could generate the second from the first using EXCEPT.
So if you could build the new FULLS you could get FULLS_DUPES from the new FULLS and the old FULLS.
create table FULLS_DUPES as
select * from FULLES
except
select * from new_FULLES
;

Dynamic list of variables in process in Azure Data Factory

I have a lookup config table that stores the 1) source table and 2) list of variables to process, for example:
SQL Lookup Table:
tableA, variableX,variableY,variableZ <-- tableA has more than these 3 variables, i.e it has other variables such as variableV, variable W but they do not need to be processed
tableB, variableA,variableB <-- tableB has more than these 2 variables
Hence, I will need to dynamically connect to each table and process the specific variables in each table. The processing step is to convert the julian date (in integer format) to standard date (date format). Example of SQL query:
select dateadd(dd, (variableX - ((variableX/1000) * 1000)) - 1, dateadd(yy, variableX/1000, 0)) FROM [dbo].[tableA]
The problem is after setting up lookup and forEach in ADF, I am unsure how to loop through the variable array (or string, since SQL DB does not allow me to store array results) and convert all these variables into the standard time format.
The return result should be a processed dataset to be exported to a sink.
Hence would like to check what will be the best way to achieve this in ADF?
Thank you!

I have reproed in my local environment. Please see the below steps.
Using lookup activity, first get all the tables list from control table.
Pass the lookup output to ForEach activity.
Inside ForEach activity, add lookup activity to get the variables list from control table where table name is current item from ForEach activity.
#concat('select table_variables from control_tb where table_name = ''',item().table_name,'''')
Convert lookup2 activity output value to an array using set variable activity.
#split(activity('Lookup2').output.firstRow.table_variables,',')
create another pipeline (pipeline2) with 2 parameters (table name (string) and variables (array)) and add ForEach activity in pipeline2
Pass the array parameter to ForEach activity in pipeline2 and Use the copy activity to copy data from source to sink
Connect Execute pipeline activity to pipeline 1 inside ForEach activity.

How to query only old and duplicate data from a database in SQL

I'm trying to query my database to pull only duplicate/old data to write to a scratch section in excel (Using a macro passing SQL to the DB).
For now, I'm currently testing in Access alone to only filter out the old data.
First, I'm trying to filter my database by a specifed WorkOrder, RunNumber, and Row.
The code below only filters by Work Order, RunNumber, and Row. ...but SQL doesn't like when I tack on a 2nd AND statement; so this currently isn't working.
SELECT *
FROM DataPoints
WHERE (((DataPoints.[WorkOrder])=[WO2]) AND ((DataPoints.[RunNumber])=6) AND ((DataPoints.[Row]=1)
Once I figure that portion out....
Then if there is only 1 entry with specified WorkOrder, RunNumber, and Row, then I want filter it out. (its not needed in the scratch section, because its data is already written to the main section of my report)
If there are 2 or more entries with said criteria(WO, RN, and Row), then I want to filter out the newest entry based on RunDate and RunTime, and only keep all older entries.
For instance, in the clip below. The only item remaining in my filtered query will be the top entry with the timestamp 11:47:00AM.
.
Are there any recommended commands to complete this problem? Any ideas are helpful. Thank you.

I would suggest something along the lines of the following:
select t.*
from datapoints t
where
t.workorder = [WO2] and
t.runnumber = 6 and
t.row = 1 and
exists
(
select 1
from datapoints u
where
u.workorder = t.workorder and
u.runnumber = t.runnumber and
u.row = t.row and
(u.rundate > t.rundate or (u.rundate = t.rundate and u.runtime > t.runtime))
)
Here, if the correlated subquery within the where clause finds a record with the same workorder, runnumber and row, but with either a later rundate or the same rundate and a later runtime, then the record is returned by the main query.

You need two more )'s at the end of your code snippet. Or you can delete the parentheses completely in this example, MS Access will ad them back in as it deems necessary.
M.S. Access SQL can be tricky as it is not standards compliant and either doesn't allow for super complex queries, or it needs an ugly work around, like having a parentheses nesting nightmare when trying to join more than two tables.
For these reasons, I suggest using multiple Access queries to produce your results.

Displaying single date header about multiple rows (Recycleview)

Evening everyone
I've currently got a simple recycle view adapter which is being populated by an SQL Lite database. The user can add information into the database from the app which then build a row inside of the recycle view. When you run the application it will display each row with its own date directly above it. I'm now looking to make the application look more professional by only displaying a single date above multiple records as a header.
So far I've built 2 custom designs, one which displays the header along with the row and the other which is just a standard row without a header built in. I also understand how to implement two layouts into a single adapter.
I've also incorporated a single row into my database which simply stores the date in a way in which I can order the database e.g. 20190101
Now my key question is when populating the adapter using the information from the SQL Lite database how can get it to check if the previous record has the same date. If the record has the same date then it doesn't need to show the custom row with header but if its a new date then it does?
Thank you
/////////////////////////////////////////////////////////////////////////////
Follow up question for Krokodilko, I've spent the last hour trying to work your implementation into my SQL Lite but still not being able to find the combination.
below the is the original code SQL Lite line I currently use to simply gain all the results.
Cursor cursor = sqLiteDatabase.rawQuery("SELECT * FROM " + Primary_Table + " " , null);

First you must define an order which will be used to determine which record is previous and which one is next. As I understand, you are simply using date column.
Then the query is simple - use LAG analytic function to pick a column value from previous row, here is a link to a simple demo (click "Run" button):
https://sqliteonline.com/#fiddle-5c323b7a7184cjmyjql6c9jh
DROP TABLE IF EXISTS d;
CREATE TABLE d(
d date
);
insert into d values ( '2012-01-22'),( '2012-01-22'),( '2015-01-22');
SELECT *,
lag( d ) OVER (order by d ) as prev_date,
CASE WHEN d = lag( d ) OVER (order by d )
THEN 'Previous row has the same date'
ELSE 'Previous row has different date'
END as Compare_status
FROM d
ORDER BY d;
In the above demo d column is used in OVER (order by d ) clause to determine the order of rows used by LAG function.

Reusability QlikView: Time/Calendar Table

It is important to apply calculations and business rules consistently across QlikView applications. We can store variables, connections etc. in an external file and apply them across various QVW.
Is there a standardized script for time/calendar dimension that has practically everything you need regarding time, and which could be used accross different QVWs without having to spend time to create it all over again when developing a new QVW.
I would like to have something that is robust, has everything I need and that I can include in every QVW.

You can check for the Rob Wunderlich's Qlikview Components, there is a standard Calendar function you can call.
You can also check on his website, there's a very good script I use each time I make a report. You can put the result of the script in a QVD and load it on every report you make.
So It will be something like this:
MasterCalendar:
Load
TempDate AS OrderDate,
week(TempDate) As Week,
Year(TempDate) As Year,
Month(TempDate) As Month,
Day(TempDate) As Day,
'Q' & ceil(month(TempDate) / 3) AS Quarter,
Week(weekstart(TempDate)) & '-' & WeekYear(TempDate) as WeekYear,
WeekDay(TempDate) as WeekDay
;
//=== Generate a temp table of dates ===
LOAD
date(mindate + IterNo()) AS TempDate
,maxdate // Used in InYearToDate() above, but not kept
WHILE mindate + IterNo() <= maxdate;
//=== Get min/max dates from Field ===/
LOAD
AddYears(today(), -6) as mindate, // The first date you want
Today() as maxdate
AUTOGENERATE 1;
STORE MasterCalendar INTO 'Calendar.qvd' (qvd);
DROP TABLE MasterCalendar;

Two options:
You can create a .qvw that generates a calendar .qvd that contains all of the date fields (say DayOfMonth, DayOfWeek, Month etc.) as well as a Date field, called say CalendarDate. Then in all of your .qvw you can Left Join your fact table against the data loaded from the .qvd through the CalendarDate field. For performance reasons I would not leave the calendar as a separate table, if I can help it.
Otherwise you can create a text file that contains the column definitions in a variable that you can use as macro. Something like
LET CalendarFields = '
Year($1) as DateYear,
Month($1) as DateMonthOfYear,
Week($1) as DateWeekOfYear,
Day($1) as DateDayOfMonth,
WeekDay($1) as DateDayOfWeek,
Date(MonthStart($1), ''YYYY MMM'') as DateMonthInYear,
Year(WeekStart($1)) & ''w'' & Num(Week($1), ''00'') as DateWeekInYear ';
You can load this file, say common.txt with $(Must_Include=common.txt); or $(Include=common.txt);
Then in your load statement for your fact table, you can use this macro like like:
Facts:
LOAD
*,
$(CalendarFields(FactDateField));

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas