Get data in the last three months using talend (Big Data Hive)

Get data in the last three months using talend (Big Data Hive) - hive

I have a query to get all data from big data hive as source using talend
this is the query i usually use:
SELECT
bd_bt_xyz.xllnis05_timestamp,
bd_bt_xyz.xllnis05_key,
.
. (too many field)
.
bd_bt_xyz.xln_cr_in_un_bl_dt,
bd_bt_xyz.date_pr
FROM newmisplus2.bd_bt_llnis05
LIMIT 1000000
And from now i need to modified the query to get only data in the last three months in talend and i still can't figured out how to do it.
*NOTE : field bd_bt_xyz.date_pr is a date of data creation.

Use filter:
where bd_bt_xyz.date_pr >= add_months(current_date, -3)
Something like this in Talend:
"select
...
where bd_bt_xyz.date_pr >= '" +TalendDate.addDate(TalendDate.getDate("yyyy-MM-dd"),"yyyy-MM-dd",-3,"MM")+ "'"

Related

Azure Data Factory -> Lookup , ForEach and copy activity

I want to take a file from the azure synapse and load it in ADLs using ADf. I want to read the data of the Last 13 months and make a different file for each month.
I made a CSV file where I wrote the start date and end date of each month and make a lookup activity over this file. Then using foreach activity, I load the file from the copy activity.
Now I want to write a query for each month's data.
select * from edw.factbaldly where INSERT_DATE > #activity('Lookup1').output.value.startdate and INSERT_DATE < #activity('Lookup1').output.value.EndDate
select * from edw.factbaldly where INSERT_DATE > #item().startdate and INSERT_DATE < #item().EndDate'
I use these to queries but not able to read the data of lookup activity and fetch the data.
Please help me with the query.
Thanks in advance.

I assume your Lookup1 CSV column headings are startdate and enddate
In your ForEach > Settings > Items you will have #activity('Lookup1').output.value
Inside the ForEach block, your Copy activity Source query will look like:
select * from edw.factbaldly where INSERT_DATE > '#{item().startdate}' and INSERT_DATE < '#{item().enddate}'
ADF will substitute #{thing} with a string so you'll get the dates as quoted strings in the query
Maybe also you want one of the signs as >= or <= ?
In fact you probably don't need to maintain the CSV because you can use a variable and ADF functions utcnow(), addToTime() and startOfMonth() to find the dates

In the lookup activity you will fetch the #item().startdate and #item().EndDate. Or I guess you have already set those in the lookup before ForEach. But to use this details when you produce new files, you must use the query from the question in source part of the Copy Activity.
If you can't use the query directly on the file, you can import the whole file to DB table and then use your query in the copy activity source.

You can use an expression like this
#concat('select * from edw.factbaldly where INSERT_DATE> >',item().startdate,'and INSERT_DATE <',item().EndDate)
If I where you , i could have added a set variable activity and tested the above expression . The set variable should give us a syntactically correct TSQL statement .
HTH

BigQuery can use wildcard table names and table_suffix, but I am looking for a similar solution like wildcard datasets and dataset_suffix

So if you process data daily and put the results into the same dataset, such as results, and each day will have the same table name (first part) and with date as table_suffix, such as result1_20190101, result1_20190102 etc., they you query the result tables use wildcard table names and table_suffix.
So your dataset/tables looks like
results/result1_20190101
results/result1_20190102
results/result2_20190101
results/result2_20190102
So I can query all the result1
select * from `xxxx.results.result1*`
But I arrange the results tables differently. Due to I have dozens tables processed each day. so to easily check and manage each day results. I use date as dataset.
so my dataset/tables look like
20190101/result1
20190101/result2
...
20190102/result1
20190102/result2
...
And my daily data process usually will not query cross dates(datasets). the daily results are pushed to next step data pipelines etc.
But once a while, I need to do some quick check, and I need to query across the dates(in my case, across the datasets)
so when I try to query result1, I have to hard code the dataset name.
select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...
1) First question is, are there anyway I could use wildcards and suffix on datasets, like we could with tables?
2) Second question: how could I use the date function, such as DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) to get the date value and use the data value in the below query
select * from `xxxxxx.20190101/result1`
union all
select * from `xxxxxx.20190102/result1`
union all
...
to replace the hard coded value, 20190101, 20190102 etc.

There is no wildcards and/or suffix available on BigQuery datasets (at least as of today)
Meantime, you can check a feature request for INFORMATION_SCHEMA that is in Alpha now. You can apply for it by submitting form that is available there.
In short: you will be able to query list of datasets in the projects and then use it to construct your query. Please note - you still need to use some sort of client to script all this properly

MS Access: Selecting Records from 2 Tables Without Foreign Key

I'm tracking my driving habits in MS Access 2016. I have a table called Miles:
In my Miles table, I'm recording information from my car's dash at the end of each drive.
I also have a 2nd table (actually a query) called Fuel:
My Fuel query shows when I purchased fuel and for how much.
I want to create a query that shows that shows the greatest Transaction_Date that is less than or equal to each Miles_Date. My expected output would look something like this:
I tried the following Select statement:
SELECT
Miles.Miles_ID,
DMax("[Transaction_Date]", "Fuel", "[Fuel]![Transaction_Date] <= [Miles]![Miles_Date]") AS Fuel_Date,
Miles.Miles_Date, Miles.Miles, Miles.MPG
FROM
Miles;
I get the error:
Microsoft Access cannot find the name [Miles]![Miles_Date]

When using a domain aggregate, you need to use string concatenation to pass values from the current row, like this:
SELECT
Miles.Miles_ID,
DMax("[Transaction_Date]", "Fuel", "[Fuel].[Transaction_Date] <= #" & Format(Miles.Miles_Date, "yyyy-mm-dd") & "#") AS Fuel_Date,
Miles.Miles_Date, Miles.Miles, Miles.MPG
FROM
Miles;
However, using a domain aggregate in a query is a bad practice, since it limits the influence of the optimizer. When possible, use a subquery instead:
SELECT
Miles.Miles_ID,
(SELECT Max([Transaction_Date]) FROM Fuel WHERE [Fuel].[Transaction_Date] <= Miles.Miles_Date) AS Fuel_Date,
Miles.Miles_Date, Miles.Miles, Miles.MPG
FROM
Miles;
This will both run faster, and not rely on string concatenation.

VBA Access Query for Day Summary

I use this forum all the time for VBA help but this is the first time I have to post something myself.
I am trying to make a report that provides a summary of various alarms stored in Access. I want to provide a simple Count of each alarm, each day. I have used some SQL queries but not really any Access. I took the fact that Access can do Pivot tables from Access itself. If there is a better way, please let me know.
Set CommandQuery.activeConnection = conn
commandQuery.CommandText = _
"TRANSFORM Count(FixAlarms.[Alm_NativeTimeLast]) AS CountOfAlm_NativeTimeLast " & _
"SELECT FixAlarms.Alm_Tagname, FixAlarms.Alm_Desc " & _
"FROM FixAlarms " & _
"WHERE ((FixAlarms.Alm_Tagname) <> """")) AND FixAlarms.Alm_NativeTimeIn > CellTime " & _
"GROUP BY FixAlarms.[Alm_Tagname], FixAlarms.Alm_Descr " & _
"PIVOT Format([Alm_NativeTimeIn],""Short Date"")"
rec.Open commandQuery
This is the code I am using. I had to retype it, so please forgive any typo. It does most of what I want but it does not give me any indication of what day each column is. I need a header on each column in case there were no alarms one day. I think the answer lies within the IN part of the PIVOT but I can't get it to work without syntax errors. I thought all I had to do was add on
PIVOT Format([Alm_NativeTimeIn],""Short Date"") IN 01/20/15"
Please help if you can.
Thanks.

In order to get the records for all day, even those where there were no activity you need to create these days. The simplest way to do so in access is to use a set of UNION statements to create a fake table for the days similar to this:
SELECT #2015-01-20# as dt FROM dual
UNION ALL
SELECT #2015-01-21# as dt FROM dual
UNION ALL
SELECT #2015-01-22# as dt FROM dual
If you try the above query in Access it will not work, as there is no table called dual. You will have to create it. Check this SO question.
After you created the above query you can LEFT JOIN it with the source table.
TRANSFORM Count(FixAlarms.[Alm_NativeTimeLast]) AS CountOfAlm_NativeTimeLast
SELECT FixAlarms.Alm_Tagname, FixAlarms.Alm_Desc
FROM
(SELECT #2015-01-20# as dt FROM dual
UNION ALL
SELECT #2015-01-21# as dt FROM dual
UNION ALL
SELECT #2015-01-22# as dt FROM dual) as dates LEFT JOIN
FixAlarms ON DateValue(FixAlarms.[Alm_NativeTimeIn]) = dates.dt
WHERE ((FixAlarms.Alm_Tagname) <> """")) AND FixAlarms.Alm_NativeTimeIn > CellTime
GROUP BY FixAlarms.[Alm_Tagname], FixAlarms.Alm_Descr
PIVOT Format(dates.dt, 'Short Date')
EDIT: I must add that this is not the only way of achieving it. Another way is to use a Numbers table. Create a table called Numbers with a single numeric column n and fill it with numbers 0 to 100 (depends on the maximum number of days you wish to include into your query). Then your query for the dates will be:
SELECT DateAdd('d', n, #2015-01-20#) as dt FROM numbers where n < 30;
And the resulting query will be:
TRANSFORM Count(FixAlarms.[Alm_NativeTimeLast]) AS CountOfAlm_NativeTimeLast
SELECT FixAlarms.Alm_Tagname, FixAlarms.Alm_Desc
FROM
(SELECT DateAdd('d', n, #2015-01-20#) as dt FROM numbers where n < 30) as dates LEFT JOIN
FixAlarms ON DateValue(FixAlarms.[Alm_NativeTimeIn]) = dates.dt
WHERE ((FixAlarms.Alm_Tagname) <> """")) AND FixAlarms.Alm_NativeTimeIn > CellTime
GROUP BY FixAlarms.[Alm_Tagname], FixAlarms.Alm_Descr
PIVOT Format(dates.dt, 'Short Date')

When using PIVOT columnName IN (ValueList) ValueList is
In parentheses
In quotes
Comma separated
So you're
PIVOT Format([Alm_NativeTimeIn],""Short Date"") IN 01/20/15"
Needs to become
PIVOT Format([Alm_NativeTimeIn],""Short Date"") IN (""01/20/15"")
With that said, this will not filter your records using PIVOTS in IN statement. You need to use the WHERE clause still.
If the end goal is to represent your data left to right then this will work. It will be a lot of extra work to make this work as a report though because your controls will not be bound to predictable columns. The Column names will change for different parameters.
You could leave this as a traditional query (not pivoted) and have a much easier time reporting it. If you are showing users the grid directly or exporting to Excel then this is not a problem.

So, I just wanted to add a header to my pivot table that would tell me what date the particular column was for.
The part of the code that I did not show was that I was using a rec.getrows to move all of my data into a simpler array variable. While this had all the data from Access, it did not have any headers to inform me what was a tagname, what was a description, and what was which date.
I found that in the recordset itself under fields.item(n) there was a Name attribute. This name told me where the column data came from or the date of the data. Using this and a simple day(date) function, I was able to make my monthly report summarizing all of the alarms.
Thanks for your help guys, but I either was not clear in my description of the problem or it was being over thought.

SQL query to produce a time x day grid from a list of timestamps?

Structures of my tables are as follow.
Table Name : timetable
timetable http://www.4shared.com/download/MYafV7-6ce/timetableTable.png
Table Name : slot_table
timetable http://www.4shared.com/download/9Lp_CBn2ba/slot_table.png
Table Name : instructor(this table is not required for this particular problem)
I want to show the resultant data in my android app in a timetable format somewhat like this:
random http://www.4shared.com/download/oAGiUXVAba/random.png
Question : What query i should write so that subjects of particular days with respective slots will be the result of the query?
1)The days should be in order like monday,tuesday,wednesday.
2)If monday has 2 subjects in 2 different slots then it should display like this :
Day 7:30-9:10AM 9:20-11:00AM
Monday Android Workshop Operating System
This is just a sample.
P.S:As timetable format is required,all the subjects with slot ids of all the days(monday to saturday) must be there in it.
Edit :
I tried this
select day,subject,slot from timetable,slot_table where timetable.slotid = slot_table.slotid
which gave a result :
a http://www.4shared.com/download/uMU7NA8Oce/random1.png
But i want it in a timetable format which i am not having an idea how to do that.
Edit :
Timetable sample format is something like this :
Edit :
I wrote a query
select timetable.day,count(slot_table.subject) as no_of_classes from timetable,slot_table where timetable.slotid = slot_table.slotid group by timetable.day
which resulted in
a http://www.4shared.com/download/rZW20_g8ce/random2.png
So now it shows monday has 2 classes in 2 slots,Tuesday has 1 class in 1 slot and so on.
Now any help on a query which can show the two slots(timings) on monday?
Solution :
select timetable.day,max(case when (slot='7:30-9:10AM') then slot_table.subject END) as "7:30-9:10AM",max(case when (slot='9:20-11:00AM') then slot_table.subject END) as "9:20-11:00AM",max(case when (slot='11:10-12:50PM') then slot_table.subject END) as "11:10-12:50PM",max(case when (slot='1:40-3:20PM') then slot_table.subject END) as "1:40-3:20PM", max(case when (slot='3:30-5:00PM') then slot_table.subject END) as "3:30-5:00PM" from timetable join slot_table on timetable.slotid = slot_table.slotid group by timetable.day
Result :
a http://www.4shared.com/download/1w7Tyicfce/random3.png

What you want is called a PIVOT query. In one of these, you have a select which gives the data in rows, like your result just under the EDIT (Day, subject, slot). Then you need to specify the values of the row you want to 'pivot' to become columns (slot in this example). Because a Pivot relies on the values of the column to be pivoted it can be difficult to write a general query, and the Postgres Wiki has an example using dymanic SQL and lots of code generating it at http://wiki.postgresql.org/wiki/Pivot_query
In your case, given that slots look like they're fixed and you might be able to hard-code them (that's a decision you'll have make yourself).
NB I am not a Postgres user, but it looks like it can do it (and I would have been very surprised if it couldn't).

This is a pivot or crosstab query. PostgreSQL has only limited support for these via the crosstab function in the tablefunc module.
It can sometimes be better to just deal with this in the application, accumulating the data into a table as you read each data point.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas