Analysis to be carried out in SQL - sql

I have a database which captures everyday every column data as binary 0 or 1 in a SQL database. Now I want to calculate what is the avg no. of days a user takes to complete that column. How can I run the query for that?
Please note timeline for every user is different and I want data to be in 3 days slab (for example 1-3,4-6 so on and so forth).

Related

Oracle SQL Plus - Query to fetch data in a specific point in time

I have a situation where I have to pull data from Oracle database at different point in time during a day but there is a condition with this.
For Example, after running the query in the morning at 11 AM, the query fetched 2 records.
When I re-run the same query at 2 PM, the query should return the records excluding the ones which it had fetched in the morning 11 AM execution.
I am playing around with createdinfo.time and timestamps but I am not able to get the results in a single query.
I don't know even if it is possible to use a single query and get the data filtered as mentioned above or I have to use 2 different queries here for 2 different time intervals.
Any suggestions are appreciated.

Doubleor triple timestamp issue

I am using SQL assistant and my data brings in snapshots from a huge database in the form of timestamps. Occasionally the snapshots bring in multiples per hour. The data is correct, multiple snapshots do happen from time to time within an hour, not always but it does happen.
I am bringing this into Spotfire and viewing by an hour and when more than one snapshot happens in the hour, the data shows as doubled.
I only want to display one per hour preferably the last(max) timestamp for the hour. Example; for the 7 am hour the data has a snapshot for 7:10 am and one for 7:55 am.
These are correct but I only want to display the last(max) timestamp, 7:55 am in this case. I can't figure the issue out in Spotfire so I am leaning towards a fix in SQL. How can I display only 1 for each hour?
You'd do this similarly to how you'd probably do it in SQL -- using a ranking/rownumber function.
The basic way Rank in Spotfire works is Rank(Order columns, order direction, partitioned columns, tie method)
You need to partition by the combination of Date and Hour, and then sort descending by your timestamp column.
So the code to identify the rows that you want to isolate should be something along the lines of:
Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first")
What you do with it from here is going to depend on how you plan to use the data - for example, you can Limit Data Using Expression and set the code above = 1 which will limit your table accordingly (helpful if you don't want your users to accidentally forget to filter), or you can create a calculated column which turns it into a flag of some form like here:
If(Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first") = 1, "Latest", "Duplicate")
Which allows your users to filter by this property. This way, they have the option to look at the extra rows.
Ultimately, though, if you want to only ever see these rows, and have no use for the earlier records, I'd probably do it in SQL, if you have that ability. This reduces the number of rows you have to load into your analytic.

Pentaho data integration issue with loading a kettle based on some condition

I have a Pentaho Data Integration job which has the following steps:
Generate row step which has an initial date (for e.g. 2010-01-01) and the limit as 10*366 = 3660 rows for 10 years.
Next step has an incrementer to increment the number of days.
Next step uses this information viz. initial date, limit, and the incrementer, to generate dates for each day for 10 years starting 2010-01-01 using javascript functions.
Final step loads a table with the generated dates.
All this works fine.
Now, I have a requirement where I do not want this table to be static with dates for 10 years only. If the max date in the date table is 2 years from today, I want to load dates for 10 more years in the table.
For the above example, with the 1st load loading dates for 10 years from 2010, I should be able to load 10 more years in 2018, the next 10 years in 2028 and so on and so forth.
What will be the best way to achieve this?
How can I:
1) Read the max date from my date table? - I know how to do this.
2) Use the read date to compare against today. And if the max date is within 2 years from today, I populate the table with next 10 years.
I don't know how to do 2 above in Pentaho data integration. Will really appreciate any pointers on a way to resolve this issue.
You need to read the current date (today) in a variable. For example with a Get system info step.
Then you can compare the two fields, max date and today, with a Filter Rows step.
As the previous step may give you more than one row, you need to either use a Unique Row (no field to provide) either a Group by (no group by field).
If any row gets by, then you launch you generate 10 years process. As you cannot have a hop from a step into this second Generate row, you must use a Transformation executor to launch your currently existing transformation.
Now, if your requirement gets a tiny little bit more complex than that, I strongly suggest you to use jobs to orchestrate your transformations.

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

MS access latest date issue

I have 2 excel files at work where I maintain the rates of assets and the dates when the rates were issued. Another excel file has the list of assets and the dates when they were sold.
So one excel file has the following columns:
Asset------Rate------Rate_Issued_On
1. X-------1500------21-Apr-2014
2. X-------2000------28-Aug-2013
3. Z-------2200------11-Jan-2014
4. X-------3000------1-Jan-2014
The other excel file has (let's suppose):
Asset-----Sold_Date
1. X------1-Dec-2013
2. Z------12-Mar-2014
Now since the sold date of Asset X lies between 1-Jan-2014 and 28-Aug-2013 it should take the rate of 2000. If for example the sold date was 22-Apr-2014 it should take the rate as 1500. If the sold date is 27-Aug-2013 it should display a blank record. So basically the sold date should be greater than the latest Issued date and rate will correspond to that particular date.
I can easily get this working in excel but the problem is that the excel file has now become so large that it runs very slow. So I just want this incorporated in ms access. Is this possible? (I am a novice in ms access so kindly requesting you to go a little easy on me)
Thanks
Yes - a few simple queries can match up the data they way you want. If your two tables are called Rates and Sales, you could use two queries to get the results you need. The 1st query would use the Sales and Rates table to find the largest Rate_date that is less than the Sale_date, and the second query would match this back to the Rate table to get the rate on that date.
A very similar problem is described in How to use another table fields as a criteria for MS Access