Leading a value into another row based on scenario - sql

I want to lead a row value into another row depending on scenario.
Here is my input in Hive table:
Output:
There should be only one entry for each seconds, Lat-Lang, V1 & V2 column values can be derived from the latest milli second having valid value means not null.
Please suggest Windowing function or spark API to achieve this.

Related

Forward fill in spark SQL based on column value condition

Please can someone help me how to forward fill values in a case statement based on another column value in SPARK SQL.
I am basically trying to detect outliers in the SQL dataset and so far how I have identified these outliers is identifying standard deviation of a value far from the mean of the dataset.
Now the problem statement is wherever these outliers fall, I have to fill the value in a new column the value which was last valid/authentic.
For example: after 1 in the first column, I want to append 556 in third column and for 3 in the first column, I want to append 561 in the third column
So far, I have identified the outliers and based on the value, I am guessing I can use lag function and go back 1 row. But I also know, this is not a good approach. For example, I get 10 outliers in a sequence, I will have to write 10 CASE statement for that.
Please if someone have any better/efficient approach, please help.

SQL query - How to achieve the subsequent column updation by summing up the value of current row in Single select query (need to avoid while loop)

The logic which we are trying to achieve in single query is as follows.
We need to loop based on row number column. So, on each loop we need to sum-up remaining value and new value.. resultant value to be updated in "by summing up column". and the decimal part to be updated in decimal value column.
in next step, need to sum-up the decimal value column by grouping on row number. and the resultant to be updated in remaining value column of next row number
the above step 1-2 to be continued till we reach last record.
We achieved this through while loop.. But trying to achieve this without while loop.
Can someone please give idea to achieve this
Please refer the attached image for understanding table
enter image description here

Tableau: Get the ids that contain only the selected values from another column

I have the following question!
I have a table like this:
Data Source
I want to create a field(i suppose it's a field) that i can take the apl_ids,
that have as service_offered some that i want.
Example from the above table. If i want the apl_ids that have ONLY the service_offered
Pending 1, Pending 2 and Pending 7.
In that case, I want to get the apl_id = "13" since apl_id = "12" got one more service that i don't need.
Which is the best way to get that?
Thank you in advance!
Add a calculated field which gives 1 for desired values and 0 for other values. Add another calc field with fixed LOD to apl_id to sum of calcF1. Filter all ids with values=3 only. I think that should work.
Else tell me I will post screenshots
You can create a set based on the field api_id defined by the condition
max([service_offering]=“Pending 1”) and
max([service_offering]=“Pending 2”) and
max([service_offering]=“Pending 7”) and
min([service_offering]=“Pending 1” or [service_offering]=“Pending 2” or [service_offering]=“Pending 7”)
This set will contain those api_ids that have at least one record where service_offering is “Pending 1” and at least one record with Pending 2 ... and where every record has a service offering of 1, 2 or 7 (I.e. no others)
The key is to realize that Tableau treats True as greater than False, so min() and max() for boolean expressions correspond to every() and any().
Once you have a set of api_ids() you can use it on shelves and in calculated fields in many different ways.

Datatable Compute Method filter on row number

I use a query which fetches say 50 records and passes it to a datatable. This record is then displayed in a tabular format. The display has pagination used displaying 10 records at a time. There is a facility to move to next or previous set of record or move forward or backwards by 1 record.
I have to find Min and Max of a column for the set of record currently visible. I am planning to use Compute method but I am not sure if it allows filtering on anything other than the columns in datatable.
Do I have to include row number in my query or is there a better solution (something along the line mentioned below)?
CType(dtLineup.Compute("Min(ArrivalDate)", dt.row(2) to dt.row(12)), Date)
There is nothing like your pseudo code in MSDN on DataColumn.Expression. You could include a row number in your query, as you said, but an alternative is to add a row number column to your data table and use that in the filter expression.
DataColumn col = new DataColumn("rownumber", typeof(int));
col.AutoIncrement = true;
col.AutoIncrementSeed = 1;
datatable.Columns.Add(col);
Another alternative could be to do paging by linq (Skip-Take) and compute the aggregate function over the returned rows. But that may be a major departure of your current application structure.

QlikView Set Analysis: use either column number or other unique info from row

I am trying to use Set Analysis in the table below for the column labelled test. I am trying to get sum([Best Dollar]) for the date range specified by the Start and End columns.
This expression returns results, but it's naturally static for each row of the table:
=sum({$<AsAtDate={">=40959 and <=40960"}>} [Best Dollar])/1000
This is what I want to have but it returns 0:
=Sum({$<AsAtDate={">=(num(floor(BroadcastWeekStart2))) and <=(num(floor(BroadcastWeekStart2)))+6"}>} [Best Dollar])/1000
To obtain unique start date serial numbers for each line for the start column (BroadcastWeekStart2) I use the following expression:
=(num(floor(BroadcastWeekStart2)))
How can I specify that the values or calculations used for the start and end columns are used in Set Analysis for the field above?
There is at least one information missing in your question.
Do you want to select on fixed values or should the sum depend on the current time?
For the static version something like
=sum( {$<BroadcastWeekStart2={"40959"}, BroadcastWeekStart2={"<=40960"}>} [Best Dollar])/1000
should work. (Assuming that BroadcastWeekStart2 contains these kind of values.)
Let me show you an example how I calculate values for the current week:
='Count: ' & count({$<start_week={"$(=WeekStart(Today()))"}>} Games)
where the start_week is set in the load script by:
WeekStart(date#(#4, 'YYYYMMDD')) as start_week,