Selecting rows based on latest timestamp

Selecting rows based on latest timestamp - sql

I have a question about tables, as I am new to it if any help that would be great.
I have a table with 3 columns. I use the first column out of it to make it as the common key. Based on the key I may get multiple rows selected. I would like to select the row with the latest timestamp which is column 2. Column 3 can have different values.
Eg:
Col1 Col2 Col3
some_name 12:5:12 1
some_name 12:6:12 0
some_name1 12:5:12 1
some_name1 12:6:12 0
some_name2 12:5:12 0
some_name2 12:6:12 1
Output:
Col1 Col2 Col3
some_name 12:6:12 0
some_name1 12:6:12 0
some_name2 12:6:12 1
I would like to do this in apache spark.

In Spark, I think I would go for row_number():
select t.*
from (select t.*, row_number() over (partition by col1 order by col2 desc) as seqnum
from t
) t
where seqnum = 1;

Using sparks Window functions:
val w = Window.partitionBy("col1").orderBy(col("col2").desc)
df.withColumn("latestTS", row_number().over(w))
.where(col("latestTS") === 1)
.drop("latestTS")
.show(false)
+----------+-------+----+
|col1 |col2 |col3|
+----------+-------+----+
|some_name |12:6:12|0 |
|some_name1|12:6:12|0 |
|some_name2|12:6:12|1 |
+----------+-------+----+

this query may help you
select
*
from table ta
where ta.col2 = (select MAX(col2) from table where col1 = ta.col1)
this query returan latest data for col1

Related

Select query eliminating unwanted rows

I'm new to SQLite and I am having trouble finding the solution.
I have TABLE1 with columns col1 and col2
col1 col2
-------------
a no
a no
a yes
b no
c yes
c no
d yes
I want no repetitions from col1 but prioritize col2 when having "yes"
I want something like this
col1 col2
-------------
a yes
b no
c yes
d yes

You may try the following:
Approach 1
You may use row_number to retrieve a row number ordered by col2 in descending order that may be used to filter your results eg.
SELECT
col1,
col2
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY col1
ORDER BY col2 DESC
) rn
FROM
my_table
) t
WHERE rn=1;
col1
col2
a
yes
b
no
c
yes
d
yes
Approach 2
or simply use a group by col1 with the MAX function. The group by will ensure that for each col1 value you will receive the MAX of col2 that is yes if available and no if not.
SELECT
col1,
MAX(col2) as col2
FROM
my_table
GROUP BY
col1;
col1
col2
a
yes
b
no
c
yes
d
yes
View working demo on DB Fiddle

ggordon's answer will work well enough, but just since a window function isn't strictly necessary I figured I'd pass another solution:
select distinct
a.col1,
ifnull(b.col2, 'no') col2
from my_table a
left join (
select distinct
col1,
col2
from my_table
where col2 = 'yes'
) b on a.col1 = b.col1
Output:
| col1 | col2 |
| ---- | ---- |
| a | yes |
| b | no |
| c | yes |
| d | yes |

You will first want to do a distinct select on column one. Then you will want to make a case statement which is essentially a if statement in other languages. The case needs to be if column 1 is yes return it. if it is not yes then return no. It would look something like this
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;

BigQuery - replicate rows with modified values

The title of the post might not accurately represent what I want to do. I have a BigQuery table with a userId column and a bunch of feature columns. Let's say the table is like this.
_____________________________
|userId| col1 | col2 | col3 |
-------|------|------|-------
|u1 | 0.3 | 0.0 | 0.0 |
|u2 | 0.0 | 0.1 | 0.6 |
-----------------------------
Each row has a userId (userIds may or may not be distinct across rows), followed by some feature values. Most of those are 0 except a few.
Now, for each of the rows, I want to create additional rows where only one non-zero feature is substituted with 0. With the example above, the resulting table would look like this.
_____________________________
|userId| col1 | col2 | col3 |
-------|------|------|-------
|u1 | 0.3 | 0.0 | 0.0 |
|u1 | 0.0* | 0.0 | 0.0 |
|u2 | 0.0 | 0.1 | 0.6 |
|u2 | 0.0 | 0.0* | 0.6 |
|u2 | 0.0 | 0.1 | 0.0* |
-----------------------------
Values with asterisk represent the columns for which the non-zero value was set to 0. Since u1 had 1 nonzero feature, only one additional row was added to it with col1 value set to 0. u2 had 2 non-zero columns (col2 and col3). As such, two additional rows were added, one with col2 set to 0 and the other with col3 set to 0.
The table has around 2000 columns and more than 20 million rows.
Normally, I post the crude attempts I could come up with. However, in this case, I don't even know where to start from. I did have one bizarre idea of joining this table with an unpivoted version of it. But, I don't know how to unpivot a BQ table.

Below is for BigQuery Standard SQL
It is generic enough - you don't need to specify column names or repeat same chunk of code 2000 times!
Assuming that your initial data is in project.dataset.table table
#standardSQL
create temp table flatten as
with temp as (
select userid, offset,
split(col_kv, ':')[offset(0)] as col,
cast(split(col_kv, ':')[offset(1)] as float64) as val
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) col_kv with offset
where split(col_kv, ':')[offset(0)] != 'userid'
), numbers as (
select * from unnest((
select generate_array(1, max(offset))
from temp)) as grp
), targets as (
select userid, grp from temp, numbers
where grp = offset and val != 0
), flatten_result as (
select *, 0 as grp from temp union all
select userid, offset, col, if(offset = grp, 0, val) as val, grp
from temp left join targets using(userid)
)
select * from flatten_result;
execute immediate '''create temp table pivot as
select userid, ''' || (
select string_agg(distinct "max(if(col = '" || col || "', val, null)) as " || col)
from flatten
) || ''' from flatten group by userid, grp''';
select * from pivot order by userid;
your final output is in temp table pivot
If to apply above to sample data from your question output of script is
and output of pivot table is under last VIW RESULT link

One method is brute force:
select userid, col1, col2, col3
from t
union all
select userid, 0 as col1, col2, col3
from t
where col1 = 0
union all
select userid, col1, 0 as col2, col3
from t
where col2 = 0
union all
select userid, col1, col2, 0 as col3
from t
where col3 = 0;
This is verbose -- and convoluted with hundreds of columns. I can't readily think of a simpler method.

SQL - Create a formatted ouput with placeholder rows

For reasons of our IT department, I am stuck doing this entirely within an SQL query.
Simplified, I have this as an input table:
And I need to create this:
And I am just not sure where to start with this. In my normal C# way of thinking its easy. Column1 is ordered, if the value in Col1 is new, then add a new row to the output and put the contents in column1 in the output. Then, whilst the contents of the input Column1 is unchanged, keep adding the contents of column2 to new rows.
In SQL... nope, I just cannot see the right way to start!

This is a presentation issue that can be easily done in the application or presentation layer. In SQL this can be clunky. The goal of a database is not to render a UI but to store and retrieve data fast and also efficiently, in order to serve as many clients as possible with the same hardware and software resources constraints.
The query that could do this can look like:
with
y as (
select col1, row_number() over(order by col1) as r1
from (select distinct col1 as col1 from t) x
),
z as (
select
t.col1, y.r1, t.col2,
row_number() over(partition by t.col1 order by t.col2) as r2
from t
join y on y.col1 = x.col1
)
select col1, col2
from (
select col1, null as col2, r1, 0 from y
union all
select null, col2, r1, r2 from z
) w
order by r1, r2
As you see, it looks clunky and bloated.

You need a header row for each group which will consist of col1 and null and all the rows of the table with null as col1.
You can do it with UNION ALL and conditional sorting:
select
case when t.col2 is null then t.col1 end col1,
t.col2
from (
select col1, col2 from tablename
union all
select distinct col1, null from tablename
) t
order by
t.col1,
case when t.col2 is null then 1 else 2 end,
t.col2
See the demo (for MySql but it is standard SQL).
Results:
| col1 | col2 |
| ---- | ----- |
| SetA | |
| | BH101 |
| | BH102 |
| | BH103 |
| SetB | |
| | BH201 |
| | BH202 |
| | BH203 |

I agree, formatting should be done outside of SQL, but if you have no choice, here is some SQL Server code that will generate your output
select *
from (
select top 100
case
when col2 is null then ' '+col1
else '' end as firstCol,
IsNull(col2,'') as Col2
from dbo.test t1
group by col1,col2 with rollup
order by col1,col2
) x
where x.firstcol is not null

Update column value in all rows of a table on mod(rownum,10) = number

I have a table tab1 that looks like this:
col1 | col2 | col3
------|------|------
abc | 100 | text
abc | 100 | text
abc | 100 | text
... | ... | ...
I need to update col2 value in each row like this:
update tab1
set col2 = 1,23
when mod(rownum,10) = 1;
update tab1
set col2 = 12,34
when mod(rownum,10) = 2;
update tab1
set col2 = 123,45
when mod(rownum,10) = 3;
and etc. until when mod(rownum,10) = 9.
But obviously this query doesn't work, and the reason is that rownum always returns 1 in this situation, afaik. However, I've got the correct last digits for each row number with select mod(rownum,10) as lastDig from tab1 query. But I don't understand how to use the result of this select for my update when conditions.
Could you please provide an example of a query that will do the job in this situation? Do I need to use a subquery or select in a temporary table? Please explain. I'm a junior frontend guy, but I need to create a demo table this way. I believe, pl/sql is v10, as well as PL/SQL Developer.
Result wanted looks like this:
col1 | col2 | col3
------|-------|------
abc | 1.23 | text
abc | 12.34 | text
abc | 123.45| text
... | ... | ...

You could use CASE expression or DECODE:
update tab1
set col2 = CASE mod(rownum,10) WHEN 1 THEN 1.23
WHEN 2 THEN 12.34
WHEN 3 THEN 123.45
-- ...
ELSE col2
END
-- WHERE ...
UPDATE tab1
SET col2 = DECODE(mod(rownum,10), 1, 1.23, 2, 12.34, 3, 123.45, ..., col2)
-- WHERE ...;
DBFiddle Demo

You have not told us if there is a specific order in which you want to treat rows as 1,2,3 .. If there is indeed an order, then ROWNUM is unreliable and may not work, you would need row_number() with a specific order by column. That can be combined with a MERGE statement.
MERGE INTO tab1 tgt USING (
SELECT
CASE mod( ROW_NUMBER() OVER(
ORDER BY
col1 -- the column which is in order and unique
),10)
WHEN 1 THEN 1.23
WHEN 2 THEN 12.34
WHEN 3 THEN 123.45
--..
--.. 9
ELSE col2
AS col2
FROM
tab1 t
)
src ON ( tgt.rowid = src.rowid ) --use primary key/unique key if there is one instead of rowid
WHEN MATCHED THEN UPDATE SET tgt.col2 = src.col2;
Demo

How to retrieve 2nd latest date from a table

I am trying to retrieve second latest date from a table. For example, consider this as my table:
COL1| COL2| COL3
---------------------
A | 1 | 25-JUN-14
B | 1 | 25-JUN-14
C | 1 | 25-JUN-14
A | 1 | 24-JUN-14
B | 1 | 24-JUN-14
C | 1 | 24-JUN-14
A | 1 | 23-JUN-14
B | 1 | 23-JUN-14
C | 1 | 23-JUN-14
I come up with this query which would get the result I want(2nd latest date).
SELECT sub.COL1, sub.COL2, MAX(sub.COL3)
FROM (SELECT t.COL1, t.COL2, t.COL3
FROM test t
GROUP BY t.COL1, t.COL2, t.COL3
HAVING MAX(t.COL3) < (
SELECT MAX(COL3)
FROM test sub
WHERE sub.COL1=t.COL1 AND sub.COL2=t.COL2
GROUP BY COL1, COL2)) sub
GROUP BY sub.COL1, sub.COL2;
As you can see it's big and messy statement with multiple nested sub queries just to get a 2nd latest date. I would love to learn an elegant solution for my problem rather that this mess. Appreciate your help.. :)
PS: I am not allowed to use 'WITH' command.. :(

If I understand correctly, you can do:
select t.*
from (select t.*,
dense_rank() over (order by col3 desc) as seqnum
from test t
) t
where seqnum = 2;

You can try like this:-
SELECT col1, col2, MAX(col3)
FROM TEST
WHERE col3 < (SELECT MAX(col3)
FROM tab1)
GROUP BY col1, col2;
Sql Fiddle Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting rows based on latest timestamp - sql

In Spark, I think I would go for row_number(): select t.* from (select t.*, row_number() over (partition by col1 order by col2 desc) as seqnum from t ) t where seqnum = 1;

this query may help you select * from table ta where ta.col2 = (select MAX(col2) from table where col1 = ta.col1) this query returan latest data for col1

Related

Select query eliminating unwanted rows

BigQuery - replicate rows with modified values

SQL - Create a formatted ouput with placeholder rows

Update column value in all rows of a table on mod(rownum,10) = number

How to retrieve 2nd latest date from a table

Categories

Resources