I am having trouble querying some data. The table I am trying to pull the data from is a LOG table, where I would like to see changes in the values next to each other (example below)
Table:
+-----------+----+-------------+----------+------------+
| UNIQUE_ID | ID | NAME | CITY | DATE |
+-----------+----+-------------+----------+------------+
| xa220 | 1 | John Smith | Berlin | 2020.05.01 |
| xa195 | 1 | John Smith | Berlin | 2020.03.01 |
| xa111 | 1 | John Smith | München | 2020.01.01 |
| xa106 | 2 | James Brown | Atlanta | 2018.04.04 |
| xa100 | 2 | James Brown | Boston | 2017.12.10 |
| xa76 | 3 | Emily Wolf | Shanghai | 2016.11.03 |
| xa20 | 3 | Emily Wolf | Shanghai | 2016.07.03 |
| xa15 | 3 | Emily Wolf | Tokyo | 2014.02.22 |
| xa12 | 3 | Emily Wolf | null | 2014.02.22 |
+-----------+----+-------------+----------+------------+
Desired outcome:
+----+-------------+----------+---------------+
| ID | NAME | CITY | PREVIOUS_CITY |
+----+-------------+----------+---------------+
| 1 | John Smith | Berlin | München |
| 2 | James Brown | Atlanta | Boston |
| 3 | Emily Wolf | Shanghai | Tokyo |
| 3 | Emily Wolf | Tokyo | null |
+----+-------------+----------+---------------+
I have been trying to use FIRST and LAST values, however, cannot get the desired outcome.
select distinct id,
name,
city,
first_value(city) over (partition by id order by city) as previous_city
from test
Any help is appreciated!
Thank you!
Use the LAG function to get the city for previous date and display only the rows where current city and the result of lag are different:
WITH cte AS (
SELECT t.*, LAG(CITY, 1, CITY) OVER (PARTITION BY ID ORDER BY "DATE") LAG_CITY
FROM yourTable t
)
SELECT ID, NAME, CITY, LAG_CITY AS PREVIOUS_CITY
FROM cte
WHERE
CITY <> LAG_CITY OR
CITY IS NULL AND LAG_CITY IS NOT NULL OR
CITY IS NOT NULL AND LAG_CITY IS NULL
ORDER BY
ID, "DATE" DESC;
Demo
Some comments on how LAG is being used and its values checked are warranted. We use the three parameter version of LAG here. The second parameter means the number of records to look back, which in this case is 1 (the default). The third parameter means the default value to use should a given record per ID partition be the first. In this case, we use the default as the same CITY value. This means that the first record would never appear in the result set.
For the WHERE clause above, a matching record is one for which the city and lag city are different, or for where one of the two be NULL and the other not NULL. This is the logic needed to treat a NULL city and some not NULL city value as being different.
Related
I got rather complicated riddle to solve. So far I'm unlocky.
I got 3 tables which I need to join to get the result.
Most important is that I need highest h_id per p_id. h_id is uniqe entry in log history. And I need newest one for given point (p_id -> num).
Apart from that I need ext and name as well.
history
+----------------+---------+--------+
| h_id | p_id | str_id |
+----------------+---------+--------+
| 1 | 1 | 11 |
| 2 | 5 | 15 |
| 3 | 5 | 23 |
| 4 | 1 | 62 |
+----------------+---------+--------+
point
+----------------+---------+
| p_id | num |
+----------------+---------+
| 1 | 4564 |
| 5 | 3453 |
+----------------+---------+
street
+----------------+---------+-------------+
| str_id | ext | name |
+----------------+---------+-------------+
| 15 | | Mein st. 33 | - bad name
| 11 | | eck st. 42 | - bad name
| 62 | abc | Main st. 33 |
| 23 | efg | Back st. 42 |
+----------------+---------+-------------+
EXPECTED RESULT
+----------------+---------+-------------+-----+
| num | ext | name |h_id |
+----------------+---------+-------------+-----+
| 3453 | efg | Back st. 42 | 3 |
| 4564 | abc | Main st. 33 | 4 |
+----------------+---------+-------------+-----+
I'm using Oracle SQL. Tried using query below but result is not true.
SELECT num, max(name), max(ext), MAX(h_id) maxm FROM history
INNER JOIN street on street.str_id = history._str_id
INNER JOIN point on point.p_id = history.p_id
GROUP BY point.num
In Oracle, you can use keep:
SELECT p.num,
MAX(h.h_id) as maxm,
MAX(s.name) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as name,
MAX(s.ext) KEEP (DENSE_RANK FIRST ORDER BY h.h_id DESC) as ext
FROM history h INNER JOIN
street s
ON s.str_id = h._str_id INNER JOIN
point p
ON p.p_id = h.p_id
GROUP BY p.num;
The keep syntax allows you to do "first()" and "last()" for aggregations.
I have some data in data lake:
Person | Date | Time | Number of Friends |
Bob | 02/01 | unix_ts1 | 5 |
Kate | 02/01 | unix_ts2 | 2 |
Jill | 02/01 | unix_ts3 | 3 |
Bob | 02/01 | unix_ts3 | 7 |
Kate | 02/02 | unix_ts4 | 10 |
Jill | 01/29 | unix_ts0 | 1 |
I would like to produce a table like so:
Person | Date | Time | Number of Friends DELTA | Found Diff Between
Bob | 02/01 | unix_ts1 | NaN | (5, NaN)
Kate | 02/01 | unix_ts2 | NaN | (2, NaN)
Jill | 02/01 | unix_ts3 | 2 | (3, 1)
Bob | 02/01 | unix_ts3 | 2 | (7, 5)
Kate | 02/02 | unix_ts4 | 8 | (10, 2)
So, I have a table where each row is identified by a person's name and a time at which the data was recorded. I would like a query that will go and find instances of "Bob" and find deltas for consecutive timestamps and then give the delta, as well as the two values it found the diff between. I would like this to happen for each person.
I found a method to do this when there is just one value, using lag() command, but that would not do a match by Person. I also know how to do this in Pandas if I downloaded the data, but I am wondering if there is a way to do this in Hive.
Is there a way to do this? Thank you!
Using lag window function.
select person
,date
,time
,num_friends-lag(num_friends) over(partition by person order by time) as delta
,concat_ws(',',num_friends,lag(num_friends) over(partition by person order by time)) as found_diff_between
from tbl
I have a table of notes related to orders from an old terminal system in Oracle 12c. Each order reference has several lines of notes, ordered by a sequence number.
I want to concatenate all of the relevant notes together for each order reference so that I can try to pull some address data out of it. The address data could be spread over several different sequence numbers. The structure is:
| SEQ | NOTE_TEXT | ORDER | ... |
|-----|--------------------------|-------|-----|
| 1 | The address for this | | |
| 2 | is 123 The Street, City, | | |
| 3 | County, Postcode | | |
| 1 | This customer has ordered| | |
| 2 | this product on date | | |
| 1 | Some other note | | |
| 1 | This order is for A Smith| | |
| 2 | The address is 4 The Lane| | |
| 3 | City, County, Postcode | | |
------------------------------------------------
What I would like to turn this into is:
|--------|---------------------------------------------------------------------------|
| ORDER | NOTE_TEXT |
|--------|---------------------------------------------------------------------------|
| ABC123 | The address for this is 123 The Street, City, County, Postcode |
| DEF456 | This customer has ordered this product on date |
| GHI789 | Some other note |
| JKL012 | This order is for A Smith The address is 4 A Lane, City, County, Postcode |
|--------|---------------------------------------------------------------------------|
It would probably be good to trim each note row before concatenating but I also need to make sure that I put a space between the join of two rows, just in case someone has filled the full line with text. Oh and the sequences are out of order so I need to order by first too.
Thanks for your help!
You can use listagg for this:
select "order" || listagg(seq, '') within group (order by seq) as "order",
listagg(trim(note_text), ' ') within group (order by seq) as note_text
from your_table
group by "order";
Also, note that order is a reserved keyword in oracle. Best use some other identifier or use " to escape it.
I need help including null values in my query. Let's say I have this table:
table1:
| in/out | year | name |
| in | 2011-12| jim
| in | 2011-12| tim
| in | 2012-13| toby
| out | 2011-12| ron
| out |2012-13 | jim
| out |2012-13 | joel
I created this transform statement:
Transform Count(*)
SELECT [in/out] FROM table1 WHERE name = "jim" GROUP BY [in/out]
PIVOT year IN("2011-12", "2012-13");
To get this table:
|in/out| 2011-12 | 2012-13
| in | 1 | 1
The thing is I want to include all in/out values even if they are null so for this example I'd want the table to look like this:
|in/out| 2011-12 | 2012-13
| in | 1 | 1
| out | |
Any help would be greatly appreciated. Thanks!
I need to preserve one row per group of names from table:
ID | Name | Attribute1| Attribute2 | Attribute3
1 | john | true | 2012-20-10 | 12345670
2 | john | false | 2015-20-10 | 12345671
3 | james | false | 2010-02-01 | 12345672
4 | james | false | 2010-02-03 | 12345673
5 | james | false | 2010-02-06 | 12345674
6 | sara | true | 2011-02-02 | 12345675
7 | sara | true | 2011-02-02 | 12345676
...according to specified criteria. In first place should be preserved rows with true in Attribute1 (if present), then with max date (Attribute2), and if that's not result in one row - the one with max Attribute3.
Desired result is:
ID|Name|Attribute1|Attribute2|Attribute3
1 | john | true | 2012-20-10 | 12345670
5 | james | false | 2010-02-06 | 12345674
7 | sara | true | 2011-02-02 | 12345676
I tried to do that with nested joins, but that seems to be overly complicated.
Some simply solution is to first do the SQL result of ORDER BY:
CREATE TABLE output AS
SELECT
ID,
Name,
Attribute1,
Attribute2,
Attribute3
FROM input
ORDER BY
Name,
Attribute1 DESC,
Attribute2 DESC,
Attribute3 DESC;
and do the loop for each row and check and cache if name occurred before - if not, preserve it (and cache name in some global variable), else delete row.
Is there any other pure SQL solution?
For Postgresql:
select distinct on (name) *
from t
order by name, attribute1 desc, attribute2 desc, attribute3 desc
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT