I need to preserve one row per group of names from table:
ID | Name | Attribute1| Attribute2 | Attribute3
1 | john | true | 2012-20-10 | 12345670
2 | john | false | 2015-20-10 | 12345671
3 | james | false | 2010-02-01 | 12345672
4 | james | false | 2010-02-03 | 12345673
5 | james | false | 2010-02-06 | 12345674
6 | sara | true | 2011-02-02 | 12345675
7 | sara | true | 2011-02-02 | 12345676
...according to specified criteria. In first place should be preserved rows with true in Attribute1 (if present), then with max date (Attribute2), and if that's not result in one row - the one with max Attribute3.
Desired result is:
ID|Name|Attribute1|Attribute2|Attribute3
1 | john | true | 2012-20-10 | 12345670
5 | james | false | 2010-02-06 | 12345674
7 | sara | true | 2011-02-02 | 12345676
I tried to do that with nested joins, but that seems to be overly complicated.
Some simply solution is to first do the SQL result of ORDER BY:
CREATE TABLE output AS
SELECT
ID,
Name,
Attribute1,
Attribute2,
Attribute3
FROM input
ORDER BY
Name,
Attribute1 DESC,
Attribute2 DESC,
Attribute3 DESC;
and do the loop for each row and check and cache if name occurred before - if not, preserve it (and cache name in some global variable), else delete row.
Is there any other pure SQL solution?
For Postgresql:
select distinct on (name) *
from t
order by name, attribute1 desc, attribute2 desc, attribute3 desc
https://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
Related
I have 2 source tables at the moment.
Table #1: sourceTableMain
|EmployeeNumber| DepartmentNumber | CostCenterNumber |
| -------------| ---------------- |------------------|
| 1 | 100 | 1001 |
| 2 | 200 | 1001 |
| 3 | 100 | 1002 |
Table #2: sourceTableEmployee
|EmployeeNumber| EmployeeFirstName | EmployeeLastName | EmployeeAddress |
| -------------| ---------------- |------------------|---------------- |
| 1 | Michael | Scott | 110 ABC Ln |
| 1 | Michael | Scott | 450 XYZ Ln |
| 2 | Dwight | Schrute | 321 PQR St |
| 3 | Jim | Halpert | 678 LMN Blvd |
I am trying to insert the combine the rows into a 3rd table named targetTableCombined which has the following schema:
FieldName
Type
Mode
employeeNumber
INTEGER
NULLABLE
employeeDetails
(struct)
RECORD
REPEATED
employeeFirstName
STRING
NULLABLE
employeeLastName
STRING
NULLABLE
employeeAddress
STRING
NULLABLE
Within the target table (targetTableCombined), I am trying to make sure that for each employeeNumber, all of the First Names, Last Names and Addresses are repeated under a single struct array. For example, EmployeeNumber 1 should have only 1 row in the target table, with the first name, last name and different addresses as part of the second column (struct), each in a separate row.
I wrote an insert script to do this, but I am going wrong:
insert into `dev.try_sbx.targetTableCombined`
select
main.employeeNumber,
array(
select as struct
emp.employeeFirstName,
emp.employeeLastName,
emp.employeeAddress
)
from
`dev.try_sbx.sourceTableMain` as main
inner join `dev.try_sbx.sourceTableEmployee` as emp
on main.EmployeeNumber = emp.EmployeeNumber;
This is the result I am getting when running the query above:
| EmployeeNumber | EmployeeDetails |
| ------------- | ------------------------------ |
| 1 | [Michael, Scott, 110 ABC Ln] |
| 1 | [Michael, Scott, 450 XYZ Ln] |
| 2 | [Dwight, Schrute, 321 PQR St] |
| 3 | [Jim, Halpert, 678 LMN Blvd] |
(Sorry about not being able to share screenshots - I don't have enough rep. But to elaborate, I am expecting only 3 rows on the insert (employee 1 should have had a single array containing both addresses). I am instead, getting 4 rows after the insert.)
Where am I going wrong with my script?
It's because ARRAY() is not an aggregation function. You should ARRAY_AGG() along with GROUP BY to group details for each employee into an array.
SELECT EmployeeNumber,
ARRAY_AGG((SELECT AS STRUCT EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
More preferred way is :
SELECT EmployeeNumber,
ARRAY_AGG(STRUCT(EmployeeFirstName, EmployeeLastName, EmployeeAddress)) AS employeeDetails
FROM `dev.try_sbx.sourceTableEmployee`
GROUP BY 1;
output:
I am having trouble querying some data. The table I am trying to pull the data from is a LOG table, where I would like to see changes in the values next to each other (example below)
Table:
+-----------+----+-------------+----------+------------+
| UNIQUE_ID | ID | NAME | CITY | DATE |
+-----------+----+-------------+----------+------------+
| xa220 | 1 | John Smith | Berlin | 2020.05.01 |
| xa195 | 1 | John Smith | Berlin | 2020.03.01 |
| xa111 | 1 | John Smith | München | 2020.01.01 |
| xa106 | 2 | James Brown | Atlanta | 2018.04.04 |
| xa100 | 2 | James Brown | Boston | 2017.12.10 |
| xa76 | 3 | Emily Wolf | Shanghai | 2016.11.03 |
| xa20 | 3 | Emily Wolf | Shanghai | 2016.07.03 |
| xa15 | 3 | Emily Wolf | Tokyo | 2014.02.22 |
| xa12 | 3 | Emily Wolf | null | 2014.02.22 |
+-----------+----+-------------+----------+------------+
Desired outcome:
+----+-------------+----------+---------------+
| ID | NAME | CITY | PREVIOUS_CITY |
+----+-------------+----------+---------------+
| 1 | John Smith | Berlin | München |
| 2 | James Brown | Atlanta | Boston |
| 3 | Emily Wolf | Shanghai | Tokyo |
| 3 | Emily Wolf | Tokyo | null |
+----+-------------+----------+---------------+
I have been trying to use FIRST and LAST values, however, cannot get the desired outcome.
select distinct id,
name,
city,
first_value(city) over (partition by id order by city) as previous_city
from test
Any help is appreciated!
Thank you!
Use the LAG function to get the city for previous date and display only the rows where current city and the result of lag are different:
WITH cte AS (
SELECT t.*, LAG(CITY, 1, CITY) OVER (PARTITION BY ID ORDER BY "DATE") LAG_CITY
FROM yourTable t
)
SELECT ID, NAME, CITY, LAG_CITY AS PREVIOUS_CITY
FROM cte
WHERE
CITY <> LAG_CITY OR
CITY IS NULL AND LAG_CITY IS NOT NULL OR
CITY IS NOT NULL AND LAG_CITY IS NULL
ORDER BY
ID, "DATE" DESC;
Demo
Some comments on how LAG is being used and its values checked are warranted. We use the three parameter version of LAG here. The second parameter means the number of records to look back, which in this case is 1 (the default). The third parameter means the default value to use should a given record per ID partition be the first. In this case, we use the default as the same CITY value. This means that the first record would never appear in the result set.
For the WHERE clause above, a matching record is one for which the city and lag city are different, or for where one of the two be NULL and the other not NULL. This is the logic needed to treat a NULL city and some not NULL city value as being different.
I have the following problem:
I have a table with different columns describing objects. One of this column let's assume can contain the values 1,2,3,4,5,6,7,8,9,10. Within this table objects can contain all of these values or some just contain for example value 1,3,5 (so 0 to n values)
Now I want to find all the objects containing only the value 1 and 2, but I do not want them in my result set if they contain 1,2,3 or other combinations but (1,2).
How do I write this SQL statement?
Sample data (Result set to be expected --> Mark and Michael):
+---------+--------------------+---------------------------+--+
| OBJ | OBJ_CHARACTERISTIC | CHARACTERISTIC_DATE_ADDED | |
+---------+--------------------+---------------------------+--+
| Mark | 1 | 15.01.2018 | |
| Mark | 2 | 15.02.2018 | |
| Jimmy | 1 | 31.01.2018 | |
| Jimmy | 2 | 11.02.2018 | |
| Jimmy | 4 | 15.03.2018 | |
| Jimmy | 5 | 15.04.2018 | |
| Jimmy | 6 | 15.04.2018 | |
| Harry | 1 | 08.01.2018 | |
| Harry | 2 | 11.01.2018 | |
| Harry | 3 | 15.02.2018 | |
| Michael | 1 | 15.06.2018 | |
| Michael | 2 | 15.07.2018 | |
| Dwayne | 4 | 15.01.2018 | |
| Dwayne | 5 | 15.01.2018 | |
| Dwayne | 6 | 15.01.2018 | |
+---------+--------------------+---------------------------+--+
You could use analytic counts to see how many characteristics each object has, and how many of the ones you are looking for; and then compare those counts:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
count(distinct obj_characteristic) over (partition by obj) as c1,
count(distinct case when obj_characteristic in (1,2) then obj_characteristic end)
over (partition by obj) as c2
from your_table
)
where c1 = c2;
With your sample data that gives:
OBJ OBJ_CHARACTERISTIC CHARACTERI
------- ------------------ ----------
Mark 1 2018-01-15
Mark 2 2018-02-15
Michael 1 2018-06-15
Michael 2 2018-07-15
From the way the question is worded it sounds like you want the complete rows, as above; froma comment you may only want the names. If so you can just change the outer select to:
select distinct obj
from ...
OBJ
-------
Mark
Michael
or use aggregates instead via a having clause:
select obj
from your_table
group by obj
having count(distinct obj_characteristic)
= count(distinct case when obj_characteristic in (1,2) then obj_characteristic end);
OBJ
-------
Mark
Michael
db<>fiddle demo of all three.
In this case, as 1 and 2 are contiguous, you could also do this with min/max, as an aggregate to just get the names:
select obj
from your_table
group by obj
having min (obj_characteristic) = 1
and max(obj_characteristic) = 2;
or analytically to get the complete rows:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
min(obj_characteristic) over (partition by obj) as min_char,
max(obj_characteristic) over (partition by obj) as max_char
from your_table
)
where min_char = 1
and max_char = 2;
but the earlier versions are more generic.
If you are just looking for sql to return rows values '1,2' and nothing else use:
select * from table where column like '%1,2'
Post an example of the data, it may be more helpful to understand.
#dwin90 You could try:
SELECT obj
FROM your_table
WHERE (OBJ_CHARACTERISTIC=1 OR OBJ_HARACTERISTIC=2 AND OBJ_CHARACTERISTIC !> 2
)GROUP BY OBJ
For a Table T1
+----------+-----------+-----------------+
| PersonID | Date | Employment |
+----------+-----------+-----------------+
| 1 | 2/28/2017 | Stayed the same |
| 1 | 4/21/2017 | Stayed the same |
| 1 | 5/18/2017 | Stayed the same |
| 2 | 3/7/2017 | Improved |
| 2 | 4/1/2017 | Stayed the same |
| 2 | 6/1/2017 | Stayed the same |
| 3 | 3/28/2016 | Improved |
| 3 | 5/4/2016 | Improved |
| 3 | 4/19/2017 | Worsened |
| 4 | 5/19/2016 | Worsened |
| 4 | 2/16/2017 | Improved |
+----------+-----------+-----------------+
I'm trying to calculate a Final Result field partitioning on Employment/PersonID fields, based on the latest result/person relative to prior results. What I mean by that is explained in the logic behind Final Result:
For every Person,
If all results/person are Stayed the same, then only should final
result for that person be "Stayed the same"
If Worsened/Improved
are in the result set for a person, the final result should be the
latest Worsened/Improved result for that person, irrespective of "Stayed the same" after a W/I result.
Eg:
Person 1 Final result -> Stayed the same, as per (1)
Person 2 Final result -> Improved, as per (2)
Person 3 Final result -> Worsened, as per (2)
Person 4 Final result -> Improved, as per (2)
Desired Result:
+----------+-----------------+
| PersonID | Final Result |
+----------+-----------------+
| 1 | Stayed the same |
| 2 | Improved |
| 3 | Worsened |
| 4 | Improved |
+----------+-----------------+
I know this might involve Window functions or Sub-queries but I'm struggling to code this.
Hmmm. This is a prioritization query. That sounds like row_number() is called for:
select t1.personid, t1.employment
from (select t1.*,
row_number() over (partition by personid
order by (case when employment <> 'Stayed the same' then 1 else 2 end),
date desc
) as seqnum
from t1
) t1
where seqnum = 1;
I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)
You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle
if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.
select user,max(index) index from tbl
group by user;
Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...