Add Value column using another column as Key - sql

Hopefully the table itself states the problem. Essentially with the Type column on the left, is it possible to add a unique code/value column using Type as a hash key/set based on the appearance orders of the types:
Type | Code
-----------
ADA | 1
ADA | 1
BIM | 2
BIM | 2
CUR | 3
BIM | 2
DEQ | 4
ADA | 1
... | ...
We can't simply hard-code the conversion as each time there's arbitrary number of Types.

You can use dense_rank():
select type, dense_rank() over (order by type) as code
from t;
However, I would advise you to create another table and to use that:
create table Types as (
select row_number() over (order by type) as TypeId,
type
from t
group by type;
Then, join that in:
select t.type, tt.TypeId
from t join
types tt
on t.type = tt.type;

Related

ORACLE SELECT DISTINCT VALUE ONLY IN SOME COLUMNS

+----+------+-------+---------+---------+
| id | order| value | type | account |
+----+------+-------+---------+---------+
| 1 | 1 | a | 2 | 1 |
| 1 | 2 | b | 1 | 1 |
| 1 | 3 | c | 4 | 1 |
| 1 | 4 | d | 2 | 1 |
| 1 | 5 | e | 1 | 1 |
| 1 | 5 | f | 6 | 1 |
| 2 | 6 | g | 1 | 1 |
+----+------+-------+---------+---------+
I need get a select of all fields of this table but only getting 1 row for each combination of id+type (I don't care the value of the type). But I tried some approach without result.
At the moment that I make an DISTINCT I cant include rest of the fields to make it available in a subquery. If I add ROWNUM in the subquery all rows will be different making this not working.
Some ideas?
My better query at the moment is this:
SELECT ID, TYPE, VALUE, ACCOUNT
FROM MYTABLE
WHERE ROWID IN (SELECT DISTINCT MAX(ROWID)
FROM MYTABLE
GROUP BY ID, TYPE);
It seems you need to select one (random) row for each distinct combination of id and type. If so, you could do that efficiently using the row_number analytic function. Something like this:
select id, type, value, account
from (
select id, type, value, account,
row_number() over (partition by id, type order by null) as rn
from your_table
)
where rn = 1
;
order by null means random ordering of rows within each group (partition) by (id, type); this means that the ordering step, which is usually time-consuming, will be trivial in this case. Also, Oracle optimizes such queries (for the filter rn = 1).
Or, in versions 12.1 and higher, you can get the same with the match_recognize clause:
select id, type, value, account
from my_table
match_recognize (
partition by id, type
all rows per match
pattern (^r)
define r as null is null
);
This partitions the rows by id and type, it doesn't order them (which means random ordering), and selects just the "first" row from each partition. Note that some analytic functions, including row_number(), require an order by clause (even when we don't care about the ordering) - order by null is customary, but it can't be left out completely. By contrast, in match_recognize you can leave out the order by clause (the default is "random order"). On the other hand, you can't leave out the define clause, even if it imposes no conditions whatsoever. Why Oracle doesn't use a default for that clause too, only Oracle knows.

How do I merge and delete duplicated rows in SQL using UPDATE?

For example, I have a table of:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web
2 | 23 | xyz | 0 | mobile
3 | 24 | xyzc | 0 | web
4 | 25 | xyzc | 0 | web
I want the result to be:
id | code | name | type | deviceType
---+------+------+------+-----------
1 | 23 | xyz | 0 | web&mobile
2 | 24 | xyzc | 0 | web
3 | 25 | xyzc | 0 | web
How do I do this in SQL Server using UPDATE and DELETE statements?
Any help is greatly appreciated!
I might actually suggest just leaving the original data intact, and instead creating a view here:
CREATE VIEW yourView AS
SELECT ROW_NUMBER() OVER (ORDER BY MIN(id)) AS id,
code, name, type,
STRING_AGG(deviceType, '&') WITHIN GROUP (ORDER BY id) AS deviceType
FROM yourTable
GROUP BY code, name, type;
Demo
One main reason for not actually doing the update is that every time new data comes in, you might possibly have to run that update, over and over. Instead, just keeping the original data and running the view occasionally might perform better here.
Note that I assume that you are using SQL Server 2017 or later. If not, then STRING_AGG would have to be replaced with an uglier approach, but you should consider upgrading in this case.
To do what you want, you would need two separate statements.
This updates the "first" row of each group with all the device types in the group:
update t
set t.devicetype = t1.devicetype
from mytable t
inner join (
select min(id) as id, string_agg(devicetype, '&') within group(order by id) as devicetype
from mytable
group by code, name, type
having count(*) > 1
) t1 on t1.id = t.id
This deletes everything but the first row per group:
with t as (
select row_number() over(partition by code, name, type order by id) rn
from mytable
)
delete from t where rn > 1
Demo on DB Fiddle

Unpack all arrays in a JSON column SQL Server 2019

Say I have a table Schema.table with these columns
id | json_col
on the forms e.g
id=1
json_col ={"names":["John","Peter"],"ages":["31","40"]}
The lengths of names and ages are always equal but might vary from id to id (size is at least 1 but no upper limit).
How do we get an "exploded" table - a table with a row for each "names", "ages" e.g
id | names | ages
---+-------+------
1 | John | 31
1 | Peter | 41
2 | Jim | 17
3 | Foo | 2
.
.
I have tried OPENJSON and CROSS APPLY but the following gives any combination of names and ages which is not correct, thus I need to to a lot of filtering afterwards
SELECT *
FROM Schema.table
CROSS APPLY OPENJSON(Schema.table,'$.names')
CROSS APPLY OPENJSON(Schema.table,'$.ages')
Here's my suggestion
DECLARE #tbl TABLE(id INT,json_col NVARCHAR(MAX));
INSERT INTO #tbl VALUES(1,N'{"names":["John","Peter"],"ages":["31","40"]}')
,(2,N'{"names":["Jim"],"ages":["17"]}');
SELECT t.id
,B.[key] As ValueIndex
,B.[value] AS PersonNam
,JSON_VALUE(A.ages,CONCAT('$[',B.[key],']')) AS PersonAge
FROM #tbl t
CROSS APPLY OPENJSON(t.json_col)
WITH(names NVARCHAR(MAX) AS JSON
,ages NVARCHAR(MAX) AS JSON) A
CROSS APPLY OPENJSON(A.names) B;
The idea in short:
We use OPENJSON with a WITH clause to read names and ages into new json variables.
We use one more OPENJSON to "explode" the names-array
As the key is the value's position within the array, we can use JSON_VALUE() to read the corresponding age-value by its position.
One general remark: If this JSON is under your control, you should change this to an entity-centered approach (array of objects). Such a position dependant storage can be quite erronous... Try something like
{"persons":[{"name":"John","age":"31"},{"name":"Peter","age":"40"}]}
Conditional Aggregation along with applying CROSS APPLY might be used :
SELECT id,
MAX(CASE WHEN RowKey = 'names' THEN value END) AS names,
MAX(CASE WHEN RowKey = 'ages' THEN value END) AS ages
FROM
(
SELECT id, Q0.[value] AS RowArray, Q0.[key] AS RowKey
FROM tab
CROSS APPLY OPENJSON(JsonCol) AS Q0
) r
CROSS APPLY OPENJSON(r.RowArray) v
GROUP BY id, v.[key]
ORDER BY id, v.[key]
id | names | ages
---+-------+------
1 | John | 31
1 | Peter | 41
2 | Jim | 17
3 | Foo | 2
Demo
The first argument for OPENJSON would be a JSON column value, but not a table itself

Greatest N Per Group with JOIN and multiple order columns

I have two tables:
Table0:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-18 | 100 |
| aa | 1 | 12-10 | 101 |
| bb | 2 | 12-10 | 102 |
| cc | 1 | 12-09 | 100 |
| cc | 2 | 12-12 | 103 |
| cc | 2 | 12-01 | 109 |
| cc | 1 | 12-07 | 101 |
| dd | 1 | 12-08 | 100 |
and
Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |
I'm trying to output results where:
ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.
Given my sample data, my results should look like this:
| ID | TYPE | TIME | SITE |
|----|------|-------|------|
| aa | 1 | 12-10 | 101 |
| cc | 2 | 12-01 | 109 |
| dd | 1 | 12-08 | 100 |
I've tried these statements:
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM
FROM TABLE0
GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MSD.MASTY =MA."TYPE"
...which generates a syntax error
INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
CASE WHEN MAS.MAB=1 THEN 'B'
WHEN MAS.MAB=2 THEN 'F'
ELSE NULL END,
"TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
TABLE0 AS MA,
(SELECT "ID", MAX("TYPE") AS MAB
FROM TABLE0
GROUP BY "ID") AS MAS,
((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY
FROM TABLE0
WHERE "TYPE" = 1
GROUP BY "ID")
UNION
(SELECT "ID", MIN("TIME"), MAX("TYPE")
FROM TABLE0
WHERE "TYPE" = 2
GROUP BY "ID")) AS MACU
WHERE QTS."ID" = MA."ID"
AND QTS."ID" = MAS."ID"
AND MACU."ID" = QTS."ID"
AND MA."TIME" = MACU.MACTM
AND MA."TYPE" = MACU.MACTB
... which is getting the wrong results.
Answering your direct question "how to avoid...":
You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG
in your data, I cannot say
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id
I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)
These are ok:
SELECT
ID, max(site), min(time)
FROM
table
GROUP BY
id
SELECT
ID, site, min(time)
FROM
table
GROUP BY
id,site
I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it
Usually people ask this when they want to identify:
The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"
In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table
SELECT * FROM
(
SELECT
ID, min(time) as mintime
FROM
table
GROUP BY
id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime
What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)
Looking at your actual problem:
The ID value must exist in two tables.
The Type value must be largest group by id.
The Time value must be smallest in the largest type group.
Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:
You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time
SELECT t.id,min(t.time) FROM
(
SELECT
ID, max(type) as maxtype
FROM
table
GROUP BY
id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id
If you can't see why, let me know
demo:db<>fiddle
SELECT DISTINCT ON (t0.id)
t0.id,
type,
time,
first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time
ID must exist in both tables
This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.
SITE should be the value from the same row as the minimum TIME value.
This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.
TYPE must be the maximum for each ID.
To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...
TIME must be the minimum value for the maximum TYPE for each ID.
... Then you can order this result by time additionally to assure this condition.
Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.
DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.
I would write this using distinct on and in/exists:
select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

SQLite - select the newest row with a certain field value

I have an SQLite question which essentially boils down to the following problem.
id | key | data
1 | A | x
2 | A | x
3 | B | x
4 | B | x
5 | A | x
6 | A | x
New data is appended to the end of the table with an auto-incremented id.
Now, I want to create a query which returns the latest row for each key, like this:
id | key | data
4 | B | x
6 | A | x
I've tried some different queries but I have been unsuccessful. How do you select only the latest rows for each "key" value in the table?
use this SQL-Query:
select * from tbl where id in (select max(id) from tbl group by key);
You could split the main task into two subroutine.
You could move with the approach first retrieve all id/key value then get the id for the latest value of A and B keys,
Now you could easly write a query to get latest value for A and B because you have value of id's for both A and B keys.
SELECT *
FROM mytable
JOIN
( SELECT MAX(id) AS maxid
FROM mytable
GROUP BY "key"
) AS grp
ON grp.maxid = mytable.id
Side note: it's best not to use reserved words like keyas identifiers (for tables, fields. etc.)
Without nested SELECTs, or JOINs but only if the field determining "newest" is primary key (e.g. autoincrement):
SELECT * FROM table GROUP BY key DESC;