i have been asked to find a way to improve the following query in order to make it faster. As is it takes about 15 min to run and it deletes no rows!
If i well understand, the query deletes all duplicates rows based on a multiple column key, and it keeps only the row with the greatest data value..but i'm not so sure...
DELETE FROM mytable F
WHERE f.f_elab = 'F'
AND EXISTS (SELECT 1 FROM mytable t
WHERE f.gldgj < t.gldgj
AND T.F_ELAB = 'F'
AND F.GLMCU = t.GLMCU
AND f.globj = t.globj
AND f.glsub = t.glsub
AND NVL(f.gmdl01,' ') = NVL(t.gmdl01,' ')
AND NVL(f.imitm,0) = NVL(t.imitm,0)
AND NVL(f.imlitm,' ') = NVL(t.imlitm,' ')
AND NVL(f.articolo_lunghezza_5,' ') = NVL(t.articolo_lunghezza_5,' ')
AND NVL(f.imdsc1,' ') = NVL(t.imdsc1,' ')
AND NVL(f.gmr022,' ') = NVL(t.gmr022,' ')
AND NVL(f.hfm,' ') = NVL(t.hfm,' ')
AND NVL(f.imglpt,' ') = NVL(t.imglpt,' ')
AND NVL(f.glsbl,' ') = NVL(t.glsbl,' ')
AND NVL(f.gldct,' ') = NVL(t.gldct,' ')
AND NVL(f.classe_coge,' ') = NVL(t.classe_coge,' ')
AND NVL(f.gldoc,0) = NVL(t.gldoc,0)
AND NVL(f.imsrp1,' ') = NVL(t.imsrp1,' ')
AND NVL(F.IMSRP4,' ') = NVL(t.IMSRP4,' ')
AND NVL(f.gllt,' ') = NVL(t.gllt,' ')
AND NVL(f.gmr030,' ') = NVL(t.gmr030,' ')
AND NVL(f.componente_costo,' ') = NVL(t.componente_costo,' ')
AND NVL(f.gmr033,' ') = NVL(t.gmr033,' ')
AND NVL(f.gmr034,' ') = NVL(t.gmr034,' ')
AND rownum<2);
Can someone help me on this? Thanks in advance.
I cannot really say if this will improve performance, but I would tackle this problem like so ...
DELETE FROM MYTABLE WHERE PRIMARY_KEY IN
(
SELECT MAX(PRIMARY_KEY) FROM MYTABLE
WHERE F_ELAB = 'F'
GROUP BY
GLMCU,
globj,
glsub,
gmdl01,
<... ETC ...>,
gmr034
HAVING COUNT(*) > 1
)
This assumes you HAVE a PK, of course. It also is only useful for duplicates, not triplicates (unless you run it more than once).
There is also a question as to the records you want to keep - first in, last in?
If you have no PK and it doesn't matter which record you wish to keep you can use RowID instead of PRIMARY_KEY
I finally found the solution, the query now runs in 4 seconds!
This is what i tried:
DELETE mytable WHERE ROWID IN
(
SELECT ROWID FROM
(
SELECT ROWID, ROW_NUMBER() OVER (PARTITION BY F_ELAB, GLMCU, globj, glsub, gmdl01, imitm, imlitm, articolo_lunghezza_5, imdsc1, gmr022, hfm, imglpt, glsbl, gldct , classe_coge , gldoc , imsrp1, IMSRP4, gllt, gmr030, componente_costo , gmr033, gmr034 ORDER BY gldgj DESC ) AS rn
FROM mytable F
WHERE f.f_elab = 'F'
)
WHERE rn > 1
)
This does exactly what the first query does! and it keeps only the record with the most recent date. The only doubt i still have is about all of those fields in the PARTITION BY argument. Do i have to use the NVL as it was in the first query? or it will work the same without that even if one or more fields have the null value?
Related
I'm very new to SQL. I understand in MySQL there's the CONCAT_WS function, but BigQuery doesn't recognise this.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string, but some are NULL, and if one is NULL then the whole result will be NULL. Here's what I have so far:
CONCAT(m.track1, ", ", m.track2))) As Tracks,
I tried this but it returns NULL too:
CONCAT(m.track1, IFNULL(m.track2,CONCAT(", ", m.track2))) As Tracks,
Super grateful for any advice, thank you in advance.
Unfortunately, BigQuery doesn't support concat_ws(). So, one method is string_agg():
select t.*,
(select string_agg(track, ',')
from (select t.track1 as track union all select t.track2) x
) x
from t;
Actually a simpler method uses arrays:
select t.*,
array_to_string([track1, track2], ',')
Arrays with NULL values are not supported in result sets, but they can be used for intermediate results.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string
Assuming that these are the only fields in the table - you can use below approach - generic enough to handle any number of columns and their names w/o explicit enumeration
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
Below is oversimplified dummy example to try, test the approach
#standardSQL
with `project.dataset.table` as (
select 1 track1, 2 track2, 3 track3, 4 track4 union all
select 5, null, 7, 8
)
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
with output
I was testing a query in SQL in which I need to concatenate values in the form of a comma-separated list, and it works, I just have the problem of duplicate values.
This is the query:
SELECT t0.id_marcas AS CodMarca,
t0.nombremarcas AS NombreMarca,
t0.imagenmarcas,
(SELECT String_agg((t2.name), ', ')
FROM exlcartu_devcit.store_to_cuisine t1
INNER JOIN exlcartu_devcit.cuisine t2
ON t1.cuisine_id = t2.cuisine_id
WHERE store_id = (SELECT TOP 1 store_id
FROM exlcartu_devcit.store
WHERE id_marcas = t0.id_marcas
AND status = 1)) AS Descripcion,
t0.logo,
t0.imagen,
(SELECT TOP 1 preparing_time
FROM exlcartu_devcit.store
WHERE id_marcas = t0.id_marcas
AND status = 1) AS Tiempo,
t0.orden,
(SELECT TOP 1 Avg(minimum_amount)
FROM exlcartu_devcit.store_delivery_zone
WHERE id_marcas = t0.id_marcas) AS MontoMinimo
FROM exlcartu_devcit.[marcas] t0
I thought the solution could be just adding a DISTINCT to the query to avoid repeated values in this way ...
(SELECT STRING_AGG(DISTINCT (t2.name), ', ') AS Descripcion
But apparently the STRING_AGG() function does not support it, any idea how to avoid repeated values?
Simplest way is just select from select, like this:
with dups as (select 1 as one union all select 1 as one)
select string_agg(one, ', ') from (select distinct one from dups) q;
vs original
with dups as (select 1 as one union all select 1 as one)
select string_agg(one, ', ') from dups;
I have a table that save personnel code.
When I select from this table I get 3 rows result such as:
2129,3394,3508,3534
2129,3508
4056
I want when create select result combine in one row such as:
2129,3394,3508,3534,2129,3508,4056
or distinct value such as:
2129,3394,3508,3534,4056
You should ideally avoid storing CSV data at all in your tables. That being said, for your first result set we can try using STRING_AGG:
SELECT STRING_AGG(col, ',') AS output
FROM yourTable;
Your second requirement is more tricky, and we can try going through a table to remove duplicates:
WITH cte AS (
SELECT DISTINCT VALUE AS col
FROM yourTable t
CROSS APPLY STRING_SPLIT(t.col, ',')
)
SELECT STRING_AGG(col, ',') WITHIN GROUP (ORDER BY CAST(col AS INT)) AS output
FROM cte;
Demo
I solved this by using STUFF and FOR XML PATH:
SELECT
STUFF((SELECT ',' + US.remain_uncompleted
FROM Table_request US
WHERE exclusive = 0 AND reqact = 1 AND reqend = 0
FOR XML PATH('')), 1, 1, '')
Thank you Tim
I have a requirement for concatenating two values of two rows having same Id's and averaging for other column. Here is the sample table I have:
Now my requirement is I need to concatenate the Response column, concatenate Response Rating column and average the Rating Avg column if it has same ParticipantId, UseriD, QuestionId and ConductedById.
Here is the target data what I wanted:
Here Response column and Response rating column is concatenated with respective rows and Rating Avg column is taken the average. I have done one column concatenation previously using stuff function. Can this be achieved using stuff function?
You can do the following. Just group by those columns and make 2 subselects for concatenated columns:
select UserID,
ConductedByID,
QuestionID,
(SELECT STUFF((SELECT ';' + Response
FROM TableName tn2 WHERE tn1.UserID = tn2.UserID and
tn1.ConductedByID = tn2.ConductedByID and
tn1.QuestionID = tn2.QuestionID and
tn1.ParticipantID = tn2.ParticipantID
FOR XML PATH('')) ,1,1,'')) as Response,
(SELECT STUFF((SELECT ';' + cast(Rating as varchar)
FROM TableName tn2 WHERE tn1.UserID = tn2.UserID and
tn1.ConductedByID = tn2.ConductedByID and
tn1.QuestionID = tn2.QuestionID and
tn1.ParticipantID = tn2.ParticipantID
FOR XML PATH('')) ,1,1,'')) as [Response Rating],
AVG(case when Rating = 'n/a' then 0 else cast(Rating as int) end) as [Rating Avg],
ParticipantID
from TableName tn1
group by UserID, ConductedByID, QuestionID, ParticipantID
This works perfectly
STUFF(
(
SELECT DISTINCT ',' + val_name
FROM t_t43_value_set
INNER JOIN t_t43_factory
ON val_id = fac_country
INNER JOIN t_t43_delivery delivery
ON pvs_part_version_id = del_part_version_id
AND pvs_supplier_id = del_supplier_id
AND del_factory_id = fac_factory_id FOR xml path('')),1,1,'') AS 'Country'
I finally found the query to execute to get all ids (comma separated) for one content in one row.
Following query did the trick:
You don't need to look at the query because it already does what it should do.
SELECT
taxonomy_item_id,
SUBSTRING(
(SELECT ', ' + CAST(taxonomy_id AS varchar) AS Expr1
FROM taxonomy_item_tbl AS t2
WHERE (t1.taxonomy_item_id = taxonomy_item_id) AND (taxonomy_language_id = 2067)
ORDER BY taxonomy_item_id, taxonomy_id FOR XML PATH('')
), 1, 1000) AS taxonomy_ids
FROM
taxonomy_item_tbl AS t1
WHERE
(taxonomy_language_id = 2067) AND (taxonomy_item_id = 180555)
GROUP BY
taxonomy_item_id
The only problem is the data result I get:
180555 | <Expr1>, 404</Expr1><Expr1>, 405</Expr1><Expr1>, 723</Expr1><Expr1>, 1086</Expr1><Expr1>, 1087</Expr1><Expr1>, 1118</Expr1><Expr1>, 1124</Expr1><Expr1>, 1126</Expr1>
I don't need the <Expr1> nodes. Is there a way to delete this? If I delete AS Expr1in the query then it is automatically added back
Thanks
If you don't want the <Expr1> - then just don't ask for it!
You have:
(SELECT ', ' + CAST(taxonomy_id AS varchar) AS Expr1
That AS Expr1 causes the <Expr1> to be added - so just don't have that expression there.
Try
SELECT
taxonomy_item_id,
SUBSTRING(
(SELECT ', ' + CAST(taxonomy_id AS VARCHAR)
FROM dbo.taxonomy_item_tbl AS t2
WHERE t1.taxonomy_item_id = taxonomy_item_id
AND taxonomy_language_id = 2067
ORDER BY taxonomy_item_id, taxonomy_id
FOR XML PATH('')
), 1, 1000) AS taxonomy_ids
FROM
dbo.taxonomy_item_tbl AS t1
WHERE
taxonomy_language_id = 2067
AND taxonomy_item_id = 180555
GROUP BY
taxonomy_item_id