sorting and comparing columns in Big Query SQL - sql

I have a requirement to to sort and compare columns values. In a table having 6 columns
Need to do sorting for A_Length, A_breadth, A_Width and similar sorting need to be done for B_length, B_breadth and B_width
After sorting comparison need to do be done between A_* and B_* column based on their sorting order like
After sorting:
comparison need to be done with out put true or False
(3<11=True and 22<22= false and 23<32 =true) over all result for this is false
(5<11=true and 11<22=true and 17<32=true ) over all result for this is true
(17<11=false and 23<22=false and 27<32=true) over all result for this is False
In Biq query i can do greatest and least but not sure how to take the 3rd value(that is neither greatest nor least)
Let me know if any one can suggest a logic Its a big table having multiple column and above 6 column will be part of it.
Some more info below :
Sorting is smallest to largest. suppose i have 6 columns with values: A_Len=3, A_Bred=15,A_Wid=10, B_Len=20, B_Bred=11,B_Wid=7 . So in this first sorting for A_* columns is needed (3,10,15) then sorting for B_* column needed(7,11,20). Then in same order comparison(less than) need to be done between A_* and B_* sorted values (3<7 = result True , 10<11 = result true ,15<20= result true), i need the output after comparison as true or false . Need an suggestion how this can be done in GCP BQ , all these 6 column are part of a table.
Regards,

It seems like you want do a comparison with something that is not quite the table you currently have. It seems that you want to sort each column individually, without affecting the others, and then compare.
You could try something like this (changing all the instances of TABLE for your table name):
SELECT
sorted_A_LENGTH.A_LENGTH,
sorted_A_BREADTH.A_BREADTH,
sorted_A_WIDTH.A_WIDTH,
sorted_B_LENGTH.B_LENGTH,
sorted_B_BREADTH.B_BREADTH,
sorted_B_WIDTH.B_WIDTH,
sorted_A_LENGTH.A_LENGTH < Sorted_B_LENGTH.B_LENGTH,
sorted_A_BREADTH.A_BREADTH < sorted_B_BREADTH.B_BREADTH,
sorted_A_WIDTH.A_WIDTH < Sorted_B_WIDTH.B_WIDTH
FROM
(SELECT A_LENGTH, row_number() OVER (ORDER A_LENGTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_A_LENGTH
LEFT JOIN
(SELECT A_BREADTH, row_number() OVER (ORDER A_BREADTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_A_BREADTH
ON sorted_A_LENGTH.row_num = sorted_A_BREADTH.row_num
LEFT JOIN
(SELECT A_WIDTH, row_number() OVER (ORDER A_WIDTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_A_WIDTH
ON sorted_A_LENGTH.row_num = sorted_A_WIDTH.row_num
LEFT JOIN
(SELECT B_LENGTH, row_number() OVER (ORDER B_LENGTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_B_LENGTH
ON sorted_A_LENGTH.row_num = sorted_B_LENGTH.row_num
LEFT JOIN
(SELECT B_BREADTH, row_number() OVER (ORDER B_BREADTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_B_BREADTH
ON sorted_A_LENGTH.row_num = sorted_B_BREADTH.row_num
LEFT JOIN
(SELECT B_WIDTH, row_number() OVER (ORDER B_WIDTH) AS row_num FROM TABLE ORDER BY ASC) as sorted_B_WIDTH
ON sorted_A_LENGTH.row_num = sorted_B_WIDTH.row_num

Consider below
select *,
a_length < b_length as length_a_less_b,
a_breadth < b_breadth as breadth_a_less_b,
a_width < b_width as width_a_less_b
from (
select * from(select * from your_table limit 0) union all
select a_arr[offset(0)], a_arr[offset(1)], a_arr[offset(2)],
b_arr[offset(0)], b_arr[offset(1)], b_arr[offset(2)]
from your_table,
unnest([struct((
select array_agg(a order by a) from unnest([a_length, a_breadth, a_width]) a
) as a_arr)]),
unnest([struct((
select array_agg(b order by b) from unnest([b_length, b_breadth, b_width]) b
) as b_arr)])
)
if applied to sample data in your question
output is

Related

SQL query to return duplicate rows for certain column, but with unique values for another column

I have written the query shown here that combines three tables and returns rows where the at_ticket_num from appeal_tickets is duplicated but against a different at_sys_ref value
select top 100
t.t_reference, at.at_system_ref, at_ticket_num, a.a_case_ref
from
tickets t, appeal_tickets at, appeals_2 a
where
t.t_reference in ('AB123','AB234') -- filtering on these values so that I can see that its working
and t.t_number = at.at_ticket_num
and at.at_system_ref = a.a_system_ref
and at.at_ticket_num IN (select at_ticket_num
from appeal_tickets
group by at_ticket_num
having count(distinct at_system_ref) > 1)
order by
t.t_reference desc
This is the output:
t_reference at_system_ref at_ticket_num a_case_ref
-------------------------------------------------------
AB123 30838974 23641583 1111979010
AB123 30838976 23641583 1111979010
AB234 30839149 23641520 1111977352
AB234 30839209 23641520 1111988003
I want to modify this so that it only returns records where t_reference is duplicated but against a different a_case_ref. So in above case only records for AB234 would be returned.
Any help would be much appreciated.
You want all ticket appeals that have more than one system reference and more than one case reference it seems. You can join the tables, count the occurrences per ticket and then only keep the tickets that match these criteria.
select *
from
(
select
t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref,
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
count(distinct a.a_case_ref) over (partition by at.at_ticket_num) as caserefs
from tickets t
join appeal_tickets at on at.at_ticket_num = t.t_number
join appeals_2 a on a.a_system_ref = at.at_system_ref
) counted
where sysrefs > 1 and caserefs > 1
order by t.t_reference, at.at_system_ref, at.at_ticket_num, a.a_case_ref;
Correction
It seems that SQL Server still doesn't support COUNT(DISTINCT ...) OVER (...). You can count distinct values in a subquery though. Replace
count(distinct a.a_system_ref) over (partition by at.at_ticket_num) as sysrefs,
by
(
select count(distinct a2.a_system_ref)
from appeal_tickets at2
join appeals_2 a2 on a2.a_system_ref = at2.at_system_ref
where at2.at_ticket_num = t.t_number
) as sysrefs,
An alternative workaround is to use DENSE_RANK in two directions (found here: https://stackoverflow.com/a/53518204/2270762):
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref) +
dense_rank() over (partition by at.at_ticket_num order by a.a_system_ref desc) -
1 as sysrefs,
with data as (
<your query plus one column>,
case when
min() over (partition by t.t_reference)
<>
max() over (partition by t.t_reference)
then 1 end as dup
)
select * from data where dup = 1

Big query De-duplication query is not working properly

anyone please tell me the below query is not working properly, It suppose to delete the duplicate records only and keep the one of them (latest record) but it is deleting all the record instead of keeping one of the duplicate records, why is it so?
delete
from
dev_rahul.page_content_insights
where
(sha_id,
etl_start_utc_dttm) in (
select
(a.sha_id,
a.etl_start_utc_dttm)
from
(
select
sha_id,
etl_start_utc_dttm,
ROW_NUMBER() over (Partition by sha_id
order by
etl_start_utc_dttm desc) as rn
from
dev_rahul.page_content_insights
where
(snapshot_dt) >= '2021-03-25' ) a
where
a.rn <> 1)
Query looks ok, though I don't use that syntax for cleaning up duplicates.
Can I confirm the following:
sha_id, etl_start_utc_dttm is your primary key?
You wish to keep sha_id and the latest row based on etl_start_utc_dttm field descending?
If so, try this two query pattern:
create or replace table dev_rahul.rows_not_to_delete as
SELECT col.* FROM (SELECT ARRAY_AGG(pci ORDER BY etl_start_utc_dttm desc LIMIT 1
) OFFSET(0)] col
FROM dev_rahul.page_content_insights pci
where snapshot_dt >= '2021-03-25' )
GROUP BY sha_id
);
delete dev_rahul.page_content_insights p
where not exists (select 1 from DW_pmo.rows_not_to_delete d
where p.sha_id = d.sha_id and p.etl_start_utc_dttm = d.etl_start_utc_dttm
) and snapshot_dt >= '2021-03-25';
You could do this in a singe query by putting the first statement into a CTE.

How to deselect duplicate entries in a query?

I've got a query like this:
SELECT *
FROM RecipeTable, RecipeIngredientTable, SyncRecipeIngredientTable
WHERE RecipeTable.recipe_id = SyncRecipeIngredientTable.recipe_id
AND RecipeIngredientTable.recipe_ingredient_id =
SyncRecipeIngredientTable.recipe_ingredient_id
AND RecipeIngredientTable.recipe_item_name in ("ayva", "pirinç", "su")
GROUP by RecipeTable.recipe_id
HAVING COUNT(*) >= 3;
and this query returns the result like this:
As you can see in the image there is 3 duplicate, unnecessary entries (no, i can't delete them because of the multiple foreign keys). How can I deselect these duplicate entries from the result query? In the end I want to return 6 entries not 9.
What you want to eliminate in the result set is not duplication of recipe_id values but recipe_name values.
You just need to group(partition) by recipe_name through use of ROW_NUMBER() analytic function :
SELECT recipe_id, author_name ...
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY recipe_name) AS rn,
sr.recipe_id, author_name ...
FROM SyncRecipeIngredientTable sr
JOIN RecipeIngredientTable ri
ON ri.recipe_ingredient_id = sr.recipe_ingredient_id
JOIN RecipeTable rt
ON rt.recipe_id = sr.recipe_id
WHERE ri.recipe_item_name in ("ayva", "pirinç", "su")
)
WHERE rn = 1
This way, you can pick only one of the records with rn=1 (ORDER BY Clause might be added to that analytic function after PARTITION BY clause if spesific record is needed to be picked)

Postgres: Making column in first row contain sum of same column in other rows

I'm a newbie in postgres and i have a troubling issue.
Suppose the output of my SQL query is
123456789;"2014-11-20 12:30:35.454875";500;200;"2014-11-16 16:16:26.976258";300
123456789;"2014-11-20 12:30:35.454875";500;200;"2014-11-16 16:16:27.173523";100
What i want is to sum up all the 4th column, and so that the first row will contain the sum of the 4th column
123456789;"2014-11-20 12:30:35.454875";500;400;"2014-11-16 16:16:26.976258";300
My query is
select l.phone_no, l.loan_time, l.cents_loaned/100, r.cents_deducted/100, r.event_time,
r.cents_balance/100
from tbl_table1 l
LEFT JOIN tbl_table2 r
ON l.tb1_id = r.tbl2_id
where l.phone_no=123456789
order by r.event_time desc
Any help will be appreciated.
Maybe this helps. It will add a new row containing the sum of the 4th column.
WITH query AS (
SELECT l.phone_no, l.loan_time, l.cents_loaned/100 AS cents_loaned,
r.cents_deducted/100 AS cents_deducted, r.event_time,
r.cents_balance/100 AS cents_balance,
ROW_NUMBER() OVER (ORDER BY r.event_time DESC) rn,
SUM(cents_deducted/100) OVER () AS sum_cents_deducted
FROM tbl_table1 l
LEFT
JOIN tbl_table2 r
ON l.tb1_id = r.tbl2_id
WHERE l.phone_no=123456789
)
SELECT phone_no, loan_time, cents_loaned, cents_deducted, event_time, cents_balance
FROM query
WHERE rn > 1
UNION
ALL
SELECT phone_no, loan_time, cents_loaned, sum_cents_deducted, event_time, cents_balance
FROM query
WHERE rn = 1
Use a window function over the whole set (OVER ()) as frame:
select l.phone_no, l.loan_time, l.cents_loaned/100
, sum(r.cents_deducted) OVER () / 100 AS total_cents_deducted
, r.event_time, r.cents_balance/100
FROM tbl_table1 l
LEFT JOIN tbl_table2 r ON l.tb1_id = r.tbl2_id
WHERE l.phone_no = 123456789
ORDER BY r.event_time desc
This will return all rows, not just the first. Your question is unclear as to that.

SQL Grouping even and odd

I have to Group Certain data as so that it comes in 2 sets.
Attached image has details of actal data, expected result and data from query I used.
I am sure i am missing something in group by of max option .Please help
select agrmnt_id ,location_name, slab_no,target_start,target_end, tier_perc ,mod(RANK, 2) col from
(select agrmnt_id ,location_name, slab_no, target as target_start ,LAG(target) OVER (PARTITION BY location_name ORDER BY slab_no DESC)-1 as target_end ,PAY_PREC|| '%' as tier_perc,
DENSE_RANK() over(partition by agrmnt_id order by location_name) RANK
from plb_addnl_slab_details
where agrmnt_id='PLBCAI140262' order by location_name,slab_no
)) group by agrmnt_id,location_name ,slab_no
order by location_name1 ,slab_no1, location_name2 ,slab_no2
If I understand what you want, which is more than a little doubtful, it seems like you are able to generate a list of all the values you want, but you can't get them aligned in two sets? If so I think you need to treat your initial list as a base view and left outer join it to itself, using your col value to decide which is in first set and which in the second.
The criteria for joining seem a bit vague. If I add another ranking to stop the same values appearing twice in the second columns, I can get your expected result with this:
with t as (
select agrmnt_id, location_name, slab_no, target_start, target_end,
tier_perc , mod(col_rnk, 2) col, rnk
from (
select agrmnt_id, location_name, slab_no, target as target_start,
LAG(target) OVER (PARTITION BY location_name
ORDER BY slab_no DESC)-1 as target_end,
SLAB_PERC|| '%' as tier_perc,
DENSE_RANK() over(partition by agrmnt_id order by location_name) col_rnk,
RANK() over(partition by agrmnt_id, slab_no order by location_name) rnk
from plb_addnl_slab_details
where agrmnt_id='PLBCAI140262'
)
)
select t1.agrmnt_id as agrmnt_id_1, t1.location_name as location_name_1,
t1.slab_no as slab_no_1, t1.target_start as target_start_1,
t1.target_end as target_end_1,
t2.agrmnt_id as agrmnt_id_2, t2.location_name as location_name_2,
t2.slab_no as slab_no_2, t2.target_start as target_start_2,
t2.target_end as target_end_2
from t t1
left join t t2 on t2.agrmnt_id = t1.agrmnt_id
and t2.slab_no = t1.slab_no
and t2.rnk = t1.rnk + 1
and t2.col = 0
where t1.col = 1
order by t1.agrmnt_id, t1.location_name, t1.slab_no;
SQL Fiddle. I'm not convinced those join conditions (or the new rank) are quite right but can't really tell without more data, or more information about the logic you want to use. Hopefully this gives you something you can adapt though.