SQL to add position depending on multiple columns - sql

I have a table that I am adding a position column in. I will need to add a numbered position to all rows already in the table. The numbering depends on 4 columns that would match each other between rows. For example
id| name| fax | cart| area |
1| jim | 1 | 4 | 1 |
2| jim | 1 | 4 | 1 |
3| jim | 2 | 4 | 1 |
4| jim | 2 | 4 | 1 |
5| bob | 1 | 4 | 1 |
6| bob | 1 | 4 | 1 |
7| bob | 2 | 5 | 1 |
8| bob | 2 | 5 | 2 |
9| bob | 2 | 5 | 2 |
10| bob | 2 | 5 | 2 |
would result with
id| name| fax | cart| area | position
1| jim | 1 | 4 | 1 | 1
2| jim | 1 | 4 | 1 | 2
3| jim | 2 | 4 | 1 | 1
4| jim | 2 | 4 | 1 | 2
5| bob | 1 | 4 | 1 | 1
6| bob | 1 | 4 | 1 | 2
7| bob | 2 | 5 | 1 | 1
8| bob | 2 | 5 | 2 | 1
9| bob | 2 | 5 | 2 | 2
10| bob | 2 | 5 | 2 | 3
I need an sql query that will iterate over the table and add the position.

Use row_number():
select
t.*,
row_number() over(partition by name, fax, cart, area order by id) position
from mytable t
If you wanted an update query:
update mytable as t
set position = rn
from (
select id, row_number() over(partition by name, fax, cart, area order by id) rn
from mytable
) x
where x.id = t.id

Related

Group query in subquery to get column value as column name

The data i've in my database:
| id| some_id| status|
| 1| 1 | SUCCESS|
| 2| 2 | SUCCESS|
| 3| 1 | SUCCESS|
| 4| 3 | SUCCESS|
| 5| 1 | SUCCESS|
| 6| 4 | FAILED |
| 7| 1 | SUCCESS|
| 8| 1 | FAILED |
| 9| 4 | FAILED |
| 10| 1 | FAILED |
.......
I ran a query to group by id and status to get the below result:
| some_id| count| status|
| 1 | 20| SUCCESS|
| 2 | 5 | SUCCESS|
| 3 | 10| SUCCESS|
| 2 | 15| FAILED |
| 3 | 12| FAILED |
| 4 | 25 | FAILED |
I want to use the above query as subquery to get the result below, where the distinct status are column name.
| some_id| SUCCESS| FAILED|
| 1 | 20 | null/0|
| 2 | 5 | 15 |
| 3 | 10 | 12 |
| 4 | null/0| 25 |
Any other approach to get the final data is also appreciated. Let me know if need more info.
Thanks
You may use a pivot query here with the help of FILTER:
SELECT
some_id,
COUNT(*) FILTER (WHERE status = 'SUCCESS') AS SUCCESS,
COUNT(*) FILTER (WHERE status = 'FAILED') AS FAILED
FROM yourTable
GROUP BY
some_id;
Demo

Cross join remaining combinations

I am trying to build a table that would bring be a combination of all products that I could sell, based on the current ones.
Product Status Table
+-------------+--------------+----------------+
| customer_id | product_name | product_status |
+-------------+--------------+----------------+
| 1 | A | Active |
| 2 | B | Active |
| 2 | C | Active |
| 3 | A | Cancelled |
+-------------+--------------+----------------+
Now I am trying to cross join with a hard code table that would give be 4 rows per customer_id, based on all 4 product we have in our portfolio, and statuses that I would like to apply.
Portfolio Table
+--------------+------------+----------+
| product_name | status_1 | status_2 |
+--------------+------------+----------+
| A | Inelegible | Inactive |
| B | Inelegible | Inactive |
| C | Ineligible | Inactive |
| D | Inelegible | Inactive |
+--------------+------------+----------+
On my code I tried to use a CROSS JOIN in order to achieve 4 rows per customer_id. Unfortunately, for customers that have more than one product, I have double/triple rows.
This is my code:
SELECT
p.customer_id,
CASE WHEN p.product_name = pt.product_name THEN p.product_name ELSE pt.product_name END AS product_name,
CASE
WHEN p.product_name = pt.product_name THEN p.product_status
ELSE pt.status_1
END AS product_status
FROM
products AS p
CROSS JOIN
portfolio as pt
This is my current output:
+----+-------------+--------------+----------------+
| # | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
| 1 | 1 | A | Active |
| 2 | 1 | B | Inelegible |
| 3 | 1 | C | Inelegible |
| 4 | 1 | D | Inelegible |
| 5 | 2 | A | Ineligible |
| 6 | 2 | A | Ineligible |
| 7 | 2 | B | Active |
| 8 | 2 | B | Ineligible |
| 9 | 2 | C | Active |
| 10 | 2 | C | Ineligible |
| 11 | 2 | D | Ineligible |
| 12 | 2 | D | Ineligible |
| 13 | 3 | A | Cancelled |
| 14 | 3 | B | Ineligible |
| 15 | 3 | C | Ineligible |
| 16 | 3 | D | Ineligible |
+----+-------------+--------------+----------------+
As you may see, for the customer_id 2, I have two rows for each product having products B and C with different statuses then what I have on the product_status table.
What I would like to achieve, in this case, is a table with 12 rows, in which the current product/status from the product_status table is shown, and the remaining product/statuses from the portfolio table are added.
Expected output
+----+-------------+--------------+----------------+
| # | customer_id | product_name | product_status |
+----+-------------+--------------+----------------+
| 1 | 1 | A | Active |
| 2 | 1 | B | Inelegible |
| 3 | 1 | C | Inelegible |
| 4 | 1 | D | Inelegible |
| 5 | 2 | A | Ineligible |
| 6 | 2 | B | Active |
| 7 | 2 | C | Active |
| 8 | 2 | D | Ineligible |
| 9 | 3 | A | Cancelled |
| 10 | 3 | B | Ineligible |
| 11 | 3 | C | Ineligible |
| 12 | 3 | D | Ineligible |
+----+-------------+--------------+----------------+
Not sure if the CROSS JOIN is the best alternative, but now I am running out of ideas.
EDIT:
I thought of another cleaner solution. Do a cross join first, then a right join on the customer_id and product_name, and coalesce the product statuses.
SELECT customer_id, product_name, coalesce(product_status, status_1)
FROM products p
RIGHT JOIN (
SELECT *
FROM (SELECT DISTINCT customer_id FROM products) pro
CROSS JOIN portfolio
) pt
USING (customer_id, product_name)
ORDER BY customer_id, product_name
Old answer:
The idea is to include information of all product names for a customer_id into a list, and check whether the product in portfolio is in that list.
(SELECT customer_id, pt_product_name as product_name, first(status_1) as product_status
FROM (
SELECT
customer_id,
p.product_name as p_product_name,
pt.product_name as pt_product_name,
product_status,
status_1,
status_2,
collect_list(p.product_name) over (partition by customer_id) AS product_list
FROM products p
CROSS JOIN portfolio pt
)
WHERE NOT array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)
UNION ALL
(SELECT customer_id, p_product_name as product_name, first(product_status) as product_status
FROM (
SELECT
customer_id,
p.product_name as p_product_name,
pt.product_name as pt_product_name,
product_status,
status_1,
status_2,
collect_list(p.product_name) over (partition by customer_id) AS product_list
FROM products p
CROSS JOIN portfolio pt)
WHERE array_contains(product_list, pt_product_name)
GROUP BY customer_id, product_name)
ORDER BY customer_id, product_name;
which gives
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| A| Active|
| 1| B| Inelegible|
| 1| C| Ineligible|
| 1| D| Inelegible|
| 2| A| Inelegible|
| 2| B| Active|
| 2| C| Active|
| 2| D| Inelegible|
| 3| A| Cancelled|
| 3| B| Inelegible|
| 3| C| Ineligible|
| 3| D| Inelegible|
+-----------+------------+--------------+
FYI the chunk before UNION ALL gives:
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| B| Inelegible|
| 1| C| Ineligible|
| 1| D| Inelegible|
| 2| A| Inelegible|
| 2| D| Inelegible|
| 3| B| Inelegible|
| 3| C| Ineligible|
| 3| D| Inelegible|
+-----------+------------+--------------+
And the chunk after UNION ALL gives:
+-----------+------------+--------------+
|customer_id|product_name|product_status|
+-----------+------------+--------------+
| 1| A| Active|
| 2| B| Active|
| 2| C| Active|
| 3| A| Cancelled|
+-----------+------------+--------------+
Hope that helps!

Creating Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python))

I am trying to Create Third Table Using Two Table With Help Of Spark-Sql or PySpark (No Use of Panda(Python))
Dataframe One:
+---------+---------+------------+-----------+
| NAME | NAME_ID | CLIENT | CLIENT_ID |
+---------+---------+------------+-----------+
| RISHABH | 1 | SINGH | 5 |
| RISHABH | 1 | PATHAK | 3 |
| RISHABH | 1 | KUMAR | 2 |
| KEDAR | 2 | PATHAK | 3 |
| KEDAR | 2 | JADHAV | 1 |
| ANKIT | 3 | SRIVASTAVA | 6 |
| ANKIT | 3 | KUMAR | 2 |
| SUMIT | 4 | SINGH | 5 |
| SUMIT | 4 | SHARMA | 4 |
+---------+---------+------------+-----------+
Dataframe Two:
| NAME | NAME_ID | CLIENT | CLIENT_ID |
| RISHBAH | _____ | SRIVASTAVA | _____ |
| KEDAR | _____ | KUMAR | _____ |
| RISHABH | _____ | SINGH | _____ |
| KEDAR | _____ | PATHAK | _____ |
###Require Dataframe Output:###
+---------+---------+------------+-----------+
| NAME | NAME_ID | CLIENT | CLIENT_ID |
| RISHBAH | 1 | SRIVASTAVA | 6 |
| KEDAR | 2 | KUMAR | 2 |
| RISHABH | 1 | SINGH | 5 |
| KEDAR | 2 | PATHAK | 3 |
Using Spark-Sql or Spark.
Tried With df1.join(df2,df1.NAME == df2.NAME,"left")
But I am Not Getting The Output As Required.
I would suggest the following spark-sql approach
val df1 = <assuming data loaded>
val df2 = <assuming data loaded>
//createviews on top of dataframe
df1.createOrReplaceTempView("tbl1")
df1.createOrReplaceTempView("tbl2")
//extract the unique names and nameIds from the first df
uniqueNameDF=sparkSession.sql("select distict name,name_Id from tbl1")
//extract the unique client names and clientIds
uniqueClientDF=sparkSession.sql("select distict client,client_Id from tbl1")
//create Views on these temporary results
uniqueNameDF.createOrReplaceTempView("name")
uniqueClientDF.createOrReplaceTempView("client")
//join the above views with df2 to get the desired result
resultDF=sparkSession.sql("select n.name,n.name_id,c.client,c.client_id from tbl2 join name n on tbl2.name=n.name join client c on tbl2.client=c.client")
# FROM DATAFRAME ONE AS df_with_key
# SPLIT OUT DISTINCT BY NAME AND CLIENT
nameDF=df_with_key.select("NAME","NAME_ID").distinct()
clientDF=df_with_key.select("CLIENT","CLIENT_ID").distinct()
# DATAFRAME TWO AS df_with_client
+-------+-------+----------+---------+
| NAME|NAME_ID| CLIENT|CLIENT_ID|
+-------+-------+----------+---------+
| KEDAR| null| KUMAR| null|
| KEDAR| null| PATHAK| null|
|RISHABH| null| SINGH| null|
|RISHBAH| null|SRIVASTAVA| null|
+-------+-------+----------+---------+
# NOW JOIN FIRST WITH NAME AND THEN CLIENT
df_with_client.drop("NAME_ID").join(nameDF,nameDF.NAME==df_with_client.NAME,"LEFT").drop(nameDF.NAME).drop("CLIENT_ID").join(clientDF,df_with_client.CLIENT==clientDF.CLIENT).drop(clientDF.CLIENT).select("NAME","NAME_ID","CLIENT","CLIENT_ID").show()
+-------+-------+----------+---------+
| NAME|NAME_ID| CLIENT|CLIENT_ID|
+-------+-------+----------+---------+
| KEDAR| 2| KUMAR| 2|
| KEDAR| 2| PATHAK| 3|
|RISHABH| 1| SINGH| 5|
|RISHBAH| 1|SRIVASTAVA| 6|
+-------+-------+----------+---------+

summing dynamic rows using over partition by postgres

on postgres 9.2
| payer| effective_status | 1 | 2 | 3 | 4+
+------+ -----------------+-------+--------+-----+-----
| p1 | foo | 8 | 6000 | 4| 1
| p1 | bar | 10 | 5200 | 9| 2
| p1 | baz | 11 | 5200 | 11| 2
| p1 | zip | 9 | 4500 | 14| 4
| p1 | zap | 7 | 4200 | 45| 5
| p1 | status_n | 2 | 3900 | 71| 1
suppose the above query output (minus the ??s). i am trying to sum columns 1, 2, 3, and 4+ by payer and effective status. so for p1 there would be a column total including all effective_statuses, and then p2 would have a group total.
| p1 | effective_status | 1 | 2 | 3 | 4+| 1 total | 2 total|3 total| 4+ total
+------+ -----------------+-------+--------+-----+---+---------+--------+-------+----------
| | foo | 8 | 6000 | 4| 1| 94 | 6230 | 154 | 15
| | bar | 10 | 5200 | 9| 2| 94 | 6230 | 154 | 15
| | baz | 11 | 5200 | 11| 2| 94 | 6230 | 154 | 15
| | zip | 9 | 4500 | 14| 4| 94 | 6230 | 154 | 15
| | zap | 7 | 4200 | 45| 5| 94 | 6230 | 154 | 15
| | status_n | 2 | 3900 | 71| 1| 94 | 6230 | 154 | 15
how would i calculate the ??s? my I ave tried:
payer
,effective_status
,status_check1
,SUM(status_check1) OVER (PARTITION BY payer) AS status_check1_total
,status_check2
,SUM(status_check2) OVER (PARTITION BY payer) AS status_check2_total
,status_check3
,SUM(status_check3) OVER (PARTITION BY payer) AS status_check3_total
,status_check4
,SUM(status_check4) OVER (PARTITION BY payer) AS status_check4_total
which seems to work, most of the time. on occasion there are wrong totals. is this the correct approach?
If I understand correctly, you can use UNION ALL to combine total result set and your original table. then use order by by the grp order.
CREATE TABLE T(
payer varchar(50),
effective_status varchar(50),
status_check1 int,
status_check2 int,
status_check3 int,
status_check4 int
);
INSERT INTO T VALUES ('p1', 'foo',8 ,6000,4,1);
INSERT INTO T VALUES ('p1', 'bar',10,5200,9,2);
INSERT INTO T VALUES ('p1', 'baz',11,5200,11,2);
INSERT INTO T VALUES ('p1', 'zip',9 ,4500,14,4);
INSERT INTO T VALUES ('p1', 'zap',7 ,4200,45,5);
INSERT INTO T VALUES ('p1', 'status_n',2 ,3900,71,1);
INSERT INTO T VALUES ('p2', 'foo',5 ,3500,12,2);
INSERT INTO T VALUES ('p2', 'zip',1 ,5000,1,1);
Query 1:
SELECT *
FROM (
SELECT t1.payer
,effective_status
,status_check1
,status_check2
,status_check3
,status_check4
,1 grp
FROM T t1
UNION ALL
SELECT payer,
'',
SUM(status_check1),
SUM(status_check2),
SUM(status_check3),
SUM(status_check4),
2
FROM T
GROUP BY payer
) t1
ORDER BY payer,grp
Results:
| payer | effective_status | status_check1 | status_check2 | status_check3 | status_check4 | grp |
|-------|------------------|---------------|---------------|---------------|---------------|-----|
| p1 | foo | 8 | 6000 | 4 | 1 | 1 |
| p1 | bar | 10 | 5200 | 9 | 2 | 1 |
| p1 | baz | 11 | 5200 | 11 | 2 | 1 |
| p1 | zip | 9 | 4500 | 14 | 4 | 1 |
| p1 | zap | 7 | 4200 | 45 | 5 | 1 |
| p1 | status_n | 2 | 3900 | 71 | 1 | 1 |
| p1 | | 47 | 29000 | 154 | 15 | 2 |
| p2 | foo | 5 | 3500 | 12 | 2 | 1 |
| p2 | zip | 1 | 5000 | 1 | 1 | 1 |
| p2 | | 6 | 8500 | 13 | 3 | 2 |
I'm not sure why you are using window functions. This would appear to be union all:
select payer, effective_status, status_check1, status_check2, status_check3, status_check4
from t
union all
select payer, null, sum(status_check1), sum(status_check2), sum(status_check3), sum(status_check4)
order by payer, effective_status nulls last;
Postgres 9.5 supports grouping sets which simplifies such logic.
Actually, I didn't get clearly what you are trying to do, but if you want to have result grouped by payer and effective_status it possibly would look like this
select
payer as p,
effective_status as es,
(sum(col1) + sum(col2) + sum(col3) + sum(col4)) as sum
from table_name
group by p, es
So, hope it will help you

Converting rows to columns and keeping data pairs

I have the following problem in MSSQL: I have a table, which contains 4 columns.
Example table:
JunctionId | type| color| value
1 | a | red | 5|
1 | b | green | 10|
2 | a | orange | 40|
2 | b | yellow | 35|
3 | a | blue | 6|
3 | b | cyan | 9|
Now, I'd like the following result:
1 | a | red | 5 | b | green | 10
2 | a | orange | 40 | b | yellow | 35
3 | a | blue | 6 | b | cyan | 9
I tried using PIVOT, but it was returning multiple rows because of the different values. I would use selfjoin, but I have 12 different 'type'. Any ideas would be very welcomed!
(note: I can't use this stackoverflow table thingy... sorry)
Self join time
select a1.junctionid,
a1.type as a_type,
a1.color as a_color,
a1.value as a_value,
a2.type as b_type,
a2.color as b_color,
a2.value as b_value
from MyTable a1
inner join MyTable a2
on a1.junctionid = a2.junctionid
where a1.type = 'a'
and a2.type = 'b'