Sum cases that meet multiple criteria (pandas) - sql

I need to find the sum of cases in col2 where for each set in col1 (ABC), the col2 value has a Y in col3 100% of the time. In this case, B1 &
D1 meet this criteria, so N=2. Support in pandas or SQL are helpful (both are ideal).
| col1 | col2 | col3 | col4 | col5 |
|------|-------|-------|-------|-------|
| A | A1 | N | 1 | 256 |
| A | B1 | Y | 2 | 3 |
| A | C1 | N | 3 | 323 |
| B | F1 | N | 1 | 89 |
| B | B1 | Y | 2 | 256 |
| C | D1 | Y | 1 | 3 |
| D | A1 | N | 1 | 32 |
| D | C1 | Y | 2 | 893 |

Something like this in python pandas
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count()).sum()
Out[568]: 2
More detail :
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count())
Out[569]:
col2
A1 False
B1 True
C1 False
D1 True
F1 False
Name: col3, dtype: bool

I don't see what col1 has to do with this. You can do this with a SQL query:
select count(*)
from (select col2
from t
where min(col3) = max(col3) and min(col3) = 'Y'
) t;

Related

Pivot a table in SQL with some columns retained from the original table [duplicate]

This question already has answers here:
Getting the top 6 items in a column to pivot to a row in SQL
(1 answer)
Dynamic Pivot Needed with Row_Number()
(1 answer)
Closed 2 years ago.
I have some data in the following format:
col1 | col2 | col3 | rank
--------------------------
A | 1 | D1 | 1
A | 1 | D2 | 2
A | 1 | D3 | 3
B | 5 | E! | 1
B | 5 | E# | 2
B | 5 | E# | 3
B | 5 | E$ | 4
C | 3 | F1 | 1
C | 3 | F2 | 2
I want to pivot it by col3, but want to retain the columns col1, col2 in the resulting table. Also, when creating the pivot columns, I want to ensure that only up to a fixed rank is picked. For example, if the rank threshold is 3, the output would look like :
col1 | col2 | P1 | P2 | P3
------------------------------
A | 1 | D1 | D2 | D3
B | 5 | E! | E# | E#
Explanation:
1. In the output, the two rows with ``col1==C`` are dropped since they don't meet the rank threshold 3.
2. The row with ``col3==E$`` is dropped since it's rank is higher than the rank threshold 3.
Is there a way to achieve this with SQL Server?
Try the following with case statement and here is the demo.
with cte as
(
select
col1,
col2,
max(case when rank = 1 then col3 end) as P1,
max(case when rank = 2 then col3 end) as P2,
max(case when rank = 3 then col3 end) as P3
from myTable
group by
col1,
col2
)
select
*
from cte
where P1 is not null and P2 is not null and P3 is not null
Output:
| col1 | col2 | p1 | p2 | p3 |
| ---- | ---- | --- | --- | --- |
| A | 1 | D1 | D2 | D3 |
| B | 5 | E! | E# | E# |

Round down to nearest of Multiple of N

I have sql table as follows
+-----------------------------+
| |col1 | col2 | col3| col4| |
+-----------------------------+
| _______________________ |
| | a | 3 | d1 | 10 | |
| | a | 6 | d2 | 15 | |
| | b | 2 | d2 | 8 | |
| | b | 30 | d1 | 50 | |
+-----------------------------+
I would like transform the above table into below, where the transformation is
col4 = col4 - (col4 % min(col2) group by col1)
+------------------------------+
| |col1 | col2 | col3| col4| |
+------------------------------+
| ____________________________ |
| |a | 3 | d1 | 9 | |
| |a | 6 | d2 | 15 | |
| |b | 2 | d2 | 8 | |
| |b | 30 | d1 | 50 | |
| |
+------------------------------+
I could read the above table in application code to do transformation manually, was wondering if it was possible to offload the transformation to sql
Just run a simple select query for this:
select col1, col2, col3,
col4 - (col4 % min(col2) over (partition by col1))
from t;
There is no need to actually modify the table.
You can use a multi-table UPDATE to achieve your desired result, joining your table to a table of MIN(col2) values:
UPDATE table1
SET col4 = col4 - (col4 % t2.col2min)
FROM (SELECT col1, MIN(col2) AS col2min
FROM table1
GROUP BY col1) t2
WHERE table1.col1 = t2.col1
Output:
col1 col2 col3 col4
a 3 d1 9
a 6 d2 15
b 2 d2 8
b 30 d1 50
Demo on dbfiddle

How to return values from same table?

I've two tables A and B. I want to return all records from A and only matching from B. I can use left join for this. But after joining, I want to return records based on a flag in the same table.
Table A:
| Col1 | Col2 |
|------|------|
| 123 | 12 |
| 456 | 34 |
| 789 | 56 |
Table B:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|------|------|
| 123 | 12 | NULL | I | 1 |
| 456 | 34 | NULL | E | 1 |
| 111 | 98 | NULL | I | 1 |
| 222 | 99 | NULL | E | 1 |
| 123 | 12 | AB | NULL | 2 |
| 456 | 34 | CD | NULL | 2 |
| 123 | 12 | EF | NULL | 2 |
| 111 | 98 | GH | NULL | 2 |
| 222 | 99 | IJ | NULL | 2 |
After left joining A and B this how the result will look like:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|------|------|
| 123 | 12 | NULL | I | 1 |
| 456 | 34 | NULL | E | 1 |
| 123 | 12 | AB | NULL | 2 |
| 456 | 34 | CD | NULL | 2 |
| 123 | 12 | EF | NULL | 2 |
| 789 | 56 | NULL | NULL | NULL |
1 and 2 values in Col5 tells if Col4 should be populated or Col3. 1 for Col4 and 2 for Col3.
I want to return all the records for 'I'(but excluding the record which has 'I') in Col4 which will look like this:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 123 | 12 | AB | (null) | 2 |
| 123 | 12 | EF | (null) | 2 |
I also want to return records for 'E' (again excluding the record which has 'E') in col4 but for all the values other than one in Col3. In this case CD. Which would look like this:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 456 | 34 | AB | (null) | 2 |
| 456 | 34 | EF | (null) | 2 |
| 456 | 34 | GH | (null) | 2 |
| 456 | 34 | IJ | (null) | 2 |
Can someone suggest how to handle this in SQL?
Ok I believe the following two queries achieve your desired results. You can see all the sample code via the following SQL Fiddle.
Existence Rule:
select A.*
, B.Col3
, B.Col4
, B.Col5
from TableA A
JOIN TableB B
on A.Col1 = B.Col1
and A.Col2 = B.Col2
and B.Col5 = 2
where exists (select 1 from TableB C
where C.col1 = B.col1 and C.col2 = B.col2
and c.col4 = 'I' AND C.col5 = 1)
Results:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 123 | 12 | AB | (null) | 2 |
| 123 | 12 | EF | (null) | 2 |
Exclusion Rule:
select A.*
, B.Col3
, B.Col4
, B.Col5
from TableA A
CROSS JOIN TableB B
where b.col5 = 2
and exists (select 1 from TableB C
where C.col1 = a.col1 and C.col2 = a.col2
and c.col4 = 'E' AND C.col5 = 1)
and b.col3 not in (select col3 from TableB b
where b.col1 = a.col1 and b.col2 = a.col2 and b.col5 = 2)
Results:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 456 | 34 | AB | (null) | 2 |
| 456 | 34 | EF | (null) | 2 |
| 456 | 34 | GH | (null) | 2 |
| 456 | 34 | IJ | (null) | 2 |
Result for I:-
;with cte1 As(select a.col1,a.col2 from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 = 'I'),cte2 As(select b.col3,b.col4,b.col5 from from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 <> 'I')
Result for E:-
select a.col1,a.col2,b.col3,b.col4,b.col5 from cte1 a cross join cte2 b
;with cte1 As(select a.col1,a.col2 from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 = 'E'),cte2 As(select b.col3,b.col4,b.col5 from from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 <> 'E')
select a.col1,a.col2,b.col3,b.col4,b.col5 from cte1 a cross join cte2 b
select c.col1, c.col2
from
(select a.col1, a.col2, b.col3 from a inner join table b on a.id = b.id
where "condition" ) c
where c.col1 = "condition"
This is the script. The explanation is:
Inside the () i wrote the first select. There, you will do the select with your joins and your conditions. At the end of the select i wrote "c" which is the name of the table generated from the sub-select.
Then, you'll select some values from the generated table and filter them with a where that will act on the results generated by the table created with the sub-select
EDIT: I used your question's names to make it easier

Transforming Columns to Rows in SQL

I have a scenario in one of the implementations to be done on Hive for reporting. I have a table structure that currently looks as below -
+------+------+----------+----------+-------+-------+--------+--------+
| Col1 | Col2 | M1_Today | M2_Today | M1_LW | M2_LW | M1_L2W | M2_L2W |
+------+------+----------+----------+-------+-------+--------+--------+
| A | A1 | 10 | 200 | 9 | 190 | 11 | 210 |
| A | A2 | 12 | 210 | 11 | 200 | 13 | 220 |
| B | B1 | 15 | 300 | 14 | 290 | 16 | 310 |
| B | B2 | 18 | 310 | 17 | 300 | 19 | 320 |
+------+------+----------+----------+-------+-------+--------+--------+
The columns in the table need to be transformed to appear as below -
+------+------+-------+----+-----+
| Col1 | Col2 | Col3 | M1 | M2 |
+------+------+-------+----+-----+
| A | A1 | Today | 10 | 200 |
| A | A1 | LW | 9 | 190 |
| A | A1 | L2W | 11 | 210 |
| A | A2 | Today | 12 | 210 |
| A | A2 | LW | 11 | 200 |
| A | A2 | L2W | 13 | 220 |
| B | B1 | Today | 15 | 300 |
| B | B1 | LW | 16 | 310 |
| B | B1 | L2W | 14 | 290 |
| B | B2 | Today | 18 | 310 |
| B | B2 | LW | 17 | 300 |
| B | B2 | L2W | 19 | 320 |
+------+------+-------+----+-----+
How can this be achieved via SQL. I am using HIVE as my datastore. Any help is much appreciated
You could use this:
SELECT Col1, Col2, 'Today' AS Col3 , M1_Today AS M1, M2_Today AS M2
FROM table_name
UNION ALL
SELECT Col1, Col2, 'LW' AS Col3 , M1_LW AS M1, M2_LW AS M2
FROM table_name
UNION ALL
SELECT Col1, Col2, 'L2W' AS Col3 , M1_L2W AS M1, M2_L2W AS M2
FROM table_name
ORDER BY Col1, Col2, Col3 DESC;

Dynamically Selecting value from column table based on column name from other table (reflection)

Assume I have 2 tables FOO and BAR as below, is it possible to use a sort of reflection on FOO, if you know the column name as a string based on a join of the BAR table?
SELECT DB, FOO.Name, FOO.Type, BAR.Field, I.DATA_TYPE AS FType, FOO.**<BAR.FIELD>** AS Value
FROM INFORMATION_SCHEMA.COLUMNS AS I
inner JOIN BAR ON I.COLUMN_NAME = BAR.Field
inner JOIN FOO ON FOO.TYPE = BAR.TYPE
WHERE DB = 4 AND FLAG = 1
i.e. for each selected row, FOO. need to change to reflect the value of the matching column in FOO, i.e. if one row has BAR {4, AC1, LO} and FOO { 4, AC1, LO, COL1} I want the value of 1 to be picked.
I know that I can probably do this is 2 rounds and merge the the data, however I wondered if anybody would know of a way to do this more efficiently in 1 go, saving code path.
I should add, I generally have around 60 columns in either table, and they are pretty random, i.e. I cannot assume that either col1, 2 or 3 exist, I can only go by what is in the equivalent BAR table.
FOO:
+--------+--------+---------+---------+--------+-------+
| DB | Name | Type | Col1 | Col2 | Col3 |
+--------+--------+---------+---------+--------+-------+
| 4 | AC1 | LO | 1 | 10 | 2 |
| 4 | AC1 | HI | 2 | 20 | 4 |
| 1 | DC2 | HI-HI | 11 | 5 | 2 |
| 1 | DC2 | HI | 22 | 10 | 4 |
| 1 | DC2 | LO | 33 | 15 | 6 |
+--------+--------+---------+---------+--------+-------+
BAR:
+--------+--------+---------+---------+--------+
| DB | Name | Type | Field | Flag |
+--------+--------+---------+---------+--------+
| 4 | AC1 | LO | Col1 | 1 |
| 4 | AC1 | HI | Col1 | 1 |
| 1 | DC2 | HI-HI | Col1 | 1 |
| 1 | DC2 | HI | Col1 | 1 |
| 1 | DC2 | LO | Col1 | 1 |
| 4 | AC1 | LO | Col2 | 0 |
| 4 | AC1 | HI | Col2 | 0 |
| 1 | DC2 | LO | Col2 | 0 |
| 1 | DC2 | HI-HI | Col2 | 0 |
| 1 | DC2 | HI | Col2 | 0 |
| 4 | AC1 | LO | Col3 | 0 |
| 4 | AC1 | HI | Col3 | 0 |
| 1 | DC2 | LO | Col3 | 0 |
| 1 | DC2 | HI-HI | Col3 | 0 |
| 1 | DC2 | HI | Col3 | 0 |
+--------+--------+---------+---------+--------+
RESULT:
+--------+--------+---------+---------+--------+--------+
| DB | Name | Type | Field | FTYPE | VALUE |
+--------+--------+---------+---------+--------+--------+
| 4 | AC1 | LO | Col1 | float | 1 |
| 4 | AC1 | HI | Col1 | float | 2 |
| 4 | AC1 | LO | Col2 | float | 10 |
| 4 | AC1 | HI | Col2 | float | 20 |
| 4 | AC1 | LO | Col3 | float | 2 |
| 4 | AC1 | HI | Col3 | float | 4 |
+--------+--------+---------+---------+--------+--------+