In SQL, how do I match specific columns on specific rows? - sql

This might be hard to describe in the title, here's a sample data:
id pub_type general_suppl book_suppl catalogue_suppl magazine_suppl
1 book 10 10 0 0
2 book 11 11 0 0
3 catalogue 10 0 10 0
4 magazine 9 0 0 9
5 other 10 0 0 0
6 magazine 8 0 0 10
Each of the item is of a specific publication type with a general supplier and a supplier for the type of publication. other items only have a general_suppl. If I want to get all items on supplier value 10, the following conditions will have to be met:
if pub_type == 'book'
match on book_suppl == 10
elif pub_type == 'catalogue'
match on catalogue_suppl == 10
elif pub_type == 'magazine'
match on magazine_suppl == 10
else
match on general_suppl == 10
As you can see above, if pub_type falls in book,catalogue,magazine, I ignore the column general_suppl.
The expected output on supplier value 10 will be:
id pub_type general_suppl book_suppl catalogue_suppl magazine_suppl
1 book 10 10 0 0
3 catalogue 10 0 10 0
5 other 10 0 0 0
6 magazine 8 0 0 10
I can achieve the above by retrieving all the rows and perform filtering at the code level. Is there a single SQL way to get the above results? The database design and data are beyond my control, so I can't re-design the DB and will have to work with the above table structure.

It's ugly, but you can throw that logic into a CASE structure.
SELECT *
FROM table
WHERE 10 = CASE WHEN pub_type = 'book' THEN book_suppl
WHEN pub_type = 'catalogue' THEN catalogue_suppl
WHEN pub_type = 'magazine' THEN magazine_suppl
ELSE general_suppl END

and to the rescue!
select *
from table
where (pub_type='book' and book_suppl=10)
or (pub_type='catalogue' and catalogue_suppl=10)
or (pub_type='magazine' and magazine_suppl=10)
or (pub_type not in ('book','catalogue','magazine') and general_suppl=10)

Related

How to count unique combinations of Co-ordinates to find most customers in grid section

I have a customer table with their closest delivery hub on a grid based system and need to calculate what is the most populated area using a query.
This is the current query I have that lists all of the Co-ordinates per Customer.
SELECT Customers.HubID, TO_CHAR(Hubs.HubCoordX, 'FM999999999999') as "X Co-ordinate", TO_CHAR(Hubs.HubCoordX, 'FM999999999999') AS "Y Co-ordinate" FROM Customers INNER JOIN Hubs ON Customers.HubID = Hubs.DestinationID ORDER BY Hubs.HubCoordX, Hubs.HubCoordY
This query creates the following result.
HubID
X Co-ord
Y Co-ord
9
-3
1
11
-2
18
2
0
0
3
0
0
3
0
0
1
0
0
1
0
0
3
0
0
4
3
1
5
3
1
7
7
3
But I need a result like this
X Co-ordinate
Y Co-ordinate
Population
-3
1
1
-2
18
1
0
0
6
3
1
2
7
3
1
Thanks in advance
I have attempted use Count Unique however it resulted in only counting individual Co-ordinates once.
SELECT TO_CHAR(Hubs.HubCoordX, 'FM999999999999') as "X Co-ordinate",
TO_CHAR(Hubs.HubCoordX, 'FM999999999999') AS "Y Co-ordinate", Count(HubID) as "Population"
FROM Customers
INNER JOIN Hubs ON Customers.HubID = Hubs.DestinationID
Group BY Hubs.HubCoordX, Hubs.HubCoordY

Rearranging SQL Query Results

I am working on an application that queries a SQL database to get Billing of Material information. I have successfully queried out the data that I wanted using the following query.
WITH bom (bomItem, partId, btmlvl)
AS
(
SELECT [bomItem], [partId], [btmlvl]
FROM [TESTDB].[dbo].[BOMTABLE]
WHERE [TESTDB].[dbo].[BOMTABLE].[bomItem] = 'PART# GOES HERE, PASSED IN BY THE APP'
UNION ALL
SELECT subQuery.bomItem, subQuery.partId, subQuery.btmlvl
FROM [TESTDB].[dbo].[BOMTABLE] AS subQuery
INNER JOIN bom AS mainQuery
on subQuery.bomItem = mainQuery.partId
)
SELECT [bomItem], [partId], [btmlvl]
FROM bom
The query results in (Sample Data):
bomItem partId btmlvl
---------------------------------
1 2 1
1 3 1
1 4 0
1 5 1
1 6 1
1 7 1
1 8 0
1 9 1
1 10 1
1 11 1
8 12 1
8 10 1
8 11 1
8 13 1
8 14 1
8 15 1
8 16 1
4 17 1
4 18 1
4 19 1
The data works as follows:
bomItem - The part number of the assembly that I am looking up the bill of materials for
partId - All of the parts and sub-assemblies tied to the bomItem I'm looking up
btmlvl - 0 indicates it's a sub-assembly, 1 indicates a standalone part (not really important, I was just using this to make sure I was getting the results that I wanted)
And what I want it to look like is (Expected Results):
bomItem partId btmlvl
---------------------------------
1 2 1
1 3 1
1 4 0
4 17 1
4 18 1
4 19 1
1 5 1
1 6 1
1 7 1
1 8 0
8 12 1
8 10 1
8 11 1
8 13 1
8 14 1
8 15 1
8 16 1
1 9 1
1 10 1
1 11 1
I could export it to a CSV or something else and write a script to rearrange the data, but I would prefer if it could be done as part of the SQL query. I messed around some with CASE statements but didn't quite achieve the desired results.
Bonus points if it could be aligned as follows (again the btmlvl isn't important, and will ultimately be removed). Again, novice SQL user here, but I think this would be possible using a PIVOT? (I just haven't gotten that far):
bomItem partId partId2
---------------------------------
...
1 8
8 12
8 10
8 11
8 13
8 14
8 15
8 16
...
Any help would be greatly appreciated, and please excuse the lengthiness of this post!
EDIT:
The only difference between my actual data and what I supplied is I simplified the part numbers from 11 characters strings to one-digit and two-digit integers and also removed other information I thought was irrelevant (e.g. qty on hand, purchasing cost, etc.). The first set of data is the sample data resulting from passing in "1" to WHERE [TESTDB].[dbo].[BOMTABLE].[bomItem] = 'PART# GOES HERE, PASSED IN BY THE APP'. This means that the first 10 lines of that data (again, anything with the bomItem == "1"), are all of the parts that make up bomItem == "1".
Looking at the "btmlvl" column for all bomItem == "1", you can see partId's 4 & 8 are assemblies (read sub-assemblies of bomItem == "1"), because btmlvl == "0". This means I need to perform additional queries to obtain the bill of materials for those part numbers. This is where the subQuery within the CTE comes in, giving me the line items starting with bomItem == "4" and bomItem == "8".
The difference between the output as it stands right now, and the output that is desired is the current output lists all parts for bomItem == "1", then all parts for bomItem == "8", and lastly all parts for bomItem == "4".
But what I want is where bomItem == "1" & partId's 1 through 3 to appear as they are right now, but once partId == "4", directly underneath that line I want all of the lines where bomItem == "4", then continue with the lines that contain bomItem == "1" & partId's 5, 6, 7, but again once partId 8 is reached, do a similar thing as with partId 4, then finish partId's 9, 10, 11 as normal.
This is inherently hierarchal data, so you need a self-join.
I think this gets you what you want.
SELECT a.* FROM bom a
LEFT JOIN bom b
ON a.bomitem = b.partID
ORDER BY ISNULL (b.partID, a.partID), a.partID

Check if condition is true and if so add value to another column in sql

I have a postgres table that looks like this:
A B
5 4
10 10
13 15
100 250
20 Null
Using SQL, I would like to check whether the value in column A is larger than the value in column B and if so, then add a 1 to the column True. If the value in column A is smaller or equal to the value in column B or if column B contains a [NULL] value, I would like to add a 1 to the column False, like so:
A B True False
5 4 1 0
10 10 0 1
13 15 0 1
100 25 1 0
20 [NULL] 0 1
What is the best way to achieve this?
You can use case logic:
select t.*,
(case when A > B then 1 else 0 end) as true_col,
(case when A > B then 0 else 1 end) as false_col
from t;

rolling sum of a column in pandas dataframe at variable intervals

I have a list of index numbers that represent index locations for a DF. list_index = [2,7,12]
I want to sum from a single column in the DF by rolling through each number in list_index and totaling the counts between the index points (and restart count at 0 at each index point). Here is a mini example.
The desired output is in OUTPUT column, which increments every time there is another 1 from COL 1 and RESTARTS the count at 0 on the location after the number in the list_index.
I was able to get it to work with a loop but there are millions of rows in the DF and it takes a while for the loop to run. It seems like I need a lambda function with a sum but I need to input start and end point in index.
Something like lambda x:x.rolling(start_index, end_index).sum()? Can anyone help me out on this.
You can try of cummulative sum and retrieving only 1 values related information , rolling sum with diffferent intervals is not possible
a = df['col'].eq(1).cumsum()
df['output'] = a - a.mask(df['col'].eq(1)).ffill().fillna(0).astype(int)
Out:
col output
0 0 0
1 1 1
2 1 2
3 0 0
4 1 1
5 1 2
6 1 3
7 0 0
8 0 0
9 0 0
10 0 0
11 1 1
12 1 2
13 0 0
14 0 0
15 1 1

SQL Server 2012 if one of the columns contain 1 function

I am trying to figure how I could do this where I have a table as follows:
ID FKeyID Complete
1 6 1
2 6 0
3 6 0
4 7 0
5 8 0
6 8 0
I want to create a function to return 1 or true if any FKeyID for example 6 has a value of 1 in complete column and 0 if it does not.
This is a function that takes fKey value and should return 1 or 0 based on that.
So in above basically if my FKeyID is 6 return 1 because complete column is 1 in one of the rows, and 0 for FKeyID 8 because none of values in column complete is 1.
CREATE function [dbo].f_x
(
#FKeyID int
)
RETURNS bit
as
begin
return case when exists
(select 1 from test where Complete = 1 and FKeyID = #FKeyID)
then 1 else 0 end
end