pandas groupby-transform logic in postgreSQL

pandas groupby-transform logic in postgreSQL - sql

I need to compare a value in one column to all the values in the same column, to search for a pattern obtained from each value among all the others.
For example, in this table, I need to search for the xxx, yyy, zzz paterns from the first Order ID value in all the other Order IDs to find it and create a new column replacing the actual value for the pattern.
Suppliers
Order ID
A1
xxx
A1
00xxx
A1
xxx0
A1
200xx
A2
yyy
A2
01yyy0
A2
45yyy
A3
45zzz
Obtaining
Suppliers
Order ID
Order ID OK
A1
xxx
xxx
A1
00xxx
xxx
A1
xxx0
xxx
A1
200xx
200xx
A2
yyy
yyy
A2
01yyy0
yyy
A2
45yyy
yyy
A3
45zzz
zzz
I have a problem with the runtime here, because as the table increases, the search space does as well and I end up with an exponential increase in runtime, so I just need to run the search logic *for each supplier (this is where the other column Suppliers needs to play a role), that is,
search all the values for Supplier A1 and replace the pattern, then do the same for supplier A2 and so on.
In pandas, that can be achieved using split-apply-combine techinques like df.groupby.transform, but I am totally lost on how to do that using postgreSQL or Vertica SQL.
here what I have tried so far, using a self join on the Order Id columns to create the search space for each value in all the others, but the runtime is not feasible.
WITH cte AS (
SELECT
A."Suppliers" AS "Suppliers",
A."Order ID" AS "Order ID",
B."Order ID" AS "Order ID OK",
ROW_NUMBER() OVER (PARTITION BY A."Order ID") AS "DUPLICATE_ORDER_ID"
FROM table AS A
LEFT JOIN table AS B
ON A."Order ID" LIKE '%' || B."Order ID" || '%'
)
SELECT Suppliers, "Order ID", "Order ID OK" FROM cte
WHERE "DUPLICATE_ORDER_ID" = 1
Is there anyway I could apply this same code modifiying it to only apply the search logic by supplier and then concatenate the result? (The Order ID values are unique no matter the supplier)
Thank you!!!!!!

Related

filter a column based on another column in oracle query

I have table like this :
ID | key | value
1 | A1 |o1
1 | A2 |o2
1 | A3 |o3
2 | A1 |o4
2 | A2 |o5
3 | A1 |o6
3 | A3 |o7
4 | A3 |o8
I want to write a oracle query that can filter value column based on key column .
some thing like this
select ID
where
if key = A1 then value ='o1'
and key = A3 then value ='o4'
please help me to write this query.
***To clarify my question ,I need list of IDs in result that all condition(key-value) are true for them. for each IDs I should check key-values (with AND ) and if all conditions are true then this ID is acceptable .
thanks

IF means PL/SQL. In SQL, we use CASE expression instead (or DECODE, if you want). Doing so, you'd move value out of the expression and use something like this:
where id = 1
and value = case when key = 'A1' then 'o1'
when key = 'A3' then 'o4'
end

You are mixing filtering and selection. List the columns that you want to display in the SELECT list and the columns used to filter in the WHERE clause
SELECT key, value
FROM my_table
WHERE ID = 1 AND key IN ('A1', 'A2')
If there is no value column in your table, you can use the DECODE function
SELECT key, DECODE(key, 'A1', 'o1', 'A2', 'o4', key) AS value
FROM my_table
WHERE ID = 1
After the key, you must specify pairs of search and result values. The pairs can be followed by a default value. In this example, since we did not specify a result for 'A3', the result will be the key itself. If no default value was specified, NULL would be returned for missing search values.
update
It seems that I have misunderstood the question (see #mathguy's comment). You can filter the way you want by simply using the Boolean operators AND and OR
SELECT * FROM
FROM my_table
WHERE
ID = 1 AND
(
key = 'A1' AND value ='o1' OR
key = 'A3' AND value ='o4'
)
By using this pattern it is easy to add more constraints of this kind. Note that AND has precedence over OR (like * over +).

Impala - Get for all tables in database concentenated columns

Lets say I have a database A with tables B1 and B2.
B1 has columns C1 and C2
and B2 has columns D1, D2 and D3.
I am looking for an Impala query that yields the following desired output:
B1 | "C1+C2"
B2 | "D1+D2+D3"
where "D1+D2+D3" and "C1+C2" are concatenated strings.

Do you want the concatenated columns in a new table? Or do you want to add the concatenated columns to your existing tables? Either way, you can use the code below in impala to concatenated columns:
SELECT
CONCAT(C1,C2) AS concat_fields
, "B1" AS table_name
FROM B1
UNION
SELECT
CONCAT(D1,D2,D3) AS concat_fields
, "B2" AS table_name
FROM B2

Postgresql : Count columns whose value starting with the value of another column

How do I count rows where a column value starts with a another column value ?
For example, I have table products shown below
---------------------------
id code abbreviation
---------------------------
1 AA01 AA
2 AB02 AB
3 AA03 AA
4 AA04 AB
---------------------------
I want to get the count of products whose code starts with abbreviation. A query like this
select count(*) from products where code ilike abbreviation+'%'
I am using postgresql 9.5.3

The string concatenation operator in postgresql is: ||
select count(*) from products where code like abbreviation || '%';

You can try:
select count(*) from products where code like '%'+abbreviation+'%'
But i am not sure why do you need this type of query.

Unique aggregate function when singular value is guaranteed by the WHERE clause

Given the following:
CREATE TABLE A (A1 INTEGER, A2 INTEGER, A3 INTEGER);
INSERT INTO A(A1, A2, A3) VALUES (1, 1, 1);
INSERT INTO A(A1, A2, A3) VALUES (2, 1, 1);
I want to select the maximum A1 given specific A2 and A3 values, and have those values (A2 and A3) also appear in the returned row (e.g. so that I may use them in a join since the SELECT below is meant for a sub-query).
It would seem logical to be able to do the following, given that A2 and A3 are hardcoded in the WHERE clause:
SELECT MAX(A1) AS A1, A2, A3 FROM A WHERE A2=1 AND A3=1
However, PostgreSQL (and I suspect other RDBMs as well) balks at that and requests an aggregate function for A2 and A3 even though their value is fixed. So instead, I either have to do a:
SELECT MAX(A1) AS A1, MAX(A2), MAX(A3) FROM A WHERE A2=1 AND A3=1
or a:
SELECT MAX(A1) AS A1, 1, 1 FROM A WHERE A2=1 AND A3=1
The first alternative I don't like cause I could have used MIN instead and it would still work, whereas the second alternative doubles the number of positional parameters to provide values for when used from a programming language interface. Ideally I would have wanted a UNIQUE aggregate function which would assert that all values are equal and return that single value, or even a RANDOM aggregate function which would return one value at random (since I know from the WHERE clause that they are all equal).
Is there an idiomatic way to write the above in PostgreSQL?

Even simpler, you only need ORDER BY / LIMIT 1
SELECT a1, a2, a3 -- add more columns as you please
FROM a
WHERE a2 = 1 AND a3 = 1
ORDER BY 1 DESC -- 1 is just a positional reference (syntax shorthand)
LIMIT 1;
LIMIT 1 is Postgres specific syntax.
The SQL standard would be:
...
FETCH FIRST 1 ROWS ONLY
My first answer with DISTINCT ON was for the more complex case where you'd want to retrieve the maximum a1 per various combinations of (a2,a3)
Aside: I am using lower case identifiers for a reason.

how about group by
select
a2
,a3
,MAX(a1) as maximumVal
from a
group by a2, a3

Does this work for you ?
select max(A1),A2,A3 from A GROUP BY A2,A3;
EDIT
select A1,A2,A3 from A where A1=(select max(A1) from A ) limit 1

A standard trick to obtain the maximal row without an aggregate function is to guantee the absense of a larger value by means of a NOT EXISTS subquery. (This does not work when there are ties, but neither would the subquery with the max) When needed, it would not be too difficult to add a tie-breaker condition.
Another solution would be a subquery with a window function row_number() or rank()
SELECT *
FROM a src
WHERE NOT EXISTS ( SELECT * FROM a nx
WHERE nx.a2 = src.a2
AND nx.a3 = src.a3
AND nx.a1 > src.a1
);

how to get the comma separated values of the column stored in the Sql server

how to get the comma separated values stored in the Sql Db into a individual values
e.g in sql DB the column is stored with comma values as shown below,
EligibleGroup
A11,A12,A13
B11,B12,B13
I need to get
EligibleGroup
A11
A12
A13
B11
B12
...
I have written a query that will fetch me some list of employees with employee name and eligible group
XXX A11
YYY B11
ZZZ C11
I need to check that the employees(XXX,YYY,ZZZ) eligiblegroup falls within this
EligibleGroup
A11,A12,A13
B11,B12,B13
and return me only that rows.

use a "user defined function" like the one shown here (including source code) - it returns the splitted values as a "table" (one row per value) you can select from like
select txt_value from dbo.fn_ParseText2Table('A11,A12,A13')
returns
A11
A12
A13

You could use a subquery:
SELECT employee_name, eligible_group
FROM YourTable
WHERE eligible_group IN
(SELECT SPLIT(EligibleGroup)
FROM tblEligibleGroup
WHERE <some conditions here>)
I don't believe the "SPLIT" function exists in SQL Server so you'll have to either create a user defined function to handle that, or you could use the nifty workaround suggested here: How do I split a string so I can access item x?

I think you can do it this way,
select left('A11,A12,A13',3) + SUBSTRING('A11,A12,A13',charindex(',','A11,A12,A13'),10)

I think you may not have to split EligibleGroup. You can do another way by just:
select empId
from yourTempEmpTable t1, EligibleGroup t2
where t2.elibigle like '%'+t1.elibigle+'%'
I think it should work.

Assuming that EligibleGroup has a fixed length data, you can try using SUBSTRING As follows:
select substring(EligibleGroup,1,3) from #test union all
select substring(EligibleGroup,5,3) from #test union all
select substring(EligibleGroup,9,3) from #test
This will return:
A11
A12
A13
B11
B12
...
You can try it in Data Explorer
And If you need to check if an employee fall into which EligibleGroup try this:
Select EligibleGroup from test where eligibleGroup like '%A11'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pandas groupby-transform logic in postgreSQL - sql

Related

filter a column based on another column in oracle query

Impala - Get for all tables in database concentenated columns

Postgresql : Count columns whose value starting with the value of another column

Unique aggregate function when singular value is guaranteed by the WHERE clause

how to get the comma separated values of the column stored in the Sql server

Categories

Resources