TypeORM & Postgres: Count only unique distinct values from multiple columns

TypeORM & Postgres: Count only unique distinct values from multiple columns - sql

I have various SQL queries, which return me unique / distinct value from DB, (or count them),
like:
SELECT buyer as counterparty
FROM public.order
UNION
SELECT seller as counterparty
FROM public.order
or
SELECT COUNT(*)
FROM (
SELECT DISTINCT p
FROM public.order
CROSS JOIN LATERAL (VALUES(seller),(buyer)) AS C(p)
) AS internalQuery
Example structure of my table:
id buyer seller
0 A B
1 B A
2 B D
3 D A
4 A D
Desired result:
3 or A,B,D
I'd like to rewrite them with TypORM query builder, but I can't figure out, how to replace CROSS JOIN LATERAL (VALUES(seller),(buyer)) AS C(p) or UNION in my case. TypeORM is pretty poor with examples and doc coverage in this case.
Does there any option with that?
I have seen various methods like .getCount and .distinct(true) which could help me and easily find the solution for one column.
So I understood, that if I want to find the exact number, instead of doc results, I should use .getCount instead of .getMany
But I can't understand, how to select (and unite) values from multiple columns via typeORM to receive distinct values from multiple columns.
I am working with PostgrSQL, so when I am trying:
const query = repository.createQueryBuilder('order')
.distinctOn(['buyer', 'seller'])
.limit(100)
.getMany()
I receive docs with each distinct value in each field, so instead of 3 I get 6 values (3 distinct by column1, and 3 by column2)

Related

How to delete records in BigQuery based on values in an array?

In Google BigQuery, I would like to delete a subset of records, based on the value of a specific column. It's a query that I need to run repeatedly and that I would like to run automatically.
The problem is that this specific column is of the form STRUCT<column_1 ARRAY (STRING), column_2 ARRAY (STRING), ... >, and I don't know how to use such a column in the where-clause when using the delete-command.
Here is basically what I am trying to do (this code does not work):
DELETE
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
The error that I'm getting is: Syntax error: Expected end of input but got keyword LEFT at [3:1]
If I replace the DELETE with SELECT *, it does work:
SELECT *
FROM dataset.table t
LEFT JOIN UNNEST(t.category.column_1) AS type
WHERE t.partition_date = '2020-07-22'
AND type = 'some_value'
Does somebody know how to use such a column to delete a subset of records?
EDIT:
Here is some code to create a reproducible example with some silly data (fill in your own dataset and table name in all queries):
Suppose you want to delete all rows where category.type contains the value 'food'.
1 - create a table:
CREATE TABLE <DATASET>.<TABLE_NAME>
(
article STRING,
category STRUCT<
color STRING,
type ARRAY<STRING>
>
);
2 - Insert data into the new table:
INSERT <DATASET>.<TABLE_NAME>
SELECT "apple" AS article, STRUCT('red' AS color, ['fruit','food'] as type) AS category
UNION ALL
SELECT "cabbage" AS article, STRUCT('blue' AS color, ['vegetable', 'food'] as type) AS category
UNION ALL
SELECT "book" AS article, STRUCT('red' AS color, ['object'] as type) AS category
UNION ALL
SELECT "dog" AS article, STRUCT('green' AS color, ['animal', 'pet'] as type) AS category;
3 - Show that select works (return all rows where category.type contains the value 'food'; these are the rows I want to delete):
SELECT *
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Initial Result
4 - My attempt at deleting rows where category.type contains 'food' does not work:
DELETE
FROM <DATASET>.<TABLE_NAME>
LEFT JOIN UNNEST(category.type) type
WHERE type = 'food'
Syntax error: Unexpected keyword LEFT at [3:1]
Desired Result

This is the code I used to delete the desired records (the records where category.type contains the value 'food'.)
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS(SELECT 1 FROM UNNEST(t1.category.type) t2 WHERE t2 = 'food')
The embarrasing thing is that I've seen these kind of answers on similar questions (for example on update-queries). But I come from Oracle-SQL and I think that there you are required to connect your subquery with your main query in the WHERE-statement of the subquery (ie. connect t1 with t2), so I didn't understand these answers. That's why I posted this question.
However, I learned that BigQuery automatically understands how to connect table t1 and 'table' t2; you don't have to explicitly connect them.
Now it is possible to still do this (perhaps even recommended?):
DELETE
FROM <DATASET>.<TABLE_NAME> t1
WHERE EXISTS (SELECT 1 FROM <DATASET>.<TABLE_NAME> t2 LEFT JOIN UNNEST(t2.category.type) AS type WHERE type = 'food' AND t1.article=t2.article)
but a second difficulty for me was that my ID in my actual data is somehow hidden in an array>struct-construction, so I got stuck connecting t1 & t2. Fortunately this is not always an absolute necessity.

Since you did not provide any sample data I am going to explain using some dummy data. In case you add your sample data, I can update the answer.
Firstly,according to your description, you have only a STRUCT not an Array[Struct <col_1, col_2>].For this reason, you do not need to use UNNEST to access the values within the data. Below is an example how to access particular data within a STRUCT.
WITH data AS (
SELECT 1 AS id, STRUCT("Alex" AS name, 30 AS age, "NYC" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Leo" AS name, 18 AS age, "Sydney" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Robert" AS name, 25 AS age, "Paris" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Mary" AS name, 28 AS age, "London" AS city) AS info UNION ALL
SELECT 1 AS id, STRUCT("Ralph" AS name, 45 AS age, "London" AS city) AS info
)
SELECT * FROM data
WHERE info.city = "London"
Notice that the STRUCT is named info and the data we accessed is city and used it in the WHERE clause.
Now, in order to delete the rows that contains an specific value within the STRUCT , in your case I assume it would be your_struct.column_1, you can use DELETE or MERGE and DELETE. I have saved the above data in a table to execute the below examples, which have the same output,
First method: DELETE
DELETE FROM `project.dataset.table`
WHERE info.city = "Sydney"
Second method: MERGE and DELETE
MERGE `project.dataset.table` a
USING (SELECT * from `project.dataset.table` WHERE info.city ="London") b
ON a.info.city =b.info.city
WHEN matched and b.id=1 then
Delete
And the output for both queries,
Row id info.name info.age info.city
1 1 Alex 30 NYC
2 1 Robert 25 Paris
3 1 Ralph 45 London
4 1 Mary 28 London
As you can see the row where info.city = "Sydney" was deleted in both cases.
It is important to point out that your data is excluded from your source table. Therefore, you should be careful.
Note: Since you want to run this process everyday, you could use Schedule Query within BigQuery Console, appending or overwriting the results after each run. Also, it is a good practice not deleting data from your source table. Thus, consider creating a new table from your source table without the rows you do not desire.

Oracle 11 SQL lists and two table comparison

I have a list of 12 strings (strings of numbers) that I need to compare to an existing table in Oracle. However I don't want to create a table just to do the compare; that takes more time than it is worth.
select
column_value as account_number
from
table(sys.odcivarchar2list('27001', '900480', '589358', '130740',
'807958', '579813', '1000100462', '656025',
'11046', '945287', '18193', '897603'))
This provides the correct result.
Now I want to compare that list of 12 against an actual table with account numbers to find the missing values. Normally with two tables I would do a left join table1.account_number = table2.account_number and table two results will have blanks. When I attempt that using the above, all I get are the results where the two records are equal.
select column_value as account_number, k.acct_num
from table(sys.odcivarchar2list('27001',
'900480',
'589358',
'130740',
'807958',
'579813',
'1000100462',
'656025',
'11046',
'945287',
'18193',
'897603'
)) left join
isi_owner.t_known_acct k on column_value = k.acct_num
9 match, but 3 should be included in table1 and blank in table2
Thoughts?
Sean

Can I get duplicate results (from one table) in an INTERSECT operation between two tables?

I know the wording of the question is awkward, but I couldn't phrase it any better. Let me explain the situation.
There's table A which has a bunch of columns (a, b, c ... ) and I run a SELECT query on it like so:
SELECT a FROM A WHERE b IN ('....') (the ellipsis indicates a number of values to be matched to)
There's another table B which has a bunch of columns (d, e, f ... ) and I run a SELECT query on it like so:
SELECT d FROM B WHERE f = '...' (the ellipsis indicates a single value to be matched to)
Now I should say here that the two tables store different types of information about the same entity, but the columns a and d contain the exact same data (in this case, an ID). I want to find out the intersection of the two tables so I run this:
SELECT a FROM A WHERE b IN ('....') INTERSECT SELECT d FROM B WHERE f = '...'
Now here's the problem:
The first SELECT contains a set of values in the WHERE clause, right? So let's say the set is (1234, 2345,3456). Now, the result of this query when b is matched ONLY to 1234 is, let's say, abc. When it's matched to 2345, it's def, suppose. And matching to 3456, it gives abc.
Let's suppose these two results (abc and def) are also in the set of results from the second SELECT.
So, now, putting back the entire set of values to matched into the WHERE clause, the INTERSECT operation will give me abc and def. But I want abc twice since two values in the WHERE clause set match to the second SELECT.
Is there any way I can get that?
I hope it's not too complicated to understand my problem. This is a real-life problem I'm facing in my job.
Data structure and my code
Table A contains general information about a company:
company_id | branch_id | no_of_employees | city
Table B contains the financials of the company:
company_id | branch_id | revenue | profits
First SELECT:
SELECT branch_id FROM A WHERE CITY IN ('Dallas', 'Miami', 'New Orleans')
Now, running each city separately in the first SELECT, I get the branch_ids:
branch_id | city
23 | Dallas
45 | Miami
45 | New Orleans
Once again, this seems impractical as to how two cities can have the same branch ids, but please bear with me on this.
Second SELECT:
SELECT branch_id FROM B
WHERE REVENUE = 5000000
I know this is a little impractical, but for the purpose of this example, it suffices.
Running this query I get the following set:
11
23
45
22
10
So the INTERSECT will give me just 23 and 45. But I want 45 twice, since both Miami and New Orleans have that branch_id and that branch_id has generated a revenue of 5 million.

Directly from Microsoft's documentation (https://msdn.microsoft.com/en-us/library/ms188055.aspx)
:
"INTERSECT returns distinct rows that are output by both the left and right input queries operator."
So NO, it is not possible to get the same value twice when using INTERSECT because the results will be DISTINCT. However if you build an INNER JOIN correctly you can do essentially the same thing as INTERSECT except keep the repetitive results by NOT using distinct or group by.
SELECT
A.a
FROM
A
INNER JOIN B
ON A.a = B.d
AND B.F = '....'
WHERE b IN ('....')
And for your specific Example that you edited:
SELECT
branch_id
FROM
A
INNER JOIN B
ON A.branch_id = B.branch_id
AND B.REVENUE = 5000000
WHERE A.CITY IN ('Dallas', 'Miami', 'New Orleans')

You overcomplicated your task a lot:
SELECT *
FROM A
WHERE CITY IN (...)
AND EXISTS
(
SELECT 1 FROM B
WHERE B.REVENUE = 5000000
AND B.branch_id = A.branch_id
)
INTERSECT and EXCEPT are both returning row sets with DISTINCT applied.
Regular joining/filtering operations are not performed by INTERSECT or EXCEPT.

Oracle / SQL - Count number of occurrences of values in a single column

Okay, I probably could have come up with a better title, but wasn't sure how to word it so let me explain.
Say I have a table with the column 'CODE'. Each record in my table will have either 'A', 'B', or 'C' as it's value in the 'CODE' column. What I would like is to get a count of how many 'A's, 'B's, and 'C's I have.
I know I could accomplish this with 3 different queries, but I'm wondering if there is a way to do it with just 1.

Use:
SELECT t.code,
COUNT(*) AS numInstances
FROM YOUR_TABLE t
GROUP BY t.code
The output will resemble:
code numInstances
--------------------
A 3
B 5
C 1
If a code exists that has not been used, it will not show up. You'd need to LEFT JOIN to the table containing the list of codes in order to see those that don't have any references.

Reporting against a CSV field in a SQL server 2005 DB

Ok so I am writing a report against a third party database which is in sql server 2005. For the most part its normalized except for one field in one table. They have a table of users (which includes groups.) This table has a UserID field (PK), a IsGroup field (bit) , a members field (text) this members field has a comma separated list of all the members of this group or (if not a group) a comma separated list of the groups this member belongs to.
The question is what is the best way to write a stored procedure that displays what users are in what groups? I have a function that parses out the ids into a table. So the best way I could come up with was to create a cursor that cycles through each group and parse out the userid, write them to a temp table (with the group id) and then select out from the temp table?
UserTable
Example:
ID|IsGroup|Name|Members
1|True|Admin|3
2|True|Power|3,4
3|False|Bob|1,3
4|False|Susan|2
5|True|Normal|6
6|False|Bill|5
I want my query to show:
GroupID|UserID
1|3
2|3
2|4
5|6
Hope that makes sense...

If you have (or could create) a separate table containing the groups you could join it with the users table and match them with the charindex function with comma padding of your data on both sides. I would test the performance of this method with some fairly extreme workloads before deploying. However, it does have the advantage of being self-contained and simple. Note that changing the example to use a cross-join with a where clause produces the exact same execution plan as this one.
Example with data:
SELECT *
FROM (SELECT 1 AS ID,
'1,2,3' AS MEMBERS
UNION
SELECT 2,
'2'
UNION
SELECT 3,
'3,1'
UNION
SELECT 4,
'2,1') USERS
LEFT JOIN (SELECT '1' AS MEMBER
UNION
SELECT '2'
UNION
SELECT '3'
UNION
SELECT '4') GROUPS
ON CHARINDEX(',' + GROUPS.MEMBER + ',',',' + USERS.MEMBERS + ',') > 0
Results:
id members group
1 1,2,3 1
1 1,2,3 2
1 1,2,3 3
2 2 2
3 3,1 1
3 3,1 3
4 2,1 1
4 2,1 2

Your technique will probably be the best method.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

TypeORM & Postgres: Count only unique distinct values from multiple columns - sql

Related

How to delete records in BigQuery based on values in an array?

Oracle 11 SQL lists and two table comparison

Can I get duplicate results (from one table) in an INTERSECT operation between two tables?

Oracle / SQL - Count number of occurrences of values in a single column

Reporting against a CSV field in a SQL server 2005 DB

Categories

Resources