Selecting where keywords match both values but in different rows [duplicate] - sql

This question already has answers here:
Postgresql: Query returning incorrect data
(2 answers)
PostgreSQL: select all types that have an entry corresponding to all entries in another table
(2 answers)
Find a value which contains in ALL rows of a table
(1 answer)
Closed 5 years ago.
I am trying to return what id contains both keywords. For example my data is:
|fID|keyword|
|1 |word1 |
|1 |word2 |
|2 |word1 |
|3 |word2 |
if I do the SQL SELECT fID FROM table WHERE keyword = 'word1' AND keyword = 'word2';
it returns with 0 results matching I assume because it wants them to be in the same row when all I am wanting is if they both are connected with the same fID.
If I do OR instead of AND it shows the fIDs that dont have both in.
I would expect the result to output fID 1. I have been messing around with brakets in various places and group by but cannot get any success in this.

Untested, but you could try something like this:
SELECT fID
FROM table
WHERE keyword = 'word1' OR keyword = 'word2'
GROUP BY fID
HAVING COUNT(DISTINCT keyword) = 2

Related

It's possible to select distinct and no distinct in Pyspark? [duplicate]

This question already has answers here:
How to get distinct rows in dataframe using pyspark?
(2 answers)
Closed 2 years ago.
I need to select 2 columns from a fact table (attached below). The problem I find is that for one of the columns I need unique values and for the other one I'm happy to have them duplicated as they below to a specific ticket id.
Fact table used:
df = (
spark.table(f'nn_table_{country}.fact_table')
.filter(f.col('date_key').between(start_date,end_date))
.filter(f.col('is_client_plus')==1)
.filter(f.col('source')=='tickets')
.filter(f.col('subtype')=='item_pm')
.filter(f.col('external_id')=='DISC0000077144 | DISC0000076895')
.filter(f.col('external_id').isNotNull())
.select('customer_id','external_id').distinct()
#.join(dim_promotions, 'external_id', 'left')
)
display(df)
As you can see, the select statement contains a customer_id and external_id column, where I'm only interested in get the unique customer_id.
.select('customer_id','external_id').distinct()
Desired output:
customer_id external_id
77000000505097070 DISC0000077144
77000002294023644 DISC0000077144
77000000385346302 DISC0000076895
77000000291101490 DISC0000076895
any idea about how to do that? or if it's possible?
Thanks in advance!
Use dropDuplicates:
df.select('customer_id','external_id').dropDuplicates(['customer_id'])

Is any way to use IN operator to specify pair of values in SQL? [duplicate]

This question already has answers here:
SQL multiple columns in IN clause
(6 answers)
Closed 3 years ago.
In Oracle SQL I need to filter records by pair of multiple values using SELECT query. There is a FIRST_ID and SECOND_ID and I want to have data filtered by only specif pair.
I tried using CONCAT in first way, next i prepared a lot of pairs with OR operator, but both ways need a lot of manual works.
select *
from table_data
where to_char(first_id||;||second_id) in ('123;354', '422;563', ... '353;536');
or
select *
from table_data
where (first_id = 123 and second_id = 354)
or (first_id = 422 and second_id = 563)
or (first_id = 353 and second_id = 536);
So, You see that I cant'use two IN operators (one for first_id, second for second_id) because it will give a result for all crosing pairs like 123 - 254, 123 - 562 & 123-536 etc. Any ideas how to do it fast and easy?
Oracle supports IN with tuples:
select *
from table_data
where (first_id, second_id) in ( (123, 354), (422, 563), (353, 536));

Convert Sql row into column [duplicate]

This question already has answers here:
Efficiently convert rows to columns in sql server
(5 answers)
Closed 6 years ago.
I have this table:
Value | Name
300 | moshe
400 | yoni
500 | niv
And i would like to convert it into this:
nameColumn: moshe yoni niv
value: 300 400 500
The value is float type and name is nchar(20).
anyone?
thanks
Most databases have a PIVOT relational operator (link for SQL Server) to turn the unique values of a specified column from multiple rows into multiple column values in the output (cross-tab), effectively rotating a table.

fetch values into a comma separated string in postgresql [duplicate]

This question already has answers here:
Postgresql GROUP_CONCAT equivalent?
(9 answers)
Closed 7 years ago.
This should be simple one. A query like
SELECT code_id FROM code WHERE status = 1
gives normally a result like
code_id
10
11
12
But my goal is to fetch into a string like
10,11,12
In order to use it in an other query
SELECT x FROM table WHERE status in (10,12,13)
Preferable in the same query. Is this possible using "standard" Postgresql WITHOUT adding extra extension?
Everything so far is using extension that not are available as standard.
Thanks in advance.
Whatever tool you are using just shows you the data like that for convenience. But you can also use the resultset in a subquery, like this
SELECT x FROM table WHERE status in (
SELECT code_id FROM code WHERE status = 1
)
You can try this way to get result as comma-separated
But my goal is to fetch into a string like
SELECT string_agg(code_id, ',') FROM code WHERE status = 1

sql logical compression of records

I have a table in SQL with more than 1 million records which I want to compress using following algorithm ,and now I'm looking for the best way to do that ,preferably without using a cursor .
if the table contains all 10 possible last digits(from 0 to 9) for a number (like 252637 in following example) we will find the most used Source (in our example 'A') and then remove all digits where Source = 'A' and insert the collapsed digit instead of that (here 252637) .
the example below would help for better understanding.
Original table :
Digit(bigint)| Source
|
2526370 | A
2526371 | A
2526372 | A
2526373 | B
2526374 | C
2526375 | A
2526376 | B
2526377 | A
2526378 | B
2526379 | B
Compressed result :
252637 |A
2526373 |B
2526374 |C
2526376 |B
2526378 |B
2526379 |B
This is just another version of Tom Morgan's accepted answer. It uses division instead of substring to trim the least significant digit off the BIGINT digit column:
SELECT
t.Digit/10
(
-- Foreach t, get the Source character that is most abundant (statistical mode).
SELECT TOP 1
Source
FROM
table i
WHERE
(i.Digit/10) = (t.Digit/10)
GROUP BY
i.Source
ORDER BY
COUNT(*) DESC
)
FROM
table t
GROUP BY
t.Digit/10
HAVING
COUNT(*) = 10
I think it'll be faster, but you should test it and see.
You could identify the rows which are candidates for compression without a cursor (I think) by GROUPing by a substring of the Digit (the length -1) HAVING count = 10. That would identify digits with 10 child rows. You could use this list to insert to a new table, then use it again to delete from the original table. What would be left would be rows that don't have all 10, which you'd also want to insert to the new table (or copy the new data back to the original).
Does that makes sense? I can write it out a bit better if it doesn't.
Possible SQL Solution:
SELECT
SUBSTRING(t.Digit,0,len(t.Digit)-1)
(SELECT TOP 1 Source
FROM innerTable i
WHERE SUBSTRING(i.Digit,0,len(i.Digit)-1)
= SUBSTRING(t.Digit,0,len(t.Digit)-1)
GROUP BY i.Source
ORDER BY COUNT(*) DESC
)
FROM table t
GROUP BY SUBSTRING(t.Digit,0,len(t.Digit)-1)
HAVING COUNT(*) = 10