I have some issues with Opensearch SQL engine. My queries do not work on OpenSearch.
For example I have two columns (A and B)
Column #1
Column #2
A
1
A
2
A
3
A
4
B
1
B
2
B
3
C
3
I need to get all the common values in column #2 that are for every element of column #1.
In this example output is 3.
Something more : I want to be able to do this with some of the items in column 1. If I want to do it only with A and B, the ouput is : 1,2,3.
Can you help me to find a query (compatible with Opensearch) ? to get this result.
Or any other solution other than SQL.
Thank you,
Vincent
Related
I´m currently working stuck on a SQL issue (well, mainly because I can´t find a way to google it and my SQL skills do not suffice to solve it myself)
I´m working on a system where documents are edited. If the editing process is finished, users mark the document as solved. In the MSSQL database, the corresponding row is not updated but instead, a new row is inserted. Thus, every document that has been processed has [e.g.: should have] multiple rows in the DB.
See the following situation:
ID
ID2
AnotherCondition
Steps
Process
Solved
1
1
yes
Three
ATAT
AF
2
2
yes
One
ATAT
FR
2
3
yes
One
ATAT
EG
2
4
yes
One
ATAT
AF
3
5
no
One
ABAT
AF
4
6
yes
One
ATAT
FR
5
7
no
One
AVAT
EG
6
8
yes
Two
SATT
FR
6
9
yes
Two
SATT
EG
6
10
yes
Two
SATT
AF
I need to select the rows which have not been processed yet. A "processed" document has a "FR" in the "Solved" column. Sadly other versions of the document exist in the DB, with other codes in the "Solved" columns.
Now: If there is a row which has "FR" in the "Solved" column I need to remove every row with the same ID from my SELECT statement as well. Is this doable?
In order to achieve this, I have to remove the rows with the IDs 2 | 4 (because the system sadly isn´t too reliable I guess) | and 6 in my select statement. Is this possible in general?
What I could do is to filter out the duplicates afterwards, in python/js/whatever. But I am curious whether I can "remove" these rows directly in the SQL statement as well.
To rephrase it another time: How can I make a select statement which returns only (in this example) the rows containing the ID´s 1, 3 and 5?
If you need to delete all rows where every id doesn't have any "Solved = 'no'", you can use a DELETE statement that will exclude all "id" values that have at least one "Solved = 'no'" in the corresponding rows.
DELETE FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'no');
Check the demo here.
Edit. If you need to use a SELECT statement, you can simply reverse the condition in the subquery:
SELECT *
FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'yes');
Check the demo here.
I'm not sure I understand your question correct:
...every document that has been processed has [...] multiple rows in the DB
I need to find out which documents have not been processed yet
So it seems you need to find unique documents with no versions, this could be done using a GROUP BY with a HAVING clause:
SELECT
Id
FROM dbo.TableName
GROUP BY Id
HAVING COUNT(*) = 1
What I'm trying to do is add another column to an existing table whose value will depend on an already existing column in the table. For example say I have this table:
Table1
|Letter|
A
C
R
A
I want to create another column (for example, numbers) that is chosen based on the letters. So let's say A corresponds with 10, C with 3 and R with 32 (this was chosen at random). My resulting table should be like this:
|Letter| Number |
A | 10
C | 3
R | 32
A | 10
Can anyone help me write a query that does this..I have over 20 different cases, so the simpler it looks the better.
Thanks in advance!
Options:
Build a table that associates [Letter] with the numeric value. Include this table in query by joining on the common [Letter] fields.
A very long Switch() expression. However, query design grid cell has a limit of 1024 characters.
Better to provide example with your real data and criteria.
I just started out using Informatica and currently I am figuring out how to get this to a target output (flat file to Microsoft SSIS):
ID Letter Parent_ID
---- ------ ---------
1 A NULL
2 B 1
3 C 1
4 D 2
5 E 2
6 F 3
7 G 3
8 H 4
9 I 4
From (assuming that this is a comma-delimited flat file):
c1,c2,c3,c4
A,B,D,H
A,B,D,I
A,B,E
A,C,F
A,C,G
EDIT: Where c1 c2 c3 and c4 being a header.
EDIT: A more descriptive representation of what I want to acheive:
EDIT: Here is what I have so far (Normalizer for achieving the letter column and Sequence Generator for ID)
Thanks in advance.
I'd go with a two-phased approach. Here's the general idea (not a full, step-by-step solution).
Perform pivot to get all values in separate rows (eg. from "A,B,D,H" do a substring and union the data to get four rows)
Perform sort with distinct and insert into target to get IDs assigned. End of mapping one.
In mapping two add a Sequence to add row numbers
Do the pivot again
Use expression variable to refer previous row and previous RowID (How do I get previous row?)
If current RowID doesn't match previous RowID, this is a top node and has no parent.
If previous row exists and the RowID is matching, previous row is a parent. Perform a lookup to get it's ID from DB and use as Parent_ID. Send update to DB.
Here is a sample of my data (n>3000) that ties two numbers together:
id a b
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
5 7030344 7030342
6 7030364 7008059
7 7030659 7066051
8 7030345 7030343
9 7031815 7045692
10 7032644 7102337
Now, the problem is that id=2 is a duplicate of id=5 and id=4 is a duplicate of id=8. So, when I tried to write if-then statements to map column a to column b, basically the numbers just get swapped. There are many cases like this in my full data.
So, my question is to identify the duplicate(s) and somehow delete one of the duplicates (either id=2 or id=5). And I preferably want to do this in Excel but I could work with SQL Server or SAS, too.
Thank you in advance. Please comment if my question is not clear.
What I want:
id a b
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
6 7030364 7008059
7 7030659 7066051
9 7031815 7045692
10 7032644 7102337
All sorts of ways to do this.
In SAS or SQL, this is simple (for SQL Server, the SQL portion should be identical or nearly so):
data have;
input id a b;
datalines;
1 7028344 7181310
2 7030342 7030344
3 7030354 7030353
4 7030343 7030345
5 7030344 7030342
6 7030364 7008059
7 7030659 7066051
8 7030345 7030343
9 7031815 7045692
10 7032644 7102337
;;;;
run;
proc sql undopolicy=none;
delete from have H where exists (
select 1 from have V where V.id < H.id
and (V.a=H.a and V.b=H.b) or (V.a=H.b and V.b=H.a)
);
quit;
The excel solution would require creating an additional column I believe with the concatenation of the two strings, in order (any order will do) and then a lookup to see if that is the first row with that value or not. I don't think you can do it without creating an additional column (or using VBA, which if you can use that will have a fairly simple solution as well).
Edit:
Actually, the excel solution IS possible without creating a new column (well, you need to put this formula somewhere, but without ANOTHER additional column).
=IF(OR(AND(COUNTIF(B$1:B1,B2),COUNTIF(C$1:C1,C2)),AND(COUNTIF(B$1:B1,C2),COUNTIF(C$1:C1,B2))),"DUPLICATE","")
Assuming ID is in A, B and C contain the values (and there is no header row). That formula goes in the second row (ie, B2/C2 values) and then is extended to further rows (so row 36 will have the arrays be B1:B35 and C1:C35 etc.). That puts DUPLICATE in the rows which are duplicates of something above and blank in rows that are unique.
I haven't tested this out but here is some food for thought, you could join the table against itself and get the ID's that have duplicates
SELECT
id, a, b
FROM
[myTable]
INNER JOIN ( SELECT id, a, b FROM [myTable] ) tbl2
ON [myTable].a = [tbl2].b
OR [myTable].b = tbl2.a
I have a table in the form:
date / category (string) / count (integer)
--------------------------------------------
7/15 A 3
7/15 B 7
7/15 C 2
7/16 A 9
7/16 B 1
7/16 C 2
Basically, for each day, each category will have a count associated with it.
The problem is, I don't necessarily know what these categories will end up being. Say I know they are A, B, and C, but next week, there is a D, E, and F.
And this is the view that I want to build:
Date / A / B / C / .. (however many categories found)
---------------------------------------------------------
7/15 3 5 2 3 4
7/16 9 5 9 6 4
...
..
.
I usually know enough SQL to get by, but this one is racking my brain. I don't think I am using the right vocabulary when trying to google it, because I'm not finding the answers I am looking for.
The answer is simple, you cannot build a view to do what you would like. A view has its columns pre-defined.
You could do one of the following:
Create a stored procedure that creates a view every week. This stored procedure would analyze the data, determine the columns, and then use dynamic SQL to alter the view.
Change the definition of what you want and put the values in a single column, separated by commas (or some other character).
Predefine a list of acceptable columns, create the view (using pivot, say) and then periodically go through an modify it when new values arise.
Do the pivoting at the application layer. This is particularly easy in Excel.
One big caveat with (1) and (3). If anything uses the view as "select * from view", you need to be sure that those queries/stored procedures/user defined functions/etc. are recompiled. Otherwise, they will have the wrong list of columns (this may only apply to SQL Server).