sql select same column multiple times (using online oracle explorer) - sql

I have exploratory access to a db using the online explorer(oracle), only 1000 records are returned at a time.
Thus, I need to make sure that the data returned contains a sufficient subset
as a basic example, tableA with columns
id chartid
1 a
1 b
1 c
2 d
2 b
select id
from tableA
where chartid in (a,d)
and chartid in (b)
and chartid in (c)
should return
id
1
I want to make sure that the ids I then run my query on contain enough of the needed data(otherwise, its sparse)
--thankyou
how is this done in general?
is this a limitation of the explorer/online interface ..?

This is an example of a "set-within-sets" query. I like to approach these using group by and having:
select id
from TableA
group by id
having sum(case when chartid in ('a', 'd') then 1 else 0 end) > 0 and
sum(case when chartid in ('b') then 1 else 0 end) > 0 and
sum(case when chartid in ('c') then 1 else 0 end) > 0;
Each condition in the having clause counts the number of rows, for a given id, that match each condition. The > 0 guarantees that at least one row exists that meets each of the conditions.
EDIT:
I'm not actually sure what your conditions are, because it is not meeting all three that you meet. If you want to meet just one of them, them use or. If you want two out of the three, then use:
having (max(case when chartid in ('a', 'd') then 1 else 0 end) +
max(case when chartid in ('b') then 1 else 0 end) +
max(case when chartid in ('c') then 1 else 0 end)
) >= 2;

The answer Gordon Linoff provided is probably the best, but just for reference a general approach to this kind problem would be to break it down into smaller parts (ie form multiple sets) and then join them. Something looking like this:
select t1.id
from tableA t1
inner join tableA t2 on t1.id = t2.id
inner join tableA t3 on t1.id = t3.id
where
t1.chartid in ('a','d')
and t2.chartid = 'b'
and t3.chartid = 'c'
This query would form three distinct sets and then return the intersect of them. At least for me, thinking in terms of sets often is helpful when doing relational database queries (although the solution might not be the most efficient).
Sample SQL Fiddle with both queries.

Related

SQL Case When Statement for Id with multiple rows

I have a table like so
Id Code
1 03J0
1 0304
1 03HI
2 033I
2 03J5
3 03J4
4 030H
I want to do a case when statement, when there is any occurrence where the Id has a Code that is like '%03J' then Happy otherwise Sad. How do I do this when an Id has multiple rows of different codes?
Intended output
Id Emotion
1 Happy
2 Happy
3 Happy
4 Sad
Is this what you want?
select id,
(case when sum(case when code like '03J%' then 1 else 0 end) > 0 then 'Happy' else 'Sad' end) as emotion
from t
group by id;
Using the ordering of strings, you can simplify this to:
select id,
min(case when code like '03J%' then 'Happy' else 'Sad' end) as emotion
from t
group by id;
Here is a db<>fiddle.
Using self-join. Judging from your sampple data, I think you want '03J%' instead of '%03J'
select distinct a.id, case when b.code is not null then 'Happy' else 'Sad' end as emotion
from mytable a
left join mytable b on a.id=b.id and b.code like '03J%';

Multiple Selects from Subquery

I have multiple queries that look like this:
select count(*) from (
SELECT * FROM TABLE1 t
JOIN TABLE2 e
USING (EVENT_ID)
) s1
WHERE
s1.SOURCE_ID = 1;
where the only difference is the t1.SOURCE_ID = (some other number). I would like to turn these into a single query that just selects from the subquery using a different SOURCE_ID for each column in the result, like this:
+----------------+----------------+----------------+
| source_1_count | source_2_count | source_3_count | ... so on
+----------------+----------------+----------------+
I am trying to avoid using the multiple queries as the join is on a very large table and takes some time, so I would rather do it once and query the result multiple times.
This is on a Snowflake data warehouse which I think uses something similar to PostgreSQL (also I'm fairly new to SQL so feel free to suggest a completely different solution as well).
Use conditional aggregation
SELECT sum(case when sourceid=1 then 1 else 0 end) source_1_count, sum(case when sourceid=2 then 1 else 0 end) source_2_count...
FROM TABLE1 t
JOIN TABLE2 e
USING (EVENT_ID)
You would put the results in separate rows, using group by:
SELECT SOURCE_ID, COUNT(*)
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID)
GROUP BY SOURCE_ID;
Putting the separate sources in columns is troublesome, unless you know the exact list of sources that you want in the result set.
EDIT:
If you know the exact list of sources, you can use conditional aggregation or pivot:
SELECT SUM(CASE WHEN SOURCE_ID = 1 THEN 1 ELSE 0 END) as source_id_1,
SUM(CASE WHEN SOURCE_ID = 2 THEN 1 ELSE 0 END) as source_id_2,
SUM(CASE WHEN SOURCE_ID = 3 THEN 1 ELSE 0 END) as source_id_3
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID);
All the comments so far ignore the fact that you won't have the possible benefits of pruning the data during the scan, as there are no WHERE predicates. Join can also be slower than it needs to be because of that.
This is a possible improvement:
SELECT SUM(CASE WHEN SOURCE_ID = 1 THEN 1 ELSE 0 END) as source_id_1,
SUM(CASE WHEN SOURCE_ID = 2 THEN 1 ELSE 0 END) as source_id_2,
SUM(CASE WHEN SOURCE_ID = 3 THEN 1 ELSE 0 END) as source_id_3
FROM TABLE1 t JOIN
TABLE2 e
USING (EVENT_ID);
WHERE SOURCE_ID IN (1, 2, 3)

SQL - most efficient way to find if a pair of row does NOT exist

I can't seem to find a similar situation to mine online. I have a table for 'orders' called Order, and a table for details on those orders, called 'order detail'. The definition of a certain type of order is if it has 1 of two pairs of order details (Value-Unit pairs). So, my order detail table might look like this:
order_id | detail
---------|-------
1 | X
1 | Y
1 | Z
2 | X
2 | Z
2 | B
3 | A
3 | Z
3 | B
The two pairs that go together are (X & Y) and (A & B). What is an efficient way of retrieving only those order_ids that DO NOT contain either one of these pairs? e.g. For the above table, I need to receive only the order_id 2.
The only solution I can come up with is essentially to use two queries and perform a self join:
select distinct o.order_id
from orders o
where o.order_id not in (
select distinct order_id
from order_detail od1 where od1.detail=X
join order_detail od2 on od2.order_id = od1.order_id and od2.detail=Y
)
and o.order_id not in (
select distinct order_id
from order_detail od1 where od1.detail=A
join order_detail od2 on od2.order_id = od1.order_id and od2.detail=B
)
The problem is that performance is an issue, my order_detail table is HUGE, and I am quite inexperienced in query languages. Is there a faster way to do this with a lower cardinality? I also have zero control over the schema of the tables, so I can't change anything there.
First and foremost I'd like to emphasise that finding the most efficient query is a combination of a good query and a good index. Far too often I see questions here where people look for magic to happen in only one or the other.
E.g. Of a variety of solutions, yours is the slowest (after fixing syntax errors) when there are no indexes, but is quite a bit better with an index on (detail, order_id)
Please also note that you have the actual data and table structures. You'll need to experiment with various combinations of queries and indexes to find what works best; not least because you haven't indicated what platform you're using and results are likely to vary between platforms.
[/ranf-off]
Query
Without further ado, Gordon Linoff has provided some good suggestions. There's another option likely to offer similar performance. You said you can't control the schema; but you can use a sub-query to transform the data into a 'friendlier structure'.
Specifically, if you:
pivot the data so you have a row per order_id
and columns for each detail you want to check
and the intersection is a count of how many orders have that detail...
Then your query is simply: where (x=0 or y=0) and (a=0 or b=0). The following uses SQL Server's temporary tables to demonstrate with sample data. The queries below work regardless of duplicate id, val pairs.
/*Set up sample data*/
declare #t table (
id int,
val char(1)
)
insert #t(id, val)
values (1, 'x'), (1, 'y'), (1, 'z'),
(2, 'x'), (2, 'z'), (2, 'b'),
(3, 'a'), (3, 'z'), (3, 'b')
/*Option 1 manual pivoting*/
select t.id
from (
select o.id,
sum(case when o.val = 'a' then 1 else 0 end) as a,
sum(case when o.val = 'b' then 1 else 0 end) as b,
sum(case when o.val = 'x' then 1 else 0 end) as x,
sum(case when o.val = 'y' then 1 else 0 end) as y
from #t o
group by o.id
) t
where (x = 0 or y = 0) and (a = 0 or b = 0)
/*Option 2 using Sql Server PIVOT feature*/
select t.id
from (
select id ,[a],[b],[x],[y]
from (select id, val from #t) src
pivot (count(val) for val in ([a],[b],[x],[y])) pvt
) t
where (x = 0 or y = 0) and (a = 0 or b = 0)
It's interesting to note that the query plans for options 1 and 2 above are slightly different. This suggests the possibility of different performance characteristics over large data sets.
Indexes
Note that the above will likely process the whole table. So there is little to be gained from indexes. However, if the table has "long rows", an index on only the 2 columns you're working with means that less data needs to be read from disk.
The query structure you provided is likely to benefit from an indexes such as (detail, order_id). This is because the server can more efficiently check the NOT IN sub-query conditions. How beneficial will depend on the distribution of data in your table.
As a side note I tested various query options including a fixed version of yours and Gordon's. (Only a small data size though.)
Without the above index, your query was slowest in the batch.
With the above index, Gordon's second query was slowest.
Alternative Queries
Your query (fixed):
select distinct o.id
from #t o
where o.id not in (
select od1.id
from #t od1
inner join #t od2 on
od2.id = od1.id
and od2.val='Y'
where od1.val= 'X'
)
and o.id not in (
select od1.id
from #t od1
inner join #t od2 on
od2.id = od1.id
and od2.val='a'
where od1.val= 'b'
)
Mixture between Gordon's first and second query. Fixes the duplicate issue in the first and the performance in the second:
select id
from #t od
group by id
having ( sum(case when val in ('X') then 1 else 0 end) = 0
or sum(case when val in ('Y') then 1 else 0 end) = 0
)
and( sum(case when val in ('A') then 1 else 0 end) = 0
or sum(case when val in ('B') then 1 else 0 end) = 0
)
Using INTERSECT and EXCEPT:
select id
from #t
except
(
select id
from #t
where val = 'a'
intersect
select id
from #t
where val = 'b'
)
except
(
select id
from #t
where val = 'x'
intersect
select id
from #t
where val = 'y'
)
I would use aggregation and having:
select order_id
from order_detail od
group by order_id
having sum(case when detail in ('X', 'Y') then 1 else 0 end) < 2 and
sum(case when detail in ('A', 'B') then 1 else 0 end) < 2;
This assumes that orders do not have duplicate rows with the same detail. If that is possible:
select order_id
from order_detail od
group by order_id
having count(distinct case when detail in ('X', 'Y') then detail end) < 2 and
count(distinct case when detail in ('A', 'B') then detail end) < 2;

Joining two datasets with subqueries

I am attempting to join two large datasets using BigQuery. they have a common field, however the common field has a different name in each dataset.
I want to count number of rows and sum the results of my case logic for both table1 and table2.
I believe that I have errors resulting from subquery (subselect?) and syntax errors. I have tried to apply precedent from similar posts but I still seem to be missing something. Any assistance in getting this sorted is greatly appreciated.
SELECT
table1.field1,
table1.field2,
(
SELECT COUNT (*)
FROM table1) AS table1_total,
sum(case when table1.mutually_exclusive_metric1 = "Y" then 1 else 0 end) AS t1_pass_1,
sum(case when table1.mutually_exclusive_metric1 = "Y" AND table1.mutually_exclusive_metric2 IS null OR table1.mutually_exclusive_metric3 = 'Y' then 1 else 0 end) AS t1_pass_2,
sum(case when table1.mutually_exclusive_metric3 ="Y" AND table1.mutually_exclusive_metric2 ="Y" AND table1.mutually_exclusive_metric3 ="Y" then 1 else 0 end) AS t1_pass_3,
(
SELECT COUNT (*)
FROM table2) AS table2_total,
sum(case when table2.metric1 IS true then 1 else 0 end) AS t2_pass_1,
sum(case when table2.metric2 IS true then 1 else 0 end) AS t2_pass_2,
(
SELECT COUNT (*)
FROM dataset1.table1 JOIN EACH dataset2.table2 ON common_field_table1 = common_field_table2) AS overlap
FROM
dataset1.table1,
dataset2.table2
WHERE
XYZ
Thanks in advance!
Sho. Lets take this one step at a time:
1) Using * is not explicit, and being explicit is good. Additionally, stating explicit selects and * will duplicate selects with autorenames. table1.field will become table1_field. Unless you are just playing around, don't use *.
2) You never joined. A query with a join looks like this (note order of WHERE and GROUP statements, note naming of each):
SELECT
t1.field1 AS field1,
t2.field2 AS field2
FROM dataset1.table1 AS t1
JOIN dataset2.table2 AS t2
ON t1.field1 = t2.field1
WHERE t1.field1 = "some value"
GROUP BY field1, field2
Where t1.f1 = t2.f1 contain corresponding values. You wouldn't repeat those in the select.
3) Use whitespace to make your code easier to read. It helps everyone involved, including you.
4) Your subselects are pretty useless. A subselect is used instead of creating a new table. For example, you would use a subselect to group or filter out data from an existing table. For example:
SELECT
subselect.field1 AS ssf1,
subselect.max_f1 AS ss_max_f1
FROM (
SELECT
t1.field1 AS field1,
MAX(t1.field1) AS max_f1,
FROM dataset1.table1 AS t1
GROUP BY field1
) AS subselect
The subselect is practically a new table that you select from. Treat it logically like it happens first, and you take the results from that and use it in your main select.
5) This was a terrible question. It didn't even look like you tried to figure things out one step at a time.

SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.
And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...
Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable
No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.
I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR
If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)