I have two tables where second table is referenced as foreign key in first table and I'm using jetbrains exposed as ORM framework for my Kotlin code.
Table A
id | type | tableb_id
1 | abcd | 2
2 | abcd | 3
Table B
id | date | time
1 | 2022-12-12 | 2022-05-16 22:47:56.779378
The query I'm trying to achieve is
WITH cte AS
(
SELECT *
FROM a
WHERE type = 'abcd'),
cte2 AS
(
SELECT *
FROM b
WHERE id IN (cte.tableb_id)
ORDER BY date DESC,
time DESC limit 1)
SELECT *
FROM cte
WHERE tableb_id = cte2.id
Seems that Exposed doesn't support CTEs at the moment. They have opened ticket from 2018 in their github: https://github.com/JetBrains/Exposed/issues/423
Related
I'm trying to use a query to select all the data of a company that answers a specific condition but I have trouble doing so. The following is what I have done so far:
SELECT *
FROM company a
WHERE a.id IN (SELECT b.company_id
FROM provider b
WHERE b.service_id IN (2, 4));
What I intend for the role of the sub-query (using the table below) to be, is to select the company_id that possess the service_id 2 and 4.
So in this example, it would only return the company_id 5:
+----------------+
| provider TABLE |
+----------------+
+----------------+----------------+----------------+
| id | company_id | service_id |
+--------------------------------------------------+
| 1 | 3 | 2 |
| 2 | 5 | 2 |
| 3 | 5 | 4 |
| 4 | 9 | 6 |
| 5 | 9 | 7 |
| ... | ... | ... |
As you may have guessed, the use of the IN in the sub-query does not fulfill my needs, it will select the company_id 5 but also the company_id 3.
I understand why, IN exists to check if a value matches any value in a list of values so it is not really what I need.
So my question is:
How can I replace the IN in my sub-query to select company_id
having the service_id 2 and 4?
The subquery should be:
SELECT b.company_id
FROM provider b
WHERE b.service_id IN (2, 4)
GROUP BY b.company_id
HAVING COUNT(b.service) = 2
You can self-JOIN the provider table to find companies that own both needed services.
SELECT p1.company_id
FROM provider p1
INNER JOIN provider p2 on p2.company_id = p1.company_id and p2.service_id = 2
WHERE p1.service_id = 4
As well as the other answers are correct if you want to see all company detail instead of only Company_id you can use two EXISTS() for each service_id.
SELECT *
FROM Company C
WHERE EXISTS (SELECT 1
FROM Provider P1
WHERE C.company_id = P1.company_id
AND P1.service_id = 2)
AND EXISTS (SELECT 1
FROM Provider P2
WHERE C.company_id = P2.company_id
AND P2.service_id = 4)
I offer this option for the subquery...
SELECT b.company_id
FROM provider b WHERE b.service_id = 2
INTERSECT
SELECT b.company_id
FROM provider b WHERE b.service_id = 4
Often, I find the performance of these operations to be outstanding even with very large data sets...
UNION
INTERSECT
EXCEPT (or MINUS in Oracle)
This article has some good insights:
You Probably Don't Use SQL Intersect or Except Often Enough
Hope it helps.
after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+
With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;
The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)
Not sure how to title or ask this really. Say I am getting a result set like this on a join of two tables, one contains the Id (C), the other contains the Rating and CreatedDate (R) with a foreign key to the first table:
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 5 | 12/08/1981 |
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
| 5 | 2 | 10/10/2010 |
I want this result set (the newest ones only):
-----------------------------------
| C.Id | R.Rating | R.CreatedDate |
-----------------------------------
| 2 | 3 | 01/01/2001 |
| 5 | 1 | 11/11/2011 |
This is a very large data set, and my methods (I won't mention which so there is no bias) is very slow to do this. Any ideas on how to get this set? It doesn't necessarily have to be a single query, this is in a stored procedure.
Thank you!
You need a CTE with a ROW_NUMBER():
WITH CTE AS (
SELECT ID, Rating, CreatedDate, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate DESC) RowID
FROM [TABLESWITHJOIN]
)
SELECT *
FROM CTE
WHERE RowID = 1;
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by createddate desc) as seqnum
from table t
) t
where seqnum = 1;
If you are using SQL Server 2008 or later, you should consider using windowing functions. For example:
select ID, Rating, CreatedDate from (
select ID, Rating, CreatedDate,
rowseq=ROW_NUMBER() over (partition by ID order by CreatedDate desc)
from MyTable
) x
where rowseq = 1
Also, please understand that while this is an efficient query in and of itself, your overall performance depends even more heavily on the underlying tables and, in particular, the indexes and explain plans that are used when joining the tables in the first place, etc.
Each row in my table belongs to some category, has some value and other data.
I would like to select each category with the most common value for it (doesn't matter which one if there are multiple), ordered by category.
some_table: expected result:
+--------+-----+--- +--------+-----+
|category|value|... |category|value|
+--------+-----+--- +--------+-----+
| 1 | a | | 1 | a |
| 1 | a | | 2 | b |
| 1 | b | | 3 | a # or b
| 2 | a | +--------+-----+
| 2 | b |
| 2 | c |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | b |
| 3 | b |
+--------+-----+---
I have a solution (posting it as an answer) but it seems suboptimal to me. So I'm looking for better solutions.
My table will have up to 10000 rows (possibly, but not likely, beyond that).
I'm planning to use SQLite but I'm not tied to it, so I may reconsider if SQLite can't do this with reasonable performance.
I would be inclined to do this using a correlated subquery:
select distinct category,
(select value
from some_table t2
where t2.category = t.category
group by value
order by count(*) desc
limit 1
) as mode_value
from some_table t;
The name for the most common value is "mode" in statistics.
And, if you had a categories table, this would be written as:
select category,
(select value
from some_table t2
where t2.category = c.category
group by value
order by count(*) desc
limit 1
) as mode_value
from categories c;
Here is one option, but I think it's slow...
SELECT DISTINCT `category` AS `the_category`, `value`
FROM `some_table`
WHERE `value`=(
SELECT `value`
FROM `some_table`
WHERE `category`=`the_category`
GROUP BY `value`
ORDER BY COUNT(`value`) DESC LIMIT 1)
ORDER BY `category`;
You can replace a part of this with WHERE `id`=( SELECT `id` if the table has a unique/primary key column, then the LIMIT 1 is not needed.
select category, value, count(*) value_count
from some_table t
group by category, value
order by category, value_count DESC;
returns us amout of each value in each category
select category, value
from (
select category, value, count(*) value_count
from some_table t
group by category, value) sub
group by category
actually we need the first value because it's sorted.
I am not sure sqlite leaves the first one and can't test but IMHO it should work
This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 9 years ago.
I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 2 | 3968 | Spain | Spanish |
| 3 | 3968 | USA | English |
| 4 | 1234 | Greece | Greek |
| 5 | 1234 | Italy | Italian |
I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 4 | 1234 | Greece | Greek |
How is this achievable?
A very typical approach to this type of problem is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by number order by id) as seqnum
from t
) t
where seqnum = 1;
This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.
Since you don't care, I chose the max ID for each number.
select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id
Query Explanation
select
tbl.* -- give me all the data from the base table (tbl)
from
tbl
inner join ( -- only return rows in tbl which match this subquery
select
max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
from
tbl
group by
NUMBER -- how to group rows for the MAX aggregation
) maxID
on maxID.maxID = tbl.id -- join condition ie only return rows in tbl
-- whose ID is also a MAX ID for a given NUMBER
You will use the following query:
SELECT * FROM [table] GROUP BY NUMBER;
Where [table] is the name of the table.
This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.