I got stuck on SQL subquery selection. Right now, I have a table products:
id | name | description
----+-------+----------------------
6 | 123 | this is a +
| | girl.
7 | 124 | this is a +
| | girl.
8 | 125 | this is a +
| | girl.
9 | 126 | this is a +
| | girl.
10 | 127 | this is a +
| | girl.
11 | 127 | this is a +
| | girl. Isn't this?
12 | 345 | this is a cute slair
13 | ggg | this is a +
| | girl
14 | shout | this is a monster
15 | haha | they are cute
16 | 123 | this is cute
What I want to do is to find ( the total number of records and the first 5 records ) which contains '1' or 'this' in either name or description columns.
What I can figure out is so ugly:
SELECT *, (select count(id)
from (SELECT * from products
where description like any (array['%1%','%this%'])
or name like any (array['%1%','%this%'])
) as foo
) as total
from (SELECT * from products
where description like any (array['%1%','%this%'])
or name like any (array['%1%','%this%']))
) as fooo
limit 5;
You can use the aggregate function count() as window function to compute the total count in the same query level:
SELECT id, name, description, count(*) OVER () AS total
FROM products p
WHERE description LIKE ANY ('{%1%,%this%}'::text[])
OR name LIKE ANY ('{%1%,%this%}'::text[])
ORDER BY id
LIMIT 5;
Quoting the manual on window functions:
In addition to these functions, any built-in or user-defined aggregate function can be used as a window function
This works, because LIMIT is applied after window functions.
I also use an alternative syntax for array literals. One is as good as the other. This one is shorter for longer arrays. And sometimes an explicit type cast is needed. I am assuming text here.
It is simpler and a bit faster than the version with a CTE in my test.
BTW, this WHERE clause with a regular expression is shorter - but slower:
WHERE description ~ '(1|this)'
OR name ~ '(1|this)'
Ugly, but fast
One more test: I found the primitive version (similar to what you had already) to be even faster:
SELECT id, name, description
, (SELECT count(*)
FROM products p
WHERE description LIKE ANY ('{%1%,%this%}'::text[])
OR name LIKE ANY ('{%1%,%this%}'::text[])
) AS total
FROM products p
WHERE description LIKE ANY ('{%1%,%this%}'::text[])
OR name LIKE ANY ('{%1%,%this%}'::text[])
ORDER BY id
LIMIT 5;
Assuming you are using postgresql 9.0+, you can use CTE's for this.
Eg.
WITH p AS (
SELECT *
FROM products
WHERE description LIKE ANY (ARRAY['%1%','%this%']) OR name LIKE ANY (ARRAY['%1%','%this%'])
)
SELECT *,
(select count(*) from p) as total
FROM p
ORDER BY id LIMIT 5;
Related
I have event data that looks like this:
id | instance_id | value
1 | 1 | a
2 | 1 | ap
3 | 1 | app
4 | 1 | appl
5 | 2 | b
6 | 2 | bo
7 | 1 | apple
8 | 2 | boa
9 | 2 | boat
10 | 2 | boa
11 | 1 | appl
12 | 1 | apply
Basically, each row is a user typing a new letter. They can also delete letters.
I'd like to create a dataset that looks like this, let's call it data
id | instance_id | value
7 | 1 | apple
9 | 2 | boat
12 | 1 | apply
My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.
To do so, I was planning to do a regex operation like so:
select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))
Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.
The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?
Try simple LIKE operator instead of regex patterns:
SELECT * FROM data d1
WHERE NOT EXISTS (
SELECT * FROM data d2
WHERE d2.value LIKE d1.value ||'_%'
)
Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39
Create an index on value column, this should speed up the query a bit.
To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:
with cte as (
select
*,
length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
from data)
select * from cte where is_peak order by id;
Demo
I'm having quite a bit of trouble figuring out exactly how to rearrange a table. I have a large table that looks something like this:
+--------+-----------+
| NAME | ACCOUNT # |
+--------+-----------+
| Nike | 87 |
| Nike | 12 |
| Adidas | 80 |
| Adidas | 21 |
+--------+-----------+
And I want to rearrange it to look like this:
+------+--------+
| Nike | Adidas |
+------+--------+
| 87 | 80 |
| 12 | 21 |
+------+--------+
But I can't seem to figure out how. I tried using PIVOT, but that only works with aggregate functions. I tried using a FOR LOOP as well, but couldn't get it work just right.
You can do this in several ways, but all being by enumerating the rows. Here is an example using conditional aggregation:
select max(case when name = 'Nike' then account end) as Nike,
max(case when name = 'Adidas' then account end) as Adidas
from (select t.*,
row_number() over (partition by name order by account desc) as seqnum
from t
) t
group by seqnum;
Consider again a pivot solution but first adding a rownumber for rolling Name group counts. Below assumes an autonumber ID field:
SELECT * FROM
(
SELECT Name, "Account #",
(ROW_NUMBER() OVER(PARTITION BY Name ORDER BY ID)) GrpRowNum
/* ALT: (SELECT Count(*) FROM Table1 sub
* WHERE sub.Name = Table1.Name AND sub.ID <= Table1.ID) GrpRowNum */
FROM Table1
)
PIVOT
(
SUM("Account #")
FOR Name IN ('Nike', 'Adidas')
)
ORDER BY RowNum;
However, for your ~200 items, you cannot easily render the Pivot's IN clause without various workarounds including PIVOT XML output or stored procedures with PL/SQL. Similarly, you could use general purpose coding (Java, PHP, Python, R) to retreive SELECT DISTINCT Name FROM Table1 resultset in vector/array, joining element values (collapsing or imploding arrays) with quotes and comma separators, and dropping the entire list in IN clause.
I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12
I have a table in MS Access which looks basically like this:
Table Name : Customer_Categories
+----------------------+------------+-------+
| Email | CategoryID | Count |
+----------------------+------------+-------+
| jim#example.com | 10 | 4 |
+----------------------+------------+-------+
| jim#example.com | 2 | 1 |
+----------------------+------------+-------+
| simon#example.com | 5 | 2 |
+----------------------+------------+-------+
| steven#example.com | 10 | 16 |
+----------------------+------------+-------+
| steven#example.com | 5 | 3 |
+----------------------+------------+-------+
In this table there are ≈ 350,000 records. The characteristics are this:
Duplicate values for Email, CategoryID and Count
Count refers to the number of times this customer has ordered from this category
What I want
I want to create a table that consists of a unique email address along with the CategoryID this customer has purchased from the most.
So the above example would be:
+----------------------+------------+
| Email | CategoryID |
+----------------------+------------+
| jim#example.com | 10 |
+----------------------+------------+
| simon#example.com | 5 |
+----------------------+------------+
| steven#example.com | 10 |
+----------------------+------------+
What I have tried
I have written a query that achieves what I want:
SELECT main.Email, (SELECT TOP 1 CategoryID
FROM Customer_Categories
WHERE main.Email = Email
GROUP BY CategoryID
ORDER BY MAX(Count) DESC, CategoryID ASC) AS Category
FROM Customer_Categories AS main
GROUP BY main.Email;
This works a treat and does exactly what I want. It returns results in around 8 seconds. However I need this data in a new table because I then want to update another table with the categoryID. When I add INTO Customer_Favourite_Categories after the sub-query to add this data to a new table rather than just return the result set and run the query it never finishes. I've left it running for about 45 minutes and it does nothing.
Is there any way around this?
If select into doesn't work, use insert into:
create table Customer_Favorite_Categories (
email <email type>,
FavoriteCategory <CategoryId type>
);
insert into Customer_Favorite_Categories
SELECT main.Email, (SELECT TOP 1 CategoryID
FROM Customer_Categories
WHERE main.Email = Email
GROUP BY CategoryID
ORDER BY MAX(Count) DESC, CategoryID ASC) AS Category
FROM Customer_Categories AS main
GROUP BY main.Email;
Try this:
SELECT Distinct(Email),Max(CategoryID )
FROM Customer_Categories group by Email
I use sub-queries for this quite frequently. Your query in "What I have tried" is close, but just a little off in syntax. Something like the following should get what you are after. Count is in square-brackets since it's a reserved word in SQL. The spacing I use in my SQL is conventional, so edit to your liking.
SELECT Email,
CategoryID
FROM MyTable AS m,
(
SELECT Email,
MAX( [Count] ) AS mc
FROM MyTable
GROUP BY Email
) AS f
WHERE m.Email = f.Email
AND m.[Count] = f.mc;
I need to create a view that automatically adds virtual row number in the result. the graph here is totally random all that I want to achieve is the last column to be created dynamically.
> +--------+------------+-----+
> | id | variety | num |
> +--------+------------+-----+
> | 234 | fuji | 1 |
> | 4356 | gala | 2 |
> | 343245 | limbertwig | 3 |
> | 224 | bing | 4 |
> | 4545 | chelan | 5 |
> | 3455 | navel | 6 |
> | 4534345| valencia | 7 |
> | 3451 | bartlett | 8 |
> | 3452 | bradford | 9 |
> +--------+------------+-----+
Query:
SELECT id,
variety,
SOMEFUNCTIONTHATWOULDGENERATETHIS() AS num
FROM mytable
Use:
SELECT t.id,
t.variety,
(SELECT COUNT(*) FROM TABLE WHERE id < t.id) +1 AS NUM
FROM TABLE t
It's not an ideal manner of doing this, because the query for the num value will execute for every row returned. A better idea would be to create a NUMBERS table, with a single column containing a number starting at one that increments to an outrageously large number, and then join & reference the NUMBERS table in a manner similar to the variable example that follows.
MySQL Ranking, or Lack Thereof
You can define a variable in order to get psuedo row number functionality, because MySQL doesn't have any ranking functions:
SELECT t.id,
t.variety,
#rownum := #rownum + 1 AS num
FROM TABLE t,
(SELECT #rownum := 0) r
The SELECT #rownum := 0 defines the variable, and sets it to zero.
The r is a subquery/table alias, because you'll get an error in MySQL if you don't define an alias for a subquery, even if you don't use it.
Can't Use A Variable in a MySQL View
If you do, you'll get the 1351 error, because you can't use a variable in a view due to design. The bug/feature behavior is documented here.
Oracle has a rowid pseudo-column. In MySQL, you might have to go ugly:
SELECT id,
variety,
1 + (SELECT COUNT(*) FROM tbl WHERE t.id < id) as num
FROM tbl
This query is off the top of my head and untested, so take it with a grain of salt. Also, it assumes that you want to number the rows according to some sort criteria (id in this case), rather than the arbitrary numbering shown in the question.