Sqllite: finding abnormal values over time - sql

I have the following sqllite table:
CREATE TABLE test (
id INTEGER NOT NULL,
date TEXT,
account TEXT,
........
value TEXT,
.......
PRIMARY KEY (id),
CONSTRAINT composite UNIQUE (date, account)
)
I want to find all the account numbers where the value is greater than 0 on 2 separate dates . I'm thinking:
SELECT * from test WHERE value> 0 GROUP BY account
is probably a start, but I don't know how to evaluate the size of groups

One way to phrase this query is to aggregate over accounts having a greater than zero value, and then retain those accounts having two or more distinct dates:
SELECT
account
FROM test
WHERE value > 0
GROUP BY account
HAVING COUNT(DISTINCT date) >= 2
I see that your value column is declared as TEXT. I think this should probably be an integer if you want to do numeric comparisons with this column.

Related

How to find distinct values for another column which has critical data?

I currently have a column with has account numbers which are unique
I want to create unique values for each account unique account number in another column , so that I can use those to get the count of distinct values .
For example :
Account number unique value
12345. 67
56738. 87
28373. 58
28373. 58
So when I take distinct of the unique value column I get the same distinct count of that distinct count of account number .
So here the distinct count will be 3
dense_rank() should do what you want:
select t.*, dense_rank() over (order by account_number) as unique_id
from t;
Note that this can change over time as new accounts are added or removed. My recommendation if this is an issue is to create a separate table account_numbers that has an integer primary key. Use the primary key for the unique number.

How to find rows with max value without aggregate functions SQLPLUS Oracle11g

I'm trying to find rows with max credit on my table ,
CREATE TABLE Course(
CourseNr INTEGER,
CourseTitel VARCHAR(60),
CourseTyp VARCHAR(10),
CourselenghtDECIMAL,
Credit DECIMAL,
PRIMARY KEY (CourseNr)
);
and there is more than one courses with max value. I dont want to use any default functions for that, any ideas?
Presumably, you want the rows with the maximum credit. A common method is to find any rows that have no larger credit:
select c.*
from course c
where c.credit >= all (select c2.credit from course c2);
Get the rows with Credit for which there don't exist any rows with greater Credit:
SELECT
c.*
FROM Course c
WHERE
NOT EXISTS (
SELECT 1 FROM Course WHERE Credit > c.Credit
)

Select all rows of a SQL table which do not share a name

I have a table, call it widgets which has columns name and created_at, among others. I want to run a query that returns the count of all the rows of widgets which share the same name and have been created within a millisecond of each other.
This is the query that I have come up with, but it returns a number greater than the total number of rows in the table, can someone point out where I am going wrong?
SELECT COUNT (DISTINCT "t1"."id")
FROM
"tasks" "t1" ,"tasks" "t2"
WHERE
"t1"."name" = "t2"."name"
AND
date_trunc('milliseconds',"t1"."created_at") = date_trunc('milliseconds',"t2"."created_at")
You should add the condition:
and "t1"."id" <> "t2"."id"
where "id" is a primary key. In the lack of a primary key you can use ctid:
and "t1".ctid <> "t2".ctid

How to create an integer from a certain string cell in PSQL

I have three certain columns in a table I am trying to query, say ID(char), Amount(bigint) and Reference(char). Here is a sample of a few entries from this table. The first two rows have no entry in the third column.
ID | Amount | Reference
16266| 24000|
16267| -12500|
16268| 25000| abc:185729000003412
16269| 25000| abc:185730000003412
What I am trying to get is a query or a function that will return the ids of the duplicate rows that have the same amount and the same modulus (%100000000) of the number in the string in the reference column.
The only cells in the reference column I am interested in will all have 'abc:' before the whole number, and nothing after the number. I need some way to convert that final field (string) into a int so I can search for the modulus of that number
Here is the script I will run once I get the reference field converted into a number without the 'abc:'
CREATE TEMP TABLE tableA (
id int,
amount int,
referenceNo bigint)
INSERT INTO tableA (id, amount, referenceNo) SELECT id, net_amount, longnumber%100000000 AS referenceNo FROM deposit_item
SELECT DISTINCT * FROM tableA WHERE referenceNo > 1 AND amount > 1
Basically, how do I convert the reference field (abc:185729000003412) to an integer in PSQL (185729000003412 or 3412)?
Assuming that reference id is always delimited by :
split_part(Reference, ':', 2)::integer
should work.
Edit:
If you want to match abc: specifically - try this:
CASE
WHEN position('abc:' in Reference) > 0
THEN split_part(Reference, 'abc:', 2)::integer
ELSE 0
END
But you should indeed consider storing the xxx: prefix separately.

How can I calculate the top % daily price changes using MySQL?

I have a table called prices which includes the closing price of stocks that I am tracking daily.
Here is the schema:
CREATE TABLE `prices` (
`id` int(21) NOT NULL auto_increment,
`ticker` varchar(21) NOT NULL,
`price` decimal(7,2) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `ticker` (`ticker`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=2200 ;
I am trying to calculate the % price drop for anything that has a price value greater than 0 for today and yesterday. Over time, this table will be huge and I am worried about performance. I assume this will have to be done on the MySQL side rather than PHP because LIMIT will be needed here.
How do I take the last 2 dates and do the % drop calculation in MySQL though?
Any advice would be greatly appreciated.
One problem I see right off the bat is using a timestamp data type for the date, this will complicate your sql query for two reasons - you will have to use a range or convert to an actual date in your where clause, but, more importantly, since you state that you are interested in today's closing price and yesterday's closing price, you will have to keep track of the days when the market is open - so Monday's query is different than tue - fri, and any day the market is closed for a holiday will have to be accounted for as well.
I would add a column like mktDay and increment it each day the market is open for business. Another approach might be to include a 'previousClose' column which makes your calculation trivial. I realize this violates normal form, but it saves an expensive self join in your query.
If you cannot change the structure, then you will do a self join to get yesterday's close and you can calculate the % change and order by that % change if you wish.
Below is Eric's code, cleaned up a bit it executed on my server running mysql 5.0.27
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_today.price > 0
and p_yest.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Note the back-ticks as some of your column names and Eric's aliases were reserved words.
Also note that using a where clause for the first table would be a less expensive query - the where get's executed first and only has to attempt to self join on the rows that are greater than zero and have today's date
select
p_today.`ticker`,
p_today.`date`,
p_yest.price as `open`,
p_today.price as `close`,
((p_today.price - p_yest.price)/p_yest.price) as `change`
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.`date`) = date(p_yest.`date`) + INTERVAL 1 DAY
and p_yest.price > 0
where p_today.price > 0
and date(p_today.`date`) = CURRENT_DATE
order by `change` desc
limit 10
Scott brings up a great point about consecutive market days. I recommend handling this with a connector table like:
CREATE TABLE `market_days` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY USING BTREE (`market_day`),
UNIQUE KEY USING BTREE (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
As more market days elapse, just INSERT new date values in the table. market_day will increment accordingly.
When inserting prices data, lookup the LAST_INSERT_ID() or corresponding value to a given date for past values.
As for the prices table itself, you can make storage, SELECT and INSERT operations much more efficient with a useful PRIMARY KEY and no AUTO_INCREMENT column. In the schema below, your PRIMARY KEY contains intrinsically useful information and isn't just a convention to identify unique rows. Using MEDIUMINT (3 bytes) instead of INT (4 bytes) saves an extra byte per row and more importantly 2 bytes per row in the PRIMARY KEY - all while still affording over 16 million possible dates and ticker symbols (each).
CREATE TABLE `prices` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`price` decimal (7,2) NOT NULL DEFAULT '00000.00',
PRIMARY KEY USING BTREE (`market_day`,`ticker_id`),
KEY `ticker_id` USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;
In this schema each row is unique across each pair of market_day and ticker_id. Here ticker_id corresponds to a list of ticker symbols in a tickers table with a similar schema to the market_days table:
CREATE TABLE `tickers` (
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`ticker_symbol` VARCHAR(5),
`company_name` VARCHAR(50),
/* etc */
PRIMARY KEY USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
This yields a similar query to others proposed, but with two important differences: 1) There's no functional transformation on the date column, which destroys MySQL's ability to use keys on the join; in the query below MySQL will use part of the PRIMARY KEY to join on market_day. 2) MySQL can only use one key per JOIN or WHERE clause. In this query MySQL will use the full width of the PRIMARY KEY (market_day and ticker_id) whereas in the previous query it could only use one (MySQL will usually pick the more selective of the two).
SELECT
`market_days`.`date`,
`tickers`.`ticker_symbol`,
`yesterday`.`price` AS `close_yesterday`,
`today`.`price` AS `close_today`,
(`today`.`price` - `yesterday`.`price`) / (`yesterday`.`price`) AS `pct_change`
FROM
`prices` AS `today`
LEFT JOIN
`prices` AS `yesterday`
ON /* uses PRIMARY KEY */
`yesterday`.`market_day` = `today`.`market_day` - 1 /* this will join NULL for `today`.`market_day` = 0 */
AND
`yesterday`.`ticker_id` = `today`.`ticker_id`
INNER JOIN
`market_days` /* uses first 3 bytes of PRIMARY KEY */
ON
`market_days`.`market_day` = `today`.`market_day`
INNER JOIN
`tickers` /* uses KEY (`ticker_id`) */
ON
`tickers`.`ticker_id` = `today`.`ticker_id`
WHERE
`today`.`price` > 0
AND
`yesterday`.`price` > 0
;
A finer point is the need to also join against tickers and market_days in order to display the actual ticker_symbol and date, but these operations are very fast since they make use of keys.
Essentially, you can just join the table to itself to find the given % change. Then, order by change descending to get the largest changers on top. You could even order by abs(change) if you want the largest swings.
select
p_today.ticker,
p_today.date,
p_yest.price as open,
p_today.price as close,
--Don't have to worry about 0 division here
(p_today.price - p_yest.price)/p_yest.price as change
from
prices p_today
inner join prices p_yest on
p_today.ticker = p_yest.ticker
and date(p_today.date) = date(date_add(p_yest.date interval 1 day))
and p_today.price > 0
and p_yest.price > 0
and date(p_today.date) = CURRENT_DATE
order by change desc
limit 10