Round in SQL Server - sql

I'm trying to figure out away to do this in SQL:
100000 % 1000 = 0
1150000%1000000 = 150000
I don't know if there is anything like that in SQL. Or even if there is, I don't know what it is called.
Any idea?

SELECT 1150000 / 1000000 AS result, 1150000 % 1000000 AS reminder;
Wich gives : result = 1 & reminder = 150000

Related

DSE: Query Timeout/Slow

I am currently running a cluster of 3 nodes with 200 mill of data and the specific vertex I'm querying a total of 25 mill vertex and 30 Mill edges. I am running the following query
g.V().hasLabel('people_node').has("age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()
I have tried this query on a smaller set of ~100 vertex and edges and the profiler showed that indexes have been used for all parts of the query. However, I think the problem might be in my schema which is shown below.
Schema
schema.propertyKey('id').Text().ifNotExists().create()
schema.propertyKey('name').Text().ifNotExists().create()
schema.propertyKey('age').Int().ifNotExists().create()
schema.propertyKey('location').Point().withGeoBounds().ifNotExists().create()
schema.propertyKey('gender').Text().ifNotExists().create()
schema.propertyKey('dob').Timestamp().ifNotExists().create()
schema.propertyKey('tags').Text().ifNotExists().create()
schema.propertyKey('date_posted').Timestamp().ifNotExists().create()
schema.vertexLabel('people_node').properties('id','name','location','gender','dob').create()
schema.vertexLabel('questions_node').properties('id','tags','date_posted').create()
schema.edgeLabel('posted_question').single().connection('people_node','questions_node').create()
Indexes Used
schema.vertexLabel("people_node").index("search").search().by("name").by("age").by("gender").by("location").by("dob").ifNotExists().add()
schema.vertexLabel("people_node").index("people_node_index").materialized().by("id").ifNotExists().add()
schema.vertexLabel("questions_node").index("search").search().by("date_posted").by("tags").ifNotExists().add()
schema.vertexLabel("questions_node").index("questions_node_index").materialized().by("id").ifNotExists().add()
I have also read about "OLAP" queries I believe I have activated it but the query is still way too slow. Any advise or insight on what is slowing it down will be greatly appreciated.
Profile Statement (OLTP)
gremlin> g1.V().has("people_node","age", inside(0,25)).filter(outE('posted_question').count().is(gt(1))).profile()
==>Traversal Metrics
Step Count Traversers
Time (ms) % Dur
=============================================================================================================
DsegGraphStep(vertex,[],(age < 25 & age > 0 & l... 1 1
38.310 25.54
query-optimizer
0.219
\_condition=((age < 25 & age > 0 & label = people_node) & (true))
query-setup
0.001
\_isFitted=true
\_isSorted=false
\_isScan=false
index-query
26.581
\_indexType=Search
\_usesCache=false
\_statement=SELECT "community_id", "member_id" FROM "MiniGraph"."people_node_p" WHERE "solr_query" = '{"q
":"*:*", "fq":["age:{0 TO 25}"]}' LIMIT ?; with params (java.lang.Integer) 50000
\_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
al.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=true,
async=true}
TraversalFilterStep([DsegVertexStep(OUT,[posted...
111.471 74.32
DsegVertexStep(OUT,[posted_question],edge,(di... 1 1
42.814
query-optimizer
0.227
\_condition=((direction = OUT & label = posted_question) & (true))
query-setup
0.036
\_isFitted=true
\_isSorted=false
\_isScan=false
vertex-query
29.908
\_usesCache=false
\_statement=SELECT * FROM "MiniGraph"."people_node_e" WHERE "community_id" = ? AND "member_id" = ? AND "
~~edge_label_id" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1300987392, (j
ava.lang.Long) 1026, (java.lang.Integer) 65584, (java.lang.Integer) 2
\_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio
nal.empty, pagingState=null, pageSize=-1, user=Optional[cassandra], waitForSchemaAgreement=tru
e, async=true}
\_usesIndex=false
RangeGlobalStep(0,2) 1 1
0.097
CountGlobalStep 1 1
0.050
IsStep(gt(1))
68.209
DsegPropertyLoadStep
0.205 0.14
>TOTAL - -
149.986 -
Next, due to the partial query being much faster I assume the long time consumption is due to the necessary graph traversals. Hence, is it possible to cache or activate the indexes (_usesIndex=false) so that OLAP queries to be much faster?
Will you please post the output of the .profile statement?
Semanticaly, it looks like you're trying to find all "people" under the age of 25 that have more than 1 posted question. Is that accurate?

DB2 COALESCE - notable impact on query time execution

I've noted that using COALESCE (in my case) to avoid possible NULL value in prepared statement causes a decrease in performance of DB query time execution. Can someone explain me what is the root cause and how can I overcome that issue? Queries samples below:
QUERY 1 (execution time 3 s):
SELECT TABLE_A.Y, TABLE_B.X
FROM ...
WHERE Z = ? AND TABLE_A.ABC = ? AND
TABLE_A.QWERTY = ? AND TABLE_A.Q = TABLE_B.Q;
QUERY 2 (execution time 210 s):
SELECT TABLE_A.Y, TABLE_B.X
FROM ...
WHERE Z = ? AND (
(COALESCE(?,'')='') OR
(TABLE_A.ABC = ? AND TABLE_A.QWERTY = ? AND TABLE_A.Q = TABLE_B.Q)
);
The only difference is using (COALESCE(?,'')='').
The bigger problem I see is that QUERY 1 has 3 placeholders whereas QUERY 2 has 4 placeholders.
I think what you're trying to do is that you want to make your placeholders optional.
A simple way to do this is to fix QUERY 1 as follows
SELECT TABLE_A.Y, TABLE_B.X
FROM TABLE_A
INNER JOIN TABLE_B ON TABLE_A.Q = TABLE_B.Q;
WHERE Z = ?
AND TABLE_A.ABC = COALESCE(?,TABLE_A.ABC)
AND TABLE_A.QWERTY = COALESCE(?,TABLE_A.QWERTY)

How to specify a decimal filter in 'select into table where' in ABAP?

I need a query like this in ABAP, but this doesn't work.
SELECT * FROM table
INTO i_tab
WHERE amount = 100,15
I've tried:
WHERE amount = '100,15' but that doesn't work either.
How to specify the decimal in my where syntax?
The correct syntax is:
SELECT *
FROM table
INTO i_tab
WHERE amount = '100.15'
I outsmarted it by changing the logic into
WHERE amount > 100 AND amount < 101

Query Optimization for MS Sql Server 2012

I have a table containing ~5,000,000 rows of scada data, described by the following:
create table data (o int, m money).
Where:
- o is PK with clustered index on it. o's fill factor is close to 100%. o represents the date of meter reading, can be thought of as X axis.
- m is a decimal value laying within 1..500 region and is the actual meter reading can be thought of as Y axis.
I need to find out about certain patterns i.e. when, how often and for how long they had been occurring.
Example. Looking for all occurrences of m changing by a region from 500 to 510 within 5 units (well from 1 to 5) of o I run the following query:
select d0.o as tFrom, d1.o as tTo, d1.m - d0.m as dValue
from data d0
inner join data d1
on (d1.o = d0.o + 1 or d1.o = d0.o + 2 or d1.o = d0.o + 3 or d1.o = d0.o + 4)
and (d1.m - d0.m) between 500 and 510
the query takes 23 seconds to execute.
Previous version took 30 minutes (90 times slower), I' managed to optimize it using a naive approach by replacing : on (d1.o - d0.o) between 1 and 4 with on (d0.o = d1.o - 1 or d0.o = d1.o - 2 or d0.o = d1.o - 3 or d0.o = d1.o - 4)
It's clear to me why it's faster - on one hand indexed column scan should fork fast enough on another one I can afford it as dates are discrete (and I always give 5 minutes grace time to any o region, so for 120 minutes it's 115..120 region). I can't use the same approach with m values as they are integral though.
Things I've tried so far:
Soft sharding by applying where o between #oRegionStart and #oRegionEnd at the bottoom of my script. and running it within a loop, fetching results into a temp table. Execution time - 25 seconds.
Hard sharding by splitting data into a number of physical tables. The result is 2 minutes nevermind the maintenance hassle.
Using some precooked data structures, like:
create table data_change_matrix (o int, dM5Min money, dM5Max money, dM10Min money, dM10Max money ... dM{N}Min money, dM{N}Max money)
where N is max dept for which I run the analysis. Having such table I could easily write a query:
select * from data_change_matrix where dMin5Min between 500 and 510
The result is - it went nowhere due to the tremendous size requirements (5M X ~ 250) and maintenance related costs, I need to support that matrix actuality close to real time.
SQL CLR - don't even ask me what went wrong it just didn't work out.
Right now I'm out of inspiration and looking for help.
All in all - is it possible to get a close to instant response time running such type of queries on large volumes of data?
All's run on MS Sql Server 2012. Didn't try it on MS Sql Server 2014 but happy to do it if it'll make sense.
Update - execution plan: http://pastebin.com/PkSSGHvH.
Update 2 - While I really love LAG function suggested by usr I wonder if there's a LAG**S** function allowing for
select o, MIN(LAG**S**(o, 4)) over(...) - or what's its shortest implementation in TSL?
I tried something very similar using SQL CLR and got it working but the performance was awful.
I assume you meant to write "on (d1.o = ..." and not "on (d.o = ...". Anyway, I got pretty drastic improvements just by simplifying the statement (making it easy for the query optimizer to pick a better plan I guess):
select d0.o as tFrom, d1.o as tTo, d1.m - d0.m as dValue
from data d0
inner join data d1
on d1.o between d0.o + 1 and d0.o + 4
and (d1.m - d0.m) between 500 and 510
Good luck with your query!
You say you've already tried CLR but don't give any code.
It was fastest in my test for my sample data.
CREATE TABLE data
(
o INT PRIMARY KEY,
m MONEY
);
INSERT INTO data
SELECT TOP 5000000 ROW_NUMBER() OVER (ORDER BY ##SPID),
1 + ABS(CAST(CRYPT_GEN_RANDOM(4) AS INT) %500)
FROM master..spt_values v1,
master..spt_values v2
None of the versions actually return any results (it is impossible for m to be a decimal value laying within 1..500 and simultaneously for two m values to have a difference > 500) but disregarding this typical timings I got for the code submitted so far are.
+-----------------+--------------------+
| | Duration (seconds) |
+-----------------+--------------------+
| Lag/Lead | 39.656 |
| Original code | 40.478 |
| Between version | 21.037 |
| CLR | 13.728 |
+-----------------+--------------------+
The CLR code I used was based on that here
To call it use
EXEC [dbo].[WindowTest]
#WindowSize = 5,
#LowerBound = 500,
#UpperBound = 510
Full code listing
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
public struct DataRow
{
public int o;
public decimal m;
}
[Microsoft.SqlServer.Server.SqlProcedure]
public static void WindowTest(SqlInt32 WindowSize, SqlInt32 LowerBound, SqlInt32 UpperBound)
{
int windowSize = (int)WindowSize;
int lowerBound = (int)LowerBound;
int upperBound = (int)UpperBound;
DataRow[] window = new DataRow[windowSize];
using (SqlConnection conn = new SqlConnection("context connection=true;"))
{
SqlCommand comm = new SqlCommand();
comm.Connection = conn;
comm.CommandText = #"
SELECT o,m
FROM data
ORDER BY o";
SqlMetaData[] columns = new SqlMetaData[3];
columns[0] = new SqlMetaData("tFrom", SqlDbType.Int);
columns[1] = new SqlMetaData("tTo", SqlDbType.Int);
columns[2] = new SqlMetaData("dValue", SqlDbType.Money);
SqlDataRecord record = new SqlDataRecord(columns);
SqlContext.Pipe.SendResultsStart(record);
conn.Open();
SqlDataReader reader = comm.ExecuteReader();
int counter = 0;
while (reader.Read())
{
DataRow thisRow = new DataRow() { o = (int)reader[0], m = (decimal)reader[1] };
int i = 0;
while (i < windowSize && i < counter)
{
DataRow previousRow = window[i];
var diff = thisRow.m - previousRow.m;
if (((thisRow.o - previousRow.o) <= WindowSize-1) && (diff >= lowerBound) && (diff <= upperBound))
{
record.SetInt32(0, previousRow.o);
record.SetInt32(1, thisRow.o);
record.SetDecimal(2, diff);
SqlContext.Pipe.SendResultsRow(record);
}
i++;
}
window[counter % windowSize] = thisRow;
counter++;
}
SqlContext.Pipe.SendResultsEnd();
}
}
}
This looks like a great case for windowed aggregate functions or LAG. Here a version using LAG:
select *
from (
select o
, lag(m, 4) over (order by o) as m4
, lag(m, 3) over (order by o) as m3
, lag(m, 2) over (order by o) as m2
, lag(m, 1) over (order by o) as m1
, m as m0
from data
) x
where 0=1
or (m1 - m0) between 500 and 510
or (m2 - m0) between 500 and 510
or (m3 - m0) between 500 and 510
or (m4 - m0) between 500 and 510
Using a windowed aggregate function you should be able to remove the manual expansion of those LAG calls.
SQL Server implements these things using a special execution plan operator called Window Spool. That makes it quite efficient.

MySQL - getting from random column but specifying amounts

I would like to do this
SELECT *
FROM Thoughts
ORDER BY RAND()
LIMIT 1 WHERE ups > 5
...but it is returning an error. Do you know an alternative around that? I'm a bit new to MySQL, thanks.
The order of the clauses is important. Do
SELECT * FROM Thoughts WHERE ups > 5 ORDER BY RAND() LIMIT 1
Also, in the future, post the error you that you're getting. "An error" is amazingly unspecific.
order by rand() may cause performance issue, instead try to do in following way:
// what NOT to do:
$r = mysql_query("SELECT * FROM Thoughts WHERE ups > 5 ORDER BY RAND() LIMIT 1");
// much better:
$r = mysql_query("SELECT count(*) FROM Thoughts WHERE ups > 5 ");
$d = mysql_fetch_row($r);
$rand = mt_rand(0,$d[0] - 1);
$r = mysql_query("SELECT * FROM Thoughts WHERE ups > 5 LIMIT $rand, 1");