SQL in KDB or am I crazy?

SQL in KDB or am I crazy? - sql

I am trying to see if I can use KDB for some of my current work. I have a fair bit of code in legacy SQL and prospect of reuse seems exciting.
Which is when I came across: http://code.kx.com/q/interfaces/q-client-for-odbc/
This link only speaks of SQL select - is it OK to use this for insert and delete as well? What about performance?

Based on your question, I'm not sure this will do what you are hoping for. You seem to want to reuse SQL code on a non-SQL database.
This driver does not run SQL against the current database, it allows you to connect to an external database, and pull back data using the SQL capability of that other database. (ODBC is a standardised driver system for connecting to various kinds of databases, sending queries, and returning data).
This would only be useful if you intended to leave two different databases running side-by-side, and needed them to interact at the database level (rather than, as #millimoose mentions above, connecting to them individually from your application).

It is seldom used, but there is a way to use ANSI SQL with KDB. Just prefix the query with s)
q)t:([]col1:1 1 2 2;col2:10 10 20 20; col3:5.0 2.0 2.3 2.4; grp:aabc)
q)t
col1 col2 col3 grp`
------------------
1 10 5 a
1 10 2 a
2 20 2.3 b
2 20 2.4 c
q) /standard select
q)select from t
col1 col2 col3 grp
------------------`
1 10 5 a
1 10 2 a
2 20 2.3 b
2 20 2.4 c
q)/SQL type select with select *
q)select * from t
'rank
q) /Prefix the query with s)
q)s)select * from t
col1 col2 col3 grp
------------------
1 10 5 a
1 10 2 a
2 20 2.3 b
2 20 2.4 c
Now - this feature is rarely used and the parser is not optimized for this type of usage and resources are scarce. You'd probably spend more time debugging issues with this than you would just by converting your code to Q. Hope this helps.
Another option is to use the qodbc server -- http://code.kx.com/q/interfaces/q-server-for-odbc/

Related

Cross join in Excel or SQL

I am using Gephi to create a network graph, here is a small subset of the data that I have:
ID Label
1 Sleep quality
2 Stress
3 Healthy Eating
4 Tremor
5 Balance
6 Drooling
7 Exercise
8 Mood
9 Speech
10 Parkinson's On-Off
So I want my graph to have these 10 nodes.
Then for the edges, I have:
Source Target User
1 5 5346
5 4 5346
4 7 5346
7 6 5346
6 9 5346
9 3 5346
3 2 5346
2 8 5346
8 10 5346
The "User" column is something I have added to explain the problem I am having. I am using a big database (in SQL) to obtain this data. On a mobile phone app, users select 10 of the different choices available (as listed in the nodes). In SQL I can query the data easily so that I can obtain the 10 choices of each of the users.
It is easy to create a graph with the edges with the information in the edges table but I would also like to connect each edge to all other edges, this is important for me. So for example, 1 connects to all those in "target". Then 5 connects to all those in "target" and so until all nodes are connected to each other for each user.
I can do this manually but the original data set has 2000+ users and this will take a long time. I know that there is a way of using cross join, possibly in Excel or in SQL... but I'm unsure how to do this..
Thanks!

You can drop this cross join into your SQL: (It'll list all Source's with all possible Target's.)
(SELECT e.Source as Source, n.ID as Target
FROM
(SELECT DISTINCT Source FROM tblEdges) as e
cross join (SELECT DISTINCT ID FROM tblNodes) as n
) as xCross

Aggregation over order-dependent partition?

I have a source data set like this (simplified to be more clear):
Key F1 F2
1 X 4
2 X 5
3 Y 6
4 X 9
5 X 7
6 X 8
7 Y 9
8 X 6
9 X 5
10 Y 3
The data is sorted by the Key field. Now, I want to compute an aggregate of the F2 field over partitions that are defined by the F1 field: A partition starts at the first X value and ends with the first subsequent Y value.
So, for example, I might want wo compute the MIN() over the partitions defined as described above. Then the result set would look like this:
rownum MIN(F2)
1 4
2 7
3 3
I have tried a number of resources (incl. our own intranet community and of course stackoverflow) but found nothing for my case. Usually partitioning only works with a field that can be used to identify the partitions. Here, the partitions are defined by a change in a field's content with respect to a given order.
Although I am aware that I may have to resort to writing a procedural solution I would prefer to solve this in pure SQL.
Any ideas how such a partitioning could be achieved with a SQL select statement?
Thanks and regards
Kai.

A little bit shorter solution: http://sqlfiddle.com/#!12/7390d/24
Query:
select min(f2)
from t t1
group by (select max(key)
from t t2
where t2.f1='Y' and
t1.key > t2.key)
Result:
| MIN |
-------
| 4 |
| 7 |
| 3 |
The idea is to find the key of preceding 'Y' for each row and group by it. Should work with any SQL engine.

You didn't specify engine or dialect or version so I assumed SQL Server 2012.
Example that you can run to see the solution: http://sqlfiddle.com/#!6/f5d38/21
You solve it by creating correct partitions in your set. Code looks like this.
WITH groupLimits as
(
SELECT
[Key] AS groupend
,COALESCE(LAG([Key]) OVER (order by [Key]),0)+1 AS groupstart
FROM sourceData
WHERE F1 = 'Y'
)
SELECT
MIN(sourceData.F2)
FROM groupLimits
INNER JOIN sourceData
ON sourceData.[Key] BETWEEN groupLimits.groupstart and groupLimits.groupend
GROUP BY groupLimits.groupstart
ORDER BY groupLimits.groupstart

how does a SQL query work?

How does a SQL query work?
How does it get compiled?
Is the from clause compiled first to see if the table exists?
How does it actually retrieve data from the database?
How and in what format are the tables stored in a database?
I am using phpmyadmin, is there any way I can peek into the files where data is stored?
I am using MySQL

sql execution order:
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT .
SQL Query mainly works in three phases .
1) Row filtering - Phase 1: Row filtering - phase 1 are done by FROM, WHERE , GROUP BY , HAVING clause.
2) Column filtering: Columns are filtered by SELECT clause.
3) Row filtering - Phase 2: Row filtering - phase 2 are done by DISTINCT , ORDER BY , LIMIT clause.
In here i will explain with an example . Suppose we have a students table as follows:
id_
name_
marks
section_
1
Julia
88
A
2
Samantha
68
B
3
Maria
10
C
4
Scarlet
78
A
5
Ashley
63
B
6
Abir
95
D
7
Jane
81
A
8
Jahid
25
C
9
Sohel
90
D
10
Rahim
80
A
11
Karim
81
B
12
Abdullah
92
D
Now we run the following sql query:
select section_,sum(marks) from students where id_<10 GROUP BY section_ having sum(marks)>100 order by section_ LIMIT 2;
Output of the query is:
section_
sum
A
247
B
131
But how we got this output ?
I have explained the query step by step . Please read bellow:
1. FROM , WHERE clause execution
Hence from clause works first therefore from students where id_<10 query will eliminate rows which has id_ greater than or equal to 10 . So the following rows remains after executing from students where id_<10 .
id_
name_
marks
section_
1
Julia
88
A
2
Samantha
68
B
3
Maria
10
C
4
Scarlet
78
A
5
Ashley
63
B
6
Abir
95
D
7
Jane
81
A
8
Jahid
25
C
9
Sohel
90
D
2. GROUP BY clause execution
now GROUP BY clause will come , that's why after executing GROUP BY section_ rows will make group like bellow:
id_
name_
marks
section_
9
Sohel
90
D
6
Abir
95
D
1
Julia
88
A
4
Scarlet
78
A
7
Jane
81
A
2
Samantha
68
B
5
Ashley
63
B
3
Maria
10
C
8
Jahid
25
C
3. HAVING clause execution
having sum(marks)>100 will eliminates groups . sum(marks) of D group is 185 , sum(marks) of A groupd is 247 , sum(marks) of B group is 131 , sum(marks) of C group is 35 . So we can see tha C groups's sum is not greater than 100 . So group C will be eliminated . So the table looks like this:
id_
name_
marks
section_
9
Sohel
90
D
6
Abir
95
D
1
Julia
88
A
4
Scarlet
78
A
7
Jane
81
A
2
Samantha
68
B
5
Ashley
63
B
4. SELECT clause execution
select section_,sum(marks) query will only decides which columns to prints . It is decided to print section_ and sum(marks) column .
section_
sum
D
185
A
245
B
131
5. ORDER BY clause execution
order by section_ query will sort the rows ascending order.
section_
sum
A
245
B
131
D
185
6. LIMIT clause execution
LIMIT 2; will only prints first 2 rows.
section_
sum
A
245
B
131
This is how we got our final output .

Well...
First you have a syntax check, followed by the generation of an expression tree - at this stage you can also test whether elements exist and "line up" (i.e. fields do exist WITHIN the table). This is the first step - any error here any you just tell the submitter to get real.
Then you have.... analysis. A SQL query is different from a program in that it does not say HOW to do something, just WHAT THE RESULT IS. Set based logic. So you get a query analyzer in (depending on product bad to good - oracle long time has crappy ones, DB2 the most sensitive ones even measuring disc speed) to decide how best to approach this result. This is a really complicated beast - it may try dozens or hundreds of approaches to find one he believes to be fastest (cost based, basically some statistics).
Then that gets executed.
The query analyzer, by the way, is where you see huge differences. Not sure about MySQL - SQL Server (Microsoft) shines in that it does not have the best one (but one of the good ones), but that it really has nice visual tools to SHOW the query plan, compare the estimates the the analyzer to the real needs (if they differ too much table statistics may be off so the analyzer THINKS a large table is small). They present that nicely visually.
DB2 had a great optimizer for some time, measuring - i already said - disc speed to put it into it's estimates. Oracle went "left to right" (no real analysis) for a long time, and took user provided query hints (crap approach). I think MySQL was VERY primitive too in the start - not sure where it is now.
Table format in database etc. - that is really something you should not care for. This is documented (clearly, especially for an open source database), but why should you care? I have done SQL work for nearly 15 years or so and never had that need. And that includes doing quite high end work in some areas. Unless you try building a database file repair tool.... it makes no sense to bother.

The order of SQL statement clause execution-
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
My answer is specific to Oracle database, which provides tutorials pertaining to your queries. Well, when SQL database engine processes any SQL query/statement, It first starts parsing and within parsing it performs three checks Syntax, Semantic and Shared Pool. To know how do these checks work? Follow the link below.
Once query parsing is done, it triggers the Execution plan. But hey Database Engine! you are smart enough. You do check if this SQL query has already been parsed (Soft Parse), if so then you directly jump on execution plan or else you deep dive and optimize the query (Hard Parse). While performing hard parse, you also use a software called Row Source Generation which provides Iterative Execution Plan received from optimizer. Enough! see the SQL query processing stages below.
Note - Before execution plan, it also performs Bind operations for variable's values and once the query is executed It performs Fetch to obtain the records and finally store into result set. So in short, the order is-
PASRE -> BIND -> EXECUTE -> FETCH
And for in depth details, this tutorial is waiting for you.
This may be helpful to someone.

If you're using SSMS for Sql Server and want to know where your data files are stored, you can use this query
SELECT
mdf.database_id,
mdf.name,
mdf.physical_name as data_file,
ldf.physical_name as log_file,
db_size = CAST((mdf.size * 8.0)/1024 AS DECIMAL(8,2)),
log_size = CAST((ldf.size * 8.0 / 1024) AS DECIMAL(8,2))
FROM (SELECT * FROM sys.master_files WHERE type_desc = 'ROWS' ) mdf
JOIN (SELECT * FROM sys.master_files WHERE type_desc = 'LOG' ) ldf
ON mdf.database_id = ldf.database_id
Here's a copy of the output

Looking for an SQL statement which groups by type

first, I was pretty lost giving this question a correct title.
I'm working on a system which allows me to find specific networking devices. A network device (called "system" in my example) has a number of ports, where each port can have a specific configuration. An example would be: Return all devices which have at least 2 ports of type 100BASE-TX and at least 1 port of 1000BASE-TX.
Here's my example table which is named "ports":
system port type
1 1 10BASE-T
1 1 100BASE-TX
1 1 1000BASE-TX
1 2 10BASE-T
1 2 100BASE-TX
1 2 1000BASE-TX
1 3 10BASE-T
1 3 100BASE-TX
1 3 1000BASE-TX
2 1 100BASE-TX
2 2 100BASE-TX
2 3 100BASE-TX
Column descriptions:
"system" is the ID of the system which contains the ports
"port" is the ID of the port
"type" is the type which that single port can have
I'm pretty lost here, and I don't ask for a complete query, maybe some hints are enough for me to figure out the rest. I already tried to join the table with itself to retrieve all possible port combinations, but from that point I was lost again.
Here's my pseudo-code:
SELECT system FROM ports WHERE (number-of-possible-100base-tx-ports >= 2 AND number-of-possible-1000base-tx-ports >= 1)
Here's my expected result:
system
1
It is important to know that a port can be either of one or another type. Basically I want the user to ask: "List all devices which support 2 100BASE-TX ports and at least 1 1000BASE-TX port at the same time". For example, the following pseudo-sql should not return any results:
SELECT system FROM ports WHERE (number-of-possible-100base-tx-ports >= 2 AND number-of-possible-1000base-tx-ports >= 2)
This query shouldn't return any result since no device has more than three ports overall.
EDIT
Here's another pseudo-SQL which represent the question better:
SELECT system FROM ports WHERE (at-least-1-type = 1000BASE-TX AND at-least-2-other-types = 100BASE-TX) AND portid-from-type-1000BASE-TX <> portid-from-type-100BASE-TX
EDIT #2
After one night, I realized that it might not be possible using plain SQL. What I need would be an intermediate table containing all possible configurations per system, and I believe that table would be quite huge. Given the example table above, I would already have 27 different combinations for system 1; regular networking devices have 12, 24 or 48 ports and storing all combinations in a database wouldn't be very efficient. I have to think of a programmatic way to solve this problem.
Thanks in advance!
Timo

I've had a bash at this using SQLite and this query seems to work ok for the limited test data I've tested it against.
select sys as system from (
select a.sys, count(distinct a.port) as want_a, count(distinct b.port) as want_b
from test a left join test b
on a.sys=b.sys and a.port<>b.port and a.type<>b.type
where
a.type='$type_a'
and (b.type='$type_b' or b.type is null)
and a.sys in (
select sys from test group by sys having count(distinct port) >= $want_a+$want_b
)
group by a.sys
having want_a >= $want_a and want_b >= $want_b
) z;
Where $want_a is the count of ports for $type_a and $want_b is the count for $type_b. So your initial query up there has want_a=2, type_a='100BASE-TX', want_b=1, type_b='1000BASE-TX'.
In the gist, the first file is mysql.sh (test driver script, ./mysql.sh < test.txt), second is test.txt (test data), third is gotest.sh (sqlite3 driver script, ./gotest.sh < test.txt), third is the SQL. All tests PASS in mysql and sqlite3 so that's promising.

I think you need to clarify this requirement a bit further:
Return all devices which have at least
2 ports of type 100BASE-TX and at
least 1 port of 1000BASE-TX.
Since a single port can have more than one type, would this device satisfy the query or not?
port type
1 100BASE-TX
1 1000BASE-TX
2 100BASE-TX
Taking your requirement literally, I think this device qualifies, but I suspect what you really want is a device which can support 2 100BASE-TX and 1 1000BASE-TX connections at the same time, so would need to have at least 3 ports.

The answer here is to normalize the data.
Table 1:
System ID - key
Description
Table 2:
Port Type ID - key
Description
Table 3:
System ID - Key
Port ID - Key
Port Type
Select count(*), port_type from table_1 a, table_3 c
where a.system_id=c.system_id
group by port_type
having count(*) > 2
I hope that gets you close.

Well, first i might question this table structure... but from your description of the system you're working on it may be difficult to change...
I would suggest a query with two sub-queries like this:
SELCT *
FROM
(
SELECT COUNT(port) AS Port1Count, system
FROM ports
WHERE type = '100BASE-TX'
AND Port1Count >= 2
GROUP BY system
) AS A
INNER JOIN
(
SELECT COUNT(port) AS Port2Count, system
FROM ports
WHERE type = '1000BASE-TX'
AND Port2Count >= 1
GROUP BY system
) AS B ON A.System = B.System
This may not be EXACT depending on the flavor of SQL and I may have some syntax wrong here or there since I didn't actually try building your table. Hope it helps!

After one night, I realized that it might not be possible using plain SQL. What I need would be an intermediate table containing all possible configurations per system, and I believe that table would be quite huge. Given the example table above, I would already have 27 different combinations for system 1; regular networking devices have 12, 24 or 48 ports and storing all combinations in a database wouldn't be very efficient. I have to think of a programmatic way to solve this problem.

SQL Recursive Tables

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.

Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/

Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.

I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.

Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild

There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!

I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?

There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas