How to apply combinatorics on pairs in sql database?

How to apply combinatorics on pairs in sql database? - sql

I want to perform some kind of combinatorics from a database.
my database table:
start | end | costs | date
____________________________________
berlin | Moscow | 100 | 2014-12-10
berlin | paris | 200 | 2014-12-13
Moscow | berlin | 150 | 2014-12-20
Moscow | berlin | 100 | 2014-12-11
Possible pairs are all start-end combinations that have an equal combination ofend-start.
In my table, this applies to berlin-moscow and moscow-berlin.
I want to compute "roundtrips" going from one city to another, and returning later to the same startcity.
The resulting table I want to achieve would be:
start | end | costs | away | wayback
berlin | moscow | 250 | 2014-12-10 | 2014-12-20
berlin | moscow | 200 | 2014-12-10 | 2014-12-11
(this implies that when starting in berlin and going to moscow, the wayback will be moscow-berlin).
Is that possible with database queries?
First of all: how could I selfjoin a tables and get all distinct start-end pairs?

As a comment, i would advise you to not use reserved words as End for user defined objects as tables or columns.
SELECT F.Start,
F."end",
F.costs + R.costs AS costs,
F.date AS away,
R.date AS wayback
FROM Table1 F
JOIN Table1 R
ON F."end" = R.Start
AND R.date>F.date
AND F.start = R."end"
SQL Fiddle Demo

Related

SQL - specific requirement to compare tables

I'm trying to merge 2 queries into 1 (cuts the number of daily queries in half): I have 2 tables, I want to do a query against 1 table, then the same query against the other table that has the same list just less entries.
Basically its a list of (let's call it for obfuscation) people and hobby. One table is ALL people & hobby, the other shorter list is people & hobby that I've met. Table 2 would all be found in table 1. Table 1 includes entries (people I have yet to meet) not found in table 2
The tables are synced up from elsewhere, what I'm looking to do is print a list of ALL people in the first column then print the hobby ONLY of people that are on both lists. That way I can see the lists merged, and track the rate at which the gap between both lists is closing. I have tried a number of SQL combinations but they either filter out the first table and match only items that are true for both (i.e. just giving me table 2) or just adding table 2 to table 1.
Example of what I'm trying to do below:
+---------+----------+--+----------+---------+--+---------+----------+
| table1 | | | table2 | | | query | |
+---------+----------+--+----------+---------+--+---------+----------+
| name | hobby | | activity | person | | name | hobby |
| bob | fishing | | fishing | bob | | bob | fishing |
| bill | vidgames | | hiking | sarah | | bill | |
| sarah | hiking | | planking | sabrina | | sarah | hiking |
| mike | cooking | | | | | mike | |
| sabrina | planking | | | | | sabrina | planking |
+---------+----------+--+----------+---------+--+---------+----------+
Normally I'd just take the few days to learn SQL a bit better however I'm stretched pretty thin at work as it is!
I should mention the table 2 is flipped and the headings are all unique (don't think this matters)!

I think you just want a left join:
select t1.name, t2.activity as hobby
from table1 t1 left join
table2 t2
on t1.name = t2.person;

SQL - UNION vs NULL functions. Which is better?

I have three tables: ACCT, PERS, ORG. Each ACCT is owned by either a PERS or ORG. The PERS and ORG tables are very similar and so are all of their child tables, but all PERS and ORG data is separate.
I'm writing a query to get PERS and ORG information for each account in ACCT and I'm curious what the best method of combining the information is. Should I use a series of left joins and NULL functions to fill in the blanks, or should I write the queries separately and use UNION to combine?
I've already written separate queries for PERS ACCT's and another for ORG ACCT's and plan on using UNION. My question more pertains to best practice in the future.
I'm expecting both to give me my desired my results, but I want to find the most efficient method both in development time and run time.
EDIT: Sample Table Data
ACCT Table:
+---------+---------+--------------+-------------+
| ACCTNBR | ACCTTYP | OWNERPERSNBR | OWNERORGNBR |
+---------+---------+--------------+-------------+
| 555001 | abc | 3010 | |
| 555002 | abc | | 2255 |
| 555003 | tre | 5125 | |
| 555004 | tre | 4485 | |
| 555005 | dsa | | 6785 |
+---------+---------+--------------+-------------+
PERS Table:
+---------+--------------+---------------+----------+-------+
| PERSNBR | PHONE | STREET | CITY | STATE |
+---------+--------------+---------------+----------+-------+
| 3010 | 555-555-5555 | 1234 Main St | New York | NY |
| 5125 | 555-555-5555 | 1234 State St | New York | NY |
| 4485 | 555-555-5555 | 6542 Vine St | New York | NY |
+---------+--------------+---------------+----------+-------+
ORG Table:
+--------+--------------+--------------+----------+-------+
| ORGNBR | PHONE | STREET | CITY | STATE |
+--------+--------------+--------------+----------+-------+
| 2255 | 222-222-2222 | 1000 Main St | New York | NY |
| 6785 | 333-333-3333 | 400 4th St | New York | NY |
+--------+--------------+--------------+----------+-------+
Desired Output:
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
| ACCTNBR | ACCTTYP | OWNERPERSNBR | OWNERORGNBR | PHONE | STREET | CITY | STATE |
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
| 555001 | abc | 3010 | | 555-555-5555 | 1234 Main St | New York | NY |
| 555002 | abc | | 2255 | 222-222-2222 | 1000 Main St | New York | NY |
| 555003 | tre | 5125 | | 555-555-5555 | 1234 State St | New York | NY |
| 555004 | tre | 4485 | | 555-555-5555 | 6542 Vine St | New York | NY |
| 555005 | dsa | | 6785 | 333-333-3333 | 400 4th St | New York | NY |
+---------+---------+--------------+-------------+--------------+---------------+----------+-------+
Query Option 1: Write 2 queries and use UNION to combine them:
select a.acctnbr, a.accttyp, a.ownerpersnbr, a.ownerorgnbr, p.phone, p.street, p.city, p.state
from acct a
inner join pers p on p.persnbr = a.ownerpersnbr
UNION
select a.acctnbr, a.accttyp, a.ownerpersnbr, a.ownerorgnbr, o.phone, o.street, o.city, o.state
from acct a
inner join org o on o.orgnbr = a.ownerorgnbr
Option 2: Use NVL() or Coalesce to return a single data set:
SELECT a.acctnbr,
a.accttyp,
NVL(a.ownerpersnbr, a.ownerorgnbr) Owner,
NVL(p.phone, o.phone) Phone,
NVL(p.street, o.street) Street,
NVL(p.city, o.city) City,
NVL(p.state, o.state) State
FROM
acct a
LEFT JOIN pers p on p.persnbr = a.ownerpersnbr
LEFT JOIN org o on o.orgnbr = a.ownerorgnbr
There are way more fields in each of the 3 tables as well as many more PERS and ORG tables in my actual query. Is one way better (faster, more efficient) than another?

That depends, on what you consider "better".
Assuming, that you will always want to pull all rows from ACCT table, I'd say to go for the LEFT OUTER JOIN and no UNION. (If using UNION, then rather go for UNION ALL variant.)
EDIT: As you've already shown your queries, mine is no longer required, and did not match your structures. Removing this part.
Why LEFT JOIN? Because with UNION you'd have to go through ACCT twice, based on "parent" criteria (whether separate or done INNER JOIN criteria), while with plain LEFT OUTER JOIN you'll probably get just one pass through ACCT. In both cases, rows from "parents" will most probably be accessed based on primary keys.
As you are probably considering performance, when looking for "better", as always: Test your queries and look at the execution plans with adequate and fresh database statistics in place, as depending on the data "layout" (histograms, etc.) the "better" may be something completely different.

I think you misunderstand what a Union does versus a join statement. A union takes the records from multiple tables, generally similar or the same structure and combines them into a single resultset. It is not meant to combine multiple dissimilar tables.
What I am seeing is that you have two tables PERS and ORG with some of the same data in it. In this case I suggest you union those two tables and then join to ACCT to get the sample output.
In this case to get the output as you have shown you would want to use Outer joins so that you don't drop any records without a match. That will give you nulls in some places but most of the time that is what you want. It is much easier to filter those out later.
Very rough sample code.
SELECT a.*, b.*
from Acct as a
FULL OUTER JOIN (
Select * from PERS UNION Select * from ORG
) as b
ON a.ID = b.ID

Remove newest redundant row and update timestamp

I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!

Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA

Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.

Can I merge SQL Server tables if they have not exactly the same structure?

I have two tables, source and target.
source:
+--------+------+-------------+
| Name | Year | City |
+--------+------+-------------+
| Toyota | 2002 | Los Angeles |
| Seat | 2012 | Madrid |
+--------+------+-------------+
target:
+----+---------+------+----------+
| ID | Name | Year | City |
+----+---------+------+----------+
| 1 | Bentley | 1969 | Budapest |
| 2 | Toyota | 1988 | New York |
| 3 | Ford | 2001 | Tokyo |
| 4 | Seat | 1995 | Madrid |
| 5 | Bugatti | 1995 | London |
+----+---------+------+----------+
I want to merge source into target. I know the MERGE command, it's fine. The issue is that the source has no column ID so that it won't match.
Since Name column in both are unique I only need to match if they are equal, then if not exists insert into target, if exists update target.
I could do it using NOT EXIST statement, but we are talking about billions of rows so MERGE would be a much quicker solution.
So can I somehow set the MERGE command to take only that column into account when matching?

Yes, you can:
MERGE target t
USING source s
ON t.name = s.name
WHEN NOT MATCHED
INSERT (Name, Year, City)
VALUES (s.Name, s.Year, s.City)
WHEN MATCHED THEN
UPDATE SET Year = s.Year,
City = s.City;
If your ID column in target is not IDENTITY column you can create sequence to populate it.

How to add column with the value of another dimension?

I appologize if the title does not make sense. I am trying to do something that is probably simple, but I have not been able to figure it out, and I'm not sure how to search for the answer. I have the following MDX query:
SELECT
event_count ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
which returns something like this:
| | event_count |
+---------------+-------------+
| P Davis | 123 |
| J Davis | 123 |
| A Brown | 120 |
| K Thompson | 119 |
| R White | 119 |
| M Wilson | 118 |
| D Harris | 118 |
| R Thompson | 116 |
| Z Williams | 115 |
| X Smith | 114 |
I need to include an additional column (gender). Gender is not a metric. It's just another dimension on the data. For instance, consider this query:
SELECT
gender.children ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
But this is not what I want! :(
| | female | male | unknown |
+--------------+--------+------+---------+
| P Davis | | | 123 |
| J Davis | | 123 | |
| A Brown | | 120 | |
| K Thompson | | 119 | |
| R White | 119 | | |
| M Wilson | | | 118 |
| D Harris | | | 118 |
| R Thompson | | | 116 |
| Z Williams | | | 115 |
| X Smith | | | 114 |
Nice try, but I just want three columns: name, event_count, and gender. How hard can it be?
Obviously this reflects lack of understanding about MDX on my part. Any pointers to quality introductory material would be appreciated.

It's important to understand that in MDX you are building sets of members on each axis, and not specifying column names like a tabular rowset. You are describing a 2-dimensional grid of results, not a linear rowset. If you imagine each dimension as a table, the member set is the set of unique values from a single column in that table.
When you choose a Measure as the member (as in your first example), it looks as if you're selecting from a table, so it's easy to misunderstand. When you choose a Dimension, you get many members, and a cross-join between the rows and columns (which is sparse in this case because the names and genders are 1-to-1).
So, you could crossjoin these two dimensions on a single axis, and then filter out the null cells:
SELECT
event_count ON 0,
TOPCOUNT(
NonEmptyCrossJoin(name.children, gender.children),
10,
event_count) ON 1
FROM
events
Which should give you results that have a single column (event_count) and 10 rows, where each row is composed of the tuple (name, gender).
I hope that sets you on the right path, and please feel free to ask you want me to clarify.
For general introductory material, I think the book "MDX Solutions" is a good place to start:
http://www.amazon.ca/MDX-Solutions-Microsoft-Analysis-Services/dp/0471748080/

For an online MDX introductory material, you can have a look to this gentle introduction that presents the main MDX concepts.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to apply combinatorics on pairs in sql database? - sql

Related

SQL - specific requirement to compare tables

SQL - UNION vs NULL functions. Which is better?

Remove newest redundant row and update timestamp

Can I merge SQL Server tables if they have not exactly the same structure?

How to add column with the value of another dimension?

Categories

Resources