Custom ordering in sqlite - sql

Is there a way to have a custom order by query in sqlite?
For example, I have essentially an enum
_id|Name|Key
------------
1 | One | Named
2 | Two | Contributing
3 | Three | Named
4 | Four | Key
5 | Five | Key
6 | Six | Contributing
7 | Seven | Named
And the 'key' columns have ordering. Say Key > Named > Contributing.
Is there a way to make
SELECT * FROM table ORDER BY Key
return something to the effect of
_id|Name|Key
------------
4 | Four | Key
5 | Five | Key
1 | One | Named
3 | Three | Named
7 | Seven | Named
2 | Two | Contributing
6 | Six | Contributing
this?

SELECT _id, Name, Key
FROM my_table t
ORDER BY CASE WHEN key = 'Key' THEN 0
WHEN key = 'Named' THEN 1
WHEN key = 'Contributing' THEN 2 END, id;

If you have a lot of CASE's (or complicated set of conditions), Adam's solution may result in an extremely large query.
SQLite does allow you to write your own functions (in C++). You could write a function to return values similar to the way Adam does, but because you're using C++, you could work with a much larger set of conditions (or separate table, etc).
Once the function is written, you can refer to it in your SELECT as if it were a built-in function:
SELECT * FROM my_table ORDER BY MyOrder(Key)

Did you try (not tested on my side but relying on a technique I previously used):
ORDER BY KEY = "Key" DESC,
KEY = "Named" DESC,
KEY = "Contributing" DESC

Related

Sort SQL results and include missing keys

I have a Postgres table like this (greatly simplified):
id | object_id (foreign id) | key (text) | value (text)
1 | 1 | A | 0foo
2 | 1 | B | 1bar
3 | 1 | C | 2baz
4 | 1 | D | 3ham
5 | 2 | C | 4sam
6 | 3 | F | 5pam
…
(billions of rows)
I select object_ids according to some query (not relevant here), and then sort them according to the value of a specified key.
def sort_query_result(query, sort_by, limit, offset):
return query\
.with_entities(Table.object_id)\
.filter(Table.key == sort_by)\
.order_by(desc(Table.value))\
.limit(limit).offset(offset).subquery()
For example, assume a query matches object_ids 1 and 2 above. When sort_by=C, I want the result to be returned in the order [2, 1], because 4sam > 2baz.
This works well but there's one big problem:
Object ids that are returned by query but do not have any row for the sort_by key, are not returned at all.
For example, for a query that matches object_ids 1 and 2, sort_query_results(query, sort_by='D') == [1]. The object_id 2 is dropped because it has no D, which is undesirable.
Instead, I'd like to return all object_ids from the query. Those without the sort key should be sorted at the end, in any order: sort_query_results(query, sort_by='D') == [1, 2].
What's the best way to achieve that?
Note: I do not have the freedom to change the DB schema or business logic. But I can change the query code. I use SQLAlchemy ORM from Python, but could execute raw Postgres commands if necessary. Thank you.

Recursive self join over file data

I know there are many questions about recursive self joins, but they're mostly in a hierarchical data structure as follows:
ID | Value | Parent id
-----------------------------
But I was wondering if there was a way to do this in a specific case that I have where I don't necessarily have a parent id. My data will look like this when I initially load the file.
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
Essentially, its a CSV file where each row in the table is a line in the file. Lines 1 and 5 identify an object header and lines 3, 4, 7, and 8 identify the rows belonging to the object. The object header lines can have only 40 attributes which is why the object is broken up across multiple sections in the CSV file.
What I'd like to do is take the table, separate out the record # column, and join it with itself multiple times so it achieves something like this:
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,5,6,7,8,...
2 | *,record,abc,efg,hij,lmn,opq,rst
3 | ,,1,x,y,z,t,u,v,...
4 | ,,2,q,r,s,l,m,n,...
I know its probably possible, I'm just not sure where to start. My initial idea was to create a view that separates out the first and second columns in a view, and use the view as a way of joining in a repeated fashion on those two columns. However, I have some problems:
I don't know how many sections will occur in the file for the same
object
The file can contain other objects as well so joining on the first two columns would be problematic if you have something like
ID | Line |
-------------------------
1 | 3,Formula,1,2,3,4,...
2 | *,record,abc,efg,hij,...
3 | ,,1,x,y,z,...
4 | ,,2,q,r,s,...
5 | 3,Formula,5,6,7,8,...
6 | *,record,lmn,opq,rst,...
7 | ,,1,t,u,v,...
8 | ,,2,l,m,n,...
9 | ,4,Data,1,2,3,4,...
10 | *,record,lmn,opq,rst,...
11 | ,,1,t,u,v,...
In the above case, my plan could join rows from the Data object in row 9 with the first rows of the Formula object by matching the record value of 1.
UPDATE
I know this is somewhat confusing. I tried doing this with C# a while back, but I had to basically write a recursive decent parser to parse the specific file format and it simply took to long because I had to get it in the database afterwards and it was too much for entity framework. It was taking hours just to convert one file since these files are excessively large.
Either way, #Nolan Shang has the closest result to what I want. The only difference is this (sorry for the bad formatting):
+----+------------+------------------------------------------+-----------------------+
| ID | header | x | value
|
+----+------------+------------------------------------------+-----------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 |3,Formula,1,2,3,4,5,6,7,8 |
| 2 | ,, | ,1,x,y,z,t,u,v | ,1,x,y,z,t,u,v |
| 3 | ,, | ,2,q,r,s,l,m,n | ,2,q,r,s,l,m,n |
| 4 | *,record, | ,abc,efg,hij,lmn,opq,rst |*,record,abc,efg,hij,lmn,opq,rst |
| 5 | ,4, | ,Data,1,2,3,4 |,4,Data,1,2,3,4 |
| 6 | *,record, | ,lmn,opq,rst | ,lmn,opq,rst |
| 7 | ,, | ,1,t,u,v | ,1,t,u,v |
+----+------------+------------------------------------------+-----------------------------------------------+
I agree that it would be better to export this to a scripting language and do it there. This will be a lot of work in TSQL.
You've intimated that there are other possible scenarios you haven't shown, so I obviously can't give a comprehensive solution. I'm guessing this isn't something you need to do quickly on a repeated basis. More of a one-time transformation, so performance isn't an issue.
One approach would be to do a LEFT JOIN to a hard-coded table of the possible identifying sub-strings like:
3,Formula,
*,record,
,,1,
,,2,
,4,Data,
Looks like it pretty much has to be human-selected and hard-coded because I can't find a reliable pattern that can be used to SELECT only these sub-strings.
Then you SELECT from this artificially-created table (or derived table, or CTE) and LEFT JOIN to your actual table with a LIKE to get all the rows that use each of these values as their starting substring, strip out the starting characters to get the rest of the string, and use the STUFF..FOR XML trick to build the desired Line.
How you get the ID column depends on what you want, for instance in your second example, I don't know what ID you want for the ,4,Data,... line. Do you want 5 because that's the next number in the results, or do you want 9 because that's the ID of the first occurrance of that sub-string? Code accordingly. If you want 5 it's a ROW_NUMBER(). If you want 9, you can add an ID column to the artificial table you created at the start of this approach.
BTW, there's really nothing recursive about what you need done, so if you're still thinking in those terms, now would be a good time to stop. This is more of a "Group Concatenation" problem.
Here is a sample, but has some different with you need.
It is because I use the value the second comma as group header, so the ,,1 and ,,2 will be treated as same group, if you can use a parent id to indicated a group will be better
DECLARE #testdata TABLE(ID int,Line varchar(8000))
INSERT INTO #testdata
SELECT 1,'3,Formula,1,2,3,4,...' UNION ALL
SELECT 2,'*,record,abc,efg,hij,...' UNION ALL
SELECT 3,',,1,x,y,z,...' UNION ALL
SELECT 4,',,2,q,r,s,...' UNION ALL
SELECT 5,'3,Formula,5,6,7,8,...' UNION ALL
SELECT 6,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 7,',,1,t,u,v,...' UNION ALL
SELECT 8,',,2,l,m,n,...' UNION ALL
SELECT 9,',4,Data,1,2,3,4,...' UNION ALL
SELECT 10,'*,record,lmn,opq,rst,...' UNION ALL
SELECT 11,',,1,t,u,v,...'
;WITH t AS(
SELECT *,REPLACE(SUBSTRING(t.Line,LEN(c.header)+1,LEN(t.Line)),',...','') AS data
FROM #testdata AS t
CROSS APPLY(VALUES(LEFT(t.Line,CHARINDEX(',',t.Line, CHARINDEX(',',t.Line)+1 )))) c(header)
)
SELECT MIN(ID) AS ID,t.header,c.x,t.header+STUFF(c.x,1,1,'') AS value
FROM t
OUTER APPLY(SELECT ','+tb.data FROM t AS tb WHERE tb.header=t.header FOR XML PATH('') ) c(x)
GROUP BY t.header,c.x
+----+------------+------------------------------------------+-----------------------------------------------+
| ID | header | x | value |
+----+------------+------------------------------------------+-----------------------------------------------+
| 1 | 3,Formula, | ,1,2,3,4,5,6,7,8 | 3,Formula,1,2,3,4,5,6,7,8 |
| 3 | ,, | ,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v | ,,1,x,y,z,2,q,r,s,1,t,u,v,2,l,m,n,1,t,u,v |
| 2 | *,record, | ,abc,efg,hij,lmn,opq,rst,lmn,opq,rst | *,record,abc,efg,hij,lmn,opq,rst,lmn,opq,rst |
| 9 | ,4, | ,Data,1,2,3,4 | ,4,Data,1,2,3,4 |
+----+------------+------------------------------------------+-----------------------------------------------+

SQL Server: Use a column to save order of the record

I'm facing a database that keeps the ORDERING in columns of the table.
It's like:
Id Name Description Category OrderByName OrderByDescription OrderByCategory
1 Aaaa bbbb cccc 1 2 3
2 BBbbb Aaaaa bbbb 2 1 2
3 cccc cccc aaaaa 3 3 1
So, when the user want's to order by name, the SQL goes with an ORDER BY OrderByName.
I think this doesn't make any sense, since that's why Index are for and i tried to find any explanation for that but haven't found. Is this faster than using indexes? Is there any scenario where this is really useful?
It can make sense for many reasons but mainly when you don't want to follow the "natural order" given by the ORDER BY clause.
This is a scenario where this can be useful :
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE Table1
([Id] int, [Name] varchar(15), [OrderByName] int)
;
INSERT INTO Table1
([Id], [Name], [OrderByName])
VALUES
(1, 'Del Torro', 2 ),
(2, 'Delson', 1),
(3, 'Delugi', 3)
;
Query 1:
SELECT *
FROM Table1
ORDER BY Name
Results:
| ID | NAME | ORDERBYNAME |
|----|-----------|-------------|
| 1 | Del Torro | 2 |
| 2 | Delson | 1 |
| 3 | Delugi | 3 |
Query 2:
SELECT *
FROM Table1
ORDER BY OrderByName
Results:
| ID | NAME | ORDERBYNAME |
|----|-----------|-------------|
| 2 | Delson | 1 |
| 1 | Del Torro | 2 |
| 3 | Delugi | 3 |
I think it makes little sense for two reasons:
Who is going to maintain this set of values in the table? You need to update them every time any row is added, updated, or deleted. You can do this with triggers, or horribly buggy and unreliable constraints using user-defined functions. But why? The information that seems to be in those columns is already there. It's redundant because you can get that order by ordering by the actual column.
You still have to use massive conditionals or dynamic SQL to tell the application how to order the results, since you can't say ORDER BY #column_name.
Now, I'm basing my assumptions on the fact that the ordering columns still reflect the alphabetical order in the relevant columns. It could be useful if there is some customization possible, e.g. if you wanted all Smiths listed first, and then all Morts, and then everyone else. But I don't see any evidence of this in the question or the data.
This could be useful if the ordering was customizable - that is, if users did not want to see the list in alphabetical order, but rather in some custom order.
An index on the int columns would be smaller than an index on the column that holds the actual text, but I don't see that there is any real benefit to this in most cases.

Increasing a +1 to the id without changing the content of a column

I have this random table with random contents.
id | name| mission
1 | aaaa | kitr
2 | bbbb | etre
3 | ccccc| qwqw
4 | dddd | qwert
5 | eeee | potentials
6 | ffffffff | toto
What I want is to add in the above table a column with id=3 with different name and different mission BUT the OLD id =3 I want to have an id = 4 with the name and the mission that it had before when it was id=3, and the OLD id =4 become id=5 with the name and mission of id 5 and so on.
its like i want to enter a column inside of the columns and the below column i want to increase there id +1 but the columns rest the same. example below:
id | name| mission
1 | aaaa | kitr
2 | bbbb | etre
3 | zzzzzz| zzzzz
4 | ccccc| qwqw
5 | dddd | qwert
6 | eeee | potentials
7 | ffffffff | toto
why I want to do this ? I have a table that has 2 CLOB. Inside of those CLOBS there are different queries ex: id =1 has clob of creation of a table id=2 inserts for the columns id=3 has creation of another table id=4 has functions
if you add all of this id in one text(or clob) they will have to create then inserts then create then functions. that table it is like a huge script .
Why I am doing this ? The developers are building their application and they want the sql to work in specific order and I have 6 developers and am organizing the data modeling and the performance and how the scripts are running .So the above table is to organize the calling of the scripts that they wany
Simply put, don't do it.
This case highlights why you should never use any business value, i.e. any 'real world values' for a Primary Key.
In your case I would recommend primary keys not be used for any other purposes.
I recommend you add an extra column 'order' and then change THAT column in order to re-order the rows. That way your primary key and all the other records will not need to be touched.
This avoid the issue that your approach would need to change ALL the database records below the current record which seems like a really bad approach. Just imagine trying to undo that update ;)
Some more info here: https://stackoverflow.com/a/8777574/631619
UPDATE random_table r1
SET id =
(SELECT CASE WHEN id > 2 THEN id+1 ELSE id END id FROM random_table r2
WHERE r1.mission=r2.mission
)
Then insert the new value.

Access 2007 select first value of query results

I am running into a rather annoying thingy in Access (2007) and I am not sure if this is a feature or if I am asking for the impossible.
Although the actual database structure is more complex, my problem boils down to this:
I have a table with data about Units for specific years. This data comes from different sources and might overlap.
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
...
Now I would like the user to select certain sources, order them by priority and then extract one data value for each year.
For example, if the user selects source 1, 2 and 3 and orders them by (3, 1, 2), then I would like the following result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2011 | 90 | 1 |
I am able to order the initial table, based on a specific order. I do this with the following query
SELECT Unit, IYR, X1, Source
FROM TestTable
WHERE Source In (1,2,3)
ORDER BY Unit, IYR,
IIf(Source=3,1,IIf(Source=1,2,IIf(Source=2,3,4)))
This gives me the following intermediate result:
Unit | IYR | X1 | Source |
-----------------------------
A | 2009 | 55 | 1 |
A | 2010 | 150 | 3 |
A | 2010 | 80 | 1 |
A | 2010 | 101 | 2 |
A | 2011 | 90 | 1 |
Next step is to only get the first value of each year. I was thinking to use the following query:
SELECT X.Unit, X.IYR, first(X.X1) as FirstX1
FROM (...) AS X
GROUP BY X.Unit, X.IYR
Where (…) is the above query.
Now Access goes bananas. Whatever order I give to the intermediate results, the result of this query is.
Unit | IYR | X1 |
--------------------
A | 2009 | 55 |
A | 2010 | 80 |
A | 2011 | 90 |
In other words, for year 2010 it shows the value of source 1 instead of 3. It seems that Access does not care about the ordering of the nested query when it applies the FIRST() function and sticks to the original ordering of the data.
Is this a feature of Access or is there a different way of achieving the desired results?
Ps: Next step would be to use a self join to add the source column to the results again, but I first need to resolve above problem.
Rather than use first it may be better to determine the MIN Priority and then join back e.g.
SELECT
t.UNIT,
t.IYR,
t.X1,
t.Source ,
t.PrioritySource
FROM
(SELECT
Unit,
IYR,
X1,
Source,
SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
) as t
INNER JOIN
(SELECT
Unit,
IYR,
MIN(SWITCH ( [Source]=3, 1,
[Source]=1, 2,
[Source]=2, 3)) as PrioritySource
FROM
TestTable
WHERE
Source In (1,2,3)
GROUP BY
Unit,
IYR ) as MinPriortiy
ON t.Unit = MinPriortiy.Unit and
t.IYR = MinPriortiy.IYR and
t.PrioritySource = MinPriortiy.PrioritySource
which will produce this result (Note I include Source and priority source for demonstration purposes only)
UNIT | IYR | X1 | Source | PrioritySource
----------------------------------------------
A | 2009 | 55 | 1 | 2
A | 2010 | 150 | 3 | 1
A | 2011 | 90 | 1 | 2
Note the first subquery is to handle the fact that Access won't let you join on a Switch
Yes, FIRST() does use an arbitrary ordering. From the Access Help:
These functions return the value of a specified field in the first or
last record, respectively, of the result set returned by a query. If
the query does not include an ORDER BY clause, the values returned by
these functions will be arbitrary because records are usually returned
in no particular order.
I don't know whether FROM (...) AS X means you are using an ORDER BY inline (assuming that is actually possible) or if you are using a VIEW ('stored Query object') here but either way I assume the ORDER BY is being disregarded (because an ORDER BY should only apply to the final result).
The alternative is to use MIN() (or possibly MAX()).
This is the most concise way I have found to write such queries in Access that require pulling back all columns that correspond to the first row in a group of records that are ordered in a particular way.
First, I added a UniqueID to your table. In this case, it's just an AutoNumber field. You may already have a unique value in your table, in which case you can use that.
This will choose the row with a Source 3 first, then Source 1, then Source 2. If there is a tie, it picks the one with the higher X1 value. If there is a further tie, it is broken by the UniqueID value:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=
(SELECT TOP 1 [UniqueID] FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, UniqueID)
This yields:
Unit IYR X1 Source UniqueID
A 2009 55 1 1
A 2010 150 3 4
A 2011 90 1 5
I recommend (1) you create an index on the IYR field -- this will dramatically increase your performance for this type of query, and (2) if you have a lot (>~100K) records, this isn't the best choice. I find it works quite well for tables in the 1-70K range. For larger datasets, I like to use my GroupIncrement function to partition each group (similar to SQL Server's ROW_NUMBER() OVER statement).
The Choose() function is a VBA function and may not be clear here. In your case, it sounds like there is some interactivity required. For that, you could create a second table called "Choices", like so:
Rank Choice
1 3
2 1
3 2
Then, you could substitute the following:
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.UniqueID=(SELECT TOP 1 [UniqueID] FROM
[TestTable] t2 INNER JOIN [Choices] c
ON t2.Source=c.Choice
WHERE t.IYR=t2.IYR ORDER BY c.[Rank], t2.X1 DESC, t2.UniqueID);
Indexing Source on TestTable and Choice on the Choices table may be helpful here, too, depending on the number of choices required.
Q:
Can you get this to work without the need for surrogate key? For
example what if the unique key is the composite of
{Unit,IYR,X1,Source}
A:
If you have a compound key, you can do it like this-- however I think that if you have a large dataset, it will totally kill the performance of the query. It may help to index all four columns, but I can't say for sure because I don't regularly use this method.
SELECT t.* INTO [Chosen Rows]
FROM TestTable AS t
WHERE t.Unit & t.IYR & t.X1 & t.Source =
(SELECT TOP 1 Unit & IYR & X1 & Source FROM [TestTable]
WHERE t.IYR=IYR ORDER BY Choose([Source],2,3,1), X1 DESC, Unit, IYR)
In certain cases, you may have to coalesce some of the individual parts of the key as follows (though Access generally will coalesce values automatically):
t.Unit & CStr(t.IYR) & CStr(t.X1) & CStr(t.Source)
You could also use a query in your FROM statements instead of the actual table. The query itself would build a composite of the four fields used in the key, and then you'd use the new key name in the WHERE clause of the top SELECT statement, and in the SELECT TOP 1 [key] of the subquery.
In general, though, I will either: (a) create a new table with an AutoNumber field, (b) add an AutoNumber field, (c) add an integer and populate it with a unique number using VBA - this is useful when you get a MaxLocks error when trying to add an AutoNumber, or (d) use an already indexed unique key.