Find most common key, value pairs with Hstore in postgres - sql

I am collecting a list of items and version numbers in an hstore column in postgres. I'm interested in seeing 100 most common key value pairs. For example if this was my data set:
"foo"=> "22",
"foo"=> "33",
"bar"=> "55",
"baz"=> "77",
"foo"=> "22"
I would want to know that "foo"=>"22" is the most common key/value pair in my database. Let's say for ease of talking about the problem that the table name is widgets and the hstore column name is items.
select ??? from widgets;
Is it possible to get a list of the top key value pairs using only SQL?

Oh, this is pretty easy. Here:
SELECT key, count(*) FROM
(SELECT (each(h)).key FROM reports)
AS stat
GROUP BY key
ORDER BY count DESC, key
LIMIT 100;

To get key/value pairs as a set, the relevant function is each():
select * from each('a=>1,b=>2')
http://www.postgresql.org/docs/current/static/hstore.html#HSTORE-FUNC-TABLE
A simple count with a limit can do the trick:
SELECT (item).key, (item).value, count(*) as count
FROM (SELECT each(items) as item FROM widgets) as t
GROUP BY (item).key
ORDER BY 2 DESC, (item).value
LIMIT 100
If you're only interested in the keys, you can use the simpler skeys() instead:
SELECT k, count(*) as count
FROM (SELECT skeys(items) as k FROM widgets) as t
GROUP BY k
ORDER BY 2 DESC, k
LIMIT 100

As you are after the complete key/value pair, the following should do it:
select items, count(*) as cnt
from widgets
group by items
order by 2 desc
limit 100

Related

SQL Server : index for finding latest value which is greater than a passed value

I have a table with 4 columns
USER_ID: numeric
EVENT_DATE: date
VERSION: date
SCORE: decimal
I have a clustered index on (USER_ID, EVENT_DATE, VERSION). These three values together are unique.
I need to get the maximum EventDate for a set of UserIds (~1000 different ids) where the Score is larger than a specific value and only consider those entries with a specific Version.
SELECT M.*
FROM (VALUES
( 5237 ),
………1000 more
( 27054 ) ) C (USER_ID)
CROSS APPLY
(SELECT TOP 1 C.USER_ID, M.EVENT_DATE, M.SCORE
FROM MY_HUGE_TABLE M
WHERE C. USER_ID = M. USER_ID
AND M.VERSION = 'xxxx-xx-xx'
AND M.SCORE > 2 --Comment M.SCORE > 2
ORDER BY M.EVENT_DATE DESC) M
Once I execute the query, I get poor results with respect to runtime, due to a missing index on score column (I suppose).
If I delete the filtering on “M.SCORE > 2” I get my results ten times faster, nevertheless the latest Scores may be less than “2”.
Could anyone please hint me on how to setup an index which could allow me to improve my query performance.
Thank you very much in advance
For your query, the optimal index would be on (User_ID, Version, ValueDate desc, Score).
Unfortunately, your clustered index doesn't match. Only the first and third columns match, but they need to match in order. So, only the User_ID can help but that probably doesn't do much to filter the data.

Extracting key from the value of json object in postgres

Let us say I have a json object {'key1':0.5,'key2':0.3,'key3':0.1} in a particular column in a table called test. I want to return the key of the highest value. To get the highest value, in postgres, I can write this query:
select greatest(column1->'key1',column1->'key2',column1->'key3') from test
Now, it returns the greatest value. But the one I want is the key associated with the highest value. Is this possible in postgres json querying?
You need to extract all key/value pairs as rows. Once you have done that, it's a greatest-n-per-group problem - without the "group" though as you are looking at all rows.
select k,val
from (
select t.*, row_number() over (order by t.val::numeric desc) as rn
from jsonb_each_text('{"key1":0.5,"key2":0.3,"key3":0.1}'::jsonb) as t(k,val)
) t
where rn = 1;
Online example: http://rextester.com/OLBM23414

Logically determine a composite key in SQL

I'm working with an MSSQL table that does not have a primary or unique key contstraint defined. There are two fields, lets call them xId and yId, that I believe together would be a composite key, but I want to confirm this by examining the data.
I'm thinking that I should be able to write a SQL count statement that I can compare to the total number of records on the table that would logically determine if the combination of xId and yId (or a third column id necessary) could in fact act as a composite key. However, I'm having trouble coming up with the right GROUP BY or other type of clause that would confirm or disprove this.
Any ideas?
Use group by and having:
select xid,yid
from table
group by xid,yid
having count(1) > 1
This will show any pairs that are non-unique, so if there are no rows returned its a good key.
Just do a count of the total rows of the table, and then do
select count(1)
from(
select xid,yid
from table
group by xid,yid
)a;
if all pairs of xid and yid form a unique identifier, then the two numbers will be the same.
Alternatively, you could count the number of distinct pairs of xid and yid and find the largest such number:
select max(num_rows)
from(
select xid,yid,count(1) as num_rows
from table
group by xid,yid
)a;
The result of this query is 1 if and only if (xid,yid) pairs form a unique identifier for your table.
this will list all the problem combinations (if any) of xid,yid:
SELECT
COUNT(*),xid,yid
FROM YourTable
GROUP BY xid,yid
HAVING COUNT(*)>1

How to get the position of a record in a table (SQL Server)

Following problem:
I need to get the position of a record in the table. Let's say I have five record in the table:
Name: john doe, ID: 1
Name: jane doe, ID: 2
Name: Frankie Boy, ID: 4
Name: Johnny, ID: 9
Now, "Frankie Boy" is in the third position in the table. But How to get this information from the SQL server? I could count IDs, but they are not reliable, Frankie has the ID 4, but is in the third position because the record with the ID '3' was deleted.
Is there a way? I am aware of ROW_RANK but it would be costly, because I need to select basically the whole set first before I can rank row_rank them.
I am using MS SQL Server 2008 R2.
Tables don't have 'position'. Rows in a table (=set) are identified by their primary key value. Only result rows have 'position' which can be deterministic when a ORDER BY clause is present. Assuming that tables (=sets) have a position will lead only to problems and is the wrong mind set.
You can use row_number() to "label" rows. You've got to specify a way to order the rows:
select row_number() over (order by id) as PositionInTable
, *
from YourTable
If performance is an issue, you could store the position in a new column:
update yt1
set PositionInTable = rn
from YourTable yt1
join (
select row_number() over (order by id) as rn
, id
from YourTable
) yt2
on yt1.id = yt2.id
With an index on PositionInTable, this would be lightning fast. But you would have to update this after each insert on the table.
Tables are [conceptually] without order. Unless you specify ORDER BY in a select statement to order a results set, results may be returned in any order. Repeated executions of the exact same SQL may return the results set in different orders fro each execution.
To get the row number in a particular result set, use the row_number() function:
select row = row_number() over( order by id ) , *
from sysobjects
This will assign a row number to each row in sysobjects as if the table were ordered by id.
A simple way to do this without having to use ROW_NUMBER would be to simply count how many rows in the table have an index less or equal to the selected index, this would give the row number.
SELECT COUNT(*) FROM YourTable WHERE ID <= 4 -- Frankie Boy, Result = 3
This may not be the most efficient way to do it for your particular scenario, but it's a simple way of achieving it.

Ordering MySQL Query By Random and a Field?

I am trying to find a way to pull 10 random records and then order those 10 records by a field. I have tried the following:
SELECT name FROM users ORDER BY RAND(), name LIMIT 10
but it does not order by name with 10 rows returned, just return 10 random record in any order. Is there a way to order by rand() and a field within a query with MySQL?
SELECT name
FROM (
SELECT name
FROM users
ORDER BY
RAND()
LIMIT 10
) q
ORDER BY
name
Ended up just doing the sort in php.