To retrieve first N objects in query results

To retrieve first N objects in query results - documentum

I am using Documentum Developer Edition 6.6.
I need to (using DFS) execute the following DQL expression
select "r_version_label","i_chronicle_id", "i_position", "r_modify_date" , "subject","title","r_object_type","object_name","r_object_id" from "dm_document" where FOLDER (ID('0cde75d180000107')) and "r_object_type"='dm_document' order by "r_modify_date" asc, "i_position" desc
But I need only first N of the objects the Select returns. I repeat: N objects, not rows (this matters because of the r_version_label in the resulting attributes is a repeatable field).
I tried to do this using the following DQL:
select "r_version_label","i_chronicle_id", "i_position", "r_modify_date" , "subject","title","r_object_type","object_name","r_object_id" from "dm_document" where FOLDER (ID('0cde75d180000107')) and "r_object_type"='dm_document' order by "r_modify_date" asc, "i_position" desc ENABLE (OPTIMIZE_TOP , RETURN_TOP )
But I saw: the returned were rows, not objects. This is because my Documentum Server has the default parameter return_top_results_row_based (=true). Changing the parameter in the server.ini is not acceptable for me – I have to write an application that will work in the same manner whatever the return_top_results_row_based is.
I have tried RETURN_RANGE, SQL_DEF_RESULT_SET and FETCH_ALL_RESULTS instead of the RETURN_TOP – but their N is rows too.
So, now I see the only way to do this. I will use the following DQL:
select "r_version_label","i_chronicle_id", "i_position", "r_modify_date" , "subject","title","r_object_type","object_name","r_object_id" from "dm_document" where FOLDER (ID('0cde75d180000107')) and "r_object_type"='dm_document' order by "r_modify_date" asc, "i_position" desc ENABLE (OPTIMIZE_TOP , RETURN_TOP )
And while processing the result my application will use only first N of the returned objects. Hope the “OPTIMIZE_TOP ” will minimize the time to read the objects I will not use. My DBMS is MSSQL, and the DQL Reference says the “OPTIMIZE_TOP ” does have effect for MSSQL.
Maybe someone can propose a better solution?

Try this query :
select "r_object_id, "r_version_label","i_chronicle_id", "i_position", "r_modify_date" , "subject","title","r_object_type","object_name","r_object_id" from "dm_document" where FOLDER (ID('0cde75d180000107')) and "r_object_type"='dm_document' order by "r_object_id", "r_modify_date" asc, "i_position" desc
Ordering on r_object_id should force dql to aggregrate rows into objects in the result collection.
I don't really know how it interacts with OPTIMIZE/RETURN_TOP though.

Related

How to build a raport Top Conversion Paths in BigQuery

I have a problem with build a raport like "Top Conversion Paths" in Google Analytics. Any ideas how can I create this?
I find something like this, but it dosen't work (https://lastclick.city/top-conversion-paths-in-ga-and-bigquery.html):
SELECT
REGEXP_REPLACE(touchpointPath, 'Conversion >.*', 'Conversion') as touchpointPath, COUNT(touchpointPath) AS TOP
FROM (SELECT
GROUP_CONCAT(touchpoint,' > ') AS touchpointPath
FROM (SELECT
*
FROM (SELECT
fullVisitorId,
'Conversion' AS touchpoint,
(visitStartTime+hits.time) AS timestamp
FROM
TABLE_DATE_RANGE([pro-tracker-id.ga_sessions_], TIMESTAMP('2018-10-01'), TIMESTAMP('2018-10-05'))
WHERE
hits.eventInfo.eventAction="Email Submission success")
,
(SELECT
fullVisitorId,
CONCAT(trafficSource.source,'/',trafficSource.medium) AS touchpoint,
(visitStartTime+hits.time) AS timestamp
FROM
TABLE_DATE_RANGE([pro-tracker-id.ga_sessions_], TIMESTAMP('2018-10-01'), TIMESTAMP('2018-10-05'))
WHERE
hits.hitNumber=1)
ORDER BY
timestamp)
GROUP BY
fullVisitorId
HAVING
touchpointPath LIKE '%Conversion%')
GROUP BY
touchpointPath
ORDER BY
TOP DESC

It doesn't work because you have to modify the query to your needs.
This line needs to be changed to match your specific event action:
hits.eventInfo.eventAction="YOUR EVENT ACTION HERE")
The table reference and the dates need to be changed too:
TABLE_DATE_RANGE([pro-tracker-id.ga_sessions_], TIMESTAMP('2018-10-01'), TIMESTAMP('2018-10-05'))

The shared article refers to a link regarding getting information about the flatten function in BigQuery Legacy SQL.
As far as I know, queries in the new BigQuery UI runs as Standard SQL by default; however, you are able to set the SQL variant by including a prefix to your query in the web UI, REST API call or when using the Cloud Client library.

Spark SQL query: org.apache.spark.sql.AnalysisException

I am trying to write a query for a twitter json file to extract the most influential person by looking at retweetCount. I need to group my output by the user, their time zone and the number of retweets in descending order.
When I run the query below I keep getting the exception:
org.apache.spark.sql.AnalysisExceptionorg.apache.spark.sql.AnalysisException:
cannot resolve 'total_retweets' given input columns
t.retweeted_screen_name, t.tz, total_retweets, tweet_count;
sqlContext.sql("""
SELECT
t.retweeted_screen_name,
t.tz,
sum(retweets) AS total_retweets,
count(*) AS tweet_count
FROM (SELECT
actor.displayName as retweeted_screen_name,
body,
actor.twitterTimeZone as tz,
max(retweetCount) as retweets
FROM tweetTable WHERE body <> ''
GROUP BY actor.displayName, actor.twitterTimeZone,
body) t
GROUP BY t.retweeted_screen_name, t.tz
ORDER BY total_retweets DESC
LIMIT 10 """).collect.foreach(println)
When I try to simplify this query I run into errors like:
Column total_retweets is invalid in the select list because it is not
contained in either an aggregate function or the GROUP BY clause.
Will much appreciate any help.

When you run a SQL query, it does not calculate resolve the aliases for each query until after the WHERE, JOIN, GROUP BY and ORDER BY clauses have run (but it does do so before any HAVING clauses). You therefore can't ORDER BY total_retweets, you will need to ORDER BY sum(retweets)

Finding most popular and most unique records using SQL

My mom wanted a baby name game for my brother's baby shower. Wanting to learn python, I volunteered to do it. I pretty much have the python bit, it's the SQL that is throwing me.
The way the game is supposed to work is everyone at the shower writes down names on paper, I manually enter them into Excel (normalizing spellings as much as possible) and export to MS Access. Then I run my python program to find the player with the most popular names and the player with the most unique names. The database, called "babynames", is just four columns.
ID | BabyFirstName | BabyMiddleName | PlayerName
---|---------------|----------------|-----------
My mom has changed things every so often, but as they stand right now, I have to figure out :
a) The most popular name (or names if there is a tie) out of all first and middle names
b) The most unique name (or names if there is a tie) out of all the first and middle names
c) The player that has the most number of popular names (wins a prize)
d) The player that has the most number of unique names (wins a prize)
I've been working on this for about a week now and can't even get a SQL query for a) and b) to work, much less c) and d). I'm more than just a bit frustrated.
BTW, I'm just looking at spellings of the names, not phonetics. As I manually enter names, I will change names like "Kris" to "Chris" and "Xtina" to "Christina" etc.
Editing to add a couple of the most recent queries I tried for a)
SELECT [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
LIMIT 1
and
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC
LIMIT 1)
These both lead to syntax errors.
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error in ORDER BY clause. (-3508) (SQLExecDirectW)')
I've tried using [FirstNameOccurrence] and just FirstNameOccurrence as well with the same error. Not sure why it's not recognizing it by that column name to order by.
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error. in query expression 'COUNT(*) = (SELECT COUNT(*) FROM [babynames] GROUP BY [BabyFirstName] ORDER BY COUNT(*) DESC LIMIT 1)'. (-3100) (SQLExecDirectW)")
I'll admit that I'm not really grokking all of the COUNT(*) commands here, but this was a solution for a similar issue here in stackoverflow that I figured I'd try when my other idea didn't pan out.

For A and B, use a group by clause in your SQL, and then count, and order by the count. Use descending order for A and ascending order for B, and just take the first result for each.
For C and D, essentially use the same strategy but now just add the PlayerName (e.g. group by babyname,playername) and then use the ascending order/descending order question.
Here's Microsoft's write-up for a group by clause in MS Access: https://office.microsoft.com/en-us/access-help/group-by-clause-HA001231482.aspx
Here's an even better write-up demonstrating how to do both group by and order by at the same time: http://rogersaccessblog.blogspot.com/2009/06/select-queries-part-3-sorting-and.html

For the first query you tried, change it to:
SELECT TOP 1 [BabyFirstName],
COUNT ([BabyFirstName]) AS 'FirstNameOccurrence'
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY 'FirstNameOccurrence' DESC
For the second, change it to:
SELECT [BabyFirstName]
FROM [babynames]
GROUP BY [BabyFirstName]
HAVING COUNT(*) =
(SELECT TOP 1 COUNT(*)
FROM [babynames]
GROUP BY [BabyFirstName]
ORDER BY COUNT(*) DESC)
Limiting the number of records returned by a SQL Statement in Access is achieved by adding a TOP statement directly after SELECT, not with ORDER BY... LIMIT
Also, Access TOP statement will return all instances of the top n (or n percent) unique records, so if there are two or more identical records in the query output (before TOP), and TOP 1 is specified, you'll see them all.

What's wrong with this Oracle query?

Below is a query generated by the PetaPoco ORM for .NET. I don't have an Oracle client right now to debug it and I can't see anything obviously wrong (but I'm a SQL Server guy). Can anyone tell me why it is producing this error:
Oracle.DataAccess.Client.OracleException ORA-00923: FROM keyword not found where expected
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) peta_rn,
"ON_CUST_MAS"."CU_NO",
"ON_CUST_MAS"."CU_NAME",
"ON_CUST_MAS"."CU_TYPE",
"ON_CUST_MAS"."CONTACT",
"ON_CUST_MAS"."ADD1_SH",
"ON_CUST_MAS"."ADD2_SH",
"ON_CUST_MAS"."CITY_SH",
"ON_CUST_MAS"."POST_CODE",
"ON_CUST_MAS"."PROV_SH",
"ON_CUST_MAS"."COUNTRY",
"ON_CUST_MAS"."PHONE_NU",
"ON_CUST_MAS"."FAX_NU",
"ON_CUST_MAS"."EMAIL",
"ON_CUST_MAS"."PU_ORDER_FL",
"ON_CUST_MAS"."CREDIT_AMOUNT"
FROM "ON_CUST_MAS" ) peta_paged
WHERE peta_rn>0 AND peta_rn<=20
Edit: Just in case it helps, this is a paging query. Regular queries (select all, select by ID) are working fine.

The problem is that the SELECT NULL in the ORDER BY clause of your analytic function is syntactically incorrect.
over (ORDER BY (SELECT NULL))
could be rewritten
(ORDER BY (SELECT NULL from dual))
or more simply
(ORDER BY null)
Of course, it doesn't really make sense to get a row_number if you aren't ordering the results by anything. There is no reason to expect that the set of rows that are returned would be consistent-- you could get any set of 20 rows arbitrarily. And if you go to the second page of results, there is no reason to expect that the second page of results would be completely different than the first page or that any particular result would appear on any page if you page through the entire result set.

There should be and order defined within ORDER BY clause. For example, lets say your elements are displayed in order of column "on_cust_mas"."cu_no", than your query should look like:
SELECT *
FROM (SELECT Row_number()
over (
ORDER BY ("on_cust_mas"."cu_no")) peta_rn,
"on_cust_mas"."cu_no",
"on_cust_mas"."cu_name",
"on_cust_mas"."cu_type",
"on_cust_mas"."contact",
"on_cust_mas"."add1_sh",
"on_cust_mas"."add2_sh",
"on_cust_mas"."city_sh",
"on_cust_mas"."post_code",
"on_cust_mas"."prov_sh",
"on_cust_mas"."country",
"on_cust_mas"."phone_nu",
"on_cust_mas"."fax_nu",
"on_cust_mas"."email",
"on_cust_mas"."pu_order_fl",
"on_cust_mas"."credit_amount"
FROM "on_cust_mas") peta_paged
WHERE peta_rn > 0
AND peta_rn <= 20
If this is a different column that sets the order just switch it within ORDER BY clause. In fact there should be any order defined, otherwise it's not guaranteed that it won't change, and you cant be sure what will be displayed at any page.

Why does ROW_NUMBER OVER (ORDER BY column) return a different result order than just ORDER BY column?

I'm on SQL Server 2008, using NHibernate as persistence layer (although this problem is purely SQL, I believe).
I've boiled down my problem to the following SQL statement:
SELECT TOP 2
this_.Id as Id36_0_,
this_.Name as Name36_0_,
ROW_NUMBER() OVER (ORDER BY this_.IsActive) as MyOrder
FROM Campsites this_
ORDER BY this_.IsActive /* a bit field */
This is part of the query that NH generates for retrieving a paged result set. The above statement gives me the following result:
Id36_0_ Name36_0_ MyOrder
9806 Camping A Cassagnau   1
8869 Camping a la ferme La Bergamotte 2
However, if I omit the ROW_NUMBER() OVER (ORDER BY this_.IsActive) - which is what NH generates for retrieving results on the first page - I get two completely different table entries in my result:
SELECT TOP 2
this_.Id as Id36_0_,
this_.Name as Name36_0_
/* ROW_NUMBER() OVER(ORDER BY this_.IsActive) as MyOrder */
FROM Campsites this_
ORDER BY this_.IsActive /* a bit field */
returns
Id36_0_ Name36_0_
22876 Centro Vacanze Pra delle Torri
22135 Molecaten Park Napoleon Hoeve
This completely confuses me and leads to a bug in our app where I get the same Campsite entry as the first element on the first AND the second page of our search.
Why does the same ORDER BY clause work differently inside the ROW_NUMBER OVER() expression?

ORDER BY this_.IsActive /* a bit field */
since that is a bit field it can only be 0 or 1...I assume you have many rows with this bit field being 0 or 1 ordering it by that doesn't make sense, what if 90% is active...you are not really ordering correctly in that case because you don't have a second ordering.
why don't you pick something that is unique...maybe his_.Name for example
or what about this?
ROW_NUMBER() OVER (ORDER BY this_.IsActive, this_.Name)

It's basically random in both instances because a bit field is bad for any ordering (as SQL Menace noted). They are separately evaluated by the DB Engine because they have nothing to do with each other.
Note:
The internal ORDER BY only applies to the ROW_NUMBER() value ordering.
Your output ORDER BY is only this_.IsActive

You expect the result to be ordered by Name too, but only have IsActive in the ORDER BY clause.
The nature of SELECT is set-based, so you should not rely on arbitrary (but seemingly correct) ordered results of your query if you do not explicitly define the order.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas