Sample Oracle SQL Randomly - in absense of ROWID - sql

I have a weird problem in using SAMPLE clause. Why does the First SQL does not work, while the second one works fine.
SELECT * FROM SYS.ALL_TABLES SAMPLE(10)
SELECT * FROM MIDAS.GERMPLASM SAMPLE(10)
I'm trying to SAMPLE a SQL query not just a table, but I could not figure out how I should use the SAMPLE clause. Is there any other way besides sample clause? Note: I want to do this in a random manner; not the first N rows.
Update:
First of all, thank you for reading this Q to help. But I already know that this SQL does not work because the SAMPLE clause is using a hidden column, ROWID. What I don't know is how to do this if ROWID does not exist in the table.
Here is a reproducible example SQL that I try to SAMPLE randomly:
SELECT cols.table_name, cols.column_name, cols.position, cons.status, cons.owner, cons.constraint_type
FROM all_constraints cons, all_cons_columns cols
WHERE cons.constraint_name = cols.constraint_name
AND cons.owner = cols.owner
ORDER BY cols.table_name, cols.position
I want to get small random subset of data (from query), to compute statistical properties of table columns before fetching everything from DB.
Thank you

The error message you get when you try to run the first query is a pretty big clue:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.
It's pretty clear to me from this that the SAMPLE functionality requires access to ROWID to work. As ROWID is a pseudocolumn that the database uses to physically locate a row, any query where the ROWID is indeterminate (such as when the data is aggregated), cannot use SAMPLE on the outer query. In the case of ALL_ALL_TABLES, the fact that it is a view that combines two tables via UNION blocks access to the ROWID.
From your revised question, the first thing that jumps out at me is that the SAMPLE clause must be in the FROM clause, between the table name and any alias. I was able to sample in a query with joins like this:
SELECT *
FROM table_a SAMPLE (10) a
JOIN table_b SAMPLE (10) b
ON a.column1 = b.column1
Regarding your actual query, I tried using the tables (again, actually views) that you're trying to sample one at a time:
select * from all_constraints sample(10)
ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table
select * from all_cons_columns sample(10)
ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table
This message is pretty clear: none of the tables in these views are key-preserved (i.e. guaranteed to return each row no more than once), so you can't sample them.
The following query should work to manually create a random sample, using DBMS_RANDOM.
SELECT *
FROM (SELECT cols.table_name,
cols.column_name,
cols.position,
cons.status,
cons.owner,
cons.constraint_type,
DBMS_RANDOM.VALUE rnd
FROM all_constraints cons
JOIN all_cons_columns cols
ON cons.constraint_name = cols.constraint_name
AND cons.owner = cols.owner)
WHERE rnd < .1
ORDER BY table_name, position

Related

Determine datatypes of columns - SQL selection

Is it possible to determine the type of data of each column after a SQL selection, based on received results? I know it is possible though information_schema.columns, but the data I receive comes from multiple tables and is joint together and the data is renamed. Besides that, I'm not able to see or use this query or execute other queries myself.
My job is to store this received data in another table, but without knowing beforehand what I will receive. I'm obviously able to check for example if a certain column contains numbers or text, but not if it is originally stored as a TINYINT(1) or a BIGINT(128). How to approach this? To clarify, it is alright if the data-types of the columns of the source and destination aren't entirely the same, but I don't want to reserve too much space beforehand (or too less for that matter).
As I'm typing, I realize I'm formulation the question wrong. What would be the best approach to handle described situation? I thought about altering tables on the run (e.g. increasing size if needed), but that seems a bit, well, wrong and not the proper way.
Thanks
Can you issue the following query about your new table after you create it?
SELECT *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'JoinedQueryResults'
Is the query too big to run before knowing how big the results will be? Get a idea of how many rows it may return, but the trick with queries with joins is to group on the columns you are joining on, to help your estimate return more quickly. Here's of an example of just returning a row count from the query above which would have created the JoinedQueryResults table above.
SELECT SUM(A.NumRows * B.NumRows)
FROM (SELECT ID, COUNT(*) AS NumRows
FROM TableA
GROUP BY ID) AS A
INNER JOIN (SELECT ID, COUNT(*) AS NumRows
FROM TableB
GROUP BY ID) AS B ON A.ID = B.ID
The query above will run faster if all you need is a record count to help you estimate a size.
Also try instantiating a table for your results with a query like this.
SELECT TOP 0 *
INTO JoinedQueryResults
FROM TableA AS A
INNER JOIN TableB AS B ON A.ID = B.ID

Translating query from Firebird to PostgreSQL

I have a Firebird query which I should rewrite into PostgreSQL code.
SELECT TRIM(RL.RDB$RELATION_NAME), TRIM(FR.RDB$FIELD_NAME), FS.RDB$FIELD_TYPE
FROM RDB$RELATIONS RL
LEFT OUTER JOIN RDB$RELATION_FIELDS FR ON FR.RDB$RELATION_NAME = RL.RDB$RELATION_NAME
LEFT OUTER JOIN RDB$FIELDS FS ON FS.RDB$FIELD_NAME = FR.RDB$FIELD_SOURCE
WHERE (RL.RDB$VIEW_BLR IS NULL)
ORDER BY RL.RDB$RELATION_NAME, FR.RDB$FIELD_NAME
I understand SQL, but have no idea, how to work with this system tables like RDB$RELATIONS etc. It would be really great if someone helped me with this, but even some links with this tables explanation will be OK.
This piece of query is in C++ code, and when I'm trying to do this :
pqxx::connection conn(serverAddress.str());
pqxx::work trans(conn);
pqxx::result res(trans.exec(/*there is this SQL query*/));//and there is a mistake
it writes that:
RDB$RELATIONS doesn't exist.
Postgres has another way of storing information about system content. This is called System Catalogs.
In Firebird your query basically returns a row for every column of a table in every schema with an additional Integer column that maps to a field datatype.
In Postgres using system tables in pg_catalog schema something similar can be achieved using this query:
SELECT
TRIM(c.relname) AS table_name, TRIM(a.attname) AS column_name, a.atttypid AS field_type
FROM pg_class c
LEFT JOIN pg_attribute a ON
c.oid = a.attrelid
AND a.attnum > 0 -- only ordinary columns, without system ones
WHERE c.relkind = 'r' -- only tables
ORDER BY 1,2
Above query does return system catalogs as well. If you'd like to exclude them you need to add another JOIN to pg_namespace and a where clause with pg_namespace.nspname <> 'pg_catalog', because this is the schema where system catalogs are stored.
If you'd also like to see datatype names instead of their representative numbers add a JOIN to pg_type.
Information schema consists of collection of views. In most cases you don't need the entire SQL query that stands behind the view, so using system tables will give you better performance. You can inspect views definition though, just to get you started on the tables and conditions used to form an output.
I think you are looking for the information_schema.
The tables are listed here: https://www.postgresql.org/docs/current/static/information-schema.html
So for example you can use:
select * from information_schema.tables;
select * from information_schema.columns;

Oracle 11g : Meta data query very slow

I have this view that should display comments and constraints (including check conditions where applicable) for the columns of some tables in a schema.
Essentially I'm (left ) joining ALL_COL_COMMENTS to ALL_CONS_COLUMNS to ALL_CONSTRAINTS.
However, this is really slow for some reason ( takes around 10 seconds ) even though I have a very small number of tables ( just 7 ) , very small number of columns ( 58 columns in total ). So the query returns few results. And it's still slow. What can I do ?
CREATE OR REPLACE FORCE VIEW "MYDB"."COMMENTS_VIEW" ("TABLE_NAME", "COLUMN_NAME", "COMMENTS", "CONSTRAINT_TYPE", "CHECK_CONDITION") AS
SELECT r.TABLE_NAME, r.COLUMN_NAME, r.COMMENTS, DECODE(q.CONSTRAINT_TYPE,'P', 'Primary Key', 'C', 'Check Constraint', 'R', 'Referential Integrity Constraint' ), q.SEARCH_CONDITION AS CHECK_CONDITION
FROM ALL_COL_COMMENTS r -- ALL_COL_COMMENTS has the COMMENTS
LEFT JOIN ALL_CONS_COLUMNS p ON (p.TABLE_NAME = r.TABLE_NAME AND p.OWNER = 'MYDB' AND p.COLUMN_NAME = r.COLUMN_NAME) -- ALL_CONS_COLUMNS links COLUMNS to CONSTRAINTS
LEFT JOIN ALL_CONSTRAINTS q ON (q.OWNER = 'MYDB' AND q.CONSTRAINT_NAME = p.CONSTRAINT_NAME AND q.TABLE_NAME = p.TABLE_NAME AND (q.CONSTRAINT_TYPE = 'C' OR q.CONSTRAINT_TYPE = 'P' OR q.CONSTRAINT_TYPE = 'R' ) ) -- this gives us INFO on CONSTRAINTS
WHERE r.OWNER = 'MYDB'
AND
r.TABLE_NAME IN ('TABLE1', 'TABLE2', 'TABLE3', 'TABLE4', 'TABLE5', 'TABLE6', 'TABLE7')
AND
r.COLUMN_NAME NOT IN ('CREATED', 'MODIFIED', 'CREATED_BY', 'MODIFIED_BY')
ORDER BY r.TABLE_NAME, r.COLUMN_NAME, r.COMMENTS;
Ensure the dictionary and fixed object statistics are up-to-date. Checking for up-to-date statistics is a good first step for almost any SQL performance problem. The dictionary and fixed objects are unusual, and there's a good chance nobody has considered gathering statistics on them before.
begin
dbms_stats.gather_fixed_objects_stats;
dbms_stats.gather_dictionary_stats;
end;
/
Try to join on table, and column ids instead of names where possible. Even OWNER if you can. Example:
ON p.TABLE_ID = r.TABLE_ID
Also, you are selecting from objects that are already views of who knows how many underlying tables. The query optimizer is probably having a hard time (and maybe giving up in some aspects). Try to translate your query into using the base tables.
I would either use a query profiler, or (simpler) just remove parts of your query until it gets super fast. For example, remove the DECODE() call, maybe that's doing it.

ORA-01446 occurs if I try to select random rows using SAMPLE clause in Oracle

Since the number of rows in the table is too large I switched from "ORDER BY dbms_random.value" construction for getting 1000 random rows to SAMPLE clause. It takes less than a second instead of 3 minutes to complete. But on some tables I get this error
ORA-01446: cannot select ROWID from view with DISTINCT, GROUP BY, etc
My query looks like this:
SELECT t1.columnA FROM
(SELECT columnA FROM table1 sample(1) where rownum <= 1000) t1
JOIN table2 t2
ON (t1.columnA = t2.columnA)
WHERE t2.columnB IS NOT NULL
and it works fine on some tables, but fails on others. I gave up googling, could you please advise any workaround in my situation.
As I expected SAMPLE clause works faster than all other solutions (Here you can see some of them)
Because I'm new to Oracle DBs generally and Oracle SQL Developer in particaular I mistakenly called view a "table". After I found that out the solution was clear.
SOLUTION: I had to look at the SQL query that forms a view and replace view name with that query.
For example my table1 was actually a view whose name I replaced with SELECT query that forms that view:
SELECT t1.columnA FROM
(SELECT columnA FROM (select distinct tt1.columnA, tt2.columnC
from table22 tt2, table11 tt1
where tt2.columnC = tt1.columnA) sample(1) where rownum <= 1000) t1
JOIN table2 t2
ON (t1.columnA = t2.columnA)
WHERE t2.columnB IS NOT NULL
After that I could work with tables and apply SAMPLE to them! Thank you everybody, great website! =)
PS: sorry for my English and ugly code facepalm.jpg

Oracle Join View - which rowid is used

CREATE VIEW EVENT_LOCATION ("EVENT_ID", "STREET", "TOWN") AS
SELECT A.EVENT_ID, A.STREET, A.TOWN
FROM TBLEVENTLOCATION A
JOIN TBLEVENTS B
ON A.EVENT_ID = B.EVENT_ID
WHERE B.REGION = 'South';
if I run
SELECT ROWID, STREET, TOWN FROM EVENT_LOCATION
then which ROWID should I get back?
Reason I'm asking is:
In the database there are many views with the above 'pattern'. It seems to differ which rowid is being returned from different views. ie. I am getting both A.ROWID or B.ROWID ...
UPDATE:
I have resolved this using the following view. Which essentially guarantees the ROWID comes from the right table. Thanks for your replies!
CREATE VIEW EVENT_LOCATION ("EVENT_ID", "STREET", "TOWN") AS
SELECT A.EVENT_ID, A.STREET, A.TOWN
FROM TBLEVENTLOCATION A
WHERE A.EVENT_ID IN (SELECT EVENT_ID FROM TBLEVENTS WHERE REGION = 'South');
Try looking at
select * from user_updatable_columns where table_name = 'EVENT_LOCATION'
The columns that are updatable should indicate the table (and hence the rowid) which Oracle says is the child.
Bear in mind that, if you use multi-table clusters (not common, but possible), then different tables in the same cluster can have records with the same ROWID.
Personally, I'd recommend (a) don't use ROWID in your code anywhere and (b) if you do, then include an explicit evt.rowid evt_rowid column in the view.
Since you get ORA-01445 if non of the tables you use are key-preserving I think it will return the rowid of one of the key-preserving tables. I don't know what will happen if several tables are key-preserving.