I created a test table in HBASE with 2 column families and 3 columns for each, all same value, 100,000 rows in total.
SELECT COUNT(*) will not return the correct row count (much much less).
However, doing SELECT on a subset of those columns gives the right count.
If I reduced the column number to 2 in this case, SELECT COUNT(*) gives the right count.
Later I tried tables with only one column family, drill returns the right count only when column number is smaller than 6.
What could be the possible reason of this?
Have I missed any drill configurations?
Related
Postgres limits number of columns in a SELECT statement to 1664, otherwise the "target lists can have at most 1664 entries" error.
In our app, we dynamically create SELECT statements on the fly and may get joins with columns numbers exceeding this limit.
My idea was to check the number of columns in the resulting SELECT statement and, if they exceed e.g. 1000, split the SELECT STATEMENT into several ones with the same WHERE clauses but different lists of columns to be selected and then join the results of multiple selects in-memory,e.g.
select first 1000 columns
select second 1000 columns ... as necessary
But in order to join the partial result sets, each should contain some unique row id to enable a join.
So we introduced
SELECT DENSE_RANK() AS ROW_NUM OVER (GROUP BY root.id), <1st 1000 column names>
FROM root
JOIN ..
but that won't work correctly since some joins represent one-to-many relations, and a single root table item may be related to multiple child items.
Is it possible to solve this problem at all?
Thanks for the contributions so far. After more digging I am re-stating the question (and indeed the title of the question) as follows:
I am selecting just 2 columns from a view that contains several columns. The view returns 50,497 rows if I select all columns, but only 50,496 (i.e. 1 fewer) when I select just 2 columns, these being [Patient_ID] (which is a bigint column) and [Condition_Code] (a varchar(6) column).
Version 1:
SELECT * FROM [vw_Query1]
returns 50,497 rows.
But:
SELECT [Patient_ID], [Condition_Code] FROM [vw_Query1]
returns 50,496 rows.
I can post the code for [vw_Query1] if required, but an understanding at a fundamental level how this can happen when no GROUP BY clause has been used is the key question for me.
UPDATE:
It turns out that if I exclude one particular column, I get the lower number of rows of 50,496. This column is unique in having a Case-Sensitive collation. I still dont understand why it is dropping one particular row but at least I am getting closer to an understanding.
I am new to VBA so I apologize in advance if this seems basic to you experts but I appreciate all of the help I can get.
I have a table containing a column of reference numbers that can grow or shrink weekly. I also have a query pulling back price list data that has changed since last week. The query results vary weekly. What I need to do is assign all of the query results to each reference number and have all of that end up in a make table. For example if there are 10 reference numbers and the query result is 10 rows then 100 lines would be added to the table (adding the reference number to the beginning of each row). This sounds like some sort of loop but your the experts, not me.
Thanks in advance!
You can solve it with a cross join. In a cross join you join two tables without specifying a join clause. Such a query returns all possible combinations of rows of the two tables (this is called a Cartesian product)
SELECT col_a, col_b INTO newTable
FROM table_a, table_b
If table_a contains 10 rows and table_b contains 5 rows, this returns 50 rows.
This question already has an answer here:
Access query producing results like ROW_NUMBER() in T-SQL
(1 answer)
Closed 7 years ago.
I have the following code
SELECT C_Record.BunchOfColumns, Count(*) AS Degrees
FROM C_Record
WHERE (((C_Record.[C#])=[Enter Value])) //Parameter Input from User
GROUP BY C_Record.BunchofColumns;
My Degrees column never increments, it shows 1 always no matter how many rows are returned from the query. I am suspecting that I have not implemented my GROUP BY method properly. If I understand it correctly, all columns that are selected and are not part of the aggregate function (COUNT in my case) should be put together in GROUP BY. Any help is much appreciated. Thanks in advance
Edit: What I am trying to achieve is to check how many rows have a particular value for a column, then select all other relevant columns and create a Index columns. For example if there are three rows that meet my requirement
Col1 Col2 Degrees
A X 1
B Y 2
C Z 3
and if only 2 rows meet my requirement then
Col1 Col2 Degrees
P X 1
Q Y 2
P.S - my C_Record.BunchofColumns consists of about 10 columns that I did not include for the sake of brevity.
P.P.S - If I try to skip out on any column it gives me the error You Tried to execute a query that does not include the specified expression <<column_name>> as part of an aggregate function
When you use Count() with a GROUP BY the count returned is the number of rows in each group. So to get a count greater than one you would have to have more than one row in your table that had exactly the same values. If you are selecting 10 different columns it seems likely that you have no two columns in the database that have exactly those 10 same values.
If you start with a selecting and grouping by a single column you will see count's of more than one.
That is not how GROUP BY works.
GROUP BY completely changes the meaning of your query. Each row of the result is an "aggregate grouping" of the original rows. Each aggregate grouping consists of all the rows with a particular combination of values for their GROUP BY columns. So if you GROUP BY ten columns, each grouping will consist of rows which are identical on all ten columns.
Once these groupings have been formed, you SELECT various aggregate values like count() or sum(), which provide you with information about the group as a whole. count(*) gives you the number of rows in the group, while count(column) gives you the number of rows in which column is non-NULL. You can also select any of the columns which appear in the GROUP BY clause, because those columns are identical across the whole group.
You are getting a count(*) of one because each of your groups only contains a single row. This is probably because you are grouping by ten columns, and there are no two rows which are identical for all ten columns.
If you just want a count of how many rows satisfy some query, and you don't want this aggregation at all, you write it like this:
SELECT count(*)
FROM something
WHERE something
-- no GROUP BY
;
That will form a single aggregate group of your whole query, and count the rows.
If you want something else, you will need to further explain what you're trying to do.
This query takes about a minute to give results:
SELECT MAX(d.docket_id), MAX(cus.docket_id) FROM docket d, Cashup_Sessions cus
Yet this one:
SELECT MAX(d.docket_id) FROM docket d UNION MAX(cus.docket_id) FROM Cashup_Sessions cus
gives its results instantly. I can't see what the first one is doing that would take so much longer - I mean they both simply check the same two lists of numbers for the greatest one and return them. What else could it be doing that I can't see?
I'm using jet SQL on an MS Access database via Java.
the first one is doing a cross join between 2 tables while the second one is not.
that's all there is to it.
The first one uses Cartesian product to form a source data, which means that every row from the first table is paired with each row from the second one. After that, it searches the source to find out the max values from the columns.
The second doesn't join tables. It just find max from the fist table and the max one from the second table and than returns two rows.
The first query makes a cross join between the tables before getting the maximums, that means that each record in one table is joined with every record in the other table.
If you have two tables with 1000 items each, you get a result with 1000000 items to go through to find the maximums.