Return the row includes the maximum value of specific column if two rows have the same values. - sql

I have a result from the SQL query, wich is displayed below,
I want to build a SQL query, which can return the row includes maximum number from the last column if any two rows (or more than 2) have the same number from the first column.
For instance, from the table, you can see the top two rows have the same number from the first column, which is 2195333. If the SQL query runs, it will return the first row and the rest of rows, discarding the 2nd row only, since the last column for the 2nd row is 1, which is smaller than 2 from the 1st row.
I was thinking about using the while loop in SQL, like run the loop from the 1st row to the last row, if there are any rows have the same value from the first column, it will return the row which has the maximum value from the last column. Since I am new to SQL, I have no idea how to implement it. Please me help me. Thanks

The question, sample data, and desired results are lacking a bit.
But if I understand your question, you can use the WITH TIES clause in concert with Row_Number()
Example
Select Top 1 with ties *
From YourTable
Order By Row_Number() over (Partition By YourCol1 Order By YourLastCol Desc)
Edit Use Dense_Rank() if you want to see ties

Related

How can I remove Null value from first column but keep the value of the 2nd and thirds columns

I am Omar, a new learner of SQL.
I have a large excel sheet that I want to analyze by SQL.
It has the following columns (Manufacturers, Products, sales)
the problem is, in the first column 'Manufacturers,' the manufacturer name has only been entered once per one manufacturer. while for the rest of the below rows, the cells are empty until the next manufacturer.
Please refer to the attached image for more understanding.
How can I remove these null values in my query results while keeping the values of the product column value?
thank you
The main problem you have is that SQL tables represent unordered sets. So, if you have only your specified columns, you cannot reconstruct the Excel format.
To solve this, you want to load the data into a table that has an identity or auto-incremented column, in order to preserve the insertion order. The exact details depend on the database. Let me call this column id.
Then you can "spread" the value where it is missing. One method is:
select t.*,
max(manufacturer) over (partition by manufacturer_grp) as imputed_manufacturer
from (select t.*,
count(manufacturer) over (order by id) as manufacturer_grp
from t
) t

Renaming Row Count Column in SQL

I can’t find how to rename the row counting column in a table in an SQL Server RDMS. When you create a table and you have user created columns, A and B for example, to the farthest right of those columns, you have the Row Number column.
It does not have a title. It just sequentially counts all the rows in your table. It's default. Is it possible to manipulate this column denoting the row numbers? Meaning, can I rename it, put its contents in descending order, etc. If so, how?
And if not, what are the alternatives to have a sequentially counting column counting all the rows in my table?
No. You can create your own column with sequential values using an identity column. This is usually a primary key.
Alternatively, when you query the table, you can assign a sequential number (with no gaps) using row_number(). In general, you want a column that specifies the ordering:
select t.*, row_number() over (order by <ordering column>) as my_sequential_column
from t;

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

What value is selected into parameter in SQL query without where clause

For example, I have this query
SELECT #param = column from table
What value is pulled into #param?
I tried this and can't figure out the value that is being pulled. It is not the old record or newer one.
The documentation states:
the variable is assigned the last value that is returned
But without a WHERE clause that uniquely identifies a row nor an ORDER BY clause that specifies a unique value for ordering, the row chosen for the variable assignment is undefined and not deterministic when the table has more than one row.
You could add ORDER BY to the query to return the last ordered row. A more efficient method to do that would to be use SELECT TOP(1)...ORDER BY...DESC. Conversely, SELECT TOP(1)...ORDER BY...ASC will return the first ordered row. Again, the order by column(s) need to be unique for a deterministic value.
This is the value in the column referenced. It seems like it should have a TOP 1 in it, with a WHERE Clause designed to fetch 1 row only.

PowerPivot - only newest values on current context

I have a problem with PowerPivot.
Let's have a look at only 3 columns in my data source:
date - clientid - category
Category can only be 1 or 2.
In the data source you can find often the same clientid for a given time period, sometimes with different category.
So in my pivot table, I can see the distinct count of my clients depending on the chosen timeline.
But, of course, the sum of clients for cat=1 and cat=2 is bigger than the distinct count.
Is it possible to count only the newest entries for every clientid, so that the sum of the two cats is the same as the distinct count of my clients?
Thanks in advance to everybody who helps and spend his time for me.
Stefan
This was fun! Thanks for an interesting problem. Normally for this sort of thing we might flag the most recent entry for a given clientid in an extra field, but yours needs to be dynamic at runtime based on your date filter selection.
Here we go. Be warned, it's a doozy.
CountCat:=
COUNTROWS(
FILTER(
GENERATE(
VALUES( ClientCats[clientid] )
,CALCULATETABLE(
SAMPLE(
1
,SUMMARIZE(
ClientCats
,ClientCats[date]
,ClientCats[category]
)
,ClientCats[date]
,DESC
)
,ALL( ClientCats[category] )
)
)
,CONTAINS(
VALUES( ClientCats[category] )
,ClientCats[category]
,ClientCats[category]
)
)
)
Let's work through it.
COUNTROWS() is trivial.
FILTER() takes a table as its first argument. It creates a row context by iterating row-by-row through this table. It evaluates a boolean expression in each row context and returns the rows for which the expression returns true. We're not getting to that expression for a little while here. Let's look at the table we'll be filtering.
GENERATE() takes a table as its input and creates a row context by iterating row-by-row through that table. For each row context it evaluates a second table, and cross joins the rows that exist in the second table expression in the current row context from the first table with the row from the first table.
Our first table is VALUES( ClientCats[clientid] ), which is simply a distinct list of all [clientid]s in context from the pivot table.
We then evaluate CALCULATETABLE() for each row context, aka for each [clientid]. CALCULATETABLE() evaluates a table expression in the filter context determined by its second and subsequent arguments.
SAMPLE() is the table we'll evaluate. SAMPLE() is like TOPN(), but with ties broken non-deterministically. SAMPLE( 1, ... ) always returns one row. TOPN( 1, ... ) returns all rows that are tied for first position.
SAMPLE(), here, will return one row from the table defined by SUMMARIZE(). SUMMARIZE() groups by the fields in a table that are named. Thus we have a table of all distinct values of [date] and [category] that are included based on the context determined by our CALCULATETABLE(). SAMPLE()'s third argument defines a sort-by column to determine which rows are first, and its fourth determines the sort order. Thus for each [clientid] we are returning the latest row in the SUMMARIZE() for that [clientid].
The ALL() in our CALCULATETABLE() strips the context from the field [category] that might be coming in from our pivot table. This means that every time we evaluate our GENERATE() (remember we're still in that function here), we get a table of all [clientid]s that exist in context, and their most recent [category], even when we're evaluating in a pivot cell that has filtered [category].
That sounds like a problem - we'd expect the same count now for every pivot cell. And that's what we'd get if we did COUNTROWS( GENERATE() ). But wait, we're still in FILTER()!
Now we get to the boolean expression which will filter the rows of that GENERATE(). CONTAINS() takes a table as its first argument, a reference to a column in that table as its second argument, and a scalar value as its third argument. It returns true if the column in argument 2, of the table in argument 1, contains the value in argument 3.
We are outside of the CALCULATETABLE(), and therefore context exists on [category]. VALUES() returns the unique rows in context. In any pivot cell filtered by [category], this will be a 1x1 table, but in our grand total, it will have multiple rows.
So, the column in that VALUES() we want to test is [category] (the only column that exists in that VALUES()).
The value we want to test for is referred to by ClientCats[category]. That third argument evaluates [category] in the row context determined by FILTER(). Thus we return true for every row that matches the current filter context (in a pivot cell) of ClientCats[category]. Mind-bending stuff here.
Anyway, the upshot is that in a [category]-filtered pivot cell, we get the number of distinct [clientid]s that have, for the time frame selected, that [category] value as their most recent category.
For the grand total we get every [clientid] in context.
This will probably not have a very good performance curve.
Here's a sample workbook to play with the functioning measure defined.
Edit
Based on replies below.
Do you need to maintain in the model all the rows that have [UseClient] <> 1? Deduping and flagging is always easier in tools other than Power Pivot.
I have no idea how you've determined the values for 1 in [UseClient]. None are the most recent entry for a given [ClientID]. If you want to just flag the most recent row, which is what it sounds like you want, but not what your workbook looks like, you can do a calculated column much more easily than doing this in a measure:
=SAMPLE(
1
,CALCULATETABLE( // return all dates for the [clientid] on current row
VALUES( ClientCats[date] )
,ALLEXCEPT( ClientCats, ClientCats[clientid] )
)
,ClientCats[date]
,DESC
) = ClientCats[date] // row context in table
This will return true when the value of [date] on a given row is equal to the maximum [date] for the client on that row.
One thing you could easily do in Power Query is to group [clientid] and take the max date for each [clientid]. Then you have one row per client.
This is all different than your original question, though, because your original wants to find the maxes based on date selection. But a calculated column is not updated based on filter context. It's only recalculated at model refresh time. If you're willing to use a calculated column, then just deal with your data issues before bringing it into Power Pivot.