Only display first instance of event for each unique person - sql

I have a table below (that also has several other columns, but for the purpose of this example, I'll exclude them) where I only want to include the very first instance for each person (unique_id) by date, which is in DATETIME format.
In the past I've used something like:
SELECT
*,
least(min(date_event), min(date_event)) as min_date
FROM
table
GROUP BY
unique_id ,issue, date_event, age_at_event
However, this is still returning multiple records for each person, rather than just the very first instance?
unique_id
issue
date_event
age_at_event
1234
issue_a
2016-04-01T00:00:00
6
1234
issue_a
2016-04-01T00:00:00
6
1234
issue_b
2018-04-01T00:00:00
8
5678
issue_a
2019-09-01T00:00:00
2
5678
issue_a
2021-09-01T00:00:00
4
65431
issue_c
2019-09-01T00:00:00
1
1234
issue_a
2022-09-01T00:00:00
12

You can use the qualify function to implement what you're looking for.
With the sample data you provided the following query:
select *
from sample_data
qualify row_number() over (partition by unique_id order by date_event) = 1
produces this:

Related

merging multiple rows into one based on id

i have the data in this format in an amazon redshift database:
id
answer
1
house
1
apple
1
moon
1
money
2
123
2
xyz
2
abc
and what i am looking for would be:
id
answer
1
house, apple, moon, money
2
123, xyz, abc
any idea? the thing is that i cannot hard code the answers as they will be variable, so preferably a solution that would simply scoop the answers for each id's row and put them together separated by a delimiter.
you can use aggregate function listagg:
select id , listagg(answer,',')
from table
group by id
You can use string_agg(concat(answer,''),',') with group by so it will be like that:
select id , string_agg(concat(answer,''),',') as answer
from table
group by id
tested here
Edit:
you don't need concatenate, you can just use string_agg(answer,',')

SQL: Select, dynamically created values as column

I have to select data from a column then show as these values as another columns. But the struggle is, inside my column always new data will come and new cells will be created.
Product_Table:
ID NAME
1 apple
2 orange
Selling_Table:
ID PRODUCT_ID DATE
1 1 2020-06-12
2 1 2020-05-03
3 2 2020-01-01
4 1 2020-07-23
What I Want
NAME SELLING_DATE_1 SELLING_DATE_2 SELLING_DATE_3
APPLE 2020-06-12 2020-05-03 2020-07-23
ORANGE 2020-01-01 NULL NULL
When there is a new date in selling table I want my SQL create another SELLING_DATE dynamically. As you notice when there is no SELLING_DATE data filled with null or we can replace basic text like 'not sold'
You can use window functions and conditional aggregation:
select
name,
max(case when rn = 1 then date end) selling_date_1,
max(case when rn = 2 then date end) selling_date_2,
max(case when rn = 3 then date end) selling_date_3
from (
select p.*, s.date, row_number() over(partition by p.id order by s.date) rn
from product_table p
inner join selling_table s on s.product_id = p.id
) t
group by id, name
You can expand the query with more columns (that is, more conditional max()s) to handle more dates.
I really don't think it's practical to solve your problem that way.
A couple of things you could try instead:
Just select each product once per sale, i.e. join them, which in practice is almost the same as replacing the ID's in your selling_table with the names of the products. That should give you something like:
PRODUCT SELLING DATE
apple 2020-06-12
apple 2020-05-12
apple 2020-07-23
orange 2020-01-01
You can try selecting all dates for each product together and display them as a string (may require some research and work though if you're new to SQL). This answer do a somewhat similar question may perhas help you.
Perhaps you are calling this from some higher level program? If you are using C# for instance, you could probably manipulate whatever result you get pretty easily using e.g. LINQ. That all depends on what the bigger picture looks like though, and how you want to present your final result.
To that end, it would be useful if you could update your question with more info about your overall architecture.

SQL - Query to split original sort

I hope my title is ok as I really don’t know how to call it.
Anyway, I have a table with the following :
ID - Num (Primary Key)
Category - VarChar
Name - VarChar
DateForName - Date
Data looks like that :
1 100 111 31/12/2017
2 101 210 30/12/2017
3 100 112 29/12/2017
4 101 203 27/12/2017
5 100 117 20/12/2017
6 103 425 08/12/2017
To generate this table, I just sorted by date DESC.
Is there a way to add a new column with the order per Category like :
1 100|1
2 101|1
3 100|2
4 101|2
5 100|3
6 103|1
Max
You want analytical function row_number():
select t.*
from (select *, row_number() over (partition by Category order by date desc) Seq
from table
) t
order by id;
Yes, SQL has a couple options for you to add a column that is populated with a ranking of the rows based on the category and id columns.
If you just want to add a column to the select statement, I recommend using the RANK() function.
See more details here:
https://learn.microsoft.com/en-us/sql/t-sql/functions/rank-transact-sql?view=sql-server-2017
For your current table, try the following select statement:
SELECT
[ID],
[Category],
[Name],
[DateForName],
RANK() OVER (PARTITION BY [Category] ORDER BY [DateForName] DESC) AS [CategoryOrder]
FROM [TableName]
Alternatively, if you want to add a permanent column (aka a field) to the existing table, I recommend treating this as a calculated column. See more information here:
https://learn.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
Because the new column would be completely based on two pre-existing columns and only those two columns. SQL can do a great job maintaining this for you.
Hope this helps!

Need help combining columns from 2 tables and keep remaining data in rows based on parameters in sql

I am needing some help with this! I have been steered toward the intersect function, but it seems limited as it only matches and returns matched values. I am trying to combine 2 tables on a common column value and return the rows based on a date parameter. Is this even possible using SQL? Thanks in advance!
My starting tables look like this:
name date doc info
janedoe 7/21 jones 47
jonwall 7/1 nick 21
name date doc info
janedoe 6/21 jones 74
jonwall 8/31 hall 22
I want to combine these rows by duplicate name. And keep the remaining data based on most recent date. so the end result should look like this.
name date doc info
janedoe 7/21 jones 47
jonwall 8/31 hall 22
Is there anyway anyone could help me on this???? I am currently using SQLExpress
WITH allRows AS (
SELECT * FROM tableA
UNION ALL
SELECT * FROM tableB
), mostRecent AS (
SELECT *,
ROW_NUMBER() OVER
(PARTITION BY name ORDER BY date DESC) as rn
FROM allRows
)
SELECT *
FROM mostRecent
WHERE rn = 1
You should have some ID column, otherwise you are risking having two person with same name.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.