sql create columns from group by collection - sql

I have a table in the following form
chain |branch
________|________|
a |UK
a |US
b |ISRAEL
b |UK
b |FRANCE
b |BELGIUM
c |NIGERIA
and i would like to create a new table in the following format
chain |branch_1|branch_2|branch_3|branch_4
________|________|________|________|________|
a | UK | US |--------|--------|
b | ISRAEL| UK | FRANCE |BELGIUM |
c | NIGERIA|--------|--------|--------|
For further clarification, imagine that you can do a group by (chain) where the aggregate function is the identity so that
group_1->(element1,element2,element3,..,elementM)
group_2->(element1,element2,element3,..,elementN)
...
group_X->(element1,element2,element3,..,elementZ)
so a new table will be created which will have
R+K columns where R are the number of columns that we group by (in our case that is the column 'chain' so R=1) and K is the max count of the groups (in our case that is four, corresponding to chain 'b')
I am sure that this must be a common question, so my apologies if this been answered before, but i could not find anything.
EDIT:
THIS IS NOT A PIVOT TABLE
A pivot table in that case would be
chain |UK |US |ISRAEL |FRANCE |BELGIUM |NIGERIA |
________|________|________|________|________|________|________|
____a___|____1___|____1___|____0___|____0___|____0___|____0___|
____b___|____1___|____0___|____1___|____1___|____1___|____0___|
____c___|____0___|____0___|____0___|____0___|____0___|____1___|
Thanks!

You can do this with conditional aggregation and row_number():
select chain,
max(case when seqnum = 1 then branch end) as branch_01,
max(case when seqnum = 2 then branch end) as branch_02,
max(case when seqnum = 3 then branch end) as branch_03,
max(case when seqnum = 4 then branch end) as branch_04
from (select t.*,
row_number() over (partition by chain order by branch) as seqnum
from table t
) t
group by chain;
Note: Your table doesn't have a column specifying the ordering of the rows. SQL tables represent unordered sets. Without such a column, there is no concept of one row being before or after another. So, this version orders by the branch name. You can order by whatever you like by changing the order by clause for row_number().

Related

Selecting with multiple conditions from one dataset

I have a Table that has 5 columns
First you can see that im german. But second you see that much of the data only differs in the category and value.
I now want to find all the Datasets that have category 1 and value 1
It should give me this table
I now whant to find in the initial TableA all the entrys that match Name, Date and City BUT only if all 3 of them for every dataset match AND the category is now 2 instead of 1 AND the Value is 0.
So for the Table A it should come out as:
I hope i didnt do any mistakes. In the example and it is clear what i try.
I know for the WHERE Statement there is an IN clause that basically checks if the value is inside a list of values. But i dont know how to use this to check for 3 Values. Because when i just do 3 Lists checks it would also give me every entry that is a combination of my 3 lists regardles of which row the actual value comes from.
So instead of checking if value Name[0] And City[0] And Date[0] can be found i need to avoid that a value is found that is like Name[0] City[4] and Date[12] (Number in brackets stands for the row number).
The code i would have thought of:
Select*
FROM tablea
WHERE
(SELECT name, date, city
FROM tablea
WHERE tablea.Category=1 AND tablea.Value=0) as tableafiltered
WHERE tablea in tableafiltered
Thats what i thought would maybe work. But im pretty sure it wouldnt work. Because im trying to match 3 Columns. And the in in the where statement is only valid for one column right?
The first dataset that you describe can be a subquery and you can join it to the table:
select t.*
from tablea t inner join (
select distinct name, date, city
from tablea
where category = 1 and value = 1
) d on d.name = t.name and d.date = t.date and d.city = t.city
where t.category = 2 and t.value = 0
Another way of doing it is with EXISTS:
select t.*
from tablea t
where t.category = 2 and t.value = 0
and exists (
select 1
from tablea
where name = t.name and date = t.date and city = t.city and category = 1 and value = 1
)
See the demo.
Results:
> name | date | city | category | value
> :----- | :--------- | :------ | -------: | ----:
> Albert | 01.01.2000 | Berlin | 2 | 0
> Albert | 01.01.2000 | Hamburg | 2 | 0
One way to do this would be to create two selects, one for category=1, value=1, and one for the 2,0 combination. Then you can inner join the two tables in one row, then ensure the other two columns matches by where table1.column1=table2.column1 and table2.column2=table1.column2. You can choose the columns any way you like, that's why I give this generic form.

Postgresql/SQL insert into table based on multiple column values

My origin table looks like this
company_name | feature_name | feature_value
abc | income | 315
abc | cash | 213
abc | val | 9
goo | income | 123
goo | cash | 487
goo | val | 990
I want to insert into a new table so that the new table looks like this, and the new table won't contain column for cash.
company_name | income_name | income_value | val_name | val_value
abc | income | 315 | val | 9
goo | income | 123 | val | 990
I've checked a lot of posts but still don't know how to do that. Any suggestion is appreciated.
To pivot data without using any proprietary syntax you have a couple of options. Group/max on conditionals or left join multiple times:
SELECT
company_name,
MAX(CASE WHEN feature_name = 'income' THEN feature_value END as income,
MAX(CASE WHEN feature_name = 'val' THEN feature_value END as val,
...
FROM
table
GROUP BY company name
To see how it works run it without the group by and max operations; you'll see how the case when spreads the single value column out into multiple columns, multiple rows per company. The group by and max collapse these multiple rows to a single row by removing all the nulls
If your table is just a snippet you can add more columns where the ... is by copying the pattern. Remember to remove the trailing comma from the select list items
The join way:
SELECT
t.company_name,
income.feature_value as income,
val.feature_value as val,
...
FROM
table t
LEFT JOIN table income ON t.company_name = income.company_name and income.feature_name = 'income'
LEFT JOIN table val ON t.company_name = val.company_name and val.feature_name = 'val'
...
This virtually splits the table into multiple tables each having the company name and just some of the rows, making a table per feature, then joins them all back together into one multi column result set
I've always preferred the group way but you'd have to trial which was more efficient for your situation and which is easier to maintain for your understanding
Inserting data from a select into another table is a very common operation so I've omitted that part for clarity- this is just to show you how to pivot the data. The insert will be one of:
INSERT INTO existingtable(company_name, income, val)
SELECT ...
CREATE newtable AS
SELECT ...
You can try with conditional aggregation using case when expression
select company_name,
max(case when feature_name='income' then feature_name end) as income_name,
max(case when feature_name='income' then feature_name end) as income_value,
max(case when feature_name='val' then feature_name end) as val_name,
max(case when feature_name='val' then feature_name end) as val_value
from tablename
group by company_name

SQL/Postgres - collapse every N rows into 1 based on row position in group

I have a set of ordered results from a Postgres table, where every group of 4 rows represents a set of related data. I want to process this set of results further, so that every group of 4 rows are collapsed into 1 row with aliased column names where the value for each column is based on that row's position in the group - I'm close, but I can't quite get the query right (nor am I confident that I'm approaching this in the optimal manner). Here's the scenario:
I am collecting survey results - each survey has 4 questions, but each answer is stored in a separate row in the database. However, they are associated with each other by a submission event_id, and the results are guaranteed to be returned in a fixed order. A set of survey_results will look something like:
event_id | answer
----------------------------
a | 10
a | foo
a | 9
a | bar
b | 2
b | baz
b | 4
b | zip
What I would like to be able to do is query this result so that the final output comes out with each set of 4 results on their own line, with aliased column names.
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 2 | baz | 4 | zip
The closest that I've been able to get is
SELECT survey_answers.event_id,
(SELECT survey_answers.answer FROM survey_answers FETCH NEXT 1 ROWS ONLY) AS score_1,
(SELECT survey_answers.answer FROM survey_answers OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_1
(SELECT survey_answers.answer FROM survey_answers OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY) AS score_2,
(SELECT survey_answers.answer FROM survey_answers OFFSET 3 ROWS FETCH NEXT 1 ROWS ONLY) AS reason_2
FROM survey_answers
GROUP BY survey_answers.event_id
But this, understandably, returns the correct number of rows, but with the same values (other than event_id):
event_id | score_1 | reason_1 | score_2 | reason_2
----------------------------------------------------------
a | 10 | foo | 9 | bar
b | 10 | foo | 9 | bar
How can I structure my query so that it applies the OFFSET/FETCH behaviors every batch of 4 rows, or, maybe more accurately, within every unique set of event_ids?
demo: db<>fiddle
First of all, this looks like a very bad design:
There is no guaranteed order! Databases store their data in random order and call them in random order. You really need a order column. In this small case this might work for accident.
You should generate two columns, one for score, one for reason. Mix up the types is not a good idea.
Nevertheless for this simple and short example this could be a solution (remember this is not recommended for productive tables):
WITH data AS (
SELECT
*,
row_number() OVER (PARTITION BY event_id) -- 1
FROM
survey_results
)
SELECT
event_id,
MAX(CASE WHEN row_number = 1 THEN answer END) AS score_1, -- 2
MAX(CASE WHEN row_number = 2 THEN answer END) AS reason_1,
MAX(CASE WHEN row_number = 3 THEN answer END) AS score_2,
MAX(CASE WHEN row_number = 4 THEN answer END) AS reason_2
FROM
data
GROUP BY event_id
The row_number() window function adds a row count for each event_id. In this case from 1 to 4. This can be used to identify the types of answer (see intermediate step in fiddle). In productive code you should use some order column to ensure the order. Then the window function would look like PARTITION BY event_id ORDER BY order_column
This is a simple pivot on event_id and the type id (row_number) which does exactly what you expect
You need a column that specifies the ordering. In your case, that should probably be a serial column, which is guaranteed to be increasing for each insert. I would call such a column survey_result_id.
With such a column, you can do:
select event_id,
max(case when seqnum = 1 then answer end) as score_1,
max(case when seqnum = 2 then answer end) as reason_1,
max(case when seqnum = 3 then answer end) as score_2,
max(case when seqnum = 4 then answer end) as reason_2
from (select sr.*,
row_number() over (partition by event_id order by survey_result_id) as seqnum
from survey_results sr
) sr
group by event_id;
Without such a column, you cannot reliably do what you want, because SQL tables represent unordered sets.

SQL Server, complex query

I have an Azure SQL Database table which is filled by importing XML-files.
The order of the files is random so I could get something like this:
ID | Name | DateFile | IsCorrection | Period | Other data
1 | Mr. A | March, 1 | false | 3 | Foo
20 | Mr. A | March, 1 | true | 2 | Foo
13 | Mr. A | Apr, 3 | true | 2 | Foo
4 | Mr. B | Feb, 1 | false | 2 | Foo
This table is joined with another table, which is also joined with a 3rd table.
I need to get the join of these 3 tables for the person with the newest data, based on Period, DateFile and Correction.
In my above example, Id=1 is the original data for Period 3, I need this record.
But in the same file was also a correction for Period 2 (Id=20) and in the file of April, the data was corrected again (Id=13).
So for Period 3, I need Id=1, for Period 2 I need Id=13 because it has the last corrected data and I need Id=4 because it is another person.
I would like to do this in a view, but using a stored procedure would not be a problem.
I have no idea how to solve this. Any pointers will be much appreciated.
EDIT:
My datamodel is of course much more complex than this sample. DateFile and Period are DateTime types in the table. Actually Period is two DateTime columns: StartPeriod and EndPeriod.
Well looking at your data I believe we can disregard the IsCorrection column and just pick the latest column for each user/period.
Lets start by ordering the rows placing the latest on top :
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC), *
And from this result you select all with row number 1:
;with numberedRows as (
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC) as rowIndex, *
)
select * from numberedRows where rowIndex=1
The PARTITION BY tells ROW_NUMBER() to reset the counter whenever it encounters change in the columns Period and Name. The ORDER BY tells the ROW_NUMBER() that we want th newest row to be number 1 and then older posts afterwards. We only need the latest row.
The WITH declares a "common table expression" which is a kind of subquery or temporary table.
Not knowing your exact data, I might recommend you something wrong, but you should be able to join your with last query with other tables to get your desired result.
Something like:
;with numberedRows as (
SELECT ROW_NUMBER() OVER (PARTITION BY Period, Name ORDER by DateFile DESC) as rowIndex, *
)
select * from numberedRows a
JOIN periods b on b.empId = a.Id
JOIN msg c on b.msgId = c.Id
where a.rowIndex=1

How to get max of multiple columns in oracle

Here is a sample table:
| customer_token | created_date | orders | views |
+--------------------------------------+------------------------------+--------+-------+
| 93a03e36-83a0-494b-bd68-495f54f406ca | 10-NOV-14 14.41.09.000000000 | 1 | 0 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 20-NOV-14 14.41.47.000000000 | 0 | 1 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 26-OCT-14 16.14.30.000000000 | 2 | 0 |
| 93a03e36-83a0-494b-bd68-495f54f406ca | 11-OCT-14 16.31.11.000000000 | 0 | 2 |
In this customer data table I store all of the dates when a given customer has placed an order, or viewed a product. Now, for a report, I want to write a query where for each customer (auth_token), I want to generate the last_order_date (row where orders > 0) and last_view_date (row where product_views > 0).
I am looking for an efficient query as I have millions of records.
select customer_token,
max(case when orders > 0 then created_date else NULL end),
max(case when views > 0 then created_date else NULL end)
from Customer
group by customer_token;
Update: This query is quite efficient because Oracle is likely to scan the table only once. Also there is an interesting thing with grouping - when you use GROUP BY a select list can only contain columns which are in the GROUP BY or aggregate functions. In this query MAX is calculated for the column created_date, but you don't need to put orders and views in a GROUP BY because they are in the expression inside MAX function. It's not very common.
When you want to get the largest value from a row, you need to use the MAX() aggregate function. It is also best practice to group a column when you are using aggregate functions.
In this case, you want to group by customer_token. That way, you'll receive one row per group, and the aggregate function will give you the value for that group.
However, you only want to see the dates where the cell value is greater than 0, so I recommend you put a case statement inside your MAX() function like this:
SELECT customer_token,
MAX(CASE WHEN orders > 0 THEN created_date ELSE NULL END) AS latestOrderDate,
MAX(CASE WHEN views > 0 THEN created_date ELSE NULL END) AS latestViewDate
FROM customer
GROUP BY customer_token;
This will give you the max date only when orders is positive, and only when views is positive. Without that case statement, the DBMS won't know which groups to give you, and you would likely get incorrect results.
Here is an oracle reference for aggregate functions.