I have a table in which Employee Punches are saved. For each date for each employee there are columns in the table as Punch1, Punch2 till Punch10.
I want all this Punch Columns data in a Single Column. e.g. If in a row i have dates stored in Punch1, Punch2, Punch3, Punc4....so on. I want all this data in a single Column.
How to achieve this?
UNPIVOTcan be used to normalize your table:
If you table is called EmployeePunchesit would look like this:
SELECT UserID, Punch
FROM
(
SELECT UserID, Punch1, Punch2, Punch3, Punch4
FROM EmployeePunches
) AS ep
UNPIVOT
(
Punch FOR Punches IN (Punch1, Punch2, Punch3, Punch4)
) AS up
Using UNION ALLworks too, but there you will have 1select statement per Punch.
With UNPIVOTyou only need 1 Statement and just add the Punch columns you need.
"Horizontal" (string) concatenation:
If for each row, you want to derive a new column Punches1To10 that contains all the timestamps as e.g. a comma-separated list (such as 'xxxx-xx-xa, xxxx-xx-xb, xxxx-xx-xc, …'), then FOR XML will be what you're looking for.
See this article for a tutorial on this, and the SO question "Row concatenation with FOR XML, but with multiple columns?"
"Vertical" (table) concatenation:
Visually speaking, if you want to vertically stack the single columns Punch1, Punch2, etc. then you would concatenate the result of several select statements using UNION ALL. For just two columns, this would look like this:
SELECT Punch1 AS Punch FROM YourTable
UNION ALL
SELECT Punch2 AS Punch FROM YourTable
With three columns, it's going to be:
SELECT Punch1 AS Punch FROM Punches
UNION ALL
SELECT Punch2 AS Punch FROM Punches
UNION ALL
SELECT Punch3 AS Punch FROM Punches;
Either way, consider normalizing your table first!
This could quickly get out of hand the more PunchN columns you have.
Therefore, may I recommend that you first redesign this table into something a little more normalized.
For example, instead of having several columns named Punch1, Punch2, etc. (where each of them contains the same type of data), just have two columns: one containing the 1, 2, etc. from the PunchN column names, the other containing the timestamps:
PunchN Date
1 xxxx-xx-xa
1 xxxx-xx-xb
1 xxxx-xx-xc
2 xxxx-xx-xd
2 xxxx-xx-xe
…
Like this answer shows, the database system can do something like this for you through UNPIVOT.
Now, no matter how many Punch columns you have, your query e.g. for "vertical" concatenation would always the same:
SELECT Date FROM Punches;
(The "horizontal" concatenation would become simpler, too.)
I think you are looking for something like this
select Employee,Punch_date,Identifier
from YourTable
cross apply (values (Punch1,'Punch1'),
(Punch2,'Punch2'),
(Punch3,'Punch3'),
.......
(Punch4,'Punch4')) tc(Punch_date,Identifier)
The Identifier column help you to find the punch date is from which Punch number for each Employee
Related
I have a table called VIEWS with Id, Day, Month, name of video, name of browser... but I'm interested only in Id, Day and Month.
The ID can be duplicate because the user (ID) can watch a video multiple days in multiple months.
This is the query for the minimum date and the maximum date.
SELECT ID, CONCAT(MIN(DAY), '/', MIN(MONTH)) AS MIN_DATE,
CONCAT(MAX(DAY), '/', MAX(MONTH)) AS MAX_DATE,
FROM Views
GROUP BY ID
I want to insert this select with two columns(MIN_DATE and MAX_DATE) to two new columns with insert into.
How can be the insert into query?
To do what you are trying to do (there are some issues with your solution, please read my comments below), first you need to add the new columns to the table.
ALTER TABLE Views ADD MIN_DATE VARCHAR(10)
ALTER TABLE Views ADD MAX_DATE VARCHAR(10)
Then you need to UPDATE your new columns (not INSERT, because you don't want new rows). Determine the min/max for each ID, then join the result back to the table to be able to update each row. You can't update directly from a GROUP BY as rows are grouped and lose their original row.
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE,
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE
FROM
Views AS V
GROUP BY
ID
)
UPDATE V SET
MIN_DATE = M.MIN_DATE,
MAX_DATE = M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID
The problems that I see with this design are:
Storing aggregated columns: you usually want to do this only for performance issues (which I believe is not the case here), as querying the aggregated (grouped) rows is faster due to being less rows to read. The problem is that you will have to update the grouped values each time one of the original rows is updated, which as extra processing time. Another option would be periodically updating the aggregated values, but you will have to accept that for a period of time the grouped values are not really representing the tracking table.
Keeping aggregated columns on the same table as the data they are aggregating: this is normalization problem. Updating or inserting a row will trigger updating all rows with the same ID as the min/max values might have changed. Also the min/max values will always be repeated on all rows that belong to the same ID, which is extra space that you are wasting. If you had to save aggregated data, you need to save it on a different table, which causes the problems I listed on the previous point.
Using text data type to store dates: you always want to work dates with a proper DATETIME data type. This will not only enable to use date functions like DATEADD or DATEDIFF, but also save space (varchars that store dates need more bytes that DATETIME). I don't see the year part on your query, it should be considered to compute a min/max (this might depend what you are storing on this table).
Computing the min/max incorrectly: If you have the following rows:
ID DAY MONTH
1 5 1
1 3 2
The current result of your query would be 3/1 as MIN_DATE and 5/2 as MAX_DATE, which I believe is not what you are trying to find. The lowest here should be the 5th of January and the highest the 3rd of February. This is a consequence of storing date parts as independent values and not the whole date as a DATETIME.
What you usually want to do for this scenario is to group directly on the query that needs the data grouped, so you will do the GROUP BY on the SELECT that needs the min/max. Having an index by ID would make the grouping very fast. Thus, you save the storage space you would use to keep the aggregated values and also the result is always the real grouped result at the time that you are querying.
Would be something like the following:
;WITH MinMax
(
SELECT
ID,
CONCAT(MIN(V.DAY), '/', MIN(V.MONTH)) AS MIN_DATE, -- Date problem (varchar + min/max computed seperately)
CONCAT(MAX(V.DAY), '/', MAX(V.MONTH)) AS MAX_DATE -- Date problem (varchar + min/max computed seperately)
FROM
Views AS V
GROUP BY
ID
)
SELECT
V.*,
M.MIN_DATE,
M.MAX_DATE
FROM
MinMax AS M
INNER JOIN Views AS V ON M.ID = V.ID
I have a requirement to create a report that counts a total from 2 date fields into one. A simplified example of the table I'm querying is:
ID, FirstName, LastName, InitialApplicationDate, UpdatedApplicationDate
I need to query the two date fields in a way that creates similar output to the following:
Date | TotalApplications
I would need the date output to include both InitialApplicationDate and
UpdatedApplicationDate fields and the TotalApplications output to be a count of the total for both types of date fields. Originally I thought maybe a Union would work however that returns 2 separate records for each date. Any ideas how I might accomplish this?
The simplest way, I think, is to unpivot using apply and then aggregate:
select v.thedate, count(*)
from t cross apply
(values (InitialApplicationDate), (UpdatedApplicationDate)) v(thedate)
group by v.thedate;
You might want to add where thedate is not null if either column could be NULL.
Note that the above will count the same application twice, once for each date. That appears to be your intention.
i am trying to find the best DB design for the following problem.
i have 20000 data sets which look like this:
1. id, name, color, width, xxxx ... 150 attributes
2. id, name, color, width, xxxx ... 150 attributes
3. ...
which means that i have 20000 entities and 150 attributes like color, width etc for each one of them.
i need all these attributes and maybe 15 are being used more than others. this is being used in a web application and it has to perform.
solutions i thought about:
normalized two tables approach:
id, name and a few "more important" attributes in one main table
in another table (one-to-one relation): id and other less important attributes, every one of the in different column
everything in one monster table:
id, name, color, width ...
normalized two table approach (one-to-many):
main table with: id and name
in another table (one-to-many relation): other table with: id, attr_name, value
i like [3] most but i am not sure is this going to perform if i need a lot of data because every "id" has 150 values. and i would have to things like
SELECT mt.id, mt.name, at.attr_name, at.value
FROM main_table mt
INNER JOIN attr_table at ON at.id = mt.id
AND at.attr_name IN ('width', 'color', 'a', 'b', 'c' .....)
AND at.id IN (1,3,9...)
ORDER BY 1
having maybe 15-20 different values in "attr_name IN (...)" does not look optimal. and if i need 10-30 different datasets (which i usually do), it looks even less appealing.
output of this would be probably 200-300 lines and I would have to normalize this output in the code.
[2] is pretty dirty and simple but i am not sure how does it perform. and having 150 columns in one monster table also does not look optimal.
i like on this approach that i can do a lot of stuff in sql and not later in code like: attr1 - attr2 ... (like "max_width - width" or weight - max_weight/4).
[1] i don't like this one because it does not look clean to have "some" attributes in one table and all other attributes of the same type in an another
What is the best solution for this specific problem?
I found some similar but not same questions:
Best to have hundreds of columns or split into multiple tables?
Is it better to have many columns, or many tables?
With so few rows as 20,000 I would have no doubt about going full normalized. In my opinion even with much more rows I would still do it in instead of the additional JSON's full new set of problems and weaknesses.
output of this would be probably 200-300 lines and I would have to normalize this output in the code
Create a view to not have to do the join at every query
create view the_view as
select id, mt.name, at.attr_name, at.value
from
main_table mt
inner join
attr_table at using (id)
Filter when selecting
select *
from the_view
where
attr_name = 'color' and "value" = 'red'
and
attr_name = 'width' and "value" = '30'
and
id in (1, 3, 9)
150 columns suggest that the attributes list is not stable. If you create a table with 150 columns you will have to be always altering the table to add new columns. The normalized approach is flexible. You create attributes at will simple by adding a row to a table.
There should be 3 tables; main_table, main_table_attr_table and attr_table. The main_table_attr_table is a n to n connection between the other two.
Eg: I have two tables EPL1 and EPL2 which contains data of footballers who have scored, assisted, played matches etc. The structure of both the tables are exactly the same.
First Table contains stats of Ronaldo, Messi with 2 goals each.
Second Table contains stats of Ronaldo, Messi with 3 goals each.
Now I want to combine both these tables and want an output which has Ronaldo, Messi with 5 goals each.
Most Important thing to notice is that both the tables have the exact same structure and column names, I just want to combine(add) the values of all the columns in both tables.
So what kind of join should I use for this in Oracle?
Simplest way is with a UNION ALL statement.
select player, sum(goals) as goals
from
( select *
from table1
union all
select *
from table2 )
group by player
This works well when the two tables have a identical structure (or you're just selecting a projection) and you want to select all rows from all tables. This approach is easy to extend to three or more tables.
Note that you need to use UNION ALL. The plain UNION operator would produce the wrong result is say you had ('XAVI', 2) in table1 and ('XAVI', 2) in table2: it applies a distinct filter, and so you would get a final result of ('XAVI', 2) instead of ('XAVI', 4).
insert into NewTable
select sum(goals),player (select * From EPL1 Union all select * from EPL2) group by player
Also, having two tables with the same structure is not a correct design, you can get the desired output using the following:
SELECT * FROM epl1
UNION ALL
SELECT * FROM epl2;
Without knowing the table structure, it's a bit difficult to know the answer, but you probably need something like this
select epl1.name, sum(epl1.goals + epl2.goals)
from epl1
left join epl2
where epl1.name = epl2.name
group by epl1.name
I have a table with an amount column a reference field and an id column. What I need to do is sum the amount based on different combinations of ID's for each reference. There are nine different combinations in total that I then need to insert into a separate table.
The best way I've found to do this is to use a cursor and do each SUM separately, assign the amount to a variable and update the table for each reference and for each combination.
Hope that makes sense!
What I was hoping to find out is - is there a better way to do it?
thanks.
You could so something like:
SELECT SUM(CASE WHEN (Id = 9) THEN Val ELSE 0 END) ConditionalSum
From dbo.Table
You can have many of those SUMs with different conditions in one query.
You can create a table called something like combos with the following columns:
Name of combination
reference id in combination
(and perhaps other useful columns like an id and creation time, but that is not important here).
Insert your combinations into this table, something like:
First10 1
First10 2
...
First10 10
MyFavorite 42
Whatever the pairs are.
Then you can do what you want with a single query:
select c.comboName, sum(val) as ConditionalSum
from t join
combos c
on t.referenceId = c.referenceId
group by c.comboName