SQL Server combine 2 rows into 1 [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 23 days ago.
This post was edited and submitted for review 21 days ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
In a SQL Server query, I'm trying to figure out how to combine two rows of data into one row for specific records.
The following is an example of table data. Below it is how I would like the data to be displayed. I want to display all available columns for each employee but on 1 row. I tried group by but that did not work as I want all the columns displayed.
I'd like to display only one row for certain employees who have two rows. I can use EMP ID because it is associated with a specific employee. Any suggestions for the best way to accomplish this in SQL Server?

In SQL Server, you can use the GROUP BY clause and an aggregate function to combine multiple rows of data into one for specific records. The following query, for example, will group the rows by EMP ID and return the sum and count of the specified column for each group:
SELECT EMP_ID, SUM(column_name) AS column_name, COUNT(column_name) AS column_name_count
FROM your_table
GROUP BY EMP_ID;
The data will be organized by employee ID, and a summary of the column specified for each group of records with the same employee ID will be provided.

If you simply wish to display the employee IDs without regard to any of the other variables in the table, then this can be accomplished using the DISTINCT function:
select distinct emp_id from table;
This will return employee IDs without any duplicate values being returned.
If you are looking to aggregate data (which I believe is your intention), then it is a case of using an aggregate function such as GROUP BY. e.g. given emp_id and a column x, one example of a query could be as follows:
select emp_id, sum(x) from table group by emp_id order by emp_id;

Related

SQL Dynamic Query generation [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed last year.
Improve this question
I have 2 tables like below
here i need to get all the rows from second table based on each row from the first table which matches Field1value and field2value combination.Second table select column will decide by first table field1 and field2 respectively,and I need to remove the duplicate row if any,for example last row in second table satisfies the condition of 1 and 3 row of first table.
How to format this query?.
Are you looking for something like this?
select 'select distinct percentage from table2 where '+Field1+' ='+ ''''+Field1value+ ''''+' and '+Field2+' = '+ ''''+Field2value+ '''' from table11
result:

How to avoid group by? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Say there are two tables, one table has a column name, and the other has a column occupation. I'm trying to find out how many people have more than 6 occupations in my records. I've tried to COUNT the occupations, but the problem is, when I do, I need to group by. When I group by the name, the problem arises when there are two "Alex Jones", each having 4 occupations, and so the resulting group by gives me "Alex Jones: 8".
I'm not sure how I can avoid this, some advise would be great, thanks in advance!
If your problem is that when you group by "name" you end up grouping two names that are identical, but refer to different people, than your "name" column is not unique. Try using a combination of columns that make the group by unique or group by using a unique column.
You can do for example group by name, other_column, where other_column is a column that in conjunction with "name" identify uniquely the person. Or even better group by personal_id., if you have a unique column like a social security number, or something like that.
As another option, you can use window functions to count without grouping by. For example :
select
...
name,
COUNT(occupation) OVER(PARTITION BY name)
...
from
my_table
You can learn how to use it from here:
https://www.postgresql.org/docs/current/tutorial-window.html

How can I find duplicate records in clickhouse [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to know how I can find duplicate data entries within one table in clickhouse.
I am actually investigating on a merge tree table and actually threw optimize statements at my table but that didn't do the trick. The duplicate entries still persist.
Preferred would be to have a universal strategy without referencing individual column names.
I only want to see the duplicate entries, since I am working on very large tables.
The straight forward way would be to run this query.
SELECT
*,
count() AS cnt
FROM myDB.myTable
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC
If that query gets to big you can run it in pieces.
SELECT
*,
count() AS cnt
FROM myDB.myTable
WHERE (date >= '2020-08-01') AND (date < '2020-09-01')
GROUP BY *
HAVING cnt > 1
ORDER BY date ASC

Picking unique records in SQL [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Say I have a table with multiple records of peoples name, I draw out a prize winner every month. What is a query in SQL that I can use so that I pick up a unique record every month and the person does not get picked on the next month. I do not want to delete the record of that person.
create a new column as a flag named anything like 'prizeFlag' make it boolean take only 0 and 1 or anything as you like, make it's default value is 0 means not get a prize yet
when you select a random column update this filed with 1 means take a prize
when you select a random column next month, Add a condition in WHERE Clause say the prizeFlag not equal 1 to avoid duplication
One should store whether a person has already won. A date would make sense to allow people after say 10 years to win again.
ALTER TABLE ADD won DATE;
A portable way would be to use a random number function outside the SQL.
SELECT MIN(id), MAX(id), COUNT(*) FROM persons
With a random number one can get the next valid ID.
SELECT MIN(ID) FROM persons WHERE won NOT IS NULL AND ID >= ?
int randomno = minId + new Random().nextInt(maxId - minId);
preparedStatement.setInt(1, randomno)
UPDATE persons SET won = CURRENT_DATE WHERE ID = ?

Fastest query to separate half of the table columns as part of DB normalization process [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I use SQL Server 2012. I have a huge table (30Gb) and a pretty basic PC for this amount of data. I have a column (let's name it COL1) in this table, for which there is just one unique value per plenty of columns. I want to start with moving this duplicated data into separate table, where only unique values will be stored. Now the question is how to do that in a fastest way. Selecting count of distinct values for each column grouping by COL1 took me about 5 hours, now I know which columns need to be moved away from the table, but don't want to wait another 6-8 hours to do that. I have a non-clustered index on COL1 and a primary key on record id, please let me know if your solution will work better with some other indexes created.
Table has 50 million rows and about 100 columns. about 40 of columns contain time series data for many companies and about 60 contain descriptive data for each company, which is repeated. COL1 is the unique id of the company. As a result I would like to separate time series data from company description data, so that company description will be in a separate table and will have 1 line per company. There are about 22 thousand unique company ids in the dataset. Most of the company description columns are varchar.
I can't find a way to just take TOP 1 element for each COL1 value. I guess other options will take longer time to execute.
Examples of queries that I can think of:
select distinct tbl.COL1, tbl.add1, tbl.add2, other columns with duplicates...
into newtable
from tbl
select COL1, min(add1), min(add2), min of other columns with duplicates...
into newtable
from tbl
group by COL1
Thanks!
Create a clustered index on Col1 - if you havent got a clusterd index, your table is a heap and every query will involve a table scan. Create a covering index on the rows you want to return. A select DISTINCT (excluding col1) should produce the results you want. Insert into a table with a clustered index on your prefered sort order only.
Assuming your data is non varing you can then loop WHILE and insert where you take values between N*1000 and (N+1)*1000 -1
The add any further indexes which are helpful for returning your data