Structure of nested queries in SQL that are based on frequencies - sql

Noob question.
Trying to put together a nested query and having some issues.
So I have one table, it contains information on store transactions.
I am trying to pull the store name and value for those stores where there are many occurences.
So something like this made sense to me. I realize that some nested query is needed whereby I need to add something to the where section whereby I pick only those stores that meet the activity condition.
SELECT store, value
FROM table
WHERE group = 'C'
AND (
SELECT store,
count(*) as total_activity
FROM table
WHERE group = 'C'
GROUP BY store
) > 1000
The issue is that I'm not sure how to structure the query.
Expected output would be
store value
101 3442
101 3434
101 3433
102
102
102
..
The table has a store column, so the count just gets a count of their frequency. I just want the rows where the store occurs frequently

SELECT store, value
FROM (
SELECT store, value,
count(*) as total_activity
FROM table
WHERE group = 'C', total_activity > 1000
GROUP BY store )

Related

Approaching SQL query building

I'm unsure what method to use in creating a query. Each week I need to pull a Count of invoices grouped by status types and the most recent Invoice entered.
I have many vendor tables that store sales records and I create a report each week that pulls the following.
Select Invoice_status, COUNT(Invoice_status) As Total, Max(Invoice_date)
From VendorABCRecordsTable
Group By Invoice_Status
Results for each vendor
|Invoice_status| Total | column3
I run this for VendorABCRecordsTable, Vendor123RecordsTable, VendorXYZRecordsTable and copy paste the results to a spread sheet.
How would I write it so the results would come out
Vendor | Invoice_status | Total | column3
VendorABC |
Vendor123 |
VendorXYZ |
SELECT
'VendorABC' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from VendorABCRecordsTable
group by Invoice_Status
UNION ALL SELECT
'Vendor123' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from Vendor123RecordsTable
group by Invoice_Status
UNION ALL SELECT
'VendorXYZ' Vendor
,Invoice_status
,COUNT(Invoice_status) Total
,MAX(Invoice_date) MaxInvoiceDate
from VendorXYZRecordsTable
group by Invoice_Status
Note, UNION ALL, as there will be no duplicate rows. If it needs to be ordered somehow, add an ORDER BY clause to (only!) the last query. (And you can name it as "Column3", if necessary.)
Definitely not optimized schema. However, perhaps a UNION query will help you deal with this.
SELECT *, "ABC" AS Vendor FROM VendorABCRecordsTable
UNION SELECT *, "123" FROM Vendor123RecordsTable
UNION SELECT *, "XYZ" FROM VendorXYZRecordsTable;
In order to use the * wildcard, fields must be in same order and same data types and same number of fields in table design. Otherwise, explicitly list fields in each SELECT. There is a limit of 50 SELECT lines. First SELECT line defines field names and data types.
Now use that query in subsequent queries.
Cannot edit data via UNION query.
Could specify fields to do aggregate calcs and grouping in each SELECT line. However, building UNION with raw data provides a dataset that can serve as source for various manipulations of data. This UNION is essentially what properly designed table would be like.

How can I remove duplicate rows from a table but keeping the summation of values of a column

Suppose there is a table which has several identical rows. I can copy the distinct values by
SELECT DISTINCT * INTO DESTINATIONTABLE FROM SOURCETABLE
but if the table has a column named value and for the sake of simplicity its value is 1 for one particular item in that table. Now that row has another 9 duplicates. So the summation of the value column for that particular item is 10. Now I want to remove the 9 duplicates(or copy the distinct value as I mentioned) and for that item now the value should show 10 and not 1. How can this be achieved?
item| value
----+----------------
A | 1
A | 1
A | 1
A | 1
B | 1
B | 1
I want to show this as below
item| value
----+----------------
A | 4
B | 2
Thanks in advance
You can try to use SUM and group by
SELECT item,SUM(value) value
FROM T
GROUP BY item
SQLfiddle:http://sqlfiddle.com/#!18/fac26/1
[Results]:
| item | value |
|------|-------|
| A | 4 |
| B | 2 |
Broadly speaking, you can just us a sum and a GROUP BY clause.
Something like:
SELECT column1, SUM(column2) AS Count
FROM SOURCETABLE
GROUP BY column1
Here it is in action: Sum + Group By
Since your table probably isn't just two columns of data, here is a slightly more complex example showing how to do this to a larger table: SQL Fiddle
Note that I've selected my rows individually so that I can access the necessary data, rather than using
SELECT *
And I have achieved this result without the need for selecting data into another table.
EDIT 2:
Further to your comments, it sounds like you want to alter the actual data in your table rather than just querying it. There may be a more elegant way to do this, but a simple way use the above query to populate a temporary table, delete the contents of the existing table, then move all the data back. To do this in my existing example:
WITH MyQuery AS (
SELECT name, type, colour, price, SUM(number) AS number
FROM MyTable
GROUP BY name, type, colour, price
)
SELECT * INTO MyTable2 FROM MyQuery;
DELETE FROM MyTable;
INSERT INTO MyTable(name, type, colour, price, number)
SELECT * FROM MyTable2;
DROP TABLE MyTable2;
WARNING: If youre going to try this, please use a development environment first (i.e one you don't mind breaking!) to ensure it does exactly what you want it to do. It's imperative that your initial query captures ALL the data you want.
Here is the SQL Fiddle of this example in action: SQL Fiddle

Assign unique ID's to three tables in SELECT query, ID's should not overlap

I am working on SQL Sever and I want to assign unique Id's to rows being pulled from those three tables, but the id's should not overlap.
Let's say, Table one contains cars data, table two contains house data, table three contains city data. I want to pull all this data into a single table with a unique id to each of them say cars from 1-100, house from 101 - 200 and city from 300- 400.
How can I achieve this using only select queries. I can't use insert statements.
To be more precise,
I have one table with computer systems/servers host information which has id from 500-700.
I have another tables, storage devices (id's from 200-600) and routers (ids from 700-900). I have already collected systems data. Now I want to pull storage systems and routers data in such a way that the consolidated data at my end should has a unique id for all records. This needs to be done only by using SELECT queries.
I was using SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS UniqueID and storing it in temp tables (separate for storage and routers). But I believe that this may lead to some overlapping. Please suggest any other way to do this.
An extension to this question:
Creating consistent integer from a string:
All I have is various strings like this
String1
String2Hello123
String3HelloHowAreYou
I Need to convert them in to positive integers say some thing like
String1 = 12
String2Hello123 = 25
String3HelloHowAreYou = 4567
Note that I am not expecting the numbers in any order.Only requirement is number generated for one string should not conflict with other
Now later after the reboot If I do not have 2nd string instead there is a new string
String1 = 12
String3HelloHowAreYou = 4567
String2Hello123HowAreyou = 28
Not that the number 25 generated for 2nd string earlier can not be sued for the new string.
Using extra storage (temp tables) is not allowed
if you dont care where the data comes from:
with dat as (
select 't1' src, id from table1
union all
select 't2' src, id from table2
union all
select 't3' src, id from table3
)
select *
, id2 = row_number() over( order by _some_column_ )
from dat

SQL Query relating to grouping entries

When I normalised my database, I used a text value to group together entries giving them the same foreign key. However, I also had 2 other fields prior to normalisation which used reference numbers to group together entries, one into pairs of entries and one into groups of entries. Because these grouped entries did not necessarily have the same text value, some entries will have been left out of the grouping (shared foreign key id).
I need a query which selects all entries which share a PairRef or GroupRef where the entries in that group do not all have the same ForeignKeyID.
Example:
ID PairRef GroupRef ForeignKeyID TextValue (in linked table2)
1 25 25 123 'Text value 123'
2 25 25 255 'Text value 255'
3 1 50 201 'Text value 201'
4 1 50 201 'Text value 201'
5 2 50 202 'Text value 202'
6 2 50 202 'Text value 202'
7 3 50 203 'Text value 203'
8 3 50 203 'Text value 203'
I then need to be able to edit the data to group them together. The problem is that in order to do this, I would need the query to be from more than one table because I need to see the text associated with the foreign key. I have found that using phpMyAdmin, although I can create queries from more than one table using inner joins, the results of these queries cannot be edited in the way that queries from one table can.
I guess the alternative would be to do an update query on the query results. Could you give an example of a quick and easy way of doing an update query on query results, without losing the original query which needs to be used repeatedly.
In the above example, for the regrouping of ID 1 and ID 2 which share the same PairRef, I would need to physically look at TextValue 123 and 255 and depending which one was the more appropriate text label, I would decide on which entry to change. Let's say that 'Text value 123' was the value I wanted to retain for that grouping. I would update ID 2 to ForeignKeyID 123, which would obviously automatically change the TextValue for that entry to 'Text value 123'.
For the regrouping of IDs 3 to 8, which share the same GroupRef, if I decided after looking at the data to re-group them all as 'Text value 201', I would change IDs 5,6,7 and 8 to ForeignKeyID = 201, which would automatically change all the TextValues for those entries to 'Text Value 201'.
IDs 1 to 8 would then no longer appear on the query results because the grouping problem would have been resolved and they would no longer meet the query criteria.
I need to find the easiest way possible of doing this, as grouping entries together is one of the main purposes of the databases and there is a lot of this editing to do.
Thank you
For the first part (select all entries which share a PairRef or GroupRef where the entries in that group do not all have the same ForeignKeyID), the following query can be used. It groups by PairRef and selects PairRefs which have more than 1 distinct ForeignKeyID. Then, all entries which have these PairRefs are selected. Similarly, the data is grouped by GroupRef also. All GroupRefs which have more than 1 distinct ForeignKeyID are selected. Then, all entries which have these GroupRefs are selected.
SELECT
T1.*
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ForeignKeyID = T2.ForeignKeyID
WHERE PairRef IN
(
SELECT
PairRef
FROM table1
GROUP BY PairRef
HAVING Count(DISTINCT ForeignKeyID) > 1
)
OR GroupRef IN
(
SELECT
GroupRef
FROM table1
GROUP BY GroupRef
HAVING Count(DISTINCT ForeignKeyID) > 1
);
For the second part (edit the data to group them together), I do not understand why you would need to see the TextValue from table2 (if it is corresponding to ForeignKeyID in table1). Anyway, once you have seen the PairRefs / GroupRefs, which have different ForeignKeyID values, you can run an update statement for each PairRef / GroupRef, since it seems like a manual process.
UPDATE Table1
SET ForeignKeyID = <ForeignKeyID to be set>
WHERE PairRef = <PairRef to update>;
UPDATE Table1
SET ForeignKeyID = <ForeignKeyID to be set>
WHERE GroupRef = <PairRef to update>;
You may need to run the first query again to check the data, because the UPDATE query for GroupRef might result in different values for PairRef.
Here is a SQL Fiddle demo. Thank you, #JohnLBevan for the stub.

SQL - Insert using Column based on SELECT result

I currently have a table called tempHouses that looks like:
avgprice | dates | city
dates are stored as yyyy-mm-dd
However I need to move the records from that table into a table called houses that looks like:
city | year2002 | year2003 | year2004 | year2005 | year2006
The information in tempHouses contains average house prices from 1995 - 2014.
I know I can use SUBSTRING to get the year from the dates:
SUBSTRING(dates, 0, 4)
So basically for each city in tempHouses.city I need to get the the average house price from the above years into one record.
Any ideas on how I would go about doing this?
This is an SQL Server approach, and a PIVOT may be a better, but here's one way:
SELECT City,
AVG(year2002) AS year2002,
AVG(year2003) AS year2003,
AVG(year2004) AS year2004
FROM (
SELECT City,
CASE WHEN Dates BETWEEN '2002-01-01T00:00:00' AND '2002-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2002,
CASE WHEN Dates BETWEEN '2003-01-01T00:00:00' AND '2003-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2003
CASE WHEN Dates BETWEEN '2004-01-01T00:00:00' AND '2004-12-31T23:59:59' THEN avgprice
ELSE 0
END AS year2004
-- Repeat for each year
)
GROUP BY City
The inner query gets the data into the correct format for each record (City, year2002, year2003, year2004), whilst the outer query gets the average for each City.
There many be many ways to do this, and performance may be the deciding factor on which one to choose.
The best way would be to use a script to perform the query execution for you because you will need to run it multiple times and you extract the data based on year. Make sure that the only required columns are city & row id:
http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
INSERT INTO <table> (city) VALUES SELECT DISTINCT `city` from <old_table>;
Then for each city extract the average values, insert them into a temporary table and then insert into the main table.
SELECT avg(price), substring(dates, 0, 4) dates from <old_table> GROUP BY dates;
Otherwise you're looking at a combination query using joins and potentially unions to extrapolate the data. Because you're flattening the table into a single row per city it's going to be a little tough to do. You should create indexes first on the date column if you don't want the database query to fail with memory limits or just take a very long time to execute.