"unpivot" data from Excel - sql

I need to clean up some ugly data. What I have is similar to
ID,someFields,Supplier,Supplier_1,Supplier_2,Price,Price_1,Price_2,Weight; Weight_1,Weight_2
and so forth. Fields are named up to _9 and there are actually 8 different such fields named _1 to _9. Of course Price_1 is for Supplier_1 and so forth.
I would now like to unpivot to
ID,someFields,Supplier,Price,Weight
by duplicating ID and somefields.
An important note is that those _1 to _9 fields can be null, in fact most of them are.
Tools I have.
Excel
MS Access
could (mis)use oracle schema I have access to...
I found this
How to simulate UNPIVOT in Access 2010?
However that also multiplies rows that only have 1 Supplier.
Any ideas?

You can use a union query.
SELECT * INTO NewTable FROM
(SELECT ID,someFields,Supplier,Price,Weight FROM Table
WHERE SomeField Is Not Null
UNION ALL
SELECT ID,someFields1,Supplier1,Price1,Weight1 FROM Table
WHERE SomeField1 Is Not Null
<...>)

Related

IF Field Exists in StandardSQL

I have a table with these columns:
Apples
Bananas
Peaches - however, this column may or may not
appear. The table is dropped and loaded every 5 hours and I need to
be ready for situation where column "Peaches" is not available.
I have found couple similar questions here on StackOverflow but they were all using LegacySQL to solve the problem.
I was trying something like this:
SELECT *
FROM project.dataset.fruits
WHERE EXISTS(
SELECT peaches
FROM project.dataset.fruits
)
The code gives me that "peaches" is unknown name in case the "fruits" table does not currently have the column and the entire query fails.
Any ideas how to get around this?
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM `project.dataset.fruits`
WHERE EXISTS (
SELECT 1 FROM `project.dataset.fruits` t
WHERE REGEXP_CONTAINS(TO_JSON_STRING(t), '[{,]"peaches":')
LIMIT 1
)
You may use INFORMATION_SCHEMA
SELECT
1
FROM
`project.dataset.INFORMATION_SCHEMA.COLUMNS
WHERE
table_name="fruits" AND column_name="peaches"

I need to SELECT a group of columns not null

Basically, i have a table that have a series of columns named:
ATTRIBUTE10, ATTRIBUTE11, ATTRIBUTE12 ... ATTRIBUTE50
I want a query that gives me all the columns from ATTRIBUTE10 to ATTRIBUTE50 not null
As others have commented we aren't exactly sure of your requirements, but if you want a list the UNPIVOT can do that...
SELECT attribute , value
FROM
(SELECT * from YourFile) p
UNPIVOT
(value FOR attribute IN
(attribute1, attribute2, attribute3, etc.)
)AS unpvt
May be you can use where condition for all columns Or use between operator as below.
For All Columns
where ATTRIBUTE10 is not null and ATTRIBUTE11 is not null ...... and ATTRIBUTE50 is not null
By using between operator
where ATTRIBUTE10 between ATTRIBUTE11 and ATTRIBUTE50
One way to approach the problem is to unfold your table-with-a-zillion-like-named-attributes into one in which you've got one attribute per row, with appropriate foreign keys back to the original table. So something like:
CREATE TABLE ATTR_TABLE AS
SELECT ID_ATTR, ID_TABLE_WITH_ATTRS, ATTR
FROM (SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+1 AS ID_ATTR, ID_TABLE_WITH_ATTRS, ATTRIBUTE10 AS ATTR FROM TABLE_WITH_ATTRS UNION ALL
SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+2, ID_TABLE_WITH_ATTRS, ATTRIBUTE11 FROM TABLE_WITH_ATTRS UNION ALL
SELECT ((ID_TABLE_WITH_ATTRS-1)*100)+3, ID_TABLE_WITH_ATTRS, ATTRIBUTE12 FROM TABLE_WITH_ATTRS);
This only unfolds ATTRIBUTE10, ATTRIBUTE11, and ATTRIBUTE12, but you should be able to get the idea - the rest of the attributes just requires a little cut-n-paste on your part.
You can then query this table to find your non-NULL attributes as
SELECT *
FROM ATTR_TABLE
WHERE ATTR IS NOT NULL
ORDER BY ID_ATTR
Hopefully the difficulty you're encountering in dealing with this table-with-a-zillion-repeated-fields teaches you a hard lesson about exactly why tables with repeated fields or groups of fields are a Bad Idea.
dbfiddle here

Compare tables and Find the missing records

I am trying to compare a table T1 and a view v1 and find the missing records from the table T1 and display the results in a excel when a button is clicked. I am trying the wrap up the situation into a stored procedure and call it from vba code. I am not sure on how to start this.. The field names are different in both the tables, although it has same data. Any help will be much appreciated. I have tried many code samples , but I didn't achieve what I want..
Table T1
alpha.FileID
Master Policy Number
Insurance Name
View V1
FileID
PolNO
InsName
These are the few columns. Though, they have different field names, the data are the same. Some times the records are missing in the table v1, and I need to compare the two tables and find the missing records of the table v2.
SELECT View_v1.[Insured Name]
FROM View_v1
WHERE View_v1.alpha.FileID NOT IN
(
SELECT Table_t1.FileID
FROM Table_t1
)
An except clause is the easiest way to do this:
SELECT FileID, PolNO, InsName
FROM View V1
EXCEPT
SELECT FileID, MasterPolicyNumber, InsuranceName
FROM Table T1
This will give you the rows in the first select that do not exist in the second select (depending on your desired results you might flip the top and bottom selects). As long as the data types and number of columns are the same, the name of each field doesn't matter. Your result set will show the field names of the first select.
Also since you didn't specify your dbms, "MINUS" is used instead of "EXCEPT" for some dbms's like Oracle.
I believe this is what you're looking for based on your description.
I'm comparing every field, not just FileID as your example appears to be attempting. So, if you truly want to look only for missing FileIDs, just remove the other join on conditions.
SELECT View_v1.FileID, View_v1.PolNO, View_v1.InsName
FROM View_v1
LEFT JOIN Table_t1
on View_v1.FileID = Table_t1.FileID
and View_v1.PolNO = Table_t1.[Master Policy Number]
and View_v1.InsName = Table_t1.[Insurance Name]
WHERE Table_t1.FileID is null

SQL Server query with extremely large IN clause results in numerous queries in activity monitor

SQL Server 2014 database. Table with 200 million rows.
Very large query with HUGE IN clause.
I originally wrote this query for them, but they have grown the IN clause to over 700 entries. The CTE looks unnecessary because I have omitted all the select columns and their substring() transformations for simplicity.
The focus is on the IN clause. 700+ pairs of these.
WITH cte AS (
SELECT *
FROM AODS-DB1B
WHERE
Source+'-'+Target
IN
(
'ACY-DTW',
'ACY-ATL',
'ACY-ORD',
:
: 700+ of these pairs
:
'HTS-PGD',
'PIE-BMI',
'PGD-HTS'
)
)
SELECT *
FROM cte
order by Source, Target, YEAR, QUARTER
When running, this query shoots CPU to 100% for hours - not unexpectedly.
There are indexes on all columns involved.
Question 1: Is there a better, or more effecient way to accomplish this query other than the huge IN clause? Would 700 UNION ALLs be better?
Question 2: When this query runs, it creates a Session_ID that contains 49 "threads" (49 processes that all have the same Session_ID). Every one of them an instance of this query with it's "Command" being this query text.
21 of them SUSPENDED,
14 of them RUNNING, and
14 of them RUNNABLE.
This changes rapidly as the task is running.
WHAT the heck is going on there? Is this SQL Server breaking the query up into pieces to work on it?
I recommend you store your 700+ strings in a permanent table as it is generally perceived as bad practice to store that much meta data in a script. You can create the table like this:
CREATE TABLE dbo.LookUp(Source varchar(250), Target varchar(250))
CREATE INDEX IX_Lookup_Source_Target on dbo.Lookup(Source,Target)
INSERT INTO dbo.Lookup (Source,Target)
SELECT 'ACY','DTW'
UNION
SELECT 'ACY','ATL'
.......
and then you can simply join on this table:
SELECT * FROM [AODS-DB1B] a
INNER JOIN dbo.Lookup lt ON lt.Source = a.Source
AND lt.Target=a.Target
ORDER BY Source, Target, YEAR, QUARTER
However, even better would be to normalise the AODS-DB1B table and store SourceId and TargetId INT values instead, with the VARCHAR values stored in Source and Target tables. You can then write a query that only performs integer comparisons rather than string comparisons and this should be much faster.
Put all of your codes into a temporary table (or permamnent if suitable).....
SELECT *
FROM AODS-DB1B
INNER JOIN NEW_TABLE ON Source+'-'+Target = NEWTABLE.Code
WHERE
...
...
you can create a temp table with all those values and then JOIN to that table, it would make the process a lot faster
I like the answer from Jaco
Have an index on source, target
It may be worth giving this a try
where ( source = 'ACY' and target in ('DTW', 'ATL', 'ORD') )
or ( source = 'HTS' and target in ('PGD') )

SQL Contains query

I have two tables (A and B) that contain ID's however in table B some records have these ID's grouped together e.g the IDExec column may consist of a record that looks like 'id1 id2'. I'm trying to find the ID's in table A that do not appear in table B. I thought that by using something like:
SELECT *
FROM A
WHERE NOT EXISTS( SELECT *
FROM B
WHERE Contains(A.ExecID, B.ExecID))
This isn't working as contains needs the 2nd parameter to be string, text_lex or variable.
Do you guys have a solution to this problem?
To shed more light on the above problem the table strucutres are as follows:
Table A (IDExec, ProdName, BuySell, Quantity, Price, DateTime)
Table B (IDExec, ClientAccountNo, Quantity)
The C# code I've created to manipulate the buysell data in Table A groups up all the buysell's of the same product on a given day. The question now is how would you guy normalise this so I'm not bastardizing IDExec? Would it be better to create a new ID column in Table B called AllocID and link the two tables like that? So something like this:
Table A (IDExec, AllocID, ProdName, BuySell, Quantity, Price, DateTime)
Table B (AllocID, ClientAccountNo, Quantity)
This data should be normalized, storing multiple values in one field is a bad idea.
A workaround is using LIKE:
SELECT *
FROM A
WHERE NOT EXISTS( SELECT *
FROM B
WHERE ' '+B.ExecID+' ' LIKE '% '+A.ExecID+' %')
This is using space delimited values per your example.
This is kind of crude, but it will give you all of the entries in A that are not contained in B.
SELECT * FROM A WHERE A.ExecID not in (SELECT ExecID from B);
I have a very simple solution. It's called normalization. Proper modeling can work wonders for query simplicity and accuracy.
However, you may be stuck with what you have. Assuming ExecID is a string in both tables, try this:
select *
from A
where not exists(
select *
from B
where ExecID like '%' || a.ExecID || '%';
This is a horrible query as it performs a complete table scan of B for every row in A and the subquery is susceptible to false hits, so maybe you can do better, but your best course ultimately is a touch of database refactoring.