I've set up a view which combines all the data across several tables. Is there a way to write this so that only columns which contain non-null data are displayed, and those columns which contain all NULL values are not included?
ADDED:
Sorry, still studying and working on my first big project so every day seems to be a new experience at the minute. I haven't been very clear, and that's partly because I'm not sure I'm going about things the right way! The client is an academic library, and the database records details of specific collections. The view I mentioned is to display all the data held about an item, so it is bringing together tables on publication, copy, author, publisher, language and so on. A small number of items in the collection are papers, so have additional details over and above the standard bibliographic details. What I didn't want was a user to get all the empty fields relating to papers if what was returned only consisted of books, therefore the paper table fields were all null. So I thought perhaps there would be a way to not show these. Someone has commented that this is the job of the client application rather than the database itself, so I can leave this until I get to that phase of the project.
There is no way to do this in sql.
CREATE VIEW dbo.YourView
AS
SELECT (list of fields)
FROM dbo.Table1 t1
INNER JOIN dbo.Table2 t2 ON t1.ID = t2.FK_ID
WHERE t1.SomeColumn IS NOT NULL
AND t2.SomeOtherColumn IS NOT NULL
In your view definition, you can include WHERE conditions which can exclude rows that have certain columns that are NULL.
Update: you cannot really filter out columns - you define the list of columns that are part of your view in your view definition, and this list is fixed and cannot be dynamically changed......
What you might be able to do is us a ISNULL(column, '') construct to replace those NULLs with an empty string. Or then you need to handle excluding those columns in your display front end - not in the SQL view definition...
The only thing I see you could do is make sure to select only those columns from the view that you know aren't NULL:
SELECT (list of non-null fields) FROM dbo.YourView
WHERE (column1 IS NOT NULL)
and so forth - but there's no simple or magic way to select all columns that aren't NULL in one SELECT statement...
You cannot do this in a view, but you can do it fairly easily using dynamic SQL in a stored procedure.
Of course, having a schema which shifts is not necessarily good for clients who consume the data, but it can be efficient if you have very sparse data AND the consuming client understands the varying schema.
If you have to have a view, you can put a "header" row in your view which you can inspect client-side on the first row in your loop to see if you want to not bother with the column in your grid or whatever, you can do something like this:
SELECT * FROM (
-- This is the view code
SELECT 'data' as typ
,int_col
,varchar_col
FROM TABLE
UNION ALL
SELECT 'hdr' as typ
-- note that different types have to be handled differently
,CASE WHEN COUNT(int_col) = 0 THEN NULL ELSE 0 END
,CASE WHEN COUNT(varchar_col) = 0 THEN NULL ELSE '' END
FROM TABLE
) AS X
-- have to get header row first
ORDER BY typ DESC -- add other sort criteria here
If we're reading your question right, there won't be a way to do this in SQL. The output of a view must be a relation - in (over-)simplified terms, it must be rectangular. That is, each row must have the same number of columns.
If you can tell us more about your data and give us some idea of what you want to do with the output, we can perhaps offer more positive suggestions.
In general, add a WHERE clause to your query, e.g.
WHERE a IS NOT NULL AND b IS NOT NULL AND c IS NOT NULL
Here, a b c are your column names.
If you are joining tables together on potentially NULL columns, then use an INNER JOIN, and NULL values will not be included.
EDIT: I may have misunderstood - the above filters out rows, but you may be asking to filter out columns, e.g. you have several columns and you only want to display columns that contain at least one null value across all the rows you are returning. Using dynamic SQL offers a solution, since the set columns varies depending upon your data.
Here's a SQL query that builds another SQL query containing the appropriate columns. You could run this query, and then submit it's result as another query. It assumes 'pk' is some column that is always non-null, e.g. a primary key - this means we can prefix additional row names with a comma.
SELECT CONCAT("SELECT pk"
CASE (count(columnA)) WHEN 0 THEN '' ELSE ',columnA' END,
CASE (count(columnB)) WHEN 0 THEN '' ELSE ',columnB' END,
// etc..
' FROM (YourQuery) base')
FROM
(YourQuery) As base
The query works using Count(column) - the aggregate function ignores NULL values, and so returns 0 for a column consisting entirely of NULLs. The query builder assumes that YourQuery uses aliases to ensure there no duplicate column names.
While you cant put this into a view, you could wrap it up as a stored procedure that copies the data to another table - the result table. You may also set up a trigger so that the result table is updated whenever the base tables change.
I suspect what's going on is that an end user is running CrystalReports and complaining about all the empty columns that have to be removed manually.
It would actually be possible to create a stored procedure that would create a view on the fly, leaving out dataless columns. But then you would have to run this proc before using the view.
Is that acceptable?
Related
I need to update values in a column in a specific table that exists in all our databases, but do not know the name of the column as it is user-generated.
I have two tables: one of them with user-generated columns tab_Case. In this table there is a column attachment that I need to update if the following condition applies: WHERE attachment = '0' (if true then the value needs to be changed to NULL).
In its simplest form the update query would look something like this:
UPDATE tab_Case
SET attachment = 'NULL'
WHERE attachment = '0'
This table is used in all our databases, so I need to write a query general enough to be usable across all of them.
The problem is that as the table uses user-generated columns, I have no way of knowing what the exact name is of concerned column-type, and exactly how many of those columns exist in the table.
I can, however, find out the type of the column by looking it up in another table tbl_itemPart inner joined with tbl_ValueType, like this:
SELECT ip.DbReference, ip.DbTableName, vt.ValueDescription
FROM tbl_itemPart ip
INNER JOIN tbl_ValueTypes vt ON ip.ValueTypeId = vt.ValuetypeId
WHERE vt.ValueDescription = 'file'
AND ip.DbTableName = 'tab_Case'
The columns I need are always of type 'file' and as the tab_Case table is referenced in tbl_ItemPart it is easy to find out 1) if any columns of type 'file' exist in this table, and 2) when true, what their respective names are.
So great, now I know the names of the columns that I need to potentially update. But, this is where I get lost: how do I use that information in my update query?
How do I write a script that first checks the tbl_itemPart for existence of any columns in tab_Case of type ' file', then retrieves the actual values (= names of those columns) from the DbReference column in tbl_itemPart and then finally uses those values in the update query for tab_Case?
Remember that this scripts needs to automatically do this for each of our databases, so I do not want to look up column names manually per database and then adjust my script accordingly for each of the databases.
I am very new to programming, and may be missing something very obvious, but so far I haven't been able to find a solution, or any relevant information to help me on my way.
I have two tables I am left joining together. The first tables has transnational level detail, causing the key I join to the second table to duplicate. When I left join the second table, the measure "company_spend" is highly inflated.
I need a way to keep only a single value of the duplicated data, and my thought was to run a distinct function on only those columns, but I am not seeing that Bigquery supports distinct functions on only a few columns, but not all.
SELECT UPPER(cwnextt.Current_Contract_Number) AS Current_Contract_Number,
UPPER(cwnextt.Replacement_Contract_Number) AS Replacement_Contract_Number,
UPPER(cwnextt.Current_Contract_Name) AS Current_Contract_Name,
UPPER(cwnextt.Supplier_Top_Parent_Entity_Code) AS Supplier_Top_Parent_Entity_Code,
UPPER(cwnextt.Supplier_Top_Parent_Name) AS Supplier_Top_Parent_Name,
UPPER(cwnextt.company_Entity_Code) AS company_Entity_Code,
UPPER(cwnextt.Facility_Name) AS Facility_Name,
smart.company_Spend AS companySpend
FROM `test_etl_field.contracts_with_member_entity_codes_test_view_2` cwnextt
--this table is what is causing the below table to duplicate,
--but I need all of this data AS well in its current format.
LEFT JOIN `test.trans_analysis` tsa
ON TRIM(UPPER(cwnextt.company_entity_code)) = TRIM(UPPER(tsa.company_entity_code))
AND TRIM(UPPER(cwnextt.Supplier_Top_Parent_Entity_Code)) = TRIM(UPPER(tsa.manufacturer_top_parent_entity_code))
AND TRIM(UPPER(cwnextt.Current_Contract_Name)) = TRIM(UPPER(tsa.contract_category))
AND cwnextt.spend_period_yyyyqmm = tsa.spend_period_yyyyqmm
--this table contains "company_spend" which is now duplicated
LEFT JOIN `test_etl_field.ecr_smart_data` smart
ON smart.company_entity_code = cwnextt.company_entity_code
AND (smart.contract_number = cwnextt.current_contract_number
OR smart.contract_number = cwnextt.replacement_contract_number)
AND smart.month_key = cwnextt.spend_period_yyyyqmm
If something can be created that will keep company_spend from duplicating on the second left join, that is what I am after.
Not sure to understand all the details of your problem but here's a fact from BigQuery doc :
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows
and returns only the remaining rows.
You can't apply DISTINCT on specific columns because it doesn't make sense. Let's say you have 4 columns and call DISTINCT on 3 columns, what is SQL supposed to do with the last one ?
You must tell SQL which value to keep for the remaining column and GROUP BY is the right solution here.
So if you want to:
Remove a column that has been duplicated : Just adjust your SELECT to get only the columns you want
Remove lines that have the same value in specific columns : I would suggest a GROUP BY on the targeted column and taking the aggregation you want (first, avg, sum or whatever) for the remaining ones.
Remove the value from a row if another row has the same : You may not want to do that. A row has to keep its value and you won't get it back. Besides, same problem, which row do you want to keep ?
Hope this helps ! Feel free to give clarification on your problem if you want more specific answers.
While I couldn't resolve this issue in SQL, I used Tableau via a FIXED LOD to aggregate the data passed duplicates so the end user could visualize the output with accuracy. Not ideal, but the SQL route wasn't make sense.
I wanted to retrieve some data from a table based on two columns see the below table structure
Update
i want the output data based on two condition
1. if the code value is having 'Web' or 'Offline'.
2. Memo column is having data same as Pre_memo column.
Output should be as shown below
So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data.
select distinct OrderTable.Memo,
max(OrderTable.Memo_Date) as Date1,
max(ot.Pre_Memo_Date) as Date2
from OrderTable,
OrderTable ot
where OrderTable.code in ('Web')
and ot.code in ('Offline')
and OrderTable.Memo = ot.Pre_Memo
group by OrderTable.Memo
Can anyone help on this? With the use of OrderTable only once in the query and filter based on memo and pre_memo column as it's having same data?
You can use union all and do the conditional aggregation :
select Memo, max(case when code = 'Offline' then Date end) as Memo_date,
max(case when code = 'Web' then Date end) as Per_Memo_date
from (select Date, 'Web' as code, Pre_memo as Memo
from OrderTable o
where code = 'Web'
union all
select Date, 'Offline', Memo
from OrderTable o
where code = 'Offline'
) t
group by Memo;
"I wanted to retrieve some data from a table based on two columns see the below table structure"
Providing a sample is sufficient to illustrate the problem (and it is desirable to do so on SO) but it is not sufficient and thus not a replacement for defining the problem, which you have failed to do.
Absent such definition of the problem, we can only guess what you're trying to achieve. E.G.
from the subset of tuples that have 'Offline' for 'code' value, take the MAX() 'Date' value per appearing value of 'Memo'.
Match that (using some matching condition) to the subset of tuples that have 'Web' for 'code value and retain the 'Date' value from those as 'Memo_date' in the result set.
matching condition being that 'Memo' value of [a tuple in] the former is equal to 'Pre_memo' value in [the matching tuple in] the latter.
If all that is correct, then that explains why it is impossible to do this in SQL without having at least two references. You cannot avoid doing some kind of matching, and matching by definition takes two distinct things to match (even if the two distinct things are distinct subsets of one and the same thing). In fact it is almost certainly a fundamental design mistake for you to have those two distinct things in one single table, probably under the totally misguided belief that "having everything in one table makes things easier".
"So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data"
From the way you have presented the question, I suspect that you were hoping for some means to exploit the fact that those 'Offline' tuples are "the next" after a 'Web' tuple, and that you could write the SQL in such a way that the engine could then derive a sort of "single pass" algorithm (which you probably assume would go faster).
It does not work like that. SQL tables have no inherent ordering and as a consequence there simply ain't no such thing as "the next" in a table.
I'm trying to achieve 2 joins. If I run the 1st join alone it pulls 4 lots of results, which is correct. However when I add the 2nd join which queries the same reference table using the results from the select statement it pulls in additional results. Please see attached. The squared section should not be being returned
So I removed the 2nd join to try and explain better. See pic2. I'm trying to get another column which looks up InvolvedInternalID against the initial reference table IRIS.Practice.idvClient.
Your database is simply doing as you tell it. When you add in the second join (confusingly aliased as tb1 in a 3 table query) the database is finding matching rows that obey the predicate/truth statement in the ON part of the join
If you don't want those rows in there then one of two things must be the case:
1) The truth you specified in the ON clause is faulty; for example saying SELECT * FROM person INNER JOIN shoes ON person.age = shoes.size is faulty - two people with age 13 and two shoes with size 13 will produce 4 results, and shoe size has nothing to do with age anyway
2) There were rows in the table joined in that didn't apply to the results you were looking for, but you forgot to filter them out by putting some WHERE (or additional restriction in the ON) clause. Example, a table holds all historical data as well as current, and the current record is the one with a NULL in the DeletedOn column. If you forget to say WHERE deletedon IS NULL then your data will multiply as all the past rows that don't apply to your query are brought in
Don't alias tables with tbX, tbY etc.. Make the names meaningful! Not only do aliases like tbX have no relation to the original table name (so you encounter tbX, and then have to go searching the rest of the query to find where it's declared so you can say "ah, it's the addresses table") but in this case you join idvclient in twice, but give them unhelpful aliases like tb1, tb3 when really you should have aliased them with something that describes the relationship between them and the rest of the query tables
For example, ParentClient and SubClient or OriginatingClient/HandlingClient would be better names, if these tables are in some relationship with each other.
Whatever the purpose of joining this table in twice is, alias it in relation to the purpose. It may make what you've done wriong easier to spot, for example "oh, of course.. i'm missing a WHERE parentclient.type = 'parent'" (or WHERE handlingclient.handlingdate is not null etc..)
The first step to wisdom is by calling things their proper names
I want to return columns based on some meta data in an other table. i.e. i have my table which contains 10 columns, and another table which contains those columns denormalise with metadata to do with them.
i.e.
Table - Car:
columns - Make,Model,Colour
and another table called "Flags" which has a row for each of the above columns and each row has a column for "IsSearchable" and "ShowOnGrid" - that sort of thing.
The query i want is one which will return all columns from the cars table that are flagged in the "Flags" table as "ShowInGrid"
----EDIT
Apologise, I should have stated that this is on SQL Server 2008.
Also, I dont want to have to physically state the columns which i would like to return, i.e. If i add a column to the car table, then add it into the Flags table and declare it to be searchable, I don't want to have to physically state in the SQL Query that i want to return that column, i want it to automatically pull through.
You need to use dynamic SQL; this can easily be done with a stored procedure.
Something like this might work:
Select
D.CarID,
Case D.ShowMake When True Then D.Make Else NULL END AS Make
...
From
(Select
C.CarID, C.Make, C.Model, C.Colour, F.IsSearchable, F.ShowOnGrid, F.ShowMake
From
Cars C
Inner Join
Flags F
On C.CarID = F.CarID) D
I didn't write in all the case statements and don't know how many flags you're working, but you can give it a try. It would require to filter on null values in your application. If you actually want the columns omitted on the basis of the Flag column value the other answer and comment are both right on. Either Dynamic SQL or build your query outside in another language first.