Power BI DAX Masterclass

Thu, 18 Dec 2025 17:48:56 GMT

Calculated column vs Mesure

In Power BI, measures and calculated columns serve distinct purposes:

Calculated Columns:

A calculated column creates a new physical column in your data table.
It calculates values row by row based on a formula. Hence, it physically exists in your table.
Use calculated columns when you need persistent data that doesn’t depend on report context. For example, these can be useful for data transformations or manipulations.

Measures:

A measure, on the other hand, does not physically exist in the table. It is calculated dynamically when needed, typically when you visualize data in reports.
Measures are evaluated in the context of the filters and slicers applied in your reports, allowing for more flexible and complex calculations.
They are highly reusable across different visuals and reports, which makes them beneficial for calculations such as totals, averages, and variances.

Key Differences:

Calculated columns operate at the row level and are static until refreshed, while measures are dynamic and depend on the visual filtering context.
The performance can be affected by calculated columns especially when they involve complex calculations and cover large datasets.

Choosing between them depends on your specific needs for data analysis and visualization. Generally, measures are preferred for their flexibility and dynamic nature.

Date Tables

In Power BI, both the CALENDARAUTO and CALENDAR functions are used to create date tables, but they differ in their approach and flexibility:

CALENDARAUTO:

Automatically generates a date table based on the existing date values in your model.
It finds the earliest and latest dates in your data to create a contiguous range of dates without any missing dates.
You don’t need to specify start or end dates, making it quicker to implement. However, it does not allow for customization of the start date or fiscal year considerations.

CALENDAR:

Requires you to manually specify the start and end dates for the date table.
This provides more flexibility as you can set exact dates and adjust for fiscal years.
For example, you might call it with CALENDAR(DATE(2020, 1, 1), DATE(2025, 12, 31)) to define the exact range you desire.

Key Considerations:

If you want an automatic date range based on your existing data without manual input, use CALENDARAUTO.
If you need precise control over the date range and fiscal settings, opt for CALENDAR.

Date Table Script

In our case script is based on existing table sales and column date

DateTable = 
ADDCOLUMNS ( 
CALENDAR(MINX('sales','sales'[date]),MAXX('sales','sales'[date])),
"DateAsInteger", FORMAT ( [date], "YYYYMMDD" ),
 "Year", YEAR ( [date] ), "MonthNo", FORMAT ( [date], "MM" ), 
"YearMonthNo", FORMAT ( [date], "YYYY/MM" ), 
"YearMonth", FORMAT ( [date], "YYYY/mmm" ), 
"MonthShort", FORMAT ( [date], "mmm" ),
"MonthLong", FORMAT ( [date], "mmmm" ), 
"WeekNo", WEEKDAY ( [date] ), 
"WeekDay", FORMAT ( [date], "dddd" ), 
"WeekDayShort", FORMAT ( [date], "ddd" ), 
"Quarter", "Q" & FORMAT ( [date], "Q" ), 
"YearQuarter", FORMAT ( [date], "YYYY" ) & "/Q" & FORMAT ( [date], "Q" ))

Key Measures table

Purpose:

A Key Measures table consolidates all your measures in one place, making them easier to manage and reference across your reports. This organization helps improve readability and access to all key calculations.

Creating the Table:

Go to the home ribbon in Power BI.
Click on "Enter Data" to create a new table.
You don’t need to input any data; simply name the table “Key Measures Table”.
Click "Load" to create the empty table.

Moving Measures:

Once your Key Measures table is created, you can start moving existing measures into this new table for better organization.
Select the measure you wish to relocate and change its destination to the Key Measures table.

Benefits:

By organizing your measures into a Key Measures table, you can not only make your data model cleaner but also leverage the full benefits of measure reusability across different visuals and reports. This organization aids in quicker access and clearer insights during your analysis.

COUNT Agregation functions

In Power BI, the DAX functions COUNT, COUNTA, COUNTBLANK, DISTINCTCOUNT, and COUNTROWS are important for tallying data based on different criteria:

COUNT:

Counts the number of rows that contain numeric data in a specified column. Use it when you specifically want to count only the rows that contain numbers.

COUNTA:

Counts the number of rows that are not empty in a specified column, regardless of the data type (numeric or text). Use this function when you want to count all non-blank entries.

COUNTBLANK:

Counts the number of blank rows in a specified column. This is useful to determine how many entries are missing data.

DISTINCTCOUNT:

Counts the number of unique values in a specified column, excluding duplicates. Use it when you want to analyze how many distinct entries exist in that column.

COUNTROWS:

Counts the number of rows in a table or a table expression. This function is helpful when you want to know the total number of rows available in a specific data table.

Summary:

Use COUNT for numeric-only count.
Use COUNTA for overall non-empty count.
Use COUNTBLANK for counting missing data.
Use DISTINCTCOUNT for counting unique values.
Use COUNTROWS for total row count in a table.

X functions

In Power BI, the X functions (often referred to as iterator functions) are a powerful category of DAX functions that allow you to perform operations on rows of a table, returning a result based on each individual row's context. Here are a few key examples:

SUMX:

Iterates through a table, evaluating an expression for each row and returning the sum of those values. Use it when you want to create a cumulative total based on a calculated or derived column.

AVERAGEX:

Similar to SUMX, but instead, it computes the average of the values returned by the expression for each row in the table.

MINX and MAXX:

These functions return the smallest or largest value, respectively, from a set of values returned by an expression evaluated for each row.

###COUNTX: Counts the number of rows that contain non-blank results from the expression evaluated for each row in the table.

MEDIANX:

Returns the median of a set of values specified by evaluating an expression for each row in the table.

General Use Cases:

The X functions are especially useful for calculations that depend on row context, enabling you to iterate through tables and apply complex logic that standard aggregation functions (like SUM or AVERAGE) cannot handle by themselves.

Key Advantages:

By using iterator functions, you can create more dynamic and context-aware calculations, which can lead to deeper insights in your Power BI reports.

Using X functions can greatly enhance your DAX formulas, providing versatility and power in your data analysis. Understanding how to effectively apply these functions will significantly improve your capability to create sophisticated reports and dashboards in Power BI.

Power BI vs Excel

Power BI and Excel are both powerful tools for data analysis, but they serve different purposes and have distinct functionalities.

Data Structure:

In Excel, calculations are centered around individual cells. You reference these cells directly to perform calculations. For instance, if you want to calculate revenue, you multiply values from specific cells.
In contrast, Power BI operates on a model where the focus is on tables and columns rather than individual cells. The data is structured in tables with relationships between them, allowing for more complex data analysis.

Analysis Capabilities:

Excel is great for quick calculations and straightforward data manipulation, making it ideal for smaller datasets or less complex analyses.
Power BI excels in handling larger datasets and provides advanced analytical capabilities. It employs DAX (Data Analysis Expressions) for creating measures and calculated columns, enabling users to perform sophisticated data modeling and insights generation.

Visualization:

While Excel offers chart and graph capabilities, Power BI provides richer visualizations with dynamic features. It allows users to create intricate dashboards that can interact with the data in real-time.

Collaboration and Sharing:

Power BI offers enhanced features for collaboration and sharing reports online, making it better suited for team environments and organizational use.

Overall, if you need detailed visualizations and work with large datasets, Power BI is typically the better choice. For simpler tasks or if you are already experienced in using Excel, it may suffice for your needs.

Filter vs Row context

In Power BI, understanding the difference between filter context and row context is crucial for effectively using DAX (Data Analysis Expressions).

Filter Context

Definition: Filter context refers to the set of filters applied to the data before performing calculations. It determines which rows of data are included in the calculation.
Usage: An example of using filter context is through the CALCULATE function. For instance, you might use CALCULATE to compute total sales for a specific category:

Code: TotalSalesElectronics = CALCULATE(SUM(Sales[SalesAmount]), FILTER(Sales, Sales[ProductCategory] = "Electronics")) Here, FILTER creates a new table containing only the rows where the product category is "Electronics," thereby modifying the filter context for the SUM function.

Row Context

Definition: Row context exists when a calculation is performed on a per-row basis. This means that for each row in a table, certain expressions are evaluated before any aggregation occurs.
Usage: You often encounter row context when using functions like SUMX or AVERAGEX. For example, with SUMX, it calculates an expression for each row (like quantity times price) and then aggregates those results:

Code: TotalRevenue = SUMX(Sales, Sales[Quantity] * Sales[Price]) In this case, the expression is calculated for each row before summing the results.

Key Differentiations

Context Transition: When you use a function like CALCULATE, it converts a row context into a filter context, allowing you to change how filters are applied to the data.
Memory Usage: Row context calculations might require more memory, as each computed result is stored before aggregation.

Understanding these contexts will help you to build more effective DAX measures in Power BI, ultimately enhancing your data analysis capabilities.

CALCULATE function

The CALCULATE function in DAX is a powerful tool used to change the context in which a calculation is performed in Power BI. Here’s a breakdown of how the CALCULATE function works:

Purpose

CALCULATE modifies the filter context before performing calculations, allowing for nuanced data analysis based on specified criteria.

Syntax

The basic syntax of the CALCULATE function is:

CALCULATE(, , , ...)

: This is the calculation you want to evaluate (e.g., SUM, AVERAGE, etc.).
, , ...: These are the filters that modify the context of the calculation.

Example

For instance, if you want to calculate total sales for a specific category, the formula would look something like this:

TotalSalesElectronics = CALCULATE(SUM(Sales[SalesAmount]), FILTER(Sales, Sales[ProductCategory] = "Electronics"))

In this example:

SUM(Sales[SalesAmount]) is the expression that computes total sales.
FILTER(Sales, Sales[ProductCategory] = "Electronics") modifies the filter context so that only sales data for the "Electronics" category is considered.

Key Points

CALCULATE can be used to apply multiple filters, making it flexible for various scenarios.
It’s essential for creating dynamic measures that can adapt to different user selections and slicers in your reports.

Importance

Understanding how to effectively use CALCULATE is crucial for building robust, context-aware measures in Power BI that can provide deeper insights into your data.

FILTER function

The FILTER function in DAX is a powerful tool for creating custom filters on tables. Could be used as FILTER part of the CALCULATE function:

Purpose

The FILTER function is primarily used to return a table that includes only the rows that meet specific criteria. It operates in a way that lets you specify the filtering conditions dynamically within your DAX formulas.

Syntax

The basic syntax of the FILTER function is:

FILTER(, )


: The table you want to filter.: An expression that defines the conditions that must be met for rows to be included in the returned table.
Example
Suppose you have a Products table and want to create a new table that includes only products with a price greater than $5. You would use the FILTER function as follows:
FilteredProducts = FILTER(Products, Products[Price] > 5)
In this case, FilteredProducts will contain only those rows from the Products table where the price exceeds $5.
Key Points

Table Function: FILTER is considered a table function because it returns a table as a result. This is distinct from scalar functions, which return a single value.
Used in CALCULATE: Often, the FILTER function is used within the CALCULATE function to modify the evaluation context of a measure. This allows you to perform calculations based on a specific subset of data.
Considerations

Using FILTER effectively can lead to more sophisticated data analysis, allowing for greater insights by dynamically adjusting the data subsets being analyzed.
It’s crucial to understand how it interacts with row and filter context to leverage its full potential.
ALL function
The ALL function in DAX is used to remove filters from a specified table or column. This can be particularly useful when you want to perform calculations that require analyzing the complete dataset without any applied filters.
Purpose
The ALL function allows you to ignore any filters in your context, returning the entire table or column, and is often used in conjunction with functions like CALCULATE to create metrics that require a different evaluation context.
Syntax
The syntax for the ALL function is:
ALL()

: This specifies the table or column from which you want to remove filters.
Example

For instance, if you want to calculate the total sales regardless of any filters applied in your report visuals, you could write:
TotalSalesAll = CALCULATE(SUM(Sales[SalesAmount]), ALL(Sales))
In this example:

SUM(Sales[SalesAmount]) calculates the total sales amount.
ALL(Sales) removes any filters applied to the Sales table, ensuring that the total sales calculation considers all rows in the table.

if you want to calculate percentage of the revenue by state filter applied in your report visuals, you could write:
Revenue filtered by a state = Revenur Mesure / CALCULATE((Revenue Measure), ALL(location[state]))
In this example:

Revenur Mesure revenue for each column calcilated by SUMX.
ALL(location[state]) removes any filters applied to the state of the location table, ensuring that the total revenue calculation considers all rows in the table.
Key Points
The ALL function is beneficial for creating calculated measures that need overall insights, such as calculating percentages of total sales or comparing values against grand totals.
Different variations of ALL exist, including ALLSELECTED, which removes filters but keeps the filters applied by the user’s selections in slicers or visuals.
Using the ALL function allows for a more profound and comprehensive analysis across your data.
ALLSELECTED function
The ALLSELECTED function in DAX is used to remove filters from columns or tables while still considering any filters applied in the current report context, such as slicers, without disregarding selections made by the user.
Purpose
It enables you to compute values over a specified range of data, which may involve filters from the visual elements of the report but not those defined in the measure itself.
Syntax
The syntax for ALLSELECTED is:
ALLSELECTED()

: This is the specific table or column for which you want to retain the filters from the user’s selections while removing others.
Example
Suppose you want to calculate the percentage of sales compared to total sales considering only the filters from slicers. You could use:
SalesPercentage = DIVIDE(SUM(Sales[SalesAmount]), CALCULATE(SUM(Sales[SalesAmount]), ALLSELECTED(Sales)))
In this formula:

SUM(Sales[SalesAmount]) calculates the sales for the current context.
ALLSELECTED(Sales) removes any filters on the Sales table but respects filters from slicers, ensuring that the calculation reflects the intended scope.
Key Points
ALLSELECTED is particularly useful for creating responsive measures in reports where user interaction (like slicers or filters) needs to modify results dynamically while still allowing a full range of data to be analyzed.
It provides flexibility when crafting insights that depend on user-driven contexts, such as dashboards where users might want to see insights filtered by certain criteria while still maintaining a broader view of the data.
ALLEXCEPT function
The ALLEXCEPT function in DAX is used to remove filters from all columns in a table except for the specified columns. This function is particularly useful when you want to maintain certain filters while disregarding others during calculations.
Purpose
The ALLEXCEPT function is ideal when you want to summarize data while keeping specific dimensions in the filter context, allowing for more targeted analysis.
Syntax
The syntax for ALLEXCEPT is:
ALLEXCEPT(



















































, , , ...)


: The table from which to remove all filters., , ...: The columns that you want to keep the filters for.
Example
For instance, if you have a Sales table and want to calculate the total sales while keeping the filter for Region, you might write the following:
TotalSalesByRegion = CALCULATE(SUM(Sales[SalesAmount]), ALLEXCEPT(Sales, Sales[Region]))
In this example:

SUM(Sales[SalesAmount]) computes the total sales.
ALLEXCEPT(Sales, Sales[Region]) allows the Region filters to remain while ignoring other filters in the Sales table.
###Key Points
ALLEXCEPT is useful for creating measures that need to respect specific dimensions while performing calculations over the entire dataset.
It helps in scenarios where you want to create comparisons (e.g., percentage calculations) that focus on certain filter aspects without losing other important insights from your data.
Logical Operators
In DAX, logical operators allow you to create complex filtering conditions in your calculations. Here are the primary logical operators used in DAX:

AND Operator (&&)
The AND operator is used to combine multiple conditions, where all conditions must be true for the entire expression to evaluate to true.
Example:
IF(Sales[Amount] > 1000 && Sales[Region] = "North", "High Sale", "Low Sale")

OR Operator (||)
The OR operator allows for conditions where at least one of the conditions must be true for the expression to evaluate to true.
Example:
IF(Sales[Amount] < 500 || Sales[Region] = "South", "Low Sale or in South", "Regular Sale")

NOT Operator (NOT)
The NOT operator is used to negate a condition. If the condition is true, NOT makes it false and vice versa.
Example:
IF(NOT(Sales[Region] = "East"), "Not in East", "In East")

Combining Conditions
You can combine these logical operators to create more complex conditions. However, be mindful of the order of operations; the AND operator is evaluated before the OR operator.
Example:
IF((Sales[Amount] > 1000 && Sales[Region] = "North") || (Sales[Amount] < 500), "Specific Sale Condition", "Other")
Usage in Filtering
These operators are often used with functions like CALCULATE to create dynamic measures based on complex filtering criteria. For example, you could filter a dataset to include products that are either above a certain price point or within a specific category, as mentioned in the course where the operator logic was discussed.
Understanding how to effectively use logical operators can help you build sophisticated analytics in your DAX queries.
VALUES and AVERAGEX function
The VALUES function and the AVERAGEX function in DAX serve distinct purposes but can work together effectively in your calculations.
VALUES Function
The VALUES function returns a one-column table that contains the distinct values from the specified column or table. It can be used to create a unique list of values for further calculations.
Example Use: If you want to get a list of unique values from a date column, you would use:
VALUES(DateTable[Date])
AVERAGEX Function
The AVERAGEX function iterates through a table, evaluating an expression and returning the average of those values. It is an iterator function, meaning it processes each row of the table specified.
Syntax:
AVERAGEX(





























, )
Example Use: To calculate the average sales amount by iterating over a table of sales data:
AVERAGEX(SalesTable, SalesTable[SalesAmount])
Combining VALUES and AVERAGEX
You can combine these two functions to calculate averages based on distinct values. For example, to calculate the monthly average revenue, you could write:
MonthlyAverageRevenue = AVERAGEX(VALUES(DateTable[YearMonth]), [RevenueMeasure])
In this case:
VALUES(DateTable[YearMonth]) provides a unique list of year-month combinations, and for each combination, the [RevenueMeasure] is evaluated and averaged.
This combination allows for powerful analytics where averages are calculated based on distinct segments of your data, providing insights tailored to your reporting needs.
RANKX function
The RANKX function in DAX is used to rank values within a specified context. It allows you to determine the rank of an expression evaluated for each row across a table. This is particularly useful for identifying top-performing items based on a certain measure, such as revenue.
Purpose
The RANKX function assigns a rank (1 for the highest value, 2 for the next highest, etc.) to each row in a specified table based on the evaluation of an expression.
Syntax
The syntax for RANKX is:
RANKX(













, , [, ])


: The table containing the values to rank.: The expression to evaluate and rank.
 (optional): An optional value that you can provide when the expression returns a blank.
 (optional): Sort order (0 for descending, 1 for ascending; defaults to descending).
Example
For example, if you want to rank customers based on their quarterly average revenue, you might create a measure like this:
Ranking by Quarterly Average = CALCULATE( RANKX ( ALLSELECTED( customer ) , [Quarterly Average Revenue] ) ,
ALL ( DateTable[Year] ) )
In this example:

ALLSELECTED( customer ) is the table we are ranking with customer slicer applied.
[QuarterlyAverageRevenue] is the measure for which ranks are calculated.
ALL ( DateTable[Year] ) - remove all filters based on Year of the DateTable table
Practical Use

You can use RANKX in tandem with other measures to filter the top performers or analyze performance over time. For instance, to filter and see only the top ten customers ranked by quarterly average revenue, you could create a calculated table or fit this measure in a visual filter.
Combining RANKX with a helper table, such as a Top N filter, can also improve your analysis by giving you dynamic control over how many top-ranked items to display.
IF function & Top-N filter
The IF function in DAX can be effectively used in conjunction with a Top-N filter to dynamically filter data based on ranking. Here's how you can implement a Top-N filter using an IF statement.
Creating a Top-N Filter

Helper Table: First, create a helper table that specifies your Top-N options such as Top 3, Top 5, Top 10, etc. This table will allow you to control how many entries you want to see in your report.

Creating the Measure: You will then need to create a measure that utilizes the ranking and implements the IF statement to determine whether to include a particular value based on its rank.
Sample Measure: Here’s an example of how you might create a measure that filters to only show revenue for the top N ranked customers:
TopN Revenue = IF ( [Ranking by Quarterly Average] <= MAX(TopNFilter[TopNValue]) , 
	[Revenue Measure] , 
	BLANK() ) 
In this example, [Ranking by Quarterly Average] is the measure that calculates the rank, and [RevenueMeasure] calculates the revenue. The IF statement checks if the rank is within the specified Top-N value.
How It Works

The measure calculates the ranking for each customer and then checks if it falls within the defined Top-N range. If true, it returns the corresponding revenue; if false, it returns a blank.
This allows for flexible reporting and analysis where you can easily adjust the Top-N filter to view different segments of your data.
Practical Application
Using this approach, you can dynamically analyze which customers, products, or any entities perform within the top range based on your chosen criteria, enhancing your insights into business performance.
Variables
In DAX, variables are a powerful feature that allows you to store values and expressions for later use within a formula. This enhances the readability and maintainability of your DAX code, and can also improve performance by avoiding repeated calculations.
Creating Variables
You can define variables in a DAX formula using the VAR keyword followed by an assignment, and then use these variables in the subsequent calculations. The structure is as follows:
MeasureName = 
VAR VariableName = Expression
RETURN
    AnotherExpressionUsing(VariableName)
Example
Here's a practical example of using a variable in a measure designed to calculate total sales while excluding a specific product category:
TotalSalesExcludingCategory = 
VAR TotalSales = SUM(Sales[SalesAmount])
VAR ExcludedSales = CALCULATE(SUM(Sales[SalesAmount]), Sales[Category] = "ExcludedCategory")
RETURN
    TotalSales - ExcludedSales
In this example:

TotalSales  holds the total sales amount.
ExcludedSales calculates sales for a category that needs to be excluded.
The measure subtracts ExcludedSales from TotalSales and returns the result.
Benefits of Using Variables

Improved Readability: Using variables makes your DAX formulas easier to understand.
Performance: Instead of calculating the same expression multiple times, you calculate it once and reference it through the variable.
Complex Calculations: Helps in breaking down complex calculations into simpler parts, making debugging easier.
Time Intelligence & DATEADD
The DATEADD function in DAX is a powerful Time Intelligence function used for shifting dates by a specified number of intervals, which can be days, months, quarters, or years. This function allows you to create calculations that compare data from different time periods, making it essential for time-based analysis.
Syntax
The syntax for DATEADD is:
DATEADD(, , )
: A column that contains dates.
: The number of intervals to add (can be negative for subtraction).
: The interval to use (e.g., DAY, MONTH, QUARTER, YEAR).
Example
For instance, if you have a measure that calculates revenue, and you want to see the revenue from two days ago, you can use DATEADD like this:
RevenueTwoDaysAgo = 
CALCULATE(
    [RevenueMeasure],
    DATEADD(DateTable[Date], -2, DAY)
)
In this example:

[RevenueMeasure] is the measure you're calculating.
-DateTable[Date] is the date column, and -2 specifies to go back two days.
Use Case
The DATEADD function is particularly useful when you want to create reports that compare current metrics with those from previous periods. By shifting the context of your calculations, you can easily compute growth rates, year-over-year changes, and other time-based insights.

Data Catalog 3.0
Sun, 07 Dec 2025 19:28:59 GMT
Role of the Data Catalog in Data Mesh
A Data Mesh is a decentralized architectural paradigm, while a Data Catalog is a tool or component used within it. A Data Mesh focuses on organizational principles like domain ownership and treating data as a product, whereas a Data Catalog provides a way to discover and inventory all the data assets, including these data products, across the decentralized domains. Essentially, data catalogs are foundational and necessary for a data mesh to function effectively by providing a central point of access and discovery for decentralized data
Starting around 2016, the modern data stack went mainstream. This refers to a flexible collection of tools and capabilities that help businesses today store, manage, and use their data. These tools are unified by three key ideas:

Self-service for a diverse range of users
“Agile” data management
Cloud-first and cloud-native

Key elements and tools in the modern Data Stack



dbt on Databricks
Sun, 07 Dec 2025 19:03:47 GMT
dbt Labs
dbt Labs is the company behind dbt (data build tool), which is an open-source analytics engineering tool. It enables data professionals to transform their raw data into structured datasets that can provide valuable insights through SQL-based transformations. dbt Labs focuses on enhancing data transformation processes by providing a modular, version-controlled framework that facilitates integration with various data platforms like Databricks, Snowflake, and Microsoft Fabric.
The key offerings from dbt Labs include:

dbt Core: The open-source version of dbt that allows users to create their models and manage data transformations.
dbt Cloud: A hosted version of dbt that offers additional features such as a user interface, collaboration tools, and scheduling capabilities to streamline workflows.
Support and Community: dbt Labs encourages community contributions and has an active ecosystem where users share knowledge and best practices.

Databricks
Databricks is a unified data analytics platform that provides a collaborative environment for data engineering, data science, and machine learning. It is built around Apache Spark and integrates with various cloud services, allowing organizations to efficiently process and analyze large amounts of data.
Key features of Databricks include:

Lakehouse Architecture: Combines data lakes and data warehouses into a single architecture, enabling easier data management and analytics.
Collaborative Workspace: Offers notebooks that support multiple languages (Python, R, Scala, SQL) for data scientists and analysts to collaborate in real-time.
Unified Analytics: Allows users to perform tasks related to data processing, analytics, and machine learning in a seamless way without needing separate tools.
Integration with Other Tools: Databricks easily integrates with various external tools and platforms, including dbt, which helps in transforming raw data into structured insights using SQL.
Scalability and Performance: Provides high-performance capabilities to handle demanding workloads and large datasets, making it suitable for enterprises.

Databricks enhances data accessibility and usability, helping organizations leverage their data effectively for decision-making and strategic planning.
Connect dbt to Databricks
To connect the Databricks Unity Catalog to your dbt project, follow these steps:

Installation Prerequisites: Ensure that you have both dbt and the Databricks adapter installed. You can check the official documentation for installation

Create SQL Warehouse: Create an SQL warehouse that will be connected to the dbt project using connection details.


Create a new Unity Catalog:  New created unity catalog, using created warehouse in the previous step, will be then connected to dbt project.


Create a new Connection:  New created unity catalog, using created warehouse in the previous step, will be then connected to dbt project.

Go to the Account Settings/Connection tab.
Select Databricks
Set the server hostname and HTTP path. Obtain these from your Databricks SQL warehouse connection details.
Optionally set the name of created Unity Catalog




Set Up Access Tokens:

Go to the Databricks Settings
Generate a new token under Developer/Access Tokens tab
Copy the token



Create and init dbt project:

Go to the Account Settings/Projects tab.
Select New project
Enter name of the project
Select created Connection
Select Token as Auth meto a paste the copied Databricks token
Leave a schema as it is
Setup a repository (1)
Create a repository (2)
Go to Studio Tab
Init and commit repository to new branch (3)





This procedure integrates Databricks Unity Catalog into your dbt project, allowing you to effectively manage and utilize your data assets.


Power BI Incremental Refresh
Sun, 07 Dec 2025 19:08:20 GMT
Incremental refresh is a feature in Power BI that allows you to increase efficiency when refreshing data by only updating the most recent data instead of completely rewriting the entire dataset. This is especially useful for large datasets as it minimizes the amount of data handled during the refresh process, reducing load times and resource usage.
Here’s how incremental refresh works:

Data Partitioning: The data is partitioned based on defined time frames, like days or months. This allows Power BI to identify which data partitions need to be refreshed based on changes.
Change Detection: It can check for changes in specific fields (like the maximum value of a date field) and trigger a refresh only if there’s new data to incorporate. For example, if today's date doesn't have new data since the last refresh, it won't load that day's data again.
Limitations: Incremental refresh is available only in Power BI service, and usually requires a Power BI Premium license for extensive capabilities. In the Pro version, while you might refresh data multiple times per day, it does not include incremental refresh.

This approach helps keep data up-to-date while keeping performance optimal, particularly with frequently changing datasets
Benefits of incremental refresh
Incremental refresh offers several benefits, particularly when dealing with large datasets in Power BI. Here are the main advantages:

Improved Performance: By only refreshing the most recent data partitions instead of the entire dataset, incremental refresh drastically reduces the time and resources needed during the refresh process.
Efficient Data Handling: It minimizes the workload on both the server and network, leading to faster report availability and improved user experience, especially when using data sources that receive frequent updates.
Cost-effective: For organizations using Power BI Premium, incremental refresh can lead to significant savings in compute resources, as less data processing time translates to lower costs.
Reduced Load Times: Users can access reports with current data more quickly since only the new or changed data is processed, rather than waiting for large dataset refreshes.
Better Management of Historical Data: It allows users to keep a subset of historical data while efficiently managing updates to the most recent data, which is particularly valuable for data models that rely on time-based analysis.
Automation Capability: Incremental refresh is ideal for scheduled refreshes, allowing organizations to automate data updates at specific intervals without manual intervention.

These benefits make incremental refresh a crucial feature for data analysts and organizations that need to work with large and frequently changing datasets in Power BI.
Setting incremental refresh
To set up incremental refresh in Power BI, follow these steps:

Prerequisites: Ensure you have Power BI Pro or Premium, as incremental refresh is not available in the free tier.
Dataset Preparation:

Start by opening your Power BI Desktop and loading your dataset.
You may need to specify which table or data source will benefit from incremental refresh.


Define Parameters:

Create two parameters in Power Query:

RangeStart: This parameter will define the start of the time range for data loading.
RangeEnd: This parameter will define the end of the time range.




Filter the Data:

After creating the parameters, apply a filter to your data query that uses these parameters to filter rows based on the date or time field that you want to use for incremental refresh.
For example, filter your data to include only records where your date field is greater than or equal to RangeStart and less than or equal to RangeEnd.


Configure Incremental Refresh:

Go to the "Modeling" tab in Power BI Desktop, and select "Manage Parameters" to set up incremental refresh policies.
Set how many periods of historical data you'd like to refresh and how many recent data snapshots you want to keep. For instance, you might want to keep the last 5 days of data but refresh daily.


Publish the Report:

Once you’ve set the parameters and filters, publish the report to Power BI Service.
The first time you publish, Power BI will load all data based on the defined range.


Define Refresh Plan in Service:

In Power BI Service, you can then set the refresh schedule for your dataset.
Any subsequent refreshes will honor the incremental refresh configuration you set up in Power BI Desktop, refreshing only the latest data.



Following these steps allows you to efficiently handle large datasets by only updating the necessary portions rather than the entire dataset.
Set Up Incremental Refresh Step by Step
Step 1: Open a report into Power BI Desktop

Open Report in a Power BI Desktop
Click Home → Transform Data to open Power Query Editor


Step 2: Create Range Parameters

In the Power Query Editor, navigate to Manage Parameters —> New Parameter.
Create two parameters:

RangeStart (Date/Time) — Set a default value (01.01.2020 0:00:00).
RangeEnd (Date/Time) — Set a default value (01.01.2021 0:00:00).




Step 3: Apply Filters to the Data

Select the date column you want to filter by.
Click Filter —> Custom Filter.
Set the filter condition:

Greater than or equal to —> RangeStart.
Less than —> RangeEnd.


Click Close & Apply to apply changes.


Step 4: Enable Incremental Refresh

In Power BI Desktop, right-click the table —> Incremental Refresh.
Configure settings:

Store data for (like 5 years).
Refresh data for (like the last 1 month).


Click Apply.


Step 5: Publish to Power BI Service

Click Publish and upload the report to the Power BI Service.
In Power BI Service, navigate to Dataset Settings —> Refresh —> Scheduled Refresh.


Step 6: Test and Verify Refresh

Run a manual refresh to verify if only recent data updates.
Monitor refresh logs for errors.


Limitations of Power BI Incremental Refresh

Requires a Premium or PPU License: Incremental Refresh in Dataflows is available only in Power BI Premium, Premium Per User (PPU), or Fabric capacities, and that makes it inaccessible for Pro users.
Requires a Date/Time Column: Incremental refresh depends on a Date/Time column to filter new (or modified) data. If the dataset lacks such a column, you’ll need additional transformations before implementing it.
Cannot Refresh Deleted Records: Incremental refresh only updates new (or modified) records but does not automatically handle deleted records unless designed using custom logic or soft delete.
Limited Data Source Support: Not all data sources support incremental refresh. It typically works with SQL databases, certain cloud-based sources, and Azure. Direct API-based sources may not be compatible.

References:

Power BI Incremental Refresh: A Complete Guide
Udemy PL-300 certification prep



Microsof Fabric as an all-in-one analytics solution
Sun, 07 Dec 2025 19:07:31 GMT
In recent years, we have witnessed repeated transitions from centralized to decentralized governance and vice versa.  I have been involved in these changes, both in the corporate sphere in the IT field and in the field of IT solution providers.
In this article, I would like to take a closer look at the transition from decentralized Data Management to centralized using Microsoft Fabric solutions. Snowflake and Apache Spark Databricks are also moving in a similar direction.
Centralized Data Management
Centralized data involves gathering data from different sources and storing it in one central database, warehouse, and data lake. The data repository offers a centralized point for managing, storing, and using data, allowing for easier maintenance and management of data.
Decentralized Data Management
Decentralized Data involves the storage, cleaning, and use of data in a decentralized way. That is, there is no central repository. Data is distributed across different nodes, giving teams more direct access to data without the need for third parties.
Comparison of Centralized versus Decentralized Data Management

Data Mesh
is a decentralized approach to data architecture that promotes domain-oriented ownership and management of data. It advocates for treating data as a product, with each domain (or business unit) responsible for its own data pipelines, governance, and quality. The primary goal of data mesh is to address the limitations of traditional centralized data architectures by enabling scalability, agility, and autonomy of independent domains.
Data Fabric
is centralized aproach to data architecture and management. It is an end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need.
Pros and Cons of Data Fabric

Microsoft Fabric
Microsoft Fabric is Azure's solution for a centralized Data Fabric approach It’s designed to address the challenges of a fragmented data and AI technology market by integrating various technologies like Azure Data Factory, Azure Synapse Analytics, Power BI, and OpenAI Service into a single unified product.
This is a all in one analytic solution that is now covering everything from data movement to data science, real time analytics and business intelligence. And this includes everything from data lake, data engineering, data integration, Power BI, Real time analytics, and all of this is integrated in one environment. Everything is managed for us and we basically just use the software as it is (SaaS). We don't need to move the data between different tools, different services and different vendors.
Microsoft Fabric Fundamentals
OneLake
OneLake in Microsoft Fabric serves as the central data repository, functioning like a managed data lake. Here are the key aspects of OneLake:

Unified Storage: OneLake acts as a single, unified storage system for all your data assets. It simplifies data management by consolidating storage in one place, eliminating the need to piece together various tools and services.
Data Accessibility: Rather than physically moving data from other locations (like AWS or Azure), OneLake allows you to create shortcuts to external files. This means you can access data without complex data pipelines, making your data management process more efficient.
Integration: OneLake integrates seamlessly with other components in Microsoft Fabric, enabling various analytical processes without the typical barriers that exist in traditional tooling setups.
Data Governance and Security: It includes features for data governance and security, ensuring your sensitive information is protected while providing access to authorized users within your organization.

In summary, OneLake is a highly managed data lake as a service that allows users to store, access, and manage their data effectively within Microsoft Fabric.

Workspace
A workspace in Microsoft Fabric acts as a dedicated environment for managing and organizing data projects. Here are the key points about workspaces:

Segmentation: Think of a workspace as a folder or segment specifically designated for a certain project or department. This helps in organizing different types of workloads, such as data pipelines, reports, and analytics.
Collaboration: Within a workspace, team members can collaborate on various items. The creator of a workspace typically controls who has access to it by adding users and assigning them specific roles (e.g., admin, member, contributor, or viewer).
Creation of Items: In a workspace, you can create various data artifacts including but not limited to datasets, reports, data pipelines, notebooks, and dashboards. This allows for a comprehensive approach to data management and analysis.
Capacity Assignment: Workspaces must be assigned to a specific capacity in Microsoft Fabric to utilize its features effectively. This capacity allows for the necessary computational power to handle the data operations within that workspace.

Overall, a workspace serves as a central hub for conducting data-related activities in Microsoft Fabric, tailored to specific needs and collaborations.

Lakehouse
A Lakehouse in Microsoft Fabric is an item created within a workspace that combines the functionalities of both data lakes and data warehouses. Here’s what you need to know about Lakehouses:

Hybrid Storage: A Lakehouse allows you to store various types of data, including structured data (like tables) and unstructured or semi-structured data (like CSV or JSON files). This flexibility makes it suitable for complex analytics workloads and machine learning projects.
Delta Tables: Within a Lakehouse, you can create Delta Tables, which are optimized for high performance and support both batch and real-time data processing. This enhances analytics and reporting capabilities.
3.** Centralized Location**: The Lakehouse serves as a central location for storing, managing, and analyzing files and data. This integration makes it easier to connect with other tools and processes within Microsoft Fabric.
Compatibility and Integration: The Lakehouse integrates seamlessly with various tools and technologies, including open-source technologies like Apache Spark and Delta Lake, facilitating advanced analytics and AI-driven insights.

Overall, Lakehouses are designed to provide a flexible yet powerful data storage and management solution, bridging the gap between traditional data warehouses and modern data lakes.

SQL Analytics Endpoint
The SQL Analytics Endpoint in Microsoft Fabric is a connection interface that allows users to interact with their data stored in the Lakehouse using SQL queries*. Here are the key points regarding the SQL Analytics Endpoint:

Connection String: The SQL Analytics Endpoint provides a connection string which can be utilized to connect other tools, such as SQL Server Management Studio (SSMS) or Power BI, to the Lakehouse. This connection string facilitates accessing tables and executing SQL queries.
Data Preview: Through the SQL Analytics Endpoint, users can explore the data structure within the Lakehouse. You can expand schemas and view tables (e.g., a sales table) directly, allowing you to preview data visually.
Visual Interface: The endpoint offers a visual explorer, making it easier to navigate through your data without needing to write extensive queries initially. This interface helps users to become familiar with the structure of their datasets.
Usage in BI Tools: The SQL connection can be utilized with BI tools like Power BI, enabling users to create reports and dashboards based on the data stored in the Lakehouse.
Integration: The SQL Analytics Endpoint is part of the unified analytics architecture in Microsoft Fabric, ensuring that data is easily accessible and manageable within a single platform.

In summary, the SQL Analytics Endpoint simplifies data interaction by providing a straightforward way for users to connect to their data and perform analytics using SQL language, enhancing the overall data management experience in Microsoft Fabric.
Visual Query
A visual query in Microsoft Fabric is a user-friendly tool that allows users, particularly those with less experience in SQL, to construct SQL queries using a graphical interface instead of writing code directly. Here are the key features and functionality:

Graphical Interface: Visual queries enable users to create and manipulate queries through a visual environment. This makes it easier to understand the relationships between different data elements and to build queries without extensive SQL knowledge.
Ease of Use: The visual query tool is designed for ease, allowing users to drag and drop elements, select tables, and specify filters or joins without needing to understand complex SQL syntax.
Accessing Visual Query Tool: To create a new visual query, users can click on a specific icon in the interface, which opens the visual query creation options. This makes it accessible for users who might not feel comfortable with traditional coding.
Support for Beginners: Visual queries are particularly beneficial for individuals new to data analytics or SQL, allowing them to engage with data analysis without the technical overhead of writing SQL code.

In summary, the visual query feature in Microsoft Fabric bridges the gap for users who are less familiar with SQL, providing a way to construct queries visually and efficiently.

Shortcuts
A shortcut in the context of Microsoft Fabric refers to a reference to a data table that allows you to access it without creating a redundant copy of the data. Here’s how it works:

Definition: A shortcut acts as a link to a data table, enabling you to interact with it just like a normal table while avoiding duplication.
Creation:

You can create a shortcut by selecting the table you want to reference and checking the appropriate option for creating a shortcut in your workspace.
This process allows you to connect to data from various sources, like SQL databases, and use it in a lakehouse without ingesting (copying) the data.


Benefits: Using shortcuts prevents unnecessary data storage and allows for seamless updates; any modifications made to the original table will be reflected when accessing it through the shortcut.
Use Cases: Shortcuts are especially useful when you want to visualize or analyze data without the need for multiple copies, ensuring that your work remains efficient and organized.

Power BI Semantic Model
The Power BI semantic model is essentially a data structure that organizes data and defines relationships between various tables, serving as a foundation for creating reports and visualizations in Power BI. Here’s a breakdown of its key aspects:
Definition: Previously known as datasets, semantic models are created when you establish a lakehouse. They centralize data management, allowing seamless access and reporting.
Relationships: The semantic model maintains the relationships between different tables, which helps ensure data integrity and enables accurate data analysis in reports.
Usage: You can leverage the semantic model to create Power BI reports. When setting up a report, you can select an existing semantic model as the data source, allowing you to utilize the relationships and structures defined within it.
Measures: The semantic model can also include measures, which are theoretical calculations used within reports. These calculations aren’t stored physically but are computed on-the-fly when the report is run.
Performance and Efficiency: By using a semantic model, you avoid data redundancy since reports directly reference the centralized data in the lakehouse. This means there’s no unnecessary duplication of data, and performance can be optimized through well-structured queries.
Overall, a semantic model enhances the ability to create effective and insightful reports within Power BI, making data analysis more efficient and coherent.
Semantic Model vs Tabular Model
The Power BI semantic model and the tabular model are both crucial elements for data analysis but serve different purposes. Here’s a breakdown of their differences and similarities:

Definition:

Power BI Semantic Model: This is a specific model used within Power BI that organizes and connects data from a lakehouse or other sources, allowing users to create reports seamlessly. It includes relationships between tables and serves as the foundation for visualizations.
Tabular Model: This is generally used in SQL Server Analysis Services (SSAS) and serves as a dataset that can also operate within a multi-dimensional context. It focuses on in-memory caching for efficient querying and includes structured data in the form of tables and relationships.


Data Storage:

Semantic Model: Data is referenced from a lakehouse, avoiding redundancy and maintaining efficient storage. Reports leverage the centralized dataset directly without duplicating data.
Tabular Model: Data can be stored in-memory or queried directly from a relational database. It can involve data import that might create duplicates unless effectively managed.


Usage Context:

Semantic Model: It is preferred for storage efficiency and centralized data management when creating reports directly in the Power BI service. Reports directly leverage this existing model.
Tabular Model: It is often used in more advanced modeling scenarios requiring robust transformations before publishing, usually managed within a local environment like Power BI Desktop.


Performance:

Semantic Model: Performance can depend on whether direct query or import mode is used. It’s structured to optimize data access efficiently.
Tabular Model: It generally provides fast query performance through in-memory data caching, but it may require additional management to optimize performance for reporting.


Relationships:

Both models maintain relationships between tables, essential for accurate reporting, but the management methodologies can differ. Power BI's semantic model can automatically infer relationships, making it easier to create actionable insights.
In conclusion, while both models allow structured data interaction, the Power BI semantic model focuses on integration and efficiency within the Power BI ecosystem, while the tabular model is broader, used primarily in various analytical contexts.



Row Level Security (RLS)
Row-level security in Power BI is a feature that allows you to restrict data access for specific users or groups. This means different users can see different data in the same report based on their roles or permissions. Here’s how it works:
Roles Creation: You define roles within your Power BI model. Each role specifies a filter that determines what data is visible to people assigned to that role. For instance, a manager might see all data, while an employee might only see their own department's data.
DAX Filters: You can use Data Analysis Expressions (DAX) to specify access rules. This involves writing DAX expressions that evaluate the current user and filter the data accordingly.
User Assignment: After defining roles, you assign users to these roles either in Power BI Desktop for testing or in the Power BI service when publishing the report.
Dynamic Filtering: RLS can also use dynamic filtering based on the user’s identity. This means you can automatically filter the data shown to the user based on their login credentials, which can be obtained through functions like USERNAME(), USERPRINCIPALNAME(), or ISINSCOPE().
Implementing RLS helps protect sensitive data and ensures that users see only the information relevant to them, enhancing data privacy and compliance.
Warehouse
A warehouse, in the context of data analytics, refers to a specialized layer designed for high-performance analytics and reporting on structured data. It is primarily optimized for structured data, such as relational data sourced from databases, and utilizes SQL for querying and analysis.
To break it down further:

Purpose: Data warehouses provide strategic insights by consolidating and managing data from multiple sources, making them essential for business intelligence and reporting purposes.
Data Structure: They are designed to handle structured data, which is organized in a format that is easily accessible and analyzable.
3.Integration with OneLake and Lakehouse: In Microsoft Fabric, data warehouses function within a broader framework that includes OneLake (a unified storage system) and Lakehouse (which combines the features of data lakes and data warehouses). This means that data warehouses can access data stored in OneLake and are suitable for performing complex analytics workloads.
Performance: They are optimized for high-performance needs, meaning they can handle large volumes of data efficiently, making them suitable for businesses that require timely and accurate data analysis.

In summary, a warehouse is integral for organizations that need to perform robust analytics and reporting on structured data, enabling informed decision-making based on comprehensive data insights.
Diference between Lakehouse and Warehouse
The main differences between a lakehouse and a warehouse can be summarized as follows:

Data Types Supported:

Lakehouse: Supports structured, semi-structured, and unstructured data, making it ideal for diverse data types. It allows the storage of files such as CSV or JSON alongside structured tables, providing flexibility in data management.
Warehouse: Primarily designed for structured data, such as relational data from databases. It is optimized specifically for high-performance analytics and reporting tasks.


Architecture and Flexibility:

Lakehouse: Combines the advantages of data lakes and warehouses, allowing for both rigid structured tables and flexible file storage. It supports real-time and batch processing for complex analytics workloads, machine learning, and data science projects.
Warehouse: A specialized layer focused on high-performance analytics tailored for structured queries, making it familiar for traditional data analysts.


Query Capabilities:

Lakehouse: Can be queried through a SQL endpoint, but it is read-only when using SQL, meaning you cannot perform write or update operations via SQL queries.
Warehouse: Allows for both read and write operations, making it suitable for executing complex queries and data updates.


Use Cases:

Lakehouse: Best suited for projects involving machine learning, real-time analytics, and processing diverse data formats. It serves as a centralized location for managing and analyzing various data types.
Warehouse: Ideal for organizations focusing on high-performance reporting and analytics tasks that rely heavily on structured data.



In summary, the lakehouse offers a more flexible and comprehensive approach to data management, while the warehouse is specialized for efficient and performant analytics solely on structured data.

Underlying format
The underlying format of a warehouse, particularly in the context of Microsoft Fabric, involves the use of the Delta Parquet format for data storage. Here are the key points related to its underlying format:

Delta Tables: Warehouses utilize delta tables, which are built on the Parquet format. This allows for efficient data processing and storage. Delta tables provide features like ACID transactions, efficient data updates, and schema enforcement, enhancing the reliability of data operations.
Integration with OneLake: The warehouse operates within the OneLake storage system, which serves as a unified storage solution. This integration enables warehouses to access and utilize data stored across different formats and sources.
SQL Support: The data warehouse environment also facilitates the use of T-SQL (Transact-SQL) for creating and managing tables, as well as for running analytics queries. This is a critical feature that distinguishes it from other components like lakehouses, where direct table creation is not supported.
Performance Optimization: The warehouse is specifically optimized for high-performance analytics and reporting on structured data, making it suitable for traditional BI use cases.

In summary, the warehouse is constructed around delta tables in the Parquet format and is designed to deliver high-performance query capabilities while facilitating robust data management features.
Apache Spark
Apache Spark offers several strengths that make it a powerful tool for data processing and analytics:

Distributed Computing: Spark is designed to operate across a network of machines, allowing it to efficiently handle large datasets and complex data processing tasks. This distributed nature means computations are executed in parallel, making it much faster than processing on a single machine.
In-Memory Processing: By caching data in memory instead of relying on disk reads, Spark significantly speeds up data processing. This enables quicker access and manipulation of data, which is vital for real-time applications.
Resilient Distributed Datasets (RDDs): Spark's core abstraction for handling data is RDDs, which are fault-tolerant collections of data partitions distributed across the cluster. RDDs ensure that data can be recovered from errors, maintaining data integrity and processing performance.
Flexibility with Programming Languages: Spark supports multiple programming languages, including Scala, Python, and Java. This flexibility allows data engineers and data scientists to choose a language they are comfortable with, facilitating easier development.
Support for Both Batch and Real-Time Processing: Spark can handle both batch processing and streaming data, making it versatile for different types of analytics tasks.
Integration with Machine Learning: Spark includes libraries for machine learning, data science, and graph processing, enabling advanced analytics directly within the framework.
Optimized for Large-Scale Data Operations: It is well-suited for processing large amounts of data efficiently, making it an ideal choice for large-scale ETL processes and analytics.

These strengths position Apache Spark as a vital tool in the toolkit of data scientists and engineers, especially when working with large and complex data environments.

Loading data from a lakehouse
To load data from a lakehouse into a DataFrame using Databricks, you can follow these steps:

Set Up Your Environment: Make sure you have access to the lakehouse in Databricks.



Load Data: Use Spark's read functionality to load data into a DataFrame. The syntax generally looks like this:

# Inferring the Schema Automatically

df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("lakehouse_path")

Replace lakehouse_path with the actual path to your file in the lakehouse.

Adjust Options as Needed: Depending on your data format (e.g., CSV, JSON, Parquet), you might need to adjust the .format and include other options, such as delimiters or schema definitions.

Work with Your DataFrame: After loading the data, you can perform various operations such as showing some rows, processing, or analyzing data:


df.show()


Error Handling (Optional): If there’s an error loading the DataFrame, review the path and format options to ensure everything is correctly specified.

Writing a DataFrame back to a lakehouse

Prepare Your DataFrame: Ensure you have the DataFrame ready that you want to write to the lakehouse.

Write as a Delta Table: The preferred method for storing data in a lakehouse is as a Delta table. This allows for optimized performance and compatibility with tools like Power BI. You can use the following code snippet:


df.write.format("delta").saveAsTable("tablename")

Replace tablename with the name you want to assign to your table in the lakehouse.

Write as a CSV File (Alternative Option): If you prefer to write the DataFrame as a file, such as a CSV, you can use:

df.write.csv("lakehouse_path")

Here, replace lakehouse_path with the path where you want to save the CSV file. Databricks will create the necessary directories if they do not exist.

Check the Path: For the CSV option, ensure that the path you specify is correct, as that is where the output file will be generated.

Verify the Write Operation: After the write operation, you can verify that the data has been saved correctly by reading it back into a DataFrame using:


new_df = spark.read.format("delta").load("lakehouse_path")

Temporary views
To create and use temporary views in a Databricks notebook, follow these steps:

Create Temporary Views: You can create a temporary view using the createOrReplaceTempView method. For example, if you have a DataFrame called sales, you can create a temporary view like this:

sales.createOrReplaceTempView("sales_temp_view")


Query the Temporary View: After creating the temporary view, you can run SQL queries against it. For instance:

result = spark.sql("SELECT * FROM sales_temp_view")
result.show()


Session Scope: Keep in mind that temporary views are session-scoped; they will only exist during the active notebook session. Once the session ends, the view will no longer be accessible.

Use Multiple Views: You can create multiple temporary views from different DataFrames. For example, if you have another DataFrame called products, you can create a temporary view for it as well:


products.createOrReplaceTempView("products_temp_view")


Combine Queries: Temporary views allow you to run complex SQL queries involving multiple views and DataFrames, combining the power of Spark with SQL syntax for more flexibility.
This approach enables effective data manipulation and querying without the need for permanent storage, facilitating quick data analysis within your current session.



Set Power BI Row-Level Security to SAP Cost Center
Sun, 07 Dec 2025 19:05:47 GMT
In organizations where data is shared between different departments, it is crucial to restrict access to only the information that is necessary. To this end, Microsoft Power BI offers Row-Level Security (RLS) and Object-Level Security (OLS).
Row-Level Security
Row-Level Security (RLS) is a security feature in Power BI that restricts access to rows in a table based on the identity of the user viewing the report. Rather than duplicating reports for different user groups, RLS allows you to apply filters at the data level so that each user sees only the data they are permitted to view.
This is crucial for preserving data confidentiality and integrity, especially in scenarios involving sensitive or proprietary information. RLS operates within the Power BI data model and ensures that unauthorized users cannot access restricted data, even through indirect methods such as slicers or drill-downs.
Setting Role-level security :

Create Roles: Use DAX (Data Analysis Expressions) or logical statements to define roles that filter the data. For example, you might create a role for "Territory Managers" that only allows them to see data for their respective territories.
Testing: After setting up the roles, test them within Power BI Desktop using the "View As" feature to ensure that the data is being filtered correctly according to the defined roles.
Deployment: Finally, publish the report to the Power BI service and confirm the RLS configurations are working as expected in that environment.

Static vs Dynamic RLS Architectures
Row-level security can be split into two types: static and dynamic.
Static RLS implementation
Static RLS involves creating roles with hardcoded DAX filters. Each role corresponds to a specific group or segment, such as a geographic region or department.
Here are the general steps to implement a static RLS:

Create a role named, e.g., "Region_East."
Apply a filter such as [Region] = "East" to that role.
Assign specific users to the role in Power BI Service.

Dynamic RLS implementation
Dynamic RLS uses functions like USERNAME() or USERPRINCIPALNAME() combined with mapping tables to dynamically filter data based on user identity.
Here are the general steps to implement a dynamic RLS:

Create a mapping table linking users to access levels. This will be your security table. This table should include columns like user emails, their access regions, and their names.
Write a DAX filter like: [Region] = RELATED(UserRegion[Region])
Filter that table with: UserRegion[Email] = USERPRINCIPALNAME()

Example: RLS based on SAP roles granted to user
The basis for setting rights are SAP tables containing information about cost centers assigned to individual users via business roles.
The resulting authorization table obtained from the above SAP tables contains both the email address matching the logged-in user and the assigned cost centers. The user's email address appears multiple times for each assigned cost center:

Dynamic RLS uses function USERPRINCIPALNAME() as a logged-in user and apply filter on Table Cost Center:
VAR Logged_User =

   LOWER(USERPRINCIPALNAME ())

RETURN

       CALCULATE (

           COUNTROWS ( 'Cost Center' ),

           'Cost Center'[Cost Center Key]

               IN CALCULATETABLE (

                   VALUES ( Authorization[Cost Center] ),

                   FILTER ( ALL ( Authorization), [User Email] =  Logged_User )

               )

       ) > 0

The logged-in user then sees only data that is linked to the assigned cost centers.



Dynamic Management Views (DMVs)
Sun, 07 Dec 2025 19:03:18 GMT
DMVs
Dynamic Management Views (DMVs) are special system views that expose internal server state for monitoring and troubleshooting.
You can join DMVs together, but there are some limitations depending on the context:

SQL Server DMVs: (like sys.dm_exec_sessions, sys.dm_exec_requests) support normal JOIN syntax.
Analysis Services DMVs: (like DISCOVER_SESSIONS) have a restricted SQL-like syntax and do not support JOIN — you must query separately and join in your application code.

Analysis Services DMVs
Here’s a breakdown of what Analysis Services DMVs are and how they're useful:

Purpose of DMVs: DMVs allow users to retrieve metadata about objects in a model, such as tables, relationships, and hierarchies. They help in understanding how data is structured and managed.
Types of Information: DMVs can provide details about table relationships, data partitions, hierarchies, and compatibility levels of models. This information is crucial for analysis and optimization of the data models.
Querying DMVs: You can query DMVs directly using tools like SQL Server Management Studio (SSMS) or DAX Studio. This querying allows you to gather comprehensive data about your model’s schema and performance metrics.
Usefulness in Analysis: By utilizing DMVs, users can gain better insights into models that might not be immediately visible in graphical tools like Power BI. Some tables created by Power BI, for example, may not appear directly in the interface but can be accessed through DMVs.

Understanding and executing queries on DMVs can greatly enhance your ability to manage your data effectively and improve your data model's performance.
Query DMV in DAX studio
To query Dynamic Management Views (DMVs) in DAX Studio, you can follow these steps:

Connect to Your Model: Start by connecting to your data model in DAX Studio. Ensure that you are connected to the appropriate instance of Analysis Services that contains your data model.
Access the DMV Tab: Once connected, navigate to the 'DMV' tab within DAX Studio. This tab provides access to various DMVs available in your connected model.
Execute DMV Queries: Type in the desired DMV query to retrieve metadata about your model. For instance, you can use the DISCOVER_SESSIONS DMV to gather session-related information or the DISCOVER_CATALOGS-view catalog details. You may also use the tables and columns DMVs to understand the structure of your model.
Review Results: After executing the query, you can view the results directly in DAX Studio. This allows you to analyze the output, which may include component names and hierarchical relationships that are not visible through standard Power BI interfaces.
Consider Limitations: Keep in mind that while DMVs are valuable for inspecting your models, some features might be better realized through SQL Server Management Studio (SSMS). For certain advanced queries, you may need to enable ad hoc distributed queries by using the sp_configure command as outlined in the course.

Zde jsou nejčastější dotazy na DMV:
-- Query to retrieve all tables in a model
select * from $SYSTEM.TMSCHEMA_TABLES

-- Query to get all columns (note the SortByColumnId column)
Select * From $SYSTEM.TMSCHEMA_COLUMNS

-- Query to get all calculated columns
Select * From $SYSTEM.TMSCHEMA_COLUMNS Where [Type] = 2

-- Query to get all measures
select * from $SYSTEM.TMSCHEMA_MEASURES

-- Query to get all dependencies
select * from $system.discover_calc_dependency

-- Get unique row counts for all tables and columns
select * from $SYSTEM.DISCOVER_STORAGE_TABLES order by rows_count desc

-- Query to get all the roles, associated permissions and role memberships defined in the model
select * from $SYSTEM.TMSCHEMA_Roles
select * from $SYSTEM.TMSCHEMA_TABLE_PERMISSIONS
select * from $SYSTEM.TMSCHEMA_Role_Memberships

-- Query to get all the KPIs defined in the model
select * from $SYSTEM.TMSCHEMA_KPIS

-- Query to get session information
select * from $SYSTEM.DISCOVER_SESSIONS

-- Query to get all relationships
select * from $SYSTEM.TMSCHEMA_RELATIONSHIPS

-- Queries to get hierarchy information
select * from $SYSTEM.TMSCHEMA_ATTRIBUTE_HIERARCHIES
select * from $SYSTEM.TMSCHEMA_ATTRIBUTE_HIERARCHY_STORAGES

-- Query to get information about each model:
select * from $SYSTEM.TMSCHEMA_MODEL

-- Query to get information about each partition:
select * from $SYSTEM.TMSCHEMA_PARTITIONS

-- Query to get perspective information:
select * from $SYSTEM.TMSCHEMA_PERSPECTIVES

-- Query to get catalog information (especially compatibility level):
select * from $SYSTEM.DBSCHEMA_CATALOGS 

Query DMV in SSMS
To use Dynamic Management Views (DMVs) in SQL Server Management Studio (SSMS), follow these steps:

Connect to SQL Server: Open SSMS and connect to your SQL Server instance. If you're using Analysis Services, select "Connect" and choose "Analysis Services," entering the appropriate server name and port if necessary.
2.Open a New Query Window: Once connected, navigate to the "New Query" option to start writing your SQL queries. If you are getting prompts for MDX queries, ensure you are connected to the right service.
Query DMVs: You can write SQL queries to access DMVs. For example, you may want to look up the DISCOVER_SESSIONS or DISCOVER_CATALOGS DMVs which provide session details and catalog information, respectively.
Enable Ad Hoc Distributed Queries: If you plan to use OPENROWSET to access DMVs, make sure that ad hoc distributed queries are enabled on your SQL instance. You can do this by executing:

EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;


Run Your Query: Execute your query to see the results. This will allow you to retrieve useful information about your model that might not be visible through other means.

This approach allows you to use JOIN to connect, for example, a view with columns and a view with tables:
------------------------------------------------------------------------------------------------------
-- This query establishes a connection to the SSAS server represented in the OPENROWSET connection
-- string field and the catalog references in the connection string. It then pulls table and column
-- information 
------------------------------------------------------------------------------------------------------

-- It may be necessary to execute these statements first.
sp_configure 'show advanced options', 1;  
RECONFIGURE;
GO 

sp_configure 'Ad Hoc Distributed Queries', 1;  
RECONFIGURE; 
GO

-- Query the table and column data in the model or models.
;WITH [Tables] AS
(
	SELECT 
		ID as TableId
		,ModelId
		,[Name] as TableName
		,DataCategory
		,[Description] as Description
		,IsHidden								-- Is the table treated as hidden by a client visualization tool
		,ModifiedTime							-- The time the table was last modified
		FROM OPENROWSET('MSOLAP','DATASOURCE= localhost:60921; Initial Catalog=4f655bd0-716b-4bf6-9424-727ec45e4b47;','SELECT * FROM $SYSTEM.TMSCHEMA_TABLES')
)

,[Columns] AS
(
SELECT 
	ID as Id
	,TableId
	,ExplicitName as ColumnName
	,InferredName																					-- Engine generated name (Calculated columns only)
	,CASE
		When ExplicitDataType = 1 Then 'Automatic'
		When ExplicitDataType = 2 Then 'String'
		When ExplicitDataType = 6 Then 'Int64'
		When ExplicitDataType = 8 Then 'Double'
		When ExplicitDataType = 9 Then 'DateTime'
		When ExplicitDataType = 10 Then 'Decimal'
		When ExplicitDataType = 11 Then 'Boolean'
		When ExplicitDataType = 17 Then 'Binary'
		When ExplicitDataType = 19 Then 'Unknown'
		Else 'N/A'
	 END as DataType
	,DataCategory
	,Description
	,IsHidden																						-- Treated as hidden by a client visualization tool?
	,IsUnique																						-- Can the column contain duplicate values?
	,IsKey																							-- Is the column a key of the table?
	,IsNullable																						-- Can the column contain null values?
	,CASE
		WHEN T.Type = 1 Then 'From data source'
		WHEN T.Type = 2 Then 'Calculated'
		WHEN T.Type = 3 Then 'Row number'
		ELSE 'N/A'
	END as ColumnType
	,SourceColumn																					-- Source column name
	,Expression																						-- The calculated column DAX expression
	,FormatString																					-- String controlling the formatting of the column
	,SortByColumnId																					-- Specifies the column that is controlling the sorting of this column
	,AttributeHierarchyId																			-- A reference to an AttributeHierarchy object
	,ModifiedTime																					-- The time the column was last modified
	,ErrorMessage																					-- A string explaining the error state of the column
	FROM OPENROWSET('MSOLAP','DATASOURCE= localhost:60921; Initial Catalog=4f655bd0-716b-4bf6-9424-727ec45e4b47;','SELECT * FROM $SYSTEM.TMSCHEMA_COLUMNS') T
)

Select 
	T.TableName, 
	T.ModelId, 
	C.*, 
	ISNULL(C2.ColumnName,'') as SortByColumnName
From Tables T
	Inner Join Columns C ON T.TableId = C.TableId
	Left Join Columns C2 ON C2.Id = C.SortByColumnId

Of course, when selecting from OPENROWSET, you must replace DATASOURCE for your model.


Orchard Core Shapes
Sun, 07 Dec 2025 19:07:42 GMT
Orchard Core doesn't render HTML directly, but instead will usually render something called a Shape, which is an object that represents the thing to render and has all the necessary data and metadata to render HTML.
When rendering a Shape, Orchard Core will look for specific templates, passing the Shape to this template.
Orchard Core can match with many templates for the same Shape. These potential templates are called Alternates.
What is a Shape

An object implementing the IShape interface
A dynamic data model that contains:

Data that will be rendered by ASP.NET views
Metadata on how to render it



Benefits of Shapes

No view name is hard-codded - view name is based on a shape
Priority based view resolution - Alternates
Theming - User defined Templates (views)
Dynamic caching
Wrapping
Placement

Zones
Ordering


Multiple sources

Database
Files
Code


Events

Creating and rendering shapes

Create a shape by shape factory and name it

	var factory = context.RequestServices.GetREquiredService();
 var shape = await factory.CreateAsync("Car");


Create a HTML content by display helper

 var displayHelper = context.RequestServices.GetRequiredService();
 var htmlContent = await displayHelper.ShapeExecuteAsync(shape);


Create a view according to shape name : Car.cshtml
Send a HTML conent to Response Body

 await using var sw = new StreamWriter(context.Response.Body);
 htmlContent.WriteTo(sw, HtmlEncoder.Default);



Code is available on Github in the branch: Test_Shape_With_Razor_View
Rendering shapes with Liquid templates
Liquid is a safe, customer-facing templating language originally created by Shopify. It's designed to be secure, flexible, and easy to understand, making it perfect for generating dynamic content where you need to combine static templates with variable data from your application.
To render shapes using a liquid templates we only add to dependencies OrchardCore.DisplayManagement.Liquid package and to Program.cs AddLiquidViews() service.
Add data to shapes
The shape as an instance of the IShape interface contains the Properties property of type IDictionary, where we can insert data via the index, which is then visible in the template. In the template, then we display data using the 'Model' object. The ShapeExecuteAsync method sends a Shape model to the template, which, in addition to IShape, is also an instance of the DynamicObject class of Compose containing the TryGetMember method. This allows us to access Properties directly, e.g., Model.Brand instead of Model.Properties["Brand"].
Code for this part is available on Github in the branch: Test_ShapeData_With_Liquid_Template
Strongly typed shapes
Since the dynamic approach requires some overhead, we will use a generic method CreateAsync to create the shape and use a POCO object as the type. In our case, the Car class.
public class Car
{
    public string? Brand { get; set; }
    public string? Color { get; set; }
}
var shape = await factory.CreateAsync("Car", c => { c.Brand = "Renault"; c.Color = "Red";});

Now we can type the model as Car and get the value directly from the class properties:
@using OrchardCore.DisplayManagement;
@model Car

This is a car @Model.Brand with @Model.Color color

Adding metadata to a shape
As mentioned 'IShape' contains metadata, like Id, TagName, Classes, Attributes... , that can be used to render a shape by template. In a template we then render these metadata with helper class. Here is code setting metadata
   shape.Id = "my-renault";
   shape.TagName = "h3";
   shape.Classes.Add("car");
   shape.Classes.Add("brand-renault");
   shape.Attributes.Add("data-brand", "renault");

and here templates rendering them:
@using OrchardCore.DisplayManagement;
@model Car

@{
    var shape = Model as IShape;
    var tagBuilder = shape.GetTagBuilder();
}

@tagBuilder.RenderStartTag()
This is a car @Model.Brand with @Model.Color
@tagBuilder.RenderEndTag()

The result is this content:

and if we look at the HTML code, we can see the rendered shape metadata:

Code for this part is available on Github in the branch: Test_Shape_With_Metadata
References:

Orchard Harvest 2024: Demystifying Shapes, Part 1



General Delta Table processing
Sun, 07 Dec 2025 19:07:54 GMT
Python is an object-oriented programming language, and it supports the four main pillars of OOP: encapsulation, inheritance, polymorphism, and abstraction.

Data & AI

Deploying Azure resource with VS Code

Prerequsities

Typical Bicep script sections

Create Bicep script from existing resources

Deployment

Visualization

Power BI DAX Masterclass

Calculated column vs Mesure

Calculated Columns:

Measures:

Key Differences:

Date Tables

CALENDARAUTO:

CALENDAR:

Key Considerations:

Date Table Script

Key Measures table

Purpose:

Creating the Table:

Moving Measures:

Benefits:

COUNT Agregation functions

COUNT:

COUNTA:

COUNTBLANK:

DISTINCTCOUNT:

COUNTROWS:

Summary:

X functions

SUMX:

AVERAGEX:

MINX and MAXX:

MEDIANX:

General Use Cases:

Key Advantages:

Power BI vs Excel

Data Structure:

Analysis Capabilities:

Visualization:

Collaboration and Sharing:

Filter vs Row context

Filter Context

Row Context

Key Differentiations

CALCULATE function

Purpose

Syntax

Example

Key Points

Importance

FILTER function

Purpose

Syntax

Example

Key Points

Considerations

ALL function

Purpose

Syntax

Example

Key Points

ALLSELECTED function

Purpose

Syntax

Example

Key Points

ALLEXCEPT function

Purpose

Syntax

Example

Logical Operators

Usage in Filtering

VALUES and AVERAGEX function

VALUES Function

AVERAGEX Function

Combining VALUES and AVERAGEX

RANKX function

Purpose

Syntax