Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
How are locks maintained within a Delta Lake? For instance, lets say there are 2 simple tables, customer_details and say orders. Lets say I am running a job that will say insert an order in the orders table for say $100 for a specific customerId, it ...
I am trying to read data into a dataframe from Azure SQL DB, using jdbc. Here is the code I am using.driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
database_host = "server.database.windows.net"
database_port = "1433"
database_name = "dat...
With the introduction of the Unity Catalog in databricks, many of us have become familiar with creating catalogs. However, did you know that the Unity Catalog also allows you to create foreign catalogs? You can register databases from the following s...
Hello!I am currently exploring the possibility of implementing incremental changes in our company's ETL pipeline and looking into Change Data Feed option. There are a couple of challenges I'm uncertain about.For instance, we have a piece of logic lik...
Hello,I created a storage credential and an external location. Test is ok, I'm able to browse it from the portal. I have a notebook to create a table :%sqlCREATE OR REPLACE TABLE myschema.mytable( data1 string, data2 string)USING DELTA LOCATION "abf...
Hello,I created a storage credential and an external location. Test is ok, I'm able to browse it from the portal. I have a notebook to create a table :%sqlCREATE OR REPLACE TABLE myschema.mytable( data1 string, data2 string)USING DELTA LOCATION "abf...
Hi all,I've been trying to make use of some of the more recent tools for debugging in Databricks: pdb in the Databricks web interface with the variable explorer described in this article.I've also been trying to debug locally using the VSCode extensi...
Hi all,I'm trying to join 2 views in SQL editor for some analysis. I get the following error:[INCONSISTENT_BEHAVIOR_CROSS_VERSION.PARSE_DATETIME_BY_NEW_PARSER] You may get a different result due to the upgrading to Spark >= 3.0: Fail to parse '22/12/...
Hi Kaniz, I found the equivalent SQL code for this but it didn't seem to store the operation past the execution. I.e I would run the code to configure settings, then run the troublesome code afterwards and still get the same result. The problem has b...
Hi, I am wondering if multithreading will help with the performance for z-ordering optimization on multiple delta tables.We are periodically doing optimization on thousands of tables and it easily takes a few days to finish the job. So we are looking...
Hi, I am wondering if multithreading will help with the performance for z-ordering optimization on multiple delta tables.We are periodically doing optimization on thousands of tables and it easily takes a few days to finish the job. So we are looking...
I am using %run command to import shared resources for each of my processes. Because it was the most easy way to import my common libraries. However, in that way, pyflake can't resolve the dependencies quite well. And I end up working in code with ma...
You could use something like flake8 and customize the rules in the .flake8 file or ignore specific lines with #noqa.
https://flake8.pycqa.org/en/latest/user/configuration.html
Hi All,I am wondering if Pandas 2.x will be available soon or is it an available option to install.I have a small job I built to manipulate some strings from a database table when technically did the job, but doesn't scale with older versions of pan...
I am attempting in ingesting data into databricks via CSV, with the following statement below, this brings in my data looking perfect:Although, the bad part is I have to group this data by summing highlighted amt field. Given it is string it spits ou...
Hello all, I'm new to Databricks and can't figure out why I'm getting an error in my SQL code.Error in SQL statement: ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'if'.(line 1, pos 0) == SQL == if OBJECT_ID('tempdb.#InitialData') IS N...
Hi folks!I would like to know if there is a way to pass parameters to a "run job" task.For example:Let's have a Job A with:a notebook task A.1 that takes as input a parameter year-month in the format yyyymma "run job" task A.2 that calls a Job BI wou...
I am trying to write a process that will programmatically update the “run_as_user_name” parameter for all jobs in an Azure Databricks workspace, using powershell to interact with the Jobs API. I have been trying to do this with a test job without suc...
Solution you've submitted is a solution for different topic (permission to run job, the job still runs as the user in run_as_user_name field). Here is an example of changing "run_as_user_name"Docs:https://docs.databricks.com/api/azure/workspace/job...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.