Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...
@daniel_sahal , this is the code snippet:lsn_incr_batch = spark.sql(f"""select start_lsn,tran_begin_time,tran_end_time,tran_id,tran_begin_lsn,cast('{current_run_ts}' as timestamp) as appendedfrom externaldb.cdc.lsn_time_mappingwhere tran_end_time > '...
I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...
@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader
I have an Azure web app running flask web server. From flask server, I want to run some queries on the data stored in ADLS Gen2 storage. I already created Databricks notebooks running these queries. The flask server will pass some parameters in ...
I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models
https://docs.databricks.com/en/large-language-models/index.html
Following the instruction on the Job Parameter Dynamic values, I am able to use {{job.id}}{{job.name}}{{job.run_id}}{{job.repair_count}}{{job.start_time.[argument]}}However, when I set trigger_type as trigger_type: {{job.trigger.type}} and hit SAVE, ...
Hi All! Im in a project where i need to connect azure devops and databricks using managed identity to avoid the using of service account, PAT, etc.The thing is i cant move forward with the connection since i cannot take the ownership of the files wh...
Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga
Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....
Thanks a lot. Then I guess we will try to use dbmanagedidentity for most of our needs, and create service principals +secret scopes when there are more specific needs, such as for limiting access to sensitive data. A bit of a hassle to scale, probabl...
Hi all,I have recently enabled Unity catalog in my DBX workspace. I have created a new catalog with an external location on Azure data storage.I can create new schemas(databases) in the new catalog but I can't create a table. I get the below error wh...
@Snoonan First of all, check the networking tab on the storage account to see if it's behind firewall. If it is, make sure that Databricks/Storage networking is properly configured (https://learn.microsoft.com/en-us/azure/databricks/security/network/...
Hello CommunityCan someone help refactor the following T-SQL Code to Databricks SQLCONVERT(DECIMAL(26, 8), ISNULL(xxx.xxxxxxx * ISNULL(RH.xxxxx, 1 / NULLIF(ST.xxxxxx, 0)), ST.xxxxx)) AS AmountWhen I attempt to execute the above code I get the followi...
Hello - We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries."datediff(unit, start,end)" is different between these two implementations (in a good way). For the purpose of migration, we'd like to stay as consiste...
Good morning, I have a DLT process with CDC incremental load and I need to ingest the history as CDC transactions are only recent. To do this I need to ingest data in the __databricks_internal catalog. In my case, as I am full admin, I can do it, how...
I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...