cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rahul_Samant
by Contributor
  • 4814 Views
  • 4 replies
  • 1 kudos

Resolved! Spark Sql Connector :

i am trying to read data from azure sql database from databricks. azure sql database is created with private link endpoint.Using DBR 10.4 LTS Cluster and expectation is the connector is pre installed as per documentation.using the below code to fetch...

  • 4814 Views
  • 4 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

It seems that .option("databaseName", "test") is redundant here as you need to include the db name in the url.Please verify that you use a connector compatible to your cluster's Spark version : Apache Spark connector: SQL Server & Azure SQL

  • 1 kudos
3 More Replies
mick042
by New Contributor III
  • 1428 Views
  • 1 replies
  • 0 kudos

Does spark utilise a temporary stage when writing to snowflake? How does that work?

Folks , when I want to push data to snowflake I need to use a stage for files before copying data over. However, when I utilise the net.snowflake.spark.snowflake.Utils library and do a spark.write as in...spark.read.format("csv") .option("header", ...

  • 1428 Views
  • 1 replies
  • 0 kudos
Latest Reply
mick042
New Contributor III
  • 0 kudos

Yes it uses a temporary stage. should have just looked in snowflake history

  • 0 kudos
165036
by New Contributor III
  • 2057 Views
  • 3 replies
  • 1 kudos

Resolved! Error message when editing schedule cron expression on job

When attempting to edit the schedule cron expression on one of our jobs we receive the following error message:Cluster validation error: Validation failed for spark_conf, spark.databricks.acl.dfAclsEnabled must be false (is "true") The spark.databric...

  • 2057 Views
  • 3 replies
  • 1 kudos
Latest Reply
165036
New Contributor III
  • 1 kudos

FYI this was a temporary Databricks bug. Seems to be resolved now.

  • 1 kudos
2 More Replies
Anonymous
by Not applicable
  • 869 Views
  • 0 replies
  • 4 kudos

Happy August! �� On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have...

Happy August! On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social! Join us for our August ...

  • 869 Views
  • 0 replies
  • 4 kudos
AP
by New Contributor III
  • 4373 Views
  • 5 replies
  • 3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

  • 4373 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@AKSHAY PALLERLA​ Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens​ for jumping in, as always!

  • 3 kudos
4 More Replies
Jayesh
by New Contributor III
  • 2849 Views
  • 5 replies
  • 3 kudos

Resolved! How can we do data copy from Databricks SQL using notebook?

Hi Team, we have a scenario where we have to connect to the DataBricks SQL instance 1 from another DataBricks instance 2 using notebook or Azure Data Factory. Can you please help?

  • 2849 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Thanks for jumping in to help @Arvind Ravish​  @Hubert Dudek​ and @Artem Sheiko​ !

  • 3 kudos
4 More Replies
Jeade
by New Contributor II
  • 3176 Views
  • 3 replies
  • 1 kudos

Resolved! Pulling data from Azure Boards into databricks

Looking for best practises/examples on how to pull data (epics, features, PBIs) from Azure Boards into databricks for analysis.Any ideas/help appreciated!

  • 3176 Views
  • 3 replies
  • 1 kudos
Latest Reply
artsheiko
Databricks Employee
  • 1 kudos

you can use export to csv (link), push the file to the storage mounted to Databricks or just import the file obtained to dbfs

  • 1 kudos
2 More Replies
cralle
by New Contributor II
  • 6452 Views
  • 7 replies
  • 2 kudos

Resolved! Cannot display DataFrame when I filter by length

I have a DataFrame that I have created based on a couple of datasets and multiple operations. The DataFrame has multiple columns, one of which is a array of strings. But when I take the DataFrame and try to filter based upon the size of this array co...

image image
  • 6452 Views
  • 7 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

strange, works fine here. what version of databricks are you on?What you could do to identify the issue is to output the query plan (.explain).And also creating a new df for each transformation could help. Like that you can check step by step where...

  • 2 kudos
6 More Replies
tej1
by New Contributor III
  • 3908 Views
  • 5 replies
  • 7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

  • 3908 Views
  • 5 replies
  • 7 kudos
Latest Reply
tej1
New Contributor III
  • 7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

  • 7 kudos
4 More Replies
alexgv12
by New Contributor III
  • 2774 Views
  • 2 replies
  • 3 kudos

delta table separate gold zone by different tenant

Hello, currently we have a process that builds with delta table the zones of bronze, silver and when it reaches gold we must create specific zones for each client because the schema changes, for this we create databases and separate tables, but when ...

image
  • 2774 Views
  • 2 replies
  • 3 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 3 kudos

Hi @alexander grajales vanegas​ Are you creating all the databases and tables in gold zone manually?If so, please check out DLT https://docs.databricks.com/data-engineering/delta-live-tables/index.html, it will take care of your complete pipeline by ...

  • 3 kudos
1 More Replies
GKKarthi
by New Contributor
  • 5459 Views
  • 6 replies
  • 2 kudos

Resolved! Databricks - Simba SparkJDBCDriver 500550 exception

We have a Denodo big data platform hosted on Databricks. Recently we have been facing the exception with message '[Simba][SparkJDBCDriver](500550)'  with the Databricks which interrupts the Databricks connection after the certain time Interval usuall...

  • 5459 Views
  • 6 replies
  • 2 kudos
Latest Reply
PFBOLIVEIRA
New Contributor II
  • 2 kudos

Hi All,We are also experiencing the same behavior:[Simba][SimbaSparkJDBCDriver] (500550) The next rowset buffer is already marked as consumed. The fetch thread might have terminated unexpectedly. Foreground thread ID: xxxx. Background thread ID: yyyy...

  • 2 kudos
5 More Replies
pankaj92
by New Contributor II
  • 4534 Views
  • 4 replies
  • 0 kudos

extract latest files from ADLS Gen2 mount point in databricks using pyspark

Hi Team,I am trying to get the latest files from an ADLS mount point directory. I am not sure how to extract latest files ,Last modified Date using Pyspark from ADLS Gen2 storage account. Please let me know asap. Thanks! I am looking forward your re...

  • 4534 Views
  • 4 replies
  • 0 kudos
Latest Reply
Sha_1890
New Contributor III
  • 0 kudos

Hi @pankaj92​ ,I wrote a Python code to pick a latest file from mnt location ,import ospath = "/dbfs/mnt/xxxx"filelist=[]for file_item in os.listdir(path):  filelist.append(file_item)file=len(filelist)print(filelist[file-1])Thanks

  • 0 kudos
3 More Replies
ivanychev
by Contributor II
  • 8208 Views
  • 5 replies
  • 2 kudos

Resolved! How to find out why the cluster is in PENDING state for so long?

I'm using Databricks on AWS. Our clusters are typically in PENDING state for 5-8 minutes after they are created. I would like to find out why (ec2 instance provisioning? docker image download is slow? ...?). The cluster logs are not helpful enough be...

  • 8208 Views
  • 5 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

hi @Sergey Ivanychev​ while the cluster is starting, you can see the status on the compute page. Hover the mouse pointer to the green rotating circle on the left of the cluster name. It will give a notification of what is happening on the cluster. Wh...

  • 2 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels