cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sathya08
by New Contributor III
  • 4363 Views
  • 3 replies
  • 1 kudos

Databricks Python function achieving Parallelism

Hello everyone,I have a very basic question wrt Databricks spark parallelism.I have a python function within a for loop, so I believe this is running sequentially.Databricks cluster is enabled with Photon and with Spark 15x, does that mean the driver...

  • 4363 Views
  • 3 replies
  • 1 kudos
Latest Reply
sathya08
New Contributor III
  • 1 kudos

any help here , thanks 

  • 1 kudos
2 More Replies
TeachingWithDat
by New Contributor II
  • 7667 Views
  • 3 replies
  • 2 kudos

I am getting this error: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I am teaching a class for BYU Idaho and every table in every database has been imploded for my class. We keep getting this error:com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: ...

  • 7667 Views
  • 3 replies
  • 2 kudos
Latest Reply
aparna123
New Contributor II
  • 2 kudos

i am facing the issue before i trying to execute a code error message:com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

  • 2 kudos
2 More Replies
User16685683696
by Databricks Employee
  • 2378 Views
  • 1 replies
  • 2 kudos

Free Training: Databricks Lakehouse Fundamentals The demand for technology roles is only growing – it's projected that over 150 million jobs will ...

Free Training: Databricks Lakehouse FundamentalsThe demand for technology roles is only growing – it's projected that over 150 million jobs will be added in the next five years. Across industries and regions, this is translating to increased demand f...

  • 2378 Views
  • 1 replies
  • 2 kudos
Latest Reply
Eddie_AZ
New Contributor II
  • 2 kudos

I watched all 4 videos but getting an error when I try to take the test. How do I complete the test and get my badge? 

  • 2 kudos
Gaurav_Lokhande
by New Contributor II
  • 3081 Views
  • 7 replies
  • 3 kudos

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC: jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SE...

  • 3081 Views
  • 7 replies
  • 3 kudos
Latest Reply
arjun_kr
Databricks Employee
  • 3 kudos

@Gaurav_Lokhande  With Spark JDBC usage, connectivity happens between your Databricks VPC (in your AWS account) and RDS VPC, assuming you are using non-serverless clusters. You may need to ensure this connectivity works (like by peering).

  • 3 kudos
6 More Replies
trentlglover
by New Contributor
  • 719 Views
  • 1 replies
  • 0 kudos

Notebooks running long in workflow

I have deployed a new databricks environment for development. I've copied a workflow from production to this environment with exactly the same compute configuration. Four notebooks that complete within minutes do not complete after 2 hours in develop...

  • 719 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @trentlglover, It sounds like you're experiencing a significant performance issue with your notebooks in the new development environment. Here are a few potential areas to investigate: Cluster Configuration: Even though you mentioned that the comp...

  • 0 kudos
isai-ds
by New Contributor
  • 708 Views
  • 0 replies
  • 0 kudos

Salesforce LakeFlow connect - Deletion Salesforce records

Hello, I am new in databricks and related to data engineering. I am running a POC to sync data between a Salesforce sandbox and Databricks using LakeFlow connect.I already make the connection and i successfully sync data between salesforce and databr...

  • 708 Views
  • 0 replies
  • 0 kudos
RajeshRK
by Contributor II
  • 14987 Views
  • 7 replies
  • 3 kudos

Resolved! Download event, driver, and executor logs

Hi Team, I can see logs in Databricks console by navigating workflow -> job name -> logs. These logs are very generic like stdout, stderr and log4-avtive.log. How to download event, driver, and executor logs at once for a job? Regards,Rajesh.

  • 14987 Views
  • 7 replies
  • 3 kudos
Latest Reply
RajeshRK
Contributor II
  • 3 kudos

@Kaniz Fatma​ @John Lourdu​ @Vidula Khanna​ Hi Team,I managed to download logs using the Databricks command line as below: Installed the Databricks command line on my Desktop (pip install databricks-cli)Configured the Databricks cluster URL and perso...

  • 3 kudos
6 More Replies
hkmodi
by New Contributor II
  • 2573 Views
  • 3 replies
  • 0 kudos

Perform row_number() filter in autoloader

I have created an autoloader job that reads data from S3 (files with no extension) having json using (cloudFiles.format, text). Now this job is suppose to run every 4 hours and read all the new data that arrived. But before writing into a delta table...

  • 2573 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

HI @hkmodi ,Basically, as @daniel_sahal  said, bronze layer should reflect the source system. The silver layer is dedicated for deduplication/cleaning/enrichment of dataset. If you still need to deduplicate at bronze layer you have 2 options:- use me...

  • 0 kudos
2 More Replies
vibhakar
by New Contributor
  • 5895 Views
  • 3 replies
  • 1 kudos

Not able to mount ADLS Gen2 in Data bricks

py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.util.Map) is ...

  • 5895 Views
  • 3 replies
  • 1 kudos
Latest Reply
cpradeep
New Contributor III
  • 1 kudos

Hi , have you sorted this issue ? can you please let me know the solution? 

  • 1 kudos
2 More Replies
fabien_arnaud
by New Contributor II
  • 2362 Views
  • 6 replies
  • 0 kudos

Data shifted when a pyspark dataframe column only contains a comma

I have a dataframe containing several columns among which 1 contains, for one specific record, just a comma, nothing else.When displaying the dataframe with the commanddisplay(df_input.where(col("erp_vendor_cd") == 'B6SA-VEN0008838')) The data is dis...

  • 2362 Views
  • 6 replies
  • 0 kudos
Latest Reply
MilesMartinez
New Contributor II
  • 0 kudos

Thank you so much for the solution.

  • 0 kudos
5 More Replies
oakhill
by New Contributor III
  • 633 Views
  • 1 replies
  • 0 kudos

How to optimize queries on a 150B table? ZORDER, LC or partioning?

Hi!I am struggling to understand how to properly manage my table to make queries effective. My table has columns date_time_utc, car_id, car_owner etc. date_time_utc, car_id and position is usually the ZORDER or Liquid Clustering-columns.Selecting max...

  • 633 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

1. According to the databricks yes But as always, I recommend to perform benchamarks yourself. There a lot of blog posts, that are saying that it's not alway the case. Yesterday, I was at data community event and presenter did several benchmark and ...

  • 0 kudos
AlvaroCM
by New Contributor III
  • 1223 Views
  • 2 replies
  • 0 kudos

Resolved! DLT error at validation

Hello,I'm creating a DLT pipeline with Databricks on AWS. After creating an external location for my bucket, I encountered the following error:DataPlaneException: [DLT ERROR CODE: CLUSTER_LAUNCH_FAILURE.CLIENT_ERROR] Failed to launch pipeline cluster...

  • 1223 Views
  • 2 replies
  • 0 kudos
Latest Reply
AlvaroCM
New Contributor III
  • 0 kudos

Hi!The error was related to the roles and permissions created when the workspace was set up. I reloaded the setup script in a new workspace, and it worked without problems.Hope it helps anyone in the future.Thanks!

  • 0 kudos
1 More Replies
AntonDBUser
by New Contributor III
  • 1319 Views
  • 1 replies
  • 2 kudos

Lakehouse Federation with OAuth connection to Snowflake

Hi!We have a lot use cases were we need to load data from Snowflake into Databricks, where users are using both R and Python for further analysis and machine learning. For this we have been using Lakehouse Federation combined with basic auth, but are...

  • 1319 Views
  • 1 replies
  • 2 kudos
Latest Reply
AntonDBUser
New Contributor III
  • 2 kudos

For anyone interested: We solved this by building an OAuth integration to Snowflake ourselfs using Entra ID: https://community.snowflake.com/s/article/External-oAuth-Token-Generation-using-Azure-ADWe also created some simple Python and R-packages tha...

  • 2 kudos
JonHMDavis
by New Contributor II
  • 6316 Views
  • 5 replies
  • 2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

  • 6316 Views
  • 5 replies
  • 2 kudos
Latest Reply
malz
New Contributor II
  • 2 kudos

Hi @MuthuLakshmi ,  As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

  • 2 kudos
4 More Replies
naveenreddy1
by New Contributor II
  • 19585 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 19585 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels