cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TeachingWithDat
by New Contributor II
  • 8297 Views
  • 3 replies
  • 2 kudos

I am getting this error: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I am teaching a class for BYU Idaho and every table in every database has been imploded for my class. We keep getting this error:com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: ...

  • 8297 Views
  • 3 replies
  • 2 kudos
Latest Reply
aparna123
New Contributor II
  • 2 kudos

i am facing the issue before i trying to execute a code error message:com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

  • 2 kudos
2 More Replies
User16685683696
by Databricks Employee
  • 3151 Views
  • 1 replies
  • 2 kudos

Free Training: Databricks Lakehouse Fundamentals The demand for technology roles is only growing – it's projected that over 150 million jobs will ...

Free Training: Databricks Lakehouse FundamentalsThe demand for technology roles is only growing – it's projected that over 150 million jobs will be added in the next five years. Across industries and regions, this is translating to increased demand f...

  • 3151 Views
  • 1 replies
  • 2 kudos
Latest Reply
Eddie_AZ
New Contributor II
  • 2 kudos

I watched all 4 videos but getting an error when I try to take the test. How do I complete the test and get my badge? 

  • 2 kudos
Gaurav_Lokhande
by Databricks Partner
  • 4709 Views
  • 7 replies
  • 3 kudos

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC: jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SE...

  • 4709 Views
  • 7 replies
  • 3 kudos
Latest Reply
arjun_kr
Databricks Employee
  • 3 kudos

@Gaurav_Lokhande  With Spark JDBC usage, connectivity happens between your Databricks VPC (in your AWS account) and RDS VPC, assuming you are using non-serverless clusters. You may need to ensure this connectivity works (like by peering).

  • 3 kudos
6 More Replies
trentlglover
by New Contributor
  • 1074 Views
  • 1 replies
  • 0 kudos

Notebooks running long in workflow

I have deployed a new databricks environment for development. I've copied a workflow from production to this environment with exactly the same compute configuration. Four notebooks that complete within minutes do not complete after 2 hours in develop...

  • 1074 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @trentlglover, It sounds like you're experiencing a significant performance issue with your notebooks in the new development environment. Here are a few potential areas to investigate: Cluster Configuration: Even though you mentioned that the comp...

  • 0 kudos
RajeshRK
by Contributor II
  • 17431 Views
  • 7 replies
  • 3 kudos

Resolved! Download event, driver, and executor logs

Hi Team, I can see logs in Databricks console by navigating workflow -> job name -> logs. These logs are very generic like stdout, stderr and log4-avtive.log. How to download event, driver, and executor logs at once for a job? Regards,Rajesh.

  • 17431 Views
  • 7 replies
  • 3 kudos
Latest Reply
RajeshRK
Contributor II
  • 3 kudos

@Kaniz Fatma​ @John Lourdu​ @Vidula Khanna​ Hi Team,I managed to download logs using the Databricks command line as below: Installed the Databricks command line on my Desktop (pip install databricks-cli)Configured the Databricks cluster URL and perso...

  • 3 kudos
6 More Replies
hkmodi
by New Contributor II
  • 3613 Views
  • 3 replies
  • 0 kudos

Perform row_number() filter in autoloader

I have created an autoloader job that reads data from S3 (files with no extension) having json using (cloudFiles.format, text). Now this job is suppose to run every 4 hours and read all the new data that arrived. But before writing into a delta table...

  • 3613 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

HI @hkmodi ,Basically, as @daniel_sahal  said, bronze layer should reflect the source system. The silver layer is dedicated for deduplication/cleaning/enrichment of dataset. If you still need to deduplicate at bronze layer you have 2 options:- use me...

  • 0 kudos
2 More Replies
vibhakar
by New Contributor
  • 6604 Views
  • 3 replies
  • 1 kudos

Not able to mount ADLS Gen2 in Data bricks

py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.util.Map) is ...

  • 6604 Views
  • 3 replies
  • 1 kudos
Latest Reply
cpradeep
New Contributor III
  • 1 kudos

Hi , have you sorted this issue ? can you please let me know the solution? 

  • 1 kudos
2 More Replies
fabien_arnaud
by New Contributor II
  • 3490 Views
  • 6 replies
  • 0 kudos

Data shifted when a pyspark dataframe column only contains a comma

I have a dataframe containing several columns among which 1 contains, for one specific record, just a comma, nothing else.When displaying the dataframe with the commanddisplay(df_input.where(col("erp_vendor_cd") == 'B6SA-VEN0008838')) The data is dis...

  • 3490 Views
  • 6 replies
  • 0 kudos
Latest Reply
MilesMartinez
New Contributor II
  • 0 kudos

Thank you so much for the solution.

  • 0 kudos
5 More Replies
oakhill
by New Contributor III
  • 1002 Views
  • 1 replies
  • 0 kudos

How to optimize queries on a 150B table? ZORDER, LC or partioning?

Hi!I am struggling to understand how to properly manage my table to make queries effective. My table has columns date_time_utc, car_id, car_owner etc. date_time_utc, car_id and position is usually the ZORDER or Liquid Clustering-columns.Selecting max...

  • 1002 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

1. According to the databricks yes But as always, I recommend to perform benchamarks yourself. There a lot of blog posts, that are saying that it's not alway the case. Yesterday, I was at data community event and presenter did several benchmark and ...

  • 0 kudos
AlvaroCM
by Databricks Partner
  • 1741 Views
  • 2 replies
  • 0 kudos

Resolved! DLT error at validation

Hello,I'm creating a DLT pipeline with Databricks on AWS. After creating an external location for my bucket, I encountered the following error:DataPlaneException: [DLT ERROR CODE: CLUSTER_LAUNCH_FAILURE.CLIENT_ERROR] Failed to launch pipeline cluster...

  • 1741 Views
  • 2 replies
  • 0 kudos
Latest Reply
AlvaroCM
Databricks Partner
  • 0 kudos

Hi!The error was related to the roles and permissions created when the workspace was set up. I reloaded the setup script in a new workspace, and it worked without problems.Hope it helps anyone in the future.Thanks!

  • 0 kudos
1 More Replies
AntonDBUser
by New Contributor III
  • 1919 Views
  • 1 replies
  • 2 kudos

Lakehouse Federation with OAuth connection to Snowflake

Hi!We have a lot use cases were we need to load data from Snowflake into Databricks, where users are using both R and Python for further analysis and machine learning. For this we have been using Lakehouse Federation combined with basic auth, but are...

  • 1919 Views
  • 1 replies
  • 2 kudos
Latest Reply
AntonDBUser
New Contributor III
  • 2 kudos

For anyone interested: We solved this by building an OAuth integration to Snowflake ourselfs using Entra ID: https://community.snowflake.com/s/article/External-oAuth-Token-Generation-using-Azure-ADWe also created some simple Python and R-packages tha...

  • 2 kudos
JonHMDavis
by New Contributor II
  • 7666 Views
  • 5 replies
  • 2 kudos

Graphframes not importing on Databricks 9.1 LTS ML

Is Graphframes for python meant to be installed by default on Databricks 9.1 LTS ML? Previously I was running the attached python command on 7.3 LTS ML with no issue, however now I am getting "no module named graphframes" when trying to import the pa...

  • 7666 Views
  • 5 replies
  • 2 kudos
Latest Reply
malz
Databricks Partner
  • 2 kudos

Hi @MuthuLakshmi ,  As per the documentation it was mentioned that graphframes comes preinstalled in databricks runtime for machine learning. but when trying to import the python module of graphframes, getting no module found error.from graphframes i...

  • 2 kudos
4 More Replies
naveenreddy1
by New Contributor II
  • 20493 Views
  • 4 replies
  • 0 kudos

Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace

We are using the databricks 3 node cluster with 32 GB memory. It is working fine but some times it automatically throwing the error: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

  • 20493 Views
  • 4 replies
  • 0 kudos
Latest Reply
RodrigoDe_Freit
Databricks Partner
  • 0 kudos

If your job fails follow this:According to https://docs.databricks.com/jobs.html#jar-job-tips: "Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and ma...

  • 0 kudos
3 More Replies
ArturOA
by New Contributor III
  • 6612 Views
  • 7 replies
  • 0 kudos

Attaching to Serverless from Azure Data Factory via Service Principal

Hi,We have issues trying to run Databricks notebooks orchestrated with Azure Data Factory. We have been doing this for a while now without any issues when we use Job Clusters, Existing General Purpose Clusters, or Cluster Pools. We use an Azure Data ...

ArturOA_0-1729677593083.png
  • 6612 Views
  • 7 replies
  • 0 kudos
Latest Reply
h_h_ak
Contributor
  • 0 kudos

Does the service principal has access and permission for the notebook?

  • 0 kudos
6 More Replies
HamidHamid_Mora
by New Contributor II
  • 5434 Views
  • 4 replies
  • 3 kudos

ganglia is unavailable on DBR 13.0

We created a library in databricks to ingest ganglia metrics for all jobs in our delta tables;However end point 8652 is no more available on DBR 13.0is there any other endpoint available ? since we need to log all metrics for all executed jobs not on...

  • 5434 Views
  • 4 replies
  • 3 kudos
Latest Reply
h_h_ak
Contributor
  • 3 kudos

You should have a look here: https://community.databricks.com/t5/data-engineering/azure-databricks-metrics-to-prometheus/td-p/71569

  • 3 kudos
3 More Replies
Labels