cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jcozar
by Contributor
  • 5515 Views
  • 5 replies
  • 2 kudos

Resolved! CDC and raw data

Hi, I am using debezium server to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question is, what are the best practices and recommendations to save raw data and then implement a medallion architecture?For clarification, I wa...

  • 5515 Views
  • 5 replies
  • 2 kudos
Latest Reply
jcozar
Contributor
  • 2 kudos

Thank you very much @Palash01 ! It has been really helpful!

  • 2 kudos
4 More Replies
rhevarr
by New Contributor II
  • 853 Views
  • 1 replies
  • 0 kudos

Course: Apache Spark Programming with Databricks ID: E-P0W7ZV // Issue Classroom-Setup

Hello,I am trying to run the Classroom-Setup from the course files notebook (ASP 1.1 - Databricks Platform)(Course:Apache Sparkâ„¢ Programming with DatabricksID: E-P0W7ZV)Instructions: "Setup:Run classroom setup to mount Databricks training datasets an...

Data Engineering
academy
Course
Databricks
spark
  • 853 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @rhevarr, Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours.

  • 0 kudos
karthik-kobai
by New Contributor II
  • 958 Views
  • 1 replies
  • 0 kudos

Databricks-jdbc and vulnerabilities CVE-2021-36090 CVE-2023-6378 CVE-2023-6481

The latest version of Databricks-jdbc available through Maven (2.6.36) now has these three vulnerabilities:https://www.cve.org/CVERecord?id=CVE-2021-36090https://www.cve.org/CVERecord?id=CVE-2023-6378https://www.cve.org/CVERecord?id=CVE-2023-6481All ...

  • 958 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @karthik-kobai,  Thank you for bringing this to my attention! Let’s address the vulnerabilities in the Databricks JDBC driver. The current version of the Databricks JDBC driver you mentioned is 2.6.361. It appears that this version has dependen...

  • 0 kudos
subrat
by New Contributor
  • 1098 Views
  • 1 replies
  • 0 kudos

Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

Hi There,I'm currently going through Module 4 of the Data Engineering Associate pathway, specifically lesson 4.1 - DLT UI Walkthrough. We are instructed to specify the Cluster Policy as 'DBAcademy DLT' when configuring the pipeline. However, this opt...

  • 1098 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @subrat.,  Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours.

  • 0 kudos
rt-slowth
by Contributor
  • 2542 Views
  • 5 replies
  • 0 kudos

Error : . If you expect to delete or update rows to the source table in the future.......

Flow 'user_silver' has FAILED fatally. An error occurred because we detected an update or delete to one or more rows in the source table. Streaming tables may only use append-only streaming sources. If you expect to delete or update rows to the sourc...

  • 2542 Views
  • 5 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @rt-slowth Just checking in if the provided solution was helpful to you. If yes, please accept this as a Best Solution so that this thread can be considered closed.

  • 0 kudos
4 More Replies
rt-slowth
by Contributor
  • 9502 Views
  • 6 replies
  • 0 kudos

Questions about the design of bronze, silver, and gold for live streaming pipelines

I'm envisioning a live streaming pipeline.The bronze, or data ingestion, is being fetched using the directory listing mode of the autoloader.I'm not using File Notification Mode because I detect about 2-300 data changes per hour.I'm thinking about im...

Data Engineering
Delta Live Table
spark
  • 9502 Views
  • 6 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 0 kudos
5 More Replies
Dilorom
by New Contributor
  • 2911 Views
  • 1 replies
  • 0 kudos

How to connect to Dynamics CRM server in Databricks.

Currently I have access to Dynamics CRM backend server via AAD, and I can query tables via XRM tool. I am trying to connect to Dynamics CRM backend server in Databricks, and I am not sure how the connection needs to be set up or if any other access n...

  • 2911 Views
  • 1 replies
  • 0 kudos
Latest Reply
sheridan06
New Contributor III
  • 0 kudos

Hi Dilorom - did you ever solve your issue? I'm trying to connect to Microsoft Dynamics Business Central and get an error when I run %pip install dynamics365bc.ERROR: Could not find a version that satisfies the requirement dynamics365bc (from version...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 6398 Views
  • 2 replies
  • 0 kudos

Resolved! How does Delta solve the large number of small file problems?

Delta creates more small files during merge and updates operations.

  • 6398 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Delta solves the large number of small file problems using the below operations available for a Delta table. Optimize writes helps to optimizes the write operation by adding an additional shuffle step and reducing the number of output files. By defau...

  • 0 kudos
1 More Replies
ranged_coop
by Valued Contributor II
  • 14883 Views
  • 24 replies
  • 29 kudos

How to install Chromium Browser and Chrome Driver on DBX runtime 10.4 and above ?

Hi Team,We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?I have been through the site and have come across several links to this effect, but they all seem to be ins...

  • 14883 Views
  • 24 replies
  • 29 kudos
Latest Reply
Kaizen
Valued Contributor
  • 29 kudos

Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)This is all done for you in playwright. Refer to this post - I hope it helps!!https://communit...

  • 29 kudos
23 More Replies
noimeta
by Contributor II
  • 10541 Views
  • 17 replies
  • 12 kudos

Resolved! Error when create an external location using code

I'm trying to create an external location from notebook, and I got this kind of error[PARSE_SYNTAX_ERROR] Syntax error at or near 'LOCATION'(line 1, pos 16)   == SQL == CREATE EXTERNAL LOCATION IF NOT EXISTS test_location URL 's3://test-bronze/db/tes...

  • 10541 Views
  • 17 replies
  • 12 kudos
Latest Reply
Lokeshv
New Contributor II
  • 12 kudos

Hey everyone,I'm facing an issue with retrieving data from a volume or table that contains a string with a symbol, for example, 'databricks+'. Whenever I try to retrieve this data, I encounter a syntax error. Can anyone help me resolve this issue?

  • 12 kudos
16 More Replies
seefoods
by New Contributor III
  • 859 Views
  • 2 replies
  • 0 kudos

cluster metrics collection

Hello @Debayan please how can i collect metrics provided by clusters metrics for databricks runtime 13.1 or latest using shell bash script. Cordially, Aubert EMAKO

  • 859 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, Cluster metrics is an UI tool and available in the UI only.  For reference:  https://docs.databricks.com/en/compute/cluster-metrics.html

  • 0 kudos
1 More Replies
chari
by Contributor
  • 4995 Views
  • 3 replies
  • 0 kudos

writing spark dataframe as CSV to a repo

Hi,I wrote a spark dataframe as csv to a repo (synced with github). But when I checked the folder, the file wasn't there. Here is my code:spark_df.write.format('csv').option('header','true').mode('overwrite').save('/Repos/abcd/mno/data') No error mes...

  • 4995 Views
  • 3 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 the folder 'Repos' is not your repo, it's `dbfs:/Repos`, please checkdbutils.fs.ls('/Repos/abcd/mno/data') 

  • 0 kudos
2 More Replies
abelian-grape
by New Contributor II
  • 7090 Views
  • 5 replies
  • 0 kudos

Intermittent error databricks job kept running

Hi i have the following error, but the job kept running, is that normal?{     "message": "The service at /api/2.0/jobs/runs/get?run_id=899157004942769 is temporarily unavailable. Please try again later. [TraceId: -]",     "error_code": "TEMPORARILY_U...

  • 7090 Views
  • 5 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @abelian-grape, It may just be a temporary setback that you can overlook for now. However, if your job seems to be stuck or making no progress, it's best to pause it and come back to it at a later time.    Nervous about that strange [TraceId: -] i...

  • 0 kudos
4 More Replies
Salman1
by New Contributor
  • 872 Views
  • 1 replies
  • 0 kudos

Cannot find UDF on subsequent job runs on same cluster.

Hello, I am trying to run jobs with a JAR task type using databricks on AWS on an all-purpose cluster. The issue I'm facing is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluste...

  • 872 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Salman1, first, navigate to the corresponding job in the Databricks Jobs UI. From there, go to the Runs tab, where you can see all current and completed runs, including any that have failed. If you hover over the failed task, you can see further ...

  • 0 kudos
chari
by Contributor
  • 2729 Views
  • 3 replies
  • 0 kudos

Fatal error when writing a big pandas dF

Hello DB community,I was trying to write a pandas dataframe containing 100000 rows as excel. Moments in the execution I received a fatal error : "Python kernel is unresponsive."However, I am constrained from increasing the number of clusters or other...

Data Engineering
Databricks
excel
python
  • 2729 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hey there! Thanks a bunch for being part of our awesome community!  We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution...

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels