cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

scholar
by New Contributor II
  • 2712 Views
  • 3 replies
  • 2 kudos

How to read data from kafka topic using spark streaming

I have installed kafka-2.10-0.10.2. And using cluster with configuration: Runtime :6.4 Extended Support( scala 2.11,Spark 2.4.5) After this i am able to get mesgage son producer and consumer But when i try to read data from spark.readsttream and tr...

  • 2712 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

You can just use display(orders_df3) for debugging purposes

  • 2 kudos
2 More Replies
palzor
by New Contributor III
  • 9207 Views
  • 4 replies
  • 4 kudos

Getting error when using CDC in delta live table

Hi,I am trying to use CDC for delta live table, and when when I run the pipeline second time I get an error :org.apache.spark.sql.streaming.StreamingQueryException: Query tbl_cdc [id = ***-xx-xx-bf7e-6cb8b0deb690, runId = ***-xxxx-4031-ba74-b4b22be05...

  • 9207 Views
  • 4 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Palzor Lama​,A streaming live table can only process append queries; that is, queries where new rows are inserted into the source table. Processing updates from source tables, for example, merges and deletes, is not supported. To process updates,...

  • 4 kudos
3 More Replies
JeromeB974
by New Contributor II
  • 6708 Views
  • 5 replies
  • 6 kudos

can we use spark-xml with delta live tables ?

Hiis there a way to use spark-xml with delta live tables (Azure Databricks) ?i 've try something like this without any succes for the momentCREATE LIVE TABLE df17 USING com.databricks.spark.xmlAS SELECT * FROM cloud_files("/mnt/dev/bronze/xml/s432799...

  • 6708 Views
  • 5 replies
  • 6 kudos
Latest Reply
Zachary_Higgins
Contributor
  • 6 kudos

This is a tough one since the only magic command available is %pip, but spark-xml is a maven package. The only way I found to do this was to install the spark-xml jar from the maven repo using the databricks-cli. You can reference the cluster ID usin...

  • 6 kudos
4 More Replies
Taha_Hussain
by Databricks Employee
  • 1013 Views
  • 0 replies
  • 1 kudos

Databricks Office Hours Register for Office Hours to participate in a live Q&A session with Databricks experts! Our next events are scheduled for ...

Databricks Office HoursRegister for Office Hours to participate in a live Q&A session with Databricks experts! Our next events are scheduled for June 8th & June 22 from 8:00 am - 9:00am PT.This is your opportunity to connect directly with our experts...

  • 1013 Views
  • 0 replies
  • 1 kudos
thaipham
by New Contributor III
  • 1960 Views
  • 3 replies
  • 4 kudos

Resolved! How would I export the latest revision of a notebook?

I've been trying to export some notebooks from my Databricks workspace to my laptop. I can't use Git Repos because the company restricted access to external services from the control plane.However it looks to me that I always exported the previous re...

  • 1960 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Too bad you are not allowed to use Repos, can be a life saver.Can you check your answer as best answer so the question is marked as solved?

  • 4 kudos
2 More Replies
Ruby8376
by Valued Contributor
  • 2018 Views
  • 2 replies
  • 0 kudos

Primary/Foreign key Costraints on Delta tables?

Hi All!I am using databricks in data migration project . We need to transform the data before loading it to SalesForce. Can we do Primary key/foreign key constraints on databricks delta tables?

  • 2018 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Ruby Rubi​  following- up did you get a chance to check @Werner Stinckens​ previous comments or do you need any further help on this?

  • 0 kudos
1 More Replies
laurencewells
by New Contributor III
  • 3504 Views
  • 3 replies
  • 1 kudos

Resolved! Log4J Custom Filter Not Working

Hi All, Hoping you can help. I am looking to set up a custom logging process that captures application ETL logs and Streaming logs I have set up multiple custom logging appenders using the guide here: https://kb.databricks.com/clusters/overwrite-log4...

  • 3504 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Laurence Wells​ Hope you are doing great.Does @Kaniz Fatma​ 's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

  • 1 kudos
2 More Replies
lizou
by Contributor II
  • 4206 Views
  • 1 replies
  • 1 kudos

Never use the float data type

select float('92233464567.33') returns 92,233,466,000I am expected result will be around 92,233,464,567.xxtherefore, float data type should be avoided.Use double or decimal works as expected. But I see float data type is widely used assuming most num...

image
  • 4206 Views
  • 1 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Float is Approximate-number data type, which means that not all values in the data type range can be represented exactly.Decimal/Numeric is Fixed-Precision data type, which means that all the values in the data type range can be represented exactly w...

  • 1 kudos
Krish-685291
by New Contributor III
  • 2248 Views
  • 6 replies
  • 2 kudos

Can I merge delta lake table to RDBMS table directly? Which is the preferred way in Databricks?

Hi,I am dealing with updating master data. I'll do the UPCERT operations on the delta lake table. But after my UPCERT is complete I like to update the master data on the RDBMS table also. Is there any support from Databricks to perform this operation...

  • 2248 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

I get your point and concerns.If there are plans in that direction, it will have to be a joint effort of Databricks + db vendor.

  • 2 kudos
5 More Replies
BenBauer
by New Contributor III
  • 853 Views
  • 0 replies
  • 4 kudos

How to prevent creation of __apply_changes_* table creation during DLT create_target_table process

Hey, we are using DLT along with SCD I via the create_target_table function. It does actually not create the table as defined, but rather a view., however on top of the expected table we see system generated tables e.g.: __apply_changes_*Is there a w...

  • 853 Views
  • 0 replies
  • 4 kudos
naveenmamidala
by New Contributor II
  • 21603 Views
  • 1 replies
  • 1 kudos

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

Error: ConnectionError: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: /api/2.0/workspace/list?path=%2F (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001CAF52B4640>: Failed to establis...

  • 21603 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sajith
New Contributor II
  • 1 kudos

Set HTTPS proxy sever in CLI and it started working without any errorset HTTPS_PROXY=http://username:password@{proxy host}:{port}

  • 1 kudos
rakeshdey
by New Contributor II
  • 1825 Views
  • 0 replies
  • 1 kudos

why providing list of filenames to spark.read.csv([file1,fiel2,file3]) is much faster than providing directory with wild card spark.read.csv("/path/*") ??

I have huge no of small files in s3 and I was going through few blog where people are telling that providing list of files is faster like (spark.read.csv([file1,file2,file3]) instead of giving directory with wild card Reason : Spark actually does fi...

  • 1825 Views
  • 0 replies
  • 1 kudos
Sunny
by New Contributor III
  • 1065 Views
  • 1 replies
  • 0 kudos

Update task status from external application

I am having a workflow with a task that is dependant on external application execution (not residing in Databricks). After external application finishes, how to update the status of a task to complete. Currently, Jobs API doesn't support status updat...

  • 1065 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sunny
New Contributor III
  • 0 kudos

Any inputs on this one please

  • 0 kudos
GC-James
by Contributor II
  • 11180 Views
  • 8 replies
  • 10 kudos

RserveException: eval failed

Sometimes when I am running R code in a Databricks notebook I am given this error. The cell I am running fails, and my whole R 'session' seems to get screwed up. For example my stored variables disappear, and I have to re-load my packages etc. It is ...

rserve_error
  • 11180 Views
  • 8 replies
  • 10 kudos
Latest Reply
data_warrior
New Contributor III
  • 10 kudos

The error file is attached here.

  • 10 kudos
7 More Replies
Braxx
by Contributor II
  • 4242 Views
  • 2 replies
  • 1 kudos

Resolved! delta table storage

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.Few questions:1) Where does delta tables are stored? Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"So where exactly i...

  • 4242 Views
  • 2 replies
  • 1 kudos
Latest Reply
Braxx
Contributor II
  • 1 kudos

thanks, very helpful

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels