cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wyzer
by Contributor II
  • 3131 Views
  • 2 replies
  • 2 kudos

Resolved! Insert data into an on-premise SQL Server

Hello,Is it possible to insert data from Databricks into an on-premise SQL Server ?Thanks.

  • 3131 Views
  • 2 replies
  • 2 kudos
Latest Reply
wyzer
Contributor II
  • 2 kudos

Hello,Yes we find out how to do it by installing a JDBC connector.It works fine.Thanks.

  • 2 kudos
1 More Replies
Abel_Martinez
by Contributor
  • 1721 Views
  • 1 replies
  • 1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

  • 1721 Views
  • 1 replies
  • 1 kudos
Latest Reply
Abel_Martinez
Contributor
  • 1 kudos

HI @Kaniz Fatma​ , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

  • 1 kudos
Vibhor
by Contributor
  • 4681 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks Data Type Conversion error

In databricks while writing data to curated layer, see error - Failed to execute user defined function (Double => decimal(38,18)). Does anyone know if faced such issue and how to resolve it.

  • 4681 Views
  • 5 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

What happens if you explicitly cast it?I remember having such issues with implicit casting when goin from spark 2.x to 3.x, but these were solved by using explicit casting (not round()).

  • 2 kudos
4 More Replies
Santosh09
by New Contributor II
  • 5399 Views
  • 4 replies
  • 3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

image.png
  • 5399 Views
  • 4 replies
  • 3 kudos
Latest Reply
User16764241763
Honored Contributor
  • 3 kudos

@shiva Santosh​ Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis​  mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

  • 3 kudos
3 More Replies
BorislavBlagoev
by Valued Contributor III
  • 3848 Views
  • 8 replies
  • 4 kudos

Resolved! Spark data limits

How much data is too much for spark and what is the best strategy to partition 2GB data?

  • 3848 Views
  • 8 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

2GB is quite small so usually default settings are the best (so in most cases better result is not to set anything like repartition etc. and leave everything to catalyst optimizer). If you want to set custom partitioning:please remember about avoidi...

  • 4 kudos
7 More Replies
venkyv
by New Contributor II
  • 1959 Views
  • 1 replies
  • 3 kudos

Resolved! Can I use Databricks to join data from S3 and Postgres using SQL?

Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.Requirement:We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join d...

  • 1959 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake...

  • 3 kudos
thomasthomas
by New Contributor II
  • 2097 Views
  • 4 replies
  • 0 kudos

Resolved! Customer deployment

Hi,I have a bunch of scripts in Databricks that perform a decent amount of data-wrangling. All of these scripts contain sensitive information and I have no intention of making them public.I would like to provide a service to my customers - so they ca...

  • 2097 Views
  • 4 replies
  • 0 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 0 kudos

@Tamas D​  I understood your concern. For cluster creation in different subscription I think that's by design at this moment. But I would like to request you to add your use case to https://feedback.azure.com/d365community/forum/2efba7dc-ef24-ec11-b6...

  • 0 kudos
3 More Replies
Development
by New Contributor III
  • 617 Views
  • 0 replies
  • 0 kudos

Hi All, I hope you're doing well I am facing issue while installing an python library on ADB Cluster. lib - PyCaret ( latest version) its not gett...

Hi All,I hope you're doing wellI am facing issue while installing an python library on ADB Cluster.lib - PyCaret ( latest version)its not getting install and showing me 'Failed' Status.It would be great if you can help here !!Thanks

  • 617 Views
  • 0 replies
  • 0 kudos
William_Scardua
by Valued Contributor
  • 8186 Views
  • 6 replies
  • 3 kudos

Resolved! How do you create a Sandbox in your data environment ?

Hi guys,How do you create a Sandbox in your data environment ? have any idea ?Azzure/AWS + Data Lake + Databricks

  • 8186 Views
  • 6 replies
  • 3 kudos
Latest Reply
missyT
New Contributor III
  • 3 kudos

In a sandbox environment, you will find the Designer enabled. You can activate Designer by selecting the design icon Designer. on a page, or by choosing the Design menu item in the Settings Settings menu.

  • 3 kudos
5 More Replies
User16857281869
by New Contributor II
  • 1956 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 1956 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
sarvesh
by Contributor III
  • 3767 Views
  • 0 replies
  • 0 kudos

Can we read an excel file with many sheets with there indexes?

I am trying to read a excel file which has 3 sheets which have integers as there names,sheet 1 name = 21sheet 2 name = 24sheet 3 name = 224i got this data from a user so I can't change the sheet name, but with spark reading these is an issue.code -v...

  • 3767 Views
  • 0 replies
  • 0 kudos
sarvesh
by Contributor III
  • 5104 Views
  • 3 replies
  • 6 kudos

Resolved! Can we use spark-stream to read/write data from mysql? I can't find an example.

If someone can link me an example where stream is used to read or write to mysql please do.

  • 5104 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

Regarding writing (sink) is possible without problem via foreachBatch .I use it in production - stream autoload csvs from data lake and writing foreachBatch to SQL (inside foreachBatch function you have temporary dataframe with records and just use w...

  • 6 kudos
2 More Replies
sarvesh
by Contributor III
  • 3212 Views
  • 5 replies
  • 8 kudos

Catch rejected Data ( Rows ) while reading with Apache-Spark.

I work with Spark-Scala and I receive Data in different formats ( .csv/.xlxs/.txt etc ), when I try to read/write this data from different sources to a any database, many records got rejected due to various issues like (special characters, data type ...

  • 3212 Views
  • 5 replies
  • 8 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 8 kudos

or maybe schema evolution on delta lake is enough, in combination with Hubert's answer

  • 8 kudos
4 More Replies
Anonymous
by Not applicable
  • 2426 Views
  • 3 replies
  • 7 kudos

Resolved! How does 73% of the data go unused for analytics or decision-making?

Is Lakehouse the answer? Here's a good resource that was just published: https://dbricks.co/3q3471X

  • 2426 Views
  • 3 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

@Alexis Lopez​ - If @Dan Zafar​ 's or @Harikrishnan Kunhumveettil​'s answers solved the issue, would you be happy to mark one of their answers as best so other members can find the solution more easily?

  • 7 kudos
2 More Replies
Labels