cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Hubert-Dudek
by Esteemed Contributor III
  • 3347 Views
  • 2 replies
  • 13 kudos

Resolved! something like AWS Macie to perform scans on Azure Data Lake

Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.

  • 3347 Views
  • 2 replies
  • 13 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 13 kudos

thank you, I checked and yes it is definitely the way to go

  • 13 kudos
1 More Replies
ahana
by New Contributor III
  • 21439 Views
  • 11 replies
  • 2 kudos
  • 21439 Views
  • 11 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @ahana ahana​ ,Did any of the replies helped you solve this issue? would you be happy to mark their answer as best so that others can quickly find the solution?Thank you

  • 2 kudos
10 More Replies
Chris_Shehu
by Valued Contributor III
  • 2480 Views
  • 2 replies
  • 2 kudos
  • 2480 Views
  • 2 replies
  • 2 kudos
Latest Reply
Prabakar
Databricks Employee
  • 2 kudos

Hi @Christopher Shehu​ , if @Piper Wilson​ 's response helped you to solve your question? would you be happy to mark her answer as best so that others can quickly find the solution in the future.

  • 2 kudos
1 More Replies
Orianh
by Valued Contributor II
  • 10103 Views
  • 4 replies
  • 2 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

  • 10103 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@orian hindi​ - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

  • 2 kudos
3 More Replies
dataEngineer3
by New Contributor II
  • 8000 Views
  • 8 replies
  • 0 kudos

Hi All, I am trying to read a csv file from datalake and loading data into sql table using Copyinto. am facing an issue   Here i created one table wit...

Hi All,I am trying to read a csv file from datalake and loading data into sql table using Copyinto.am facing an issue  Here i created one table with 6 columns same as data in csv file.but unable to load the data.can anyone helpme on this

image
  • 8000 Views
  • 8 replies
  • 0 kudos
Latest Reply
dataEngineer3
New Contributor II
  • 0 kudos

Thanks Werners for your Reply,How to pass schema(ColumnName && Types) to CSV file ??

  • 0 kudos
7 More Replies
kjoth
by Contributor II
  • 8821 Views
  • 4 replies
  • 3 kudos

Resolved! Pyspark logging - custom to Azure blob mount directory

I'm using the logging module to log the events from the job, but it seems the log is creating the file with only 1 lines. The consecutive log events are not being recorded. Is there any reference for custom logging in Databricks.

  • 8821 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@karthick J​ - If Jose's answer helped solve the issue, would you be happy to mark their answer as best so that others can find the solution more easily?

  • 3 kudos
3 More Replies
AjayHN
by New Contributor II
  • 5060 Views
  • 1 replies
  • 2 kudos

Resolved! Notebook failing in job-cluster but runs fine in all-purpose-cluster with the same configuration

I have a notebook with many join and few persist operations (which runs fine on all-purpose-cluster (with worker nodes - i3.xlarge and autoscale enabled), but the same notebook failing in job-cluster with the same cluster definition (to be frank the ...

job-cluster all-purpose-cluster
  • 5060 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Ajay Nanjundappa​ ,Check "Event log" tab. Search for any spot terminations events. It seems like all your nodes are spot instances. The error "FetchFailedException" is associated with spot termination nodes.

  • 2 kudos
Andriy_Shevchen
by New Contributor
  • 4121 Views
  • 2 replies
  • 3 kudos

Resolved! yarn.nodemanager.resource.memory-mb parameter update

I am currently working on determining proper cluster size for my Spark application and I have a question regarding Hadoop configuration parameter yarn.nodemanager.resource.memory-mb. From what I see, this parameter is responsible for setting the phys...

  • 4121 Views
  • 2 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Andriy Shevchenko​ ,Databricks does not use Yarn. I recommend you to try to use Databricks community edition link to get familiar and explore. You can check Ganglia UI to see how is the cluster utilization, memory, cpu, IO, etc

  • 3 kudos
1 More Replies
Sebastian
by Contributor
  • 9216 Views
  • 3 replies
  • 1 kudos

How to access databricks secret in global ini file

How to access databricks secret in global ini file. {{secrets/scope/key}} doesnt work. Do i have to put that inside quotes

  • 9216 Views
  • 3 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

hi @SEBIN THOMAS​ ,I would like to share the docs here are you getting any error messages? like @Hubert Dudek​ mentioned, please share more details and error message in case you are getting any.

  • 1 kudos
2 More Replies
Mohit_m
by Databricks Employee
  • 3590 Views
  • 5 replies
  • 2 kudos

Which rest API to use in order to list the groups that belong to a specific user

Which rest API to use in order to list the groups that belong to a specific user

  • 3590 Views
  • 5 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

@Mohit Miglani​ ,Make sure to select the best option so the post will be moved to the top and will help in case more users have this question in the future.

  • 2 kudos
4 More Replies
Nosa
by New Contributor II
  • 4239 Views
  • 3 replies
  • 4 kudos

Resolved! adding databricks to my application

I am developing an application. I want to use databricks in my application. I developed that with python and godot. how can I have data bricks in my application?

  • 4239 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 4 kudos

Hi @Ensiyeh Shojaei​ ,Which cloud service are you using? According to the cloud provider, you will have a list of tools that can help you to connect and interact in your application.

  • 4 kudos
2 More Replies
schmit89
by New Contributor
  • 4270 Views
  • 1 replies
  • 1 kudos

Resolved! Downstream duration timeout

I'm trying to upload a file that is .5GB for a school lab and when I drag the file to DBFS it uploads for about 30 seconds and then I receive a downstream duration timeout error. What can I do to solve this issue?

  • 4270 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Jason Schmit​ ,Your file might be too large to upload by using the upload interface docs I will recommend to split it up into smaller files. You can also use DBFS CLI, dbutils to upload your file.

  • 1 kudos
Raghav1
by New Contributor II
  • 12616 Views
  • 7 replies
  • 3 kudos

How to avoid DataBricks Secret Scope from exposing the value of the key resides in Azure Key Vault?

I have created a key in Azure Key Vault to store my secrets in it. In order to use it securely in Azure DataBricks, have created the secret scope and configured the Azure Key Vault properties. Out of curiosity, just wanted to check whether my key is ...

databricks issue
  • 12616 Views
  • 7 replies
  • 3 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 3 kudos

@Kaniz Fatma​ is any fix coming soon for this? this is a big security loophole The docs say that "To ensure proper control of secrets you should use Workspace object access control (limiting permission to run commands) " --- if i prevent access to ru...

  • 3 kudos
6 More Replies
tigger
by New Contributor III
  • 4911 Views
  • 3 replies
  • 2 kudos

Resolved! Is it possible to disable retryWrites using .option()?

Hello everyone,I'm trying to write to DocumentDB using org.mongodb.spark:mongo-spark-connector_2.12:3.0.1. The DocDB is version 4 which doesn't support Retryable Writes so I disabled the feature setting option "retryWrites" to "false" (also tried wit...

  • 4911 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Hugh Vo​ - If Sajehs's answer resolved the issue, would you be happy to mark their answer as best?

  • 2 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels