cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jack
by New Contributor II
  • 5406 Views
  • 1 replies
  • 1 kudos

Python: Generate new dfs from a list of dataframes using for loop

I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final:First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast:for x i...

df_long view
  • 5406 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

thanks

  • 1 kudos
isaac_gritz
by Valued Contributor II
  • 722 Views
  • 1 replies
  • 6 kudos

Databricks Security Review

Conducting a security review or vendor assessment of Databricks and looking to learn more about our security features, compliance information, and privacy policies?You can find the latest on Databricks security features, architecture, compliance and ...

  • 722 Views
  • 1 replies
  • 6 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 6 kudos

thanks man

  • 6 kudos
SRK
by Contributor III
  • 1478 Views
  • 3 replies
  • 5 kudos

Resolved! I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.

I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.Following is the issue:java.lang.IllegalArgumentException:...

  • 1478 Views
  • 3 replies
  • 5 kudos
Latest Reply
SRK
Contributor III
  • 5 kudos

I got the issue resolved. The issues was by mistake we have duplicate columns in the schema files. Because of that it was showing that error. However, the error is totally mis-leading, that's why didn't able to rectify it.

  • 5 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 691 Views
  • 2 replies
  • 12 kudos

Resolved! How to get list of users who created the tables in different workspaces and the operations they have done.

Hi,I have 10 workspaces linked to different departments. We have overall 4 users doing some activity on these 10 workspaces . I want to get the list of users who are all operating on which tables and what operation they have performed and all in all ...

  • 691 Views
  • 2 replies
  • 12 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 12 kudos

Hi Ranjit,for tablets, I believe it's hard but if you want to combine all 10 workspaces you can use the databricks API for cluster lists https://docs.databricks.com/dev-tools/api/latest/index.htmland then you can check their IAM roles to understand w...

  • 12 kudos
1 More Replies
Thanapat_S
by Contributor
  • 1751 Views
  • 3 replies
  • 5 kudos

Resolved! How could I export an Alert object for deployment to another Azure Databricks resource?

IntroductionI would like to use Alert feature for monitor job status (from log table) in Databricks-SQL.So, I have write a query in a query notebook (or object) to return result from log table. Also, I have set the alert object for monitoring and tri...

image image image
  • 1751 Views
  • 3 replies
  • 5 kudos
Latest Reply
Harun
Honored Contributor
  • 5 kudos

I am not seeing any direct option to export or version control the alert object other than the migrate option.https://docs.databricks.com/sql/api/queries-dashboards.html - check this link, it might help you in other way.

  • 5 kudos
2 More Replies
KVNARK
by Honored Contributor II
  • 711 Views
  • 2 replies
  • 6 kudos

Resolved! Scope of Data Governance in Databricks

Scope of Data Governance in Databricks. How we can implement it and is there any data limit for this to implement. I would like to know more about Cost wise. 

  • 711 Views
  • 2 replies
  • 6 kudos
Latest Reply
KVNARK
Honored Contributor II
  • 6 kudos

I see. Thank you @karthik p​. Got it.

  • 6 kudos
1 More Replies
Taha_Hussain
by Valued Contributor II
  • 2179 Views
  • 1 replies
  • 5 kudos

Ask your technical questions at Databricks Office Hours! November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Regist...

Ask your technical questions at Databricks Office Hours!November 16 - 8:00 AM - 9:00 AM PT: Register HereNovember 30 - 11:00 AM - 12:00 PM PT: Register HereDatabricks Office Hours connects you directly with experts to answer all your Databricks quest...

  • 2179 Views
  • 1 replies
  • 5 kudos
Latest Reply
Taha_Hussain
Valued Contributor II
  • 5 kudos

Q&A Recap from 11/30 Office HoursQ: What is the downside of using z-ordering and auto optimize? It seems like there could be a tradeoff with writing small files (whereas it is good at reading a larger file), is that true?A: By default, Delta Lake on ...

  • 5 kudos
Ancil
by Contributor II
  • 9082 Views
  • 11 replies
  • 1 kudos

Any on please suggest how we can effectively loop through PySpark Dataframe .

Scenario: I Have a dataframe with more than 1000 rows, each row having a file path and result data column. I need to loop through each row and write files to the file path, with data from the result column.what is the easiest and time effective way ...

image
  • 9082 Views
  • 11 replies
  • 1 kudos
Latest Reply
NhatHoang
Valued Contributor II
  • 1 kudos

Hi,​I agree with Werners, try to avoid loop with Pyspark Dataframe.If your dataframe is small, as you said, only about 1000 rows, you may consider to use Pandas.Thanks.​

  • 1 kudos
10 More Replies
pabloaus
by New Contributor III
  • 3535 Views
  • 2 replies
  • 4 kudos

Resolved! How to read sql file from a Repo to string

I am trying to read a sql file in the repo to string. I have triedwith open("/Workspace/Repos/xx@***.com//file.sql","r") as queryFile: queryText = queryFile.read()And I get following error.[Errno 1] Operation not permitted: '/Workspace/Repos/***@*...

  • 3535 Views
  • 2 replies
  • 4 kudos
Latest Reply
Senthil1
Contributor
  • 4 kudos

I checked in my unity_catalog enabled cluster, i am able to access the @repos file to read and display

  • 4 kudos
1 More Replies
SRK
by Contributor III
  • 1762 Views
  • 4 replies
  • 7 kudos

How to handle schema validation for Json file. Using Databricks Autoloader?

Following are the details of the requirement:1.      I am using databricks notebook to read data from Kafka topic and writing into ADLS Gen2 container i.e., my landing layer.2.      I am using Spark code to read data from Kafka and write into landing...

  • 1762 Views
  • 4 replies
  • 7 kudos
Latest Reply
Anonymous
Not applicable
  • 7 kudos

Hi @Swapnil Kamle​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Than...

  • 7 kudos
3 More Replies
shamly
by New Contributor III
  • 2424 Views
  • 9 replies
  • 2 kudos

Resolved! need to remove doubledagger delimiter from a csv using databricks

My csv data looks like this‡‡companyId‡‡,‡‡empId‡‡,‡‡regionId‡‡,‡‡companyVersion‡‡,‡‡Question‡‡I tried this codedff = spark.read.option("header", "true").option("inferSchema", "true").option("delimiter", "‡,").csv(f"/mnt/data/path/datafile.csv")But I...

  • 2424 Views
  • 9 replies
  • 2 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 2 kudos

Hi @shamly pt​ I took a bit another approach since I guess no one would be sure of the the encoding of the data you showed. Sample data I took :‡‡companyId‡‡,‡‡empId‡‡,‡‡regionId‡‡,‡‡companyVersion‡‡,‡‡Question‡‡‡‡1‡‡,‡‡121212‡‡,‡‡R‡‡,‡‡1.0A‡‡,‡‡NA‡‡...

  • 2 kudos
8 More Replies
Andrei_Radulesc
by Contributor III
  • 1023 Views
  • 2 replies
  • 0 kudos

Terraform can set ALL_PRIVILEGES and USE_CATALOG on catalogs for 'account users', but not # SELECT or USE_SCHEMA

Only the GUI seems to allow SELECT and USE_SCHEMA 'account users' permissions on catalogs. Terraform gives me an error. Here is my Terraform config:resource "databricks_grants" "staging" { provider = databricks.workspace catalog = databricks_catalog....

  • 1023 Views
  • 2 replies
  • 0 kudos
Latest Reply
Pat
Honored Contributor III
  • 0 kudos

Hi @Andrei Radulescu-Banu​ ,Which version of the provider are you using?I did check the github repo it should work:https://github.com/databricks/terraform-provider-databricks/blob/d65ef3518074a48e079080d94e1ab33a80bf7e0f/catalog/resource_grants.go#L1...

  • 0 kudos
1 More Replies
tom_shaffner
by New Contributor III
  • 7732 Views
  • 6 replies
  • 8 kudos

Resolved! Is there some form of enablement required to use Delta Live Tables (DLT)?

I'm trying to use delta live tables, but if I import even the example notebooks I get a warning saying `ModuleNotFoundError: No module named 'dlt'`. If I try and install via pip it attempts to install a deep learning framework of some sort.I checked ...

  • 7732 Views
  • 6 replies
  • 8 kudos
Latest Reply
Insight6
New Contributor II
  • 8 kudos

Here's the solution I came up with... Replace `import dlt` at the top of your first cell with the following: try: import dlt # When run in a pipeline, this package will exist (no way to import it here) except ImportError: class dlt...

  • 8 kudos
5 More Replies
Swapnil1998
by New Contributor III
  • 1365 Views
  • 4 replies
  • 7 kudos

Resolved! Ingest Cosmos Mongo DB data using Databricks by applying filters

I would need to add a filter condition while ingesting data from a Cosmos Mongo DB using Databricks,I am using the below query to ingest data of a Cosmos Collection:df = spark.read \.format('com.mongodb.spark.sql.DefaultSource') \.option('uri', sourc...

  • 1365 Views
  • 4 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @Swapnil Sarkar​, The error message means the stage name in your aggregation pipeline request wasn't recognised. The solution will be to ensure that all aggregation pipeline names are valid in your request.This article describes common errors and ...

  • 7 kudos
3 More Replies
dineshg
by New Contributor III
  • 1771 Views
  • 3 replies
  • 6 kudos

Resolved! pyspark - execute dynamically framed action statement stored in string variable

I need to execute union statement which is framed dynamically and stored in string variable. I framed the union statement, but struck with executing the statement. Does anyone know how to execute union statement stored in string variable? I'm using p...

  • 1771 Views
  • 3 replies
  • 6 kudos
Latest Reply
Shalabh007
Honored Contributor
  • 6 kudos

@Dineshkumar Gopalakrishnan​ using python's exec() function can be used to execute a python statement, which in your case could be pyspark union statement. Refer below sample code snippet for your reference.df1 = spark.sparkContext.parallelize([(1, 2...

  • 6 kudos
2 More Replies
Labels
Top Kudoed Authors